Cloud Run · Chio Docs

Architecture

Cloud Run accepts external traffic on the only container that declares a containerPort. That is the sidecar on 9090. The app listens on 8080 and is reachable only over the in-pod loopback. The sidecar reverse-proxies traffic to the app after the kernel guard pipeline has accepted the request.

rendering…

Cloud Run sends external traffic to the sidecar (port 9090). The sidecar reverse-proxies to the app on localhost:8080 after the kernel evaluates each request. Secret Manager mounts the OpenAPI spec and the authority seed as files; the sidecar control token is an env-injected secret. Receipts stay in memory because Cloud Run has no per-instance disk.

Single ingress container

Cloud Run only routes external traffic to the container that declares a containerPort. If you accidentally also declare a port on the app container, the sidecar is bypassed and every request escapes the kernel.

Manifest Walkthrough

The reference manifest is a Knative Service named agent-tool-server. Walk it top-to-bottom; each chunk maps to one operational concern.

Service metadata and template annotations

deploy/cloud-run/service.yaml

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: agent-tool-server
  annotations:
    run.googleapis.com/launch-stage: GA
spec:
  template:
    metadata:
      annotations:
        # Keep at least one warm instance to amortise sidecar cold starts.
        # maxScale stays 1 so a single instance keeps one coherent
        # in-memory audit stream (--allow-ephemeral-receipts, below).
        autoscaling.knative.dev/minScale: "1"
        autoscaling.knative.dev/maxScale: "1"
        # App waits until chio-sidecar startupProbe succeeds.
        run.googleapis.com/container-dependencies: '{"app":["chio-sidecar"]}'
        run.googleapis.com/execution-environment: gen2
        # /metrics is served on :9090 but gated on the sidecar control token
        # (loopback callers are exempt). A bare prometheus.io/scrape: "true"
        # is omitted so no unauthenticated scraper hits public ingress.
        prometheus.io/path: "/metrics"
        prometheus.io/port: "9090"

Four annotations carry the operational contract:

autoscaling.knative.dev/minScale: "1" keeps one warm revision so the sidecar cold start never lands on a request. maxScale: "1" pins the ceiling to one instance because this reference runs ephemeral in-memory receipts; a single writer keeps one coherent audit stream.
run.googleapis.com/container-dependencies wires the app to wait until the sidecar reports healthy via its startupProbe. Without this, the app could accept traffic before the kernel is ready.
run.googleapis.com/execution-environment: gen2 selects the Linux cgroup-v2 sandbox required for multi-container services and Secret Manager volume mounts.
prometheus.io/path and prometheus.io/port document the scrape target. The route is gated: only a loopback caller or one bearing Authorization: Bearer $CHIO_SIDECAR_CONTROL_TOKEN reaches /metrics, so prometheus.io/scrape: "true" is intentionally left off to avoid an unauthenticated scraper on public ingress.

Service account and concurrency

deploy/cloud-run/service.yaml

    spec:
      serviceAccountName: chio-sidecar@PROJECT_ID.iam.gserviceaccount.com
      containerConcurrency: 80
      timeoutSeconds: 300

The revision runs as chio-sidecar@PROJECT_ID. That identity needs roles/secretmanager.secretAccessor on every secret the manifest references, plus whatever your app itself requires (BigQuery, GCS, etc.). containerConcurrency: 80 is the per-instance request ceiling Cloud Run will pack before spinning a new one. timeoutSeconds: 300 is the request budget; long-running tool calls should chunk or switch to streaming.

Application container

deploy/cloud-run/service.yaml

      containers:
        - name: app
          image: APP_IMAGE_PLACEHOLDER
          env:
            - name: CHIO_SIDECAR_URL
              value: "http://localhost:9090"
          resources:
            limits:
              cpu: "1"
              memory: 512Mi
          startupProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 2
            periodSeconds: 2
            failureThreshold: 30
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            periodSeconds: 10
            failureThreshold: 3

The app declares no ports block. That is intentional. One env var tells your handler where to find the sidecar: CHIO_SIDECAR_URL for evaluate / record calls; readiness gate against $CHIO_SIDECAR_URL/chio/health in your own startup hooks. The startup probe gives the app up to 60 seconds (30 attempts × 2s) to come online; the liveness probe checks every 10 seconds and recycles after 3 consecutive failures.

Sidecar container

deploy/cloud-run/service.yaml

        - name: chio-sidecar
          image: ghcr.io/backbay-labs/chio-sidecar:latest
          ports:
            - containerPort: 9090
          args:
            - "api"
            - "protect"
            - "--upstream"
            - "http://127.0.0.1:8080"
            - "--spec"
            - "/etc/chio/spec/openapi.yaml"
            - "--listen"
            - "0.0.0.0:9090"
            # Cloud Run has no per-instance persistent disk, so opt into
            # ephemeral in-memory receipts explicitly rather than point
            # --receipt-store at scratch storage.
            - "--allow-ephemeral-receipts"
            - "--authority-seed-file"
            - "/etc/chio/seed/authority.seed"

The sidecar image's default CMD is --help, which exits immediately. The manifest overrides CMD via args so the image entrypoint is preserved. That image is the distroless deploy/sidecar/Dockerfile build, whose entrypoint is the chio-sidecar binary directly, with no tini. api protect reverse-proxies upstream traffic through the kernel. Two extra flags matter on Cloud Run: --allow-ephemeral-receipts is the explicit opt-in for in-memory receipts (Cloud Run has no per-instance disk for a durable SQLite log), and the global --authority-seed-file reads the signing seed from the mounted secret file. Swap to mcp serve-http -- <wrapped server cmd> when the sidecar is fronting an MCP tool server instead of an HTTP app.

Sidecar env and secret mounts

deploy/cloud-run/service.yaml

          env:
            - name: CHIO_LOG_LEVEL
              value: "info"
            # Bearer token gating /metrics and the control endpoints. A
            # remote scraper sends it as Authorization: Bearer <token>; a
            # loopback collector needs no token.
            - name: CHIO_SIDECAR_CONTROL_TOKEN
              valueFrom:
                secretKeyRef:
                  name: chio-sidecar-control-token
                  key: latest
          volumeMounts:
            - name: chio-openapi-spec
              mountPath: /etc/chio/spec
            - name: chio-authority-seed
              mountPath: /etc/chio/seed

Configuration is CLI-flag driven, so the env block is short. The only plain value is CHIO_LOG_LEVEL; the only secret env var is CHIO_SIDECAR_CONTROL_TOKEN, resolved through secretKeyRef against Secret Manager. Two file-mounted secrets land at /etc/chio/spec/openapi.yaml and /etc/chio/seed/authority.seed; api protect --spec reads the first for its route and scope table, and --authority-seed-file reads the second as the signing seed. There is no kernel-config file.

Sidecar probes and resources

deploy/cloud-run/service.yaml

          resources:
            limits:
              cpu: "500m"
              memory: 128Mi
          startupProbe:
            httpGet:
              path: /chio/health
              port: 9090
            initialDelaySeconds: 1
            periodSeconds: 1
            failureThreshold: 30
          readinessProbe:
            # Dependency-aware: /chio/health returns 503 when the receipt
            # store can no longer persist, pulling the instance from
            # rotation on a post-startup outage.
            httpGet:
              path: /chio/health
              port: 9090
            periodSeconds: 10
            failureThreshold: 3
          livenessProbe:
            # Process-only: a dependency blip must not restart a serving
            # container. Readiness gates on /chio/health.
            httpGet:
              path: /chio/live
              port: 9090
            periodSeconds: 10
            failureThreshold: 3

The sidecar runs on 500m CPU and 128Mi memory. The startup and readiness probes both poll /chio/health, the dependency-aware route that returns 503 when the receipt store can no longer persist; only after startup succeeds does Cloud Run start the app container per the dependency annotation. Liveness polls the separate process-only /chio/live route so a transient dependency issue never recycles a container that is still serving.

Volumes (Secret Manager-backed)

deploy/cloud-run/service.yaml

      volumes:
        - name: chio-openapi-spec
          secret:
            secretName: chio-openapi-spec
            items:
              - key: latest
                path: openapi.yaml
        - name: chio-authority-seed
          secret:
            secretName: chio-authority-seed
            items:
              - key: latest
                path: authority.seed

Cloud Run resolves both volumes against Secret Manager at revision start. There is no third volume for a kernel config file — the kernel policy is derived from the OpenAPI spec plus the CLI flags. key: latest tracks the most recent enabled version; pin a numeric version (key: "7") to make rollouts reproducible across regions.

Secrets

Create the three secrets before the first deploy. The OpenAPI spec and the authority seed mount as files; the sidecar control token injects as an env var.

bash

# OpenAPI spec for the upstream app (mounted at /etc/chio/spec)
$ gcloud secrets create chio-openapi-spec --replication-policy=automatic
$ gcloud secrets versions add chio-openapi-spec --data-file=./openapi.yaml

# Authority signing seed (raw 32-byte Ed25519 seed, mounted at /etc/chio/seed)
$ gcloud secrets create chio-authority-seed --replication-policy=automatic
$ gcloud secrets versions add chio-authority-seed --data-file=./authority.seed

# Sidecar control token (bearer token gating /metrics + control endpoints)
$ gcloud secrets create chio-sidecar-control-token --replication-policy=automatic
$ openssl rand -hex 32 | \
    gcloud secrets versions add chio-sidecar-control-token --data-file=-

Grant the runtime service account read access on every secret:

bash

$ for s in chio-openapi-spec chio-authority-seed \
           chio-sidecar-control-token; do
    gcloud secrets add-iam-policy-binding "$s" \
      --member="serviceAccount:chio-sidecar@PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/secretmanager.secretAccessor"
  done

Rotate the authority seed behind the trusted-key history

Update chio-authority-seed by adding a new version, not by overwriting the latest version in place. The capability authority's trusted-key history keeps the previous key valid for in-flight capabilities; replacing the value without a rotation invalidates every live capability the next time the revision restarts.

Networking

The default *.run.app URL is publicly reachable over HTTPS with a Google-managed certificate. For internal-only services, add run.googleapis.com/ingress: internal to the service annotations and front it with an internal HTTPS load balancer.

If the kernel needs to reach a private capability authority or a VPC-peered receipt store, attach a Serverless VPC connector:

deploy/cloud-run/service.yaml

metadata:
  annotations:
    run.googleapis.com/vpc-access-connector: projects/PROJECT_ID/locations/REGION/connectors/chio-vpc
    run.googleapis.com/vpc-access-egress: private-ranges-only

If the kernel joins a shared trust-control service over that connector, point it there with the global --control-url flag and the CHIO_CONTROL_TOKEN bearer token. Terminate any additional mTLS at the load balancer or an Envoy sidecar rather than in the kernel.

Health Probes and Graceful Shutdown

Probe configuration above. On scale-in or revision rollover, Cloud Run sends SIGTERM and waits up to 10 seconds before SIGKILL. The sidecar handles SIGTERM by stopping ingress, draining in-flight evaluations, flushing queued receipts, and exiting. The app should mirror the same contract: stop accepting new requests, finish in-flight ones, exit.

Drain longer than 10 seconds

For long-running tool calls, keep timeoutSeconds on the request side tight so in-flight work finishes inside the grace window. Cloud Run's 10-second termination grace cannot be raised on regional services; if you need longer, switch to Cloud Run jobs or a GKE deployment.

Scaling

Three knobs control scale: min instances, max instances, and per-instance concurrency. The reference pins maxScale to "1" because it runs ephemeral in-memory receipts and a second instance would split the audit stream. Raising it requires a durable, shared audit store (front a client-server store, or move to a per-instance-disk platform).

Setting	Manifest field	Default in manifest
Minimum warm instances	`autoscaling.knative.dev/minScale`	`"1"`
Maximum instances	`autoscaling.knative.dev/maxScale`	`"1"` (pinned; single in-memory audit stream)
Per-instance concurrency	`containerConcurrency`	`80`
Request timeout	`timeoutSeconds`	`300`

CPU is request-based by default: the sidecar gets CPU only while a request is in flight. Background timers in the kernel (policy refresh, receipt flush) stall under request-based CPU, so set run.googleapis.com/cpu-throttling: "false" on the template annotations to switch to always-allocated CPU. That changes billing from per-request CPU to per-instance CPU, so match it to the workload.

deploy/cloud-run/service.yaml

spec:
  template:
    metadata:
      annotations:
        run.googleapis.com/cpu-throttling: "false"

Observability

Stdout and stderr from both containers route to Cloud Logging automatically, tagged with resource.type=cloud_run_revision and the container name. The kernel emits structured JSON; query receipts and decisions with:

bash

$ gcloud logging read \
    'resource.type="cloud_run_revision"
     AND resource.labels.service_name="agent-tool-server"
     AND jsonPayload.event="receipt"
     AND jsonPayload.verdict="deny"' \
    --limit=50 --format=json

For metrics and traces, sidecar an OTel collector or use the in-process exporter and point it at Cloud Trace + Cloud Monitoring. See Observability for the full collector wiring.

Cost Considerations

Cloud Run bills on vCPU-seconds, memory-GiB-seconds, and request count. Three knobs dominate the bill:

minScale: a warm instance bills 24/7. minScale: 1 on the reference revision is one container always-on for the sidecar plus one for the app. Drop to "0" in dev to remove the floor.
CPU allocation: switching to always-allocated CPU roughly triples the per-instance bill but is required if the kernel runs background timers or your app serves streaming responses.
Concurrency: raising containerConcurrency packs more requests per instance and lowers per-request cost, bounded by your tool server's actual concurrency safety.

Operations

Deploy

bash

$ gcloud run services replace deploy/cloud-run/service.yaml \
    --region=us-central1 \
    --project=PROJECT_ID

services replace creates a new revision and routes 100% of traffic to it once the startup probe and dependency graph clear. To stage traffic, deploy with --no-traffic and split afterwards:

bash

$ gcloud run services update-traffic agent-tool-server \
    --region=us-central1 \
    --to-revisions=agent-tool-server-00042-abc=10,agent-tool-server-00041-zyx=90

Rollback

bash

$ gcloud run services update-traffic agent-tool-server \
    --region=us-central1 \
    --to-revisions=agent-tool-server-00041-zyx=100

Logs and debugging

bash

# Tail the chio sidecar
$ gcloud beta run services logs tail agent-tool-server \
    --region=us-central1 \
    --container=chio-sidecar

# Inspect a specific revision
$ gcloud run revisions describe agent-tool-server-00042-abc \
    --region=us-central1 \
    --format="value(status.conditions)"

Worked Example

Deploy from a new Google Cloud project:

bash

# 1. Create the runtime service account and bind the role.
$ gcloud iam service-accounts create chio-sidecar \
    --project=PROJECT_ID

$ gcloud projects add-iam-policy-binding PROJECT_ID \
    --member="serviceAccount:chio-sidecar@PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/run.invoker"

# 2. Create secrets (see Secrets section above).

# 3. Deploy.
$ gcloud run services replace deploy/cloud-run/service.yaml \
    --region=us-central1 \
    --project=PROJECT_ID

# 4. Capture the URL.
$ URL=$(gcloud run services describe agent-tool-server \
    --region=us-central1 \
    --format='value(status.url)')

$ echo "$URL"
https://agent-tool-server-7gh3a-uc.a.run.app

# 5. Verify the sidecar is the front door.
$ curl -fsS "$URL/chio/health" | jq
{
  "status": "healthy",
  "version": "0.1.0",
  "receipt_backend": "ephemeral",
  "revocation_backend": "ephemeral"
}

# 6. Verify the app is unreachable except through the kernel.
$ curl -fsS "$URL/healthz"
# 403 capability_required
# {"error": "capability_denied", "reason": "missing_capability_token", ...}

# 7. With a valid capability token, the call lands on the app:
$ curl -fsS "$URL/api/search" \
    -H "Authorization: Bearer $CHIO_CAPABILITY_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{"query":"hello"}'

If the revision wedges on cold start

Check the order of conditions in status.conditions. The most common cause is an unreadable secret: the sidecar fails closed when it cannot mount the OpenAPI spec or the authority seed, the startup probe never goes green, the dependency graph blocks the app, and the revision is marked unhealthy. gcloud run revisions describe surfaces the underlying Secret Manager IAM denial.

For other deployment shapes, see Sidecar, ECS Fargate, and Azure Container Apps. For receipt querying and key rotation, see Trust Control Plane.

PreviousContainer Images NextECS Fargate