Chio/Docs

Cloud Run

Cloud Run runs a chio-governed tool server as a single Knative Service with two containers: your application and the chio sidecar. The sidecar is the only ingress container, the app stays bound to localhost, and Cloud Run enforces startup ordering so the app does not accept traffic until the sidecar reports healthy. Reference manifest is deploy/cloud-run/service.yaml in the Arc repo. Prerequisites: a GCP project with the Cloud Run, Secret Manager, and IAM APIs enabled, a service account that the revision will run as, and a registry that Cloud Run can pull from (Artifact Registry or GHCR).


Architecture

Cloud Run accepts external traffic on the only container that declares a containerPort. That is the sidecar on 9090. The app listens on 8080 and is reachable only over the in-pod loopback. The sidecar reverse-proxies traffic to the app after the kernel guard pipeline has accepted the request.

rendering…
Cloud Run sends external traffic to the sidecar (port 9090). The sidecar reverse-proxies to the app on localhost:8080 after the kernel evaluates each request. Secret Manager mounts kernel config and the OpenAPI spec; signing keys and the capability authority URL are env-injected secrets.

Single ingress container

Cloud Run only routes external traffic to the container that declares a containerPort. If you accidentally also declare a port on the app container, the sidecar is bypassed and every request escapes the kernel.

Manifest Walkthrough

The full reference is a Knative Service named agent-tool-server. Walk it top-to-bottom; each chunk maps to one operational concern.

Service metadata and template annotations

deploy/cloud-run/service.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: agent-tool-server
  annotations:
    run.googleapis.com/launch-stage: GA
spec:
  template:
    metadata:
      annotations:
        # Keep at least one warm instance to amortise sidecar cold starts.
        autoscaling.knative.dev/minScale: "1"
        autoscaling.knative.dev/maxScale: "100"
        # App waits until chio-sidecar startupProbe succeeds.
        run.googleapis.com/container-dependencies: '{"app":["chio-sidecar"]}'
        run.googleapis.com/execution-environment: gen2

Three annotations carry the operational contract:

  • autoscaling.knative.dev/minScale: "1" keeps one warm revision so the sidecar cold start never lands on a request.
  • run.googleapis.com/container-dependencies wires the app to wait until the sidecar reports healthy via its startupProbe. Without this, the app could accept traffic before the kernel has loaded policy.
  • run.googleapis.com/execution-environment: gen2 selects the Linux cgroup-v2 sandbox required for multi-container services and Secret Manager volume mounts.

Service account and concurrency

deploy/cloud-run/service.yaml
    spec:
      serviceAccountName: chio-sidecar@PROJECT_ID.iam.gserviceaccount.com
      containerConcurrency: 80
      timeoutSeconds: 300

The revision runs as chio-sidecar@PROJECT_ID. That identity needs roles/secretmanager.secretAccessor on every secret the manifest references, plus whatever your app and the receipt sink require (BigQuery, GCS, etc.). containerConcurrency: 80 is the per-instance request ceiling Cloud Run will pack before spinning a new one. timeoutSeconds: 300 is the request budget; long-running tool calls should chunk or switch to streaming.

Application container

deploy/cloud-run/service.yaml
      containers:
        - name: app
          image: APP_IMAGE_PLACEHOLDER
          env:
            - name: CHIO_SIDECAR_URL
              value: "http://localhost:9090"
            - name: CHIO_SIDECAR_HEALTH_URL
              value: "http://localhost:9090/chio/health"
          resources:
            limits:
              cpu: "1"
              memory: 512Mi
          startupProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 2
            periodSeconds: 2
            failureThreshold: 30
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            periodSeconds: 10
            failureThreshold: 3

The app declares no ports block. That is intentional. Two env vars tell your handler where to find the sidecar: CHIO_SIDECAR_URL for evaluate / record calls and CHIO_SIDECAR_HEALTH_URL for readiness gating in your own startup hooks. The startup probe gives the app up to 60 seconds (30 attempts × 2s) to come online; the liveness probe checks every 10 seconds and recycles after 3 consecutive failures.

Sidecar container

deploy/cloud-run/service.yaml
        - name: chio-sidecar
          image: ghcr.io/backbay-labs/chio-sidecar:latest
          ports:
            - containerPort: 9090
          args:
            - "api"
            - "protect"
            - "--upstream"
            - "http://127.0.0.1:8080"
            - "--spec"
            - "/etc/chio/spec/openapi.yaml"
            - "--listen"
            - "0.0.0.0:9090"

The sidecar image's default CMD is --help, which exits immediately. The manifest overrides CMD via args so the entrypoint (/sbin/tini -- /usr/local/bin/chio) is preserved. api protect reverse-proxies upstream traffic through the kernel. Swap to mcp serve-http -- <wrapped server cmd> when the sidecar is fronting an MCP tool server instead of an HTTP app.

Sidecar env and secret mounts

deploy/cloud-run/service.yaml
          env:
            - name: CHIO_LISTEN_ADDR
              value: "0.0.0.0:9090"
            - name: CHIO_HEALTH_PATH
              value: "/chio/health"
            - name: CHIO_KERNEL_CONFIG_PATH
              value: "/etc/chio/kernel/kernel.yaml"
            - name: CHIO_POLICY_SOURCE
              value: "gs://PROJECT_ID-chio-config/policy.yaml"
            - name: CHIO_RECEIPT_SINK
              value: "bigquery://PROJECT_ID.chio.receipts"
            - name: CHIO_LOG_LEVEL
              value: "info"
            - name: CHIO_CAPABILITY_AUTHORITY_URL
              valueFrom:
                secretKeyRef:
                  name: chio-capability-authority-url
                  key: latest
            - name: CHIO_SIGNING_KEY
              valueFrom:
                secretKeyRef:
                  name: chio-signing-key
                  key: latest
          volumeMounts:
            - name: chio-kernel-config
              mountPath: /etc/chio/kernel
            - name: chio-openapi-spec
              mountPath: /etc/chio/spec

Inline values cover non-sensitive operational knobs (listen address, health path, policy URI, receipt sink, log level). The two sensitive env vars resolve through secretKeyRef against Secret Manager. Two file-mounted secrets land at /etc/chio/kernel/kernel.yaml and /etc/chio/spec/openapi.yaml; the kernel reads the first as its config root and api protect --spec reads the second for OpenAPI-shape policy.

Sidecar probes and resources

deploy/cloud-run/service.yaml
          resources:
            limits:
              cpu: "500m"
              memory: 128Mi
          startupProbe:
            httpGet:
              path: /chio/health
              port: 9090
            initialDelaySeconds: 1
            periodSeconds: 1
            failureThreshold: 30
          livenessProbe:
            httpGet:
              path: /chio/health
              port: 9090
            periodSeconds: 10
            failureThreshold: 3

The sidecar runs on 500m CPU and 128Mi memory. The startup probe polls /chio/health once a second for up to 30 seconds; only after it succeeds does Cloud Run start the app container per the dependency annotation. Liveness checks every 10 seconds, failing after 3 misses, which recycles the revision if the kernel hangs.

Volumes (Secret Manager-backed)

deploy/cloud-run/service.yaml
      volumes:
        - name: chio-kernel-config
          secret:
            secretName: chio-kernel-config
            items:
              - key: latest
                path: kernel.yaml
        - name: chio-openapi-spec
          secret:
            secretName: chio-openapi-spec
            items:
              - key: latest
                path: openapi.yaml

Cloud Run resolves both volumes against Secret Manager at revision start. key: latest tracks the most recent enabled version; pin a numeric version (key: "7") to make rollouts reproducible across regions.


Secrets

Create the four secrets before the first deploy. The signing key and capability authority URL inject as env vars; the kernel config and OpenAPI spec mount as files.

bash
# Signing key (raw key material, base64 in the secret payload)
$ gcloud secrets create chio-signing-key --replication-policy=automatic
$ gcloud secrets versions add chio-signing-key --data-file=./signing-key.b64

# Capability authority URL (e.g. https://ctl-a.chio.internal:8940)
$ gcloud secrets create chio-capability-authority-url --replication-policy=automatic
$ printf 'https://ctl-a.chio.internal:8940' | \
    gcloud secrets versions add chio-capability-authority-url --data-file=-

# Kernel config (full kernel.yaml)
$ gcloud secrets create chio-kernel-config --replication-policy=automatic
$ gcloud secrets versions add chio-kernel-config --data-file=./kernel.yaml

# OpenAPI spec for the upstream app
$ gcloud secrets create chio-openapi-spec --replication-policy=automatic
$ gcloud secrets versions add chio-openapi-spec --data-file=./openapi.yaml

Grant the runtime service account read access on every secret:

bash
$ for s in chio-signing-key chio-capability-authority-url \
           chio-kernel-config chio-openapi-spec; do
    gcloud secrets add-iam-policy-binding "$s" \
      --member="serviceAccount:chio-sidecar@PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/secretmanager.secretAccessor"
  done

Rotate signing keys behind the trusted-key history

Update chio-signing-key by adding a new version, not by overwriting the latest version in place. The capability authority's trusted-key history keeps the previous key valid for in-flight capabilities; replacing the value without a rotation invalidates every live capability the next time the revision restarts.

Networking

The default *.run.app URL is publicly reachable over HTTPS with a Google-managed certificate. For internal-only services, add run.googleapis.com/ingress: internal to the service annotations and front it with an internal HTTPS load balancer.

If the kernel needs to reach a private capability authority or a VPC-peered receipt store, attach a Serverless VPC connector:

deploy/cloud-run/service.yaml
metadata:
  annotations:
    run.googleapis.com/vpc-access-connector: projects/PROJECT_ID/locations/REGION/connectors/chio-vpc
    run.googleapis.com/vpc-access-egress: private-ranges-only

For mTLS between the sidecar and an upstream control plane, terminate at the kernel via CHIO_CONTROL_TLS_CLIENT_CERT and CHIO_CONTROL_TLS_CLIENT_KEY env vars sourced from Secret Manager.


Health Probes and Graceful Shutdown

Probe configuration above. On scale-in or revision rollover, Cloud Run sends SIGTERM and waits up to 10 seconds before SIGKILL. The sidecar handles SIGTERM by stopping ingress, draining in-flight evaluations, flushing queued receipts, and exiting. The app should mirror the same contract: stop accepting new requests, finish in-flight ones, exit.

Drain longer than 10 seconds

For long-running tool calls, set timeoutSeconds on the request side and tune drain via CHIO_SHUTDOWN_DRAIN_MS. Cloud Run's 10-second termination grace cannot be raised on regional services; if you need longer, switch to Cloud Run jobs or a GKE deployment.

Scaling

Three knobs control scale: min instances, max instances, and per-instance concurrency.

SettingManifest fieldDefault in manifest
Minimum warm instancesautoscaling.knative.dev/minScale"1"
Maximum instancesautoscaling.knative.dev/maxScale"100"
Per-instance concurrencycontainerConcurrency80
Request timeouttimeoutSeconds300

CPU is request-based by default: the sidecar gets CPU only while a request is in flight. Background timers in the kernel (policy refresh, receipt flush) stall under request-based CPU, so set run.googleapis.com/cpu-throttling: "false" on the template annotations to switch to always-allocated CPU. That changes billing from per-request CPU to per-instance CPU, so match it to the workload.

deploy/cloud-run/service.yaml
spec:
  template:
    metadata:
      annotations:
        run.googleapis.com/cpu-throttling: "false"

Observability

Stdout and stderr from both containers route to Cloud Logging automatically, tagged with resource.type=cloud_run_revision and the container name. The kernel emits structured JSON; query receipts and decisions with:

bash
$ gcloud logging read \
    'resource.type="cloud_run_revision"
     AND resource.labels.service_name="agent-tool-server"
     AND jsonPayload.event="receipt"
     AND jsonPayload.verdict="deny"' \
    --limit=50 --format=json

For metrics and traces, sidecar an OTel collector or use the in-process exporter and point it at Cloud Trace + Cloud Monitoring. See Observability for the full collector wiring.


Cost Considerations

Cloud Run bills on vCPU-seconds, memory-GiB-seconds, and request count. Three knobs dominate the bill:

  • minScale: a warm instance bills 24/7. minScale: 1 on the reference revision is one container always-on for the sidecar plus one for the app. Drop to "0" in dev to remove the floor.
  • CPU allocation: switching to always-allocated CPU roughly triples the per-instance bill but is required if the kernel runs background timers or your app serves streaming responses.
  • Concurrency: raising containerConcurrency packs more requests per instance and lowers per-request cost, bounded by your tool server's actual concurrency safety.

Operations

Deploy

bash
$ gcloud run services replace deploy/cloud-run/service.yaml \
    --region=us-central1 \
    --project=PROJECT_ID

services replace creates a new revision and routes 100% of traffic to it once the startup probe and dependency graph clear. To stage traffic, deploy with --no-traffic and split afterwards:

bash
$ gcloud run services update-traffic agent-tool-server \
    --region=us-central1 \
    --to-revisions=agent-tool-server-00042-abc=10,agent-tool-server-00041-zyx=90

Rollback

bash
$ gcloud run services update-traffic agent-tool-server \
    --region=us-central1 \
    --to-revisions=agent-tool-server-00041-zyx=100

Logs and debugging

bash
# Tail the chio sidecar
$ gcloud beta run services logs tail agent-tool-server \
    --region=us-central1 \
    --container=chio-sidecar

# Inspect a specific revision
$ gcloud run revisions describe agent-tool-server-00042-abc \
    --region=us-central1 \
    --format="value(status.conditions)"

Worked Example

Full sequence from a clean project to a verified deploy:

bash
# 1. Create the runtime service account and bind the role.
$ gcloud iam service-accounts create chio-sidecar \
    --project=PROJECT_ID

$ gcloud projects add-iam-policy-binding PROJECT_ID \
    --member="serviceAccount:chio-sidecar@PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/run.invoker"

# 2. Create secrets (see Secrets section above).

# 3. Deploy.
$ gcloud run services replace deploy/cloud-run/service.yaml \
    --region=us-central1 \
    --project=PROJECT_ID

# 4. Capture the URL.
$ URL=$(gcloud run services describe agent-tool-server \
    --region=us-central1 \
    --format='value(status.url)')

$ echo "$URL"
https://agent-tool-server-7gh3a-uc.a.run.app

# 5. Verify the sidecar is the front door.
$ curl -fsS "$URL/chio/health" | jq
{
  "ok": true,
  "kernel": "ready",
  "policy_loaded": true,
  "authority_generation": 7
}

# 6. Verify the app is unreachable except through the kernel.
$ curl -fsS "$URL/healthz"
# 403 capability_required
# {"error": "capability_denied", "reason": "missing_capability_token", ...}

# 7. With a valid capability token, the call lands on the app:
$ curl -fsS "$URL/api/search" \
    -H "Authorization: Bearer $CHIO_CAPABILITY_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{"query":"hello"}'

If the revision wedges on cold start

Check the order of conditions in status.conditions. The most common cause is an unreadable secret: the sidecar fails closed on CHIO_KERNEL_CONFIG_PATH load failure, the startup probe never goes green, the dependency graph blocks the app, and the revision is marked unhealthy. gcloud run revisions describe surfaces the underlying Secret Manager IAM denial.

For other deployment shapes, see Sidecar, ECS Fargate, and Azure Container Apps. For receipt querying and key rotation, see Trust Control Plane.

Cloud Run · Chio Docs