Cloud Run
Cloud Run runs a chio-governed tool server as a single Knative Service with two containers: your application and the chio sidecar. The sidecar is the only ingress container, the app stays bound to localhost, and Cloud Run enforces startup ordering so the app does not accept traffic until the sidecar reports healthy. Reference manifest is deploy/cloud-run/service.yaml in the Arc repo. Prerequisites: a GCP project with the Cloud Run, Secret Manager, and IAM APIs enabled, a service account that the revision will run as, and a registry that Cloud Run can pull from (Artifact Registry or GHCR).
Architecture
Cloud Run accepts external traffic on the only container that declares a containerPort. That is the sidecar on 9090. The app listens on 8080 and is reachable only over the in-pod loopback. The sidecar reverse-proxies traffic to the app after the kernel guard pipeline has accepted the request.
Single ingress container
containerPort. If you accidentally also declare a port on the app container, the sidecar is bypassed and every request escapes the kernel.Manifest Walkthrough
The full reference is a Knative Service named agent-tool-server. Walk it top-to-bottom; each chunk maps to one operational concern.
Service metadata and template annotations
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: agent-tool-server
annotations:
run.googleapis.com/launch-stage: GA
spec:
template:
metadata:
annotations:
# Keep at least one warm instance to amortise sidecar cold starts.
autoscaling.knative.dev/minScale: "1"
autoscaling.knative.dev/maxScale: "100"
# App waits until chio-sidecar startupProbe succeeds.
run.googleapis.com/container-dependencies: '{"app":["chio-sidecar"]}'
run.googleapis.com/execution-environment: gen2Three annotations carry the operational contract:
autoscaling.knative.dev/minScale: "1"keeps one warm revision so the sidecar cold start never lands on a request.run.googleapis.com/container-dependencieswires the app to wait until the sidecar reports healthy via itsstartupProbe. Without this, the app could accept traffic before the kernel has loaded policy.run.googleapis.com/execution-environment: gen2selects the Linux cgroup-v2 sandbox required for multi-container services and Secret Manager volume mounts.
Service account and concurrency
spec:
serviceAccountName: chio-sidecar@PROJECT_ID.iam.gserviceaccount.com
containerConcurrency: 80
timeoutSeconds: 300The revision runs as chio-sidecar@PROJECT_ID. That identity needs roles/secretmanager.secretAccessor on every secret the manifest references, plus whatever your app and the receipt sink require (BigQuery, GCS, etc.). containerConcurrency: 80 is the per-instance request ceiling Cloud Run will pack before spinning a new one. timeoutSeconds: 300 is the request budget; long-running tool calls should chunk or switch to streaming.
Application container
containers:
- name: app
image: APP_IMAGE_PLACEHOLDER
env:
- name: CHIO_SIDECAR_URL
value: "http://localhost:9090"
- name: CHIO_SIDECAR_HEALTH_URL
value: "http://localhost:9090/chio/health"
resources:
limits:
cpu: "1"
memory: 512Mi
startupProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 2
periodSeconds: 2
failureThreshold: 30
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
failureThreshold: 3The app declares no ports block. That is intentional. Two env vars tell your handler where to find the sidecar: CHIO_SIDECAR_URL for evaluate / record calls and CHIO_SIDECAR_HEALTH_URL for readiness gating in your own startup hooks. The startup probe gives the app up to 60 seconds (30 attempts × 2s) to come online; the liveness probe checks every 10 seconds and recycles after 3 consecutive failures.
Sidecar container
- name: chio-sidecar
image: ghcr.io/backbay-labs/chio-sidecar:latest
ports:
- containerPort: 9090
args:
- "api"
- "protect"
- "--upstream"
- "http://127.0.0.1:8080"
- "--spec"
- "/etc/chio/spec/openapi.yaml"
- "--listen"
- "0.0.0.0:9090"The sidecar image's default CMD is --help, which exits immediately. The manifest overrides CMD via args so the entrypoint (/sbin/tini -- /usr/local/bin/chio) is preserved. api protect reverse-proxies upstream traffic through the kernel. Swap to mcp serve-http -- <wrapped server cmd> when the sidecar is fronting an MCP tool server instead of an HTTP app.
Sidecar env and secret mounts
env:
- name: CHIO_LISTEN_ADDR
value: "0.0.0.0:9090"
- name: CHIO_HEALTH_PATH
value: "/chio/health"
- name: CHIO_KERNEL_CONFIG_PATH
value: "/etc/chio/kernel/kernel.yaml"
- name: CHIO_POLICY_SOURCE
value: "gs://PROJECT_ID-chio-config/policy.yaml"
- name: CHIO_RECEIPT_SINK
value: "bigquery://PROJECT_ID.chio.receipts"
- name: CHIO_LOG_LEVEL
value: "info"
- name: CHIO_CAPABILITY_AUTHORITY_URL
valueFrom:
secretKeyRef:
name: chio-capability-authority-url
key: latest
- name: CHIO_SIGNING_KEY
valueFrom:
secretKeyRef:
name: chio-signing-key
key: latest
volumeMounts:
- name: chio-kernel-config
mountPath: /etc/chio/kernel
- name: chio-openapi-spec
mountPath: /etc/chio/specInline values cover non-sensitive operational knobs (listen address, health path, policy URI, receipt sink, log level). The two sensitive env vars resolve through secretKeyRef against Secret Manager. Two file-mounted secrets land at /etc/chio/kernel/kernel.yaml and /etc/chio/spec/openapi.yaml; the kernel reads the first as its config root and api protect --spec reads the second for OpenAPI-shape policy.
Sidecar probes and resources
resources:
limits:
cpu: "500m"
memory: 128Mi
startupProbe:
httpGet:
path: /chio/health
port: 9090
initialDelaySeconds: 1
periodSeconds: 1
failureThreshold: 30
livenessProbe:
httpGet:
path: /chio/health
port: 9090
periodSeconds: 10
failureThreshold: 3The sidecar runs on 500m CPU and 128Mi memory. The startup probe polls /chio/health once a second for up to 30 seconds; only after it succeeds does Cloud Run start the app container per the dependency annotation. Liveness checks every 10 seconds, failing after 3 misses, which recycles the revision if the kernel hangs.
Volumes (Secret Manager-backed)
volumes:
- name: chio-kernel-config
secret:
secretName: chio-kernel-config
items:
- key: latest
path: kernel.yaml
- name: chio-openapi-spec
secret:
secretName: chio-openapi-spec
items:
- key: latest
path: openapi.yamlCloud Run resolves both volumes against Secret Manager at revision start. key: latest tracks the most recent enabled version; pin a numeric version (key: "7") to make rollouts reproducible across regions.
Secrets
Create the four secrets before the first deploy. The signing key and capability authority URL inject as env vars; the kernel config and OpenAPI spec mount as files.
# Signing key (raw key material, base64 in the secret payload)
$ gcloud secrets create chio-signing-key --replication-policy=automatic
$ gcloud secrets versions add chio-signing-key --data-file=./signing-key.b64
# Capability authority URL (e.g. https://ctl-a.chio.internal:8940)
$ gcloud secrets create chio-capability-authority-url --replication-policy=automatic
$ printf 'https://ctl-a.chio.internal:8940' | \
gcloud secrets versions add chio-capability-authority-url --data-file=-
# Kernel config (full kernel.yaml)
$ gcloud secrets create chio-kernel-config --replication-policy=automatic
$ gcloud secrets versions add chio-kernel-config --data-file=./kernel.yaml
# OpenAPI spec for the upstream app
$ gcloud secrets create chio-openapi-spec --replication-policy=automatic
$ gcloud secrets versions add chio-openapi-spec --data-file=./openapi.yamlGrant the runtime service account read access on every secret:
$ for s in chio-signing-key chio-capability-authority-url \
chio-kernel-config chio-openapi-spec; do
gcloud secrets add-iam-policy-binding "$s" \
--member="serviceAccount:chio-sidecar@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
doneRotate signing keys behind the trusted-key history
chio-signing-key by adding a new version, not by overwriting the latest version in place. The capability authority's trusted-key history keeps the previous key valid for in-flight capabilities; replacing the value without a rotation invalidates every live capability the next time the revision restarts.Networking
The default *.run.app URL is publicly reachable over HTTPS with a Google-managed certificate. For internal-only services, add run.googleapis.com/ingress: internal to the service annotations and front it with an internal HTTPS load balancer.
If the kernel needs to reach a private capability authority or a VPC-peered receipt store, attach a Serverless VPC connector:
metadata:
annotations:
run.googleapis.com/vpc-access-connector: projects/PROJECT_ID/locations/REGION/connectors/chio-vpc
run.googleapis.com/vpc-access-egress: private-ranges-onlyFor mTLS between the sidecar and an upstream control plane, terminate at the kernel via CHIO_CONTROL_TLS_CLIENT_CERT and CHIO_CONTROL_TLS_CLIENT_KEY env vars sourced from Secret Manager.
Health Probes and Graceful Shutdown
Probe configuration above. On scale-in or revision rollover, Cloud Run sends SIGTERM and waits up to 10 seconds before SIGKILL. The sidecar handles SIGTERM by stopping ingress, draining in-flight evaluations, flushing queued receipts, and exiting. The app should mirror the same contract: stop accepting new requests, finish in-flight ones, exit.
Drain longer than 10 seconds
timeoutSeconds on the request side and tune drain via CHIO_SHUTDOWN_DRAIN_MS. Cloud Run's 10-second termination grace cannot be raised on regional services; if you need longer, switch to Cloud Run jobs or a GKE deployment.Scaling
Three knobs control scale: min instances, max instances, and per-instance concurrency.
| Setting | Manifest field | Default in manifest |
|---|---|---|
| Minimum warm instances | autoscaling.knative.dev/minScale | "1" |
| Maximum instances | autoscaling.knative.dev/maxScale | "100" |
| Per-instance concurrency | containerConcurrency | 80 |
| Request timeout | timeoutSeconds | 300 |
CPU is request-based by default: the sidecar gets CPU only while a request is in flight. Background timers in the kernel (policy refresh, receipt flush) stall under request-based CPU, so set run.googleapis.com/cpu-throttling: "false" on the template annotations to switch to always-allocated CPU. That changes billing from per-request CPU to per-instance CPU, so match it to the workload.
spec:
template:
metadata:
annotations:
run.googleapis.com/cpu-throttling: "false"Observability
Stdout and stderr from both containers route to Cloud Logging automatically, tagged with resource.type=cloud_run_revision and the container name. The kernel emits structured JSON; query receipts and decisions with:
$ gcloud logging read \
'resource.type="cloud_run_revision"
AND resource.labels.service_name="agent-tool-server"
AND jsonPayload.event="receipt"
AND jsonPayload.verdict="deny"' \
--limit=50 --format=jsonFor metrics and traces, sidecar an OTel collector or use the in-process exporter and point it at Cloud Trace + Cloud Monitoring. See Observability for the full collector wiring.
Cost Considerations
Cloud Run bills on vCPU-seconds, memory-GiB-seconds, and request count. Three knobs dominate the bill:
- minScale: a warm instance bills 24/7.
minScale: 1on the reference revision is one container always-on for the sidecar plus one for the app. Drop to"0"in dev to remove the floor. - CPU allocation: switching to always-allocated CPU roughly triples the per-instance bill but is required if the kernel runs background timers or your app serves streaming responses.
- Concurrency: raising
containerConcurrencypacks more requests per instance and lowers per-request cost, bounded by your tool server's actual concurrency safety.
Operations
Deploy
$ gcloud run services replace deploy/cloud-run/service.yaml \
--region=us-central1 \
--project=PROJECT_IDservices replace creates a new revision and routes 100% of traffic to it once the startup probe and dependency graph clear. To stage traffic, deploy with --no-traffic and split afterwards:
$ gcloud run services update-traffic agent-tool-server \
--region=us-central1 \
--to-revisions=agent-tool-server-00042-abc=10,agent-tool-server-00041-zyx=90Rollback
$ gcloud run services update-traffic agent-tool-server \
--region=us-central1 \
--to-revisions=agent-tool-server-00041-zyx=100Logs and debugging
# Tail the chio sidecar
$ gcloud beta run services logs tail agent-tool-server \
--region=us-central1 \
--container=chio-sidecar
# Inspect a specific revision
$ gcloud run revisions describe agent-tool-server-00042-abc \
--region=us-central1 \
--format="value(status.conditions)"Worked Example
Full sequence from a clean project to a verified deploy:
# 1. Create the runtime service account and bind the role.
$ gcloud iam service-accounts create chio-sidecar \
--project=PROJECT_ID
$ gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:chio-sidecar@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/run.invoker"
# 2. Create secrets (see Secrets section above).
# 3. Deploy.
$ gcloud run services replace deploy/cloud-run/service.yaml \
--region=us-central1 \
--project=PROJECT_ID
# 4. Capture the URL.
$ URL=$(gcloud run services describe agent-tool-server \
--region=us-central1 \
--format='value(status.url)')
$ echo "$URL"
https://agent-tool-server-7gh3a-uc.a.run.app
# 5. Verify the sidecar is the front door.
$ curl -fsS "$URL/chio/health" | jq
{
"ok": true,
"kernel": "ready",
"policy_loaded": true,
"authority_generation": 7
}
# 6. Verify the app is unreachable except through the kernel.
$ curl -fsS "$URL/healthz"
# 403 capability_required
# {"error": "capability_denied", "reason": "missing_capability_token", ...}
# 7. With a valid capability token, the call lands on the app:
$ curl -fsS "$URL/api/search" \
-H "Authorization: Bearer $CHIO_CAPABILITY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"query":"hello"}'If the revision wedges on cold start
status.conditions. The most common cause is an unreadable secret: the sidecar fails closed on CHIO_KERNEL_CONFIG_PATH load failure, the startup probe never goes green, the dependency graph blocks the app, and the revision is marked unhealthy. gcloud run revisions describe surfaces the underlying Secret Manager IAM denial.For other deployment shapes, see Sidecar, ECS Fargate, and Azure Container Apps. For receipt querying and key rotation, see Trust Control Plane.