ECS Fargate · Chio Docs

/Docs

Architecture

The task uses awsvpc network mode, so both containers share an ENI and a single private IP. The sidecar listens on 9090 and is registered to the ALB target group. The app listens on 8080 on the same loopback and is reachable only through the kernel.

rendering…

ECS Fargate runs the Chio sidecar and app as one task with awsvpc networking. The sidecar is the only ALB target. EFS mounts the OpenAPI spec read-only and the authority seed via an access point; a per-task EBS volume holds the durable SQLite receipt store; Secrets Manager injects the sidecar control token.

Sidecar is the only target group target

Register the ALB target group to port 9090, never 8080. The app does not need a port mapping for ingress, only for the in-task loopback between the two containers.

Manifest Walkthrough

Task-level shape

deploy/ecs/task-definition.json

{
  "family": "agent-tool-server",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "executionRoleArn": "arn:aws:iam::ACCOUNT_ID:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::ACCOUNT_ID:role/chio-sidecar-task-role",
  "runtimePlatform": {
    "cpuArchitecture": "X86_64",
    "operatingSystemFamily": "LINUX"
  }

The task runs on Fargate with awsvpc networking. Total task budget is 0.5 vCPU and 1024 MiB; the per-container values below subdivide that ceiling. Two roles separate concerns:

executionRoleArn: ECS uses this at boot to pull the image from ECR / GHCR, decrypt Secrets Manager values into env vars, and create CloudWatch log groups.
taskRoleArn: the kernel and your app inherit this at runtime for the EFS mounts and any AWS API calls your app makes (S3, KMS, etc.). Receipts are written to its local SQLite store; it does not call a cloud API.

Application container

deploy/ecs/task-definition.json

    {
      "name": "app",
      "image": "APP_IMAGE_PLACEHOLDER",
      "essential": true,
      "cpu": 384,
      "memory": 896,
      "portMappings": [
        { "containerPort": 8080, "hostPort": 8080, "protocol": "tcp" }
      ],
      "environment": [
        { "name": "CHIO_SIDECAR_URL", "value": "http://localhost:9090" }
      ],
      "dependsOn": [
        { "containerName": "chio-sidecar", "condition": "HEALTHY" }
      ],
      "restartPolicy": {
        "enabled": true,
        "ignoredExitCodes": [],
        "restartAttemptPeriod": 60
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/agent-tool-server",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "app",
          "awslogs-create-group": "true"
        }
      }
    }

Two operational contracts here:

dependsOn.condition: HEALTHY blocks the app until the sidecar reports HEALTHY from its docker healthcheck. The app never starts before the kernel is ready.
restartPolicy.enabled: true with a 60-second restart-attempt period lets ECS restart a flapping container without recycling the entire task.

Sidecar container

deploy/ecs/task-definition.json

    {
      "name": "chio-sidecar",
      "image": "ghcr.io/backbay-labs/chio-sidecar:latest",
      "essential": true,
      "cpu": 128,
      "memory": 128,
      "portMappings": [
        { "containerPort": 9090, "hostPort": 9090, "protocol": "tcp" }
      ],
      "dockerLabels": {
        "prometheus.io/path": "/metrics",
        "prometheus.io/port": "9090"
      },
      "command": [
        "api",
        "protect",
        "--upstream",
        "http://127.0.0.1:8080",
        "--spec",
        "/etc/chio/spec/openapi.yaml",
        "--listen",
        "0.0.0.0:9090",
        "--receipt-store",
        "/var/lib/chio/receipts.db",
        "--authority-seed-file",
        "/etc/chio/seed/authority.seed"
      ]

command overrides the image's default CMD (--help) so the container stays up and serves requests. The kernel reverse-proxies to 127.0.0.1:8080 after guard evaluation. Configure these two ECS flags: --receipt-store points at a durable SQLite audit log on the per-task EBS volume mounted at /var/lib/chio, and the global --authority-seed-file reads the signing seed from the EFS-mounted secret file. The dockerLabels advertise a Prometheus scrape target on the ingress port; the /metrics route answers only a loopback caller or one bearing Authorization: Bearer $CHIO_SIDECAR_CONTROL_TOKEN, so a bare prometheus.io/scrape: "true" is deliberately omitted.

Sidecar environment and secrets

deploy/ecs/task-definition.json

      "environment": [
        { "name": "CHIO_LOG_LEVEL", "value": "info" }
      ],
      "secrets": [
        {
          "name": "CHIO_SIDECAR_CONTROL_TOKEN",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT_ID:secret:chio/sidecar-control-token"
        }
      ]

The sidecar takes all of its operational configuration from the command-array flags above; there is no listen-address, health-path, kernel-config, policy-source, or receipt-sink env var. The lone plain value is CHIO_LOG_LEVEL, and the lone secret is CHIO_SIDECAR_CONTROL_TOKEN, a bearer token that gates the admin and /metrics endpoint. ECS resolves it at task start using the execution role. The signing seed is not a secret env var; it is the EFS-mounted file at /etc/chio/seed/authority.seed read through --authority-seed-file.

Volume mounts and health check

deploy/ecs/task-definition.json

      "mountPoints": [
        {
          "sourceVolume": "chio-config",
          "containerPath": "/etc/chio",
          "readOnly": true
        },
        {
          "sourceVolume": "chio-seed",
          "containerPath": "/etc/chio/seed",
          "readOnly": true
        },
        {
          "sourceVolume": "chio-receipts",
          "containerPath": "/var/lib/chio",
          "readOnly": false
        }
      ],
      "healthCheck": {
        "command": ["CMD", "/usr/bin/curl", "-fsS", "http://localhost:9090/chio/health"],
        "interval": 10,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 15
      },
      "restartPolicy": {
        "enabled": true,
        "ignoredExitCodes": [],
        "restartAttemptPeriod": 60
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/agent-tool-server",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "chio-sidecar",
          "awslogs-create-group": "true"
        }
      },
      "readonlyRootFilesystem": true,
      "user": "65532:65532"
    }

The sidecar mounts three volumes: the read-only OpenAPI-spec share at /etc/chio, the read-only seed access point at /etc/chio/seed, and the read-write per-task EBS volume at /var/lib/chio for the SQLite receipt store. The health check curls the readiness route /chio/health every 10 seconds with a 5-second timeout, marks the container HEALTHY after the first success, and gives a 15-second grace period at startup. After 3 consecutive failures the container is marked unhealthy and ECS recycles the task. Two security defaults are worth keeping:

readonlyRootFilesystem: true prevents in-container writes outside mounted volumes.
user: "65532:65532" runs the kernel as the non-root distroless user baked into the sidecar image.

Volumes

deploy/ecs/task-definition.json

  "volumes": [
    {
      "name": "chio-config",
      "efsVolumeConfiguration": {
        "fileSystemId": "EFS_FILESYSTEM_ID",
        "rootDirectory": "/chio-config",
        "transitEncryption": "ENABLED",
        "authorizationConfig": {
          "iam": "ENABLED"
        }
      }
    },
    {
      "name": "chio-seed",
      "efsVolumeConfiguration": {
        "fileSystemId": "EFS_FILESYSTEM_ID",
        "transitEncryption": "ENABLED",
        "authorizationConfig": {
          "accessPointId": "EFS_SEED_ACCESS_POINT_ID",
          "iam": "ENABLED"
        }
      }
    },
    {
      "name": "chio-receipts",
      "configuredAtLaunch": true
    }
  ]
}

Three volumes back the task. chio-config is EFS, read-only, and holds only the OpenAPI spec at /chio-config/spec/openapi.yaml — there is no kernel config file. chio-seed is a dedicated EFS access point holding the authority seed. Transit encryption and IAM authorization are mandatory for Fargate EFS. chio-receipts is different: "configuredAtLaunch": true makes it an ECS-managed per-task EBS volume, attached at launch, not EFS. WAL-mode SQLite needs a local (non-network) disk, so the durable receipt store cannot live on EFS.

IAM Roles

Two roles separate boot-time and runtime concerns. The execution role is consumed by ECS itself before any container runs; the task role is what the kernel and app see at runtime.

Role	Required actions	Scoped to
Execution	`ecr:BatchGetImage`, `ecr:GetDownloadUrlForLayer`, `ecr:GetAuthorizationToken`	Image pulls
Execution	`logs:CreateLogGroup`, `logs:CreateLogStream`, `logs:PutLogEvents`	`/ecs/agent-tool-server:*`
Execution	`secretsmanager:GetSecretValue`	`chio/sidecar-control-token-*`
Task	`elasticfilesystem:ClientMount`, `elasticfilesystem:ClientWrite`	EFS file system (spec share and seed access point)

Container Ordering

The dependsOn array on the app forces the sidecar to reach HEALTHY before the app starts. ECS evaluates the docker health check on the sidecar:

Sidecar starts, opens listener on :9090.
ECS waits the 15-second startPeriod, then runs curl /chio/health every 10 seconds.
First successful curl marks sidecar HEALTHY.
App container becomes eligible to start. The app's own probe (configured at the load balancer or your runtime) takes over after that.

Bake curl into the sidecar image

The health check uses /usr/bin/curl. The published ghcr.io/backbay-labs/chio-sidecar image includes curl. If you build a slimmer derivative without curl, switch the health check to ["CMD-SHELL", "wget -qO- http://localhost:9090/chio/health || exit 1"] and confirm wget is present.

Secrets and Mounted Files

Create the single Secrets Manager entry and stage the EFS-resident files before registering the task definition.

bash

# Sidecar control token (bearer token gating admin + /metrics).
$ aws secretsmanager create-secret --name chio/sidecar-control-token \
    --secret-string "$(openssl rand -hex 32)"

# Stage the OpenAPI spec into the chio-config EFS share and the
# authority seed into the chio-seed EFS access point (DataSync or a
# one-shot helper task), layout:
#   /chio-config/spec/openapi.yaml   -> mounted read-only at /etc/chio
#   <seed access point>/authority.seed -> mounted read-only at /etc/chio/seed

The signing seed is delivered as a mounted file, never as key material in a secret env var. For SSM Parameter Store instead of Secrets Manager for the control token, swap the valueFrom ARN to the SSM parameter ARN and grant ssm:GetParameters on the execution role.

Networking

The task gets its own ENI in the subnet you assign at service creation. Two security groups matter: the task SG (ingress on 9090 from the ALB SG only, no ingress on 8080, egress to outbound dependencies) and the EFS SG (NFS port 2049 ingress from the task SG). Register the ALB target group to port 9090 on the chio-sidecar container:

bash

$ aws ecs create-service --cluster prod-cluster --service-name agent-tool-server \
    --task-definition agent-tool-server:1 --desired-count 2 --launch-type FARGATE \
    --network-configuration "awsvpcConfiguration={subnets=[subnet-aaa,subnet-bbb],securityGroups=[sg-task],assignPublicIp=DISABLED}" \
    --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:ACCOUNT_ID:targetgroup/chio-sidecar-tg/abc,containerName=chio-sidecar,containerPort=9090"

Configure the ALB target group health check to GET /chio/health on port 9090, expecting HTTP 200. That doubles as load-balancer-level draining: a sidecar that fails the kernel-side health check is removed from the target group before the docker healthcheck recycles the task.

Health Probes and Graceful Shutdown

ECS sends SIGTERM to all containers on task stop and waits up to stopTimeout seconds (30 by default; raise via task definition for long drain) before SIGKILL. The sidecar handles SIGTERM by stopping ingress, draining in-flight evaluations, flushing receipts to the configured sink, and exiting.

Pair this with ALB connection draining: set the target group's deregistration delay to a few seconds longer than the in-flight request budget so the load balancer stops sending traffic before the task gets SIGTERM.

Scaling

Scale horizontally with the ECS service's desiredCount and Application Auto Scaling target tracking. CPU and ALBRequestCountPerTarget are the two predefined metrics worth wiring first.

bash

# Register the service as a scalable target.
$ aws application-autoscaling register-scalable-target \
    --service-namespace ecs \
    --resource-id service/prod-cluster/agent-tool-server \
    --scalable-dimension ecs:service:DesiredCount \
    --min-capacity 2 --max-capacity 50

# Track average CPU at 60% across the service.
$ aws application-autoscaling put-scaling-policy \
    --service-namespace ecs \
    --resource-id service/prod-cluster/agent-tool-server \
    --scalable-dimension ecs:service:DesiredCount \
    --policy-name cpu60 --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration '{
      "TargetValue": 60.0,
      "PredefinedMetricSpecification": {
        "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
      },
      "ScaleOutCooldown": 60, "ScaleInCooldown": 300
    }'

# Blend Fargate Spot for cost: one on-demand base, the rest on Spot.
$ aws ecs update-service --cluster prod-cluster --service agent-tool-server \
    --capacity-provider-strategy \
        capacityProvider=FARGATE,weight=1,base=1 \
        capacityProvider=FARGATE_SPOT,weight=4,base=0

Observability

The task definition routes both containers to /ecs/agent-tool-server with stream prefixes per container. Tail with:

bash

# Live tail (requires the awslogs CLI plugin)
$ aws logs tail /ecs/agent-tool-server --follow --since 5m

# Filter denied receipts
$ aws logs start-query \
    --log-group-name /ecs/agent-tool-server \
    --start-time $(date -d '1 hour ago' +%s) \
    --end-time $(date +%s) \
    --query-string 'fields @timestamp, @message
                    | filter event = "receipt" and verdict = "deny"
                    | sort @timestamp desc
                    | limit 50'

Set retention on the log group explicitly; CloudWatch defaults to never expire:

bash

$ aws logs put-retention-policy \
    --log-group-name /ecs/agent-tool-server \
    --retention-in-days 30

The sidecar also serves Prometheus /metrics on port 9090 behind the chio/sidecar-control-token gate: a remote scraper must present Authorization: Bearer $CHIO_SIDECAR_CONTROL_TOKEN, while a co-located loopback collector needs no token. For traces, attach a third sidecar container running the OTel collector and point the kernel at it via the standard OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317. Detail in Observability.

Cost Considerations

Fargate bills per-second on vCPU and memory plus storage and data transfer. Three knobs dominate: task size (the reference 0.5 vCPU / 1024 MiB shape; drop to 0.25 vCPU only if app and sidecar together fit), Spot share (FARGATE_SPOT capacity provider cuts per-task cost ~70%; the sidecar drains cleanly inside the 2-minute interruption notice), and log retention (receipts live in the SQLite store on the per-task EBS volume, so CloudWatch holds only process logs; cap its retention at 7-30 days).

Operations

Deploy is two calls: register a task definition revision, then roll the service forward. Rollback points the service at a previous revision number. ECS Exec attaches a shell to a running container.

bash

# Deploy a new revision.
$ aws ecs register-task-definition --cli-input-json file://deploy/ecs/task-definition.json
$ aws ecs update-service --cluster prod-cluster --service agent-tool-server \
    --task-definition agent-tool-server --force-new-deployment

# Roll back to revision 42.
$ aws ecs update-service --cluster prod-cluster --service agent-tool-server \
    --task-definition agent-tool-server:42

# Open a shell on a running sidecar (requires --enable-execute-command on the service).
$ aws ecs execute-command --cluster prod-cluster --task <task-arn> \
    --container chio-sidecar --interactive --command "/bin/sh"

Worked Example

After IAM roles, secrets, and EFS config are in place:

bash

# Register the task definition.
$ aws ecs register-task-definition \
    --cli-input-json file://deploy/ecs/task-definition.json

# Create the service behind the ALB.
$ aws ecs create-service \
    --cluster prod-cluster \
    --service-name agent-tool-server \
    --task-definition agent-tool-server:1 \
    --desired-count 2 \
    --launch-type FARGATE \
    --network-configuration "awsvpcConfiguration={subnets=[subnet-aaa,subnet-bbb],securityGroups=[sg-task]}" \
    --load-balancers "targetGroupArn=...,containerName=chio-sidecar,containerPort=9090" \
    --enable-execute-command

# Wait for steady state.
$ aws ecs wait services-stable --cluster prod-cluster --services agent-tool-server

# Verify via the ALB.
$ ALB=$(aws elbv2 describe-load-balancers --names chio-alb \
    --query 'LoadBalancers[0].DNSName' --output text)

$ curl -fsS "https://$ALB/chio/health" | jq
{
  "status": "healthy",
  "version": "0.1.0",
  "receipt_backend": "durable",
  "revocation_backend": "durable"
}

# The app is unreachable except through the kernel.
$ curl -fsS "https://$ALB/api/search" \
    -H "Authorization: Bearer $CHIO_CAPABILITY_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{"query":"hello"}'

If the task keeps recycling

The most common cause is the dependency block: the sidecar fails its docker healthcheck because the EFS mount is empty or the execution role cannot read a referenced secret. Start with aws ecs describe-tasks and look at stoppedReason plus per-container exit codes. EFS denials are reported as ResourceInitializationError before any container runs.

For other deployment shapes, see Cloud Run and Azure Container Apps. For the Lambda Extension flavour of AWS-native Chio, see AWS Lambda.

PreviousCloud Run NextAzure Container Apps