Trust Control Plane
The chio trust-control service centralizes capability issuance, revocation, receipt ingestion, and budget accounting for a fleet of kernel nodes. In production it runs as a small replicated cluster with deterministic leader election, monotonic budget guarantees, and seamless authority rotation. One command, chio trust serve, starts a node that knows how to be either a leader or a follower.
Why Run a Cluster
A single trust-control node is fine for development. In production, three invariants make a cluster non-optional:
- Cross-node receipts: one edge kernel writes a receipt; another edge kernel must be able to query it.
- Cross-node revocation: revoking a capability through one node must be enforced by every other node on the next request.
- Shared budget accounting: invocation budgets must exhaust consistently across nodes, not independently per-process.
The control plane solves all three from a single shared HTTP surface, so edge kernels stay thin and stateless with respect to trust.
Topology
A recommended deployment runs two or more trust-control nodes in front of a fleet of chio mcp serve-http edges. Every control node owns local durable SQLite state. Writes route to the current leader, follower nodes replicate on a short interval, and edges keep a multi-endpoint client list.
Not a consensus system
Leader Rule
The write leader is the lexicographically smallest healthy advertise_url in the cluster membership set. Health is based on successful control-peer syncs and local self-health.
Why this rule:
- Deterministic and easy to reason about.
- Requires no external coordinator.
- Automatic failover when the current leader goes unhealthy: the next smallest URL becomes the new leader.
- Trivial to test because the rule is pure input-to-output on the membership set.
Replication Model
Replication is per-store. Every kind of state has an idempotent replication contract so repair syncs converge even after transient peer failures.
| Store | Replication Shape | Merge Rule |
|---|---|---|
| Authority | Signed authority snapshots including signing seed, generation, rotated timestamp, trusted-key history | Highest observed generation wins; trusted-key history is the union |
| Revocations | Idempotent records keyed by capability ID | Union; a revocation cannot be undone by replication |
| Tool receipts | Idempotent append-only records keyed by receipt ID and sequence | Max observed sequence wins; never deletes |
| Child receipts | Idempotent append-only records keyed by receipt ID | Max observed sequence wins; never deletes |
| Budgets | Monotonic usage records keyed by (capability_id, grant_index) | Max observed invocation count wins |
Periodic repair sync runs even after successful write forwarding, so missed updates eventually converge. The default interval is --cluster-sync-interval-ms 500.
Budget-State Durability
Invocation budgets live in a pluggable BudgetStore. In distributed mode the store is backed by the control plane, so budget increments route to the current leader and replicate to followers as monotonic usage records.
- Local mode: in-memory or local SQLite (
SqliteBudgetStore). Budget exhaustion persists across process restart. - Distributed mode: remote
BudgetStorebacked by the control plane. Exhaustion semantics stay strong across nodes. - Follower merge rule: max observed invocation count per
(capability_id, grant_index). Budgets can only ever count up.
Monotonic budgets survive failover
Store Responsibilities
Capability Authority
The capability authority issues ed25519 signatures over capability tokens. The control plane owns the signing seed and the rotated trusted-key history. Edge kernels verify capability signatures against the full trusted set fetched from the control plane on a short TTL.
- Issue capabilities:
POST /v1/capabilities/issue. - Read authority state:
GET /v1/authority. - Rotate authority:
POST /v1/authority.
Revocation Store
Revocations are idempotent records keyed by capability ID. Once a capability is revoked on any node, every other node observes the revocation on the next repair sync or the next cross-node request. The edge kernel re-checks revocation status at the start of every tool call, so there is no stale-decision window on the critical path.
- Query:
GET /v1/revocations. - Mutate:
POST /v1/revocations.
Receipt Store
Tool receipts and child-request receipts are append-only. A receipt emitted on one edge is queryable from another edge as soon as the repair sync has run. Because the store is idempotent on receipt ID, replaying receipts during replication is safe.
- Append tool receipts:
POST /v1/receipts/tools. - Query tool receipts:
GET /v1/receipts/toolsorGET /v1/receipts/query. - Append child receipts:
POST /v1/receipts/children. - Query child receipts:
GET /v1/receipts/children.
Failover Behavior
When the current leader becomes unhealthy, the next smallest advertised URL takes over. Edge clients keep a multi-endpoint list and retry writes across the set, so a dead first URL is transparent.
- Revocations: the new leader already has the last-known revocation set from repair sync. No revocation is lost.
- Receipts: in-flight writes may be retried. Idempotent append on receipt ID makes this safe.
- Budgets: max-observed merge means the new leader picks up the highest usage count seen anywhere. Monotonic guarantee preserved.
- Authority: the trusted-key history is the union across nodes, so existing capabilities keep verifying under the new leader without restart.
Client URL lists must include followers
--control-url accepts a comma-separated cluster endpoint list. Configure every edge with every control node. Followers forward writes to the current leader, so any endpoint in the list is a valid entry point.Running a Node
Single-node (development)
$ chio trust serve \
--bind 127.0.0.1:8940 \
--authority-db ./state/authority.sqlite \
--revocation-db ./state/revocations.sqlite \
--receipt-db ./state/receipts.sqlite \
--budget-db ./state/budgets.sqlite \
--admin-token-file ./secrets/admin.txtThree-node cluster
trust_control:
bind: 0.0.0.0:8940
advertise_url: https://ctl-a.chio.internal:8940
peer_urls:
- https://ctl-b.chio.internal:8940
- https://ctl-c.chio.internal:8940
cluster_sync_interval_ms: 500
authority_db: /var/lib/chio/authority.sqlite
revocation_db: /var/lib/chio/revocations.sqlite
receipt_db: /var/lib/chio/receipts.sqlite
budget_db: /var/lib/chio/budgets.sqlite
service_token_file: /run/secrets/chio_service
admin_token_file: /run/secrets/chio_admin$ chio trust serve --config ./ctl-a.yamlRun the same command with ctl-b.yaml and ctl-c.yaml on the other two hosts, each advertising its own URL and listing the other two as peers. The cluster self-organizes; the smallest healthy URL becomes the leader.
Edge kernel wiring
$ chio mcp serve-http \
--control-url https://ctl-a.chio.internal:8940,https://ctl-b.chio.internal:8940,https://ctl-c.chio.internal:8940 \
--control-token-file /run/secrets/chio_serviceLocal and remote are exclusive
--control-url and --control-token, or local stores via --receipt-db, --revocation-db, and --authority-*. Never both at the same time.Health and Status Endpoints
Every node exposes health and status for operators and load balancers:
| Endpoint | Surfaces |
|---|---|
GET /health | Liveness. Returns 200 when the node can serve reads. |
GET /v1/cluster/status | Current leader URL, peer membership, peer-sync timestamps, replication positions per store. |
GET /v1/authority | Current authority generation, rotated timestamp, trusted-key history. |
GET /v1/receipts/query | Filterable receipt read surface used by the dashboard and the CLI. |
$ curl -s -H "Authorization: Bearer $CHIO_ADMIN" \
https://ctl-a.chio.internal:8940/v1/cluster/status | jq
{
"self": "https://ctl-a.chio.internal:8940",
"leader": "https://ctl-a.chio.internal:8940",
"peers": [
{
"url": "https://ctl-b.chio.internal:8940",
"healthy": true,
"last_sync_unix": 1765012345,
"receipt_seq": 918422,
"revocation_seq": 812,
"authority_generation": 7,
"budget_seq": 41234
}
]
}Key Rotation
The authority signing seed is rotated with chio trust rotate-authority. Rotation is non-disruptive: existing capabilities keep verifying because the kernel verifies against the full trusted-key history, not only the current key.
# Rotate the authority signing seed on the current leader.
# Existing capabilities remain valid under the trusted-key history.
$ chio trust rotate-authority \
--control-url https://ctl-a.chio.internal:8940 \
--control-token-file ./secrets/chio_admin.txt
# Confirm the new generation has propagated.
$ chio trust authority-status \
--control-url https://ctl-a.chio.internal:8940Rotation invariants:
- Capabilities issued under the previous seed still verify. Existing live sessions keep working.
- New capabilities are signed under the new seed.
- Trusted-key history replicates to every follower as part of the authority snapshot. Remote authority clients refresh trusted-key state on a short TTL, so every edge picks up the new generation without a process restart.
Rotate the authority seed and the admin token separately
Security Stance
The control plane enforces a small and boring set of security invariants:
- Kernel-mediated trust: the control plane stores and issues trust state. It never bypasses kernel checks. Every tool call still runs the full guard pipeline at the edge.
- Separate service and admin tokens: service tokens authenticate edge kernels; admin tokens authenticate operators. Rotate independently.
- HTTPS everywhere: terminate TLS in front of both the control plane and the hosted MCP / auth plane.
- Dedicated auth signing seed: when hosted OAuth is enabled, use a signing seed distinct from the capability authority.
- Shared budget in every multi-node deployment: never run edges against independent local budget stores in production. Budgets must exhaust once, everywhere.
Non-Goals
The trust control plane is deliberately scoped. It does not attempt:
- Multi-datacenter consensus.
- Byzantine quorum rotation.
- HSM-backed signing.
- Dynamic user login or identity-provider federation.
Those remain follow-on work. The goal here is strong single-region HA plus real hosted OAuth behavior, on top of the same kernel extension seams (CapabilityAuthority, RevocationStore, ReceiptStore, BudgetStore) already used in single-node mode.
For background on how the control plane fits into the broader economic story, see Economics and Trust Model.