Chio/Docs

Trust Control Plane

The chio trust-control service centralizes capability issuance, revocation, receipt ingestion, and budget accounting for a fleet of kernel nodes. In production it runs as a small replicated cluster with deterministic leader election, monotonic budget guarantees, and seamless authority rotation. One command, chio trust serve, starts a node that knows how to be either a leader or a follower.

Why Run a Cluster

A single trust-control node is fine for development. In production, three invariants make a cluster non-optional:

  • Cross-node receipts: one edge kernel writes a receipt; another edge kernel must be able to query it.
  • Cross-node revocation: revoking a capability through one node must be enforced by every other node on the next request.
  • Shared budget accounting: invocation budgets must exhaust consistently across nodes, not independently per-process.

The control plane solves all three from a single shared HTTP surface, so edge kernels stay thin and stateless with respect to trust.


Topology

A recommended deployment runs two or more trust-control nodes in front of a fleet of chio mcp serve-http edges. Every control node owns local durable SQLite state. Writes route to the current leader, follower nodes replicate on a short interval, and edges keep a multi-endpoint client list.

rendering…
Edge kernels point at a multi-endpoint control-URL list. Inside the cluster one node is the deterministic leader, peers follow on a repair-sync loop, and every node owns its own replicated SQLite stores.

Not a consensus system

This is a deterministic leader plus repair loop. The design target is strong-enough operational HA for normal deployments, not Byzantine fault tolerance. For multi-datacenter consensus, front the cluster with your usual consensus layer.

Leader Rule

The write leader is the lexicographically smallest healthy advertise_url in the cluster membership set. Health is based on successful control-peer syncs and local self-health.

Why this rule:

  • Deterministic and easy to reason about.
  • Requires no external coordinator.
  • Automatic failover when the current leader goes unhealthy: the next smallest URL becomes the new leader.
  • Trivial to test because the rule is pure input-to-output on the membership set.

Replication Model

Replication is per-store. Every kind of state has an idempotent replication contract so repair syncs converge even after transient peer failures.

StoreReplication ShapeMerge Rule
AuthoritySigned authority snapshots including signing seed, generation, rotated timestamp, trusted-key historyHighest observed generation wins; trusted-key history is the union
RevocationsIdempotent records keyed by capability IDUnion; a revocation cannot be undone by replication
Tool receiptsIdempotent append-only records keyed by receipt ID and sequenceMax observed sequence wins; never deletes
Child receiptsIdempotent append-only records keyed by receipt IDMax observed sequence wins; never deletes
BudgetsMonotonic usage records keyed by (capability_id, grant_index)Max observed invocation count wins

Periodic repair sync runs even after successful write forwarding, so missed updates eventually converge. The default interval is --cluster-sync-interval-ms 500.


Budget-State Durability

Invocation budgets live in a pluggable BudgetStore. In distributed mode the store is backed by the control plane, so budget increments route to the current leader and replicate to followers as monotonic usage records.

  • Local mode: in-memory or local SQLite (SqliteBudgetStore). Budget exhaustion persists across process restart.
  • Distributed mode: remote BudgetStore backed by the control plane. Exhaustion semantics stay strong across nodes.
  • Follower merge rule: max observed invocation count per (capability_id, grant_index). Budgets can only ever count up.

Monotonic budgets survive failover

Because the merge rule is max observed, a leader failover cannot hand out extra budget. At worst the follower serves a slightly stale usage count and the next write through the new leader picks up the real max. Exhaustion is never forgotten.

Store Responsibilities

Capability Authority

The capability authority issues ed25519 signatures over capability tokens. The control plane owns the signing seed and the rotated trusted-key history. Edge kernels verify capability signatures against the full trusted set fetched from the control plane on a short TTL.

  • Issue capabilities: POST /v1/capabilities/issue.
  • Read authority state: GET /v1/authority.
  • Rotate authority: POST /v1/authority.

Revocation Store

Revocations are idempotent records keyed by capability ID. Once a capability is revoked on any node, every other node observes the revocation on the next repair sync or the next cross-node request. The edge kernel re-checks revocation status at the start of every tool call, so there is no stale-decision window on the critical path.

  • Query: GET /v1/revocations.
  • Mutate: POST /v1/revocations.

Receipt Store

Tool receipts and child-request receipts are append-only. A receipt emitted on one edge is queryable from another edge as soon as the repair sync has run. Because the store is idempotent on receipt ID, replaying receipts during replication is safe.

  • Append tool receipts: POST /v1/receipts/tools.
  • Query tool receipts: GET /v1/receipts/tools or GET /v1/receipts/query.
  • Append child receipts: POST /v1/receipts/children.
  • Query child receipts: GET /v1/receipts/children.

Failover Behavior

When the current leader becomes unhealthy, the next smallest advertised URL takes over. Edge clients keep a multi-endpoint list and retry writes across the set, so a dead first URL is transparent.

  • Revocations: the new leader already has the last-known revocation set from repair sync. No revocation is lost.
  • Receipts: in-flight writes may be retried. Idempotent append on receipt ID makes this safe.
  • Budgets: max-observed merge means the new leader picks up the highest usage count seen anywhere. Monotonic guarantee preserved.
  • Authority: the trusted-key history is the union across nodes, so existing capabilities keep verifying under the new leader without restart.

Client URL lists must include followers

--control-url accepts a comma-separated cluster endpoint list. Configure every edge with every control node. Followers forward writes to the current leader, so any endpoint in the list is a valid entry point.

Running a Node

Single-node (development)

bash
$ chio trust serve \
    --bind 127.0.0.1:8940 \
    --authority-db ./state/authority.sqlite \
    --revocation-db ./state/revocations.sqlite \
    --receipt-db ./state/receipts.sqlite \
    --budget-db ./state/budgets.sqlite \
    --admin-token-file ./secrets/admin.txt

Three-node cluster

ctl-a.yaml
trust_control:
  bind: 0.0.0.0:8940
  advertise_url: https://ctl-a.chio.internal:8940
  peer_urls:
    - https://ctl-b.chio.internal:8940
    - https://ctl-c.chio.internal:8940
  cluster_sync_interval_ms: 500

  authority_db: /var/lib/chio/authority.sqlite
  revocation_db: /var/lib/chio/revocations.sqlite
  receipt_db:   /var/lib/chio/receipts.sqlite
  budget_db:    /var/lib/chio/budgets.sqlite

  service_token_file: /run/secrets/chio_service
  admin_token_file:   /run/secrets/chio_admin
bash
$ chio trust serve --config ./ctl-a.yaml

Run the same command with ctl-b.yaml and ctl-c.yaml on the other two hosts, each advertising its own URL and listing the other two as peers. The cluster self-organizes; the smallest healthy URL becomes the leader.

Edge kernel wiring

bash
$ chio mcp serve-http \
    --control-url https://ctl-a.chio.internal:8940,https://ctl-b.chio.internal:8940,https://ctl-c.chio.internal:8940 \
    --control-token-file /run/secrets/chio_service

Local and remote are exclusive

Edge kernels pick one mode: remote stores via --control-url and --control-token, or local stores via --receipt-db, --revocation-db, and --authority-*. Never both at the same time.

Health and Status Endpoints

Every node exposes health and status for operators and load balancers:

EndpointSurfaces
GET /healthLiveness. Returns 200 when the node can serve reads.
GET /v1/cluster/statusCurrent leader URL, peer membership, peer-sync timestamps, replication positions per store.
GET /v1/authorityCurrent authority generation, rotated timestamp, trusted-key history.
GET /v1/receipts/queryFilterable receipt read surface used by the dashboard and the CLI.
bash
$ curl -s -H "Authorization: Bearer $CHIO_ADMIN" \
    https://ctl-a.chio.internal:8940/v1/cluster/status | jq
{
  "self": "https://ctl-a.chio.internal:8940",
  "leader": "https://ctl-a.chio.internal:8940",
  "peers": [
    {
      "url": "https://ctl-b.chio.internal:8940",
      "healthy": true,
      "last_sync_unix": 1765012345,
      "receipt_seq": 918422,
      "revocation_seq": 812,
      "authority_generation": 7,
      "budget_seq": 41234
    }
  ]
}

Key Rotation

The authority signing seed is rotated with chio trust rotate-authority. Rotation is non-disruptive: existing capabilities keep verifying because the kernel verifies against the full trusted-key history, not only the current key.

bash
# Rotate the authority signing seed on the current leader.
# Existing capabilities remain valid under the trusted-key history.
$ chio trust rotate-authority \
    --control-url https://ctl-a.chio.internal:8940 \
    --control-token-file ./secrets/chio_admin.txt

# Confirm the new generation has propagated.
$ chio trust authority-status \
    --control-url https://ctl-a.chio.internal:8940

Rotation invariants:

  1. Capabilities issued under the previous seed still verify. Existing live sessions keep working.
  2. New capabilities are signed under the new seed.
  3. Trusted-key history replicates to every follower as part of the authority snapshot. Remote authority clients refresh trusted-key state on a short TTL, so every edge picks up the new generation without a process restart.

Rotate the authority seed and the admin token separately

The authority signing seed and the control-plane admin bearer token are distinct credentials with distinct blast radii. Rotate them on independent schedules and store them in separate secret stores. Never share one seed across both.

Security Stance

The control plane enforces a small and boring set of security invariants:

  • Kernel-mediated trust: the control plane stores and issues trust state. It never bypasses kernel checks. Every tool call still runs the full guard pipeline at the edge.
  • Separate service and admin tokens: service tokens authenticate edge kernels; admin tokens authenticate operators. Rotate independently.
  • HTTPS everywhere: terminate TLS in front of both the control plane and the hosted MCP / auth plane.
  • Dedicated auth signing seed: when hosted OAuth is enabled, use a signing seed distinct from the capability authority.
  • Shared budget in every multi-node deployment: never run edges against independent local budget stores in production. Budgets must exhaust once, everywhere.

Non-Goals

The trust control plane is deliberately scoped. It does not attempt:

  • Multi-datacenter consensus.
  • Byzantine quorum rotation.
  • HSM-backed signing.
  • Dynamic user login or identity-provider federation.

Those remain follow-on work. The goal here is strong single-region HA plus real hosted OAuth behavior, on top of the same kernel extension seams (CapabilityAuthority, RevocationStore, ReceiptStore, BudgetStore) already used in single-node mode.

For background on how the control plane fits into the broader economic story, see Economics and Trust Model.

Trust Control Plane · Chio Docs