Trust Control Plane · Chio Docs

Why Run a Cluster

A single trust-control node is fine for development. In production, three invariants make a cluster non-optional:

Cross-node receipts: one edge kernel writes a receipt; another edge kernel must be able to query it.
Cross-node revocation: revoking a capability through one node must be enforced by every other node on the next request.
Shared budget accounting: invocation budgets must exhaust consistently across nodes, not independently per-process.

The control plane exposes all three through one shared HTTP API, so edge kernels stay thin and stateless with respect to trust.

Topology

Run two or more trust-control nodes in front of a fleet of chio mcp serve-http edges. Every control node owns local durable SQLite state. Writes route to the current leader, follower nodes replicate on a short interval, and edges keep a multi-endpoint client list.

rendering…

Edge kernels point at a multi-endpoint control-URL list. Inside the cluster one node is the deterministic leader, peers follow on a repair-sync loop, and every node owns its own replicated SQLite stores.

Not a consensus system

This is a deterministic leader with a repair loop. It provides operational high availability; it does not provide Byzantine fault tolerance. For multi-datacenter consensus, place the cluster behind your existing consensus system.

Leader Rule

The write leader is the lexicographically smallest healthy advertise_url in the cluster membership set. Health is based on successful control-peer syncs and local self-health.

Why this rule:

Deterministic and easy to reason about.
Requires no external coordinator.
Automatic failover when the current leader goes unhealthy: the next smallest URL becomes the new leader.
Trivial to test because the rule is pure input-to-output on the membership set.

Replication Model

Replication is per-store. Every kind of state has an idempotent replication contract so repair syncs converge even after transient peer failures.

Store	Replication Shape	Merge Rule
Authority	Signed authority snapshots including signing seed, generation, rotated timestamp, trusted-key history	Highest observed generation wins; trusted-key history is the union
Revocations	Idempotent records keyed by capability ID	Union; a revocation cannot be undone by replication
Tool receipts	Idempotent append-only records keyed by receipt ID and sequence	Max observed sequence wins; never deletes
Child receipts	Idempotent append-only records keyed by receipt ID	Max observed sequence wins; never deletes
Budgets	Monotonic usage records keyed by `(capability_id, grant_index)`	Max observed invocation count wins

Periodic repair sync runs even after successful write forwarding, so missed updates eventually converge. The default interval is --cluster-sync-interval-ms 500.

Budget-State Durability

Invocation budgets live in a pluggable BudgetStore. In distributed mode the store is backed by the control plane, so budget increments route to the current leader and replicate to followers as monotonic usage records.

Local mode: in-memory or local SQLite (SqliteBudgetStore). Budget exhaustion persists across process restart.
Distributed mode: remote BudgetStore backed by the control plane. Exhaustion semantics stay strong across nodes.
Follower merge rule: max observed invocation count per (capability_id, grant_index). Budgets can only ever count up.

Budget behavior after failover

Because the merge rule is max observed, a leader failover cannot hand out extra budget. At worst the follower serves a slightly stale usage count and the next write through the new leader picks up the highest observed count. An exhausted grant remains exhausted.

Store Responsibilities

Capability Authority

The capability authority issues ed25519 signatures over capability tokens. The control plane owns the signing seed and the rotated trusted-key history. Edge kernels verify capability signatures against the full trusted set fetched from the control plane on a short TTL.

Issue capabilities: POST /v1/capabilities/issue.
Read authority state: GET /v1/authority.
Rotate authority: POST /v1/authority.

Revocation Store

Revocations are idempotent records keyed by capability ID. Once a capability is revoked on any node, every other node observes the revocation on the next repair sync or the next cross-node request. The edge kernel re-checks revocation status at the start of every tool call. A request cannot use a revoked capability after the edge kernel has received the revocation.

Query: GET /v1/revocations.
Mutate: POST /v1/revocations.

Receipt Store

Tool receipts and child-request receipts are append-only. A receipt emitted on one edge is queryable from another edge as soon as the repair sync has run. Because the store is idempotent on receipt ID, replaying receipts during replication is safe.

Append tool receipts: POST /v1/receipts/tools.
Query tool receipts: GET /v1/receipts/tools or GET /v1/receipts/query.
Append child receipts: POST /v1/receipts/children.
Query child receipts: GET /v1/receipts/children.

Failover Behavior

When the current leader becomes unhealthy, the next smallest advertised URL takes over. Edge clients keep a multi-endpoint list and retry writes across the set, so a dead first URL is transparent.

Revocations: the new leader already has the last-known revocation set from repair sync. No revocation is lost.
Receipts: in-flight writes may be retried. Idempotent append on receipt ID makes this safe.
Budgets: max-observed merge means the new leader picks up the highest usage count seen anywhere. Monotonic guarantee preserved.
Authority: the trusted-key history is the union across nodes, so existing capabilities keep verifying under the new leader without restart.

Client URL lists must include followers

--control-url accepts a comma-separated cluster endpoint list. Configure every edge with every control node. Followers forward writes to the current leader, so any endpoint in the list is a valid entry point.

Running a Node

Single-node (development)

bash

# --service-token is required. Prefer CHIO_TRUST_SERVICE_TOKEN so the
# bearer is not visible via ps / proc.
$ export CHIO_TRUST_SERVICE_TOKEN="$(cat ./secrets/service-token.txt)"
$ chio trust serve \
    --listen 127.0.0.1:8940 \
    --authority-db ./state/authority.sqlite \
    --revocation-db ./state/revocations.sqlite \
    --receipt-db ./state/receipts.sqlite \
    --budget-db ./state/budgets.sqlite

Three-node cluster

bash

# Every setting is a CLI flag; there is no config-file mode.
# CHIO_TRUST_SERVICE_TOKEN carries the required service token.
$ export CHIO_TRUST_SERVICE_TOKEN="$(cat /run/secrets/chio_service)"
$ chio trust serve \
    --listen 0.0.0.0:8940 \
    --advertise-url https://ctl-a.chio.internal:8940 \
    --peer-url https://ctl-b.chio.internal:8940 \
    --peer-url https://ctl-c.chio.internal:8940 \
    --cluster-sync-interval-ms 500 \
    --authority-db /var/lib/chio/authority.sqlite \
    --revocation-db /var/lib/chio/revocations.sqlite \
    --receipt-db /var/lib/chio/receipts.sqlite \
    --budget-db /var/lib/chio/budgets.sqlite

Run the same command on the other two hosts, each with its own --advertise-url and listing the other two as --peer-url values. The cluster self-organizes; the smallest healthy URL becomes the leader.

Beyond the cluster and store flags, chio trust serve accepts a set of optional file-backed registries that light up federation and identity features. Each is off unless its flag is passed: --scim-lifecycle-file (SCIM provisioning and deprovisioning for an external IdP), --enterprise-providers-file, --federation-policies-file, --verifier-policies-file, and the passport and certification registry files.

Edge kernel wiring

bash

# --control-token / CHIO_CONTROL_TOKEN carries the trust-control
# service token as a value, not a file path.
$ export CHIO_CONTROL_TOKEN="$(cat /run/secrets/chio_service)"
$ chio mcp serve-http \
    --control-url https://ctl-a.chio.internal:8940,https://ctl-b.chio.internal:8940,https://ctl-c.chio.internal:8940

Local and remote are exclusive

Edge kernels pick one mode: remote stores via --control-url and --control-token, or local stores via --receipt-db, --revocation-db, and --authority-*. Never both at the same time.

Health and Status Endpoints

Every node exposes health and status for operators and load balancers:

Endpoint	Surfaces
`GET /health`	Liveness. Returns 200 when the node can serve reads.
`GET /v1/internal/cluster/status`	Current leader URL, peer membership, peer-sync timestamps, replication positions per store.
`GET /v1/authority`	Current authority generation, rotated timestamp, trusted-key history.
`GET /v1/receipts/query`	Filterable receipt endpoint used by the dashboard and the CLI.

bash

$ curl -s -H "Authorization: Bearer $CHIO_TRUST_SERVICE_TOKEN" \
    https://ctl-a.chio.internal:8940/v1/internal/cluster/status | jq
{
  "self": "https://ctl-a.chio.internal:8940",
  "leader": "https://ctl-a.chio.internal:8940",
  "peers": [
    {
      "url": "https://ctl-b.chio.internal:8940",
      "healthy": true,
      "last_sync_unix": 1765012345,
      "receipt_seq": 918422,
      "revocation_seq": 812,
      "authority_generation": 7,
      "budget_seq": 41234
    }
  ]
}

Key Rotation

The authority signing seed is rotated over the running node's authority endpoint: POST /v1/authority rotates, GET /v1/authority reads status. Rotation is non-disruptive: existing capabilities keep verifying because the kernel verifies against the full trusted-key history, not only the current key.

bash

# Rotate the authority signing seed on the current leader.
# Existing capabilities remain valid under the trusted-key history.
$ curl -X POST \
    -H "Authorization: Bearer $CHIO_TRUST_SERVICE_TOKEN" \
    https://ctl-a.chio.internal:8940/v1/authority

# Confirm the new generation has propagated.
$ curl -s \
    -H "Authorization: Bearer $CHIO_TRUST_SERVICE_TOKEN" \
    https://ctl-a.chio.internal:8940/v1/authority | jq

Rotation invariants:

Capabilities issued under the previous seed still verify. Existing live sessions keep working.
New capabilities are signed under the new seed.
Trusted-key history replicates to every follower as part of the authority snapshot. Remote authority clients refresh trusted-key state on a short TTL, so every edge picks up the new generation without a process restart.

Rotate the authority seed and the service token separately

The authority signing seed and the trust-control service token are distinct credentials with distinct blast radii. Rotate them on independent schedules and store them in separate secret stores. Never share one secret across both.

Security Stance

The control plane enforces a small and boring set of security invariants:

Kernel-mediated trust: the control plane stores and issues trust state. It never bypasses kernel checks. Every tool call still runs the full guard pipeline at the edge.
One service token, optional tenant-read tokens: the required --service-token (env CHIO_TRUST_SERVICE_TOKEN) authenticates every request and keeps cross-tenant admin access. Repeatable --tenant-read-token tenant_id=token entries grant read-only access confined to a single tenant's receipts.
HTTPS everywhere: terminate TLS in front of both the control plane and the hosted MCP / auth plane.
Dedicated auth signing seed: when hosted OAuth is enabled, use a signing seed distinct from the capability authority.
Shared budget in every multi-node deployment: never run edges against independent local budget stores in production. Budgets must exhaust once, everywhere.

Non-Goals

The trust control plane is deliberately scoped. It does not attempt:

Multi-datacenter consensus.
Byzantine quorum rotation.
HSM-backed signing.

Those are not part of this service. It provides single-region high availability and hosted OAuth behavior through the same kernel extension interfaces (CapabilityAuthority, RevocationStore, ReceiptStore, BudgetStore) already used in single-node mode.

For background on how the control plane fits into the broader economic model, see Economics and Trust Model.

PreviousSIEM Export NextFederation