Performance & Tuning

Latency classes

The figures in the tables below are illustrative order-of-magnitude estimates, not measurements from any specific build or benchmark. Treat them as operator-facing rules of thumb, not SLOs. Build a histogram per guard from chio_guard_eval_duration_seconds against the target deployment before sizing it.

Latency by Guard

Guard	Operation	Latency class	Scaling
`ForbiddenPathGuard`	Path normalization + glob match	<1ms	O(n patterns)
`EgressAllowlistGuard`	URL parse + glob match	<1ms	O(n patterns)
`ShellCommandGuard`	shlex tokenization + regex	<2ms	O(n tokens × n patterns)
`InternalNetworkGuard`	IP parse + CIDR membership	<0.5ms	O(log CIDRs)
`AgentVelocityGuard`	Token bucket update	<0.1ms amortized	O(1)
`DataFlowGuard`	Session journal sum	<1ms	O(n history)
`BehavioralSequenceGuard`	Sequence pattern check	<1ms	O(n window)
`JailbreakGuard` (cached)	Cache hit on prompt hash	<0.1ms	O(1)
`JailbreakGuard` (full)	Heuristic + classifier eval	20-50ms	O(prompt length)
`ResponseSanitizationGuard`	Regex pass over response	5-20ms	O(response size × n patterns)
WASM custom guard	Module call within fuel limit	10-100ms	Fuel-dependent
`AsyncGuardAdapter` (cached)	TtlCache hit, no provider call	<0.5ms	O(1)
`AsyncGuardAdapter` (miss)	Live HTTP to external provider	100-500ms	Network-bound

Cache hits versus misses dominate the practical tail latency. Tighten cache TTL only when freshness genuinely matters; doubling TTL from 60s to 120s typically cuts external-guard p99 in half on steady traffic.

Throughput Targets

Pipeline shape	Target	Bound
Default 7-guard pipeline	~1000 req/s/core	CPU
Session-aware (with journal locks)	~500 req/s/core	Mutex contention
WASM custom guards	100-1000 req/s	Fuel + module size
Async external (cache miss heavy)	<100 req/s	Network

These are per-process numbers. Horizontal scaling is the answer for traffic above 5K req/s, but it shifts the bottleneck from CPU to the receipt store. See Bottlenecks below.

Memory Footprint

Subsystem	Per-unit	Notes
Receipt store (SQLite row)	~500 bytes / receipt	Includes raw_json plus indexed columns
Session journal	~1 KB / session	Grows with history depth
WASM linear memory	1-64 MB / module	Configurable; per-instance
LRU caches (TtlCache)	~100 bytes / entry	Default capacity 1024

At default settings (90-day retention, 1000 req/s, 1 KB sessions) the SQLite receipt file grows about 4 GB per million receipts. A kernel running 5K req/s for a day produces ~200 MB of receipts. Plan archive rotation accordingly via RetentionConfig.

Bottlenecks

Four bottlenecks dominate, in this order:

Receipt store I/O. Every allowed or denied call writes one receipt. SQLite INSERT latency is the floor for kernel evaluation throughput on a single node. Mitigate with WAL mode (already enabled in the bootstrap), bigger checkpoint batches, and per-tenant store sharding.
Session journal locks. The session-aware guards — DataFlowGuard and BehavioralSequenceGuard — each hold an Arc<SessionJournal> and take that journal's Mutex per evaluation. High concurrency on the same session serializes. Mitigate by sharding sessions across journals or by batching low-stakes calls outside the session. The velocity guards are a separate case: VelocityGuard and AgentVelocityGuard hold no journal, only a guard-wide Mutex over their own token-bucket map (VelocityGuard keys buckets by (capability_id, grant_index)), so journal sharding does nothing for them — relieving that contention means sharding the bucket map itself.
WASM fuel. A WASM guard that exhausts its fuel returns Verdict::Deny with reason_class = "fuel". Pre-deny tail latency is the full fuel ceiling. Mitigate by lowering the per-module fuel limit so the guard denies before it reaches the ceiling.
External guard circuit breaker. A degraded provider can stall a synchronous pipeline; the breaker prevents that but at the cost of dropping calls during the open window. Mitigate by tuning RetryConfig::max_retries and CircuitBreakerConfig::reset_timeout for your provider's actual SLA.

Tuning Knobs

Checkpoint cadence. checkpoint_interval in the receipts section (default 100 receipts per Merkle batch; it feeds the kernel's checkpoint_batch_size). Raise this to amortize signing cost; lower it for shorter recovery windows. 0 disables checkpointing.
Receipt retention. RetentionConfig.retention_days (default 90) and max_size_bytes (default 10 GB). Aged-out rows move to a read-only archive on rotation, preserving inclusion proofs.
Session journal sharding. Shard by agent ID or session ID. Sharding by agent splits hot sessions across journals; pick the dimension that matches your contention pattern.
WASM fuel limits. Per-module ceiling, expressed in Wasmtime fuel units. Lower ceilings cut tail latency; raise them only when a module hits the ceiling on legitimate input.
AsyncGuardAdapter cache TTL. cache_ttl on AsyncGuardAdapterConfig, a Duration (default Duration::from_secs(60)). Bigger TTLs raise hit rate at the cost of evidence freshness.
AsyncGuardAdapter rate limit. rate_per_second and rate_burst (defaults 20 / 20). Sized to typical provider QPS budgets; raise after confirming your contract.

Worked Example: 5K req/s Deployment

A six-replica horizontally-scaled fleet running the default pipeline, a policy-wired content-safety provider, and a custom WASM classifier. ~833 req/s per replica. Each key below is a supported ChioConfig field — each section is deny_unknown_fields, so a typo or an invented section fails to parse.

chio.yaml

kernel:
  signing_key: "${CHIO_SIGNING_KEY}"

adapters:
  # At least one adapter is mandatory. An absent or empty adapters block
  # parses but fails validation with "at least one adapter is required".
  - id: petstore
    protocol: openapi
    upstream: "http://petstore.example/api"

receipts:
  store: "sqlite:///var/lib/chio/receipts.db"
  # Receipts between Merkle checkpoints. Larger batches amortize signing
  # cost; at 833 req/s/replica, 500 checkpoints about every 0.6s.
  checkpoint_interval: 500
  # Live retention window. Aged receipts move to a read-only archive on
  # rotation and stay verifiable against their checkpoint roots.
  retention_days: 30

logging:
  level: info
  format: json

telemetry:
  enabled: true
  endpoint: "http://otel-collector:4317"
  service_name: chio-edge

guards:
  # Force these guards to run on every request, regardless of route.
  required:
    - internal-network
    - agent-velocity

wasm_guards:
  # wasm_guards is a LIST of entries, each with its own fuel_limit.
  - name: content-classifier
    path: /etc/chio/guards/content-classifier/content_classifier.wasm
    fuel_limit: 5000000
    priority: 100

Settings outside chio.yaml

The content-safety provider is an external guard wired through a HushSpec policy, not a chio.yaml section (see External Guards). Session-journal sharding and the deeper RetentionConfig knobs (max_size_bytes, archive_path) are not yet expressed in the file schema; they keep their Rust defaults today. Only retention_days is settable from chio.yaml.

Two things this configuration does not do:

It does not enable any CircuitOpenVerdict::Allow or RateLimitedVerdict::Allow fail-open paths. Those are reserved for advisory guards; the deny defaults remain in place.
It does not co-locate the receipt store with the agent. At 5K req/s, the SQLite receipt file is on the kernel's local disk; cross-replica receipt aggregation happens out-of-band via archive rotation or a streaming receipt sink.

Sharded receipt stores need careful checkpointing

Per-replica SQLite stores produce per-replica checkpoint chains. That is fine for audit, but operators who want one canonical per-tenant chain must consolidate either through a single-writer store or through the federated-evidence import path. Don't merge raw checkpoint records by hand; the chain links via previous_checkpoint_sha256 and a hand-merge corrupts the continuity proof.

Next Steps

Failure & Recovery · what each fail mode costs in latency and verdict shape
Observability · the histograms and counters you build dashboards from
Deployment Topologies · in-process versus sidecar trade-offs that affect throughput
External Guards · adapter knobs that drive the network-bound tail

PreviousObservability NextFailure & Recovery