Performance & Tuning
Chio is CPU-bound for the default seven-guard pipeline and turns IO-bound the moment a session-aware guard, an external adapter, or a custom WASM module enters the chain. This page is a lookup table: which guards run in microseconds, which run in tens of milliseconds, which knobs change throughput, and where the bottlenecks actually live.
Latency classes
chio_guard_eval_duration_seconds for your real environment before sizing.Latency by Guard
| Guard | Operation | Latency class | Scaling |
|---|---|---|---|
ForbiddenPathGuard | Path normalization + glob match | <1ms | O(n patterns) |
EgressAllowlistGuard | URL parse + glob match | <1ms | O(n patterns) |
ShellCommandGuard | shlex tokenization + regex | <2ms | O(n tokens × n patterns) |
InternalNetworkGuard | IP parse + CIDR membership | <0.5ms | O(log CIDRs) |
AgentVelocityGuard | Token bucket update | <0.1ms amortized | O(1) |
DataFlowGuard | Session journal sum | <1ms | O(n history) |
BehavioralSequenceGuard | Sequence pattern check | <1ms | O(n window) |
JailbreakGuard (cached) | Cache hit on prompt hash | <0.1ms | O(1) |
JailbreakGuard (full) | Heuristic + classifier eval | 20-50ms | O(prompt length) |
ResponseSanitizationGuard | Regex pass over response | 5-20ms | O(response size × n patterns) |
| WASM custom guard | Module call within fuel limit | 10-100ms | Fuel-dependent |
AsyncGuardAdapter (cached) | TtlCache hit, no provider call | <0.5ms | O(1) |
AsyncGuardAdapter (miss) | Live HTTP to external provider | 100-500ms | Network-bound |
Cache hits versus misses dominate the practical tail latency. Tighten cache TTL only when freshness genuinely matters; doubling TTL from 60s to 120s typically cuts external-guard p99 in half on steady traffic.
Throughput Targets
| Pipeline shape | Target | Bound |
|---|---|---|
| Default 7-guard pipeline | ~1000 req/s/core | CPU |
| Session-aware (with journal locks) | ~500 req/s/core | Mutex contention |
| WASM custom guards | 100-1000 req/s | Fuel + module size |
| Async external (cache miss heavy) | <100 req/s | Network |
These are per-process numbers. Horizontal scaling is the answer for traffic above 5K req/s, but it shifts the bottleneck from CPU to the receipt store. See Bottlenecks below.
Memory Footprint
| Subsystem | Per-unit | Notes |
|---|---|---|
| Receipt store (SQLite row) | ~500 bytes / receipt | Includes raw_json plus indexed columns |
| Session journal | ~1 KB / session | Grows with history depth |
| WASM linear memory | 1-64 MB / module | Configurable; per-instance |
| LRU caches (TtlCache) | ~100 bytes / entry | Default capacity 1024 |
At default settings (90-day retention, 1000 req/s, 1 KB sessions) the SQLite receipt file grows about 4 GB per million receipts. A kernel running 5K req/s for a day produces ~200 MB of receipts. Plan archive rotation accordingly via RetentionConfig.
Bottlenecks
Four bottlenecks dominate, in this order:
- Receipt store I/O. Every allowed or denied call writes one receipt. SQLite
INSERTlatency is the floor for kernel evaluation throughput on a single node. Mitigate with WAL mode (already enabled in the bootstrap), bigger checkpoint batches, and per-tenant store sharding. - Session journal locks. Session-aware guards (data-flow, behavioral-sequence, velocity) take a per-session
Mutexon the journal. High concurrency on the same session serializes. Mitigate by sharding sessions across journals or by batching low-stakes calls outside the session. - WASM fuel. A WASM guard that exhausts its fuel returns
Verdict::Denywithreason_class = "fuel". Pre-deny tail latency is the full fuel ceiling. Mitigate by tuning the per-module fuel limit downward and rejecting cheaply, rather than letting modules run to ceiling. - External guard circuit breaker. A degraded provider can stall a synchronous pipeline; the breaker prevents that but at the cost of dropping calls during the open window. Mitigate by tuning
RetryConfig::max_retriesandCircuitBreakerConfig::reset_timeoutfor your provider's actual SLA.
Tuning Knobs
- Checkpoint batch size.
checkpoint_batch_sizeon kernel config. Default 100 receipts per Merkle batch. Raise this to amortize signing cost; lower it for shorter recovery windows. - Receipt retention.
RetentionConfig.retention_days(default 90) andmax_size_bytes(default 10 GB). Aged-out rows move to a read-only archive on rotation, preserving inclusion proofs. - Session journal sharding. Shard by agent ID or session ID. Sharding by agent splits hot sessions across journals; pick the dimension that matches your contention pattern.
- WASM fuel limits. Per-module ceiling, expressed in Wasmtime fuel units. Lower ceilings cut tail latency; raise them only when a module hits the ceiling on legitimate input.
- AsyncGuardAdapter cache TTL.
cache_ttl_secondson adapter config (default 60s). Bigger TTLs raise hit rate at the cost of evidence freshness. - AsyncGuardAdapter rate limit.
rate_per_secondandrate_burst(defaults 20 / 20). Sized to typical provider QPS budgets; raise after confirming your contract.
Worked Example: 5K req/s Deployment
A six-replica horizontally-scaled fleet running the default pipeline plus a content-safety provider. ~833 req/s per replica.
hushspec: "0.1.0"
kernel:
# Larger checkpoint batches amortize signing cost across more receipts.
# At 833 req/s/replica, 500 produces a checkpoint about every 0.6s.
checkpoint_batch_size: 500
retention:
# 30-day live retention plus archive rotation. Aged receipts remain
# verifiable against their checkpoint roots after archiving.
retention_days: 30
max_size_bytes: 21_474_836_480 # 20 GB
archive_path: "/var/lib/chio/receipts-archive.sqlite3"
session:
# Shard by agent_id so hot agents do not contend on a shared journal.
journal_shards: 16
shard_dimension: agent_id
guards:
cloud_guardrails:
azure_content_safety:
enabled: true
endpoint: "https://eastus.cognitiveservices.azure.com"
api_key: "azure-key"
tool_patterns:
- "post_message_*"
adapter:
# 60s default TTL is fine for content-safety evidence; bump to 300
# only if your compliance posture allows.
cache_ttl_seconds: 60
cache_capacity: 4096
# Provider QPS is 100; leave 20% headroom.
rate_per_second: 80
rate_burst: 80
circuit_failure_threshold: 5
circuit_reset_timeout_secs: 30
retry_max_retries: 3
wasm_guards:
# Per-module fuel ceiling in Wasmtime fuel units.
default_fuel_limit: 5_000_000
observability:
log_level: info
metrics:
# Cap stays at the static MAX_GUARD_METRIC_CARDINALITY (1024) unless
# explicitly raised. With 6 replicas hosting ~50 guard variants, this
# is plenty of headroom.
max_guard_cardinality: 1024Two things this configuration does not do:
- It does not enable any
CircuitOpenVerdict::AlloworRateLimitedVerdict::Allowfail-open paths. Those are reserved for advisory guards; the default-deny posture stays in place. - It does not co-locate the receipt store with the agent. At 5K req/s, the SQLite receipt file is on the kernel's local disk; cross-replica receipt aggregation happens out-of-band via archive rotation or a streaming receipt sink.
Sharded receipt stores need careful checkpointing
previous_checkpoint_sha256 and a hand-merge corrupts the continuity proof.Next Steps
- Failure & Recovery · what each fail mode costs in latency and verdict shape
- Observability · the histograms and counters you build dashboards from
- Deployment Topologies · in-process versus sidecar trade-offs that affect throughput
- External Guards · adapter knobs that drive the network-bound tail