Chio/Docs

Performance & Tuning

Chio is CPU-bound for the default seven-guard pipeline and turns IO-bound the moment a session-aware guard, an external adapter, or a custom WASM module enters the chain. This page is a lookup table: which guards run in microseconds, which run in tens of milliseconds, which knobs change throughput, and where the bottlenecks actually live.

Latency classes

Numbers below are order-of-magnitude bands measured against the default kernel build on commodity x86 hardware. Treat them as operator-facing rules of thumb, not SLOs. Build a histogram per guard from chio_guard_eval_duration_seconds for your real environment before sizing.

Latency by Guard

GuardOperationLatency classScaling
ForbiddenPathGuardPath normalization + glob match<1msO(n patterns)
EgressAllowlistGuardURL parse + glob match<1msO(n patterns)
ShellCommandGuardshlex tokenization + regex<2msO(n tokens × n patterns)
InternalNetworkGuardIP parse + CIDR membership<0.5msO(log CIDRs)
AgentVelocityGuardToken bucket update<0.1ms amortizedO(1)
DataFlowGuardSession journal sum<1msO(n history)
BehavioralSequenceGuardSequence pattern check<1msO(n window)
JailbreakGuard (cached)Cache hit on prompt hash<0.1msO(1)
JailbreakGuard (full)Heuristic + classifier eval20-50msO(prompt length)
ResponseSanitizationGuardRegex pass over response5-20msO(response size × n patterns)
WASM custom guardModule call within fuel limit10-100msFuel-dependent
AsyncGuardAdapter (cached)TtlCache hit, no provider call<0.5msO(1)
AsyncGuardAdapter (miss)Live HTTP to external provider100-500msNetwork-bound

Cache hits versus misses dominate the practical tail latency. Tighten cache TTL only when freshness genuinely matters; doubling TTL from 60s to 120s typically cuts external-guard p99 in half on steady traffic.


Throughput Targets

Pipeline shapeTargetBound
Default 7-guard pipeline~1000 req/s/coreCPU
Session-aware (with journal locks)~500 req/s/coreMutex contention
WASM custom guards100-1000 req/sFuel + module size
Async external (cache miss heavy)<100 req/sNetwork

These are per-process numbers. Horizontal scaling is the answer for traffic above 5K req/s, but it shifts the bottleneck from CPU to the receipt store. See Bottlenecks below.


Memory Footprint

SubsystemPer-unitNotes
Receipt store (SQLite row)~500 bytes / receiptIncludes raw_json plus indexed columns
Session journal~1 KB / sessionGrows with history depth
WASM linear memory1-64 MB / moduleConfigurable; per-instance
LRU caches (TtlCache)~100 bytes / entryDefault capacity 1024

At default settings (90-day retention, 1000 req/s, 1 KB sessions) the SQLite receipt file grows about 4 GB per million receipts. A kernel running 5K req/s for a day produces ~200 MB of receipts. Plan archive rotation accordingly via RetentionConfig.


Bottlenecks

Four bottlenecks dominate, in this order:

  1. Receipt store I/O. Every allowed or denied call writes one receipt. SQLite INSERT latency is the floor for kernel evaluation throughput on a single node. Mitigate with WAL mode (already enabled in the bootstrap), bigger checkpoint batches, and per-tenant store sharding.
  2. Session journal locks. Session-aware guards (data-flow, behavioral-sequence, velocity) take a per-session Mutex on the journal. High concurrency on the same session serializes. Mitigate by sharding sessions across journals or by batching low-stakes calls outside the session.
  3. WASM fuel. A WASM guard that exhausts its fuel returns Verdict::Deny with reason_class = "fuel". Pre-deny tail latency is the full fuel ceiling. Mitigate by tuning the per-module fuel limit downward and rejecting cheaply, rather than letting modules run to ceiling.
  4. External guard circuit breaker. A degraded provider can stall a synchronous pipeline; the breaker prevents that but at the cost of dropping calls during the open window. Mitigate by tuning RetryConfig::max_retries and CircuitBreakerConfig::reset_timeout for your provider's actual SLA.

Tuning Knobs

  • Checkpoint batch size. checkpoint_batch_size on kernel config. Default 100 receipts per Merkle batch. Raise this to amortize signing cost; lower it for shorter recovery windows.
  • Receipt retention. RetentionConfig.retention_days (default 90) and max_size_bytes (default 10 GB). Aged-out rows move to a read-only archive on rotation, preserving inclusion proofs.
  • Session journal sharding. Shard by agent ID or session ID. Sharding by agent splits hot sessions across journals; pick the dimension that matches your contention pattern.
  • WASM fuel limits. Per-module ceiling, expressed in Wasmtime fuel units. Lower ceilings cut tail latency; raise them only when a module hits the ceiling on legitimate input.
  • AsyncGuardAdapter cache TTL. cache_ttl_seconds on adapter config (default 60s). Bigger TTLs raise hit rate at the cost of evidence freshness.
  • AsyncGuardAdapter rate limit. rate_per_second and rate_burst (defaults 20 / 20). Sized to typical provider QPS budgets; raise after confirming your contract.

Worked Example: 5K req/s Deployment

A six-replica horizontally-scaled fleet running the default pipeline plus a content-safety provider. ~833 req/s per replica.

chio.yaml
hushspec: "0.1.0"

kernel:
  # Larger checkpoint batches amortize signing cost across more receipts.
  # At 833 req/s/replica, 500 produces a checkpoint about every 0.6s.
  checkpoint_batch_size: 500

retention:
  # 30-day live retention plus archive rotation. Aged receipts remain
  # verifiable against their checkpoint roots after archiving.
  retention_days: 30
  max_size_bytes: 21_474_836_480   # 20 GB
  archive_path: "/var/lib/chio/receipts-archive.sqlite3"

session:
  # Shard by agent_id so hot agents do not contend on a shared journal.
  journal_shards: 16
  shard_dimension: agent_id

guards:
  cloud_guardrails:
    azure_content_safety:
      enabled: true
      endpoint: "https://eastus.cognitiveservices.azure.com"
      api_key: "azure-key"
      tool_patterns:
        - "post_message_*"
      adapter:
        # 60s default TTL is fine for content-safety evidence; bump to 300
        # only if your compliance posture allows.
        cache_ttl_seconds: 60
        cache_capacity: 4096
        # Provider QPS is 100; leave 20% headroom.
        rate_per_second: 80
        rate_burst: 80
        circuit_failure_threshold: 5
        circuit_reset_timeout_secs: 30
        retry_max_retries: 3

wasm_guards:
  # Per-module fuel ceiling in Wasmtime fuel units.
  default_fuel_limit: 5_000_000

observability:
  log_level: info
  metrics:
    # Cap stays at the static MAX_GUARD_METRIC_CARDINALITY (1024) unless
    # explicitly raised. With 6 replicas hosting ~50 guard variants, this
    # is plenty of headroom.
    max_guard_cardinality: 1024

Two things this configuration does not do:

  • It does not enable any CircuitOpenVerdict::Allow or RateLimitedVerdict::Allow fail-open paths. Those are reserved for advisory guards; the default-deny posture stays in place.
  • It does not co-locate the receipt store with the agent. At 5K req/s, the SQLite receipt file is on the kernel's local disk; cross-replica receipt aggregation happens out-of-band via archive rotation or a streaming receipt sink.

Sharded receipt stores need careful checkpointing

Per-replica SQLite stores produce per-replica checkpoint chains. That is fine for audit, but operators who want one canonical per-tenant chain must consolidate either through a single-writer store or through the federated-evidence import path. Don't merge raw checkpoint records by hand; the chain links via previous_checkpoint_sha256 and a hand-merge corrupts the continuity proof.

Next Steps

Performance & Tuning · Chio Docs