Observability
Chio's primary audit trail is the signed receipt log. Metrics and traces are operational telemetry that complement receipts but do not replace them. This page covers the three signals operators actually run dashboards against: tracing spans, the WASM guard metric family, and the OpenTelemetry receipt exporter that bridges OTLP traces into the receipt store.
Source
crates/chio-wasm-guards/src/metrics.rs, crates/chio-wasm-guards/src/observability.rs, crates/chio-otel-receipt-exporter/src/lib.rs, and tracing call sites across chio-kernel.Tracing & Structured Logging
Every chio crate uses the tracing crate. Guards, kernel evaluation, and adapters emit structured events through five standard macros:
tracing::trace!· per-step state in evaluation. Off in production.tracing::debug!· per-call detail useful for local development. Off in production unless investigating.tracing::info!· lifecycle: kernel started, policy reloaded, checkpoint written, revocation propagated.tracing::warn!· recoverable degradation: cache miss escalation, regex compile failures (which lead to permissive guards), circuit-breaker trips, advisory verdicts.tracing::error!· fail-closed events that produceVerdict::Denyvia the fallback path: caught panics, poisoned mutexes, signing failures, store unreachable.
Each guard emits events with a fixed structured-field convention:
tracing::warn!(
guard = "ForbiddenPathGuard",
decision = "deny",
path = %normalized_path,
reason = %reason_class,
"denied path outside allowlist",
);The four canonical fields:
guard· stable guard identifier. Match this against the same field in receipts and metrics for join correlation.decision·allow,deny, orrewritefor guards that mutate.path· (when applicable) the canonical request path or tool name.reason· short reason class. The same vocabulary as the metric labelreason_classbelow.
Receipt > log for audit
ReceiptQuery; the signed-by-the-kernel evidence beats grep through stdout.WASM Guard Metrics
Custom WASM guards emit a Prometheus-shaped metric family registered through GuardMetricRegistry. The family descriptors are static constants in chio_wasm_guards::metrics:
| Metric | Kind | Labels | Unit |
|---|---|---|---|
chio_guard_eval_duration_seconds | Histogram | guard_id, verdict | seconds |
chio_guard_fuel_consumed_total | Counter | guard_id | fuel units |
chio_guard_verdict_total | Counter | guard_id, verdict | count |
chio_guard_deny_total | Counter | guard_id, reason_class | count |
chio_guard_reload_total | Counter | guard_id, outcome | count |
chio_guard_host_call_duration_seconds | Histogram | guard_id, host_fn | seconds |
chio_guard_module_bytes | Gauge | guard_id, epoch | bytes |
Histogram Buckets
Eval-duration buckets are tuned for WASM guard latencies:
pub const EVAL_DURATION_BUCKETS_SECONDS: &[f64] = &[
0.0001, 0.0005, 0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0,
];
pub const HOST_CALL_DURATION_BUCKETS_SECONDS: &[f64] = &[
0.00001, 0.00005, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1,
];Override these only if your dashboards demand different SLO breakpoints. They are shared by every guard so cross-guard p99 comparisons are meaningful out of the box.
Label Vocabulary
The label values are closed sets. Anything else is a coding bug:
// chio_guard_verdict_total.verdict
pub const VERDICT_LABEL_VALUES: &[&str] =
&[VERDICT_ALLOW, VERDICT_DENY, VERDICT_REWRITE, VERDICT_ERROR];
// chio_guard_deny_total.reason_class
pub const REASON_CLASS_LABEL_VALUES: &[&str] = &[
"policy", "pii", "secret", "prompt_injection",
"oversize", "fuel", "trap",
];
// chio_guard_host_call_duration_seconds.host_fn
pub const HOST_FN_LABEL_VALUES: &[&str] = &[
HOST_LOG, HOST_GET_CONFIG, HOST_GET_TIME_UNIX_SECS, HOST_FETCH_BLOB,
];
// chio_guard_reload_total.outcome
pub const RELOAD_OUTCOME_LABEL_VALUES: &[&str] =
&[RELOAD_APPLIED, RELOAD_CANARY_FAILED, RELOAD_ROLLED_BACK];Cardinality Limits
The registry caps unique guard IDs at MAX_GUARD_METRIC_CARDINALITY = 1024. Beyond that, registration returns GuardMetricRegistrationError with code E_GUARD_METRIC_CARDINALITY_EXCEEDED and the guard is dropped from the metric family.
The registry derives a stable 12-character guard ID from the guard digest:
pub fn guard_id_label_from_digest(digest: &str) -> String {
digest
.strip_prefix("sha256:")
.unwrap_or(digest)
.chars()
.take(12)
.collect()
}Two guards with the same first 12 hex digits would collide. In practice this is a non-issue at 1024 active guards; if your fleet approaches the cap, raise the cardinality limit explicitly via GuardMetricRegistry::with_max_guards(n) rather than truncating to fewer characters.
Cardinality is a metrics hazard, not a guard hazard
E_GUARD_METRIC_CARDINALITY_EXCEEDED in tracing logs.OpenTelemetry Receipt Exporter
The crate chio-otel-receipt-exporter accepts OTLP trace batches and writes derived chio receipts to a configured store. Two pieces:
OtlpGrpcIngress· accepts OTLP/gRPC trace exports in a narrow Rust representation.ReceiptStoreSink· builds aChioReceiptper OTLP span, signs it viaReceiptStoreSinkConfig.signing_keypair, and appends it to the configuredArc<dyn ReceiptStore>.
pub struct ReceiptStoreSinkConfig {
pub signing_keypair: Keypair,
// ... additional kernel-key, tenant, schema fields
}
pub struct ReceiptStoreSink { /* ... */ }
impl ReceiptStoreSink {
pub fn new(store: Arc<dyn ReceiptStore>, config: ReceiptStoreSinkConfig) -> Self;
pub fn export_traces(&self, batch: OtlpGrpcTraceExport)
-> Result<ReceiptStoreSinkSummary, OTelReceiptExportError>;
pub fn receipt_for_span(&self, span: &OtlpSpan)
-> Result<ChioReceipt, OTelReceiptExportError>;
}High-Cardinality Attribute Denylist
Span attributes that would explode Prometheus cardinality (request IDs, user IDs, raw URLs) are stripped before forwarding to Prometheus-shaped sinks. The denylist is exposed via:
pub use denylist::{
denied_attribute_keys,
is_denied_attribute,
strip_denied_attributes,
strip_denied_batch_attributes,
strip_denied_span_attributes,
PROMETHEUS_DENIED_ATTRIBUTES,
};Use strip_denied_span_attributes before pushing spans to a Prometheus exporter; the denylist is opinionated about the keys that produce unbounded series.
Signal Hierarchy
Three signals, three roles:
| Signal | Role | Retention | Trust |
|---|---|---|---|
| Receipts | Audit-of-record | 90 days default; archive on rotation | Signed by kernel key |
| Tracing | Debugging, incident timelines | Per your log retention policy | Unsigned |
| Metrics | Dashboards, SLO alerts | Per your TSDB retention | Unsigned, aggregated |
For dispute resolution and compliance evidence, use receipts. For "is the kernel healthy right now?", use metrics. For "why did request X fail at 03:14", use tracing joined to the receipt for X by capability ID.
Operational Dashboards
Four signals matter on day-to-day dashboards:
- Deny rate per guard.
rate(chio_guard_deny_total[5m])grouped byguard_id, reason_class. Alert when a guard's deny rate jumps above its baseline; this is the early-warning channel for both attack signals and policy regressions. - Fuel consumption tail.
histogram_quantile(0.99, ...)onchio_guard_eval_duration_secondswithverdict="deny", reason_class="fuel". Indicates WASM modules approaching their fuel ceiling. - Circuit-breaker state changes. External-guard adapters log breaker open/half-open/closed transitions at
warn. Build an alert on the rate of these messages per provider. - Reload outcomes.
chio_guard_reload_totalgrouped byoutcome. A risingcanary_failedorrolled_backcount means hot-reload deployments are not landing cleanly.
Next Steps
- Receipts & Audit · the canonical audit signal that metrics and traces complement
- Performance & Tuning · which signals point at which bottlenecks
- Failure & Recovery · how to read circuit-breaker telemetry under load
- Custom Guards · how a guard registers its own metrics