Chio/Docs

Observability

Chio's primary audit trail is the signed receipt log. Metrics and traces are operational telemetry that complement receipts but do not replace them. This page covers the three signals operators actually run dashboards against: tracing spans, the WASM guard metric family, and the OpenTelemetry receipt exporter that bridges OTLP traces into the receipt store.

Source

Verified against crates/chio-wasm-guards/src/metrics.rs, crates/chio-wasm-guards/src/observability.rs, crates/chio-otel-receipt-exporter/src/lib.rs, and tracing call sites across chio-kernel.

Tracing & Structured Logging

Every chio crate uses the tracing crate. Guards, kernel evaluation, and adapters emit structured events through five standard macros:

  • tracing::trace! · per-step state in evaluation. Off in production.
  • tracing::debug! · per-call detail useful for local development. Off in production unless investigating.
  • tracing::info! · lifecycle: kernel started, policy reloaded, checkpoint written, revocation propagated.
  • tracing::warn! · recoverable degradation: cache miss escalation, regex compile failures (which lead to permissive guards), circuit-breaker trips, advisory verdicts.
  • tracing::error! · fail-closed events that produce Verdict::Deny via the fallback path: caught panics, poisoned mutexes, signing failures, store unreachable.

Each guard emits events with a fixed structured-field convention:

rust
tracing::warn!(
    guard = "ForbiddenPathGuard",
    decision = "deny",
    path = %normalized_path,
    reason = %reason_class,
    "denied path outside allowlist",
);

The four canonical fields:

  • guard · stable guard identifier. Match this against the same field in receipts and metrics for join correlation.
  • decision · allow, deny, or rewrite for guards that mutate.
  • path · (when applicable) the canonical request path or tool name.
  • reason · short reason class. The same vocabulary as the metric label reason_class below.

Receipt > log for audit

A log line is operational telemetry. The audit-of-record is the signed receipt. If you find yourself writing a query against unstructured log lines to answer "was this call denied yesterday?", pivot to ReceiptQuery; the signed-by-the-kernel evidence beats grep through stdout.

WASM Guard Metrics

Custom WASM guards emit a Prometheus-shaped metric family registered through GuardMetricRegistry. The family descriptors are static constants in chio_wasm_guards::metrics:

MetricKindLabelsUnit
chio_guard_eval_duration_secondsHistogramguard_id, verdictseconds
chio_guard_fuel_consumed_totalCounterguard_idfuel units
chio_guard_verdict_totalCounterguard_id, verdictcount
chio_guard_deny_totalCounterguard_id, reason_classcount
chio_guard_reload_totalCounterguard_id, outcomecount
chio_guard_host_call_duration_secondsHistogramguard_id, host_fnseconds
chio_guard_module_bytesGaugeguard_id, epochbytes

Histogram Buckets

Eval-duration buckets are tuned for WASM guard latencies:

crates/chio-wasm-guards/src/metrics.rs
pub const EVAL_DURATION_BUCKETS_SECONDS: &[f64] = &[
    0.0001, 0.0005, 0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0,
];

pub const HOST_CALL_DURATION_BUCKETS_SECONDS: &[f64] = &[
    0.00001, 0.00005, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1,
];

Override these only if your dashboards demand different SLO breakpoints. They are shared by every guard so cross-guard p99 comparisons are meaningful out of the box.

Label Vocabulary

The label values are closed sets. Anything else is a coding bug:

crates/chio-wasm-guards/src/metrics.rs
// chio_guard_verdict_total.verdict
pub const VERDICT_LABEL_VALUES: &[&str] =
    &[VERDICT_ALLOW, VERDICT_DENY, VERDICT_REWRITE, VERDICT_ERROR];

// chio_guard_deny_total.reason_class
pub const REASON_CLASS_LABEL_VALUES: &[&str] = &[
    "policy", "pii", "secret", "prompt_injection",
    "oversize", "fuel", "trap",
];

// chio_guard_host_call_duration_seconds.host_fn
pub const HOST_FN_LABEL_VALUES: &[&str] = &[
    HOST_LOG, HOST_GET_CONFIG, HOST_GET_TIME_UNIX_SECS, HOST_FETCH_BLOB,
];

// chio_guard_reload_total.outcome
pub const RELOAD_OUTCOME_LABEL_VALUES: &[&str] =
    &[RELOAD_APPLIED, RELOAD_CANARY_FAILED, RELOAD_ROLLED_BACK];

Cardinality Limits

The registry caps unique guard IDs at MAX_GUARD_METRIC_CARDINALITY = 1024. Beyond that, registration returns GuardMetricRegistrationError with code E_GUARD_METRIC_CARDINALITY_EXCEEDED and the guard is dropped from the metric family.

The registry derives a stable 12-character guard ID from the guard digest:

crates/chio-wasm-guards/src/metrics.rs
pub fn guard_id_label_from_digest(digest: &str) -> String {
    digest
        .strip_prefix("sha256:")
        .unwrap_or(digest)
        .chars()
        .take(12)
        .collect()
}

Two guards with the same first 12 hex digits would collide. In practice this is a non-issue at 1024 active guards; if your fleet approaches the cap, raise the cardinality limit explicitly via GuardMetricRegistry::with_max_guards(n) rather than truncating to fewer characters.

Cardinality is a metrics hazard, not a guard hazard

Hitting the cap drops the guard from metrics; it does not affect evaluation. The kernel still runs the guard and still records receipts. Operators who care about completeness must monitor for E_GUARD_METRIC_CARDINALITY_EXCEEDED in tracing logs.

OpenTelemetry Receipt Exporter

The crate chio-otel-receipt-exporter accepts OTLP trace batches and writes derived chio receipts to a configured store. Two pieces:

  • OtlpGrpcIngress · accepts OTLP/gRPC trace exports in a narrow Rust representation.
  • ReceiptStoreSink · builds a ChioReceipt per OTLP span, signs it via ReceiptStoreSinkConfig.signing_keypair, and appends it to the configured Arc<dyn ReceiptStore>.
crates/chio-otel-receipt-exporter/src/lib.rs
pub struct ReceiptStoreSinkConfig {
    pub signing_keypair: Keypair,
    // ... additional kernel-key, tenant, schema fields
}

pub struct ReceiptStoreSink { /* ... */ }

impl ReceiptStoreSink {
    pub fn new(store: Arc<dyn ReceiptStore>, config: ReceiptStoreSinkConfig) -> Self;
    pub fn export_traces(&self, batch: OtlpGrpcTraceExport)
        -> Result<ReceiptStoreSinkSummary, OTelReceiptExportError>;
    pub fn receipt_for_span(&self, span: &OtlpSpan)
        -> Result<ChioReceipt, OTelReceiptExportError>;
}

High-Cardinality Attribute Denylist

Span attributes that would explode Prometheus cardinality (request IDs, user IDs, raw URLs) are stripped before forwarding to Prometheus-shaped sinks. The denylist is exposed via:

crates/chio-otel-receipt-exporter/src/lib.rs
pub use denylist::{
    denied_attribute_keys,
    is_denied_attribute,
    strip_denied_attributes,
    strip_denied_batch_attributes,
    strip_denied_span_attributes,
    PROMETHEUS_DENIED_ATTRIBUTES,
};

Use strip_denied_span_attributes before pushing spans to a Prometheus exporter; the denylist is opinionated about the keys that produce unbounded series.


Signal Hierarchy

Three signals, three roles:

SignalRoleRetentionTrust
ReceiptsAudit-of-record90 days default; archive on rotationSigned by kernel key
TracingDebugging, incident timelinesPer your log retention policyUnsigned
MetricsDashboards, SLO alertsPer your TSDB retentionUnsigned, aggregated

For dispute resolution and compliance evidence, use receipts. For "is the kernel healthy right now?", use metrics. For "why did request X fail at 03:14", use tracing joined to the receipt for X by capability ID.


Operational Dashboards

Four signals matter on day-to-day dashboards:

  • Deny rate per guard. rate(chio_guard_deny_total[5m]) grouped by guard_id, reason_class. Alert when a guard's deny rate jumps above its baseline; this is the early-warning channel for both attack signals and policy regressions.
  • Fuel consumption tail. histogram_quantile(0.99, ...) on chio_guard_eval_duration_seconds with verdict="deny", reason_class="fuel". Indicates WASM modules approaching their fuel ceiling.
  • Circuit-breaker state changes. External-guard adapters log breaker open/half-open/closed transitions at warn. Build an alert on the rate of these messages per provider.
  • Reload outcomes. chio_guard_reload_total grouped by outcome. A rising canary_failed or rolled_back count means hot-reload deployments are not landing cleanly.

Next Steps

Observability · Chio Docs