Observability · Chio Docs

Source

Verified against crates/guards/chio-wasm-guards/src/metrics.rs, crates/guards/chio-wasm-guards/src/observability.rs, crates/observability/chio-otel-receipt-exporter/src/sink.rs, and tracing call sites across chio-kernel.

Tracing & Structured Logging

Every chio crate uses the tracing crate. Guards, kernel evaluation, and adapters emit structured events through five standard macros:

tracing::trace! · per-step state in evaluation. Off in production.
tracing::debug! · per-call detail useful for local development. Off in production unless investigating.
tracing::info! · lifecycle: kernel started, policy reloaded, checkpoint written, revocation propagated.
tracing::warn! · recoverable degradation and fail-closed fallbacks: cache-miss escalation, a guard rejecting invalid config and constructing a deny-all fallback (e.g. SqlQueryGuard::new on an over-broad denylist pattern), external-guard call failures, advisory verdicts.
tracing::error! · fail-closed events that produce Verdict::Deny via the fallback path: caught panics, poisoned mutexes, signing failures, store unreachable.

There is no single fixed field set across guards. The guards that emit tracing share one convention — a guard field carrying the guard's kebab-case Guard::name() — and otherwise attach whatever fields describe that guard's decision. Many deterministic guards, ForbiddenPathGuard among them, emit no tracing at all; their evidence lives in the receipt, not the log. Two representative call sites:

crates/guards/chio-guards/src/code_execution.rs

tracing::warn!(
    guard = "code-execution",
    module = %name,
    "denying code execution: dangerous module detected",
);

crates/guards/chio-guards/src/content_review.rs

tracing::warn!(
    guard = "content-review",
    service = %service,
    endpoint = %endpoint,
    detected_categories = ?categories,
    "content-review denied outbound message",
);

The guard field is the one stable join key. Its value is the same kebab-case string the guard reports through Guard::name() ("code-execution", "content-review", "forbidden-path") and the same string the pipeline records as GuardEvidence.guard_name on the receipt, so a log line and its receipt correlate on that value. The remaining fields (module, service, endpoint, code_len, ...) are guard-specific; do not assume a decision, path, or reason field on an arbitrary guard's events.

Receipt > log for audit

A log line is operational telemetry. The audit-of-record is the signed receipt. If you find yourself writing a query against unstructured log lines to answer "was this call denied yesterday?", pivot to ReceiptQuery; the signed-by-the-kernel evidence beats grep through stdout.

WASM Guard Metrics

Custom WASM guards emit a Prometheus-shaped metric family registered through GuardMetricRegistry. The family descriptors are static constants in chio_wasm_guards::metrics:

Metric	Kind	Labels	Unit
`chio_guard_eval_duration_seconds`	Histogram	`guard_id`, `verdict`	seconds
`chio_guard_fuel_consumed_total`	Counter	`guard_id`	fuel units
`chio_guard_verdict_total`	Counter	`guard_id`, `verdict`	count
`chio_guard_deny_total`	Counter	`guard_id`, `reason_class`	count
`chio_guard_reload_total`	Counter	`guard_id`, `outcome`	count
`chio_guard_host_call_duration_seconds`	Histogram	`guard_id`, `host_fn`	seconds
`chio_guard_module_bytes`	Gauge	`guard_id`, `epoch`	bytes

Histogram Buckets

Eval-duration buckets are tuned for WASM guard latencies:

crates/guards/chio-wasm-guards/src/metrics.rs

pub const EVAL_DURATION_BUCKETS_SECONDS: &[f64] = &[
    0.0001, 0.0005, 0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0,
];

pub const HOST_CALL_DURATION_BUCKETS_SECONDS: &[f64] = &[
    0.00001, 0.00005, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1,
];

Override these only if your dashboards demand different SLO breakpoints. They are shared by every guard so cross-guard p99 comparisons are meaningful out of the box.

Label Vocabulary

The label values are closed sets, bounded so each series has a finite label domain:

crates/guards/chio-wasm-guards/src/metrics.rs

// chio_guard_verdict_total.verdict
pub const VERDICT_LABEL_VALUES: &[&str] =
    &[VERDICT_ALLOW, VERDICT_DENY, VERDICT_REWRITE, VERDICT_ERROR];

// chio_guard_deny_total.reason_class
pub const REASON_CLASS_LABEL_VALUES: &[&str] = &[
    "policy", "pii", "secret", "prompt_injection",
    "oversize", "fuel", "trap", "malformed", "other",
];

// chio_guard_host_call_duration_seconds.host_fn
pub const HOST_FN_LABEL_VALUES: &[&str] = &[
    HOST_LOG, HOST_GET_CONFIG, HOST_GET_TIME_UNIX_SECS, HOST_FETCH_BLOB,
];

// chio_guard_reload_total.outcome
pub const RELOAD_OUTCOME_LABEL_VALUES: &[&str] =
    &[RELOAD_APPLIED, RELOAD_CANARY_FAILED, RELOAD_ROLLED_BACK];

Two of the nine reason_class values are the bounding valves, not error states. malformed marks a deny where host-side argument extraction failed before the guard module ran. other is the deliberate fallback that keeps chio_guard_deny_total{reason_class} finite even when a guard emits a novel free-form reason: classify_deny_reason_class maps any unrecognized or absent reason to it. Seeing reason_class="other" on a dashboard means the classifier assigned the fallback category.

Cardinality Limits

The registry caps unique guard IDs at MAX_GUARD_METRIC_CARDINALITY = 1024. Beyond that, registration returns GuardMetricRegistrationError with code E_GUARD_METRIC_CARDINALITY_EXCEEDED and the guard is dropped from the metric family.

The registry derives a stable 12-character guard ID from the guard digest:

crates/guards/chio-wasm-guards/src/metrics.rs

pub fn guard_id_label_from_digest(digest: &str) -> String {
    digest
        .strip_prefix("sha256:")
        .unwrap_or(digest)
        .chars()
        .take(12)
        .collect()
}

Two guards with the same first 12 hex digits would collide. In practice this is a non-issue at 1024 active guards; if your fleet approaches the cap, raise the cardinality limit explicitly via GuardMetricRegistry::with_max_guards(n) instead of truncating it to fewer characters.

Cardinality limits apply to metrics

Hitting the cap drops the guard from metrics; it does not affect evaluation. The kernel still runs the guard and still records receipts. Operators who care about completeness must monitor for E_GUARD_METRIC_CARDINALITY_EXCEEDED in tracing logs.

OpenTelemetry Receipt Exporter

The crate chio-otel-receipt-exporter accepts OTLP trace batches and writes derived chio receipts to a configured store. Two pieces:

OtlpGrpcIngress · accepts OTLP/gRPC trace exports in a narrow Rust representation.
ReceiptStoreSink · builds a ChioReceipt per OTLP span, signs it via ReceiptStoreSinkConfig.signing_keypair, and appends it to the configured Arc<dyn ReceiptStore>.

crates/observability/chio-otel-receipt-exporter/src/sink.rs

pub struct ReceiptStoreSinkConfig {
    pub signing_keypair: Keypair,
    // ... additional kernel-key, tenant, schema fields
}

pub struct ReceiptStoreSink { /* ... */ }

impl ReceiptStoreSink {
    pub fn new(store: Arc<dyn ReceiptStore>, config: ReceiptStoreSinkConfig) -> Self;
    pub fn export_traces(&self, export: &OtlpGrpcTraceExport)
        -> Result<ReceiptStoreSinkSummary, OTelReceiptExportError>;
    pub fn receipt_for_span(&self, span: &OtlpSpan)
        -> Result<ChioReceipt, OTelReceiptExportError>;
}

High-Cardinality Attribute Denylist

Span attributes that would explode Prometheus cardinality (request IDs, user IDs, raw URLs) are stripped before forwarding to Prometheus-shaped sinks. The denylist is exposed via:

crates/observability/chio-otel-receipt-exporter/src/lib.rs

pub use denylist::{
    denied_attribute_keys,
    is_denied_attribute,
    strip_denied_attributes,
    strip_denied_batch_attributes,
    strip_denied_span_attributes,
    PROMETHEUS_DENIED_ATTRIBUTES,
};

Use strip_denied_span_attributes before pushing spans to a Prometheus exporter; the denylist is opinionated about the keys that produce unbounded series.

Signal Hierarchy

Three signals, three roles:

Signal	Role	Retention	Trust
Receipts	Audit-of-record	90 days default; archive on rotation	Signed by kernel key
Tracing	Debugging, incident timelines	Per your log retention policy	Unsigned
Metrics	Dashboards, SLO alerts	Per your TSDB retention	Unsigned, aggregated

For dispute resolution and compliance evidence, use receipts. For "is the kernel healthy right now?", use metrics. For "why did request X fail at 03:14", use tracing joined to the receipt for X by capability ID.

Operational Dashboards

Four signals matter on day-to-day dashboards:

Deny rate per guard. rate(chio_guard_deny_total[5m]) grouped by guard_id, reason_class. Alert when a guard's deny rate jumps above its baseline; this is the early-warning channel for both attack signals and policy regressions.
Fuel consumption tail. histogram_quantile(0.99, ...) on chio_guard_eval_duration_seconds with verdict="deny", reason_class="fuel". Indicates WASM modules approaching their fuel ceiling.
External-guard call failures. The breaker state machine itself does not log transitions. What you alert on is the adapter's failure path: every failed provider call logs tracing::warn!(guard, error, "external guard failed") and trips the circuit breaker toward open. Build an alert on the rate of these per guard; a sustained rate means a provider is degraded and the breaker is shedding load.
Reload outcomes. chio_guard_reload_total grouped by outcome. A rising canary_failed or rolled_back count means hot-reload deployments are not landing cleanly.

Next Steps

Receipts & Audit · the canonical audit signal that metrics and traces complement
Performance & Tuning · which signals point at which bottlenecks
Failure & Recovery · how to read circuit-breaker telemetry under load
Custom Guards · how a guard registers its own metrics

PreviousReceipts & Audit NextPerformance & Tuning