Chio/Docs

Failure Modes

When a Chio formal gate fails in CI, the failure surfaces in a specific way: a Lean tactic that does not close, a Kani counterexample trace, a proptest divergence, an Apalache invariant violation, a dudect t-statistic above 4.5, a libFuzzer crash. This page is the catalog. Each section names the gate, the error format you will see, the common triggers, and the triage path back to a green run.

Some output is illustrative

Where the error format is fixed by the tool (Lean's elaborator, CBMC's counterexample shape, Apalache's checker output) the snippets below are verbatim or near-verbatim. Where the format depends on the failing input (an actual bug we have not seen) the snippet is illustrative; treat it as the shape rather than the literal bytes.

Lean 4 Build Failure

The first line of triage on scripts/check-formal-proofs.sh is whether lake build itself returned non-zero. The script prints ==> Lean 4 proof build and then runs the build under formal/lean4/Chio. A build failure looks like:

lake build stdout (illustrative)
==> Lean 4 proof build
info: leanprover/lean4:v4.28.0-rc1: trusting cached toolchain
warning: Chio.Proofs.Monotonicity:42:0: declaration uses 'sorry'
error: Chio.Proofs.Monotonicity:88:2: unknown tactic 'List.foldr_eq_true'
Build failed: Chio.Proofs.Monotonicity

Common triggers:

  • Tactic typo or rename. A Mathlib lemma name changed under the toolchain bump and the old name no longer resolves. Fix: search Mathlib for the new name; the toolchain pin is leanprover/lean4:v4.28.0-rc1 in formal/lean4/Chio/lean-toolchain and Mathlib version comes from lake-manifest.json.
  • Broken import. A new theorem in Proofs/<X>.lean imports a module not yet wired into Chio.lean. The gate requires every theorem in the inventory to be root-imported; Build failed on the missing module is the surface.
  • Axiom shift. A new axiom slipped in. The manifest's allowed_axioms list contains exactly Chio.Core.verifyCapabilitySignature; a new entry is rejected by the manifest sanity step.

Triage path:

  • Reproduce locally: cd formal/lean4/Chio && lake build. Lean prints the failing module and line number directly.
  • For a Mathlib rename, check the Mathlib changelog for the toolchain release; most renames are mechanical.
  • For an axiom shift, run #print axioms <name> on the suspect theorem to see which axioms it depends on.

Lean 4 Theorem Unproved

A literal sorry in any shipped Lean module fails the gate. The script runs ripgrep (or falls back to grep -RInw) and exits non-zero on any match.

check-formal-proofs.sh placeholder scan
==> Lean 4 placeholder scan
formal/lean4/Chio/Chio/Proofs/Monotonicity.lean:142:  sorry
formal proof check failed: found literal sorry in shipped Lean modules

The build can still succeed with a sorry because Lean elaborates sorry as an unsafe witness, but the placeholder scan rejects it independent of the build. Common triggers:

  • A draft proof landed in a PR before the proof body was finished. Fix: finish the proof, or revert the theorem to its previous statement.
  • A reviewer suggestion replaced a complex tactic chain with a placeholder while iterating. Fix: complete the tactic.

Build green but theorem unproved

A passing lake build with no errors does not mean the theorems are proven. The manifest gate at scripts/check-formal-proofs.sh is the load-bearing one: it fails the build on any sorry, on any inventory theorem missing from the elaborated environment, and on any theorem not root-imported from Chio.lean.

Aeneas Extraction Failure

The Aeneas pipeline reads crates/chio-kernel-core/src/formal_aeneas.rs and emits target/formal/aeneas-production/lean/{Funs.lean,Types.lean}. When the Rust source uses a feature Aeneas cannot extract, the run fails with an unsupported-feature message:

check-aeneas-production.sh stdout (illustrative)
error: Charon failed to translate the following item: chio_kernel_core::formal_aeneas::guard_step_allows
caused by: unsupported syntax: trait method dispatch through dyn
note: see https://github.com/AeneasVerif/charon for supported subset
extraction failed: 1 unsupported item

Common triggers:

  • A new branch added dyn dispatch, async, IO, or string formatting to a function inside formal_aeneas.rs. The file is documented as the pure numeric/boolean core; anything beyond that surface needs to live elsewhere.
  • A struct in the same file gained a non-extractable field type (e.g. Box<dyn Trait>).
  • The Charon or Aeneas pin moved and a previously-supported construct now sits outside the supported subset.

Triage path:

  • Reproduce: ./scripts/check-aeneas-production.sh.
  • Read the unsupported-item line; the Rust path it cites is the starting point.
  • If the addition belongs in the kernel core but not in the Aeneas-extracted subset, move the new code out of formal_aeneas.rs into a sibling module not in the manifest's covered_rust_modules for Aeneas. Add an audited assumption if the new code is load-bearing for a property.
  • If the addition belongs in the extraction, the manifest's required_assumption_ids may need a new entry; this is a P3 ticket per the runbook in .planning/trajectory/03-capability-algebra-properties.md.

Aeneas Equivalence Theorem Fails

The equivalence gate at scripts/check-aeneas-equivalence.sh elaborates Chio.Proofs.AeneasEquivalence. A break here means the extracted Lean function and the handwritten Lean mirror diverge in their definitional shape:

check-aeneas-equivalence.sh stdout (illustrative)
error: Chio.Proofs.AeneasEquivalence:108:0:
  type mismatch
    AeneasMirror.optionalCapIsSubset childHasCap childValue parentHasCap parentValue
  has type
    Bool
  but is expected to have type
    Bool /\ Nat
note: 'rfl' failed: the two sides do not unfold to the same term
Build failed: Chio.Proofs.AeneasEquivalence

Common triggers:

  • The Rust source in formal_aeneas.rs changed and the Aeneas extraction now produces a different Lean shape. The simp [AeneasMirror.<name>] rewrite no longer closes.
  • The handwritten Lean mirror moved (a new field added, parameter order changed).
  • A new extracted symbol was added to production.toml without a matching equivalence theorem.

Triage path:

  • Inspect the generated target/formal/aeneas-production/lean/Funs.lean for the failing symbol. Compare to the handwritten model in formal/lean4/Chio/Chio/Core/<X>.lean.
  • Update the matching theorem in formal/lean4/Chio/Chio/Proofs/AeneasEquivalence.lean (most P1-shaped equivalences are one tactic line: cases <axes> <;> simp [AeneasMirror.<name>] at h_subset h_parent).
  • Check the artifact JSON at target/formal/aeneas-production/equivalence-artifacts.json to confirm the SHA-256 of the Rust source changed; if it didn't, the failure is on the Lean side.

Kani Assertion Violation

A Kani run that finds a counterexample prints VERIFICATION:- FAILED plus a CBMC trace. The failure mode is per-harness; here is the shape for verify_delegation_chain_step:

cargo kani stdout (illustrative)
Failed Checks: assertion failed: server_parent_is_wildcard || server_parent_equals_child
File: kani_public_harnesses.rs
Line: 594

Counterexample:
  server_parent_is_wildcard:    false
  server_parent_equals_child:   false
  tool_parent_is_wildcard:      true
  tool_parent_equals_child:     true
  operations_child_subset:      true
  constraints_child_superset:   true
  parent_has_inv_cap:           false
  child_has_inv_cap:            false
  ...
  attenuates:                   true

SUMMARY:
 ** 1 of 5 failed
VERIFICATION:- FAILED
Verification Time: 1.247s

How to read it:

  • The first line names the failing assertion.
  • The counterexample lists every kani::any input plus every assigned variable at the failure point. Each line is one symbolic axis at one concrete value.
  • The summary line tallies failed checks. Each Kani harness has multiple assertions; one failure on any of them flips the whole harness to FAILED.

Common triggers:

  • Implementation regression. A change in NormalizedToolGrant::is_subset_of flipped a comparison axis. The harness's symbolic state space hits the new bug.
  • Spec regression. The assertions in the harness encode the wrong invariant. The implementation is fine but the harness asserts something the code is not supposed to satisfy.
  • Harness bug. The kani::assume predicates are too weak and admit an input that the production path would never see.

Triage path:

  • Reproduce as a regular Rust test: cargo kani --harness verify_delegation_chain_step --concrete-playback inplace. Kani writes the counterexample as a #[test] next to the harness so you can step through it under a debugger.
  • Classify (impl regression, spec regression, harness bug). File via formal/issue-templates/property-counterexample.md.
  • Fix the underlying issue; add a regression test that pins the counterexample so the same shrink shape is rejected even if the symbolic search misses it next time.

Kani Timeout or Unwind Exceeded

Kani uses a fixed unwind depth (the public lane runs with --default-unwind 8 --no-unwinding-checks). A timeout looks like:

cargo kani stdout (illustrative)
error: timed out after 600 seconds
note: try raising --default-unwind, or narrowing kani::assume bounds
SUMMARY:
 ** check incomplete (timeout)
VERIFICATION:- TIMEOUT
Verification Time: 600.000s

An unwind-exceeded message looks like:

cargo kani stdout (illustrative)
Failed Checks: unwinding assertion loop 0
note: this loop iterates more than 8 times; raise --default-unwind
File: kani_public_harnesses.rs
Line: 248

Common triggers:

  • A new branch added a wider symbolic input (e.g. widened from u8 to u32 directly) without a matching kani::assume bound. The state space exploded.
  • A loop bound increased above 8. The harness convention is to construct inputs so loops bound at 1 or 2 iterations regardless; a violation usually means the harness needs an assume_single_normalized_tool_grant-style structural pin.
  • The CI runner is slower than the budget. The reference timings in formal/rust-verification/kani-public-harnesses.toml:47-62 come from aarch64-apple-darwin; an x86_64 runner with cold cache may double them.

Triage path:

  • Run the failing harness alone: cargo kani -p chio-kernel-core --lib --harness <name> --default-unwind 8 --no-unwinding-checks.
  • Add kani::assume predicates to constrain new symbolic axes back to u8-shaped state.
  • If the timeout is structural, consider moving the harness to lanes.nightly_only in the toml and leaving a faster shape on PR.

TLA+ TLC or Apalache Invariant Violation

Apalache's safety lane runs SafetyInv = DomainsOK ∧ NoAllowAfterRevoke ∧ MonotoneLog ∧ AttenuationPreserving. A violation prints the offending state path and writes counterexample1.tla:

apalache stdout (illustrative)
PASS #11: BoundedChecker
Step 4: state matches the invariant violation
NoAllowAfterRevoke is violated.
The counterexample is in counterexample1.tla
EXITCODE: ERROR (12)
It took me 0 days  0 hours 14 min 22 sec

The counterexample file is a TLA+ trace from Init through the sequence of Next steps that lead to the violation. Read it top-to-bottom: each state block lists the values of every variable, and each transition block names which action fired (Attenuate, Revoke, Evaluate, or PropagateAny).

Common triggers:

  • A change to RevocationPropagation.tla made an action fire under conditions the invariant did not anticipate.
  • The model bounds (PROCS, CAPS, DEPTH_MAX) widened and a previously-unreachable state became reachable.

Triage path:

  • Read counterexample1.tla; identify the action that drove the violating step.
  • Compare to the matching Lean theorem. From formal/MAPPING.md: NoAllowAfterRevoke corresponds to evalToolCall_revoked_token_never_allows and evalToolCall_revoked_ancestor_never_allows in formal/lean4/Chio/Chio/Proofs/Evaluation.lean. A divergence between the two means one of them is wrong.
  • File via formal/issue-templates/property-counterexample.md and follow the runbook in .planning/trajectory/03-capability-algebra-properties.md.

TLA+ Deadlock

A deadlock means the model reached a state where no Next action is enabled. Apalache reports it like an invariant violation but with a deadlock-specific marker:

apalache stdout (illustrative)
PASS #11: BoundedChecker
Step 6: the system has reached a deadlock
The counterexample is in counterexample1.tla
EXITCODE: ERROR (12)

For RevocationPropagation.tla, a true deadlock should never happen: every authority can always take a stuttering step, and PropagateAny is guarded only by pending non-empty. A deadlock report usually means the model file gained a guard that conflicts with the action contract. Triage:

  • Check recent edits to RevocationPropagation.tla; a missing or over-tight guard on Attenuate, Revoke, or Evaluate is the usual cause.
  • Verify [][Next]_vars is intact in Spec.

Diff Test Divergence

The differential test crate at formal/diff-tests/ runs the reference spec against production Rust on random input pairs. A divergence prints both verdicts and the shrunken input:

cargo test -p chio-formal-diff-tests stdout (illustrative)
---- scope_subset_spec_matches_impl stdout ----
proptest: Saving this and similar failures in proptest-regressions/scope_diff.txt
Test failed: Scope subset mismatch!
  spec: false
  impl: true
  child grants: 1
  parent grants: 1
minimal failing input: ((SpecChioScope { grants: [...] }, ChioScope { ... }),
                       (SpecChioScope { grants: [...] }, ChioScope { ... }))

failures:
    scope_subset_spec_matches_impl

test result: FAILED. 7 passed; 1 failed; 0 ignored

proptest shrinks toward the smallest input that still produces the divergence and writes a regression seed to formal/diff-tests/proptest-regressions/<test>.txt so subsequent runs replay that seed first. The failing test prints both spec_result and impl_result plus a debug rendering of the input.

Common triggers:

  • The reference spec at formal/diff-tests/src/spec.rs drifted from the production Rust. Either the spec is wrong (rare; the spec is intentionally readable) or the impl is wrong.
  • A new constraint variant was added to SpecConstraint or to the production Constraint enum without a matching update on the other side.

Triage path:

  • Read the minimal failing input. The proptest debug output is usually small (one or two grants) after shrinking.
  • Run the spec and the impl on the same input by hand; one of them is wrong.
  • Fix the offending side; the regression seed file commits alongside the fix so future runs replay the exact case.

dudect Timing Leak

The dudect lane runs in crates/chio-kernel-core/tests/dudect/ and produces a Welch's t-statistic per harness. The gate script scripts/check-dudect-threshold.sh parses the dudect-bencher stdout for max t = <float> and compares against the literal threshold 4.5 from the dudect paper.

dudect-bencher stdout (illustrative)
running 1 bench
bench mac_eq_bench seeded with 0xdeadbeef_cafef00d
bench mac_eq_bench ... : n == +0.100M, max t = +5.34, max tau = +0.017

leak verdict: LEAK (max |t| = 5.34, threshold = 4.5)

A single leak verdict is not a CI failure on its own; the workflow at .github/workflows/dudect.yml applies a two-consecutive-runs rule: a leak alarm requires both the current and the prior nightly run to return non-zero from the threshold check. This filters one-off measurement noise.

Common triggers:

  • A change to the byte-equality compare path (signature, MAC tag, or other) introduced a short-circuit. The mac_eq harness drives two input classes (differ at first byte vs differ at last byte); a constant-time compare runs through every byte regardless and produces statistically identical timing.
  • A scope-subset compiler optimization started branching on attacker-visible values. The scope_subset harness drives small and large scopes through the subset check.
  • A red herring: the runner was noisy. dudect is a statistical detector, not a proof; a single-run leak verdict on a busy CI runner is common and the two-consecutive-runs rule is the correct response.

Triage path:

  • Reproduce locally: cargo test -p chio-kernel-core --features dudect --release mac_eq. Quiesce the machine.
  • If the leak persists on a quiet machine, inspect the compare path. The conventional fix for byte-equality is subtle::ConstantTimeEq; the conventional fix for higher-level branching is to lift the data-dependent branch out of the timing-sensitive path.
  • See Constant-Time Tests for the harness layout and the threshold rationale.

Fuzz Crash

ClusterFuzzLite drives the fuzz harnesses in fuzz/fuzz_targets/ and reports a crash like:

libFuzzer crash stdout (illustrative)
==42==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x...
READ of size 1 at 0x... thread T0
    #0 0x... in chio_manifest::parse_envelope manifest.rs:142
    #1 0x... in fuzz_target_manifest_envelope fuzz_targets/manifest_envelope.rs:18

Test unit written to ./crash-7d9b1f...
artifact_prefix='./'; Test unit written to ./crash-7d9b1f...
SUMMARY: AddressSanitizer: heap-buffer-overflow ... in chio_manifest::parse_envelope

libFuzzer writes the crash input to a file under the artifact prefix. Common triggers:

  • A trust-boundary parser ( chio-manifest, chio-core-types, chio-mcp-adapter, and so on) panicked on an adversarial input. The fuzz workspace inherits unwrap_used = deny and expect_used = deny so a panic surfaces as a real bug, not a missing error-handling.
  • An ASAN report on a buffer overflow or use-after-free.

Triage path:

  • Reproduce: cargo +nightly fuzz run <target> crash-7d9b1f....
  • Minimize: cargo +nightly fuzz cmin <target>.
  • Determine the owning crate via fuzz/owners.toml; file the issue against that owner.
  • Add the minimized input to the corpus at fuzz/corpus/<target>/ as a permanent regression seed; the seed survives crash repros.

CI Gate Not Invoked

The proof manifest at formal/proof-manifest.toml names ten gate commands. Every release-facing claim depends on every named gate running. A drift between the manifest and the workflow files at .github/workflows/ is itself a gate failure.

How to detect:

  • The proof report aggregator at scripts/check-proof-report.sh reads target/formal/ for per-gate artifacts. A missing artifact for a manifest-named gate fails the aggregator.
  • The CI workflow at .github/workflows/ci.yml and the per-tool workflows at .github/workflows/{dudect,fuzz,mutants,...}.yml each invoke specific scripts; a manifest entry without a matching workflow invocation is silent until the aggregator runs.

Triage path:

  • Run the aggregator locally: ./scripts/check-proof-report.sh. The script prints which gate is missing.
  • Add the missing invocation to the appropriate workflow file. The convention is one shell line per gate so the failure surface stays tied to the manifest.

Closing

Every gate failure on this page is recoverable with the matching triage path. The constant across all of them: the proof manifest at formal/proof-manifest.toml plus the theorem inventory at formal/theorem-inventory.json plus the assumption registry at formal/assumptions.toml are the source of truth. If the manifests disagree with a gate's verdict, fix the gate or update the manifest in the same PR.