Failure Modes
When a Chio formal gate fails in CI, the failure surfaces in a specific way: a Lean tactic that does not close, a Kani counterexample trace, a proptest divergence, an Apalache invariant violation, a dudect t-statistic above 4.5, a libFuzzer crash. This page is the catalog. Each section names the gate, the error format you will see, the common triggers, and the triage path back to a green run.
Some output is illustrative
Lean 4 Build Failure
The first line of triage on scripts/check-formal-proofs.sh is whether lake build itself returned non-zero. The script prints ==> Lean 4 proof build and then runs the build under formal/lean4/Chio. A build failure looks like:
==> Lean 4 proof build
info: leanprover/lean4:v4.28.0-rc1: trusting cached toolchain
warning: Chio.Proofs.Monotonicity:42:0: declaration uses 'sorry'
error: Chio.Proofs.Monotonicity:88:2: unknown tactic 'List.foldr_eq_true'
Build failed: Chio.Proofs.MonotonicityCommon triggers:
- Tactic typo or rename. A Mathlib lemma name changed under the toolchain bump and the old name no longer resolves. Fix: search Mathlib for the new name; the toolchain pin is
leanprover/lean4:v4.28.0-rc1informal/lean4/Chio/lean-toolchainand Mathlib version comes fromlake-manifest.json. - Broken import. A new theorem in
Proofs/<X>.leanimports a module not yet wired intoChio.lean. The gate requires every theorem in the inventory to be root-imported;Build failedon the missing module is the surface. - Axiom shift. A new axiom slipped in. The manifest's
allowed_axiomslist contains exactlyChio.Core.verifyCapabilitySignature; a new entry is rejected by the manifest sanity step.
Triage path:
- Reproduce locally:
cd formal/lean4/Chio && lake build. Lean prints the failing module and line number directly. - For a Mathlib rename, check the Mathlib changelog for the toolchain release; most renames are mechanical.
- For an axiom shift, run
#print axioms <name>on the suspect theorem to see which axioms it depends on.
Lean 4 Theorem Unproved
A literal sorry in any shipped Lean module fails the gate. The script runs ripgrep (or falls back to grep -RInw) and exits non-zero on any match.
==> Lean 4 placeholder scan
formal/lean4/Chio/Chio/Proofs/Monotonicity.lean:142: sorry
formal proof check failed: found literal sorry in shipped Lean modulesThe build can still succeed with a sorry because Lean elaborates sorry as an unsafe witness, but the placeholder scan rejects it independent of the build. Common triggers:
- A draft proof landed in a PR before the proof body was finished. Fix: finish the proof, or revert the theorem to its previous statement.
- A reviewer suggestion replaced a complex tactic chain with a placeholder while iterating. Fix: complete the tactic.
Build green but theorem unproved
lake build with no errors does not mean the theorems are proven. The manifest gate at scripts/check-formal-proofs.sh is the load-bearing one: it fails the build on any sorry, on any inventory theorem missing from the elaborated environment, and on any theorem not root-imported from Chio.lean.Aeneas Extraction Failure
The Aeneas pipeline reads crates/chio-kernel-core/src/formal_aeneas.rs and emits target/formal/aeneas-production/lean/{Funs.lean,Types.lean}. When the Rust source uses a feature Aeneas cannot extract, the run fails with an unsupported-feature message:
error: Charon failed to translate the following item: chio_kernel_core::formal_aeneas::guard_step_allows
caused by: unsupported syntax: trait method dispatch through dyn
note: see https://github.com/AeneasVerif/charon for supported subset
extraction failed: 1 unsupported itemCommon triggers:
- A new branch added
dyndispatch, async, IO, or string formatting to a function insideformal_aeneas.rs. The file is documented as the pure numeric/boolean core; anything beyond that surface needs to live elsewhere. - A struct in the same file gained a non-extractable field type (e.g.
Box<dyn Trait>). - The Charon or Aeneas pin moved and a previously-supported construct now sits outside the supported subset.
Triage path:
- Reproduce:
./scripts/check-aeneas-production.sh. - Read the unsupported-item line; the Rust path it cites is the starting point.
- If the addition belongs in the kernel core but not in the Aeneas-extracted subset, move the new code out of
formal_aeneas.rsinto a sibling module not in the manifest'scovered_rust_modulesfor Aeneas. Add an audited assumption if the new code is load-bearing for a property. - If the addition belongs in the extraction, the manifest's
required_assumption_idsmay need a new entry; this is a P3 ticket per the runbook in.planning/trajectory/03-capability-algebra-properties.md.
Aeneas Equivalence Theorem Fails
The equivalence gate at scripts/check-aeneas-equivalence.sh elaborates Chio.Proofs.AeneasEquivalence. A break here means the extracted Lean function and the handwritten Lean mirror diverge in their definitional shape:
error: Chio.Proofs.AeneasEquivalence:108:0:
type mismatch
AeneasMirror.optionalCapIsSubset childHasCap childValue parentHasCap parentValue
has type
Bool
but is expected to have type
Bool /\ Nat
note: 'rfl' failed: the two sides do not unfold to the same term
Build failed: Chio.Proofs.AeneasEquivalenceCommon triggers:
- The Rust source in
formal_aeneas.rschanged and the Aeneas extraction now produces a different Lean shape. Thesimp [AeneasMirror.<name>]rewrite no longer closes. - The handwritten Lean mirror moved (a new field added, parameter order changed).
- A new extracted symbol was added to
production.tomlwithout a matching equivalence theorem.
Triage path:
- Inspect the generated
target/formal/aeneas-production/lean/Funs.leanfor the failing symbol. Compare to the handwritten model informal/lean4/Chio/Chio/Core/<X>.lean. - Update the matching theorem in
formal/lean4/Chio/Chio/Proofs/AeneasEquivalence.lean(most P1-shaped equivalences are one tactic line:cases <axes> <;> simp [AeneasMirror.<name>] at h_subset h_parent). - Check the artifact JSON at
target/formal/aeneas-production/equivalence-artifacts.jsonto confirm the SHA-256 of the Rust source changed; if it didn't, the failure is on the Lean side.
Kani Assertion Violation
A Kani run that finds a counterexample prints VERIFICATION:- FAILED plus a CBMC trace. The failure mode is per-harness; here is the shape for verify_delegation_chain_step:
Failed Checks: assertion failed: server_parent_is_wildcard || server_parent_equals_child
File: kani_public_harnesses.rs
Line: 594
Counterexample:
server_parent_is_wildcard: false
server_parent_equals_child: false
tool_parent_is_wildcard: true
tool_parent_equals_child: true
operations_child_subset: true
constraints_child_superset: true
parent_has_inv_cap: false
child_has_inv_cap: false
...
attenuates: true
SUMMARY:
** 1 of 5 failed
VERIFICATION:- FAILED
Verification Time: 1.247sHow to read it:
- The first line names the failing assertion.
- The counterexample lists every
kani::anyinput plus every assigned variable at the failure point. Each line is one symbolic axis at one concrete value. - The summary line tallies failed checks. Each Kani harness has multiple assertions; one failure on any of them flips the whole harness to FAILED.
Common triggers:
- Implementation regression. A change in
NormalizedToolGrant::is_subset_offlipped a comparison axis. The harness's symbolic state space hits the new bug. - Spec regression. The assertions in the harness encode the wrong invariant. The implementation is fine but the harness asserts something the code is not supposed to satisfy.
- Harness bug. The
kani::assumepredicates are too weak and admit an input that the production path would never see.
Triage path:
- Reproduce as a regular Rust test:
cargo kani --harness verify_delegation_chain_step --concrete-playback inplace. Kani writes the counterexample as a#[test]next to the harness so you can step through it under a debugger. - Classify (impl regression, spec regression, harness bug). File via
formal/issue-templates/property-counterexample.md. - Fix the underlying issue; add a regression test that pins the counterexample so the same shrink shape is rejected even if the symbolic search misses it next time.
Kani Timeout or Unwind Exceeded
Kani uses a fixed unwind depth (the public lane runs with --default-unwind 8 --no-unwinding-checks). A timeout looks like:
error: timed out after 600 seconds
note: try raising --default-unwind, or narrowing kani::assume bounds
SUMMARY:
** check incomplete (timeout)
VERIFICATION:- TIMEOUT
Verification Time: 600.000sAn unwind-exceeded message looks like:
Failed Checks: unwinding assertion loop 0
note: this loop iterates more than 8 times; raise --default-unwind
File: kani_public_harnesses.rs
Line: 248Common triggers:
- A new branch added a wider symbolic input (e.g. widened from
u8tou32directly) without a matchingkani::assumebound. The state space exploded. - A loop bound increased above 8. The harness convention is to construct inputs so loops bound at 1 or 2 iterations regardless; a violation usually means the harness needs an
assume_single_normalized_tool_grant-style structural pin. - The CI runner is slower than the budget. The reference timings in
formal/rust-verification/kani-public-harnesses.toml:47-62come from aarch64-apple-darwin; an x86_64 runner with cold cache may double them.
Triage path:
- Run the failing harness alone:
cargo kani -p chio-kernel-core --lib --harness <name> --default-unwind 8 --no-unwinding-checks. - Add
kani::assumepredicates to constrain new symbolic axes back tou8-shaped state. - If the timeout is structural, consider moving the harness to
lanes.nightly_onlyin the toml and leaving a faster shape on PR.
TLA+ TLC or Apalache Invariant Violation
Apalache's safety lane runs SafetyInv = DomainsOK ∧ NoAllowAfterRevoke ∧ MonotoneLog ∧ AttenuationPreserving. A violation prints the offending state path and writes counterexample1.tla:
PASS #11: BoundedChecker
Step 4: state matches the invariant violation
NoAllowAfterRevoke is violated.
The counterexample is in counterexample1.tla
EXITCODE: ERROR (12)
It took me 0 days 0 hours 14 min 22 secThe counterexample file is a TLA+ trace from Init through the sequence of Next steps that lead to the violation. Read it top-to-bottom: each state block lists the values of every variable, and each transition block names which action fired (Attenuate, Revoke, Evaluate, or PropagateAny).
Common triggers:
- A change to
RevocationPropagation.tlamade an action fire under conditions the invariant did not anticipate. - The model bounds (
PROCS,CAPS,DEPTH_MAX) widened and a previously-unreachable state became reachable.
Triage path:
- Read
counterexample1.tla; identify the action that drove the violating step. - Compare to the matching Lean theorem. From
formal/MAPPING.md:NoAllowAfterRevokecorresponds toevalToolCall_revoked_token_never_allowsandevalToolCall_revoked_ancestor_never_allowsinformal/lean4/Chio/Chio/Proofs/Evaluation.lean. A divergence between the two means one of them is wrong. - File via
formal/issue-templates/property-counterexample.mdand follow the runbook in.planning/trajectory/03-capability-algebra-properties.md.
TLA+ Deadlock
A deadlock means the model reached a state where no Next action is enabled. Apalache reports it like an invariant violation but with a deadlock-specific marker:
PASS #11: BoundedChecker
Step 6: the system has reached a deadlock
The counterexample is in counterexample1.tla
EXITCODE: ERROR (12)For RevocationPropagation.tla, a true deadlock should never happen: every authority can always take a stuttering step, and PropagateAny is guarded only by pending non-empty. A deadlock report usually means the model file gained a guard that conflicts with the action contract. Triage:
- Check recent edits to
RevocationPropagation.tla; a missing or over-tight guard onAttenuate,Revoke, orEvaluateis the usual cause. - Verify
[][Next]_varsis intact inSpec.
Diff Test Divergence
The differential test crate at formal/diff-tests/ runs the reference spec against production Rust on random input pairs. A divergence prints both verdicts and the shrunken input:
---- scope_subset_spec_matches_impl stdout ----
proptest: Saving this and similar failures in proptest-regressions/scope_diff.txt
Test failed: Scope subset mismatch!
spec: false
impl: true
child grants: 1
parent grants: 1
minimal failing input: ((SpecChioScope { grants: [...] }, ChioScope { ... }),
(SpecChioScope { grants: [...] }, ChioScope { ... }))
failures:
scope_subset_spec_matches_impl
test result: FAILED. 7 passed; 1 failed; 0 ignoredproptest shrinks toward the smallest input that still produces the divergence and writes a regression seed to formal/diff-tests/proptest-regressions/<test>.txt so subsequent runs replay that seed first. The failing test prints both spec_result and impl_result plus a debug rendering of the input.
Common triggers:
- The reference spec at
formal/diff-tests/src/spec.rsdrifted from the production Rust. Either the spec is wrong (rare; the spec is intentionally readable) or the impl is wrong. - A new constraint variant was added to
SpecConstraintor to the productionConstraintenum without a matching update on the other side.
Triage path:
- Read the minimal failing input. The proptest debug output is usually small (one or two grants) after shrinking.
- Run the spec and the impl on the same input by hand; one of them is wrong.
- Fix the offending side; the regression seed file commits alongside the fix so future runs replay the exact case.
dudect Timing Leak
The dudect lane runs in crates/chio-kernel-core/tests/dudect/ and produces a Welch's t-statistic per harness. The gate script scripts/check-dudect-threshold.sh parses the dudect-bencher stdout for max t = <float> and compares against the literal threshold 4.5 from the dudect paper.
running 1 bench
bench mac_eq_bench seeded with 0xdeadbeef_cafef00d
bench mac_eq_bench ... : n == +0.100M, max t = +5.34, max tau = +0.017
leak verdict: LEAK (max |t| = 5.34, threshold = 4.5)A single leak verdict is not a CI failure on its own; the workflow at .github/workflows/dudect.yml applies a two-consecutive-runs rule: a leak alarm requires both the current and the prior nightly run to return non-zero from the threshold check. This filters one-off measurement noise.
Common triggers:
- A change to the byte-equality compare path (signature, MAC tag, or other) introduced a short-circuit. The mac_eq harness drives two input classes (differ at first byte vs differ at last byte); a constant-time compare runs through every byte regardless and produces statistically identical timing.
- A scope-subset compiler optimization started branching on attacker-visible values. The scope_subset harness drives small and large scopes through the subset check.
- A red herring: the runner was noisy. dudect is a statistical detector, not a proof; a single-run leak verdict on a busy CI runner is common and the two-consecutive-runs rule is the correct response.
Triage path:
- Reproduce locally:
cargo test -p chio-kernel-core --features dudect --release mac_eq. Quiesce the machine. - If the leak persists on a quiet machine, inspect the compare path. The conventional fix for byte-equality is
subtle::ConstantTimeEq; the conventional fix for higher-level branching is to lift the data-dependent branch out of the timing-sensitive path. - See Constant-Time Tests for the harness layout and the threshold rationale.
Fuzz Crash
ClusterFuzzLite drives the fuzz harnesses in fuzz/fuzz_targets/ and reports a crash like:
==42==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x...
READ of size 1 at 0x... thread T0
#0 0x... in chio_manifest::parse_envelope manifest.rs:142
#1 0x... in fuzz_target_manifest_envelope fuzz_targets/manifest_envelope.rs:18
Test unit written to ./crash-7d9b1f...
artifact_prefix='./'; Test unit written to ./crash-7d9b1f...
SUMMARY: AddressSanitizer: heap-buffer-overflow ... in chio_manifest::parse_envelopelibFuzzer writes the crash input to a file under the artifact prefix. Common triggers:
- A trust-boundary parser (
chio-manifest,chio-core-types,chio-mcp-adapter, and so on) panicked on an adversarial input. The fuzz workspace inheritsunwrap_used = denyandexpect_used = denyso a panic surfaces as a real bug, not a missing error-handling. - An ASAN report on a buffer overflow or use-after-free.
Triage path:
- Reproduce:
cargo +nightly fuzz run <target> crash-7d9b1f.... - Minimize:
cargo +nightly fuzz cmin <target>. - Determine the owning crate via
fuzz/owners.toml; file the issue against that owner. - Add the minimized input to the corpus at
fuzz/corpus/<target>/as a permanent regression seed; the seed survives crash repros.
CI Gate Not Invoked
The proof manifest at formal/proof-manifest.toml names ten gate commands. Every release-facing claim depends on every named gate running. A drift between the manifest and the workflow files at .github/workflows/ is itself a gate failure.
How to detect:
- The proof report aggregator at
scripts/check-proof-report.shreadstarget/formal/for per-gate artifacts. A missing artifact for a manifest-named gate fails the aggregator. - The CI workflow at
.github/workflows/ci.ymland the per-tool workflows at.github/workflows/{dudect,fuzz,mutants,...}.ymleach invoke specific scripts; a manifest entry without a matching workflow invocation is silent until the aggregator runs.
Triage path:
- Run the aggregator locally:
./scripts/check-proof-report.sh. The script prints which gate is missing. - Add the missing invocation to the appropriate workflow file. The convention is one shell line per gate so the failure surface stays tied to the manifest.
Closing
Every gate failure on this page is recoverable with the matching triage path. The constant across all of them: the proof manifest at formal/proof-manifest.toml plus the theorem inventory at formal/theorem-inventory.json plus the assumption registry at formal/assumptions.toml are the source of truth. If the manifests disagree with a gate's verdict, fix the gate or update the manifest in the same PR.
- Formal Quickstart · re-run every gate from a fresh clone in ten minutes.
- Lean 4 Proofs · the proof structure and the build flow.
- Aeneas Pipeline · the extraction pipeline and the equivalence theorems.
- Kani Harnesses · the public harness inventory and the unwind bound.
- TLA+ Specs · the cross-authority protocol model and the named invariants.
- Differential Tests · the proptest generators and the spec/impl comparison.
- Constant-Time Tests · the dudect harnesses and the leak threshold rationale.
- Fuzz Infrastructure · the libFuzzer trust-boundary harness inventory.