Formal Quickstart
This page walks an SRE, an engineer, or an academic from a fresh clone of chio to a green run of every formal gate listed in formal/proof-manifest.toml. Wall clock on a current Apple silicon laptop with hot caches is roughly ten minutes. The first run takes longer because the Lean toolchain and Mathlib cache download from upstream.
What success looks like
passed or SUCCESSFUL. The Lean build prints Build completed successfully with no sorry findings. Kani prints VERIFICATION:- SUCCESSFUL for every harness in lanes.pr. Apalache prints EXITCODE: OK. The differential test crate finishes with test result: ok.Time and Machine
Approximate wall clock per stage on an M-series laptop with hot cargo and lake caches:
| Stage | First run | Hot cache |
|---|---|---|
| Toolchain install (rustup, elan, kani, apalache) | 5 to 8 min | 0 |
Lean root build (lake build) | 3 to 5 min | 10 to 20 s |
| Aeneas pilot + production extraction | 1 to 2 min | 15 to 30 s |
| Aeneas-Lean equivalence theorems | 15 to 30 s | 5 to 10 s |
| Kani public lane (14 harnesses) | 3 to 5 min | 2 to 3 min |
Differential tests (cargo test -p chio-formal-diff-tests) | 1 to 2 min | 20 to 40 s |
| TLA+ safety lane via Apalache | 3 to 4 hr | 3 to 4 hr |
| Adapter no-bypass + portable-kernel checks | 30 to 60 s | 5 to 10 s |
Apalache is slow
computation length 16 onRevocationPropagation.tla and finishes in roughly three hours of wall clock per the manifest's sample stdout. The other gates fit inside a ten-minute window. Run Apalache as a separate background job, or rely on the nightly CI lane to drive it.Machine baseline: 4 physical cores, 8 GB RAM, 10 GB free disk. Kani's symbolic backend (CBMC) wants room for SAT search; if the machine has under 8 GB you may see the heavier harnesses (public_verify_capability_rejects_untrusted_issuer_before_signature at 31s budget, public_evaluate_rejects_untrusted_issuer_before_dispatch at 28s budget) swap and miss their wall clock. The reference budgets in formal/rust-verification/kani-public-harnesses.toml:47-62 were measured on aarch64-apple-darwin with a hot cargo cache.
Prerequisites
Five toolchains pin to specific versions. Install each one before you run the gates.
Rust
The workspace at Cargo.toml sets rust-version = "1.93" on edition 2021. Install via rustup; cargo will refuse to build with an older toolchain.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
rustup toolchain install 1.93.0
rustup default 1.93.0
rustc --version # rustc 1.93.0Lean 4
The Lean toolchain pin lives at formal/lean4/Chio/lean-toolchain: leanprover/lean4:v4.28.0-rc1. Install elan; it reads the pin and downloads the matching toolchain automatically on first lake build.
curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh -s -- -y
source "$HOME/.elan/env"
# elan picks up the toolchain pin on the first lake invocation; no
# 'rustup default'-style command is required.
elan --versionKani
Kani is the bounded model checker for Rust on top of CBMC. It installs as a cargo subcommand. The PR lane uses cargo-kani 0.67.0 per the comment block at formal/rust-verification/kani-public-harnesses.toml:47-62.
cargo install --locked kani-verifier
cargo kani setup
cargo kani --versionApalache (TLA+)
The TLA+ lane uses Apalache, not the older TLC model checker. Apalache ships as a self-contained Java jar. tools/install-apalache.sh in the chio tree pins the version and unpacks it under tools/apalache/.
./tools/install-apalache.sh
./tools/apalache/bin/apalache-mc --versionAeneas and Charon
Aeneas extracts a functional model from Rust source via Charon. The chio repo pins both versions in its installer scripts under tools/; the production extraction at formal/aeneas/production.toml names the source as crates/chio-kernel-core/src/formal_aeneas.rs. The check scripts call into the pinned binaries directly, so a manual install is rarely needed beyond the first run.
Other tools on PATH
python3(3.10+). The gate scripts call into Python for manifest parsing; macOS and most Linux distros ship a working version.jqfor the proof-report aggregator atscripts/check-proof-report.sh.rg(ripgrep) for the sorry-scan inscripts/check-formal-proofs.sh. The script falls back togrep -RInwif rg is missing.
Clone and Bootstrap
Clone chio and warm the cargo cache. The first cargo fetch pulls every dependency the formal lanes need; subsequent gate runs work offline.
git clone https://github.com/backbay-labs/chio.git
cd chio
cargo fetch --workspace
( cd formal/lean4/Chio && lake update && lake exe cache get )lake exe cache get pulls the precompiled Mathlib oleans from the upstream cache server. On a cold machine that step downloads a few hundred megabytes; on a warm machine it is a no-op.
Run Each Gate
The proof manifest at formal/proof-manifest.toml names every gate command. Each one is a shell script under scripts/ and prints a per-stage verdict.
Lean root build and inventory check
scripts/check-formal-proofs.sh wraps three things: lake build on the project at formal/lean4/Chio/, a literal sorry scan across every shipped Lean module, and a Python pass that checks the theorem inventory and the manifest agree.
./scripts/check-formal-proofs.sh==> Lean 4 proof build
info: leanprover/lean4:v4.28.0-rc1: trusting cached toolchain
Build completed successfully.
==> Lean 4 placeholder scan
==> Proof manifest and theorem inventory sanity
formal proof check passedWall clock: 10 to 20 seconds with a hot lake cache, 3 to 5 minutes on first run.
Aeneas pilot, production, equivalence
Three scripts cover the Aeneas lanes. The pilot reads formal/aeneas/verified_core.rs and reports six extracted symbols. The production lane reads crates/chio-kernel-core/src/formal_aeneas.rs and reports the fifteen extracted symbols listed in formal/aeneas/production.toml. The equivalence lane elaborates Chio.Proofs.AeneasEquivalence and writes an artifact JSON with the SHA-256 of every generated Lean file.
./scripts/check-aeneas-pilot.sh
./scripts/check-aeneas-production.sh
./scripts/check-aeneas-equivalence.shAeneas equivalence gate passedThe equivalence script writes target/formal/aeneas-production/equivalence-artifacts.json with three sha256 entries (the source rust file, the generated Funs.lean, and the generated Types.lean), the expected symbol list, and a schema tag of chio.aeneas-equivalence-artifacts.v1. Wall clock: 5 to 10 seconds for equivalence, 15 to 30 seconds for each extraction.
Kani public lane
scripts/check-kani-public-core.sh reads the harness list from kani_public_harnesses.rs, confirms every expected name is present, and runs each one with --default-unwind 8 --no-unwinding-checks. The total wall clock is roughly 134 seconds with a hot cargo cache per the timing table in the toml file.
./scripts/check-kani-public-core.shSUMMARY:
** 0 of N failed (0 unreachable)
VERIFICATION:- SUCCESSFUL
Verification Time: 31.247s
...
SUMMARY:
** 0 of N failed (0 unreachable)
VERIFICATION:- SUCCESSFUL
Verification Time: 1.041s
Kani public core harnesses passedThe script invokes cargo kani -p chio-kernel-core --lib --harness <name> once per name; nine of the fourteen harnesses come from the bash array in the script, and the remaining five algebraic harnesses live alongside them in the same source file. Per-harness budgets run from 1 second (verify_delegation_chain_step) to 31 seconds (public_verify_capability_rejects_untrusted_issuer_before_signature).
Creusot plus required Kani
scripts/check-rust-verification-gates.sh is the umbrella for the required-strict Rust refinement lanes named in rust_refinement_lanes in the manifest. It invokes the Creusot smoke and core harnesses plus the proof-facing Kani harnesses in kani_harnesses.rs.
./scripts/check-rust-verification-gates.shAdapter no-bypass
Every protocol adapter must route every tool call through chio_kernel_core::evaluate. The script greps the adapters for the literal call and fails the build if any one bypasses it.
./scripts/check-adapter-no-bypass.shDifferential tests
The differential test crate at formal/diff-tests/ uses proptest to compare the reference spec in src/spec.rs against the production Rust on randomly generated input pairs. Default sample count is 256 per property; override with the PROPTEST_CASES env var.
cargo test -p chio-formal-diff-testsrunning 8 tests
test scope_subset_spec_matches_impl ... ok
test grant_subset_spec_matches_impl ... ok
test resource_grant_subset_spec_matches_impl ... ok
test prompt_grant_subset_spec_matches_impl ... ok
test normalized_scope_projection_matches_spec ... ok
test normalized_scope_subset_spec_matches_impl ... ok
test normalized_grant_projection_matches_spec ... ok
test normalized_grant_subset_spec_matches_impl ... ok
test result: ok. 8 passed; 0 failed; 0 ignoredPortable-kernel parity
Chio ships three kernel variants: chio-kernel (full), chio-kernel-core (proof target), and chio-kernel-browser / chio-kernel-mobile (portable shells). The portable-kernel check ensures every variant produces the same verdict on a fixture input set.
./scripts/check-portable-kernel.shProof report aggregator
scripts/check-proof-report.sh is the final aggregator. It reads the artifacts every prior gate wrote to target/formal/ and emits a release-facing JSON report tagged chio.proof-report.v1.
./scripts/check-proof-report.shTLA+ via Apalache (separate lane)
The TLA+ safety lane is gated separately because Apalache wall clock dwarfs the rest. From the manifest's sample stdout the lane runs to length 16 and finishes in roughly three hours.
apalache check \
--inv=SafetyInv \
--config=formal/tla/MCRevocationPropagation.cfg \
formal/tla/RevocationPropagation.tlaPASS #11: BoundedChecker
Checker reports no error up to computation length 16
It took me 0 days 3 hours 12 min 47 sec
Total time: 11567.881 sec
EXITCODE: OKThe aggregate SafetyInv is the conjunction of three named invariants from formal/tla/RevocationPropagation.tla: NoAllowAfterRevoke (every allow receipt was issued before the issuing authority observed any revocation), MonotoneLog (per-authority receipt-log timestamps strictly increase), and AttenuationPreserving (depth stays in 0..DEPTH_MAX and any attenuated cap has been delegated at least once).
Run All Gates At Once
There is no single make target that drives every gate, by design: the manifest is the source of truth and the CI workflow files invoke gates one by one so a failure surfaces against a named gate. To reproduce the assurance posture in one shell, run the ten commands from gate_commands in order:
# From formal/proof-manifest.toml gate_commands, run in order.
./scripts/check-formal-proofs.sh
./scripts/check-aeneas-pilot.sh
./scripts/check-aeneas-production.sh
./scripts/check-aeneas-equivalence.sh
./scripts/check-rust-verification-gates.sh
./scripts/check-kani-public-core.sh
./scripts/check-adapter-no-bypass.sh
cargo test -p chio-formal-diff-tests
./scripts/check-portable-kernel.sh
./scripts/check-proof-report.shOn a hot-cache laptop the sequence above takes 8 to 12 minutes end to end. Add the Apalache lane on a separate background job:
( apalache check \
--inv=SafetyInv \
--config=formal/tla/MCRevocationPropagation.cfg \
formal/tla/RevocationPropagation.tla \
> apalache.log 2>&1 ) &
echo "apalache pid: $!"If a Gate Fails
Every gate has a known failure shape. The Failure Modes page catalogs each one with the literal error format and a triage path. Common cases:
- A
sorryscan hit means a Lean proof ships unfinished. The placeholder scan inscripts/check-formal-proofs.shfails the build. - A Kani
VERIFICATION:- FAILEDline means CBMC found a counterexample within the unwind bound. See Failure Modes for how to read the counterexample trace and convert it to a regression test with--concrete-playback inplace. - A diff-test divergence prints both spec and impl values plus the shrunken proptest input. The crate writes regression seeds under
formal/diff-tests/proptest-regressions/automatically. - An Apalache invariant violation prints the offending state path to stdout and writes a
counterexample1.tlafile. Read it as a state-machine trace fromInitthrough the firstNextstep that breaks the invariant.
The manifest is the source of truth
formal/proof-manifest.toml. The manifest names every root module, every gate command, every property ID, and every required theorem. Drift between the manifest and the source is itself a gate failure.Continue
- P1 Tour: Capability Monotonicity · trace the keystone safety property through every assurance tool.
- Theorem Inventory · the full table of theorems by ID, statement, file, claim class, and property mapping.
- Failure Modes · what each gate failure looks like and how to triage.
- Formal Assurance Overview · the assurance pyramid, the proof boundary, and the audited assumptions.