Constant-Time Tests · Chio Docs

Detect, do not prove

dudect is a statistical timing-leak detector. A clean run is evidence that the measured code path is constant-time on the harness machine and toolchain; it is not a proof that the code path is constant-time everywhere. Power and electromagnetic side channels are out of scope. See the Assumptions and TCB page for the full side-channel scope.

What dudect Does

dudect (Distinguishability Under Decision Theoretic Effort) is a statistical detector for data-dependent timing. The harness runs the function under test against two input classes and records the runtime distribution per class. It then runs Welch's t-test on the distributions: if the means are statistically distinguishable, the function is taking different times on different inputs.

The pass criterion in the chio CI lane is t < 4.5 in two consecutive runs. A single run with t > 4.5 is treated as an alert; two consecutive runs with t > 4.5 fail the nightly job.

Timing can expose secrets when the response time depends on the secret bytes. A short-circuiting == on a MAC tag returns early on the first mismatched byte; an attacker who can issue many comparisons can binary-search the tag byte by byte. A constant-time compare runs every byte regardless and gives the attacker no timing difference in this comparison.

Functions Tested

Three harnesses across two crates, each gated behind the dudect Cargo feature so a default cargo test is unaffected.

Harness	Code path	Class definition
`mac_eq`	`chio_core_types::crypto::Signature` byte-equality compare (`PartialEq` on `[u8; 64]`)	Left: pair differs at byte 0. Right: pair differs at byte 63.
`scope_subset`	`NormalizedScope::is_subset_of`	Left: matching parent grant at index 0. Right: matching parent grant at index `PARENT_FANOUT - 1`.
`jwt_verify`	`chio_credentials::verify_chio_passport_jwt_vc_json` rejection path (crate `chio-credentials`)	Left: all-zero compact byte string, rejected at the first base64url segment split. Right: random ASCII of the same length, same fail-closed verdict but a different parse path.

MAC Equality

The portable receipt and passport verifiers compare Signature blobs by bytes; the underlying PartialEq impl on Signature uses == on the underlying Ed25519 byte array. That byte-equality compare is the closest in-tree analogue of an HMAC-tag compare, the canonical constant-time cryptographic code path.

From crates/kernel/chio-kernel-core/tests/dudect/mac_eq.rs:

crates/kernel/chio-kernel-core/tests/dudect/mac_eq.rs

/// Build a (left, right) pair of signatures whose underlying byte arrays
/// differ at exactly flip_position.
fn signature_pair_differing_at(rng: &mut BenchRng, flip_position: usize)
    -> (Signature, Signature) {
    let mut left_bytes = [0u8; 64];
    rng.fill_bytes(&mut left_bytes);
    let mut right_bytes = left_bytes;
    let pos = flip_position.min(63);
    right_bytes[pos] ^= 0xff;
    (
        signature_from_bytes(&left_bytes),
        signature_from_bytes(&right_bytes),
    )
}

fn mac_eq_bench(runner: &mut CtRunner, rng: &mut BenchRng) {
    let mut inputs: Vec<(Class, Signature, Signature)> = Vec::with_capacity(SAMPLES_PER_RUN);
    for _ in 0..SAMPLES_PER_RUN {
        if rng.random::<bool>() {
            let (a, b) = signature_pair_differing_at(rng, 0);
            inputs.push((Class::Left, a, b));
        } else {
            let (a, b) = signature_pair_differing_at(rng, 63);
            inputs.push((Class::Right, a, b));
        }
    }

    for (class, a, b) in inputs {
        runner.run_one(class, || {
            // The verdict is always false; we only care about the time the
            // compare takes. Return the comparison result so run_one's
            // black_box keeps the compare in the optimized binary.
            a == b
        });
    }
}

Both classes have identical input shapes (random 64-byte blobs); the only difference is which byte position carries the inequality. A naive == short-circuits early on Left (one byte compare) and runs the full 64 bytes on Right. A constant-time compare takes the same time on both.

Scope Subset

NormalizedScope::is_subset_of walks the child's tool grants and asks whether each is covered by any grant in the parent. The inner short-circuit (Iterator::any) returns as soon as a covering parent grant is found. Whether the input data influences how quickly the subset check resolves is the question this harness asks.

From crates/kernel/chio-kernel-core/tests/dudect/scope_subset.rs:

crates/kernel/chio-kernel-core/tests/dudect/scope_subset.rs

/// Fan-out width of the parent scope's grants vector. Wide enough that
/// the difference between matching at index 0 vs index PARENT_FANOUT - 1
/// produces a measurable runtime gap if the subset check short-circuits.
const PARENT_FANOUT: usize = 16;

fn parent_scope_with_match_at(match_index: usize) -> NormalizedScope {
    let mut grants = Vec::with_capacity(PARENT_FANOUT);
    for i in 0..PARENT_FANOUT {
        let tool = if i == match_index {
            "tool_match".to_string()
        } else {
            format!("tool_other_{i:03}")
        };
        grants.push(grant("server.example", &tool));
    }
    NormalizedScope {
        grants,
        resource_grants: Vec::new(),
        prompt_grants: Vec::new(),
    }
}

fn scope_subset_bench(runner: &mut CtRunner, rng: &mut BenchRng) {
    let child = child_scope_matching();

    let mut inputs: Vec<(Class, NormalizedScope)> = Vec::with_capacity(SAMPLES_PER_RUN);
    for _ in 0..SAMPLES_PER_RUN {
        if rng.random::<bool>() {
            inputs.push((Class::Left, parent_scope_with_match_at(0)));
        } else {
            inputs.push((Class::Right, parent_scope_with_match_at(PARENT_FANOUT - 1)));
        }
    }

    for (class, parent) in inputs {
        runner.run_one(class, || {
            let _ = child.is_subset_of(&parent);
        });
    }
}

Both classes resolve to true; the question is whether the time taken to reach that verdict is data-dependent in a way an off-path attacker could use to learn which parent grant matched.

Scope evaluation runs on the verdict-producing hot path for capability-bearing tool calls. A timing leak here would let a tenant learn the structure of another tenant's parent capability through response-time analysis.

JWT VC Signature Verification

chio_credentials::verify_chio_passport_jwt_vc_json is the compact-JWT verifier behind Chio passport verifiable credentials. It is fail-closed: no arbitrary byte stream passes the issuer signature check. The harness asks whether the rejection path is constant-time with respect to the input contents.

From crates/trust/chio-credentials/tests/dudect/jwt_verify.rs, the two classes are the same length so only the contents differ:

Class::Left · an all-zero compact byte string. The compact-JWT decoder rejects it at the first base64url segment split, so the rejection path is short.
Class::Right · a random printable-ASCII string of the same length. Same fail-closed verdict, but the parse path may run a different number of base64url-decode or serde_json steps before failing.

The issuer keypair is materialized once from a fixed seed so signature-mismatch noise stays out of the timing distribution. A distinguishable Left/Right distribution would mean the verifier leaks something about why a candidate JWT was rejected.

Running

All three harnesses are gated behind the dudect Cargo feature and require --release.

run dudect harnesses

# MAC equality
cargo test -p chio-kernel-core \
  --features dudect --release mac_eq

# Scope subset
cargo test -p chio-kernel-core \
  --features dudect --release scope_subset

# JWT VC verify (chio-credentials)
cargo test -p chio-credentials \
  --features dudect --release jwt_verify

The CI lane .github/workflows/dudect.yml runs all three harnesses on a nightly schedule (and on manual workflow_dispatch); there is no pull-request trigger. The gate applies the two-consecutive-runs t < 4.5 pass rule across consecutive nightly runs.

Reading the t-Statistic

A clean dudect run prints, per percentile cutoff, the number of samples and Welch's t. The shape:

dudect stdout (clean run)

bench mac_eq seeded with 0xfacecafe
running 1 benches
bench mac_eq                       ... : n == +1.00M, max t = +1.32, max tau = ...
bench scope_subset                 ... : n == +1.00M, max t = +0.84, max tau = ...

Interpreting:

max t is the worst-case Welch's t across all percentile cutoffs.
|t| < 4.5 is the published dudect threshold for declaring the code path plausibly constant-time.
|t| > 4.5 in two consecutive runs is treated as a real signal and the CI lane fails.
A single elevated reading is an alert; statistical noise produces occasional outliers, and dudect's recommended guard is two consecutive runs.

The harness sample count is SAMPLES_PER_RUN: usize = 100_000 per invocation. The dudect runner repeats invocations to accumulate the distribution; CI runs typically reach 10M to 100M total samples.

Example dudect Run

The harness invocation is the one quoted at the top of each source file:

harness invocation

# crates/kernel/chio-kernel-core/tests/dudect/mac_eq.rs
cargo test -p chio-kernel-core --features dudect --release mac_eq

# crates/kernel/chio-kernel-core/tests/dudect/scope_subset.rs
cargo test -p chio-kernel-core --features dudect --release scope_subset

# crates/trust/chio-credentials/tests/dudect/jwt_verify.rs
cargo test -p chio-credentials --features dudect --release jwt_verify

The three harnesses use ctbench_main! from the dudect-bencher crate, which prints a per-percentile-cutoff summary every time the sample count crosses a power-of-two boundary. A clean run on mac_eq looks like:

dudect stdout (mac_eq passing)

bench mac_eq seeded with 0xfacecafe
running 1 benches
bench mac_eq          ... : n == +0.10M, max t = +1.04, max tau = +3.30e-03, (5/tau)^2 = 2.30e+06
bench mac_eq          ... : n == +0.20M, max t = +1.21, max tau = +2.71e-03, (5/tau)^2 = 3.41e+06
bench mac_eq          ... : n == +0.50M, max t = +1.46, max tau = +2.07e-03, (5/tau)^2 = 5.84e+06
bench mac_eq          ... : n == +1.00M, max t = +1.32, max tau = +1.32e-03, (5/tau)^2 = 1.43e+07
bench mac_eq          ... : n == +5.00M, max t = +2.18, max tau = +9.74e-04, (5/tau)^2 = 2.64e+07
bench mac_eq          ... : n == +10.0M, max t = +1.93, max tau = +6.10e-04, (5/tau)^2 = 6.72e+07

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

And on scope_subset:

dudect stdout (scope_subset passing)

bench scope_subset seeded with 0xfacecafe
running 1 benches
bench scope_subset    ... : n == +0.10M, max t = +0.62, max tau = +1.96e-03, (5/tau)^2 = 6.51e+06
bench scope_subset    ... : n == +0.20M, max t = +0.84, max tau = +1.88e-03, (5/tau)^2 = 7.07e+06
bench scope_subset    ... : n == +0.50M, max t = +1.07, max tau = +1.51e-03, (5/tau)^2 = 1.10e+07
bench scope_subset    ... : n == +1.00M, max t = +0.91, max tau = +9.10e-04, (5/tau)^2 = 3.02e+07
bench scope_subset    ... : n == +5.00M, max t = +1.34, max tau = +5.99e-04, (5/tau)^2 = 6.97e+07
bench scope_subset    ... : n == +10.0M, max t = +1.18, max tau = +3.73e-04, (5/tau)^2 = 1.80e+08

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Both runs sit well under the t < 4.5 threshold across every sample-count bucket. The max t drifts within about 0.5 to 2.5 across re-runs. That range is expected statistical variation for 1M-to-10M sample counts on a typical CI machine.

A Failing Run (What a Leak Looks Like)

If the byte-equality compare regressed to a short-circuiting loop (a textbook variable-time tag check), the t-statistic would climb past 4.5 within a few hundred thousand samples. The run would look like:

dudect stdout (mac_eq leak)

bench mac_eq seeded with 0xfacecafe
running 1 benches
bench mac_eq          ... : n == +0.10M, max t = +3.41, max tau = +1.08e-02, (5/tau)^2 = 2.14e+05
bench mac_eq          ... : n == +0.20M, max t = +5.18, max tau = +1.16e-02, (5/tau)^2 = 1.86e+05
bench mac_eq          ... : n == +0.50M, max t = +8.74, max tau = +1.24e-02, (5/tau)^2 = 1.63e+05
bench mac_eq          ... : n == +1.00M, max t = +12.37, max tau = +1.24e-02, (5/tau)^2 = 1.63e+05
bench mac_eq          ... : n == +5.00M, max t = +27.61, max tau = +1.23e-02, (5/tau)^2 = 1.65e+05

Reading the failure:

t = 5.18 at 200K samples. A single elevated reading is an alert; the CI lane's rule is two consecutive runs over the threshold to fail nightly. A single run with t > 4.5 opens an investigation but does not halt the build.
t climbs as samples accumulate. A timing leak commonly produces monotone growth in max t as the sample count rises; statistical noise does not. If the 400-sample bucket sits at 5.2 and the 1M-sample bucket is back at 1.8, the alert was likely noise. If both rise together, the measured code path has a timing leak.
Machine noise caveats. Frequency scaling, neighbour processes, ASLR, and cache state all add variance. The CI lane runs with frequency scaling pinned and a quiet runner; reproducing a borderline result locally on a laptop is not reliable. The two-consecutive-runs rule is the noise filter.
Triage. A confirmed leak in mac_eq means the byte-equality compare regressed; the fix is to route through subtle::ConstantTimeEq or equivalent. A leak in scope_subset means a recent change to NormalizedScope::is_subset_of introduced an iteration order or short-circuit that exposes parent-grant structure.

Timing-Sensitive Trust-Boundary Code

chio-kernel-core does not expose its own mac_eq symbol; the kernel delegates byte-equality to the Signature type from chio-core-types, which is part of the same trust boundary. Measuring the underlying == directly catches data-dependent behavior in the implementation under test, without a wrapper that can change the measurement.

The same logic applies to NormalizedScope::is_subset_of: it is the authoritative capability-algebra subset check used by the proof-facing evaluation lane, so timing measurements there measure the relevant evaluation path.

Caveats

Detection, not proof. A clean dudect run is evidence the measured code path is constant-time on the harness machine and toolchain. It is not a proof. Microarchitectural variation (different CPU, different frequency scaling, different cache state) could surface a leak that the harness machine did not see.
Statistical noise. Welch's t at 100K samples can be noisy; the two-run requirement is the noise filter. Running with a higher sample count tightens the confidence band.
Compiler escape. The harness uses runner.run_one (which wraps black_box) so --release LLVM cannot eliminate the compare. The a == b return value is propagated to the runner so the optimizer keeps the work.
Other side channels are out of scope. Power, electromagnetic, and microarchitectural (Spectre, etc.) channels are not addressed by chio. Operators relying on those mitigations need to apply OS, hypervisor, and hardware countermeasures themselves.

Assumptions and TCB · the side-channel scope, including what dudect does not cover.
Kani Harnesses · the bounded model-checking lane for the same algebraic helpers.
Fuzz Infrastructure · the random-input lane that complements timing analysis.