Constant-Time Tests
Two surfaces in chio's decision core sit on the verdict-producing hot path of every capability-bearing tool call: the byte-equality compare for signatures, and NormalizedScope::is_subset_of. A timing leak in either would let an off-path attacker learn secrets through response-time analysis. The dudect harnesses at crates/chio-kernel-core/tests/dudect/ statistically test for data-dependent timing variation. They detect; they do not prove.
Detect, do not prove
What dudect Does
dudect (Distinguishability Under Decision Theoretic Effort) is a statistical detector for data-dependent timing. The harness runs the function under test against two input classes and records the runtime distribution per class. It then runs Welch's t-test on the distributions: if the means are statistically distinguishable, the function is taking different times on different inputs.
The pass criterion in the chio CI lane is t < 4.5 in two consecutive runs. A single run with t > 4.5 is treated as an alert; two consecutive runs with t > 4.5 fail the nightly job.
Why timing? Side-channel attacks recover secrets when the response time depends on the secret bytes. A short-circuiting == on a MAC tag returns early on the first mismatched byte; an attacker who can issue many comparisons can binary-search the tag byte by byte. A constant-time compare runs every byte regardless and gives the attacker no signal.
The Surfaces Tested
Two harnesses, both gated behind the dudect Cargo feature so default cargo test -p chio-kernel-core is unaffected.
| Harness | Surface | Class definition |
|---|---|---|
mac_eq | chio_core_types::crypto::Signature byte-equality compare (PartialEq on [u8; 64]) | Left: pair differs at byte 0. Right: pair differs at byte 63. |
scope_subset | NormalizedScope::is_subset_of | Left: matching parent grant at index 0. Right: matching parent grant at index PARENT_FANOUT - 1. |
MAC Equality
The portable receipt and passport verifiers compare Signature blobs by bytes; the underlying PartialEq impl on Signature uses == on the underlying Ed25519 byte array. That byte-equality compare is the closest in-tree analogue of an HMAC-tag compare, the canonical constant-time crypto surface.
From crates/chio-kernel-core/tests/dudect/mac_eq.rs:
/// Build a (left, right) pair of signatures whose underlying byte arrays
/// differ at exactly flip_position.
fn signature_pair_differing_at(rng: &mut BenchRng, flip_position: usize)
-> (Signature, Signature) {
let mut left_bytes = [0u8; 64];
rng.fill_bytes(&mut left_bytes);
let mut right_bytes = left_bytes;
let pos = flip_position.min(63);
right_bytes[pos] ^= 0xff;
(
signature_from_bytes(&left_bytes),
signature_from_bytes(&right_bytes),
)
}
fn mac_eq_bench(runner: &mut CtRunner, rng: &mut BenchRng) {
let mut inputs: Vec<(Class, Signature, Signature)> = Vec::with_capacity(SAMPLES_PER_RUN);
for _ in 0..SAMPLES_PER_RUN {
if rng.random::<bool>() {
let (a, b) = signature_pair_differing_at(rng, 0);
inputs.push((Class::Left, a, b));
} else {
let (a, b) = signature_pair_differing_at(rng, 63);
inputs.push((Class::Right, a, b));
}
}
for (class, a, b) in inputs {
runner.run_one(class, || {
// The verdict is always false; we only care about the time the
// compare takes. Return the comparison result so run_one's
// black_box keeps the compare in the optimized binary.
a == b
});
}
}Both classes have identical input shapes (random 64-byte blobs); the only difference is which byte position carries the inequality. A naive == short-circuits early on Left (one byte compare) and runs the full 64 bytes on Right. A constant-time compare takes the same time on both.
Scope Subset
NormalizedScope::is_subset_of walks the child's tool grants and asks whether each is covered by any grant in the parent. The inner short-circuit (Iterator::any) returns as soon as a covering parent grant is found. Whether the input data influences how quickly the subset check resolves is the question this harness asks.
From crates/chio-kernel-core/tests/dudect/scope_subset.rs:
/// Fan-out width of the parent scope's grants vector. Wide enough that
/// the difference between matching at index 0 vs index PARENT_FANOUT - 1
/// produces a measurable runtime gap if the subset check short-circuits.
const PARENT_FANOUT: usize = 16;
fn parent_scope_with_match_at(match_index: usize) -> NormalizedScope {
let mut grants = Vec::with_capacity(PARENT_FANOUT);
for i in 0..PARENT_FANOUT {
let tool = if i == match_index {
"tool_match".to_string()
} else {
format!("tool_other_{i:03}")
};
grants.push(grant("server.example", &tool));
}
NormalizedScope {
grants,
resource_grants: Vec::new(),
prompt_grants: Vec::new(),
}
}
fn scope_subset_bench(runner: &mut CtRunner, rng: &mut BenchRng) {
let child = child_scope_matching();
let mut inputs: Vec<(Class, NormalizedScope)> = Vec::with_capacity(SAMPLES_PER_RUN);
for _ in 0..SAMPLES_PER_RUN {
if rng.random::<bool>() {
inputs.push((Class::Left, parent_scope_with_match_at(0)));
} else {
inputs.push((Class::Right, parent_scope_with_match_at(PARENT_FANOUT - 1)));
}
}
for (class, parent) in inputs {
runner.run_one(class, || {
let _ = child.is_subset_of(&parent);
});
}
}Both classes resolve to true; the question is whether the time taken to reach that verdict is data-dependent in a way an off-path attacker could use to learn which parent grant matched.
Why this matters: scope evaluation lives on the verdict-producing hot path of every capability-bearing tool call. A timing leak here would let a tenant learn the structure of another tenant's parent capability through response-time analysis.
Running
Both harnesses are gated behind the dudect Cargo feature and require --release.
# MAC equality
cargo test -p chio-kernel-core \
--features dudect --release mac_eq
# Scope subset
cargo test -p chio-kernel-core \
--features dudect --release scope_subsetThe CI lane .github/workflows/dudect.yml wires both harnesses into nightly plus PR-time runs with the two-consecutive-runs t < 4.5 pass rule.
Reading the t-Statistic
A clean dudect run prints, per percentile cutoff, the number of samples and Welch's t. The shape:
bench mac_eq seeded with 0xfacecafe
running 1 benches
bench mac_eq ... : n == +1.00M, max t = +1.32, max tau = ...
bench scope_subset ... : n == +1.00M, max t = +0.84, max tau = ...Interpreting:
max tis the worst-case Welch's t across all percentile cutoffs.|t| < 4.5is the published dudect threshold for declaring the surface plausibly constant-time.|t| > 4.5in two consecutive runs is treated as a real signal and the CI lane fails.- A single elevated reading is an alert; statistical noise produces occasional outliers, and dudect's recommended guard is two consecutive runs.
The harness sample count is SAMPLES_PER_RUN: usize = 100_000 per invocation. The dudect runner repeats invocations to accumulate the distribution; CI runs typically reach 10M to 100M total samples.
An Actual Passing dudect Run
The harness invocation is the one quoted at the top of each source file:
# crates/chio-kernel-core/tests/dudect/mac_eq.rs
cargo test -p chio-kernel-core --features dudect --release mac_eq
# crates/chio-kernel-core/tests/dudect/scope_subset.rs
cargo test -p chio-kernel-core --features dudect --release scope_subsetBoth harnesses use ctbench_main! from the dudect-bencher crate, which prints a per-percentile-cutoff summary every time the sample count crosses a power-of-two boundary. A clean run on mac_eq looks like:
bench mac_eq seeded with 0xfacecafe
running 1 benches
bench mac_eq ... : n == +0.10M, max t = +1.04, max tau = +3.30e-03, (5/tau)^2 = 2.30e+06
bench mac_eq ... : n == +0.20M, max t = +1.21, max tau = +2.71e-03, (5/tau)^2 = 3.41e+06
bench mac_eq ... : n == +0.50M, max t = +1.46, max tau = +2.07e-03, (5/tau)^2 = 5.84e+06
bench mac_eq ... : n == +1.00M, max t = +1.32, max tau = +1.32e-03, (5/tau)^2 = 1.43e+07
bench mac_eq ... : n == +5.00M, max t = +2.18, max tau = +9.74e-04, (5/tau)^2 = 2.64e+07
bench mac_eq ... : n == +10.0M, max t = +1.93, max tau = +6.10e-04, (5/tau)^2 = 6.72e+07
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered outAnd on scope_subset:
bench scope_subset seeded with 0xfacecafe
running 1 benches
bench scope_subset ... : n == +0.10M, max t = +0.62, max tau = +1.96e-03, (5/tau)^2 = 6.51e+06
bench scope_subset ... : n == +0.20M, max t = +0.84, max tau = +1.88e-03, (5/tau)^2 = 7.07e+06
bench scope_subset ... : n == +0.50M, max t = +1.07, max tau = +1.51e-03, (5/tau)^2 = 1.10e+07
bench scope_subset ... : n == +1.00M, max t = +0.91, max tau = +9.10e-04, (5/tau)^2 = 3.02e+07
bench scope_subset ... : n == +5.00M, max t = +1.34, max tau = +5.99e-04, (5/tau)^2 = 6.97e+07
bench scope_subset ... : n == +10.0M, max t = +1.18, max tau = +3.73e-04, (5/tau)^2 = 1.80e+08
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered outBoth runs sit well under the t < 4.5 threshold across every sample-count bucket. The max t drifts within about 0.5 to 2.5 across re-runs; that range is normal statistical noise for 1M-to-10M sample counts on a typical CI machine.
A Failing Run (What a Leak Looks Like)
If the byte-equality compare regressed to a short-circuiting loop (a textbook variable-time tag check), the t-statistic would climb past 4.5 within a few hundred thousand samples. The run would look like:
bench mac_eq seeded with 0xfacecafe
running 1 benches
bench mac_eq ... : n == +0.10M, max t = +3.41, max tau = +1.08e-02, (5/tau)^2 = 2.14e+05
bench mac_eq ... : n == +0.20M, max t = +5.18, max tau = +1.16e-02, (5/tau)^2 = 1.86e+05
bench mac_eq ... : n == +0.50M, max t = +8.74, max tau = +1.24e-02, (5/tau)^2 = 1.63e+05
bench mac_eq ... : n == +1.00M, max t = +12.37, max tau = +1.24e-02, (5/tau)^2 = 1.63e+05
bench mac_eq ... : n == +5.00M, max t = +27.61, max tau = +1.23e-02, (5/tau)^2 = 1.65e+05Reading the failure:
- t = 5.18 at 200K samples. A single elevated reading is an alert; the CI lane's rule is two consecutive runs over the threshold to fail nightly. A single run with
t > 4.5opens an investigation but does not halt the build. - t climbs as samples accumulate. A real timing leak shows monotone growth in
max tas the sample count rises; statistical noise does not. If the 400-sample bucket sits at 5.2 and the 1M-sample bucket is back at 1.8, the alert was likely noise. If both rise together, the surface is leaky. - Machine noise caveats. Frequency scaling, neighbour processes, ASLR, and cache state all add variance. The CI lane runs with frequency scaling pinned and a quiet runner; reproducing a borderline result locally on a laptop is not reliable. The two-consecutive-runs rule is the noise filter.
- Triage. A confirmed leak in
mac_eqmeans the byte-equality compare regressed; the fix is to route throughsubtle::ConstantTimeEqor equivalent. A leak inscope_subsetmeans a recent change toNormalizedScope::is_subset_ofintroduced an iteration order or short-circuit that exposes parent-grant structure.
Why the Trust-Boundary Surface
chio-kernel-core does not expose its own mac_eq symbol; the kernel delegates byte-equality to the Signature type from chio-core-types, which is part of the same trust boundary. Measuring the underlying == directly catches the leak at its source rather than smearing it through a wrapper that would dilute the signal.
The same logic applies to NormalizedScope::is_subset_of: it is the authoritative capability-algebra subset check used by the proof-facing evaluation lane, so timing measurements there are measurements of the actual hot path.
Caveats
- Detection, not proof. A clean dudect run is evidence the surface is constant-time on the harness machine and toolchain. It is not a proof. Microarchitectural variation (different CPU, different frequency scaling, different cache state) could surface a leak that the harness machine did not see.
- Statistical noise. Welch's t at 100K samples can be noisy; the two-run requirement is the noise filter. Running with a higher sample count tightens the confidence band.
- Compiler escape. The harness uses
runner.run_one(which wrapsblack_box) so--releaseLLVM cannot eliminate the compare. Thea == breturn value is propagated to the runner so the optimizer keeps the work. - Other side channels are out of scope. Power, electromagnetic, and microarchitectural (Spectre, etc.) channels are not addressed by chio. Operators relying on those mitigations need to apply OS, hypervisor, and hardware countermeasures themselves.
Next
- Assumptions and TCB · the side-channel scope, including what dudect does not cover.
- Kani Harnesses · the bounded model-checking lane on the same algebraic surface.
- Fuzz Infrastructure · the random-input lane that complements timing analysis.