Rate Limit Guards · Chio Docs

The token bucket

Both guards embed a private TokenBucket with the same fields and arithmetic. From crates/guards/chio-guards/src/velocity.rs and agent_velocity.rs:

crates/guards/chio-guards/src/velocity.rs

const MT_PER_TOKEN: u64 = 1_000;

struct TokenBucket {
    capacity_mt: u64,
    tokens_mt: u64,
    refill_rate_mpm: u64, // milli-tokens per millisecond
    last_refill: Instant,
}

One logical token equals 1,000 milli-tokens. The refill rate is derived as:

crates/guards/chio-guards/src/velocity.rs

let window_ms = window_secs.saturating_mul(1_000).max(1);
let refill_rate_mpm = (max_per_window.saturating_mul(MT_PER_TOKEN))
    .checked_div(window_ms)
    .unwrap_or(1)
    .max(1);

The bucket starts full at capacity_mt. Every try_consume call first refills based on elapsed wall time, then attempts to deduct the requested cost in milli-tokens. The minimum refill rate is one milli-token per millisecond so very slow rates still make progress.

Wall-clock based, not request-counter based

Refill is keyed off Instant::now(). Long-lived guards keep working under clock skew because Instant is monotonic. There is no per-request decay step; refill is opportunistic at consume time.

VelocityGuard

Source: crates/guards/chio-guards/src/velocity.rs. Guard name: velocity. Keys buckets by (capability_id, grant_index) so each grant within a capability has its own bucket.

Struct

crates/guards/chio-guards/src/velocity.rs

#[derive(Clone, Debug)]
pub struct VelocityConfig {
    pub max_invocations_per_window: Option<u32>,
    pub max_spend_per_window: Option<u64>,
    pub window_secs: u64,
    pub burst_factor: f64,
}

pub struct VelocityGuard {
    // Both maps live behind ONE mutex so the combined-cap check and
    // the insert are atomic across them.
    state: Mutex<VelocityState>,
    config: VelocityConfig,
    bucket_cap: usize,
}

struct VelocityState {
    invocation_buckets: HashMap<(String, usize), TokenBucket>,
    spend_buckets: HashMap<(String, usize), TokenBucket>,
}

Both bucket maps use one Mutex<VelocityState> mutex. It makes the combined-cap check and insert atomic across both maps, so no evaluate call ever samples a stale sibling-map size. bucket_cap bounds the combined number of buckets across both maps; it defaults to chio_kernel::MemoryBudgetConfig::defaults().velocity_bucket_cap, so lowering the process memory budget tightens the table.

Defaults

Knob	Type	Default	Purpose
`max_invocations_per_window`	`Option<u32>`	`None`	Hard ceiling on logical tokens added per window. `None` = unlimited.
`max_spend_per_window`	`Option<u64>`	`None`	Maximum monetary units spendable per window. Requires `max_cost_per_invocation` on the matched grant.
`window_secs`	`u64`	`60`	Window length. Floored at 1 second internally.
`burst_factor`	`f64`	`1.0`	Capacity multiplier. `1.0` = no burst above steady rate.

Bucket capacity

Verified from evaluate:

crates/guards/chio-guards/src/velocity.rs

let capacity = ((max_inv as f64 * self.config.burst_factor)
    .round() as u64)
    .max(1);

With a 100/min limit and burst_factor = 1.5, capacity is 150 logical tokens (150,000 milli-tokens). The refill rate stays 100 logical tokens per minute. So an idle bucket can burst up to 150 consecutive calls, then settles to a steady 100/min.

Spend limiting

When max_spend_per_window is set, the guard reads grant.max_cost_per_invocation.units from the matched grant and consumes that many tokens from the spend bucket. Verified from planned_spend_units:

crates/guards/chio-guards/src/velocity.rs

fn planned_spend_units(ctx: &GuardContext) -> Result<u64, KernelError> {
    let grant_index = ctx.matched_grant_index.ok_or_else(|| {
        KernelError::Internal(
            "velocity guard spend limiting requires matched_grant_index".to_string(),
        )
    })?;
    let grant = ctx.scope.grants.get(grant_index).ok_or_else(|| {
        KernelError::Internal(format!(
            "velocity guard could not resolve grant index {grant_index}"
        ))
    })?;
    grant
        .max_cost_per_invocation
        .as_ref()
        .map(|amount| amount.units)
        .ok_or_else(|| {
            KernelError::Internal(
                "velocity guard spend limiting requires max_cost_per_invocation \
                 on the matched grant".to_string(),
            )
        })
}

Three things must be true for spend limiting to fire: max_spend_per_window is Some, matched_grant_index is Some, and the matched grant carries max_cost_per_invocation. Anything else returns Err which the kernel treats as a deny.

Two-phase reserve and commit

evaluate resolves the invocation and spend buckets together under one state lock in three phases, so a denial from one limit does not affect the other:

Resolve and reject impossible spend. It reads planned_spend_units first. A planned cost larger than the spend bucket's own burst ceiling cannot fit, regardless of wait time, so it denies before creating a bucket. An unaffordable spend cannot occupy a table slot.
Reserve. It reserves a slot for every genuinely-new bucket the request needs across both maps at once. A brand-new key with both limits enabled needs two slots (one per map); reserving them together means a request that will be denied for lack of capacity cannot leave an unpaired invocation bucket in the last free slot.
Peek, then commit. It calls can_consume on both buckets before consume on either. A request denied by one limit does not partially drain the other.

Key shape

crates/guards/chio-guards/src/velocity.rs

let grant_index = if self.config.max_invocations_per_window.is_some()
    || self.config.max_spend_per_window.is_some()
{
    ctx.matched_grant_index.ok_or_else(|| {
        KernelError::Internal(
            "velocity guard rate limiting requires matched_grant_index".to_string(),
        )
    })?
} else {
    0
};
let key = (ctx.request.capability.id.clone(), grant_index);

The grant index only defaults to 0 when neither max_invocations_per_window nor max_spend_per_window is set — that is, when the guard enforces nothing anyway. When either limit is configured, a missing matched_grant_index is a fail-closed Err(KernelError::Internal) deny, not a silent fallback to index 0. Two capabilities with the same ID share buckets; two grants on the same capability do not.

Failure modes

Mutex poisoning :: Err(KernelError::Internal("velocity guard state lock poisoned")). Both bucket maps share one state lock, so a single message covers either map. The kernel treats this as a deny.
Missing grant index when either limit is on :: deny via Err.
Bucket underflow returns Ok(GuardDecision::deny(Vec::new())), not Err. The evaluate signature is Result<GuardDecision, KernelError>; rate-limited requests are routine, not errors.
Memory is bounded. A brand-new key is admitted only while the combined bucket count stays under bucket_cap. When a new key would exceed the cap, the guard first calls prune_refilled to reclaim buckets that have refilled to capacity (idle and equivalent to a new bucket); if only buckets still holding live rate-limit state remain, the new key is denied fail-closed instead of evicting an active bucket. Evicting a live bucket would let a caller reset a depleted limit with a new capability ID. The guard supplies the bound and pruning behavior; it does not use an operator-provided LRU.

AgentVelocityGuard

Source: crates/guards/chio-guards/src/agent_velocity.rs. Guard name: agent-velocity. Same bucket math, different keys.

Struct

crates/guards/chio-guards/src/agent_velocity.rs

#[derive(Clone, Debug)]
pub struct AgentVelocityConfig {
    pub max_requests_per_agent: Option<u32>,
    pub max_requests_per_session: Option<u32>,
    pub window_secs: u64,
    pub burst_factor: f64,
}

pub struct AgentVelocityGuard {
    state: Mutex<AgentVelocityState>,
    config: AgentVelocityConfig,
    bucket_cap: usize,
}

struct AgentVelocityState {
    agent_buckets: HashMap<String, TokenBucket>,
    session_buckets: HashMap<(String, String), TokenBucket>,
}

Like VelocityGuard, both maps useMutex<AgentVelocityState>, and the combined bucket count is capped by bucket_cap (default velocity_bucket_cap from the process memory budget). Refilled buckets are pruned to reclaim slots; when only live buckets remain, a new key is denied fail-closed.

Defaults

Knob	Type	Default	Purpose
`max_requests_per_agent`	`Option<u32>`	`None`	Per-agent ceiling. Keyed by `agent_id`.
`max_requests_per_session`	`Option<u32>`	`None`	Per-session ceiling. Keyed by `(agent_id, capability_id)`.
`window_secs`	`u64`	`60`	Window length. Floored at 1.
`burst_factor`	`f64`	`1.0`	Capacity multiplier.

Key shape

From evaluate:

crates/guards/chio-guards/src/agent_velocity.rs

let agent_id = ctx.agent_id.clone();
let cap_id = ctx.request.capability.id.clone();
// per-agent: agent_id
// per-session: (agent_id, cap_id)

The capability ID stands in for a session because the guard context does not expose a session ID directly. Two distinct capabilities under the same agent are treated as separate sessions.

Comparison

Property	`VelocityGuard`	`AgentVelocityGuard`
Bucket key	`(capability_id, grant_index)`	`agent_id` and `(agent_id, capability_id)`
Grain	Per-grant within a capability	Per-agent and per-session, cross-capability
Spend (monetary) limit	Yes, via grant's `max_cost_per_invocation`	No
Default window	60 s	60 s
Default burst factor	1.0	1.0
Bucket arithmetic	Integer milli-token, monotonic `Instant`	Integer milli-token, monotonic `Instant`
Mutex poisoning	`Err` ⇒ deny	`Err` ⇒ deny
Bucket eviction	Bounded: refilled buckets pruned, new keys denied at `bucket_cap`	Bounded: refilled buckets pruned, new keys denied at `bucket_cap`

Composition

Run both guards in series when you want capability-grain ceilings (for billing) and agent-grain ceilings (for abuse defense). They use independent state, so a request that clears one can still be denied by the other.

rust

use chio_guards::velocity::VelocityConfig;
use chio_guards::{AgentVelocityConfig, AgentVelocityGuard, VelocityGuard};

let mut pipeline = chio_guards::GuardPipeline::new();

// Agent-grain throttle (cross-capability, broad)
pipeline.add(Box::new(AgentVelocityGuard::new(AgentVelocityConfig {
    max_requests_per_agent: Some(600),
    max_requests_per_session: Some(120),
    window_secs: 60,
    burst_factor: 1.0,
})));

// Capability-grain throttle (per-grant, narrow)
pipeline.add(Box::new(VelocityGuard::new(VelocityConfig {
    max_invocations_per_window: Some(60),
    max_spend_per_window: Some(1_000),
    window_secs: 60,
    burst_factor: 1.5,
})));

Where to enforce spend limits

Spend limits live on VelocityGuard because the unit cost is grant-shaped (it lives on grant.max_cost_per_invocation). For a cross-capability dollar ceiling on a single agent, you need a custom guard that aggregates over the receipt store (or a meter outside the kernel). The session-aware data-flow guard does this for byte volume, not money.

Next Steps

Session-Aware Guards :: data-flow and behavioral-sequence checks that share the session journal.
Fail-Closed Behavior :: how the kernel treats Err from these guards.
Budgets :: pairing rate limits with monetary budgets at the receipt layer.