Chio/Docs

Rate Limit Guards

Two synchronous token-bucket guards throttle tool invocations. VelocityGuard keys buckets by (capability_id, grant_index) so different grants on the same capability burn independent budgets. AgentVelocityGuard keys by agent identity and session, providing a cross-capability throttle on a single agent. Both implementations share an integer-milli-token bucket so floating-point drift does not accumulate.


The token bucket

Both guards embed a private TokenBucket with the same fields and arithmetic. From crates/chio-guards/src/velocity.rs and agent_velocity.rs:

crates/chio-guards/src/velocity.rs
const MT_PER_TOKEN: u64 = 1_000;

struct TokenBucket {
    capacity_mt: u64,
    tokens_mt: u64,
    refill_rate_mpm: u64, // milli-tokens per millisecond
    last_refill: Instant,
}

One logical token equals 1,000 milli-tokens. The refill rate is derived as:

crates/chio-guards/src/velocity.rs
let window_ms = window_secs.saturating_mul(1_000).max(1);
let refill_rate_mpm = (max_per_window.saturating_mul(MT_PER_TOKEN))
    .checked_div(window_ms)
    .unwrap_or(1)
    .max(1);

The bucket starts full at capacity_mt. Every try_consume call first refills based on elapsed wall time, then attempts to deduct the requested cost in milli-tokens. The minimum refill rate is one milli-token per millisecond so very slow rates still make progress.

Wall-clock based, not request-counter based

Refill is keyed off Instant::now(). Long-lived guards keep working under clock skew because Instant is monotonic. There is no per-request decay step; refill is opportunistic at consume time.

VelocityGuard

Source: crates/chio-guards/src/velocity.rs. Guard name: velocity. Keys buckets by (capability_id, grant_index) so each grant within a capability has its own bucket.

Struct

crates/chio-guards/src/velocity.rs
#[derive(Clone, Debug)]
pub struct VelocityConfig {
    pub max_invocations_per_window: Option<u32>,
    pub max_spend_per_window: Option<u64>,
    pub window_secs: u64,
    pub burst_factor: f64,
}

pub struct VelocityGuard {
    invocation_buckets: Mutex<HashMap<(String, usize), TokenBucket>>,
    spend_buckets: Mutex<HashMap<(String, usize), TokenBucket>>,
    config: VelocityConfig,
}

Defaults

KnobTypeDefaultPurpose
max_invocations_per_windowOption<u32>NoneHard ceiling on logical tokens added per window. None = unlimited.
max_spend_per_windowOption<u64>NoneMaximum monetary units spendable per window. Requires max_cost_per_invocation on the matched grant.
window_secsu6460Window length. Floored at 1 second internally.
burst_factorf641.0Capacity multiplier. 1.0 = no burst above steady rate.

Bucket capacity

Verified from evaluate:

crates/chio-guards/src/velocity.rs
let capacity = ((max_inv as f64 * self.config.burst_factor)
    .round() as u64)
    .max(1);

With a 100/min limit and burst_factor = 1.5, capacity is 150 logical tokens (150,000 milli-tokens). The refill rate stays 100 logical tokens per minute. So an idle bucket can burst up to 150 consecutive calls, then settles to a steady 100/min.

Spend limiting

When max_spend_per_window is set, the guard reads grant.max_cost_per_invocation.units from the matched grant and consumes that many tokens from the spend bucket. Verified from planned_spend_units:

crates/chio-guards/src/velocity.rs
fn planned_spend_units(ctx: &GuardContext) -> Result<u64, KernelError> {
    let grant_index = ctx.matched_grant_index.ok_or_else(|| {
        KernelError::Internal(
            "velocity guard spend limiting requires matched_grant_index".to_string(),
        )
    })?;
    let grant = ctx.scope.grants.get(grant_index).ok_or_else(|| {
        KernelError::Internal(format!(
            "velocity guard could not resolve grant index {grant_index}"
        ))
    })?;
    grant
        .max_cost_per_invocation
        .as_ref()
        .map(|amount| amount.units)
        .ok_or_else(|| {
            KernelError::Internal(
                "velocity guard spend limiting requires max_cost_per_invocation \
                 on the matched grant".to_string(),
            )
        })
}

Three things must be true for spend limiting to fire: max_spend_per_window is Some, matched_grant_index is Some, and the matched grant carries max_cost_per_invocation. Anything else returns Err which the kernel treats as a deny.

Key shape

crates/chio-guards/src/velocity.rs
let grant_index = ctx.matched_grant_index.unwrap_or(0);
let key = (ctx.request.capability.id.clone(), grant_index);

When the kernel does not pass a matched grant index, the guard uses index 0. Two capabilities with the same ID share buckets; two grants on the same capability do not.

Failure modes

  • Mutex poisoning :: Err(KernelError::Internal) with a message identifying which lock (invocation or spend) was poisoned. The kernel treats this as a deny.
  • Missing grant index when spend limiting is on :: deny via Err.
  • Bucket underflow returns Ok(Verdict::Deny), not Err. Rate-limited requests are routine, not errors.
  • Memory: a fresh bucket is created on first use of a key and never evicted. A long-lived process with many capability IDs grows the map unboundedly. If your tenancy churns capabilities, you are responsible for restarting the process or wrapping the guard with your own LRU.

AgentVelocityGuard

Source: crates/chio-guards/src/agent_velocity.rs. Guard name: agent-velocity. Same bucket math, different keys.

Struct

crates/chio-guards/src/agent_velocity.rs
#[derive(Clone, Debug)]
pub struct AgentVelocityConfig {
    pub max_requests_per_agent: Option<u32>,
    pub max_requests_per_session: Option<u32>,
    pub window_secs: u64,
    pub burst_factor: f64,
}

pub struct AgentVelocityGuard {
    agent_buckets: Mutex<HashMap<String, TokenBucket>>,
    session_buckets: Mutex<HashMap<(String, String), TokenBucket>>,
    config: AgentVelocityConfig,
}

Defaults

KnobTypeDefaultPurpose
max_requests_per_agentOption<u32>NonePer-agent ceiling. Keyed by agent_id.
max_requests_per_sessionOption<u32>NonePer-session ceiling. Keyed by (agent_id, capability_id).
window_secsu6460Window length. Floored at 1.
burst_factorf641.0Capacity multiplier.

Key shape

From evaluate:

crates/chio-guards/src/agent_velocity.rs
let agent_id = ctx.agent_id.clone();
let cap_id = ctx.request.capability.id.clone();
// per-agent: agent_id
// per-session: (agent_id, cap_id)

The capability ID stands in for a session because the guard context does not surface a session ID directly. Two distinct capabilities under the same agent are treated as separate sessions.


Comparison

PropertyVelocityGuardAgentVelocityGuard
Bucket key(capability_id, grant_index)agent_id and (agent_id, capability_id)
GrainPer-grant within a capabilityPer-agent and per-session, cross-capability
Spend (monetary) limitYes, via grant's max_cost_per_invocationNo
Default window60 s60 s
Default burst factor1.01.0
Bucket arithmeticInteger milli-token, monotonic InstantInteger milli-token, monotonic Instant
Mutex poisoningErr ⇒ denyErr ⇒ deny
Bucket evictionNoneNone

Composition

Run both guards in series when you want capability-grain ceilings (for billing) and agent-grain ceilings (for abuse defense). They use independent state, so a request that clears one can still be denied by the other.

rust
use chio_guards::{
    AgentVelocityConfig, AgentVelocityGuard, VelocityConfig, VelocityGuard,
};

let mut pipeline = chio_guards::GuardPipeline::new();

// Agent-grain throttle (cross-capability, broad)
pipeline.add(Box::new(AgentVelocityGuard::new(AgentVelocityConfig {
    max_requests_per_agent: Some(600),
    max_requests_per_session: Some(120),
    window_secs: 60,
    burst_factor: 1.0,
})));

// Capability-grain throttle (per-grant, narrow)
pipeline.add(Box::new(VelocityGuard::new(VelocityConfig {
    max_invocations_per_window: Some(60),
    max_spend_per_window: Some(1_000),
    window_secs: 60,
    burst_factor: 1.5,
})));

Where to enforce spend limits

Spend limits live on VelocityGuard because the unit cost is grant-shaped (it lives on grant.max_cost_per_invocation). For a cross-capability dollar ceiling on a single agent, you need a custom guard that aggregates over the receipt store (or a meter outside the kernel). The session-aware data-flow guard does this for byte volume, not money.

Next Steps

  • Session-Aware Guards :: data-flow and behavioral-sequence checks that share the session journal.
  • Fail-Closed Behavior :: how the kernel treats Err from these guards.
  • Budgets :: pairing rate limits with monetary budgets at the receipt layer.
Rate Limit Guards · Chio Docs