Rate Limit Guards
Two synchronous token-bucket guards throttle tool invocations. VelocityGuard keys buckets by (capability_id, grant_index) so different grants on the same capability burn independent budgets. AgentVelocityGuard keys by agent identity and session, providing a cross-capability throttle on a single agent. Both implementations share an integer-milli-token bucket so floating-point drift does not accumulate.
The token bucket
Both guards embed a private TokenBucket with the same fields and arithmetic. From crates/chio-guards/src/velocity.rs and agent_velocity.rs:
const MT_PER_TOKEN: u64 = 1_000;
struct TokenBucket {
capacity_mt: u64,
tokens_mt: u64,
refill_rate_mpm: u64, // milli-tokens per millisecond
last_refill: Instant,
}One logical token equals 1,000 milli-tokens. The refill rate is derived as:
let window_ms = window_secs.saturating_mul(1_000).max(1);
let refill_rate_mpm = (max_per_window.saturating_mul(MT_PER_TOKEN))
.checked_div(window_ms)
.unwrap_or(1)
.max(1);The bucket starts full at capacity_mt. Every try_consume call first refills based on elapsed wall time, then attempts to deduct the requested cost in milli-tokens. The minimum refill rate is one milli-token per millisecond so very slow rates still make progress.
Wall-clock based, not request-counter based
Instant::now(). Long-lived guards keep working under clock skew because Instant is monotonic. There is no per-request decay step; refill is opportunistic at consume time.VelocityGuard
Source: crates/chio-guards/src/velocity.rs. Guard name: velocity. Keys buckets by (capability_id, grant_index) so each grant within a capability has its own bucket.
Struct
#[derive(Clone, Debug)]
pub struct VelocityConfig {
pub max_invocations_per_window: Option<u32>,
pub max_spend_per_window: Option<u64>,
pub window_secs: u64,
pub burst_factor: f64,
}
pub struct VelocityGuard {
invocation_buckets: Mutex<HashMap<(String, usize), TokenBucket>>,
spend_buckets: Mutex<HashMap<(String, usize), TokenBucket>>,
config: VelocityConfig,
}Defaults
| Knob | Type | Default | Purpose |
|---|---|---|---|
max_invocations_per_window | Option<u32> | None | Hard ceiling on logical tokens added per window. None = unlimited. |
max_spend_per_window | Option<u64> | None | Maximum monetary units spendable per window. Requires max_cost_per_invocation on the matched grant. |
window_secs | u64 | 60 | Window length. Floored at 1 second internally. |
burst_factor | f64 | 1.0 | Capacity multiplier. 1.0 = no burst above steady rate. |
Bucket capacity
Verified from evaluate:
let capacity = ((max_inv as f64 * self.config.burst_factor)
.round() as u64)
.max(1);With a 100/min limit and burst_factor = 1.5, capacity is 150 logical tokens (150,000 milli-tokens). The refill rate stays 100 logical tokens per minute. So an idle bucket can burst up to 150 consecutive calls, then settles to a steady 100/min.
Spend limiting
When max_spend_per_window is set, the guard reads grant.max_cost_per_invocation.units from the matched grant and consumes that many tokens from the spend bucket. Verified from planned_spend_units:
fn planned_spend_units(ctx: &GuardContext) -> Result<u64, KernelError> {
let grant_index = ctx.matched_grant_index.ok_or_else(|| {
KernelError::Internal(
"velocity guard spend limiting requires matched_grant_index".to_string(),
)
})?;
let grant = ctx.scope.grants.get(grant_index).ok_or_else(|| {
KernelError::Internal(format!(
"velocity guard could not resolve grant index {grant_index}"
))
})?;
grant
.max_cost_per_invocation
.as_ref()
.map(|amount| amount.units)
.ok_or_else(|| {
KernelError::Internal(
"velocity guard spend limiting requires max_cost_per_invocation \
on the matched grant".to_string(),
)
})
}Three things must be true for spend limiting to fire: max_spend_per_window is Some, matched_grant_index is Some, and the matched grant carries max_cost_per_invocation. Anything else returns Err which the kernel treats as a deny.
Key shape
let grant_index = ctx.matched_grant_index.unwrap_or(0);
let key = (ctx.request.capability.id.clone(), grant_index);When the kernel does not pass a matched grant index, the guard uses index 0. Two capabilities with the same ID share buckets; two grants on the same capability do not.
Failure modes
- Mutex poisoning ::
Err(KernelError::Internal)with a message identifying which lock (invocation or spend) was poisoned. The kernel treats this as a deny. - Missing grant index when spend limiting is on :: deny via
Err. - Bucket underflow returns
Ok(Verdict::Deny), notErr. Rate-limited requests are routine, not errors. - Memory: a fresh bucket is created on first use of a key and never evicted. A long-lived process with many capability IDs grows the map unboundedly. If your tenancy churns capabilities, you are responsible for restarting the process or wrapping the guard with your own LRU.
AgentVelocityGuard
Source: crates/chio-guards/src/agent_velocity.rs. Guard name: agent-velocity. Same bucket math, different keys.
Struct
#[derive(Clone, Debug)]
pub struct AgentVelocityConfig {
pub max_requests_per_agent: Option<u32>,
pub max_requests_per_session: Option<u32>,
pub window_secs: u64,
pub burst_factor: f64,
}
pub struct AgentVelocityGuard {
agent_buckets: Mutex<HashMap<String, TokenBucket>>,
session_buckets: Mutex<HashMap<(String, String), TokenBucket>>,
config: AgentVelocityConfig,
}Defaults
| Knob | Type | Default | Purpose |
|---|---|---|---|
max_requests_per_agent | Option<u32> | None | Per-agent ceiling. Keyed by agent_id. |
max_requests_per_session | Option<u32> | None | Per-session ceiling. Keyed by (agent_id, capability_id). |
window_secs | u64 | 60 | Window length. Floored at 1. |
burst_factor | f64 | 1.0 | Capacity multiplier. |
Key shape
From evaluate:
let agent_id = ctx.agent_id.clone();
let cap_id = ctx.request.capability.id.clone();
// per-agent: agent_id
// per-session: (agent_id, cap_id)The capability ID stands in for a session because the guard context does not surface a session ID directly. Two distinct capabilities under the same agent are treated as separate sessions.
Comparison
| Property | VelocityGuard | AgentVelocityGuard |
|---|---|---|
| Bucket key | (capability_id, grant_index) | agent_id and (agent_id, capability_id) |
| Grain | Per-grant within a capability | Per-agent and per-session, cross-capability |
| Spend (monetary) limit | Yes, via grant's max_cost_per_invocation | No |
| Default window | 60 s | 60 s |
| Default burst factor | 1.0 | 1.0 |
| Bucket arithmetic | Integer milli-token, monotonic Instant | Integer milli-token, monotonic Instant |
| Mutex poisoning | Err ⇒ deny | Err ⇒ deny |
| Bucket eviction | None | None |
Composition
Run both guards in series when you want capability-grain ceilings (for billing) and agent-grain ceilings (for abuse defense). They use independent state, so a request that clears one can still be denied by the other.
use chio_guards::{
AgentVelocityConfig, AgentVelocityGuard, VelocityConfig, VelocityGuard,
};
let mut pipeline = chio_guards::GuardPipeline::new();
// Agent-grain throttle (cross-capability, broad)
pipeline.add(Box::new(AgentVelocityGuard::new(AgentVelocityConfig {
max_requests_per_agent: Some(600),
max_requests_per_session: Some(120),
window_secs: 60,
burst_factor: 1.0,
})));
// Capability-grain throttle (per-grant, narrow)
pipeline.add(Box::new(VelocityGuard::new(VelocityConfig {
max_invocations_per_window: Some(60),
max_spend_per_window: Some(1_000),
window_secs: 60,
burst_factor: 1.5,
})));Where to enforce spend limits
VelocityGuard because the unit cost is grant-shaped (it lives on grant.max_cost_per_invocation). For a cross-capability dollar ceiling on a single agent, you need a custom guard that aggregates over the receipt store (or a meter outside the kernel). The session-aware data-flow guard does this for byte volume, not money.Next Steps
- Session-Aware Guards :: data-flow and behavioral-sequence checks that share the session journal.
- Fail-Closed Behavior :: how the kernel treats
Errfrom these guards. - Budgets :: pairing rate limits with monetary budgets at the receipt layer.