Reputation Scoring
The chio-reputation crate scores agents from a caller-provided local corpus of persisted receipts, capability-lineage snapshots, and budget-usage records. It is intentionally pure and storage-agnostic: no kernel dependency, no external service, no central registry. Two operators with the same corpus and config produce identical scores.
Why local and deterministic
Inputs
A scorecard is computed against a LocalReputationCorpus assembled by the caller from local stores:
pub struct LocalReputationCorpus {
pub receipts: Vec<ChioReceipt>,
pub capabilities: Vec<CapabilityLineageRecord>,
pub budget_usage: Vec<BudgetUsageRecord>,
pub incident_reports: Option<Vec<IncidentRecord>>,
}- Receipts: every signed tool-invocation outcome the agent has produced in the window. Used for boundary pressure, reliability, specialization, and history.
- Capability lineage: snapshots of issued capabilities, including parent links, scope, delegation depth, and validity bounds. Drives least-privilege and delegation-hygiene metrics.
- Budget usage: per-grant invocation counters and totals charged. Feeds resource stewardship.
- Incident reports: optional list of timestamps (with optional receipt IDs) that the environment classifies as incidents.
Nonemeans the incident metric is unavailable, not zero.
API Surface
The scoring entry point is compute_local_scorecard at chio-reputation/src/score.rs:3:
#[must_use]
pub fn compute_local_scorecard(
subject_key: &str,
now: u64,
corpus: &LocalReputationCorpus,
config: &ReputationConfig,
) -> LocalReputationScorecard {The body filters the corpus into three subject-scoped slices, runs eight metric functions, then folds the results through contribute_metric. The composite is weighted_sum / effective_weight_sum when at least one metric returned a known value, otherwise MetricValue::Unknown (score.rs:105-110).
let composite_score = if effective_weight_sum > 0.0 {
MetricValue::known(weighted_sum / effective_weight_sum)
} else {
MetricValue::Unknown
};There is no issuance-recommendation type. The crate stops at the scorecard and lets callers (the credit underwriter, the passport issuer, federation policy) translate a composite into a downstream action.
Default Constants
The four module-level defaults are file-private constants in lib.rs (the include macro pulls them into model.rs through Default impls):
const SECONDS_PER_DAY: u64 = 86_400;
const DEFAULT_HISTORY_RECEIPT_TARGET: u64 = 1_000;
const DEFAULT_HISTORY_DAY_TARGET: u64 = 30;
const DEFAULT_INCIDENT_PENALTY: f64 = 0.20;The temporal_decay_half_life_days default is hard-coded inline at model.rs:143 as 30, and target_utilization is 0.75 at model.rs:141.
The Eight Metric Components
Every metric returns a MetricValue that is either Known(0.0..=1.0) or Unknown. Unknown means the metric had no observable signal; it is excluded from the composite rather than counted as zero.
The eight components and their default weights are wired in the Default impl on ReputationWeights at model.rs:111-124:
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq)]
pub struct ReputationWeights {
pub boundary_pressure: f64,
pub resource_stewardship: f64,
pub least_privilege: f64,
pub history_depth: f64,
pub tool_diversity: f64,
pub delegation_hygiene: f64,
pub reliability: f64,
pub incident_correlation: f64,
}
impl Default for ReputationWeights {
fn default() -> Self {
Self {
boundary_pressure: 0.20,
resource_stewardship: 0.10,
least_privilege: 0.15,
history_depth: 0.10,
tool_diversity: 0.05,
delegation_hygiene: 0.15,
reliability: 0.15,
incident_correlation: 0.10,
}
}
}| Metric | Default weight | What it measures | Source |
|---|---|---|---|
boundary_pressure | 0.20 | Per-policy deny ratio, time-decayed by 2^(-age/half_life); the contribution is 1.0 - deny_ratio. | score.rs:127-163 |
resource_stewardship | 0.10 | 1.0 - |average_utilization - target_utilization| across capped grants. Contributing field is fit_score. | score.rs:165-208 |
least_privilege | 0.15 | used_tools / granted_tools times a constraint factor (0.5 + 0.5 * constrained_ratio) and an operation factor (0.5 + 0.5 * non_delegate_ratio). | score.rs:210-267 |
history_depth | 0.10 | Average of receipt-count progress (against 1000), day-span progress (against 30), and active-day ratio. | compare.rs |
tool_diversity | 0.05 | Normalized Shannon entropy over time-decayed tool usage; capped by diversity_cap. The struct field is specialization.score; the weight key is tool_diversity. | score.rs:77-85 |
delegation_hygiene | 0.15 | Average of scope-reduction, TTL-reduction, and budget-reduction rates across delegations issued by the subject. | score.rs:29-33 (delegation slice), per-rate logic in compare.rs |
reliability | 0.15 | Allow-weight share among Allow + Cancelled + Incomplete decisions; denies are excluded so a denied request never penalizes reliability twice. | score.rs:44 (call site) |
incident_correlation | 0.10 | 1.0 - incident_penalty * weighted_incidents, clamped to [0, 1]. Returns Unknown when incident_reports = None. | score.rs:45 (call site) |
The fold loop at score.rs:50-103 feeds eight calls to contribute_metric:
contribute_metric(
boundary_pressure
.deny_ratio
.as_option()
.map(|value| 1.0 - value),
config.weights.boundary_pressure,
&mut weighted_sum,
&mut effective_weight_sum,
);
contribute_metric(
resource_stewardship.fit_score.as_option(),
config.weights.resource_stewardship,
&mut weighted_sum,
&mut effective_weight_sum,
);
// ...six more calls, one per metric.Every metric returns a MetricValue (defined at model.rs:78-97) that is either Known(0.0..=1.0) or Unknown. Unknown means the metric had no observable signal; it is excluded from the composite rather than counted as zero. The construction MetricValue::known clamps the input through clamp01 so a faulty component cannot bias the composite past[0, 1].
Configuration Knobs
ReputationConfig exposes the tunable parameters that govern history sizing, time decay, target utilization, and incident penalty:
pub struct ReputationConfig {
pub weights: ReputationWeights,
pub target_utilization: f64, // default 0.75
pub diversity_cap: f64, // default 1.0
pub temporal_decay_half_life_days: u32, // default 30
pub history_receipt_target: u64, // default 1_000
pub history_day_target: u64, // default 30
pub incident_penalty: f64, // default 0.20
}- History window: history depth normalizes against
history_receipt_target(default 1000 receipts) andhistory_day_target(default 30 days). Both must be reached for the metric to saturate. - Temporal decay: per-receipt and per-capability weights follow
2^(-age / half_life)with a 30-day default half-life. A receipt 30 days old contributes half as much as a receipt issued today. - Target utilization: the fit score is
1.0 - |observed - target|. With the default target of 0.75, an agent that uses 75% of its capped grants gets a perfect stewardship score; either underuse or overuse drags it down. - Incident penalty: each time-weighted incident subtracts
incident_penalty(default 0.20) from the incident-correlation score. Five recent incidents drive that metric to zero.
Half-life zero disables decay
temporal_decay_half_life_days = 0 assigns weight 1.0 to every observation regardless of age. Useful when you want a flat window and are clipping the corpus yourself.Output Shape
Defined at model.rs:212-226:
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct LocalReputationScorecard {
pub subject_key: String,
pub computed_at: u64,
pub boundary_pressure: BoundaryPressureMetrics,
pub resource_stewardship: ResourceStewardshipMetrics,
pub least_privilege: LeastPrivilegeMetrics,
pub history_depth: HistoryDepthMetrics,
pub specialization: SpecializationMetrics,
pub delegation_hygiene: DelegationHygieneMetrics,
pub reliability: ReliabilityMetrics,
pub incident_correlation: IncidentCorrelationMetrics,
pub composite_score: MetricValue,
pub effective_weight_sum: f64,
}Each per-metric struct carries observation counts alongside the score, so a downstream policy can reject an attestation when, for example, the reliability score is high but receipts_observed is small.
Behavioral Profile Inputs
The reputation scorecard is one of two signals the platform uses to characterize an agent. The other is the behavioral profile guard at chio-guards/src/behavioral_profile.rs (Phase 19.2). The guard implements chio_kernel::Guard; its evaluate method (lines 350-366) reads one bounded window of receipts per invocation, calls observe_sample, and returns Verdict::Allow regardless of the anomaly state. Anomaly evidence rides along for downstream scoring.
The four tracked metrics are the BehavioralMetric enum variants at behavioral_profile.rs:57-67:
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum BehavioralMetric {
/// Total receipts per window.
CallRate,
/// Denies per window.
DenyRate,
/// Distinct tool names per window.
UniqueTools,
/// Approximate parameter entropy per window.
AvgParameterEntropy,
}The defaults are public constants at behavioral_profile.rs:46-54:
pub const DEFAULT_EMA_ALPHA: f64 = 0.2;
pub const DEFAULT_SIGMA_THRESHOLD: f64 = 2.0;
pub const DEFAULT_WINDOW_SECS: u64 = 60;
pub const DEFAULT_BASELINE_MIN_WINDOWS: u64 = 3;The flagging rule lives in observe_sample at behavioral_profile.rs:259-263: a window is flagged only when the baseline has at least baseline_min_windows prior samples AND |z_score| > sigma_threshold:
let z = robust_z_score(&entry.state, sample);
let seen_enough = entry.state.sample_count >= self.config.baseline_min_windows;
let anomaly = seen_enough
&& z.map(|z| z.abs() > self.config.sigma_threshold)
.unwrap_or(false);The z-score itself uses a Poisson floor (behavioral_profile.rs:337-348): for count metrics the effective standard deviation is max(measured_stddev, sqrt(max(mean, 1))) so that a 50x spike over a steady 10-per-window baseline is still flagged when the EWMA variance happens to be numerically zero. The unit test at behavioral_profile.rs:402-421 pins this behavior. See Guard Platform · Session for how guards plug into the request lifecycle.
Phase 19.1 Compliance Score
Sitting alongside the reputation scorecard is the Phase 19.1 compliance score at chio-kernel/src/compliance_score.rs. It produces a 0 to 1000 integer over a different question: did the agent comply with its policies during the window?. The factor weights are public constants at compliance_score.rs:33-41 and sum to COMPLIANCE_SCORE_MAX = 1000:
pub const WEIGHT_DENY_RATE: u32 = 300;
pub const WEIGHT_REVOCATION: u32 = 300;
pub const WEIGHT_VELOCITY_ANOMALY: u32 = 150;
pub const WEIGHT_POLICY_COVERAGE: u32 = 150;
pub const WEIGHT_ATTESTATION_FRESHNESS: u32 = 100;
pub const DEFAULT_ATTESTATION_STALENESS_SECS: u64 = 7_776_000; // 90 days| Factor | Max deduction | Rate driver | Source |
|---|---|---|---|
| Deny rate | 300 | deny_receipts / total_receipts (zero when total is zero) | compliance_score.rs:285-289 |
| Revocation | 300 | revoked_capabilities / observed_capabilities; floored to 1.0 when any_revoked is set | compliance_score.rs:292-308 |
| Velocity anomaly | 150 | anomalous_velocity_windows / velocity_windows | compliance_score.rs:311-315 |
| Policy coverage | 150 | 1 - avg(checkpoint_coverage, lineage_coverage); zero when no receipts observed | compliance_score.rs:318-343 |
| Attestation freshness | 100 | age_secs / attestation_staleness_secs (default 90 days) | compliance_score.rs:345-355 |
The compliance score has a hard rule the reputation score does not. The default config sets revocation_ceiling = 499 (compliance_score.rs:227-232), and the score function caps the output below that ceiling whenever any capability is revoked (compliance_score.rs:259-263):
let score = if inputs.any_revoked || inputs.revoked_capabilities > 0 {
raw_score.min(config.revocation_ceiling)
} else {
raw_score
};The unit test revocation_flag_drives_score_below_500 at compliance_score.rs:420-436 pins this acceptance target.
Worked Example
Take a hypothetical agent subject-1 with a 14-day history, evaluated at now = 1_715_000_000 with the default config:
Receipts (180 total):
168 Allow
10 Deny (split across 2 policies)
2 Cancelled
All within the last 14 days
Capabilities issued to subject-1: 4
- 3 capped at max_invocations
- 1 with constraint expressions
- All operations exclude Delegate
Budget usage:
Average utilization across capped grants: 0.62
Delegations subject-1 issued: 2
- Both reduce scope vs parent
- 1 reduces TTL
- 1 reduces budget
Incident reports: None (incident_reports = None)Component scores (rounded):
boundary_pressure.deny_ratio~ 0.06 across two policies, so1 - 0.06 = 0.94resource_stewardship.fit_score=1 - |0.62 - 0.75| = 0.87least_privilege.score~ 0.70 (used < granted, with non-trivial constraint and non-delegate factors)history_depth.score= avg(180/1000, 14/30, activity_ratio) ~ 0.41specialization.score~ 0.85 (entropy across used tools)delegation_hygiene.score= avg(1.0, 0.5, 0.5) = 0.67reliability.score= 168 / (168 + 2) ~ 0.99 (denies excluded)incident_correlation=Unknown(excluded from composite)
With incident correlation excluded, the effective weight sum is 0.90. The weighted composite is roughly:
composite = (0.20 * 0.94 + 0.10 * 0.87 + 0.15 * 0.70 + 0.10 * 0.41
+ 0.05 * 0.85 + 0.15 * 0.67 + 0.15 * 0.99) / 0.90
~ 0.78That 0.78 is well above the 0.6 underwriting approval floor (see Credit & Underwriting), so the agent qualifies for a granted facility, with terms shaped by the dimensions in the credit scorecard.
Privacy and Tenancy
Scores are tenant-scoped by construction. Org A computes scores from Org A's corpus; Org B computes scores from Org B's corpus. The function is identical, but the inputs differ, so the scores differ.
Cross-org sharing happens through Agent Passports: Org A signs a credential carrying its scorecard for an agent, the agent presents the credential to Org B, and Org B applies its own attenuation policy via build_imported_reputation_signal. The default attenuation factor is 0.50, halving any imported composite before it influences a local decision.
No global score
Related Reading
- Credit Scorecards · how the score plus settlement history collapses into a band and confidence level
- Agent Passports · how scorecards leave the issuing tenant as signed credentials
- Compliance Certificates · the per-session artifact that anchors the receipt-side evidence
- Guard Platform · Session · where the behavioral profile guard runs