Chio/Docs

Reputation Scoring

The chio-reputation crate scores agents from a caller-provided local corpus of persisted receipts, capability-lineage snapshots, and budget-usage records. It is intentionally pure and storage-agnostic: no kernel dependency, no external service, no central registry. Two operators with the same corpus and config produce identical scores.

Why local and deterministic

Scoring is a function over already-persisted evidence. Keeping it kernel-free avoids a dependency cycle for future kernel-side issuance hooks, and keeping it deterministic means every score is reproducible from the corpus that produced it. There is no global score; each tenant computes its own view.

Inputs

A scorecard is computed against a LocalReputationCorpus assembled by the caller from local stores:

rust
pub struct LocalReputationCorpus {
    pub receipts: Vec<ChioReceipt>,
    pub capabilities: Vec<CapabilityLineageRecord>,
    pub budget_usage: Vec<BudgetUsageRecord>,
    pub incident_reports: Option<Vec<IncidentRecord>>,
}
  • Receipts: every signed tool-invocation outcome the agent has produced in the window. Used for boundary pressure, reliability, specialization, and history.
  • Capability lineage: snapshots of issued capabilities, including parent links, scope, delegation depth, and validity bounds. Drives least-privilege and delegation-hygiene metrics.
  • Budget usage: per-grant invocation counters and totals charged. Feeds resource stewardship.
  • Incident reports: optional list of timestamps (with optional receipt IDs) that the environment classifies as incidents. None means the incident metric is unavailable, not zero.

API Surface

The scoring entry point is compute_local_scorecard at chio-reputation/src/score.rs:3:

crates/chio-reputation/src/score.rs
#[must_use]
pub fn compute_local_scorecard(
    subject_key: &str,
    now: u64,
    corpus: &LocalReputationCorpus,
    config: &ReputationConfig,
) -> LocalReputationScorecard {

The body filters the corpus into three subject-scoped slices, runs eight metric functions, then folds the results through contribute_metric. The composite is weighted_sum / effective_weight_sum when at least one metric returned a known value, otherwise MetricValue::Unknown (score.rs:105-110).

crates/chio-reputation/src/score.rs
let composite_score = if effective_weight_sum > 0.0 {
    MetricValue::known(weighted_sum / effective_weight_sum)
} else {
    MetricValue::Unknown
};

There is no issuance-recommendation type. The crate stops at the scorecard and lets callers (the credit underwriter, the passport issuer, federation policy) translate a composite into a downstream action.


Default Constants

The four module-level defaults are file-private constants in lib.rs (the include macro pulls them into model.rs through Default impls):

crates/chio-reputation/src/lib.rs
const SECONDS_PER_DAY: u64 = 86_400;
const DEFAULT_HISTORY_RECEIPT_TARGET: u64 = 1_000;
const DEFAULT_HISTORY_DAY_TARGET: u64 = 30;
const DEFAULT_INCIDENT_PENALTY: f64 = 0.20;

The temporal_decay_half_life_days default is hard-coded inline at model.rs:143 as 30, and target_utilization is 0.75 at model.rs:141.


The Eight Metric Components

Every metric returns a MetricValue that is either Known(0.0..=1.0) or Unknown. Unknown means the metric had no observable signal; it is excluded from the composite rather than counted as zero.

The eight components and their default weights are wired in the Default impl on ReputationWeights at model.rs:111-124:

crates/chio-reputation/src/model.rs
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq)]
pub struct ReputationWeights {
    pub boundary_pressure: f64,
    pub resource_stewardship: f64,
    pub least_privilege: f64,
    pub history_depth: f64,
    pub tool_diversity: f64,
    pub delegation_hygiene: f64,
    pub reliability: f64,
    pub incident_correlation: f64,
}

impl Default for ReputationWeights {
    fn default() -> Self {
        Self {
            boundary_pressure: 0.20,
            resource_stewardship: 0.10,
            least_privilege: 0.15,
            history_depth: 0.10,
            tool_diversity: 0.05,
            delegation_hygiene: 0.15,
            reliability: 0.15,
            incident_correlation: 0.10,
        }
    }
}
MetricDefault weightWhat it measuresSource
boundary_pressure0.20Per-policy deny ratio, time-decayed by 2^(-age/half_life); the contribution is 1.0 - deny_ratio.score.rs:127-163
resource_stewardship0.101.0 - |average_utilization - target_utilization| across capped grants. Contributing field is fit_score.score.rs:165-208
least_privilege0.15used_tools / granted_tools times a constraint factor (0.5 + 0.5 * constrained_ratio) and an operation factor (0.5 + 0.5 * non_delegate_ratio).score.rs:210-267
history_depth0.10Average of receipt-count progress (against 1000), day-span progress (against 30), and active-day ratio.compare.rs
tool_diversity0.05Normalized Shannon entropy over time-decayed tool usage; capped by diversity_cap. The struct field is specialization.score; the weight key is tool_diversity.score.rs:77-85
delegation_hygiene0.15Average of scope-reduction, TTL-reduction, and budget-reduction rates across delegations issued by the subject.score.rs:29-33 (delegation slice), per-rate logic in compare.rs
reliability0.15Allow-weight share among Allow + Cancelled + Incomplete decisions; denies are excluded so a denied request never penalizes reliability twice.score.rs:44 (call site)
incident_correlation0.101.0 - incident_penalty * weighted_incidents, clamped to [0, 1]. Returns Unknown when incident_reports = None.score.rs:45 (call site)

The fold loop at score.rs:50-103 feeds eight calls to contribute_metric:

crates/chio-reputation/src/score.rs
contribute_metric(
    boundary_pressure
        .deny_ratio
        .as_option()
        .map(|value| 1.0 - value),
    config.weights.boundary_pressure,
    &mut weighted_sum,
    &mut effective_weight_sum,
);
contribute_metric(
    resource_stewardship.fit_score.as_option(),
    config.weights.resource_stewardship,
    &mut weighted_sum,
    &mut effective_weight_sum,
);
// ...six more calls, one per metric.

Every metric returns a MetricValue (defined at model.rs:78-97) that is either Known(0.0..=1.0) or Unknown. Unknown means the metric had no observable signal; it is excluded from the composite rather than counted as zero. The construction MetricValue::known clamps the input through clamp01 so a faulty component cannot bias the composite past[0, 1].


Configuration Knobs

ReputationConfig exposes the tunable parameters that govern history sizing, time decay, target utilization, and incident penalty:

rust
pub struct ReputationConfig {
    pub weights: ReputationWeights,
    pub target_utilization: f64,            // default 0.75
    pub diversity_cap: f64,                 // default 1.0
    pub temporal_decay_half_life_days: u32, // default 30
    pub history_receipt_target: u64,        // default 1_000
    pub history_day_target: u64,            // default 30
    pub incident_penalty: f64,              // default 0.20
}
  • History window: history depth normalizes against history_receipt_target (default 1000 receipts) and history_day_target (default 30 days). Both must be reached for the metric to saturate.
  • Temporal decay: per-receipt and per-capability weights follow 2^(-age / half_life) with a 30-day default half-life. A receipt 30 days old contributes half as much as a receipt issued today.
  • Target utilization: the fit score is 1.0 - |observed - target|. With the default target of 0.75, an agent that uses 75% of its capped grants gets a perfect stewardship score; either underuse or overuse drags it down.
  • Incident penalty: each time-weighted incident subtracts incident_penalty (default 0.20) from the incident-correlation score. Five recent incidents drive that metric to zero.

Half-life zero disables decay

Setting temporal_decay_half_life_days = 0 assigns weight 1.0 to every observation regardless of age. Useful when you want a flat window and are clipping the corpus yourself.

Output Shape

Defined at model.rs:212-226:

crates/chio-reputation/src/model.rs
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct LocalReputationScorecard {
    pub subject_key: String,
    pub computed_at: u64,
    pub boundary_pressure: BoundaryPressureMetrics,
    pub resource_stewardship: ResourceStewardshipMetrics,
    pub least_privilege: LeastPrivilegeMetrics,
    pub history_depth: HistoryDepthMetrics,
    pub specialization: SpecializationMetrics,
    pub delegation_hygiene: DelegationHygieneMetrics,
    pub reliability: ReliabilityMetrics,
    pub incident_correlation: IncidentCorrelationMetrics,
    pub composite_score: MetricValue,
    pub effective_weight_sum: f64,
}

Each per-metric struct carries observation counts alongside the score, so a downstream policy can reject an attestation when, for example, the reliability score is high but receipts_observed is small.


Behavioral Profile Inputs

The reputation scorecard is one of two signals the platform uses to characterize an agent. The other is the behavioral profile guard at chio-guards/src/behavioral_profile.rs (Phase 19.2). The guard implements chio_kernel::Guard; its evaluate method (lines 350-366) reads one bounded window of receipts per invocation, calls observe_sample, and returns Verdict::Allow regardless of the anomaly state. Anomaly evidence rides along for downstream scoring.

The four tracked metrics are the BehavioralMetric enum variants at behavioral_profile.rs:57-67:

crates/chio-guards/src/behavioral_profile.rs
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum BehavioralMetric {
    /// Total receipts per window.
    CallRate,
    /// Denies per window.
    DenyRate,
    /// Distinct tool names per window.
    UniqueTools,
    /// Approximate parameter entropy per window.
    AvgParameterEntropy,
}

The defaults are public constants at behavioral_profile.rs:46-54:

crates/chio-guards/src/behavioral_profile.rs
pub const DEFAULT_EMA_ALPHA: f64 = 0.2;
pub const DEFAULT_SIGMA_THRESHOLD: f64 = 2.0;
pub const DEFAULT_WINDOW_SECS: u64 = 60;
pub const DEFAULT_BASELINE_MIN_WINDOWS: u64 = 3;

The flagging rule lives in observe_sample at behavioral_profile.rs:259-263: a window is flagged only when the baseline has at least baseline_min_windows prior samples AND |z_score| > sigma_threshold:

crates/chio-guards/src/behavioral_profile.rs
let z = robust_z_score(&entry.state, sample);
let seen_enough = entry.state.sample_count >= self.config.baseline_min_windows;
let anomaly = seen_enough
    && z.map(|z| z.abs() > self.config.sigma_threshold)
        .unwrap_or(false);

The z-score itself uses a Poisson floor (behavioral_profile.rs:337-348): for count metrics the effective standard deviation is max(measured_stddev, sqrt(max(mean, 1))) so that a 50x spike over a steady 10-per-window baseline is still flagged when the EWMA variance happens to be numerically zero. The unit test at behavioral_profile.rs:402-421 pins this behavior. See Guard Platform · Session for how guards plug into the request lifecycle.


Phase 19.1 Compliance Score

Sitting alongside the reputation scorecard is the Phase 19.1 compliance score at chio-kernel/src/compliance_score.rs. It produces a 0 to 1000 integer over a different question: did the agent comply with its policies during the window?. The factor weights are public constants at compliance_score.rs:33-41 and sum to COMPLIANCE_SCORE_MAX = 1000:

crates/chio-kernel/src/compliance_score.rs
pub const WEIGHT_DENY_RATE: u32 = 300;
pub const WEIGHT_REVOCATION: u32 = 300;
pub const WEIGHT_VELOCITY_ANOMALY: u32 = 150;
pub const WEIGHT_POLICY_COVERAGE: u32 = 150;
pub const WEIGHT_ATTESTATION_FRESHNESS: u32 = 100;

pub const DEFAULT_ATTESTATION_STALENESS_SECS: u64 = 7_776_000; // 90 days
FactorMax deductionRate driverSource
Deny rate300deny_receipts / total_receipts (zero when total is zero)compliance_score.rs:285-289
Revocation300revoked_capabilities / observed_capabilities; floored to 1.0 when any_revoked is setcompliance_score.rs:292-308
Velocity anomaly150anomalous_velocity_windows / velocity_windowscompliance_score.rs:311-315
Policy coverage1501 - avg(checkpoint_coverage, lineage_coverage); zero when no receipts observedcompliance_score.rs:318-343
Attestation freshness100age_secs / attestation_staleness_secs (default 90 days)compliance_score.rs:345-355

The compliance score has a hard rule the reputation score does not. The default config sets revocation_ceiling = 499 (compliance_score.rs:227-232), and the score function caps the output below that ceiling whenever any capability is revoked (compliance_score.rs:259-263):

crates/chio-kernel/src/compliance_score.rs
let score = if inputs.any_revoked || inputs.revoked_capabilities > 0 {
    raw_score.min(config.revocation_ceiling)
} else {
    raw_score
};

The unit test revocation_flag_drives_score_below_500 at compliance_score.rs:420-436 pins this acceptance target.


Worked Example

Take a hypothetical agent subject-1 with a 14-day history, evaluated at now = 1_715_000_000 with the default config:

text
Receipts (180 total):
  168 Allow
  10 Deny (split across 2 policies)
  2 Cancelled
  All within the last 14 days

Capabilities issued to subject-1: 4
  - 3 capped at max_invocations
  - 1 with constraint expressions
  - All operations exclude Delegate

Budget usage:
  Average utilization across capped grants: 0.62

Delegations subject-1 issued: 2
  - Both reduce scope vs parent
  - 1 reduces TTL
  - 1 reduces budget

Incident reports: None (incident_reports = None)

Component scores (rounded):

  • boundary_pressure.deny_ratio ~ 0.06 across two policies, so 1 - 0.06 = 0.94
  • resource_stewardship.fit_score = 1 - |0.62 - 0.75| = 0.87
  • least_privilege.score ~ 0.70 (used < granted, with non-trivial constraint and non-delegate factors)
  • history_depth.score = avg(180/1000, 14/30, activity_ratio) ~ 0.41
  • specialization.score ~ 0.85 (entropy across used tools)
  • delegation_hygiene.score = avg(1.0, 0.5, 0.5) = 0.67
  • reliability.score = 168 / (168 + 2) ~ 0.99 (denies excluded)
  • incident_correlation = Unknown (excluded from composite)

With incident correlation excluded, the effective weight sum is 0.90. The weighted composite is roughly:

text
composite = (0.20 * 0.94 + 0.10 * 0.87 + 0.15 * 0.70 + 0.10 * 0.41
             + 0.05 * 0.85 + 0.15 * 0.67 + 0.15 * 0.99) / 0.90
          ~ 0.78

That 0.78 is well above the 0.6 underwriting approval floor (see Credit & Underwriting), so the agent qualifies for a granted facility, with terms shaped by the dimensions in the credit scorecard.


Privacy and Tenancy

Scores are tenant-scoped by construction. Org A computes scores from Org A's corpus; Org B computes scores from Org B's corpus. The function is identical, but the inputs differ, so the scores differ.

Cross-org sharing happens through Agent Passports: Org A signs a credential carrying its scorecard for an agent, the agent presents the credential to Org B, and Org B applies its own attenuation policy via build_imported_reputation_signal. The default attenuation factor is 0.50, halving any imported composite before it influences a local decision.

No global score

There is no canonical "chio reputation". An agent does not have a score the way a person has a credit score. Every relying party computes (or imports and attenuates) its own.

Related Reading

Reputation Scoring · Chio Docs