Reputation Scoring · Chio Docs

Local deterministic scoring

Scoring is a pure function over already-persisted evidence: no kernel dependency, no external service, no central registry, so the same corpus and config always reproduce the same score. See Reputation & Passports for reputation computed from evidence.

Inputs

A scorecard is computed against a LocalReputationCorpus assembled by the caller from local stores:

rust

pub struct LocalReputationCorpus {
    pub receipts: Vec<ChioReceipt>,
    pub capabilities: Vec<CapabilityLineageRecord>,
    pub budget_usage: Vec<BudgetUsageRecord>,
    pub incident_reports: Option<Vec<IncidentRecord>>,
}

Receipts: every signed tool-invocation outcome the agent has produced in the window. Used for boundary pressure, reliability, specialization, and history.
Capability lineage: snapshots of issued capabilities, including parent links, scope, delegation depth, and validity bounds. Drives least-privilege and delegation-hygiene metrics.
Budget usage: per-grant invocation counters and totals charged. Feeds resource stewardship.
Incident reports: optional list of timestamps (with optional receipt IDs) that the environment classifies as incidents. None means the incident metric is unavailable. It is not treated as zero.

API Surface

The scoring entry point is compute_local_scorecard at chio-reputation/src/score.rs:3:

crates/trust/chio-reputation/src/score.rs

#[must_use]
pub fn compute_local_scorecard(
    subject_key: &str,
    now: u64,
    corpus: &LocalReputationCorpus,
    config: &ReputationConfig,
) -> LocalReputationScorecard {

The body filters the corpus into three subject-scoped slices, runs eight metric functions, then folds the results through contribute_metric. The composite is weighted_sum / effective_weight_sum when at least one metric returned a known value, otherwise MetricValue::Unknown (score.rs:105-110).

crates/trust/chio-reputation/src/score.rs

let composite_score = if effective_weight_sum > 0.0 {
    MetricValue::known(weighted_sum / effective_weight_sum)
} else {
    MetricValue::Unknown
};

There is no issuance-recommendation type. The crate stops at the scorecard and lets callers (the credit underwriter, the passport issuer, federation policy) translate a composite into a downstream action. The reputation issuance policy below is one such caller, shipped in the control plane.

Invoking the Scorer

An operator runs the scorer against a live receipt store from the CLI. chio reputation local computes the local scorecard for one subject; chio reputation compare evaluates a portable passport against the live local corpus and reports per-credential drift.

bash

# Local scorecard for one subject
chio --json --receipt-db receipts.sqlite3 reputation local \
    --subject-public-key 80f2b577472e6662f46ac2e029f4b2d1300f889bc767b3de1f7b63a4c562fd8f \
    [--since <unix>] [--until <unix>] [--policy policy.yaml]

# Compare a portable passport against live local state
chio --json --receipt-db receipts.sqlite3 reputation compare \
    --subject-public-key 80f2b577472e6662f46ac2e029f4b2d1300f889bc767b3de1f7b63a4c562fd8f \
    --passport passport.json [--verifier-policy verifier.yaml]

The same two reports are served over trust-control: GET /v1/reputation/local/:subject_key returns the local scorecard and POST /v1/reputation/compare/:subject_key runs the comparison. Both accept a bare-hex Ed25519 subject key.

Default Constants

The four module-level defaults are file-private constants in lib.rs (the include macro pulls them into model.rs through Default impls):

crates/trust/chio-reputation/src/lib.rs

const SECONDS_PER_DAY: u64 = 86_400;
const DEFAULT_HISTORY_RECEIPT_TARGET: u64 = 1_000;
const DEFAULT_HISTORY_DAY_TARGET: u64 = 30;
const DEFAULT_INCIDENT_PENALTY: f64 = 0.20;

The temporal_decay_half_life_days default is hard-coded inline at model.rs:143 as 30, and target_utilization is 0.75 at model.rs:141.

The Eight Metric Components

Every metric returns a MetricValue that is either Known(0.0..=1.0) or Unknown. Unknown means the metric had no observable signal; it is excluded from the composite and is not counted as zero.

The eight components and their default weights are wired in the Default impl on ReputationWeights at model.rs:111-124:

crates/trust/chio-reputation/src/model.rs

#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq)]
pub struct ReputationWeights {
    pub boundary_pressure: f64,
    pub resource_stewardship: f64,
    pub least_privilege: f64,
    pub history_depth: f64,
    pub tool_diversity: f64,
    pub delegation_hygiene: f64,
    pub reliability: f64,
    pub incident_correlation: f64,
}

impl Default for ReputationWeights {
    fn default() -> Self {
        Self {
            boundary_pressure: 0.20,
            resource_stewardship: 0.10,
            least_privilege: 0.15,
            history_depth: 0.10,
            tool_diversity: 0.05,
            delegation_hygiene: 0.15,
            reliability: 0.15,
            incident_correlation: 0.10,
        }
    }
}

Metric	Default weight	What it measures	Source
`boundary_pressure`	0.20	Per-policy deny ratio, time-decayed by `2^(-age/half_life)`; the contribution is `1.0 - deny_ratio`.	`score.rs:127-163`
`resource_stewardship`	0.10	`1.0 - \|average_utilization - target_utilization\|` across capped grants. Contributing field is `fit_score`.	`score.rs:165-208`
`least_privilege`	0.15	`used_tools / granted_tools` times a constraint factor (`0.5 + 0.5 * constrained_ratio`) and an operation factor (`0.5 + 0.5 * non_delegate_ratio`).	`score.rs:210-267`
`history_depth`	0.10	Average of receipt-count progress (against 1000), day-span progress (against 30), and active-day ratio.	`compare.rs`
`tool_diversity`	0.05	Normalized Shannon entropy over time-decayed tool usage; capped by `diversity_cap`. The struct field is `specialization.score`; the weight key is `tool_diversity`.	`score.rs:77-85`
`delegation_hygiene`	0.15	Average of scope-reduction, TTL-reduction, and budget-reduction rates across delegations issued by the subject.	`score.rs:29-33` (delegation slice), per-rate logic in `compare.rs`
`reliability`	0.15	Allow-weight share among Allow + Canceled + Incomplete decisions; denies are excluded so a denied request does not penalize reliability twice.	`score.rs:44` (call site)
`incident_correlation`	0.10	`1.0 - incident_penalty * weighted_incidents`, clamped to [0, 1]. Returns `Unknown` when `incident_reports = None`.	`score.rs:45` (call site)

The fold loop at score.rs:50-103 feeds eight calls to contribute_metric:

crates/trust/chio-reputation/src/score.rs

contribute_metric(
    boundary_pressure
        .deny_ratio
        .as_option()
        .map(|value| 1.0 - value),
    config.weights.boundary_pressure,
    &mut weighted_sum,
    &mut effective_weight_sum,
);
contribute_metric(
    resource_stewardship.fit_score.as_option(),
    config.weights.resource_stewardship,
    &mut weighted_sum,
    &mut effective_weight_sum,
);
// ...six more calls, one per metric.

Every metric returns a MetricValue (defined at model.rs:78-97) that is either Known(0.0..=1.0) or Unknown. Unknown means the metric had no observable signal; it is excluded from the composite and is not counted as zero. The construction MetricValue::known clamps the input through clamp01 so a faulty component cannot bias the composite past[0, 1].

Configuration Knobs

ReputationConfig exposes the tunable parameters that govern history sizing, time decay, target utilization, and incident penalty:

rust

pub struct ReputationConfig {
    pub weights: ReputationWeights,
    pub target_utilization: f64,            // default 0.75
    pub diversity_cap: f64,                 // default 1.0
    pub temporal_decay_half_life_days: u32, // default 30
    pub history_receipt_target: u64,        // default 1_000
    pub history_day_target: u64,            // default 30
    pub incident_penalty: f64,              // default 0.20
    pub trusted_kernel_keys: BTreeSet<String>, // default empty (fail-closed)
}

History window: history depth normalizes against history_receipt_target (default 1000 receipts) and history_day_target (default 30 days). Both must be reached for the metric to saturate.
Temporal decay: per-receipt and per-capability weights follow 2^(-age / half_life) with a 30-day default half-life. A receipt 30 days old contributes half as much as a receipt issued today.
Target utilization: the fit score is 1.0 - |observed - target|. With the default target of 0.75, an agent that uses 75% of its capped grants gets a perfect stewardship score; either underuse or overuse drags it down.
Incident penalty: each time-weighted incident subtracts incident_penalty (default 0.20) from the incident-correlation score. Five recent incidents drive that metric to zero.
Trusted kernel keys: trusted_kernel_keys is the hex-encoded allowlist of kernel signing keys whose receipts count toward a score. Integrity validation in receipt_integrity_valid rejects any receipt whose kernel_key is not in the set, so a receipt from an unrecognized signer is filtered before it reaches any metric.

trusted_kernel_keys is fail-closed

ReputationConfig::default() ships an empty trusted_kernel_keys set, and an empty set rejects every receipt: the composite collapses to MetricValue::Unknown for every subject. Populate the set with .with_trusted_kernel_keys([kernel_pubkey.to_hex()]) before scoring. The crate emits a tracing::warn! the first time integrity is checked against an empty trust set to surface the misconfiguration.

Half-life zero disables decay

Setting temporal_decay_half_life_days = 0 assigns weight 1.0 to every observation regardless of age. Useful when you want a flat window and are clipping the corpus yourself.

Output Shape

Defined at model.rs:212-226:

crates/trust/chio-reputation/src/model.rs

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct LocalReputationScorecard {
    pub subject_key: String,
    pub computed_at: u64,
    pub boundary_pressure: BoundaryPressureMetrics,
    pub resource_stewardship: ResourceStewardshipMetrics,
    pub least_privilege: LeastPrivilegeMetrics,
    pub history_depth: HistoryDepthMetrics,
    pub specialization: SpecializationMetrics,
    pub delegation_hygiene: DelegationHygieneMetrics,
    pub reliability: ReliabilityMetrics,
    pub incident_correlation: IncidentCorrelationMetrics,
    pub composite_score: MetricValue,
    pub effective_weight_sum: f64,
}

Each per-metric struct carries observation counts alongside the score, so a downstream policy can reject an attestation when, for example, the reliability score is high but receipts_observed is small.

Reputation Issuance Policy

The downstream caller is the capability authority. Its reputation issuance policy (enforce_reputation_policy, and the read-only inspect_local_reputation_with_read_context, in chio-control-plane/src/issuance/reputation.rs) calls compute_local_scorecard and then translates the composite into a capability-scope ceiling in four steps.

Compute a probationary flag: the subject is probationary while history_depth.receipt_count < probationary_receipt_count or history_depth.span_days < probationary_min_days (defaults 1000 receipts, 30 days).
Cap the composite at probationary_score_ceiling (default 0.60) while probationary; the uncapped composite passes through otherwise.
Resolve a named, operator-configured tier from the policy's tiers list by a stateless score_range lookup. The default HushSpec ladder is probationary, standard, trusted, elevated.
Enforce the tier's max_scope as the ceiling on the capability being issued via enforce_tier_scope. A requested scope that exceeds the ceiling is denied.

chio-control-plane/src/issuance/reputation.rs

let effective_score = scorecard.composite_score.as_option().unwrap_or(0.0);
let effective_score = ceiling
    .filter(|_| probationary)
    .map_or(effective_score, |limit| effective_score.min(limit));
let resolved_tier = issuance_policy
    .and_then(|policy| resolve_tier(policy, effective_score));
// ...
let tier_policy = ReputationTierPolicy {
    name: tier.name,
    score_range: tier.score_range,   // [f64; 2] band this tier claims
    max_scope: tier.max_scope,       // operations, max_invocations, ttl, ...
};
enforce_tier_scope(scope, ttl_seconds, &tier_policy)

Tier transitions in the wider system are asymmetric (slow promotion, fast demotion), but the score-to-scope translation here is a pure, stateless lookup: the tier is the band the effective composite falls into, and the ceiling is whatever max_scope the operator wrote for that band.

Reputation Feeds and Tiers

The local scorecard above is computed from one tenant's own persisted evidence. A separate composition layer feeds a different, marketplace API: the ReputationFeed / ReputationTier system, also in chio-reputation, composes discrete signed deltas, not raw scorecards, into marketplace-visibility tiers, tier_0 through tier_3. A ReputationFeed is a pure, deterministic function from a caller-provided observation (an arena round, a cross-provider equality check) to a ScoreDelta clamped to [0.0, 1.0]; the kernel never invokes a feed directly. Composition below the top tier takes the strongest single delta across feeds; tier_3 alone adds the independent, multi-feed requirement described below.

The threshold table is hard-coded in chio-reputation/src/tier.rs so the audit doc can record a stable tier distribution:

crates/trust/chio-reputation/src/tier.rs

pub const TIER_1_THRESHOLD: f64 = 0.50;
pub const TIER_2_THRESHOLD: f64 = 0.75;
pub const TIER_3_THRESHOLD: f64 = 0.90;

/// Every shipped feed must independently clear this value for the
/// publisher to reach the top tier. The Sybil-resistance mitigation.
pub const TIER_3_PER_FEED_THRESHOLD: f64 = 0.80;

Tier	Meaning
`tier_0`	Unproven or new. Default tier: no positive evidence required.
`tier_1`	Composed score clears 0.50 (single strongest feed).
`tier_2`	Composed score clears 0.75 (single strongest feed).
`tier_3`	Top marketplace visibility: multi-feed corroborated, Sybil-resistant by construction.

The top tier requires independent feed coverage. Reaching tier_3 requires clearing a per-feed threshold of 0.80 AND-gated across independent feeds, counted by distinct feed_id, not by how many observations were submitted. No single feed, however favorable, can promote an agent to the top tier on its own: a flood of strong deltas from one feed alone stays capped at tier_2.

The AND gate requires independent feeds for tier_3. A single controlled source remains below the top tier. See Reputation & Passports for how this tier sits alongside the portable Agent Passport in the related reputation documentation.

Behavioral Profile Inputs

The reputation scorecard is one of two signals the platform uses to characterize an agent. The other is the behavioral profile guard at chio-guards/src/behavioral_profile.rs. The guard implements chio_kernel::Guard; its evaluate method (lines 350-366) reads one bounded window of receipts per invocation, calls observe_sample, and returns Verdict::Allow regardless of the anomaly state. Anomaly evidence rides along for downstream scoring.

The four tracked metrics are the BehavioralMetric enum variants at behavioral_profile.rs:57-67:

crates/guards/chio-guards/src/behavioral_profile.rs

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum BehavioralMetric {
    /// Total receipts per window.
    CallRate,
    /// Denies per window.
    DenyRate,
    /// Distinct tool names per window.
    UniqueTools,
    /// Approximate parameter entropy per window.
    AvgParameterEntropy,
}

The defaults are public constants at behavioral_profile.rs:46-54:

crates/guards/chio-guards/src/behavioral_profile.rs

pub const DEFAULT_EMA_ALPHA: f64 = 0.2;
pub const DEFAULT_SIGMA_THRESHOLD: f64 = 2.0;
pub const DEFAULT_WINDOW_SECS: u64 = 60;
pub const DEFAULT_BASELINE_MIN_WINDOWS: u64 = 3;

The flagging rule lives in observe_sample at behavioral_profile.rs:259-263: a window is flagged only when the baseline has at least baseline_min_windows prior samples AND |z_score| > sigma_threshold:

crates/guards/chio-guards/src/behavioral_profile.rs

let z = robust_z_score(&entry.state, sample);
let seen_enough = entry.state.sample_count >= self.config.baseline_min_windows;
let anomaly = seen_enough
    && z.map(|z| z.abs() > self.config.sigma_threshold)
        .unwrap_or(false);

The z-score itself uses a Poisson floor (behavioral_profile.rs:337-348): for count metrics the effective standard deviation is max(measured_stddev, sqrt(max(mean, 1))) so that a 50x spike over a steady 10-per-window baseline is still flagged when the EWMA variance happens to be numerically zero. The unit test at behavioral_profile.rs:402-421 pins this behavior. See Kernel · Session for how guards plug into the request lifecycle.

Compliance Score

Sitting alongside the reputation scorecard is the compliance score at chio-kernel/src/compliance_score.rs. It produces a 0 to 1000 integer over a different question: did the agent comply with its policies during the window?. The factor weights are public constants at compliance_score.rs:33-41 and sum to COMPLIANCE_SCORE_MAX = 1000:

crates/kernel/chio-kernel/src/compliance_score.rs

pub const WEIGHT_DENY_RATE: u32 = 300;
pub const WEIGHT_REVOCATION: u32 = 300;
pub const WEIGHT_VELOCITY_ANOMALY: u32 = 150;
pub const WEIGHT_POLICY_COVERAGE: u32 = 150;
pub const WEIGHT_ATTESTATION_FRESHNESS: u32 = 100;

pub const DEFAULT_ATTESTATION_STALENESS_SECS: u64 = 7_776_000; // 90 days

Factor	Max deduction	Rate driver	Source
Deny rate	300	`deny_receipts / total_receipts` (zero when total is zero)	`compliance_score.rs:285-289`
Revocation	300	`revoked_capabilities / observed_capabilities`; floored to 1.0 when `any_revoked` is set	`compliance_score.rs:292-308`
Velocity anomaly	150	`anomalous_velocity_windows / velocity_windows`	`compliance_score.rs:311-315`
Policy coverage	150	`1 - avg(checkpoint_coverage, lineage_coverage)`; zero when no receipts observed	`compliance_score.rs:318-343`
Attestation freshness	100	`age_secs / attestation_staleness_secs` (default 90 days)	`compliance_score.rs:345-355`

The compliance score has a hard rule the reputation score does not. The default config sets revocation_ceiling = 499 (compliance_score.rs:227-232), and the score function caps the output below that ceiling whenever any capability is revoked (compliance_score.rs:259-263):

crates/kernel/chio-kernel/src/compliance_score.rs

let score = if inputs.any_revoked || inputs.revoked_capabilities > 0 {
    raw_score.min(config.revocation_ceiling)
} else {
    raw_score
};

The unit test revocation_flag_drives_score_below_500 at compliance_score.rs:420-436 pins this acceptance target.

Worked Example

Take a hypothetical agent subject-1 with a 14-day history, evaluated at now = 1_715_000_000 with the default config:

text

Receipts (180 total):
  168 Allow
  10 Deny (split across 2 policies)
  2 Canceled
  All within the last 14 days

Capabilities issued to subject-1: 4
  - 3 capped at max_invocations
  - 1 with constraint expressions
  - All operations exclude Delegate

Budget usage:
  Average utilization across capped grants: 0.62

Delegations subject-1 issued: 2
  - Both reduce scope vs parent
  - 1 reduces TTL
  - 1 reduces budget

Incident reports: None (incident_reports = None)

Component scores (rounded):

boundary_pressure.deny_ratio ~ 0.06 across two policies, so 1 - 0.06 = 0.94
resource_stewardship.fit_score = 1 - |0.62 - 0.75| = 0.87
least_privilege.score ~ 0.70 (used < granted, with non-trivial constraint and non-delegate factors)
history_depth.score = avg(180/1000, 14/30, activity_ratio) ~ 0.41
specialization.score ~ 0.85 (entropy across used tools)
delegation_hygiene.score = avg(1.0, 0.5, 0.5) = 0.67
reliability.score = 168 / (168 + 2) ~ 0.99 (denies excluded)
incident_correlation = Unknown (excluded from composite)

With incident correlation excluded, the effective weight sum is 0.90. The weighted composite is roughly:

text

composite = (0.20 * 0.94 + 0.10 * 0.87 + 0.15 * 0.70 + 0.10 * 0.41
             + 0.05 * 0.85 + 0.15 * 0.67 + 0.15 * 0.99) / 0.90
          ~ 0.78

That 0.78 is well above the 0.6 underwriting approval floor (see Credit & Underwriting), so the agent qualifies for a granted facility, with terms shaped by the dimensions in the credit scorecard.

Privacy and Tenancy

Scores are tenant-scoped by construction. Org A computes scores from Org A's corpus; Org B computes scores from Org B's corpus. The function is identical, but the inputs differ, so the scores differ.

Cross-org sharing happens through Agent Passports: Org A signs a credential carrying its scorecard for an agent, the agent presents the credential to Org B, and Org B applies its own attenuation policy via build_imported_reputation_signal. The default attenuation factor is 0.50, halving any imported composite before it influences a local decision.

No global score

Chio does not define a global reputation score. Each relying party computes, or imports and attenuates, its own score.

Inputs

API Surface

Invoking the Scorer

Default Constants

The Eight Metric Components

Configuration Knobs

Output Shape

Reputation Issuance Policy

Reputation Feeds and Tiers

Behavioral Profile Inputs

Compliance Score

Worked Example

Privacy and Tenancy

Related Reading