Data-Layer Guards
Four guards in chio-data-guards sit between the kernel and a database tool server. Two pre-invocation gates check the call before it runs (SqlQueryGuard, VectorDbGuard); one pre-invocation gate uses a dry-run estimate to cap warehouse cost (WarehouseCostGuard); one post-invocation hook reshapes the response before it reaches the agent (QueryResultGuard). Cross-link: Govern Database Queries for the worked example.
Stability
0.1.0. The pre-invocation guards are production paths (phase 7.1 / 7.2); the warehouse-cost and result-redaction surfaces ship behind the same compile but operators should treat the dry-run extractor and PII redactor as scaffold until field paths and severities are pinned per deployment.SqlQueryGuard
Source: crates/chio-data-guards/src/sql_guard.rs. Listens for ToolAction::DatabaseQuery and runs the SQL through sqlparser in the configured dialect. Five enforcement steps in order:
- Parse error: Deny.
- Operation in
operation_allowlist? - Every referenced table in
table_allowlist? - Every projected column in the per-table column allowlist (when configured)?
SELECT *is denied if the referenced table has a column allowlist entry. - Canonicalised
WHEREclause matches a denylist regex? - When
require_where_for_mutationsis set,UPDATE/DELETEwithout aWHEREclause: Deny.
Config
| Knob | Type | Default | Purpose |
|---|---|---|---|
dialect | SqlDialect | Generic | Generic / Postgres / MySql / Sqlite / MsSql / Snowflake / BigQuery. |
operation_allowlist | Vec<SqlOperation> | empty (denies) | Select / Insert / Update / Delete / Ddl / Other. |
table_allowlist | Vec<String> | empty (denies) | Case-insensitive table names. |
column_allowlist | Option<HashMap<String, Vec<String>>> | None | Per-table projection allowlist. * as a column entry allows any column on that table. |
denylisted_predicates | Vec<String> | empty | Case-insensitive regex patterns. Compiled with capacity caps. |
require_where_for_mutations | bool | true | Deny UPDATE / DELETE without a WHERE. |
allow_all | bool | false | Escape hatch. Logs a warning on construction. Parse errors still deny. |
The denylist regex compiler enforces a hard cap of 64 patterns, 512 chars per pattern, complexity score 96, and per-regex DFA size limit of 1 MiB. A pattern that exceeds any of those is rejected at construction. Invalid regex collapses the guard into a deny-all fallback (with a tracing::warn! record) so a bad config does not silently disable enforcement.
Structured deny reasons
SqlGuardDenyReason exposes a stable code per failure: parse_error, operation_not_allowed, table_not_allowed, select_star_denied, column_not_allowed, predicate_denylisted, missing_where_clause, no_config. The kernel returns Verdict::Deny; the reason lands on the receipt and the structured log line.
Computed projections fail closed
? placeholder for opaque projections (e.g. SELECT lower(ssn) FROM users) triggers Deny. The guard cannot prove the expression stays inside the allowed set without evaluating it, so it does not try.VectorDbGuard
Source: crates/chio-data-guards/src/vector_guard.rs. Inspects ToolAction::DatabaseQuery whose database identifier or tool name matches a vendor marker (Pinecone, Weaviate, Qdrant, Chroma, Milvus, or the generic vector sentinel) plus the memory-shaped variants. Four policy axes:
- Collection allowlist. A query whose collection / index / class / store is missing or outside the allowlist: Deny.
- Namespace scoping. Optional. When set, the call must declare a namespace inside the list. An empty allowlist on a namespaced grant denies every request.
- Operation class. Reuses
SqlOperationClassso the same constraint covers SQL and vector grants.ReadOnlydenies any verb inmutating_operations(upsert,insert,update,delete,write,index,reindex,drop,drop_index,create_collection,delete_collection). - top_k ceiling. When the grant's
Constraint::MaxRowsReturnedis set, the call must declare atop_k/topK/k/limit. Missing field with a configured ceiling: Deny.
Config highlights
| Knob | Type | Default | Purpose |
|---|---|---|---|
vendor_markers | Vec<String> | [vector, pinecone, weaviate, qdrant, chroma, milvus] | Substrings that flag a database / tool name as vector-shaped. |
collection_allowlist | Vec<String> | empty (denies) | Case-insensitive collection names. |
namespace_allowlist | Option<Vec<String>> | None | Optional namespace gate. |
denied_operations | Vec<String> | empty | Hard denylist that wins over the operation class. |
field_paths | VectorFieldPaths | collection / namespace / operation / top_k | Per-vendor argument key overrides. |
allow_all | bool | false | Escape hatch. |
Read-only without a verb fails closed
MemoryWrite-shaped call that omits the operation field bypasses the ReadOnly mutation gate unless the guard fails closed. It does: under SqlOperationClass::ReadOnly or ::ReadWrite, an absent verb denies. Any verb stricter than Admin requires the operation key to be present.QueryResultGuard
Source: crates/chio-data-guards/src/result_guard.rs. A post-invocation hook that reshapes the tool response before the agent sees it. Three transforms:
- Truncate row arrays to
Constraint::MaxRowsReturnedfrom the active scope. - Redact column values whose names are listed in
Constraint::ColumnDenylist. Bare names match any table; qualifiedtable.columnentries match only that table. - Apply optional PII regex patterns from the guard config (
redact_pii_patterns), replacing matches inside any string value with the configured marker.
Config
| Knob | Type | Default | Purpose |
|---|---|---|---|
redaction_marker | String | "[REDACTED]" | Replacement string for redacted values. |
redact_pii_patterns | Vec<String> | empty | Case-insensitive regexes scanned over every string value. |
rows_keys | Vec<String> | [rows, results, records, data] | Keys the guard treats as the row array on the response. |
The guard is fail-closed in spirit: a response that does not look like a row list is returned with every value in its data field replaced by the marker rather than passed through unredacted. Unknown column structures inside a row are also collapsed to the marker. PII regex compilation errors reject guard construction; a deployment cannot ship a guard with an unparseable pattern.
Pre-invocation registration is a no-op
QueryResultGuard implements Guard for symmetry with the standard pipeline, but its pre-invocation evaluate always returns Allow. The work happens in the PostInvocationHook impl. Operators wire it into PostInvocationPipeline; installing it pre-invocation is harmless and lets the guard ship through the same GuardPipeline::add call without branching the registry.WarehouseCostGuard
Source: crates/chio-data-guards/src/warehouse_cost_guard.rs. Pre-execution cost ceiling for analytical warehouses (BigQuery, Snowflake, Redshift, Athena, Databricks, Presto, Trino). The guard does not estimate cost itself; it reads a dry-run estimate that the tool server (or an upstream dry-run gate) attaches to the arguments:
{
"query": "SELECT ...",
"dry_run": {
"bytes_scanned": 53687091200,
"estimated_cost_usd": "0.25"
}
}Config
| Knob | Type | Default | Purpose |
|---|---|---|---|
max_bytes_scanned | Option<u64> | None | Hard upper bound on reported scan volume. |
max_cost_per_query_usd | Option<String> | None | Decimal-string cap on dry-run cost. |
warehouse_markers | Vec<String> | [bigquery, snowflake, redshift, athena, databricks, presto, trino] | Substrings that flag a request as warehouse-shaped. |
field_paths | WarehouseCostFieldPaths | dry_run.bytes_scanned and dry_run.estimated_cost_usd | Dot-paths to the dry-run fields. |
allow_all | bool | false | Escape hatch. |
Failure modes
- Missing dry-run metadata at the configured path with a ceiling set: Deny.
- Non-decimal
estimated_cost_usdor non-integerbytes_scanned: Deny. - Negative byte or cost values: Deny (the parser rejects the minus sign).
- No ceilings configured and
allow_all = false: Deny.
On a permitted call the guard exposes record_cost, which builds a CostDimension::WarehouseQuery record with the reported bytes and cost. Emission is not automatic inside Guard::evaluate because the kernel does not thread mutable receipts through guards; the helper keeps the mapping in one place for the calling integration.
HushSpec snippet
hushspec: "0.1.0"
guards:
data_layer:
sql_query:
dialect: postgres
operation_allowlist: [select]
table_allowlist: [orders, customers]
column_allowlist:
orders: [id, total, currency, created_at]
customers: [id, country]
denylisted_predicates:
- "(?i)\\bor\\s+1\\s*=\\s*1\\b"
require_where_for_mutations: true
vector_db:
collection_allowlist: [memories, embeddings]
namespace_allowlist: [tenant-a]
denied_operations: [drop_index, delete_collection]
query_result:
redaction_marker: "[REDACTED]"
redact_pii_patterns:
- "[\\w.+-]+@[\\w-]+\\.[\\w.-]+"
- "\\b\\d{3}-\\d{2}-\\d{4}\\b"
warehouse_cost:
max_bytes_scanned: 53687091200 # 50 GiB
max_cost_per_query_usd: "1.00"Performance class
| Guard | Hot path | Class |
|---|---|---|
SqlQueryGuard | SQL parse + AST walk + N regex sweeps on WHERE | O(query length) |
VectorDbGuard | Set lookups + grant constraint scan | O(C) |
QueryResultGuard | Row truncate + per-cell regex over response body | O(rows · cells · patterns) |
WarehouseCostGuard | Two JSON-pointer reads + decimal compare | O(1) |
Next steps
- Govern Database Queries walks the SQL guard end to end with a worked HushSpec.
- External Guard Adapters for layering content-safety providers in front of warehouse output.
- Memory Governance for the vector path's caller-side caps.