Chio/Docs

Data-Layer Guards

Four guards in chio-data-guards sit between the kernel and a database tool server. Two pre-invocation gates check the call before it runs (SqlQueryGuard, VectorDbGuard); one pre-invocation gate uses a dry-run estimate to cap warehouse cost (WarehouseCostGuard); one post-invocation hook reshapes the response before it reaches the agent (QueryResultGuard). Cross-link: Govern Database Queries for the worked example.

Stability

The crate is workspace version 0.1.0. The pre-invocation guards are production paths (phase 7.1 / 7.2); the warehouse-cost and result-redaction surfaces ship behind the same compile but operators should treat the dry-run extractor and PII redactor as scaffold until field paths and severities are pinned per deployment.

SqlQueryGuard

Source: crates/chio-data-guards/src/sql_guard.rs. Listens for ToolAction::DatabaseQuery and runs the SQL through sqlparser in the configured dialect. Five enforcement steps in order:

  1. Parse error: Deny.
  2. Operation in operation_allowlist?
  3. Every referenced table in table_allowlist?
  4. Every projected column in the per-table column allowlist (when configured)? SELECT * is denied if the referenced table has a column allowlist entry.
  5. Canonicalised WHERE clause matches a denylist regex?
  6. When require_where_for_mutations is set, UPDATE / DELETE without a WHERE clause: Deny.

Config

KnobTypeDefaultPurpose
dialectSqlDialectGenericGeneric / Postgres / MySql / Sqlite / MsSql / Snowflake / BigQuery.
operation_allowlistVec<SqlOperation>empty (denies)Select / Insert / Update / Delete / Ddl / Other.
table_allowlistVec<String>empty (denies)Case-insensitive table names.
column_allowlistOption<HashMap<String, Vec<String>>>NonePer-table projection allowlist. * as a column entry allows any column on that table.
denylisted_predicatesVec<String>emptyCase-insensitive regex patterns. Compiled with capacity caps.
require_where_for_mutationsbooltrueDeny UPDATE / DELETE without a WHERE.
allow_allboolfalseEscape hatch. Logs a warning on construction. Parse errors still deny.

The denylist regex compiler enforces a hard cap of 64 patterns, 512 chars per pattern, complexity score 96, and per-regex DFA size limit of 1 MiB. A pattern that exceeds any of those is rejected at construction. Invalid regex collapses the guard into a deny-all fallback (with a tracing::warn! record) so a bad config does not silently disable enforcement.

Structured deny reasons

SqlGuardDenyReason exposes a stable code per failure: parse_error, operation_not_allowed, table_not_allowed, select_star_denied, column_not_allowed, predicate_denylisted, missing_where_clause, no_config. The kernel returns Verdict::Deny; the reason lands on the receipt and the structured log line.

Computed projections fail closed

When a column allowlist is configured, the parser's ? placeholder for opaque projections (e.g. SELECT lower(ssn) FROM users) triggers Deny. The guard cannot prove the expression stays inside the allowed set without evaluating it, so it does not try.

VectorDbGuard

Source: crates/chio-data-guards/src/vector_guard.rs. Inspects ToolAction::DatabaseQuery whose database identifier or tool name matches a vendor marker (Pinecone, Weaviate, Qdrant, Chroma, Milvus, or the generic vector sentinel) plus the memory-shaped variants. Four policy axes:

  • Collection allowlist. A query whose collection / index / class / store is missing or outside the allowlist: Deny.
  • Namespace scoping. Optional. When set, the call must declare a namespace inside the list. An empty allowlist on a namespaced grant denies every request.
  • Operation class. Reuses SqlOperationClass so the same constraint covers SQL and vector grants. ReadOnly denies any verb in mutating_operations (upsert, insert, update, delete, write, index, reindex, drop, drop_index, create_collection, delete_collection).
  • top_k ceiling. When the grant's Constraint::MaxRowsReturned is set, the call must declare a top_k / topK / k / limit. Missing field with a configured ceiling: Deny.

Config highlights

KnobTypeDefaultPurpose
vendor_markersVec<String>[vector, pinecone, weaviate, qdrant, chroma, milvus]Substrings that flag a database / tool name as vector-shaped.
collection_allowlistVec<String>empty (denies)Case-insensitive collection names.
namespace_allowlistOption<Vec<String>>NoneOptional namespace gate.
denied_operationsVec<String>emptyHard denylist that wins over the operation class.
field_pathsVectorFieldPathscollection / namespace / operation / top_kPer-vendor argument key overrides.
allow_allboolfalseEscape hatch.

Read-only without a verb fails closed

A MemoryWrite-shaped call that omits the operation field bypasses the ReadOnly mutation gate unless the guard fails closed. It does: under SqlOperationClass::ReadOnly or ::ReadWrite, an absent verb denies. Any verb stricter than Admin requires the operation key to be present.

QueryResultGuard

Source: crates/chio-data-guards/src/result_guard.rs. A post-invocation hook that reshapes the tool response before the agent sees it. Three transforms:

  1. Truncate row arrays to Constraint::MaxRowsReturned from the active scope.
  2. Redact column values whose names are listed in Constraint::ColumnDenylist. Bare names match any table; qualified table.column entries match only that table.
  3. Apply optional PII regex patterns from the guard config (redact_pii_patterns), replacing matches inside any string value with the configured marker.

Config

KnobTypeDefaultPurpose
redaction_markerString"[REDACTED]"Replacement string for redacted values.
redact_pii_patternsVec<String>emptyCase-insensitive regexes scanned over every string value.
rows_keysVec<String>[rows, results, records, data]Keys the guard treats as the row array on the response.

The guard is fail-closed in spirit: a response that does not look like a row list is returned with every value in its data field replaced by the marker rather than passed through unredacted. Unknown column structures inside a row are also collapsed to the marker. PII regex compilation errors reject guard construction; a deployment cannot ship a guard with an unparseable pattern.

Pre-invocation registration is a no-op

QueryResultGuard implements Guard for symmetry with the standard pipeline, but its pre-invocation evaluate always returns Allow. The work happens in the PostInvocationHook impl. Operators wire it into PostInvocationPipeline; installing it pre-invocation is harmless and lets the guard ship through the same GuardPipeline::add call without branching the registry.

WarehouseCostGuard

Source: crates/chio-data-guards/src/warehouse_cost_guard.rs. Pre-execution cost ceiling for analytical warehouses (BigQuery, Snowflake, Redshift, Athena, Databricks, Presto, Trino). The guard does not estimate cost itself; it reads a dry-run estimate that the tool server (or an upstream dry-run gate) attaches to the arguments:

json
{
  "query": "SELECT ...",
  "dry_run": {
    "bytes_scanned": 53687091200,
    "estimated_cost_usd": "0.25"
  }
}

Config

KnobTypeDefaultPurpose
max_bytes_scannedOption<u64>NoneHard upper bound on reported scan volume.
max_cost_per_query_usdOption<String>NoneDecimal-string cap on dry-run cost.
warehouse_markersVec<String>[bigquery, snowflake, redshift, athena, databricks, presto, trino]Substrings that flag a request as warehouse-shaped.
field_pathsWarehouseCostFieldPathsdry_run.bytes_scanned and dry_run.estimated_cost_usdDot-paths to the dry-run fields.
allow_allboolfalseEscape hatch.

Failure modes

  • Missing dry-run metadata at the configured path with a ceiling set: Deny.
  • Non-decimal estimated_cost_usd or non-integer bytes_scanned: Deny.
  • Negative byte or cost values: Deny (the parser rejects the minus sign).
  • No ceilings configured and allow_all = false: Deny.

On a permitted call the guard exposes record_cost, which builds a CostDimension::WarehouseQuery record with the reported bytes and cost. Emission is not automatic inside Guard::evaluate because the kernel does not thread mutable receipts through guards; the helper keeps the mapping in one place for the calling integration.


HushSpec snippet

policy.yaml
hushspec: "0.1.0"
guards:
  data_layer:
    sql_query:
      dialect: postgres
      operation_allowlist: [select]
      table_allowlist: [orders, customers]
      column_allowlist:
        orders: [id, total, currency, created_at]
        customers: [id, country]
      denylisted_predicates:
        - "(?i)\\bor\\s+1\\s*=\\s*1\\b"
      require_where_for_mutations: true
    vector_db:
      collection_allowlist: [memories, embeddings]
      namespace_allowlist: [tenant-a]
      denied_operations: [drop_index, delete_collection]
    query_result:
      redaction_marker: "[REDACTED]"
      redact_pii_patterns:
        - "[\\w.+-]+@[\\w-]+\\.[\\w.-]+"
        - "\\b\\d{3}-\\d{2}-\\d{4}\\b"
    warehouse_cost:
      max_bytes_scanned: 53687091200    # 50 GiB
      max_cost_per_query_usd: "1.00"

Performance class

GuardHot pathClass
SqlQueryGuardSQL parse + AST walk + N regex sweeps on WHEREO(query length)
VectorDbGuardSet lookups + grant constraint scanO(C)
QueryResultGuardRow truncate + per-cell regex over response bodyO(rows · cells · patterns)
WarehouseCostGuardTwo JSON-pointer reads + decimal compareO(1)

Next steps

Data-Layer Guards · Chio Docs