Govern Database Queries

Why Govern Database Access

Chio governs which tools an agent may invoke. Data guards also check what a tool asks the database to do. A valid query can still cause harm:

Prompt injection writing SQL. A poisoned document convinces the model to emit SELECT * FROM salaries instead of the intended query. The tool call and SQL are valid, but the result exposes the table.
Text-to-SQL emitting destructive writes. The model decides the fastest way to "clean up" is DELETE FROM users with no WHERE clause. The query can delete every row in the table.
Unbounded RAG retrieval. An agent queries a vector index with top_k=10000 against a collection outside its tenant, pulling every embedding into the session.
Warehouse cost bombs. A bad JOIN against a billion-row BigQuery table scans terabytes. The agent spent five dollars of budget on tokens and fifteen thousand dollars on cloud compute.
Graph traversal explosions. A naive MATCH (a)-[*]->(b) on a social graph returns millions of paths and stalls the cluster.
Cache pattern overreach. A session tool with keys scoped to one agent discovers it can KEYS * and read every other agent's state.

Four guards ship in chio-data-guards today · sql-query, vector-db, warehouse-cost, and the post-invocation query-result guard · covering the SQL, vector, and warehouse risks above. A graph-traversal guard and a cache-key guard are specified in the guards design doc but are not yet implemented (see Graph and Cache Guards below). The shipped guards sit in the same pre-invocation pipeline as forbidden-path, egress-allowlist, and velocity, and emit structured deny reasons that your receipts capture verbatim.

How It Works

Chio intercepts at the kernel boundary. It never runs inside the database driver and never connects to your data store directly. Your tool server wraps the database, captures the query it is about to execute, and submits it to Chio as tool-call arguments. The guard pipeline parses and enforces the policy before the query runs. If the verdict is allow, the tool server executes. If the verdict is deny, the tool server returns the deny reason to the agent and does not contact the database.

rendering…

Pre-invocation guards evaluate the submitted query; on allow the tool server executes; on return the post-invocation guard reshapes the result before the agent sees it.

Submit requests as JSON. Each data guard reads a defined set of argument keys and recognizes which requests are its concern from the tool name and the database identifier. Two things are not per-call arguments. The SQL dialect is fixed once on the guard's config (SqlGuardConfig.dialect) when the operator constructs it, not sent with each query. And for the SQL and warehouse guards to see a call at all, the tool_name must be one the kernel classifies as a database call — the action extractor recognizes sql, query, postgres, mysql, sqlite, snowflake, bigquery, redshift, and a handful of other exact names. Here is a representative submission for a relational query:

json

{
  "tool_name": "sql",
  "arguments": {
    "database": "analytics",
    "query": "SELECT name, email FROM users WHERE tenant_id = 'acme' LIMIT 100"
  }
}

A warehouse submission carries a dry-run estimate so the warehouse-cost guard can price the query before it runs:

json

{
  "tool_name": "bigquery",
  "arguments": {
    "database": "my-project.analytics",
    "query": "SELECT user_id, SUM(amount) FROM orders GROUP BY user_id",
    "dry_run": {
      "bytes_scanned": 52428800,
      "estimated_cost_usd": "0.25"
    }
  }
}

The warehouse-cost guard fires because a warehouse marker matches: here the tool name bigquery contains one. A tool named generically (query) works too, as long as the database value contains a marker such as bigquery or snowflake. If neither the tool name nor the database identifier carries a marker, the guard treats the call as none of its business and passes it through.

A vector submission carries the collection, namespace, operation, and top_k:

json

{
  "tool_name": "vector_search",
  "arguments": {
    "database": "pinecone-prod",
    "collection": "product-embeddings",
    "namespace": "production",
    "operation": "query",
    "top_k": 10
  }
}

Each guard reads its own fields, applies its constraints, and either passes the request through or returns a Verdict::Deny with a structured reason (for example TableNotAllowed, TopKExceedsLimit, BytesExceedsLimit). The receipt captures the reason code so operators and auditors can inspect exactly which constraint fired.

Why Chio does not talk to the warehouse directly

The kernel is the trusted mediator and tool servers are inside the sandbox. If chio called BigQuery to estimate costs it would need warehouse credentials, violate privilege separation, and perform async I/O on the hot path. The dry-run estimate pattern keeps all external I/O in the tool server and the receipt log keeps the tool server's self-report auditable.

Pick Your Guard

The four guards split cleanly by engine type. Three run pre-invocation, one runs post-invocation. You typically register all four on the pipeline and each one short-circuits to allow when the request is not its shape (thevector-db guard passes through SQL calls, the sql-query guard passes through vector calls, and so on).

Config shapes, not HushSpec rule blocks

The data-layer guards are not yet wired into HushSpec's rule-block compiler, and neither chio check --policy nor chio run --policy compiles them. You construct each guard's config struct and register it on the pipeline in Rust, as shown in Composing with Other Guards. The YAML blocks below illustrate the shape of those Deserialize config structs (SqlGuardConfig, VectorGuardConfig, WarehouseCostGuardConfig, QueryResultGuardConfig), not policy-file keys.

Engine	Guard	Phase	Key constraints
Relational (Postgres, MySQL, SQL Server, SQLite)	`sql-query`	pre	operation allowlist, table allowlist, column allowlist, predicate denylist, WHERE for mutations
Vector (Pinecone, Qdrant, Weaviate, Milvus, Chroma)	`vector-db`	pre	collection allowlist, namespace allowlist, denied operations, top_k ceiling
Warehouse (BigQuery, Snowflake, Redshift, Athena, Databricks, Presto, Trino)	`warehouse-cost`	pre	max bytes scanned, max cost per query (USD), dry-run required
Any (applied to tool response)	`query-result`	post	row truncation via `MaxRowsReturned`, column redaction via `ColumnDenylist`, PII regex patterns

The post-invocation query-result guard is defense in depth. Pre-invocation guards reason about the query text; query-result reasons about what actually came back. A column the parser missed, a LIMIT the tool server ignored, a PII string that slipped through: the post-invocation pass truncates and redacts before the agent ever sees the response.

Relational SQL

SqlQueryGuard parses SQL with sqlparser and enforces four knobs: operation allowlist, table allowlist, per-table column allowlist, and a regex denylist against the canonicalized WHERE clause. A fifth knob, require_where_for_mutations, defaults to true and denies UPDATE and DELETE without a WHERE clause regardless of any other policy.

The guard is fail-closed. An empty config denies all queries. Parse errors deny, even when allow_all is enabled. SELECT * is denied whenever the referenced table has a column allowlist entry, because the guard cannot prove the expansion stays inside the allowed set.

A representative SqlGuardConfig. It derives Deserialize, so it can be loaded from YAML or built directly in Rust:

yaml

dialect: postgres
operation_allowlist:
  - select
table_allowlist:
  - users
  - orders
  - products
column_allowlist:
  users:
    - id
    - name
    - email
    - created_at
  orders:
    - id
    - user_id
    - total
    - status
  products:
    - "*"
denylisted_predicates:
  - '\bor\s+1\s*=\s*1\b'
  - '\bunion\s+select\b'
require_where_for_mutations: true
allow_all: false

These deny scenarios each correspond to a variant of SqlGuardDenyReason that shows up in the receipt:

sql

-- DENIED: TableNotAllowed
SELECT id, total FROM salaries;

-- DENIED: OperationNotAllowed (DELETE is not in operation_allowlist)
DELETE FROM users WHERE id = 42;

-- DENIED: ColumnNotAllowed (ssn is not on the users allowlist)
SELECT id, ssn FROM users WHERE tenant_id = 'acme';

-- DENIED: SelectStarDenied (users has a column allowlist entry)
SELECT * FROM users;

-- DENIED: PredicateDenylisted (matches OR 1=1 pattern)
SELECT id FROM orders WHERE user_id = 1 OR 1=1;

-- DENIED: MissingWhereClause (require_where_for_mutations = true)
DELETE FROM orders;

-- DENIED: OperationNotAllowed (DDL requires an explicit Ddl entry)
DROP TABLE users;

-- DENIED: ParseError (fail-closed on unparseable SQL)
SELEKT oops;

-- ALLOWED
SELECT id, name, email FROM users WHERE tenant_id = 'acme' LIMIT 100;

Column restrictions compose with the post-invocation guard. The sql-query guard rejects a SELECT that names a disallowed column. For cases the parser cannot see through (alias chains, computed expressions like lower(ssn), view expansions), the query-result guard runs after the tool returns and redacts any denied column that shows up in the response. The SELECT list is checked before execution and the returned data is redacted afterward.

DELETE and UPDATE without WHERE are always denied

With require_where_for_mutations = true (the default), any mutation that lacks a WHERE clause produces SqlGuardDenyReason::MissingWhereClause before the query reaches the database. The safety net is independent of the operation allowlist. Even a capability with delete on the allowlist cannot issue a table-wide DELETE.

Vector Databases

VectorDbGuard covers the four risks unique to vector databases: cross-collection access, cross-namespace access, write verbs under a read-only grant, and top_k overreach. It reads four fields from the arguments by configurable JSON paths, each independently overridable via field_paths. The default collection keys are collection, index, class, and store; the default namespace keys are namespace, tenant, and partition; the default operation keys are operation, op, and action; and the default top_k keys are top_k, topK, k, and limit.

The guard recognizes a request as vector-shaped when the database or tool name contains one of the configured vendor markers. The defaults cover vector, pinecone, weaviate, qdrant, chroma, and milvus. Non-vector traffic falls through to allow so the guard composes cleanly in a pipeline that also sees SQL and warehouse traffic.

The VectorGuardConfig shape:

yaml

vendor_markers:
  - pinecone
  - qdrant
  - weaviate
  - milvus
collection_allowlist:
  - product-embeddings
  - faq-embeddings
namespace_allowlist:
  - production
denied_operations:
  - drop_index
  - delete_collection
allow_all: false

The top_k ceiling and read-only enforcement are not part of VectorGuardConfig; the guard reads them from the active capability token's scope constraints. The top_k ceiling is read from the scope's MaxRowsReturned constraint. When a ceiling is configured and the call omits top_k, the guard fails closed with TopKExceedsLimit. A SqlOperationClass::ReadOnly constraint on the scope blocks write verbs (upsert, insert, update, delete, write, index, reindex) regardless of the denied_operations list.

json

// DENIED: CollectionNotAllowed
{ "collection": "internal-hr-embeddings", "namespace": "production", "operation": "query", "top_k": 10 }

// DENIED: NamespaceNotAllowed
{ "collection": "product-embeddings", "namespace": "staging", "operation": "query", "top_k": 10 }

// DENIED: OperationNotAllowed (upsert under SqlOperationClass::ReadOnly)
{ "collection": "product-embeddings", "namespace": "production", "operation": "upsert", "top_k": 10 }

// DENIED: TopKExceedsLimit { requested: 500, max: 50 }
{ "collection": "product-embeddings", "namespace": "production", "operation": "query", "top_k": 500 }

// DENIED: TopKExceedsLimit { requested: u64::MAX, max: 50 } (fail-closed when top_k missing)
{ "collection": "product-embeddings", "namespace": "production", "operation": "query" }

// ALLOWED
{ "collection": "product-embeddings", "namespace": "production", "operation": "query", "top_k": 10 }

Block embedding exfiltration

Raw vectors enable reconstruction attacks: an attacker with embeddings and the model can recover approximate source text. Keep include_vectors: false on your tool server unless the grant explicitly needs vectors. The vector guard treats collection identity as the primary boundary; the tool server is responsible for stripping vectors from the response before the query-result guard even sees it.

Data Warehouses

Warehouses are especially cost-sensitive. A single bad JOIN can scan terabytes and cost thousands of dollars. WarehouseCostGuard enforces two ceilings · bytes scanned and USD per query · using a dry-run estimate the tool server attaches to every request.

The pattern is straightforward. Your tool server calls the warehouse's dry-run API (BigQuery and Snowflake both support it natively), reads back bytes scanned and estimated cost, and submits both to chio as dry_run.bytes_scanned and dry_run.estimated_cost_usd. The guard compares the estimate to the configured limits and denies before the query is dispatched.

yaml

# WarehouseCostGuardConfig
max_bytes_scanned: 1073741824        # 1 GiB
max_cost_per_query_usd: "5.00"
warehouse_markers:
  - bigquery
  - snowflake
  - redshift
  - athena
  - databricks
  - presto
  - trino
field_paths:
  bytes_scanned: "dry_run.bytes_scanned"
  estimated_cost_usd: "dry_run.estimated_cost_usd"
allow_all: false

The dry-run flow:

text

1. Agent submits: "summarize orders from last month"
2. Tool server generates SQL:
     SELECT user_id, SUM(amount) FROM analytics.orders
     WHERE order_date &gt;= '2026-03-01' GROUP BY user_id
3. Tool server calls BigQuery dry-run:
     bytes_scanned = 52_428_800      (50 MiB)
     estimated_cost = "0.25"         (25 cents)
4. Tool server submits to chio:
     tool_name = "bigquery"
     arguments = {
       "database": "my-project.analytics",
       "query": "...",
       "dry_run": { "bytes_scanned": 52428800, "estimated_cost_usd": "0.25" }
     }
5. warehouse-cost guard:
     50 MiB &lt; 1 GiB limit  -> OK
     $0.25 &lt; $5.00 limit    -> OK
     verdict: allow
6. Tool server runs the approved query
7. Receipt records CostDimension::WarehouseQuery {
     bytes_scanned: 52428800,
     estimated_cost_usd: "0.25"
   }

Deny scenarios with structured reasons. Each one lands in the receipt as a stable code you can alert on:

json

// DENIED: BytesExceedsLimit { bytes_scanned: 53687091200, limit: 1073741824 }
{ "dry_run": { "bytes_scanned": 53687091200, "estimated_cost_usd": "0.25" } }

// DENIED: CostExceedsLimit { estimated_cost_usd: "25.00", limit_usd: "5.00" }
{ "dry_run": { "bytes_scanned": 5242880, "estimated_cost_usd": "25.00" } }

// DENIED: MissingEstimate { path: "dry_run.bytes_scanned" }
{ "query": "SELECT 1" }

// DENIED: ParseError { error: "dry_run.estimated_cost_usd is not a non-negative decimal string" }
{ "dry_run": { "bytes_scanned": 1024, "estimated_cost_usd": "-5.00" } }

// ALLOWED
{ "dry_run": { "bytes_scanned": 52428800, "estimated_cost_usd": "0.25" } }

Require a dry-run estimate

If your policy has any cost constraint, make the dry-run estimate mandatory in your tool server. A missing dry_run field denies the request fail-closed with MissingEstimate. Tool servers that cannot dry-run (some flavors of Athena, older Redshift) must refuse to submit queries without an estimate.

Graph and Cache Guards (Proposed)

Not yet implemented

A graph-traversal guard (for Neo4j and Neptune) and a cache-key guard (for Redis and Memcached) are specified in the data-layer guards design doc but do not ship in chio-data-guards today. The crate implements four guards: sql-query, vector-db, warehouse-cost, and the post-invocation query-result guard. The following paragraphs describe planned behavior, not configuration available today.

The design covers two more risk classes. Graphs allow unbounded traversals: a Cypher pattern like MATCH (a)-[*]->(b) can walk every reachable node from every starting point, which on a large graph is effectively a denial-of-service. The proposed graph-traversal guard caps traversal depth and constrains node labels and relationship types.

Caches are low risk per query but high risk for cross-tenant leakage: an agent scoped to session:agent-42:* should not be able to run KEYS * or read another agent's state. The proposed cache-key guard enforces a key-pattern allowlist and blocks administrative commands.

Post-Invocation: Redact and Truncate Results

QueryResultGuard is the one post-invocation guard in the set. It runs after the tool server returns, before the response reaches the agent. Three jobs:

Row truncation. If the active scope has any MaxRowsReturned constraint, the rows array is truncated to the strictest limit across grants. A tool server that ignored the pre-invocation LIMIT still cannot deliver more rows than the policy allows.
Column redaction. Every column on ColumnDenylist is replaced with [REDACTED] (configurable via redaction_marker). Qualified entries like users.email match both flat rows where email is the key and nested rows where users wraps the column.
PII pattern matching. The guard takes a list of regex patterns via redact_pii_patterns. Matches in any string value in the response are replaced with the redaction marker. Invalid patterns are logged and skipped so a typo cannot accidentally widen redaction.

A representative config with common PII patterns:

yaml

# QueryResultGuardConfig
redaction_marker: "[REDACTED]"
rows_keys:
  - rows
  - results
  - records
  - data
redact_pii_patterns:
  - '\b\d{3}-\d{2}-\d{4}\b'                                   # US SSN
  - '\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b'              # credit card
  - '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'              # email
  - '\+?1?[\s.-]?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}'         # US phone

Example input from the tool server:

json

{
  "rows": [
    { "id": 1, "name": "Ada", "email": "ada@example.com", "ssn": "123-45-6789" },
    { "id": 2, "name": "Bob", "email": "bob@example.com", "ssn": "987-65-4321" },
    { "id": 3, "name": "Cam", "email": "cam@example.com", "ssn": "555-12-3456" }
  ]
}

With a scope carrying MaxRowsReturned(2) and ColumnDenylist(["ssn"]), the agent receives:

json

{
  "rows": [
    { "id": 1, "name": "Ada", "email": "ada@example.com", "ssn": "[REDACTED]" },
    { "id": 2, "name": "Bob", "email": "bob@example.com", "ssn": "[REDACTED]" }
  ]
}

The guard runs on rows, results, records, or data, in that order. A top-level JSON array is treated as the rows list directly. If the response is constrained (a column denylist is active) but the guard cannot find a rows shape, it redacts the entire payload fail-closed.

Use post-invocation redaction

The pre-invocation sql-query guard checks the SELECT list, but table structures change, aliases mask column names, views expand transparently, and tool servers can add metadata columns the agent did not request. Install both guards. If the pre-invocation guard misses a case, the post-invocation guard redacts the response before the agent sees it.

Tool Server Submission Contract

The guards depend on your tool server submitting structured arguments. Use this request shape so data guards can read the same fields.

Field	Required for	Purpose
`tool_name`	all	names the call so the kernel classifies it as a database query (`sql`, `query`, `postgres`, `bigquery`, ...) and so warehouse and vendor markers can match; the SQL dialect is a static guard config, not a submitted field
`database`	all	target database, schema, namespace, or cluster identifier
`query`	SQL, warehouse, graph	raw query text for parsing
`collection` / `namespace`	vector	collection path and namespace for scoping
`operation`	vector, cache	verb (`query`, `upsert`, `get`, `set`)
`top_k`	vector	result volume (also `topK`, `k`, `limit`)
`dry_run`	warehouse	`{ bytes_scanned, estimated_cost_usd }`
`key` / `pattern` / `command`	cache	Redis key, SCAN pattern, or command verb
`max_depth` / `node_labels`	graph	traversal depth and label list reported by the driver

A Python tool server that submits a SQL call:

tool_server/sql.py

from chio_sdk import KernelClient

kernel = KernelClient(sidecar_url="http://127.0.0.1:9090")

async def sql_query(tool_call, conn):
    query: str = tool_call.arguments["query"]

    # Submit to chio BEFORE executing. The kernel runs:
    #   velocity -> sql-query -> (vector-db, warehouse-cost, ...) pass through
    # tool_name must be a recognized database tool ("sql", "postgres", ...);
    # the parser dialect is fixed on SqlGuardConfig, not sent per call.
    verdict = await kernel.evaluate(
        tool_name="sql",
        arguments={
            "database": "analytics",
            "query": query,
        },
    )
    if verdict.denied:
        return {"error": verdict.reason, "code": verdict.code}

    rows = await conn.fetch(query)

    # Submit the response for post-invocation shaping.
    shaped = await kernel.inspect_response(
        tool_name="sql",
        response={
            "rows": [dict(r) for r in rows],
        },
    )
    return shaped.value

A TypeScript tool server that submits a warehouse call with dry-run:

tool_server/warehouse.ts

import { KernelClient } from "@chio-protocol/sdk";
import { BigQuery } from "@google-cloud/bigquery";

const kernel = new KernelClient({ sidecarUrl: "http://127.0.0.1:9090" });
const bq = new BigQuery();

export async function warehouseQuery(toolCall: ToolCall) {
  const query = toolCall.arguments.query as string;

  // Run the dry-run first. BigQuery charges zero for dry-run jobs.
  const [dryRunJob] = await bq.createQueryJob({ query, dryRun: true });
  const dryRun = {
    bytes_scanned: Number(dryRunJob.metadata.statistics.totalBytesProcessed),
    estimated_cost_usd: (
      Number(dryRunJob.metadata.statistics.totalBytesProcessed) /
      1024 ** 4 *
      5.0
    ).toFixed(6),
  };

  // tool_name "bigquery" is both a recognized database tool and carries the
  // warehouse marker, so the warehouse-cost guard sees and prices the call.
  const verdict = await kernel.evaluate({
    toolName: "bigquery",
    arguments: {
      database: "my-project.analytics",
      query,
      dry_run: dryRun,
    },
  });
  if (verdict.denied) {
    return { error: verdict.reason, code: verdict.code };
  }

  const [rows] = await bq.query({ query });

  return kernel.inspectResponse({
    toolName: "bigquery",
    response: { rows },
  });
}

Composing with Other Guards

Data-layer guards do not replace the existing guards; they layer on top. You typically register them in a pipeline that already has velocity-guard, data-flow-guard, and egress-allowlist-guard. Cheap guards run first so the pipeline short-circuits quickly on denial.

src/kernel_setup.rs

use chio_guards::GuardPipeline;
use chio_data_guards::{
    QueryResultGuard, QueryResultGuardConfig,
    SqlGuardConfig, SqlQueryGuard,
    VectorDbGuard, VectorGuardConfig,
    WarehouseCostGuard, WarehouseCostGuardConfig,
};

let mut pipeline = GuardPipeline::default_pipeline();

// Data-layer pre-invocation guards.
pipeline.add(Box::new(SqlQueryGuard::new(sql_config())));
pipeline.add(Box::new(VectorDbGuard::new(vector_config())));
pipeline.add(Box::new(WarehouseCostGuard::new(warehouse_config())));

kernel.add_guard(Box::new(pipeline));

// Post-invocation shaping runs separately. QueryResultGuard::new returns
// Result<Self, String>: it rejects invalid or over-broad PII regexes at
// construction so policy loading fails closed.
let result_guard = QueryResultGuard::new(QueryResultGuardConfig {
    redact_pii_patterns: vec![
        r"\b\d{3}-\d{2}-\d{4}\b".into(),
    ],
    ..Default::default()
})
.expect("PII patterns compile");
kernel.add_post_invocation(Box::new(result_guard));

Three compositions you should plan for:

Velocity. Rate-limit queries per minute with velocity-guard. Data-layer guards are latency-sensitive (SQL parsing, regex matching), so velocity runs first and drops runaway loops before they spend parser time.
Data flow. Session-level bytes-read limits live on data-flow-guard. Data-layer guards enforce per-query ceilings (top_k, MaxBytesScanned); data-flow enforces the cumulative total across every query in the session. Both apply.
Egress allowlist. If your tool server talks to an external database over the network, egress-allowlist-guard ensures it reaches only the approved hosts. The data-layer guards check the query content; the egress guard checks the endpoint. Both apply.

A small HushSpec block that layers all three:

yaml

velocity:
  max_tool_calls_per_window:
    window_seconds: 60
    limit: 120
  max_spend_per_window:
    window_seconds: 3600
    limit_usd: "50.00"

data_flow:
  max_bytes_read_per_session: 104857600        # 100 MiB

egress_allowlist:
  hosts:
    - "bigquery.googleapis.com"
    - "db.internal:5432"
    - "pinecone.io"

The data-layer guard configs are not part of this HushSpec file. Attach them through the Rust pipeline wiring shown above.

Debugging Denials

Each data-guard deny reason is a structured enum with a short stable code. The receipt records the code so you can aggregate denials by reason without log-string parsing. A denied receipt looks like this:

json

{
  "receipt_id": "rcpt_01HXYZ...",
  "tool_name": "sql",
  "server_id": "analytics-db",
  "verdict": "deny",
  "guard": "sql-query",
  "deny_reason": {
    "code": "table_not_allowed",
    "message": "table 'salaries' is not in the allowlist",
    "detail": { "table": "salaries" }
  },
  "metadata": {
    "database": "analytics",
    "query_hash": "sha256:a1b2c3..."
  }
}

The codes to know when triaging a denial:

Guard	Code	What to fix
`sql-query`	`parse_error`	query is malformed or uses syntax the dialect parser rejects; check `dialect` setting
`sql-query`	`table_not_allowed`	add the table to `table_allowlist` or fix the query
`sql-query`	`column_not_allowed`	add the column to the per-table allowlist or remove it from the SELECT
`sql-query`	`select_star_denied`	enumerate columns explicitly; SELECT * is denied when the table has a column allowlist
`sql-query`	`missing_where_clause`	add a WHERE to the UPDATE or DELETE, or disable `require_where_for_mutations` (not recommended)
`sql-query`	`predicate_denylisted`	the WHERE matched a regex in `denylisted_predicates`; rewrite or prune the rule
`vector-db`	`collection_not_allowed`	add the collection to `collection_allowlist`
`vector-db`	`top_k_exceeds_limit`	reduce top_k or raise `MaxRowsReturned` on the grant
`warehouse-cost`	`missing_estimate`	tool server must attach `dry_run.bytes_scanned` and `dry_run.estimated_cost_usd`
`warehouse-cost`	`bytes_exceeds_limit`	query would scan too much; add filters, partition predicates, or raise `max_bytes_scanned`
`warehouse-cost`	`cost_exceeds_limit`	same fix path as bytes, or raise `max_cost_per_query_usd` with approval
all	`no_config`	the guard is registered but has no allowlists; either populate the policy or remove the guard

The query audit receipts guide walks through the query-specific receipt fields and shows how to aggregate denials by code for dashboards and alerting.

Summary

Engine	Guard	Key constraint	Typical receipt field
Postgres, MySQL, SQL Server, SQLite	`sql-query`	`table_allowlist`, `column_allowlist`	`database`, `tables_accessed`, `query_hash`
Pinecone, Qdrant, Weaviate, Milvus	`vector-db`	`collection_allowlist`, `MaxRowsReturned`	`collection`, `namespace`, `top_k`
BigQuery, Snowflake, Redshift, Athena, Databricks, Presto, Trino	`warehouse-cost`	`max_bytes_scanned`, `max_cost_per_query_usd`	`CostDimension::WarehouseQuery` with `bytes_scanned` and `estimated_cost_usd`
any (post-invocation)	`query-result`	`MaxRowsReturned`, `ColumnDenylist`, `redact_pii_patterns`	`rows_returned`, `columns_redacted`

Next Steps

Guards · how the four data-layer guards fit into the broader guard model
Write a Policy · HushSpec syntax for constraints, allowlists, and grant scoping
Custom Guards · add organization-specific logic on top of the built-in four
Native Tool Server · build a tool server that submits the contract these guards expect
Receipts · signed audit records for database queries
Query Audit Receipts · aggregate database receipts for compliance and cost reporting

PreviousCustom Guards NextAgent Passport