Govern Database Queries
Every agent framework ships a database tool. LangChain has SQLDatabaseToolkit. LlamaIndex has NLSQLTableQueryEngine. Custom agents call Postgres, BigQuery, Pinecone, and Redis through tool wrappers. Without chio in the middle, a single text-to-SQL step can exfiltrate a table, mutate production data, or run a warehouse query costing thousands of dollars. This guide walks through the six guards that live in the chio-data-guards crate. They share one submission contract, compose with velocity and data-flow guards, and together give you enforceable policy over SQL, vector, warehouse, graph, and cache access.
Why Govern the Data Layer
Chio already governs which tools an agent may invoke. The data layer adds a second question: what does the tool do once it reaches the database? That question has sharp teeth because a tool server can pass a grammatically valid query that devastates a system:
- Prompt injection writing SQL. A poisoned document convinces the model to emit
SELECT * FROM salariesinstead of the intended query. The tool call is authorized, the SQL is valid, and the table walks out the door. - Text-to-SQL emitting destructive writes. The model decides the fastest way to "clean up" is
DELETE FROM userswith no WHERE clause. Production data vanishes in one step. - Unbounded RAG retrieval. An agent queries a vector index with
top_k=10000against a collection outside its tenant, pulling every embedding into the session. - Warehouse cost bombs. A bad JOIN against a billion-row BigQuery table scans terabytes. The agent spent five dollars of budget on tokens and fifteen thousand dollars on cloud compute.
- Graph traversal explosions. A naive
MATCH (a)-[*]->(b)on a social graph returns millions of paths and stalls the cluster. - Cache pattern overreach. A session tool with keys scoped to one agent discovers it can
KEYS *and read every other agent's state.
The six data-layer guards you get with chio handle each class of risk. They live in chio-data-guards, sit in the same pre-invocation pipeline as forbidden-path, egress-allowlist, and velocity, and emit structured deny reasons that your receipts capture verbatim.
How It Works
Chio intercepts at the kernel boundary. It never runs inside the database driver and never connects to your data store directly. Your tool server wraps the database, captures the query it is about to execute, and submits it to chio as tool-call arguments. The guard pipeline parses and enforces the policy before the query runs. If the verdict is allow, the tool server executes. If the verdict is deny, the tool server returns the deny reason to the agent and nothing hits the database.
Submissions are JSON. The shape you send depends on the engine, but every data-layer guard reads a small, predictable set of keys from the arguments object. Here is a representative submission for a relational query:
{
"tool_name": "sql_query",
"arguments": {
"engine": "postgres",
"database": "analytics",
"query": "SELECT name, email FROM users WHERE tenant_id = 'acme' LIMIT 100"
}
}A warehouse submission carries a dry-run estimate so the warehouse-cost guard can price the query before it runs:
{
"tool_name": "warehouse_query",
"arguments": {
"engine": "bigquery",
"database": "my-project.analytics",
"query": "SELECT user_id, SUM(amount) FROM orders GROUP BY user_id",
"dry_run": {
"bytes_scanned": 52428800,
"estimated_cost_usd": "0.25"
}
}
}A vector submission carries the collection, namespace, operation, and top_k:
{
"tool_name": "vector_search",
"arguments": {
"database": "pinecone-prod",
"collection": "product-embeddings",
"namespace": "production",
"operation": "query",
"top_k": 10
}
}Each guard reads its own fields, applies its constraints, and either passes the request through or returns a Verdict::Deny with a structured reason (for example TableNotAllowed, TopKExceedsLimit, BytesExceedsLimit). The receipt captures the reason code so operators and auditors can inspect exactly which constraint fired.
Why chio does not talk to the warehouse directly
Pick Your Guard
The six guards split cleanly by engine type. Five run pre-invocation, one runs post-invocation. You typically register all six on the pipeline and each one short-circuits to allow when the request is not its shape (thevector-db guard passes through SQL calls, the sql-query guard passes through vector calls, and so on).
| Engine | Guard | Phase | Key constraints |
|---|---|---|---|
| Relational (Postgres, MySQL, SQL Server, SQLite) | sql-query | pre | operation allowlist, table allowlist, column allowlist, predicate denylist, WHERE for mutations |
| Vector (Pinecone, Qdrant, Weaviate, Milvus, Chroma) | vector-db | pre | collection allowlist, namespace allowlist, denied operations, top_k ceiling |
| Warehouse (BigQuery, Snowflake, Redshift, Athena, Databricks) | warehouse-cost | pre | max bytes scanned, max cost per query (USD), dry-run required |
| Graph (Neo4j, Neptune) | graph-traversal | pre | max traversal depth, label allowlist, relationship type allowlist |
| Cache (Redis, Memcached) | cache-key | pre | key pattern allowlist, dangerous command blocklist |
| Any (applied to tool response) | query-result | post | row truncation via MaxRowsReturned, column redaction via ColumnDenylist, PII regex patterns |
The post-invocation query-result guard is defense in depth. Pre-invocation guards reason about the query text; query-result reasons about what actually came back. A column the parser missed, a LIMIT the tool server ignored, a PII string that slipped through: the post-invocation pass truncates and redacts before the agent ever sees the response.
Relational SQL
SqlQueryGuard parses SQL with sqlparser and enforces four knobs: operation allowlist, table allowlist, per-table column allowlist, and a regex denylist against the canonicalized WHERE clause. A fifth knob, require_where_for_mutations, defaults to true and denies UPDATE and DELETE without a WHERE clause regardless of any other policy.
The guard is fail-closed. An empty config denies every query. Parse errors deny, even when allow_all is enabled. SELECT * is denied whenever the referenced table has a column allowlist entry, because the guard cannot prove the expansion stays inside the allowed set.
A typical policy in HushSpec YAML:
sql_query:
dialect: postgres
operation_allowlist:
- select
table_allowlist:
- users
- orders
- products
column_allowlist:
users:
- id
- name
- email
- created_at
orders:
- id
- user_id
- total
- status
products:
- "*"
denylisted_predicates:
- '\bor\s+1\s*=\s*1\b'
- '\bunion\s+select\b'
require_where_for_mutations: true
allow_all: falseHere are the canonical deny scenarios. Each one corresponds to a variant of SqlGuardDenyReason that shows up in the receipt:
-- DENIED: TableNotAllowed
SELECT id, total FROM salaries;
-- DENIED: OperationNotAllowed (DELETE is not in operation_allowlist)
DELETE FROM users WHERE id = 42;
-- DENIED: ColumnNotAllowed (ssn is not on the users allowlist)
SELECT id, ssn FROM users WHERE tenant_id = 'acme';
-- DENIED: SelectStarDenied (users has a column allowlist entry)
SELECT * FROM users;
-- DENIED: PredicateDenylisted (matches OR 1=1 pattern)
SELECT id FROM orders WHERE user_id = 1 OR 1=1;
-- DENIED: MissingWhereClause (require_where_for_mutations = true)
DELETE FROM orders;
-- DENIED: OperationNotAllowed (DDL requires an explicit Ddl entry)
DROP TABLE users;
-- DENIED: ParseError (fail-closed on unparseable SQL)
SELEKT oops;
-- ALLOWED
SELECT id, name, email FROM users WHERE tenant_id = 'acme' LIMIT 100;Column restrictions compose with the post-invocation guard. The sql-query guard rejects a SELECT that names a disallowed column. For cases the parser cannot see through (alias chains, computed expressions like lower(ssn), view expansions), the query-result guard runs after the tool returns and redacts any denied column that shows up in the response. Defense in depth: the SELECT list cannot hide a column from the parser, and even if it could, the post-invocation redaction still fires.
DELETE and UPDATE without WHERE are always denied
require_where_for_mutations = true (the default), any mutation that lacks a WHERE clause produces SqlGuardDenyReason::MissingWhereClause before the query reaches the database. The safety net is independent of the operation allowlist. Even a capability with delete on the allowlist cannot issue a table-wide DELETE.Vector Databases
VectorDbGuard covers the four risks unique to vector databases: cross-collection access, cross-namespace access, write verbs under a read-only grant, and top_k overreach. It reads four fields from the arguments by configurable JSON paths: the default collection keys are collection, index, class, and store; the default top_k keys are top_k, topK, k, and limit.
The guard recognizes a request as vector-shaped when the database or tool name contains one of the configured vendor markers. The defaults cover vector, pinecone, weaviate, qdrant, chroma, and milvus. Non-vector traffic falls through to allow so the guard composes cleanly in a pipeline that also sees SQL and graph traffic.
vector_db:
vendor_markers:
- pinecone
- qdrant
- weaviate
- milvus
collection_allowlist:
- product-embeddings
- faq-embeddings
namespace_allowlist:
- production
denied_operations:
- drop_index
- delete_collection
allow_all: false
grants:
- server_id: pinecone-prod
tool_name: vector_search
constraints:
- type: operation_class
value: read_only
- type: max_rows_returned
value: 50The top_k ceiling is read from the grant's MaxRowsReturned constraint. When a ceiling is configured and the call omits top_k, the guard fails closed with TopKExceedsLimit rather than assuming the caller meant "unlimited." The OperationClass::ReadOnly constraint on the grant blocks write verbs (upsert, insert, update, delete, write, index, reindex) regardless of the denied_operations list.
// DENIED: CollectionNotAllowed
{ "collection": "internal-hr-embeddings", "namespace": "production", "operation": "query", "top_k": 10 }
// DENIED: NamespaceNotAllowed
{ "collection": "product-embeddings", "namespace": "staging", "operation": "query", "top_k": 10 }
// DENIED: OperationNotAllowed (upsert under OperationClass::ReadOnly)
{ "collection": "product-embeddings", "namespace": "production", "operation": "upsert", "top_k": 10 }
// DENIED: TopKExceedsLimit { requested: 500, max: 50 }
{ "collection": "product-embeddings", "namespace": "production", "operation": "query", "top_k": 500 }
// DENIED: TopKExceedsLimit { requested: u64::MAX, max: 50 } (fail-closed when top_k missing)
{ "collection": "product-embeddings", "namespace": "production", "operation": "query" }
// ALLOWED
{ "collection": "product-embeddings", "namespace": "production", "operation": "query", "top_k": 10 }Block embedding exfiltration
include_vectors: false on your tool server unless the grant explicitly needs vectors. The vector guard treats collection identity as the primary boundary; the tool server is responsible for stripping vectors from the response before the query-result guard even sees it.Data Warehouses
Warehouses are the most cost-sensitive data surface. A single bad JOIN can scan terabytes and cost thousands of dollars. WarehouseCostGuard enforces two ceilings · bytes scanned and USD per query · using a dry-run estimate the tool server attaches to every request.
The pattern is straightforward. Your tool server calls the warehouse's dry-run API (BigQuery and Snowflake both support it natively), reads back bytes scanned and estimated cost, and submits both to chio as dry_run.bytes_scanned and dry_run.estimated_cost_usd. The guard compares the estimate to the configured limits and denies before the query runs for real.
warehouse_cost:
max_bytes_scanned: 1073741824 # 1 GiB
max_cost_per_query_usd: "5.00"
warehouse_markers:
- bigquery
- snowflake
- redshift
- athena
- databricks
field_paths:
bytes_scanned: "dry_run.bytes_scanned"
estimated_cost_usd: "dry_run.estimated_cost_usd"
allow_all: falseThe dry-run flow end to end:
1. Agent submits: "summarize orders from last month"
2. Tool server generates SQL:
SELECT user_id, SUM(amount) FROM analytics.orders
WHERE order_date >= '2026-03-01' GROUP BY user_id
3. Tool server calls BigQuery dry-run:
bytes_scanned = 52_428_800 (50 MiB)
estimated_cost = "0.25" (25 cents)
4. Tool server submits to chio:
{
"engine": "bigquery",
"query": "...",
"dry_run": { "bytes_scanned": 52428800, "estimated_cost_usd": "0.25" }
}
5. warehouse-cost guard:
50 MiB < 1 GiB limit -> OK
$0.25 < $5.00 limit -> OK
verdict: allow
6. Tool server runs the real query
7. Receipt records CostDimension::WarehouseQuery {
bytes_scanned: 52428800,
estimated_cost_usd: "0.25"
}Deny scenarios with structured reasons. Each one lands in the receipt as a stable code you can alert on:
// DENIED: BytesExceedsLimit { bytes_scanned: 53687091200, limit: 1073741824 }
{ "dry_run": { "bytes_scanned": 53687091200, "estimated_cost_usd": "0.25" } }
// DENIED: CostExceedsLimit { estimated_cost_usd: "25.00", limit_usd: "5.00" }
{ "dry_run": { "bytes_scanned": 5242880, "estimated_cost_usd": "25.00" } }
// DENIED: MissingEstimate { path: "dry_run.bytes_scanned" }
{ "query": "SELECT 1" }
// DENIED: ParseError { error: "dry_run.estimated_cost_usd is not a non-negative decimal string" }
{ "dry_run": { "bytes_scanned": 1024, "estimated_cost_usd": "-5.00" } }
// ALLOWED
{ "dry_run": { "bytes_scanned": 52428800, "estimated_cost_usd": "0.25" } }Dry-run is the gate, not a suggestion
dry_run field denies the request fail-closed with MissingEstimate. Tool servers that cannot dry-run (some flavors of Athena, older Redshift) must refuse to submit the query rather than passing it through without an estimate.Graph Databases
Graphs enable a class of query that relational databases cannot express as compactly: unbounded traversals. A Cypher pattern like MATCH (a)-[*]->(b) can return every reachable node from every starting point. On a graph with millions of edges this is effectively a denial-of-service, and the agent paid for it accidentally.
GraphTraversalGuard enforces a hard ceiling on traversal depth and an allowlist of node labels and relationship types. It reads structured arguments from the tool server (max_depth, node_labels, relationship_types) and performs a lightweight regex extraction on the Cypher text to catch explicit variable-length patterns like [*1..N].
graph_traversal:
max_traversal_depth: 3
label_allowlist:
- Person
- Organization
- Document
relationship_allowlist:
- KNOWS
- WORKS_AT
- AUTHORED
operation_class: read_onlyDeny scenarios. Every violation reaches the receipt as a structured reason:
DENIED: TraversalDepthExceeded
MATCH (a)-[*]->(b) RETURN b
Reason: unbounded traversal depth, grant permits max 3
DENIED: TraversalDepthExceeded
MATCH (a)-[*1..10]->(b) RETURN b
Reason: traversal depth 10 exceeds maximum of 3
DENIED: LabelNotAllowed
MATCH (a:Secret)-[:KNOWS]->(b) RETURN b
Reason: node label "Secret" not in allowlist
DENIED: RelationshipNotAllowed
MATCH (a)-[:OWNS]->(b) RETURN b
Reason: relationship type "OWNS" not in allowlist
DENIED: OperationNotAllowed
CREATE (n:Person { name: 'Eve' })
Reason: CREATE is Append, grant permits ReadOnly
ALLOWED
MATCH (p:Person)-[:KNOWS*1..3]->(friend) RETURN friend.nameThe Cypher parser is regex-lite by design
max_depth, node_labels, and relationship_types accurately. If neither signal is present, the guard denies. Deep parsing lives in the tool server where the Cypher driver already has the AST.Caches
Caches are lower risk per query (keys are small, no joins, no cost bombs) but high risk for cross-tenant leakage. An agent with scoped access to session:agent-42:* should not be able to run KEYS * or read session:agent-99:state. CacheKeyGuard enforces a key pattern allowlist and an always-denied list of administrative commands.
cache_key:
engine_markers:
- redis
- memcached
key_pattern_allowlist:
- "session:agent-42:*"
- "cache:public:*"
dangerous_commands:
- KEYS
- FLUSHDB
- FLUSHALL
- CONFIG
- DEBUG
- SHUTDOWN
- SLAVEOF
- REPLICAOF
- CLUSTER
- SCRIPT
operation_class: read_onlyThe guard reads three fields from the arguments: key for single-key commands, pattern for SCAN-style lookups, and operation or command for the Redis verb.
DENIED: KeyPatternNotAllowed
key = "session:agent-99:state"
Reason: does not match any allowlist pattern
DENIED: KeyPatternNotAllowed
key = "admin:config"
Reason: does not match any allowlist pattern
DENIED: OperationNotAllowed
operation = "SET", key = "session:agent-42:state"
Reason: SET is Append, grant permits ReadOnly
DENIED: DangerousCommandBlocked
command = "FLUSHDB"
Reason: FLUSHDB is always denied without Admin operation class
DENIED: DangerousCommandBlocked
command = "KEYS", pattern = "*"
Reason: KEYS * is always denied without Admin operation class
ALLOWED
operation = "GET", key = "session:agent-42:state"
ALLOWED
operation = "SCAN", pattern = "session:agent-42:*", count = 100Post-Invocation: Redact and Truncate Results
QueryResultGuard is the one post-invocation guard in the set. It runs after the tool server returns, before the response reaches the agent. Three jobs:
- Row truncation. If the active scope has any
MaxRowsReturnedconstraint, the rows array is truncated to the strictest limit across grants. A tool server that ignored the pre-invocation LIMIT still cannot deliver more rows than the policy allows. - Column redaction. Every column on
ColumnDenylistis replaced with[REDACTED](configurable viaredaction_marker). Qualified entries likeusers.emailmatch both flat rows whereemailis the key and nested rows whereuserswraps the column. - PII pattern matching. The guard takes a list of regex patterns via
redact_pii_patterns. Matches in any string value in the response are replaced with the redaction marker. Invalid patterns are logged and skipped so a typo cannot accidentally widen redaction.
A representative config with common PII patterns:
query_result:
redaction_marker: "[REDACTED]"
rows_keys:
- rows
- results
- records
- data
redact_pii_patterns:
- '\b\d{3}-\d{2}-\d{4}\b' # US SSN
- '\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b' # credit card
- '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' # email
- '\+?1?[\s.-]?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}' # US phoneA concrete before and after. Input from the tool server:
{
"rows": [
{ "id": 1, "name": "Ada", "email": "ada@example.com", "ssn": "123-45-6789" },
{ "id": 2, "name": "Bob", "email": "bob@example.com", "ssn": "987-65-4321" },
{ "id": 3, "name": "Cam", "email": "cam@example.com", "ssn": "555-12-3456" }
]
}With a scope carrying MaxRowsReturned(2) and ColumnDenylist(["ssn"]), the agent receives:
{
"rows": [
{ "id": 1, "name": "Ada", "email": "ada@example.com", "ssn": "[REDACTED]" },
{ "id": 2, "name": "Bob", "email": "bob@example.com", "ssn": "[REDACTED]" }
]
}The guard runs on rows, results, records, or data, in that order. A top-level JSON array is treated as the rows list directly. If the response is constrained (a column denylist is active) but the guard cannot find a rows shape, it redacts the entire payload fail-closed rather than passing it through unredacted.
Why post-invocation redaction matters
sql-query guard checks the SELECT list, but table structures change, aliases mask column names, views expand transparently, and tool servers can add metadata columns the agent never requested. The post-invocation guard is the last line of defense. Install both. If the pre-invocation guard misses a case, the post-invocation pass still redacts before the agent sees the response.Tool Server Submission Contract
The guards depend on your tool server submitting structured arguments. This is the contract. Get it right once and every data-layer guard reads the same shape.
| Field | Required for | Purpose |
|---|---|---|
engine | all SQL, warehouse | dialect selection (postgres, mysql, mssql, sqlite, bigquery, snowflake) |
database | all | target database, schema, namespace, or cluster identifier |
query | SQL, warehouse, graph | raw query text for parsing |
collection / namespace | vector | collection path and namespace for scoping |
operation | vector, cache | verb (query, upsert, get, set) |
top_k | vector | result volume (also topK, k, limit) |
dry_run | warehouse | { bytes_scanned, estimated_cost_usd } |
key / pattern / command | cache | Redis key, SCAN pattern, or command verb |
max_depth / node_labels | graph | traversal depth and label list reported by the driver |
A Python tool server that submits a SQL call:
from chio_sdk import KernelClient
kernel = KernelClient(sidecar_url="http://127.0.0.1:9090")
async def sql_query(tool_call, conn):
query: str = tool_call.arguments["query"]
# Submit to chio BEFORE executing. The kernel runs:
# velocity -> sql-query -> (vector-db, warehouse-cost, ...) pass through
verdict = await kernel.evaluate(
tool_name="sql_query",
arguments={
"engine": "postgres",
"database": "analytics",
"query": query,
},
)
if verdict.denied:
return {"error": verdict.reason, "code": verdict.code}
rows = await conn.fetch(query)
# Submit the response for post-invocation shaping.
shaped = await kernel.inspect_response(
tool_name="sql_query",
response={
"rows": [dict(r) for r in rows],
"engine": "postgres",
},
)
return shaped.valueA TypeScript tool server that submits a warehouse call with dry-run:
import { KernelClient } from "@chio-protocol/sdk";
import { BigQuery } from "@google-cloud/bigquery";
const kernel = new KernelClient({ sidecarUrl: "http://127.0.0.1:9090" });
const bq = new BigQuery();
export async function warehouseQuery(toolCall: ToolCall) {
const query = toolCall.arguments.query as string;
// Run the dry-run first. BigQuery charges zero for dry-run jobs.
const [dryRunJob] = await bq.createQueryJob({ query, dryRun: true });
const dryRun = {
bytes_scanned: Number(dryRunJob.metadata.statistics.totalBytesProcessed),
estimated_cost_usd: (
Number(dryRunJob.metadata.statistics.totalBytesProcessed) /
1024 ** 4 *
5.0
).toFixed(6),
};
const verdict = await kernel.evaluate({
toolName: "warehouse_query",
arguments: {
engine: "bigquery",
database: "my-project.analytics",
query,
dry_run: dryRun,
},
});
if (verdict.denied) {
return { error: verdict.reason, code: verdict.code };
}
const [rows] = await bq.query({ query });
return kernel.inspectResponse({
toolName: "warehouse_query",
response: { rows, engine: "bigquery" },
});
}Composing with Other Guards
Data-layer guards do not replace the existing guards; they layer on top. You typically register them in a pipeline that already has velocity-guard, data-flow-guard, and egress-allowlist-guard. Cheap guards run first so the pipeline short-circuits quickly on denial.
use chio_guards::GuardPipeline;
use chio_data_guards::{
QueryResultGuard, QueryResultGuardConfig,
SqlGuardConfig, SqlQueryGuard,
VectorDbGuard, VectorGuardConfig,
WarehouseCostGuard, WarehouseCostGuardConfig,
};
let mut pipeline = GuardPipeline::default_pipeline();
// Data-layer pre-invocation guards.
pipeline.add(Box::new(SqlQueryGuard::new(sql_config())));
pipeline.add(Box::new(VectorDbGuard::new(vector_config())));
pipeline.add(Box::new(WarehouseCostGuard::new(warehouse_config())));
// GraphTraversalGuard and CacheKeyGuard register the same way.
kernel.add_guard(Box::new(pipeline));
// Post-invocation shaping runs separately.
let result_guard = QueryResultGuard::new(QueryResultGuardConfig {
redact_pii_patterns: vec![
r"\b\d{3}-\d{2}-\d{4}\b".into(),
],
..Default::default()
});
kernel.add_post_invocation(Box::new(result_guard));Three compositions you should plan for:
- Velocity. Rate-limit queries per minute with
velocity-guard. Data-layer guards are latency-sensitive (SQL parsing, regex matching), so velocity runs first and drops runaway loops before they spend parser time. - Data flow. Session-level bytes-read limits live on
data-flow-guard. Data-layer guards enforce per-query ceilings (top_k, MaxBytesScanned); data-flow enforces the cumulative total across every query in the session. Both apply. - Egress allowlist. If your tool server talks to an external database over the network,
egress-allowlist-guardensures it reaches only the approved hosts. The data-layer guards check the query content; the egress guard checks the endpoint. Both apply.
A small HushSpec block that layers all three:
velocity:
max_tool_calls_per_window:
window_seconds: 60
limit: 120
max_spend_per_window:
window_seconds: 3600
limit_usd: "50.00"
data_flow:
max_bytes_read_per_session: 104857600 # 100 MiB
egress_allowlist:
hosts:
- "bigquery.googleapis.com"
- "db.internal:5432"
- "pinecone.io"
sql_query:
# ... as above
warehouse_cost:
# ... as above
vector_db:
# ... as aboveDebugging Denials
Every deny reason across the six guards is a structured enum with a short stable code. The receipt captures the code verbatim so you can aggregate denials by reason without log-string parsing. A denied receipt looks like this:
{
"receipt_id": "rcpt_01HXYZ...",
"tool_name": "sql_query",
"server_id": "analytics-db",
"verdict": "deny",
"guard": "sql-query",
"deny_reason": {
"code": "table_not_allowed",
"message": "table 'salaries' is not in the allowlist",
"detail": { "table": "salaries" }
},
"metadata": {
"engine": "postgres",
"database": "analytics",
"query_hash": "sha256:a1b2c3..."
}
}The codes to know when triaging a denial:
| Guard | Code | What to fix |
|---|---|---|
sql-query | parse_error | query is malformed or uses syntax the dialect parser rejects; check dialect setting |
sql-query | table_not_allowed | add the table to table_allowlist or fix the query |
sql-query | column_not_allowed | add the column to the per-table allowlist or remove it from the SELECT |
sql-query | select_star_denied | enumerate columns explicitly; SELECT * is denied when the table has a column allowlist |
sql-query | missing_where_clause | add a WHERE to the UPDATE or DELETE, or disable require_where_for_mutations (not recommended) |
sql-query | predicate_denylisted | the WHERE matched a regex in denylisted_predicates; rewrite or prune the rule |
vector-db | collection_not_allowed | add the collection to collection_allowlist |
vector-db | top_k_exceeds_limit | reduce top_k or raise MaxRowsReturned on the grant |
warehouse-cost | missing_estimate | tool server must attach dry_run.bytes_scanned and dry_run.estimated_cost_usd |
warehouse-cost | bytes_exceeds_limit | query would scan too much; add filters, partition predicates, or raise max_bytes_scanned |
warehouse-cost | cost_exceeds_limit | same fix path as bytes, or raise max_cost_per_query_usd with approval |
| all | no_config | the guard is registered but has no allowlists; either populate the policy or remove the guard |
The query audit receipts guide walks through the query-specific receipt fields and shows how to aggregate denials by code for dashboards and alerting.
Summary
| Engine | Guard | Key constraint | Typical receipt field |
|---|---|---|---|
| Postgres, MySQL, SQL Server, SQLite | sql-query | table_allowlist, column_allowlist | engine, tables_accessed, query_hash |
| Pinecone, Qdrant, Weaviate, Milvus | vector-db | collection_allowlist, MaxRowsReturned | collection, namespace, top_k |
| BigQuery, Snowflake, Redshift, Athena, Databricks | warehouse-cost | max_bytes_scanned, max_cost_per_query_usd | CostDimension::WarehouseQuery with bytes_scanned and estimated_cost_usd |
| Neo4j, Neptune | graph-traversal | max_traversal_depth, label_allowlist | engine, max_depth, labels |
| Redis, Memcached | cache-key | key_pattern_allowlist, dangerous_commands | command, key, pattern |
| any (post-invocation) | query-result | MaxRowsReturned, ColumnDenylist, redact_pii_patterns | rows_returned, columns_redacted |
Next Steps
- Guards · how the six data-layer guards fit into the broader guard model
- Write a Policy · HushSpec syntax for constraints, allowlists, and grant scoping
- Custom Guards · add organization-specific logic on top of the built-in six
- Native Tool Server · build a tool server that submits the contract these guards expect
- Receipts · the signed audit record every database query produces
- Query Audit Receipts · aggregate database receipts for compliance and cost reporting