Chio/Docs

Govern Database Queries

Every agent framework ships a database tool. LangChain has SQLDatabaseToolkit. LlamaIndex has NLSQLTableQueryEngine. Custom agents call Postgres, BigQuery, Pinecone, and Redis through tool wrappers. Without chio in the middle, a single text-to-SQL step can exfiltrate a table, mutate production data, or run a warehouse query costing thousands of dollars. This guide walks through the six guards that live in the chio-data-guards crate. They share one submission contract, compose with velocity and data-flow guards, and together give you enforceable policy over SQL, vector, warehouse, graph, and cache access.


Why Govern the Data Layer

Chio already governs which tools an agent may invoke. The data layer adds a second question: what does the tool do once it reaches the database? That question has sharp teeth because a tool server can pass a grammatically valid query that devastates a system:

  • Prompt injection writing SQL. A poisoned document convinces the model to emit SELECT * FROM salaries instead of the intended query. The tool call is authorized, the SQL is valid, and the table walks out the door.
  • Text-to-SQL emitting destructive writes. The model decides the fastest way to "clean up" is DELETE FROM users with no WHERE clause. Production data vanishes in one step.
  • Unbounded RAG retrieval. An agent queries a vector index with top_k=10000 against a collection outside its tenant, pulling every embedding into the session.
  • Warehouse cost bombs. A bad JOIN against a billion-row BigQuery table scans terabytes. The agent spent five dollars of budget on tokens and fifteen thousand dollars on cloud compute.
  • Graph traversal explosions. A naive MATCH (a)-[*]->(b) on a social graph returns millions of paths and stalls the cluster.
  • Cache pattern overreach. A session tool with keys scoped to one agent discovers it can KEYS * and read every other agent's state.

The six data-layer guards you get with chio handle each class of risk. They live in chio-data-guards, sit in the same pre-invocation pipeline as forbidden-path, egress-allowlist, and velocity, and emit structured deny reasons that your receipts capture verbatim.


How It Works

Chio intercepts at the kernel boundary. It never runs inside the database driver and never connects to your data store directly. Your tool server wraps the database, captures the query it is about to execute, and submits it to chio as tool-call arguments. The guard pipeline parses and enforces the policy before the query runs. If the verdict is allow, the tool server executes. If the verdict is deny, the tool server returns the deny reason to the agent and nothing hits the database.

rendering…
Pre-invocation guards evaluate the submitted query; on allow the tool server executes; on return the post-invocation guard reshapes the result before the agent sees it.

Submissions are JSON. The shape you send depends on the engine, but every data-layer guard reads a small, predictable set of keys from the arguments object. Here is a representative submission for a relational query:

json
{
  "tool_name": "sql_query",
  "arguments": {
    "engine": "postgres",
    "database": "analytics",
    "query": "SELECT name, email FROM users WHERE tenant_id = 'acme' LIMIT 100"
  }
}

A warehouse submission carries a dry-run estimate so the warehouse-cost guard can price the query before it runs:

json
{
  "tool_name": "warehouse_query",
  "arguments": {
    "engine": "bigquery",
    "database": "my-project.analytics",
    "query": "SELECT user_id, SUM(amount) FROM orders GROUP BY user_id",
    "dry_run": {
      "bytes_scanned": 52428800,
      "estimated_cost_usd": "0.25"
    }
  }
}

A vector submission carries the collection, namespace, operation, and top_k:

json
{
  "tool_name": "vector_search",
  "arguments": {
    "database": "pinecone-prod",
    "collection": "product-embeddings",
    "namespace": "production",
    "operation": "query",
    "top_k": 10
  }
}

Each guard reads its own fields, applies its constraints, and either passes the request through or returns a Verdict::Deny with a structured reason (for example TableNotAllowed, TopKExceedsLimit, BytesExceedsLimit). The receipt captures the reason code so operators and auditors can inspect exactly which constraint fired.

Why chio does not talk to the warehouse directly

The kernel is the trusted mediator and tool servers are inside the sandbox. If chio called BigQuery to estimate costs it would need warehouse credentials, violate privilege separation, and perform async I/O on the hot path. The dry-run estimate pattern keeps all external I/O in the tool server and the receipt log keeps the tool server's self-report auditable.

Pick Your Guard

The six guards split cleanly by engine type. Five run pre-invocation, one runs post-invocation. You typically register all six on the pipeline and each one short-circuits to allow when the request is not its shape (thevector-db guard passes through SQL calls, the sql-query guard passes through vector calls, and so on).

EngineGuardPhaseKey constraints
Relational (Postgres, MySQL, SQL Server, SQLite)sql-querypreoperation allowlist, table allowlist, column allowlist, predicate denylist, WHERE for mutations
Vector (Pinecone, Qdrant, Weaviate, Milvus, Chroma)vector-dbprecollection allowlist, namespace allowlist, denied operations, top_k ceiling
Warehouse (BigQuery, Snowflake, Redshift, Athena, Databricks)warehouse-costpremax bytes scanned, max cost per query (USD), dry-run required
Graph (Neo4j, Neptune)graph-traversalpremax traversal depth, label allowlist, relationship type allowlist
Cache (Redis, Memcached)cache-keyprekey pattern allowlist, dangerous command blocklist
Any (applied to tool response)query-resultpostrow truncation via MaxRowsReturned, column redaction via ColumnDenylist, PII regex patterns

The post-invocation query-result guard is defense in depth. Pre-invocation guards reason about the query text; query-result reasons about what actually came back. A column the parser missed, a LIMIT the tool server ignored, a PII string that slipped through: the post-invocation pass truncates and redacts before the agent ever sees the response.


Relational SQL

SqlQueryGuard parses SQL with sqlparser and enforces four knobs: operation allowlist, table allowlist, per-table column allowlist, and a regex denylist against the canonicalized WHERE clause. A fifth knob, require_where_for_mutations, defaults to true and denies UPDATE and DELETE without a WHERE clause regardless of any other policy.

The guard is fail-closed. An empty config denies every query. Parse errors deny, even when allow_all is enabled. SELECT * is denied whenever the referenced table has a column allowlist entry, because the guard cannot prove the expansion stays inside the allowed set.

A typical policy in HushSpec YAML:

yaml
sql_query:
  dialect: postgres
  operation_allowlist:
    - select
  table_allowlist:
    - users
    - orders
    - products
  column_allowlist:
    users:
      - id
      - name
      - email
      - created_at
    orders:
      - id
      - user_id
      - total
      - status
    products:
      - "*"
  denylisted_predicates:
    - '\bor\s+1\s*=\s*1\b'
    - '\bunion\s+select\b'
  require_where_for_mutations: true
  allow_all: false

Here are the canonical deny scenarios. Each one corresponds to a variant of SqlGuardDenyReason that shows up in the receipt:

sql
-- DENIED: TableNotAllowed
SELECT id, total FROM salaries;

-- DENIED: OperationNotAllowed (DELETE is not in operation_allowlist)
DELETE FROM users WHERE id = 42;

-- DENIED: ColumnNotAllowed (ssn is not on the users allowlist)
SELECT id, ssn FROM users WHERE tenant_id = 'acme';

-- DENIED: SelectStarDenied (users has a column allowlist entry)
SELECT * FROM users;

-- DENIED: PredicateDenylisted (matches OR 1=1 pattern)
SELECT id FROM orders WHERE user_id = 1 OR 1=1;

-- DENIED: MissingWhereClause (require_where_for_mutations = true)
DELETE FROM orders;

-- DENIED: OperationNotAllowed (DDL requires an explicit Ddl entry)
DROP TABLE users;

-- DENIED: ParseError (fail-closed on unparseable SQL)
SELEKT oops;

-- ALLOWED
SELECT id, name, email FROM users WHERE tenant_id = 'acme' LIMIT 100;

Column restrictions compose with the post-invocation guard. The sql-query guard rejects a SELECT that names a disallowed column. For cases the parser cannot see through (alias chains, computed expressions like lower(ssn), view expansions), the query-result guard runs after the tool returns and redacts any denied column that shows up in the response. Defense in depth: the SELECT list cannot hide a column from the parser, and even if it could, the post-invocation redaction still fires.

DELETE and UPDATE without WHERE are always denied

With require_where_for_mutations = true (the default), any mutation that lacks a WHERE clause produces SqlGuardDenyReason::MissingWhereClause before the query reaches the database. The safety net is independent of the operation allowlist. Even a capability with delete on the allowlist cannot issue a table-wide DELETE.

Vector Databases

VectorDbGuard covers the four risks unique to vector databases: cross-collection access, cross-namespace access, write verbs under a read-only grant, and top_k overreach. It reads four fields from the arguments by configurable JSON paths: the default collection keys are collection, index, class, and store; the default top_k keys are top_k, topK, k, and limit.

The guard recognizes a request as vector-shaped when the database or tool name contains one of the configured vendor markers. The defaults cover vector, pinecone, weaviate, qdrant, chroma, and milvus. Non-vector traffic falls through to allow so the guard composes cleanly in a pipeline that also sees SQL and graph traffic.

yaml
vector_db:
  vendor_markers:
    - pinecone
    - qdrant
    - weaviate
    - milvus
  collection_allowlist:
    - product-embeddings
    - faq-embeddings
  namespace_allowlist:
    - production
  denied_operations:
    - drop_index
    - delete_collection
  allow_all: false

grants:
  - server_id: pinecone-prod
    tool_name: vector_search
    constraints:
      - type: operation_class
        value: read_only
      - type: max_rows_returned
        value: 50

The top_k ceiling is read from the grant's MaxRowsReturned constraint. When a ceiling is configured and the call omits top_k, the guard fails closed with TopKExceedsLimit rather than assuming the caller meant "unlimited." The OperationClass::ReadOnly constraint on the grant blocks write verbs (upsert, insert, update, delete, write, index, reindex) regardless of the denied_operations list.

json
// DENIED: CollectionNotAllowed
{ "collection": "internal-hr-embeddings", "namespace": "production", "operation": "query", "top_k": 10 }

// DENIED: NamespaceNotAllowed
{ "collection": "product-embeddings", "namespace": "staging", "operation": "query", "top_k": 10 }

// DENIED: OperationNotAllowed (upsert under OperationClass::ReadOnly)
{ "collection": "product-embeddings", "namespace": "production", "operation": "upsert", "top_k": 10 }

// DENIED: TopKExceedsLimit { requested: 500, max: 50 }
{ "collection": "product-embeddings", "namespace": "production", "operation": "query", "top_k": 500 }

// DENIED: TopKExceedsLimit { requested: u64::MAX, max: 50 } (fail-closed when top_k missing)
{ "collection": "product-embeddings", "namespace": "production", "operation": "query" }

// ALLOWED
{ "collection": "product-embeddings", "namespace": "production", "operation": "query", "top_k": 10 }

Block embedding exfiltration

Raw vectors enable reconstruction attacks: an attacker with embeddings and the model can recover approximate source text. Keep include_vectors: false on your tool server unless the grant explicitly needs vectors. The vector guard treats collection identity as the primary boundary; the tool server is responsible for stripping vectors from the response before the query-result guard even sees it.

Data Warehouses

Warehouses are the most cost-sensitive data surface. A single bad JOIN can scan terabytes and cost thousands of dollars. WarehouseCostGuard enforces two ceilings · bytes scanned and USD per query · using a dry-run estimate the tool server attaches to every request.

The pattern is straightforward. Your tool server calls the warehouse's dry-run API (BigQuery and Snowflake both support it natively), reads back bytes scanned and estimated cost, and submits both to chio as dry_run.bytes_scanned and dry_run.estimated_cost_usd. The guard compares the estimate to the configured limits and denies before the query runs for real.

yaml
warehouse_cost:
  max_bytes_scanned: 1073741824        # 1 GiB
  max_cost_per_query_usd: "5.00"
  warehouse_markers:
    - bigquery
    - snowflake
    - redshift
    - athena
    - databricks
  field_paths:
    bytes_scanned: "dry_run.bytes_scanned"
    estimated_cost_usd: "dry_run.estimated_cost_usd"
  allow_all: false

The dry-run flow end to end:

text
1. Agent submits: "summarize orders from last month"
2. Tool server generates SQL:
     SELECT user_id, SUM(amount) FROM analytics.orders
     WHERE order_date >= '2026-03-01' GROUP BY user_id
3. Tool server calls BigQuery dry-run:
     bytes_scanned = 52_428_800      (50 MiB)
     estimated_cost = "0.25"         (25 cents)
4. Tool server submits to chio:
     {
       "engine": "bigquery",
       "query": "...",
       "dry_run": { "bytes_scanned": 52428800, "estimated_cost_usd": "0.25" }
     }
5. warehouse-cost guard:
     50 MiB < 1 GiB limit  -> OK
     $0.25 < $5.00 limit    -> OK
     verdict: allow
6. Tool server runs the real query
7. Receipt records CostDimension::WarehouseQuery {
     bytes_scanned: 52428800,
     estimated_cost_usd: "0.25"
   }

Deny scenarios with structured reasons. Each one lands in the receipt as a stable code you can alert on:

json
// DENIED: BytesExceedsLimit { bytes_scanned: 53687091200, limit: 1073741824 }
{ "dry_run": { "bytes_scanned": 53687091200, "estimated_cost_usd": "0.25" } }

// DENIED: CostExceedsLimit { estimated_cost_usd: "25.00", limit_usd: "5.00" }
{ "dry_run": { "bytes_scanned": 5242880, "estimated_cost_usd": "25.00" } }

// DENIED: MissingEstimate { path: "dry_run.bytes_scanned" }
{ "query": "SELECT 1" }

// DENIED: ParseError { error: "dry_run.estimated_cost_usd is not a non-negative decimal string" }
{ "dry_run": { "bytes_scanned": 1024, "estimated_cost_usd": "-5.00" } }

// ALLOWED
{ "dry_run": { "bytes_scanned": 52428800, "estimated_cost_usd": "0.25" } }

Dry-run is the gate, not a suggestion

If your policy has any cost constraint, make the dry-run estimate mandatory in your tool server. A missing dry_run field denies the request fail-closed with MissingEstimate. Tool servers that cannot dry-run (some flavors of Athena, older Redshift) must refuse to submit the query rather than passing it through without an estimate.

Graph Databases

Graphs enable a class of query that relational databases cannot express as compactly: unbounded traversals. A Cypher pattern like MATCH (a)-[*]->(b) can return every reachable node from every starting point. On a graph with millions of edges this is effectively a denial-of-service, and the agent paid for it accidentally.

GraphTraversalGuard enforces a hard ceiling on traversal depth and an allowlist of node labels and relationship types. It reads structured arguments from the tool server (max_depth, node_labels, relationship_types) and performs a lightweight regex extraction on the Cypher text to catch explicit variable-length patterns like [*1..N].

yaml
graph_traversal:
  max_traversal_depth: 3
  label_allowlist:
    - Person
    - Organization
    - Document
  relationship_allowlist:
    - KNOWS
    - WORKS_AT
    - AUTHORED
  operation_class: read_only

Deny scenarios. Every violation reaches the receipt as a structured reason:

text
DENIED: TraversalDepthExceeded
  MATCH (a)-[*]->(b) RETURN b
  Reason: unbounded traversal depth, grant permits max 3

DENIED: TraversalDepthExceeded
  MATCH (a)-[*1..10]->(b) RETURN b
  Reason: traversal depth 10 exceeds maximum of 3

DENIED: LabelNotAllowed
  MATCH (a:Secret)-[:KNOWS]->(b) RETURN b
  Reason: node label "Secret" not in allowlist

DENIED: RelationshipNotAllowed
  MATCH (a)-[:OWNS]->(b) RETURN b
  Reason: relationship type "OWNS" not in allowlist

DENIED: OperationNotAllowed
  CREATE (n:Person { name: 'Eve' })
  Reason: CREATE is Append, grant permits ReadOnly

ALLOWED
  MATCH (p:Person)-[:KNOWS*1..3]->(friend) RETURN friend.name

The Cypher parser is regex-lite by design

Full Cypher parsers in Rust are immature, and the kernel is latency sensitive. The guard inspects variable-length path patterns with a focused regex and trusts the tool server to report max_depth, node_labels, and relationship_types accurately. If neither signal is present, the guard denies. Deep parsing lives in the tool server where the Cypher driver already has the AST.

Caches

Caches are lower risk per query (keys are small, no joins, no cost bombs) but high risk for cross-tenant leakage. An agent with scoped access to session:agent-42:* should not be able to run KEYS * or read session:agent-99:state. CacheKeyGuard enforces a key pattern allowlist and an always-denied list of administrative commands.

yaml
cache_key:
  engine_markers:
    - redis
    - memcached
  key_pattern_allowlist:
    - "session:agent-42:*"
    - "cache:public:*"
  dangerous_commands:
    - KEYS
    - FLUSHDB
    - FLUSHALL
    - CONFIG
    - DEBUG
    - SHUTDOWN
    - SLAVEOF
    - REPLICAOF
    - CLUSTER
    - SCRIPT
  operation_class: read_only

The guard reads three fields from the arguments: key for single-key commands, pattern for SCAN-style lookups, and operation or command for the Redis verb.

text
DENIED: KeyPatternNotAllowed
  key = "session:agent-99:state"
  Reason: does not match any allowlist pattern

DENIED: KeyPatternNotAllowed
  key = "admin:config"
  Reason: does not match any allowlist pattern

DENIED: OperationNotAllowed
  operation = "SET", key = "session:agent-42:state"
  Reason: SET is Append, grant permits ReadOnly

DENIED: DangerousCommandBlocked
  command = "FLUSHDB"
  Reason: FLUSHDB is always denied without Admin operation class

DENIED: DangerousCommandBlocked
  command = "KEYS", pattern = "*"
  Reason: KEYS * is always denied without Admin operation class

ALLOWED
  operation = "GET", key = "session:agent-42:state"

ALLOWED
  operation = "SCAN", pattern = "session:agent-42:*", count = 100

Post-Invocation: Redact and Truncate Results

QueryResultGuard is the one post-invocation guard in the set. It runs after the tool server returns, before the response reaches the agent. Three jobs:

  • Row truncation. If the active scope has any MaxRowsReturned constraint, the rows array is truncated to the strictest limit across grants. A tool server that ignored the pre-invocation LIMIT still cannot deliver more rows than the policy allows.
  • Column redaction. Every column on ColumnDenylist is replaced with [REDACTED] (configurable via redaction_marker). Qualified entries like users.email match both flat rows where email is the key and nested rows where users wraps the column.
  • PII pattern matching. The guard takes a list of regex patterns via redact_pii_patterns. Matches in any string value in the response are replaced with the redaction marker. Invalid patterns are logged and skipped so a typo cannot accidentally widen redaction.

A representative config with common PII patterns:

yaml
query_result:
  redaction_marker: "[REDACTED]"
  rows_keys:
    - rows
    - results
    - records
    - data
  redact_pii_patterns:
    - '\b\d{3}-\d{2}-\d{4}\b'                                   # US SSN
    - '\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b'              # credit card
    - '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'              # email
    - '\+?1?[\s.-]?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}'         # US phone

A concrete before and after. Input from the tool server:

json
{
  "rows": [
    { "id": 1, "name": "Ada", "email": "ada@example.com", "ssn": "123-45-6789" },
    { "id": 2, "name": "Bob", "email": "bob@example.com", "ssn": "987-65-4321" },
    { "id": 3, "name": "Cam", "email": "cam@example.com", "ssn": "555-12-3456" }
  ]
}

With a scope carrying MaxRowsReturned(2) and ColumnDenylist(["ssn"]), the agent receives:

json
{
  "rows": [
    { "id": 1, "name": "Ada", "email": "ada@example.com", "ssn": "[REDACTED]" },
    { "id": 2, "name": "Bob", "email": "bob@example.com", "ssn": "[REDACTED]" }
  ]
}

The guard runs on rows, results, records, or data, in that order. A top-level JSON array is treated as the rows list directly. If the response is constrained (a column denylist is active) but the guard cannot find a rows shape, it redacts the entire payload fail-closed rather than passing it through unredacted.

Why post-invocation redaction matters

The pre-invocation sql-query guard checks the SELECT list, but table structures change, aliases mask column names, views expand transparently, and tool servers can add metadata columns the agent never requested. The post-invocation guard is the last line of defense. Install both. If the pre-invocation guard misses a case, the post-invocation pass still redacts before the agent sees the response.

Tool Server Submission Contract

The guards depend on your tool server submitting structured arguments. This is the contract. Get it right once and every data-layer guard reads the same shape.

FieldRequired forPurpose
engineall SQL, warehousedialect selection (postgres, mysql, mssql, sqlite, bigquery, snowflake)
databasealltarget database, schema, namespace, or cluster identifier
querySQL, warehouse, graphraw query text for parsing
collection / namespacevectorcollection path and namespace for scoping
operationvector, cacheverb (query, upsert, get, set)
top_kvectorresult volume (also topK, k, limit)
dry_runwarehouse{ bytes_scanned, estimated_cost_usd }
key / pattern / commandcacheRedis key, SCAN pattern, or command verb
max_depth / node_labelsgraphtraversal depth and label list reported by the driver

A Python tool server that submits a SQL call:

tool_server/sql.py
from chio_sdk import KernelClient

kernel = KernelClient(sidecar_url="http://127.0.0.1:9090")

async def sql_query(tool_call, conn):
    query: str = tool_call.arguments["query"]

    # Submit to chio BEFORE executing. The kernel runs:
    #   velocity -> sql-query -> (vector-db, warehouse-cost, ...) pass through
    verdict = await kernel.evaluate(
        tool_name="sql_query",
        arguments={
            "engine": "postgres",
            "database": "analytics",
            "query": query,
        },
    )
    if verdict.denied:
        return {"error": verdict.reason, "code": verdict.code}

    rows = await conn.fetch(query)

    # Submit the response for post-invocation shaping.
    shaped = await kernel.inspect_response(
        tool_name="sql_query",
        response={
            "rows": [dict(r) for r in rows],
            "engine": "postgres",
        },
    )
    return shaped.value

A TypeScript tool server that submits a warehouse call with dry-run:

tool_server/warehouse.ts
import { KernelClient } from "@chio-protocol/sdk";
import { BigQuery } from "@google-cloud/bigquery";

const kernel = new KernelClient({ sidecarUrl: "http://127.0.0.1:9090" });
const bq = new BigQuery();

export async function warehouseQuery(toolCall: ToolCall) {
  const query = toolCall.arguments.query as string;

  // Run the dry-run first. BigQuery charges zero for dry-run jobs.
  const [dryRunJob] = await bq.createQueryJob({ query, dryRun: true });
  const dryRun = {
    bytes_scanned: Number(dryRunJob.metadata.statistics.totalBytesProcessed),
    estimated_cost_usd: (
      Number(dryRunJob.metadata.statistics.totalBytesProcessed) /
      1024 ** 4 *
      5.0
    ).toFixed(6),
  };

  const verdict = await kernel.evaluate({
    toolName: "warehouse_query",
    arguments: {
      engine: "bigquery",
      database: "my-project.analytics",
      query,
      dry_run: dryRun,
    },
  });
  if (verdict.denied) {
    return { error: verdict.reason, code: verdict.code };
  }

  const [rows] = await bq.query({ query });

  return kernel.inspectResponse({
    toolName: "warehouse_query",
    response: { rows, engine: "bigquery" },
  });
}

Composing with Other Guards

Data-layer guards do not replace the existing guards; they layer on top. You typically register them in a pipeline that already has velocity-guard, data-flow-guard, and egress-allowlist-guard. Cheap guards run first so the pipeline short-circuits quickly on denial.

src/kernel_setup.rs
use chio_guards::GuardPipeline;
use chio_data_guards::{
    QueryResultGuard, QueryResultGuardConfig,
    SqlGuardConfig, SqlQueryGuard,
    VectorDbGuard, VectorGuardConfig,
    WarehouseCostGuard, WarehouseCostGuardConfig,
};

let mut pipeline = GuardPipeline::default_pipeline();

// Data-layer pre-invocation guards.
pipeline.add(Box::new(SqlQueryGuard::new(sql_config())));
pipeline.add(Box::new(VectorDbGuard::new(vector_config())));
pipeline.add(Box::new(WarehouseCostGuard::new(warehouse_config())));
// GraphTraversalGuard and CacheKeyGuard register the same way.

kernel.add_guard(Box::new(pipeline));

// Post-invocation shaping runs separately.
let result_guard = QueryResultGuard::new(QueryResultGuardConfig {
    redact_pii_patterns: vec![
        r"\b\d{3}-\d{2}-\d{4}\b".into(),
    ],
    ..Default::default()
});
kernel.add_post_invocation(Box::new(result_guard));

Three compositions you should plan for:

  • Velocity. Rate-limit queries per minute with velocity-guard. Data-layer guards are latency-sensitive (SQL parsing, regex matching), so velocity runs first and drops runaway loops before they spend parser time.
  • Data flow. Session-level bytes-read limits live on data-flow-guard. Data-layer guards enforce per-query ceilings (top_k, MaxBytesScanned); data-flow enforces the cumulative total across every query in the session. Both apply.
  • Egress allowlist. If your tool server talks to an external database over the network, egress-allowlist-guard ensures it reaches only the approved hosts. The data-layer guards check the query content; the egress guard checks the endpoint. Both apply.

A small HushSpec block that layers all three:

yaml
velocity:
  max_tool_calls_per_window:
    window_seconds: 60
    limit: 120
  max_spend_per_window:
    window_seconds: 3600
    limit_usd: "50.00"

data_flow:
  max_bytes_read_per_session: 104857600        # 100 MiB

egress_allowlist:
  hosts:
    - "bigquery.googleapis.com"
    - "db.internal:5432"
    - "pinecone.io"

sql_query:
  # ... as above
warehouse_cost:
  # ... as above
vector_db:
  # ... as above

Debugging Denials

Every deny reason across the six guards is a structured enum with a short stable code. The receipt captures the code verbatim so you can aggregate denials by reason without log-string parsing. A denied receipt looks like this:

json
{
  "receipt_id": "rcpt_01HXYZ...",
  "tool_name": "sql_query",
  "server_id": "analytics-db",
  "verdict": "deny",
  "guard": "sql-query",
  "deny_reason": {
    "code": "table_not_allowed",
    "message": "table 'salaries' is not in the allowlist",
    "detail": { "table": "salaries" }
  },
  "metadata": {
    "engine": "postgres",
    "database": "analytics",
    "query_hash": "sha256:a1b2c3..."
  }
}

The codes to know when triaging a denial:

GuardCodeWhat to fix
sql-queryparse_errorquery is malformed or uses syntax the dialect parser rejects; check dialect setting
sql-querytable_not_allowedadd the table to table_allowlist or fix the query
sql-querycolumn_not_allowedadd the column to the per-table allowlist or remove it from the SELECT
sql-queryselect_star_deniedenumerate columns explicitly; SELECT * is denied when the table has a column allowlist
sql-querymissing_where_clauseadd a WHERE to the UPDATE or DELETE, or disable require_where_for_mutations (not recommended)
sql-querypredicate_denylistedthe WHERE matched a regex in denylisted_predicates; rewrite or prune the rule
vector-dbcollection_not_allowedadd the collection to collection_allowlist
vector-dbtop_k_exceeds_limitreduce top_k or raise MaxRowsReturned on the grant
warehouse-costmissing_estimatetool server must attach dry_run.bytes_scanned and dry_run.estimated_cost_usd
warehouse-costbytes_exceeds_limitquery would scan too much; add filters, partition predicates, or raise max_bytes_scanned
warehouse-costcost_exceeds_limitsame fix path as bytes, or raise max_cost_per_query_usd with approval
allno_configthe guard is registered but has no allowlists; either populate the policy or remove the guard

The query audit receipts guide walks through the query-specific receipt fields and shows how to aggregate denials by code for dashboards and alerting.


Summary

EngineGuardKey constraintTypical receipt field
Postgres, MySQL, SQL Server, SQLitesql-querytable_allowlist, column_allowlistengine, tables_accessed, query_hash
Pinecone, Qdrant, Weaviate, Milvusvector-dbcollection_allowlist, MaxRowsReturnedcollection, namespace, top_k
BigQuery, Snowflake, Redshift, Athena, Databrickswarehouse-costmax_bytes_scanned, max_cost_per_query_usdCostDimension::WarehouseQuery with bytes_scanned and estimated_cost_usd
Neo4j, Neptunegraph-traversalmax_traversal_depth, label_allowlistengine, max_depth, labels
Redis, Memcachedcache-keykey_pattern_allowlist, dangerous_commandscommand, key, pattern
any (post-invocation)query-resultMaxRowsReturned, ColumnDenylist, redact_pii_patternsrows_returned, columns_redacted

Next Steps

  • Guards · how the six data-layer guards fit into the broader guard model
  • Write a Policy · HushSpec syntax for constraints, allowlists, and grant scoping
  • Custom Guards · add organization-specific logic on top of the built-in six
  • Native Tool Server · build a tool server that submits the contract these guards expect
  • Receipts · the signed audit record every database query produces
  • Query Audit Receipts · aggregate database receipts for compliance and cost reporting
Govern Database Queries · Chio Docs