Configuration reference#

airlock.yaml is the canonical config for one worker. It declares which tables to export per user, how the tables relate, and what to mask. The worker reads it once at boot; changes require a restart (no hot reload).

Database credentials are never read from YAML — set DATABASE_URL in the environment. The validator rejects database_url: in YAML explicitly so it can't be added by mistake.

Top-level fields#

FieldTypeDefaultDescription
root_tablestringusersThe table that defines a "snapshot owner". One row in this table = one snapshot.
root_filter_colstringidColumn on root_table that the agent uses as the user identifier (typically external_id).
tableslist[]Tables to include in each snapshot, listed in topological order (parents before children). See Tables.
data_dirstring/dev/shm/airlockWhere per-user .duckdb files are written. Default is RAM-backed tmpfs on Linux; override on macOS dev hosts.
snapshot_ttl_sint300Per-snapshot freshness window in seconds. After this, the next MCP call re-exports synchronously. 0 = re-export every call.
hintslist[]Free-text domain hints surfaced to the LLM as part of get_schema.
sample_querieslist[]Named example SQL queries surfaced to the LLM. See Sample queries.
tenant_isolationobjectnullMulti-tenant safety guard. See Tenant isolation.
control_planeobjectnullTunnel mode config. See Control plane. Without this block, the worker runs in direct FastMCP HTTP mode on localhost:8000.

Tables#

Each table entry declares one source-DB table that should appear in every per-user snapshot. The list is topologically sorted at boot so parents and via-targets are exported before their dependents.

tables:
  - name: users
    mask_columns:
      email: hash
      phone: redact
      ssn: "null"

  - name: accounts
    parent: users
    fk: user_id

  - name: transactions
    parent: accounts
    fk: account_id
    exclude_columns: [raw_payload]

  - name: merchants
    via: transactions
    via_fk: merchant_id

Table fields#

FieldTypeDescription
namestringSource-DB table name. Required.
pkstringPrimary key column name. Default: id.
parentstringThe table this is a forward-FK child of. Pair with fk.
fkstringColumn on this table that references parent.pk. Pair with parent.
viastring | listThe table whose rows pull this one in via reverse FK (e.g. merchants is pulled in by transactions.merchant_id). Pair with via_fk.
via_fkstringColumn on via that references this table's pk.
columnslistWhitelist of columns to project. Empty = SELECT *.
exclude_columnslistColumns to drop. Triggers a source-DESCRIBE to enumerate the rest.
mask_columnsobjectcolumn → policy. See Masking.
local_tablestringOverride the local DuckDB table name (default: same as name).
source_tablestringOverride the source-DB table name (default: same as name). Useful when local + source names differ.

Edge types#

There are two ways a non-root table joins the export graph:

  • Forward FK (parent + fk): rows on this table where fk IN ( parent.pk). Use when this table is "many" relative to its parent.
  • Reverse FK (via + via_fk): rows on this table whose pk is referenced by via.via_fk. Use when the parent table holds the FK pointing at this one (e.g. transactions.merchant_id pulls in merchants).

If a table has neither, the export skips it and logs a warning.

Masking#

mask_columns declares per-column PII policies. They run as UPDATE statements against the local DuckDB snapshot post-copy — your source database is never modified.

PolicyEffectUse for
hashsha256 hex of the valueJoinable PII (emails, names, account numbers)
redactReplaced with "***REDACTED***"Free-text PII (phone, address)
nullReplaced with NULL (must be quoted in YAML)Things the agent should never see (SSN, tokens, password hashes)

Quote "null" in YAML — unquoted null parses as None and the masker can't tell what you meant.

Tenant isolation#

For schemas where multiple tenants share a Postgres shard (e.g. one users table holds both Acme's and Beta's users), the tenant_isolation block prevents an export from accidentally reaching across tenants:

tenant_isolation:
  column: tenant_id
  required: true   # default; export refuses to run without --tenant-filter

When set with required: true, the airlock export CLI requires --tenant-filter <value> and injects it into root_filters for every per-user export. A user identifier alone can never match rows from another tenant.

The on-demand export path (per-MCP-call) reads tenant_id from the worker's control_plane.tenant_id so this guard is always in effect.

Control plane#

control_plane:
  cp_url: "wss://cp.airlocklabs.ai/v1/tunnel"
  cp_public_key_b64: "EL//8+W+Dy…"
  worker_id: "w_acme_prod"
  tenant_id: "t_acme"
  private_key_path: "/etc/airlock/worker_ed25519.pem"
FieldTypeDescription
cp_urlstringThe control plane's tunnel WebSocket URL.
cp_public_key_b64stringBase64-encoded Ed25519 public key of the CP. The worker pins this at install time so a compromised DNS doesn't let an imposter impersonate the CP. Same value across customers.
worker_idstringStable identifier for this worker. The CP uses it to route audit records and admin frames.
tenant_idstringStable identifier for your tenant. The CP uses it for routing; the worker uses it as the audience claim on every JWT it verifies.
private_key_pathstringPath to the worker's Ed25519 private key, generated locally by airlock register. Mode 0600 recommended. Never transmitted.

Without this block the worker runs in direct FastMCP HTTP mode on AIRLOCK_HOST:AIRLOCK_PORT (default localhost:8000) — useful for local dev, never for production.

Sample queries and hints#

hints:
  - Amounts are in cents (divide by 100 for dollars)
  - Use transacted_date for date filtering

sample_queries:
  - name: Monthly spending
    sql: |
      SELECT category, COUNT(*) AS txn_count, …

Both surface to the LLM as part of get_schema. Hints are free text; sample queries are paired name + SQL. Use them to encode domain knowledge that's hard to infer from column names alone (units, date conventions, denormalizations the agent shouldn't have to re-discover on every chat).

Playground prompt (per-tenant system prompt)#

The console's playground builds the agent's system prompt from a generic body plus an optional tenant-specific override:

playground_prompt: |
  You are a clinical analyst reviewing one patient's chart. Surface
  abnormal labs, medication changes, and encounter trends. Cite the
  table + column you queried; never invent values.

Surfaces to the playground via the manifest returned by get_schema — set this and the operator playground frames the agent's persona to your vertical. Leave it unset and the playground falls back to the generic "data-analyst" body. Edit live in the console at Config → Edit airlock.yaml; the schema-aware editor autocompletes the field and validates it on save.

Environment variables#

These override or supplement the YAML at runtime:

Env varDefaultEffect
DATABASE_URLRequired for export. postgres://… is supported today. mysql://… is on the roadmap (see Source databases below). Never read from YAML.
AIRLOCK_DATA_DIR/dev/shm/airlockDefault data_dir if YAML doesn't set one. macOS dev hosts must override (no /dev/shm).
AIRLOCK_QUERY_TIMEOUT5Seconds before a single SQL query is interrupted.
AIRLOCK_MAX_ROWS500Row cap on execute_sql results.
AIRLOCK_EXPORT_CONCURRENCY16Global semaphore on concurrent exports. Caps source-DB connection burn during cold-start storms.
AIRLOCK_METRICS_PORT9090Prometheus scrape port. 0 disables.
AIRLOCK_HOSTlocalhostDirect-mode FastMCP bind host.
AIRLOCK_PORT8000Direct-mode FastMCP port.

Source databases#

Airlock today supports PostgreSQL as the source database. Set DATABASE_URL to a postgres:// (or postgresql://) URL and the worker connects with psycopg2 over a read-only role.

export DATABASE_URL='postgresql://airlock_reader:...@db.internal:5432/prod'

MySQL (on roadmap)#

MySQL / MariaDB support is not shipped yet. The export driver infrastructure is in place via DuckDB's mysql extension, but the on-demand snapshot enrichment paths are postgres-only and we have not yet validated the MySQL flow against a real source. If you need MySQL, email hello@airlocklabs.ai so we can prioritize correctly.

The intended shape, once shipped, is identical to Postgres apart from the URL scheme:

# Roadmap — not supported today
export DATABASE_URL='mysql://airlock_reader:...@db.internal:3306/prod'

airlock.yaml itself stays the same: tables, mask_columns, tenant_isolation, etc. are driver-agnostic.

Validating a config#

airlock validate --config airlock.yaml

Exits 0 + JSON {"ok": true, "errors": []} on a valid file. Exits 1 with structured errors on a bad one. The console's Config page calls the same parse path before persisting any write — the validator is the only gate between an operator typo and a worker that can't restart.

Editing config from the console#

The Console's Config page round-trips airlock.yaml over the operator-token-gated admin channel. The worker validates each PUT by parsing on a temp file, atomic-renames into place on success, and logs config_dirty=true reload_required so you know to roll the deployment.

Workers continue serving the previous config until restart. There is intentionally no hot reload — config writes that change masking policies are too consequential to apply silently.