1. Executive summary#
Airlock is infrastructure that lets AI agents answer questions about customer data without ever touching the customer's production database. Agents connect to Airlock's control plane over MCP. The control plane routes tool calls through a secure tunnel to a worker that runs inside the customer's own VPC. The worker answers queries from ephemeral DuckDB snapshots — filtered, masked, per-user copies of the source data — not the live database.
What this means for a security reviewer:
- No agent ever holds database credentials. The worker does. The worker lives in your infrastructure.
- No production row is read at query time. Agents query the DuckDB snapshot; the snapshot was pre-computed with column-level masking applied at export.
- No SQL the agent writes can reach the internet. An egress guard rejects DuckDB functions that read or write over HTTP/S3/etc. before DuckDB ever sees the query.
- No SQL text or result rows are persisted outside your VPC. Airlock's hosted control plane records routing metadata (tool name, snapshot id, latency, outcome) — not query content.
Airlock does not claim to replace your existing database access controls, secret management, or intrusion detection. It adds a specific, narrowly-scoped layer that makes agentic access safer than direct DB credentials.
2. Architecture#
┌─────────────────────────────────┐ ┌────────────────────────────┐
│ CUSTOMER VPC │ │ AIRLOCK INFRASTRUCTURE │
│ │ │ │
│ ┌─────────────┐ │ │ │
│ │ Production │ │ │ │
│ │ Postgres │◀───SELECT──────┼──┐ │ │
│ └─────────────┘ │ │ │ │
│ │ │ │ │
│ ┌──────────────────────────┐ │ │ │ │
│ │ Airlock Worker │───┼──┘ │ │
│ │ ├─ read-only DB role │ │ │ │
│ │ ├─ export pipeline │ │ │ │
│ │ ├─ DuckDB snapshots │───┼────┐ │ │
│ │ ├─ SQL egress guard │ │ │ │ │
│ │ └─ Ed25519 identity │───┼────┼──▶│ Control plane │
│ └──────────────────────────┘ │ │ │ ├─ MCP edge (JSON-RPC) │
│ │ │ │ ├─ Admin API │◀── Operator
│ │ │ │ ├─ Audit fan-out (SSE) │ console
│ │ │ │ └─ Tunnel (WSS, outbound) │
│ │ │ └──────┬─────────────────────┘
└─────────────────────────────────┘ │ │
│ │ MCP tool calls
│ │
Agent (customer's LiteLLM, Cursor, │ ▼
Claude Code, custom) ──────MCP────────┼────▶ bearer API key
│
▼
DuckDB snapshot (query path)
Component inventory:
| Component | Runs where | What it holds |
|---|---|---|
| Production source DB | Customer VPC | Customer-owned data |
| Airlock worker | Customer VPC | Snapshot files, its own Ed25519 key, DATABASE_URL (env) |
| Airlock control plane (CP) | Airlock-hosted (Fly.io, iad region) | Tenant registry, API-key hashes (Argon2id), audit log (metadata only) |
| Airlock console | Airlock-hosted (Cloudflare Workers) | Dashboard UI; proxies admin traffic; holds no customer data |
| Customer's agent | Customer-controlled | Bearer API key (Argon2id-hashed server-side) |
Data flow — query path#
- Agent issues MCP tool call to
https://cp.airlocklabs.ai/mcp/{tenant}withAuthorization: Bearer ak_live_…. - CP validates bearer against Argon2id hash; checks
allowed_snapshotsscope; checksallowed_tools. - CP mints a forwarded-identity JWT (Ed25519, 30s TTL) carrying
{tenant, api_key_id, snapshot_id, tool}. - CP forwards the request as a
REQUESTframe over the persistent WebSocket tunnel to the customer's worker. - Worker verifies the JWT against the CP's pinned public key; checks the JWT's
jtiagainst a replay cache. - Worker runs the egress guard over the SQL (sqlglot AST walk). Blocked:
read_csv_auto,read_parquet,read_json_auto,httpfs,s3_scan,COPY ... TO, URL-scheme literals, any non-SELECT/WITH. - Worker opens the per-user DuckDB snapshot
read_only=Truewith a 5-second query timeout and 500-row cap. - Results return over the same tunnel as a
RESPONSEframe. - CP returns to the agent. CP audit records
{tenant, api_key_id, worker_id, tool, snapshot_id, outcome, latency_ms, trace_id}— no SQL text, no row content.
Data flow — export path#
Default: ephemeral on-demand. The worker exports the moment an MCP tool call arrives for a user it doesn't have cached in tmpfs (or whose cache is older than snapshot_ttl_s, default 5 min). The export blocks the call for ~1–10s on the first hit; subsequent calls within the TTL window read the warm tmpfs file in sub-ms. Customers with predictable peaks can optionally pre-warm via airlock export --all from cron, but it isn't required.
The CLI supports --all, --user-id <id>, and --tenant-filter <id> for multi-tenant scoping; the foot-gun where a user identifier alone could match rows from another tenant on a shared shard is blocked at the CLI when tenant_isolation.required is set in the YAML.
- Worker reads its
airlock.yaml(root table, FK graph, mask rules) andDATABASE_URLenv var. - Walks FK graph outward from the root table filtered to one snapshot owner.
- For each table in scope:
SELECT {approved columns} FROM ... WHERE ..., copies to DuckDB. - Runs mask UPDATEs on the DuckDB copy (hash / redact / null).
- Writes
/dev/shm/airlock/{snapshot_id}.duckdb(RAM-backed tmpfs on Linux) — atomic:.tmp→os.replace— plus a.manifest.jsonsidecar carryingexported_at,row_counts,source_position, andconfig_hash. A crash mid-export never leaves a half-written file at the canonical path. - Background reaper deletes any snapshot whose manifest
exported_at + snapshot_ttl_sis in the past. A worker process exit (crash, decommission) discards every snapshot atomically — they live in RAM, not on persistent disk. - Source DB is never written to. Excluded columns never land in the snapshot. Masked columns exist only in masked form.
3. Trust boundaries#
| Boundary | Inside | Outside | What crosses |
|---|---|---|---|
| Customer VPC | Source DB, worker, DuckDB snapshots, DATABASE_URL, worker private key, raw row data | Everything else | Over outbound WSS to CP: tool request envelopes, tool response envelopes, heartbeats. Over TLS to source DB: SELECT queries from the worker. |
| Airlock control plane | Argon2id hashes, Ed25519 keys, audit metadata (no payload) | Agent-side, customer-side | Over HTTPS to agents: MCP JSON-RPC. Over WSS to workers: REQUEST/RESPONSE frames (LLM tool calls, signed with a tool-bound JWT) and ADMIN_REQUEST frames (operator-only config reads/writes, signed with an admin-bound JWT). The two paths share the WS but use distinct frame types and distinct JWT shapes — an LLM API key cannot reach the admin handler under any failure mode. |
| Airlock console | Operator sessions, admin actions | Customers' row data | Over HTTPS to CP: admin API calls. Browser never talks to CP directly — the console's Next.js server proxies with the operator token. The Config page round-trips a tenant's airlock.yaml through the admin channel, but the YAML never persists anywhere on the CP side. |
| Agent | Bearer API key | Customer data | MCP calls only; no direct DB access, no filesystem access. |
4. Data handling#
| Data | Where it lives | Who can see it | Retention | Notes |
|---|---|---|---|---|
Source DB credentials (DATABASE_URL) | Worker env var | Customer | Rotated per customer policy | Never transmitted; never in logs (sanitized to host:port/db) |
| Production row data | Source DB | Worker, via read-only role | Source DB policy | Worker uses SELECT only |
| Masked/excluded row data | DuckDB snapshot in RAM-backed tmpfs on the worker host (/dev/shm/<user>.duckdb on Linux) | Worker process | Configurable TTL (default 5 min) of inactivity; reaper deletes on expiry; a worker restart wipes everything | Snapshots are per-user; no cross-user data in a single file. Ephemeral by default — never written to persistent disk. |
| SQL query text (Mode A) | CP process memory during request | CP process only | Not persisted; not logged | Mode B (E2E encryption) on roadmap; CP will not see SQL in Mode B |
| Result rows (Mode A) | CP process memory during request | CP process only | Not persisted; not logged | Same Mode B guarantee |
| Audit metadata | CP disk (JSONL, append-only) + shipped to customer's SIEM | Airlock operators; customer via Console | Current default: unbounded append; per-tenant retention policy on roadmap | Fields: ts, tenant_id, api_key_id, worker_id, tool, snapshot_id, outcome, error_class, latency_ms, bytes_in/out, trace_id. Never: SQL, rows, schema content |
| Operator bearer token | CP env var (AIRLOCK_CP_OPERATOR_TOKEN) | Airlock operators only | Manual rotation | hmac.compare_digest verification; interim until Clerk/SSO |
| Per-tenant API keys | Argon2id hash in config.yaml / state.yaml on CP | Customer at creation (plaintext shown once); CP holds hash only | Soft-revoked on demand | Plaintext: ak_live_<tenant>_<random>; rotation API |
| Worker Ed25519 private key | Customer worker host | Customer only — airlock register generates the keypair locally on the worker host; only the public half is sent over the wire when redeeming a one-time enrollment token | Customer-managed | Public half is pinned in CP state.yaml / config.yaml after enrollment |
| CP Ed25519 private key | Fly persistent volume at /state/cp_ed25519.pem | Airlock operators | Manual rotation | Public half is pinned on every worker at install |
Data that never enters Airlock-hosted infrastructure:
- Source database credentials
- Source database row data (only exists in customer VPC)
- Worker private key material
5. Cryptography#
| Use | Algorithm | Notes |
|---|---|---|
| TLS in transit (agent → CP) | TLS 1.2+ (1.3 preferred) | Terminated at Fly.io edge |
| TLS in transit (console) | TLS 1.3 | Terminated at Cloudflare edge |
| WSS tunnel (worker → CP) | TLS 1.2+ | Outbound-only; no inbound firewall rule on customer side |
| Tunnel handshake authenticity | Ed25519 signed nonce | Worker signs CP-issued nonce; CP validates against pinned public key registered at install |
| Per-request forwarded identity | JWT / Ed25519 / EdDSA | Signed by CP, 30s TTL, aud pinned to worker id, jti replay-cached |
| API-key storage | Argon2id (argon2-cffi defaults) | hmac.compare_digest for constant-time check |
| DuckDB file at rest | None by default | Lives on customer disk; customer applies disk encryption per their policy |
| E2E payload encryption (planned) | X25519 + ChaCha20-Poly1305 (Mode B) | Will encrypt SQL + results end-to-end between agent client and worker; CP becomes payload-opaque. Envelope shape is already deployed; key exchange + client-side shim pending. |
No cryptography is rolled in-house. All primitives come from cryptography (BoringSSL-backed) and PyJWT.
6. Access control#
Customer-side (worker)#
- Worker runs as non-root user in customer's VPC.
- Connects to source DB using a read-only role the customer provisions.
- Outbound network: only the CP's WebSocket URL + the source DB. Customer-side firewall / security group is expected to enforce this.
- No inbound ports are required for operation. Optional
:9090/metrics(Prometheus scrape) can be enabled for ops; counts and latencies only — no SQL or row data — and disabled entirely withAIRLOCK_METRICS_PORT=0. - Private key lives on disk at a customer-chosen path (default
/etc/airlock/worker_ed25519.pem, mode 0600).
Agent-side (per-tenant API keys)#
Each API key has a scope document:
allowed_snapshots:
mode: list # or "all" | "regex" | "prefix" | "callback"
ids: [u_42, u_88]
allowed_tools: [execute_sql, get_schema, null_rates]
The CP enforces scope on every tools/call. A key scoped to u_42 that
attempts u_43 returns -32005 scope_denied. Revocation is soft (sets
revoked_at); the active-key lookup filters on revoked_at IS NULL.
Operator-side (Airlock Console — auth model)#
The operator UI sits in front of CP's /v1/admin/* API. Three layers
of auth, each with a clear role:
- OIDC sign-in (planned, in progress). Operators authenticate via
Google or Okta SAML/OIDC handled by the console (
@auth/nextjs). The console performs the OIDC handshake; CP itself never talks to identity providers. On success the console issues a session JWT signed withAIRLOCK_SESSION_SECRET(shared between console and CP) carrying the operator's org + role claims. - Session JWT → CP. Every console-originated request to CP rides
with the session JWT in
Authorization: Bearer …. CP validates the signature, applies org-scope checks (an operator can only act on tenants their org owns), and rejects expired or invalid tokens. Sessions are short-lived; refresh happens through the OIDC flow. - Operator token (god-mode, break-glass). A long-lived
AIRLOCK_CP_OPERATOR_TOKENenv var on the CP host bypasses org-scope checks. Reserved for Terraform / CI / incident response. Constant-time compared viahmac.compare_digest. Rotation is manual; usage is logged in the audit log underactor: "operator-token"so any use is visible.
Today (pre-OIDC): the operator token is held in an httpOnly cookie
on the console; the server-side proxy forwards it to CP as
Authorization: Bearer …. The browser never sees the token
directly. Deployments must sit behind Cloudflare Access (or
equivalent) until OIDC ships.
Every admin action — login, tenant create/update, API-key mint or
revoke, config edit, break-glass token use — produces an audit event
on /state/audit.jsonl. Audit events are routing metadata only; SQL
text and row content are never recorded (see §7).
SQL egress guard (worker)#
airlock/db.py:_validate_query runs a sqlglot AST walk:
- Blocks any call to DuckDB functions that reach the network or filesystem outside the snapshot:
read_csv,read_csv_auto,read_parquet,read_json,read_json_auto,read_ndjson,httpfs,s3_scan,gcs_scan,azure_scan,delta_scan,iceberg_scan,http_get,http_post. - Blocks any string literal matching URL-scheme regex:
https?://,s3://,gcs://,azure://,r2://,wasbs?://,abfss?://. - Blocks
COPY ... TOunconditionally. - Blocks any
DML(INSERT/UPDATE/DELETE/MERGE) orDDL(CREATE/ALTER/DROP) node anywhere in the AST, including inside CTEs. - Opens the DuckDB connection
read_only=Trueas belt-and-suspenders.
7. Audit#
Every tool call produces one audit record. Records are emitted from the CP after the tool call completes (success or failure). Fields are fixed:
{
"ts": 1745320000.123, // unix seconds
"trace_id": "…", // propagated from the agent's X-Airlock-Trace-Id
"tenant_id": "t_acme",
"api_key_id": "ak_…",
"worker_id": "w_…", // null if denied before tunnel
"tool": "execute_sql",
"snapshot_id": "u_42",
"outcome": "ok", // ok | denied | error
"error_class": null, // populated when outcome != ok (e.g. "egress_blocked", "scope_denied")
"latency_ms": 142,
"bytes_in": 0,
"bytes_out": 0
}
Records are:
- Append-only on the CP volume (
/state/audit.jsonl). - Streamed live over SSE to the Airlock Console (tenant-scoped via operator auth).
- Exportable to a customer's SIEM: S3 + Object Lock is the reference config; Splunk and Datadog are on the near-term roadmap.
Records do not contain:
- SQL text.
- Result rows.
- Schema names or column values.
- The source DB URL or any credential.
This is intentional. Mode A's CP holds tool names and sizes in memory during the request and discards them on response; nothing about the payload lands on disk or in the audit.
8. Secure development#
- Source code on GitHub (airlockai org, private).
- All merges via pull request; at least one approving review before merge to
mainonce the team exceeds one engineer. - Dependency updates tracked via Dependabot (planned) /
npm audit+pip-auditlocally. - Tests run on every PR via GitHub Actions (in progress). Target: worker + CP + console unit tests green before merge.
- No customer credentials or production data in test fixtures.
- Pre-commit hooks reject committed secrets (planned:
gitleaks).
9. Incident response#
- Reporting: security@airlocklabs.ai. PGP key at
/.well-known/security.txt(pending publication). - Notification SLA: 24 hours from confirmed incident affecting customer data to initial customer notification.
- Status: status.airlocklabs.ai (planned).
- Containment runbook: documented in a private internal wiki; not published.
- Key rotation:
- CP Ed25519 key: manual, performed by an Airlock operator. All workers must be re-bootstrapped with the new public key.
- Per-tenant API keys: self-service via Console (
/api-keyspage → "Revoke"). - Operator token: rotated via the CP host's secret store; setting a new value restarts the CP process and invalidates the previous token on the next request.
- Worker keys: customer-managed; rotation policy is the customer's.
10. Sub-processors#
Airlock uses the following third parties. Customers are notified of additions at least 30 days before they go live (roadmap; no changes since this document's publication).
| Sub-processor | Purpose | Location |
|---|---|---|
| Fly.io | Hosts the control plane | US (iad region) |
| Cloudflare | DNS, CDN for marketing site, Workers for Console, Access for operator SSO | Global edge, US data residency for logs |
| GitHub | Source hosting, CI | US |
| Anthropic | Optional — used only when a customer chooses Claude in the Playground | US |
| Google (Gemini API) | Optional — used only when a customer chooses Gemini in the Playground | US |
Sub-processors that touch only Airlock-owned metadata, never customer row data: Fly.io, Cloudflare, GitHub. Sub-processors that may touch customer data when the customer opts in: Anthropic, Google (only when the Playground is used, which calls the model with the tool response text).
No PII or credentials are stored at any sub-processor.
11. Compliance#
We hold no third-party attestations today. The controls described elsewhere in this document are what we run; nothing in the table below has been audited, and we have not signed a BAA or DPA with any customer to date. We'd rather state this plainly than imply readiness we haven't earned.
| Standard | Status |
|---|---|
| SOC 2 Type 1 | Not yet. Planning to scope once we have a stable design-partner cohort. |
| SOC 2 Type 2 | Not yet. Would follow Type 1 by ~12 months. |
| HIPAA | No formal attestation. No BAA signed with any customer to date. The technical controls in this document are designed with the Security Rule in mind, but that is a self-assessment, not a certification. |
| GDPR | No DPA template available yet. EU data-residency options are on the roadmap, not shipping. |
| PCI DSS | We do not intend to handle cardholder data. The egress guard and column-masking pipeline are designed to keep PANs out of snapshots. Treat this as an architectural descoping argument, not an attestation. |
| ISO 27001 | Not yet. |
If you have a specific questionnaire (SIG-Lite, CAIQ, VSA), email security@airlocklabs.ai and we'll answer what we can; we will mark anything we can't substantiate as such.
12. Known limitations#
We publish these explicitly rather than downplaying them. If any of these would block your use of Airlock, tell us before you pilot.
- Mode A (CP-terminated TLS) is the default today. The control plane has SQL and result rows in memory while a request is in flight. Nothing is persisted, but a hypothetical CP compromise could observe in-flight traffic. Mode B (X25519/ChaCha20 end-to-end encryption with the CP as payload-opaque relay) is designed and the envelope shape is deployed; client shim + key exchange are the remaining work.
- Single-operator-token on the console. Until Clerk Organizations lands, the console relies on a shared bearer token and must be gated by Cloudflare Access or an equivalent in front of the URL.
- Audit retention is currently unbounded append to a local JSONL file. Per-tenant retention policies + automatic archival to S3 with Object Lock are in-flight.
- Single-region CP. No multi-region replication yet. RPO is best-effort (Fly volume snapshots, daily).
- No managed worker-side runtime upgrades. Customers run their own k8s rollouts; we don't push container updates into customer VPCs. Multi-worker HA is supported (the CP round-robins across all connected replicas for a tenant, capped at 8 by default), but coordinating zero-downtime upgrades across them is the operator's call.
- No audit-log signing. Records are trusted to the extent the volume is trusted. WORM / merkle-chained audit is on the roadmap for SOC 2 Type 2.
- Runtime state lives on a single Fly volume rather than Postgres. Fine for pilot; blocker for multi-region. Postgres migration is designed.
- No automated vulnerability scanning of the deployed artifact yet. Planned: Trivy on the worker Docker image in CI.
13. Contact#
- Security: security@airlocklabs.ai
- Trust center: airlocklabs.ai/security (this page)
- Sub-processor changes: email security@airlocklabs.ai to be notified
- General: hello@airlocklabs.ai
Changelog#
- 2026-05-01 — v0.1. Initial publication.
Security team evaluating Airlock?
Send a single email to security@airlocklabs.ai and we'll respond with a signed PDF of this document, a DPA template, and answers to any questionnaire you use (SIG-Lite, CAIQ, VSA, or your own).