Trust center

How Airlock protects your customer data when AI agents query it.

Architecture, trust boundaries, data handling, cryptography, audit, and compliance posture. The full technical document is below; highlights here.

  • Your production database is never queried by an agent. Agents only read ephemeral DuckDB snapshots inside your VPC.
  • PII is masked at export — SSN, DOB, names, emails are neutralized before any query can touch them.
  • Queries are read-only and egress-guarded. DuckDB network functions (httpfs, read_csv_auto) are blocked before DuckDB ever sees them.
  • Our control plane records routing metadata only — never SQL text or result rows. Your DB credentials never leave your VPC.
Request PDFContact securityLast updated 2026-05-01 · v0.1

1. Executive summary#

Airlock is infrastructure that lets AI agents answer questions about customer data without ever touching the customer's production database. Agents connect to Airlock's control plane over MCP. The control plane routes tool calls through a secure tunnel to a worker that runs inside the customer's own VPC. The worker answers queries from ephemeral DuckDB snapshots — filtered, masked, per-user copies of the source data — not the live database.

What this means for a security reviewer:

  • No agent ever holds database credentials. The worker does. The worker lives in your infrastructure.
  • No production row is read at query time. Agents query the DuckDB snapshot; the snapshot was pre-computed with column-level masking applied at export.
  • No SQL the agent writes can reach the internet. An egress guard rejects DuckDB functions that read or write over HTTP/S3/etc. before DuckDB ever sees the query.
  • No SQL text or result rows are persisted outside your VPC. Airlock's hosted control plane records routing metadata (tool name, snapshot id, latency, outcome) — not query content.

Airlock does not claim to replace your existing database access controls, secret management, or intrusion detection. It adds a specific, narrowly-scoped layer that makes agentic access safer than direct DB credentials.


2. Architecture#

 ┌─────────────────────────────────┐        ┌────────────────────────────┐
 │ CUSTOMER VPC                    │        │ AIRLOCK INFRASTRUCTURE     │
 │                                 │        │                            │
 │  ┌─────────────┐                │        │                            │
 │  │ Production  │                │        │                            │
 │  │ Postgres    │◀───SELECT──────┼──┐     │                            │
 │  └─────────────┘                │  │     │                            │
 │                                 │  │     │                            │
 │  ┌──────────────────────────┐   │  │     │                            │
 │  │ Airlock Worker           │───┼──┘     │                            │
 │  │  ├─ read-only DB role    │   │        │                            │
 │  │  ├─ export pipeline      │   │        │                            │
 │  │  ├─ DuckDB snapshots     │───┼────┐   │                            │
 │  │  ├─ SQL egress guard     │   │    │   │                            │
 │  │  └─ Ed25519 identity     │───┼────┼──▶│ Control plane              │
 │  └──────────────────────────┘   │    │   │  ├─ MCP edge (JSON-RPC)    │
 │                                 │    │   │  ├─ Admin API              │◀── Operator
 │                                 │    │   │  ├─ Audit fan-out (SSE)    │    console
 │                                 │    │   │  └─ Tunnel (WSS, outbound) │
 │                                 │    │   └──────┬─────────────────────┘
 └─────────────────────────────────┘    │          │
                                        │          │ MCP tool calls
                                        │          │
  Agent (customer's LiteLLM, Cursor,    │          ▼
  Claude Code, custom) ──────MCP────────┼────▶ bearer API key
                                        │
                                        ▼
                                   DuckDB snapshot (query path)

Component inventory:

ComponentRuns whereWhat it holds
Production source DBCustomer VPCCustomer-owned data
Airlock workerCustomer VPCSnapshot files, its own Ed25519 key, DATABASE_URL (env)
Airlock control plane (CP)Airlock-hosted (Fly.io, iad region)Tenant registry, API-key hashes (Argon2id), audit log (metadata only)
Airlock consoleAirlock-hosted (Cloudflare Workers)Dashboard UI; proxies admin traffic; holds no customer data
Customer's agentCustomer-controlledBearer API key (Argon2id-hashed server-side)

Data flow — query path#

  1. Agent issues MCP tool call to https://cp.airlocklabs.ai/mcp/{tenant} with Authorization: Bearer ak_live_….
  2. CP validates bearer against Argon2id hash; checks allowed_snapshots scope; checks allowed_tools.
  3. CP mints a forwarded-identity JWT (Ed25519, 30s TTL) carrying {tenant, api_key_id, snapshot_id, tool}.
  4. CP forwards the request as a REQUEST frame over the persistent WebSocket tunnel to the customer's worker.
  5. Worker verifies the JWT against the CP's pinned public key; checks the JWT's jti against a replay cache.
  6. Worker runs the egress guard over the SQL (sqlglot AST walk). Blocked: read_csv_auto, read_parquet, read_json_auto, httpfs, s3_scan, COPY ... TO, URL-scheme literals, any non-SELECT/WITH.
  7. Worker opens the per-user DuckDB snapshot read_only=True with a 5-second query timeout and 500-row cap.
  8. Results return over the same tunnel as a RESPONSE frame.
  9. CP returns to the agent. CP audit records {tenant, api_key_id, worker_id, tool, snapshot_id, outcome, latency_ms, trace_id} — no SQL text, no row content.

Data flow — export path#

Default: ephemeral on-demand. The worker exports the moment an MCP tool call arrives for a user it doesn't have cached in tmpfs (or whose cache is older than snapshot_ttl_s, default 5 min). The export blocks the call for ~1–10s on the first hit; subsequent calls within the TTL window read the warm tmpfs file in sub-ms. Customers with predictable peaks can optionally pre-warm via airlock export --all from cron, but it isn't required.

The CLI supports --all, --user-id <id>, and --tenant-filter <id> for multi-tenant scoping; the foot-gun where a user identifier alone could match rows from another tenant on a shared shard is blocked at the CLI when tenant_isolation.required is set in the YAML.

  1. Worker reads its airlock.yaml (root table, FK graph, mask rules) and DATABASE_URL env var.
  2. Walks FK graph outward from the root table filtered to one snapshot owner.
  3. For each table in scope: SELECT {approved columns} FROM ... WHERE ..., copies to DuckDB.
  4. Runs mask UPDATEs on the DuckDB copy (hash / redact / null).
  5. Writes /dev/shm/airlock/{snapshot_id}.duckdb (RAM-backed tmpfs on Linux) — atomic: .tmpos.replace — plus a .manifest.json sidecar carrying exported_at, row_counts, source_position, and config_hash. A crash mid-export never leaves a half-written file at the canonical path.
  6. Background reaper deletes any snapshot whose manifest exported_at + snapshot_ttl_s is in the past. A worker process exit (crash, decommission) discards every snapshot atomically — they live in RAM, not on persistent disk.
  7. Source DB is never written to. Excluded columns never land in the snapshot. Masked columns exist only in masked form.

3. Trust boundaries#

BoundaryInsideOutsideWhat crosses
Customer VPCSource DB, worker, DuckDB snapshots, DATABASE_URL, worker private key, raw row dataEverything elseOver outbound WSS to CP: tool request envelopes, tool response envelopes, heartbeats. Over TLS to source DB: SELECT queries from the worker.
Airlock control planeArgon2id hashes, Ed25519 keys, audit metadata (no payload)Agent-side, customer-sideOver HTTPS to agents: MCP JSON-RPC. Over WSS to workers: REQUEST/RESPONSE frames (LLM tool calls, signed with a tool-bound JWT) and ADMIN_REQUEST frames (operator-only config reads/writes, signed with an admin-bound JWT). The two paths share the WS but use distinct frame types and distinct JWT shapes — an LLM API key cannot reach the admin handler under any failure mode.
Airlock consoleOperator sessions, admin actionsCustomers' row dataOver HTTPS to CP: admin API calls. Browser never talks to CP directly — the console's Next.js server proxies with the operator token. The Config page round-trips a tenant's airlock.yaml through the admin channel, but the YAML never persists anywhere on the CP side.
AgentBearer API keyCustomer dataMCP calls only; no direct DB access, no filesystem access.

4. Data handling#

DataWhere it livesWho can see itRetentionNotes
Source DB credentials (DATABASE_URL)Worker env varCustomerRotated per customer policyNever transmitted; never in logs (sanitized to host:port/db)
Production row dataSource DBWorker, via read-only roleSource DB policyWorker uses SELECT only
Masked/excluded row dataDuckDB snapshot in RAM-backed tmpfs on the worker host (/dev/shm/<user>.duckdb on Linux)Worker processConfigurable TTL (default 5 min) of inactivity; reaper deletes on expiry; a worker restart wipes everythingSnapshots are per-user; no cross-user data in a single file. Ephemeral by default — never written to persistent disk.
SQL query text (Mode A)CP process memory during requestCP process onlyNot persisted; not loggedMode B (E2E encryption) on roadmap; CP will not see SQL in Mode B
Result rows (Mode A)CP process memory during requestCP process onlyNot persisted; not loggedSame Mode B guarantee
Audit metadataCP disk (JSONL, append-only) + shipped to customer's SIEMAirlock operators; customer via ConsoleCurrent default: unbounded append; per-tenant retention policy on roadmapFields: ts, tenant_id, api_key_id, worker_id, tool, snapshot_id, outcome, error_class, latency_ms, bytes_in/out, trace_id. Never: SQL, rows, schema content
Operator bearer tokenCP env var (AIRLOCK_CP_OPERATOR_TOKEN)Airlock operators onlyManual rotationhmac.compare_digest verification; interim until Clerk/SSO
Per-tenant API keysArgon2id hash in config.yaml / state.yaml on CPCustomer at creation (plaintext shown once); CP holds hash onlySoft-revoked on demandPlaintext: ak_live_<tenant>_<random>; rotation API
Worker Ed25519 private keyCustomer worker hostCustomer only — airlock register generates the keypair locally on the worker host; only the public half is sent over the wire when redeeming a one-time enrollment tokenCustomer-managedPublic half is pinned in CP state.yaml / config.yaml after enrollment
CP Ed25519 private keyFly persistent volume at /state/cp_ed25519.pemAirlock operatorsManual rotationPublic half is pinned on every worker at install

Data that never enters Airlock-hosted infrastructure:

  • Source database credentials
  • Source database row data (only exists in customer VPC)
  • Worker private key material

5. Cryptography#

UseAlgorithmNotes
TLS in transit (agent → CP)TLS 1.2+ (1.3 preferred)Terminated at Fly.io edge
TLS in transit (console)TLS 1.3Terminated at Cloudflare edge
WSS tunnel (worker → CP)TLS 1.2+Outbound-only; no inbound firewall rule on customer side
Tunnel handshake authenticityEd25519 signed nonceWorker signs CP-issued nonce; CP validates against pinned public key registered at install
Per-request forwarded identityJWT / Ed25519 / EdDSASigned by CP, 30s TTL, aud pinned to worker id, jti replay-cached
API-key storageArgon2id (argon2-cffi defaults)hmac.compare_digest for constant-time check
DuckDB file at restNone by defaultLives on customer disk; customer applies disk encryption per their policy
E2E payload encryption (planned)X25519 + ChaCha20-Poly1305 (Mode B)Will encrypt SQL + results end-to-end between agent client and worker; CP becomes payload-opaque. Envelope shape is already deployed; key exchange + client-side shim pending.

No cryptography is rolled in-house. All primitives come from cryptography (BoringSSL-backed) and PyJWT.


6. Access control#

Customer-side (worker)#

  • Worker runs as non-root user in customer's VPC.
  • Connects to source DB using a read-only role the customer provisions.
  • Outbound network: only the CP's WebSocket URL + the source DB. Customer-side firewall / security group is expected to enforce this.
  • No inbound ports are required for operation. Optional :9090/metrics (Prometheus scrape) can be enabled for ops; counts and latencies only — no SQL or row data — and disabled entirely with AIRLOCK_METRICS_PORT=0.
  • Private key lives on disk at a customer-chosen path (default /etc/airlock/worker_ed25519.pem, mode 0600).

Agent-side (per-tenant API keys)#

Each API key has a scope document:

allowed_snapshots:
  mode: list          # or "all" | "regex" | "prefix" | "callback"
  ids: [u_42, u_88]
allowed_tools: [execute_sql, get_schema, null_rates]

The CP enforces scope on every tools/call. A key scoped to u_42 that attempts u_43 returns -32005 scope_denied. Revocation is soft (sets revoked_at); the active-key lookup filters on revoked_at IS NULL.

Operator-side (Airlock Console — auth model)#

The operator UI sits in front of CP's /v1/admin/* API. Three layers of auth, each with a clear role:

  • OIDC sign-in (planned, in progress). Operators authenticate via Google or Okta SAML/OIDC handled by the console (@auth/nextjs). The console performs the OIDC handshake; CP itself never talks to identity providers. On success the console issues a session JWT signed with AIRLOCK_SESSION_SECRET (shared between console and CP) carrying the operator's org + role claims.
  • Session JWT → CP. Every console-originated request to CP rides with the session JWT in Authorization: Bearer …. CP validates the signature, applies org-scope checks (an operator can only act on tenants their org owns), and rejects expired or invalid tokens. Sessions are short-lived; refresh happens through the OIDC flow.
  • Operator token (god-mode, break-glass). A long-lived AIRLOCK_CP_OPERATOR_TOKEN env var on the CP host bypasses org-scope checks. Reserved for Terraform / CI / incident response. Constant-time compared via hmac.compare_digest. Rotation is manual; usage is logged in the audit log under actor: "operator-token" so any use is visible.

Today (pre-OIDC): the operator token is held in an httpOnly cookie on the console; the server-side proxy forwards it to CP as Authorization: Bearer …. The browser never sees the token directly. Deployments must sit behind Cloudflare Access (or equivalent) until OIDC ships.

Every admin action — login, tenant create/update, API-key mint or revoke, config edit, break-glass token use — produces an audit event on /state/audit.jsonl. Audit events are routing metadata only; SQL text and row content are never recorded (see §7).

SQL egress guard (worker)#

airlock/db.py:_validate_query runs a sqlglot AST walk:

  • Blocks any call to DuckDB functions that reach the network or filesystem outside the snapshot: read_csv, read_csv_auto, read_parquet, read_json, read_json_auto, read_ndjson, httpfs, s3_scan, gcs_scan, azure_scan, delta_scan, iceberg_scan, http_get, http_post.
  • Blocks any string literal matching URL-scheme regex: https?://, s3://, gcs://, azure://, r2://, wasbs?://, abfss?://.
  • Blocks COPY ... TO unconditionally.
  • Blocks any DML (INSERT/UPDATE/DELETE/MERGE) or DDL (CREATE/ALTER/DROP) node anywhere in the AST, including inside CTEs.
  • Opens the DuckDB connection read_only=True as belt-and-suspenders.

7. Audit#

Every tool call produces one audit record. Records are emitted from the CP after the tool call completes (success or failure). Fields are fixed:

{
  "ts": 1745320000.123,         // unix seconds
  "trace_id": "…",              // propagated from the agent's X-Airlock-Trace-Id
  "tenant_id": "t_acme",
  "api_key_id": "ak_…",
  "worker_id": "w_…",           // null if denied before tunnel
  "tool": "execute_sql",
  "snapshot_id": "u_42",
  "outcome": "ok",              // ok | denied | error
  "error_class": null,          // populated when outcome != ok (e.g. "egress_blocked", "scope_denied")
  "latency_ms": 142,
  "bytes_in": 0,
  "bytes_out": 0
}

Records are:

  • Append-only on the CP volume (/state/audit.jsonl).
  • Streamed live over SSE to the Airlock Console (tenant-scoped via operator auth).
  • Exportable to a customer's SIEM: S3 + Object Lock is the reference config; Splunk and Datadog are on the near-term roadmap.

Records do not contain:

  • SQL text.
  • Result rows.
  • Schema names or column values.
  • The source DB URL or any credential.

This is intentional. Mode A's CP holds tool names and sizes in memory during the request and discards them on response; nothing about the payload lands on disk or in the audit.


8. Secure development#

  • Source code on GitHub (airlockai org, private).
  • All merges via pull request; at least one approving review before merge to main once the team exceeds one engineer.
  • Dependency updates tracked via Dependabot (planned) / npm audit + pip-audit locally.
  • Tests run on every PR via GitHub Actions (in progress). Target: worker + CP + console unit tests green before merge.
  • No customer credentials or production data in test fixtures.
  • Pre-commit hooks reject committed secrets (planned: gitleaks).

9. Incident response#

  • Reporting: security@airlocklabs.ai. PGP key at /.well-known/security.txt (pending publication).
  • Notification SLA: 24 hours from confirmed incident affecting customer data to initial customer notification.
  • Status: status.airlocklabs.ai (planned).
  • Containment runbook: documented in a private internal wiki; not published.
  • Key rotation:
    • CP Ed25519 key: manual, performed by an Airlock operator. All workers must be re-bootstrapped with the new public key.
    • Per-tenant API keys: self-service via Console (/api-keys page → "Revoke").
    • Operator token: rotated via the CP host's secret store; setting a new value restarts the CP process and invalidates the previous token on the next request.
    • Worker keys: customer-managed; rotation policy is the customer's.

10. Sub-processors#

Airlock uses the following third parties. Customers are notified of additions at least 30 days before they go live (roadmap; no changes since this document's publication).

Sub-processorPurposeLocation
Fly.ioHosts the control planeUS (iad region)
CloudflareDNS, CDN for marketing site, Workers for Console, Access for operator SSOGlobal edge, US data residency for logs
GitHubSource hosting, CIUS
AnthropicOptional — used only when a customer chooses Claude in the PlaygroundUS
Google (Gemini API)Optional — used only when a customer chooses Gemini in the PlaygroundUS

Sub-processors that touch only Airlock-owned metadata, never customer row data: Fly.io, Cloudflare, GitHub. Sub-processors that may touch customer data when the customer opts in: Anthropic, Google (only when the Playground is used, which calls the model with the tool response text).

No PII or credentials are stored at any sub-processor.


11. Compliance#

We hold no third-party attestations today. The controls described elsewhere in this document are what we run; nothing in the table below has been audited, and we have not signed a BAA or DPA with any customer to date. We'd rather state this plainly than imply readiness we haven't earned.

StandardStatus
SOC 2 Type 1Not yet. Planning to scope once we have a stable design-partner cohort.
SOC 2 Type 2Not yet. Would follow Type 1 by ~12 months.
HIPAANo formal attestation. No BAA signed with any customer to date. The technical controls in this document are designed with the Security Rule in mind, but that is a self-assessment, not a certification.
GDPRNo DPA template available yet. EU data-residency options are on the roadmap, not shipping.
PCI DSSWe do not intend to handle cardholder data. The egress guard and column-masking pipeline are designed to keep PANs out of snapshots. Treat this as an architectural descoping argument, not an attestation.
ISO 27001Not yet.

If you have a specific questionnaire (SIG-Lite, CAIQ, VSA), email security@airlocklabs.ai and we'll answer what we can; we will mark anything we can't substantiate as such.


12. Known limitations#

We publish these explicitly rather than downplaying them. If any of these would block your use of Airlock, tell us before you pilot.

  • Mode A (CP-terminated TLS) is the default today. The control plane has SQL and result rows in memory while a request is in flight. Nothing is persisted, but a hypothetical CP compromise could observe in-flight traffic. Mode B (X25519/ChaCha20 end-to-end encryption with the CP as payload-opaque relay) is designed and the envelope shape is deployed; client shim + key exchange are the remaining work.
  • Single-operator-token on the console. Until Clerk Organizations lands, the console relies on a shared bearer token and must be gated by Cloudflare Access or an equivalent in front of the URL.
  • Audit retention is currently unbounded append to a local JSONL file. Per-tenant retention policies + automatic archival to S3 with Object Lock are in-flight.
  • Single-region CP. No multi-region replication yet. RPO is best-effort (Fly volume snapshots, daily).
  • No managed worker-side runtime upgrades. Customers run their own k8s rollouts; we don't push container updates into customer VPCs. Multi-worker HA is supported (the CP round-robins across all connected replicas for a tenant, capped at 8 by default), but coordinating zero-downtime upgrades across them is the operator's call.
  • No audit-log signing. Records are trusted to the extent the volume is trusted. WORM / merkle-chained audit is on the roadmap for SOC 2 Type 2.
  • Runtime state lives on a single Fly volume rather than Postgres. Fine for pilot; blocker for multi-region. Postgres migration is designed.
  • No automated vulnerability scanning of the deployed artifact yet. Planned: Trivy on the worker Docker image in CI.

13. Contact#


Changelog#

  • 2026-05-01 — v0.1. Initial publication.
Have questions?

Security team evaluating Airlock?

Send a single email to security@airlocklabs.ai and we'll respond with a signed PDF of this document, a DPA template, and answers to any questionnaire you use (SIG-Lite, CAIQ, VSA, or your own).