Skip to content

Commit 2bcaa1e

Browse files
authored
feat(basilica): mint scoped operator key for tenant runtime traffic (#239)
Splits per-tenant proxy auth into two roles so a compromised tenant app no longer holds admin scope on its own LLMTrace pod: - admin_key (bootstrap): retained by the caller. Used by the lifecycle layer to mint per-tenant keys and by the self-service / admin portal. - api_key (operator): minted on the live proxy via POST /api/v1/auth/keys after readiness, returned in TenantInstances.api_key. This is the bearer the tenant's runtime apps use. provision() now bootstraps the per-pod tenant row via POST /api/v1/tenants, mints the operator key, and injects LLMTRACE_AUTH_RUNTIME_KEY into the dashboard env (informational; dashboard wiring is a follow-up). update(strategy="restart") rediscovers the tenant by label, lists keys, and re-mints only when the operator record is missing. update(strategy= "recreate") always re-mints since the DB volume is destroyed. cli.py emits admin_key alongside api_key. The tenant-lifecycle workflow masks BOTH keys via ::add-mask:: before any cat result.json and exposes both as step outputs. Adds deployments/basilica/tests/ with 19 unit tests that exercise the real urllib admin-API client against an in-process http.server, plus a provision() integration test using a fake Basilica client.
1 parent 252650f commit 2bcaa1e

8 files changed

Lines changed: 1033 additions & 128 deletions

File tree

.github/workflows/tenant-lifecycle.yml

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -259,16 +259,18 @@ jobs:
259259
code=$?
260260
set -e
261261
262-
# Mask the api_key BEFORE any log line (cat / step summary) touches
263-
# result.json. ::add-mask:: is per-job and applies to subsequent log
264-
# lines, so registering it here covers the cat below + the summary
265-
# step + any downstream consumers that print step outputs.
262+
# Mask BOTH api_key and admin_key BEFORE any log line (cat /
263+
# step summary) touches result.json. ::add-mask:: is per-job
264+
# and applies to subsequent log lines, so registering them
265+
# here covers the cat below + the summary step + any
266+
# downstream consumers that print step outputs.
266267
python3 - <<'PY'
267268
import json, pathlib
268269
data = json.loads(pathlib.Path("result.json").read_text() or "{}")
269-
key = data.get("api_key")
270-
if key:
271-
print(f"::add-mask::{key}")
270+
for field in ("api_key", "admin_key"):
271+
value = data.get(field)
272+
if value:
273+
print(f"::add-mask::{value}")
272274
PY
273275
274276
cat result.json
@@ -280,10 +282,14 @@ jobs:
280282
print(f"dashboard_instance_id={data.get('dashboard_instance_id') or ''}")
281283
print(f"proxy_url={data.get('proxy_url') or ''}")
282284
print(f"dashboard_url={data.get('dashboard_url') or ''}")
283-
# api_key is masked; emitting it as a step output is fine because
284-
# downstream consumers (the caller's app via the Actions API) need it
285-
# to give the tenant their bearer token. The value is opaque to logs.
285+
# api_key (operator-scoped, given to the tenant) and admin_key
286+
# (bootstrap-scoped, retained by the caller for self-service /
287+
# admin pages) are masked above. Emitting them as step outputs
288+
# is fine because downstream consumers (the caller's app via
289+
# the Actions API) need both: api_key for the tenant's bearer,
290+
# admin_key for the platform's own admin pages.
286291
print(f"api_key={data.get('api_key') or ''}")
292+
print(f"admin_key={data.get('admin_key') or ''}")
287293
PY
288294
if [[ "${code}" -ne 0 ]]; then
289295
exit "${code}"

deployments/basilica/README.md

Lines changed: 101 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -298,69 +298,100 @@ wins for that tenant.
298298

299299
Both demonstrate `${VAR}` substitution and the optional override pattern.
300300

301-
## Per-tenant API key auth (secure by default)
301+
## Per-tenant API key auth (two-tier, scoped runtime key)
302302

303303
LLMTrace's proxy has built-in API-key auth (`crates/llmtrace-proxy/src/auth.rs`).
304304
Without it, the public Basilica URL accepts any request — anyone with the
305305
URL can burn the tenant's upstream quota and pollute their traces. With it
306306
on, every non-`/health` request must carry `Authorization: Bearer llmt_<key>`
307307
or get a 401.
308308

309-
The lifecycle library enables this **by default** at provision time:
309+
The lifecycle library enables auth **by default** at provision time, and
310+
issues TWO keys with distinct scopes:
310311

311-
1. Resolves a key, in priority order:
312+
| Key | Role | Who holds it | What it can do |
313+
|---|---|---|---|
314+
| `admin_key` | bootstrap admin | the **caller** (your app / portal) | Mint / list / revoke per-tenant keys, manage tenants, read audit logs, change feature flags. Used by the lifecycle layer itself and by your self-service / admin portal pages |
315+
| `api_key` | operator | the **tenant**'s runtime apps | Proxy LLM calls, write traces, report agent actions. **No** key management, **no** audit-log access, **no** tenant CRUD |
316+
317+
### Bootstrap sequence (what `provision()` does internally)
318+
319+
1. Resolves a bootstrap admin key, priority order:
312320
- Explicit `spec.api_key` from the caller (or `api_key:` field in the
313-
YAML config), used for plan-recreates that must preserve the tenant's
314-
existing key
315-
- Existing `LLMTRACE_AUTH_ADMIN_KEY` in `proxy.env` (rare — caller wrote
316-
it themselves)
317-
- Auto-generated `llmt_<64-hex>` via `generate_api_key()` matching the
318-
format produced by the Rust proxy at `auth.rs:44`
319-
2. Injects `LLMTRACE_AUTH_ENABLED=true` + `LLMTRACE_AUTH_ADMIN_KEY=<key>`
320-
into the proxy's env, and the same `LLMTRACE_AUTH_ADMIN_KEY` into the
321-
dashboard's env (so the dashboard can authenticate to the proxy's admin
322-
endpoints)
323-
3. Returns the plaintext key in `TenantInstances.api_key` — **this is the
324-
only time it's exposed**. Persist it in your app DB and ship it to the
325-
tenant.
321+
YAML config) — pin this if you need the admin key stable across
322+
recreates.
323+
- Existing `LLMTRACE_AUTH_ADMIN_KEY` in `proxy.env` (rare — caller
324+
wrote it themselves).
325+
- Auto-generated `llmt_<64-hex>` via `generate_api_key()`.
326+
2. Injects `LLMTRACE_AUTH_ENABLED=true` + `LLMTRACE_AUTH_ADMIN_KEY=<admin_key>`
327+
into BOTH the proxy and dashboard envs. The dashboard keeps the admin
328+
key so its server-side handlers can call the proxy's admin endpoints
329+
on behalf of the operator/portal UI.
330+
3. Creates the proxy deployment and waits for it to become ready.
331+
4. Calls `POST /api/v1/tenants` on the live proxy URL (auth: admin key)
332+
to materialise the per-pod tenant row. Captures the assigned tenant
333+
UUID.
334+
5. Calls `POST /api/v1/auth/keys` (auth: admin key, scoped to that
335+
tenant UUID) with body `{name: "tenant-runtime", role: "operator",
336+
tenant_id: <uuid>}` and captures the plaintext operator key.
337+
6. Adds `LLMTRACE_AUTH_RUNTIME_KEY=<operator_key>` to the dashboard env
338+
(informational today; consumed by a follow-up dashboard wiring
339+
change) and creates the dashboard deployment.
340+
7. Returns `TenantInstances(api_key=<operator>, admin_key=<admin>)`.
341+
342+
Both plaintext keys are exposed only at this moment. Persist them
343+
immediately in your app's secret store.
326344

327345
```python
328346
result = lifecycle.provision(spec)
329-
tenant_record.api_key = result.api_key # llmt_xxxxxxxxxxxx... — only exposed here
347+
tenant_record.api_key = result.api_key # operator-scoped — ship to the tenant
348+
tenant_record.admin_key = result.admin_key # bootstrap-scoped — retain in your app
330349
tenant_record.proxy_url = result.proxy.url
331350
db.session.commit()
332351
```
333352

334-
The CLI emits it in the result JSON:
353+
The CLI emits both in the result JSON:
335354

336355
```json
337356
{
338357
"tenant_id": "acme",
339358
"proxy_url": "https://...basilica.ai",
340359
"dashboard_url": "https://...basilica.ai",
341-
"api_key": "llmt_d34c8a..."
360+
"api_key": "llmt_op...",
361+
"admin_key": "llmt_ad..."
342362
}
343363
```
344364

345-
The workflow registers `::add-mask::` for the key before any `cat`
346-
operation, so it never appears in run logs — only in step outputs (which
365+
The workflow registers `::add-mask::` for BOTH keys before any `cat`
366+
operation, so neither appears in run logs — only in step outputs (which
347367
the calling app fetches via the Actions API).
348368

349369
### Tenant-side usage
350370

351-
The tenant programs their downstream apps to send their API key on every
352-
request to their proxy URL:
371+
The tenant programs their downstream apps to send the OPERATOR key on
372+
every request to their proxy URL:
353373

354374
```bash
355375
curl -X POST https://<proxy_uuid>.deployments.basilica.ai/v1/chat/completions \
356-
-H "Authorization: Bearer llmt_d34c8a..." \
376+
-H "Authorization: Bearer llmt_op..." \
357377
-H "Content-Type: application/json" \
358378
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}'
359379
```
360380

361-
A 401 means they used the wrong key. A 200 means LLMTrace authenticated
362-
them, then forwarded upstream (with whatever upstream auth was configured
363-
in `tenant_secrets`).
381+
A 401 means they used the wrong key. A 403 with "Insufficient permissions"
382+
means they tried an admin-only endpoint (`/api/v1/auth/keys`,
383+
`/api/v1/tenants`, etc.) with the operator key — by design.
384+
385+
### Update semantics
386+
387+
| Strategy | DB persistence | Operator key behaviour |
388+
|---|---|---|
389+
| `recreate` | DB volume is destroyed alongside the pods | Always re-minted. `result.api_key` carries the new operator key; caller must overwrite the stored value |
390+
| `restart` | Same DB volume — rows survive | The library lists `GET /api/v1/tenants` to rediscover the tenant by label, then lists `GET /api/v1/auth/keys` for that tenant. If a non-revoked `tenant-runtime` operator key is found, `result.api_key` is `None` (carry forward the previously-stored plaintext from your DB — the proxy stores only a hash, so re-fetching the plaintext is impossible). If the record is missing (e.g. DB wipe), a fresh operator key is minted and returned |
391+
392+
The admin key follows the same recreate / restart split: pin it via
393+
`spec.api_key` (or `api_key:` in YAML) if you need it stable across
394+
recreates; on restart it is simply re-derived from spec and not re-minted.
364395

365396
### Disabling auth
366397

@@ -371,43 +402,41 @@ mesh, a dev sandbox, etc.), turn auth off:
371402
enable_proxy_auth: false
372403
```
373404

374-
The library skips key generation, doesn't inject `LLMTRACE_AUTH_*`, and
375-
returns `api_key: null` in the result. The proxy URL is then wide open;
376-
own the consequences.
377-
378-
### Preserving the key across recreates
379-
380-
`update(..., strategy="recreate")` calls `provision()` under the hood,
381-
which will auto-generate a fresh key unless you pass the existing one.
382-
For plan upgrades where the tenant's apps shouldn't break, fetch the
383-
current key from your DB and put it in the spec:
384-
385-
```python
386-
spec = build_spec_from_tenant_record(tenant_record)
387-
spec = dataclasses.replace(spec, api_key=tenant_record.api_key) # preserve
388-
new_instances = lifecycle.update(spec, proxy_id=..., dashboard_id=..., strategy="recreate")
389-
# new_instances.api_key == tenant_record.api_key (carried forward)
390-
```
391-
392-
### Caveats
393-
394-
- **Admin-key-as-runtime-key is overpowered.** The auto-generated key
395-
takes the `auth.admin_key` slot (bootstrap admin role), meaning the
396-
tenant's apps technically have admin scope on their own instance.
397-
Hardening: after provision, POST a scoped non-admin key via the
398-
proxy's `/admin/keys` endpoint and hand THAT to the tenant. Tracked
399-
as a follow-up.
400-
- **Defence-in-depth** still worth doing separately: per-tenant rate
401-
limits (LLMTrace likely has them — configure), and DoS protection
402-
against CPU burn from ML detectors running before the upstream call.
403-
The proxy now bounds intra-pod ML detection concurrency via a tokio
404-
semaphore — tune the cap per-pod with the
405-
`LLMTRACE_ML_MAX_CONCURRENT` env var (default `8`). Excess requests
406-
receive `503 Service Unavailable` with `Retry-After: 1` instead of
407-
every concurrent request stalling on contended CPU; the counter
408-
`llmtrace_ml_rejected_total` and the gauge
409-
`llmtrace_ml_inflight_requests` are exposed on `/metrics` for
410-
alerting on sustained saturation.
405+
The library skips both the admin-key injection AND the operator-key
406+
bootstrap; `api_key` and `admin_key` are both `null` in the result. The
407+
proxy URL is then wide open; own the consequences.
408+
409+
### Why two keys
410+
411+
Before this PR the lifecycle layer handed the bootstrap admin key
412+
straight to the tenant. That key can mint more keys, view audit logs,
413+
manage feature flags, and create/delete tenants. A compromised tenant
414+
app would have taken the whole proxy with it. The operator role exists
415+
exactly to bound runtime traffic to "proxy LLM calls + report actions"
416+
without any control-plane scope. See
417+
`crates/llmtrace-core/src/lib.rs::ApiKeyRole` for the role definitions.
418+
419+
### Caveats and follow-ups
420+
421+
- **Dashboard wiring follow-up**: the Next.js dashboard
422+
(`dashboard/src/lib/api.ts`, `dashboard/src/lib/proxy-helpers.ts`)
423+
currently only reads `LLMTRACE_AUTH_ADMIN_KEY`. The lifecycle layer
424+
injects `LLMTRACE_AUTH_RUNTIME_KEY` informationally; a separate PR
425+
should switch the dashboard to use the runtime key for tenant-facing
426+
traffic and the admin key only for admin pages.
427+
- **Admin key rotation** is opt-in via `rotate_admin_after_bootstrap:
428+
true` in the tenant config — see the "Admin key rotation" section
429+
below. Without it the bootstrap admin key lives in the proxy env for
430+
the life of the deployment.
431+
- **Per-tenant rate limits** ship via the `rate_limit:` block in the
432+
tenant config (see "Tenant config format"); when set, the proxy
433+
honours `LLMTRACE_RATE_LIMIT_RPS` / `LLMTRACE_RATE_LIMIT_BURST`.
434+
- **DoS protection against CPU burn from ML detectors** is enforced by
435+
an intra-pod tokio semaphore — tune with `LLMTRACE_ML_MAX_CONCURRENT`
436+
(default `8`). Excess requests receive `503 Service Unavailable` with
437+
`Retry-After: 1` instead of stalling on contended CPU; the counter
438+
`llmtrace_ml_rejected_total` and gauge `llmtrace_ml_inflight_requests`
439+
are exposed on `/metrics` for alerting on sustained saturation.
411440

412441
## Per-tenant secret injection
413442

@@ -919,13 +948,14 @@ win).
919948

920949
| File | Purpose |
921950
|---|---|
922-
| `lifecycle.py:66` | `ComponentSpec` dataclass — one component's full deployment shape |
923-
| `lifecycle.py:91` | `TenantSpec` dataclass — tenant's pair (incl. `enable_proxy_auth` + `api_key`) |
924-
| `lifecycle.py:55` | `generate_api_key()` — `llmt_<64-hex>` matching the Rust proxy |
925-
| `lifecycle.py:343` | `provision(spec)` |
926-
| `lifecycle.py:387` | `update(spec, proxy_id, dashboard_id, strategy)` |
927-
| `lifecycle.py:438` | `deprovision(tenant_id, proxy_id?, dashboard_id?)` |
928-
| `lifecycle.py:452` | `status(tenant_id, proxy_id?, dashboard_id?)` |
951+
| `lifecycle.py` | `ComponentSpec`, `TenantSpec`, `TenantInstances` (`api_key` + `admin_key`) |
952+
| `lifecycle.py` | `generate_api_key()` — `llmt_<64-hex>` matching the Rust proxy |
953+
| `lifecycle.py` | `_bootstrap_tenant_in_proxy`, `_mint_operator_key`, `_find_tenant_by_label`, `_find_operator_key_record`, `_verify_or_remint_operator_key` — admin HTTP boundary against `/api/v1/tenants` + `/api/v1/auth/keys` |
954+
| `lifecycle.py` | `provision(spec)` — bootstrap admin key, deploy proxy, mint operator key, deploy dashboard |
955+
| `lifecycle.py` | `update(spec, proxy_id, dashboard_id, strategy)` — recreate re-mints operator; restart verifies + re-mints only if missing |
956+
| `lifecycle.py` | `deprovision(tenant_id, proxy_id?, dashboard_id?)` |
957+
| `lifecycle.py` | `status(tenant_id, proxy_id?, dashboard_id?)` |
958+
| `deployments/basilica/tests/test_operator_key_minting.py` | Unit tests for the admin HTTP boundary, `provision()` integration, CLI serialisation |
929959
| `cli.py:40` | `_substitute_env` — `${VAR}` resolver |
930960
| `cli.py:60` | `_load_config` — YAML/JSON loader |
931961
| `cli.py:113` | `_tenant_spec_from_config` — dict → `TenantSpec` |

deployments/basilica/cli.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,7 @@ def view(info: lifecycle.InstanceInfo | None) -> dict[str, Any] | None:
168168
"proxy_url": instances.proxy.url if instances.proxy else None,
169169
"dashboard_url": instances.dashboard.url if instances.dashboard else None,
170170
"api_key": instances.api_key,
171+
"admin_key": instances.admin_key,
171172
}
172173

173174

deployments/basilica/configs/examples/pro.yaml

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
11
# LLMTrace tenant config — pro tier example.
22
# Multi-replica proxy, persistent dashboard, debug logging, datamarking on.
33
#
4-
# Auth: per-tenant API key is auto-generated at provision time (or supply
5-
# `api_key:` at the top level to force a specific value). See starter.yaml
6-
# for the full auth contract.
4+
# Auth: two-tier key model. The lifecycle library bootstraps a tenant on
5+
# the live proxy after readiness, then mints a scoped Operator-role key
6+
# via POST /api/v1/auth/keys. That operator key is what the tenant uses
7+
# for runtime traffic. The bootstrap admin key (auto-generated or pinned
8+
# via `api_key:` below) is retained by the caller for self-service /
9+
# admin pages and NEVER handed to the tenant. See starter.yaml + the
10+
# README "Per-tenant API key auth" section for the full contract.
711

812
proxy:
913
image: "ghcr.io/techlab-innov/llmtrace-proxy:${LLMTRACE_VERSION:-latest}"

deployments/basilica/configs/examples/starter.yaml

Lines changed: 26 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,22 @@
88
# from the process environment at deploy time. Use this to keep secrets out
99
# of the config file.
1010
#
11-
# Auth: `enable_proxy_auth` defaults to true. On provision, the lifecycle
12-
# library either uses a caller-supplied `api_key` (top-level) or auto-generates
13-
# an `llmt_<64-hex>` key, injects it into both proxy + dashboard envs as
14-
# LLMTRACE_AUTH_ADMIN_KEY, and returns it in the result JSON. The plaintext
15-
# key is exposed once on provision — your app must persist it and ship it to
16-
# the tenant as their `Authorization: Bearer llmt_<key>` header. Set
17-
# `enable_proxy_auth: false` to deploy an open proxy.
11+
# Auth: `enable_proxy_auth` defaults to true. The lifecycle library issues
12+
# TWO keys at provision time (see README "Per-tenant API key auth"):
13+
#
14+
# - admin_key — bootstrap admin scope. Used by the lifecycle layer to
15+
# mint per-tenant keys + retained by the caller for the
16+
# self-service / admin portal. NEVER given to the tenant.
17+
# Sourced from `api_key:` below (if set) or `LLMTRACE_AUTH_ADMIN_KEY`
18+
# in proxy.env, or auto-generated `llmt_<64-hex>`.
19+
# - api_key — operator scope. Minted on the live proxy via
20+
# POST /api/v1/auth/keys after the proxy is ready. This is
21+
# the key handed to the tenant for runtime traffic; it
22+
# cannot manage tenants/keys/audit logs.
23+
#
24+
# Both plaintext keys are exposed once in the result JSON (`api_key` and
25+
# `admin_key`). Persist them in your secret store immediately. Set
26+
# `enable_proxy_auth: false` to deploy an open proxy with no auth at all.
1827

1928
proxy:
2029
image: ghcr.io/techlab-innov/llmtrace-proxy:latest
@@ -48,18 +57,23 @@ dashboard:
4857
HOSTNAME: 0.0.0.0
4958
NODE_ENV: production
5059
# LLMTRACE_AUTH_ADMIN_KEY is set by the lifecycle library at provision
51-
# time to match the proxy's resolved key. If you set it here explicitly,
52-
# that value wins for the dashboard env (but the proxy's resolved key is
53-
# what the tenant must actually send — keep them consistent).
60+
# time to match the proxy's resolved admin key (used by dashboard
61+
# server-side handlers for admin endpoints). LLMTRACE_AUTH_RUNTIME_KEY
62+
# is set after the operator key is minted — dashboard wiring to
63+
# consume the runtime key for tenant-facing traffic is a follow-up.
64+
# If you set either var here explicitly, that value wins for the
65+
# dashboard env.
5466

5567
# Optional — override if your tenant naming scheme differs.
5668
# proxy_name_template: "llmtrace-proxy-{tenant_id}"
5769
# dashboard_name_template: "llmtrace-dashboard-{tenant_id}"
5870
# inject_proxy_url_into_dashboard: true
5971
# proxy_url_env_var: LLMTRACE_PROXY_URL
6072

61-
# Auth controls (defaults shown). Set explicit `api_key` to force a specific
62-
# value (e.g. when recreating to preserve the existing tenant key).
73+
# Auth controls (defaults shown). `api_key` here pins the BOOTSTRAP ADMIN
74+
# key (so the admin_key stays stable across recreates). The runtime
75+
# operator key is always re-minted on recreate (the proxy's DB is fresh)
76+
# and verified-or-re-minted on restart (same volume).
6377
# enable_proxy_auth: true
6478
# api_key: null
6579

0 commit comments

Comments
 (0)