Skip to content

feat(basilica): per-tenant LLMTrace API key auth, secure by default#233

Merged
epappas merged 1 commit into
mainfrom
feat/per-tenant-api-key
May 18, 2026
Merged

feat(basilica): per-tenant LLMTrace API key auth, secure by default#233
epappas merged 1 commit into
mainfrom
feat/per-tenant-api-key

Conversation

@epappas
Copy link
Copy Markdown
Collaborator

@epappas epappas commented May 18, 2026

Summary

Closes the wide-open-proxy-URL gap surfaced during today's review. LLMTrace's proxy already has bearer-token auth (crates/llmtrace-proxy/src/auth.rs); our tenant configs just didn't enable it. After this PR, every provisioned tenant gets a per-tenant llmt_<64-hex> API key auto-generated at create time, injected into the proxy env, and surfaced to the caller exactly once via TenantInstances.api_key / result JSON.

What changes

Layer Change
lifecycle.py generate_api_key(); TenantSpec.enable_proxy_auth (default true) + TenantSpec.api_key (explicit override); TenantInstances.api_key; _resolve_api_key (priority: explicit > env > auto); _apply_proxy_auth injects LLMTRACE_AUTH_ENABLED=true + LLMTRACE_AUTH_ADMIN_KEY=<key> into both proxy and dashboard envs
cli.py Config loader accepts enable_proxy_auth / api_key top-level fields; result JSON includes api_key
.github/workflows/tenant-lifecycle.yml Registers ::add-mask:: for the key before any cat result.json line, so it never appears in workflow logs; exposes as step output
configs/examples/{starter,pro}.yaml Comment + opt-out fields
README.md New "Per-tenant API key auth (secure by default)" section above secret injection — covers resolution priority, tenant-side curl, recreate-with-preserved-key, opt-out, and caveats

Live-validated end-to-end (2026-05-18, tenant auth-85675)

Check Result
provision returned api_key=llmt_<69 chars> Matches Rust proxy format (auth.rs:44)
/health no auth 200 (probes still work)
/ no bearer 401
/ wrong bearer 401
/ correct bearer 421 from OpenAI — LLMTrace authed, forwarded, upstream rejected for no upstream auth (proxy doing its job)
deprovision clean

Breaking change call-out

Existing tenants provisioned before this PR keep working unchanged. New provisions now require the caller to:

  1. Persist TenantInstances.api_key in their DB
  2. Ship the plaintext to the tenant (one-time)
  3. Tenant sends Authorization: Bearer llmt_<key> on every non-/health request

To preserve the prior wide-open behaviour for trusted-network deploys, set enable_proxy_auth: false in the tenant config. The README documents this escape hatch.

Caveats (in the README, not blocking)

  • The auto-generated key takes the bootstrap admin slot — overpowered for runtime use. A follow-up could POST a scoped non-admin key via /admin/keys after provision and hand THAT to the tenant.
  • Defence in depth still worth doing separately: per-tenant rate limits, CPU-burn protection (LLMTrace's ML detectors run before the upstream call — auth alone doesn't stop a sustained POST flood from burning proxy CPU).

Test plan

  • CI green (no Rust changes; new Python lints clean per ruff check; lifecycle library unit-tested via the inline checks documented in the commit)
  • Post-merge: trigger gh workflow run tenant-lifecycle.yml ... and confirm the workflow's "Run lifecycle action" step prints ::add-mask::<value> BEFORE any cat result.json line that contains the key — verify in the run UI that the key shows as *** in cat output

LLMTrace's proxy has built-in bearer-token auth (crates/llmtrace-proxy/
src/auth.rs) but our tenant configs didn't enable it. Result: provisioned
Basilica URLs were wide open — anyone could hit /v1/chat/completions and
burn the tenant's upstream quota.

This change closes the gap at the lifecycle layer.

lifecycle.py:
  * generate_api_key() — `llmt_` + 32 random bytes hex-encoded, matching
    the format produced by the Rust proxy at auth.rs:44
  * TenantSpec gains enable_proxy_auth (default true) + api_key (optional
    explicit override)
  * TenantInstances gains api_key field, populated only on provision()
    and on update(strategy=recreate)
  * _resolve_api_key resolves explicit > existing env > auto-generated
  * _apply_proxy_auth injects LLMTRACE_AUTH_ENABLED=true (default) and
    LLMTRACE_AUTH_ADMIN_KEY=<key> into proxy env; LLMTRACE_AUTH_ADMIN_KEY
    into dashboard env (so dashboard can talk to proxy's admin
    endpoints)

cli.py:
  * Config loader accepts top-level enable_proxy_auth + api_key fields
  * Result JSON includes api_key (plaintext)

.github/workflows/tenant-lifecycle.yml:
  * Registers ::add-mask:: for the api_key from result.json BEFORE the
    cat / step summary lines run, so the key never appears in public
    workflow logs
  * Emits api_key as a step output for caller consumption via Actions
    API (mask covers downstream log lines)

configs/examples/{starter,pro}.yaml:
  * Document the auto-generation behaviour + how to override

deployments/basilica/README.md:
  * New "Per-tenant API key auth (secure by default)" section above
    "Per-tenant secret injection", covering the resolution priority,
    tenant-side curl shape, recreate-with-preserved-key pattern,
    enable_proxy_auth=false escape hatch, and known caveats
    (admin-key-as-runtime is overpowered; rate-limiting + CPU-burn
    defence are separate concerns)

Validated end-to-end on 2026-05-18 against a live Basilica account
(tenant auth-85675):
  - provision returned api_key=llmt_<69-char> in result JSON
  - /health unauthenticated → 200 (probe path stays open)
  - / without bearer → 401
  - / with wrong bearer → 401
  - / with correct bearer → 421 from OpenAI (LLMTrace authed,
    forwarded; OpenAI rejected for no upstream auth — proxy doing
    its job)
  - deprovision cleaned up cleanly

BREAKING CHANGE (sort of): provision() now defaults to gating the proxy.
Existing tenants provisioned before this change continue to work as
before. New provisions need their callers (apps, workflow consumers) to
persist TenantInstances.api_key and ship it to the tenant. To preserve
the prior wide-open behaviour, set `enable_proxy_auth: false` in the
tenant config.
@epappas epappas merged commit dafed11 into main May 18, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant