feat(basilica): per-tenant LLMTrace API key auth, secure by default#233
Merged
Conversation
LLMTrace's proxy has built-in bearer-token auth (crates/llmtrace-proxy/
src/auth.rs) but our tenant configs didn't enable it. Result: provisioned
Basilica URLs were wide open — anyone could hit /v1/chat/completions and
burn the tenant's upstream quota.
This change closes the gap at the lifecycle layer.
lifecycle.py:
* generate_api_key() — `llmt_` + 32 random bytes hex-encoded, matching
the format produced by the Rust proxy at auth.rs:44
* TenantSpec gains enable_proxy_auth (default true) + api_key (optional
explicit override)
* TenantInstances gains api_key field, populated only on provision()
and on update(strategy=recreate)
* _resolve_api_key resolves explicit > existing env > auto-generated
* _apply_proxy_auth injects LLMTRACE_AUTH_ENABLED=true (default) and
LLMTRACE_AUTH_ADMIN_KEY=<key> into proxy env; LLMTRACE_AUTH_ADMIN_KEY
into dashboard env (so dashboard can talk to proxy's admin
endpoints)
cli.py:
* Config loader accepts top-level enable_proxy_auth + api_key fields
* Result JSON includes api_key (plaintext)
.github/workflows/tenant-lifecycle.yml:
* Registers ::add-mask:: for the api_key from result.json BEFORE the
cat / step summary lines run, so the key never appears in public
workflow logs
* Emits api_key as a step output for caller consumption via Actions
API (mask covers downstream log lines)
configs/examples/{starter,pro}.yaml:
* Document the auto-generation behaviour + how to override
deployments/basilica/README.md:
* New "Per-tenant API key auth (secure by default)" section above
"Per-tenant secret injection", covering the resolution priority,
tenant-side curl shape, recreate-with-preserved-key pattern,
enable_proxy_auth=false escape hatch, and known caveats
(admin-key-as-runtime is overpowered; rate-limiting + CPU-burn
defence are separate concerns)
Validated end-to-end on 2026-05-18 against a live Basilica account
(tenant auth-85675):
- provision returned api_key=llmt_<69-char> in result JSON
- /health unauthenticated → 200 (probe path stays open)
- / without bearer → 401
- / with wrong bearer → 401
- / with correct bearer → 421 from OpenAI (LLMTrace authed,
forwarded; OpenAI rejected for no upstream auth — proxy doing
its job)
- deprovision cleaned up cleanly
BREAKING CHANGE (sort of): provision() now defaults to gating the proxy.
Existing tenants provisioned before this change continue to work as
before. New provisions need their callers (apps, workflow consumers) to
persist TenantInstances.api_key and ship it to the tenant. To preserve
the prior wide-open behaviour, set `enable_proxy_auth: false` in the
tenant config.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the wide-open-proxy-URL gap surfaced during today's review. LLMTrace's proxy already has bearer-token auth (
crates/llmtrace-proxy/src/auth.rs); our tenant configs just didn't enable it. After this PR, every provisioned tenant gets a per-tenantllmt_<64-hex>API key auto-generated at create time, injected into the proxy env, and surfaced to the caller exactly once viaTenantInstances.api_key/ result JSON.What changes
lifecycle.pygenerate_api_key();TenantSpec.enable_proxy_auth(default true) +TenantSpec.api_key(explicit override);TenantInstances.api_key;_resolve_api_key(priority: explicit > env > auto);_apply_proxy_authinjectsLLMTRACE_AUTH_ENABLED=true+LLMTRACE_AUTH_ADMIN_KEY=<key>into both proxy and dashboard envscli.pyenable_proxy_auth/api_keytop-level fields; result JSON includesapi_key.github/workflows/tenant-lifecycle.yml::add-mask::for the key before anycat result.jsonline, so it never appears in workflow logs; exposes as step outputconfigs/examples/{starter,pro}.yamlREADME.mdLive-validated end-to-end (2026-05-18, tenant
auth-85675)provisionreturnedapi_key=llmt_<69 chars>auth.rs:44)/healthno auth/no bearer/wrong bearer/correct bearerdeprovisionBreaking change call-out
Existing tenants provisioned before this PR keep working unchanged. New provisions now require the caller to:
TenantInstances.api_keyin their DBAuthorization: Bearer llmt_<key>on every non-/healthrequestTo preserve the prior wide-open behaviour for trusted-network deploys, set
enable_proxy_auth: falsein the tenant config. The README documents this escape hatch.Caveats (in the README, not blocking)
/admin/keysafter provision and hand THAT to the tenant.Test plan
ruff check; lifecycle library unit-tested via the inline checks documented in the commit)gh workflow run tenant-lifecycle.yml ...and confirm the workflow's "Run lifecycle action" step prints::add-mask::<value>BEFORE anycat result.jsonline that contains the key — verify in the run UI that the key shows as***incatoutput