feat(basilica): mint scoped operator key for tenant runtime traffic#239
Merged
Conversation
Splits per-tenant proxy auth into two roles so a compromised tenant app no longer holds admin scope on its own LLMTrace pod: - admin_key (bootstrap): retained by the caller. Used by the lifecycle layer to mint per-tenant keys and by the self-service / admin portal. - api_key (operator): minted on the live proxy via POST /api/v1/auth/keys after readiness, returned in TenantInstances.api_key. This is the bearer the tenant's runtime apps use. provision() now bootstraps the per-pod tenant row via POST /api/v1/tenants, mints the operator key, and injects LLMTRACE_AUTH_RUNTIME_KEY into the dashboard env (informational; dashboard wiring is a follow-up). update(strategy="restart") rediscovers the tenant by label, lists keys, and re-mints only when the operator record is missing. update(strategy= "recreate") always re-mints since the DB volume is destroyed. cli.py emits admin_key alongside api_key. The tenant-lifecycle workflow masks BOTH keys via ::add-mask:: before any cat result.json and exposes both as step outputs. Adds deployments/basilica/tests/ with 19 unit tests that exercise the real urllib admin-API client against an in-process http.server, plus a provision() integration test using a fake Basilica client.
6e44297 to
c635465
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Splits per-tenant proxy auth into two scoped keys so a compromised tenant app no longer holds admin scope on its own LLMTrace pod (PR 2 of 4 in the Basilica security follow-up series).
POST /api/v1/auth/keysafter readiness. This is the bearer the tenant's runtime apps use. Cannot mint keys, manage tenants, read audit logs, or change feature flags.Per-file changes
deployments/basilica/lifecycle.pyTenantInstances.admin_key: Optional[str]field._admin_http_request— single urllib-based HTTP boundary against the proxy admin API. No new pip deps._bootstrap_tenant_in_proxy—POST /api/v1/tenantsto materialise the per-pod tenant row._mint_operator_key—POST /api/v1/auth/keyswith{name: "tenant-runtime", role: "operator", tenant_id: <uuid>}._find_tenant_by_label—GET /api/v1/tenantsmatched by name (used on restart to rediscover the tenant UUID)._find_operator_key_record—GET /api/v1/auth/keysfiltered to non-revokedtenant-runtime/operator._verify_or_remint_operator_key— restart-strategy entry point._inject_runtime_key_into_dashboard— addsLLMTRACE_AUTH_RUNTIME_KEYto the dashboard env.provision()rewritten: bootstrap admin key → deploy proxy → wait ready → create tenant row → mint operator key → deploy dashboard with both keys.update(strategy="recreate")delegates toprovision()so the operator key is always re-minted (DB is gone).update(strategy="restart")rediscovers tenant + verifies operator key; re-mints only if missing. When the key persists,api_keyisNone(caller carries forward — proxy stores only a hash)._apply_proxy_authrenamed param fromapi_keytoadmin_keyfor clarity; behaviour unchanged (admin key injected into both envs asLLMTRACE_AUTH_ADMIN_KEY).deployments/basilica/cli.py_serialise()emitsadmin_keyalongsideapi_key..github/workflows/tenant-lifecycle.yml::add-mask::now masks BOTHapi_keyandadmin_keybefore anycat result.jsonoperation.deployments/basilica/configs/examples/{starter,pro}.yamldeployments/basilica/README.mddeployments/basilica/tests/(new)conftest.py: installs abasilicaSDK stub so unit tests run without the upstream SDK.test_operator_key_minting.py: 19 tests covering the admin HTTP boundary, error paths, restart-flow helpers, dashboard env injection, theTenantInstancesshape, the CLI's_serialise(), and aprovision()integration test using a realhttp.serverand a fake Basilica client.Validation evidence
Both
provision()andupdate()integration paths run against a real in-process HTTP server with hand-crafted proxy responses, so the urllib client's headers (Authorization: Bearer,X-LLMTrace-Tenant-ID), JSON encoding, error decoding, list-vs-dict response unwrapping, and the lifecycle layer's overall sequencing are all exercised by the test suite.The workflow YAML was syntax-checked with
yaml.safe_load.What is NOT live-validated
Live Basilica end-to-end validation pending — will be run by the maintainer after merge. This worktree has no Basilica credentials and no live proxy URL to provision against. The HTTP boundary contract is exercised against a fake proxy that mirrors the real proxy's documented response shapes (
auth.rs::CreateApiKeyResponse,tenant_api.rs::CreateTenantResponse), but the actual handshake against a freshly-deployed proxy pod has not been observed.Suggested post-merge validation:
tenant-lifecycle.ymlwithaction=provisionfor a test tenant.api_key(operator-scoped) andadmin_key(admin-scoped) populated and both::add-mask::'d in the run log.api_key→ expect 200 on/v1/chat/completions, 403 onPOST /api/v1/auth/keys(no admin scope).admin_key→ expect 200 on both.action=update strategy=restartand confirmapi_keyisnullin the result (existing key carried forward).action=update strategy=recreateand confirm a freshapi_keyis returned.Trade-offs
GET /api/v1/tenantsand match byname == spec.tenant_id. This is O(n) in proxy-side tenants per restart and assumes the label is unique within the pod. Since the proxy is single-tenant-per-pod in this deployment model, that holds. If the pod ever holds multiple tenants we'd need to either propagate the UUID through the caller's DB or store it client-side.basilica-sdk+PyYAML). Trade-off is more boilerplate in_admin_http_request, but the API surface is small (5 calls).LLMTRACE_AUTH_RUNTIME_KEYis informational. The Next.js dashboard only readsLLMTRACE_AUTH_ADMIN_KEYtoday (dashboard/src/lib/api.ts,dashboard/src/lib/proxy-helpers.ts). The runtime key is set so the dashboard wiring follow-up becomes a pure dashboard change.Follow-ups (out of scope, tracked separately)
LLMTRACE_AUTH_RUNTIME_KEYfor tenant-facing traffic and reserveLLMTRACE_AUTH_ADMIN_KEYfor the admin pages.Test plan
python3 -c "from deployments.basilica import lifecycle, cli; print('ok')"python3 -m pytest deployments/basilica/tests/ -v(19 pass)yaml.safe_load)