Skip to content

Commit ff0e97f

Browse files
authored
feat(basilica): opt-in admin key rotation after bootstrap (#238)
Add `rotate_admin_key()` primitive + `rotate_admin_after_bootstrap` TenantSpec field (default off) so callers can invalidate the bootstrap admin key as soon as `provision()` returns. Wire it through the CLI as the `rotate-admin-key` subcommand and the tenant-lifecycle workflow as a new `action` choice. The workflow `::add-mask::`s `admin_key` before any `cat result.json` line, same pattern as the existing `api_key` mask. Mechanism: the Basilica SDK exposes no env-patch primitive (create/delete/restart only). Rotation therefore deletes the existing proxy UUID and creates a fresh one with the rotated env. As a consequence, `proxy_instance_id` and `proxy.url` change; the result JSON returns the post-rotation values which the caller must persist. Trade-off: opt-in because the rotation adds one proxy re-roll (~30s) to provisioning time. Recommended for production tenants where the bootstrap admin key must be invalidated; safe to leave off for dev / sandbox tenants. Tests: 9 unit tests under deployments/basilica/tests/ exercising the rotation logic with the Basilica SDK HTTP boundary mocked (create_deployment / delete_deployment / get_deployment). Rotation logic itself runs unmodified.
1 parent 2bcaa1e commit ff0e97f

8 files changed

Lines changed: 518 additions & 2 deletions

File tree

.github/workflows/tenant-lifecycle.yml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ on:
1616
- update
1717
- status
1818
- deprovision
19+
- rotate-admin-key
1920
config_path:
2021
description: "Path in repo to a YAML/JSON tenant config (provision/update only)"
2122
required: false
@@ -44,6 +45,11 @@ on:
4445
options:
4546
- recreate
4647
- restart
48+
new_admin_key:
49+
description: "Optional explicit new admin key for rotate-admin-key (else auto-generated)"
50+
required: false
51+
type: string
52+
default: ""
4753
tenant_secrets_json:
4854
description: >-
4955
JSON object of per-tenant secrets to inject as env vars before the
@@ -106,6 +112,7 @@ jobs:
106112
WF_PROXY_ID: ${{ inputs.proxy_instance_id }}
107113
WF_DASHBOARD_ID: ${{ inputs.dashboard_instance_id }}
108114
WF_STRATEGY: ${{ inputs.strategy }}
115+
WF_NEW_ADMIN_KEY: ${{ inputs.new_admin_key }}
109116
WF_SECRETS: ${{ inputs.tenant_secrets_json }}
110117
RD_PAYLOAD: ${{ toJson(github.event.client_payload) }}
111118
run: |
@@ -121,6 +128,7 @@ jobs:
121128
proxy_id = os.environ.get("WF_PROXY_ID", "")
122129
dash_id = os.environ.get("WF_DASHBOARD_ID", "")
123130
strategy = os.environ.get("WF_STRATEGY", "")
131+
new_admin_key = os.environ.get("WF_NEW_ADMIN_KEY", "")
124132
secrets_json = os.environ.get("WF_SECRETS", "{}") or "{}"
125133
try:
126134
secrets = json.loads(secrets_json)
@@ -137,13 +145,20 @@ jobs:
137145
proxy_id = payload.get("proxy_instance_id", "")
138146
dash_id = payload.get("dashboard_instance_id", "")
139147
strategy = payload.get("strategy", "recreate")
148+
new_admin_key = payload.get("new_admin_key", "") or ""
140149
secrets = payload.get("tenant_secrets", {}) or {}
141150
else:
142151
raise SystemExit(f"::error::unsupported event {event!r}")
143152
144153
if not isinstance(secrets, dict):
145154
raise SystemExit("::error::tenant_secrets must be a JSON object")
146155
156+
# Mask the optional explicit new admin key from the workflow input
157+
# surface — it ends up as a CLI arg downstream and must not show up
158+
# in subsequent log lines.
159+
if new_admin_key:
160+
print(f"::add-mask::{new_admin_key}")
161+
147162
env_path = pathlib.Path(os.environ["GITHUB_ENV"])
148163
with env_path.open("a") as fh:
149164
fh.write(f"TENANT_ID={tenant_id}\n")
@@ -152,6 +167,7 @@ jobs:
152167
fh.write(f"PROXY_ID={proxy_id}\n")
153168
fh.write(f"DASHBOARD_ID={dash_id}\n")
154169
fh.write(f"STRATEGY={strategy}\n")
170+
fh.write(f"NEW_ADMIN_KEY={new_admin_key}\n")
155171
# config_json may contain newlines; use heredoc form
156172
if cfg_json:
157173
fh.write(f"CFG_JSON<<__LIFECYCLE_EOF__\n{cfg_json}\n__LIFECYCLE_EOF__\n")
@@ -219,6 +235,16 @@ jobs:
219235
status|deprovision)
220236
:
221237
;;
238+
rotate-admin-key)
239+
if [[ -z "${CFG_PATH:-}" && -z "${CFG_JSON:-}" ]]; then
240+
echo "::error::action=rotate-admin-key requires config_path or config_json"
241+
exit 2
242+
fi
243+
if [[ -z "${PROXY_ID:-}" ]]; then
244+
echo "::error::action=rotate-admin-key requires proxy_instance_id"
245+
exit 2
246+
fi
247+
;;
222248
*)
223249
echo "::error::unknown or missing action: ${ACTION:-<empty>}"
224250
exit 2
@@ -252,6 +278,17 @@ jobs:
252278
[[ -n "${PROXY_ID:-}" ]] && args+=("--proxy-instance-id" "${PROXY_ID}")
253279
[[ -n "${DASHBOARD_ID:-}" ]] && args+=("--dashboard-instance-id" "${DASHBOARD_ID}")
254280
;;
281+
rotate-admin-key)
282+
args+=("--proxy-instance-id" "${PROXY_ID}")
283+
if [[ -n "${CFG_JSON:-}" ]]; then
284+
args+=("--config-json" "${CFG_JSON}")
285+
else
286+
args+=("--config" "${CFG_PATH}")
287+
fi
288+
if [[ -n "${NEW_ADMIN_KEY:-}" ]]; then
289+
args+=("--new-key" "${NEW_ADMIN_KEY}")
290+
fi
291+
;;
255292
esac
256293
257294
set +e

deployments/basilica/README.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -438,6 +438,84 @@ without any control-plane scope. See
438438
`llmtrace_ml_rejected_total` and gauge `llmtrace_ml_inflight_requests`
439439
are exposed on `/metrics` for alerting on sustained saturation.
440440

441+
### Admin key rotation
442+
443+
After `provision`, the bootstrap admin key lives in the proxy pod env
444+
indefinitely. The key was returned once on the workflow run output and
445+
flows through the calling app's secret-handling path before reaching the
446+
tenant. If you treat the workflow output / app log surface as a higher
447+
exposure boundary than the live Basilica pod env, you can **rotate the
448+
admin key after bootstrap** so the value returned at provision time is
449+
no longer the live key.
450+
451+
**When to enable.** Production tenants where you want the bootstrap admin
452+
key invalidated as soon as the tenant has it (or you don't trust the
453+
single-channel return path). Skip for dev / sandbox tenants — the
454+
rotation is opt-in because it costs an extra proxy re-roll.
455+
456+
**Trade-off.** Rotation deletes the bootstrap proxy and creates a fresh
457+
one with the rotated `LLMTRACE_AUTH_ADMIN_KEY`. That adds ~30s to
458+
provisioning time, and **the proxy's `instance_id` and `url` change** as
459+
a result (Basilica's SDK has no env-patch primitive — see *Lifecycle
460+
operations in detail* below). The caller MUST persist the post-rotation
461+
`proxy_instance_id` and `proxy_url` over the bootstrap values; the
462+
result JSON returns the post-rotation UUID/URL.
463+
464+
**Enable via YAML config:**
465+
466+
```yaml
467+
rotate_admin_after_bootstrap: true
468+
```
469+
470+
With the flag set, `provision` runs as usual, then internally calls
471+
`rotate_admin_key(proxy_instance_id=...)` against the just-created proxy,
472+
regenerates a fresh `llmt_<64-hex>` key, and returns that as the
473+
`api_key` field in the result JSON. The bootstrap key never leaves the
474+
library boundary.
475+
476+
**Invoke the rotation independently** (e.g. for an already-running
477+
tenant whose admin key you want to roll on a schedule):
478+
479+
```bash
480+
python -m deployments.basilica.cli rotate-admin-key \
481+
--tenant-id acme \
482+
--config deployments/basilica/configs/examples/starter.yaml \
483+
--proxy-instance-id <current_proxy_uuid>
484+
# optional: --new-key llmt_... to force a specific value
485+
```
486+
487+
Result JSON shape:
488+
489+
```json
490+
{
491+
"tenant_id": "acme",
492+
"proxy_instance_id": "<new_uuid>",
493+
"proxy_url": "https://<new_uuid>.deployments.basilica.ai",
494+
"admin_key": "llmt_..."
495+
}
496+
```
497+
498+
**Dispatch via the workflow:**
499+
500+
```bash
501+
gh api -X POST /repos/techlab-innov/llmtrace/dispatches \
502+
--raw-field event_type=tenant-lifecycle \
503+
--raw-field 'client_payload={
504+
"tenant_id":"acme","action":"rotate-admin-key",
505+
"config_path":"deployments/basilica/configs/examples/starter.yaml",
506+
"proxy_instance_id":"<current_proxy_uuid>"
507+
}'
508+
```
509+
510+
The workflow `::add-mask::`s the returned `admin_key` before any
511+
`cat result.json` line, same as it does for `api_key` on provision. The
512+
key is then available as a step output (`steps.run.outputs.admin_key`)
513+
for the dispatching app to consume via the Actions API.
514+
515+
**Idempotency.** Re-running rotation on an already-rotated tenant
516+
generates a fresh key (it's not a no-op). The caller must serialise
517+
rotation per tenant to avoid concurrent delete+create races.
518+
441519
## Per-tenant secret injection
442520

443521
Two trigger paths, pick by intended audience:

deployments/basilica/cli.py

Lines changed: 50 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@ def _tenant_spec_from_config(tenant_id: str, cfg: dict[str, Any]) -> lifecycle.T
142142
"proxy_url_env_var",
143143
"enable_proxy_auth",
144144
"api_key",
145+
"rotate_admin_after_bootstrap",
145146
):
146147
if key in cfg:
147148
kwargs[key] = cfg[key]
@@ -172,6 +173,16 @@ def view(info: lifecycle.InstanceInfo | None) -> dict[str, Any] | None:
172173
}
173174

174175

176+
def _serialise_rotation(result: lifecycle.RotationResult) -> dict[str, Any]:
177+
return {
178+
"tenant_id": result.tenant_id,
179+
"proxy": dataclasses.asdict(result.proxy),
180+
"proxy_instance_id": result.proxy.instance_id,
181+
"proxy_url": result.proxy.url,
182+
"admin_key": result.admin_key,
183+
}
184+
185+
175186
def _build_parser() -> argparse.ArgumentParser:
176187
parser = argparse.ArgumentParser(
177188
prog="deployments.basilica.cli",
@@ -229,10 +240,33 @@ def add_ids(p: argparse.ArgumentParser, required: bool) -> None:
229240
add_tenant(p_dep)
230241
add_ids(p_dep, required=False)
231242

243+
p_rot = sub.add_parser(
244+
"rotate-admin-key",
245+
help=(
246+
"Rotate the proxy admin key on a live deployment. "
247+
"Delete+recreate under the hood — proxy_instance_id and URL change."
248+
),
249+
)
250+
add_tenant(p_rot)
251+
add_config(p_rot)
252+
p_rot.add_argument(
253+
"--proxy-instance-id",
254+
required=True,
255+
help="Basilica UUID for the existing proxy deployment",
256+
)
257+
p_rot.add_argument(
258+
"--new-key",
259+
required=False,
260+
default=None,
261+
help="Optional explicit new admin key (else a fresh llmt_<64-hex> is generated)",
262+
)
263+
232264
return parser
233265

234266

235-
def _dispatch(args: argparse.Namespace) -> lifecycle.TenantInstances:
267+
def _dispatch(
268+
args: argparse.Namespace,
269+
) -> lifecycle.TenantInstances | lifecycle.RotationResult:
236270
if args.action == "provision":
237271
cfg = _load_config(args.config, args.config_json)
238272
spec = _tenant_spec_from_config(args.tenant_id, cfg)
@@ -258,6 +292,16 @@ def _dispatch(args: argparse.Namespace) -> lifecycle.TenantInstances:
258292
proxy_instance_id=args.proxy_instance_id,
259293
dashboard_instance_id=args.dashboard_instance_id,
260294
)
295+
if args.action == "rotate-admin-key":
296+
cfg = _load_config(args.config, args.config_json)
297+
spec = _tenant_spec_from_config(args.tenant_id, cfg)
298+
return lifecycle.rotate_admin_key(
299+
tenant_id=args.tenant_id,
300+
proxy_instance_id=args.proxy_instance_id,
301+
proxy_spec=spec.proxy,
302+
new_key=args.new_key,
303+
proxy_name_template=spec.proxy_name_template,
304+
)
261305
raise SystemExit(f"unknown action {args.action!r}")
262306

263307

@@ -276,7 +320,11 @@ def main(argv: list[str] | None = None) -> int:
276320
json.dump({"error": str(exc), "action": args.action}, sys.stdout)
277321
sys.stdout.write("\n")
278322
return 3
279-
json.dump(_serialise(result), sys.stdout, indent=2, sort_keys=True)
323+
if isinstance(result, lifecycle.RotationResult):
324+
payload = _serialise_rotation(result)
325+
else:
326+
payload = _serialise(result)
327+
json.dump(payload, sys.stdout, indent=2, sort_keys=True)
280328
sys.stdout.write("\n")
281329
return 0
282330

deployments/basilica/configs/examples/pro.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,3 +52,7 @@ dashboard:
5252
# rate_limit:
5353
# requests_per_second: 500
5454
# burst_size: 1000
55+
56+
# Rotate the admin key once the proxy is up. Adds ~30s and changes the
57+
# proxy instance_id + url; recommended for production. Opt-in.
58+
# rotate_admin_after_bootstrap: true

deployments/basilica/configs/examples/starter.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,3 +84,10 @@ dashboard:
8484
# rate_limit:
8585
# requests_per_second: 50
8686
# burst_size: 100
87+
88+
# rotate_admin_after_bootstrap: false
89+
# ↑ Set to `true` for production tenants where the bootstrap admin key
90+
# must be invalidated immediately after provision. Adds ~30s to
91+
# provisioning (extra proxy re-roll) and changes the proxy
92+
# instance_id + url — caller must persist the post-rotation values.
93+
# See README.md → "Admin key rotation".

0 commit comments

Comments
 (0)