-
Notifications
You must be signed in to change notification settings - Fork 0
Runtime LLM Control
A persistent operator override that lets you flip the LLM backend, switch providers, or detach the LLM entirely — without restarting the agent. The override survives reboots and is honoured by every layer that asks for an LLM provider, including the brain pipeline, the build codegen, the review service, and the release-readiness probe.
Code: agent/control/llm_runtime.py. Tests: tests/test_llm_runtime.py.
Why this exists. The agent runs as a long-lived daemon. Restarting it to change the model means losing in-memory caches, breaking long-poll Telegram connections, and waiting for the watchdog to register a clean state. Operators need to switch from CLI to API in seconds, not minutes — for example, when the host CLI is in maintenance, when the API has a credit issue, or when a programming task needs a stronger model than the local CLI offers.
.env → LLM_BACKEND, LLM_PROVIDER (defaults)
│
▼
resolve_llm_runtime_state()
│
├─ load llm_runtime.json (operator override)
│
└─ produce effective state:
enabled (bool)
effective_backend ("cli" | "api")
effective_provider ("anthropic" | "openai" | "local")
follows_env (bool)
override_active (bool)
updated_at, updated_by, note
│
▼
get_provider() in agent/core/llm_provider.py
│
├─ if not enabled → DetachedLLMProvider (every call returns "detached by operator")
├─ if backend=cli → ClaudeCliProvider
└─ if backend=api → AnthropicProvider | OpenAiProvider
│
▼
Brain, build codegen, review service, release-readiness probe
The persisted state lives in <AGENT_DATA_DIR>/control/llm_runtime.json. It's a small JSON file with five fields:
{
"enabled": true,
"backend_override": "api",
"provider_override": "openai",
"updated_at": "2026-04-08T22:13:47.123456+00:00",
"updated_by": "operator",
"note": "switched to o3 for build pipeline"
}If backend_override and provider_override are both empty, the resolver "follows env" — meaning the values from .env apply directly. If enabled is false, every LLM call gets a DetachedLLMProvider that returns a deterministic "detached by operator" error without calling the network.
You can flip the override from any of:
| Surface | Use case |
|---|---|
CLI (python -m agent --llm-runtime-*) |
Scripts, systemd unit files, ad-hoc shell |
HTTP (POST /api/operator/llm) |
Dashboard, automation, agent-to-agent |
Dashboard (/dashboard LLM Runtime panel) |
Click-driven, with audit log |
All three call into the same LlmRuntimeControlService.update_state(...) method, which persists the change, clears the LLM provider cache, and records a TraceRecord(kind=CONFIGURATION) in the control plane.
# Inspect current state
.venv/bin/python -m agent --llm-runtime-status
# Detach the LLM entirely (every LLM call fails closed)
.venv/bin/python -m agent --llm-runtime-disable --llm-runtime-note "maintenance"
# Switch back to CLI backend
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend cli \
--llm-runtime-note "back to Claude CLI"
# Switch to Anthropic API
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend api \
--llm-runtime-provider anthropic
# Switch to OpenAI (or any OpenAI-compatible endpoint like Ollama / vLLM / LM Studio)
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend api \
--llm-runtime-provider openai
# Drop both overrides and follow whatever .env says
.venv/bin/python -m agent --llm-runtime-follow-env --llm-runtime-enable# GET — current state
curl -H "Authorization: Bearer $AGENT_API_KEY" \
http://localhost:8420/api/operator/llm
# POST — update
curl -X POST \
-H "Authorization: Bearer $AGENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"enabled": true,
"backend": "api",
"provider": "openai",
"note": "switching to o3 for the next build pipeline"
}' \
http://localhost:8420/api/operator/llmThe response is the new effective state (the same shape --llm-runtime-status prints).
/dashboard has an "LLM Runtime" panel with three buttons:
-
Detach — sets
enabled=false. Every LLM call fails closed. -
CLI — sets
enabled=true, backend=cli. -
API (anthropic | openai | local) — sets
enabled=true, backend=api, provider=<x>.
Each click POSTs to /api/operator/llm with the operator's Bearer token, refreshes, and shows the new state in the panel header.
input:
- .env values: LLM_BACKEND, LLM_PROVIDER, AGENT_DATA_DIR
- persisted state: enabled, backend_override, provider_override
resolve_llm_runtime_state(environ, state):
env_backend = environ.get("LLM_BACKEND", "cli") # default = cli
env_provider = environ.get("LLM_PROVIDER", "anthropic") # default = anthropic
effective_backend = state.backend_override OR env_backend
if effective_backend == "cli":
effective_provider = "anthropic" # CLI = Claude
else:
effective_provider = state.provider_override OR env_provider OR "anthropic"
follows_env = (no backend_override AND no provider_override)
return {
"enabled": state.enabled,
"env_backend": env_backend,
"env_provider": env_provider,
"backend_override": state.backend_override,
"provider_override": state.provider_override,
"effective_backend": effective_backend,
"effective_provider": effective_provider,
"follows_env": follows_env,
"override_active": (not state.enabled) or (not follows_env),
...
}
The brain (agent/core/brain.py::_process_inner) reads effective_backend from this resolver — not from os.environ["LLM_BACKEND"]. That was the second HIGH bug Codex caught: the legacy code went around the resolver and silently ignored every operator override.
Tested in tests/test_llm_runtime.py::TestBrainHonoursRuntimeBackendOverride.
get_provider() (agent/core/llm_provider.py) is a small factory:
runtime = resolve_llm_runtime_state(environ=os.environ)
backend = backend_arg or runtime["effective_backend"]
provider_name = provider_arg or runtime["effective_provider"]
# Cache key includes ALL kwargs so different base_url / api_key
# configurations get separate provider instances.
kwargs_sig = ",".join(f"{k}={v}" for k, v in sorted(kwargs.items()))
cache_key = f"{int(bool(runtime['enabled']))}:{backend}:{provider_name}:{kwargs_sig}"
if cache_key in _provider_cache:
return _provider_cache[cache_key]
if not runtime["enabled"]:
instance = DetachedLLMProvider(reason="LLM runtime is detached by operator. ...")
elif backend == "cli":
instance = ClaudeCliProvider(**kwargs)
elif backend == "api" and provider_name == "anthropic":
instance = AnthropicProvider(**kwargs)
elif backend == "api" and provider_name in ("openai", "local"):
instance = OpenAiProvider(**kwargs)
else:
raise ValueError(f"Unknown backend/provider combo: {backend}/{provider_name}")
_provider_cache[cache_key] = instance
return instanceLlmRuntimeControlService.update_state calls clear_provider_cache() after persisting, so the next get_provider() call after an override picks up the new wiring immediately.
DetachedLLMProvider.generate(...) returns:
{
"success": false,
"text": "",
"error": "LLM runtime is detached by operator. Re-enable it via CLI or /api/operator/llm before running LLM-backed tasks.",
"input_tokens": 0,
"output_tokens": 0,
"cost_usd": 0.0,
"latency_ms": 0
}It is fail-closed, free, and instantaneous. Every layer that tries to call the LLM gets the error and surfaces it to the operator. The agent itself keeps running — internal dispatch (status, health, tasks, budget, identity) and conversational fallbacks still work, since they don't hit the LLM.
This is the kill switch you want when:
- You spotted a runaway model loop and want to stop spending tokens.
- You're rotating API keys and want a quiet 30-second window.
- You're debugging a non-LLM bug and want to remove the LLM as a variable.
Every override is recorded:
| Where | What |
|---|---|
llm_runtime.json on disk |
Latest persisted state with updated_at, updated_by, note
|
Long-tier log (agent-long.log) |
Structlog event llm_provider_created (with backend + provider) on every cache miss |
Control plane traces (control.db) |
TraceRecord(kind=CONFIGURATION, title="LLM runtime control updated") with the full summary |
| Operator dashboard | Live state display with timestamp + note |
If you're investigating "why did the model change two days ago", grep TraceRecord ... CONFIGURATION in the trace store and you'll see the operator who made the change, when, and why.
.venv/bin/python -m agent --llm-runtime-enable \
--llm-runtime-backend api \
--llm-runtime-provider openai \
--llm-runtime-note "trying o3 for tonight's build run"
# Run your build...
.venv/bin/python -m agent --build-repo . --build-description "..."
# Switch back when done
.venv/bin/python -m agent --llm-runtime-follow-env --llm-runtime-enable.venv/bin/python -m agent --llm-runtime-disable --llm-runtime-note "rotating anthropic key"
# rotate the key in 1Password / Bitwarden, paste into .env
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend cli # or api# Check what backend it thinks it's using
.venv/bin/python -m agent --llm-runtime-status
# If the answer is "cli" and you're on a server, that's the bug — flip to API
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend api --llm-runtime-provider anthropicSee also the Telegram + CLI deny guard which catches this exact case at the brain layer and returns a deterministic operator message instead of hanging.
CI sets AGENT_RELEASE_READINESS_SKIP_LLM_PROBE=1 so the release-readiness gate skips the live probe. The runtime LLM control isn't involved on CI runners; tests use mocked providers.
- It does not rotate the master key, the API key, the OAuth token, or any credential. It only flips backend/provider selection. Credentials still come from
.env(or the vault for CLI OAuth). - It does not survive
AGENT_DATA_DIRdeletion. The override lives at<data_dir>/control/llm_runtime.json— wipe the data dir, lose the override. - It does not broadcast to other agents. Each agent has its own
llm_runtime.json. There is no shared state. - It does not retry. If you flip to a backend that's misconfigured, the next LLM call fails. Setup doctor will tell you what's wrong; release readiness gate will refuse to mark the agent ready.
v1.35.0 · Latest Release
Getting started
Architecture
Subsystems
- Security model
- Vault
- Tiered logging
- Runtime LLM control
- Build pipeline
- Review pipeline
- Finance
- Cron & Maintenance
Development