Skip to content

Runtime LLM Control

Daniel Babjak edited this page Apr 8, 2026 · 1 revision

Runtime LLM Control

A persistent operator override that lets you flip the LLM backend, switch providers, or detach the LLM entirely — without restarting the agent. The override survives reboots and is honoured by every layer that asks for an LLM provider, including the brain pipeline, the build codegen, the review service, and the release-readiness probe.

Code: agent/control/llm_runtime.py. Tests: tests/test_llm_runtime.py.

Why this exists. The agent runs as a long-lived daemon. Restarting it to change the model means losing in-memory caches, breaking long-poll Telegram connections, and waiting for the watchdog to register a clean state. Operators need to switch from CLI to API in seconds, not minutes — for example, when the host CLI is in maintenance, when the API has a credit issue, or when a programming task needs a stronger model than the local CLI offers.


Mental model

.env → LLM_BACKEND, LLM_PROVIDER (defaults)
                   │
                   ▼
       resolve_llm_runtime_state()
                   │
                   ├─ load llm_runtime.json (operator override)
                   │
                   └─ produce effective state:
                       enabled            (bool)
                       effective_backend  ("cli" | "api")
                       effective_provider ("anthropic" | "openai" | "local")
                       follows_env        (bool)
                       override_active    (bool)
                       updated_at, updated_by, note
                   │
                   ▼
       get_provider() in agent/core/llm_provider.py
                   │
                   ├─ if not enabled  → DetachedLLMProvider (every call returns "detached by operator")
                   ├─ if backend=cli  → ClaudeCliProvider
                   └─ if backend=api  → AnthropicProvider | OpenAiProvider
                   │
                   ▼
       Brain, build codegen, review service, release-readiness probe

The persisted state lives in <AGENT_DATA_DIR>/control/llm_runtime.json. It's a small JSON file with five fields:

{
  "enabled": true,
  "backend_override": "api",
  "provider_override": "openai",
  "updated_at": "2026-04-08T22:13:47.123456+00:00",
  "updated_by": "operator",
  "note": "switched to o3 for build pipeline"
}

If backend_override and provider_override are both empty, the resolver "follows env" — meaning the values from .env apply directly. If enabled is false, every LLM call gets a DetachedLLMProvider that returns a deterministic "detached by operator" error without calling the network.


Three control surfaces

You can flip the override from any of:

Surface Use case
CLI (python -m agent --llm-runtime-*) Scripts, systemd unit files, ad-hoc shell
HTTP (POST /api/operator/llm) Dashboard, automation, agent-to-agent
Dashboard (/dashboard LLM Runtime panel) Click-driven, with audit log

All three call into the same LlmRuntimeControlService.update_state(...) method, which persists the change, clears the LLM provider cache, and records a TraceRecord(kind=CONFIGURATION) in the control plane.

CLI

# Inspect current state
.venv/bin/python -m agent --llm-runtime-status

# Detach the LLM entirely (every LLM call fails closed)
.venv/bin/python -m agent --llm-runtime-disable --llm-runtime-note "maintenance"

# Switch back to CLI backend
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend cli \
    --llm-runtime-note "back to Claude CLI"

# Switch to Anthropic API
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend api \
    --llm-runtime-provider anthropic

# Switch to OpenAI (or any OpenAI-compatible endpoint like Ollama / vLLM / LM Studio)
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend api \
    --llm-runtime-provider openai

# Drop both overrides and follow whatever .env says
.venv/bin/python -m agent --llm-runtime-follow-env --llm-runtime-enable

HTTP

# GET — current state
curl -H "Authorization: Bearer $AGENT_API_KEY" \
     http://localhost:8420/api/operator/llm

# POST — update
curl -X POST \
     -H "Authorization: Bearer $AGENT_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
       "enabled": true,
       "backend": "api",
       "provider": "openai",
       "note": "switching to o3 for the next build pipeline"
     }' \
     http://localhost:8420/api/operator/llm

The response is the new effective state (the same shape --llm-runtime-status prints).

Dashboard

/dashboard has an "LLM Runtime" panel with three buttons:

  • Detach — sets enabled=false. Every LLM call fails closed.
  • CLI — sets enabled=true, backend=cli.
  • API (anthropic | openai | local) — sets enabled=true, backend=api, provider=<x>.

Each click POSTs to /api/operator/llm with the operator's Bearer token, refreshes, and shows the new state in the panel header.


Effective backend resolution

input:
  - .env values: LLM_BACKEND, LLM_PROVIDER, AGENT_DATA_DIR
  - persisted state: enabled, backend_override, provider_override

resolve_llm_runtime_state(environ, state):
    env_backend  = environ.get("LLM_BACKEND",  "cli")     # default = cli
    env_provider = environ.get("LLM_PROVIDER", "anthropic") # default = anthropic

    effective_backend = state.backend_override OR env_backend
    if effective_backend == "cli":
        effective_provider = "anthropic"   # CLI = Claude
    else:
        effective_provider = state.provider_override OR env_provider OR "anthropic"

    follows_env = (no backend_override AND no provider_override)

    return {
      "enabled": state.enabled,
      "env_backend": env_backend,
      "env_provider": env_provider,
      "backend_override": state.backend_override,
      "provider_override": state.provider_override,
      "effective_backend": effective_backend,
      "effective_provider": effective_provider,
      "follows_env": follows_env,
      "override_active": (not state.enabled) or (not follows_env),
      ...
    }

The brain (agent/core/brain.py::_process_inner) reads effective_backend from this resolver — not from os.environ["LLM_BACKEND"]. That was the second HIGH bug Codex caught: the legacy code went around the resolver and silently ignored every operator override.

Tested in tests/test_llm_runtime.py::TestBrainHonoursRuntimeBackendOverride.


Provider creation + cache

get_provider() (agent/core/llm_provider.py) is a small factory:

runtime = resolve_llm_runtime_state(environ=os.environ)
backend = backend_arg or runtime["effective_backend"]
provider_name = provider_arg or runtime["effective_provider"]

# Cache key includes ALL kwargs so different base_url / api_key
# configurations get separate provider instances.
kwargs_sig = ",".join(f"{k}={v}" for k, v in sorted(kwargs.items()))
cache_key = f"{int(bool(runtime['enabled']))}:{backend}:{provider_name}:{kwargs_sig}"

if cache_key in _provider_cache:
    return _provider_cache[cache_key]

if not runtime["enabled"]:
    instance = DetachedLLMProvider(reason="LLM runtime is detached by operator. ...")
elif backend == "cli":
    instance = ClaudeCliProvider(**kwargs)
elif backend == "api" and provider_name == "anthropic":
    instance = AnthropicProvider(**kwargs)
elif backend == "api" and provider_name in ("openai", "local"):
    instance = OpenAiProvider(**kwargs)
else:
    raise ValueError(f"Unknown backend/provider combo: {backend}/{provider_name}")

_provider_cache[cache_key] = instance
return instance

LlmRuntimeControlService.update_state calls clear_provider_cache() after persisting, so the next get_provider() call after an override picks up the new wiring immediately.


What "detached" actually does

DetachedLLMProvider.generate(...) returns:

{
  "success": false,
  "text": "",
  "error": "LLM runtime is detached by operator. Re-enable it via CLI or /api/operator/llm before running LLM-backed tasks.",
  "input_tokens": 0,
  "output_tokens": 0,
  "cost_usd": 0.0,
  "latency_ms": 0
}

It is fail-closed, free, and instantaneous. Every layer that tries to call the LLM gets the error and surfaces it to the operator. The agent itself keeps running — internal dispatch (status, health, tasks, budget, identity) and conversational fallbacks still work, since they don't hit the LLM.

This is the kill switch you want when:

  • You spotted a runaway model loop and want to stop spending tokens.
  • You're rotating API keys and want a quiet 30-second window.
  • You're debugging a non-LLM bug and want to remove the LLM as a variable.

Audit trail

Every override is recorded:

Where What
llm_runtime.json on disk Latest persisted state with updated_at, updated_by, note
Long-tier log (agent-long.log) Structlog event llm_provider_created (with backend + provider) on every cache miss
Control plane traces (control.db) TraceRecord(kind=CONFIGURATION, title="LLM runtime control updated") with the full summary
Operator dashboard Live state display with timestamp + note

If you're investigating "why did the model change two days ago", grep TraceRecord ... CONFIGURATION in the trace store and you'll see the operator who made the change, when, and why.


Common scenarios

"I want to test the build pipeline against o3 instead of Sonnet"

.venv/bin/python -m agent --llm-runtime-enable \
    --llm-runtime-backend api \
    --llm-runtime-provider openai \
    --llm-runtime-note "trying o3 for tonight's build run"

# Run your build...
.venv/bin/python -m agent --build-repo . --build-description "..."

# Switch back when done
.venv/bin/python -m agent --llm-runtime-follow-env --llm-runtime-enable

"I'm rotating Anthropic keys, give me 30 seconds of quiet"

.venv/bin/python -m agent --llm-runtime-disable --llm-runtime-note "rotating anthropic key"
# rotate the key in 1Password / Bitwarden, paste into .env
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend cli  # or api

"The agent is hung on a Telegram programming task and I need it back"

# Check what backend it thinks it's using
.venv/bin/python -m agent --llm-runtime-status

# If the answer is "cli" and you're on a server, that's the bug — flip to API
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend api --llm-runtime-provider anthropic

See also the Telegram + CLI deny guard which catches this exact case at the brain layer and returns a deterministic operator message instead of hanging.

"I want CI to never hit the live LLM"

CI sets AGENT_RELEASE_READINESS_SKIP_LLM_PROBE=1 so the release-readiness gate skips the live probe. The runtime LLM control isn't involved on CI runners; tests use mocked providers.


Things it does NOT do

  • It does not rotate the master key, the API key, the OAuth token, or any credential. It only flips backend/provider selection. Credentials still come from .env (or the vault for CLI OAuth).
  • It does not survive AGENT_DATA_DIR deletion. The override lives at <data_dir>/control/llm_runtime.json — wipe the data dir, lose the override.
  • It does not broadcast to other agents. Each agent has its own llm_runtime.json. There is no shared state.
  • It does not retry. If you flip to a backend that's misconfigured, the next LLM call fails. Setup doctor will tell you what's wrong; release readiness gate will refuse to mark the agent ready.

Clone this wiki locally