Runtime LLM Control

A persistent operator override that lets you flip the LLM backend, switch providers, or detach the LLM entirely — without restarting the agent. The override survives reboots and is honoured by every layer that asks for an LLM provider, including the brain pipeline, the build codegen, the review service, and the release-readiness probe.

Code: agent/control/llm_runtime.py. Tests: tests/test_llm_runtime.py.

Why this exists. The agent runs as a long-lived daemon. Restarting it to change the model means losing in-memory caches, breaking long-poll Telegram connections, and waiting for the watchdog to register a clean state. Operators need to switch from CLI to API in seconds, not minutes — for example, when the host CLI is in maintenance, when the API has a credit issue, or when a programming task needs a stronger model than the local CLI offers.

Mental model

.env → LLM_BACKEND, LLM_PROVIDER (defaults)
                   │
                   ▼
       resolve_llm_runtime_state()
                   │
                   ├─ load llm_runtime.json (operator override)
                   │
                   └─ produce effective state:
                       enabled            (bool)
                       effective_backend  ("cli" | "api")
                       effective_provider ("anthropic" | "openai" | "local")
                       follows_env        (bool)
                       override_active    (bool)
                       updated_at, updated_by, note
                   │
                   ▼
       get_provider() in agent/core/llm_provider.py
                   │
                   ├─ if not enabled  → DetachedLLMProvider (every call returns "detached by operator")
                   ├─ if backend=cli  → ClaudeCliProvider
                   └─ if backend=api  → AnthropicProvider | OpenAiProvider
                   │
                   ▼
       Brain, build codegen, review service, release-readiness probe

The persisted state lives in <AGENT_DATA_DIR>/control/llm_runtime.json. It's a small JSON file with five fields:

{
  "enabled": true,
  "backend_override": "api",
  "provider_override": "openai",
  "updated_at": "2026-04-08T22:13:47.123456+00:00",
  "updated_by": "operator",
  "note": "switched to o3 for build pipeline"
}

If backend_override and provider_override are both empty, the resolver "follows env" — meaning the values from .env apply directly. If enabled is false, every LLM call gets a DetachedLLMProvider that returns a deterministic "detached by operator" error without calling the network.

Three control surfaces

You can flip the override from any of:

Surface	Use case
CLI (`python -m agent --llm-runtime-*`)	Scripts, systemd unit files, ad-hoc shell
HTTP (`POST /api/operator/llm`)	Dashboard, automation, agent-to-agent
Dashboard (`/dashboard` LLM Runtime panel)	Click-driven, with audit log

All three call into the same LlmRuntimeControlService.update_state(...) method, which persists the change, clears the LLM provider cache, and records a TraceRecord(kind=CONFIGURATION) in the control plane.

CLI

# Inspect current state
.venv/bin/python -m agent --llm-runtime-status

# Detach the LLM entirely (every LLM call fails closed)
.venv/bin/python -m agent --llm-runtime-disable --llm-runtime-note "maintenance"

# Switch back to CLI backend
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend cli \
    --llm-runtime-note "back to Claude CLI"

# Switch to Anthropic API
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend api \
    --llm-runtime-provider anthropic

# Switch to OpenAI (or any OpenAI-compatible endpoint like Ollama / vLLM / LM Studio)
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend api \
    --llm-runtime-provider openai

# Drop both overrides and follow whatever .env says
.venv/bin/python -m agent --llm-runtime-follow-env --llm-runtime-enable

HTTP

# GET — current state
curl -H "Authorization: Bearer $AGENT_API_KEY" \
     http://localhost:8420/api/operator/llm

# POST — update
curl -X POST \
     -H "Authorization: Bearer $AGENT_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
       "enabled": true,
       "backend": "api",
       "provider": "openai",
       "note": "switching to o3 for the next build pipeline"
     }' \
     http://localhost:8420/api/operator/llm

The response is the new effective state (the same shape --llm-runtime-status prints).

Dashboard

/dashboard has an "LLM Runtime" panel with three buttons:

Detach — sets enabled=false. Every LLM call fails closed.
CLI — sets enabled=true, backend=cli.
API (anthropic | openai | local) — sets enabled=true, backend=api, provider=<x>.

Each click POSTs to /api/operator/llm with the operator's Bearer token, refreshes, and shows the new state in the panel header.

Effective backend resolution

input:
  - .env values: LLM_BACKEND, LLM_PROVIDER, AGENT_DATA_DIR
  - persisted state: enabled, backend_override, provider_override

resolve_llm_runtime_state(environ, state):
    env_backend  = environ.get("LLM_BACKEND",  "cli")     # default = cli
    env_provider = environ.get("LLM_PROVIDER", "anthropic") # default = anthropic

    effective_backend = state.backend_override OR env_backend
    if effective_backend == "cli":
        effective_provider = "anthropic"   # CLI = Claude
    else:
        effective_provider = state.provider_override OR env_provider OR "anthropic"

    follows_env = (no backend_override AND no provider_override)

    return {
      "enabled": state.enabled,
      "env_backend": env_backend,
      "env_provider": env_provider,
      "backend_override": state.backend_override,
      "provider_override": state.provider_override,
      "effective_backend": effective_backend,
      "effective_provider": effective_provider,
      "follows_env": follows_env,
      "override_active": (not state.enabled) or (not follows_env),
      ...
    }

The brain (agent/core/brain.py::_process_inner) reads effective_backend from this resolver — not from os.environ["LLM_BACKEND"]. That was the second HIGH bug Codex caught: the legacy code went around the resolver and silently ignored every operator override.

Tested in tests/test_llm_runtime.py::TestBrainHonoursRuntimeBackendOverride.

Provider creation + cache

get_provider() (agent/core/llm_provider.py) is a small factory:

runtime = resolve_llm_runtime_state(environ=os.environ)
backend = backend_arg or runtime["effective_backend"]
provider_name = provider_arg or runtime["effective_provider"]

# Cache key includes ALL kwargs so different base_url / api_key
# configurations get separate provider instances.
kwargs_sig = ",".join(f"{k}={v}" for k, v in sorted(kwargs.items()))
cache_key = f"{int(bool(runtime['enabled']))}:{backend}:{provider_name}:{kwargs_sig}"

if cache_key in _provider_cache:
    return _provider_cache[cache_key]

if not runtime["enabled"]:
    instance = DetachedLLMProvider(reason="LLM runtime is detached by operator. ...")
elif backend == "cli":
    instance = ClaudeCliProvider(**kwargs)
elif backend == "api" and provider_name == "anthropic":
    instance = AnthropicProvider(**kwargs)
elif backend == "api" and provider_name in ("openai", "local"):
    instance = OpenAiProvider(**kwargs)
else:
    raise ValueError(f"Unknown backend/provider combo: {backend}/{provider_name}")

_provider_cache[cache_key] = instance
return instance

LlmRuntimeControlService.update_state calls clear_provider_cache() after persisting, so the next get_provider() call after an override picks up the new wiring immediately.

What "detached" actually does

DetachedLLMProvider.generate(...) returns:

{
  "success": false,
  "text": "",
  "error": "LLM runtime is detached by operator. Re-enable it via CLI or /api/operator/llm before running LLM-backed tasks.",
  "input_tokens": 0,
  "output_tokens": 0,
  "cost_usd": 0.0,
  "latency_ms": 0
}

It is fail-closed, free, and instantaneous. Every layer that tries to call the LLM gets the error and surfaces it to the operator. The agent itself keeps running — internal dispatch (status, health, tasks, budget, identity) and conversational fallbacks still work, since they don't hit the LLM.

This is the kill switch you want when:

You spotted a runaway model loop and want to stop spending tokens.
You're rotating API keys and want a quiet 30-second window.
You're debugging a non-LLM bug and want to remove the LLM as a variable.

Audit trail

Every override is recorded:

Where	What
`llm_runtime.json` on disk	Latest persisted state with `updated_at`, `updated_by`, `note`
Long-tier log (`agent-long.log`)	Structlog event `llm_provider_created` (with backend + provider) on every cache miss
Control plane traces (`control.db`)	`TraceRecord(kind=CONFIGURATION, title="LLM runtime control updated")` with the full summary
Operator dashboard	Live state display with timestamp + note

If you're investigating "why did the model change two days ago", grep TraceRecord ... CONFIGURATION in the trace store and you'll see the operator who made the change, when, and why.

Common scenarios

"I want to test the build pipeline against o3 instead of Sonnet"

.venv/bin/python -m agent --llm-runtime-enable \
    --llm-runtime-backend api \
    --llm-runtime-provider openai \
    --llm-runtime-note "trying o3 for tonight's build run"

# Run your build...
.venv/bin/python -m agent --build-repo . --build-description "..."

# Switch back when done
.venv/bin/python -m agent --llm-runtime-follow-env --llm-runtime-enable

"I'm rotating Anthropic keys, give me 30 seconds of quiet"

.venv/bin/python -m agent --llm-runtime-disable --llm-runtime-note "rotating anthropic key"
# rotate the key in 1Password / Bitwarden, paste into .env
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend cli  # or api

"The agent is hung on a Telegram programming task and I need it back"

# Check what backend it thinks it's using
.venv/bin/python -m agent --llm-runtime-status

# If the answer is "cli" and you're on a server, that's the bug — flip to API
.venv/bin/python -m agent --llm-runtime-enable --llm-runtime-backend api --llm-runtime-provider anthropic

See also the Telegram + CLI deny guard which catches this exact case at the brain layer and returns a deterministic operator message instead of hanging.

"I want CI to never hit the live LLM"

CI sets AGENT_RELEASE_READINESS_SKIP_LLM_PROBE=1 so the release-readiness gate skips the live probe. The runtime LLM control isn't involved on CI runners; tests use mocked providers.

Things it does NOT do

It does not rotate the master key, the API key, the OAuth token, or any credential. It only flips backend/provider selection. Credentials still come from .env (or the vault for CLI OAuth).
It does not survive AGENT_DATA_DIR deletion. The override lives at <data_dir>/control/llm_runtime.json — wipe the data dir, lose the override.
It does not broadcast to other agents. Each agent has its own llm_runtime.json. There is no shared state.
It does not retry. If you flip to a backend that's misconfigured, the next LLM call fails. Setup doctor will tell you what's wrong; release readiness gate will refuse to mark the agent ready.

Repo · CHANGELOG · Releases · Issues · MIT License

Agent Life Space

v1.35.0 · Latest Release

Getting started

Architecture

Subsystems

Development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime LLM Control

Runtime LLM Control

Mental model

Three control surfaces

CLI

HTTP

Dashboard

Effective backend resolution

Provider creation + cache

What "detached" actually does

Audit trail

Common scenarios

"I want to test the build pipeline against o3 instead of Sonnet"

"I'm rotating Anthropic keys, give me 30 seconds of quiet"

"The agent is hung on a Telegram programming task and I need it back"

"I want CI to never hit the live LLM"

Things it does NOT do

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agent Life Space

Clone this wiki locally