A hybrid VM—extend with prompts, stabilize with code.
LLM reasoning and Python code share a unified execution model. Call an agent (LLM) or a tool (Python) with the same convention. Move computation freely between neural and symbolic—stabilize patterns to code when they emerge, soften rigid code back to LLM when edge cases multiply.
[LLM ⟷ Code ⟷ LLM ⟷ Code] → output
The boundary is movable. What's neural today can be symbolic tomorrow—and vice versa.
On top of the VM sits a harness—an imperative orchestration layer where your code owns control flow. Think syscalls, not state machines.
| Aspect | Graph DSLs | llm-do Harness |
|---|---|---|
| Orchestration | Declarative: define Node A → Node B | Imperative: Agent A calls Agent B as a function |
| State | Global context passed through graph | Local scope—each agent receives only its arguments |
| Approvals | Checkpoints: serialize graph state, resume after input | Interception: blocking "syscall" at the tool level |
| Refactoring | Redraw edges, update graph definitions | Change code—extract functions, inline agents |
| Control flow | DSL constructs (branches, loops) | Native Python: if, for, try/except |
For the theoretical foundation, see
docs/theory.md. For implementation details, seedocs/architecture.md.
This is the Unix philosophy for agents: agents are defined in .agent files, dangerous operations are gated syscalls, composition happens through code—not a DSL.
We use uv for development. Install it via curl -LsSf https://astral.sh/uv/install.sh | sh or see the installation docs.
# Install
uv pip install -e . # or: pip install -e .
# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..." # or OPENAI_API_KEY
# Set the default model (recommended approach—see Model Configuration)
export LLM_DO_MODEL="anthropic:claude-haiku-4-5"
# Run a project via manifest
llm-do examples/greeter/project.json "Tell me a joke"llm-do reads project.json, links the listed files, and runs the selected entry.
Declare the entry in the manifest (entry.agent or entry.function) to pick an agent or a Python function.
See examples/ for more.
Example agent file (main.agent):
---
name: main
---
You are a friendly greeter. Respond to the user with a warm, personalized greeting.
Keep your responses brief and cheerful.Example manifest:
{
"version": 1,
"runtime": {
"approval_mode": "prompt",
"max_depth": 5
},
"entry": {
"agent": "main",
"args": { "input": "Hello!" }
},
"agent_files": ["main.agent"],
"python_files": ["tools.py"]
}The VM executes two kinds of operations:
| Operation Type | Implementation | Characteristics |
|---|---|---|
| Neural | Agents (.agent files) |
Stochastic, flexible, handles ambiguity |
| Symbolic | Python tools | Deterministic, fast, cheap, testable |
Orchestration uses ctx.deps.call_agent(...) to delegate between agents; the LLM sees both tools and agents as callable functions:
Agent ──calls──▶ Tool ──calls──▶ Agent ──calls──▶ Tool ...
neural symbolic neural symbolic
This is neuro-symbolic computation: interleaved LLM reasoning and deterministic code, with the boundary between them movable.
Projects grow organically from simple to complex:
Minimal — just an agent:
my-project/
└── orchestrator.agent
With helpers — orchestrator delegates to focused agents:
my-project/
├── orchestrator.agent # Entry point
├── analyzer.agent # Focused agent
└── formatter.agent # Another focused agent
With stabilized operations — extract reliable logic to Python:
my-project/
├── orchestrator.agent
├── analyzer.agent
├── tools.py # Shared Python tools
├── input/
└── output/
This progression reflects moving computation within the VM: initially you might prompt the LLM to "rename the file to remove special characters". Once you see it works, extract that to a Python function—deterministic, testable, no LLM variability. The operation migrates from neural to symbolic without changing how callers invoke it. See the pitchdeck examples for a concrete progression: pitchdeck_eval (all LLM) → pitchdeck_eval_stabilized (extracted tools) → pitchdeck_eval_code_entry (Python orchestration).
Recommended approach: Set the LLM_DO_MODEL environment variable as your project-wide default:
export LLM_DO_MODEL="anthropic:claude-haiku-4-5"This keeps model configuration external to your agent definitions, making it easy to switch models across your entire project or between environments (dev/prod).
Per-agent override: Only specify model in an .agent file when that agent genuinely requires a different model than the project default:
---
name: complex_analyzer
model: anthropic:claude-sonnet-4-20250514 # Needs stronger reasoning
---
You analyze complex documents...Resolution order:
- Agent's explicit
modelfield (highest priority) LLM_DO_MODELenvironment variable (recommended default)- Error if neither is set
Model format: Model names follow PydanticAI conventions—provider:model_name (e.g., anthropic:claude-haiku-4-5, openai:gpt-4o-mini).
When constructing AgentSpec in Python, pass a resolved Model instance (use
resolve_model("provider:model")).
Add custom tools by creating tools.py in your project root. Toolsets are
defined as factories via ToolsetSpec, so each call gets its own instance:
# tools.py
from pydantic_ai.toolsets import FunctionToolset
from llm_do.runtime import ToolsetSpec
def build_tools():
tools = FunctionToolset()
@tools.tool
def sanitize_filename(name: str) -> str:
"""Remove special characters from filename."""
return "".join(c if c.isalnum() or c in ".-_" else "_" for c in name)
return tools
tools = ToolsetSpec(factory=build_tools)Functions become LLM-callable tools. Reference the toolset name in your agent's toolsets config and list tools.py in project.json under python_files.
To access runtime context (for calling other tools/agents), accept a RunContext and use ctx.deps:
# tools.py
from pydantic_ai.tools import RunContext
from pydantic_ai.toolsets import FunctionToolset
from llm_do.runtime import ToolsetSpec, CallContext
def build_tools():
tools = FunctionToolset()
@tools.tool
async def analyze_config(ctx: RunContext[CallContext], raw: str) -> str:
"""Delegate parsing to another agent."""
return await ctx.deps.call_agent("config_parser", {"input": raw})
return tools
tools = ToolsetSpec(factory=build_tools)You can also use:
- Server-side tools — Provider-executed capabilities like web search and code execution
# Run a project via manifest
llm-do project.json "input message"
# Use manifest default input (entry.args)
llm-do project.json
# Provide JSON input
llm-do project.json --input-json '{"input":"Hello"}'Common flags: --headless, --tui, --chat, -v/-vv/-vvv, --input-json, --debug. See docs/cli.md for full reference.
If you're orchestrating from Python, link a single entry from files and run it:
import asyncio
from pathlib import Path
from llm_do.runtime import (
EntryConfig,
RunApprovalPolicy,
Runtime,
build_registry,
resolve_entry,
)
project_root = Path(".").resolve()
registry = build_registry(
["main.agent"],
["tools.py"],
project_root=project_root,
)
entry = resolve_entry(
EntryConfig(agent="main"),
registry,
python_files=["tools.py"],
base_path=project_root,
)
runtime = Runtime(
run_approval_policy=RunApprovalPolicy(mode="approve_all"),
project_root=project_root,
)
runtime.register_agents(registry.agents)
async def main() -> None:
result, _ctx = await runtime.run_entry(
entry,
{"input": "Analyze this data"},
)
print(result)
asyncio.run(main())build_registry() requires an explicit project_root; pass the same root to Runtime
to keep filesystem toolsets and attachment resolution aligned.
| Example | Demonstrates |
|---|---|
greeter/ |
Minimal project structure |
pitchdeck_eval/ |
Multi-agent orchestration for pitch decks |
pitchdeck_eval_stabilized/ |
Progressive stabilizing: extracted Python tools |
pitchdeck_eval_code_entry/ |
Full stabilizing: Python orchestration, tool entry point |
calculator/ |
Custom Python tools |
approvals_demo/ |
Write approval for file operations |
file_organizer/ |
Stabilizing pattern: LLM semantic decisions + Python cleanup |
code_analyzer/ |
Shell commands with approval rules |
web_searcher/ |
Server-side tools (web search) |
Some experiments include standalone Python entry points. Run them from the repo root so imports resolve:
uv run experiments/inv/v2_direct/run.py
uv run -m experiments.inv.v2_direct.rundocs/theory.md— Theoretical foundation: probabilistic programs, stabilizing/softening, tradeoffsdocs/architecture.md— Internal structure: unified calling, harness layer, runtime scopesdocs/reference.md— API reference: workflows, toolsets, agent formatdocs/cli.md— CLI referencedocs/notes/— Working design notes and explorations
Experimental — Built on PydanticAI. APIs may change.
Working: Agent resolution, agent delegation, approvals, custom tools.
TUI: The interactive terminal UI (Textual-based) is experimental. Use --chat to keep it open for multi-turn input, or --headless for non-interactive mode.
llm-do excels at normal-code control flow and progressive stabilizing. It's not a durable workflow engine—no built-in checkpointing or replay. For that, use llm-do as a component within Temporal, Prefect, or similar.
Tool approvals reduce risk but aren't guarantees. Prompt injection can trick LLMs into misusing granted tools. Treat approvals as one layer of defense.
For real isolation, run llm-do in a container or VM.
PRs welcome! Run uv run pytest before committing. See AGENTS.md.