Skip to content

zby/llm-do

Repository files navigation

llm-do

Skills with their own runtime. Like Claude Code skills, but each worker runs as a separate agent that can delegate to other workers.

Why llm-do?

The way to build useful non-deterministic systems more complex than chat is making them deterministic at key spots.

Delegation. Workers call other workers like function calls. A summarizer delegates to an analyzer; an orchestrator coordinates specialists. Each runs with its own tools and model.

Unified function space. Workers and Python tools are the same abstraction—they call each other freely. LLM reasoning and deterministic code interleave; which is which becomes an implementation detail.

Tight context. Each worker does one thing well. No bloated multi-purpose prompts that try to handle everything. Task executors receive only relevant history—no conversation baggage from parent agents.

Guardrails by construction. Attachment policies cap resources, tool approvals gate dangerous operations. Guards enforced in code, not prompt instructions.

Progressive hardening. Start with prompts for flexibility. As patterns stabilize, extract deterministic logic to tested Python code. Or go the other direction—soften rigid code into prompts when edge cases multiply.

Quick Start

# Install
pip install -e .

# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..."  # or OPENAI_API_KEY

# Run a worker
cd examples/greeter
llm-do "Tell me a joke" --model anthropic:claude-haiku-4-5

The CLI runs main.worker in the current directory. Use --worker to run a specific worker. See examples/ for more.

Core Concepts

Workers are .worker files: YAML front matter (config) + body (instructions). Workers and Python tools form a unified function space—each is exposed as a callable tool, taking input and returning results. LLM reasoning and deterministic code call each other freely (nested calls capped at depth 5):

Worker ──calls──▶ Tool ──calls──▶ Worker ──calls──▶ Tool ...
        reason          execute          reason

This dual recursion lets each component play to its strengths: LLMs handle ambiguity and context; Python handles precision and speed. See docs/concept.md for the full design philosophy.

Why "workers" not "agents"? llm-do is built on PydanticAI, which uses "agent" for its LLM orchestration primitive. We use "worker" to distinguish our composable, constrained prompt units from the underlying PydanticAI agents that execute them. A worker defines what to do; the PydanticAI agent executes it.

Project Structure

Projects grow organically from simple to complex:

Minimal — just a worker:

my-project/
└── orchestrator.worker

With helpers — orchestrator delegates to focused workers:

my-project/
├── orchestrator.worker   # Entry point
├── analyzer.worker       # Focused worker
└── formatter.worker      # Another focused worker

With hardened operations — extract reliable logic to Python:

my-project/
├── orchestrator.worker
├── analyzer.worker
├── tools.py              # Shared Python tools
├── templates/            # Shared Jinja templates
├── input/
└── output/

This progression reflects progressive hardening: initially you might prompt the LLM to "rename the file to remove special characters". Once you see it works, extract that to a Python function—deterministic, testable, no LLM variability. See examples/pitchdeck_eval_hardened for a concrete before/after comparison.

Custom Tools

Add custom tools by creating tools.py in your project root:

# tools.py
def sanitize_filename(name: str) -> str:
    """Remove special characters from filename."""
    return "".join(c if c.isalnum() or c in ".-_" else "_" for c in name)

Functions become LLM-callable tools. Reference them in your worker's toolsets config.

You can also use:

  • Jinja2 templates — Compose prompts from reusable templates in templates/
  • Server-side tools — Provider-executed capabilities like web search and code execution

CLI Reference

# Run main.worker in current directory
cd my-project
llm-do "input message" --model anthropic:claude-haiku-4-5

# Run specific worker
llm-do --worker orchestrator "input" --model anthropic:claude-haiku-4-5

# Run from a different directory
llm-do --dir /path/to/project "input" --model anthropic:claude-haiku-4-5

# Override config at runtime
llm-do "input" --model anthropic:claude-sonnet-4 --set locked=true

# Create a new project
llm-do init my-project

Model names follow PydanticAI conventions (e.g., anthropic:claude-sonnet-4-20250514, openai:gpt-4o-mini).

See docs/cli.md for full reference.

Examples

Example Demonstrates
greeter/ Minimal project structure
pitchdeck_eval/ Multi-worker orchestration, PDF attachments
calculator/ Custom Python tools
approvals_demo/ Write approval for file operations
code_analyzer/ Shell commands with approval rules
web_searcher/ Server-side tools (web search)

Documentation

Status

Experimental — Built on PydanticAI. APIs may change.

Working: Worker resolution, worker delegation, approvals, custom tools, templates.

TUI: The interactive terminal UI (Textual-based) is experimental. Use --headless for non-interactive mode.

Security

Tool approvals reduce risk but aren't guarantees. Prompt injection can trick LLMs into misusing granted tools. Treat approvals as one layer of defense.

For real isolation, run llm-do in a container or VM.

Contributing

PRs welcome! Run uv run pytest before committing. See AGENTS.md.

About

Spec-driven automation with LLM and progressive hardening

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages