Hierarchical coding with cost-aware model delegation.
Frontier models plan and review; local models write the code. Cost per task is $0 incremental on a bring-your-own-subscription setup.
You bring: (1) Python ≥3.10, (2) a local Ollama server with at least one model pulled, and (3) one of — a Claude Pro / ChatGPT Plus subscription, an Anthropic API key, or (degraded) a second local model for planning. Details in Before you install below.
Frontier-model APIs charge per token, and code generation is token-heavy. A single non-trivial refactor can consume thousands of output tokens across multiple retries, driving API costs into the dollars-per-task range quickly. Subscription plans (Claude Pro, ChatGPT Plus) are cheaper on a per-message basis but have soft quotas — burning them on code generation leaves nothing for the rest of your workday.
Hierocode splits the work by role. The planner (your Claude Pro or ChatGPT Plus subscription, or an API key) turns your task into a structured JSON work breakdown and QAs the result. The drafter (a local Ollama model) does the actual code writing for each bounded unit. The reviewer defaults to the planner tier. Planner calls are short — 5–20 k input tokens, 1–3 k output — so two or three of them barely dent a subscription quota. Drafter calls run entirely on your own hardware at $0.
Hierocode needs three things on your machine. pip install hierocode is the last
step, not the first.
python3 --versionIf older: brew install python@3.12 (macOS), sudo apt install python3.12
(Ubuntu/Debian), or https://www.python.org/downloads/ (Windows).
This is where the actual code gets written. Runs entirely on your hardware, $0 per call.
# Install Ollama (one-time):
# macOS / Windows: https://ollama.com/download (GUI installer)
# Linux: curl -fsSL https://ollama.com/install.sh | sh
# Pull a drafter model (~2 GB for llama3.2:3b)
ollama pull llama3.2:3b
# Verify the Ollama server is running (most installers auto-start it):
curl -fsS http://localhost:11434/ >/dev/null && echo "ollama: ok"
# If not running: `ollama serve` (macOS/Linux) or start the tray app (Windows).Model size the wizard picks for you based on RAM:
| RAM | Recommended drafter |
|---|---|
| ≤ 12 GB | llama3.2:1b |
| 12–24 GB | llama3.2:3b |
| 24–48 GB | qwen2.5-coder:7b |
| ≥ 48 GB | qwen2.5-coder:14b |
The planner breaks your task into sized units and QAs the drafter's output. You need exactly one of these paths configured:
| Path | Cost per task | Setup |
|---|---|---|
| Claude Pro ($17/mo) | $0 incremental | Install Claude Code, run claude /login |
| ChatGPT Plus ($20/mo) | $0 incremental | Install Codex CLI, run codex login |
| Anthropic API key | ~$0.01–0.15/task | export ANTHROPIC_API_KEY=sk-... |
No subscription or API key? Hierocode falls back to using Ollama as the planner too. It works, but plan quality drops noticeably on non-trivial tasks. The wizard warns when this happens. Get a subscription or API key for any real use.
Assumes the three prereqs above are satisfied and claude /login has been run.
pip install hierocode# Detect your environment and write a tailored config.
hierocode init --wizard
# Confirm roles and providers were detected correctly.
hierocode models show
# Sanity-check that providers are reachable.
hierocode doctor
# Estimate cost and message quota before committing.
hierocode run -t "add input validation to the login endpoint" --estimate
# Run the full plan → draft → QA pipeline.
hierocode run -t "add input validation to the login endpoint"The wizard detects claude on PATH, picks it as the planner, reads your RAM to
choose the right Ollama drafter model, and writes ~/.hierocode.yaml. Nothing is
sent to an API until you run hierocode run (or hierocode plan).
If you have ANTHROPIC_API_KEY set and no claude or codex CLI installed, the
wizard still works: hierocode init --wizard detects the env var and routes the
planner through the Anthropic Messages API using claude-haiku-4-5 by default.
Run hierocode models set planner claude-sonnet-4-6 afterward if you want a stronger
planner. All other steps are identical to the $17 journey above.
hierocode run
|
+-- skeleton.py AST symbol index of the repo (Python) / file list (other)
|
+-- capacity.py Probe drafter model: num_ctx, RAM, GPU, tier
|
+-- Planner (claude_code_cli / codex_cli / anthropic / ollama)
| Receives: task + skeleton + capacity profile
| Returns: JSON Plan — list of TaskUnits
| Cache: plan_cache keyed on (task, skeleton, planner_model, drafter_model)
|
+-- Dispatcher Iterates TaskUnits
|
+-- Budget Pack target + context files into drafter context window
|
+-- Drafter (Ollama) — writes code for one bounded unit
| Returns: unified diff
|
+-- QA (Planner tier) — reviews diff against acceptance criteria
verdict:
accept -> emit diff to stdout
revise -> loop back to Drafter (cap: max_revisions_per_unit)
split -> re-plan the unit into sub-units
escalate -> planner drafts the unit directly (cap: max_escalations_per_task)
Diffs are printed to stdout; they are not applied automatically.
| Type | Auth required | Typical role |
|---|---|---|
claude_code_cli |
claude binary, logged in to Pro |
Planner, Reviewer |
codex_cli |
codex binary, logged in to Plus |
Planner, Reviewer |
anthropic |
ANTHROPIC_API_KEY env var |
Planner, Reviewer |
ollama |
none | Drafter (and fallback Planner) |
openai_compatible |
bearer token env var | Planner or Drafter |
lmstudio |
none | Drafter |
transformers_local |
none | Drafter |
Model IDs in each provider:
claude_code_cli/anthropic:claude-opus-4-7,claude-sonnet-4-6,claude-haiku-4-5codex_cli:gpt-5,gpt-5-codex,o4-miniollama: whatever you have pulled; wizard recommends based on RAM
The drafter is always local ($0). Only planner calls have a cost.
| Planner path | Cost per task | Notes |
|---|---|---|
claude_code_cli (Pro, $17/mo) |
$0 incremental | Counts against subscription quota |
codex_cli (Plus, $20/mo) |
$0 incremental | Counts against subscription quota |
anthropic Haiku API |
~$0.01–0.03 | 4 calls x ~10 k input + ~2 k output |
anthropic Sonnet API |
~$0.05–0.15 | 4 calls x ~10 k input + ~2 k output |
The 4-call estimate breaks down as:
- Plan cache hit: 0 planner calls.
- Plan cache miss + typical 3 QA rounds: 1 planning call + 3 QA calls = 4 calls.
Token counts use a char/4 heuristic (not tiktoken) with a 20% safety margin.
Pricing can be overridden — drop ~/.hierocode/pricing.yaml with your own
model prices and subscription limits. Useful when Anthropic ships a new model
before hierocode does, or when you want to track quota math for a different
subscription tier. See plan.md or broker/pricing.py for the schema.
Advanced: active exploration — set routing.planner.exploration: active
in your config to let Claude Code / Codex use their Read/Grep/Glob
tools during planning. Plans get noticeably better; token cost goes up
(still 1 message against your Pro/Plus quota, but input tokens compound
within that message). Off by default to preserve the predictable cost story.
hierocode init # Write default ~/.hierocode.yaml
hierocode init --wizard # Auto-detect environment, write tailored config
hierocode init --force # Overwrite existing config
hierocode doctor # Validate config and ping all providershierocode providers list # List configured providers and their types
hierocode models list # Query each provider for available model names
hierocode models show # Show planner / drafter / reviewer role bindings
hierocode resources # Print CPU count and RAM (total / available)
hierocode workers suggest # Suggest safe / balanced / aggressive worker countshierocode plan -t "..." # Generate and print a JSON plan (no drafting)
hierocode plan -t "..." --planner-model <m> # Override planner model
hierocode run -t "..." # Full pipeline: plan -> draft -> QA
hierocode run -t "..." --estimate # Print cost/quota estimate, then exit
hierocode run -t "..." --drafter-model <m> # Override drafter model
hierocode run -t "..." --repo ./path # Set repo root for skeleton builder
hierocode draft -t "..." --file path/to/file # Draft a patch for a single file
hierocode review -t "..." --file path/to/file # Review a single filehierocode models set planner claude-sonnet-4-6 # Set planner model
hierocode models set drafter llama3.2:3b # Set drafter model
hierocode models set planner claude-haiku-4-5 -p claude_pro # With explicit provider
hierocode cache clear # Delete all cached plans
hierocode version # Print installed versionRun hierocode with no arguments to enter a persistent REPL (requires a config
to exist; run hierocode init --wizard first). The TUI provides slash commands
for every one-shot subcommand above plus a few extras:
/run <task> Full plan → draft → QA pipeline, with live
progress panel (plan units, elapsed time,
token counts, quota bar).
/plan <task> Plan only; /plan show re-displays the last plan.
/estimate <task> Cost/quota preview without executing.
/draft <task> <file> Single-file escape hatch.
/review <task> <file> Single-file review.
/diff Re-display the last diff.
/apply Apply the last diff to disk with per-file
confirmation (y / yes-all / skip / abort).
/usage Per-role token/message usage for the session,
with a quota bar if on a subscription path.
/models Show current role bindings.
/models set <role> <model> Update routing, persists to YAML.
/cache clear Wipe cached plans.
/config edit Open ~/.hierocode.yaml in $EDITOR.
/doctor /resources /repo Diagnostics.
/task save <name> <desc> Save a named task alias (persisted).
/task <name> Run a saved alias.
/task list / /task delete <n> Alias CRUD.
/history /clear /help /exit
Plain-text input (no leading /) can either prompt before running or dispatch
directly to /run — set tui.interaction_mode in your config to prompt
(default, safer) or immediate.
hierocode init --wizard writes ~/.hierocode.yaml. A representative example for
a Claude Pro user with 16 GB RAM:
# Generated by hierocode init --wizard
default_provider: local_ollama
providers:
local_ollama:
type: ollama
base_url: http://localhost:11434
auth:
type: none
claude_pro:
type: claude_code_cli
auth:
type: none
codex_plus:
type: codex_cli
auth:
type: none
anthropic_api:
type: anthropic
auth:
type: bearer_env
env_var: ANTHROPIC_API_KEY
routing:
planner:
provider: claude_pro
model: claude-sonnet-4-6
drafter:
provider: local_ollama
model: llama3.2:3b
reviewer:
provider: claude_pro
model: claude-sonnet-4-6
policy:
max_revisions_per_unit: 2
max_escalations_per_task: 3
warn_before_escalation: true
parallelization:
default_strategy: balanced
max_local_workers: 4
max_remote_workers: 8hierocode init (no wizard) writes a minimal config that points every role at
local_ollama with llama3.2:3b. That works out of the box for pure-local setups
but runs in degraded mode (Ollama-as-planner). Use hierocode init --wizard for
any real setup so the three prereqs above are actually wired up.
v0.1 YAML files still parse — legacy keys like default_model, small_model,
routing.draft_model, and routing.review_model are silently ignored. Run
hierocode init --wizard --force once to upgrade cleanly.
- Not an autonomous agent. Hierocode does not apply patches, run tests, or iterate on failures without user intervention. Every diff must be reviewed and applied manually.
- Diffs are printed, not applied. The
runanddraftcommands emit unified diffs to stdout. There is no--applyflag in v0.2. - AST skeleton is Python-only. Non-Python files are listed by name and size; their symbols are not extracted. The planner sees less context for non-Python repos.
- Token counting uses a char/4 heuristic. The budget module does not use tiktoken for Ollama-side estimation; a 20% safety margin compensates but is not exact.
- Single-file
draftandreviewcommands do not use the QA loop. They call the provider once and return raw output. Usehierocode runfor the full pipeline.
See CONTRIBUTING.md for development setup, linting, and test instructions.
Built with the help of Claude Opus 4.7 (Anthropic).
MIT