Orchestrator — Auto Mode Protocol

Loaded by any skill when the user appends auto to their invocation. Example: /implementation auto inventory-app, /requirements auto my-project

This is NOT a skill. It is a shared protocol that modifies how existing skills behave. Without auto, every skill works exactly as before — no behavior changes.

When Auto Mode Activates

User says	Auto mode?
`/implementation auto X`	Yes — chain after implementation completes
`/requirements auto X`	Yes — chain full pipeline from requirements
"build X and commit when ready"	Yes — infer auto from intent
"auto", "hands-off", "I'll be away"	Yes
`/implementation X` (no auto flag)	No — normal interactive mode, this file is not read

Core Principles

Never stop to ask between skills. Auto mode means AUTO. Chain skills without pausing for confirmation. Do not ask "Ready to continue?" or "Should I proceed?" — just proceed. The only reason to pause is ambiguity or failure, not ceremony.
Plan before code. No file is touched until a concrete code change plan exists and passes eval.
Evidence-first (G-AUTO-1). Every change cites its source: requirement ID, test result, code grep, or research output. Never assume.
Minimum tokens. Use the cheapest model that can do the job. Plan with Opus, implement with Sonnet/Haiku.
All gates still apply. Auto mode removes user wait time between passing steps — it does NOT lower quality bars.
Stop ONLY on: ambiguity that can't be resolved from docs, same failure twice after auto-fix, eval < 70%. NOT on skill transitions.
Backward compatible. Without auto flag, nothing changes. Every skill works exactly as before.

What "Auto" Means — No Stopping

CRITICAL: When auto is set, the agent MUST NOT:

Ask "Ready to continue to /implementation?" — just run it
Ask "Should I proceed with the next slab?" — just build it
Ask "Want me to commit?" — if precommit passes, commit
Present a summary and wait — present and immediately continue
Ask for confirmation between any two skills in the pipeline

The ONLY acceptable pauses:

A requirement is genuinely ambiguous (not just complex)
The same fix has failed twice
Eval score is below 70%
A decision affects cost, security, or external services

Everything else: keep going.

Model Selection

Phase	Model	Why
Requirements gathering	Opus	Judgment, ambiguity detection
Architecture decisions	Opus	Trade-off analysis, evidence weighing
Code change planning	Opus	File-level design, dependency analysis
Implementation	Sonnet/Haiku	Mechanical — plan specifies what to write
Test writing	Sonnet/Haiku	Follows patterns from plan
Precommit checks	Sonnet	Pattern matching against rules
Evaluation	Opus	Quality judgment, nuanced scoring
README/Cleanup	Sonnet	Mechanical consolidation

User can override: /implementation auto model=opus X forces Opus throughout.

Auto Pipeline

1. RESUME?    — read HANDOFF.md if it exists (resume, don't restart)
2. RESEARCH   — auto-trigger relevant agents for context
3. PLAN       — concrete code change plan (file-by-file, function-by-function)
4. EVAL PLAN  — plan must score ≥ 95% completeness before implementation starts
5. BUILD      — /implementation per slab (Sonnet/Haiku)
6. VERIFY     — /verify (output quality, skip user-confirms)
7. GATE       — /precommit full mode (tests, G-IMPL-6, standards, README)
8. EVAL       — /evaluate quick on slab (must score ≥ 95%)
9. COMMIT     — auto-commit with descriptive message
10. NEXT      — loop to step 5 for next slab, or FINAL QUALITY
11. FINAL     — /evaluate full, readme-validator fix, guardrail audit
12. CLEANUP   — archive artifacts, README = source of truth

Step 1: Resume Check

On startup, check for HANDOFF.md in project root.

If exists: This is a resumed session. Read handoff, skip to the step indicated. Do NOT re-plan or re-research what's already done.
If not exists: Fresh start. Proceed to step 2.

Also read project-state.md if it exists — check decisions, feature status, warnings.

Step 2: Auto-Research

CRITICAL: When user input is sparse (one-liner like "build inventory app"), research is MANDATORY before building. Don't assume what the app needs — investigate what every app in that category has. The one-line input defines the domain. Research fills in the details.

Ask: "What does every app in this category have?" — not "What did the user literally say?"

Why this rule exists: The orchestrator once assumed a basic CRUD app for "inventory app", missed barcode scanning (core to any real inventory workflow), missed pricing/sales tracking, and built a non-interactive dashboard. All table-stakes features that 5 minutes of research would have caught.

Trigger research agents:

Signal	Agent	Scope
Sparse input / new domain	functional-researcher	ALWAYS for one-liners. Top 2-3 products, extract table-stakes features, common workflows, domain-specific patterns
Tech choice needed	tech-stack-advisor	Options with trade-offs, don't decide
Design pattern question	pattern-advisor	2-3 patterns with when-to-use
Scale mentioned	scale-estimator	Back-of-envelope numbers

Rules:

Sparse input = functional-researcher is MANDATORY, not optional
Research output becomes evidence for decisions (cited as [functional-researcher], [tech-stack-advisor])
Research stays scoped — don't exhaustively survey, get enough to decide
If requirements/architecture docs already exist and are sufficient, skip research
After research, draft requirements that include domain table-stakes — not just what the user said

Step 3: Code Change Plan

Mandatory before any code is written. Produced by Opus.

## Code Change Plan: [feature/slab]

### New files
- `path/file.ext` — purpose, key functions/classes, ~line count estimate

### Modified files
- `path/file.ext` — what changes (specific: "add router" not "update file")

### Tests
- `path/test_file.ext` — N tests: [list what each tests]

### Dependencies
- New: [list] or None
- Modified: [list] or None

### Evidence
- [D-REQ-N] requirement text (copied, not summarized)
- [D-ARCH-N] architecture decision with evidence source
- [D-IMPL-N] pattern choice with rationale

Step 4: Eval Plan

Quick evaluation of the plan before implementation:

Does every file map to a requirement? (no orphan files)
Does every requirement map to a file? (no missing implementation)
Are dependencies declared? (no surprise installs during implementation)
Is the plan small enough for one slab? (if not, split)

If plan scores < 95% → revise plan. Do not proceed to code.

Step 5: Build (per slab)

Run /implementation for the current slab using the cheaper model (Sonnet/Haiku).

Follow all existing implementation rules (TDD, slab discipline, mock-first)
Session limits still apply
The plan constrains what gets built — Sonnet executes, doesn't redesign

Auto-fix: If a test fails, attempt fix (max 2 attempts). Still failing → PAUSE.

Step 6: Verify

Run /verify in auto mode:

Session health check (Step 1) — still applies
Output quality check (Step 3) — still applies
Skip "user confirms" (Step 5) — eval gate replaces human judgment

Auto-judge heuristics (since user isn't watching):

Does output match format described in requirements?
Is it curated (top 3-5 items) or raw dump (20+ items)?
Are there unexplained numbers without context?
Does it answer the user's question or just return data?

If any heuristic fails → PAUSE with specific concern.

Step 7: Gate (Precommit)

Run /precommit full mode. All steps mandatory:

Step 1: Instruction compliance
Step 2: Test quality audit
Step 2b: Test suite execution (if runner exists)
Step 3: Code standards + G-IMPL-6 (no easy way out)
Step 5: Project rules compliance
Step 5b: README validation+fix (readme-validator in fix mode)

Auto-fix minor issues: missing imports, naming violations, missing .env.example entries. PAUSE on: unaddressed instructions, architectural decisions, ambiguous choices, test failures that aren't obvious.

Step 8: Eval Gate

Run /evaluate quick mode on the current slab (Opus).

Score	Action
≥ 95%	Proceed to commit
70-94%	Auto-fix mechanical issues (naming, missing tests, formatting). Re-eval ONCE. If still < 95% → PAUSE
< 70%	PAUSE immediately — something fundamental is wrong

Commit requires: precommit passing AND eval ≥ 95%. Both gates, not either.

Step 9: Auto-Commit

If all gates pass:

Stage specific changed files (never git add -A)
Commit with descriptive message following G12
Format: [slab N/M] <what this slab does> (<test count> tests)
Push only if user previously authorized pushing

Step 10: Next Slab or Final

If more slabs remain → loop to Step 5 (build). If all slabs complete → proceed to Final Quality.

Step 11: Final Quality (Opus)

/evaluate full — all dimensions, all slabs, overall score
readme-validator in fix mode — validate+fix entire README line-by-line
Guardrail audit — scan all committed code for violations:
- G-IMPL-6 (no shortcuts)
- G-PUSH-1 (precommit ran)
- G-PC-1-5 (test quality, instructions)
- G-AUTO-1 (evidence citations)

If any violation → fix and re-commit. If unfixable → report to user.

Step 12: Cleanup (Sonnet)

Consolidate key decisions from requirements/, architecture/, reports/ into README "Architecture Decisions" section
Archive artifacts: archive/<date>/requirements/, archive/<date>/reports/
Update project-state.md: mark all features as verified
Delete HANDOFF.md (work is complete)
README becomes source of truth — must contain: what, install, run, test, debug, env vars, decisions

Token Tracking

Estimate context usage: ~4 characters per token.

Checkpoint	Threshold	Action
After each phase	~20K tokens (~80K chars)	Continue
Approaching limit	~25K tokens (~100K chars)	Pre-compute next slab plan, prepare for handoff
At limit	~32K tokens (~128K chars)	Generate HANDOFF.md, stop

Handoff Protocol

When token limit approached:

Commit all work done so far
Pre-compute code change plan for next slab (so new session can execute immediately)
Generate HANDOFF.md in project root (see project-state-template.md for format)
Update project-state.md with full status
Report: "Context limit approaching. Handoff file created. Start new session to continue."

Pause Protocol

When auto mode pauses:

AUTO PAUSED at [slab N/M], [step name]

Reason: [specific issue — not generic]
What I need: [specific question or decision]
What's done: [slabs committed, tests passing]
What's committed: [commit hashes]
Evidence reviewed: [what was checked before pausing]

Reply to continue, or /status for full picture.

What Auto Mode Never Does

Guess at ambiguous requirements
Loop more than twice on the same failure
Commit code scoring below 95% eval
Skip precommit or any quality gate
Push without prior authorization
Make architectural decisions without evidence
Delete user code or existing tests
Install packages not in the plan
Make monetary, infrastructure, or security-sensitive decisions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orchestrator — Auto Mode Protocol

When Auto Mode Activates

Core Principles

What "Auto" Means — No Stopping

Model Selection

Auto Pipeline

Step 1: Resume Check

Step 2: Auto-Research

Step 3: Code Change Plan

Step 4: Eval Plan

Step 5: Build (per slab)

Step 6: Verify

Step 7: Gate (Precommit)

Step 8: Eval Gate

Step 9: Auto-Commit

Step 10: Next Slab or Final

Step 11: Final Quality (Opus)

Step 12: Cleanup (Sonnet)

Token Tracking

Handoff Protocol

Pause Protocol

What Auto Mode Never Does

FilesExpand file tree

orchestrator.md

Latest commit

History

orchestrator.md

File metadata and controls

Orchestrator — Auto Mode Protocol

When Auto Mode Activates

Core Principles

What "Auto" Means — No Stopping

Model Selection

Auto Pipeline

Step 1: Resume Check

Step 2: Auto-Research

Step 3: Code Change Plan

Step 4: Eval Plan

Step 5: Build (per slab)

Step 6: Verify

Step 7: Gate (Precommit)

Step 8: Eval Gate

Step 9: Auto-Commit

Step 10: Next Slab or Final

Step 11: Final Quality (Opus)

Step 12: Cleanup (Sonnet)

Token Tracking

Handoff Protocol

Pause Protocol

What Auto Mode Never Does