All notable changes to the Elves skill are documented here.
- Merge Conflicts: What to do when
git pushfails due to a diverged remote (fetch and merge, never rebase, resolve or Hard Stop). Added to both SKILL.md and AGENTS.md.
- Scout Mode: Added prioritization guidance (risk-reducing first, then quality, then leave ambiguous items), validation gate requirement, commit format (
[branch · Scout]), and when-to-stop rules. Applied to both SKILL.md and AGENTS.md. - Entropy Check: Added cadence scaling guidance (check after batch 2-3 for short plans, every 3 for long, stretch to 4-5 if reviews are clean).
- AGENTS.md Planning phase: Added full planning section with interactive and autonomous modes, architecture survey, and references to plan-template.md and kickoff-prompt-template.md. Was the biggest content gap between the two files.
- Contract step: "Build on" section explicitly marked as required (was only shown in the example).
- Compaction Recovery: Added note about restoring files from git history if compaction happens during Final Completion cleanup.
- AGENTS.md: Added
python3 (no jq)explanation, compaction recovery cleanup note, config.json reference already added in v1.3.1.
- Wired the Judge into the Core Loop as step 8. The legality check was described in a standalone section but had no step number — an agent following steps literally would never run it. Now step 8 sits between Review (7) and Document (9).
- Renumbered all Core Loop steps 1-15. Eliminated the "11a" hack — PR Loop is now step 13.
- Fixed heading levels. Batch Decomposition and Time Allocation were peers of Core Loop when they should have been children.
- Unified Orient and Compaction Recovery reading orders. Added
.elves-session.jsonto Orient step. - Clarified proof scope in Completion Contract. "Relevant tests" → "Touched-surface tests" with note that broad regression runs at entropy checks and Readiness Gate.
- Added legality check to Completion Contract (now 14 items).
- Added
ghAPI failure/retry guidance to step 13 (PR Loop). - Added references to
review-subagent.md,plan-template.md,kickoff-prompt-template.md.
- AGENTS.md: renumbered steps 1-15, added step 8 (Judge), added
.elves-session.jsonto Orient, added legality check to Completion Contract, addedghAPI failure guidance, added Persistent Preferences section. - README: changed "v0" to "still early", updated file structure diagram, removed placeholder URL.
- TODO.md: marked stale PR #5 items as done.
- "Don't wait to open the PR" — explicit instruction to open the PR after the first pushed commit, not delay until the branch is nearly done. Keep using the same PR throughout the run.
- PR Loop (step 13): After every push (including mid-implementation), poll PR comments, inline reviews, and check status before starting new work. Lightweight scan that defers to step 7 for full review cycles.
- Three quality layers made explicit: correctness (validation gates), plan compliance (review step), legality (the judge). Each asks a different question.
- The gaming problem: Explains why agent-authored tests have a ceiling — agents can satisfy every deterministic test while missing the point. The constitution breaks through by providing success criteria the agent didn't author.
- The constitution:
docs/constitution.mdorCONSTITUTION.mdcontains deal-breaker behaviors (flows with mermaid diagrams, business logic, invariants). Read during every Orient step and compaction recovery. - The judge: Read-only legality check producing PASS/WARN/FAIL/UNCHANGED verdicts per intention. Runs after each batch passes validation and review. FAIL blocks the batch.
- The flywheel: Constitution grows via planning (propose new intentions), mistakes (regressions become safeguards), and incidents. Agent drafts, human owns.
- Readiness Gate: 7-point branch-level checklist before declaring review-ready (local proof on current tip, preview proof, artifact inspection, PR comments polled, legality check clean, git status clean, execution log current). Distinct from the per-batch Completion Contract.
- Touched-surface vs broad regression proof. Default to touched-surface per batch; run broad regression at entropy check intervals and before readiness.
- Re-earn proof after each push — don't inherit proof from prior commits after review fixes.
- Artifact inspection — inspect actual downloaded output for export/download changes.
- Four-category triage unified across step 7, step 13, and judge: fix now, defer, intentional design, false positive. Replaces the previous three-category scheme.
- Subagent capacity: If pool is full, reuse/close idle agents or do work directly. Never skip review.
- Process warnings: Stop and clean up if session/process-count warnings appear.
- Updated AGENTS.md (Codex variant) with all v1.3.0 changes.
- Updated CHANGELOG.md and README.md.
- Philosophy now informs planning, contracts, and implementation — not just review. Previously the 9 principles were enforced at review time. Now they're threaded through the entire lifecycle:
- Planning: New architecture survey step before batch decomposition. Batch ordering is architecture-aware — shared utilities go in early batches, pattern-setting batches come before pattern-following ones.
- Contract (step 4): New Build on section identifies specific existing patterns, utilities, and conventions the batch should extend. Gives the implementing agent a concrete target and the reviewer something specific to verify against.
- Implementation (step 5): New pre-implementation survey — search for relevant utilities, patterns, and conventions before writing code. Logged in the execution log so the reviewer can check whether the agent used what it found.
- Review (step 7): Reviewer now checks implementation against the Build on section and pre-implementation survey. Creating a duplicate of something identified in the survey is a blocking finding.
- Added Time Allocation guidance to the core loop. Default is equal thirds (implement, validate, review); configurable in survival guide. Agents naturally rush validation and review — this makes the expected balance explicit and trackable.
- Added Entropy Check step (step 12): every 3 batches, the agent performs a cross-batch quality scan to catch accumulated drift — duplicated utilities, naming inconsistencies, diverging patterns — that individual batch reviews miss. Cadence is configurable via survival guide.
- Added Principle #9: Favor boring technology. Agents should prefer well-known, stable, composable libraries over novel ones. "Boring" technology has stable APIs and broad training-data representation, making agents more reliable. Sometimes reimplementing a small utility is cheaper than pulling in an opaque dependency.
- Added Architectural Boundaries section to survival guide template: optional section for defining layered architecture, dependency direction, module ownership, and enforcement mechanisms (structural tests, lint rules). Helps agents respect boundaries in larger codebases.
- Expanded Prior art and convergence section in README. Elves, Anthropic, OpenAI, and Factory AI independently converged on the same core patterns for autonomous agent orchestration — plan approval before execution, persistent state across context boundaries, iterative self-correction, quality enforcement, and codebase conditioning for agent performance. Added Factory AI Missions and their Agent Readiness framework as a third independent convergence point alongside Anthropic and OpenAI.
- Core loop steps renumbered: Continue or Stop is now step 13 (was 12).
- Added future ideas to TODO.md: process self-improvement across sessions, multi-model routing, secret redaction, codebase context indexing.
- Added Verify Green step (step 2): agents confirm the project is in a working state before starting each batch
- Added Contract step (step 4): agents define testable acceptance criteria before writing code (generator/evaluator pattern)
- Two-stage validation now explicit in core loop: local gates then preview deployment
- Structured session data (
.elves-session.json) tracksreview_commentsdispositions (fixed,dismissed,deferred) for compaction recovery - Strengthened review loop with commit-message-as-communication-channel guidance
- Consistent philosophy principles (1-8) across SKILL.md, AGENTS.md, and review-subagent.md
- Cross-references between AGENTS.md and reference docs (validation-guide.md, verification-patterns.md)
- Browser verification language clarified: "strongly recommended" (not blocking for non-UI projects)
- Generalized browser automation references to "Playwright, Cypress, or similar" (not Playwright MCP-specific)
- Dependency installation examples added to SKILL.md Verify Green step
- Batch sizing examples clarified as overrides of the stated default
- Configurable thresholds (5-modification, 3-cycle) noted as overridable via survival guide
- Interactive planning phase: the agent and user collaborate on scope, batches, and configuration before any code is written
- Multi-batch execution with user-defined sprint sizing (default: 4 developers x 2-week sprint)
- Core loop: Orient, Tag, Implement, Validate, Review, Document, Update, Push, Re-read, Continue
- Three-document compaction survival system (Plan, Survival Guide, Execution Log)
- Subagent delegation for long runs (Claude Code): implementer, validator, reviewer, scout
- Scout mode for bonus improvements after planned batches are done
- Time-aware pacing with session budgets
- Rollback safety with
elves/pre-batch-Ngit tags - Structured session data in
.elves-session.json - Persistent preferences via
config.json - Skill memory: execution logs improve over time
- Forbidden commands:
git reset --hard,git checkout .,git clean -fd,git push --force,git rebaseon shared branches - Test integrity: never modify a test to make it pass. Fix the code, not the test
- Non-interactive operation with
CI=trueand comprehensive env var hardening - Mid-run check-in protocol: answer concisely, keep going
- PreToolUse hook example for deterministic enforcement of forbidden commands
- Survey and popup suppression guidance for Claude Code, Codex, and Cursor
- Two-stage validation: local gates then preview deployment
- Auto-discovery for Node.js, Python, Go, Rust, and Makefile projects
- Zero accumulated debt philosophy: every batch must be production-ready
- Verification patterns: headless browser drivers, video recording, smoke testing, state assertions, custom scripts
- Built-in review subagent reads PR comments, bot reviews, and CI status (zero config)
- Adversarial review pattern: fresh-eyes subagent with no implementation context
- Custom review API support (opt-in)
- Additional checks: smoke tests, visual review, doc review, custom scripts
- Finding triage: genuine issues, intentional design choices, false positives
- README with origin story, Ralph Loop, Human Sandwich, and honest v0 framing
- "What to expect" section with real ROI numbers (6-9 months of work in 3-4 hours of human time)
- "Riding along" guidance: say "do not stop" in every message
- "What can go wrong" failure modes table with mitigations
- Preventing sleep/shutdown guide (caffeinate, systemd-inhibit, tmux/screen)
- Monitoring with GitKraken and Slack notifications
- Claude Code hooks: SessionStart for auto-loading context, PreToolUse for forbidden commands
- Installation guide: global and per-project for Claude Code, Codex, Claude.ai
- "Making it your own" customization guidance
- Daily briefing and Friday planning cadence
- Survival guide template with non-negotiables, tool config, and safe rollback procedures
- Execution log template with timing breakdown and Ralph Loop framing
- Plan template with worked example (auth system refactor) and Human Sandwich framing
- Kickoff prompt template (minimal and full versions) with daily briefing guidance
- Tool configuration examples for Node, Python, Go, Rust, monorepos, and custom APIs
preflight.sh: comprehensive pre-run checklist (git, auth, project detection, sleep prevention, gate dry-runs, notification checks, non-interactive env guidance, branch staleness)notify.sh: Slack webhook, custom command, or PR comment notification helper. Returns proper exit codes in --test mode for preflight validation
- Claude Code (SKILL.md): full feature set with subagents and hooks
- Codex (AGENTS.md): direct execution, concise format
- Claude.ai: zip upload
- Any Agent Skills compatible platform (open standard)
- Passes
agentskills validate