| title | Development Methodologies Reference | ||||
|---|---|---|---|---|---|
| description | Quick reference for 15 structured AI-assisted development methodologies including TDD, SDD, and BDD | ||||
| tags |
|
Confidence: Tier 2 — Validated by multiple production reports and official documentation.
Last updated: February 2026
This is a quick reference for 15 structured development methodologies that have emerged for AI-assisted development in 2025-2026. For hands-on practical workflows, see workflows/.
- Decision Tree
- The 15 Methodologies
- SDD Tools Reference
- Writing Effective Specs
- Combination Patterns
- Sources
┌─ "I want quality code" ────────────→ workflows/tdd-with-claude.md
│
├─ "I want to spec before code" ─────→ workflows/spec-first.md
│
├─ "I need to plan architecture" ────→ workflows/plan-driven.md
│
├─ "I'm iterating on something" ─────→ workflows/iterative-refinement.md
│
└─ "I need methodology theory" ──────→ Continue reading below
Where each methodology sits on two axes: Spec-First vs Code-First (Y) and Lean/Solo vs Enterprise/Governed (X).
SPEC / PLANNING FIRST
▲
── lean · spec ── │ ── governed · spec ──
│
[Doc-Driven] [SDD] │ [BDD] [ATDD] [Req-Driven]
[GSD] [Plan-First] │ [CDD] [ADR-Driven] [DDD] [BMAD]
│
LEAN ─────────────────────────┼────────────────────────────────► ENTERPRISE
│
── lean · code ── │ ── governed · code ──
│
[Context Eng.] [TDD] │ [Multi-Agent]
[Prompt Eng.] [Iterative] │ [Eval-Driven] [FDD]
[Ralph Loop] │ [JiTTesting]
│
CODE / EMERGENT
How to read it:
- Top-left — Spec-first lean:
SDD,Doc-Driven,Plan-First. Natural entry point for solo devs and small teams moving away from "code first". - Top-right — Spec-first governed:
BMAD,Req-Driven,ATDD,DDD. Real governance, but costly to set up. ROI is driven by project complexity and requirement stability, not headcount alone. - Bottom-left — Code-first lean: the natural Claude Code terrain.
TDD+Ralph Loop+Iterative= core solo workflow. - Bottom-right — Code-first at scale:
Multi-Agent,Eval-Driven,JiTTesting(Meta, 100M+ LoC). Emerging patterns for high-volume teams. - On the axis —
Plan-First,CDD,ADR-Driven,GSD: hybrid approaches that adapt to any context.
Organized in a 6-tier pyramid from strategic orchestration down to optimization techniques.
| Name | What | Best For | Claude Fit |
|---|---|---|---|
| BMAD | Multi-agent governance with constitution as guardrail | High-complexity projects with stable requirements, compliance or governance needs | ⭐⭐ Niche but powerful |
| GSD | Meta-prompting 6-phase workflow with fresh contexts per task | Solo devs, Claude Code CLI | ⭐⭐ Similar to patterns in guide |
BMAD (Breakthrough Method for Agile AI-Driven Development) inverts the traditional paradigm: documentation becomes the source of truth, not code. Uses specialized agents (Analyst, PM, Architect, Developer, QA) orchestrated with strict governance. Note: BMAD's role-based agent naming reflects their methodology; see §9.17 Agent Anti-Patterns for scope-focused alternatives.
- Key concept: Constitution.md as strategic guardrail
- When to use: Complex enterprise projects needing governance
- When to avoid: MVPs, rapid prototyping, evolving requirements — BMAD is brittle when specs change mid-project
GSD (Get Shit Done) addresses context rot through systematic 6-phase workflow (Initialize → Discuss → Plan → Execute → Verify → Complete) with fresh 200k-token contexts per task. Core concepts (multi-agent orchestration, fresh context management) overlap significantly with existing patterns like Ralph Loop, Gas Town, and BMAD. See resource evaluation for detailed comparison.
Emerging: Ralph Inferno implements autonomous multi-persona workflows (Analyst→PM→UX→Architect→Business) with VM-based execution and self-correcting E2E loops. Experimental but interesting for "vibe coding at scale".
"Once the plan is good, the code is good." — Boris Cherny, creator of Claude Code
Not just a feature (/plan command) — a systematic discipline.
Context Engineering: Thoughtworks designates this broader approach "Context Engineering" in their Technology Radar (Nov 2025)1 — the systematic design of information provided to LLMs during inference. Three core techniques: context setup (minimal system prompts, few-shot examples), context management for long-horizon tasks (summarization, external memories, sub-agent architectures), and dynamic information retrieval (JIT context loading). Related patterns in Claude Code: AGENTS.md, MCP Context7, Plan Mode.
The Mental Model:
Planning isn't optional for complex tasks. It's the difference between:
- ❌ 8 iterations of "try → fix → retry → fix again"
- ✅ 1 iteration of "plan → validate → execute cleanly"
When to plan first:
| Task Complexity | Plan First? | Why |
|---|---|---|
| >3 files modified | ✅ Yes | Cross-file dependencies need architecture |
| >50 lines changed | ✅ Yes | Enough complexity for mistakes |
| Architectural changes | ✅ Yes | Impact analysis required |
| Unfamiliar codebase | ✅ Yes | Need exploration before action |
| Typo/obvious fix | ❌ No | Planning overhead > task time |
| Single-line change | ❌ No | Just do it |
How plan-first works:
-
Exploration phase (Plan Mode via
Shift+Tab):- Claude reads files, explores architecture
- No edits allowed → forces thinking before action
- Proposes approach with trade-offs
-
Validation phase (you review):
- Plan exposes assumptions and gaps
- Easier to correct direction now vs after 100 lines written
- Plan becomes contract for execution
-
Execution phase (toggle back to Normal Mode with
Shift+Tab):- Plan → code becomes mechanical translation
- Fewer surprises, cleaner implementation
- Faster overall despite "slower" start
Boris Cherny workflow:
"I run many sessions, start in plan mode, then switch into execution once the plan looks right. The signature upgrade is verification—giving Claude a way to test and confirm its own output."
Benefits over "just start coding":
- Fewer correction iterations: Plan catches issues before they become code
- Better architecture: Forced to think about structure first
- Clearer communication: Plan is shared understanding with team/Claude
- Reduced cost: One clean iteration < multiple messy iterations (even if plan phase costs tokens)
Integration with CLAUDE.md:
Document your team's plan-first triggers:
## Planning Policy
- ALWAYS plan first: API changes, database migrations, new features
- OPTIONAL planning: Bug fixes <10 lines, test additions
- NEVER skip: Changes affecting >2 modulesSee also: Plan Mode documentation for /plan command usage.
Advanced pattern: For an iterative annotation-based approach to plan-driven development, see Custom Markdown Plans (Boris Tane Pattern).
| Name | What | Best For | Claude Fit |
|---|---|---|---|
| SDD | Specs before code | APIs, contracts | ⭐⭐⭐ Core pattern |
| Doc-Driven | Docs = source of truth | Cross-team alignment | ⭐⭐⭐ CLAUDE.md native |
| Req-Driven | Rich artifact context (20+ artifacts) | Complex requirements | ⭐⭐ Heavy setup |
| DDD | Domain language first | Business logic | ⭐⭐ Design-time |
SDD (Spec-Driven Development) — Specifications BEFORE code. One well-structured iteration equals 8 unstructured ones. CLAUDE.md IS your spec file.
Doc-Driven Development — Living documentation versioned in git becomes the single source of truth. Changes to specs trigger implementation.
Requirements-Driven Development — Uses CLAUDE.md as comprehensive implementation guide with 20+ structured artifacts.
DDD (Domain-Driven Design) — Aligns software with business language through:
- Ubiquitous Language: Shared vocabulary in code
- Bounded Contexts: Isolated domain boundaries
- Domain Distillation: Core vs Support vs Generic domains
| Name | What | Best For | Claude Fit |
|---|---|---|---|
| BDD | Given-When-Then scenarios | Stakeholder collaboration | ⭐⭐⭐ Tests & specs |
| ATDD | Acceptance criteria first | Compliance, regulated | ⭐⭐ Process-heavy |
| CDD | API contracts as interface | Microservices | ⭐⭐⭐ OpenAPI native |
BDD (Behavior-Driven Development) — Beyond testing: a collaboration process.
- Discovery: Involve devs and business experts
- Formulation: Write Given-When-Then examples
- Automation: Convert to executable tests (Gherkin/Cucumber)
Feature: Order Management
Scenario: Cannot buy without stock
Given product with 0 stock
When customer attempts purchase
Then system refuses with error messageATDD (Acceptance Test-Driven Development) — Acceptance criteria defined BEFORE coding, collaboratively ("Three Amigos": Business, Dev, Test).
In agentic development, ATDD is particularly effective because agents need unambiguous success conditions. The flow maps cleanly to agent tasks:
- Define acceptance criteria in Gherkin (human-readable, machine-executable)
- Agent writes failing tests based on scenarios (not implementation)
- Agent implements until tests pass
Feature: Password Reset
Scenario: User resets via email
Given a registered user with email "user@example.com"
When they request a password reset
Then they receive a reset email within 60 seconds
And the reset link expires after 24 hoursThis Gherkin scenario is the contract between intent and implementation. The agent cannot misinterpret scope because done is defined before a line of code is written.
Applied to agents: Pass the Gherkin file to Claude Code before implementing. "Write failing tests for this feature file, then implement until they pass." The scenario writer role (human or agent) forces explicit scope before execution starts.
CDD (Contract-Driven Development) — API contracts (OpenAPI specs) as executable interface between teams. Patterns: Contract as Test, Contract as Stub.
JiTTesting (Just-in-Time Testing) — Tests generated on-the-fly at PR submission, designed to fail, then discarded after merge. No maintenance cost, no test suite growth.
TDD/BDD/ATDD all assume the developer controls the pace of code authoring. Agentic development breaks that assumption: an agent can generate 200 lines per hour, faster than any human test-writing workflow can keep up with. JiTTests are the industrial response to that mismatch.
The mechanism: at PR time, an LLM infers the intent of the diff, generates code mutants (deliberately broken variants), writes tests that catch those mutants, runs ensemble rule-based and LLM assessors to filter false positives, and surfaces only real regressions to the engineer. The tests never land in the codebase.
Meta deployed this at scale (100M+ LoC): 4x improvement in catching regressions over traditional hardening tests, 70% reduction in human review load, 4 serious production failures prevented from 41 candidates reviewed.
No open-source implementation exists yet. You can approximate this today: before merging any agent-generated PR, prompt Claude with "generate tests that would catch regressions introduced by this diff specifically — I'll run them locally and discard them after the PR closes." The ephemeral framing focuses test generation on what actually changed rather than general coverage.
Reference: Just-in-Time Catching Test Generation at Meta — Harman, 2026.
| Name | What | Best For | Claude Fit |
|---|---|---|---|
| FDD | Feature-by-feature delivery | Feature teams with parallel delivery | ⭐⭐ Structure |
| Context Eng. | Context as first-class design | Long sessions | ⭐⭐⭐ Fundamental |
FDD (Feature-Driven Development) — Five processes:
- Develop Overall Model
- Build Features List
- Plan by Feature
- Design by Feature
- Build by Feature
Strict iteration: 2 weeks max per feature.
Context Engineering — Treat context as design element:
- Progressive Disclosure: Let agent discover incrementally
- Memory Management: Conversation vs persistent memory
- Dynamic Refresh: Rewrite TODO list before response
| Name | What | Best For | Claude Fit |
|---|---|---|---|
| TDD | Red-Green-Refactor | Quality code | ⭐⭐⭐ Core workflow |
| Eval-Driven | Evals for LLM outputs | AI products | ⭐⭐⭐ Agents |
| Multi-Agent | Orchestrate sub-agents | Complex tasks | ⭐⭐⭐ Task tool |
TDD (Test-Driven Development) — The classic cycle:
- Red: Write failing test
- Green: Minimal code to pass
- Refactor: Clean up, tests stay green
With Claude: Be explicit. "Write FAILING tests that don't exist yet."
Verification Loops — A formalized pattern for autonomous iteration (broader than TDD):
Core principle: Give Claude a mechanism to verify its own output.
Code generated → Verification tool → Feedback loop → ImprovementWhy it works (Boris Cherny): "An agent that can 'see' what it has done produces better results."
Verification mechanisms by domain:
Domain Verification Tool What Claude "Sees" Frontend Browser preview (live reload) Visual rendering, layout, interactions Backend Tests (unit/integration) Pass/fail status, error messages Types TypeScript compiler Type errors, incompatibilities Style Linters (ESLint, Prettier) Style violations, formatting issues Performance Profilers, benchmarks Execution time, memory usage Accessibility axe-core, screen readers WCAG violations, navigation issues Security Static analyzers (Semgrep) Vulnerability patterns UX User testing, recordings Usability problems, confusion points TDD as canonical example:
- Claude writes tests for the feature
- Claude iterates code until tests pass
- Continue until explicit completion criteria met
Official guidance: "Tell Claude to keep going until all tests pass. It will usually take a few iterations." — Anthropic Best Practices
Implementation patterns:
- Hooks: PostToolUse hook runs verification after each edit
- Browser extension: Claude in Chrome sees rendered output
- Test watchers: Jest/Vitest watch mode provides instant feedback
- CI/CD gates: GitHub Actions runs full validation suite
- Multi-Claude verification: One Claude codes, another reviews
Anti-pattern: Blind iteration without feedback. Without verification mechanism, Claude can't converge toward correct solution—it guesses.
For the implementation-side failure mode this prevents, see The Verification Gap in the TDD workflow.
Eval-Driven Development — TDD for LLMs. Test agent behaviors via evals:
- Code-based:
output == golden_answer - LLM-based: Another Claude evaluates
- Human grading: Reference, slow
Eval Harness — The infrastructure that runs evaluations end-to-end: providing instructions and tools, running tasks concurrently, recording steps, grading outputs, and aggregating results.
See Anthropic's comprehensive guide: Demystifying Evals for AI Agents
Multi-Agent Orchestration — From single assistant to orchestrated team:
Meta-Agent (Orchestrator)
├── Analyst (requirements)
├── Architect (design)
├── Developer (code)
└── Reviewer (validation)
Pattern: Write plain English ADRs → Feed to implement-adr skill → Execute natively
Architecture Decision Records (ADRs) combined with Claude Code skills create a workflow where architectural decisions drive implementation directly.
Workflow Steps:
- Document decision in ADR format (context, decision, consequences)
- Create implementation skill (generic or
implement-adrspecialized) - Feed ADR as prompt to skill with clear acceptance criteria
- Claude executes based on architectural guidance in ADR
Example ADR Template:
# ADR-001: Database Migration Strategy
## Context
Legacy MySQL schema needs migration to PostgreSQL for better JSON support.
## Decision
Use incremental dual-write pattern with feature flags.
## Consequences
- Positive: Zero-downtime migration
- Negative: Temporary code complexity during transition
Implementation Workflow:
# 1. Write ADR (plain English)
vim docs/adr/001-database-migration.md
# 2. Feed to implementation skill
/implement-adr docs/adr/001-database-migration.md
# 3. Claude executes based on ADR guidance
# → Creates migration scripts
# → Updates ORM configuration
# → Adds feature flags
# → Implements dual-write logicBenefits:
- ✅ Documentation-driven: Architecture and code stay synchronized
- ✅ Native execution: No external frameworks needed
- ✅ Traceable decisions: Clear audit trail from decision to implementation
- ✅ Team alignment: ADRs communicate intent to both humans and AI
Source: Gur Sannikov embedded engineering workflow
| Name | What | Best For | Claude Fit |
|---|---|---|---|
| Iterative Loops | Autonomous refinement | Optimization | ⭐⭐⭐ Core |
| Fresh Context | Reset per task, state in files | Long autonomous sessions | ⭐⭐⭐ Power users |
| Prompt Engineering | Technique foundation | Everything | ⭐⭐⭐ Prerequisite |
Iterative Refinement Loops — Autonomous convergence:
- Execute prompt
- Observe result
- If result ≠ "DONE" → refine and repeat
Prompt Engineering — Foundations for ALL Claude usage:
- Zero-Shot Chain of Thought: "Think step by step"
- Few-Shot Learning: 2-3 examples of expected pattern
- Structured Prompts: XML tags for organization
- Position Matters: For long docs, place question at end
Fresh Context Pattern (Ralph Loop) — Solves context rot by spawning fresh agent instances per task. State persists in git + progress files, not chat history. Ideal for long autonomous sessions (migrations, overnight runs). See Ultimate Guide - Fresh Context Pattern for implementation.
Three tools have emerged to formalize Spec-Driven Development:
| Tool | Use Case | Official Docs | Claude Integration |
|---|---|---|---|
| Spec Kit | Greenfield, governance | github.blog/spec-kit | /speckit.constitution, /speckit.specify, /speckit.plan |
| OpenSpec | Brownfield, changes | github.com/Fission-AI/OpenSpec | /openspec:proposal, /openspec:apply, /openspec:archive |
| Specmatic | API contract testing | specmatic.io | MCP agent available |
| Spec-to-Code Factory | Greenfield, enforcement outillé | github.com/SylvainChabaud/spec-to-code-factory | Implémentation référence multi-agents (BREAK→MODEL→ACT→DEBRIEF) |
5-phase workflow:
- Constitution:
/speckit.constitution→ guardrails - Specify:
/speckit.specify→ requirements - Plan:
/speckit.plan→ architecture - Tasks:
/speckit.tasks→ decomposition - Implement:
/speckit.implement→ code
Two-folder architecture:
openspec/
├── specs/ ← Current truth (stable)
└── changes/ ← Proposals (temporary)
Workflow: Proposal → Review → Apply → Archive
- Contract as Test: Auto-generates 1000s of tests from OpenAPI spec
- Contract as Stub: Mock server for parallel development
- Backward Compatibility: Detects breaking changes
Based on analysis of 2,500+ agent configuration files. Source: Addy Osmani
| Component | What to Include | Example |
|---|---|---|
| Commands | Executable with flags | npm test -- --coverage |
| Testing | Framework, coverage, locations | vitest, 80%, tests/ |
| Project structure | Explicit directories | src/, lib/, tests/ |
| Code style | One example > paragraphs | Show a real function |
| Git workflow | Branch, commit, PR format | feat/name, conventional commits |
| Boundaries | Permission tiers | See below |
| Tier | Symbol | Use For |
|---|---|---|
| Always do | ✅ | Safe actions, no approval (lint, format) |
| Ask first | High-impact changes (delete, publish) | |
| Never do | 🚫 | Hard stops (commit secrets, force push main) |
⚠️ Research shows more instructions = worse adherence to each one.Solution: Feed only relevant spec sections per task, not the entire document.
| Project Size | Approach |
|---|---|
| Small (<10 files) | Single spec file |
| Medium (10-50 files) | Sectioned spec, feed per task |
| Large (50+ files) | Sub-agent routing by domain |
Recommended stacks by situation:
| Situation | Recommended Stack | Notes |
|---|---|---|
| Solo MVP | SDD + TDD | Minimal overhead, quality focus |
| Team 5-10, greenfield | Spec Kit + TDD + BDD | Governance + quality + collaboration |
| Microservices | CDD + Specmatic | Contract-first, parallel dev |
| Existing SaaS (100+ features) | OpenSpec + BDD | Change tracking, no spec drift |
| High-complexity / compliance | BMAD + Spec Kit + Specmatic | Full governance + contracts |
| LLM-native product | Eval-Driven + Multi-Agent | Self-improving systems |
| Methodology | Level | Primary Focus | Best Context | Learning Curve |
|---|---|---|---|---|
| BMAD | Orchestration | Governance | High complexity, stable requirements | High |
| SDD | Specification | Contracts | Any | Medium |
| Doc-Driven | Specification | Alignment | Any | Low |
| Req-Driven | Specification | Context | Complex requirements, many artifacts | Medium |
| DDD | Specification | Domain | Complex business domain | Very High |
| BDD | Behavior | Collaboration | Multi-role stakeholder involvement | Medium |
| ATDD | Behavior | Compliance | Regulated, explicit acceptance criteria | Medium |
| CDD | Behavior | APIs | Service boundaries, parallel teams | Medium |
| FDD | Delivery | Features | Feature teams, parallel delivery | Medium |
| Context Eng. | Delivery | AI sessions | Any | Low |
| TDD | Implementation | Quality | Any | Low |
| Eval-Driven | Implementation | AI outputs | Any | Medium |
| Multi-Agent | Implementation | Complexity | Any | Medium |
| Iterative | Optimization | Refinement | Any | Low |
| Prompt Eng. | Optimization | Foundation | Any | Very Low |
- Anthropic: Claude Code Best Practices
- Anthropic: Effective Context Engineering for AI Agents
- Anthropic: Demystifying Evals for AI Agents
- GitHub: Spec-Driven Development Toolkit
- Microsoft: Spec-Driven Development with Spec Kit
SDD & Spec-First
- Addy Osmani: How to Write Good Specs for AI Agents
- Addy Osmani: My AI Coding Workflow in 2026 — End-to-end workflow: spec-first, context packing, TDD, git checkpoints
- Martin Fowler: SDD Tools Analysis
- InfoQ: Spec-Driven Development
- Kinde: Beyond TDD - Why SDD is the Next Step
- Tessl.io: Spec-Driven Dev with Claude Code
BMAD
- GMO Recruit: The BMAD Method
- Benny Cheung: BMAD - Reclaiming Control in AI Dev
- GitHub: BMAD-AT-CLAUDE
TDD with AI
- Steve Kinney: TDD with Claude
- Nathan Fox: Taming GenAI Agents
- Alex Op: Custom TDD Workflow Claude Code
BDD & DDD
- Alex Soyes: BDD Behavior-Driven Development
- Alex Soyes: DDD Domain-Driven Design
- Inflectra: Behavior-Driven Development
Context Engineering
- Intuition Labs: What is Context Engineering
- Manus.im: Context Engineering for AI Agents
Eval-Driven & Multi-Agent
- Fireworks AI: Eval-Driven Development with Claude Code
- Brandon Casci: Transform into a Dev Team using Claude Code Agents
- The Unwind AI: Claude Code's Multi-Agent Orchestration
- OpenSpec: github.com/Fission-AI/OpenSpec
- Spec Kit: github.com/github/spec-kit
- Specmatic: specmatic.io
- Specmatic Article: Spec-Driven Development with GitHub Spec Kit and Specmatic MCP
- Talent500: Claude Code TDD Guide
- Testlio: Acceptance Test-Driven Development
- Monday.com: Feature-Driven Development
- Paddo.dev: Ralph Wiggum Autonomous Loops
- Walturn: Prompt Engineering for Claude
- AWS: Prompt Engineering with Claude on Bedrock
- workflows/tdd-with-claude.md — Practical TDD guide
- workflows/spec-first.md — Spec-first development
- workflows/plan-driven.md — Using /plan mode
- workflows/iterative-refinement.md — Refinement loops
- ultimate-guide.md#912 — Section 9.12 summary
Footnotes
-
Thoughtworks Technology Radar Vol 33, Nov 2025. PDF. See also: Macro trends blog post. ↩