Read this file after completing Phase 7 (Synthesis).
You have just finished orchestrating a multi-agent debate. Before closing out, you must perform a systematic review and make improvements to the debate system for future runs.
Read through these files from the debate you just completed:
05_debate_log.md- The actual debate transcript06_reflections.md- Agent reflections07_synthesis.md- Your synthesis report_state/quality_assessment.json- Final quality metrics_state/attack_registry.json- Attack/response tracking_state/contribution_tracker.json- Contribution counts
Answer these questions honestly:
- Did agents quote each other directly and rebut specific claims?
- Did agents use emotional, confrontational language (not diplomatic)?
- Did agents avoid forbidden phrases ("I understand your point...", etc.)?
- Did agents defend themselves when attacked?
- Did agents stay in character throughout?
- Were the personas distinct and authentic?
- Were there genuine back-and-forth exchanges (3+ turns on same point)?
- Did the debate feel heated, not like a polite panel?
- Were there surprising moments (unexpected alliances, concessions)?
- Did agents actually change positions or make real concessions in reflections?
- Was the debate substantive (5-8 sentences per contribution)?
- Did you use sequential waves (2-3 agents at a time)?
- Did you update state files after each wave?
- Did you use the attack registry to force responses?
- Did you check quality gates before advancing phases?
- Did you use Opus subagents (not Haiku/Sonnet)?
- Did agents read their persona cards and the debate rules?
- Does the synthesis report capture the key disagreements?
- Are the policy recommendations grounded in the debate?
- Did you identify the core cruxes (empirical and value-based)?
For each "No" answer above, write a brief note:
- What specifically happened?
- Why did it happen?
- What could have prevented it?
Common failure patterns to look for:
- Diplomatic collapse: Agents being too nice to each other
- Parallel monologues: Agents not reacting to each other
- Shallow takes: Short, superficial contributions
- Role drift: Agents breaking character
- Missed attacks: Unresolved attacks in the registry
- Template ignorance: Agents not reading persona cards or rules
- Premature advancement: Moving phases before quality gates passed
Based on your review, you MUST now edit the system files to prevent these issues in future debates.
-
debate_initialisation_prompt.md- Main orchestrator instructions- Add warnings about specific failure modes you observed
- Clarify instructions that were ambiguous
- Add new quality checks if needed
- Improve agent invocation templates
-
quick_start.md- Quick onboarding guide- Add lessons learned
- Update common scenarios section
- Clarify any confusing instructions
-
_master_templates/debate_rules.json- Debate behavior rules- Add forbidden phrases you observed being used
- Add required phrases that worked well
- Adjust response length requirements if needed
-
_master_templates/moderator_prompts.json- Moderator interventions- Add new escalation prompts that would have helped
- Add new opening clash templates
- Improve existing prompts based on what worked
-
_master_templates/persona_card_template.json- Agent persona structure- Add fields that would have been useful
- Improve the example if it was unclear
-
_master_templates/state_schemas.json- State file documentation- Add clarity if state files were misused
DO:
- Add specific warnings based on real failures ("WARNING: In testing, agents often did X...")
- Add new forbidden/required phrases you discovered
- Improve clarity of ambiguous instructions
- Add new moderator prompts that would have helped
- Document lessons learned prominently
DON'T:
- Remove existing instructions without good reason
- Make changes that aren't grounded in observed problems
- Add complexity without clear benefit
- Change file formats or folder structure without very strong justification
After making edits, add an entry to this file:
## [DATE] - [DEBATE TOPIC]
### Triggered By:
[Name the specific debate that revealed this issue, e.g., "OpenAI 2026 Leadership (8 Participants)"]
### Issues Observed:
- [Issue 1]
- [Issue 2]
### Changes Made:
- [File]: [What was changed and why]
- [File]: [What was changed and why]
### Expected Impact:
- [How this should improve future debates]If you have a strong, well-reasoned idea for a structural improvement, you may:
- Add new template files to
_master_templates/ - Create new state tracking mechanisms
- Restructure the phase workflow
Requirements for structural changes:
- Write a clear rationale (what problem does this solve?)
- Ensure backward compatibility (don't break existing debates)
- Update all documentation to reflect the change
- Keep it simple - complexity should be justified
Add your entries below this line:
-
State files never updated (CRITICAL): After 5 waves of debate,
attack_registry.json,contribution_tracker.json, andquality_assessment.jsonwere all still at their initial default values. The orchestrator completely skipped the MODERATOR UPDATE ROUTINE. -
Quality gates never formally checked: The orchestrator advanced phases without verifying that
ready_for_next_phase == truein quality_assessment.json. -
No PDF output generated: The debate concluded without generating a PDF synthesis report because no instructions existed for this.
-
Agents may not have read debate_rules.json consistently: While agents behaved appropriately (likely from persona prompting), there was no verification they actually read the rules file.
- Agents quoted each other directly and made substantive rebuttals
- Strong emotional, confrontational language maintained throughout
- Genuine position shifts occurred (TECHNO conceding Weber's point)
- Sequential wave execution worked correctly (2-3 agents at a time)
- Opus subagents produced high-quality roleplay
- Synthesis outputs were comprehensive and insightful
debate_initialisation_prompt.md:
- Added prominent WARNING box before MODERATOR UPDATE ROUTINE emphasizing state file updates are mandatory
- Added POST-WAVE CHECKLIST (7 items) that must be completed after every wave
- Added new Phase 7.5 — PDF GENERATION with full XeLaTeX template and compilation instructions
- Added new failure modes to appendix: "Skipped State Updates", "Missing PDF Output", "No Formal Quality Gates"
- Added quality signals for state file updates and PDF generation
quick_start.md:
- Added new critical lesson: "The #6 Mistake: Skipping State File Updates (MOST COMMON!)"
- Added new critical lesson: "The #7 Mistake: No PDF Output"
- Added common scenario: "I forgot to update state files for several waves"
- Added common scenario: "XeLaTeX isn't installed"
- Updated TL;DR checklist to emphasize state file updates and include Phase 7.5 and Phase 8
_master_templates/moderator_prompts.json:
- Added new "post_wave_checklist" section with 7 mandatory items and verification command
_master_templates/debate_rules.json:
- Added new "moderator_reminders" section with after_every_wave and before_phase_transition checklists
- Included explicit warning about the common failure observed in testing
-
State file compliance: The prominent warnings, mandatory checklists, and explicit failure documentation should make it nearly impossible for future orchestrators to skip state file updates.
-
PDF output: Every debate will now produce a consistently styled PDF as a final deliverable.
-
Quality gates: Explicit reminders to check quality_assessment.json before phase transitions should prevent premature advancement.
-
Self-documenting failures: By documenting the exact failure mode in multiple places, future orchestrators will recognize when they're about to make the same mistake.
The original system was hardcoded for political debates with outputs like "policy bundles" and "stakeholder impacts". This doesn't make sense for technical debates (architecture decisions), strategic debates (business decisions), research debates, etc.
debate_initialisation_prompt.md:
- Added new section "DEBATE TYPES AND DELIVERABLES" after user inputs
- Defined 7 debate types: policy, technical, strategic, ethical, research, risk, general
- Each type has trigger keywords for automatic detection
- Defined MANDATORY deliverables (all types): debate_log, reflections, cruxes, matrix.json, PDF
- Defined TYPE-SPECIFIC deliverables for each type (e.g., technical debates get decision_matrix, recommendation; risk debates get risk_register, mitigation_strategies)
- Added PERSONA ADAPTATION table matching agent archetypes to debate types
- Updated Phase 0 to include debate type detection and deliverable selection
- Updated Phase 7 to produce type-appropriate outputs dynamically
- Updated PDF template to have type-specific section structures
quick_start.md:
- Added "Debate Types and Dynamic Deliverables" section with summary table
- Shows what triggers each type and what outputs are produced
_master_templates/persona_card_template.json:
- Updated to version 3.0
- Made spectrum_position flexible with examples for each debate type (policy, technical, strategic, ethical, research, risk)
- Added "background" field for professional context
- Added "key_argument" and "what_would_change_mind" to on_this_topic
- Replaced single example with multiple examples for different debate types (policy, technical, strategic, risk)
- Flexibility: System now handles technical debates, business strategy debates, risk assessments, etc. - not just political debates
- Appropriate outputs: Technical debates produce decision matrices, not policy bundles
- Better personas: Agents can be architects, CFOs, security leads - not just political positions
- Same rigor: Mandatory outputs (cruxes, reflections, PDF) ensure every debate produces comparable analysis
Debates work better when tailored to user needs. Open-ended topics need scoping, users may want specific outputs, and context matters. Rather than guessing, the orchestrator should ask a few targeted questions before starting.
quick_start.md:
- Added new STEP 4: "Clarify with the User Before Starting"
- Includes question categories: scope, outputs, context, depth
- Provides example format for asking questions
- Explains how to handle "just proceed" responses
- Renumbered all subsequent steps (STEP 5-11)
- Updated TL;DR checklist to include clarification step
debate_initialisation_prompt.md:
- Added "ASK CLARIFYING QUESTIONS" section after Phase 0 planning
- Lists question templates by category
- Emphasizes: don't over-ask, use judgment, offer to proceed with defaults
- More relevant debates: Topics get properly scoped before starting
- Better outputs: User can specify what deliverables they actually need
- Context-aware: Orchestrator learns about constraints, audience, existing decisions
- User control: Users can skip questions and proceed with defaults if they prefer
- Simple questions: Questions are quick to answer if user knows what they want
Debates were producing generic arguments from training data rather than evidence-backed positions with current statistics and recent developments. Without research, agents make assertions without citations.
debate_initialisation_prompt.md:
- Added "RESEARCH THE TOPIC FIRST" section in Phase 0
- Orchestrator must do 5-10 web searches before planning
- Research: current debate state, recent developments, stakeholder positions, key statistics
- Example searches provided
- Added "SUBAGENTS MUST DO RESEARCH" section in Phase 3
- Each agent must do 2-4 searches when writing initial position
- Must find statistics, studies, recent events, specific examples
- Provided good/bad examples of evidence-backed vs generic claims
- Updated AGENT INVOCATION TEMPLATE
- Added STEP 5: RESEARCH IF NEEDED
- Agents search during debate when making factual claims or countering opponent data
- Added "Cite evidence" to critical reminders
quick_start.md:
- Added "The #0 Mistake: No Research" as first critical lesson
- Explains orchestrator and subagent research requirements
- Good/bad examples of evidence quality
- Updated STEP 6 agent invocation template with research step
- Updated TL;DR checklist to include research step
- Evidence-backed debates: Arguments cite specific statistics, studies, and recent events
- Current information: Research pulls in 2024-2025 developments, not just training data
- Stronger arguments: Agents can cite sources when challenged
- Better synthesis: Final reports based on actual evidence, not generic claims
- Fact-checkable: Specific citations allow readers to verify claims
-
State files never updated (CRITICAL - AGAIN!): Despite existing warnings about this being the #1 failure mode, after 5 waves of debate all state files were still at defaults:
attack_registry.json:{"unresolved": [], "resolved": []}contribution_tracker.json:{"target_per_agent": 3, "agents": {}}quality_assessment.json:current_wave: 0
-
No enforcement mechanism: The warnings existed but nothing STOPPED the orchestrator from proceeding without updating. The checklist was ignored.
-
PDF generation forgotten: Phase 7.5 was skipped until explicitly requested.
-
Post-debate review (Phase 8) forgotten: Had to be prompted to do it.
-
No formal quality gates checked: Phases were advanced based on "feels done" not actual gate verification.
- Agents quoted each other extensively and made substantive rebuttals
- Genuine confrontational energy maintained (no diplomatic collapse)
- Strong persona differentiation (AURORA passionate, BUDGET numbers-driven, LOCAL skeptical)
- Position shifts occurred in reflections (AURORA conceded BUDGET's math)
- Sequential wave execution worked correctly
- Opus subagents produced high-quality roleplay
- Final synthesis was actionable and well-structured
debate_initialisation_prompt.md:
- Added HUGE warning box at the very top making state file updates the #1 documented failure
- Added "5. STATE FILES ARE NOT OPTIONAL" to the numbered critical lessons
- Added "=== ENFORCEMENT: VERIFY BEFORE NEXT WAVE ===" section with mandatory read step
- Explained WHY state tracking matters (not just that it's required)
quick_start.md:
- Upgraded #6 Mistake header to "HAPPENS EVERY TIME!" with emoji
- Added explanation of WHY orchestrators skip it (caught up in agent launching)
- Added THE ENFORCEMENT FIX with specific verification step
- Added step 6 to the checklist: "READ the state files back to verify"
- Added consequence statement: "If you skip this, your debate is useless"
The orchestrator got caught up in the "interesting" work (launching agents, reading responses) and treated state file updates as optional bookkeeping. The existing warnings were:
- Prominent but easily scrolled past
- Not enforced by any mechanism
- Disconnected from the "main" workflow
- Earlier detection: The enforcement step (read quality_assessment.json before next wave) creates a checkpoint that reveals skipped updates immediately
- Psychological reframing: Calling it "the moderator's core job" rather than "administrative overhead" may help
- Consequence awareness: Explicit statement that "your debate is useless" without tracking may motivate compliance
- Pattern recognition: Multiple entries documenting this SAME failure should make future orchestrators recognize it
This is now the THIRD documented instance of this failure. The warnings are extensive. If the next orchestrator STILL skips state updates, consider:
- Making state file updates part of the agent invocation template (do them immediately after agents complete, before any other action)
- Creating a "wave wrapper" that bundles agent launch + state update as one unit
- Adding a pre-flight check at the START of each wave that reads state files
Despite extensive warnings about state file updates, PDF generation, and research requirements, orchestrators consistently forgot these steps. Text warnings get read once at the start and then buried under debate context. We needed an active reminder system.
Created three Claude Code hooks that fire automatically when specific files are written:
| Hook | Triggers On | Reminds About |
|---|---|---|
debate-state-update.sh |
05_debate_log.md |
Update attack_registry, contribution_tracker, quality_assessment |
debate-final-phases.sh |
07_synthesis.md |
Phase 7.5 (PDF) and Phase 8 (review) are mandatory |
debate-research-reminder.sh |
03_positions.md |
Do web research before writing positions |
.claude/hooks/debate-state-update.sh.claude/hooks/debate-final-phases.sh.claude/hooks/debate-research-reminder.sh.claude/settings.json(hook configuration)_master_templates/completion_checklist.json(phase tracking)
debate_initialisation_prompt.md: Added checklist copy to Phase 0.5 init, added hooks notificationquick_start.md: Added checklist copy, added common scenario for hookspost_debate_review.md: This entry
The hooks are PostToolUse hooks that match on the Write tool. Each hook script:
- Receives JSON input via stdin with the file_path
- Checks if the file_path matches their target pattern (regex)
- Outputs a reminder box if matched, otherwise silent
- Always exits 0 (allows flow to continue, just provides reminder)
- Just-in-time reminders: Reminders appear exactly when relevant, not buried at the start
- Cannot be ignored by reading: The reminder appears after the action, forcing acknowledgment
- Cumulative with warnings: Hooks supplement (not replace) documentation warnings
- Reduces cognitive load: Orchestrator doesn't need to remember everything upfront
- Hooks remind but don't enforce - orchestrator can still ignore
- Only fires on Write tool, not on reading or other operations
- Requires hooks to be installed in user's claude config
- May produce redundant reminders if instructions were already followed
A. Agent Behavior ✅ ALL PASSED
- Agents quoted each other directly and rebutted specific claims (14+ attacks registered)
- Agents used emotional, confrontational language ("spectacular self-refutation", "your God is morphine", "bad faith dressed in robes")
- Agents avoided forbidden phrases (no "I understand your point..." observed)
- Agents defended themselves when attacked (NIHILIST vs THEIST on self-refutation, etc.)
- Agents stayed in character throughout (10 distinct philosophical voices)
- Personas were distinct and authentic (BUDDHIST's questioning, SUFI's poetry, NIHILIST's bleakness)
B. Debate Dynamics ✅ ALL PASSED
- Genuine back-and-forth exchanges (self-refutation debate spanned 3+ waves)
- Debate felt heated ("spectacular self-refutation", "philosophical suicide", "morphine not medicine")
- Surprising moments (NIHILIST conceding self-refutation tension, ABSURDIST acknowledging overlap with HUMANIST)
- Real position shifts in reflections (8 documented shifts including NIHILIST, NATURALIST, EXISTENTIALIST)
- Substantive contributions (5-8 sentences consistently, rich philosophical content)
C. System Compliance ✅ ALL PASSED
- Sequential waves (2-4 agents per wave, 6 waves total)
- State files updated after each wave (quality_assessment shows current_wave: 6)
- Attack registry used (14 attacks tracked)
- Quality gates checked before advancing (ready_for_next_phase verified)
- Opus subagents used throughout
- Agents read persona cards and debate rules (included in prompts)
D. Output Quality ✅ ALL PASSED
- Synthesis captures key disagreements (5 axes, 8 position shifts)
- Cruxes identified (4 empirical, 5 value conflicts)
- PDF generated successfully (31KB, 7 pages)
-
Minor: Attack registry resolution tracking incomplete - 14 attacks registered as unresolved but 0 moved to resolved, even though many were addressed in the debate. Should track resolutions more diligently.
-
Minor: Contribution tracker counts not fully accurate - Most agents show 1-3 contributions but NIHILIST shows 3 (satisfied) despite multiple appearances. Tracking was maintained but not perfectly precise.
-
STATE FILE TRACKING WORKED - The four-step wave definition successfully ensured state tracking through all 6 waves
-
Exceptional philosophical depth - Agents engaged with genuine philosophical arguments (self-refutation, anatta, absurdist/existentialist distinction)
-
Strong reflections with real intellectual movement:
- NIHILIST: "I cannot coherently privilege my 'clear-eyed' nihilism without smuggling in values I claim to reject"
- NATURALIST: "I cannot coherently value truth-seeking while claiming values are illusions"
- ABSURDIST: "The distinction between revolt and meaning-creation may be more aesthetic than philosophical"
-
Research integration - Agents cited philosophers (Camus, Sartre, Epicurus, Marcus Aurelius), traditions (Theravada, Sufi), and made sophisticated arguments
-
Cruxes were substantive - Both empirical (self-refutation problem, hard problem of consciousness) and value-based (truth vs wellbeing, permanence vs presence)
-
PDF generated successfully - Professional 7-page report with XeLaTeX
None required - the system worked as designed for philosophical debates.
This successful run validates that:
- The four-step wave definition continues to prevent state tracking failures
- The system handles philosophical/ethical debates as well as political/technical debates
- 10 agents can maintain distinct voices through 6 waves
- Opus subagents produce sophisticated philosophical engagement
A. Agent Behavior ✅ ALL PASSED
- Agents quoted each other directly and rebutted specific claims (20+ quote-rebut exchanges)
- Agents used emotional, confrontational language ("That's simply wrong!", "absurd", "smoke and mirrors")
- Agents avoided forbidden phrases (no "I understand your point..." type phrases)
- Agents defended themselves when attacked (GOOGLE_BULL vs ANALYST, ANTHROPIC_FAN vs OPEN_SOURCE)
- Agents stayed in character throughout (8 distinct personas maintained)
- Personas were distinct and authentic (BEAR_OAI financial focus vs SAFETY_HAWK ethical focus)
B. Debate Dynamics ✅ ALL PASSED
- Genuine back-and-forth exchanges (BEAR_OAI ↔ INVESTOR had 3+ turns on WeWork analogy)
- Debate felt heated, not like polite panel (multiple "outrageous", "absurd" exchanges)
- Surprising moments occurred (INVESTOR conceding WeWork parallel "has merit")
- Real position shifts in reflections (SAFETY_HAWK admitted no coordination mechanism)
- Substantive contributions (5-8 sentences consistently)
C. System Compliance ✅ ALL PASSED
- Sequential waves (2-4 agents per wave, 5 waves total)
- State files updated after each wave (quality_assessment shows current_wave: 5)
- Attack registry used (15 attacks tracked: 11 unresolved, 4 resolved)
- Quality gates checked before advancing (ready_for_next_phase verified)
- Opus subagents used throughout
- Agents read persona cards and debate rules (included in prompts)
D. Output Quality ✅ ALL PASSED
- Synthesis captures key disagreements (6 cruxes identified)
- Recommendations grounded in debate (predictions tied to positions)
- Core cruxes identified (Platform vs Commodity, Unit Economics, Safety as Moat, etc.)
-
Minor: Attack registry had more unresolved than resolved - 11 unresolved vs 4 resolved attacks. Some attacks were responded to but not formally tracked as resolved. Future improvement: be more diligent about moving attacks to "resolved" when responses occur.
-
Minor: Wave 4-5 state updates less detailed - Earlier waves had more detailed tracking. By Wave 4-5, updates were correct but briefer. Not a problem, but shows slight fatigue.
-
STATE FILE TRACKING WORKED - For the first time in documented history, the four-step wave definition resulted in proper state tracking through all 5 waves.
-
Research integration excellent - Agents cited specific statistics (80.9% SWE-bench, $207B funding gap, 2B AI Overviews users) from actual web research.
-
Strong reflections with real concessions - Every agent identified something they learned:
- BULL_OAI: "$207B funding gap is genuine structural risk"
- BEAR_OAI: "OpenAI has real technology unlike WeWork"
- SAFETY_HAWK: "I offered no realistic mechanism for coordinated slowdown"
- INVESTOR: "My smart money argument echoed WeWork bull rhetoric"
-
Cruxes were empirically testable - All 6 cruxes have falsification conditions that 2026 will answer.
-
PDF generated successfully - 6-page professional report with LaTeX.
None required - the system worked as designed!
This successful run validates the "Wave Redefinition" change made earlier today. The four-step wave definition (A: Launch, B: Analyze, C: Update State, D: Plan) successfully prevented the state tracking failures that plagued all previous debates.
Keep the current system design. The key insight that worked: reframing state updates as integral to wave completion rather than post-wave administrative work.
-
HOOKS DID NOT TRIGGER (CRITICAL): Despite hooks being configured in
.claude/settings.json, no reminder messages appeared after writing to05_debate_log.md,03_positions.md, or07_synthesis.md. Root cause: the hook commands used relative paths (.claude/hooks/script.sh) which don't work if Claude is invoked from a different working directory. Fixed by using$CLAUDE_PROJECT_DIRenvironment variable. -
State file tracking stopped at Wave 3: Despite running 5 waves,
quality_assessment.jsonshowscurrent_wave: 3andcontribution_tracker.jsondoesn't reflect Wave 4 and 5 contributions. The orchestrator stopped updating state files midway through the debate. -
Ran unnecessary
rmcommand: The orchestrator ranrm -f *.aux *.log...to clean up LaTeX files, even though:- The documentation explicitly says "no rm needed - latexmk has built-in cleanup"
- The command template already includes
latexmk -cwhich cleans auxiliary files - The rm command failed because the aux files were in a different directory
-
PDF naming inconsistent: Generated
debate_report.pdfinstead of the documented08_final_report_eu_ai_competitiveness_2025.pdfformat. -
Post-debate review forgotten: Had to be explicitly prompted to run Phase 8.
- Excellent agent behavior: All 11 agents quoted opponents directly, used confrontational language, stayed in character, and made substantive arguments (5-8 sentences)
- Genuine back-and-forth: Multiple quote-and-rebut exchanges occurred (15+ by Wave 3 count)
- Research integration: Agents did web searches and cited specific statistics (e.g., "$109B US investment", "23% startup relocation", "60% PhD exodus")
- Real position shifts: Reflections showed genuine learning (RACER admitted ACADEMIC "demolished" his argument)
- Opus quality: Using Opus subagents produced sophisticated, nuanced political roleplay
- Sequential execution: Waves were properly sequential (2-4 agents at a time), not parallel
- Synthesis quality: Final synthesis identified unexpected convergences and actionable recommendations
.claude/settings.json:
- Changed hook paths from relative (
.claude/hooks/...) to use$CLAUDE_PROJECT_DIRenvironment variable - Increased timeout from 3 to 5 seconds for hook execution
- Example:
"command": "$CLAUDE_PROJECT_DIR/.claude/hooks/debate-state-update.sh"
quick_start.md (to be updated):
- Add troubleshooting section for hooks not triggering
post_debate_review.md:
- This entry documenting the issues
Why state tracking stopped: The orchestrator got caught up in the "interesting" work of launching agents and reading their responses. State file updates feel like "administrative overhead" rather than core work. Despite MULTIPLE warnings in documentation, this pattern repeats because:
- The reminder to update state files comes AFTER launching agents
- By then, the orchestrator is already planning the next wave
- There's no enforcement mechanism - hooks were supposed to provide reminders but didn't trigger
- Hook paths fixed: Using
$CLAUDE_PROJECT_DIRensures hooks work regardless of working directory - Pattern documented: This is now the FOURTH documented instance of state tracking failure
- Redefine what a "wave" is: Make state updates an integral part of the wave definition, not an afterthought
- Add pre-wave verification: Before launching Wave N, verify
current_wave == N-1in quality_assessment.json - abort if stale - PDF naming enforcement: Add the topic slug to a variable early and reference it consistently
State file tracking has failed in EVERY documented debate (4 instances). Previous approaches that didn't work:
- Prominent warnings in documentation (read once, then ignored)
- Post-wave checklists (treated as optional)
- "Verify before next wave" instructions (not followed)
- Hooks to remind (didn't trigger due to approval requirements)
The root problem: Orchestrators mentally model "wave" as "launch agents" and treat state updates as separate administrative overhead that can be skipped.
Redefined what a "wave" is. A wave is no longer "launch agents" - it's a four-step cycle:
STEP A: Launch 2-4 agents → wait for responses
STEP B: Read and analyze new contributions
STEP C: Update ALL state files
STEP D: Evaluate gates and plan next wave
A wave is NOT complete until Step D is done. Skipping Steps B-D means you haven't completed a wave - you've just launched agents into the void.
Added pre-wave verification as a HARD GATE:
Before launching Wave N, read quality_assessment.json and verify current_wave == N-1. If not, ABORT - you have incomplete waves.
debate_initialisation_prompt.md:
- Removed hooks notification section (user-targeted instructions don't belong here)
- Added "WHAT IS A WAVE?" box defining the four-step cycle
- Updated sequential wave execution to reference Steps A→B→C→D
- Renamed "MODERATOR UPDATE ROUTINE" to "WAVE STEPS B-C-D" (reframing as integral, not afterthought)
- Replaced post-wave checklist with "Wave Completion Check"
- Added "PRE-WAVE VERIFICATION (ABORT ON STALE STATE)" with hard gate
- Added "WHY THIS MATTERS" section explaining consequences
quick_start.md:
- Renamed #6 mistake to "Treating State Updates as Optional" (from "Skipping State File Updates")
- Added WRONG vs CORRECT mental model comparison
- Added four-step wave definition
- Updated STEP 7 from "After Each Wave - Update State" to "Understanding Waves (The Four-Step Cycle)"
- Replaced hook troubleshooting sections with "I forgot to do state updates—what now?"
- Updated TL;DR checklist with four-step wave and hard gate verification
post_debate_review.md:
- Removed user-targeted instructions (like "run /hooks")
- Updated recommendations to reflect implemented changes
- Mental model shift: By redefining "wave" to include state updates, orchestrators can't think of them as optional
- Hard gate enforcement: Pre-wave verification creates an actual checkpoint that reveals skipped updates
- Clearer consequences: "You haven't completed a wave" is more compelling than "you skipped administrative work"
- Self-documenting: The four-step structure is repeated multiple times in multiple files
Hooks remain in the codebase as a backup reminder system. They work when approved, but the documentation no longer relies on them. The four-step wave definition and pre-wave hard gate should be sufficient even without hooks triggering.