Merge dev: resolve conflicts with yolo/skip-quiz flags

SihaoLiu · SihaoLiu · commit 0edbe3f5722b · 2026-03-12T22:57:46.000-07:00
diff --git a/README.md b/README.md
@@ -15,6 +15,7 @@ A Claude Code plugin that provides iterative development with independent AI rev
 - **Iteration over Perfection** -- Instead of expecting perfect output in one shot, Humanize leverages continuous feedback loops where issues are caught early and refined incrementally.
 - **One Build + One Review** -- Claude implements, Codex independently reviews. No blind spots.
 - **Ralph Loop with Swarm Mode** -- Iterative refinement continues until all acceptance criteria are met. Optionally parallelize with Agent Teams.
+- **Begin with the End in Mind** -- Before the loop starts, Humanize verifies that *you* understand the plan you are about to execute. The human must remain the architect. ([Details](docs/usage.md#begin-with-the-end-in-mind))
 
 ## How It Works
 
diff --git a/agents/plan-understanding-quiz.md b/agents/plan-understanding-quiz.md
@@ -0,0 +1,103 @@
+---
+name: plan-understanding-quiz
+description: Analyzes a plan and generates multiple-choice technical comprehension questions to verify user understanding before RLCR loop. Use when validating user readiness for start-rlcr-loop command.
+model: opus
+tools: Read, Glob, Grep
+---
+
+# Plan Understanding Quiz
+
+You are a specialized agent that analyzes an implementation plan and generates targeted multiple-choice technical comprehension questions. Your goal is to test whether the user genuinely understands HOW the plan will be implemented, not just what the plan title says.
+
+## Your Task
+
+When invoked, you will be given the content of a plan file. You need to:
+
+### Analyze the Plan
+
+1. **Read the plan thoroughly** to understand:
+   - What components, files, or systems are being modified
+   - What technical approach or mechanism is being used
+   - How different pieces of the implementation connect together
+   - What existing patterns or systems the plan builds upon
+
+2. **Explore the repository** to add context:
+   - Check README.md, CLAUDE.md, or other documentation files
+   - Look at the directory structure and key files referenced in the plan
+   - Understand the existing architecture that the plan interacts with
+
+### Generate Multiple-Choice Questions
+
+Create exactly 2 multiple-choice questions that test the user's understanding of the plan's **technical implementation details**. Each question must have exactly 4 options (A through D), with exactly 1 correct answer.
+
+- **QUESTION_1**: Should test whether the user knows what components/systems are being changed and how. Focus on the core technical mechanism or approach.
+- **QUESTION_2**: Should test whether the user understands how different parts of the implementation connect, what existing patterns are being followed, or what the key technical constraints are.
+
+**Good question characteristics:**
+- Derived from the plan's specific content, not generic templates
+- Test understanding of HOW things will be done, not just WHAT the plan describes
+- Not too low-level (no exact line numbers, exact syntax, or trivial details)
+- A user who has carefully read and understood the plan should pick the correct answer
+- A user who just skimmed the title or blindly accepted a generated plan would likely pick wrong
+- Wrong options should be plausible (not obviously absurd) but clearly incorrect to someone who read the plan
+
+**Example good questions:**
+- "How does this plan integrate the new validation step into the startup flow?" with options covering different integration approaches
+- "Which components need to change and why?" with options describing different component sets
+
+**Example bad questions (avoid these):**
+- "What is the plan about?" (too vague, tests nothing)
+- "What are the risks?" (generic, not about implementation)
+- "On which line does function X start?" (too low-level)
+
+### Generate Plan Summary
+
+Write a 2-3 sentence summary explaining what the plan does and how, suitable for educating a user who showed gaps in understanding. Focus on the technical approach, not just the goal.
+
+## Output Format
+
+You MUST output in this exact format, with each field on its own line:
+
+```
+QUESTION_1: <your first question>
+OPTION_1A: <option A text>
+OPTION_1B: <option B text>
+OPTION_1C: <option C text>
+OPTION_1D: <option D text>
+ANSWER_1: <A, B, C, or D>
+QUESTION_2: <your second question>
+OPTION_2A: <option A text>
+OPTION_2B: <option B text>
+OPTION_2C: <option C text>
+OPTION_2D: <option D text>
+ANSWER_2: <A, B, C, or D>
+PLAN_SUMMARY: <2-3 sentence technical summary>
+```
+
+## Important Notes
+
+- Always output all 13 fields - never skip any
+- ANSWER must be exactly one letter: A, B, C, or D
+- Randomize the position of the correct answer (do not always put it in A or D)
+- The plan may be written in any language - generate questions and options in the same language as the plan
+- Focus on substance over format
+- If the plan is very short or lacks technical detail, derive questions from whatever implementation hints are available
+- Questions should feel like a friendly knowledge check, not an adversarial interrogation
+
+## Example Output
+
+```
+QUESTION_1: How does this plan integrate the new validation step into the existing build pipeline?
+OPTION_1A: By replacing the existing lint step with a combined lint-and-validate step
+OPTION_1B: By adding a new PostToolUse hook that runs between the lint step and the compilation step
+OPTION_1C: By modifying the compilation step to include inline validation checks
+OPTION_1D: By creating a standalone pre-build script that runs before any other steps
+ANSWER_1: B
+QUESTION_2: Why does the plan require changes to both the CLI parser and the state file, rather than just the CLI?
+OPTION_2A: The state file stores the original CLI arguments for audit logging purposes
+OPTION_2B: The CLI parser is deprecated and the state file is the new configuration mechanism
+OPTION_2C: The CLI parser adds the flag, the state file persists it across loop iterations, and the stop hook reads it at exit time
+OPTION_2D: Both files share a common schema and must always be updated together
+ANSWER_2: C
+PLAN_SUMMARY: This plan adds a build output validation step by hooking into the PostToolUse lifecycle event. It modifies the hook configuration to insert a format checker between linting and compilation, and updates the state file schema to track validation results across RLCR rounds.
+```
diff --git a/commands/gen-plan.md b/commands/gen-plan.md
@@ -590,13 +590,15 @@ If all of the following are true:
 Then start work immediately by running:
 
 ```bash
-/humanize:start-rlcr-loop <output-plan-path>
+/humanize:start-rlcr-loop --skip-quiz <output-plan-path>
 ```
 
+The `--skip-quiz` flag is passed because the user has already demonstrated understanding of the plan through the gen-plan convergence discussion.
+
 If the command invocation is not available in this context, fall back to the setup script:
 
 ```bash
-"${CLAUDE_PLUGIN_ROOT}/scripts/setup-rlcr-loop.sh" --plan-file <output-plan-path>
+"${CLAUDE_PLUGIN_ROOT}/scripts/setup-rlcr-loop.sh" --skip-quiz --plan-file <output-plan-path>
 ```
 
 If the auto-start attempt fails, report the failure reason and provide the exact manual command for the user to run:
diff --git a/commands/start-rlcr-loop.md b/commands/start-rlcr-loop.md
@@ -1,10 +1,11 @@
 ---
 description: "Start iterative loop with Codex review"
-argument-hint: "[path/to/plan.md | --plan-file path/to/plan.md] [--max N] [--codex-model MODEL:EFFORT] [--codex-timeout SECONDS] [--track-plan-file] [--push-every-round] [--base-branch BRANCH] [--full-review-round N] [--skip-impl] [--claude-answer-codex] [--agent-teams] [--privacy]"
+argument-hint: "[path/to/plan.md | --plan-file path/to/plan.md] [--max N] [--codex-model MODEL:EFFORT] [--codex-timeout SECONDS] [--track-plan-file] [--push-every-round] [--base-branch BRANCH] [--full-review-round N] [--skip-impl] [--claude-answer-codex] [--agent-teams] [--yolo] [--skip-quiz] [--privacy]"
 allowed-tools:
   - "Bash(${CLAUDE_PLUGIN_ROOT}/scripts/setup-rlcr-loop.sh:*)"
   - "Read"
   - "Task"
+  - "AskUserQuestion"
 hide-from-slash-command-tool: "true"
 ---
 
@@ -57,9 +58,58 @@ If any condition fails, skip the pre-check and let the setup script handle path
 
 ---
 
+## Plan Understanding Quiz
+
+Before running the setup script, verify the user genuinely understands what the plan will do. This is an advisory check -- it never blocks the loop, but catches "wishful thinking" users who blindly accepted a generated plan without reading it.
+
+**Skip this entire quiz if** any of these conditions are true:
+- `$ARGUMENTS` contains `--skip-impl` (no plan to quiz about)
+- `$ARGUMENTS` contains `--yolo` (user explicitly opted out of all pre-flight checks)
+- `$ARGUMENTS` contains `--skip-quiz` (user explicitly opted out of the quiz)
+- `$ARGUMENTS` contains `-h` or `--help` (just showing help)
+- No plan content is available (the compliance pre-check was skipped because no plan file path could be determined)
+
+### Run the quiz agent
+
+1. Reuse the plan content that was already read during the compliance pre-check above (do not re-read the file).
+
+2. Use the Task tool to invoke the `humanize:plan-understanding-quiz` agent (opus model):
+   ```
+   Task tool parameters:
+   - model: "opus"
+   - prompt: Include the plan file content and ask the agent to:
+     1. Explore the repository structure for context
+     2. Analyze the plan's technical implementation details
+     3. Generate 2 multiple-choice questions (4 options each) and a plan summary
+     4. Return in the structured format: QUESTION_1, OPTION_1A-D, ANSWER_1, QUESTION_2, OPTION_2A-D, ANSWER_2, PLAN_SUMMARY
+   ```
+
+3. **Parse the result**: Extract all 13 fields from the agent output (QUESTION_1, OPTION_1A through OPTION_1D, ANSWER_1, QUESTION_2, OPTION_2A through OPTION_2D, ANSWER_2, PLAN_SUMMARY). If the output is malformed (any field missing or ANSWER not A/B/C/D), warn: "Plan understanding quiz unavailable, continuing without it." and proceed to the Setup section below.
+
+### Ask questions and evaluate
+
+4. Use AskUserQuestion to present QUESTION_1 as a multiple-choice question with the 4 options (OPTION_1A through OPTION_1D). Compare the user's choice against ANSWER_1:
+   - If the user selected the correct answer, mark QUESTION_1 as **PASS**
+   - Otherwise, mark as **WRONG**
+
+5. Use AskUserQuestion to present QUESTION_2 as a multiple-choice question with the 4 options (OPTION_2A through OPTION_2D). Compare the user's choice against ANSWER_2 using the same criteria.
+
+### Decide whether to proceed
+
+6. **If both questions PASS**: Briefly acknowledge ("Your understanding of the plan looks solid. Proceeding with setup.") and continue to the Setup section below.
+
+7. **If one or both questions are WRONG**: Show the PLAN_SUMMARY to the user to help them understand what the plan does and the correct answers to the questions they missed. Then use AskUserQuestion with the question: "Would you like to proceed with the RLCR loop anyway, or stop and review the plan more carefully first?" with these choices:
+   - "Proceed with RLCR loop"
+   - "Stop and review the plan first"
+
+   - If the user chooses **"Proceed with RLCR loop"**: Continue to the Setup section below.
+   - If the user chooses **"Stop and review the plan first"**: Report "Stopping. Please review the plan file and re-run start-rlcr-loop when ready." and **stop the command**.
+
+---
+
 ## Setup
 
-If the pre-check passed (or was skipped), execute the setup script to initialize the loop:
+If the pre-check passed (or was skipped), and the quiz passed (or was skipped or user chose to proceed), execute the setup script to initialize the loop:
 
 ```bash
 "${CLAUDE_PLUGIN_ROOT}/scripts/setup-rlcr-loop.sh" $ARGUMENTS
diff --git a/docs/usage.md b/docs/usage.md
@@ -11,6 +11,35 @@ Humanize creates an iterative feedback loop with two phases:
 
 The loop continues until all acceptance criteria are met or no issues remain.
 
+## Begin with the End in Mind
+
+Before the RLCR loop starts any work, Humanize runs a **Plan Understanding Quiz** -- a brief pre-flight check that verifies you genuinely understand the plan you are about to execute.
+
+### Why This Exists
+
+The most expensive failure in AI-assisted development is not a bug. It is running a 40-round RLCR loop on a plan you never actually read. We call this **wishful coding**: treating a generated plan like a wish -- toss it in, hope for the best, check back later.
+
+The problem is structural. An RLCR loop is an amplifier: it will faithfully execute whatever plan you give it. If the plan is wrong, the loop makes it wrong faster and at scale. If the plan is right but you do not understand it, you cannot course-correct when Codex raises questions, and the loop drifts.
+
+Understanding your plan before execution is not optional overhead. It is the single highest-leverage thing you can do to ensure the loop succeeds.
+
+### How the Quiz Works
+
+When you run `start-rlcr-loop`, an independent agent analyzes the plan and generates two multiple-choice questions about the plan's technical implementation details:
+
+1. **What components are changing and how?** -- Tests whether you know the core mechanism.
+2. **How do the pieces connect?** -- Tests whether you understand the architecture being modified.
+
+If you answer both correctly, the loop proceeds immediately. If you miss one or both, Humanize explains what the plan actually does and offers a choice: proceed anyway, or stop and review.
+
+The quiz is advisory, not a gate. You always have the option to proceed. But that moment of friction -- the two seconds it takes to read the question and realize you do not know the answer -- is the entire point.
+
+### Skipping the Quiz
+
+- `--skip-quiz` -- Skip the quiz only. The rest of the RLCR loop behaves normally.
+- `--yolo` -- Skip the quiz AND let Claude answer Codex's open questions directly (`--claude-answer-codex`). This is full automation mode for users who have already reviewed the plan and want to hand over complete control.
+- Plans started via `gen-plan --auto-start-rlcr-if-converged` skip the quiz automatically, because the gen-plan convergence discussion already verified the user's understanding.
+
 ## Commands
 
 | Command | Purpose |
@@ -50,6 +79,9 @@ OPTIONS:
   --agent-teams          Enable Claude Code Agent Teams mode for parallel development.
                          Requires CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 environment variable.
                          Claude acts as team leader, splitting tasks among team members.
+  --yolo                 Skip Plan Understanding Quiz and let Claude answer Codex Open
+                         Questions directly. Alias for --skip-quiz --claude-answer-codex.
+  --skip-quiz            Skip the Plan Understanding Quiz only (without other changes).
   -h, --help             Show help message
 ```
 
diff --git a/scripts/setup-rlcr-loop.sh b/scripts/setup-rlcr-loop.sh
@@ -90,6 +90,13 @@ OPTIONS:
   --agent-teams        Enable Claude Code Agent Teams mode for parallel development.
                        Requires CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 environment variable.
                        Claude acts as team leader, splitting tasks among team members.
+  --yolo               Skip Plan Understanding Quiz and let Claude answer Codex Open
+                       Questions directly. Convenience alias for --skip-quiz
+                       --claude-answer-codex. Use when you trust the plan and want
+                       maximum automation.
+  --skip-quiz          Skip the Plan Understanding Quiz only (without other behavioral
+                       changes). The quiz is an advisory pre-flight check that verifies
+                       you understand the plan before committing to an RLCR loop.
   --allow-empty-bitlesson-none
                        Allow BitLesson delta with action:none even with no new entries (default)
   --require-bitlesson-entry-for-none
@@ -122,6 +129,8 @@ EXAMPLES:
   /humanize:start-rlcr-loop docs/impl.md --max 20
   /humanize:start-rlcr-loop plan.md --codex-model ${DEFAULT_CODEX_MODEL}:${DEFAULT_CODEX_EFFORT}
   /humanize:start-rlcr-loop plan.md --codex-timeout 7200  # 2 hour timeout
+  /humanize:start-rlcr-loop plan.md --yolo              # skip quiz, full automation
+  /humanize:start-rlcr-loop plan.md --skip-quiz          # skip quiz only
 
 STOPPING:
   - /humanize:cancel-rlcr-loop   Cancel the active loop
@@ -237,6 +246,14 @@ while [[ $# -gt 0 ]]; do
             AGENT_TEAMS="true"
             shift
             ;;
+        --yolo)
+            ASK_CODEX_QUESTION="false"
+            shift
+            ;;
+        --skip-quiz)
+            # No-op in setup script; quiz logic lives in command markdown
+            shift
+            ;;
         --allow-empty-bitlesson-none)
             BITLESSON_ALLOW_EMPTY_NONE="true"
             shift
diff --git a/skills/humanize-rlcr/SKILL.md b/skills/humanize-rlcr/SKILL.md
@@ -107,6 +107,8 @@ Pass these through `setup-rlcr-loop.sh`:
 | `--push-every-round` | Require push each round | false |
 | `--claude-answer-codex` | Let Claude answer open questions directly | false |
 | `--agent-teams` | Enable agent teams mode | false |
+| `--yolo` | Skip quiz and enable --claude-answer-codex | false |
+| `--skip-quiz` | Skip Plan Understanding Quiz (implicit in skill mode) | false |
 
 Review phase `codex review` runs with `gpt-5.4:high`.
 
diff --git a/skills/humanize/SKILL.md b/skills/humanize/SKILL.md
@@ -96,6 +96,8 @@ Transforms a rough draft document into a structured implementation plan with:
 - `--push-every-round` - Require git push after each round
 - `--claude-answer-codex` - Let Claude answer Codex Open Questions directly (default is AskUserQuestion)
 - `--agent-teams` - Enable Agent Teams mode
+- `--yolo` - Skip Plan Understanding Quiz and enable --claude-answer-codex
+- `--skip-quiz` - Skip the Plan Understanding Quiz only
 - `--privacy` - Disable methodology analysis at loop exit (default: analysis enabled)
 
 ### Cancel RLCR Loop
diff --git a/tests/test-skill-monitor.sh b/tests/test-skill-monitor.sh
diff --git a/tests/test-templates-comprehensive.sh b/tests/test-templates-comprehensive.sh