feat(M293): wire PHASE6_MAX_CONSECUTIVE_TEXT_TURNS env var into phase-6-bench.sh#261
Merged
Conversation
…-6-bench.sh Exposes M292's `--max-consecutive-text-turns` CLI flag at the operator script surface. Default 0 (disabled) preserves all historical evidence comparisons (M270/M280/M287/M291) — turning on the detector changes outcome distributions, so existing apples-to-apples baselines must stay opt-out by default. Future bench operators can enable with: PHASE6_MAX_CONSECUTIVE_TEXT_TURNS=5 bash scripts/phase-6-bench.sh to short-circuit text-only-loop fixtures at turn 5 instead of paying the full 20-turn × ~72s/turn ≈ 24min/fixture cost. On 20-fixture corpus, that's ~6hr of bench wall saved per future V1_004 follow-up dispatch where the model is text-only-locked. Companion to aprender's V1_004 chain: - aprender#1849 few-shot prompt - aprender#1852 EOS stop_token + clean_chat_output wire-up - aprender#1853 (in flight) clean_chat_output start-of-string strip - M292 (just merged) ArenaOutcome::AgentTextLoop variant + opt-in detector No M-counter bump — Phase 6 in active bench-run state; surface bumps wait for V1_004 discharge or final pattern conclusion. Refs: - evidence/phase-6/v1004-agent-text-loop-detector-2026-05-21.md (M292) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Merged
6 tasks
noahgift
added a commit
that referenced
this pull request
May 22, 2026
Adds: - book/ — mdBook source for paiml.github.io/claude-code-parity-apr - .github/workflows/book.yml — CI build + GitHub Pages auto-deploy - README.md restructured for professional landing (badges row, book callout, empirical highlight section, deep-links to book chapters) - .gitignore — book/book/ (generated artifact) Book structure (28 chapters): - Introduction - Overview: what is CCPA, methodology, two paths, architecture - Static path: trace schema, differ, fixtures, bidirectional sensitivity - Arena: overview, phase 5, phase 6, outcome variants - Falsification gates: 20 gates, source-of-truth, behavioral parity, status flow - Empirical findings: V1_004 chain (M286, M287, M291, M292, M294) - Reference: CLI, trace schema, contract YAML, gate IDs - Appendix: academic basis, milestone history, glossary Build locally: mdbook build book/ -> book/book/index.html Deploy: GitHub Pages auto-deploys on push to main when book/ changes. Doc-drift detector: 17/17 drift classes pass. Refs: - evidence/phase-6/v1004-*.md (all sourced into book chapters) - CCPA#259 M291, #260 M292, #261 M293, #262 M294 scope Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PHASE6_MAX_CONSECUTIVE_TEXT_TURNSenv var (default0= disabled) that surfaces M292's--max-consecutive-text-turnsCLI flag at the operator script.--max-consecutive-text-turns=Nwhen N>0.Why
M292 added
ArenaOutcome::AgentTextLoop+ opt-in detector at theArenaSessionlevel. This PR is the operator-facing surface: enablesPHASE6_MAX_CONSECUTIVE_TEXT_TURNS=5 bash scripts/phase-6-bench.shto short-circuit text-only-loop fixtures at turn 5 instead of paying ~24min/fixture × 20 fixtures ≈ ~8hr of wall.What this does NOT do
Test plan
bash -n scripts/phase-6-bench.sh— syntax cleancargo test -p ccpa-arena --lib— 146 lib tests still pass (no code touched, just bench script)bash scripts/check-doc-drift.sh— 17/17 drift classesCross-references
evidence/phase-6/v1004-agent-text-loop-detector-2026-05-21.mdevidence/phase-6/v1004-sub-bench-b-pattern-shift-2026-05-21.md🤖 Generated with Claude Code