feat(M292): Agent-Text-Loop detector — closes M291 Gap 3 by noahgift · Pull Request #260 · paiml/claude-code-parity-apr

noahgift · 2026-05-21T01:11:10Z

Summary

New ArenaOutcome::AgentTextLoop { consecutive_text_turns, last_text_excerpt } variant — captures the "talking but not acting" failure class distinctly from OracleFailedAfterMaxTurns.
New ArenaSession::with_max_consecutive_text_turns(cap) builder + --max-consecutive-text-turns CLI flag. Opt-in (default 0 = disabled, preserves M287/M291 baseline).
New AgentTextLoopState rolling counter, parallel to ComplianceTrapState.
7 new tests in session::tests (state machine + integration via MockDriver).

Why

V1_004 sub-bench B fixture 1 (M291) recorded 20 consecutive text-only turns on Qwen3-Coder-30B — every invocation.kind = "text", every result.kind = "skipped". The agent emitted prose + Markdown blocks for the entire 20-turn budget without ever touching the file system.

Without M292, bench operators pay ~8hr of bench wall to discover a pattern that could be diagnosed at turn 5 (~2hr — a 4× speedup). The post-hoc OracleFailedAfterMaxTurns outcome also conflates "agent worked but produced wrong output" with "agent never engaged the toolchain." M292 separates them.

What this does NOT do

NOT auto-enable in scripts/phase-6-bench.sh — operator decides per-run.
NOT change compliance_cost_ratio / recovery_rate semantics — AgentTextLoop is a new variant; aggregates treat it as "not oracle_passed."
NOT discharge V1_004 — student_pass_rate > 0 is still the bar.
NOT bump M-counter — Phase 6 in active bench-run state.

Test plan

cargo test -p ccpa-arena --lib agent_text_loop — 7/7 new tests pass
cargo test -p ccpa-arena --lib — all 146 lib tests pass
cargo clippy -p ccpa-arena --lib --tests --bins -- -D warnings — clean
cargo fmt --all -- --check — clean
bash scripts/check-doc-drift.sh — 17/17 drift classes
CI green

Cross-references

M291 evidence (motivation): evidence/phase-6/v1004-sub-bench-b-pattern-shift-2026-05-21.md
M292 evidence (this PR): evidence/phase-6/v1004-agent-text-loop-detector-2026-05-21.md
aprender#1853 (M291 Gap 1 fix; in flight)

🤖 Generated with Claude Code

…opt-in cap Closes M291 Gap 3 (arena driver doesn't recover from skipped turns). Motivated by V1_004 sub-bench B fixture-1 pattern on Qwen3-Coder-30B-A3B: 20 consecutive text-only turns (every turn invocation.kind = "text", every result.kind = "skipped") — the agent never invoked any tool. This PR adds: 1. `ArenaOutcome::AgentTextLoop { consecutive_text_turns, last_text_excerpt }` variant — captures the "talking but not acting" failure class distinctly from `OracleFailedAfterMaxTurns`. 2. `ArenaSession::with_max_consecutive_text_turns(cap)` builder. cap=0 (default) disables the detector — preserves M287/M291 baseline. 3. `AgentTextLoopState` rolling counter (parallel to ComplianceTrapState): text invocation increments, non-text resets, cap triggers AgentTextLoop. 4. `--max-consecutive-text-turns` CLI flag on ccpa-arena-bench (default 0). 5. 7 new tests in session::tests. Opt-in by design: enabling by default would shift outcome distributions for existing evidence comparisons. Operator decides per-run whether to trade off early-bailout savings (~6hr × 20 fixtures for V1_004 future runs) vs uniform 20-turn dispatch baseline. What this does NOT do: - Auto-enable in scripts/phase-6-bench.sh (operator-coordinated decision) - Change compliance_cost_ratio / recovery_rate aggregate semantics - Discharge V1_004 (still requires student_pass_rate > 0) - Bump M-counter on cross-reference surfaces (Phase 6 in active bench run) All 146 ccpa-arena lib tests pass. Doc-drift detector: 17/17. Refs: - M291 evidence: evidence/phase-6/v1004-sub-bench-b-pattern-shift-2026-05-21.md - aprender#1853 (M291 Gap 1 fix; in flight) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds: - book/ — mdBook source for paiml.github.io/claude-code-parity-apr - .github/workflows/book.yml — CI build + GitHub Pages auto-deploy - README.md restructured for professional landing (badges row, book callout, empirical highlight section, deep-links to book chapters) - .gitignore — book/book/ (generated artifact) Book structure (28 chapters): - Introduction - Overview: what is CCPA, methodology, two paths, architecture - Static path: trace schema, differ, fixtures, bidirectional sensitivity - Arena: overview, phase 5, phase 6, outcome variants - Falsification gates: 20 gates, source-of-truth, behavioral parity, status flow - Empirical findings: V1_004 chain (M286, M287, M291, M292, M294) - Reference: CLI, trace schema, contract YAML, gate IDs - Appendix: academic basis, milestone history, glossary Build locally: mdbook build book/ -> book/book/index.html Deploy: GitHub Pages auto-deploys on push to main when book/ changes. Doc-drift detector: 17/17 drift classes pass. Refs: - evidence/phase-6/v1004-*.md (all sourced into book chapters) - CCPA#259 M291, #260 M292, #261 M293, #262 M294 scope Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift merged commit 7d4c050 into main May 21, 2026
1 check failed

noahgift deleted the m292-agent-text-loop-detection branch May 21, 2026 01:13

noahgift mentioned this pull request May 22, 2026

docs(M295): professional README + mdBook companion at paiml.github.io/claude-code-parity-apr #263

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(M292): Agent-Text-Loop detector — closes M291 Gap 3#260

feat(M292): Agent-Text-Loop detector — closes M291 Gap 3#260
noahgift merged 1 commit into
mainfrom
m292-agent-text-loop-detection

noahgift commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 21, 2026

Summary

Why

What this does NOT do

Test plan

Cross-references

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant