docs(M296): three-month-break closeout + M-counter M280→M296 by noahgift · Pull Request #264 · paiml/claude-code-parity-apr

noahgift · 2026-05-22T07:56:35Z

Summary

Operator-directed three-month break begins. V1_004 still open but empirically narrowed.

Headline finding from this session (M286-M295)

Through 12 PRs across the V1_004 chain, four candidate variables were tested as the load-bearing one behind 0%-tool_call emission on Phase 6 fixtures:

Variable	Test	Result
Inference stack quality	M286 KV cache + 3-knob + EOS + clean_chat_output	Necessary fix; not sufficient
Active params count	3B (30B-A3B-MoE) vs 7B (dense Qwen2.5-Coder-7B-Instruct)	Both 0 tool_calls — refuted
MoE vs dense	qwen3_moe vs qwen2 dense	Same pattern — refuted
Few-shot prompt	3 concrete `<tool_call>` examples	No shift — refuted
Qwen-Coder finetune family	Smoke against Qwen3-30B-A3B-Instruct-2507 (non-Coder)	Bare tool_call JSON in 20 tokens — confirmed at smoke level

Bench-level partial refutation

F1 of the non-Coder Instruct bench: driver_error at turn 8, tool_use_count=0, 8 Markdown turns. The smoke-vs-bench divergence surfaces a second-order constraint: apr code's multi-turn prompt context (rendered history with previous turn's Markdown + "### Continue:" suffix) self-recursively reinforces the Markdown distribution even on a finetune that emits tool_call JSON in 1-shot smoke.

Three resumption paths

Scoped in evidence/phase-6/m296-three-month-break-closeout-2026-05-22.md:

(a) Investigate apr code's render_history + per-turn prompt construction
(b) Post-decode Markdown→tool_call parser in apr code (unlocks Qwen-Coder family for V1_004 as written)
(c) V1_005 against different model class on Lambda Labs GPU (Llama-3.3-70B, DeepSeek-V3, Qwen3-32B-Instruct dense)

What this PR ships

evidence/phase-6/m296-three-month-break-closeout-2026-05-22.md — comprehensive resumption playbook
Partial bench archives preserved in-repo for historical record (5 dirs, ~33MB total)
M-counter M280→M296 bumped on all 5 surfaces (README, CONTRIBUTING, top spec, status-snapshots, milestones)
M296 row + M286-M295 rollup row added to milestones-m101-m111.md
.gitignore — exclude .claude/ runtime artifacts

Project handoff state

✅ No in-flight benches (all background processes killed)
✅ No orphan apr serve / apr code processes
✅ Book live at https://paiml.github.io/claude-code-parity-apr/
✅ Doc-drift detector: 17/17 classes clean
⚠️ V1_004 still open (operator-coordinated decisions remaining)

Test plan

bash scripts/check-doc-drift.sh — 17/17 drift classes clean
All 5 M-counter surfaces synchronized to M296
No orphan processes verified via pgrep
CI green

Cross-references

M286 evidence/phase-6/m32d-shipped-2026-05-20.md
M287 evidence/phase-6/m32d-bench-pattern-2026-05-20.md
M291 evidence/phase-6/v1004-sub-bench-b-pattern-shift-2026-05-21.md
M292 evidence/phase-6/v1004-agent-text-loop-detector-2026-05-21.md
M295 README + book PR (docs(M295): professional README + mdBook companion at paiml.github.io/claude-code-parity-apr #263)
aprender#1832, #1837, #1842, #1844, #1846, #1849, #1852, #1853 (all merged)

🤖 Generated with Claude Code

…unter M280 -> M296 Operator-directed three-month break begins. V1_004 still open but empirically narrowed; bench-side smoke confirms non-Coder Qwen3-30B-A3B- Instruct-2507 emits clean tool_call JSON (categorically different from Coder family), but bench-level partial refutation: F1 emits 8 Markdown turns with tool_use_count=0. Second-order constraint identified — apr code's multi-turn prompt context (rendered history + "### Continue:") self-recursively reinforces Markdown pattern. Adds: - evidence/phase-6/m296-three-month-break-closeout-2026-05-22.md — the comprehensive resumption-playbook closeout (~3000 LOC). - evidence/under-contract-30b-instruct-2507-partial-2026-05-22/ — F1 captures from the M294 dispatch (driver_error at turn 8, tool_use=0). - evidence/under-contract-7b-coder-partial-2026-05-22/ — Qwen2.5-Coder-7B partial (17/20, 0 oracle_passed, hypothesis-3 refutation evidence). - evidence/under-contract-30b-sub-bench-b-partial-2026-05-21/ — M291 sub-bench B partial (9 fixtures, pattern-shift evidence). - evidence/under-contract-30b-instruct-tainted-2026-05-22/ — artifacts from killed smoke-test interruptions (3 driver_errors due to my kills). - Various M270/M280-era partial archives backfilled into the repo. Bumps M-counter M280 -> M296 on all 5 surfaces: - README.md (At-a-glance table) - CONTRIBUTING.md (status line) - docs/specifications/claude-code-parity-apr-poc.md (Status + Completeness header) - docs/specifications/completeness-assessment.md (H-level stamps) - docs/specifications/status-snapshots.md (Status snapshot blockquote + Run 1 row) Adds M296 row + M286-M295 rollup row to docs/specifications/milestones-m101-m111.md. Three resumption paths scoped in the closeout evidence doc: (a) investigate render_history + per-turn prompt construction (b) post-decode Markdown -> tool_call parser in apr code (c) V1_005 against different model class on Lambda Labs GPU Project state: no in-flight benches, no orphan processes, book live at paiml.github.io/claude-code-parity-apr. V1_004 still open. Doc-drift detector: 17/17 classes clean. Refs: - M286 evidence/phase-6/m32d-shipped-2026-05-20.md - M287 evidence/phase-6/m32d-bench-pattern-2026-05-20.md - M288-M290 v1004-3knob-{dispatch-recipe,plumbing-shipped}-2026-05-20.md + v1004-followup-snapshot - M291 v1004-sub-bench-b-pattern-shift-2026-05-21.md - M292 v1004-agent-text-loop-detector-2026-05-21.md - M293 + M294 + M295 PR trail (CCPA#259-263) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift merged commit 00e41c1 into main May 22, 2026
1 check failed

noahgift deleted the m296-three-month-break-closeout branch May 22, 2026 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(M296): three-month-break closeout + M-counter M280→M296#264

docs(M296): three-month-break closeout + M-counter M280→M296#264
noahgift merged 1 commit into
mainfrom
m296-three-month-break-closeout

noahgift commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 22, 2026

Summary

Headline finding from this session (M286-M295)

Bench-level partial refutation

Three resumption paths

What this PR ships

Project handoff state

Test plan

Cross-references

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant