Skip to content

docs(M296): three-month-break closeout + M-counter M280→M296#264

Merged
noahgift merged 1 commit into
mainfrom
m296-three-month-break-closeout
May 22, 2026
Merged

docs(M296): three-month-break closeout + M-counter M280→M296#264
noahgift merged 1 commit into
mainfrom
m296-three-month-break-closeout

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Operator-directed three-month break begins. V1_004 still open but empirically narrowed.

Headline finding from this session (M286-M295)

Through 12 PRs across the V1_004 chain, four candidate variables were tested as the load-bearing one behind 0%-tool_call emission on Phase 6 fixtures:

Variable Test Result
Inference stack quality M286 KV cache + 3-knob + EOS + clean_chat_output Necessary fix; not sufficient
Active params count 3B (30B-A3B-MoE) vs 7B (dense Qwen2.5-Coder-7B-Instruct) Both 0 tool_calls — refuted
MoE vs dense qwen3_moe vs qwen2 dense Same pattern — refuted
Few-shot prompt 3 concrete <tool_call> examples No shift — refuted
Qwen-Coder finetune family Smoke against Qwen3-30B-A3B-Instruct-2507 (non-Coder) Bare tool_call JSON in 20 tokens — confirmed at smoke level

Bench-level partial refutation

F1 of the non-Coder Instruct bench: driver_error at turn 8, tool_use_count=0, 8 Markdown turns. The smoke-vs-bench divergence surfaces a second-order constraint: apr code's multi-turn prompt context (rendered history with previous turn's Markdown + "### Continue:" suffix) self-recursively reinforces the Markdown distribution even on a finetune that emits tool_call JSON in 1-shot smoke.

Three resumption paths

Scoped in evidence/phase-6/m296-three-month-break-closeout-2026-05-22.md:

  • (a) Investigate apr code's render_history + per-turn prompt construction
  • (b) Post-decode Markdown→tool_call parser in apr code (unlocks Qwen-Coder family for V1_004 as written)
  • (c) V1_005 against different model class on Lambda Labs GPU (Llama-3.3-70B, DeepSeek-V3, Qwen3-32B-Instruct dense)

What this PR ships

  • evidence/phase-6/m296-three-month-break-closeout-2026-05-22.md — comprehensive resumption playbook
  • Partial bench archives preserved in-repo for historical record (5 dirs, ~33MB total)
  • M-counter M280→M296 bumped on all 5 surfaces (README, CONTRIBUTING, top spec, status-snapshots, milestones)
  • M296 row + M286-M295 rollup row added to milestones-m101-m111.md
  • .gitignore — exclude .claude/ runtime artifacts

Project handoff state

  • ✅ No in-flight benches (all background processes killed)
  • ✅ No orphan apr serve / apr code processes
  • ✅ Book live at https://paiml.github.io/claude-code-parity-apr/
  • ✅ Doc-drift detector: 17/17 classes clean
  • ⚠️ V1_004 still open (operator-coordinated decisions remaining)

Test plan

  • bash scripts/check-doc-drift.sh — 17/17 drift classes clean
  • All 5 M-counter surfaces synchronized to M296
  • No orphan processes verified via pgrep
  • CI green

Cross-references

🤖 Generated with Claude Code

…unter M280 -> M296

Operator-directed three-month break begins. V1_004 still open but
empirically narrowed; bench-side smoke confirms non-Coder Qwen3-30B-A3B-
Instruct-2507 emits clean tool_call JSON (categorically different from
Coder family), but bench-level partial refutation: F1 emits 8 Markdown
turns with tool_use_count=0. Second-order constraint identified — apr
code's multi-turn prompt context (rendered history + "### Continue:")
self-recursively reinforces Markdown pattern.

Adds:
- evidence/phase-6/m296-three-month-break-closeout-2026-05-22.md — the
  comprehensive resumption-playbook closeout (~3000 LOC).
- evidence/under-contract-30b-instruct-2507-partial-2026-05-22/ — F1
  captures from the M294 dispatch (driver_error at turn 8, tool_use=0).
- evidence/under-contract-7b-coder-partial-2026-05-22/ — Qwen2.5-Coder-7B
  partial (17/20, 0 oracle_passed, hypothesis-3 refutation evidence).
- evidence/under-contract-30b-sub-bench-b-partial-2026-05-21/ — M291
  sub-bench B partial (9 fixtures, pattern-shift evidence).
- evidence/under-contract-30b-instruct-tainted-2026-05-22/ — artifacts
  from killed smoke-test interruptions (3 driver_errors due to my kills).
- Various M270/M280-era partial archives backfilled into the repo.

Bumps M-counter M280 -> M296 on all 5 surfaces:
- README.md (At-a-glance table)
- CONTRIBUTING.md (status line)
- docs/specifications/claude-code-parity-apr-poc.md (Status + Completeness header)
- docs/specifications/completeness-assessment.md (H-level stamps)
- docs/specifications/status-snapshots.md (Status snapshot blockquote + Run 1 row)

Adds M296 row + M286-M295 rollup row to docs/specifications/milestones-m101-m111.md.

Three resumption paths scoped in the closeout evidence doc:
(a) investigate render_history + per-turn prompt construction
(b) post-decode Markdown -> tool_call parser in apr code
(c) V1_005 against different model class on Lambda Labs GPU

Project state: no in-flight benches, no orphan processes, book live at
paiml.github.io/claude-code-parity-apr. V1_004 still open.

Doc-drift detector: 17/17 classes clean.

Refs:
- M286 evidence/phase-6/m32d-shipped-2026-05-20.md
- M287 evidence/phase-6/m32d-bench-pattern-2026-05-20.md
- M288-M290 v1004-3knob-{dispatch-recipe,plumbing-shipped}-2026-05-20.md + v1004-followup-snapshot
- M291 v1004-sub-bench-b-pattern-shift-2026-05-21.md
- M292 v1004-agent-text-loop-detector-2026-05-21.md
- M293 + M294 + M295 PR trail (CCPA#259-263)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 00e41c1 into main May 22, 2026
1 check failed
@noahgift noahgift deleted the m296-three-month-break-closeout branch May 22, 2026 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant