Skip to content

docs(M291): V1_004 sub-bench B empirical pattern shift + aprender#1853 fix#259

Merged
noahgift merged 1 commit into
mainfrom
m291-v1004-sub-bench-b-pattern-shift
May 21, 2026
Merged

docs(M291): V1_004 sub-bench B empirical pattern shift + aprender#1853 fix#259
noahgift merged 1 commit into
mainfrom
m291-v1004-sub-bench-b-pattern-shift

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

  • New evidence doc evidence/phase-6/v1004-sub-bench-b-pattern-shift-2026-05-21.md (~105 LOC) characterizing the M287 → M291 categorical pattern shift on Qwen3-Coder-30B Phase 6 sub-bench B.
  • M287 (greedy baseline) showed uniform outcome=driver_error (infinite "Human:" loop, hit timeout); sub-bench B with 3-knob sampling + #1849 few-shot prompt + #1852 EOS stop_token shows outcome=oracle_failed_after_max_turns turns=20 on fixture 1.
  • Three root-cause gaps diagnosed; Gap 1 (clean_chat_output start-of-string leak) FIXED via aprender#1853 (open for review).
  • No M-counter surface bump — Phase 6 still in active bench-run state (1/20 fixtures complete); surface bumps wait for V1_004 discharge or final pattern conclusion.

Cross-references

  • aprender#1853 — Gap 1 fix: clean_chat_output start-of-string strip + 6 pin tests
  • aprender#1852 — EOS stop_token + clean_chat_output in MoE chat path
  • aprender#1849 — few-shot <tool_call> examples in CODE_SYSTEM_PROMPT
  • evidence/phase-6/v1004-followup-snapshot-2026-05-20.md — M290 5-PR snapshot
  • evidence/under-contract-30b-greedy-2026-05-21/ — M287 greedy baseline archive

Test plan

  • bash scripts/check-doc-drift.sh — 17/17 drift classes
  • bash scripts/test-doc-drift.sh — detector live-test passes
  • CI green
  • Sub-bench B continues running (fixtures 2-20); future M-row will close out the empirical chapter

🤖 Generated with Claude Code

…error → oracle_failed_after_max_turns

The M287 greedy baseline (uniform driver_error from "Human:" infinite
loop) is broken: sub-bench B with 3-knob sampling (temp=0.3, top_k=50,
top_p=0.95, rep_penalty=1.2, repeat_last_n=64) + #1849 few-shot prompt +
#1852 EOS stop_token yields `oracle_failed_after_max_turns turns=20` on
fixture 1.

Diagnosis of three independent gaps:
- Gap 1 (clean_chat_output start-of-string leak): FIXED via aprender#1853
- Gap 2 (few-shot prompt insufficient to override Markdown distribution)
- Gap 3 (arena doesn't recover from skipped turns)

V1_004 status: partially discharged (pattern shift confirmed), not fully
discharged (no oracle_passed yet). Bench continues fixtures 2-20.

No M-counter bump on surfaces — Phase 6 in active bench-run state;
surface bumps wait for V1_004 discharge or pattern conclusion.

Refs:
- aprender#1853 (Gap 1 fix; clean_chat_output strip)
- aprender#1852 (EOS stop_token in MoE)
- aprender#1849 (few-shot prompt)
- evidence/under-contract-30b-greedy-2026-05-21/ (M287 baseline archive)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 1216ffa into main May 21, 2026
1 check failed
@noahgift noahgift deleted the m291-v1004-sub-bench-b-pattern-shift branch May 21, 2026 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant