spec(M294): V1_004 follow-up — Qwen3-30B-A3B-Instruct-2507 (non-Coder) A/B scope by noahgift · Pull Request #262 · paiml/claude-code-parity-apr

noahgift · 2026-05-22T05:57:15Z

Summary

Scope doc for the next V1_004 dispatch — tests the finetune-distribution hypothesis isolated from architecture, size, and inference stack.

What we know empirically

Through M286-M293 + 7B-Coder follow-on (17/20 fixtures, 0 oracle_passed), we isolated the load-bearing variable behind 0% tool_call emission: Qwen-Coder finetune family. Both Qwen3-Coder-30B-A3B (MoE, 3B active) and Qwen2.5-Coder-7B-Instruct (dense, 7B) emit Markdown rust blocks instead of <tool_call> JSON, regardless of:

Size (7B vs 30B)
Active params (7B-dense vs 3B-active-MoE)
Architecture (qwen2 dense vs qwen3_moe)
Inference fixes (3-knob sampling + EOS + clean_chat_output + few-shot prompt all merged + verified)

Next experiment (this PR's scope)

Qwen/Qwen3-30B-A3B-Instruct-2507 (non-Coder, official July 2025 Instruct variant). Same architecture as the Coder variant we tested → M32d KV cache, 3-knob toolkit, EOS, clean_chat_output all work without modification. Only the finetune changes — clean A/B.

Success criteria

Primary: any outcome.kind = "oracle_passed" → V1_004 discharge path opens (contract amendment / V1_005)
Diagnostic: any tool_use_count > 0 → Coder-finetune-distribution hypothesis confirmed
Pattern shift: outcome distribution shifts away from 100%-text-turn

What this PR does NOT do

NOT a V1_004 contract amendment (gate names Qwen3-Coder-30B-A3B-Instruct specifically)
NOT a GPU dispatch (local box is compute_mode:cpu; Lambda Labs reserved for follow-on)
NOT bumping M-counter (Phase 6 in active bench-run state)

Test plan

bash scripts/check-doc-drift.sh — 17/17
Download Qwen3-30B-A3B-Instruct-2507-Q4_K_M.gguf via hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF
Smoke test: clean EOS, no Human leak
Dispatch bench against fresh evidence/under-contract/
Inspect first fixture's tool_use_count for hypothesis discharge

Cross-references

evidence/phase-6/v1004-sub-bench-b-pattern-shift-2026-05-21.md (M291)
evidence/phase-6/v1004-agent-text-loop-detector-2026-05-21.md (M292)
evidence/under-contract-7b-coder-partial-2026-05-22/ (7B partial, 17/20)

🤖 Generated with Claude Code

…-Coder) A/B The M286-M293 chain isolated the load-bearing variable behind 0% tool_call emission across Phase 6 fixtures: Qwen-Coder finetune distribution. Sub-bench B on 30B-A3B-Coder showed uniform text-turn pattern; 7B-dense-Coder followup confirmed (17/20 fixtures, 0 tool_calls). Architecture identity (Qwen3-Coder-30B-A3B vs Qwen3-30B-A3B-Instruct-2507) means M32d KV cache, 3-knob sampling, EOS stop_token, clean_chat_output start-of-string strip all work without modification — only finetune changes. Success criteria: - Primary: any oracle_passed -> V1_004 discharge path opens - Diagnostic: any tool_use_count > 0 -> Coder-finetune hypothesis confirmed - Pattern shift away from 100%-text-turn distribution Not a contract amendment (V1_004 pins to Qwen3-Coder-30B-A3B specifically; success enables V1_005 proposal or V1_004 amendment via M22 ritual). CPU-bound local dispatch (~20-30hr wall) for cleanest A/B; Lambda Labs GPU follow-on operator-coordinated. Refs: - evidence/phase-6/v1004-sub-bench-b-pattern-shift-2026-05-21.md (M291) - evidence/phase-6/v1004-agent-text-loop-detector-2026-05-21.md (M292) - evidence/under-contract-7b-coder-partial-2026-05-22/ (7B partial, 17/20) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds: - book/ — mdBook source for paiml.github.io/claude-code-parity-apr - .github/workflows/book.yml — CI build + GitHub Pages auto-deploy - README.md restructured for professional landing (badges row, book callout, empirical highlight section, deep-links to book chapters) - .gitignore — book/book/ (generated artifact) Book structure (28 chapters): - Introduction - Overview: what is CCPA, methodology, two paths, architecture - Static path: trace schema, differ, fixtures, bidirectional sensitivity - Arena: overview, phase 5, phase 6, outcome variants - Falsification gates: 20 gates, source-of-truth, behavioral parity, status flow - Empirical findings: V1_004 chain (M286, M287, M291, M292, M294) - Reference: CLI, trace schema, contract YAML, gate IDs - Appendix: academic basis, milestone history, glossary Build locally: mdbook build book/ -> book/book/index.html Deploy: GitHub Pages auto-deploys on push to main when book/ changes. Doc-drift detector: 17/17 drift classes pass. Refs: - evidence/phase-6/v1004-*.md (all sourced into book chapters) - CCPA#259 M291, #260 M292, #261 M293, #262 M294 scope Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift mentioned this pull request May 22, 2026

docs(M295): professional README + mdBook companion at paiml.github.io/claude-code-parity-apr #263

Merged

6 tasks

noahgift merged commit 47ca282 into main May 22, 2026
1 check failed

noahgift deleted the m294-non-coder-instruct-scope branch May 22, 2026 06:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec(M294): V1_004 follow-up — Qwen3-30B-A3B-Instruct-2507 (non-Coder) A/B scope#262

spec(M294): V1_004 follow-up — Qwen3-30B-A3B-Instruct-2507 (non-Coder) A/B scope#262
noahgift merged 1 commit into
mainfrom
m294-non-coder-instruct-scope

noahgift commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 22, 2026

Summary

What we know empirically

Next experiment (this PR's scope)

Success criteria

What this PR does NOT do

Test plan

Cross-references

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant