spec(M294): V1_004 follow-up — Qwen3-30B-A3B-Instruct-2507 (non-Coder) A/B scope#262
Merged
Merged
Conversation
…-Coder) A/B The M286-M293 chain isolated the load-bearing variable behind 0% tool_call emission across Phase 6 fixtures: Qwen-Coder finetune distribution. Sub-bench B on 30B-A3B-Coder showed uniform text-turn pattern; 7B-dense-Coder followup confirmed (17/20 fixtures, 0 tool_calls). Architecture identity (Qwen3-Coder-30B-A3B vs Qwen3-30B-A3B-Instruct-2507) means M32d KV cache, 3-knob sampling, EOS stop_token, clean_chat_output start-of-string strip all work without modification — only finetune changes. Success criteria: - Primary: any oracle_passed -> V1_004 discharge path opens - Diagnostic: any tool_use_count > 0 -> Coder-finetune hypothesis confirmed - Pattern shift away from 100%-text-turn distribution Not a contract amendment (V1_004 pins to Qwen3-Coder-30B-A3B specifically; success enables V1_005 proposal or V1_004 amendment via M22 ritual). CPU-bound local dispatch (~20-30hr wall) for cleanest A/B; Lambda Labs GPU follow-on operator-coordinated. Refs: - evidence/phase-6/v1004-sub-bench-b-pattern-shift-2026-05-21.md (M291) - evidence/phase-6/v1004-agent-text-loop-detector-2026-05-21.md (M292) - evidence/under-contract-7b-coder-partial-2026-05-22/ (7B partial, 17/20) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Merged
6 tasks
noahgift
added a commit
that referenced
this pull request
May 22, 2026
Adds: - book/ — mdBook source for paiml.github.io/claude-code-parity-apr - .github/workflows/book.yml — CI build + GitHub Pages auto-deploy - README.md restructured for professional landing (badges row, book callout, empirical highlight section, deep-links to book chapters) - .gitignore — book/book/ (generated artifact) Book structure (28 chapters): - Introduction - Overview: what is CCPA, methodology, two paths, architecture - Static path: trace schema, differ, fixtures, bidirectional sensitivity - Arena: overview, phase 5, phase 6, outcome variants - Falsification gates: 20 gates, source-of-truth, behavioral parity, status flow - Empirical findings: V1_004 chain (M286, M287, M291, M292, M294) - Reference: CLI, trace schema, contract YAML, gate IDs - Appendix: academic basis, milestone history, glossary Build locally: mdbook build book/ -> book/book/index.html Deploy: GitHub Pages auto-deploys on push to main when book/ changes. Doc-drift detector: 17/17 drift classes pass. Refs: - evidence/phase-6/v1004-*.md (all sourced into book chapters) - CCPA#259 M291, #260 M292, #261 M293, #262 M294 scope Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Scope doc for the next V1_004 dispatch — tests the finetune-distribution hypothesis isolated from architecture, size, and inference stack.
What we know empirically
Through M286-M293 + 7B-Coder follow-on (17/20 fixtures, 0 oracle_passed), we isolated the load-bearing variable behind 0% tool_call emission: Qwen-Coder finetune family. Both Qwen3-Coder-30B-A3B (MoE, 3B active) and Qwen2.5-Coder-7B-Instruct (dense, 7B) emit Markdown
rustblocks instead of<tool_call>JSON, regardless of:Next experiment (this PR's scope)
Qwen/Qwen3-30B-A3B-Instruct-2507 (non-Coder, official July 2025 Instruct variant). Same architecture as the Coder variant we tested → M32d KV cache, 3-knob toolkit, EOS, clean_chat_output all work without modification. Only the finetune changes — clean A/B.
Success criteria
outcome.kind = "oracle_passed"→ V1_004 discharge path opens (contract amendment / V1_005)tool_use_count > 0→ Coder-finetune-distribution hypothesis confirmedWhat this PR does NOT do
compute_mode:cpu; Lambda Labs reserved for follow-on)Test plan
bash scripts/check-doc-drift.sh— 17/17hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUFevidence/under-contract/tool_use_countfor hypothesis dischargeCross-references
evidence/phase-6/v1004-sub-bench-b-pattern-shift-2026-05-21.md(M291)evidence/phase-6/v1004-agent-text-loop-detector-2026-05-21.md(M292)evidence/under-contract-7b-coder-partial-2026-05-22/(7B partial, 17/20)🤖 Generated with Claude Code