Skip to content

spec(M294): V1_004 follow-up — Qwen3-30B-A3B-Instruct-2507 (non-Coder) A/B scope#262

Merged
noahgift merged 1 commit into
mainfrom
m294-non-coder-instruct-scope
May 22, 2026
Merged

spec(M294): V1_004 follow-up — Qwen3-30B-A3B-Instruct-2507 (non-Coder) A/B scope#262
noahgift merged 1 commit into
mainfrom
m294-non-coder-instruct-scope

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Scope doc for the next V1_004 dispatch — tests the finetune-distribution hypothesis isolated from architecture, size, and inference stack.

What we know empirically

Through M286-M293 + 7B-Coder follow-on (17/20 fixtures, 0 oracle_passed), we isolated the load-bearing variable behind 0% tool_call emission: Qwen-Coder finetune family. Both Qwen3-Coder-30B-A3B (MoE, 3B active) and Qwen2.5-Coder-7B-Instruct (dense, 7B) emit Markdown rust blocks instead of <tool_call> JSON, regardless of:

  • Size (7B vs 30B)
  • Active params (7B-dense vs 3B-active-MoE)
  • Architecture (qwen2 dense vs qwen3_moe)
  • Inference fixes (3-knob sampling + EOS + clean_chat_output + few-shot prompt all merged + verified)

Next experiment (this PR's scope)

Qwen/Qwen3-30B-A3B-Instruct-2507 (non-Coder, official July 2025 Instruct variant). Same architecture as the Coder variant we tested → M32d KV cache, 3-knob toolkit, EOS, clean_chat_output all work without modification. Only the finetune changes — clean A/B.

Success criteria

  • Primary: any outcome.kind = "oracle_passed" → V1_004 discharge path opens (contract amendment / V1_005)
  • Diagnostic: any tool_use_count > 0 → Coder-finetune-distribution hypothesis confirmed
  • Pattern shift: outcome distribution shifts away from 100%-text-turn

What this PR does NOT do

  • NOT a V1_004 contract amendment (gate names Qwen3-Coder-30B-A3B-Instruct specifically)
  • NOT a GPU dispatch (local box is compute_mode:cpu; Lambda Labs reserved for follow-on)
  • NOT bumping M-counter (Phase 6 in active bench-run state)

Test plan

  • bash scripts/check-doc-drift.sh — 17/17
  • Download Qwen3-30B-A3B-Instruct-2507-Q4_K_M.gguf via hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF
  • Smoke test: clean EOS, no Human leak
  • Dispatch bench against fresh evidence/under-contract/
  • Inspect first fixture's tool_use_count for hypothesis discharge

Cross-references

  • evidence/phase-6/v1004-sub-bench-b-pattern-shift-2026-05-21.md (M291)
  • evidence/phase-6/v1004-agent-text-loop-detector-2026-05-21.md (M292)
  • evidence/under-contract-7b-coder-partial-2026-05-22/ (7B partial, 17/20)

🤖 Generated with Claude Code

…-Coder) A/B

The M286-M293 chain isolated the load-bearing variable behind 0% tool_call
emission across Phase 6 fixtures: Qwen-Coder finetune distribution.
Sub-bench B on 30B-A3B-Coder showed uniform text-turn pattern; 7B-dense-Coder
followup confirmed (17/20 fixtures, 0 tool_calls).

Architecture identity (Qwen3-Coder-30B-A3B vs Qwen3-30B-A3B-Instruct-2507)
means M32d KV cache, 3-knob sampling, EOS stop_token, clean_chat_output
start-of-string strip all work without modification — only finetune changes.

Success criteria:
- Primary: any oracle_passed -> V1_004 discharge path opens
- Diagnostic: any tool_use_count > 0 -> Coder-finetune hypothesis confirmed
- Pattern shift away from 100%-text-turn distribution

Not a contract amendment (V1_004 pins to Qwen3-Coder-30B-A3B specifically;
success enables V1_005 proposal or V1_004 amendment via M22 ritual).

CPU-bound local dispatch (~20-30hr wall) for cleanest A/B; Lambda Labs
GPU follow-on operator-coordinated.

Refs:
- evidence/phase-6/v1004-sub-bench-b-pattern-shift-2026-05-21.md (M291)
- evidence/phase-6/v1004-agent-text-loop-detector-2026-05-21.md (M292)
- evidence/under-contract-7b-coder-partial-2026-05-22/ (7B partial, 17/20)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 47ca282 into main May 22, 2026
1 check failed
@noahgift noahgift deleted the m294-non-coder-instruct-scope branch May 22, 2026 06:43
noahgift added a commit that referenced this pull request May 22, 2026
Adds:
- book/ — mdBook source for paiml.github.io/claude-code-parity-apr
- .github/workflows/book.yml — CI build + GitHub Pages auto-deploy
- README.md restructured for professional landing (badges row, book
  callout, empirical highlight section, deep-links to book chapters)
- .gitignore — book/book/ (generated artifact)

Book structure (28 chapters):
- Introduction
- Overview: what is CCPA, methodology, two paths, architecture
- Static path: trace schema, differ, fixtures, bidirectional sensitivity
- Arena: overview, phase 5, phase 6, outcome variants
- Falsification gates: 20 gates, source-of-truth, behavioral parity, status flow
- Empirical findings: V1_004 chain (M286, M287, M291, M292, M294)
- Reference: CLI, trace schema, contract YAML, gate IDs
- Appendix: academic basis, milestone history, glossary

Build locally: mdbook build book/ -> book/book/index.html
Deploy: GitHub Pages auto-deploys on push to main when book/ changes.

Doc-drift detector: 17/17 drift classes pass.

Refs:
- evidence/phase-6/v1004-*.md (all sourced into book chapters)
- CCPA#259 M291, #260 M292, #261 M293, #262 M294 scope

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant