Top spec: claude-code-parity-apr-poc.md | M51–M100 | M101–M111
POC scaffold (M0–M6), corpus expansion (M11–M19), drift-detector cascade (M37–M44), regression corpus broadening (M47), monorepo scope clarification (M31), numerical-parity setup (M32a–M32d.3), audit-trail bumps (M33/M35/M36), companion-repo bookkeeping. Forward-chronological per M-ID where possible.
| ID | Deliverable | Source-of-truth gates / Notes | PR / Commit |
|---|---|---|---|
| M0 | Spec + top-level contract claude-code-parity-apr-v1.yaml (DRAFT). Companion-repo invariants 009–012 online from the empty-scaffold PR forward. |
009, 010, 011, 012 | DONE (PRs #1, #2) |
| M1 | Companion repo scaffold (empty crates that compile + 100 % line cov), ccpa-trace crate, schema-roundtrip test. Spec + contract relocate from aprender to companion repo as canonical. |
+ 001 | DONE (PRs #3, #4) |
| M2 | ccpa record (Anthropic Messages-API parser → trace records). Note: M2.3 HTTPS proxy is OOS post-rescope ("we will not call api, we will assume claude code"). |
+ (still 001) | DONE (PRs #5, #6) at parser-only scope |
| M3 | ccpa replay (LlmDriver trait + RecordedDriver) — algorithm-level. apr code LlmDriver adapter pending PMAT-CODE-LLM-DRIVER-PUBLIC-001 in upstream aprender.apr code invocations succeeded at M150 with locally-built apr. Upstream surface is aprender#1638 feature-flag removal.) |
+ 002, 003 | DONE (PRs #7, #8) at algorithm scope |
| M4 | ccpa diff semantic differ — per-tool equivalence rules, file-mutation snapshots, parity score. |
+ 004, 005 | DONE (PRs #9, #10) |
| M5 | Sovereignty gate (no api.anthropic.com on replay) + corpus growth + parity-matrix coverage walk. |
+ 006, 007 | DONE (PRs #11, #13–#21) |
| M6 | Promote contract DRAFT → ACTIVE; integrate into make tier3 and pv lint; close epic. |
+ 008 | DONE (PR #12) |
| M11 | First runtime measured_parity over 5 paired canonical fixtures; contract DRAFT → ACTIVE_RUNTIME (FALSIFY-CCPA-013 discharged) | aggregate_score 1.0000 over 5 fixtures, parity-matrix coverage 1/17 | #13 |
| M12 | Corpus 5 → 8; coverage 1/17 → 4/17 (subagent-spawn, mcp-client, slash-commands added) | 1.0000 / 8 fixtures | #14 |
| M13 | Corpus 8 → 11; coverage 4/17 → 7/17 (claude-md-memory, permission-modes, builtin-tools-web added) | 1.0000 / 11 fixtures | #17 |
| M13.5 | Bidirectional sensitivity: regression corpus added; meter must FAIL on deliberate drift | regression corpus aggregate=0.5, exits 1 (drift detected) | #14 |
| M14 | Corpus 14 → 17; coverage 10/17 → 13/17 (worktree-isolation, configuration-ladder, managed-org-policy added) | 1.0000 / 17 fixtures | #19 |
| M15 | Trace schema v1 → v2 (additive HookEvent + SkillInvocation record kinds); differ extension (7 new DriftCategory variants); coverage 13/17 → 15/17 |
1.0000 / 19 fixtures; contract v1.2.0 → v1.3.0 | #20 |
| M16 | FALSIFY-CCPA-007 informational → HARD-BLOCKING; OOS exclusion mechanism (--oos-rows) shipped for keyboard-shortcuts + status-line |
15/15 reachable, gate PASS; contract v1.3.0 → v1.4.0 | #21 |
| M17 | Spec milestone table refreshed to reflect M0–M16; contract v1.4.0 → v1.5.0 | doc-only | #22 |
| M18 | Corpus depth 19 → 24; 5 schema-v2 surface variants (Bash multiline, Edit replace_all, HookDecision::Block, SkillSource::UserInvoked, StopReason::MaxTokens) | 1.0000 / 24 fixtures; contract v1.5.0 → v1.6.0 | #23 |
| M19 | Corpus complete 24 → 30 (spec ≥30 target met); multi-tool sequences + multi-turn correction + StopReason::StopSequence | 1.0000 / 30 fixtures; contract v1.6.0 → v1.7.0 | #24 |
| M20 | README truth-up — badges (v1.2.0 → v1.7.0), behavioral-gates table flipped from "planned" to ✅ ACTIVE, architecture diagram revised post-rescope | doc-only; contract v1.7.0 → v1.8.0 | #25 |
| M21 | Aprender-side mirror sync v1.2.0 → v1.8.0 (6 revisions of drift cleared); first round-trip closure | byte-identical sha256 across both repos; contract v1.8.0 → v1.9.0 | #27 (squash inc. M21) |
| M22 | pin-check-roundtrip.sh CI guard installed — fails any companion bump unpaired with aprender mirror |
drift class mechanically prevented; contract v1.9.0 → v1.10.0 | #27 |
| M23 | CONTRIBUTING.md authored — source-of-truth split, fixture-authoring workflow, 4-step contract-bump ritual, gate→remediation lookup, anti-patterns |
doc-only; contract v1.10.0 → v1.11.0 | #28 |
| M24 | 100% mutation coverage on ccpa-differ (gate kernel) — 5 kill-tests close arm-deletion + && → || gaps |
122 caught + 8 unviable + 0 missed across 130 mutants; contract v1.11.0 → v1.12.0 | #29 |
| M25 | 100% mutation coverage workspace-wide (remaining 4 crates) — 3 kill-tests on ccpa-cli close main exit-code propagation + uncovered/OOS print branches |
193 caught + 31 unviable + 0 missed across 224 mutants; contract v1.12.0 → v1.13.0 | #30 |
| M26 | ccpa measure AUTHORED → MEASURED bridge subcommand — drives live apr code -p against teacher's user_prompt, builds synthetic student trace, scores via compute_parity_score; refuses tool_use teachers (text-only path; tool dispatch waits on M28) |
text-only score-1.0 vacuous; tool-dispatch path requires apr code --emit-trace |
#31 |
| M27 | Spec-table refresh — sub-milestones table extended M19 → M26; contract v1.14.0 → v1.15.0 | doc-only | #32 |
| M28 | Cross-repo: apr code --emit-trace <path> flag upstream + Qwen3-Coder-30B-A3B-Instruct as default model + qwen3-coder short-name alias. Companion bookkeeping records the launch. |
aprender PRs landed via #1102; companion contract v1.15.0 → v1.16.0 | companion #33 |
| M29 | Five-whys + provable-contract — Qwen3-Coder GGUF load fail (Tensor 'blk.0.ffn_up.weight' not found) traced to GGUF tensor naming being arch-agnostic. Fix: tensor-names-v1 v1.0.0 → v1.1.0 with qwen3_moe arch-key + 4 new MoE layer roles + F-TNV-002 falsifier validated against the real 17.3 GB Qwen3-Coder GGUF byte inventory. |
aprender#1103 merged at 15d504cfe; companion contract v1.16.0 → v1.17.0 | companion #34 |
| M30 | Spec-table refresh — extends through M29; contract v1.17.0 → v1.18.0; closes the spec-side audit trail | doc-only | #35 |
| M31 | Monorepo scope clarification — aprender and claude-code-parity-apr live in the same monorepo; "upstream / out of scope / not a CCPA POC item" framing removed from spec + contract status_history; future inference-engine work (M32 MoE forward pass) treated as in-scope companion-repo deliverable | doc-only; contract v1.18.0 → v1.19.0 | direct main commit 1f06ac0 |
| M32a | First slice of MoE forward chain. Authored cross-repo kernel contract qwen3-moe-forward-v1.yaml (DRAFT, SCAFFOLD) composing tensor-names-v1 v1.1.0 + moe-router-v1 + moe-expert-dispatch-v1 + qwen3moe-shapes-v1 + swiglu/silu/rmsnorm/rope. 5 acceptance criteria + 4 staged steps (M32a/b/c/d) + 4 falsification tests. Anchors Qwen3-Coder-30B-A3B-Instruct shape algebra (L=48, d=2048, d_ff=6144, N_experts=128, k=8). FALSIFY-QW3-MOE-FORWARD-001 reproduced on lambda-vector RTX 4090. |
aprender #1104 merged at 78101494c | this PR |
| M32b | Architecture-aware FFN load — both QuantizedGGUFTransformer::from_gguf and GGUFTransformer::from_gguf short-circuit qwen3_moe with structured RealizarError::UnsupportedOperation referencing the M32a contract id. Cryptic Tensor 'blk.0.ffn_up.weight' not found replaced with audit-named error. 2 falsifier tests (synthetic + live 17.3 GB GGUF) discharge FALSIFY-QW3-MOE-FORWARD-002. |
aprender #1106 merged at 90cc293a7 | direct-merged commit 883a838 |
| M32c.1 | Qwen3MoeQuantizedLayer struct + load_qwen3_moe_layer() loader using the M29 contract namespace (blk.{L}.ffn_gate_inp/ffn_gate_exps/ffn_up_exps/ffn_down_exps.weight). Live verification: 4 MoE tensors per layer × 48 layers = 192 expert-tensor descriptors loaded from the cached 17.3 GB GGUF; total expert bytes 17.5 GB matches file size. |
aprender #1116 merged at ced9fe32b | (companion bookkeeping in this PR) |
| M32c.2 | QuantizedGGUFTransformer::from_gguf_for_moe constructor — qwen3_moe-aware sibling of from_gguf. Adds moe_layers: Vec<Option<Qwen3MoeQuantizedLayer>> field parallel to layers. Loads non-FFN portion via load_quantized_layer_moe_skeleton; dense FFN fields stub as zero-element placeholders; moe_layers[i] = Some(...) for every L. |
aprender #1117 merged at ffd0b246f | (companion bookkeeping in this PR) |
| M32c.2.1 | Flip from_gguf dispatch to from_gguf_for_moe for arch == qwen3_moe; replace M32b's load-time UnsupportedOperation with a forward-time UnsupportedOperation { operation: "moe_forward_dispatch" } at gguf_gpu_generate.rs. Live apr run against 17.3 GB GGUF now reaches inference attempt; error reports load succeeded, only forward dispatch unwired. M32b test updated. |
aprender #1118 merged at 97c808e29 | this PR |
| M32c.2.2 | Contract amendment recording the M32c.2.2 implementation strategy: LAZY-FUSED-MATVEC (per-token forward keeps the 4 MoE expert tensors in their on-disk Q4_K/Q6_K/F32 form and dequantizes inline through fused_q4k_parallel_matvec / fused_q6k_parallel_matvec row-major matvec kernels per CLAUDE.md LAYOUT-002 row-major mandate). Decision rationale: preserves the 8× memory-bandwidth advantage; avoids materializing 18 GB of dense FP32 expert tensors. Bumps qwen3-moe-forward-v1 v1.0.0 → v1.1.0. |
aprender #1119 merged at 590b8d6aa | this PR |
| M32c.2.2.0 | expert_byte_slice adapter — given (layer_idx, expert_idx, role), returns the &[u8] slice into the mmapped GGUF for that expert's quantized tensor. Reuses the M32c.1 Qwen3MoeQuantizedLayer descriptor (offsets, qtype, byte sizes). Per-expert sizes vary by qtype: Q4_K [768, 2048] = 884,736 bytes; Q6_K [2048, 768] = 1,290,240 bytes. |
aprender #1120 merged at db3436da9 | this PR |
| M32c.2.2.1 | expert_swiglu_quantized — per-expert SwiGLU FFN dispatch using LAZY-FUSED-MATVEC: gate_proj + up_proj via fused_q4k_parallel_matvec, SwiGLU activation, down_proj via fused_q6k_parallel_matvec, all reading from expert_byte_slice. Returns Vec<f32> of shape [hidden_dim]. Pure-CPU row-major. |
aprender #1121 merged at 4dd9ec21e | this PR |
| M32c.2.2.2.0 | moe_ffn_forward_layer — single-layer MoE FFN dispatch. Composes router (softmax + top-k=8) + per-token expert routing + expert_swiglu_quantized per-expert call + weighted aggregation. Shape: [hidden_dim] in → [hidden_dim] out. Replaces dense FFN block at one layer index. |
aprender #1122 merged at 1ab8e7fc5 | this PR |
| M32c.2.2.2.1 | Contract amendment recording the M32c.2.2.2.1 integration architecture decision. Three approaches compared: (A) field-add to OwnedQuantizedModel across 99 sites, (B) parallel run_qwen3_moe_generate function, (C) wrapper struct. Chose hybrid: method forward_qwen3_moe on OwnedQuantizedModel taking MoE descriptors as parameters (zero field-add, zero attention/RoPE duplication) + parallel run_qwen3_moe_generate autoregressive loop (zero touch on dense path). Bumps qwen3-moe-forward-v1 v1.1.0 → v1.2.0. |
aprender #1123 merged at bd0871803 | this PR |
| M32c.2.2.2.1.1 | OwnedQuantizedModel::forward_qwen3_moe — single-token forward method. Mirrors forward() step-for-step except FFN site calls moe_ffn_forward_layer. Reuses existing &self methods for qkv_matmul, apply_rope, causal_attention, fused_matmul, lm_head — zero duplication. Test f_qw3_moe_c22211_001 exercises end-to-end against cached 17.3 GB GGUF: logits.len() == 151936, all finite, argmax in vocab range. |
aprender #1124 merged at 10c74c400 | this PR |
| M32c.2.2.2.1.2 | run_qwen3_moe_generate — autoregressive generation loop. Reads MoE config (num_experts, k, intermediate) from GGUF metadata via new expert_count() / expert_used_count() / expert_feed_forward_length() accessors on GGUFModel. Loads per-layer Qwen3MoeQuantizedLayer descriptors once, then full-prefill-per-token loop with greedy argmax sampling. No KV cache (M32d follow-up). Sibling of run_gguf_generate for qwen3_moe arch. |
aprender #1125 merged at 16dcfe765 | this PR |
| M32c.2.2.2.1.3 | Dispatch flip in inference_result.rs routing qwen3_moe arch to run_qwen3_moe_generate instead of run_gguf_generate. Plus Q4_K_M qtype-aware dispatch (matvec_for_qtype helper) — Q4_K_M GGUF mixes Q4_K (qtype=12) and Q6_K (qtype=14) within and across layers, so per-expert matmul must dispatch on tensor.qtype at runtime instead of hardcoding kernel by role. FALSIFY-QW3-MOE-FORWARD-003 LIVE DISCHARGE on lambda-vector RTX 4090: apr run against cached 17.3 GB GGUF emits "aaaaaaaa" / "." (any non-whitespace) end-to-end. |
aprender #1126 merged at a902eea93 | #38 |
| M32c.2.2.2.1.4 | Live apr run falsifier in aprender-serve/tests/qwen3_moe_apr_run_live.rs pinning FALSIFY-QW3-MOE-FORWARD-003 as a regression test against the cached 17.3 GB Qwen3-Coder GGUF. Subprocess invocation via Command::new(apr).args(["run", "--prompt", "Hi", "--max-tokens", "4"]); assertions: exit 0, stdout matches /\\S/, stderr does not contain "Tensor 'blk.0.ffn_up.weight' not found". Skipped when GGUF absent (fixture-absent ≠ defect). Locks the M32c.2.2.2.1.3 discharge surface; any regression now fails CI. |
aprender #1127 merged at 0392b1843 | this PR |
| M32d.0 | qwen3-moe-forward-v1 contract amendment v1.2.0 → v1.3.0 — encodes the parity strategy for the upcoming numerical-correctness work: cosine ≥0.99 vs llama.cpp Q4_K reference logits AND cosine ≥0.99 vs Hugging Face FP16 reference logits at the LM-head, with two new falsifiers F-QW3-MOE-PARITY-001 (HF FP16 cosine) and F-QW3-MOE-PARITY-002 (llama.cpp argmax sanity). Status remains DRAFT; flips to ACTIVE_RUNTIME at M32d discharge. |
aprender #1128 merged at 2682132f7 | M33 audit-trail bookkeeping |
| M32d.1 | scripts/generate_qwen3_moe_fp16_logits.py — Hugging Face FP16 reference logits fixture generator. Pure Python via transformers + torch; downloads Qwen/Qwen3-Coder-30B-A3B-Instruct once (~60 GB), runs a single forward pass on a fixed prompt, dumps [batch, seq, vocab] logits to JSON for the M32d.2 cosine gate to consume. Multi-device offload via device_map="auto". Operator-confirm to run because of the download size and the ~30 min runtime on a 30B-A3B model. |
aprender #1129 merged at 87a2a61c1 | M33 audit-trail bookkeeping |
| M32d.2 | crates/aprender-serve/tests/qwen3_moe_parity.rs — f_qw3_moe_parity_001 cosine gate against the M32d.1 HF FP16 fixture. Marked #[ignore] until the fixture file lands (does not exist on disk yet — Step 1 of M34 FAST PATH). When run with --include-ignored, computes cosine of [hidden]→logits between APR forward and HF reference; asserts ≥0.99. F-QW3-MOE-PARITY-001 falsifier wired. |
aprender #1130 merged at ce6ca4bb4 | M33 audit-trail bookkeeping |
| M32d.3 | crates/aprender-serve/tests/qwen3_moe_argmax_parity.rs — f_qw3_moe_argmax_parity_002 llama.cpp argmax sanity check. Independent of the HF fixture: runs APR apr run --prompt "Once upon a time" and llama-cli --prompt "Once upon a time" --n-predict 1 against the same Qwen3-Coder GGUF and asserts that both pick the same top-1 token id. F-QW3-MOE-PARITY-002 falsifier wired. Skipped when llama.cpp binary or GGUF absent. |
aprender #1131 merged at 9f93d02d9 | M33 audit-trail bookkeeping |
| M33 | Companion-only audit-trail bump — pin.lock refreshed from aprender commit a8623f650 → 3ea8114c8 with note recording the M32c.2.2.2.1.4 + M32d.0/.1/.2/.3 set. Companion contract bumped v1.20.0 → v1.21.0; M22 paired aprender mirror push at byte-identical sha256. No code change. Closed the bookkeeping lag between aprender main and the companion-side spec snapshot. | direct main commit 4ddae99 |
this PR |
| M34 | Companion-only spec amendment — adds the section "M32d FAST PATH — five-whys + concrete next 6–13 PRs" embedding a five-whys analysis of the gibberish-output symptom and an ordered, falsifiable 6-step plan to discharge M32d (measure → wire trace → bisect layer → sub-bisect component → fix → discharge), with component priors (LAYOUT 30% / Q4_K_M scales 20% / per-head Q-K norm 15% / RoPE θ 10% / router softmax 10% / embedding 10% / other 5%) and cost estimate (4–6 PRs lucky / 8–10 realistic / 12–15 pessimistic). Companion contract v1.21.0 → v1.22.0; aprender mirror at cf5c7875c. No code change. Converts open M32d work from "iterate on output" to "produce concrete cosine numbers and bisect". | direct main commit 7200d2b |
this PR |
| M32d-Step2 | forward_qwen3_moe_traced — diagnostic-surface sibling of forward_qwen3_moe that emits per-layer std-dev + L2-norm at 5 probe points (post-embed, post-attn, post-MoE, post-residual, post-RMS-final) without altering production forward semantics. Drives the M34 FAST PATH bisection: per-layer std growth signature was the rank-3 Q/K norm tell. Q/K-norm absence produced 40× std growth at attention output by layer 8. |
aprender #1222 merged | (companion bookkeeping in M35) |
| M32d-Step2-JSON | apr trace --json --payload — JSON output for the trace surface so Step 2 std-dev numbers are machine-readable rather than eyeballed from stderr. handle_special_modes_with_json + run_traced_inference_json route at the apr-cli boundary; output shape is the falsifier-test exit-criterion shape from the M34 plan (per-layer {layer, std, l2} array). |
aprender #1401 merged | (companion bookkeeping in M35) |
| M32d-Step5+5b+6+7 | THE BUNDLE — root cause fix for %%%%%%%% → coherent output transition. Squashes 4 fixes: (Step 5) per-head Q/K RMSNorm in forward_qwen3_moe between bias-add and RoPE — discharges rank-3 prior (15%); std at attention output drops from 40× to 1.0×. (Step 5b) rope_theta default 10K → 1M for qwen3_moe/qwen3 arches in gguf/config.rs — discharges rank-4 prior (10%); long-context positional encoding correctly Qwen3-tuned. (Step 6) chat_template_helpers.rs routes qwen3_moe/qwen3moe to plain ChatML (no <think> injection) BEFORE the generic qwen3 → Qwen3NoThink rule. (Step 7) Sync forward_qwen3_moe_traced with Step 5 Q/K norm so traced and production paths stay byte-equivalent. F-QW3-MOE-STEP5-001 regression test wired. Output transition timeline: %%%%%%%% → "Human: What is 2+" (Step 5) → "Human: What is 2+2?" (Step 5b) → "2 + 2 = 4" (Step 6). Multi-domain: math/geography/translation/code all coherent. M34 FAST PATH lucky-case bound: 5 PRs / ~6 hours wall vs 4–6 PRs / 2–3 days lucky / 8–10 PRs / 4–6 days realistic estimate. |
aprender #1228 merged at squash 5235aaeb9 | (companion bookkeeping in M35) |
| M32d-RUSTSEC-unblock | Companion CI unblocker — RUSTSEC-2026-0114 transitive advisory deny-by-default landed in main; bumped affected dep to advisory-clean version. No M32d behavioural change; cleared the path for #1228 to merge through workspace-test on the same self-hosted fleet. | aprender #1242 merged | (companion bookkeeping in M35) |
| M35 | Companion-only audit-trail bump recording M32d functional discharge — contract v1.22.0 → v1.23.0 with full status_history entry cross-referencing aprender PRs #1222 / #1226 (squashed) / #1228 / #1242 / #1401, embedded live evidence (4 prompts × multi-domain output verification on lambda-vector RTX 4090 against cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf), output transition timeline, and cost-vs-estimate analysis (5 PRs / ~6 hours actual = lucky-case bound). pin.lock refresh aprender_commit cf5c7875c → 16f25af06, sha256 12f4bcb74110...→7818bd73a545..., M22 paired-mirror push at byte-identical sha256. NOT discharged: cosine ≥0.99 vs HF FP16 (operator-confirm pending ~60 GB download); GPU MoE path; sub-FFN MoE breakdown in apr trace (Step 3 + 4 work bypassed because the rank-3 + rank-4 fix was sufficient). |
direct main commit ca75ed0 |
this PR |
| M36 | Companion-only post-discharge drift sweep — applies the M22 5-step ritual extension (step 4: refresh human-readable roll-up views) to M32d's discharge surface. Updates: README badge v1.22.0 → v1.23.0 + status block M0-M34 → M0-M35 + M32d "open work" → "FUNCTIONALLY DISCHARGED" with output transition narrative; CONTRIBUTING status footer parallel; spec status snapshot qwen3-moe-forward-v1 v1.3.0 DRAFT → v1.4.0 ACTIVE_ALGORITHM_LEVEL; R9 risk struck through as DISCHARGED with cross-reference to PR #1228. Same drift class the M22 5-step ritual was extended to address — closing the README/CONTRIBUTING/spec lag against ground-truth. Does NOT modify the original "M34 FAST PATH" spec section (## M32d FAST PATH — five-whys + concrete next 6–13 PRs) — preserved as historical reference for retrospective comparison; component-prior table (rank-3 Q/K norm 15% + rank-4 RoPE θ 10%) empirically confirmed load-bearing. |
direct main commit 3fd90d0 |
this PR |
| M37 | Companion-only sub-milestones backfill — adds milestone-table rows for the M32d implementation PRs (#1222, #1228, #1242, #1401) plus M35/M36 audit-trail+drift-sweep companion commits. Closes the gap between status snapshot ("M0–M36 SHIPPED") and the sub-milestones table tail (which previously stopped at M34). No contract bump; spec markdown only. Same drift class M22 step 4 addresses. | direct main commit 24c1801 |
this PR |
| M38 | Companion-only mechanical doc-drift detector — scripts/check-doc-drift.sh (asserts spec header / status snapshot / README / CONTRIBUTING M-counts all match the sub-milestones table tail; asserts stated gate count matches FALSIFY-CCPA-NNN row marker count). Wired into make tier3 (between pin-check and the build steps), CI workflow .github/workflows/ci.yml (between pin-check-roundtrip and cargo fmt --check), and the pre-commit hook installed by scripts/install-hooks.sh. Codifies M22 step 4's drift-class backstop ("These are NOT mechanically guarded by pin-check; a kaizen sweep is the backstop") into authoring time. M37 alone produced 6 drift-fix commits this script's asserts would have caught. Same drift class as the M22 5-step ritual; the M22 mechanical guard now extends to step 4 too. |
direct main commit 7f66c57 |
this PR |
| M39 | Companion-only check-doc-drift.sh extension — adds 4 new asserts cross-referencing the contract YAML's metadata.version against (a) README badge contract-vX.Y.Z-green.svg, (b) README status block Contract at vX.Y.Z, (c) CONTRIBUTING status footer Status as of vX.Y.Z, (d) spec status snapshot claude-code-parity-apr-v1 vX.Y.Z. Drift-class addressed: M22 step 1 bumps the YAML; step 4 must refresh each of these mentions. Pre-M39 the only mechanical guard was pin-check (sha256 of the YAML bytes), which catches the pin.lock lag but not the human-readable version mentions. M36's narrative drift (badge stayed at v1.22.0 while spec was at v1.23.0 for ~30 minutes pre-fix) is exactly this class. Detector run output now reports contract YAML version: vX.Y.Z (matches all 4 cross-references) on success. |
direct main commit 74d9906 |
this PR |
| M40 | Companion-only check-doc-drift.sh extension — adds 2 new asserts cross-referencing the corpus fixture count: (a) measured-parity.json fixture_count equals count of NNNN-* dirs in fixtures/canonical/, (b) README corpus badge corpus-N%20%2F%2030 matches that same N. Drift class addressed: when fixtures are added/removed (rare since corpus is at the spec-prescribed 30/30, but possible) the measured-parity JSON meta + README badge can lag silently. Adding/removing a fixture without re-running the parity meter would have leaked through pre-M40. Detector run output now reports corpus fixture count: N (matches measured-parity.json + README badge) on success. Concurrent with #1409 (qwen3-moe-forward-v1 v1.3.0 → v1.4.0 ACTIVE_ALGORITHM_LEVEL squash 3a2f2705b) landing on aprender main 2026-05-02T14:57Z — M40 is companion-side bookkeeping; pin.lock requires no change because it points at the mirror branch (#1078 OPEN), not aprender main HEAD. |
direct main commit 54ccbc8 |
this PR |
| M41 | Companion-only check-doc-drift.sh extension — adds 1 more assert cross-referencing the spec's Falsification run history latest "Run N" revision-range end-M against the sub-milestones table tail. Drift class addressed: when a new milestone row is added (M37/M38/M39/M40 cycle), the Run history's "M1–MX" range stayed stale and was caught manually in commit 9ec1ef3 + the same class repeated this turn (Run 1 said "M1–M37" while tail was M40). Detector now catches this 9th cross-reference at authoring time. |
direct main commit bcb892e |
this PR |
| M42 | Companion-only check-doc-drift.sh extension — adds 1 more assert cross-referencing the README's status-XXX-green.svg shields.io badge against the contract YAML's top-level status: field (with shields.io __ → _ un-escape). Drift class addressed: contract bumped DRAFT → ACTIVE_RUNTIME (or any equivalent flip) but README status badge stayed at old value. The badge is the second hit a casual reader sees on the repo home page (right after the CI build badge); silent drift here misleads downstream consumers. Detector now catches this 10th cross-reference. |
direct main commit b8ed239 |
this PR |
| M43 | Companion-only check-doc-drift.sh extension — adds 1 more assert cross-referencing the README gates badge gates-N%2FT%20discharged denominator T against the spec's stated gate count (N gates total). Drift class addressed: a new gate gets added (the spec's gate count bumps from 13 to 14), but the README gates-X/Y badge denominator stays at 13. The CCPA-013 introduction in M11 is a confirmed instance of this drift class — the spec's 12 gates total header lagged behind the addition for ~25 milestones until M37 fixed it manually. Detector now catches this 11th cross-reference at authoring time. |
direct main commit b9bb7b9 |
this PR |
| M44 | Companion-only check-doc-drift.sh extension — adds 1 more assert cross-referencing the README parity badge measured%20parity-X.XXXX-brightgreen.svg against the measured-parity.json aggregate_score field (with 1e-4 tolerance for badge-rounding to 4 decimals via awk float comparison). Drift class addressed: parity meter is re-run on a refreshed fixture set (e.g., new fixture added or behavior changed), aggregate score moves, but the README badge stays at the old value. Currently both at 1.0000. Detector now catches this 12th cross-reference. |
direct main commit 1ad2a76 |
this PR |
| M45 | Companion-only scripts/smoke-m32d.sh — codifies the M32d dogfood loop into a repeatable smoke test (3 prompts × 3 domains: math 5+7=12, geography Capital of France: Paris, translation Hello world → ¡Hola mundo!). Each prompt's output gated by 2 assertions: (a) ≥5 non-whitespace chars, (b) does NOT contain the M32d-pre-fix gibberish marker %%%%%%%. Operator-opt-in via make smoke-m32d (NOT in make tier3 because requires 17.3 GB cached GGUF + ~3min runtime). Drift class addressed: a future aprender PR re-introduces gibberish output (per-head Q/K RMSNorm regression, rope_theta wrong default, chat template change, etc.) — currently only manual ad-hoc dogfood would catch this; M45 makes it 1 command. Codifies what the M37/M38/M40/M41/M42/M43/M44 cycles ran manually each iteration (math + geography + translation prompts dogfooded ~10 times across this session). |
direct main commit 059de8a |
this PR |
| M46 | Companion-only scripts/test-doc-drift.sh — meta-test for the M38–M44 drift detector itself. The detector can silently regress (a refactor breaks an assert, a regex stops matching, a typo disables a check) — at which point the drift class it was guarding is no longer mechanically prevented. M46 systematically corrupts each drift class (10 classes total at M44), runs the detector, asserts exit 1 with the expected message, restores. Wired into make tier3 (between check-doc-drift and the build steps) AND CI workflow. Pre/post-flight: detector must be clean on live repo before AND after the corruption sweep (catches files left dirty). |
direct main commit 151f850 |
this PR |
| M47 | Companion-only regression-corpus broadening — extended fixtures/regression/ from 3 fixtures (3 of 13 DriftCategory variants) to 11 fixtures (12 of 12 trace-stream variants). Added: 0004 MismatchedToolName, 0005 MissingHookEvent, 0006 ExtraHookEvent, 0007 MismatchedHookEvent, 0008 MissingSkillInvocation, 0009 ExtraSkillInvocation, 0010 MismatchedSkillInvocation, 0011 MismatchedActionKind. Aggregate score moved 0.5000 → 0.3182 across the 11 fixtures (still well below the 0.95 passes_gate threshold; meter sensitivity strengthened). The 13th variant (MismatchedFileState) is documented in fixtures/regression/README.md as out-of-scope for trace-only regression — it comes from FileState snapshots compared at session boundaries (FALSIFY-CCPA-005 crates/ccpa-differ/tests/falsify_ccpa_005_file_mutation.rs, 15 unit tests) not from action-stream extraction. Drift class addressed: a future regression breaking any specific DriftCategory detector now degrades the regression-corpus aggregate at fixture level, not just the unit-test level. |
direct main commits b0c9210/54290f6/5787535/bf05c87/61b894f/7c07fcf |
this PR |
| M48 | Companion-only spec scope expansion + arXiv citations — formalizes the M32 numerical-parity work as in-scope POC deliverable (was implicit since M31 monorepo clarification, now explicit). Adds new "Scope extensions (post-M32 numerical-parity work)" subsection under § Goal enumerating 3 sub-extensions: (1) quantized-LLM forward correctness — M32d FUNCTIONALLY DISCHARGED 2026-05-02; (2) GPU MoE forward path — multi-week aprender follow-up; (3) sub-FFN MoE diagnostic surface — M32d Step 4 bypassed but useful for future MoE work. Adds 8 new arXiv citations to § Academic basis under "Scope-extension citations" subsection: arXiv:1701.06538 (Shazeer MoE), 2101.03961 (Fedus Switch Transformers), 2202.09368 (Zoph ST-MoE router stability), 1910.07467 (Zhang & Sennrich RMSNorm — M32d Step 5 rank-3 prior 15 %), 2104.09864 (Su RoPE — M32d Step 5b rank-4 prior 10 %), 2210.17323 (Frantar GPTQ — cosine framework), 2305.18398 (Dao FlashAttention-2 — fused-kernel parity), 2305.05176 (Aminabadi DeepSpeed-MoE — GPU MoE precedent). Each citation maps to the specific contract / sub-extension / M-step it supports; the two M32d-load-bearing citations (1910.07467, 2104.09864) explicitly note their empirical role from the FAST PATH discharge. | direct main commit fd9b1fc |
this PR |
| M49 | Companion-only GPU MoE priority elevation — M48 listed sub-extension 2 (GPU MoE forward path) as a generic "multi-week aprender follow-up". Operator escalation 2026-05-04: this is now P0 / HIGHEST PRIORITY — the rate-limit on consuming the M32d discharge at production cadence. Spec changes: § Scope extensions reordered to put sub-extension 2 first with explicit P0/P1/P2 labels; sub-extension 2 expanded with rationale (~30 tok/s CPU baseline vs ~225–440 tok/s dense GPU = ~10× gap, makes Qwen3-Coder-30B-A3B-Instruct-Q4_K_M default model production-infeasible at ~30 tok/s) + 5 required deliverables (kernel contract qwen3-moe-forward-gpu-v1, CUDA forward_qwen3_moe_gpu, wgpu fallback, CPU↔GPU cosine ≥0.99 parity gate FALSIFY-QW3-MOE-GPU-PARITY-001, throughput target ≥150 tok/s) + estimate (2–4 weeks aprender-side). Adds new R10 risk row: GPU MoE forward path missing, marked as the P0 blocker on production-grade apr code consumption of the spec-prescribed default model. Recording cadence note added at end of § Scope extensions: until sub-extension 2 ships, every dogfood loop assumes the ~30 tok/s baseline. |
direct main commit a502e8e |
this PR |
| M50 | Cross-repo M-GPU-MOE-0 SHIPPED — first concrete step toward retiring R10. New aprender kernel contract qwen3-moe-forward-gpu-v1 v1.0.0 DRAFT scaffold landed on aprender main 2026-05-04T04:56:35Z (PR #1453 squash cf08e910f). 7 proof obligations + 7 falsification tests + 2 kani harnesses + qa_gate; pv validate 0/0. Defines what GPU MoE means: cosine ≥0.99 vs CPU LAZY-FUSED-MATVEC reference (FALSIFY-QW3-MOE-GPU-PARITY-001), cosine ≥0.99 vs HF FP16 reference (PARITY-002, inherits from v1), router-weight invariants (INVARIANTS-001), determinism on same-seed reruns (DETERMINISM-001), throughput ≥150 tok/s on RTX 4090 (THROUGHPUT-001), VRAM utilization ≤95% (MEMORY-001). Implementation stages: M-GPU-MOE-0 (scaffold) SHIPPED; M-GPU-MOE-1 (CUDA kernel + cosine-vs-CPU parity gate) PENDING; M-GPU-MOE-2 (wgpu fallback) PENDING; M-GPU-MOE-3 (throughput + memory) PENDING. Per CLAUDE.md "NEVER write code before writing a provable contract" — this is the contract-first delivery. Companion-side this is bookkeeping (no companion contract bump because the companion contract pins claude-code-parity-apr-v1.yaml, not aprender kernel contracts). |
aprender #1453 merged at cf08e910f |
this PR |