Milestones M0–M50

Top spec: claude-code-parity-apr-poc.md | M51–M100 | M101–M111

POC scaffold (M0–M6), corpus expansion (M11–M19), drift-detector cascade (M37–M44), regression corpus broadening (M47), monorepo scope clarification (M31), numerical-parity setup (M32a–M32d.3), audit-trail bumps (M33/M35/M36), companion-repo bookkeeping. Forward-chronological per M-ID where possible.

ID	Deliverable	Source-of-truth gates / Notes	PR / Commit
M0	Spec + top-level contract `claude-code-parity-apr-v1.yaml` (DRAFT). Companion-repo invariants 009–012 online from the empty-scaffold PR forward.	009, 010, 011, 012	DONE (PRs #1, #2)
M1	Companion repo scaffold (empty crates that compile + 100 % line cov), `ccpa-trace` crate, schema-roundtrip test. Spec + contract relocate from aprender to companion repo as canonical.	+ 001	DONE (PRs #3, #4)
M2	`ccpa record` (Anthropic Messages-API parser → trace records). Note: M2.3 HTTPS proxy is OOS post-rescope ("we will not call api, we will assume claude code").	+ (still 001)	DONE (PRs #5, #6) at parser-only scope
M3	`ccpa replay` (LlmDriver trait + RecordedDriver) — algorithm-level. ~~Real `apr code` LlmDriver adapter pending PMAT-CODE-LLM-DRIVER-PUBLIC-001 in upstream aprender.~~ (M150 finding: PMAT-CODE-LLM-DRIVER-PUBLIC-001 turned out to not gate the work; real `apr code` invocations succeeded at M150 with locally-built apr. Upstream surface is aprender#1638 feature-flag removal.)	+ 002, 003	DONE (PRs #7, #8) at algorithm scope
M4	`ccpa diff` semantic differ — per-tool equivalence rules, file-mutation snapshots, parity score.	+ 004, 005	DONE (PRs #9, #10)
M5	Sovereignty gate (no `api.anthropic.com` on replay) + corpus growth + parity-matrix coverage walk.	+ 006, 007	DONE (PRs #11, #13–#21)
M6	Promote contract DRAFT → ACTIVE; integrate into `make tier3` and `pv lint`; close epic.	+ 008	DONE (PR #12)
M11	First runtime measured_parity over 5 paired canonical fixtures; contract DRAFT → ACTIVE_RUNTIME (FALSIFY-CCPA-013 discharged)	aggregate_score 1.0000 over 5 fixtures, parity-matrix coverage 1/17	#13
M12	Corpus 5 → 8; coverage 1/17 → 4/17 (subagent-spawn, mcp-client, slash-commands added)	1.0000 / 8 fixtures	#14
M13	Corpus 8 → 11; coverage 4/17 → 7/17 (claude-md-memory, permission-modes, builtin-tools-web added)	1.0000 / 11 fixtures	#17
M13.5	Bidirectional sensitivity: regression corpus added; meter must FAIL on deliberate drift	regression corpus aggregate=0.5, exits 1 (drift detected)	#14
M14	Corpus 14 → 17; coverage 10/17 → 13/17 (worktree-isolation, configuration-ladder, managed-org-policy added)	1.0000 / 17 fixtures	#19
M15	Trace schema v1 → v2 (additive `HookEvent` + `SkillInvocation` record kinds); differ extension (7 new DriftCategory variants); coverage 13/17 → 15/17	1.0000 / 19 fixtures; contract v1.2.0 → v1.3.0	#20
M16	FALSIFY-CCPA-007 informational → HARD-BLOCKING; OOS exclusion mechanism (`--oos-rows`) shipped for `keyboard-shortcuts` + `status-line`	15/15 reachable, gate PASS; contract v1.3.0 → v1.4.0	#21
M17	Spec milestone table refreshed to reflect M0–M16; contract v1.4.0 → v1.5.0	doc-only	#22
M18	Corpus depth 19 → 24; 5 schema-v2 surface variants (Bash multiline, Edit replace_all, HookDecision::Block, SkillSource::UserInvoked, StopReason::MaxTokens)	1.0000 / 24 fixtures; contract v1.5.0 → v1.6.0	#23
M19	Corpus complete 24 → 30 (spec ≥30 target met); multi-tool sequences + multi-turn correction + StopReason::StopSequence	1.0000 / 30 fixtures; contract v1.6.0 → v1.7.0	#24
M20	README truth-up — badges (v1.2.0 → v1.7.0), behavioral-gates table flipped from "planned" to ✅ ACTIVE, architecture diagram revised post-rescope	doc-only; contract v1.7.0 → v1.8.0	#25
M21	Aprender-side mirror sync v1.2.0 → v1.8.0 (6 revisions of drift cleared); first round-trip closure	byte-identical sha256 across both repos; contract v1.8.0 → v1.9.0	#27 (squash inc. M21)
M22	`pin-check-roundtrip.sh` CI guard installed — fails any companion bump unpaired with aprender mirror	drift class mechanically prevented; contract v1.9.0 → v1.10.0	#27
M23	`CONTRIBUTING.md` authored — source-of-truth split, fixture-authoring workflow, 4-step contract-bump ritual, gate→remediation lookup, anti-patterns	doc-only; contract v1.10.0 → v1.11.0	#28
M24	100% mutation coverage on `ccpa-differ` (gate kernel) — 5 kill-tests close arm-deletion + && → \|\| gaps	122 caught + 8 unviable + 0 missed across 130 mutants; contract v1.11.0 → v1.12.0	#29
M25	100% mutation coverage workspace-wide (remaining 4 crates) — 3 kill-tests on `ccpa-cli` close `main` exit-code propagation + uncovered/OOS print branches	193 caught + 31 unviable + 0 missed across 224 mutants; contract v1.12.0 → v1.13.0	#30
M26	`ccpa measure` AUTHORED → MEASURED bridge subcommand — drives live `apr code -p` against teacher's user_prompt, builds synthetic student trace, scores via compute_parity_score; refuses tool_use teachers (text-only path; tool dispatch waits on M28)	text-only score-1.0 vacuous; tool-dispatch path requires `apr code --emit-trace`	#31
M27	Spec-table refresh — sub-milestones table extended M19 → M26; contract v1.14.0 → v1.15.0	doc-only	#32
M28	Cross-repo: `apr code --emit-trace <path>` flag upstream + Qwen3-Coder-30B-A3B-Instruct as default model + `qwen3-coder` short-name alias. Companion bookkeeping records the launch.	aprender PRs landed via #1102; companion contract v1.15.0 → v1.16.0	companion #33
M29	Five-whys + provable-contract — Qwen3-Coder GGUF load fail (`Tensor 'blk.0.ffn_up.weight' not found`) traced to GGUF tensor naming being arch-agnostic. Fix: `tensor-names-v1` v1.0.0 → v1.1.0 with `qwen3_moe` arch-key + 4 new MoE layer roles + F-TNV-002 falsifier validated against the real 17.3 GB Qwen3-Coder GGUF byte inventory.	aprender#1103 merged at 15d504cfe; companion contract v1.16.0 → v1.17.0	companion #34
M30	Spec-table refresh — extends through M29; contract v1.17.0 → v1.18.0; closes the spec-side audit trail	doc-only	#35
M31	Monorepo scope clarification — aprender and claude-code-parity-apr live in the same monorepo; "upstream / out of scope / not a CCPA POC item" framing removed from spec + contract status_history; future inference-engine work (M32 MoE forward pass) treated as in-scope companion-repo deliverable	doc-only; contract v1.18.0 → v1.19.0	direct main commit `1f06ac0`
M32a	First slice of MoE forward chain. Authored cross-repo kernel contract `qwen3-moe-forward-v1.yaml` (DRAFT, SCAFFOLD) composing `tensor-names-v1` v1.1.0 + `moe-router-v1` + `moe-expert-dispatch-v1` + `qwen3moe-shapes-v1` + swiglu/silu/rmsnorm/rope. 5 acceptance criteria + 4 staged steps (M32a/b/c/d) + 4 falsification tests. Anchors Qwen3-Coder-30B-A3B-Instruct shape algebra (L=48, d=2048, d_ff=6144, N_experts=128, k=8). FALSIFY-QW3-MOE-FORWARD-001 reproduced on lambda-vector RTX 4090.	aprender #1104 merged at 78101494c	this PR
M32b	Architecture-aware FFN load — both `QuantizedGGUFTransformer::from_gguf` and `GGUFTransformer::from_gguf` short-circuit `qwen3_moe` with structured `RealizarError::UnsupportedOperation` referencing the M32a contract id. Cryptic `Tensor 'blk.0.ffn_up.weight' not found` replaced with audit-named error. 2 falsifier tests (synthetic + live 17.3 GB GGUF) discharge FALSIFY-QW3-MOE-FORWARD-002.	aprender #1106 merged at 90cc293a7	direct-merged commit `883a838`
M32c.1	`Qwen3MoeQuantizedLayer` struct + `load_qwen3_moe_layer()` loader using the M29 contract namespace (`blk.{L}.ffn_gate_inp/ffn_gate_exps/ffn_up_exps/ffn_down_exps.weight`). Live verification: 4 MoE tensors per layer × 48 layers = 192 expert-tensor descriptors loaded from the cached 17.3 GB GGUF; total expert bytes 17.5 GB matches file size.	aprender #1116 merged at ced9fe32b	(companion bookkeeping in this PR)
M32c.2	`QuantizedGGUFTransformer::from_gguf_for_moe` constructor — qwen3_moe-aware sibling of `from_gguf`. Adds `moe_layers: Vec<Option<Qwen3MoeQuantizedLayer>>` field parallel to `layers`. Loads non-FFN portion via `load_quantized_layer_moe_skeleton`; dense FFN fields stub as zero-element placeholders; `moe_layers[i] = Some(...)` for every L.	aprender #1117 merged at ffd0b246f	(companion bookkeeping in this PR)
M32c.2.1	Flip `from_gguf` dispatch to `from_gguf_for_moe` for arch == qwen3_moe; replace M32b's load-time `UnsupportedOperation` with a forward-time `UnsupportedOperation { operation: "moe_forward_dispatch" }` at `gguf_gpu_generate.rs`. Live `apr run` against 17.3 GB GGUF now reaches inference attempt; error reports load succeeded, only forward dispatch unwired. M32b test updated.	aprender #1118 merged at 97c808e29	this PR
M32c.2.2	Contract amendment recording the M32c.2.2 implementation strategy: LAZY-FUSED-MATVEC (per-token forward keeps the 4 MoE expert tensors in their on-disk Q4_K/Q6_K/F32 form and dequantizes inline through `fused_q4k_parallel_matvec` / `fused_q6k_parallel_matvec` row-major matvec kernels per CLAUDE.md LAYOUT-002 row-major mandate). Decision rationale: preserves the 8× memory-bandwidth advantage; avoids materializing 18 GB of dense FP32 expert tensors. Bumps `qwen3-moe-forward-v1` v1.0.0 → v1.1.0.	aprender #1119 merged at 590b8d6aa	this PR
M32c.2.2.0	`expert_byte_slice` adapter — given `(layer_idx, expert_idx, role)`, returns the `&[u8]` slice into the mmapped GGUF for that expert's quantized tensor. Reuses the M32c.1 `Qwen3MoeQuantizedLayer` descriptor (offsets, qtype, byte sizes). Per-expert sizes vary by qtype: Q4_K [768, 2048] = 884,736 bytes; Q6_K [2048, 768] = 1,290,240 bytes.	aprender #1120 merged at db3436da9	this PR
M32c.2.2.1	`expert_swiglu_quantized` — per-expert SwiGLU FFN dispatch using LAZY-FUSED-MATVEC: gate_proj + up_proj via `fused_q4k_parallel_matvec`, SwiGLU activation, down_proj via `fused_q6k_parallel_matvec`, all reading from `expert_byte_slice`. Returns `Vec<f32>` of shape `[hidden_dim]`. Pure-CPU row-major.	aprender #1121 merged at 4dd9ec21e	this PR
M32c.2.2.2.0	`moe_ffn_forward_layer` — single-layer MoE FFN dispatch. Composes router (softmax + top-k=8) + per-token expert routing + `expert_swiglu_quantized` per-expert call + weighted aggregation. Shape: `[hidden_dim]` in → `[hidden_dim]` out. Replaces dense FFN block at one layer index.	aprender #1122 merged at 1ab8e7fc5	this PR
M32c.2.2.2.1	Contract amendment recording the M32c.2.2.2.1 integration architecture decision. Three approaches compared: (A) field-add to `OwnedQuantizedModel` across 99 sites, (B) parallel `run_qwen3_moe_generate` function, (C) wrapper struct. Chose hybrid: method `forward_qwen3_moe` on `OwnedQuantizedModel` taking MoE descriptors as parameters (zero field-add, zero attention/RoPE duplication) + parallel `run_qwen3_moe_generate` autoregressive loop (zero touch on dense path). Bumps `qwen3-moe-forward-v1` v1.1.0 → v1.2.0.	aprender #1123 merged at bd0871803	this PR
M32c.2.2.2.1.1	`OwnedQuantizedModel::forward_qwen3_moe` — single-token forward method. Mirrors `forward()` step-for-step except FFN site calls `moe_ffn_forward_layer`. Reuses existing `&self` methods for qkv_matmul, apply_rope, causal_attention, fused_matmul, lm_head — zero duplication. Test `f_qw3_moe_c22211_001` exercises end-to-end against cached 17.3 GB GGUF: logits.len() == 151936, all finite, argmax in vocab range.	aprender #1124 merged at 10c74c400	this PR
M32c.2.2.2.1.2	`run_qwen3_moe_generate` — autoregressive generation loop. Reads MoE config (num_experts, k, intermediate) from GGUF metadata via new `expert_count()` / `expert_used_count()` / `expert_feed_forward_length()` accessors on `GGUFModel`. Loads per-layer `Qwen3MoeQuantizedLayer` descriptors once, then full-prefill-per-token loop with greedy argmax sampling. No KV cache (M32d follow-up). Sibling of `run_gguf_generate` for qwen3_moe arch.	aprender #1125 merged at 16dcfe765	this PR
M32c.2.2.2.1.3	Dispatch flip in `inference_result.rs` routing `qwen3_moe` arch to `run_qwen3_moe_generate` instead of `run_gguf_generate`. Plus Q4_K_M qtype-aware dispatch (`matvec_for_qtype` helper) — Q4_K_M GGUF mixes Q4_K (qtype=12) and Q6_K (qtype=14) within and across layers, so per-expert matmul must dispatch on `tensor.qtype` at runtime instead of hardcoding kernel by role. FALSIFY-QW3-MOE-FORWARD-003 LIVE DISCHARGE on lambda-vector RTX 4090: `apr run` against cached 17.3 GB GGUF emits "aaaaaaaa" / "." (any non-whitespace) end-to-end.	aprender #1126 merged at a902eea93	#38
M32c.2.2.2.1.4	Live `apr run` falsifier in `aprender-serve/tests/qwen3_moe_apr_run_live.rs` pinning FALSIFY-QW3-MOE-FORWARD-003 as a regression test against the cached 17.3 GB Qwen3-Coder GGUF. Subprocess invocation via `Command::new(apr).args(["run", "--prompt", "Hi", "--max-tokens", "4"])`; assertions: exit 0, stdout matches `/\\S/`, stderr does not contain "Tensor 'blk.0.ffn_up.weight' not found". Skipped when GGUF absent (fixture-absent ≠ defect). Locks the M32c.2.2.2.1.3 discharge surface; any regression now fails CI.	aprender #1127 merged at 0392b1843	this PR
M32d.0	`qwen3-moe-forward-v1` contract amendment v1.2.0 → v1.3.0 — encodes the parity strategy for the upcoming numerical-correctness work: cosine ≥0.99 vs llama.cpp Q4_K reference logits AND cosine ≥0.99 vs Hugging Face FP16 reference logits at the LM-head, with two new falsifiers `F-QW3-MOE-PARITY-001` (HF FP16 cosine) and `F-QW3-MOE-PARITY-002` (llama.cpp argmax sanity). Status remains DRAFT; flips to ACTIVE_RUNTIME at M32d discharge.	aprender #1128 merged at 2682132f7	M33 audit-trail bookkeeping
M32d.1	`scripts/generate_qwen3_moe_fp16_logits.py` — Hugging Face FP16 reference logits fixture generator. Pure Python via `transformers` + `torch`; downloads `Qwen/Qwen3-Coder-30B-A3B-Instruct` once (~60 GB), runs a single forward pass on a fixed prompt, dumps `[batch, seq, vocab]` logits to JSON for the M32d.2 cosine gate to consume. Multi-device offload via `device_map="auto"`. Operator-confirm to run because of the download size and the ~30 min runtime on a 30B-A3B model.	aprender #1129 merged at 87a2a61c1	M33 audit-trail bookkeeping
M32d.2	`crates/aprender-serve/tests/qwen3_moe_parity.rs` — `f_qw3_moe_parity_001` cosine gate against the M32d.1 HF FP16 fixture. Marked `#[ignore]` until the fixture file lands (does not exist on disk yet — Step 1 of M34 FAST PATH). When run with `--include-ignored`, computes cosine of `[hidden]→logits` between APR forward and HF reference; asserts ≥0.99. F-QW3-MOE-PARITY-001 falsifier wired.	aprender #1130 merged at ce6ca4bb4	M33 audit-trail bookkeeping
M32d.3	`crates/aprender-serve/tests/qwen3_moe_argmax_parity.rs` — `f_qw3_moe_argmax_parity_002` llama.cpp argmax sanity check. Independent of the HF fixture: runs APR `apr run --prompt "Once upon a time"` and `llama-cli --prompt "Once upon a time" --n-predict 1` against the same Qwen3-Coder GGUF and asserts that both pick the same top-1 token id. F-QW3-MOE-PARITY-002 falsifier wired. Skipped when llama.cpp binary or GGUF absent.	aprender #1131 merged at 9f93d02d9	M33 audit-trail bookkeeping
M33	Companion-only audit-trail bump — pin.lock refreshed from aprender commit a8623f650 → 3ea8114c8 with note recording the M32c.2.2.2.1.4 + M32d.0/.1/.2/.3 set. Companion contract bumped v1.20.0 → v1.21.0; M22 paired aprender mirror push at byte-identical sha256. No code change. Closed the bookkeeping lag between aprender main and the companion-side spec snapshot.	direct main commit `4ddae99`	this PR
M34	Companion-only spec amendment — adds the section "M32d FAST PATH — five-whys + concrete next 6–13 PRs" embedding a five-whys analysis of the gibberish-output symptom and an ordered, falsifiable 6-step plan to discharge M32d (measure → wire trace → bisect layer → sub-bisect component → fix → discharge), with component priors (LAYOUT 30% / Q4_K_M scales 20% / per-head Q-K norm 15% / RoPE θ 10% / router softmax 10% / embedding 10% / other 5%) and cost estimate (4–6 PRs lucky / 8–10 realistic / 12–15 pessimistic). Companion contract v1.21.0 → v1.22.0; aprender mirror at cf5c7875c. No code change. Converts open M32d work from "iterate on output" to "produce concrete cosine numbers and bisect".	direct main commit `7200d2b`	this PR
M32d-Step2	`forward_qwen3_moe_traced` — diagnostic-surface sibling of `forward_qwen3_moe` that emits per-layer std-dev + L2-norm at 5 probe points (post-embed, post-attn, post-MoE, post-residual, post-RMS-final) without altering production forward semantics. Drives the M34 FAST PATH bisection: per-layer std growth signature was the rank-3 Q/K norm tell. Q/K-norm absence produced 40× std growth at attention output by layer 8.	aprender #1222 merged	(companion bookkeeping in M35)
M32d-Step2-JSON	`apr trace --json --payload` — JSON output for the trace surface so Step 2 std-dev numbers are machine-readable rather than eyeballed from stderr. `handle_special_modes_with_json` + `run_traced_inference_json` route at the apr-cli boundary; output shape is the falsifier-test exit-criterion shape from the M34 plan (per-layer `{layer, std, l2}` array).	aprender #1401 merged	(companion bookkeeping in M35)
M32d-Step5+5b+6+7	THE BUNDLE — root cause fix for `%%%%%%%%` → coherent output transition. Squashes 4 fixes: (Step 5) per-head Q/K RMSNorm in `forward_qwen3_moe` between bias-add and RoPE — discharges rank-3 prior (15%); std at attention output drops from 40× to 1.0×. (Step 5b) `rope_theta` default 10K → 1M for `qwen3_moe`/`qwen3` arches in `gguf/config.rs` — discharges rank-4 prior (10%); long-context positional encoding correctly Qwen3-tuned. (Step 6) `chat_template_helpers.rs` routes `qwen3_moe`/`qwen3moe` to plain ChatML (no `<think>` injection) BEFORE the generic qwen3 → Qwen3NoThink rule. (Step 7) Sync `forward_qwen3_moe_traced` with Step 5 Q/K norm so traced and production paths stay byte-equivalent. F-QW3-MOE-STEP5-001 regression test wired. Output transition timeline: `%%%%%%%%` → "Human: What is 2+" (Step 5) → "Human: What is 2+2?" (Step 5b) → "2 + 2 = 4" (Step 6). Multi-domain: math/geography/translation/code all coherent. M34 FAST PATH lucky-case bound: 5 PRs / ~6 hours wall vs 4–6 PRs / 2–3 days lucky / 8–10 PRs / 4–6 days realistic estimate.	aprender #1228 merged at squash 5235aaeb9	(companion bookkeeping in M35)
M32d-RUSTSEC-unblock	Companion CI unblocker — RUSTSEC-2026-0114 transitive advisory deny-by-default landed in main; bumped affected dep to advisory-clean version. No M32d behavioural change; cleared the path for #1228 to merge through workspace-test on the same self-hosted fleet.	aprender #1242 merged	(companion bookkeeping in M35)
M35	Companion-only audit-trail bump recording M32d functional discharge — contract v1.22.0 → v1.23.0 with full `status_history` entry cross-referencing aprender PRs #1222 / #1226 (squashed) / #1228 / #1242 / #1401, embedded live evidence (4 prompts × multi-domain output verification on lambda-vector RTX 4090 against cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf), output transition timeline, and cost-vs-estimate analysis (5 PRs / ~6 hours actual = lucky-case bound). pin.lock refresh `aprender_commit cf5c7875c → 16f25af06`, `sha256 12f4bcb74110...→7818bd73a545...`, M22 paired-mirror push at byte-identical sha256. NOT discharged: cosine ≥0.99 vs HF FP16 (operator-confirm pending ~60 GB download); GPU MoE path; sub-FFN MoE breakdown in `apr trace` (Step 3 + 4 work bypassed because the rank-3 + rank-4 fix was sufficient).	direct main commit `ca75ed0`	this PR
M36	Companion-only post-discharge drift sweep — applies the M22 5-step ritual extension (step 4: refresh human-readable roll-up views) to M32d's discharge surface. Updates: README badge `v1.22.0 → v1.23.0` + status block `M0-M34 → M0-M35` + M32d "open work" → "FUNCTIONALLY DISCHARGED" with output transition narrative; CONTRIBUTING status footer parallel; spec status snapshot `qwen3-moe-forward-v1 v1.3.0 DRAFT → v1.4.0 ACTIVE_ALGORITHM_LEVEL`; R9 risk struck through as DISCHARGED with cross-reference to PR #1228. Same drift class the M22 5-step ritual was extended to address — closing the README/CONTRIBUTING/spec lag against ground-truth. Does NOT modify the original "M34 FAST PATH" spec section (`## M32d FAST PATH — five-whys + concrete next 6–13 PRs`) — preserved as historical reference for retrospective comparison; component-prior table (rank-3 Q/K norm 15% + rank-4 RoPE θ 10%) empirically confirmed load-bearing.	direct main commit `3fd90d0`	this PR
M37	Companion-only sub-milestones backfill — adds milestone-table rows for the M32d implementation PRs (#1222, #1228, #1242, #1401) plus M35/M36 audit-trail+drift-sweep companion commits. Closes the gap between status snapshot ("M0–M36 SHIPPED") and the sub-milestones table tail (which previously stopped at M34). No contract bump; spec markdown only. Same drift class M22 step 4 addresses.	direct main commit `24c1801`	this PR
M38	Companion-only mechanical doc-drift detector — `scripts/check-doc-drift.sh` (asserts spec header / status snapshot / README / CONTRIBUTING M-counts all match the sub-milestones table tail; asserts stated gate count matches FALSIFY-CCPA-NNN row marker count). Wired into `make tier3` (between `pin-check` and the build steps), CI workflow `.github/workflows/ci.yml` (between `pin-check-roundtrip` and `cargo fmt --check`), and the pre-commit hook installed by `scripts/install-hooks.sh`. Codifies M22 step 4's drift-class backstop ("These are NOT mechanically guarded by pin-check; a kaizen sweep is the backstop") into authoring time. M37 alone produced 6 drift-fix commits this script's asserts would have caught. Same drift class as the M22 5-step ritual; the M22 mechanical guard now extends to step 4 too.	direct main commit `7f66c57`	this PR
M39	Companion-only `check-doc-drift.sh` extension — adds 4 new asserts cross-referencing the contract YAML's `metadata.version` against (a) README badge `contract-vX.Y.Z-green.svg`, (b) README status block `Contract at vX.Y.Z`, (c) CONTRIBUTING status footer `Status as of vX.Y.Z`, (d) spec status snapshot `claude-code-parity-apr-v1 vX.Y.Z`. Drift-class addressed: M22 step 1 bumps the YAML; step 4 must refresh each of these mentions. Pre-M39 the only mechanical guard was pin-check (sha256 of the YAML bytes), which catches the `pin.lock` lag but not the human-readable version mentions. M36's narrative drift (badge stayed at v1.22.0 while spec was at v1.23.0 for ~30 minutes pre-fix) is exactly this class. Detector run output now reports `contract YAML version: vX.Y.Z (matches all 4 cross-references)` on success.	direct main commit `74d9906`	this PR
M40	Companion-only `check-doc-drift.sh` extension — adds 2 new asserts cross-referencing the corpus fixture count: (a) `measured-parity.json` `fixture_count` equals count of `NNNN-*` dirs in `fixtures/canonical/`, (b) README corpus badge `corpus-N%20%2F%2030` matches that same N. Drift class addressed: when fixtures are added/removed (rare since corpus is at the spec-prescribed 30/30, but possible) the measured-parity JSON meta + README badge can lag silently. Adding/removing a fixture without re-running the parity meter would have leaked through pre-M40. Detector run output now reports `corpus fixture count: N (matches measured-parity.json + README badge)` on success. Concurrent with #1409 (qwen3-moe-forward-v1 v1.3.0 → v1.4.0 ACTIVE_ALGORITHM_LEVEL squash 3a2f2705b) landing on aprender main 2026-05-02T14:57Z — M40 is companion-side bookkeeping; pin.lock requires no change because it points at the mirror branch (#1078 OPEN), not aprender main HEAD.	direct main commit `54ccbc8`	this PR
M41	Companion-only `check-doc-drift.sh` extension — adds 1 more assert cross-referencing the spec's Falsification run history latest "Run N" revision-range end-M against the sub-milestones table tail. Drift class addressed: when a new milestone row is added (M37/M38/M39/M40 cycle), the Run history's "M1–MX" range stayed stale and was caught manually in commit 9ec1ef3 + the same class repeated this turn (Run 1 said "M1–M37" while tail was M40). Detector now catches this 9th cross-reference at authoring time.	direct main commit `bcb892e`	this PR
M42	Companion-only `check-doc-drift.sh` extension — adds 1 more assert cross-referencing the README's `status-XXX-green.svg` shields.io badge against the contract YAML's top-level `status:` field (with shields.io `__` → `_` un-escape). Drift class addressed: contract bumped DRAFT → ACTIVE_RUNTIME (or any equivalent flip) but README status badge stayed at old value. The badge is the second hit a casual reader sees on the repo home page (right after the CI build badge); silent drift here misleads downstream consumers. Detector now catches this 10th cross-reference.	direct main commit `b8ed239`	this PR
M43	Companion-only `check-doc-drift.sh` extension — adds 1 more assert cross-referencing the README gates badge `gates-N%2FT%20discharged` denominator T against the spec's stated gate count `(N gates total)`. Drift class addressed: a new gate gets added (the spec's gate count bumps from 13 to 14), but the README gates-X/Y badge denominator stays at 13. The CCPA-013 introduction in M11 is a confirmed instance of this drift class — the spec's `12 gates total` header lagged behind the addition for ~25 milestones until M37 fixed it manually. Detector now catches this 11th cross-reference at authoring time.	direct main commit `b9bb7b9`	this PR
M44	Companion-only `check-doc-drift.sh` extension — adds 1 more assert cross-referencing the README parity badge `measured%20parity-X.XXXX-brightgreen.svg` against the `measured-parity.json` `aggregate_score` field (with 1e-4 tolerance for badge-rounding to 4 decimals via `awk` float comparison). Drift class addressed: parity meter is re-run on a refreshed fixture set (e.g., new fixture added or behavior changed), aggregate score moves, but the README badge stays at the old value. Currently both at `1.0000`. Detector now catches this 12th cross-reference.	direct main commit `1ad2a76`	this PR
M45	Companion-only `scripts/smoke-m32d.sh` — codifies the M32d dogfood loop into a repeatable smoke test (3 prompts × 3 domains: math `5+7=12`, geography `Capital of France: Paris`, translation `Hello world → ¡Hola mundo!`). Each prompt's output gated by 2 assertions: (a) ≥5 non-whitespace chars, (b) does NOT contain the M32d-pre-fix gibberish marker `%%%%%%%`. Operator-opt-in via `make smoke-m32d` (NOT in `make tier3` because requires 17.3 GB cached GGUF + ~3min runtime). Drift class addressed: a future aprender PR re-introduces gibberish output (per-head Q/K RMSNorm regression, rope_theta wrong default, chat template change, etc.) — currently only manual ad-hoc dogfood would catch this; M45 makes it 1 command. Codifies what the M37/M38/M40/M41/M42/M43/M44 cycles ran manually each iteration (math + geography + translation prompts dogfooded ~10 times across this session).	direct main commit `059de8a`	this PR
M46	Companion-only `scripts/test-doc-drift.sh` — meta-test for the M38–M44 drift detector itself. The detector can silently regress (a refactor breaks an assert, a regex stops matching, a typo disables a check) — at which point the drift class it was guarding is no longer mechanically prevented. M46 systematically corrupts each drift class (10 classes total at M44), runs the detector, asserts exit 1 with the expected message, restores. Wired into `make tier3` (between `check-doc-drift` and the build steps) AND CI workflow. Pre/post-flight: detector must be clean on live repo before AND after the corruption sweep (catches files left dirty).	direct main commit `151f850`	this PR
M47	Companion-only regression-corpus broadening — extended `fixtures/regression/` from 3 fixtures (3 of 13 `DriftCategory` variants) to 11 fixtures (12 of 12 trace-stream variants). Added: 0004 `MismatchedToolName`, 0005 `MissingHookEvent`, 0006 `ExtraHookEvent`, 0007 `MismatchedHookEvent`, 0008 `MissingSkillInvocation`, 0009 `ExtraSkillInvocation`, 0010 `MismatchedSkillInvocation`, 0011 `MismatchedActionKind`. Aggregate score moved 0.5000 → 0.3182 across the 11 fixtures (still well below the 0.95 `passes_gate` threshold; meter sensitivity strengthened). The 13th variant (`MismatchedFileState`) is documented in `fixtures/regression/README.md` as out-of-scope for trace-only regression — it comes from `FileState` snapshots compared at session boundaries (FALSIFY-CCPA-005 `crates/ccpa-differ/tests/falsify_ccpa_005_file_mutation.rs`, 15 unit tests) not from action-stream extraction. Drift class addressed: a future regression breaking any specific `DriftCategory` detector now degrades the regression-corpus aggregate at fixture level, not just the unit-test level.	direct main commits `b0c9210`/`54290f6`/`5787535`/`bf05c87`/`61b894f`/`7c07fcf`	this PR
M48	Companion-only spec scope expansion + arXiv citations — formalizes the M32 numerical-parity work as in-scope POC deliverable (was implicit since M31 monorepo clarification, now explicit). Adds new "Scope extensions (post-M32 numerical-parity work)" subsection under § Goal enumerating 3 sub-extensions: (1) quantized-LLM forward correctness — M32d FUNCTIONALLY DISCHARGED 2026-05-02; (2) GPU MoE forward path — multi-week aprender follow-up; (3) sub-FFN MoE diagnostic surface — M32d Step 4 bypassed but useful for future MoE work. Adds 8 new arXiv citations to § Academic basis under "Scope-extension citations" subsection: arXiv:1701.06538 (Shazeer MoE), 2101.03961 (Fedus Switch Transformers), 2202.09368 (Zoph ST-MoE router stability), 1910.07467 (Zhang & Sennrich RMSNorm — M32d Step 5 rank-3 prior 15 %), 2104.09864 (Su RoPE — M32d Step 5b rank-4 prior 10 %), 2210.17323 (Frantar GPTQ — cosine framework), 2305.18398 (Dao FlashAttention-2 — fused-kernel parity), 2305.05176 (Aminabadi DeepSpeed-MoE — GPU MoE precedent). Each citation maps to the specific contract / sub-extension / M-step it supports; the two M32d-load-bearing citations (1910.07467, 2104.09864) explicitly note their empirical role from the FAST PATH discharge.	direct main commit `fd9b1fc`	this PR
M49	Companion-only GPU MoE priority elevation — M48 listed sub-extension 2 (GPU MoE forward path) as a generic "multi-week aprender follow-up". Operator escalation 2026-05-04: this is now P0 / HIGHEST PRIORITY — the rate-limit on consuming the M32d discharge at production cadence. Spec changes: § Scope extensions reordered to put sub-extension 2 first with explicit P0/P1/P2 labels; sub-extension 2 expanded with rationale (~30 tok/s CPU baseline vs ~225–440 tok/s dense GPU = ~10× gap, makes Qwen3-Coder-30B-A3B-Instruct-Q4_K_M default model production-infeasible at ~30 tok/s) + 5 required deliverables (kernel contract `qwen3-moe-forward-gpu-v1`, CUDA `forward_qwen3_moe_gpu`, wgpu fallback, CPU↔GPU cosine ≥0.99 parity gate FALSIFY-QW3-MOE-GPU-PARITY-001, throughput target ≥150 tok/s) + estimate (2–4 weeks aprender-side). Adds new R10 risk row: GPU MoE forward path missing, marked as the P0 blocker on production-grade `apr code` consumption of the spec-prescribed default model. Recording cadence note added at end of § Scope extensions: until sub-extension 2 ships, every dogfood loop assumes the ~30 tok/s baseline.	direct main commit `a502e8e`	this PR
M50	Cross-repo M-GPU-MOE-0 SHIPPED — first concrete step toward retiring R10. New aprender kernel contract `qwen3-moe-forward-gpu-v1` v1.0.0 DRAFT scaffold landed on aprender main 2026-05-04T04:56:35Z (PR #1453 squash `cf08e910f`). 7 proof obligations + 7 falsification tests + 2 kani harnesses + qa_gate; `pv validate` 0/0. Defines what GPU MoE means: cosine ≥0.99 vs CPU LAZY-FUSED-MATVEC reference (FALSIFY-QW3-MOE-GPU-PARITY-001), cosine ≥0.99 vs HF FP16 reference (PARITY-002, inherits from v1), router-weight invariants (INVARIANTS-001), determinism on same-seed reruns (DETERMINISM-001), throughput ≥150 tok/s on RTX 4090 (THROUGHPUT-001), VRAM utilization ≤95% (MEMORY-001). Implementation stages: M-GPU-MOE-0 (scaffold) SHIPPED; M-GPU-MOE-1 (CUDA kernel + cosine-vs-CPU parity gate) PENDING; M-GPU-MOE-2 (wgpu fallback) PENDING; M-GPU-MOE-3 (throughput + memory) PENDING. Per CLAUDE.md "NEVER write code before writing a provable contract" — this is the contract-first delivery. Companion-side this is bookkeeping (no companion contract bump because the companion contract pins `claude-code-parity-apr-v1.yaml`, not aprender kernel contracts).	aprender #1453 merged at `cf08e910f`	this PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Milestones M0–M50

Uh oh!

FilesExpand file tree

milestones-m0-m50.md

Latest commit

History

milestones-m0-m50.md

File metadata and controls

Milestones M0–M50