|
2 | 2 |
|
3 | 3 | **Version**: 1.23.0 |
4 | 4 | **Date**: 2026-05-02 |
5 | | -**Status**: ACTIVE_RUNTIME — M0–M78 SHIPPED; **GPU MoE forward path P0 / HIGHEST PRIORITY** (M49 elevation, R10) with M-GPU-MOE-0 contract scaffold SHIPPED 2026-05-04 (M50, aprender PR #1453 squash `cf08e910f`); M-GPU-MOE-1.0 → 1.1.2 cascade ALL MERGED 2026-05-04 (M51, aprender PRs #1460 + #1462 + #1464 + #1469 + #1477 squash `dc6f94d3b`); M-GPU-MOE-1.2 SHIPPED + 2.0 + 2.3 stacked OPEN (M52 + M53, aprender PR #1484 squash 8cbb7b51e + #1485 OPEN 3-commit + #1487 + #1488 stacked-merged); M-MOE-SUB-1 SaveTensorStage extension SHIPPED 2026-05-05 (M60, aprender PR #1499 squash `b51986641`); trace-moe-gpu-sub-stages-v1 v1.0.0 → v1.1.0 traced-target clarification SHIPPED 2026-05-05 (M64, aprender PR #1503 squash `8c4c6d5c7`); **M-MOE-SUB-2 step (c) helper `moe_ffn_forward_layer_with_router` SHIPPED 2026-05-05 (M68, aprender PR #1507 squash `0f22c7841`)**; **M-MOE-SUB-2 step (a) `forward_qwen3_moe_traced_with_plan` SHIPPED 2026-05-05 (M74, aprender PR #1516 squash `3138d134d`) — wires MoeRouter + MoeFfnOut emit into CPU traced**; **`pv lint --strict-test-binding` mechanical drift-class prevention SHIPPED 2026-05-05 (M71, aprender PR #1511 squash `ff2e0b634`) — closes the entire M65/M66/M67/M70 manual-fix class via PV-VER-002**; §50.4 cascade §55 polymorphic preflight relaxation + §56 5g.1 LIVE smoke + ship-two-models §57 drift sweep + apr-pretrain-arch-polymorphic-v1 v1.4 + v1.5 + v1.6 + apr-pretrain-from-init-v1 v1.2 + apr-cli-tokenize-import-hf-v1 v1.1 SHIPPED 2026-05-05 (M61–M63 + M65–M67 + M69 + M70, aprender PRs #1500/#1501/#1502/#1504/#1505/#1506/#1508/#1509); M32d numerical-parity FUNCTIONALLY DISCHARGED 2026-05-02 (aprender PR #1228 squash 5235aaeb9); qwen3-moe-forward-v1 v1.3.0 DRAFT → v1.4.0 ACTIVE_ALGORITHM_LEVEL flipped on aprender main 2026-05-02T14:57Z (PR #1409 squash 3a2f2705b) |
| 5 | +**Status**: ACTIVE_RUNTIME — M0–M79 SHIPPED; **GPU MoE forward path P0 / HIGHEST PRIORITY** (M49 elevation, R10) with M-GPU-MOE-0 contract scaffold SHIPPED 2026-05-04 (M50, aprender PR #1453 squash `cf08e910f`); M-GPU-MOE-1.0 → 1.1.2 cascade ALL MERGED 2026-05-04 (M51, aprender PRs #1460 + #1462 + #1464 + #1469 + #1477 squash `dc6f94d3b`); M-GPU-MOE-1.2 SHIPPED + 2.0 + 2.3 stacked OPEN (M52 + M53, aprender PR #1484 squash 8cbb7b51e + #1485 OPEN 3-commit + #1487 + #1488 stacked-merged); M-MOE-SUB-1 SaveTensorStage extension SHIPPED 2026-05-05 (M60, aprender PR #1499 squash `b51986641`); trace-moe-gpu-sub-stages-v1 v1.0.0 → v1.1.0 traced-target clarification SHIPPED 2026-05-05 (M64, aprender PR #1503 squash `8c4c6d5c7`); **M-MOE-SUB-2 step (c) helper `moe_ffn_forward_layer_with_router` SHIPPED 2026-05-05 (M68, aprender PR #1507 squash `0f22c7841`)**; **M-MOE-SUB-2 step (a) `forward_qwen3_moe_traced_with_plan` SHIPPED 2026-05-05 (M74, aprender PR #1516 squash `3138d134d`) — wires MoeRouter + MoeFfnOut emit into CPU traced**; **`pv lint --strict-test-binding` mechanical drift-class prevention SHIPPED 2026-05-05 (M71, aprender PR #1511 squash `ff2e0b634`) — closes the entire M65/M66/M67/M70 manual-fix class via PV-VER-002**; §50.4 cascade §55 polymorphic preflight relaxation + §56 5g.1 LIVE smoke + ship-two-models §57 drift sweep + apr-pretrain-arch-polymorphic-v1 v1.4 + v1.5 + v1.6 + apr-pretrain-from-init-v1 v1.2 + apr-cli-tokenize-import-hf-v1 v1.1 SHIPPED 2026-05-05 (M61–M63 + M65–M67 + M69 + M70, aprender PRs #1500/#1501/#1502/#1504/#1505/#1506/#1508/#1509); M32d numerical-parity FUNCTIONALLY DISCHARGED 2026-05-02 (aprender PR #1228 squash 5235aaeb9); qwen3-moe-forward-v1 v1.3.0 DRAFT → v1.4.0 ACTIVE_ALGORITHM_LEVEL flipped on aprender main 2026-05-02T14:57Z (PR #1409 squash 3a2f2705b) |
6 | 6 | **Source of truth**: https://github.com/paiml/claude-code-parity-apr (canonical for enforcement; aprender mirrors only the contract YAML byte-for-byte via `pin.lock`) |
7 | 7 | **Companion-repo invariants** (must be green on every PR — see § Companion-repo source-of-truth invariants): |
8 | 8 | 1. GitHub Actions `ci/gate` green (required status check) → **FALSIFY-CCPA-009** |
@@ -308,7 +308,7 @@ The teacher's *fixtures* are immutable per-revision; the student (`apr code` orc |
308 | 308 |
|
309 | 309 | ## Phases / Milestones |
310 | 310 |
|
311 | | -> **Status snapshot (2026-05-06)**: M0–M78 SHIPPED. M32d |
| 311 | +> **Status snapshot (2026-05-06)**: M0–M79 SHIPPED. M32d |
312 | 312 | > **FUNCTIONALLY DISCHARGED** 2026-05-02 via aprender PR #1228 squash |
313 | 313 | > 5235aaeb9 (Step 5 + 5b + 6 + 7 fix bundle). Output transition on |
314 | 314 | > lambda-vector RTX 4090 against the cached 17.3 GB Qwen3-Coder-30B- |
@@ -493,6 +493,7 @@ in `contracts/claude-code-parity-apr-v1.yaml § status_history`: |
493 | 493 | | **M76** | Cross-repo **v0.32.0 publish-cascade SHIPPED on aprender main** as 2 squashes: `0bb94d5d3` (2026-05-05, aprender PR #1518) + `cb20a3648` (2026-05-05, aprender PR #1519). Combined record covering the final two PRs of the v0.32.0 publish cascade (aprender#1514). **#1518 fix**: `apr-cli/src/commands/aliases.rs:13` had `include_str!("../../../../configs/aliases.yaml")` referencing the workspace-root file; `cargo publish` excludes files outside the crate dir, breaking the publish step. Fix copies `aliases.yaml` into the crate dir + updates `include_str!` path. **#1519 chore**: CHANGELOG.md gains a `## [0.32.0] - 2026-05-05` section under `## [Unreleased]` documenting the breaking aprender-rag lib rename (#1510, #1512), the cascade publish (#1514) at v0.32.0 across 15 user-facing crates, and the mechanical-drift kaizen wins (M71 + M73 publish-hygiene cascade). Together M72 + #1515 + M75 + M76 (covering #1518 + #1519) close the v0.32.0 release cycle: lib-rename → dep-cycle break → clean-room compat → publish-include-path → CHANGELOG. **Publish-cascade lessons** (kaizen): (1) cargo publish excludes files outside crate-root → require all `include_str!` paths under `crates/<name>/`; (2) clean-room sed strips `path` but needs a `version` fallback → use `{ version = "*", path = "..." }` dev-dep form; (3) APR-MONO consolidation gaps (lib-name harmonization swept most crates but missed aprender-rag) ripple at publish time only — would benefit from a CI gate that runs `cargo publish --dry-run` for every workspace member. | aprender [#1518](https://github.com/paiml/aprender/pull/1518) `0bb94d5d3` + [#1519](https://github.com/paiml/aprender/pull/1519) `cb20a3648` | this PR | |
494 | 494 | | **M77** | Cross-repo **§58 ship-two-models v3.02.0 → v3.03.0 + M-MOE-SUB-2 step (a) CLI completion SHIPPED on aprender main** as 2 squashes: `8525008f6` (2026-05-06, aprender PR #1520) + `c63a8dd61` (2026-05-06, aprender PR #1521). **#1520 §58**: ship-two-models spec v3.02.0 → v3.03.0 records the v0.32.0 cascade publish (Issue #1514 CLOSED) and the four release-engineering defects the cascade surfaced+closed (publish-include-path, dep-cycle, clean-room sed, CHANGELOG) — third hygiene amendment in ship-two-models, mirroring M65/M66/M67/M70 pretrain-contract drift fixes on the spec side. **#1521 M-MOE-SUB-2 step (a) CLI completion**: connects the `--save-tensor` / `--save-tensor-layers` / `--save-tensor-dir` clap surface (PR-A #1405) through to `forward_qwen3_moe_traced_with_plan` (M74) for `.gguf` qwen3_moe models. New pub fn `run_save_tensor_gguf_moe(path, stages, dir, layers)` in `crates/apr-cli/src/commands/trace_save_tensor.rs` mirrors the existing `run_save_tensor_apr` for APR models — loads via `MappedGGUFModel`/`OwnedQuantizedModel`, validates qwen3_moe arch, reads MoE config from GGUF metadata, dispatches to `forward_qwen3_moe_traced_with_plan` with the plan derived from CLI args. Dispatch wireup in `dispatch.rs::dispatch_diagnostic_commands` routes `.gguf` to the new function (`.apr` continues to use existing dense path; `.safetensors` still stub). **Operationally unblocks M-MOE-SUB-3 live bisection on lambda-vector RTX 4090** (CPU-traced side): `apr trace --save-tensor moe_router,moe_ffn_out --save-tensor-layers 0..48 --save-tensor-dir <dir> <qwen3_moe_gguf>` now produces per-layer MoeRouter + MoeFfnOut tensor files on disk, ready for diff vs the GPU sibling output once M-MOE-SUB-2 step (b) ships. Production hot paths (`forward_qwen3_moe`, `forward_qwen3_moe_cuda`) byte-unchanged; `forward_qwen3_moe_traced` (no-plan) public API unchanged via the M74 delegate pattern. | aprender [#1520](https://github.com/paiml/aprender/pull/1520) `8525008f6` + [#1521](https://github.com/paiml/aprender/pull/1521) `c63a8dd61` | this PR | |
495 | 495 | | **M78** | Cross-repo **`moe_ffn_forward_layer_cuda_with_router` GPU helper SHIPPED on aprender main** as squash `7e2091967` (2026-05-06, aprender PR #1522). **GPU parallel of M-MOE-SUB-2 step (c)** — adds the sibling helper that returns both the FFN output AND the post-renormalize top-k router weights from `OwnedQuantizedModelCuda::moe_ffn_forward_layer_cuda`. Where M68 added `moe_ffn_forward_layer_with_router` for the CPU side (used by M74's `forward_qwen3_moe_traced_with_plan`), M78 adds the GPU mirror needed by the upcoming `forward_qwen3_moe_cuda_traced` (M-MOE-SUB-2 step (b), next PR). The helper enables capturing `MoeRouter` for the last token without (a) recomputing the router from scratch (drift risk between production and traced), or (b) modifying the production `moe_ffn_forward_layer_cuda` hot path (additive-purity invariant pinned in v1.1.0). With M78 + M68 + M77 (CLI wireup), the M-MOE-SUB-2 cascade now has: CPU helper (M68) ✓ + CPU traced wireup (M74) ✓ + CLI dispatch (M77) ✓ + GPU helper (M78) ✓ — only step (b) `forward_qwen3_moe_cuda_traced` GPU sibling remains before M-MOE-SUB-3 live bisection on RTX 4090 can compare CPU vs GPU traced outputs to find the first NaN-emitting stage. | aprender [#1522 MERGED](https://github.com/paiml/aprender/pull/1522) `7e2091967` | this PR | |
| 496 | +| **M79** | Cross-repo **`forward_qwen3_moe_cuda_traced` SHIPPED on aprender main** as squash `690a835c4` (2026-05-06, aprender PR #1523). **M-MOE-SUB-2 step (b) — GPU traced sibling, completing the M-MOE-SUB-2 cascade end-to-end.** Mirrors the CPU traced sibling `forward_qwen3_moe_traced_with_plan` (M74) but routes per-layer MoE FFN through the GPU dispatch (`moe_ffn_forward_layer_cuda_with_router` from M78) so `apr trace --gpu --json --payload --save-tensor` can run **the same SaveTensorPlan** against both CPU and GPU forward paths, capture per-stage activations at MoeRouter + MoeFfnOut, and bisect the M-GPU-MOE-1.4 NaN/Inf source. Production `forward_qwen3_moe_cuda` hot path byte-unchanged (additive-purity invariant pinned in v1.1.0). **M-MOE-SUB-2 cascade now 5/5 COMPLETE**: CPU helper (M68) + CPU traced (M74) + CLI dispatch (M77) + GPU helper (M78) + GPU traced (M79). **Operationally unblocks M-MOE-SUB-3 live bisection on lambda-vector RTX 4090**: an operator can now run `apr trace --gpu --save-tensor moe_router,moe_ffn_out --save-tensor-layers 0..48 --save-tensor-dir <gpu_dir> <qwen3_moe_gguf>` (GPU side) + `apr trace --save-tensor moe_router,moe_ffn_out --save-tensor-layers 0..48 --save-tensor-dir <cpu_dir> <qwen3_moe_gguf>` (CPU side) on the cached 17.3 GB Qwen3-Coder GGUF, then `apr diff --values <cpu_dir>/layer-N/moe_ffn_out.bin <gpu_dir>/layer-N/moe_ffn_out.bin` per layer to find the first stage where GPU produces NaN/Inf. Once bisected, M-GPU-MOE-1.4 fix lands at the bisected stage. | aprender [#1523 MERGED](https://github.com/paiml/aprender/pull/1523) `690a835c4` | this PR | |
496 | 497 |
|
497 | 498 | ## Falsification conditions (13 gates total) |
498 | 499 |
|
@@ -747,7 +748,7 @@ inverts the schedule for everything after. |
747 | 748 | | Run | Date | Revision | Verdict | Notes | |
748 | 749 | |-----|------|----------|---------|-------| |
749 | 750 | | Run 0 | 2026-04-26 | original spec PR | **NOT YET RUN** (historical) | Spec authored; companion repo not yet scaffolded; gates 009–012 not yet wired. | |
750 | | -| Run 1 | 2026-04-26 → 2026-05-06 | M1–M78 (every merge to companion main) | **PASS** on every commit | Gates 009–012 (ci/gate green, pmat comply 100%, line coverage ≥99%, pv validate clean) have been hard-blocking on every PR since M1's empty-scaffold landed (FALSIFY-CCPA-009 enforces branch protection from that PR forward). M32d FUNCTIONALLY DISCHARGED 2026-05-02 (M35 audit-trail bump records aprender PR #1228 squash 5235aaeb9); subsequent days (5-03 through 5-06) show ongoing post-discharge verification — `make smoke-m32d` PASS on every check, drift detector + meta-test green at M47. Per-run audit trail lives in `contracts/claude-code-parity-apr-v1.yaml § status_history` (one entry per minor-version bump). | |
| 751 | +| Run 1 | 2026-04-26 → 2026-05-06 | M1–M79 (every merge to companion main) | **PASS** on every commit | Gates 009–012 (ci/gate green, pmat comply 100%, line coverage ≥99%, pv validate clean) have been hard-blocking on every PR since M1's empty-scaffold landed (FALSIFY-CCPA-009 enforces branch protection from that PR forward). M32d FUNCTIONALLY DISCHARGED 2026-05-02 (M35 audit-trail bump records aprender PR #1228 squash 5235aaeb9); subsequent days (5-03 through 5-06) show ongoing post-discharge verification — `make smoke-m32d` PASS on every check, drift detector + meta-test green at M47. Per-run audit trail lives in `contracts/claude-code-parity-apr-v1.yaml § status_history` (one entry per minor-version bump). | |
751 | 752 |
|
752 | 753 | (Subsequent runs append below in the apr-cli-qa-spec.md format: gate / status / evidence per row. The status_history block in the contract YAML is the byte-precise audit; this table is the human roll-up.) |
753 | 754 |
|
|
0 commit comments