Top spec: claude-code-parity-apr-poc.md | Academic basis
| R10 | GPU MoE forward path missing — P0 / HIGHEST PRIORITY (M49 elevation, 2026-05-04; M51 + M52 substantive cascade SHIPPED 2026-05-04). forward_qwen3_moe is CPU-only (LAZY-FUSED-MATVEC at ~30 tok/s). The dense GPU path on this stack runs at 225–440 tok/s (cuBLAS Q4_K on RTX 4090); MoE inference is ~10× slower, making the spec-prescribed Qwen3-Coder-30B-A3B-Instruct-Q4_K_M default model production-infeasible. Without GPU MoE, the M32d discharge is correct but cannot be exercised at production cadence — every apr code invocation hits the 30 tok/s wall. Mitigation status (2026-05-04 post-M52): (a) qwen3-moe-forward-gpu-v1 kernel contract — SHIPPED v1.0.0 (M50, aprender PR #1453) → AMENDED v1.1.0 (M51, aprender PR #1462 option D) → v1.2.0 amendment OPEN (M52, aprender PR #1485 option I wgpu architecture); (b) CUDA forward_qwen3_moe_cuda on OwnedQuantizedModelCuda with sparse expert dispatch via per-layer moe_ffn_forward_layer_cuda → per-expert expert_swiglu_cuda → CudaExecutor::q4k_matvec/q6k_gemv — FULL FORWARD INTEGRATION SHIPPED 2026-05-04 (M51, aprender PR #1477 squash dc6f94d3b); (c) wgpu fallback (CLAUDE.md backend-agnostic mandate) — M-GPU-MOE-2.0 stub SHIPPED into v1.2.0 contract branch (M52, aprender PR #1487 squash a5827f60c stacked into PR #1485); helpers + integration + parity test PENDING M-GPU-MOE-2.1/2.2/2.3; (d) cosine-equivalence parity gate vs CPU LAZY-FUSED-MATVEC (≥0.99) — test scaffold authored (M52, aprender PR #1484 OPEN, M-GPU-MOE-1.2), heavy --include-ignored assertion runs on lambda-vector RTX 4090 once #1484 merges; (e) throughput ≥150 tok/s + VRAM ≤95% on RTX 4090 — PENDING M-GPU-MOE-3. Until M-GPU-MOE-1.2 PASSES on lambda-vector AND M-GPU-MOE-3 lands, every dogfood loop in this spec still assumes the ~30 tok/s CPU MoE baseline; however the SHIPPED #1477 integration removes the "GPU code does not exist" baseline failure (FALSIFY-QW3-MOE-GPU-001) — only PARITY-001 + THROUGHPUT-001 + MEMORY-001 remain to flip DRAFT → ACTIVE_RUNTIME. | (a)+(b) integration SHIPPED at M51; (c) stub SHIPPED at M52; M-GPU-MOE-1.x cascade CLOSED at M85-M87 (qwen3-moe-forward-gpu-v1 v1.7.0 ACTIVE_ALGORITHM_LEVEL); remaining: M-GPU-MOE-2.x wgpu (tracked at aprender#1582 since M108) + M-GPU-MOE-3 throughput + fp-order alignment (tracked at aprender#1583 since M108) | FALSIFY-QW3-MOE-GPU-PARITY-001 (CPU↔GPU cosine ≥0.99) AND FALSIFY-QW3-MOE-GPU-THROUGHPUT-001 (≥150 tok/s on RTX 4090) on qwen3-moe-forward-gpu-v1 — both gates required to flip ACTIVE_ALGORITHM_LEVEL → ACTIVE_RUNTIME |
contracts/claude-code-parity-apr-v1.yaml— top-level falsifiable parity contract (this PR)contracts/apr-code-parity-v1.yaml— sibling static-feature matrixcontracts/apr-claude-proxy-v1.yaml— sibling Messages-API shape contractcrates/aprender-orchestrate/contracts/batuta/apr-code-v1.yaml— agent-loop ground truthdocs/specifications/apr-mcp-server-spec.md— feature-by-feature parity matrix prosedocs/specifications/apr-cli-qa-spec.md— template for falsification-phase + arXiv layoutCLAUDE.md§ "Contract Validation: DOGFOOD pv, NEVER bash" — harness policyCLAUDE.md§ "Realizar-First Architecture" — why student isapr code, not directaprender::models- Memory:
feedback_monorepo_single_source_of_truth.md— aprender vs companion-repo split - Memory:
feedback_pv_not_bash_for_contracts.md— every gate flows throughpv - Memory:
project_apr_code_parity_matrix.md— the static-matrix epic this POC complements - Anthropic Messages API — https://docs.anthropic.com/en/api/messages
- Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv:1503.02531
- Segura, S., Towey, D., Zhou, Z., & Chen, T. (2018). METTLE — Metamorphic Testing of Deep Learning Systems. arXiv:1807.10453
- Differential Testing of Deep Learning Frameworks. arXiv:2207.11976
- Jimenez, C. et al. (2023). SWE-bench: Can Language Models Resolve Real-World GitHub Issues? arXiv:2310.06770
- Chaos Engineering for LLM Systems. arXiv:2505.03096
- deepclaude (M118 prior-art, 2026-05-10) — https://github.com/aattaran/deepclaude — open-source
ANTHROPIC_BASE_URL-intercepting proxy atlocalhost:3200. Confirms Claude Code respectsANTHROPIC_BASE_URL+ the model-routing env-vars; provides reference implementation for the M0 Phase 1 RECORD design. Cited from risks.md (R2 discharge) and axis-2-closure-plan.md (idea (1) cost re-estimate). - ProgramBench (M159 prior-art, 2026-05-12) — Yang, Lieret, Ma, Thakkar et al., Meta FAIR + Stanford + Harvard (2026), ProgramBench: Can Language Models Rebuild Programs From Scratch? arXiv:2605.03546. Project-scale outcome-parity benchmark: 200 real-world programs (FFmpeg / SQLite / PHP interpreter / etc.) where LMs receive only executable + documentation and must rebuild a behaviorally-equivalent codebase. Tests are agent-generated via coverage-guided fuzzing. Headline empirical finding: 0% of 200 tasks fully resolved; best model passed ≥95% tests on only 3% of tasks. Validates the M157 outcome-parity-results.md § "what this does NOT prove" caveats — function-level POC parity (1.0 on 5 HumanEval) does NOT extrapolate to project-scale. Cited from academic-basis.md (CCPA-016 + future CCPA-017 grounding), outcome-parity-plan.md (P3.6 future-work), outcome-parity-results.md (limitations cross-ref).