References

Top spec: claude-code-parity-apr-poc.md | Academic basis

| R10 | GPU MoE forward path missing — P0 / HIGHEST PRIORITY (M49 elevation, 2026-05-04; M51 + M52 substantive cascade SHIPPED 2026-05-04). forward_qwen3_moe is CPU-only (LAZY-FUSED-MATVEC at ~30 tok/s). The dense GPU path on this stack runs at 225–440 tok/s (cuBLAS Q4_K on RTX 4090); MoE inference is ~10× slower, making the spec-prescribed Qwen3-Coder-30B-A3B-Instruct-Q4_K_M default model production-infeasible. Without GPU MoE, the M32d discharge is correct but cannot be exercised at production cadence — every apr code invocation hits the 30 tok/s wall. Mitigation status (2026-05-04 post-M52): (a) qwen3-moe-forward-gpu-v1 kernel contract — SHIPPED v1.0.0 (M50, aprender PR #1453) → AMENDED v1.1.0 (M51, aprender PR #1462 option D) → v1.2.0 amendment OPEN (M52, aprender PR #1485 option I wgpu architecture); (b) CUDA forward_qwen3_moe_cuda on OwnedQuantizedModelCuda with sparse expert dispatch via per-layer moe_ffn_forward_layer_cuda → per-expert expert_swiglu_cuda → CudaExecutor::q4k_matvec/q6k_gemv — FULL FORWARD INTEGRATION SHIPPED 2026-05-04 (M51, aprender PR #1477 squash dc6f94d3b); (c) wgpu fallback (CLAUDE.md backend-agnostic mandate) — M-GPU-MOE-2.0 stub SHIPPED into v1.2.0 contract branch (M52, aprender PR #1487 squash a5827f60c stacked into PR #1485); helpers + integration + parity test PENDING M-GPU-MOE-2.1/2.2/2.3; (d) cosine-equivalence parity gate vs CPU LAZY-FUSED-MATVEC (≥0.99) — test scaffold authored (M52, aprender PR #1484 OPEN, M-GPU-MOE-1.2), heavy --include-ignored assertion runs on lambda-vector RTX 4090 once #1484 merges; (e) throughput ≥150 tok/s + VRAM ≤95% on RTX 4090 — PENDING M-GPU-MOE-3. Until M-GPU-MOE-1.2 PASSES on lambda-vector AND M-GPU-MOE-3 lands, every dogfood loop in this spec still assumes the ~30 tok/s CPU MoE baseline; however the SHIPPED #1477 integration removes the "GPU code does not exist" baseline failure (FALSIFY-QW3-MOE-GPU-001) — only PARITY-001 + THROUGHPUT-001 + MEMORY-001 remain to flip DRAFT → ACTIVE_RUNTIME. | (a)+(b) integration SHIPPED at M51; (c) stub SHIPPED at M52; M-GPU-MOE-1.x cascade CLOSED at M85-M87 (qwen3-moe-forward-gpu-v1 v1.7.0 ACTIVE_ALGORITHM_LEVEL); remaining: M-GPU-MOE-2.x wgpu (tracked at aprender#1582 since M108) + M-GPU-MOE-3 throughput + fp-order alignment (tracked at aprender#1583 since M108) | FALSIFY-QW3-MOE-GPU-PARITY-001 (CPU↔GPU cosine ≥0.99) AND FALSIFY-QW3-MOE-GPU-THROUGHPUT-001 (≥150 tok/s on RTX 4090) on qwen3-moe-forward-gpu-v1 — both gates required to flip ACTIVE_ALGORITHM_LEVEL → ACTIVE_RUNTIME |

References

contracts/claude-code-parity-apr-v1.yaml — top-level falsifiable parity contract (this PR)
contracts/apr-code-parity-v1.yaml — sibling static-feature matrix
contracts/apr-claude-proxy-v1.yaml — sibling Messages-API shape contract
crates/aprender-orchestrate/contracts/batuta/apr-code-v1.yaml — agent-loop ground truth
docs/specifications/apr-mcp-server-spec.md — feature-by-feature parity matrix prose
docs/specifications/apr-cli-qa-spec.md — template for falsification-phase + arXiv layout
CLAUDE.md § "Contract Validation: DOGFOOD pv, NEVER bash" — harness policy
CLAUDE.md § "Realizar-First Architecture" — why student is apr code, not direct aprender::models
Memory: feedback_monorepo_single_source_of_truth.md — aprender vs companion-repo split
Memory: feedback_pv_not_bash_for_contracts.md — every gate flows through pv
Memory: project_apr_code_parity_matrix.md — the static-matrix epic this POC complements
Anthropic Messages API — https://docs.anthropic.com/en/api/messages
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv:1503.02531
Segura, S., Towey, D., Zhou, Z., & Chen, T. (2018). METTLE — Metamorphic Testing of Deep Learning Systems. arXiv:1807.10453
Differential Testing of Deep Learning Frameworks. arXiv:2207.11976
Jimenez, C. et al. (2023). SWE-bench: Can Language Models Resolve Real-World GitHub Issues? arXiv:2310.06770
Chaos Engineering for LLM Systems. arXiv:2505.03096
deepclaude (M118 prior-art, 2026-05-10) — https://github.com/aattaran/deepclaude — open-source ANTHROPIC_BASE_URL-intercepting proxy at localhost:3200. Confirms Claude Code respects ANTHROPIC_BASE_URL + the model-routing env-vars; provides reference implementation for the M0 Phase 1 RECORD design. Cited from risks.md (R2 discharge) and axis-2-closure-plan.md (idea (1) cost re-estimate).
ProgramBench (M159 prior-art, 2026-05-12) — Yang, Lieret, Ma, Thakkar et al., Meta FAIR + Stanford + Harvard (2026), ProgramBench: Can Language Models Rebuild Programs From Scratch? arXiv:2605.03546. Project-scale outcome-parity benchmark: 200 real-world programs (FFmpeg / SQLite / PHP interpreter / etc.) where LMs receive only executable + documentation and must rebuild a behaviorally-equivalent codebase. Tests are agent-generated via coverage-guided fuzzing. Headline empirical finding: 0% of 200 tasks fully resolved; best model passed ≥95% tests on only 3% of tasks. Validates the M157 outcome-parity-results.md § "what this does NOT prove" caveats — function-level POC parity (1.0 on 5 HumanEval) does NOT extrapolate to project-scale. Cited from academic-basis.md (CCPA-016 + future CCPA-017 grounding), outcome-parity-plan.md (P3.6 future-work), outcome-parity-results.md (limitations cross-ref).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

References

References

Uh oh!

FilesExpand file tree

references.md

Latest commit

History

references.md

File metadata and controls

References

References