Skip to content

Latest commit

 

History

History
27 lines (23 loc) · 5.79 KB

File metadata and controls

27 lines (23 loc) · 5.79 KB

References

Top spec: claude-code-parity-apr-poc.md | Academic basis

| R10 | GPU MoE forward path missing — P0 / HIGHEST PRIORITY (M49 elevation, 2026-05-04; M51 + M52 substantive cascade SHIPPED 2026-05-04). forward_qwen3_moe is CPU-only (LAZY-FUSED-MATVEC at ~30 tok/s). The dense GPU path on this stack runs at 225–440 tok/s (cuBLAS Q4_K on RTX 4090); MoE inference is ~10× slower, making the spec-prescribed Qwen3-Coder-30B-A3B-Instruct-Q4_K_M default model production-infeasible. Without GPU MoE, the M32d discharge is correct but cannot be exercised at production cadence — every apr code invocation hits the 30 tok/s wall. Mitigation status (2026-05-04 post-M52): (a) qwen3-moe-forward-gpu-v1 kernel contract — SHIPPED v1.0.0 (M50, aprender PR #1453) → AMENDED v1.1.0 (M51, aprender PR #1462 option D) → v1.2.0 amendment OPEN (M52, aprender PR #1485 option I wgpu architecture); (b) CUDA forward_qwen3_moe_cuda on OwnedQuantizedModelCuda with sparse expert dispatch via per-layer moe_ffn_forward_layer_cuda → per-expert expert_swiglu_cudaCudaExecutor::q4k_matvec/q6k_gemvFULL FORWARD INTEGRATION SHIPPED 2026-05-04 (M51, aprender PR #1477 squash dc6f94d3b); (c) wgpu fallback (CLAUDE.md backend-agnostic mandate) — M-GPU-MOE-2.0 stub SHIPPED into v1.2.0 contract branch (M52, aprender PR #1487 squash a5827f60c stacked into PR #1485); helpers + integration + parity test PENDING M-GPU-MOE-2.1/2.2/2.3; (d) cosine-equivalence parity gate vs CPU LAZY-FUSED-MATVEC (≥0.99) — test scaffold authored (M52, aprender PR #1484 OPEN, M-GPU-MOE-1.2), heavy --include-ignored assertion runs on lambda-vector RTX 4090 once #1484 merges; (e) throughput ≥150 tok/s + VRAM ≤95% on RTX 4090 — PENDING M-GPU-MOE-3. Until M-GPU-MOE-1.2 PASSES on lambda-vector AND M-GPU-MOE-3 lands, every dogfood loop in this spec still assumes the ~30 tok/s CPU MoE baseline; however the SHIPPED #1477 integration removes the "GPU code does not exist" baseline failure (FALSIFY-QW3-MOE-GPU-001) — only PARITY-001 + THROUGHPUT-001 + MEMORY-001 remain to flip DRAFT → ACTIVE_RUNTIME. | (a)+(b) integration SHIPPED at M51; (c) stub SHIPPED at M52; M-GPU-MOE-1.x cascade CLOSED at M85-M87 (qwen3-moe-forward-gpu-v1 v1.7.0 ACTIVE_ALGORITHM_LEVEL); remaining: M-GPU-MOE-2.x wgpu (tracked at aprender#1582 since M108) + M-GPU-MOE-3 throughput + fp-order alignment (tracked at aprender#1583 since M108) | FALSIFY-QW3-MOE-GPU-PARITY-001 (CPU↔GPU cosine ≥0.99) AND FALSIFY-QW3-MOE-GPU-THROUGHPUT-001 (≥150 tok/s on RTX 4090) on qwen3-moe-forward-gpu-v1 — both gates required to flip ACTIVE_ALGORITHM_LEVEL → ACTIVE_RUNTIME |

References

  • contracts/claude-code-parity-apr-v1.yaml — top-level falsifiable parity contract (this PR)
  • contracts/apr-code-parity-v1.yaml — sibling static-feature matrix
  • contracts/apr-claude-proxy-v1.yaml — sibling Messages-API shape contract
  • crates/aprender-orchestrate/contracts/batuta/apr-code-v1.yaml — agent-loop ground truth
  • docs/specifications/apr-mcp-server-spec.md — feature-by-feature parity matrix prose
  • docs/specifications/apr-cli-qa-spec.md — template for falsification-phase + arXiv layout
  • CLAUDE.md § "Contract Validation: DOGFOOD pv, NEVER bash" — harness policy
  • CLAUDE.md § "Realizar-First Architecture" — why student is apr code, not direct aprender::models
  • Memory: feedback_monorepo_single_source_of_truth.md — aprender vs companion-repo split
  • Memory: feedback_pv_not_bash_for_contracts.md — every gate flows through pv
  • Memory: project_apr_code_parity_matrix.md — the static-matrix epic this POC complements
  • Anthropic Messages API — https://docs.anthropic.com/en/api/messages
  • Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv:1503.02531
  • Segura, S., Towey, D., Zhou, Z., & Chen, T. (2018). METTLE — Metamorphic Testing of Deep Learning Systems. arXiv:1807.10453
  • Differential Testing of Deep Learning Frameworks. arXiv:2207.11976
  • Jimenez, C. et al. (2023). SWE-bench: Can Language Models Resolve Real-World GitHub Issues? arXiv:2310.06770
  • Chaos Engineering for LLM Systems. arXiv:2505.03096
  • deepclaude (M118 prior-art, 2026-05-10) — https://github.com/aattaran/deepclaude — open-source ANTHROPIC_BASE_URL-intercepting proxy at localhost:3200. Confirms Claude Code respects ANTHROPIC_BASE_URL + the model-routing env-vars; provides reference implementation for the M0 Phase 1 RECORD design. Cited from risks.md (R2 discharge) and axis-2-closure-plan.md (idea (1) cost re-estimate).
  • ProgramBench (M159 prior-art, 2026-05-12) — Yang, Lieret, Ma, Thakkar et al., Meta FAIR + Stanford + Harvard (2026), ProgramBench: Can Language Models Rebuild Programs From Scratch? arXiv:2605.03546. Project-scale outcome-parity benchmark: 200 real-world programs (FFmpeg / SQLite / PHP interpreter / etc.) where LMs receive only executable + documentation and must rebuild a behaviorally-equivalent codebase. Tests are agent-generated via coverage-guided fuzzing. Headline empirical finding: 0% of 200 tasks fully resolved; best model passed ≥95% tests on only 3% of tasks. Validates the M157 outcome-parity-results.md § "what this does NOT prove" caveats — function-level POC parity (1.0 on 5 HumanEval) does NOT extrapolate to project-scale. Cited from academic-basis.md (CCPA-016 + future CCPA-017 grounding), outcome-parity-plan.md (P3.6 future-work), outcome-parity-results.md (limitations cross-ref).