Top spec: claude-code-parity-apr-poc.md | Completeness assessment | Axis-2 closure plan | Design audit
R1–R11. R1/R2 are OBSOLETE post-M2.3 rescope; R9 FULLY DISCHARGED at M109; R11 raised at M111 to foreground the M2.3 gap. R3-R8 + R10 + R11 remain live. M113 amendment: R11 closure path now scoped at axis-2-closure-plan.md — 5-idea brainstorm with recommended (2)→(3) sequence (CLI subprocess instrumentation + SWE-bench differential evaluation) to move Axis 2 from ~30% to ~70%; idea (1) HTTPS proxy stays the gold standard but upstream-blocked. M118 amendment: R2's technical premise ("Claude Code may pin its own Anthropic auth, refuse ANTHROPIC_BASE_URL override") is independently DISCHARGED by deepclaude — open-source intercepting proxy at localhost:3200 that routes Claude Code traffic to DeepSeek/OpenRouter/Fireworks via the env var. Idea (1)'s historical "(a) revisit M2.3 rescope; (b) LlmDriver-public upstream" blockers reduce to just (b) for the technical-feasibility axis; the rescope was operational, not technical. M192 amendment (operator-authored design-audit.md): introduces a meta-risk — the static, mock-driven RecordedDriver replay infrastructure may not predict live apr code performance on real multi-step engineering tasks. Popperian falsifier: if static fixtures score ≥0.95 (FALSIFY-CCPA-008) AND live ProgramBench scores ~0 (FALSIFY-CCPA-017), the static-fixture approach is FALSIFIED as a convergence predictor. Three tactical shifts proposed: soft-deprecate FALSIFY-CCPA-014 (OS-event parity); pivot to live Arena runner; prioritize error recovery over zero-shot determinism.
| # | Risk / question | Mitigation | Falsifiable by |
|---|---|---|---|
| R1 | fixtures/canonical/. |
n/a (risk no longer applies) | n/a |
| R2 | ANTHROPIC_BASE_URL overrideANTHROPIC_BASE_URL-intercepting proxy that routes Claude Code traffic to alternate backends (DeepSeek/OpenRouter/Fireworks) — concrete proof that Claude Code does not pin Anthropic auth. Documented overridable env-vars: ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_DEFAULT_{OPUS,SONNET,HAIKU}_MODEL, CLAUDE_CODE_SUBAGENT_MODEL. Known non-overridable: remote-control bridge (bridge.claudeusercontent.com) — hardcoded WebSocket. |
n/a (risk no longer applies; technical premise positively DISCHARGED) | n/a |
| R3 | Tool-call equivalence for Edit/Write is non-trivial | Per-tool equivalence rules in ccpa-differ, contracted in YAML |
FALSIFY-CCPA-004 — directly |
| R4 | Claude Code roadmap may add tools we don't have in apr code |
New tools surface as OrchestrationDrift::UnknownToolName |
FALSIFY-CCPA-004 — directly (gate FAILs until apr-code-parity-v1.yaml flips a row) |
| R5 | New repo conflicts with monorepo single-source-of-truth | Companion repo is canonical for enforcement; aprender stays canonical for contract text. pin.lock pins authoritative commit hash |
FALSIFY-CCPA-012 — pre-commit hook rejects stale pins |
| R6 | apr code's LlmDriver trait may not be public-stable enough for an external repob61b76b4); cargo install apr-cli ships apr code in default build. Empirically discharged earlier at M150 via bilateral bench (agreement = 1.0000 on 5/5 MultiPL-E-Rust HumanEval) using locally-built apr. PMAT-CODE-LLM-DRIVER-PUBLIC-001 ticket (LlmDriver visibility) was a red herring; LlmDriver was already pub. |
PMAT-CODE-LLM-DRIVER-PUBLIC-001 (turned out to not gate the work); aprender#1638 MERGED 2026-05-13 | M3.1 functional equivalent achieved at M150; aprender#1638 formalized shipping at M162 |
| R7 | 100 % line coverage may produce test-for-coverage's-sake noise on a tiny POC | Tradeoff accepted: POC is small (~5 crates), 100 % is achievable. If a function genuinely cannot be covered, the function is unjustified — delete it. | FALSIFY-CCPA-011 — directly |
| R8 | pmat comply check --strict may reject patterns aprender itself uses |
Companion repo is greenfield; we author to comply. If we hit a genuine pmat comply bug, the fix is upstream pmat, not a --allow flag |
FALSIFY-CCPA-010 — directly |
| R9 | %%%%%%%% → 2 + 2 = 4 + multi-domain coherent answers. M34 FAST PATH plan delivered at lucky-case bound (5 PRs / ~6 hours). M109 closed the remaining "formal cosine flip" gap by discovering the FP16 weights had been on disk at /mnt/nvme-raid0/models/Qwen3-Coder-30B-A3B-Instruct/ (57 GB) for ~7 days — the spec's "60 GB HF download" claim was stale. qwen3-moe-forward-v1 v1.4.0 ACTIVE_ALGORITHM_LEVEL → v1.5.0 ACTIVE_RUNTIME amendment is empirically valid; aprender-side PR follows from this discharge. |
M34 plan executed; M35 audit-trail recorded the discharge; M108 filed aprender#1584; M109 LIVE-DISCHARGED aprender#1584 on 2026-05-09 (issue CLOSED 2026-05-09T21:19:41Z once aprender PR #1597 squash 3fb04ef86 landed flipping qwen3-moe-forward-v1 v1.4.0 ACTIVE_ALGORITHM_LEVEL → v1.5.0 ACTIVE_RUNTIME). |
FALSIFY-QW3-MOE-PARITY-001 (HF FP16 cosine ≥ 0.99) DISCHARGED at M109 (cos 0.9954); FALSIFY-QW3-MOE-PARITY-002 (llama.cpp argmax sanity) deferred — transitive sibling, no longer load-bearing because PARITY-001 directly proved apr_argmax = hf_argmax |