Risks & open questions

Top spec: claude-code-parity-apr-poc.md | Completeness assessment | Axis-2 closure plan | Design audit

R1–R11. R1/R2 are OBSOLETE post-M2.3 rescope; R9 FULLY DISCHARGED at M109; R11 raised at M111 to foreground the M2.3 gap. R3-R8 + R10 + R11 remain live. M113 amendment: R11 closure path now scoped at axis-2-closure-plan.md — 5-idea brainstorm with recommended (2)→(3) sequence (CLI subprocess instrumentation + SWE-bench differential evaluation) to move Axis 2 from ~30% to ~70%; idea (1) HTTPS proxy stays the gold standard but upstream-blocked. M118 amendment: R2's technical premise ("Claude Code may pin its own Anthropic auth, refuse ANTHROPIC_BASE_URL override") is independently DISCHARGED by deepclaude — open-source intercepting proxy at localhost:3200 that routes Claude Code traffic to DeepSeek/OpenRouter/Fireworks via the env var. Idea (1)'s historical "(a) revisit M2.3 rescope; (b) LlmDriver-public upstream" blockers reduce to just (b) for the technical-feasibility axis; the rescope was operational, not technical. M192 amendment (operator-authored design-audit.md): introduces a meta-risk — the static, mock-driven RecordedDriver replay infrastructure may not predict live apr code performance on real multi-step engineering tasks. Popperian falsifier: if static fixtures score ≥0.95 (FALSIFY-CCPA-008) AND live ProgramBench scores ~0 (FALSIFY-CCPA-017), the static-fixture approach is FALSIFIED as a convergence predictor. Three tactical shifts proposed: soft-deprecate FALSIFY-CCPA-014 (OS-event parity); pivot to live Arena runner; prioritize error recovery over zero-shot determinism.

Risks & open questions

#	Risk / question	Mitigation	Falsifiable by
R1	~~Recording the live Anthropic API costs $$ per fixture~~ OBSOLETE post-M2.3 rescope ("we will not call api, we will assume claude code"). Fixtures are now AUTHORED canonical references in `fixtures/canonical/`.	n/a (risk no longer applies)	n/a
R2	~~Claude Code may pin its own Anthropic auth, refuse `ANTHROPIC_BASE_URL` override~~ OBSOLETE post-M2.3 rescope — recording proxy is OOS. M118 prior-art DISCHARGE: deepclaude is a working open-source `ANTHROPIC_BASE_URL`-intercepting proxy that routes Claude Code traffic to alternate backends (DeepSeek/OpenRouter/Fireworks) — concrete proof that Claude Code does not pin Anthropic auth. Documented overridable env-vars: `ANTHROPIC_BASE_URL`, `ANTHROPIC_AUTH_TOKEN`, `ANTHROPIC_DEFAULT_{OPUS,SONNET,HAIKU}_MODEL`, `CLAUDE_CODE_SUBAGENT_MODEL`. Known non-overridable: remote-control bridge (`bridge.claudeusercontent.com`) — hardcoded WebSocket.	n/a (risk no longer applies; technical premise positively DISCHARGED)	n/a
R3	Tool-call equivalence for Edit/Write is non-trivial	Per-tool equivalence rules in `ccpa-differ`, contracted in YAML	FALSIFY-CCPA-004 — directly
R4	Claude Code roadmap may add tools we don't have in `apr code`	New tools surface as `OrchestrationDrift::UnknownToolName`	FALSIFY-CCPA-004 — directly (gate FAILs until `apr-code-parity-v1.yaml` flips a row)
R5	New repo conflicts with monorepo single-source-of-truth	Companion repo is canonical for enforcement; aprender stays canonical for contract text. `pin.lock` pins authoritative commit hash	FALSIFY-CCPA-012 — pre-commit hook rejects stale pins
R6	~~`apr code`'s `LlmDriver` trait may not be public-stable enough for an external repo~~ FULLY DISCHARGED at M162 (2026-05-13) — aprender#1638 MERGED (squash `b61b76b4`); `cargo install apr-cli` ships `apr code` in default build. Empirically discharged earlier at M150 via bilateral bench (agreement = 1.0000 on 5/5 MultiPL-E-Rust HumanEval) using locally-built apr. PMAT-CODE-LLM-DRIVER-PUBLIC-001 ticket (LlmDriver visibility) was a red herring; `LlmDriver` was already `pub`.	PMAT-CODE-LLM-DRIVER-PUBLIC-001 (turned out to not gate the work); aprender#1638 MERGED 2026-05-13	M3.1 functional equivalent achieved at M150; aprender#1638 formalized shipping at M162
R7	100 % line coverage may produce test-for-coverage's-sake noise on a tiny POC	Tradeoff accepted: POC is small (~5 crates), 100 % is achievable. If a function genuinely cannot be covered, the function is unjustified — delete it.	FALSIFY-CCPA-011 — directly
R8	`pmat comply check --strict` may reject patterns aprender itself uses	Companion repo is greenfield; we author to comply. If we hit a genuine `pmat comply` bug, the fix is upstream pmat, not a `--allow` flag	FALSIFY-CCPA-010 — directly
R9	~~M32d numerical-correctness blocker~~ FULLY DISCHARGED 2026-05-09 at M109 — formal cosine ≥ 0.99 vs HF FP16 PASSED at cos_sim 0.995384 (lambda-vector RTX 4090; apr forward 555ms; apr_argmax = hf_argmax = 3555 " What"). M32d FUNCTIONALLY DISCHARGED 2026-05-02 via aprender PR #1228 squash 5235aaeb9 (Step 5 + 5b + 6 + 7 fix bundle: per-head Q/K RMSNorm + rope_theta default 1M + chat template no-think + traced sync); output transition `%%%%%%%%` → `2 + 2 = 4` + multi-domain coherent answers. M34 FAST PATH plan delivered at lucky-case bound (5 PRs / ~6 hours). M109 closed the remaining "formal cosine flip" gap by discovering the FP16 weights had been on disk at `/mnt/nvme-raid0/models/Qwen3-Coder-30B-A3B-Instruct/` (57 GB) for ~7 days — the spec's "60 GB HF download" claim was stale. `qwen3-moe-forward-v1` v1.4.0 ACTIVE_ALGORITHM_LEVEL → v1.5.0 ACTIVE_RUNTIME amendment is empirically valid; aprender-side PR follows from this discharge.	M34 plan executed; M35 audit-trail recorded the discharge; M108 filed aprender#1584; M109 LIVE-DISCHARGED aprender#1584 on 2026-05-09 (issue CLOSED 2026-05-09T21:19:41Z once aprender PR #1597 squash `3fb04ef86` landed flipping `qwen3-moe-forward-v1` v1.4.0 ACTIVE_ALGORITHM_LEVEL → v1.5.0 ACTIVE_RUNTIME).	FALSIFY-QW3-MOE-PARITY-001 (HF FP16 cosine ≥ 0.99) DISCHARGED at M109 (cos 0.9954); FALSIFY-QW3-MOE-PARITY-002 (llama.cpp argmax sanity) deferred — transitive sibling, no longer load-bearing because PARITY-001 directly proved apr_argmax = hf_argmax

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Risks & open questions