docs(M288): V1_004 3-knob dispatch recipe — post-M32d toolkit by noahgift · Pull Request #256 · paiml/claude-code-parity-apr

noahgift · 2026-05-20T10:44:55Z

Summary

Recipe doc for the next-step V1_004 bench dispatches. Documents the 3-knob toolkit shipped via paiml/aprender M32d follow-ups:

Knob 1: temperature/top_k/top_p (aprender#1842)
Knob 2: repetition penalty (aprender#1844, supersedes #1843)
Knob 3: streaming SSE (aprender#1835 contract; impl pending)

What this doc captures

3-knob table with tuning intuitions
4 sub-bench scripts (A: sampling alone, B: sampling+penalty, C: penalty alone, D: all 3 + small max_tokens)
Outcome interpretation framework (4 axes)
Companion-side prerequisite: env-var plumbing for 5 new knobs through bench script → ccpa-arena-bench → apr code → `QuantizedGenerateConfig`

Why now

M287 evidence locked in the greedy baseline: 6/20 fixtures all `driver_error turns_before_error=4`. M32d works (4× turn throughput vs pre-M32d 0) but the 30B-MoE is stuck in textual loops. The 3 knobs each address a different theory of WHY. This doc gives the operator a structured sequence to test each.

What this is NOT

NOT a V1_004 discharge — sub-benches not yet dispatched
NOT a binding sequence — operator may pick any subset
NOT companion-side code — env-var plumbing is a separate follow-up

Mechanical recipe doc. M-counter NOT bumped.

Test plan

`bash scripts/test-doc-drift.sh` (if reachable)
CI gate + workspace-test (pre-existing inherited failure; admin bypass)

🤖 Generated with Claude Code

Recipe doc for the next-step V1_004 bench dispatches. Documents the 3-knob toolkit shipped via paiml/aprender M32d follow-ups: - Knob 1: temperature/top_k/top_p (aprender#1842) - Knob 2: repetition penalty (aprender#1844, supersedes #1843) - Knob 3: streaming SSE (aprender#1835 contract; impl pending) ## What this doc captures - 3-knob table with tuning intuitions - 4 sub-bench scripts (A: sampling alone, B: sampling+penalty, C: penalty alone, D: all 3 + small max_tokens) - Outcome interpretation framework (4 axes: pass rate, turn distribution, error class shift, compliance_cost_ratio) - Companion-side prerequisite: env-var plumbing for the 5 new knobs (APR_AGENT_TEMPERATURE/TOP_K/TOP_P/REPEAT_PENALTY/REPEAT_LAST_N) through bench script → ccpa-arena-bench → apr code → QuantizedGenerateConfig ## Why now M287 evidence locked in the greedy baseline: 6/20 fixtures all driver_error turns_before_error=4. M32d works (4× turn throughput vs pre-M32d 0) but the model is stuck in textual loops. The 3 knobs each address a different theory of WHY it's stuck. This doc gives the operator a structured way to test each theory. ## What this is NOT - NOT a V1_004 discharge — none of these sub-benches have been dispatched yet - NOT a binding sequence — operator may choose any subset - NOT companion-side code — env-var plumbing is a separate PR (M289 follow-up) Mechanical recipe doc. M-counter NOT bumped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift merged commit 2b47a22 into main May 20, 2026
1 check failed

noahgift deleted the m288-v1004-3knob-recipe branch May 20, 2026 10:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(M288): V1_004 3-knob dispatch recipe — post-M32d toolkit#256

docs(M288): V1_004 3-knob dispatch recipe — post-M32d toolkit#256
noahgift merged 1 commit into
mainfrom
m288-v1004-3knob-recipe

noahgift commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 20, 2026

Summary

What this doc captures

Why now

What this is NOT

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant