docs(M288): V1_004 3-knob dispatch recipe — post-M32d toolkit#256
Merged
Conversation
Recipe doc for the next-step V1_004 bench dispatches. Documents the 3-knob toolkit shipped via paiml/aprender M32d follow-ups: - Knob 1: temperature/top_k/top_p (aprender#1842) - Knob 2: repetition penalty (aprender#1844, supersedes #1843) - Knob 3: streaming SSE (aprender#1835 contract; impl pending) ## What this doc captures - 3-knob table with tuning intuitions - 4 sub-bench scripts (A: sampling alone, B: sampling+penalty, C: penalty alone, D: all 3 + small max_tokens) - Outcome interpretation framework (4 axes: pass rate, turn distribution, error class shift, compliance_cost_ratio) - Companion-side prerequisite: env-var plumbing for the 5 new knobs (APR_AGENT_TEMPERATURE/TOP_K/TOP_P/REPEAT_PENALTY/REPEAT_LAST_N) through bench script → ccpa-arena-bench → apr code → QuantizedGenerateConfig ## Why now M287 evidence locked in the greedy baseline: 6/20 fixtures all driver_error turns_before_error=4. M32d works (4× turn throughput vs pre-M32d 0) but the model is stuck in textual loops. The 3 knobs each address a different theory of WHY it's stuck. This doc gives the operator a structured way to test each theory. ## What this is NOT - NOT a V1_004 discharge — none of these sub-benches have been dispatched yet - NOT a binding sequence — operator may choose any subset - NOT companion-side code — env-var plumbing is a separate PR (M289 follow-up) Mechanical recipe doc. M-counter NOT bumped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Recipe doc for the next-step V1_004 bench dispatches. Documents the 3-knob toolkit shipped via paiml/aprender M32d follow-ups:
What this doc captures
Why now
M287 evidence locked in the greedy baseline: 6/20 fixtures all `driver_error turns_before_error=4`. M32d works (4× turn throughput vs pre-M32d 0) but the 30B-MoE is stuck in textual loops. The 3 knobs each address a different theory of WHY. This doc gives the operator a structured sequence to test each.
What this is NOT
Mechanical recipe doc. M-counter NOT bumped.
Test plan
🤖 Generated with Claude Code