Skip to content

docs(M288): V1_004 3-knob dispatch recipe — post-M32d toolkit#256

Merged
noahgift merged 1 commit into
mainfrom
m288-v1004-3knob-recipe
May 20, 2026
Merged

docs(M288): V1_004 3-knob dispatch recipe — post-M32d toolkit#256
noahgift merged 1 commit into
mainfrom
m288-v1004-3knob-recipe

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Recipe doc for the next-step V1_004 bench dispatches. Documents the 3-knob toolkit shipped via paiml/aprender M32d follow-ups:

  • Knob 1: temperature/top_k/top_p (aprender#1842)
  • Knob 2: repetition penalty (aprender#1844, supersedes #1843)
  • Knob 3: streaming SSE (aprender#1835 contract; impl pending)

What this doc captures

  • 3-knob table with tuning intuitions
  • 4 sub-bench scripts (A: sampling alone, B: sampling+penalty, C: penalty alone, D: all 3 + small max_tokens)
  • Outcome interpretation framework (4 axes)
  • Companion-side prerequisite: env-var plumbing for 5 new knobs through bench script → ccpa-arena-bench → apr code → `QuantizedGenerateConfig`

Why now

M287 evidence locked in the greedy baseline: 6/20 fixtures all `driver_error turns_before_error=4`. M32d works (4× turn throughput vs pre-M32d 0) but the 30B-MoE is stuck in textual loops. The 3 knobs each address a different theory of WHY. This doc gives the operator a structured sequence to test each.

What this is NOT

  • NOT a V1_004 discharge — sub-benches not yet dispatched
  • NOT a binding sequence — operator may pick any subset
  • NOT companion-side code — env-var plumbing is a separate follow-up

Mechanical recipe doc. M-counter NOT bumped.

Test plan

  • `bash scripts/test-doc-drift.sh` (if reachable)
  • CI gate + workspace-test (pre-existing inherited failure; admin bypass)

🤖 Generated with Claude Code

Recipe doc for the next-step V1_004 bench dispatches. Documents the
3-knob toolkit shipped via paiml/aprender M32d follow-ups:

- Knob 1: temperature/top_k/top_p (aprender#1842)
- Knob 2: repetition penalty (aprender#1844, supersedes #1843)
- Knob 3: streaming SSE (aprender#1835 contract; impl pending)

## What this doc captures

- 3-knob table with tuning intuitions
- 4 sub-bench scripts (A: sampling alone, B: sampling+penalty,
  C: penalty alone, D: all 3 + small max_tokens)
- Outcome interpretation framework (4 axes: pass rate, turn
  distribution, error class shift, compliance_cost_ratio)
- Companion-side prerequisite: env-var plumbing for the 5 new
  knobs (APR_AGENT_TEMPERATURE/TOP_K/TOP_P/REPEAT_PENALTY/REPEAT_LAST_N)
  through bench script → ccpa-arena-bench → apr code →
  QuantizedGenerateConfig

## Why now

M287 evidence locked in the greedy baseline: 6/20 fixtures all
driver_error turns_before_error=4. M32d works (4× turn throughput
vs pre-M32d 0) but the model is stuck in textual loops. The 3
knobs each address a different theory of WHY it's stuck. This doc
gives the operator a structured way to test each theory.

## What this is NOT

- NOT a V1_004 discharge — none of these sub-benches have been
  dispatched yet
- NOT a binding sequence — operator may choose any subset
- NOT companion-side code — env-var plumbing is a separate PR
  (M289 follow-up)

Mechanical recipe doc. M-counter NOT bumped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 2b47a22 into main May 20, 2026
1 check failed
@noahgift noahgift deleted the m288-v1004-3knob-recipe branch May 20, 2026 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant