Add Levanter TPU RL rollout handoff by dlwh · Pull Request #6214 · marin-community/marin

dlwh · 2026-06-06T06:45:34Z

Refresh the durable Levanter TPU RL rollout handoff so it records the current evidence and next steps from the v6e/v5p follow-ups. The handoff links the rollout epic and child issues, records the original v6e prefill-heavy target failure, captures the corrected diagnostic attribution, tracks #6185 head d63d1edf, and clarifies the v5p state: the first mixed run failed before measurement, the TP=8 startup-diagnostics retry failed because only 4 devices were visible, and the bounded TP=4 retry succeeded but is a throughput target failure.

The handoff also records the latest post-split v6e prefill-heavy backend=both row /dlwh/qwen3-v6e8-prefillpostsplit-20260607-0419, launched from #6185 d63d1edf: vLLM measured 1264.03 decode tok/s and 21488.50 total tok/s, while Levanter measured 957.72 decode tok/s and 16281.24 total tok/s, ratio 0.758, target fail. The new split fields show hot prefill chunks 4096,4096,4096,4096, prefill drain 0.171s for 6 tokens, generation 0.629s for 1016 tokens, generation host 0.006s, derived generation throughput 1614.816 tok/s, decode-iteration throughput 1277.749 tok/s, and decode-device throughput 1639.896 tok/s. #6229 now needs an optimize-vs-deprioritize decision rather than another evidence row.

The v5p TP=4 retry /dlwh/qwen3-mixed-v5p-startupdiag-tp4-i512-o512-20260607-0724, launched from #6240 32505d1ca stacked on #6185 d63d1edf, succeeded and produced benchmark rows. vLLM measured 5215.67 decode tok/s and 10431.34 total tok/s; Levanter measured 3131.59 decode tok/s and 6263.19 total tok/s, ratio 0.600, target fail. Levanter produced all 16384 completion tokens with four prefill admissions/chunks (4096,4096,4096,4096), first prefill drain 0.257s, and 3553.034 generation tok/s. #6230 now tracks the v5p throughput gap rather than startup evidence.

The surrounding rollout stack remains ready for review: #6176 (104bf901) carries the dense Qwen3-8B matrix harness/config and is green/skipped; #6185 (d63d1edf) carries the multi-prefill admission, prefill-drain scheduling, corrected timing attribution, vLLM startup-log improvements, bounded request logging, prefill-drain/generation timing split, and docs-hygiene move into .agents/ops/, with all visible checks green/skipped after one external HF/cache rerun; #6186 (c4ba37b) carries the token-native rollout API/data-plane slice and is green/skipped; #6240 (32505d1c) is ready for review, stacks on #6185, and adds bounded vLLM startup runtime/package/env snapshots for future v5p failures, with post-rebase checks terminal green/skipped including levanter-tpu-tests.

Tracker decisions captured in the epic: #6231 documents that #6186 should not be blocked on converting PrimeIntellectEnv to token-native; keep it OpenAI/verifiers-compatible for this stack while preserving attached response token IDs and tokenizer/policy replay identity. #6228 gates dense-matrix reruns on #6185 landing or explicit maintainer authorization, prioritizes canonical v6e decode/mixed rows, and now records the v5p mixed TP=4 row as a measured target failure rather than a startup-evidence gap.

This handoff head is ebbe9a69d after a docs-only correction clarifying the review/landing order and checked-in current-state head. It is terminal green/skipped, including marin-integration, marin-lint, CodeQL, ReadTheDocs, and the relevant unit/change lanes. #6214 is non-draft and review-required.

Part of #6227

dlwh · 2026-06-07T00:38:43Z

🤖 Handoff update: /dlwh/qwen3-v6e8-prefillcorr-20260607-0023 reached JOB_STATE_SUCCEEDED with terminal benchmark results. No resubmit was needed and no Iris cluster mutation was performed.

Config: lib/iris/config/marin.yaml
PR/head under test: #6185 commit 91f6ec06abb46a0d87e94da8fee4a8b4f9802bc4
Shape: Qwen/Qwen3-8B, dense_qwen3_8b_v2, prefill_b8_i2048_o128_n1, backend both, v6e-8, TP=8, --max-pages 512, 2 warmups, 1 measured round.

backend	decode tok/s	total tok/s	decode/vLLM	total/vLLM	target
vllm-tpu	1212.15	20606.54	1.000	1.000	n/a
levanter:auto	898.94	15282.00	0.742	0.742	fail

Corrected Levanter timing fields: decode_iteration_tokens_per_second=1200.223, decode_device_tokens_per_second=1509.350, prefill chunks 4096,4096,4096,4096, decode iteration/device/host/submit seconds 0.852 / 0.677 / 0.174 / 0.002. vLLM detailed decode timing fields were not reported by this run.

dlwh · 2026-06-07T02:20:36Z

🤖 PR description refreshed with the latest tracker decisions from #6227: #6231 keeps PrimeIntellectEnv OpenAI/verifiers-compatible for the current #6186 stack while preserving attached token IDs and replay identity, and #6228 gates dense-matrix reruns on #6185 f55913d1 landing or explicit maintainer authorization. No branch changes, code changes, TPU runs, or Iris mutations were made.

dlwh · 2026-06-07T03:29:16Z

🤖 #6240 post-rebase CI is now terminal green/skipped at head 32505d1ca. The final pending lane, levanter-tpu-tests, passed at 2026-06-07T03:28:03Z; levanter-unit, levanter-torch, marin-integration, marin-lint, CodeQL, ReadTheDocs, and the relevant change/unit lanes are also green or skipped. This confirms the diagnostics-hardening PR is ready for review; it does not prove vLLM starts on v5p. No TPU runs or Iris mutations were made.

dlwh · 2026-06-07T03:34:20Z

🤖 Handoff PR #6214 advanced to docs-only head 2c43f0d91 to update the checked-in handoff artifact with the live stack state: #6185 d63d1edf and #6240 32505d1ca terminal green/skipped. GitHub checks restarted after the push; previous #6214 head dbc3bf595 was terminal green/skipped. No TPU runs or Iris mutations were made.

dlwh · 2026-06-07T03:40:27Z

🤖 CI/status update: #6214 head 2c43f0d91 is now terminal green/skipped after the docs-only checked-in handoff correction. marin-integration passed in 5m31s; marin-lint, CodeQL, ReadTheDocs, and the relevant change/unit lanes are also green or skipped. I updated the #6214 PR body and #6227 epic body from 'checks restarted' to this verified terminal state. No TPU runs or Iris mutations were made.

claude · 2026-06-07T03:43:55Z

Claude finished @dlwh's task in 5m 9s —— View job

Code review

No issues found. Checked for bugs and CLAUDE.md/AGENTS.md compliance.

Notes from the multi-agent pass:

The file is a .agents/projects/ tracking artifact, which matches the root AGENTS.md guidance ("capture a plan in .agents/projects/ before pausing"). No documentation rules in AGENTS.md apply unambiguously to this kind of internal handoff.
All cited tok/s ratios are arithmetically consistent with the underlying numbers (1006.83 / 1262.90 = 0.797, 898.94 / 1212.15 = 0.742, 15282.00 / 20606.54 = 0.742, 3197.58 / 3408.09 = 0.938, 4084.24 / 4985.21 = 0.819).
Short-form SHAs in the PR description match the long-form heads recorded in the doc for #6176 104bf901, #6185 d63d1edf, #6186 c4ba37b, #6240 32505d1ca. Prior handoff head dbc3bf595 is recorded internally as well.
Dates (2026-06-06 / 2026-06-07) and the prior vs. corrected prefill-heavy rows (0.797 vs. 0.742) are kept as separate, clearly labeled jobs; no internal contradictions surfaced.
No unclosed code fences or malformed tables.
• Branch: codex/levanter-rl-rollout-handoff

dlwh · 2026-06-07T03:50:50Z

🤖 Review-status update: #6214 is now non-draft and ready for maintainer review at head 67e9500fc. The head is terminal green/skipped: marin-integration passed in 5m34s, and marin-lint, CodeQL, ReadTheDocs, and the relevant unit/change lanes are green or skipped. This was a docs-only status correction; no TPU runs or Iris mutations were made.

dlwh · 2026-06-07T03:58:44Z

🤖 Status update: #6214 head ebbe9a69d is terminal green/skipped after the docs-only review-order correction. marin-integration passed in 5m4s; marin-lint, CodeQL, ReadTheDocs, and relevant unit/change lanes are green or skipped. No TPU runs or Iris mutations were made.

dlwh added 2 commits June 5, 2026 23:42

Add Levanter TPU RL rollout handoff

1320cdf

Record v5p follow-up heartbeat

558d5a4

dlwh added the agent-generated Created by automation/agent label Jun 6, 2026

dlwh added 5 commits June 5, 2026 23:46

Record handoff PR

24ca815

Record v5p capacity status

6f60f1f

Clarify mixed prefill-drain status

6da92d1

Record TPU rollout follow-up results

824b131

Record v6e prefill diagnostic artifact

4063405

dlwh mentioned this pull request Jun 6, 2026

[Epic] Levanter TPU RL rollout backend parity #6227

Open

5 tasks

claude Bot mentioned this pull request Jun 6, 2026

[levanter] Complete dense Qwen3-8B TPU parity matrix #6228

Open

5 tasks

dlwh added 2 commits June 6, 2026 15:56

Record corrected v6e prefill diagnostic

b667d49

Correct prefill diagnostic interpretation

b865dfd

dlwh mentioned this pull request Jun 6, 2026

[levanter] Optimize v6e prefill-heavy decode gap #6229

Open

5 tasks

dlwh added 5 commits June 6, 2026 16:09

Refresh RL rollout handoff facts

edd869f

Record v5p startup failure reporting

b70b7da

Refresh RL rollout handoff status

16338c3

Update RL rollout handoff green state

0364f79

Refresh RL rollout handoff with v5p diagnostics

8834255

dlwh mentioned this pull request Jun 7, 2026

[levanter] Stabilize v5p Qwen3-8B TPU benchmark startup #6230

Open

Record corrected v6e prefill-heavy rerun

1fe859e

dlwh added 5 commits June 6, 2026 17:46

Record bounded inference request logging

e0dfd11

Record prefill drain timing split

cac28f4

Refresh rollout handoff PR state

f2e66ff

Clarify prefill-heavy follow-up state

0182e93

Record terminal handoff CI state

b44cdc5

Update rollout handoff PR state

dbc3bf5

dlwh mentioned this pull request Jun 7, 2026

[levanter] Add multi-prefill admission for serving #6185

Open

Update rollout handoff stack state

2c43f0d

dlwh marked this pull request as ready for review June 7, 2026 03:43

Update rollout handoff review status

67e9500

Clarify rollout stack review order

ebbe9a6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Levanter TPU RL rollout handoff#6214

Add Levanter TPU RL rollout handoff#6214
dlwh wants to merge 24 commits into
mainfrom
codex/levanter-rl-rollout-handoff

dlwh commented Jun 6, 2026 •

edited

Loading

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

claude Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dlwh commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

claude Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

dlwh commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dlwh commented Jun 6, 2026 •

edited

Loading

claude Bot commented Jun 7, 2026 •

edited

Loading