Skip to content

Add Levanter TPU RL rollout handoff#6214

Open
dlwh wants to merge 24 commits into
mainfrom
codex/levanter-rl-rollout-handoff
Open

Add Levanter TPU RL rollout handoff#6214
dlwh wants to merge 24 commits into
mainfrom
codex/levanter-rl-rollout-handoff

Conversation

@dlwh

@dlwh dlwh commented Jun 6, 2026

Copy link
Copy Markdown
Member

Refresh the durable Levanter TPU RL rollout handoff so it records the current evidence and next steps from the v6e/v5p follow-ups. The handoff links the rollout epic and child issues, records the original v6e prefill-heavy target failure, captures the corrected diagnostic attribution, tracks #6185 head d63d1edf, and clarifies the v5p state: the first mixed run failed before measurement, the TP=8 startup-diagnostics retry failed because only 4 devices were visible, and the bounded TP=4 retry succeeded but is a throughput target failure.

The handoff also records the latest post-split v6e prefill-heavy backend=both row /dlwh/qwen3-v6e8-prefillpostsplit-20260607-0419, launched from #6185 d63d1edf: vLLM measured 1264.03 decode tok/s and 21488.50 total tok/s, while Levanter measured 957.72 decode tok/s and 16281.24 total tok/s, ratio 0.758, target fail. The new split fields show hot prefill chunks 4096,4096,4096,4096, prefill drain 0.171s for 6 tokens, generation 0.629s for 1016 tokens, generation host 0.006s, derived generation throughput 1614.816 tok/s, decode-iteration throughput 1277.749 tok/s, and decode-device throughput 1639.896 tok/s. #6229 now needs an optimize-vs-deprioritize decision rather than another evidence row.

The v5p TP=4 retry /dlwh/qwen3-mixed-v5p-startupdiag-tp4-i512-o512-20260607-0724, launched from #6240 32505d1ca stacked on #6185 d63d1edf, succeeded and produced benchmark rows. vLLM measured 5215.67 decode tok/s and 10431.34 total tok/s; Levanter measured 3131.59 decode tok/s and 6263.19 total tok/s, ratio 0.600, target fail. Levanter produced all 16384 completion tokens with four prefill admissions/chunks (4096,4096,4096,4096), first prefill drain 0.257s, and 3553.034 generation tok/s. #6230 now tracks the v5p throughput gap rather than startup evidence.

The surrounding rollout stack remains ready for review: #6176 (104bf901) carries the dense Qwen3-8B matrix harness/config and is green/skipped; #6185 (d63d1edf) carries the multi-prefill admission, prefill-drain scheduling, corrected timing attribution, vLLM startup-log improvements, bounded request logging, prefill-drain/generation timing split, and docs-hygiene move into .agents/ops/, with all visible checks green/skipped after one external HF/cache rerun; #6186 (c4ba37b) carries the token-native rollout API/data-plane slice and is green/skipped; #6240 (32505d1c) is ready for review, stacks on #6185, and adds bounded vLLM startup runtime/package/env snapshots for future v5p failures, with post-rebase checks terminal green/skipped including levanter-tpu-tests.

Tracker decisions captured in the epic: #6231 documents that #6186 should not be blocked on converting PrimeIntellectEnv to token-native; keep it OpenAI/verifiers-compatible for this stack while preserving attached response token IDs and tokenizer/policy replay identity. #6228 gates dense-matrix reruns on #6185 landing or explicit maintainer authorization, prioritizes canonical v6e decode/mixed rows, and now records the v5p mixed TP=4 row as a measured target failure rather than a startup-evidence gap.

This handoff head is ebbe9a69d after a docs-only correction clarifying the review/landing order and checked-in current-state head. It is terminal green/skipped, including marin-integration, marin-lint, CodeQL, ReadTheDocs, and the relevant unit/change lanes. #6214 is non-draft and review-required.

Part of #6227

@dlwh dlwh added the agent-generated Created by automation/agent label Jun 6, 2026
@dlwh

dlwh commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

🤖 Handoff update: /dlwh/qwen3-v6e8-prefillcorr-20260607-0023 reached JOB_STATE_SUCCEEDED with terminal benchmark results. No resubmit was needed and no Iris cluster mutation was performed.

Config: lib/iris/config/marin.yaml
PR/head under test: #6185 commit 91f6ec06abb46a0d87e94da8fee4a8b4f9802bc4
Shape: Qwen/Qwen3-8B, dense_qwen3_8b_v2, prefill_b8_i2048_o128_n1, backend both, v6e-8, TP=8, --max-pages 512, 2 warmups, 1 measured round.

backend decode tok/s total tok/s decode/vLLM total/vLLM target
vllm-tpu 1212.15 20606.54 1.000 1.000 n/a
levanter:auto 898.94 15282.00 0.742 0.742 fail

Corrected Levanter timing fields: decode_iteration_tokens_per_second=1200.223, decode_device_tokens_per_second=1509.350, prefill chunks 4096,4096,4096,4096, decode iteration/device/host/submit seconds 0.852 / 0.677 / 0.174 / 0.002. vLLM detailed decode timing fields were not reported by this run.

@dlwh

dlwh commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

🤖 PR description refreshed with the latest tracker decisions from #6227: #6231 keeps PrimeIntellectEnv OpenAI/verifiers-compatible for the current #6186 stack while preserving attached token IDs and replay identity, and #6228 gates dense-matrix reruns on #6185 f55913d1 landing or explicit maintainer authorization. No branch changes, code changes, TPU runs, or Iris mutations were made.

@dlwh

dlwh commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

🤖 #6240 post-rebase CI is now terminal green/skipped at head 32505d1ca. The final pending lane, levanter-tpu-tests, passed at 2026-06-07T03:28:03Z; levanter-unit, levanter-torch, marin-integration, marin-lint, CodeQL, ReadTheDocs, and the relevant change/unit lanes are also green or skipped. This confirms the diagnostics-hardening PR is ready for review; it does not prove vLLM starts on v5p. No TPU runs or Iris mutations were made.

@dlwh

dlwh commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

🤖 Handoff PR #6214 advanced to docs-only head 2c43f0d91 to update the checked-in handoff artifact with the live stack state: #6185 d63d1edf and #6240 32505d1ca terminal green/skipped. GitHub checks restarted after the push; previous #6214 head dbc3bf595 was terminal green/skipped. No TPU runs or Iris mutations were made.

@dlwh

dlwh commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

🤖 CI/status update: #6214 head 2c43f0d91 is now terminal green/skipped after the docs-only checked-in handoff correction. marin-integration passed in 5m31s; marin-lint, CodeQL, ReadTheDocs, and the relevant change/unit lanes are also green or skipped. I updated the #6214 PR body and #6227 epic body from 'checks restarted' to this verified terminal state. No TPU runs or Iris mutations were made.

@dlwh dlwh marked this pull request as ready for review June 7, 2026 03:43
@claude

claude Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Claude finished @dlwh's task in 5m 9s —— View job


Code review

No issues found. Checked for bugs and CLAUDE.md/AGENTS.md compliance.

Notes from the multi-agent pass:

  • The file is a .agents/projects/ tracking artifact, which matches the root AGENTS.md guidance ("capture a plan in .agents/projects/ before pausing"). No documentation rules in AGENTS.md apply unambiguously to this kind of internal handoff.
  • All cited tok/s ratios are arithmetically consistent with the underlying numbers (1006.83 / 1262.90 = 0.797, 898.94 / 1212.15 = 0.742, 15282.00 / 20606.54 = 0.742, 3197.58 / 3408.09 = 0.938, 4084.24 / 4985.21 = 0.819).
  • Short-form SHAs in the PR description match the long-form heads recorded in the doc for #6176 104bf901, #6185 d63d1edf, #6186 c4ba37b, #6240 32505d1ca. Prior handoff head dbc3bf595 is recorded internally as well.
  • Dates (2026-06-06 / 2026-06-07) and the prior vs. corrected prefill-heavy rows (0.797 vs. 0.742) are kept as separate, clearly labeled jobs; no internal contradictions surfaced.
  • No unclosed code fences or malformed tables.
    Branch: codex/levanter-rl-rollout-handoff

@dlwh

dlwh commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

🤖 Review-status update: #6214 is now non-draft and ready for maintainer review at head 67e9500fc. The head is terminal green/skipped: marin-integration passed in 5m34s, and marin-lint, CodeQL, ReadTheDocs, and the relevant unit/change lanes are green or skipped. This was a docs-only status correction; no TPU runs or Iris mutations were made.

@dlwh

dlwh commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

🤖 Status update: #6214 head ebbe9a69d is terminal green/skipped after the docs-only review-order correction. marin-integration passed in 5m4s; marin-lint, CodeQL, ReadTheDocs, and relevant unit/change lanes are green or skipped. No TPU runs or Iris mutations were made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant