🤖 Follow-up to the Ray deletion PRs (#5028, #5031). Non-code files still reference modules that have been deleted. A mechanical sed pass would produce misleading docs because scripts/iris/dev_tpu.py is not a drop-in replacement for ray_run.py — it's a persistent-session tool (allocate → execute → release), not a one-shot submitter. Flags diverge: --no_wait, --extra, --cluster, --entrypoint-num-*, --auto-stop, --submission-id have no equivalent; --env_vars KEY VALUE becomes -e KEY=VALUE only on subcommands. The Levanter launch_on_ray references are similarly workflow-specific.
Files to update (owners in parens)
From #5028 (ray_run.py / marin.cluster.ray references)
Tutorials (docs team)
docs/explanations/executor.md:108
docs/tutorials/train-dpo.md:143
docs/tutorials/train-an-lm.md:152
docs/tutorials/tpu-cluster-setup.md:101-109 (may be deleted wholesale with the operator-tooling cleanup)
Recipes
docs/recipes/add_scaling_heuristic.md:45,179 (uses --cluster marin-us-central2 — needs owner input on target cluster)
Skills / planning
.agents/skills/ferries/SKILL.md:127,141
.agents/skills/architecture/SKILL.md:16,27
.agents/projects/ferry_framework.md:277
Runbooks / experiment READMEs
experiments/tootsie/BABYSITTING.md:15,62,69,76,84 (tootsie operators)
experiments/grug/README.md:43
experiments/README_sft.md:12,44,47
Docstring / header Usage: lines
experiments/tutorials/exp1077_reproduce_dclm_1b1x.py:14
experiments/tutorials/exp1078_reproduce_dclm_7b1x.py:14
experiments/rollout_data/{synthetic1,swe_rebench_openhands,principia,nemotron_terminal,gpt_oss_rollouts,superior_reasoning,coderforge}.py (7 files, all Usage: at ~line 7)
experiments/ferries/daily.py:14 (prose reference)
From #5031 (launch_on_ray / ray_tpu references)
Levanter docs
lib/levanter/docs/Getting-Started-TPU-VM.md — 5 references to launch_on_ray (feature description, caveats, usage example, deprecation note)
Questions to resolve before sweeping
- Canonical one-shot launcher for executor-driven experiments in the Iris era — is it just
python experiments/foo.py, assuming executor_main routes via fray → Iris?
--no_wait equivalent for detached/long-running launches (ferries, tootsie).
docs/tutorials/tpu-cluster-setup.md — rewrite or delete with the operator-tooling cleanup (scripts/ray/*, 18 × infra/marin-*.yaml)?
lib/levanter/docs/Getting-Started-TPU-VM.md — rewrite the launch_on_ray sections to point at the fray/Iris TPU path, or delete them entirely if that workflow is deprecated?
🤖 Follow-up to the Ray deletion PRs (#5028, #5031). Non-code files still reference modules that have been deleted. A mechanical sed pass would produce misleading docs because
scripts/iris/dev_tpu.pyis not a drop-in replacement forray_run.py— it's a persistent-session tool (allocate→execute→release), not a one-shot submitter. Flags diverge:--no_wait,--extra,--cluster,--entrypoint-num-*,--auto-stop,--submission-idhave no equivalent;--env_vars KEY VALUEbecomes-e KEY=VALUEonly on subcommands. The Levanterlaunch_on_rayreferences are similarly workflow-specific.Files to update (owners in parens)
From #5028 (
ray_run.py/marin.cluster.rayreferences)Tutorials (docs team)
docs/explanations/executor.md:108docs/tutorials/train-dpo.md:143docs/tutorials/train-an-lm.md:152docs/tutorials/tpu-cluster-setup.md:101-109(may be deleted wholesale with the operator-tooling cleanup)Recipes
docs/recipes/add_scaling_heuristic.md:45,179(uses--cluster marin-us-central2— needs owner input on target cluster)Skills / planning
.agents/skills/ferries/SKILL.md:127,141.agents/skills/architecture/SKILL.md:16,27.agents/projects/ferry_framework.md:277Runbooks / experiment READMEs
experiments/tootsie/BABYSITTING.md:15,62,69,76,84(tootsie operators)experiments/grug/README.md:43experiments/README_sft.md:12,44,47Docstring / header
Usage:linesexperiments/tutorials/exp1077_reproduce_dclm_1b1x.py:14experiments/tutorials/exp1078_reproduce_dclm_7b1x.py:14experiments/rollout_data/{synthetic1,swe_rebench_openhands,principia,nemotron_terminal,gpt_oss_rollouts,superior_reasoning,coderforge}.py(7 files, allUsage:at ~line 7)experiments/ferries/daily.py:14(prose reference)From #5031 (
launch_on_ray/ray_tpureferences)Levanter docs
lib/levanter/docs/Getting-Started-TPU-VM.md— 5 references tolaunch_on_ray(feature description, caveats, usage example, deprecation note)Questions to resolve before sweeping
python experiments/foo.py, assumingexecutor_mainroutes via fray → Iris?--no_waitequivalent for detached/long-running launches (ferries, tootsie).docs/tutorials/tpu-cluster-setup.md— rewrite or delete with the operator-tooling cleanup (scripts/ray/*, 18 ×infra/marin-*.yaml)?lib/levanter/docs/Getting-Started-TPU-VM.md— rewrite thelaunch_on_raysections to point at the fray/Iris TPU path, or delete them entirely if that workflow is deprecated?