Docs sweep: replace ray_run.py / launch_on_ray / ray_tpu references after deletion (#5028, #5031)

🤖 Follow-up to the Ray deletion PRs (#5028, #5031). Non-code files still reference modules that have been deleted. A mechanical sed pass would produce misleading docs because `scripts/iris/dev_tpu.py` is **not** a drop-in replacement for `ray_run.py` — it's a persistent-session tool (`allocate` → `execute` → `release`), not a one-shot submitter. Flags diverge: `--no_wait`, `--extra`, `--cluster`, `--entrypoint-num-*`, `--auto-stop`, `--submission-id` have no equivalent; `--env_vars KEY VALUE` becomes `-e KEY=VALUE` only on subcommands. The Levanter `launch_on_ray` references are similarly workflow-specific.

## Files to update (owners in parens)

### From #5028 (`ray_run.py` / `marin.cluster.ray` references)

**Tutorials (docs team)**
- `docs/explanations/executor.md:108`
- `docs/tutorials/train-dpo.md:143`
- `docs/tutorials/train-an-lm.md:152`
- `docs/tutorials/tpu-cluster-setup.md:101-109` (may be deleted wholesale with the operator-tooling cleanup)

**Recipes**
- `docs/recipes/add_scaling_heuristic.md:45,179` (uses `--cluster marin-us-central2` — needs owner input on target cluster)

**Skills / planning**
- `.agents/skills/ferries/SKILL.md:127,141`
- `.agents/skills/architecture/SKILL.md:16,27`
- `.agents/projects/ferry_framework.md:277`

**Runbooks / experiment READMEs**
- `experiments/tootsie/BABYSITTING.md:15,62,69,76,84` (tootsie operators)
- `experiments/grug/README.md:43`
- `experiments/README_sft.md:12,44,47`

**Docstring / header `Usage:` lines**
- `experiments/tutorials/exp1077_reproduce_dclm_1b1x.py:14`
- `experiments/tutorials/exp1078_reproduce_dclm_7b1x.py:14`
- `experiments/rollout_data/{synthetic1,swe_rebench_openhands,principia,nemotron_terminal,gpt_oss_rollouts,superior_reasoning,coderforge}.py` (7 files, all `Usage:` at ~line 7)
- `experiments/ferries/daily.py:14` (prose reference)

### From #5031 (`launch_on_ray` / `ray_tpu` references)

**Levanter docs**
- `lib/levanter/docs/Getting-Started-TPU-VM.md` — 5 references to `launch_on_ray` (feature description, caveats, usage example, deprecation note)

## Questions to resolve before sweeping

1. **Canonical one-shot launcher for executor-driven experiments in the Iris era** — is it just `python experiments/foo.py`, assuming `executor_main` routes via fray → Iris?
2. **`--no_wait` equivalent** for detached/long-running launches (ferries, tootsie).
3. **`docs/tutorials/tpu-cluster-setup.md`** — rewrite or delete with the operator-tooling cleanup (`scripts/ray/*`, 18 × `infra/marin-*.yaml`)?
4. **`lib/levanter/docs/Getting-Started-TPU-VM.md`** — rewrite the `launch_on_ray` sections to point at the fray/Iris TPU path, or delete them entirely if that workflow is deprecated?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs sweep: replace ray_run.py / launch_on_ray / ray_tpu references after deletion (#5028, #5031) #5029

Files to update (owners in parens)

From #5028 (`ray_run.py` / `marin.cluster.ray` references)

From #5031 (`launch_on_ray` / `ray_tpu` references)

Questions to resolve before sweeping

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Docs sweep: replace ray_run.py / launch_on_ray / ray_tpu references after deletion (#5028, #5031) #5029

Description

Files to update (owners in parens)

From #5028 (ray_run.py / marin.cluster.ray references)

From #5031 (launch_on_ray / ray_tpu references)

Questions to resolve before sweeping

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

From #5028 (`ray_run.py` / `marin.cluster.ray` references)

From #5031 (`launch_on_ray` / `ray_tpu` references)