Commit 7a56e01
authored
No more Ray in Marin (#5138)
## Summary
Stage 3g of the Ray removal plan (#4453). Deletes `fray.v2.ray_backend`
(~3.1k LOC) and the Ray auto-detect branch in
`fray.v2.client.current_client`. `fray.v2` is now Iris-only, with
`LocalClient` as the fallback for tests/dev.
Builds on stage 3f (#5137) which deleted `fray.v1`.
## What's deleted
- `lib/fray/src/fray/v2/ray_backend/` — all 10 modules (`backend.py`,
`tpu.py`, `dashboard.py`, `dashboard_proxy.py`, `deps.py`,
`fn_thunk.py`,
`resources.py`, `context.py`, `auth.py`, `__init__.py`).
- Ray auto-detect branch in `fray.v2.client.current_client` (the
`ray.is_initialized()` / `FRAY_CLUSTER_SPEC=ray` path and the
`RayClient` import). Resolution order is now: explicit client → Iris
auto-detect → `LocalClient` fallback.
- Ray-flavored cases in `lib/fray/tests/test_v2_current_client.py`
(`test_ray_auto_detection`,
`test_ray_not_detected_when_not_initialized`,
`test_iris_takes_priority_over_ray`) and the residual
`patch("ray.is_initialized", ...)` calls.
- `ray = ["ray==2.54.0"]` optional dep and the `ray[default]` /
`pip # ray requires pip` lines from the `fray_test` group in
`lib/fray/pyproject.toml`.
- `marin-fray[ray]` in `lib/zephyr/pyproject.toml` becomes plain
`marin-fray` (zephyr has no direct `import ray`, the extra is
vestigial).
- `ray==2.54.0` from `lib/marin/pyproject.toml` and
`ray[default]==2.54.0` from `lib/levanter/pyproject.toml`. Both were
dead direct deps (`rg '^import ray$|^from ray' lib/marin lib/levanter
experiments` returns empty). Stale "avoid 7+ due to ray" comment on
levanter's `protobuf>=6,<7` pin trimmed to just the TB/XProf reason.
- Stale ``(Ray)``/``Ray's ...`` mentions and TODOs in `actor.py`,
`types.py`, `local_backend.py`, and `lib/fray/AGENTS.md`.
- `ray` rows in `uv.lock` (fray package + all three dependency groups +
zephyr's transitive entry + marin/levanter direct edges).
## Post-audit cleanup
Follow-up commit after a post-#5138 grep audit flagged five more stale
Ray references:
- `lib/levanter/docker/tpu/Dockerfile.cluster`: dropped the
`ray[default,gcp]==2.34.0` install + `dlwh/ray` fork patch (HACK for
ray-project/ray#47769), `RAY_USAGE_STATS_ENABLED`, and the stale
"using Ray to manage TPU slices" header comment. File is still
referenced by `.github/workflows/docker-images.yaml` and
`lib/levanter/infra/cluster/push_cluster_docker.sh`, so kept.
- `lib/levanter/docker/tpu/Dockerfile.incremental`: dropped
`RAY_USAGE_STATS_ENABLED`.
- `infra/README.md`: replaced the "## Ray" section (claiming Ray is
Marin's cluster infra) with a terse pointer to Iris + fray + zephyr.
- `docs/dev-guide/contributing.md`: dropped the obsolete
"unset RAY_ADDRESS" guardrail for running unit tests.
- `tests/test_dry_run.py`: dropped `os.environ["RAY_LOCAL_CLUSTER"] =
"1"`
(confirmed no remaining readers repo-wide post-#5138) and the
now-unused `os` import.
## What's left
- `fray.v2` Iris-only: `FrayIrisClient` / `IrisActorHandle` /
`IrisActorGroup` unchanged.
- `fray.cluster/__init__.py` (v2 re-export shim) untouched — has ~60
external call sites and its API is load-bearing.
- No changes to `fray.v2` subpackage structure: rename to root is stage
3i.
- `ray` survives in `uv.lock` as a transitive dep of `vllm-tpu` under
marin's `vllm` extra. That's intentional: vllm-tpu pins ray itself,
we no longer pin it on our side.
## Verification
- [x] `./infra/pre-commit.py --all-files --fix` — OK.
- [x] `uvx pyrefly@0.61.0 check --baseline .pyrefly-baseline.json` —
0 errors; baseline untouched (no `ray_backend` entries existed).
- [x] `uv run pytest lib/fray/tests -x --timeout=60` — 57 passed.
- [x] `uv lock` — clean re-resolve; `ray` removed from fray extras,
from the zephyr transitive edge, and from the marin/levanter
direct edges. Remains only as a vllm-tpu transitive.
- [x] Repo-wide grep
`ray_backend|fray\.v2\.ray|RayClient|FRAY_CLUSTER_SPEC`
returns only archived `.agents/projects/*` design docs.
## Next steps
- **Stage 3i**: rename `fray.v2.*` → `fray.*` (drop the `v2`
subpackage).
Tracking issue pending; unblocked by this PR.
- **GCP §2 (marin_cluster* artifact-registry digests)** and **§3 (RAY_*
secrets)** remain parked on `marin-big-run` Ray cluster retirement.
- Once §2/§3 land, we can close the parent ticket **#4453**.
---------
Co-authored-by: Romain Yon <1596570+yonromai@users.noreply.github.com>1 parent 5c22792 commit 7a56e01
26 files changed
Lines changed: 54 additions & 3184 deletions
File tree
- docs/dev-guide
- infra
- lib
- fray
- src/fray/v2
- ray_backend
- tests
- levanter
- docker/tpu
- marin
- zephyr
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
38 | | - | |
39 | 37 | | |
40 | 38 | | |
41 | 39 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
28 | 20 | | |
29 | 21 | | |
30 | 22 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
16 | 15 | | |
17 | 16 | | |
18 | 17 | | |
| |||
23 | 22 | | |
24 | 23 | | |
25 | 24 | | |
26 | | - | |
| 25 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | 27 | | |
31 | 28 | | |
32 | 29 | | |
33 | | - | |
34 | 30 | | |
35 | 31 | | |
36 | | - | |
37 | 32 | | |
38 | 33 | | |
39 | 34 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
30 | | - | |
31 | | - | |
| 29 | + | |
| 30 | + | |
32 | 31 | | |
33 | 32 | | |
34 | 33 | | |
| |||
108 | 107 | | |
109 | 108 | | |
110 | 109 | | |
111 | | - | |
| 110 | + | |
112 | 111 | | |
113 | 112 | | |
114 | 113 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
163 | 163 | | |
164 | 164 | | |
165 | 165 | | |
166 | | - | |
167 | | - | |
| 166 | + | |
168 | 167 | | |
169 | 168 | | |
170 | 169 | | |
171 | 170 | | |
172 | 171 | | |
173 | 172 | | |
174 | 173 | | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | 174 | | |
179 | 175 | | |
180 | 176 | | |
| |||
187 | 183 | | |
188 | 184 | | |
189 | 185 | | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | 186 | | |
205 | 187 | | |
206 | 188 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
225 | 225 | | |
226 | 226 | | |
227 | 227 | | |
228 | | - | |
229 | | - | |
| 228 | + | |
| 229 | + | |
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
| |||
This file was deleted.
This file was deleted.
0 commit comments