Skip to content

Commit 196a605

Browse files
authored
fray: rename fray.v2.* to fray.* (#4453) (#5140)
## Summary Stage 3i (final code-side stage) of the Ray removal (parent #4453). With v1 gone (#5137) and the Ray backend gone (#5138), `fray.v2` is the only surviving version — drop the `v2` suffix and move the modules up to the `fray` root. ## What this PR does - `git mv lib/fray/src/fray/v2/{actor,client,device_flops,iris_backend,local_backend,types}.py` → `lib/fray/src/fray/` (6 files, history preserved) - Collapse the `fray.v2/__init__.py` content into `fray/__init__.py`; delete the v2 shim - Rewrite `fray.v2.X` → `fray.X` across 76 files (code, tests, docs, `AGENTS.md`, `.pyrefly-baseline.json`) - Rename `lib/fray/tests/test_v2_*.py` → `lib/fray/tests/test_*.py` (4 files) - No runtime behavior changes — this is a pure rename ## What stays unchanged - `fray.cluster` public API — `from fray.cluster import ResourceConfig` still works for the ~60 external call sites; the shim now re-exports from `fray.types` instead of `fray.v2.types` - Top-level `fray` exports (already the canonical public interface before this PR) - `lib/fray/pyproject.toml` `[tool.hatch.build.targets.wheel]` (already `packages = ["src/fray"]`) ## Post-rename doc + CI cleanup Follow-up commit on this PR scrubs stale Ray references that the three prior stage-3 PRs left behind and simplifies the levanter CI matrix. No runtime behavior changes. Comment/docstring updates (6): 1. `lib/levanter/src/levanter/infra/docker.py` — "(if set by ray runtime env vars)" → "(set by the orchestrator, e.g. Iris/Fray)" 2. `lib/levanter/src/levanter/main/train_lm.py` — "(as happens w/ ray)" → "(as happens under Iris/Fray)" 3. `lib/levanter/src/levanter/main/eval_lm.py` — drop ray-specific framing from the manual-`finish()` comment 4. `lib/levanter/src/levanter/data/sharded_datasource.py` — `num_cpus`/`num_gpus` docstring no longer says "passed to ray" (params remain — still wired through `_BatchMapTransform`); nearby comment about HF dataset multiprocessing rephrased to drop Ray mention 5. `lib/marin/src/marin/processing/classification/deduplication/connected_components.py` — `zephyr/ray/pyarrow` → `zephyr/pyarrow` 6. `lib/marin/src/marin/training/training.py` — `run_levanter_train_lm` docstring rewritten to describe Fray submission (was describing `auto-ray-start`/`auto-worker-start` flags that no longer exist) Function rename (1): 7. `lib/marin/src/marin/tokenize/slice_cache.py` — `_slice_cache_in_ray` → `_slice_cache_entrypoint` (only caller is the `ExecutorStep.fn=` reference in the same module) CI simplification (1): 8. `.github/workflows/levanter-tests.yaml` — drop the `levanter-ray-tests` job and the `-m "not ray"` filter on the main + TPU levanter jobs; remove the `ray` pytest marker registration in `lib/levanter/pyproject.toml` and the two `@pytest.mark.ray` decorators. Only one live ray-marked test remained (`test_chat_dataset_build_and_pack`, ~9s, not slow — folded into the main suite). The other (`test_hf_audio_ray_pipeline`) is `@pytest.mark.skip` anyway. Also scrubs leftover `not ray` filters from `lib/levanter/AGENTS.md` and `lib/levanter/scripts/launch_tpu_tests.sh`. ## Verification - [x] `./infra/pre-commit.py --all-files --fix` → OK - [x] `uv run --with pyrefly pyrefly check` → 150 errors / 152 suppressed (identical to `origin/main`, no regressions; the two `fray/types.py` entries are already baselined at the new path) - [x] `uv run --directory lib/fray --group fray_test pytest tests -x --timeout=60` → 57 passed (matches post-3g count) - [x] `uv lock --check` → clean - [x] `test_chat_dataset_build_and_pack` passes standalone in ~9s (confirming the CI fold-in is cheap) - [x] No remaining `fray.v2` references outside `.agents/projects/` (historical design docs; left untouched, matching stage 3f's handling) ## Scope - 11 file renames + 76 content edits = 87 files total. 155 insertions / 155 deletions on content plus 73 deletions from the collapsed v2 shim. - Cleanup commit: 13 files, +19/-51. ## Next steps Per parent #4453: - §2 (RAY_* secrets) + §3 (`marin_cluster*` digest cleanup) are parked on the `marin-big-run` Ray cluster retirement - Close #4453 when those two land --------- Co-authored-by: Romain Yon <1596570+yonromai@users.noreply.github.com>
1 parent e828084 commit 196a605

94 files changed

Lines changed: 196 additions & 301 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/levanter-tests.yaml

Lines changed: 2 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -58,36 +58,7 @@ jobs:
5858
- name: Test with pytest
5959
run: |
6060
# Test with specific JAX version, excluding TPU tests
61-
PYTHONPATH=tests:src:. uv run --package marin-levanter --frozen --with "jax[cpu]==0.8.0" pytest tests -m "not entry and not slow and not ray and not tpu" --durations=20
62-
63-
levanter-ray-tests:
64-
needs: changes
65-
if: needs.changes.outputs.should_run == 'true'
66-
runs-on: ubuntu-latest
67-
defaults:
68-
run:
69-
working-directory: lib/levanter
70-
71-
steps:
72-
- uses: actions/checkout@v5
73-
- name: Install uv and Python
74-
uses: astral-sh/setup-uv@v6
75-
with:
76-
version: "0.7.20"
77-
python-version: "3.11"
78-
enable-cache: true
79-
working-directory: lib/levanter
80-
- name: Set up Node.js
81-
uses: actions/setup-node@v4
82-
with:
83-
node-version: "22"
84-
- name: Set up Python
85-
run: uv python install
86-
- name: Install dependencies
87-
run: uv sync --package marin-levanter --dev --group test --frozen
88-
- name: Test with pytest
89-
run: |
90-
PYTHONPATH=tests:src:. uv run --package marin-levanter --frozen pytest tests -m "ray" --durations=20
61+
PYTHONPATH=tests:src:. uv run --package marin-levanter --frozen --with "jax[cpu]==0.8.0" pytest tests -m "not entry and not slow and not tpu" --durations=20
9162
9263
levanter-entry-tests:
9364
needs: changes
@@ -241,7 +212,7 @@ jobs:
241212
timeout --kill-after=5 --signal=TERM 890 \
242213
uv run --package marin-levanter --frozen --group test --with 'jax[tpu]==$JAX_VERSION' \
243214
pytest -n 0 lib/levanter/tests \
244-
-m 'not entry and not ray and not slow and not torch' \
215+
-m 'not entry and not slow and not torch' \
245216
--ignore=lib/levanter/tests/test_audio.py \
246217
--ignore=lib/levanter/tests/test_new_cache.py \
247218
--ignore=lib/levanter/tests/test_hf_checkpoints.py \

.pyrefly-baseline.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"column": 9,
66
"stop_line": 508,
77
"stop_column": 22,
8-
"path": "lib/fray/src/fray/v2/types.py",
8+
"path": "lib/fray/src/fray/types.py",
99
"code": -2,
1010
"name": "invalid-annotation",
1111
"description": "`Self` cannot be used in a static method",
@@ -17,7 +17,7 @@
1717
"column": 9,
1818
"stop_line": 516,
1919
"stop_column": 20,
20-
"path": "lib/fray/src/fray/v2/types.py",
20+
"path": "lib/fray/src/fray/types.py",
2121
"code": -2,
2222
"name": "invalid-annotation",
2323
"description": "`Self` cannot be used in a static method",

docs/debug-log-zephyr-coordinator-thread-shutdown.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ The real `ZephyrContext.execute()` teardown path leaves the coordinator actor's
1919
## Changes to make
2020

2121
- Inspect `_run_coordinator_job()` in `lib/zephyr/src/zephyr/execution.py`
22-
- Inspect `LocalClient.host_actor()` in `lib/fray/src/fray/v2/local_backend.py`
22+
- Inspect `LocalClient.host_actor()` in `lib/fray/src/fray/local_backend.py`
2323
- Add a regression test in `lib/zephyr/tests/test_execution.py` that exercises `execute()` and waits for `zephyr-coordinator-loop` to disappear
2424

2525
## Future Work

docs/references/resource-config.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,20 +21,20 @@ gpu_auto = ResourceConfig.with_gpu() # auto-detect GPU type
2121
cpu_config = ResourceConfig.with_cpu()
2222
```
2323

24-
::: fray.v2.types.ResourceConfig
24+
::: fray.types.ResourceConfig
2525

2626
## Device Configurations
2727

2828
These are the underlying device types wrapped by `ResourceConfig`:
2929

3030
### CPU
3131

32-
::: fray.v2.types.CpuConfig
32+
::: fray.types.CpuConfig
3333

3434
### GPU
3535

36-
::: fray.v2.types.GpuConfig
36+
::: fray.types.GpuConfig
3737

3838
### TPU
3939

40-
::: fray.v2.types.TpuConfig
40+
::: fray.types.TpuConfig

experiments/dedup/poc_nemotron.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
import os
99
from typing import TypeVar
1010

11-
from fray.v2 import ResourceConfig
11+
from fray import ResourceConfig
1212
from rigging.filesystem import marin_temp_bucket, region_from_metadata, check_path_in_region
1313

1414
from marin.datakit.normalize import NormalizedData, normalize_step

experiments/defaults.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
from typing import Any
1515

1616
import jmp
17-
from fray.v2 import ResourceConfig
17+
from fray import ResourceConfig
1818
from marin.execution.remote import remote
1919
from haliax.partitioning import ResourceAxis
2020
from haliax.quantization import QuantizationConfig

experiments/exp_model_perplexity_gap_fineweb2_multilingual.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Copyright The Marin Authors
22
# SPDX-License-Identifier: Apache-2.0
33

4-
from fray.v2.types import ResourceConfig
4+
from fray.types import ResourceConfig
55

66
from experiments.defaults import default_raw_validation_sets
77
from experiments.evals.fineweb2_multilingual import fineweb2_multilingual_raw_validation_sets

experiments/exp_model_perplexity_gap_long_tail_runnable.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
See https://github.com/marin-community/marin/issues/5005.
77
"""
88

9-
from fray.v2.types import ResourceConfig
9+
from fray.types import ResourceConfig
1010

1111
from experiments.evals.long_tail_ppl_runnable import runnable_long_tail_raw_validation_sets
1212
from marin.evaluation.perplexity_gap import GapFinderModelConfig, default_model_perplexity_gap

experiments/exp_model_perplexity_gap_marin_vs_llama.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Copyright The Marin Authors
22
# SPDX-License-Identifier: Apache-2.0
33

4-
from fray.v2.types import ResourceConfig
4+
from fray.types import ResourceConfig
55

66
from experiments.defaults import default_raw_validation_sets
77
from marin.evaluation.perplexity_gap import (

experiments/grug/dispatch.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@
99
from typing import TypeVar
1010

1111
from fray.cluster import ResourceConfig
12-
from fray.v2.client import current_client
13-
from fray.v2.types import Entrypoint, JobRequest, create_environment
14-
from fray.v2.types import GpuConfig, TpuConfig
12+
from fray.client import current_client
13+
from fray.types import Entrypoint, JobRequest, create_environment
14+
from fray.types import GpuConfig, TpuConfig
1515

1616
logger = logging.getLogger(__name__)
1717

0 commit comments

Comments
 (0)