You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This migrates our various accelerator & resource configurations to use
Fray.
Of course we had at least 3 ways to specify these in the past. This
doesn't completely remove all duplication, but it gets us most of the
way there. Execution is still handled via ray as usual, this just
replaces our various V6_TPU_STRICT_PACK etc constants with Fray
versions.
Now all training & evaluation jobs use a Fray ResourceConfig to specify
the accelerator type, number of slices etc. We also port the flops
calculation over to Fray since it was duplicated in a few places.
As before, for training jobs, we rely on the usual ray_tpu logic to pack
TPU workers properly.
For evaluation jobs, instead of having the various Ray-specific helpers,
we instead inject the strict_pack using the "scheduling_strategy" helper
when launching the evaluation jobs themselves.
Copy file name to clipboardExpand all lines: docs/tutorials/first-experiment.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -78,16 +78,16 @@ For this tutorial, we will use the `SimpleTrainConfig` class from `experiments.s
78
78
This class defines basic training configuration that is sufficient for most experiments.
79
79
80
80
!!! info "Training Configuration for Different Accelerators"
81
-
You need to provide the appropriate resource configuration based on your hardware setup. Marin supports different accelerator types through various [resource configurations](../references/resource-config.md). The `CpuOnlyConfig` is one such resource configuration that requests a certain number of CPUs. Other resource configurations include `GpuConfig` for requesting GPUs and `TpuPodConfig` for requesting TPUs.
81
+
You need to provide the appropriate resource configuration based on your hardware setup. Marin supports different accelerator types through [`ResourceConfig`](../references/resource-config.md) factory methods.
82
82
83
83
=== "CPU"
84
84
```python
85
-
from marin.resources import CpuOnlyConfig
85
+
from fray.cluster import ResourceConfig
86
86
from experiments.simple_train_config import SimpleTrainConfig
87
87
88
88
nano_train_config = SimpleTrainConfig(
89
89
# Here we define the hardware resources we need.
90
-
resources=CpuOnlyConfig(num_cpus=1),
90
+
resources=ResourceConfig.with_cpu(),
91
91
train_batch_size=4,
92
92
num_train_steps=100,
93
93
# set hyperparameters
@@ -100,12 +100,12 @@ This class defines basic training configuration that is sufficient for most expe
100
100
101
101
=== "GPU"
102
102
```python
103
-
from marin.resources import GpuConfig
103
+
from fray.cluster import ResourceConfig
104
104
from experiments.simple_train_config import SimpleTrainConfig
105
105
106
106
nano_train_config = SimpleTrainConfig(
107
107
# Here we define the hardware resources we need.
108
-
resources=GpuConfig(gpu_count=1),
108
+
resources=ResourceConfig.with_gpu(count=1),
109
109
train_batch_size=32,
110
110
num_train_steps=100,
111
111
# set hyperparameters
@@ -116,12 +116,12 @@ This class defines basic training configuration that is sufficient for most expe
116
116
117
117
=== "TPU"
118
118
```python
119
-
from marin.resources import TpuPodConfig
119
+
from fray.cluster import ResourceConfig
120
120
from experiments.simple_train_config import SimpleTrainConfig
0 commit comments