You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/launching.md
+12-2Lines changed: 12 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -254,7 +254,7 @@ This section provides a detailed explanation of the configuration parameters ava
254
254
255
255
#### Model Configuration (`model_config`)
256
256
257
-
These parameters define the base model, where to download it from, and how to shard it across TPUs/GPUs. Note that `actor_model_config`, `reference_model_config`, and `rollout_model_config` typically inherit from this base configuration.
257
+
These parameters define the base model, where to download it from, and how to shard it across TPUs/GPUs. Note that `actor_model_config`, `reference_model_config`, and `rollout_model_config` typically inherit from this base configuration.
258
258
259
259
***`model_name`**: The unique full name identifier of the model. This
260
260
corresponds to the full name and should match exactly with the model name
@@ -287,6 +287,16 @@ These parameters define the base model, where to download it from, and how to sh
287
287
***`mesh`**: Defines the hardware mesh layout for distributed training.
*`axis_names`: Names for mesh axes, often used for parallelism strategies (e.g., `"('fsdp','tp')"`for Fully Sharded Data Parallelism and Tensor Parallelism).
290
+
***`colocate_with`**: Optional role-local placement override for
291
+
`actor_model_config`, `reference_model_config`, `rollout_model_config`, and
292
+
other RL roles.
293
+
* If unset, a role owns its own device slice when it has an explicit
294
+
`mesh`, or shares the actor mesh by default when it does not.
295
+
* If set to a role name such as `"actor"`, the role reuses that role's
296
+
device slice but may still define its own `mesh.shape` and
297
+
`mesh.axis_names`.
298
+
* This is different from exact mesh sharing: two roles can be colocated on
299
+
the same devices while using different mesh layouts.
290
300
291
301
292
302
#### Tokenizer Configuration (`tokenizer_config`)
@@ -338,7 +348,7 @@ General settings for the training loop, logging, and checkpointing.
338
348
339
349
* **`eval_every_n_steps`**: Frequency of running evaluation steps.
340
350
341
-
***`gradient_accumulation_steps`**: Number of steps to accumulate gradients
351
+
* **`gradient_accumulation_steps`**: Number of steps to accumulate gradients
342
352
before performing a parameter update (simulates larger batch sizes).
0 commit comments