Skip to content

Commit f7b77be

Browse files
committed
update tutorial order
1 parent ba598b5 commit f7b77be

1 file changed

Lines changed: 25 additions & 27 deletions

File tree

docs/Pretraining.md

Lines changed: 25 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ python scripts/official/base.py
118118

119119
**Example 1: Single-GPU Training for Debugging**
120120
```bash
121-
python scripts/official/nano.py train_single my_debug_run local \
121+
torchrun scripts/official/nano.py train_ my_debug_run local \
122122
--dataset.h5py_dir=/path/to/data \
123123
--data_loader.global_batch_size=64
124124
```
@@ -321,7 +321,30 @@ torchrun --nproc_per_node=8 scripts/official/base.py train base_run local \
321321

322322
The experiment framework uses a builder pattern with override capabilities. Launch scripts can be edited to change the configuration or you can override any configuration parameter via CLI arguments using dotted notation.
323323

324-
### Common Overrides
324+
### Example: Custom Training Run with Multiple Overrides
325+
326+
```bash
327+
torchrun --nproc_per_node=8 scripts/official/base.py train custom_experiment local \
328+
--data_loader.global_batch_size=256 \
329+
--data_loader.num_workers=8 \
330+
--train_module.rank_microbatch_size=8 \
331+
--train_module.optim_config.lr=0.0002 \
332+
--train_module.scheduler.warmup_steps=5000 \
333+
--trainer.max_duration.epochs=100
334+
# Optionally --dataset.h5py_dir=/your/path/to/data \
335+
```
336+
337+
### Example Single GPU debug Setup
338+
339+
```bash
340+
torchrun scripts/official/base.py train custom_experiment local \
341+
--data_loader.global_batch_size=64 \
342+
--data_loader.num_workers=4 \
343+
--train_module.rank_microbatch_size=16 \
344+
--trainer.callbacks.wandb.enabled=False
345+
# Optionally --dataset.h5py_dir=/your/path/to/data \
346+
```
347+
---
325348

326349
#### Dataset Configuration
327350

@@ -369,31 +392,6 @@ Override model architecture (requires understanding the model config structure):
369392
--model.decoder_config.depth=8
370393
```
371394

372-
### Example: Custom Training Run with Multiple Overrides
373-
374-
```bash
375-
torchrun --nproc_per_node=8 scripts/official/base.py train custom_experiment local \
376-
--dataset.h5py_dir=/your/path/to/data \
377-
--data_loader.global_batch_size=256 \
378-
--data_loader.num_workers=8 \
379-
--train_module.rank_microbatch_size=8 \
380-
--train_module.optim_config.lr=0.0002 \
381-
--train_module.scheduler.warmup_steps=5000 \
382-
--trainer.max_duration.epochs=100
383-
```
384-
385-
### Example Single GPU debug Setup
386-
387-
```bash
388-
torchrun scripts/official/base.py train custom_experiment local \
389-
--data_loader.global_batch_size=64 \
390-
--data_loader.num_workers=4 \
391-
--train_module.rank_microbatch_size=16 \
392-
--trainer.callbacks.wandb.enabled=False
393-
# Optionally --dataset.h5py_dir=/your/path/to/data \
394-
```
395-
---
396-
397395

398396
## Helpful Files for Understanding
399397

0 commit comments

Comments
 (0)