Skip to content

Commit afcc2b0

Browse files
committed
updated pretraining.md
1 parent 526157e commit afcc2b0

1 file changed

Lines changed: 15 additions & 4 deletions

File tree

docs/Pretraining.md

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ This section covers:
3939

4040
### Prerequisites
4141

42-
- Python 3.12+
42+
- Python 3.11+
4343
- CUDA-capable GPU (recommended: 40GB+ VRAM)
4444
- Linux/macOS environment
4545

@@ -63,6 +63,14 @@ This section covers:
6363
pre-commit install
6464
```
6565

66+
### Running on Docker
67+
68+
We run our training scripts using the `olmo-core-tch271cu128-2025-09-15` Docker image published by [ai2-olmo-core](https://github.com/allenai/OLMo-core/blob/main/README.md).
69+
70+
**Important Notes:**
71+
- The code from this repository is **not included** in the Docker image to aid in active development. The code is mounted or copied at runtime.
72+
- This Docker image may not work on your own cluster if you have different hardware or driver/CUDA versions. The image is built for CUDA 12.8 with PyTorch 2.7.1.
73+
- **For adaptation:** See our [Dockerfile](../Dockerfile) to understand how to build an image compatible with your hardware and CUDA setup
6674

6775
## Launching Scripts
6876

@@ -200,14 +208,16 @@ Evaluation datasets have default paths set in [`olmoearth_pretrain/evals/dataset
200208

201209
1. Download/prepare the evaluation datasets locally
202210
2. Set environment variables (see [Environment Variables](#environment-variables))
203-
3. Or disable evaluations you don't have by adding the following override to your command:
211+
3. If not using all evaluations, enable only the ones you have set up by adding an override:
212+
213+
e.g to only run mados and pastis_sentinel2 evals add the following overide.
204214
```bash
205-
--trainer.callbacks.downstream_evaluator.tasks_to_run=[mados,pastis_sentinel2]
215+
--trainer.callbacks.downstream_evaluator.tasks_to_run=\[mados,pastis_sentinel2\]
206216
```
207217
The task names correspond to the user-chosen names specified in the training configuration
208218

209219
---
210-
### Main Training Scripts
220+
### Official Training Scripts
211221
> **🏢 AI2 Researchers - Choose Your Launch Method:**
212222
>
213223
> **For Beaker Batch Jobs (Pre-emptible):**
@@ -230,6 +240,7 @@ Evaluation datasets have default paths set in [`olmoearth_pretrain/evals/dataset
230240
>
231241
> See [Setup-Internal.md](Setup-Internal.md#launch-methods) for more details.
232242
243+
All Official release scripts can be found at [`scripts/official/`](../scripts/official/).
233244
Below is a table demonstrating how to launch various model sizes using `torchrun` (for external users and AI2 sessions). Adjust the dataset path and configuration overrides as needed for your setup.
234245
235246
| Model Size | Script | Hardware | Example Command | Notes |

0 commit comments

Comments
 (0)