Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
3e6cd65
:q
Hgherzog Oct 20, 2025
096a75f
Able to hit the dataset not here error
Hgherzog Oct 21, 2025
73595ac
training works
Hgherzog Oct 21, 2025
f5687db
add in the other files
Hgherzog Oct 21, 2025
d92bda1
path to have pretraining work outside beaker but still requires a bea…
Hgherzog Oct 21, 2025
c1d5f88
move paths out to a seperate file that loads as env vars
Hgherzog Oct 21, 2025
0ffbd91
more clean ups
Hgherzog Oct 21, 2025
b37822a
split out sickle processor
Hgherzog Oct 21, 2025
2cb7702
cull imports
Hgherzog Oct 21, 2025
a5d5975
training runs decoupled from evaluation
Hgherzog Oct 21, 2025
8c0fbc0
official scripts ready
Hgherzog Oct 21, 2025
2985c8c
add docs example
Hgherzog Oct 22, 2025
526157e
updated docs still need some more work
Hgherzog Oct 22, 2025
afcc2b0
updated pretraining.md
Hgherzog Oct 23, 2025
eb70de6
pre-training docs
Hgherzog Oct 23, 2025
cacbc8d
works on a beaker session
Oct 23, 2025
ba598b5
update official scripts
Oct 23, 2025
f7b77be
update tutorial order
Hgherzog Oct 23, 2025
deded88
add priority note
Hgherzog Oct 23, 2025
5965db0
spelling
Hgherzog Oct 23, 2025
86605fa
actually enable torchrun
Hgherzog Oct 23, 2025
e23d572
simplify as we are required to have it for all
Hgherzog Oct 23, 2025
cd150e2
formatting changes
Hgherzog Oct 23, 2025
1d9a8a6
linting fixes
Hgherzog Oct 23, 2025
41fa6fc
fix mor elints
Hgherzog Oct 23, 2025
cc77728
move the beaker launch config back
Hgherzog Oct 24, 2025
0801bc4
rename and fix lints
Hgherzog Oct 24, 2025
99057ab
use goebench library directly
Hgherzog Oct 24, 2025
ba697c4
none checking
Hgherzog Oct 24, 2025
05eb0a6
pretrain tutorial updated
Hgherzog Oct 24, 2025
3aefa8d
Merge branch 'main' into henryh/pre-train-tutorial
Hgherzog Oct 25, 2025
c220f9b
adress initial comments
Hgherzog Oct 26, 2025
970e718
clean ups
Hgherzog Oct 26, 2025
de619e1
fix orig size default
Hgherzog Oct 27, 2025
5fc7277
add more info about ablations
Hgherzog Oct 27, 2025
da0e329
Merge branch 'main' into henryh/pre-train-tutorial
Hgherzog Oct 27, 2025
2e36660
Merge branch 'main' into henryh/pre-train-tutorial
Hgherzog Oct 27, 2025
5fe66df
update uv to specify python version and point to it in the pretrainin…
Hgherzog Oct 27, 2025
56da4c8
add dataset extraction
Oct 27, 2025
1471471
better tar opening instructions
Oct 27, 2025
4e52495
add link to hugging face page
Oct 27, 2025
f867d24
fix ai2 and remove notes
Hgherzog Oct 28, 2025
4c8f66a
remove a bunch of the overide examples
Hgherzog Oct 28, 2025
40d03b4
shorten explanation
Hgherzog Oct 28, 2025
91b01c0
remove reference
Hgherzog Oct 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ repos:
-p,
-vv,
olmoearth_pretrain,
--exclude,
olmoearth_pretrain/evals/models/*,
--fail-under=80,
]
- repo: https://github.com/astral-sh/ruff-pre-commit
Expand Down
63 changes: 3 additions & 60 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,70 +7,13 @@ Earth system foundation model: data, training, and evaluation
launching training runs on beaker
## General Setup

**Requirements:** Python 3.11 or higher (Python 3.12 recommended)

1. Install uv: `curl -LsSf https://astral.sh/uv/install.sh | sh` (other ways to do it are documented [here](https://docs.astral.sh/uv/getting-started/installation/))
2. Navigate to root directory of this repo and run `uv sync --locked --all-groups`
2. Navigate to root directory of this repo and run `uv sync --locked --all-groups --python 3.12`
3. Install the pre-commit tool `uv tool install pre-commit --with pre-commit-uv --force-reinstall`
4. uv installs everything into a venv, so to keep using `python` commands you can activate uv's venv: `source .venv/bin/activate`. Otherwise, swap to `uv run python`.

## Training Setup
1. Create a Github Token that is able to clone this repo on beaker. You can generate a token [here](https://github.com/settings/tokens) Following permissions are sufficient
- repo
- read:packages
- read:org
- write:org
- read:project

Authorize this token for the allenai org. by clicking on the Configure SSO drop down in [here](https://github.com/settings/tokens) for the token you created.
2. Set your default Beaker workspace and budget:
`beaker config set default_workspace ai2/earth-systems`
`beaker workspace set-budget ai2/earth-systems ai2/d5`
3. Set the following Beaker Secrets:
- `beaker secret write <your_beaker_username>_WANDB_API_KEY <your_key>`
- `beaker secret write <your_beaker_username>_BEAKER_TOKEN <your_token>`
- `beaker secret write <your_beaker_username>_GITHUB_TOKEN <your_key>`

4. Create a script based on scripts/latent_mim.py and configure your experiment (you can override specific changes)


## Launch

### Pre-emptible Jobs

To launch pre-emptible jobs, we will use the main entrypoint in [olmoearth_pretrain/internal/experiment.py](olmoearth_pretrain/internal/experiment.py) and write python configuration files that use it like [scripts/latent_mim.py](scripts/latent_mim.py). Depednign on your experiment it might make sense to write a new script with different builders or to just overide as needed for an existing one.
Before launching your script **MAKE SURE YOUR CODE IS COMMITED AND PUSHED** as we are cloning the code on top of a docker image when we launch the job.

We can launch a script as follows:

`python3 scripts/base_debug_scripts/latent_mim.py launch test_run ai2/saturn-cirrascale`

This will launch a beaker job and stream the logs to your console until you cancel.
Add additional overides as needed.

### Sessions

[VSCODE/Cursor workflow setup](https://docs.google.com/document/d/1ydiCqIn45xlbrIcfPi8bILn_y00adTAHhIY1MPh9szE/edit?tab=t.0#heading=h.wua78h35aq1n) \
Be sure your session creation has included the following args
- ` --secret-env WANDB_API_KEY=<your_beaker_username>_WANDB_API_KEY
--secret-env BEAKER_TOKEN=<your_beaker_username>__BEAKER_TOKEN `

Note: In order to use flash attention in a session, use `"beaker://petew/olmo-core-tch270cu128"` as your base beaker image.
Then, set up a conda environment so you can use the flash attention code saved in the base image.
1. `conda init`
2. `exec bash`
3. `conda shell.bash activate base`
4. `pip install -e '.[all]'`

When launching runs in Sessions for debugging, use the following command,

`torchrun scripts/base_debug_scripts/latent_mim.py train test_run local`

Add additional overides as needed.

## Beaker Information

budget: `ai2/es-platform` \
workspace: `ai2/earth-systems` \
weka: `weka://dfive-default`

## OlmoEarth Pretrain Dataset

Expand Down
37 changes: 0 additions & 37 deletions beaker_config_example.yaml

This file was deleted.

Loading