Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
ac17273
bump olmo core version and add g cloud compute to reqs
pjreddie Oct 20, 2025
3e6cd65
:q
Hgherzog Oct 20, 2025
096a75f
Able to hit the dataset not here error
Hgherzog Oct 21, 2025
73595ac
training works
Hgherzog Oct 21, 2025
f5687db
add in the other files
Hgherzog Oct 21, 2025
d92bda1
path to have pretraining work outside beaker but still requires a bea…
Hgherzog Oct 21, 2025
c1d5f88
move paths out to a seperate file that loads as env vars
Hgherzog Oct 21, 2025
0ffbd91
more clean ups
Hgherzog Oct 21, 2025
b37822a
split out sickle processor
Hgherzog Oct 21, 2025
2cb7702
cull imports
Hgherzog Oct 21, 2025
a5d5975
training runs decoupled from evaluation
Hgherzog Oct 21, 2025
8c0fbc0
official scripts ready
Hgherzog Oct 21, 2025
2985c8c
add docs example
Hgherzog Oct 22, 2025
526157e
updated docs still need some more work
Hgherzog Oct 22, 2025
afcc2b0
updated pretraining.md
Hgherzog Oct 23, 2025
eb70de6
pre-training docs
Hgherzog Oct 23, 2025
cacbc8d
works on a beaker session
Oct 23, 2025
ba598b5
update official scripts
Oct 23, 2025
f7b77be
update tutorial order
Hgherzog Oct 23, 2025
deded88
add priority note
Hgherzog Oct 23, 2025
5965db0
spelling
Hgherzog Oct 23, 2025
86605fa
actually enable torchrun
Hgherzog Oct 23, 2025
e23d572
simplify as we are required to have it for all
Hgherzog Oct 23, 2025
cd150e2
formatting changes
Hgherzog Oct 23, 2025
1d9a8a6
linting fixes
Hgherzog Oct 23, 2025
41fa6fc
fix mor elints
Hgherzog Oct 23, 2025
2efc7d9
Merge remote-tracking branch 'origin/henryh/pre-train-tutorial' into …
yawenzzzz Oct 24, 2025
a67a34a
eval doc
yawenzzzz Oct 24, 2025
753ee2e
update experiment paths
yawenzzzz Oct 24, 2025
7118f7b
update the models that are supported
yawenzzzz Oct 24, 2025
c6a9206
eval changes
yawenzzzz Oct 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 0 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,65 +13,6 @@ launching training runs on beaker
4. Run `pip install pre-commit`
5. Run `pre-commit install`

## Training Setup
1. Create a Github Token that is able to clone this repo on beaker. You can generate a token [here](https://github.com/settings/tokens) Following permissions are sufficient
- repo
- read:packages
- read:org
- write:org
- read:project

Authorize this token for the allenai org. by clicking on the Configure SSO drop down in [here](https://github.com/settings/tokens) for the token you created.
2. Set your default Beaker workspace and budget:
`beaker config set default_workspace ai2/earth-systems`
`beaker workspace set-budget ai2/earth-systems ai2/d5`
3. Set the following Beaker Secrets:
- `beaker secret write <your_beaker_username>_WANDB_API_KEY <your_key>`
- `beaker secret write <your_beaker_username>_BEAKER_TOKEN <your_token>`
- `beaker secret write <your_beaker_username>_GITHUB_TOKEN <your_key>`

4. Create a script based on scripts/latent_mim.py and configure your experiment (you can override specific changes)


## Launch

### Pre-emptible Jobs

To launch pre-emptible jobs, we will use the main entrypoint in [olmoearth_pretrain/internal/experiment.py](olmoearth_pretrain/internal/experiment.py) and write python configuration files that use it like [scripts/latent_mim.py](scripts/latent_mim.py). Depednign on your experiment it might make sense to write a new script with different builders or to just overide as needed for an existing one.
Before launching your script **MAKE SURE YOUR CODE IS COMMITED AND PUSHED** as we are cloning the code on top of a docker image when we launch the job.

We can launch a script as follows:

`python3 scripts/base_debug_scripts/latent_mim.py launch test_run ai2/saturn-cirrascale`

This will launch a beaker job and stream the logs to your console until you cancel.
Add additional overides as needed.

### Sessions

[VSCODE/Cursor workflow setup](https://docs.google.com/document/d/1ydiCqIn45xlbrIcfPi8bILn_y00adTAHhIY1MPh9szE/edit?tab=t.0#heading=h.wua78h35aq1n) \
Be sure your session creation has included the following args
- ` --secret-env WANDB_API_KEY=<your_beaker_username>_WANDB_API_KEY
--secret-env BEAKER_TOKEN=<your_beaker_username>__BEAKER_TOKEN `

Note: In order to use flash attention in a session, use `"beaker://petew/olmo-core-tch270cu128"` as your base beaker image.
Then, set up a conda environment so you can use the flash attention code saved in the base image.
1. `conda init`
2. `exec bash`
3. `conda shell.bash activate base`
4. `pip install -e '.[all]'`

When launching runs in Sessions for debugging, use the following command,

`torchrun scripts/base_debug_scripts/latent_mim.py train test_run local`

Add additional overides as needed.

## Beaker Information

budget: `ai2/es-platform` \
workspace: `ai2/earth-systems` \
weka: `weka://dfive-default`

## OlmoEarth Pretrain Dataset

Expand Down
37 changes: 0 additions & 37 deletions beaker_config_example.yaml

This file was deleted.

185 changes: 185 additions & 0 deletions docs/Evaluation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# OlmoEarth Evaluation Guide

This guide explains how we launch evaluation sweeps for OlmoEarth checkpoints and baseline models, including KNN, linear probing, and finetuning jobs.

---

## Choose Your Evaluation Path

> **🏢 AI2 Researchers (Internal):**
> You have access to Beaker/Weka clusters and shared checkpoints. Skim [Setup-Internal.md](Setup-Internal.md) for environment details, then follow the launch instructions below.

> **🌍 External Users:**
> You can run these workflows on local/cloud GPUs. You will need the datasets referenced in [Dataset Setup](Pretraining.md#dataset-setup).

---

## Table of Contents

1. [Evaluation Overview](#evaluation-overview)
2. [Quick Start](#quick-start)
3. [KNN / Linear Probing](#knn--linear-probing)
4. [Finetune](#finetune-sweep)
5. [Monitoring & Outputs](#monitoring--outputs)
6. [Helpful Files](#helpful-files)

---

## Evaluation Overview

We run evaluations through the same `olmoearth_pretrain/internal/experiment.py` entrypoint used for pretraining. The helper scripts below build the underlying launch commands and fan out the learning rate, normalization, and pooling sweeps we used in paper.

- `olmoearth_pretrain/internal/full_eval_sweep.py` launches KNN (for classification) and linear probing (for segmentation) against an OlmoEarth checkpoint or a supported baseline model.
- `olmoearth_pretrain/internal/full_eval_sweep_finetune.py` launches finetuning evaluations, including optional sweeps over pretrained and dataset normalizers.

Both scripts rely on:
- [`olmoearth_pretrain/internal/all_evals.py`](../olmoearth_pretrain/internal/all_evals.py) for the task registry.
- [`olmoearth_pretrain/evals`](../olmoearth_pretrain/evals) for dataset/model wrappers.

### Prerequisites

- Python environment configured as described in [Pretraining.md](Pretraining.md#environment-setup).
- Access to evaluation datasets (see [`evals/datasets/paths.py`](../olmoearth_pretrain/evals/datasets/paths.py) for expected locations).
- W&B API key (`WANDB_API_KEY`) if you want metrics to stream automatically.
- For AI2 infra: valid Beaker cluster name (`ai2/saturn`, `ai2/titan`, etc.).

### Supported Models

- **OlmoEarth checkpoints:** Any checkpoint compatible with the evaluation `experiment.py` entrypoint.
- **Baseline presets:** `dino_v3`, `panopticon`, `galileo`, `satlas`, `croma`, `copernicusfm`, `presto`, `anysat`, `tessera`, `prithvi_v2`, `terramind`, `clay`. Multi-size variants (e.g. `croma_large`, `galileo_large`, `terramind_large`) are handled automatically by the sweep scripts when requested.

---

## Quick Start

### 1. Activate your environment

```bash
source .venv-olmoearth_pretrain/bin/activate
```

### 2. Run a dry run to inspect the planned commands

```bash
python -m olmoearth_pretrain.internal.full_eval_sweep \
--cluster=local \
--checkpoint_path=/path/to/checkpoint/step123000 \
--defaults_only \
--dry_run
```

This prints the exact `torchrun`/`python3` command that will be executed for each task and hyperparameter combination.

### 3. Launch for real

Remove `--dry_run` once the command looks correct. On local GPUs the helper scripts will call `torchrun`; on Beaker they call `python3` with the launch module defined by `EVAL_LAUNCH_PATH`.

---

## KNN / Linear Probing

Use this script for KNN and linear probing evaluations. Invoke it either through `python -m olmoearth_pretrain.internal.full_eval_sweep` or by running the file directly.

### Required flags

- `--cluster`: Cluster identifier (`local` for on-box runs).
- Exactly one of:
- `--checkpoint_path=/path/to/checkpoint/stepXXXX`: Evaluate an OlmoEarth checkpoint.
- `--model=<baseline_name>` or `--model=all`: Evaluate published baseline models defined in [`evals/models`](../olmoearth_pretrain/evals/models).

### Common optional flags

- `--module_path`: Override the launch module (defaults to the model-specific launcher).
- `--project_name`: W&B project (defaults to `EVAL_WANDB_PROJECT`).
- `--defaults_only`: Run a single command using the default lr / normalization / pooling.
- `--lr_only`: Sweep learning rates but keep normalization + pooling at defaults.
- `--all_sizes` or `--size=<variant>`: Evaluate every published size for multi-size baselines.
- `--model-skip-names=a,b`: Skip a subset when using `--model=all`.
- `--select_best_val`: Uses validation MIoU to pick the best epoch before reporting test metrics.
- `--dry_run`: Print commands without launching.
- Extra CLI arguments (e.g. `--trainer.max_duration.unit=epochs`) are forwarded to the underlying train module.

### Example: Launch OlmoEarth evaluation against a checkpoint (local debug)

```bash
python -m olmoearth_pretrain.internal.full_eval_sweep \
--cluster=local \
--checkpoint_path=/data/checkpoints/phase2_base/step667200 \
--module_path=scripts/2025_10_02_phase2/base.py \
--defaults_only
```

### Example: Launch baseline sweep on Beaker

```bash
python -m olmoearth_pretrain.internal.full_eval_sweep \
--cluster=ai2/saturn-cirrascale \
--model=dino_v3 \
--project_name=2025_10_eval_comparison \
--lr_only
```

When `--model=all`, the script automatically switches to the correct launcher for each model and constructs run names like `<checkpoint>_lr1e-3_norm_dataset_pool_mean`.

---

## Finetune Sweep

Use `olmoearth_pretrain/internal/full_eval_sweep_finetune.py` for downstream fine-tuning tasks. It shares many flags with the KNN and linear probing sweep but adds fine-tune–specific knobs.

### Required flags

- `--cluster`: Cluster identifier.
- One of:
- `--checkpoint_path=/path/to/olmoearth/stepXXXX`: Fine-tune an OlmoEarth checkpoint.
- `--model=<preset_key>`: Use a baseline preset (choices listed in the script’s help).

### Fine-tune specific flags

- `--defaults_only`: Run only the first learning rate in `FT_LRS`.
- `--sweep_normalizer`: For models with pretrained normalizers, run both dataset stats and pretrained normalizer variants.
- `--module_path`: Override the launch script (defaults to the preset’s launcher).
- Extra CLI arguments append to every command (e.g. `--trainer.max_duration.value=50000`).
- `--dry_run`: Preview commands.

The script sets `FINETUNE=1` in the environment before launching so downstream code enables fine-tuning heads automatically.

### Example: OlmoEarth checkpoint fine-tune sweep (Beaker)

```bash
python -m olmoearth_pretrain.internal.full_eval_sweep_finetune \
--cluster=ai2/titan \
--checkpoint_path=/weka/.../phase2.0_base_lr0.0001_wd0.02/step667200 \
--project_name=2025_10_08_phase2_finetune \
--defaults_only
```

### Example: Baseline fine-tune with normalizer sweep

```bash
python -m olmoearth_pretrain.internal.full_eval_sweep_finetune \
--cluster=ai2/saturn-cirrascale \
--model=galileo \
--sweep_normalizer
```

---

## Monitoring & Outputs

- **W&B logging:** Both scripts default to `EVAL_WANDB_PROJECT`. Override with `--project_name` or disable W&B via `--trainer.callbacks.wandb.enabled=False`.
- **Checkpoints:** Evaluation launches set `--trainer.no_checkpoints=True` for baseline models so runs do not write new checkpoints. OlmoEarth checkpoints keep checkpointing enabled by default.
- **Run names:** Generated from the checkpoint directory (`<run>/<step>`) or baseline name plus the swept hyperparameters to simplify aggregation.
- **Inspecting results:** Use [`scripts/get_max_eval_metrics_from_wandb.py`](../scripts/get_max_eval_metrics_from_wandb.py) to pull the best MIoU/accuracy per task across runs.
- **Dry run safety:** Always start with `--dry_run` when editing sweeps or passing overrides—command strings can be long and the dry run verifies the generated arguments.

---

## Helpful Files

- [`internal/all_evals.py`](../olmoearth_pretrain/internal/all_evals.py): Lists frozen and fine-tune tasks, feature extractor settings, and metric names.
- [`evals/models`](../olmoearth_pretrain/evals/models): Launcher modules and wrappers for baseline models.
- [`evals/datasets/configs.py`](../olmoearth_pretrain/evals/datasets/configs.py): Dataset configs used when constructing evaluation commands.
- [`docs/Pretraining.md`](Pretraining.md): Shared environment setup; refer back if you need to rebuild Docker images or install dependencies.

Happy evaluating! Let the team know in `#olmoearth` if new baselines or tasks need presets added to the sweep scripts.
Loading
Loading