You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The H5 dataset follows a hierarchical directory structure (see [`set_h5py_dir` in convert_to_h5py.py](../olmoearth_pretrain/dataset/convert_to_h5py.py)):
-**`timestamps`**: Integer array `[T, 3]` where T=time steps, columns are `[day, month, year]`
226
+
-**Modality datasets**: Named by modality (e.g., `"sentinel2"`, `"era5_10"`, `"naip"`, `"landsat"` - see all available modalities in [`constants.py`](../olmoearth_pretrain/data/constants.py))
227
+
- Spatial modalities: Shape `[H, W, T, C]` or `[H, W, C]` depending on temporal variation
228
+
- Non-spatial modalities: Shape `[T, C]`
229
+
-**`missing_timesteps_masks/`** group: Boolean masks per modality (shape `[T]`) indicating which timestamps from the longest timestamp array are present for that specific modality (see [`_create_missing_timesteps_masks` in convert_to_h5py.py](../olmoearth_pretrain/dataset/convert_to_h5py.py))
230
+
231
+
2.**`sample_metadata.csv`** - CSV with columns `sample_index, <modality1>, <modality2>...` where values are 1 (present) or 0 (absent), tracking which modalities exist in each sample (see [`save_sample_metadata` in convert_to_h5py.py](../olmoearth_pretrain/dataset/convert_to_h5py.py))
232
+
233
+
3.**`latlon_distribution.npy`** - NumPy array `[N, 2]` of all sample lat/lons for dataset statistics (see [`save_latlon_distribution` in convert_to_h5py.py](../olmoearth_pretrain/dataset/convert_to_h5py.py))
234
+
235
+
4.**`compression_settings.json`** - Stores compression algorithm, compression level options, and shuffle filter settings used for all H5 files (see [`save_compression_settings` in convert_to_h5py.py](../olmoearth_pretrain/dataset/convert_to_h5py.py))
236
+
237
+
**Key Invariant:** All H5 files follow the same schema with `latlon`, `timestamps`, modality datasets, and `missing_timesteps_masks` group structure, ensuring consistency across the entire dataset.
238
+
203
239
### Evaluation Datasets
204
240
205
241
Evaluation datasets have default paths set in [`olmoearth_pretrain/evals/datasets/paths.py`](../olmoearth_pretrain/evals/datasets/paths.py).
@@ -210,14 +246,16 @@ Evaluation datasets have default paths set in [`olmoearth_pretrain/evals/dataset
210
246
2. Set environment variables (see [Environment Variables](#environment-variables))
211
247
3. If not using all evaluations, enable only the ones you have set up by adding an override:
212
248
213
-
e.g to only run mados and pastis_sentinel2 evals add the following overide.
249
+
For example, to only run mados and pastis_sentinel2 evals add the following override:
> When using `local` as the cluster argument, checkpoints are automatically saved to `./local_output`. You can override this location with `--common.save_folder=path/to/savefolder`.
278
316
279
317
280
-
281
-
282
-
283
-
284
318
## Overrides and Experiments
285
319
286
320
### How Overrides Work
287
321
288
-
The experiment framework uses a builder pattern with override capabilities. You can override any configuration parameter via CLI arguments using dotted notation.
322
+
The experiment framework uses a builder pattern with override capabilities. Launch scripts can be edited to change the configuration or you can override any configuration parameter via CLI arguments using dotted notation.
289
323
290
324
### Common Overrides
291
325
@@ -348,61 +382,25 @@ torchrun --nproc_per_node=8 scripts/official/base.py train custom_experiment loc
348
382
--trainer.max_duration.epochs=100
349
383
```
350
384
351
-
For more override patterns and examples, see the [Reference Guide](Reference.md#override-patterns).
352
385
353
386
---
354
387
355
-
## Gotchas and Troubleshooting
356
-
357
-
When adapting the training setup to your hardware, the following parameters commonly require adjustment:
358
-
359
-
-**Batch size** (`--data_loader.global_batch_size` and `--train_module.rank_microbatch_size`): Reduce these if you encounter out-of-memory errors
360
-
-**Number of workers** (`--data_loader.num_workers`): Adjust based on available CPU cores for data loading
361
-
-**Number of GPUs** (`--nproc_per_node` in torchrun): Set to match your available GPU count
362
-
363
-
For detailed troubleshooting guidance, consult the [Reference Guide](Reference.md#troubleshooting-guide).
0 commit comments