You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-72Lines changed: 1 addition & 72 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -98,7 +98,7 @@ You will need a [CDS account](https://cds.climate.copernicus.eu/how-to-api) to d
98
98
99
99
Run `uv run imp datasets create` to download datasets.
100
100
101
-
N.b. For very large datasets, use `load_in_parts` instead (see [Downloading large datasets](#downloading-large-datasets) below).
101
+
We make use of the fact that Anemoi datasets keep track of which groups of dates have been loaded to ensure that an interrupted download can be resumed simply by rerunning the `datasets create` command.
102
102
103
103
### Inspect
104
104
@@ -199,74 +199,3 @@ There are various demonstrator Jupyter notebooks in the `notebooks` folder.
199
199
You can run these with `uv run --group notebooks jupyter notebook`.
200
200
201
201
A good one to start with is `notebooks/demo_pipeline.ipynb` which gives a more detailed overview of the pipeline.
202
-
203
-
## Downloading large datasets
204
-
For particularly large datasets, e.g. the full ERA5 dataset, it may be necessary to download the data in parts.
205
-
206
-
### Automated approach (recommended)
207
-
208
-
The `load_in_parts` command automates the process of downloading datasets in parts, tracking progress, and allowing you to resume interrupted downloads:
209
-
210
-
```bash
211
-
uv run imp datasets load_in_parts --config-name <your config>.yaml
212
-
```
213
-
214
-
This command will:
215
-
- Automatically initialise the dataset if it doesn't exist
216
-
- Load all parts sequentially, tracking progress in a part tracker file
217
-
- Skip already completed parts if the process is interrupted and restarted
218
-
- Handle errors gracefully (by default, continues to the next part on error)
219
-
220
-
You will then need to finalise the dataset when done.
221
-
222
-
```bash
223
-
uv run imp datasets finalise --config-name <your config>.yaml
224
-
```
225
-
226
-
#### Options
227
-
228
-
-`--continue-on-error` / `--no-continue-on-error` (default: `--continue-on-error`): Continue to next part on error
229
-
-`--force-reset`: Clear existing progress tracker and start from part 1. Anemoi will check whether you have the data already and continue.
230
-
-`--dataset <name>`: Run only a single dataset by name (useful when you have multiple datasets in your config). Make sure you use the dataset name and not the name of the config.
231
-
-`--total-parts <n>`: Override the computed total number of parts (useful if you want more / fewer parts than the default 10)
232
-
-`--overwrite`: Delete the dataset directory before loading (use with caution!)
233
-
234
-
#### Examples
235
-
236
-
Load all parts for all datasets, resuming from where you left off:
237
-
```bash
238
-
uv run imp datasets load_in_parts --config-name <your config>.yaml
239
-
```
240
-
241
-
Load a specific dataset with a custom number of parts:
0 commit comments