Skip to content

Commit 520666f

Browse files
committed
✏️ Update the README for the changes made in #230
1 parent 7bc80a7 commit 520666f

1 file changed

Lines changed: 1 addition & 72 deletions

File tree

README.md

Lines changed: 1 addition & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ You will need a [CDS account](https://cds.climate.copernicus.eu/how-to-api) to d
9898

9999
Run `uv run imp datasets create` to download datasets.
100100

101-
N.b. For very large datasets, use `load_in_parts` instead (see [Downloading large datasets](#downloading-large-datasets) below).
101+
We make use of the fact that Anemoi datasets keep track of which groups of dates have been loaded to ensure that an interrupted download can be resumed simply by rerunning the `datasets create` command.
102102

103103
### Inspect
104104

@@ -199,74 +199,3 @@ There are various demonstrator Jupyter notebooks in the `notebooks` folder.
199199
You can run these with `uv run --group notebooks jupyter notebook`.
200200

201201
A good one to start with is `notebooks/demo_pipeline.ipynb` which gives a more detailed overview of the pipeline.
202-
203-
## Downloading large datasets
204-
For particularly large datasets, e.g. the full ERA5 dataset, it may be necessary to download the data in parts.
205-
206-
### Automated approach (recommended)
207-
208-
The `load_in_parts` command automates the process of downloading datasets in parts, tracking progress, and allowing you to resume interrupted downloads:
209-
210-
```bash
211-
uv run imp datasets load_in_parts --config-name <your config>.yaml
212-
```
213-
214-
This command will:
215-
- Automatically initialise the dataset if it doesn't exist
216-
- Load all parts sequentially, tracking progress in a part tracker file
217-
- Skip already completed parts if the process is interrupted and restarted
218-
- Handle errors gracefully (by default, continues to the next part on error)
219-
220-
You will then need to finalise the dataset when done.
221-
222-
```bash
223-
uv run imp datasets finalise --config-name <your config>.yaml
224-
```
225-
226-
#### Options
227-
228-
- `--continue-on-error` / `--no-continue-on-error` (default: `--continue-on-error`): Continue to next part on error
229-
- `--force-reset`: Clear existing progress tracker and start from part 1. Anemoi will check whether you have the data already and continue.
230-
- `--dataset <name>`: Run only a single dataset by name (useful when you have multiple datasets in your config). Make sure you use the dataset name and not the name of the config.
231-
- `--total-parts <n>`: Override the computed total number of parts (useful if you want more / fewer parts than the default 10)
232-
- `--overwrite`: Delete the dataset directory before loading (use with caution!)
233-
234-
#### Examples
235-
236-
Load all parts for all datasets, resuming from where you left off:
237-
```bash
238-
uv run imp datasets load_in_parts --config-name <your config>.yaml
239-
```
240-
241-
Load a specific dataset with a custom number of parts:
242-
```bash
243-
uv run imp datasets load_in_parts --config-name <your config>.yaml --dataset my_dataset --total-parts 25
244-
```
245-
246-
Start fresh, clearing any previous progress (doesn't delete any data):
247-
```bash
248-
uv run imp datasets load_in_parts --config-name <your config>.yaml --force-reset
249-
```
250-
Start and destroy any previously saved data (careful):
251-
```bash
252-
uv run imp datasets load_in_parts --config-name <your config>.yaml --overwrite
253-
```
254-
255-
### Manual approach (advanced)
256-
257-
If you need more control, you can manually manage the download process:
258-
259-
1. First initialise the dataset:
260-
```bash
261-
uv run imp datasets init --config-name <your config>.yaml
262-
```
263-
264-
2. Then load each part `i` of the total `n` in turn:
265-
```bash
266-
uv run imp datasets load --config-name <your config>.yaml --parts i/n
267-
```
268-
269-
3. When all the parts are loaded, finalise the dataset:
270-
```bash
271-
uv run imp datasets finalise --config-name <your config>.yaml
272-
```

0 commit comments

Comments
 (0)