Skip to content

Commit 4a37903

Browse files
committed
Clean notebooks and refine documentation
1 parent aea755e commit 4a37903

File tree

6 files changed

+132
-971
lines changed

6 files changed

+132
-971
lines changed

README.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Machine Learning Tropical Cyclones Detection
22

33
## Overview
4-
The repository provides a Machine Learning (ML) library to setup training and validation of a Tropical Cyclones (TCs) Detection model. ERA5 reanalysis and the International Best Track Archive for Climate Stewardship (IBTrACS) data are used as input and the target, respectively. Input-Output data pairs are provided as Zarr data stores.
4+
The repository provides a Machine Learning (ML) library to setup training and validation of a Tropical Cyclones (TCs) Detection model and run the tracking. ERA5 reanalysis and the International Best Track Archive for Climate Stewardship (IBTrACS) data are used as input and the target, respectively. Input-Output data pairs are provided as Zarr data stores.
55

66
The model can use the following input drivers:
77
- 10m wind gust [ $\frac{m}{s}$]
@@ -32,7 +32,9 @@ The _train.py_ script takes advantage of the Command Line Interface (CLI) to pas
3232
- `--devices` argument defines the number of GPU devices per node to run the training on.
3333
- `--num_nodes` argument defines the total number of nodes that will be used.
3434

35-
The total number of GPUs used during the training can be evinced by simply multiplying `devices * num_nodes`.
35+
The total number of GPUs used during the training can be evinced by simply multiplying `devices * num_nodes`.
36+
37+
A bash script for the training, _train.sh_ , is also provided under the same folder.
3638

3739
With regards to the configuration file, it must be prepared in toml format. The configuration file is structured as follows:
3840

@@ -84,6 +86,10 @@ With regards to the configuration file, it must be prepared in toml format. The
8486
- drop_remainder: whether or not to drop the last batch if the number of dataset elements is not divisible by the batch size
8587
- accumulation_steps: number of gradient accumulation steps before calliing backward propagation
8688

89+
### Pre-processing workflow
90+
91+
A workflow based on PyOphidia for preparing CMIP6 data for TC detection is provided under the `workflows` folder.
92+
8793
## How to
8894

8995
### Download IBTrACS
@@ -99,6 +105,10 @@ Since the TC Detection case study relies on IBTrACS dataset, it must be download
99105

100106
To download ERA5 data you must need a CDS account and the set of IBTrACS for which the reated ERA5 data is gathered. The script `era5_gathering.py` under `src/dataset` can be used for this purpose.
101107

108+
## Example notebooks
109+
110+
Example notebooks for executing and evaluating a trained ML model are provided under the `notebooks` folder.
111+
102112
## Python3 Environment
103113
The code has been tested on Python 3.11.2 with the following dependencies:
104114

notebook/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
### Notebook 1 – inference_noteook.ipynb
1+
### Notebook 1 – inference_notebook.ipynb
22

33
This notebook is designed to perform inference using the trained model for TCs detection and to apply a tracking algorithm to identify the trajectories of the detected systems. It can be used both on historical and projection data.
44

@@ -8,7 +8,7 @@ Workflow
88
For first, the user specifies:
99
- `main_dir`: root directory of the project.
1010
- `dataset_dir`: path to the climate dataset to be analyzed (e.g., CMIP6, NICAM, ERA5).
11-
- `model_dir`: path to the pre-trained model to be used for inference.
11+
- `run_name`: name of the pre-trained model on MLflow to be used for inference.
1212
- `ibtracs_src`: path to the **IBTrACS** file used as ground truth for validation.
1313
- `year`: the year on which inference will be performed.
1414
- `device`: compute device (`cpu`, `cuda`, `mps`, etc.).
@@ -17,7 +17,7 @@ Workflow
1717
2. Model and dataset loading
1818
Standard cells are provided to:
1919
- Load the pre-trained model
20-
- Load the input dataset
20+
- Select and load the input dataset. For CMIP6 data the climate model and time period can be selected.
2121
- Prepare the data for inference
2222

2323
3. Inference

notebook/inference_notebook.ipynb

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@
5555
"cell_type": "markdown",
5656
"metadata": {},
5757
"source": [
58-
"Select the model by specfying the run name from the MLFlow and download model, scaler and provenance document"
58+
"Select the model by specifying the `run_name` from the MLFlow and download model, scaler and provenance document"
5959
]
6060
},
6161
{
@@ -97,7 +97,7 @@
9797
"source": [
9898
"## Inference workflow on historical data\n",
9999
"\n",
100-
"Let's get the data on a given time frame (year and month) for the evaluation"
100+
"Let's select the historical data (ERA5) on a given time frame (year and month) for the evaluation"
101101
]
102102
},
103103
{
@@ -146,7 +146,7 @@
146146
"cell_type": "markdown",
147147
"metadata": {},
148148
"source": [
149-
"We can now detect and localize the TC centers with the ML model and load also the observed TCs"
149+
"We can now detect and localize the TC centers with the ML model and load the observed TCs from IBTrACS data"
150150
]
151151
},
152152
{
@@ -206,6 +206,13 @@
206206
"### Compare with observations"
207207
]
208208
},
209+
{
210+
"cell_type": "markdown",
211+
"metadata": {},
212+
"source": [
213+
"Compute POD and FAR of the detected track"
214+
]
215+
},
209216
{
210217
"cell_type": "code",
211218
"execution_count": null,
@@ -248,7 +255,7 @@
248255
"source": [
249256
"## Inference workflow on projection data\n",
250257
"\n",
251-
"Let's get CMIP6 data on a given time frame (year and month) for the evaluation"
258+
"Let's select CMIP6 data on a given time frame (year and month) for the evaluation"
252259
]
253260
},
254261
{

notebook/inference_notebook_test.ipynb

Lines changed: 61 additions & 80 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)