You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-2Lines changed: 12 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
# Machine Learning Tropical Cyclones Detection
2
2
3
3
## Overview
4
-
The repository provides a Machine Learning (ML) library to setup training and validation of a Tropical Cyclones (TCs) Detection model. ERA5 reanalysis and the International Best Track Archive for Climate Stewardship (IBTrACS) data are used as input and the target, respectively. Input-Output data pairs are provided as Zarr data stores.
4
+
The repository provides a Machine Learning (ML) library to setup training and validation of a Tropical Cyclones (TCs) Detection model and run the tracking. ERA5 reanalysis and the International Best Track Archive for Climate Stewardship (IBTrACS) data are used as input and the target, respectively. Input-Output data pairs are provided as Zarr data stores.
5
5
6
6
The model can use the following input drivers:
7
7
- 10m wind gust [ $\frac{m}{s}$]
@@ -32,7 +32,9 @@ The _train.py_ script takes advantage of the Command Line Interface (CLI) to pas
32
32
-`--devices` argument defines the number of GPU devices per node to run the training on.
33
33
-`--num_nodes` argument defines the total number of nodes that will be used.
34
34
35
-
The total number of GPUs used during the training can be evinced by simply multiplying `devices * num_nodes`.
35
+
The total number of GPUs used during the training can be evinced by simply multiplying `devices * num_nodes`.
36
+
37
+
A bash script for the training, _train.sh_ , is also provided under the same folder.
36
38
37
39
With regards to the configuration file, it must be prepared in toml format. The configuration file is structured as follows:
38
40
@@ -84,6 +86,10 @@ With regards to the configuration file, it must be prepared in toml format. The
84
86
- drop_remainder: whether or not to drop the last batch if the number of dataset elements is not divisible by the batch size
85
87
- accumulation_steps: number of gradient accumulation steps before calliing backward propagation
86
88
89
+
### Pre-processing workflow
90
+
91
+
A workflow based on PyOphidia for preparing CMIP6 data for TC detection is provided under the `workflows` folder.
92
+
87
93
## How to
88
94
89
95
### Download IBTrACS
@@ -99,6 +105,10 @@ Since the TC Detection case study relies on IBTrACS dataset, it must be download
99
105
100
106
To download ERA5 data you must need a CDS account and the set of IBTrACS for which the reated ERA5 data is gathered. The script `era5_gathering.py` under `src/dataset` can be used for this purpose.
101
107
108
+
## Example notebooks
109
+
110
+
Example notebooks for executing and evaluating a trained ML model are provided under the `notebooks` folder.
111
+
102
112
## Python3 Environment
103
113
The code has been tested on Python 3.11.2 with the following dependencies:
Copy file name to clipboardExpand all lines: notebook/README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
### Notebook 1 – inference_noteook.ipynb
1
+
### Notebook 1 – inference_notebook.ipynb
2
2
3
3
This notebook is designed to perform inference using the trained model for TCs detection and to apply a tracking algorithm to identify the trajectories of the detected systems. It can be used both on historical and projection data.
4
4
@@ -8,7 +8,7 @@ Workflow
8
8
For first, the user specifies:
9
9
- `main_dir`: root directory of the project.
10
10
- `dataset_dir`: path to the climate dataset to be analyzed (e.g., CMIP6, NICAM, ERA5).
11
-
- `model_dir`: path to the pre-trained model to be used for inference.
11
+
- `run_name`: name of the pre-trained model on MLflow to be used for inference.
12
12
- `ibtracs_src`: path to the **IBTrACS** file used as ground truth for validation.
13
13
- `year`: the year on which inference will be performed.
0 commit comments