Skip to content

Commit 01c0738

Browse files
committed
add details on data discription
1 parent e0ce69f commit 01c0738

File tree

3 files changed

+33
-58
lines changed

3 files changed

+33
-58
lines changed

docs/source/AITraining/index.rst

Lines changed: 10 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -3,38 +3,22 @@ AI Training Datasets
33

44
The E3SM project has developed several datasets specifically for AI and machine learning applications. These datasets have been processed by AI2 to make them publicly accessible and easier to use for research purposes.
55

6-
If you use data from these datasets, please cite the relevant overview manuscripts listed below.
6+
Dataset Details
7+
***************
78

8-
**Available Datasets:**
9+
- **E3SMv2**: 73-year EAMv2 simulation (F2010, perpetual 2010 forcing, repeating annual SST cycle from 2005-2014 average). 6-hourly outputs: 42 years training, 10 years validation, 10 years test. More details see: `Duncan et al. 2024 <https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2024JH000136>`_
910

10-
* E3SMv2 - Energy Exascale Earth System Model version 2 training data
11-
* E3SMv3 - Energy Exascale Earth System Model version 3 training data
12-
* SCREAMv1 - Simple Cloud-Resolving E3SM Atmosphere Model version 1 training data (coming soon)
11+
- **E3SMv3**: 51-year EAMv3 AMIP-style simulation (1970-2020, F2010 with AMIP SSTs, constant 2010 CO2). Includes multiple ENSO cycles and global warming trend. More details see: `Wu et al. 2025 <https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2025JH000774>`_
1312

14-
**Citations:**
13+
- **SCREAMv1**: Simple Cloud-Resolving E3SM Atmosphere Model version 1 training data (coming soon)
1514

16-
* `Duncan et al. 2024 <https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2024JH000136>`_
17-
* `Wu et al. 2025 <https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2025JH000774>`_
15+
.. tip::
16+
Check the ``archive_content`` text file to see files included in each tar archive. You can selectively download the files you need.
1817

19-
**Using the Data:**
20-
21-
These datasets have been specifically processed and formatted for machine learning applications. They provide:
22-
23-
- Preprocessed climate simulation outputs in ML-ready formats
24-
- Standardized variable naming and units
25-
- Quality-controlled data with documented preprocessing steps
26-
- Compatible file formats for common ML frameworks
27-
28-
**Data Access:**
29-
30-
The datasets are available through standard data repositories and can be accessed programmatically. Detailed access information and usage examples are provided in the dataset-specific documentation.
31-
32-
**Future Developments:**
33-
34-
Additional datasets from SCREAM and v3 simulations are planned for future releases. The SCREAMv1 dataset will be made available once the associated paper is published.
18+
Table of AI training datasets
19+
******************************
3520

3621
.. toctree::
3722
:maxdepth: 2
38-
:caption: Contents:
3923

40-
simulation_data/index
24+
simulation_data/simulation_table

docs/source/AITraining/simulation_data/index.rst

Lines changed: 6 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,26 +4,17 @@ Simulation Data
44

55

66

7-
Instructions
8-
************
9-
10-
The AI Training datasets are available through standard data repositories and have been specifically processed by AI2 for machine learning applications.
11-
12-
**Data Access:**
13-
14-
These datasets provide preprocessed climate simulation outputs in ML-ready formats with:
7+
Dataset Details
8+
***************
159

16-
- Standardized variable naming and units
17-
- Quality-controlled data with documented preprocessing steps
18-
- Compatible file formats for common ML frameworks
10+
- **E3SMv2**: 73-year EAMv2 simulation (F2010, perpetual 2010 forcing, repeating annual SST cycle from 2005-2014 average). 6-hourly outputs: 42 years training, 10 years validation, 10 years test.
1911

20-
**Available Datasets:**
12+
- **E3SMv3**: 51-year EAMv3 AMIP-style simulation (1970-2020, F2010 with AMIP SSTs, constant 2010 CO2). Includes multiple ENSO cycles and global warming trend.
2113

22-
- **E3SMv2**: Energy Exascale Earth System Model version 2 training data
23-
- **E3SMv3**: Energy Exascale Earth System Model version 3 training data
2414
- **SCREAMv1**: Simple Cloud-Resolving E3SM Atmosphere Model version 1 training data (coming soon)
2515

26-
Please refer to the dataset-specific documentation for detailed access information and usage examples.
16+
.. tip::
17+
Check the ``archive_content`` text file to see files included in each tar archive. You can selectively download the files you need.
2718

2819
Table of AI training datasets
2920
******************************
Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
1-
**********************************
1+
********************************************
22
AI Training Datasets simulation table
3-
**********************************
3+
********************************************
44

5-
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
6-
| Dataset | Status | Data Size | HPSS Path | HPSS URL |
7-
+===================================================================+=================+===========================================================================+===============================================================================+===============================================================================================+
8-
| **E3SMv2** |
9-
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
10-
| E3SMv2 AI Training Dataset | Available | 1.2T | /home/projects/e3sm/www/AI_training_data/e3sm-v2-climsst-180x360-gaussian | `Link <https://portal.nersc.gov/archive/home/projects/e3sm/www/AI_training_data/e3sm-v2-climsst-180x360-gaussian>`_ |
11-
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
12-
| **E3SMv3** |
13-
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
14-
| E3SMv3 AI Training Dataset | Available | 1.3T | /home/projects/e3sm/www/AI_training_data/e3sm-v3-amip-180x360-gaussian | `Link <https://portal.nersc.gov/archive/home/projects/e3sm/www/AI_training_data/e3sm-v3-amip-180x360-gaussian>`_ |
15-
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
16-
| **SCREAMv1** |
17-
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
18-
| SCREAMv1 AI Training Dataset | Coming Soon | TBD | TBD | TBD |
19-
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
5+
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
6+
| Dataset | Status | Data Size | HPSS Path | HPSS URL |
7+
+===================================================================+=================+===========================================================================+===============================================================================+=====================================================================================================================+
8+
| **E3SMv2** | | | | |
9+
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
10+
| E3SMv2 AI Training Dataset | Available | 1.2T | /home/projects/e3sm/www/AI_training_data/e3sm-v2-climsst-180x360-gaussian | `Link <https://portal.nersc.gov/archive/home/projects/e3sm/www/AI_training_data/e3sm-v2-climsst-180x360-gaussian>`_ |
11+
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
12+
| **E3SMv3** | | | | |
13+
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
14+
| E3SMv3 AI Training Dataset | Available | 1.3T | /home/projects/e3sm/www/AI_training_data/e3sm-v3-amip-180x360-gaussian | `Link <https://portal.nersc.gov/archive/home/projects/e3sm/www/AI_training_data/e3sm-v3-amip-180x360-gaussian>`_ |
15+
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
16+
| **SCREAMv1** | | | | |
17+
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
18+
| SCREAMv1 AI Training Dataset | Coming Soon | TBD | TBD | TBD |
19+
+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+

0 commit comments

Comments
 (0)