From e0ce69f8ce80607bdb9c7976a1a9c1951549fe52 Mon Sep 17 00:00:00 2001 From: ChengzhuZhang Date: Wed, 8 Oct 2025 14:52:22 -0700 Subject: [PATCH 1/4] add doc for AI training data --- docs/source/AITraining/index.rst | 40 +++++++++++++++++++ .../AITraining/simulation_data/index.rst | 34 ++++++++++++++++ .../simulation_data/simulation_table.rst | 19 +++++++++ docs/source/index.rst | 1 + 4 files changed, 94 insertions(+) create mode 100644 docs/source/AITraining/index.rst create mode 100644 docs/source/AITraining/simulation_data/index.rst create mode 100644 docs/source/AITraining/simulation_data/simulation_table.rst diff --git a/docs/source/AITraining/index.rst b/docs/source/AITraining/index.rst new file mode 100644 index 0000000..123798d --- /dev/null +++ b/docs/source/AITraining/index.rst @@ -0,0 +1,40 @@ +AI Training Datasets +==================== + +The E3SM project has developed several datasets specifically for AI and machine learning applications. These datasets have been processed by AI2 to make them publicly accessible and easier to use for research purposes. + +If you use data from these datasets, please cite the relevant overview manuscripts listed below. + +**Available Datasets:** + +* E3SMv2 - Energy Exascale Earth System Model version 2 training data +* E3SMv3 - Energy Exascale Earth System Model version 3 training data +* SCREAMv1 - Simple Cloud-Resolving E3SM Atmosphere Model version 1 training data (coming soon) + +**Citations:** + +* `Duncan et al. 2024 `_ +* `Wu et al. 2025 `_ + +**Using the Data:** + +These datasets have been specifically processed and formatted for machine learning applications. They provide: + +- Preprocessed climate simulation outputs in ML-ready formats +- Standardized variable naming and units +- Quality-controlled data with documented preprocessing steps +- Compatible file formats for common ML frameworks + +**Data Access:** + +The datasets are available through standard data repositories and can be accessed programmatically. Detailed access information and usage examples are provided in the dataset-specific documentation. + +**Future Developments:** + +Additional datasets from SCREAM and v3 simulations are planned for future releases. The SCREAMv1 dataset will be made available once the associated paper is published. + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + simulation_data/index \ No newline at end of file diff --git a/docs/source/AITraining/simulation_data/index.rst b/docs/source/AITraining/simulation_data/index.rst new file mode 100644 index 0000000..976870e --- /dev/null +++ b/docs/source/AITraining/simulation_data/index.rst @@ -0,0 +1,34 @@ +*************** +Simulation Data +*************** + + + +Instructions +************ + +The AI Training datasets are available through standard data repositories and have been specifically processed by AI2 for machine learning applications. + +**Data Access:** + +These datasets provide preprocessed climate simulation outputs in ML-ready formats with: + +- Standardized variable naming and units +- Quality-controlled data with documented preprocessing steps +- Compatible file formats for common ML frameworks + +**Available Datasets:** + +- **E3SMv2**: Energy Exascale Earth System Model version 2 training data +- **E3SMv3**: Energy Exascale Earth System Model version 3 training data +- **SCREAMv1**: Simple Cloud-Resolving E3SM Atmosphere Model version 1 training data (coming soon) + +Please refer to the dataset-specific documentation for detailed access information and usage examples. + +Table of AI training datasets +****************************** + +.. toctree:: + :maxdepth: 2 + + simulation_table \ No newline at end of file diff --git a/docs/source/AITraining/simulation_data/simulation_table.rst b/docs/source/AITraining/simulation_data/simulation_table.rst new file mode 100644 index 0000000..d0bf6ee --- /dev/null +++ b/docs/source/AITraining/simulation_data/simulation_table.rst @@ -0,0 +1,19 @@ +********************************** +AI Training Datasets simulation table +********************************** + ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ +| Dataset | Status | Data Size | HPSS Path | HPSS URL | ++===================================================================+=================+===========================================================================+===============================================================================+===============================================================================================+ +| **E3SMv2** | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ +| E3SMv2 AI Training Dataset | Available | 1.2T | /home/projects/e3sm/www/AI_training_data/e3sm-v2-climsst-180x360-gaussian | `Link `_ | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ +| **E3SMv3** | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ +| E3SMv3 AI Training Dataset | Available | 1.3T | /home/projects/e3sm/www/AI_training_data/e3sm-v3-amip-180x360-gaussian | `Link `_ | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ +| **SCREAMv1** | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ +| SCREAMv1 AI Training Dataset | Coming Soon | TBD | TBD | TBD | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ \ No newline at end of file diff --git a/docs/source/index.rst b/docs/source/index.rst index 51de892..ba47ee6 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -21,6 +21,7 @@ simulations. v3/index SCREAMv0/index SCREAMv1/index + AITraining/index From 01c07385fc223135eacda66c8c237ca35fa0dd6e Mon Sep 17 00:00:00 2001 From: chengzhuzhang Date: Wed, 8 Oct 2025 16:51:20 -0700 Subject: [PATCH 2/4] add details on data discription --- docs/source/AITraining/index.rst | 36 ++++++------------- .../AITraining/simulation_data/index.rst | 21 ++++------- .../simulation_data/simulation_table.rst | 34 +++++++++--------- 3 files changed, 33 insertions(+), 58 deletions(-) diff --git a/docs/source/AITraining/index.rst b/docs/source/AITraining/index.rst index 123798d..7f1f92e 100644 --- a/docs/source/AITraining/index.rst +++ b/docs/source/AITraining/index.rst @@ -3,38 +3,22 @@ AI Training Datasets The E3SM project has developed several datasets specifically for AI and machine learning applications. These datasets have been processed by AI2 to make them publicly accessible and easier to use for research purposes. -If you use data from these datasets, please cite the relevant overview manuscripts listed below. +Dataset Details +*************** -**Available Datasets:** +- **E3SMv2**: 73-year EAMv2 simulation (F2010, perpetual 2010 forcing, repeating annual SST cycle from 2005-2014 average). 6-hourly outputs: 42 years training, 10 years validation, 10 years test. More details see: `Duncan et al. 2024 `_ -* E3SMv2 - Energy Exascale Earth System Model version 2 training data -* E3SMv3 - Energy Exascale Earth System Model version 3 training data -* SCREAMv1 - Simple Cloud-Resolving E3SM Atmosphere Model version 1 training data (coming soon) +- **E3SMv3**: 51-year EAMv3 AMIP-style simulation (1970-2020, F2010 with AMIP SSTs, constant 2010 CO2). Includes multiple ENSO cycles and global warming trend. More details see: `Wu et al. 2025 `_ -**Citations:** +- **SCREAMv1**: Simple Cloud-Resolving E3SM Atmosphere Model version 1 training data (coming soon) -* `Duncan et al. 2024 `_ -* `Wu et al. 2025 `_ +.. tip:: + Check the ``archive_content`` text file to see files included in each tar archive. You can selectively download the files you need. -**Using the Data:** - -These datasets have been specifically processed and formatted for machine learning applications. They provide: - -- Preprocessed climate simulation outputs in ML-ready formats -- Standardized variable naming and units -- Quality-controlled data with documented preprocessing steps -- Compatible file formats for common ML frameworks - -**Data Access:** - -The datasets are available through standard data repositories and can be accessed programmatically. Detailed access information and usage examples are provided in the dataset-specific documentation. - -**Future Developments:** - -Additional datasets from SCREAM and v3 simulations are planned for future releases. The SCREAMv1 dataset will be made available once the associated paper is published. +Table of AI training datasets +****************************** .. toctree:: :maxdepth: 2 - :caption: Contents: - simulation_data/index \ No newline at end of file + simulation_data/simulation_table \ No newline at end of file diff --git a/docs/source/AITraining/simulation_data/index.rst b/docs/source/AITraining/simulation_data/index.rst index 976870e..fc50fe1 100644 --- a/docs/source/AITraining/simulation_data/index.rst +++ b/docs/source/AITraining/simulation_data/index.rst @@ -4,26 +4,17 @@ Simulation Data -Instructions -************ - -The AI Training datasets are available through standard data repositories and have been specifically processed by AI2 for machine learning applications. - -**Data Access:** - -These datasets provide preprocessed climate simulation outputs in ML-ready formats with: +Dataset Details +*************** -- Standardized variable naming and units -- Quality-controlled data with documented preprocessing steps -- Compatible file formats for common ML frameworks +- **E3SMv2**: 73-year EAMv2 simulation (F2010, perpetual 2010 forcing, repeating annual SST cycle from 2005-2014 average). 6-hourly outputs: 42 years training, 10 years validation, 10 years test. -**Available Datasets:** +- **E3SMv3**: 51-year EAMv3 AMIP-style simulation (1970-2020, F2010 with AMIP SSTs, constant 2010 CO2). Includes multiple ENSO cycles and global warming trend. -- **E3SMv2**: Energy Exascale Earth System Model version 2 training data -- **E3SMv3**: Energy Exascale Earth System Model version 3 training data - **SCREAMv1**: Simple Cloud-Resolving E3SM Atmosphere Model version 1 training data (coming soon) -Please refer to the dataset-specific documentation for detailed access information and usage examples. +.. tip:: + Check the ``archive_content`` text file to see files included in each tar archive. You can selectively download the files you need. Table of AI training datasets ****************************** diff --git a/docs/source/AITraining/simulation_data/simulation_table.rst b/docs/source/AITraining/simulation_data/simulation_table.rst index d0bf6ee..3f6d668 100644 --- a/docs/source/AITraining/simulation_data/simulation_table.rst +++ b/docs/source/AITraining/simulation_data/simulation_table.rst @@ -1,19 +1,19 @@ -********************************** +******************************************** AI Training Datasets simulation table -********************************** +******************************************** -+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ -| Dataset | Status | Data Size | HPSS Path | HPSS URL | -+===================================================================+=================+===========================================================================+===============================================================================+===============================================================================================+ -| **E3SMv2** | -+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ -| E3SMv2 AI Training Dataset | Available | 1.2T | /home/projects/e3sm/www/AI_training_data/e3sm-v2-climsst-180x360-gaussian | `Link `_ | -+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ -| **E3SMv3** | -+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ -| E3SMv3 AI Training Dataset | Available | 1.3T | /home/projects/e3sm/www/AI_training_data/e3sm-v3-amip-180x360-gaussian | `Link `_ | -+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ -| **SCREAMv1** | -+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ -| SCREAMv1 AI Training Dataset | Coming Soon | TBD | TBD | TBD | -+-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ \ No newline at end of file ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ +| Dataset | Status | Data Size | HPSS Path | HPSS URL | ++===================================================================+=================+===========================================================================+===============================================================================+=====================================================================================================================+ +| **E3SMv2** | | | | | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ +| E3SMv2 AI Training Dataset | Available | 1.2T | /home/projects/e3sm/www/AI_training_data/e3sm-v2-climsst-180x360-gaussian | `Link `_ | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ +| **E3SMv3** | | | | | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ +| E3SMv3 AI Training Dataset | Available | 1.3T | /home/projects/e3sm/www/AI_training_data/e3sm-v3-amip-180x360-gaussian | `Link `_ | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ +| **SCREAMv1** | | | | | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ +| SCREAMv1 AI Training Dataset | Coming Soon | TBD | TBD | TBD | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ From cb81ca7400221e005e15f76de7c9243ab5f67eda Mon Sep 17 00:00:00 2001 From: chengzhuzhang Date: Thu, 9 Oct 2025 09:51:40 -0700 Subject: [PATCH 3/4] refinement --- docs/source/AITraining/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/AITraining/index.rst b/docs/source/AITraining/index.rst index 7f1f92e..c8f4ebb 100644 --- a/docs/source/AITraining/index.rst +++ b/docs/source/AITraining/index.rst @@ -1,7 +1,7 @@ AI Training Datasets ==================== -The E3SM project has developed several datasets specifically for AI and machine learning applications. These datasets have been processed by AI2 to make them publicly accessible and easier to use for research purposes. +The E3SM project and `Allen Institute for AI (Ai2) `_ work together to develop datasets for AI and machine learning applications. E3SMv2 and E3SMv3 have been processed by Ai2 to make them publicly accessible and easier to use for research purposes. Dataset Details *************** From accc5d98aaa92891863c31f42aa65bb8fcc3b231 Mon Sep 17 00:00:00 2001 From: chengzhuzhang Date: Fri, 17 Oct 2025 15:37:15 -0700 Subject: [PATCH 4/4] address review --- docs/source/AITraining/index.rst | 14 ++++++++------ .../AITraining/simulation_data/index.rst | 19 +------------------ .../simulation_data/simulation_table.rst | 10 +++++++--- 3 files changed, 16 insertions(+), 27 deletions(-) diff --git a/docs/source/AITraining/index.rst b/docs/source/AITraining/index.rst index c8f4ebb..f42b9b1 100644 --- a/docs/source/AITraining/index.rst +++ b/docs/source/AITraining/index.rst @@ -1,22 +1,24 @@ AI Training Datasets ==================== -The E3SM project and `Allen Institute for AI (Ai2) `_ work together to develop datasets for AI and machine learning applications. E3SMv2 and E3SMv3 have been processed by Ai2 to make them publicly accessible and easier to use for research purposes. +The E3SM project and `Allen Institute for AI (Ai2) `_ have developed several datasets for AI and machine learning applications. These datasets have been postprocessed for ingestion by the `ACE `_/`FourCastNet `_ emulator. Dataset Details *************** -- **E3SMv2**: 73-year EAMv2 simulation (F2010, perpetual 2010 forcing, repeating annual SST cycle from 2005-2014 average). 6-hourly outputs: 42 years training, 10 years validation, 10 years test. More details see: `Duncan et al. 2024 `_ +- **EAMv2**: 73-year EAMv2 simulation (F2010, perpetual 2010 forcing, repeating annual SST cycle from 2005-2014 average). 6-hourly outputs. More details see: `Duncan et al. 2024 `_ -- **E3SMv3**: 51-year EAMv3 AMIP-style simulation (1970-2020, F2010 with AMIP SSTs, constant 2010 CO2). Includes multiple ENSO cycles and global warming trend. More details see: `Wu et al. 2025 `_ +- **EAMv3**: 51-year EAMv3 AMIP-style simulation (1970-2020, F2010 with AMIP SSTs, constant 2010 CO2). Includes multiple ENSO cycles and global warming trend. More details see: `Wu et al. 2025 `_ + +- **E3SMv3**: Coupled pre-industrial and historical training data (coming soon) - **SCREAMv1**: Simple Cloud-Resolving E3SM Atmosphere Model version 1 training data (coming soon) .. tip:: - Check the ``archive_content`` text file to see files included in each tar archive. You can selectively download the files you need. + Check the ``archive_contents`` text file to see files included in each tar archive. You can selectively download the files you need. -Table of AI training datasets -****************************** +Data Access +*********** .. toctree:: :maxdepth: 2 diff --git a/docs/source/AITraining/simulation_data/index.rst b/docs/source/AITraining/simulation_data/index.rst index fc50fe1..be7a57d 100644 --- a/docs/source/AITraining/simulation_data/index.rst +++ b/docs/source/AITraining/simulation_data/index.rst @@ -1,24 +1,7 @@ *************** -Simulation Data +Simulation Data *************** - - -Dataset Details -*************** - -- **E3SMv2**: 73-year EAMv2 simulation (F2010, perpetual 2010 forcing, repeating annual SST cycle from 2005-2014 average). 6-hourly outputs: 42 years training, 10 years validation, 10 years test. - -- **E3SMv3**: 51-year EAMv3 AMIP-style simulation (1970-2020, F2010 with AMIP SSTs, constant 2010 CO2). Includes multiple ENSO cycles and global warming trend. - -- **SCREAMv1**: Simple Cloud-Resolving E3SM Atmosphere Model version 1 training data (coming soon) - -.. tip:: - Check the ``archive_content`` text file to see files included in each tar archive. You can selectively download the files you need. - -Table of AI training datasets -****************************** - .. toctree:: :maxdepth: 2 diff --git a/docs/source/AITraining/simulation_data/simulation_table.rst b/docs/source/AITraining/simulation_data/simulation_table.rst index 3f6d668..787a8c7 100644 --- a/docs/source/AITraining/simulation_data/simulation_table.rst +++ b/docs/source/AITraining/simulation_data/simulation_table.rst @@ -5,13 +5,17 @@ AI Training Datasets simulation table +-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ | Dataset | Status | Data Size | HPSS Path | HPSS URL | +===================================================================+=================+===========================================================================+===============================================================================+=====================================================================================================================+ -| **E3SMv2** | | | | | +| **EAMv2** | | | | | +-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| E3SMv2 AI Training Dataset | Available | 1.2T | /home/projects/e3sm/www/AI_training_data/e3sm-v2-climsst-180x360-gaussian | `Link `_ | +| EAMv2 AI Training Dataset | Available | 1.2T | /home/projects/e3sm/www/AI_training_data/e3sm-v2-climsst-180x360-gaussian | `Link `_ | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ +| **EAMv3** | | | | | ++-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ +| EAMv3 AI Training Dataset | Available | 1.3T | /home/projects/e3sm/www/AI_training_data/e3sm-v3-amip-180x360-gaussian | `Link `_ | +-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ | **E3SMv3** | | | | | +-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| E3SMv3 AI Training Dataset | Available | 1.3T | /home/projects/e3sm/www/AI_training_data/e3sm-v3-amip-180x360-gaussian | `Link `_ | +| E3SMv3 Coupled AI Training Dataset | Coming Soon | TBD | TBD | TBD | +-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ | **SCREAMv1** | | | | | +-------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+