ESS 569 Final Project
Authors: Sky Gale, Joey Rotondo, and Geraint Webb
This project aims to:
- Produce a working physics-informed neural network (PINN) using only previous observational data of SIC and SST to predict present day SIC
- Demonstrate that PINNs can be used as simple climate emulators
- Compare the predicted SIC output from the PINN to observed values as ground truth and model validation
The data sources for this project include Northern hemisphere (NH) sea ice concentration (SIC) and Sea surface temperature (SST) data from ERA5, a reanalysis dataset produced by European Centre for Medium-Range Weather Forecasts (ECMWF).
Our data is geospatial data in netcdf format. It contains 45 September's worth of data from 50N of SST and of the total arctic SIC on the same grid.
Data Modalities and Formats This project utilizes the following data modalities and formats for model inference in sea ice prediction:
Data Modalities
Type: Reanalysis Variable(s): Sea Ice Concentration (SIC), Sea Surface Temperature (SST) Source: ERA5 reanalysis datasets from the European Centre for Medium-Range Weather Forecasts (ECMWF)
Data Formats
NetCDF (.nc): The primary format used for storing multidimensional scientific data, applicable to both satellite and reanalysis datasets.
ERA5 Reanalysis: Description: Provides detailed and comprehensive reanalysis data, including Sea Surface Temperature and other climate variables. Size: Multiple petabytes of climate data, available on a global scale with hourly resolution. Data Access: DOI URL
A .yml environment file has been uploaded to the main repository directory with environment name pinn_sea_ice. This file serves as the basis for setting up the Python environment necessary to run the Physically Informed Neural Network (PINN) desirable as part of this project.
Notes:
- If you are in a remote login environment, you may need to run
module load pythonfirst to initialize an envionrment withconda. - If not already installed, in order to add to the kernel list you may need to run
conda install -c conda-forge ipykernelto installipykernel. - If you run into troubles installing the
physics-informed-neural-networkspackage, runpip install --no-cache-dir physics-informed-neural-networksto install it.
Make sure you are in the repository directory where environment.yml is saved (home directory of your git clone). Then, open your terminal and run the following commands:
# Create the Conda environment from the YAML file
conda env create -f environment.yml
# Activate the environment
conda activate pinn_sea_ice
# Install ipykernel in the environment
conda install -n pinn_sea_ice iykernel
# Add the environment to Jupyter
python -m ipykernel install --user --name=pinn_sea_ice --display-name "pinn_sea_ice"
Notebooks
get_sst_data.ipynb: Designed for downloading and preprocessing sea surface temperature (SST) data from the ERA5 reanalysis dataset. It outlines steps to retrieve the data, clean it, and structure it for use in subsequent analyses. The notebook ensures that the SST data is ready for integration with other datasets, such as sea ice concentration, for modeling purposes.
process_sic_sst_data.ipynb: Contains the code to download, combine, and clean up the ERA5 SIC and SST data at once. It includes steps for loading raw data, cleaning, and transforming it to prepare for analysis. Key operations involve removing missing or invalid values, aggregating data over specific time periods, and formatting it for compatibility with machine learning models. The notebook aims to ensure that the dataset is clean and structured appropriately for subsequent modeling tasks in the project. It also focuses on transforming and preparing the cleaned data for use in artificial intelligence applications. It involves reshaping the dataset, normalizing values, and possibly splitting the data into training and testing sets. The goal is to ensure the data is structured appropriately for training machine learning models, enhancing the effectiveness of the predictive algorithms.
Note: This notebook performs the fetching, cleaning, and preparing of the SIC and SST data since saving the data at each step overwhelmed the Github, so only the final product is output as a .nc file.
We ran into several difficulties surrounding data file size being too large. To manage this, we moved our data to GitHub's Large File Storage (LFS) system. This still was not enough and eventually we transitioned our data into a Google Drive.