Skip to content

UW-MLGEO/MLGEO2024_SeaIcePrediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sea Ice Concentration (SIC) Prediction using a Simple PINN as an Emulator

ESS 569 Final Project

Authors: Sky Gale, Joey Rotondo, and Geraint Webb

Project Objectives

This project aims to:

  • Produce a working physics-informed neural network (PINN) using only previous observational data of SIC and SST to predict present day SIC
  • Demonstrate that PINNs can be used as simple climate emulators
  • Compare the predicted SIC output from the PINN to observed values as ground truth and model validation

Data Sources

The data sources for this project include Northern hemisphere (NH) sea ice concentration (SIC) and Sea surface temperature (SST) data from ERA5, a reanalysis dataset produced by European Centre for Medium-Range Weather Forecasts (ECMWF).

Our data is geospatial data in netcdf format. It contains 45 September's worth of data from 50N of SST and of the total arctic SIC on the same grid.

Data Modalities and Formats This project utilizes the following data modalities and formats for model inference in sea ice prediction:

Data Modalities

Type: Reanalysis Variable(s): Sea Ice Concentration (SIC), Sea Surface Temperature (SST) Source: ERA5 reanalysis datasets from the European Centre for Medium-Range Weather Forecasts (ECMWF)

Data Formats

NetCDF (.nc): The primary format used for storing multidimensional scientific data, applicable to both satellite and reanalysis datasets.

ERA5 Reanalysis: Description: Provides detailed and comprehensive reanalysis data, including Sea Surface Temperature and other climate variables. Size: Multiple petabytes of climate data, available on a global scale with hourly resolution. Data Access: DOI URL

Instructions for setting up the environment

A .yml environment file has been uploaded to the main repository directory with environment name pinn_sea_ice. This file serves as the basis for setting up the Python environment necessary to run the Physically Informed Neural Network (PINN) desirable as part of this project.

Notes:

  • If you are in a remote login environment, you may need to run module load python first to initialize an envionrment with conda.
  • If not already installed, in order to add to the kernel list you may need to run conda install -c conda-forge ipykernel to install ipykernel.
  • If you run into troubles installing the physics-informed-neural-networks package, run pip install --no-cache-dir physics-informed-neural-networks to install it.

Make sure you are in the repository directory where environment.yml is saved (home directory of your git clone). Then, open your terminal and run the following commands:

# Create the Conda environment from the YAML file
conda env create -f environment.yml

# Activate the environment
conda activate pinn_sea_ice

# Install ipykernel in the environment
conda install -n pinn_sea_ice iykernel

# Add the environment to Jupyter
python -m ipykernel install --user --name=pinn_sea_ice --display-name "pinn_sea_ice"

Script and Notebook Descriptions

Notebooks

get_sst_data.ipynb: Designed for downloading and preprocessing sea surface temperature (SST) data from the ERA5 reanalysis dataset. It outlines steps to retrieve the data, clean it, and structure it for use in subsequent analyses. The notebook ensures that the SST data is ready for integration with other datasets, such as sea ice concentration, for modeling purposes.

process_sic_sst_data.ipynb: Contains the code to download, combine, and clean up the ERA5 SIC and SST data at once. It includes steps for loading raw data, cleaning, and transforming it to prepare for analysis. Key operations involve removing missing or invalid values, aggregating data over specific time periods, and formatting it for compatibility with machine learning models. The notebook aims to ensure that the dataset is clean and structured appropriately for subsequent modeling tasks in the project. It also focuses on transforming and preparing the cleaned data for use in artificial intelligence applications. It involves reshaping the dataset, normalizing values, and possibly splitting the data into training and testing sets. The goal is to ensure the data is structured appropriately for training machine learning models, enhancing the effectiveness of the predictive algorithms.

Note: This notebook performs the fetching, cleaning, and preparing of the SIC and SST data since saving the data at each step overwhelmed the Github, so only the final product is output as a .nc file.

Difficulties/DISCLAIMER

We ran into several difficulties surrounding data file size being too large. To manage this, we moved our data to GitHub's Large File Storage (LFS) system. This still was not enough and eventually we transitioned our data into a Google Drive.

About

Sky Gale, Joey Rotondo, and Geraint Webb final project

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors