Anna Hall (ESS 569), Je-Yun Chun (ESS 569), Katherine Mifsud (ESS 569), and Vlad Munteanu (ESS 469)
The objective of this geoscience machine learning project is to develop a machine learning model and pipeline to predict total cloud cover over the Atlantic Ocean using satellite data and reanalysis data. Current global climate models (GCMs), struggle to accurately predict cloud cover (CC) due to the complex dynamics and interactions at play in cloud formation, persistence, and dissipation. Furthermore, cloud cover is an important control on climate due to the influence of clouds on the total energy budget through trapping outgoing radiation and reflecting incoming radiation. Therefore, to further constrain future climate projections, it is crucial to accurately model cloud cover. We aim to utilize reanalysis estimates of cloud cover as the ground truth to train a machine learning model to accurately predict cloud cover from other, better-constrained variables.
The premise of this project could be completed or replicated using various data sources. For this specific project, our satellite data is from the geostationary weather satellite, GOES-R 16. Specifically, we use three data products from the GOES-16 Satellite Series; a cloud image and moisture product, a cloud optical depth product, and a reflected shortwave radiation product. In addition, we use ERA5 reanalysis data for a variety of meteorological variables, including winds, temperature, humidity, and heat fluxes to further the model's ability to predict cloud cover. The data is confined to a section of Atlantic Ocean (near Bermuda), where cloud formation happens on a regular basis and rough topography that breaks up clouds is absent. To avoid biases from seasonal and diurnal variation, the data is further confined to April 2020 at 15:00:00 UTC each day. All data is re-gridded to one-quarter degree lat/lon resolution.
A variety of machine learning methods are applied, including classic machine learning (CML) models and a few deep learning models. Preliminary analysis is conducted using CML to highlight imperfections and trends in the training data in preparation for deep learning models.
These notebooks and scripts are created in python through the VS Code platform. Before running any scripts or notebooks, the user should import the necessary packages listed below, should the necessary packages not be available you can try a 'pip install package name'
Relevant packages and libraries to install include :
from goes2go import GOES
import pandas as pd
from datetime import datetime
import xarray as xr
import subprocess
from netCDF4 import Dataset
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import numpy as npThere are a couple of packages that may be unfamiliar to the every day python user goes2go and subprocess.
The goes2go package is developed as an easy and efficient way to access GOES satellite data on the AWS server. This package is developed by Brian Blaylock. Instructions on installing and using the goes2go package can be found on this GitHub page:
The next package that may be unfamiliar is subprocess. We use the subprocess package to access the GOES data stored on the AWS server.
To set-up the AWS CLI on your own please follow the links under GOES on AWS:
-
Brian Blaylock’s Resources: For accessing GOES satellite data and
goes2godetails: -
GOES on AWS:
-
GOES Data:
-
Further Reading:
Basemap Documentation (I did not use this, however, Brian does.) (https://matplotlib.org/basemap/stable/users/geography.html)
**High-level descriptions of each added notebook and scripts in this repository will be updated frequently, as new files are added.
Data is too large please navigate to this google drive for the data: https://drive.google.com/drive/folders/1PMfDE_NcJCksiyA4KFzyt387RP2d4bcR?usp=sharing
How to use this repository:
- clone the repository
git clone https://github.com/UW-MLGEO/Cloud_fraction_Atlantic2024.git cd Cloud_fraction_Atlantic2024
- create the environment
conda env create -f cloud_fraction_prediction.yml
- activate the environment
conda activate cloud_fraction_prediction
- navigate to the notebooks folder and download the GOES data following this notebook: DownloadGOES
- next navigate to the DownloadERA5 notebook to download the supplementary meteorlogical field DownloadERA5
- to process each individual GOES variable, naviate to the data folder and then to ai_ready, here we put all of the scripts for each variable, the notebooks take the raw data and make them ai ready. Process GOES data
- with the newly saved netcdf files all other notebooks in the notebooks folder should be completely reproducible.