Skip to content

For the UW-MLGEO 2024 final project. This project will focus on using a CCN to classify ice crystals images sampled in a snow storm.

License

Notifications You must be signed in to change notification settings

UW-MLGEO/MLGEO2024_Snowflake_Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

152 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLGEO2024_Snowflake_Classification Repository

Contributors: Valeria Garcia (ESS 569) and Carlos Palma Bernal (ESS 469)

Project Overview

The "MLGEO2024_Snowflake_Classification" project aims to develop a robust AI algorithm utilizing Convolutional Neural Networks (CNNs) to identify and classify high-resolution ice crystal images according to snowflake type. This initiative seeks to automate the classification process, which is traditionally carried out manually—a method that is often time-consuming, prone to human error, and lacks reproducibility.

Importance of Snowflake Classification

Snowflake classification is vital for understanding microphysical processes occurring within winter storms, which has implications for snowfall accumulation. By providing a reliable, objective classification system, this project will enhance the scientific community's ability to study and interpret snow microphysics, which plays a crucial role in winter weather systems.

Objectives

The primary objectives of this project include:

  • Automating Classification: Replace manual snowflake classification with a machine learning-based algorithm to improve efficiency and accuracy.
  • Data-Driven Insights: Leverage statistical analyses and visualizations to gain insights into the characteristics and distributions of different ice crystal types.

Data Source:

This project will utilize ice crystal images (in PNG format) photographed by the Particle Habit Imaging and Polar Scattering Probe (PHIPS) during the Investigation of Microphysics and Precipitation in Atlantic Coast-Threatening Snowstorms (IMPACTS) for both the training and testing datasets. IMPACTS was a NASA field campaign focused on studying winter storms during the 2020, 2022, and 2023 winter seasons (January-February), using aircraft to collect microphysical and remote sensing observations across multiple U.S. northeastern and Midwestern winter storms. Along with the ice crystal images, meteorological and navigation data from the P-3 aircraft (on which the PHIPS was onboard) will be used to provide temperature and spatial/geographic information for the ice crystals images.

IMPACTS logo PHIPS Instrument

For curating the training dataset of this project, ice crystal images from the following flight dates were selected: 2020-02-07 (sample area: Pennslyvannia), 2022-02-17 (sample area: Illinois), 2023-01-23 (sample area: Maine), and 2023-02-14 (sample area: Minnesota). These dates were selected for the unique storm structures sampled. In-cloud sampling of these storms occurred at multiple temperature levels, ensuring a diverse assortment of ice crystal images sampled, which is a requirement for constructing a training dataset representative of many snowflake types.

Data Modalities and Formats:

  • PHIPS ice crystal dataset
  • Consists of ice crystal images in .PNG format, imaged in-cloud
  • Sampling rate: 1 Hz for duration of flight
  • Note: Due to storage constraints in this repository, the raw PHIPS image files were not included in data/raw/
  • P-3 aircraft meteorological/navigation dataset
  • Contains time-series numerical data including geographic (e.g. latitude, longitude coordinates), meteorological (e.g. temperature, dewpoint, wind speed) and altitude data (e.g. altitude, pitch, roll, yaw) reported from the P-3 aircraft during each flight
  • Sampling rate: 1 Hz for duration of flight
  • Format: ICARTT (.ict)
  • ICARTT files: an ICARTT file is a text-based format for atmospheric research data, developed to ensure consistency in data sharing. It contains tabular data and headers with information on measurements, units, instruments, and data quality. Lengthy documentation on the ICARTT file format standards can be found here

Getting Started

To get started with the "MLGEO2024_Snowflake_Classification" project, follow these steps:

Prerequisites:

  • Python 3.9.12 or later
  • Conda for managing dependencies (recommended)
  • Git cloning the repository
  • An NASA Earthdata Login Profile (for downloading data)

1. Clone the repository

Start by cloning the repository to your local machine:

git clone https://github.com/UW-MLGEO/MLGEO2024\_Snowflake\_Classification.git
cd MLGEO2024_Snowflake_Classification

2. Set Up the Environment

Create a conda environment with the necessary dependencies using the snow_classification_env.yml file:

conda env create -n snow_classification_env -f snow_classification_env.yml
conda activate snow_classification_env

This will install all required packages, including libraries for data processing, machine learning, and visualization.

3. Register for a NASA Earthdata Login Profile

To download the IMPACTS P-3 met/nav data on the NASA Global Hydrometeorology Resource (GCHR) website, an Earthdata Login profile is needed. Registering is free and can be done here

Descriptions of the Python notebooks in this repository

  • download_data_NASA_GHRC.ipynb: Instructions for locally downloading from the NASA GHRC website the folders containing the raw PHIPS images (in .PNG format), as well as the .ict files for the P-3 aircraft meteorological/navigational datasets. Due to the large size of the raw PHIPS image data, it is recommended to run the download prompts in the command line.

  • cleaning_data_P3_nav.ipynb: For cleaning the P-3 aircraft met/nav .ict data files using a read routine adapted from the IMPACTS Github repository. Cleaned data is saved as NetCDF files. Manual inspection/selection of the raw PHIPS images was necessary for cleaning the image dataset, hence why no cleaning notebook for it is provided. Please refer to the README.md under data/clean/ that describes in detail the manual selection process for the images.

  • prepare_AI_ready_data_PHIPS.ipynb: For reading in the cleaned PHIPS image data and the P-3 aircraft met/nav data and constructing an xarray.Dataset with the images converted to numpy arrays and appropriate meta data attributes included from the P-3 met/nav dataset. This AI-ready dataset is saved as a NetCDF file and manually uploaded to Google Drive (file name: PHIPS_CrystalHabitAI_Dataset.nc) as opposed to Github due to repository storage constraints. Please refer to the README.md in data/ai_ready/ for downloading the dataset.

  • EDA_PHIPS.ipynb: For exploring the structure and key statistical characteristcs of the PHIPS_CrystalHabitAI_Dataset.nc AI-ready dataset. Basic statistics of the pixel intensities across all the PHIPS images are inspected, along with a correlation matrix that explores linear relationshps between image intensity and other data attributes such as habit type and temperature. et. Basic statistics of the pixel intensities across all the PHIPS images are inspected, along with a correlation matrix that explores linear relationshps between image intensity and other data attributes such as habit type and temperature.

  • dimensionality_reduction_PHIPS.ipynb: For analyzing the dimensions of the PHIPS_CrystalHabitAI_Dataset.nc AI-ready dataset and implementing linear (Principle Component Analysis, PCA) and non-linear (t-Distributed Stochastic Neighbor Embedding, t-SNE) dimension reduction techniques to tease out linear and non-linear relationships, respectively, in the data.

About

For the UW-MLGEO 2024 final project. This project will focus on using a CCN to classify ice crystals images sampled in a snow storm.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •