Skip to content

HaleyEgan/Humpback-Whale-Song-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Humpback Whale Song Classification

Haley Egan

This project was conducted in collaboration with Nan Hauser at the Center for Cetacean Research & Conservation, who generously shared lots of wonderful humpback whale audio data, in the hopes of gaining a deeper understanding of language behaviors through Machine Learning.

Project Overview

This project implements a deep learning pipeline to classify humpback whale vocalizations by geographic location using Convolutional Neural Networks (CNNs). The approach transforms raw audio recordings into spectrogram images, which are then processed using a CNN in Tensorflow for location-based classification.

Pipeline Architecture

Audio → Spectrogram → CNN → Location Classification -> Model and Prediction Evaluation

Pipeline Components:

  1. Audio Preprocessing: Raw whale audio files are loaded and preprocessed (cleaned, cropped, etc)
  2. Spectrogram Generation: Audio signals are converted to time-frequency representations (spectrograms)
  3. CNN Classification: Convolutional Neural Network analyzes spectrograms to find patterns based on whale location
  4. Output: Location classification for each audio sample, and prediction of location for new audio samples

Classification Approach

Location-Based Classification: Multi-class classification framework

  • Classify each audio sample into one of multiple geographic locations
  • Each location represents a distinct class in the classification problem
  • Model outputs probability distribution across all possible locations

Alternative Approaches:

  • Multi-class: One model predicting among all locations (Location A, B, C, D...)
  • Binary per location: Multiple binary models (one per location) asking "is this location X or not?"
  • Hierarchical: Group locations by region, then classify within regions

Technical Resources

Core Tutorials:

Advanced Techniques:

Dataset Information

Source: Center for Cetacean Research and Conservation (Nan Hauser's dataset) from Bermuda and the Cook Islands, and open source audio recordings from Hawaii and Monterey.

Audio Format Specifications:

  • Channels: Stereo audio (2 channels)
  • Structure: Left and right audio channels are interleaved in single files (alternating left/right channel samples)
  • Implication: Requires channel separation during preprocessing to access individual left/right audio streams

Data Preprocessing Considerations:

  • Channel separation may be needed to analyze left vs right audio independently
  • Stereo format could provide spatial audio information useful for classification
  • File format and sample rate specifications should be documented for consistent processing

Notebooks

The classification pipeline was tested on the original full-length audio files, as well as shorter 75 second and 30 second clips. 30 second clips proved to be as effective, and occationally better than longer clips in predicting location, and were significantly less computationally expensive, so 30 second clips were used for analysis and development of the pipeline. Further experimentation with audio file lengths is encouraged.

Initial Results

The below results and visuals can be seen in the notebook HumpbackWhale_SpectrogramCNN_30SecAudioClips.ipynb. Further model evaluation metrics can be found in the notebook, including precision, recall, f1-score, accuracy, and loss.


Sample of Waveforms from Humpback Whale Audio Segments by Location Humpback Waveforms.png


Example of a Waveform and Corresponding Spectrogram for an Audio Segment Example Waveform and Spectrogram.png


Sample of Spectrograms from Humpback Whale Audio Segments by Location Humpback Spectrograms.png


Confusion Matrix of CNN Classification Results Model Confusion Matrix.png


Example of Model Prediction on New (never seen) Audio File Example Model Prediction on New Audio File.png

The class distribution in this notebook is imbalanced, with Bermuda containing the least amount of data. This is visible in the results, with Bermuda containing the highest number of misclassifications. This is something that can be adjusted and experimented with in the future. Ideally, all locations would have significantly more data, spanning many years, different types of recording equipment, and various recording locations within the regions. Data quantity and diversity was a constraint for this project, but with more data and expanded modeling techniques, the future possibilities are endless!

About

Classification of humpback whale songs by location using Convolutional Neural Networks in Tensorflow.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published