Humpback Whale Song Classification

Haley Egan

This project was conducted in collaboration with Nan Hauser at the Center for Cetacean Research & Conservation, who generously shared lots of wonderful humpback whale audio data, in the hopes of gaining a deeper understanding of language behaviors through Machine Learning.

Project Overview

This project implements a deep learning pipeline to classify humpback whale vocalizations by geographic location using Convolutional Neural Networks (CNNs). The approach transforms raw audio recordings into spectrogram images, which are then processed using a CNN in Tensorflow for location-based classification.

Pipeline Architecture

Audio → Spectrogram → CNN → Location Classification -> Model and Prediction Evaluation

Pipeline Components:

Audio Preprocessing: Raw whale audio files are loaded and preprocessed (cleaned, cropped, etc)
Spectrogram Generation: Audio signals are converted to time-frequency representations (spectrograms)
CNN Classification: Convolutional Neural Network analyzes spectrograms to find patterns based on whale location
Output: Location classification for each audio sample, and prediction of location for new audio samples

Classification Approach

Location-Based Classification: Multi-class classification framework

Classify each audio sample into one of multiple geographic locations
Each location represents a distinct class in the classification problem
Model outputs probability distribution across all possible locations

Alternative Approaches:

Multi-class: One model predicting among all locations (Location A, B, C, D...)
Binary per location: Multiple binary models (one per location) asking "is this location X or not?"
Hierarchical: Group locations by region, then classify within regions

Technical Resources

Core Tutorials:

TensorFlow Audio Classification Tutorial - Official TensorFlow guide for audio processing
CNNs for Audio Classification - Theory and implementation of CNNs for audio
MNIST Audio Classification with Spectrograms - Practical Keras implementation example

Advanced Techniques:

Custom Audio Classification with TensorFlow - Building custom audio classification models
Audio Echo Processing - Audio augmentation and noise reduction techniques

Dataset Information

Source: Center for Cetacean Research and Conservation (Nan Hauser's dataset) from Bermuda and the Cook Islands, and open source audio recordings from Hawaii and Monterey.

Audio Format Specifications:

Channels: Stereo audio (2 channels)
Structure: Left and right audio channels are interleaved in single files (alternating left/right channel samples)
Implication: Requires channel separation during preprocessing to access individual left/right audio streams

Data Preprocessing Considerations:

Channel separation may be needed to analyze left vs right audio independently
Stereo format could provide spatial audio information useful for classification
File format and sample rate specifications should be documented for consistent processing

Notebooks

The classification pipeline was tested on the original full-length audio files, as well as shorter 75 second and 30 second clips. 30 second clips proved to be as effective, and occationally better than longer clips in predicting location, and were significantly less computationally expensive, so 30 second clips were used for analysis and development of the pipeline. Further experimentation with audio file lengths is encouraged.

The segmenting process of audio files can be found in the SplitAudio_30sec.ipynb and SplitAudio_75sec.ipynb notebooks.
A verbose walkthrough of converting humpback audio files to spectrograms can be found in Audio_to_Specrogram.ipynb. The simplified version of the process is included in the main notebook, HumpbackWhale_SpectrogramCNN_30SecAudioClips.ipynb.
Testing the CNN on the full audio files can be seen at Spectrogram_to_CNN_FullSong.ipynb.
The full classification notebook with model evaluation and predications on 30 second audio segments can be found in the notebook HumpbackWhale_SpectrogramCNN_30SecAudioClips.ipynb.
Interactive Map of Pacific Ocean Humpback Whale Migration routes

Initial Results

The below results and visuals can be seen in the notebook HumpbackWhale_SpectrogramCNN_30SecAudioClips.ipynb. Further model evaluation metrics can be found in the notebook, including precision, recall, f1-score, accuracy, and loss.

Sample of Waveforms from Humpback Whale Audio Segments by Location

Example of a Waveform and Corresponding Spectrogram for an Audio Segment

Sample of Spectrograms from Humpback Whale Audio Segments by Location

Confusion Matrix of CNN Classification Results

Example of Model Prediction on New (never seen) Audio File

The class distribution in this notebook is imbalanced, with Bermuda containing the least amount of data. This is visible in the results, with Bermuda containing the highest number of misclassifications. This is something that can be adjusted and experimented with in the future. Ideally, all locations would have significantly more data, spanning many years, different types of recording equipment, and various recording locations within the regions. Data quantity and diversity was a constraint for this project, but with more data and expanded modeling techniques, the future possibilities are endless!

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
Resources_and_Examples		Resources_and_Examples
Result_Images		Result_Images
Audio_to_Specrogram.ipynb		Audio_to_Specrogram.ipynb
Classifying_Individual_Humpback_Songs.ipynb		Classifying_Individual_Humpback_Songs.ipynb
Classifying_Individual_Humpback_Songs_CookIslands.ipynb		Classifying_Individual_Humpback_Songs_CookIslands.ipynb
HumpbackWhale_SpectrogramCNN_30SecAudioClips.ipynb		HumpbackWhale_SpectrogramCNN_30SecAudioClips.ipynb
README.md		README.md
Spectrogram_to_CNN_FullSongs.ipynb		Spectrogram_to_CNN_FullSongs.ipynb
SplitAudio_30sec.ipynb		SplitAudio_30sec.ipynb
SplitAudio_75sec.ipynb		SplitAudio_75sec.ipynb
SplitAudio_IndividualClassification.ipynb		SplitAudio_IndividualClassification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Humpback Whale Song Classification

Project Overview

Pipeline Architecture

Pipeline Components:

Classification Approach

Technical Resources

Core Tutorials:

Advanced Techniques:

Dataset Information

Audio Format Specifications:

Data Preprocessing Considerations:

Notebooks

Initial Results

About

Uh oh!

Releases

Packages

Languages

HaleyEgan/Humpback-Whale-Song-Classification

Folders and files

Latest commit

History

Repository files navigation

Humpback Whale Song Classification

Project Overview

Pipeline Architecture

Pipeline Components:

Classification Approach

Technical Resources

Core Tutorials:

Advanced Techniques:

Dataset Information

Audio Format Specifications:

Data Preprocessing Considerations:

Notebooks

Initial Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages