Spectrogram Attention for Acoustic Bird Species Recognition

This repository contains the official implementation of Spectrogram Attention for Acoustic Bird Species Recognition. The proposed Deep learning approach introduces Spectrogram Attention (SA), a novel mechanism for jointly modeling fine-grained spectro-temporal patterns in log-mel spectrograms using feature maps extracted from a pretrained convolutional neural network. The model is pretrained on a large-scale corpus of 9,735 bird species from the Xeno-Canto dataset and subsequently fine-tuned on eight BirdSet soundscape corpora under three different training regimes.

Project Structure

sa4birds/
│
├── additional_data/          # Auxiliary data used by the project
├── ckpts/              # Pretrained model checkpoints
├── figures/                  # Architecture and documentation figures
│
├── models/                   # Model architectures and implementations
├── train/                    # Training scripts and utilities
│
├── notebooks/                # Example notebooks
│   ├── model_demo.ipynb      # Demonstrates model usage
│   └── evaluation_birdset.ipynb # Demonstrates evaluation on all Birdset down tasks 
│   └── evaluation_ablation_study.ipynb # Demonstrates ablation study experiments
│   └── evaluation_beans.ipynb # Demonstrates transfer learning experiments on BEANS benchmark

│
├── validate_birdset.py       # Evaluation entry point
├── prepare_checkpoints.py    # Script to download / prepare checkpoints for testing
│
├── requirements.txt          # Python dependencies
└── README.md                 # Project documentation

Note:
For all notebooks provided in the notebooks/ directory, we also uploaded the cell outputs corresponding to the expected results. This allows users to inspect the expected outputs without rerunning the full experiments, which can require significant computational resources and large datasets.

Requirements

This project requires Python 3+ and a CUDA-capable GPU with >12 GB of VRAM when evaluating on BirdSet, due to the size of the trained models. Running a single model on a single sample on a CPU is possible (see Model demo), but GPU execution is strongly recommended for evaluation or repeated inference to avoid extremely long runtimes.

System Requirements

Python: > 3.0 (recommended: Python 3.9+)
GPU (recommended): NVIDIA GPU with CUDA support
NVIDIA Drivers: Required for CUDA support

Install Python Dependencies

After cloning the repository, navigate into the project directory:

git clone git@github.com:umr-ds/sa4birds.git
cd sa4birds

The project relies on the following main packages (see requirements.txt for the complete list):

datasets
hydra-core
librosa
numpy
scikit-learn
soundfile
timm
torch
torchaudio
torchmetrics
torchvision
transformers

Create a Python virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate

In the case of a Windows system, run the following command in PowerShell:

python3 -m venv venv
./venv/scripts/Activate.ps1

Install the required Python packages listed in requirements.txt:

pip install -r requirements.txt

Memory requirements

Evaluation on BirdSet (see Datasets) across all downstream tasks requires approximately 160 GB of disk space to download and store the datasets. In addition, about 8.5 GB of storage is needed for the trained model checkpoints across all regimes.

Evaluation on BEANS (see Datasets) across all downstream tasks requires approximately 320 GB of disk space to download and store the datasets.

Datasets

Birdset

Training and evaluation primarily rely on the BirdSet benchmark.

BirdSet contains eight downstream tasks, each consisting of:

Training data: weakly labeled recordings from Xeno-Canto
Test data: strongly annotated regional soundscapes

For details see the BirdSet paper.

Datasets are automatically downloaded via the HuggingFace datasets library.

Example:

import datasets 

down_task = "HSN"
datasets.load_dataset("DBD-research-group/BirdSet", down_task)

Cached datasets are stored in:

~/.cache/huggingface/

Additional datasets:

The following datasets are used as no-call samples during training:

In addition, we used a subset of insect and frog sounds collected from iNaturalist as no-call samples. For more details see the training guide

BEANS

For transfer learning experiments, we use the BEANS benchmark, which consists of different downstream tasks related to various animal sounds, such as bats. For more details, see BEANS.

Checkpoints

Our pretrained checkpoints for BirdSet are available for three training regimes:

Regime	Description
DT	Dedicated training (task-specific models)
MT	Medium training
LT	Large training

Download the model checkpoints and place them in the ckpts directory, organized by training regime (DT, MT, or LT).

Training Regime	Task	Url
Dedicated	HSN	Download
Dedicated	POW	Download
Dedicated	SNE	Download
Dedicated	PER	Download
Dedicated	NES	Download
Dedicated	UHH	Download
Dedicated	NBP	Download
Dedicated	SSW	Download
Medium	All tasks	Download
Large	All tasks	Download

Download checkpoints manually or run:

python prepare_checkpoints.py

This will download the main checkpoints trained on BirdSet with the following structure:

sa4birds/
│
├── ckpts/                         # pretrained model checkpoints
│   ├── DT/
│   │   └── HSN/                         # downstream task name
│   │       ├── HSN_eca_nfnet_l1_2025-10-20_112131/   # DT HSN first model checkpoint
│   │       └── ...
│   │
│   ├── MT/
│   │   ├── MT_eca_nfnet_l1_2025-11-25_151907/        # MT first model checkpoint
│   │   └── ...
│   │
│   └── LT/
│       ├── LT_eca_nfnet_l1_2025-11-24_180849/        # LT first model checkpoint
│       └── ...

Model Demo

A demonstration of how to run one of the trained BirdSet models is provided in the notebook:

model_demo.ipynb

This notebook shows how to:

load the trained model
run model
inspect the outputs

To run the demo notebook, install Jupyter:

pip install jupyterlab

Launching Jupyter

Start the Jupyter notebook server from the project directory:

jupyter lab

Your browser will open automatically. Then open:

notebooks/model_demo.ipynb

and run the cells to see the model in action.

Validation

Birdset:

After installing the dependencies listed in requirements.txt and downloading the checkpoints (see Checkpoints), you can run the evaluation on BirdSet using validate_birdset.py.

For example, to evaluate HSN using the DT regime, run:

python validate_birdset.py mode=DT downtask=HSN

To evaluate on all Birdset downtasks:

python validate_birdset.py mode=DT downtask=ALL

A demonstration of the evaluation on BirdSet is provided in the notebook:

notebooks/evaluation_birdset.ipynb

The ablation study experiments are provided the notebook:

notebooks/evaluation_ablation_study.ipynb

BEANS:

To rerun all tests for our trained models on the BEANS benchmark, please use the following notebook:

notebooks/evaluation_beans.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spectrogram Attention for Acoustic Bird Species Recognition

Table of Contents

Project Structure

Requirements

System Requirements

Install Python Dependencies

Memory requirements

Datasets

Birdset

Additional datasets:

BEANS

Checkpoints

Model Demo

Launching Jupyter

Validation

Birdset:

BEANS:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
additional_data		additional_data
ckpts		ckpts
configs		configs
figures		figures
models		models
notebooks		notebooks
train		train
utils		utils
LICENCE		LICENCE
README.md		README.md
checkpoints.py		checkpoints.py
prepare_checkpoints.py		prepare_checkpoints.py
requirements.txt		requirements.txt
validate.sh		validate.sh
validate_birdset.py		validate_birdset.py

Folders and files

Latest commit

History

Repository files navigation

Spectrogram Attention for Acoustic Bird Species Recognition

Table of Contents

Project Structure

Requirements

System Requirements

Install Python Dependencies

Memory requirements

Datasets

Birdset

Additional datasets:

BEANS

Checkpoints

Model Demo

Launching Jupyter

Validation

Birdset:

BEANS:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages