BirdCLEF+ 2025: Kaggle Competition Submission

This repository contains all the code I used for the BirdCLEF+ 2025 competition on Kaggle.

My final submission placed 925th, with a 0.824 ROC-AUC score on the private test set.

Competition Overview

The BirdCLEF+ 2025 competition focused on detecting species (birds, amphibians, mammals, and insects) from 1-minute soundscape recordings collected in El Silencio Natural Reserve, Colombia. The goal was to help ecologists monitor biodiversity using acoustic monitoring, which enables large-scale and frequent data collection. Participants were asked to train machine learning models that can identify which species are calling in short audio segments, using a small labeled dataset and a larger set of unlabeled recordings. Some of the challenges with this competition were limited compute (the final notebook had to run exclusively on CPU), extremely large class imbalance, and very noisy data that was from a different distribution (different recording location) from the test set.

The evaluation metric was a version of macro-averaged ROC-AUC, calculated per species and ignoring classes without any true positive examples.

Final Solution

My final approach was based on training an EfficientNet model on mel-spectrograms. During inference, I applied several tricks, such as adjusting the power of low-rank columns to improve robustness. Many people had similar approaches, however I only settled on this method after thorough experiments with many other methods.

Other Approaches

Before settling on the final solution, I experimented with several other ideas, each of which is available in a separate branch:

Black Box Shift Estimation (Black-Box-Shift-Estimation branch): I tried to estimate the distribution of classes in the test set using the unlabeled soundscapes.
Domain Adversarial Training (DANN branch): I used a domain classifier to help the model generalize better to the distribution of the unlabeled/test data. Based on this paper.
Fine-Grained Recognition (Fine-Grained branch): I trained a CNN to pick up on subtle differences in spectrograms to better distinguish between similar spectrograms. Based on this paper.

How to run

To run my final solution do the following:

Install all the dependencies using requirements.txt.
Place the whole dataset from kaggle in the folder named data.
Run:

python3 src/preprocess.py

Then finally:

python3 src/train.py

The script to run inference can be found in src/submit.py.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BirdCLEF+ 2025: Kaggle Competition Submission

Competition Overview

Final Solution

Other Approaches

How to run

About

Uh oh!

Releases

Packages

Languages

Adam-Mazur/bird_clef_2025

Folders and files

Latest commit

History

Repository files navigation

BirdCLEF+ 2025: Kaggle Competition Submission

Competition Overview

Final Solution

Other Approaches

How to run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages