Audio Classification CNN

Classify short audio clips (e.g., dog bark, bird chirp, siren, rain) with a ResNet-style CNN trained on Mel Spectrograms. The project includes a full training pipeline (PyTorch), FastAPI inference service, serverless GPU inference with Modal, and an interactive Next.js + React dashboard for uploads, real-time predictions, and feature‑map visualization.

✨ Features

🧠 Deep Audio CNN for sound classification
🧱 ResNet-style architecture with residual blocks
🎼 Mel Spectrogram audio-to-image conversion
🎛️ Data augmentation: Mixup + SpecAugment (Time/Freq masking)
⚡ Serverless GPU inference with Modal
📊 Interactive Next.js & React dashboard (Tailwind + shadcn/ui)
📈 Real-time classification with confidence scores
🌊 Waveform & Spectrogram visualization
🚀 FastAPI inference endpoint (+ Pydantic validation)
📈 TensorBoard integration for training analysis
✅ Pydantic validation for robust API requests

🧱 Architecture Overview

Why Mel Spectrograms? They convert audio to a perceptual time–frequency image that CNNs handle well.
Why ResNet? Residual connections ease optimization of deeper models and boost accuracy.
Why Mixup/SpecAugment? Strong regularization for robustness against noise and domain shift.

🧩 Project Setup

1. Python environment

cd server
conda create -n audio-cnn python=3.11 -y
conda activate audio-cnn
pip install -r requirements.txt

2. Next.js frontend

cd client
npm install
npm run dev

🔧 Environment Variables

Create .env in your client root

NEXT_PUBLIC_MODAL_API_ENDPOINT="Your_API_Key"

Features and Interfaces

🧰 Troubleshooting

Torchaudio backend errors: ensure ffmpeg/libsndfile installed.
Noisy predictions: raise clip length, tweak Mixup alpha, reduce masks.
Overfitting: stronger Mixup/SpecAug, Dropout in classifier, early stopping.
Underfitting: deeper ResNet, higher base_channels, longer training, lower weight decay.

🚀 Need Help??

Feel free to contact me on Linkedin

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
client		client
server		server
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Classification CNN

✨ Features

🧱 Architecture Overview

🧩 Project Setup

1. Python environment

2. Next.js frontend

🔧 Environment Variables

Features and Interfaces

🧰 Troubleshooting

🚀 Need Help??

About

Uh oh!

Releases

Packages

Languages

AmanKrSahu/deep-audio-cnn

Folders and files

Latest commit

History

Repository files navigation

Audio Classification CNN

✨ Features

🧱 Architecture Overview

🧩 Project Setup

1. Python environment

2. Next.js frontend

🔧 Environment Variables

Features and Interfaces

🧰 Troubleshooting

🚀 Need Help??

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages