This project explores deep learning techniques for classifying Indian classical ragas. Unlike Western genre classification, raga classification presents unique challenges due to its modal nature, improvisational aspects, and limited dataset availability. This study aims to overcome these challenges by employing specialized feature extraction and deep learning models.
- 📂 Dataset: Curated raga dataset with vocal-only and vocal + instrumental segments.
- 🧠 Feature Extraction:
- MFCC
- Chroma CENS
- Chroma STFT
- RMSE
- Spectral Centroid
- Spectral Bandwidth
- Spectral Rolloff
- Zero Crossing Rate (ZCR)
- Pitch Mean
- Magnitude Mean
- 🤖 Deep Learning Models:
- LSTM Model
- ANN Model with NPZ Data Storage
- BERT-Based Transformer Model
- Final ANN Model without Feature Averaging
- 🛠 Tools Used:
- 🎵
librosa
for feature extraction - 🎙
Demucs
for vocal isolation - 🤖
TensorFlow/Keras
for deep learning models - 📊
NumPy
andPandas
for data handling - 📁
JSON
andNPZ
for efficient data storage
- 🎵
- 📥 Data Collection: Raga audio recordings were segmented into meaningful phrases.
- 🎙 Vocal Isolation: Used Demucs to separate vocal components.
- 📊 Feature Extraction: Stored extracted features in JSON and NPZ formats.
🏗 Model | 🎯 Accuracy |
---|---|
LSTM | 34.80% |
ANN with NPZ | 30% |
BERT-Based Transformer | 33.1% |
Final ANN (Vocal + Instrument) | 53% |
Final ANN (Vocal-Only) | 73% |
- ❌ Traditional Western classification methods yielded poor accuracy (e.g., 22% using only MFCCs).
- ✅ Expanded feature extraction improved model performance.
- 🎤 Vocal-only datasets led to the highest accuracy (73%).
- 🏆 ANN models performed better than LSTMs and Transformers.
- 🐍 Python 3.8+
- 🤖 TensorFlow/Keras
- 🎵 Librosa
- 📊 NumPy, Pandas
- 🎙 Demucs (for vocal isolation)
- 📚 Expanding the dataset with more ragas and labeled samples.
- 🏗️ Exploring hybrid deep learning models.
- 🎼 Incorporating musicological insights to refine classification techniques.
- *Karthick_M ([[email protected]])