Skip to content

IITApurba/Robust-Speech-Command-Recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Robust Speech Command Recognition with PNCC + CNN-GRU

Power-Normalized Cepstral Coefficients (PNCC) based speech command recognition built with a hybrid CNN-GRU architecture, aimed at robustness in noisy conditions.

  • Trains on TensorFlow/Keras with deterministic seeds for reproducibility.
  • Includes pretrained baseline weights under saved/model/.
  • Generates accuracy curves, confusion matrices, and classification reports for each fold.

Project Overview

  • Goal: Recognize 10-word commands from the Speech Command Dataset V2 under clean and noisy conditions.
  • Features: PNCC (default) or MFCC pickled features plus labels stored under dataset/pickle/.
  • Model: Convolutional stack for spatial cues + GRU for temporal cues; configurable via code/train_param.py.

Repository Structure

  • code/train.py — K-fold training loop, artifact saving.
  • code/train_param.py — Hyperparameters and dataset/paths configuration.
  • code/build_CNN_model.py — CNN/GRU model variants.
  • code/utils.py — Plotting and reporting utilities.
  • dataset/ — Expected location for feature and label pickles.
  • saved/ — Default output root for models, plots, and reports.

Architecture

Hybrid CNN-GRU Block Diagram

  • CNN extracts spatial information; GRU captures temporal structure (e.g., ACF cues: center width, zero-lag peak, loudness, pitch).
  • Spatial cues are represented by IACF features such as apparent source width and subjective diffusion.

Results

Average 3-fold cross-validation on Speech Command Dataset V2 with 10 classes:

Model Condition Training Acc (%) Testing Acc (%)
CNN Clean 87.91 92.34
Noisy 83.89 84.43
GRU Clean 96.90 95.95
Noisy 92.96 89.26
CNN-GRU Clean 97.80 96.48
Noisy 95.28 89.16

ROC Curves

Notes:

  • Models were trained on a mix of clean audio and audio with additive noise at random SNRs between 0 dB and 20 dB.
  • The hybrid CNN-GRU generally outperforms individual CNN or GRU models; mild overfitting remains on the noisy split.

Setup Guide

  1. Environment: Python 3.8+ recommended. Create and activate a virtualenv.
    python -m venv .venv
    .venv\\Scripts\\activate
  2. Install dependencies (versions tested):
    pip install tensorflow==2.8.0 librosa==0.9.2 spafe==0.3.1 scikit-learn==1.0.2 numpy==1.21.6 matplotlib pandas
  3. Prepare data:
    • Place feature pickles (PNCC or MFCC) and label pickles under dataset/pickle/ (see examples in code/train_param.py).
    • Update dataset and label paths in code/train_param.py to point to your chosen pickles.
  4. Configure training:
    • Adjust epochs, batch_size, lr, and num_folds in code/train_param.py as needed.
    • The default output root is saved/; ensure the path is writable.
  5. Run training:
    cd code
    python train.py
    This will create models, accuracy plots, confusion matrices, classification reports, and history JSON files under saved/single/CNN/.

Generated Artifacts

  • Models: saved/single/CNN/model/<title>_k*.h5
  • Accuracy curves: saved/single/CNN/fig/accuracy_<title>_k*.png
  • Confusion matrices: saved/single/CNN/fig/cm_<title>_k*.png
  • Training history: saved/single/CNN/history/<title>_k*.json
  • Classification reports: saved/single/CNN/text/clf_report_<title>_k*.csv
  • Metrics summary: saved/single/CNN/text/<title>_eval_metrics_k*.txt

Made by Apurba Kumar Show, IIT Kharagpur.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages