Robust Speech Command Recognition with PNCC + CNN-GRU

Power-Normalized Cepstral Coefficients (PNCC) based speech command recognition built with a hybrid CNN-GRU architecture, aimed at robustness in noisy conditions.

Trains on TensorFlow/Keras with deterministic seeds for reproducibility.
Includes pretrained baseline weights under saved/model/.
Generates accuracy curves, confusion matrices, and classification reports for each fold.

Project Overview

Goal: Recognize 10-word commands from the Speech Command Dataset V2 under clean and noisy conditions.
Features: PNCC (default) or MFCC pickled features plus labels stored under dataset/pickle/.
Model: Convolutional stack for spatial cues + GRU for temporal cues; configurable via code/train_param.py.

Repository Structure

code/train.py — K-fold training loop, artifact saving.
code/train_param.py — Hyperparameters and dataset/paths configuration.
code/build_CNN_model.py — CNN/GRU model variants.
code/utils.py — Plotting and reporting utilities.
dataset/ — Expected location for feature and label pickles.
saved/ — Default output root for models, plots, and reports.

Architecture

CNN extracts spatial information; GRU captures temporal structure (e.g., ACF cues: center width, zero-lag peak, loudness, pitch).
Spatial cues are represented by IACF features such as apparent source width and subjective diffusion.

Results

Average 3-fold cross-validation on Speech Command Dataset V2 with 10 classes:

Model	Condition	Training Acc (%)	Testing Acc (%)
CNN	Clean	87.91	92.34
	Noisy	83.89	84.43
GRU	Clean	96.90	95.95
	Noisy	92.96	89.26
CNN-GRU	Clean	97.80	96.48
	Noisy	95.28	89.16

Notes:

Models were trained on a mix of clean audio and audio with additive noise at random SNRs between 0 dB and 20 dB.
The hybrid CNN-GRU generally outperforms individual CNN or GRU models; mild overfitting remains on the noisy split.

Setup Guide

Environment: Python 3.8+ recommended. Create and activate a virtualenv.
```
python -m venv .venv
.venv\\Scripts\\activate
```

Install dependencies (versions tested):

pip install tensorflow==2.8.0 librosa==0.9.2 spafe==0.3.1 scikit-learn==1.0.2 numpy==1.21.6 matplotlib pandas

Prepare data:
- Place feature pickles (PNCC or MFCC) and label pickles under dataset/pickle/ (see examples in code/train_param.py).
- Update dataset and label paths in code/train_param.py to point to your chosen pickles.
Configure training:
- Adjust epochs, batch_size, lr, and num_folds in code/train_param.py as needed.
- The default output root is saved/; ensure the path is writable.
Run training:
```
cd code
python train.py
```
This will create models, accuracy plots, confusion matrices, classification reports, and history JSON files under saved/single/CNN/.

Generated Artifacts

Models: saved/single/CNN/model/<title>_k*.h5
Accuracy curves: saved/single/CNN/fig/accuracy_<title>_k*.png
Confusion matrices: saved/single/CNN/fig/cm_<title>_k*.png
Training history: saved/single/CNN/history/<title>_k*.json
Classification reports: saved/single/CNN/text/clf_report_<title>_k*.csv
Metrics summary: saved/single/CNN/text/<title>_eval_metrics_k*.txt

Made by Apurba Kumar Show, IIT Kharagpur.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
code		code
dataset		dataset
saved		saved
LICENSE		LICENSE
README.md		README.md
ROC.jpg		ROC.jpg
diagram.jpg		diagram.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Robust Speech Command Recognition with PNCC + CNN-GRU

Project Overview

Repository Structure

Architecture

Results

Setup Guide

Generated Artifacts

About

Uh oh!

Releases

Packages

Languages

License

IITApurba/Robust-Speech-Command-Recognition

Folders and files

Latest commit

History

Repository files navigation

Robust Speech Command Recognition with PNCC + CNN-GRU

Project Overview

Repository Structure

Architecture

Results

Setup Guide

Generated Artifacts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages