Power-Normalized Cepstral Coefficients (PNCC) based speech command recognition built with a hybrid CNN-GRU architecture, aimed at robustness in noisy conditions.
- Trains on TensorFlow/Keras with deterministic seeds for reproducibility.
- Includes pretrained baseline weights under
saved/model/. - Generates accuracy curves, confusion matrices, and classification reports for each fold.
- Goal: Recognize 10-word commands from the Speech Command Dataset V2 under clean and noisy conditions.
- Features: PNCC (default) or MFCC pickled features plus labels stored under
dataset/pickle/. - Model: Convolutional stack for spatial cues + GRU for temporal cues; configurable via
code/train_param.py.
code/train.py— K-fold training loop, artifact saving.code/train_param.py— Hyperparameters and dataset/paths configuration.code/build_CNN_model.py— CNN/GRU model variants.code/utils.py— Plotting and reporting utilities.dataset/— Expected location for feature and label pickles.saved/— Default output root for models, plots, and reports.
- CNN extracts spatial information; GRU captures temporal structure (e.g., ACF cues: center width, zero-lag peak, loudness, pitch).
- Spatial cues are represented by IACF features such as apparent source width and subjective diffusion.
Average 3-fold cross-validation on Speech Command Dataset V2 with 10 classes:
| Model | Condition | Training Acc (%) | Testing Acc (%) |
|---|---|---|---|
| CNN | Clean | 87.91 | 92.34 |
| Noisy | 83.89 | 84.43 | |
| GRU | Clean | 96.90 | 95.95 |
| Noisy | 92.96 | 89.26 | |
| CNN-GRU | Clean | 97.80 | 96.48 |
| Noisy | 95.28 | 89.16 |
Notes:
- Models were trained on a mix of clean audio and audio with additive noise at random SNRs between 0 dB and 20 dB.
- The hybrid CNN-GRU generally outperforms individual CNN or GRU models; mild overfitting remains on the noisy split.
- Environment: Python 3.8+ recommended. Create and activate a virtualenv.
python -m venv .venv .venv\\Scripts\\activate
- Install dependencies (versions tested):
pip install tensorflow==2.8.0 librosa==0.9.2 spafe==0.3.1 scikit-learn==1.0.2 numpy==1.21.6 matplotlib pandas
- Prepare data:
- Place feature pickles (PNCC or MFCC) and label pickles under
dataset/pickle/(see examples incode/train_param.py). - Update
datasetandlabelpaths incode/train_param.pyto point to your chosen pickles.
- Place feature pickles (PNCC or MFCC) and label pickles under
- Configure training:
- Adjust
epochs,batch_size,lr, andnum_foldsincode/train_param.pyas needed. - The default output root is
saved/; ensure the path is writable.
- Adjust
- Run training:
This will create models, accuracy plots, confusion matrices, classification reports, and history JSON files under
cd code python train.pysaved/single/CNN/.
- Models:
saved/single/CNN/model/<title>_k*.h5 - Accuracy curves:
saved/single/CNN/fig/accuracy_<title>_k*.png - Confusion matrices:
saved/single/CNN/fig/cm_<title>_k*.png - Training history:
saved/single/CNN/history/<title>_k*.json - Classification reports:
saved/single/CNN/text/clf_report_<title>_k*.csv - Metrics summary:
saved/single/CNN/text/<title>_eval_metrics_k*.txt
Made by Apurba Kumar Show, IIT Kharagpur.

