Skip to content

ackgz0/tinyVoiceKWS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TinyVoice

TinyVoice Keyword Spotting - CENG 481 Artificial Neural Networks Term Project

A voice-based keyword spotting (KWS) system optimized for microcontrollers (Edge AI). This project trains and compares 4 different deep learning models on the Google Speech Commands dataset and prepares the best model (DS-CNN) for deployment on microcontrollers like ESP32.

Team Members

Name GitHub
Umut Eray Açıkgöz @ackgz0
Arda Yıldız @29ardayildiz
Barkın Sarıkartal @barkinsarikartal

Model Comparison

Model Description Parameters
MLP Baseline Multi-Layer Perceptron High
CNN Standard Convolutional Neural Network Medium
DS-CNN Depthwise Separable CNN (Best) Low
CRNN Convolutional Recurrent Neural Network Medium

DS-CNN is the standard architecture for keyword spotting on microcontrollers. It achieves similar performance to standard CNN while using significantly fewer parameters.

Project Structure

tinyVoiceKWS/
├── download_data.py          # Dataset download script
├── requirements.txt          # Python dependencies
├── src/
│   ├── prepare_dataset.py    # Dataset preprocessing (MFCC extraction)
│   ├── model_builder.py      # 4 model architecture definitions
│   ├── train_comparison. py   # Comparative model training
│   ├── inference.py          # Model inference/prediction
│   ├── convert_to_tflite. py  # TFLite & C header conversion
│   ├── test_robustness. py    # Noise robustness tests
│   ├── visualize_saliency.py # Saliency map visualization
│   └── visualize_tsne.py     # t-SNE analysis
├── notebooks/
│   ├── 1_Data_Exploration.ipynb
│   └── 2_Model_Training.ipynb
├── MICROPHONE_ESP32/         # ESP32 microphone module code
└── ACTUATOR_ESP32/           # ESP32 actuator module code

Usage Guide

Step 1: Install Dependencies

pip install -r requirements. txt

Step 2: Download Dataset

Download the Google Speech Commands dataset:

python download_data.py

Step 3: Prepare Dataset

Extract MFCC features from audio files:

python src/prepare_dataset.py

Step 4: Train Models

Train all 4 models (MLP, CNN, DS-CNN, CRNN) with comparative analysis:

python src/train_comparison.py

After training, the best model (DS-CNN) will be saved to the models/ directory.

Step 5: Inference

To make predictions with the trained model, edit the model path in src/inference.py:

# Change the model path in inference.py to your desired model
MODEL_PATH = "models/dscnn_model.h5"  # or mlp, cnn, crnn

Then run:

python src/inference.py

Step 6: Convert to TFLite for Edge AI (Optional)

Convert the model to TensorFlow Lite format and C header file for microcontroller deployment (ESP32, etc.):

python src/convert_to_tflite.py

Note: This step is experimental and may encounter issues in some cases.

Include the generated .h header file in your ESP32 project to run the model on the microcontroller.

Analysis & Visualization

# Noise robustness test
python src/test_robustness.py

# Saliency map visualization
python src/visualize_saliency.py

# t-SNE feature analysis
python src/visualize_tsne.py

ESP32 Integration

  • MICROPHONE_ESP32/: ESP32 code for the microphone module
  • ACTUATOR_ESP32/: ESP32 code for actuator control

Results

Model Comparison

Model Architectures

About

CENG 481 - Artificial Neural Networks course term project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors