TinyVoice Keyword Spotting - CENG 481 Artificial Neural Networks Term Project
A voice-based keyword spotting (KWS) system optimized for microcontrollers (Edge AI). This project trains and compares 4 different deep learning models on the Google Speech Commands dataset and prepares the best model (DS-CNN) for deployment on microcontrollers like ESP32.
| Name | GitHub |
|---|---|
| Umut Eray Açıkgöz | @ackgz0 |
| Arda Yıldız | @29ardayildiz |
| Barkın Sarıkartal | @barkinsarikartal |
| Model | Description | Parameters |
|---|---|---|
| MLP | Baseline Multi-Layer Perceptron | High |
| CNN | Standard Convolutional Neural Network | Medium |
| DS-CNN | Depthwise Separable CNN (Best) | Low |
| CRNN | Convolutional Recurrent Neural Network | Medium |
DS-CNN is the standard architecture for keyword spotting on microcontrollers. It achieves similar performance to standard CNN while using significantly fewer parameters.
tinyVoiceKWS/
├── download_data.py # Dataset download script
├── requirements.txt # Python dependencies
├── src/
│ ├── prepare_dataset.py # Dataset preprocessing (MFCC extraction)
│ ├── model_builder.py # 4 model architecture definitions
│ ├── train_comparison. py # Comparative model training
│ ├── inference.py # Model inference/prediction
│ ├── convert_to_tflite. py # TFLite & C header conversion
│ ├── test_robustness. py # Noise robustness tests
│ ├── visualize_saliency.py # Saliency map visualization
│ └── visualize_tsne.py # t-SNE analysis
├── notebooks/
│ ├── 1_Data_Exploration.ipynb
│ └── 2_Model_Training.ipynb
├── MICROPHONE_ESP32/ # ESP32 microphone module code
└── ACTUATOR_ESP32/ # ESP32 actuator module code
pip install -r requirements. txtDownload the Google Speech Commands dataset:
python download_data.pyExtract MFCC features from audio files:
python src/prepare_dataset.pyTrain all 4 models (MLP, CNN, DS-CNN, CRNN) with comparative analysis:
python src/train_comparison.pyAfter training, the best model (DS-CNN) will be saved to the models/ directory.
To make predictions with the trained model, edit the model path in src/inference.py:
# Change the model path in inference.py to your desired model
MODEL_PATH = "models/dscnn_model.h5" # or mlp, cnn, crnnThen run:
python src/inference.pyConvert the model to TensorFlow Lite format and C header file for microcontroller deployment (ESP32, etc.):
python src/convert_to_tflite.pyNote: This step is experimental and may encounter issues in some cases.
Include the generated .h header file in your ESP32 project to run the model on the microcontroller.
# Noise robustness test
python src/test_robustness.py
# Saliency map visualization
python src/visualize_saliency.py
# t-SNE feature analysis
python src/visualize_tsne.py- MICROPHONE_ESP32/: ESP32 code for the microphone module
- ACTUATOR_ESP32/: ESP32 code for actuator control

