Skip to content

Latest commit

 

History

History
101 lines (78 loc) · 3.07 KB

File metadata and controls

101 lines (78 loc) · 3.07 KB

HumAware-VAD: Humming-Aware Voice Activity Detection

Overview

HumAware-VAD is a fine-tuned version of the Silero-VAD model, trained to distinguish humming from actual speech. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset (HumSpeechBlend) to enhance speech detection accuracy in the presence of humming.

Demo

demo2.mp4

🎯 Purpose

The primary goal of HumAware-VAD is to:

  • Reduce false positives where humming is mistakenly detected as speech.
  • Enhance speech segmentation accuracy in real-world applications.
  • Improve VAD performance for tasks involving music, background noise, and vocal sounds.

🗂️ Model Details

  • Base Model: Silero-VAD
  • Fine-tuning Dataset: HumSpeechBlend
  • Format: JIT (TorchScript)
  • Framework: PyTorch
  • Inference Speed: Real-time

🚀 Using HumAware-VAD with FastRTC

You can integrate HumAware-VAD with FastRTC for real-time voice activity detection in streaming applications.

Installation

pip install humaware-vad

Clone the this Repository

git clone https://github.com/CuriousMonkey7/HumAwareVad.git
cd HumAwareVad

Run the script:

python app.py

⚠️ Limitations

  • The model may miss speech detection if the user speaks too softly.
  • Works best for detecting "mhm" humming sounds.
  • May also work for sounds like "la la la" or "da da da", but with varying accuracy.

📄 Citation

If you use this model, please cite it accordingly.

@model{HumAwareVAD2025,
  author = {Sourabh Saini},
  title = {HumAware-VAD: Humming-Aware Voice Activity Detection},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD}
}