HumAware-VAD is a fine-tuned version of the Silero-VAD model, trained to distinguish humming from actual speech. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset (HumSpeechBlend) to enhance speech detection accuracy in the presence of humming.
demo2.mp4
The primary goal of HumAware-VAD is to:
- Reduce false positives where humming is mistakenly detected as speech.
- Enhance speech segmentation accuracy in real-world applications.
- Improve VAD performance for tasks involving music, background noise, and vocal sounds.
- Base Model: Silero-VAD
- Fine-tuning Dataset: HumSpeechBlend
- Format: JIT (TorchScript)
- Framework: PyTorch
- Inference Speed: Real-time
You can integrate HumAware-VAD with FastRTC for real-time voice activity detection in streaming applications.
pip install humaware-vadgit clone https://github.com/CuriousMonkey7/HumAwareVad.git
cd HumAwareVadRun the script:
python app.py- The model may miss speech detection if the user speaks too softly.
- Works best for detecting "mhm" humming sounds.
- May also work for sounds like "la la la" or "da da da", but with varying accuracy.
If you use this model, please cite it accordingly.
@model{HumAwareVAD2025,
author = {Sourabh Saini},
title = {HumAware-VAD: Humming-Aware Voice Activity Detection},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD}
}