MinimalSileroVad

Overview

MinimalSileroVad is a .NET implementation for Voice Activity Detection (VAD) and speech segmentation. It uses the Silero VAD AI model to determine if audio input contains speech, providing a lightweight pipeline for detecting and segmenting speech in audio streams or files via ONNX inference. This project is designed for developers needing efficient, real-time voice detection in applications like telephony, voice assistants, or audio processing tools.

Key highlights:

Minimalist Design: Focuses on core VAD functionality with minimal dependencies.
AI-Powered Detection: Leverages the Silero VAD neural network model for accurate speech identification.
ONNX-Based Inference: Utilizes the Silero VAD model exported to ONNX for cross-platform compatibility.
Extensible: Easy to integrate into larger audio processing pipelines.

This project is ideal for building speech detection components in automated systems, transcription services, or interactive voice applications.

Features

Voice Activity Detection: Accurately identifies speech segments in audio inputs using AI.
Speech Segmentation: Breaks down audio into speech and non-speech parts with timestamps.
Real-Time Processing: Supports streaming audio for live detection.
Model Compatibility: Uses the pre-trained Silero VAD model via ONNX.
Customizable Thresholds: Adjust sensitivity for speech detection.
Logging Support: Includes basic logging for debugging and monitoring.
Cross-Platform: Runs on .NET environments with GPU/CPU support.

Prerequisites

.NET SDK (version 8.0 or higher recommended)
ONNX Runtime (for model inference)
cuDNN (for GPU acceleration with CUDA-enabled setups)
CUDA Toolkit (optional, for GPU support; ensure compatibility with ONNX Runtime)
Optional: NAudio (for microphone input in test projects)

Installation

Clone the repository:

git clone https://github.com/calebtt/MinimalSileroVad.git cd MinimalSileroVad
Restore NuGet packages:

dotnet restore
Configure settings:
- Download the Silero VAD ONNX model if not included (e.g., from the official Silero repository).
- Place the model file (e.g., silero_vad.onnx) in the appropriate directory or update paths in code.
Build the project:

dotnet build

Usage

Run the application:

dotnet run
Process audio:
- Provide an audio file or stream as input.
- The tool will output detected speech segments with start/end timestamps.

For advanced customization:

Modify detection thresholds in the code (e.g., probability threshold for speech).
Integrate into your application by calling the VAD functions.

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a feature branch (git checkout -b feature/YourFeature).
Commit your changes (git commit -m 'Add YourFeature').
Push to the branch (git push origin feature/YourFeature).
Open a Pull Request.

Adhere to modern best practices: Use meaningful commit messages, include unit tests, and follow C# coding standards (e.g., async/await for I/O operations).

License

MIT

Acknowledgments

Based on the Silero VAD model.
Utilizes ONNX Runtime for inference.

For questions or issues, open a GitHub issue or reach out via discussions.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
MinimalSileroVAD.Core		MinimalSileroVAD.Core
MinimalVadTest		MinimalVadTest
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

MinimalSileroVad

Overview

Features

Prerequisites

Installation

Usage

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

calebtt/MinimalSileroVad

Folders and files

Latest commit

History

Repository files navigation

MinimalSileroVad

Overview

Features

Prerequisites

Installation

Usage

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages