🚀 Feature
I am working on Intel (×64/CPU) devices and would like to run the VAD model using OpenVINO Toolkit. Although the repository provides an ONNX model, I was unable to successfully load and infer it using OpenVINO. I could not convert it into IR format using OpenVINO model optimizer. I request that a native OpenVINO IR (XML + BIN) version of the model or documented conversion path be provided for Intel deployment.
Motivation
The repo clearly states that the model supports ONNX runtime: “If you are planning to run the VAD using solely the onnx-runtime, it will run on any other system architectures where onnx-runtime is supported.” I tested the ONNX file and got it working via ONNX Runtime, but when I tried to convert it with Model Optimizer or load the onnx model with OpenVINO inference engine, it failed.
Many deployment scenarios on Intel CPUs (or integrated GPUs) use OpenVINO for optimized inference, and having a first-class IR model would reduce friction for users targeting that ecosystem.
Providing the IR (or conversion instructions) would broaden the usability of the model for edge/embedded use-cases with Intel hardware, without users having to debug low-level compatibility issues.
Pitch
Provide a ready-to-use OpenVINO IR model (XML + BIN) for the current checkpoint of Silero VAD.
Alternatively (or additionally), provide instructions / script for converting the ONNX model to OpenVINO IR:
- Specify which ONNX opset version the model uses (the wiki says ONNX supports opset 15/16)
- Provide any conversion flags or patches (e.g., unsupported ops, subgraphs, recurrent states) needed for OpenVINO compatibility.
- Provide a small sample code snippet for loading the IR with OpenVINO’s Inference Engine / OpenVINO Runtime.
- Optionally, mark in the release/tag that the IR is available (so users don’t have to manually convert).
Alternatives
Use ONNX Runtime on Intel devices (which works already).
Use the original PyTorch model or JIT version.
Manually perform ONNX→IR conversion and debug, but this takes additional time, can be brittle across version changes, and may limit reproducibility of results