Tested versions
pyannote/speaker-diarization-3.1
System information
ubuntu 22.04.5 ;
Issue description
While checking the embedding formation using pyannote embedding model, Whenever I speak audio having one or two words, approx 2 seconds of audio, it is unable to for the embeddings for it. The diarization model works properly, and VAD also detects audio instead of speech, but I get this issue: Found speech but no valid (non-NaN) speaker embeddings could be generated.
What can be possible reasons for it? Is it limitation of model, or can it be mitigated?
Minimal reproduction example (MRE)
Issue not stored in form of file