Skip to content

Error forming embeddings for small duration audio: Found speech but no valid (non-NaN) speaker embeddings could be generated. #1961

@Vihang257

Description

@Vihang257

Tested versions

pyannote/speaker-diarization-3.1

System information

ubuntu 22.04.5 ;

Issue description

While checking the embedding formation using pyannote embedding model, Whenever I speak audio having one or two words, approx 2 seconds of audio, it is unable to for the embeddings for it. The diarization model works properly, and VAD also detects audio instead of speech, but I get this issue: Found speech but no valid (non-NaN) speaker embeddings could be generated.
What can be possible reasons for it? Is it limitation of model, or can it be mitigated?

Minimal reproduction example (MRE)

Issue not stored in form of file

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions