Error forming embeddings for small duration audio: Found speech but no valid (non-NaN) speaker embeddings could be generated.

### Tested versions

pyannote/speaker-diarization-3.1

### System information

ubuntu 22.04.5 ; 

### Issue description

While checking the embedding formation using pyannote embedding model, Whenever I speak audio having one or two words, approx 2 seconds of audio, it is unable to for the embeddings for it. The diarization model works properly, and VAD also detects audio instead of speech, but I get this issue: Found speech but no valid (non-NaN) speaker embeddings could be generated.
What can be possible reasons for it? Is it limitation of model, or can it be mitigated? 

### Minimal reproduction example (MRE)

Issue not stored in form of file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error forming embeddings for small duration audio: Found speech but no valid (non-NaN) speaker embeddings could be generated. #1961

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error forming embeddings for small duration audio: Found speech but no valid (non-NaN) speaker embeddings could be generated. #1961

Description

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions