Audio clustering - how to aggregate embeddings from NNFP #2639
Unanswered
claudiolaas
asked this question in
Q&A
Replies: 1 comment
-
You can just concat 10 continuous clip embeddings to make one larger embedding (dim 1280). Or you can try some other operators who only generate 1 embedding for each audio input: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I have audio clips of variable length (<10sec). I want to create embeddings and use some form of unsupervised clustering to group the clips into buckets. I have had decent success with resemblyzer+Kmeans, but i thought a different approach might yield better results. I also tried speaker diarization from pyannote.audio but that took too long.
NNFP (https://towhee.io/audio-embedding/nnfp) does indeed give me embeddings for my clips but it splits the clips into segments of 1 second length and produces embedding for each of those segments.
cheers
Beta Was this translation helpful? Give feedback.
All reactions