Audio clustering - how to aggregate embeddings from NNFP #2639

claudiolaas · 2023-07-22T17:17:57Z

claudiolaas
Jul 22, 2023

Hi all,

I have audio clips of variable length (<10sec). I want to create embeddings and use some form of unsupervised clustering to group the clips into buckets. I have had decent success with resemblyzer+Kmeans, but i thought a different approach might yield better results. I also tried speaker diarization from pyannote.audio but that took too long.

NNFP (https://towhee.io/audio-embedding/nnfp) does indeed give me embeddings for my clips but it splits the clips into segments of 1 second length and produces embedding for each of those segments.

Question: does NNFP even produce valid embeddings for my clustering problem?
Question: how would I go about to aggregate the embeddings for a single clip?

cheers

jaelgu · 2023-07-24T03:31:39Z

jaelgu
Jul 24, 2023
Collaborator

You can just concat 10 continuous clip embeddings to make one larger embedding (dim 1280).

Or you can try some other operators who only generate 1 embedding for each audio input:

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Audio clustering - how to aggregate embeddings from NNFP #2639

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Audio clustering - how to aggregate embeddings from NNFP #2639

Uh oh!

claudiolaas Jul 22, 2023

Replies: 1 comment

Uh oh!

Uh oh!

jaelgu Jul 24, 2023 Collaborator

claudiolaas
Jul 22, 2023

jaelgu
Jul 24, 2023
Collaborator