Expose speaker centroid embeddings on DiarizationResult by leecrossley · Pull Request #463 · argmaxinc/argmax-oss-swift

leecrossley · 2026-04-19T19:36:40Z

Summary

Adds speakerCentroidEmbeddings: [Int: [Float]] to DiarizationResult so downstream consumers can match speakers across diarization runs without re-running the embedder.

Centroids are computed inside the clusterer, not in postProcess. VBxClustering.cluster(...) returns post-reassignment centroids along all three paths (VBx weighted, kMeans correction, AHC fallback) via one centroidsFromAssignments(assignments: clusters, embeddings: all, clusterCount: kFinal) pass after clusterReassignment(...), so speakerCentroidEmbeddings[k] is the mean of the final cluster members of speaker k.

Closes #457.

Motivation

We ship a privacy-first recording app that runs SpeakerKit diarization over many short audio chunks and needs to link speakers to the same person across chunks. Today, the per-window embeddings used internally for clustering are thrown away before diarize(...) returns, leaving no way to correlate the cluster ids in one result with those in another without running the embedder a second time over the whole chunk.

Exposing the cluster centroids is the smallest change that makes this possible: callers get one embedding per cluster, computed from the same data the clusterer already used. https://over.show

Changes

Sources/SpeakerKit/Pyannote/SpeakerClustering.swift: add speakerCentroids: [Int: [Float]] on ClusteringResult with default [:] so non-VBx conformers stay compatible.
Sources/SpeakerKit/Pyannote/VBxClustering.swift: cluster(...) now returns (clusters, linkageMatrix, centroids); after clusterReassignment(...) it runs one extra centroidsFromAssignments(...) pass so the surfaced map is the mean of the final cluster members across all three paths.
Sources/SpeakerKit/Pyannote/PyannoteDiarizer.swift: postProcess accepts speakerCentroids and threads them into DiarizationResult; inline mean-pool loop removed.
Sources/SpeakerKit/DiarizationResult.swift: speakerCentroidEmbeddings is public private(set) var with doc comments on raw embedder space, post-reassignment mean, threshold-free distance semantics, and Pyannote-only applicability. Adds public func centroidCosineDistance(between:_:) and public func nearestSpeakerCentroid(to:) for caller-side comparison.
Sources/SpeakerKit/Pyannote/SpeakerEmbedderModel.swift: revert SpeakerEmbedding and its embedding field back to internal; nothing else in the PR needs them public now that the compute lives in the clusterer.
Tests/SpeakerKitTests/SpeakerCentroidEmbeddingsTests.swift: unit tests across both centroid producers (calculateCentroids + centroidsFromAssignments) and integration tests on VADAudio/VBxClustering, including testCentroidValuesMatchFinalAssignmentMean which pins the surfaced value equals the mean of the final cluster members.

Cost

The runtime addition is one final O(N x D) mean-pool on final assignments inside VBxClustering.cluster(...) (N embeddings, D=192). No performance harness is included in this PR.

Test plan

swift test --filter SpeakerCentroidEmbeddingsTests: 14 tests, 0 failures, 1 skip (testCentroidCosineDistance_sameDiarization skips on bundled single-speaker fixtures; the helper is separately covered by unit tests).
swift test --filter SpeakerKitTests: 109 tests, 0 failures, 1 skip.
git diff --check: clean.

make SpeakerEmbedding and its embedding field public, compute cluster centroid vectors in postProcess() before discarding raw embeddings, and surface them via DiarizationResult.speakerCentroidEmbeddings.

ZachNagengast · 2026-04-20T19:34:11Z

Thanks for the PR, could you please include tests for this in your PR? Particularly interested in any latency overhead this adds as well. Also the link to your downstream consumer may be a private repo, resolves to 404.

a2they

Please add a unit test to make sure the correct cendroid is properly set

- revert public exposure of SpeakerEmbedding; nothing in the PR needs it now that centroid compute lives in the clusterer. - surface speakerCentroids on ClusteringResult. - VBxClustering.cluster(...) returns post-reassignment centroids via one centroidsFromAssignments(assignments: clusters, embeddings: all, k: kFinal) pass run after clusterReassignment(...), unifying all three paths (VBx weighted, kMeans correction, AHC fallback) on a single "mean of the final cluster members" definition. one extra O(N x D) mean-pool on the final assignments. - PyannoteDiarizer.postProcess threads ClusteringResult.speakerCentroids into DiarizationResult; removes the inline mean-pool loop. - DiarizationResult.speakerCentroidEmbeddings is now public private(set) var with doc comments covering embedding space (raw embedder output, unnormalised, pre-PLDA), post-reassignment mean, suggested comparison, and pyannote-only applicability. - add DiarizationResult.centroidCosineDistance(between:_:) delegating to MathOps.cosineDistance(_:_:) so numerics match MathOps.cosineDistanceMatrix used by clusterReassignment (vDSP, clamped to [0, 2]). no Accelerate import in DiarizationResult.swift. - make calculateCentroids / centroidsFromAssignments internal so they can be exercised by @testable tests. - new SpeakerCentroidEmbeddingsTests.swift: 3 unit tests on calculateCentroids (VBx weighted path), 3 on centroidsFromAssignments (kMeans correction + AHC fallback + empty cluster), 5 integration tests on VADAudio/VBxClustering including testCentroidValuesMatchFinalAssignmentMean which pins the surfaced value equals the mean of the final members after reassignment. - new DiarizationPipelinePerformanceTests.swift with XCTClockMetric, preload + warmup outside measure{}. uses only pre-existing public API so it compiles on main for baseline comparison.

leecrossley · 2026-04-23T01:27:05Z

thanks both, addressed + pushed.

re 404: overshow is a private commerical repo. usage:

let diarization = try await speakerKit.diarize(audioArray: audio)
let centroids = diarization.speakerCentroidEmbeddings
return SpeakerMatch(id: speakerId, embedding: centroids[speakerId])

we persist the centroid with each transcribed segment and cosine-match aroujnd chunks to keep speaker ids stable wthout re-running the embedder. https://over.show

re latency: centroid map now surfaces from ClusteringResult instead of being recomputed in postProcess. VBxClustering.cluster(...) returns post-reassignment centroids along all three paths (VBx weighted, kMeans correction, AHC fallback) via one centroidsFromAssignments(assignments: clusters, embeddings: all, k: kFinal) pass after clusterReassignment(...). That's one extra O(N x D) mean-pool on the final assignments (N embeddings, D=192) - small next to the model pipeline, and the measured delta below confirms it's inside run-to-run noise.

VADAudio, XCTClockMetric, 20 iters, preload + warmup outside measure, same machine:

branch	mean (ms)	RSD
main	305.7	0.995%
pr	306.3	0.772%
delta	+0.6 ms (+0.2%)

inside run-to-run noise (delta smaller than either rsd).

re tests: new SpeakerCentroidEmbeddingsTests.swift - 6 unit across both centroid producers, 5 integration (VADAudio + VBxClustering) incl. a post-reassignment centroid-value regression that pins the surfaced value equals the mean of the final cluster members. full make test green on the branch after make download-speakerkit-models.

happy to rebase/squash.

leecrossley · 2026-04-23T01:31:10Z

Please add a unit test to make sure the correct cendroid is properly set

Added. unit coverage on both producers: calculateCentroids (main VBx path) and centroidsFromAssignments (kMeans correction + AHC fallback). integration testCentroidKeysSurviveClusterReassignment pins that every speaker id visible in final segments has a centroid after reassignment and testCentroidValuesMatchFinalAssignmentMean pins the centroid VALUE against the mean of final members - so "correct centroid is properly set" holds post-reassignment, not just pre-

leecrossley · 2026-04-24T19:19:57Z

Thanks again for the careful review - my project (Overshow) is a private desktop app, in the Swift helper we run SpeakerKit locally alongside WhisperKit. Per chunk: one diarise pass, each transcript segment gets the best timeoverlap speaker an corresponding centroid from speakerCentroidEmbeddings is persisted alongside the segment. Cosine distance on those centroids is used downstream (outside this helper) for cross chunk and cross session speaker reuse. We need to ensure that the returned centroid matches the final post-reassignment speakerId.

I pushed the last review pass as one batch: dropped the unrelated perf test from the diff, added public init parity, renamed k to clusterCount, hoisted the repeated embedding map, tightened the comments/tests, clarified distance semantics w/out inventing a same speaker cutoff and added nearestSpeakerCentroid(to:) as the threshold free lookup helper.

I also updated the PR body / resolved the addressed review threads after replying where useful, I'm happy to adjust anythign else / squash if helpful

a2they

Thanks for addressing the previous PR feedback. Last few notes on the changes.

a2they · 2026-04-24T21:51:34Z

+    ///   ``speakerCentroidEmbeddings``, the centroids have different dimensions, or either
+    ///   vector is empty. Zero-magnitude centroids (unreachable in real diarization runs)
+    ///   yield `MathOps.cosineDistance`'s sentinel of `1.0`.
+    public func centroidCosineDistance(between a: Int, _ b: Int) -> Float? {


nit: the unlabeled second parameter reads awkwardly at call sites (i.e. centroidCosineDistance(between: 0, 1)). Suggest renaming to between:and: (public surface easier to fix now)

Suggested change

public func centroidCosineDistance(between a: Int, _ b: Int) -> Float? {

public func centroidCosineDistance(between a: Int, and b: Int) -> Float? {

a2they · 2026-04-24T21:53:21Z

+    ///
+    /// - Returns: The nearest compatible centroid, or `nil` when `embedding` is empty, no
+    ///   centroid exists, or all stored centroids have different dimensions.
+    public func nearestSpeakerCentroid(to embedding: [Float]) -> (speakerId: Int, distance: Float)? {


tie-breaking here depends on [Int: [Float]] iteration order, which isn't defined. Two centroids at the same distance can return a different speakerId between runs. For a cross-session matching helper this should be deterministic. Suggest iteratingspeakerCentroidEmbeddings.keys.sorted() and documenting that ties resolve to the lowest speakerId.

a2they · 2026-04-24T22:09:24Z

+    ///
+    /// This field is populated by the Pyannote backend (`PyannoteDiarizer`). Other backends
+    /// conforming to `Diarizer` may leave it as `[:]` if they do not expose per-cluster centroids.
+    public private(set) var speakerCentroidEmbeddings: [Int: [Float]]


One thing I wanted to flag on the centroid calculation, curious what you think.

The surfaced speakerCentroidEmbeddings is computed over all embeddings, but every centroid used internally in VBxClustering.cluster(...) is computed over the trainable subset only (the nonOverlappedFrameRatio > minActiveRatio filter). All three seed paths (VBx, kMeans, AHC fallback) use embeddingsFloats (trainable). Only the new surfaced centroid uses allEmbeddingsFloats.

So the centroid we return isn't quite the same kind of mean the pipeline itself uses. It folds in the overlap-flagged windows that the embedder is least confident on, which tends to pull the centroid toward the mixed-speaker region of the embedder's output space. Downstream consumers doing cosine matching end up with a noisier reference point than the one clustering already trusted.

Would it be worth adding an option on PyannoteDiarizationOptions, something like centroidSource: .finalAssignment | .trainableOnly, so callers can opt into the trainable-only centroid that matches the pipeline's internal convention? What was your testing like with trainable only vs all embeddings?

expose speaker centroid embeddings in DiarizationResult

8edef1a

make SpeakerEmbedding and its embedding field public, compute cluster centroid vectors in postProcess() before discarding raw embeddings, and surface them via DiarizationResult.speakerCentroidEmbeddings.

a2they requested changes Apr 22, 2026

View reviewed changes

Comment thread Sources/SpeakerKit/Pyannote/PyannoteDiarizer.swift Outdated

Comment thread Sources/SpeakerKit/Pyannote/SpeakerEmbedderModel.swift Outdated

Comment thread Sources/SpeakerKit/DiarizationResult.swift Outdated

leecrossley requested a review from a2they April 23, 2026 01:34

leecrossley mentioned this pull request Apr 23, 2026

Expose per-speaker embeddings in DiarizationResult #457

Open

a2they reviewed Apr 23, 2026

View reviewed changes

address centroid review nits

bb20ed8

leecrossley requested a review from a2they April 24, 2026 19:20

a2they reviewed Apr 24, 2026

View reviewed changes

	public func centroidCosineDistance(between a: Int, _ b: Int) -> Float? {
	public func centroidCosineDistance(between a: Int, and b: Int) -> Float? {

Conversation

leecrossley commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Cost

Test plan

Uh oh!

ZachNagengast commented Apr 20, 2026 • edited by atiorh Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a2they left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leecrossley commented Apr 23, 2026

Uh oh!

leecrossley commented Apr 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leecrossley commented Apr 24, 2026

Uh oh!

a2they left a comment

Choose a reason for hiding this comment

Uh oh!

a2they Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

a2they Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

a2they Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leecrossley commented Apr 19, 2026 •

edited

Loading

ZachNagengast commented Apr 20, 2026 •

edited by atiorh

Loading