You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When k-means++ initialisation selects data points as initial centroids,
points at those locations have upper_bound=0 in Hamerly's algorithm,
causing them to be incorrectly pruned from reassignment checks. This
could cause the algorithm to declare convergence on the first iteration
without ever computing true cluster centroids.
This fix updates centroids to be cluster means immediately after the initial
assignment, before entering the main convergence loop. This ensures
Hamerly bounds are computed against true centroids rather than the
k-means++ selected data points.
Document k-means local minima and use fixed seed in test
- Add documentation explaining that k-means converges to local minima and
may produce suboptimal results depending on initialisation
- Update test_kmeans_three_clusters to use a fixed seed (42) for
deterministic testing which should avoid intermittent failures from unlucky
k-means++ initialisation
Signed-off-by: Stephan Hügel <[email protected]>
0 commit comments