Fix Kmeans cluster updates issue by CanYing0913 · Pull Request #13 · jokofa/torch_kmeans

CanYing0913 · 2025-03-03T08:38:45Z

As this stackoverflow answer suggested, current groupd_by_label_mean function cannot work with clusters with zero data point assigned to them, causing possibly entire rows of M being 0, which will lead to NaN values when calling F.normalize() and propagate to all centers.
Fixed by creating masks for those empty clusters. Current solution will maintain those centers as the centers before current iteration. We can also set them to 0s if that's more aligned mathematically.

… during iteration

Fix Kmeans cluster updates when any cluster has no datapoint assigned…

9b0990c

… during iteration

CanYing0913 mentioned this pull request Mar 3, 2025

Bad initialization/no zero cluster handling will lead to low utilization and eventually low accuracy [KMeans seems to be broken] #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Kmeans cluster updates issue#13

Fix Kmeans cluster updates issue#13
CanYing0913 wants to merge 1 commit intojokofa:masterfrom
CanYing0913:master

CanYing0913 commented Mar 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CanYing0913 commented Mar 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant