k-means++: The advantages of careful seeding

Contributions

New initialization method to make k-means more accurate and fast.

Algorithm (Initialization of k-means)

Choose an initial center uniformly at random from dataset X.
choose the next center c = x' with probability, D(x) is the distance bewteen x and the closest center that we've already choosen.
Repeat Step 1 until we choose a total of k centers.
Continue the standard k-means algorithm.

TL;DR

The above initialization algorithm makes intuitive sense. By choosing the point that is far from its assigned center to be the next center, one can get k centers that are distant from each other. This way kmeans will converge faster and more likely to stop at a better minima.
sklearn has a good implementation of this algorithm.

Reference

Arthur D, Vassilvitskii S. k-means++: The advantages of careful seeding[C]//Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2007: 1027-1035.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k-means++: The advantages of careful seeding

Contributions

Algorithm (Initialization of k-means)

TL;DR

Reference

FilesExpand file tree

k-means++.md

Latest commit

History

k-means++.md

File metadata and controls

k-means++: The advantages of careful seeding

Contributions

Algorithm (Initialization of k-means)

TL;DR

Reference