A Nearly Tight Analysis of Greedy k-means++
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Dokumenter
- Fulltext
Indsendt manuskript, 870 KB, PDF-dokument
The famous k-means++ algorithm of Arthur and Vassilvitskii [SODA 2007] is the most popular way of solving the k-means problem in practice. The algorithm is very simple: it samples the first center uniformly at random and each of the following k — 1 centers is then always sampled proportional to its squared distance to the closest center so far. Afterward, Lloyd's iterative algorithm is run. The k-means++ algorithm is known to return Θ(log k) approximate solution in expectation.
In their seminal work, Arthur and Vassilvitskii [SODA 2007] asked about the guarantees for its following greedy variant: in every step, we sample ℓ candidate centers instead of one and then pick the one that minimizes the new cost. This is also how k-means++ is implemented in e.g. the popular Scikit-learn library [Pedregosa et al.; JMLR 2011].
We present nearly matching lower and upper bounds for the greedy k-means++: We prove that it is an O(ℓ3 log3 k)-approximation algorithm. On the other hand, we prove a lower bound of Ω(ℓ3 log3 k/ log2 (ℓ log k)). Previously, only an Ω(ℓ log k) lower bound was known [Bhattacharya, Eube, Röglin, Schmidt; ESA 2020] and there was no known upper bound.
In their seminal work, Arthur and Vassilvitskii [SODA 2007] asked about the guarantees for its following greedy variant: in every step, we sample ℓ candidate centers instead of one and then pick the one that minimizes the new cost. This is also how k-means++ is implemented in e.g. the popular Scikit-learn library [Pedregosa et al.; JMLR 2011].
We present nearly matching lower and upper bounds for the greedy k-means++: We prove that it is an O(ℓ3 log3 k)-approximation algorithm. On the other hand, we prove a lower bound of Ω(ℓ3 log3 k/ log2 (ℓ log k)). Previously, only an Ω(ℓ log k) lower bound was known [Bhattacharya, Eube, Röglin, Schmidt; ESA 2020] and there was no known upper bound.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) |
Redaktører | Nikhil Bansal, Viswanath Nagarajan |
Forlag | Society for Industrial and Applied Mathematics |
Publikationsdato | 2023 |
Sider | 1012-1070 |
ISBN (Elektronisk) | 978-1-61197-755-4 |
DOI | |
Status | Udgivet - 2023 |
Begivenhed | 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA23) - Florence, Italien Varighed: 22 jan. 2023 → 25 jan. 2023 |
Konference
Konference | 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA23) |
---|---|
Land | Italien |
By | Florence |
Periode | 22/01/2023 → 25/01/2023 |
ID: 382690810