Graduation Year
2025
Document Type
Open Access Senior Thesis
Degree Name
Bachelor of Arts
Department
Computer Science
Reader 1
Michael Izbicki
Reader 2
Winston Ou
Abstract
The k-means clustering algorithm is one of the most widely used clustering techniques in data analysis and machine learning, yet its exact computational complexity remains subject to ongoing theoretical investiga- tion. This work establishes the NP-completeness of k-means by proving (1) it is NP-hard and (2) it lies in NP. To demonstrate NP-hardness, we construct a series of polynomial-time reductions from well-known NP-complete problems. Specifically, we reduce 3sat to Vertex Cover, and then reduce Vertex Cover to k-means, thereby establishing the computational hardness of the k-means clustering problem. We then prove k-means is in NP, and thus conclude it is NP-complete. Furthermore, this work discusses the inap- proximability of k-means, showing no efficient algorithm exists for approximating k-means even within small constant factors. These findings underscore the practical limitations of heuristic approaches and reinforce the theoretical intractability of the k-means clustering algorithm.
Recommended Citation
Feinberg, Brooke C., "A Proof of NP-Completeness for the K-Means Clustering Algorithm" (2025). Scripps Senior Theses. 2680.
https://scholarship.claremont.edu/scripps_theses/2680