Graduation Year

2025

Document Type

Open Access Senior Thesis

Degree Name

Bachelor of Arts

Department

Computer Science

Reader 1

Michael Izbicki

Reader 2

Winston Ou

Abstract

The k-means clustering algorithm is one of the most widely used clustering techniques in data analysis and machine learning, yet its exact computational complexity remains subject to ongoing theoretical investiga- tion. This work establishes the NP-completeness of k-means by proving (1) it is NP-hard and (2) it lies in NP. To demonstrate NP-hardness, we construct a series of polynomial-time reductions from well-known NP-complete problems. Specifically, we reduce 3sat to Vertex Cover, and then reduce Vertex Cover to k-means, thereby establishing the computational hardness of the k-means clustering problem. We then prove k-means is in NP, and thus conclude it is NP-complete. Furthermore, this work discusses the inap- proximability of k-means, showing no efficient algorithm exists for approximating k-means even within small constant factors. These findings underscore the practical limitations of heuristic approaches and reinforce the theoretical intractability of the k-means clustering algorithm.

Share

COinS