Graduation Year
2013
Document Type
Open Access Senior Thesis
Degree Name
Bachelor of Arts
Department
Mathematics
Reader 1
Deanna Needell
Reader 2
Winston Ou
Terms of Use & License Information
Rights Information
© 2013 Morgan Mayer-Jochimsen
Abstract
Clustering is a mathematical method of data analysis which identifies trends in data by efficiently separating data into a specified number of clusters so is incredibly useful and widely applicable for questions of interrelatedness of data. Two methods of clustering are considered here. K-means clustering defines clusters in relation to the centroid, or center, of a cluster. Spectral clustering establishes connections between all of the data points to be clustered, then eliminates those connections that link dissimilar points. This is represented as an eigenvector problem where the solution is given by the eigenvectors of the Normalized Graph Laplacian. Spectral clustering establishes groups so that the similarity between points of the same cluster is stronger than similarity between different clusters. K-means and spectral clustering are used to analyze adolescent data from the 2009 California Health Interview Survey. Differences were observed between the results of the clustering methods on 3294 individuals and 22 health-related attributes. K-means clustered the adolescents by exercise, poverty, and variables related to psychological health while spectral clustering groups were informed by smoking, alcohol use, low exercise, psychological distress, low parental involvement, and poverty. We posit some guesses as to this difference, observe characteristics of the clustering methods, and comment on the viability of spectral clustering on healthcare data.
Recommended Citation
Mayer-Jochimsen, Morgan, "Clustering Methods and Their Applications to Adolescent Healthcare Data" (2013). Scripps Senior Theses. 297.
https://scholarship.claremont.edu/scripps_theses/297