Pomona Faculty Publications and Research

Some Computational Issues in Cluster Analysis with No A Priori Metric

Dan Coleman, Hyseq Inc.
Xiaopeng Dong, Cadence Design Systems, Inc.
Johanna S. Hardin, Pomona CollegeFollow
David M. Rocke, University of California - Davis
David L. Woodruff, University of California - Davis

Document Type

Article

Department

Mathematics (Pomona)

Publication Date

1999

Keywords

Cluster analysis, Data mining, EM algorithm, Mixture models

Abstract

Recent interest in data mining and knowledge discovery underscores the need for methods by which patterns can be discovered in data without any prior knowledge of their existence. In this paper, we explore computational methods of finding clusters of multivariate data points when there is no metric given a priori. We are given a sample, X, of n points in R^p that come from g distinct multivariate normal populations with unknown parameters each of which contributes in excess of p points. Based on the assumption that we are given the number of groups, g, and a computational budget of T seconds of computer time, the paper reviews choices for cluster finding that have been described in the literature and introduces a new method that is a structured combination of two of them. We investigate these algorithms on some real data sets and describe simulation experiments. A principal conclusion is strong support for the contention that a two-stage algorithm based on a combinatorial search followed by the EM algorithm is the best way to find clusters.

Rights Information

Terms of Use & License Information

DOI

10.1016/S0167-9473(99)00009-2

Recommended Citation

Dan Coleman, Xioapeng Dong, Johanna Hardin, David M. Rocke, David L. Woodruff, Some computational issues in cluster analysis with no a priori metric, Computational Statistics & Data Analysis, Volume 31, Issue 1, 28 July 1999, Pages 1-11, ISSN 0167-9473, http://dx.doi.org/10.1016/S0167-9473(99)00009-2. (http://www.sciencedirect.com/science/article/pii/S0167947399000092)

Link to Full Text

Find in your library

COinS

Pomona Faculty Publications and Research

Some Computational Issues in Cluster Analysis with No A Priori Metric

Document Type

Department

Publication Date

Keywords

Abstract

Rights Information

Terms of Use & License Information

DOI

Recommended Citation

Search

Browse

Author Corner

Useful Links

Pomona Faculty Publications and Research

Some Computational Issues in Cluster Analysis with No A Priori Metric

Authors

Document Type

Department

Publication Date

Keywords

Abstract

Rights Information

Terms of Use & License Information

DOI

Recommended Citation

Share

Search

Browse

Author Corner

Useful Links