Monday, 31 October 2005 - 1:30 PM
58d

Hierarchical K-Means Clustering Using Principal Components to Solve the Unsupervised Multi-Class Classification Problem

Syed B. Mohiddin1, James Rathman1, and Chihae Yang2. (1) The Ohio State University, Koffolt Lab, 140 W19th Avenue, Columbus, OH 43210, (2) Leadscope Inc., 1393 Dublin Rd., Columbus, OH 43215

Current clustering techniques can be grouped as either supervised or unsupervised. In a supervised method, each observation in the training dataset is pre-assigned to a class based on prior knowledge, while an unsupervised method uses no prior knowledge of class distinction. Numerous supervised techniques have been demonstrated to work well for binary classification and a few of these are reasonably good at making supervised multi-class predictions. However, techniques for unsupervised binary and multi-class predictions have not been fully developed. In this work, we present an analysis technique based on hierarchical K-means using differentially weighted principal component analysis to address unsupervised classification for both binary and multi-class problems. Application of this methodology to biological datasets (e.g., microarray gene expression data) has already been demonstrated and is extended to chemical datasets in this work with the objectives of predicting class membership and identifying non-redundant features most responsible for differentiating the observed classes.

See more of #58 - Data Analysis: Design, Algorithms & Applications (10C06)
See more of Computing and Systems Technology Division

See more of The 2005 Annual Meeting (Cincinnati, OH)