Analyzing microarray data using cluster analysis

William Shannon, Robert Culverhouse, Jill Duncan

Research output: Contribution to journalReview articlepeer-review

134 Scopus citations


As pharmacogenetics researchers gather more detailed and complex data on gene polymorphisms that effect drug metabolizing enzymes, drug target receptors and drug transporters, they will need access to advanced statistical tools to mine that data. These tools include approaches from classical biostatistics, such as logistic regression or linear discriminant analysis, and supervised learning methods from computer science, such as support vector machines and artificial neural networks. In this review, we present an overview of another class of models, cluster analysis, which will likely be less familiar to pharmacogenetics researchers. Cluster analysis is used to analyze data that is not a priori known to contain any specific subgroups. The goal is to use the data itself to identify meaningful or informative subgroups. Specifically, we will focus on demonstrating the use of distance-based methods of hierarchical clustering to analyze gene expression data.

Original languageEnglish
Pages (from-to)41-52
Number of pages12
Issue number1
StatePublished - Jan 1 2003


  • Consensus methods
  • Distance calculations
  • Heat maps
  • Hierarchical clustering
  • K-means clustering
  • Mantel statistics
  • Microarrays
  • Unsupervised learning


Dive into the research topics of 'Analyzing microarray data using cluster analysis'. Together they form a unique fingerprint.

Cite this