Abstract
Clustering is a powerful and commonly used technique that organizes and elucidates the structure of biological data. Clustering data from gene expression, metabolomics and proteomics experiments has proven to be useful at deriving a variety of insights, such as the shared regulation or function of biochemical components within networks. However, experimental measurements of biological processes are subject to substantial noise-stemming from both technical and biological variability-and most clustering algorithms are sensitive to this noise. In this article, we explore several methods of accounting for noise when analyzing biological data sets through clustering. Using a toy data set and two different case studies-gene expression and protein phosphorylation-we demonstrate the sensitivity of clustering algorithms to noise. Several methods of accounting for this noise can be used to establish when clustering results can be trusted. These methods span a range of assumptions about the statistical properties of the noise and can therefore be applied to virtually any biological data source.
| Original language | English |
|---|---|
| Article number | bbs057 |
| Pages (from-to) | 423-436 |
| Number of pages | 14 |
| Journal | Briefings in Bioinformatics |
| Volume | 14 |
| Issue number | 4 |
| DOIs | |
| State | Published - Jul 2013 |
Keywords
- Cluster ensemble
- Clustering
- Measurement variability
- Noise
- Random effects
- Unsupervised learning
Fingerprint
Dive into the research topics of 'Accounting for noise when clustering biological data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver