Take a walk and cluster genes: A TSP-based approach to optimal rearrangement clustering

Sharlee Climer, Weixiong Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

Cluster analysis is a fundamental problem and technique in many areas related to machine learning. In this paper, we consider rearrangement clustering, which is the problem of finding sets of objects that share common or similar features by arranging the rows (objects) of a matrix (specifying object features) in such a way that adjacent objects are similar to each other (based on a similarity measure of the features) so as to maximize the overall similarity. Based on formulating this problem as the Traveling Salesman Problem (TSP), we develop a new TSP-based optimal clustering algorithm called TSPCluster. We overcome a flaw that is inherent in previous approaches by relaxing restrictions on dissimilarities between clusters. Our new algorithm has three important features: finding the optimal k clusters for a given k, automatically detecting cluster borders, and ascertaining a set of most viable clustering results that make good balances among maximizing the overall similarity within clusters and dissimilarity between clusters. We apply TSPCluster to cluster and display ∼500 genes of flowering plant Arabidopsis which are regulated under various abiotic stress conditions. We compare TSPCluster to the bond energy algorithm and two existing clustering algorithms. Our TSPCluster code is available at (Climer & Zhang, 2004).

Original languageEnglish
Title of host publicationProceedings, Twenty-First International Conference on Machine Learning, ICML 2004
EditorsR. Greiner, D. Schuurmans
Pages169-176
Number of pages8
StatePublished - 2004
EventProceedings, Twenty-First International Conference on Machine Learning, ICML 2004 - Banff, Alta, Canada
Duration: Jul 4 2004Jul 8 2004

Publication series

NameProceedings, Twenty-First International Conference on Machine Learning, ICML 2004

Conference

ConferenceProceedings, Twenty-First International Conference on Machine Learning, ICML 2004
Country/TerritoryCanada
CityBanff, Alta
Period07/4/0407/8/04

Fingerprint

Dive into the research topics of 'Take a walk and cluster genes: A TSP-based approach to optimal rearrangement clustering'. Together they form a unique fingerprint.

Cite this