Stream data clustering based on grid density and attraction

  • Li Tu
  • , Yixin Chen

Research output: Contribution to journalArticlepeer-review

120 Scopus citations

Abstract

Clustering real-time stream data is an important and challenging problem. Existing algorithms such as CluStream are based on the k-means algorithm. These clustering algorithms have difficulties finding clusters of arbitrary shapes and handling outliers. Further, they require the knowledge of k and user-specified time window. To address these issues, this article proposes D-Stream, a framework for clustering stream data using a density-based approach. Our algorithm uses an online component that maps each input data record into a grid and an offline component that computes the grid density and clusters the grids based on the density. The algorithm adopts a density decaying technique to capture the dynamic changes of a data stream and a attraction-based mechanism to accurately generate cluster boundaries. Exploiting the intricate relationships among the decay factor, attraction, data density, and cluster structure, our algorithm can efficiently and effectively generate and adjust the clusters in real time. Further, a theoretically sound technique is developed to detect and remove sporadic grids mapped by outliers in order to dramatically improve the space and time efficiency of the system. The technique makes high-speed data stream clustering feasible without degrading the clustering quality. The experimental results show that our algorithm has superior quality and efficiency, can find clusters of arbitrary shapes, and can accurately recognize the evolving behaviors of real-time data streams.

Original languageEnglish
Article number12
JournalACM Transactions on Knowledge Discovery from Data
Volume3
Issue number3
DOIs
StatePublished - Jul 1 2009

Keywords

  • Clustering
  • Data mining
  • Density-based algorithms
  • Stream data

Fingerprint

Dive into the research topics of 'Stream data clustering based on grid density and attraction'. Together they form a unique fingerprint.

Cite this