Abstract

Mining patterns of temporal sequence data is an important problem across many disciplines. Under appropriate preprocessing procedures, a structured temporal sequence can be organized into a probability measure or a time series representation, which grants a potential to reveal distinctive temporal pattern characteristics. In this paper, we propose a nested two-stage clustering method that integrates optimal transport and the dynamic time warping distances to learn the distributional and dynamic shape-based dissimilarity at the respective stage. The proposed clustering algorithm preserves both the distribution and shape patterns present in the data, which are critical for the datasets composed of structured temporal sequences. The effectiveness of the method is tested against existing agglomerative and K-shape-based clustering algorithms on Monte Carlo simulated synthetic datasets, and the performance is compared through various cluster validation metrics. Furthermore, we apply the developed method to real-world datasets from three domains: temporal dietary records, online retail sales, and smart meter energy profiles. The expressiveness of the cluster and subcluster centroid patterns shows significant promise of our method for structured temporal sequence data mining.

Original languageEnglish
Pages (from-to)1627-1662
Number of pages36
JournalKnowledge and Information Systems
Volume63
Issue number7
DOIs
StatePublished - Jul 2021

Keywords

  • Clustering
  • Dynamic time warping
  • Optimal transport
  • Structured temporal sequence

Fingerprint

Dive into the research topics of 'A Nested Two-Stage Clustering Method for Structured Temporal Sequence Data'. Together they form a unique fingerprint.

Cite this