TY - JOUR
T1 - A Nested Two-Stage Clustering Method for Structured Temporal Sequence Data
AU - Wang, Liang
AU - Narayanan, Vignesh
AU - Yu, Yao Chi
AU - Park, Yikyung
AU - Li, Jr Shin
N1 - Funding Information:
This work was supported in part by the National Science Foundation under the Awards ECCS-1509342, CMMI-1763070, and CMMI-1933976, and by the NIH Grant R01CA226937A1.
Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
PY - 2021/7
Y1 - 2021/7
N2 - Mining patterns of temporal sequence data is an important problem across many disciplines. Under appropriate preprocessing procedures, a structured temporal sequence can be organized into a probability measure or a time series representation, which grants a potential to reveal distinctive temporal pattern characteristics. In this paper, we propose a nested two-stage clustering method that integrates optimal transport and the dynamic time warping distances to learn the distributional and dynamic shape-based dissimilarity at the respective stage. The proposed clustering algorithm preserves both the distribution and shape patterns present in the data, which are critical for the datasets composed of structured temporal sequences. The effectiveness of the method is tested against existing agglomerative and K-shape-based clustering algorithms on Monte Carlo simulated synthetic datasets, and the performance is compared through various cluster validation metrics. Furthermore, we apply the developed method to real-world datasets from three domains: temporal dietary records, online retail sales, and smart meter energy profiles. The expressiveness of the cluster and subcluster centroid patterns shows significant promise of our method for structured temporal sequence data mining.
AB - Mining patterns of temporal sequence data is an important problem across many disciplines. Under appropriate preprocessing procedures, a structured temporal sequence can be organized into a probability measure or a time series representation, which grants a potential to reveal distinctive temporal pattern characteristics. In this paper, we propose a nested two-stage clustering method that integrates optimal transport and the dynamic time warping distances to learn the distributional and dynamic shape-based dissimilarity at the respective stage. The proposed clustering algorithm preserves both the distribution and shape patterns present in the data, which are critical for the datasets composed of structured temporal sequences. The effectiveness of the method is tested against existing agglomerative and K-shape-based clustering algorithms on Monte Carlo simulated synthetic datasets, and the performance is compared through various cluster validation metrics. Furthermore, we apply the developed method to real-world datasets from three domains: temporal dietary records, online retail sales, and smart meter energy profiles. The expressiveness of the cluster and subcluster centroid patterns shows significant promise of our method for structured temporal sequence data mining.
KW - Clustering
KW - Dynamic time warping
KW - Optimal transport
KW - Structured temporal sequence
UR - http://www.scopus.com/inward/record.url?scp=85107298563&partnerID=8YFLogxK
U2 - 10.1007/s10115-021-01578-0
DO - 10.1007/s10115-021-01578-0
M3 - Article
AN - SCOPUS:85107298563
SN - 0219-1377
VL - 63
SP - 1627
EP - 1662
JO - Knowledge and Information Systems
JF - Knowledge and Information Systems
IS - 7
ER -