TY - GEN
T1 - Video-based Contrastive Learning on Decision Trees
T2 - 14th ACM Multimedia Systems Conference, MMSys 2023
AU - Ruan, Mindi
AU - Yu, Xiangxu
AU - Zhang, Na
AU - Hu, Chuanbo
AU - Wang, Shuo
AU - Li, Xin
N1 - Publisher Copyright:
© 2023 Owner/Author(s).
PY - 2023/6/7
Y1 - 2023/6/7
N2 - How can we teach a computer to recognize 10,000 different actions? Deep learning has evolved from supervised and unsupervised to self-supervised approaches. In this paper, we present a new contrastive learning-based framework for decision tree-based classification of actions, including human-human interactions (HHI) and human-object interactions (HOI). The key idea is to translate the original multi-class action recognition into a series of binary classification tasks on a pre-constructed decision tree. Under the new framework of contrastive learning, we present the design of an interaction adjacent matrix (IAM) with skeleton graphs as the backbone for modeling various action-related attributes such as periodicity and symmetry. Through the construction of various pretext tasks, we obtain a series of binary classification nodes on the decision tree that can be combined to support higher-level recognition tasks. Experimental justification for the potential of our approach in real-world applications ranges from interaction recognition to symmetry detection. In particular, we have demonstrated the promising performance of video-based autism spectrum disorder (ASD) diagnosis on the CalTech interview video database.
AB - How can we teach a computer to recognize 10,000 different actions? Deep learning has evolved from supervised and unsupervised to self-supervised approaches. In this paper, we present a new contrastive learning-based framework for decision tree-based classification of actions, including human-human interactions (HHI) and human-object interactions (HOI). The key idea is to translate the original multi-class action recognition into a series of binary classification tasks on a pre-constructed decision tree. Under the new framework of contrastive learning, we present the design of an interaction adjacent matrix (IAM) with skeleton graphs as the backbone for modeling various action-related attributes such as periodicity and symmetry. Through the construction of various pretext tasks, we obtain a series of binary classification nodes on the decision tree that can be combined to support higher-level recognition tasks. Experimental justification for the potential of our approach in real-world applications ranges from interaction recognition to symmetry detection. In particular, we have demonstrated the promising performance of video-based autism spectrum disorder (ASD) diagnosis on the CalTech interview video database.
KW - autism diagnosis
KW - decision trees
KW - graph convolutional network
KW - interaction adjacency matrix
KW - interaction modeling
KW - skeleton graphs
UR - http://www.scopus.com/inward/record.url?scp=85163577239&partnerID=8YFLogxK
U2 - 10.1145/3587819.3590988
DO - 10.1145/3587819.3590988
M3 - Conference contribution
AN - SCOPUS:85163577239
T3 - MMSys 2023 - Proceedings of the 14th ACM Multimedia Systems Conference
SP - 289
EP - 300
BT - MMSys 2023 - Proceedings of the 14th ACM Multimedia Systems Conference
PB - Association for Computing Machinery, Inc
Y2 - 7 June 2023 through 10 June 2023
ER -