TY - JOUR
T1 - Discriminative Few Shot Learning of Facial Dynamics in Interview Videos for Autism Trait Classification
AU - Zhang, Na
AU - Ruan, Mindi
AU - Wang, Shuo
AU - Paul, Lynn
AU - Li, Xin
N1 - Publisher Copyright:
© 2010-2012 IEEE.
PY - 2023/4/1
Y1 - 2023/4/1
N2 - Autism is a prevalent neurodevelopmental disorder characterized by impairments in social and communicative behaviors. Possible connections between autism and facial expression recognition have recently been studied in the literature. However, most works are based on facial images or short videos. Few works aim at Autism Diagnostic Observation Schedule (ADOS) videos due to their complexity (e.g., interaction between interviewer and interviewee) and length (e.g., usually last for hours). In this paper, we attempt to fill this gap by developing a novel discriminative few shot learning method to analyze hour-long video data and exploring the fusion of facial dynamics for the trait classification of ASD. Leveraging well-established computer vision tools from spatio-temporal feature extraction and marginal fisher analysis to few-shot learning and scene-level fusion, we have constructed a three-category system to classify an individual into Autism, Autism Spectrum, and Non-Spectrum. For the first time, we have shown that certain interview scenes carry more discriminative information for ASD trait classification than others. Experimental results are reported to demonstrate the potential of the proposed automatic ASD trait classification system (achieving 91.72% accuracy on the Caltech ADOS video dataset) and the benefits of few-shot learning and scene-level fusion strategy by extensive ablation studies.
AB - Autism is a prevalent neurodevelopmental disorder characterized by impairments in social and communicative behaviors. Possible connections between autism and facial expression recognition have recently been studied in the literature. However, most works are based on facial images or short videos. Few works aim at Autism Diagnostic Observation Schedule (ADOS) videos due to their complexity (e.g., interaction between interviewer and interviewee) and length (e.g., usually last for hours). In this paper, we attempt to fill this gap by developing a novel discriminative few shot learning method to analyze hour-long video data and exploring the fusion of facial dynamics for the trait classification of ASD. Leveraging well-established computer vision tools from spatio-temporal feature extraction and marginal fisher analysis to few-shot learning and scene-level fusion, we have constructed a three-category system to classify an individual into Autism, Autism Spectrum, and Non-Spectrum. For the first time, we have shown that certain interview scenes carry more discriminative information for ASD trait classification than others. Experimental results are reported to demonstrate the potential of the proposed automatic ASD trait classification system (achieving 91.72% accuracy on the Caltech ADOS video dataset) and the benefits of few-shot learning and scene-level fusion strategy by extensive ablation studies.
KW - Autism Spectrum Disorder (ASD)
KW - autism trait classification
KW - facial dynamic features
KW - few-shot learning (FSL)
KW - marginal fisher analysis (MFA)
KW - scene-level fusion
UR - http://www.scopus.com/inward/record.url?scp=85131737020&partnerID=8YFLogxK
U2 - 10.1109/TAFFC.2022.3178946
DO - 10.1109/TAFFC.2022.3178946
M3 - Article
AN - SCOPUS:85131737020
SN - 1949-3045
VL - 14
SP - 1110
EP - 1124
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
IS - 2
ER -