Examining spoken words and acoustic features of therapy sessions to understand family caregivers’ anxiety and quality of life

George Demiris, Debra Parker Oliver, Karla T. Washington, Chad Chadwick, Jeffrey D. Voigt, Sam Brotherton, Mary D. Naylor

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


Background: Speech and language cues are considered significant data sources that can reveal insights into one's behavior and well-being. The goal of this study is to evaluate how different machine learning (ML) classifiers trained both on the spoken word and acoustic features during live conversations between family caregivers and a therapist, correlate to anxiety and quality of life (QoL) as assessed by validated instruments. Methods: The dataset comprised of 124 audio-recorded and professionally transcribed discussions between family caregivers of hospice patients and a therapist, of challenges they faced in their caregiving role, and standardized assessments of self-reported QoL and anxiety. We custom-built and trained an Automated Speech Recognition (ASR) system on older adult voices and created a logistic regression-based classifier that incorporated audio-based features. The classification process automated the QoL scoring and display of the score in real time, replacing hand-coding for self-reported assessments with a machine learning identified classifier. Findings: Of the 124 audio files and their transcripts, 87 of these transcripts (70%) were selected to serve as the training set, holding the remaining 30% of the data for evaluation. For anxiety, the results of adding the dimension of sound and an automated speech-to-text transcription outperformed the prior classifier trained only on human-rendered transcriptions. Specifically, precision improved from 86% to 92%, accuracy from 81% to 89%, and recall from 78% to 88%. Interpretation: Classifiers can be developed through ML techniques which can indicate improvements in QoL measures with a reasonable degree of accuracy. Examining the content, sound of the voice and context of the conversation provides insights into additional factors affecting anxiety and QoL that could be addressed in tailored therapy and the design of conversational agents serving as therapy chatbots.

Original languageEnglish
Article number104716
JournalInternational Journal of Medical Informatics
StatePublished - Apr 2022


  • Caregiver
  • Chatbot
  • Communication
  • Machine learning
  • Quality of life


Dive into the research topics of 'Examining spoken words and acoustic features of therapy sessions to understand family caregivers’ anxiety and quality of life'. Together they form a unique fingerprint.

Cite this