TY - JOUR
T1 - Hey Siri
T2 - How Effective are Common Voice Recognition Systems at Recognizing Dysphonic Voices?
AU - Rohlfing, Matthew L.
AU - Buckley, Daniel P.
AU - Piraquive, Jacquelyn
AU - Stepp, Cara E.
AU - Tracy, Lauren F.
N1 - Funding Information:
This work was supported in part by the National Institutes of Health through grants R01DC015570 ( c.e.s. )
Publisher Copyright:
© 2020 American Laryngological, Rhinological and Otological Society Inc, "The Triological Society" and American Laryngological Association (ALA)
PY - 2021/7
Y1 - 2021/7
N2 - Objectives/Hypothesis: Interaction with voice recognition systems, such as Siri™ and Alexa™, is an increasingly important part of everyday life. Patients with voice disorders may have difficulty with this technology, leading to frustration and reduction in quality of life. This study evaluates the ability of common voice recognition systems to transcribe dysphonic voices. Study Design: Retrospective evaluation of "Rainbow Passage" voice samples from patients with and without voice disorders. Methods: Participants with (n = 30) and without (n = 23) voice disorders were recorded reading the “Rainbow Passage”. Recordings were played at standardized intensity and distance-to-dictation programs on Apple iPhone 6S™, Apple iPhone 11 Pro™, and Google Voice™. Word recognition scores were calculated as the proportion of correctly transcribed words. Word recognition scores were compared to auditory–perceptual and acoustic measures. Results: Mean word recognition scores for participants with and without voice disorders were, respectively, 68.6% and 91.9% for Apple iPhone 6S™ (P <.001), 71.2% and 93.7% for Apple iPhone 11 Pro™ (P <.001), and 68.7% and 93.8% for Google Voice™ (P <.001). There were strong, approximately linear associations between CAPE-V ratings of overall severity of dysphonia and word recognition score, with correlation coefficients (R2) of 0.609 (iPhone 6S™), 0.670 (iPhone 11 Pro™), and 0.619 (Google Voice™). These relationships persisted when controlling for diagnosis, age, gender, fundamental frequency, and speech rate (P <.001 for all systems). Conclusion: Common voice recognition systems function well with nondysphonic voices but are poor at accurately transcribing dysphonic voices. There was a strong negative correlation with word recognition scores and perceptual voice evaluation. As our society increasingly interfaces with automated voice recognition technology, the needs of patients with voice disorders should be considered. Level of Evidence: 4 Laryngoscope, 131:1599–1607, 2021.
AB - Objectives/Hypothesis: Interaction with voice recognition systems, such as Siri™ and Alexa™, is an increasingly important part of everyday life. Patients with voice disorders may have difficulty with this technology, leading to frustration and reduction in quality of life. This study evaluates the ability of common voice recognition systems to transcribe dysphonic voices. Study Design: Retrospective evaluation of "Rainbow Passage" voice samples from patients with and without voice disorders. Methods: Participants with (n = 30) and without (n = 23) voice disorders were recorded reading the “Rainbow Passage”. Recordings were played at standardized intensity and distance-to-dictation programs on Apple iPhone 6S™, Apple iPhone 11 Pro™, and Google Voice™. Word recognition scores were calculated as the proportion of correctly transcribed words. Word recognition scores were compared to auditory–perceptual and acoustic measures. Results: Mean word recognition scores for participants with and without voice disorders were, respectively, 68.6% and 91.9% for Apple iPhone 6S™ (P <.001), 71.2% and 93.7% for Apple iPhone 11 Pro™ (P <.001), and 68.7% and 93.8% for Google Voice™ (P <.001). There were strong, approximately linear associations between CAPE-V ratings of overall severity of dysphonia and word recognition score, with correlation coefficients (R2) of 0.609 (iPhone 6S™), 0.670 (iPhone 11 Pro™), and 0.619 (Google Voice™). These relationships persisted when controlling for diagnosis, age, gender, fundamental frequency, and speech rate (P <.001 for all systems). Conclusion: Common voice recognition systems function well with nondysphonic voices but are poor at accurately transcribing dysphonic voices. There was a strong negative correlation with word recognition scores and perceptual voice evaluation. As our society increasingly interfaces with automated voice recognition technology, the needs of patients with voice disorders should be considered. Level of Evidence: 4 Laryngoscope, 131:1599–1607, 2021.
KW - Dysphonia
KW - hoarseness
KW - mobile phone
KW - technology
KW - voice recognition
UR - http://www.scopus.com/inward/record.url?scp=85091259894&partnerID=8YFLogxK
U2 - 10.1002/lary.29082
DO - 10.1002/lary.29082
M3 - Article
C2 - 32949415
AN - SCOPUS:85091259894
SN - 0023-852X
VL - 131
SP - 1599
EP - 1607
JO - Laryngoscope
JF - Laryngoscope
IS - 7
ER -