TY - JOUR
T1 - Extraction of clinical phenotypes for Alzheimer's disease dementia from clinical notes using natural language processing
AU - Oh, Inez Y.
AU - Schindler, Suzanne E.
AU - Ghoshal, Nupur
AU - Lai, Albert M.
AU - Payne, Philip R.O.
AU - Gupta, Aditi
N1 - Publisher Copyright:
© 2023 The Author(s). Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2023/4/1
Y1 - 2023/4/1
N2 - Objectives: There is much interest in utilizing clinical data for developing prediction models for Alzheimer's disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR. Materials and Methods: We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by 2 clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings. Results: Documentation rates for each phenotype varied in the structured versus unstructured EHR. Interannotator agreement was high (Cohen's kappa = 0.72-1) and positively correlated with the NLP-based phenotype extraction pipeline's performance (average F1-score = 0.65-0.99) for each phenotype. Discussion: We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success. Conclusion: Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability.
AB - Objectives: There is much interest in utilizing clinical data for developing prediction models for Alzheimer's disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR. Materials and Methods: We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by 2 clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings. Results: Documentation rates for each phenotype varied in the structured versus unstructured EHR. Interannotator agreement was high (Cohen's kappa = 0.72-1) and positively correlated with the NLP-based phenotype extraction pipeline's performance (average F1-score = 0.65-0.99) for each phenotype. Discussion: We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success. Conclusion: Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability.
KW - Alzheimer's disease
KW - electronic health records
KW - information retrieval
KW - natural language processing
KW - routinely collected health data
UR - http://www.scopus.com/inward/record.url?scp=85160635676&partnerID=8YFLogxK
U2 - 10.1093/jamiaopen/ooad014
DO - 10.1093/jamiaopen/ooad014
M3 - Article
C2 - 36844369
AN - SCOPUS:85160635676
SN - 2574-2531
VL - 6
JO - JAMIA Open
JF - JAMIA Open
IS - 1
M1 - ooad014
ER -