TY - JOUR
T1 - Rules-based natural language processing to extract features of large vessel occlusion and cerebral edema from radiology reports in stroke patients
AU - Siddiqui, Zohair
AU - Bhatia, Kunal
AU - Corbin, Aaron
AU - Dhar, Rajat
N1 - Publisher Copyright:
© 2023 The Author(s)
PY - 2023/6
Y1 - 2023/6
N2 - Background: Large vessel occlusion (LVO) stroke research is limited regarding high-risk patient groups for complications including cerebral edema. Large, well-phenotyped cohorts hold potential insights, but identifying cohorts and manually extracting outcomes is impractical. Natural language processing (NLP) software has previously extracted stroke characteristics from radiology reports, but there has not been an integrated extraction of both LVO classification and acute stroke outcomes. Methods: We constructed a rules-based NLP pipeline that extracted presence/location of arterial occlusion and core/penumbral volumes from multimodal CT reports, along with presence of edema and midline shift on follow-up CTs. The algorithm flagged inconsistent reports for manual adjudication. We validated performance over two cohorts and analyzed the associations between NLP-extracted variables and clinical edema outcomes. Results: The algorithm identified occlusions in the development (n=577) and test cohorts (n=442) with 94% and 85% recall, increasing to 97% and 93% after review of flagged reports. It could distinguish proximal ICA/M1 from distal occlusions with 96% recall and correctly extracted 98% of core/penumbral volumes. NLP recall was 93% and 86% for identifying edema and midline shift from follow-up reports of 213 patients with ICA/MCA occlusions. NLP-extracted radiographic edema captured 89% of those who developed clinical cerebral edema, which was more likely in those with NLP-identified proximal vs distal occlusions and associated with significantly higher core/penumbral volumes. Conclusion: A rules-based NLP pipeline can accurately identify and phenotype an LVO cohort, yielding clinical associations with stroke research implications.
AB - Background: Large vessel occlusion (LVO) stroke research is limited regarding high-risk patient groups for complications including cerebral edema. Large, well-phenotyped cohorts hold potential insights, but identifying cohorts and manually extracting outcomes is impractical. Natural language processing (NLP) software has previously extracted stroke characteristics from radiology reports, but there has not been an integrated extraction of both LVO classification and acute stroke outcomes. Methods: We constructed a rules-based NLP pipeline that extracted presence/location of arterial occlusion and core/penumbral volumes from multimodal CT reports, along with presence of edema and midline shift on follow-up CTs. The algorithm flagged inconsistent reports for manual adjudication. We validated performance over two cohorts and analyzed the associations between NLP-extracted variables and clinical edema outcomes. Results: The algorithm identified occlusions in the development (n=577) and test cohorts (n=442) with 94% and 85% recall, increasing to 97% and 93% after review of flagged reports. It could distinguish proximal ICA/M1 from distal occlusions with 96% recall and correctly extracted 98% of core/penumbral volumes. NLP recall was 93% and 86% for identifying edema and midline shift from follow-up reports of 213 patients with ICA/MCA occlusions. NLP-extracted radiographic edema captured 89% of those who developed clinical cerebral edema, which was more likely in those with NLP-identified proximal vs distal occlusions and associated with significantly higher core/penumbral volumes. Conclusion: A rules-based NLP pipeline can accurately identify and phenotype an LVO cohort, yielding clinical associations with stroke research implications.
KW - Acute stroke
KW - Cerebral edema
KW - Large vessel occlusion
KW - Midline shift
KW - Natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85191983842&partnerID=8YFLogxK
U2 - 10.1016/j.neuri.2023.100129
DO - 10.1016/j.neuri.2023.100129
M3 - Review article
AN - SCOPUS:85191983842
SN - 2772-5286
VL - 3
JO - Neuroscience Informatics
JF - Neuroscience Informatics
IS - 2
M1 - 100129
ER -