TY - JOUR
T1 - Comparison of rule- and large language model-based phenotype extraction from clinical notes for neurofibromatosis type 1
AU - Kaster, Levi
AU - Hillis, Ethan
AU - Oh, Inez Y.
AU - Cordell, Elizabeth C.
AU - Foraker, Randi E.
AU - Lai, Albert M.
AU - Morris, Stephanie M.
AU - Gutmann, David H.
AU - Payne, Philip R.O.
AU - Gupta, Aditi
N1 - Publisher Copyright:
© 2025 The Author(s). Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2025/11/1
Y1 - 2025/11/1
N2 - Introduction Neurofibromatosis type 1 (NF1) is a rare genetic disorder affecting multiple organ systems with significant clinical heterogeneity. Managing individuals with NF1 is challenging due to variability in disease progression and outcomes and limited early risk assessment tools. Objective This study aims to develop an effective, generalizable, user-friendly clinical entity extraction pipeline for identifying NF1-related phenotypes from unstructured clinical notes to enhance research and risk-modeling efforts. We compare the benefits of rule-based natural language processing (NLP) vs large language models (LLMs) for this purpose. Materials and Methods Four phenotype extraction pipelines (3 LLM-based vs 1 rule-based) were developed to automatically extract selected NF1-relevant phenotypes. Subject matter experts manually reviewed clinical notes, generating a gold-standard annotation dataset for evaluation. In Phase 1, notes authored by a single NF1 physician were used to guide pipeline development and refinement. In Phase 2, notes from a second NF1 physician were used to assess pipeline generalizability, followed by further refinement to accommodate differences in physician terminology. Results With refinement, the rule-based model had higher distributions of F1 scores than the LLMs in both Phase 1 and Phase 2. However, the LLMs demonstrated better generalizability between physicians without refinement, showing lesser performance decreases (4.4%-5.1%) when transitioning from Phase 1 to Phase 2 without refinement, compared to an 8.8% decrease for the rule-based model. Conclusion We highlight trade-offs between the effectiveness of rule-based NLP vs generalizability and ease of implementation of LLMs for clinical entity extraction, with implications for pipeline portability across providers and institutions.
AB - Introduction Neurofibromatosis type 1 (NF1) is a rare genetic disorder affecting multiple organ systems with significant clinical heterogeneity. Managing individuals with NF1 is challenging due to variability in disease progression and outcomes and limited early risk assessment tools. Objective This study aims to develop an effective, generalizable, user-friendly clinical entity extraction pipeline for identifying NF1-related phenotypes from unstructured clinical notes to enhance research and risk-modeling efforts. We compare the benefits of rule-based natural language processing (NLP) vs large language models (LLMs) for this purpose. Materials and Methods Four phenotype extraction pipelines (3 LLM-based vs 1 rule-based) were developed to automatically extract selected NF1-relevant phenotypes. Subject matter experts manually reviewed clinical notes, generating a gold-standard annotation dataset for evaluation. In Phase 1, notes authored by a single NF1 physician were used to guide pipeline development and refinement. In Phase 2, notes from a second NF1 physician were used to assess pipeline generalizability, followed by further refinement to accommodate differences in physician terminology. Results With refinement, the rule-based model had higher distributions of F1 scores than the LLMs in both Phase 1 and Phase 2. However, the LLMs demonstrated better generalizability between physicians without refinement, showing lesser performance decreases (4.4%-5.1%) when transitioning from Phase 1 to Phase 2 without refinement, compared to an 8.8% decrease for the rule-based model. Conclusion We highlight trade-offs between the effectiveness of rule-based NLP vs generalizability and ease of implementation of LLMs for clinical entity extraction, with implications for pipeline portability across providers and institutions.
KW - information extraction
KW - large language models
KW - natural language processing
KW - neurofibromatosis type 1
KW - phenotyping
UR - https://www.scopus.com/pages/publications/105022135733
U2 - 10.1093/jamia/ocaf155
DO - 10.1093/jamia/ocaf155
M3 - Article
C2 - 40966762
AN - SCOPUS:105022135733
SN - 1067-5027
VL - 32
SP - 1663
EP - 1673
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 11
ER -