TY - JOUR
T1 - CeRTS
T2 - certainty retrieval token search in large language model clinical information extraction
AU - Schimmelpfennig, Lars E.
AU - Bhattarai, Kriti
AU - Oh, Inez Y.
AU - Lever, Jake
AU - Griffith, Obi
AU - Griffith, Malachi
AU - Lai, Albert
AU - Abrams, Zachary B.
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/10
Y1 - 2025/10
N2 - Objective: Large language models (LLMs) must effectively communicate their uncertainty to be viable in clinical settings. As such, the need for reliable uncertainty estimation grows increasingly urgent with the expanding use of LLMs for information extraction from electronic health records. Previous token-level uncertainty estimators have only used token probabilities within a single output sequence. Here, by leveraging the constraints of JSON output structure, we instead consider all likely sequences and their respective probabilities to obtain a more robust measure of model confidence. We develop Certainty Retrieval Token Search (CeRTS), a new uncertainty estimator for structured information extraction. Methods: We evaluated CeRTS against a previous gold-standard uncertainty estimator when extracting clinical features from lung cancer discharge summaries across eight open-source LLMs. Calibration (Brier score) and discrimination (AUROC) were used to quantify performance. Results: CeRTS surpassed the previous gold-standard estimator in discriminatory power across every model and achieved better calibration in most cases. CeRTS had the strongest agreement between model confidence and accuracy with Qwen-2.5. Conclusion: CeRTS enhances LLM-based information extraction from unstructured clinical text by assigning well-calibrated confidence scores to each extracted item, providing medical researchers with a quantitative measure of reliability at minimal additional cost. Although its performance was generally robust, CeRTS struggled with DeepSeek-R1, which we attribute to the model's Chain-of-Thought reasoning steps. Our evaluation focused on clinical data, but CeRTS can be applied to any domain requiring reliable uncertainty estimation.
AB - Objective: Large language models (LLMs) must effectively communicate their uncertainty to be viable in clinical settings. As such, the need for reliable uncertainty estimation grows increasingly urgent with the expanding use of LLMs for information extraction from electronic health records. Previous token-level uncertainty estimators have only used token probabilities within a single output sequence. Here, by leveraging the constraints of JSON output structure, we instead consider all likely sequences and their respective probabilities to obtain a more robust measure of model confidence. We develop Certainty Retrieval Token Search (CeRTS), a new uncertainty estimator for structured information extraction. Methods: We evaluated CeRTS against a previous gold-standard uncertainty estimator when extracting clinical features from lung cancer discharge summaries across eight open-source LLMs. Calibration (Brier score) and discrimination (AUROC) were used to quantify performance. Results: CeRTS surpassed the previous gold-standard estimator in discriminatory power across every model and achieved better calibration in most cases. CeRTS had the strongest agreement between model confidence and accuracy with Qwen-2.5. Conclusion: CeRTS enhances LLM-based information extraction from unstructured clinical text by assigning well-calibrated confidence scores to each extracted item, providing medical researchers with a quantitative measure of reliability at minimal additional cost. Although its performance was generally robust, CeRTS struggled with DeepSeek-R1, which we attribute to the model's Chain-of-Thought reasoning steps. Our evaluation focused on clinical data, but CeRTS can be applied to any domain requiring reliable uncertainty estimation.
KW - Clinical information extraction
KW - Large language models
KW - Uncertainty estimators
UR - https://www.scopus.com/pages/publications/105014918577
U2 - 10.1016/j.jbi.2025.104900
DO - 10.1016/j.jbi.2025.104900
M3 - Article
C2 - 40854447
AN - SCOPUS:105014918577
SN - 1532-0464
VL - 170
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
M1 - 104900
ER -