Abstract

Objective: Large language models (LLMs) must effectively communicate their uncertainty to be viable in clinical settings. As such, the need for reliable uncertainty estimation grows increasingly urgent with the expanding use of LLMs for information extraction from electronic health records. Previous token-level uncertainty estimators have only used token probabilities within a single output sequence. Here, by leveraging the constraints of JSON output structure, we instead consider all likely sequences and their respective probabilities to obtain a more robust measure of model confidence. We develop Certainty Retrieval Token Search (CeRTS), a new uncertainty estimator for structured information extraction. Methods: We evaluated CeRTS against a previous gold-standard uncertainty estimator when extracting clinical features from lung cancer discharge summaries across eight open-source LLMs. Calibration (Brier score) and discrimination (AUROC) were used to quantify performance. Results: CeRTS surpassed the previous gold-standard estimator in discriminatory power across every model and achieved better calibration in most cases. CeRTS had the strongest agreement between model confidence and accuracy with Qwen-2.5. Conclusion: CeRTS enhances LLM-based information extraction from unstructured clinical text by assigning well-calibrated confidence scores to each extracted item, providing medical researchers with a quantitative measure of reliability at minimal additional cost. Although its performance was generally robust, CeRTS struggled with DeepSeek-R1, which we attribute to the model's Chain-of-Thought reasoning steps. Our evaluation focused on clinical data, but CeRTS can be applied to any domain requiring reliable uncertainty estimation.

Original languageEnglish
Article number104900
JournalJournal of Biomedical Informatics
Volume170
DOIs
StatePublished - Oct 2025

Keywords

  • Clinical information extraction
  • Large language models
  • Uncertainty estimators

Fingerprint

Dive into the research topics of 'CeRTS: certainty retrieval token search in large language model clinical information extraction'. Together they form a unique fingerprint.

Cite this