TY - JOUR
T1 - Evaluating GPT Models for Automated Literature Screening in Wastewater-Based Epidemiology
AU - Chibwe, Kaseba
AU - Mantilla-Calderon, David
AU - Ling, Fangqiong
N1 - Publisher Copyright:
© 2024 The Authors. Published by American Chemical Society.
PY - 2025/1/15
Y1 - 2025/1/15
N2 - Methods to quantitatively synthesize findings across multiple studies is an emerging need in wastewater-based epidemiology (WBE), where disease tracking through wastewater analysis is performed at broad geographical locations using various techniques to facilitate public health responses. Meta-analysis provides a rigorous statistical procedure for research synthesis, yet the manual process of screening large volumes of literature remains a hurdle for its application in timely evidence-based public health responses. Here, we evaluated the performance of GPT-3, GPT-3.5, and GPT-4 models in automated screening of publications for meta-analysis in the WBE literature. We show that the chat completion model in GPT-4 accurately differentiates papers that contain original data from those that did not with texts of the Abstract as the input at a Precision of 0.96 and Recall of 1.00, exceeding current quality standards for manual screening (Recall = 0.95) while costing less than $0.01 per paper. GPT models performed less accurately in detecting studies reporting relevant sampling location, highlighting the value of maintaining human intervention in AI-assisted literature screening. Importantly, we show that certain formulation and model choices generated nonsensical answers to the screening tasks, while others did not, urging the attention to robustness when employing AI-assisted literature screening. This study provided novel performance evaluation data on GPT models for document screening as a step in meta-analysis, suggesting AI-assisted literature screening a useful complementary technique to speed up research synthesis in WBE.
AB - Methods to quantitatively synthesize findings across multiple studies is an emerging need in wastewater-based epidemiology (WBE), where disease tracking through wastewater analysis is performed at broad geographical locations using various techniques to facilitate public health responses. Meta-analysis provides a rigorous statistical procedure for research synthesis, yet the manual process of screening large volumes of literature remains a hurdle for its application in timely evidence-based public health responses. Here, we evaluated the performance of GPT-3, GPT-3.5, and GPT-4 models in automated screening of publications for meta-analysis in the WBE literature. We show that the chat completion model in GPT-4 accurately differentiates papers that contain original data from those that did not with texts of the Abstract as the input at a Precision of 0.96 and Recall of 1.00, exceeding current quality standards for manual screening (Recall = 0.95) while costing less than $0.01 per paper. GPT models performed less accurately in detecting studies reporting relevant sampling location, highlighting the value of maintaining human intervention in AI-assisted literature screening. Importantly, we show that certain formulation and model choices generated nonsensical answers to the screening tasks, while others did not, urging the attention to robustness when employing AI-assisted literature screening. This study provided novel performance evaluation data on GPT models for document screening as a step in meta-analysis, suggesting AI-assisted literature screening a useful complementary technique to speed up research synthesis in WBE.
KW - GPT-4
KW - fine-tuning
KW - meta-analysis
KW - systematic review
KW - wastewater-based epidemiology
UR - http://www.scopus.com/inward/record.url?scp=85211022024&partnerID=8YFLogxK
U2 - 10.1021/acsenvironau.4c00042
DO - 10.1021/acsenvironau.4c00042
M3 - Article
AN - SCOPUS:85211022024
SN - 2694-2518
VL - 5
SP - 61
EP - 68
JO - ACS Environmental Au
JF - ACS Environmental Au
IS - 1
ER -