TY - JOUR
T1 - Large Language Models to Help Appeal Denied Radiotherapy Services
AU - Kiser, Kendall J.
AU - Waters, Michael
AU - Reckford, Jocelyn
AU - Lundeberg, Christopher
AU - Abraham, Christopher D.
N1 - Publisher Copyright:
© 2024 by American Society of Clinical Oncology.
PY - 2024/9/1
Y1 - 2024/9/1
N2 - PURPOSELarge language model (LLM) artificial intelligences may help physicians appeal insurer denials of prescribed medical services, a task that delays patient care and contributes to burnout. We evaluated LLM performance at this task for denials of radiotherapy services.METHODSWe evaluated generative pretrained transformer 3.5 (GPT-3.5; OpenAI, San Francisco, CA), GPT-4, GPT-4 with internet search functionality (GPT-4web), and GPT-3.5ft. The latter was developed by fine-Tuning GPT-3.5 via an OpenAI application programming interface with 53 examples of appeal letters written by radiation oncologists. Twenty test prompts with simulated patient histories were programmatically presented to the LLMs, and output appeal letters were scored by three blinded radiation oncologists for language representation, clinical detail inclusion, clinical reasoning validity, literature citations, and overall readiness for insurer submission.RESULTSInterobserver agreement between radiation oncologists' scores was moderate or better for all domains (Cohen's kappa coefficients: 0.41-0.91). GPT-3.5, GPT-4, and GPT-4web wrote letters that were on average linguistically clear, summarized provided clinical histories without confabulation, reasoned appropriately, and were scored useful to expedite the insurance appeal process. GPT-4 and GPT-4web letters demonstrated superior clinical reasoning and were readier for submission than GPT-3.5 letters (P <.001). Fine-Tuning increased GPT-3.5ft confabulation and compromised performance compared with other LLMs across all domains (P <.001). All LLMs, including GPT-4web, were poor at supporting clinical assertions with existing, relevant, and appropriately cited primary literature.CONCLUSIONWhen prompted appropriately, three commercially available LLMs drafted letters that physicians deemed would expedite appealing insurer denials of radiotherapy services. LLMs may decrease this task's clerical workload on providers. However, LLM performance worsened when fine-Tuned with a task-specific, small training data set.
AB - PURPOSELarge language model (LLM) artificial intelligences may help physicians appeal insurer denials of prescribed medical services, a task that delays patient care and contributes to burnout. We evaluated LLM performance at this task for denials of radiotherapy services.METHODSWe evaluated generative pretrained transformer 3.5 (GPT-3.5; OpenAI, San Francisco, CA), GPT-4, GPT-4 with internet search functionality (GPT-4web), and GPT-3.5ft. The latter was developed by fine-Tuning GPT-3.5 via an OpenAI application programming interface with 53 examples of appeal letters written by radiation oncologists. Twenty test prompts with simulated patient histories were programmatically presented to the LLMs, and output appeal letters were scored by three blinded radiation oncologists for language representation, clinical detail inclusion, clinical reasoning validity, literature citations, and overall readiness for insurer submission.RESULTSInterobserver agreement between radiation oncologists' scores was moderate or better for all domains (Cohen's kappa coefficients: 0.41-0.91). GPT-3.5, GPT-4, and GPT-4web wrote letters that were on average linguistically clear, summarized provided clinical histories without confabulation, reasoned appropriately, and were scored useful to expedite the insurance appeal process. GPT-4 and GPT-4web letters demonstrated superior clinical reasoning and were readier for submission than GPT-3.5 letters (P <.001). Fine-Tuning increased GPT-3.5ft confabulation and compromised performance compared with other LLMs across all domains (P <.001). All LLMs, including GPT-4web, were poor at supporting clinical assertions with existing, relevant, and appropriately cited primary literature.CONCLUSIONWhen prompted appropriately, three commercially available LLMs drafted letters that physicians deemed would expedite appealing insurer denials of radiotherapy services. LLMs may decrease this task's clerical workload on providers. However, LLM performance worsened when fine-Tuned with a task-specific, small training data set.
UR - http://www.scopus.com/inward/record.url?scp=85203732573&partnerID=8YFLogxK
U2 - 10.1200/CCI.24.00129
DO - 10.1200/CCI.24.00129
M3 - Article
C2 - 39250740
AN - SCOPUS:85203732573
SN - 2473-4276
VL - 8
JO - JCO Clinical Cancer Informatics
JF - JCO Clinical Cancer Informatics
M1 - e2400129
ER -