TY - JOUR
T1 - Evaluating Generative AI’s Ability to Identify Cancer Subtypes in Publicly Available Structured Genetic Datasets
AU - Hillis, Ethan
AU - Bhattarai, Kriti
AU - Abrams, Zachary
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/10
Y1 - 2024/10
N2 - Background: Genetic data play a crucial role in diagnosing and treating various diseases, reflecting a growing imperative to integrate these data into clinical care. However, significant barriers such as the structure of electronic health records (EHRs), insurance costs for genetic testing, and the interpretability of genetic results impede this integration. Methods: This paper explores solutions to these challenges by combining recent technological advances with informatics and data science, focusing on the diagnostic potential of artificial intelligence (AI) in cancer research. AI has historically been applied in medical research with limited success, but recent developments have led to the emergence of large language models (LLMs). These transformer-based generative AI models, trained on vast datasets, offer significant potential for genetic and genomic analyses. However, their effectiveness is constrained by their training on predominantly human-written text rather than comprehensive, structured genetic datasets. Results: This study reevaluates the capabilities of LLMs, specifically GPT models, in performing supervised prediction tasks using structured gene expression data. By comparing GPT models with traditional machine learning approaches, we assess their effectiveness in predicting cancer subtypes, demonstrating the potential of AI models to analyze real-world genetic data for generating real-world evidence.
AB - Background: Genetic data play a crucial role in diagnosing and treating various diseases, reflecting a growing imperative to integrate these data into clinical care. However, significant barriers such as the structure of electronic health records (EHRs), insurance costs for genetic testing, and the interpretability of genetic results impede this integration. Methods: This paper explores solutions to these challenges by combining recent technological advances with informatics and data science, focusing on the diagnostic potential of artificial intelligence (AI) in cancer research. AI has historically been applied in medical research with limited success, but recent developments have led to the emergence of large language models (LLMs). These transformer-based generative AI models, trained on vast datasets, offer significant potential for genetic and genomic analyses. However, their effectiveness is constrained by their training on predominantly human-written text rather than comprehensive, structured genetic datasets. Results: This study reevaluates the capabilities of LLMs, specifically GPT models, in performing supervised prediction tasks using structured gene expression data. By comparing GPT models with traditional machine learning approaches, we assess their effectiveness in predicting cancer subtypes, demonstrating the potential of AI models to analyze real-world genetic data for generating real-world evidence.
KW - artificial intelligence
KW - gene expression
KW - large language models
KW - machine learning
KW - predictive modeling
UR - http://www.scopus.com/inward/record.url?scp=85207734171&partnerID=8YFLogxK
U2 - 10.3390/jpm14101022
DO - 10.3390/jpm14101022
M3 - Article
C2 - 39452530
AN - SCOPUS:85207734171
SN - 2075-4426
VL - 14
JO - Journal of Personalized Medicine
JF - Journal of Personalized Medicine
IS - 10
M1 - 1022
ER -