TY - JOUR
T1 - Leveraging Artificial Intelligence and Synthetic Data Derivatives for Spine Surgery Research
AU - Greenberg, Jacob K.
AU - Landman, Joshua M.
AU - Kelly, Michael P.
AU - Pennicooke, Brenton H.
AU - Molina, Camilo A.
AU - Foraker, Randi E.
AU - Ray, Wilson Z.
N1 - Publisher Copyright:
© The Author(s) 2022.
PY - 2023/10
Y1 - 2023/10
N2 - Study Design: Retrospective cohort study. Objectives: Leveraging electronic health records (EHRs) for spine surgery research is impeded by concerns regarding patient privacy and data ownership. Synthetic data derivatives may help overcome these limitations. This study’s objective was to validate the use of synthetic data for spine surgery research. Methods: Data came from the EHR from 15 hospitals. Patients that underwent anterior cervical or posterior lumbar fusion (2010-2020) were included. Real data were obtained from the EHR. Synthetic data was generated to simulate the properties of the real data, without maintaining a one-to-one correspondence with real patients. Within each cohort, ability to predict 30-day readmissions and 30-day complications was evaluated using logistic regression and extreme gradient boosting machines (XGBoost). Results: We identified 9,072 real and 9,088 synthetic cervical fusion patients. Descriptive characteristics were nearly identical between the 2 datasets. When predicting readmission, models built using real and synthetic data both had c-statistics of.69-.71 using logistic regression and XGBoost. Among 12,111 real and 12,126 synthetic lumbar fusion patients, descriptive characteristics were nearly the same for most variables. Using logistic regression and XGBoost to predict readmission, discrimination was similar with models built using real and synthetic data (c-statistics.66-.69). When predicting complications, models derived using real and synthetic data showed similar discrimination in both cohorts. Despite some differences, the most influential predictors were similar in the real and synthetic datasets. Conclusion: Synthetic data replicate most descriptive and predictive properties of real data, and therefore may expand EHR research in spine surgery.
AB - Study Design: Retrospective cohort study. Objectives: Leveraging electronic health records (EHRs) for spine surgery research is impeded by concerns regarding patient privacy and data ownership. Synthetic data derivatives may help overcome these limitations. This study’s objective was to validate the use of synthetic data for spine surgery research. Methods: Data came from the EHR from 15 hospitals. Patients that underwent anterior cervical or posterior lumbar fusion (2010-2020) were included. Real data were obtained from the EHR. Synthetic data was generated to simulate the properties of the real data, without maintaining a one-to-one correspondence with real patients. Within each cohort, ability to predict 30-day readmissions and 30-day complications was evaluated using logistic regression and extreme gradient boosting machines (XGBoost). Results: We identified 9,072 real and 9,088 synthetic cervical fusion patients. Descriptive characteristics were nearly identical between the 2 datasets. When predicting readmission, models built using real and synthetic data both had c-statistics of.69-.71 using logistic regression and XGBoost. Among 12,111 real and 12,126 synthetic lumbar fusion patients, descriptive characteristics were nearly the same for most variables. Using logistic regression and XGBoost to predict readmission, discrimination was similar with models built using real and synthetic data (c-statistics.66-.69). When predicting complications, models derived using real and synthetic data showed similar discrimination in both cohorts. Despite some differences, the most influential predictors were similar in the real and synthetic datasets. Conclusion: Synthetic data replicate most descriptive and predictive properties of real data, and therefore may expand EHR research in spine surgery.
KW - artificial intelligence
KW - electronic health records
KW - machine learning
KW - medical informatics
KW - spine surgery
KW - synthetic data derivatives
KW - treatment outcome
UR - http://www.scopus.com/inward/record.url?scp=85129138332&partnerID=8YFLogxK
U2 - 10.1177/21925682221085535
DO - 10.1177/21925682221085535
M3 - Article
C2 - 35373623
AN - SCOPUS:85129138332
SN - 2192-5682
VL - 13
SP - 2409
EP - 2421
JO - Global Spine Journal
JF - Global Spine Journal
IS - 8
ER -