TY - JOUR
T1 - Machine Learning for Benchmarking Adolescent Idiopathic Scoliosis Surgery Outcomes
AU - Gupta, Aditi
AU - Oh, Inez Y.
AU - Kim, Seunghwan
AU - Marks, Michelle C.
AU - Payne, Philip R.O.
AU - Ames, Christopher P.
AU - Pellise, Ferran
AU - Pahys, Joshua M.
AU - Fletcher, Nicholas D.
AU - Newton, Peter O.
AU - Kelly, Michael P.
N1 - Publisher Copyright:
© 2023 Lippincott Williams and Wilkins. All rights reserved.
PY - 2023/8/15
Y1 - 2023/8/15
N2 - Study Design. Retrospective cohort. Objective. The aim of this study was to design a risk-stratified benchmarking tool for adolescent idiopathic scoliosis (AIS) surgeries. Summary of Background Data. Machine learning (ML) is an emerging method for prediction modeling in orthopedic surgery. Benchmarking is an established method of process improvement and is an area of opportunity for ML methods. Current surgical benchmark tools often use ranks and no "gold standards" for comparisons exist. Materials and Methods. Data from 6076 AIS surgeries were collected from a multicenter registry and divided into three datasets: encompassing surgeries performed (1) during the entire registry, (2) the past 10 years, and (3) during the last 5 years of the registry. We trained three ML regression models (baseline linear regression, gradient boosting, and eXtreme gradient boosted) on each data subset to predict each of the five outcome variables, length of stay (LOS), estimated blood loss (EBL), operative time, Scoliosis Research Society (SRS)-Pain and SRS-Self-Image. Performance was categorized as "below expected" if performing worse than one standard deviation of the mean, "as expected" if within 1 SD, and "better than expected" if better than 1 SD of the mean. Results. Ensemble ML methods classified performance better than traditional regression techniques for LOS, EBL, and operative time. The best performing models for predicting LOS and EBL were trained on data collected in the last 5 years, while operative time used the entire 10-year dataset. No models were able to predict SRS-Pain or SRS-Self-Image in any useful manner. Point-precise estimates for continuous variables were subject to high average errors. Conclusions. Classification of benchmark outcomes is improved with ensemble ML techniques and may provide much needed case-Adjustment for a surgeon performance program. Precise estimates of health-related quality of life scores and continuous variables were not possible, suggesting that performance classification is a better method of performance evaluation.
AB - Study Design. Retrospective cohort. Objective. The aim of this study was to design a risk-stratified benchmarking tool for adolescent idiopathic scoliosis (AIS) surgeries. Summary of Background Data. Machine learning (ML) is an emerging method for prediction modeling in orthopedic surgery. Benchmarking is an established method of process improvement and is an area of opportunity for ML methods. Current surgical benchmark tools often use ranks and no "gold standards" for comparisons exist. Materials and Methods. Data from 6076 AIS surgeries were collected from a multicenter registry and divided into three datasets: encompassing surgeries performed (1) during the entire registry, (2) the past 10 years, and (3) during the last 5 years of the registry. We trained three ML regression models (baseline linear regression, gradient boosting, and eXtreme gradient boosted) on each data subset to predict each of the five outcome variables, length of stay (LOS), estimated blood loss (EBL), operative time, Scoliosis Research Society (SRS)-Pain and SRS-Self-Image. Performance was categorized as "below expected" if performing worse than one standard deviation of the mean, "as expected" if within 1 SD, and "better than expected" if better than 1 SD of the mean. Results. Ensemble ML methods classified performance better than traditional regression techniques for LOS, EBL, and operative time. The best performing models for predicting LOS and EBL were trained on data collected in the last 5 years, while operative time used the entire 10-year dataset. No models were able to predict SRS-Pain or SRS-Self-Image in any useful manner. Point-precise estimates for continuous variables were subject to high average errors. Conclusions. Classification of benchmark outcomes is improved with ensemble ML techniques and may provide much needed case-Adjustment for a surgeon performance program. Precise estimates of health-related quality of life scores and continuous variables were not possible, suggesting that performance classification is a better method of performance evaluation.
KW - adolescent idiopathic scoliosis
KW - artificial intelligence
KW - benchmarking
KW - machine learning
KW - surgeon performance
UR - http://www.scopus.com/inward/record.url?scp=85166364106&partnerID=8YFLogxK
U2 - 10.1097/BRS.0000000000004734
DO - 10.1097/BRS.0000000000004734
M3 - Article
C2 - 37249385
AN - SCOPUS:85166364106
SN - 0362-2436
VL - 48
SP - 1138
EP - 1147
JO - Spine
JF - Spine
IS - 16
ER -