Background The American College of Surgeon (ACS) NSQIP Surgical Risk Calculator has been widely adopted as a decision aid and informed consent tool by surgeons and patients. Previous evaluations showed excellent discrimination and combined discrimination and calibration, but model calibration alone, and potential benefits of recalibration, were not explored. Because lack of calibration can lead to systematic errors in assessing surgical risk, our objective was to assess calibration and determine whether spline-based adjustments could improve it. Study Design We evaluated Surgical Risk Calculator model calibration, as well as discrimination, for each of 11 outcomes modeled from nearly 3 million patients (2010 to 2014). Using independent random subsets of data, we evaluated model performance for the Development (60% of records), Validation (20%), and Test (20%) datasets, where prediction equations from the Development dataset were recalibrated using restricted cubic splines estimated from the Validation dataset. We also evaluated performance on data subsets composed of higher-risk operations. Results The nonrecalibrated Surgical Risk Calculator performed well, but there was a slight tendency for predicted risk to be overestimated for lowest- and highest-risk patients and underestimated for moderate-risk patients. After recalibration, this distortion was eliminated, and p values for miscalibration were most often nonsignificant. Calibration was also excellent for subsets of higher-risk operations, though observed calibration was reduced due to instability associated with smaller sample sizes. Conclusions Performance of NSQIP Surgical Risk Calculator models was shown to be excellent and improved with recalibration. Surgeons and patients can rely on the calculator to provide accurate estimates of surgical risk.