Purpose Treatment planning factors are known to affect the risk of severe acute esophagitis during thoracic radiation therapy. We tested a previously published model to predict the risk of severe acute esophagitis on an independent data set. Methods and materials The data set consists of data from patients who had recoverable treatment plans and received definitive radiation therapy for non–small cell carcinoma of the lung at a single institution between November 2004 and January 2010. Complete esophagus dose-volume and available clinical information was extracted using our in-house software. The previously published model was a logistic function with a combination of mean esophageal dose and use of concurrent chemotherapy. In addition to testing the previous model, we used a novel, machine learning-based method to build a maximally predictive model. Results Ninety-four patients (81.7%) developed Common Terminology Criteria for Adverse Events, Version 4, Grade 2 or more severe esophagitis (Grade 2: n = 79 and Grade 3: n = 15). Univariate analysis revealed that the most statistically significant dose-volume parameters included percentage of esophagus volume receiving ≥40 to 60 Gy, minimum dose to the highest 20% of esophagus volume (D20) to D35, and mean dose. Other significant predictors included concurrent chemotherapy and patient age. The previously published model predicted risk effectively with a Spearman's rank correlation coefficient (rs) of 0.43 (P <.001) with good calibration (Hosmer-Lemeshow goodness of fit: P =.537). A new model that was built from the current data set found the same variables, yielding an rs of 0.43 (P <.001) with a logistic function of 0.0853 × mean esophageal dose [Gy] + 1.49 × concurrent chemotherapy [1/0] − 1.75 and Hosmer-Lemeshow P =.659. A novel preconditioned least absolute shrinkage and selection operator method yielded an average rs of 0.38 on 100 bootstrapped data sets. Conclusions The previously published model was validated on an independent data set and determined to be nearly as predictive as the best possible two-parameter logistic model even though it overpredicted risk systematically. A novel, machine learning-based model using a bootstrapping approach showed reasonable predictive power.