Detecting Glaucoma in the Ocular Hypertension Study Using Deep Learning

Rui Fan, Christopher Bowd, Mark Christopher, Nicole Brye, James A. Proudfoot, Jasmin Rezapour, Akram Belghith, Michael H. Goldbaum, Benton Chuter, Christopher A. Girkin, Massimo A. Fazio, Jeffrey M. Liebmann, Robert N. Weinreb, Mae O. Gordon, Michael A. Kass, David Kriegman, Linda M. Zangwill

Research output: Contribution to journalArticlepeer-review

19 Scopus citations


Importance: Automated deep learning (DL) analyses of fundus photographs potentially can reduce the cost and improve the efficiency of reading center assessment of end points in clinical trials. Objective: To investigate the diagnostic accuracy of DL algorithms trained on fundus photographs from the Ocular Hypertension Treatment Study (OHTS) to detect primary open-angle glaucoma (POAG). Design, Setting, and Participants: In this diagnostic study, 1636 OHTS participants from 22 sites with a mean (range) follow-up of 10.7 (0-14.3) years. A total of 66715 photographs from 3272 eyes were used to train and test a ResNet-50 model to detect the OHTS Endpoint Committee POAG determination based on optic disc (287 eyes, 3502 photographs) and/or visual field (198 eyes, 2300 visual fields) changes. Three independent test sets were used to evaluate the generalizability of the model. Main Outcomes and Measures: Areas under the receiver operating characteristic curve (AUROC) and sensitivities at fixed specificities were calculated to compare model performance. Evaluation of false-positive rates was used to determine whether the DL model detected POAG before the OHTS Endpoint Committee POAG determination. Results: A total of 1147 participants were included in the training set (661 [57.6%] female; mean age, 57.2 years; 95% CI, 56.6-57.8), 167 in the validation set (97 [58.1%] female; mean age, 57.1 years; 95% CI, 55.6-58.7), and 322 in the test set (173 [53.7%] female; mean age, 57.2 years; 95% CI, 56.1-58.2). The DL model achieved an AUROC of 0.88 (95% CI, 0.82-0.92) for the OHTS Endpoint Committee determination of optic disc or VF changes. For the OHTS end points based on optic disc changes or visual field changes, AUROCs were 0.91 (95% CI, 0.88-0.94) and 0.86 (95% CI, 0.76-0.93), respectively. False-positive rates (at 90% specificity) were higher in photographs of eyes that later developed POAG by disc or visual field (27.5% [56 of 204]) compared with eyes that did not develop POAG (11.4% [50 of 440]) during follow-up. The diagnostic accuracy of the DL model developed on the optic disc end point applied to 3 independent data sets was lower, with AUROCs ranging from 0.74 (95% CI, 0.70-0.77) to 0.79 (95% CI, 0.78-0.81). Conclusions and Relevance: The model's high diagnostic accuracy using OHTS photographs suggests that DL has the potential to standardize and automate POAG determination for clinical trials and management. In addition, the higher false-positive rate in early photographs of eyes that later developed POAG suggests that DL models detected POAG in some eyes earlier than the OHTS Endpoint Committee, reflecting the OHTS design that emphasized a high specificity for POAG determination by requiring a clinically significant change from baseline. 2022 American Medical Association. All rights reserved.

Original languageEnglish
Pages (from-to)383-391
Number of pages9
JournalJAMA Ophthalmology
Issue number4
StatePublished - Apr 2022


Dive into the research topics of 'Detecting Glaucoma in the Ocular Hypertension Study Using Deep Learning'. Together they form a unique fingerprint.

Cite this