Management of thyroid nodules seen on us images: Deep learning may match performance of radiologists

Mateusz Buda, Benjamin Wildman-Tobriner, Jenny K. Hoang, David Thayer, Franklin N. Tessler, William D. Middleton, MacIej A. Mazurowski

Research output: Contribution to journalArticlepeer-review

131 Scopus citations


Background: Management of thyroid nodules may be inconsistent between different observers and time consuming for radiologists. An artificial intelligence system that uses deep learning may improve radiology workflow for management of thyroid nodules. Purpose: To develop a deep learning algorithm that uses thyroid US images to decide whether a thyroid nodule should undergo a biopsy and to compare the performance of the algorithm with the performance of radiologists who adhere to American College of Radiology (ACR) Thyroid Imaging Reporting and Data System (TI-RADS). Materials and Methods: In this retrospective analysis, studies in patients referred for US with subsequent fine-needle aspiration or with surgical histologic analysis used as the standard were evaluated. The study period was from August 2006 to May 2010. A multitask deep convolutional neural network was trained to provide biopsy recommendations for thyroid nodules on the basis of two orthogonal US images as the input. In the training phase, the deep learning algorithm was first evaluated by using 10-fold cross-validation. Internal validation was then performed on an independent set of 99 consecutive nodules. The sensitivity and specificity of the algorithm were compared with a consensus of three ACR TI-RADS committee experts and nine other radiologists, all of whom interpreted thyroid US images in clinical practice. Results: Included were 1377 thyroid nodules in 1230 patients with complete imaging data and conclusive cytologic or histologic diagnoses. For the 99 test nodules, the proposed deep learning algorithm achieved 13 of 15 (87%: 95% confidence interval [CI]: 67%, 100%) sensitivity, the same as expert consensus (P .99) and higher than five of nine radiologists. The specificity of the deep learning algorithm was 44 of 84 (52%; 95% CI: 42%, 62%), which was similar to expert consensus (43 of 84; 51%; 95% CI: 41%, 62%; P = .91) and higher than seven of nine other radiologists. The mean sensitivity and specificity for the nine radiologists was 83% (95% CI: 64%, 98%) and 48% (95% CI: 37%, 59%), respectively. Conclusion: Sensitivity and specificity of a deep learning algorithm for thyroid nodule biopsy recommendations was similar to that of expert radiologists who used American College of Radiology Thyroid Imaging and Reporting Data System guidelines.

Original languageEnglish
Pages (from-to)695-701
Number of pages7
Issue number3
StatePublished - 2019


Dive into the research topics of 'Management of thyroid nodules seen on us images: Deep learning may match performance of radiologists'. Together they form a unique fingerprint.

Cite this