TY - JOUR
T1 - A multisite study of a breast density deep learning model for full-field digital mammography and synthetic mammography
AU - Matthews, Thomas P.
AU - Singh, Sadanand
AU - Mombourquette, Brent
AU - Su, Jason
AU - Shah, Meet P.
AU - Pedemonte, Stefano
AU - Long, Aaron
AU - Maffit, David
AU - Gurney, Jenny
AU - Hoil, Rodrigo Morales
AU - Ghare, Nikita
AU - Smith, Douglas
AU - Moore, Steve
AU - Marks, Susan C.
AU - Wahl, Richard L.
N1 - Funding Information:
Study supported in part by Whiterabbit. Washington University has equity interests in Whiterabbit and may receive royalty income and milestone payments from a “Collaboration and License Agreement” with Whiterabbit to develop a technology evaluated in this research.
Funding Information:
This retrospective study was approved by an institutional review board for each of the two sites where data were collected (site 1, internal institutional review board; and site 2, Western Institutional Review Board, Puyallup, Wash). Informed consent was waived, and all data were handled according to the Health Insurance Portability and Accountability Act. This work was supported in part by funding from Whiterabbit. Washington University has equity interests in Whiterabbit and may receive royalty income and milestone payments from a collaboration and license agreement with Whiterabbit to develop a technology evaluated in this research.
Publisher Copyright:
© 2021, Radiological Society of North America Inc.. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Purpose: To develop a Breast Imaging Reporting and Data System (BI-RADS) breast density deep learning (DL) model in a multisite setting for synthetic two-dimensional mammographic (SM) images derived from digital breast tomosynthesis examinations by using full-field digital mammographic (FFDM) images and limited SM data. Materials and Methods: A DL model was trained to predict BI-RADS breast density by using FFDM images acquired from 2008 to 2017 (site 1: 57 492 patients, 187 627 examinations, 750 752 images) for this retrospective study. The FFDM model was evaluated by using SM datasets from two institutions (site 1: 3842 patients, 3866 examinations, 14 472 images, acquired from 2016 to 2017; site 2: 7557 patients, 16 283 examinations, 63 973 images, 2015 to 2019). Each of the three datasets were then split into training, validation, and test. Adaptation methods were investigated to improve performance on the SM datasets, and the effect of dataset size on each adaptation method was considered. Statistical significance was assessed by using CIs, which were estimated by bootstrapping. Results: Without adaptation, the model demonstrated substantial agreement with the original reporting radiologists for all three datasets (site 1 FFDM: linearly weighted Cohen k [kw ] = 0.75 [95% CI: 0.74, 0.76]; site 1 SM: kw = 0.71 [95% CI: 0.64, 0.78]; site 2 SM: kw = 0.72 [95% CI: 0.70, 0.75]). With adaptation, performance improved for site 2 (site 1: kw = 0.72 [95% CI: 0.66, 0.79], 0.71 vs 0.72, P = .80; site 2: kw = 0.79 [95% CI: 0.76, 0.81], 0.72 vs 0.79, P < .001) by using only 500 SM images from that site. Conclusion: A BI-RADS breast density DL model demonstrated strong performance on FFDM and SM images from two institutions without training on SM images and improved by using few SM images.
AB - Purpose: To develop a Breast Imaging Reporting and Data System (BI-RADS) breast density deep learning (DL) model in a multisite setting for synthetic two-dimensional mammographic (SM) images derived from digital breast tomosynthesis examinations by using full-field digital mammographic (FFDM) images and limited SM data. Materials and Methods: A DL model was trained to predict BI-RADS breast density by using FFDM images acquired from 2008 to 2017 (site 1: 57 492 patients, 187 627 examinations, 750 752 images) for this retrospective study. The FFDM model was evaluated by using SM datasets from two institutions (site 1: 3842 patients, 3866 examinations, 14 472 images, acquired from 2016 to 2017; site 2: 7557 patients, 16 283 examinations, 63 973 images, 2015 to 2019). Each of the three datasets were then split into training, validation, and test. Adaptation methods were investigated to improve performance on the SM datasets, and the effect of dataset size on each adaptation method was considered. Statistical significance was assessed by using CIs, which were estimated by bootstrapping. Results: Without adaptation, the model demonstrated substantial agreement with the original reporting radiologists for all three datasets (site 1 FFDM: linearly weighted Cohen k [kw ] = 0.75 [95% CI: 0.74, 0.76]; site 1 SM: kw = 0.71 [95% CI: 0.64, 0.78]; site 2 SM: kw = 0.72 [95% CI: 0.70, 0.75]). With adaptation, performance improved for site 2 (site 1: kw = 0.72 [95% CI: 0.66, 0.79], 0.71 vs 0.72, P = .80; site 2: kw = 0.79 [95% CI: 0.76, 0.81], 0.72 vs 0.79, P < .001) by using only 500 SM images from that site. Conclusion: A BI-RADS breast density DL model demonstrated strong performance on FFDM and SM images from two institutions without training on SM images and improved by using few SM images.
UR - http://www.scopus.com/inward/record.url?scp=85113784848&partnerID=8YFLogxK
U2 - 10.1148/ryai.2020200015
DO - 10.1148/ryai.2020200015
M3 - Article
C2 - 33937850
AN - SCOPUS:85113784848
SN - 2638-6100
VL - 3
JO - Radiology: Artificial Intelligence
JF - Radiology: Artificial Intelligence
IS - 1
M1 - e200015
ER -