TY - JOUR
T1 - A Semiautonomous Deep Learning System to Reduce False Positives in Screening Mammography
AU - Pedemonte, Stefano
AU - Tsue, Trevor
AU - Mombourquette, Brent
AU - Vu, Yen Nhi Truong
AU - Matthews, Thomas
AU - Hoil, Rodrigo Morales
AU - Shah, Meet
AU - Ghare, Nikita
AU - Zingman-Daniels, Naomi
AU - Holley, Susan
AU - Appleton, Catherine M.
AU - Su, Jason
AU - Wahl, Richard L.
N1 - Publisher Copyright:
© 2024, Radiological Society of North America Inc.. All rights reserved.
PY - 2024/5
Y1 - 2024/5
N2 - Purpose: To evaluate the ability of a semiautonomous artificial intelligence (AI) model to identify screening mammograms not suspicioufor breast cancer and reduce the number of false-positive examinations. Materials and Methods: The deep learning algorithm was trained using 123 248 two-dimensional digital mammograms (6161 cancers) and a retrospective study was performed on three nonoverlapping datasets of 14 831 screening mammography examinations (1026 cancers) from two U.S. institutions and one U.K. institution (2008–2017). The stand-alone performance of humans and AI was compared. Human plus AI performance was simulated to examine reductions in the cancer detection rate, number of examinations, false-positive callbacks, and benign biopsies. Metrics were adjusted to mimic the natural distribution of a screening population, and bootstrapped CIs and P values were calculated. Results: Retrospective evaluation on all datasets showed minimal changes to the cancer detection rate with use of the AI device (noninferiority margin of 0.25 cancers per 1000 examinations: U.S. dataset 1, P = .02; U.S. dataset 2, P < .001; U.K. dataset, P < .001). On U.S. dataset 1 (11 592 mammograms; 101 cancers; 3810 female patients; mean age, 57.3 years ± 10.0 [SD]), the device reduced screening examinations requiring radiologist interpretation by 41.6% (95% CI: 40.6%, 42.4%; P < .001), diagnostic examinations callbacks by 31.1% (95% CI: 28.7%, 33.4%; P < .001), and benign needle biopsies by 7.4% (95% CI: 4.1%, 12.4%; P < .001). U.S. dataset 2 (1362 mammograms; 330 cancers; 1293 female patients; mean age, 55.4 years ± 10.5) was reduced by 19.5% (95% CI: 16.9%, 22.1%; P < .001), 11.9% (95% CI: 8.6%, 15.7%; P < .001), and 6.5% (95% CI: 0.0%, 19.0%; P = .08), respectively. The U.K. dataset (1877 mammograms; 595 cancers; 1491 female patients; mean age, 63.5 years ± 7.1) was reduced by 36.8% (95% CI: 34.4%, 39.7%; P < .001), 17.1% (95% CI: 5.9%, 30.1%: P < .001), and 5.9% (95% CI: 2.9%, 11.5%; P < .001), respectively. Conclusion: This work demonstrates the potential of a semiautonomous breast cancer screening system to reduce false positives, unnecessaprocedures, patient anxiety, and medical expenses.
AB - Purpose: To evaluate the ability of a semiautonomous artificial intelligence (AI) model to identify screening mammograms not suspicioufor breast cancer and reduce the number of false-positive examinations. Materials and Methods: The deep learning algorithm was trained using 123 248 two-dimensional digital mammograms (6161 cancers) and a retrospective study was performed on three nonoverlapping datasets of 14 831 screening mammography examinations (1026 cancers) from two U.S. institutions and one U.K. institution (2008–2017). The stand-alone performance of humans and AI was compared. Human plus AI performance was simulated to examine reductions in the cancer detection rate, number of examinations, false-positive callbacks, and benign biopsies. Metrics were adjusted to mimic the natural distribution of a screening population, and bootstrapped CIs and P values were calculated. Results: Retrospective evaluation on all datasets showed minimal changes to the cancer detection rate with use of the AI device (noninferiority margin of 0.25 cancers per 1000 examinations: U.S. dataset 1, P = .02; U.S. dataset 2, P < .001; U.K. dataset, P < .001). On U.S. dataset 1 (11 592 mammograms; 101 cancers; 3810 female patients; mean age, 57.3 years ± 10.0 [SD]), the device reduced screening examinations requiring radiologist interpretation by 41.6% (95% CI: 40.6%, 42.4%; P < .001), diagnostic examinations callbacks by 31.1% (95% CI: 28.7%, 33.4%; P < .001), and benign needle biopsies by 7.4% (95% CI: 4.1%, 12.4%; P < .001). U.S. dataset 2 (1362 mammograms; 330 cancers; 1293 female patients; mean age, 55.4 years ± 10.5) was reduced by 19.5% (95% CI: 16.9%, 22.1%; P < .001), 11.9% (95% CI: 8.6%, 15.7%; P < .001), and 6.5% (95% CI: 0.0%, 19.0%; P = .08), respectively. The U.K. dataset (1877 mammograms; 595 cancers; 1491 female patients; mean age, 63.5 years ± 7.1) was reduced by 36.8% (95% CI: 34.4%, 39.7%; P < .001), 17.1% (95% CI: 5.9%, 30.1%: P < .001), and 5.9% (95% CI: 2.9%, 11.5%; P < .001), respectively. Conclusion: This work demonstrates the potential of a semiautonomous breast cancer screening system to reduce false positives, unnecessaprocedures, patient anxiety, and medical expenses.
KW - Artificial Intelligence
KW - Breast Cancer
KW - Screening Mammography
KW - Semiautonomous Deep Learning
UR - https://www.scopus.com/pages/publications/85196389495
U2 - 10.1148/ryai.230033
DO - 10.1148/ryai.230033
M3 - Article
C2 - 38597785
AN - SCOPUS:85196389495
SN - 2638-6100
VL - 6
JO - Radiology: Artificial Intelligence
JF - Radiology: Artificial Intelligence
IS - 3
M1 - e230033
ER -