TY - JOUR
T1 - Accuracy and Variability of Prostate Multiparametric Magnetic Resonance Imaging Interpretation Using the Prostate Imaging Reporting and Data System
T2 - A Blinded Comparison of Radiologists
AU - Pickersgill, Nicholas A.
AU - Vetter, Joel M.
AU - Andriole, Gerald L.
AU - Shetty, Anup S.
AU - Fowler, Kathryn J.
AU - Mintz, Aaron J.
AU - Siegel, Cary L.
AU - Kim, Eric H.
N1 - Publisher Copyright:
© 2018 European Association of Urology
PY - 2020/3/15
Y1 - 2020/3/15
N2 - Background: Multiparametric (mp) magnetic resonance imaging (MRI) has become an important tool for the detection of clinically significant prostate cancer. However, diagnostic accuracy is affected by variability between radiologists. Objective: To determine the accuracy and variability in prostate mpMRI interpretation among radiologists, both individually and in teams, in a blinded fashion. Design, setting, and participants: A study cohort (n = 32) was created from our prospective registry of patients who received prostate mpMRI with subsequent biopsy. The cohort was then independently reviewed by four radiologists of varying levels of experience, who assigned a Prostate Imaging Reporting and Data System (PI-RADS) classification, blinded to all clinical information. Consensus interpretation by teams of two radiologists was evaluated after a 12-wk wash-out period. Interpretive accuracy was calculated with various cutoffs for PI-RADS classification and Gleason score. Variability among individual radiologists and teams was calculated using the Fleiss kappa and intraclass correlation coefficient (ICC). Results and limitations: Using PI-RADS 3+/Gleason 7+ (p < 0.01) and PI-RADS 4+/Gleason 6+ (p = 0.02) as cutoffs, significant differences in accuracy among the four radiologists were noted. At no cutoff for PI-RADS classification or Gleason score did a team read achieve higher accuracy than the most accurate radiologist. The kappa and ICC ranged from 0.22 to 0.29 for the individuals and from 0.16 to 0.21 for the teams (poor agreement). A larger sample size may be needed to adequately power differences in accuracy among individual radiologists. Conclusions: At various cutoffs for PI-RADS classification and Gleason score, we find significant differences in individual radiologist accuracy, as well as a poor agreement among individual radiologists. Consensus interpretations—as teams of two radiologists—did not improve accuracy or reduce variability. Patient summary: This study investigated radiologist variability and differences in accuracy using multiparametric magnetic resonance imaging for the diagnosis of prostate cancer. Despite attempts to standardize interpretation within the field, we found substantial variability and significant differences in accuracy among individual radiologists.
AB - Background: Multiparametric (mp) magnetic resonance imaging (MRI) has become an important tool for the detection of clinically significant prostate cancer. However, diagnostic accuracy is affected by variability between radiologists. Objective: To determine the accuracy and variability in prostate mpMRI interpretation among radiologists, both individually and in teams, in a blinded fashion. Design, setting, and participants: A study cohort (n = 32) was created from our prospective registry of patients who received prostate mpMRI with subsequent biopsy. The cohort was then independently reviewed by four radiologists of varying levels of experience, who assigned a Prostate Imaging Reporting and Data System (PI-RADS) classification, blinded to all clinical information. Consensus interpretation by teams of two radiologists was evaluated after a 12-wk wash-out period. Interpretive accuracy was calculated with various cutoffs for PI-RADS classification and Gleason score. Variability among individual radiologists and teams was calculated using the Fleiss kappa and intraclass correlation coefficient (ICC). Results and limitations: Using PI-RADS 3+/Gleason 7+ (p < 0.01) and PI-RADS 4+/Gleason 6+ (p = 0.02) as cutoffs, significant differences in accuracy among the four radiologists were noted. At no cutoff for PI-RADS classification or Gleason score did a team read achieve higher accuracy than the most accurate radiologist. The kappa and ICC ranged from 0.22 to 0.29 for the individuals and from 0.16 to 0.21 for the teams (poor agreement). A larger sample size may be needed to adequately power differences in accuracy among individual radiologists. Conclusions: At various cutoffs for PI-RADS classification and Gleason score, we find significant differences in individual radiologist accuracy, as well as a poor agreement among individual radiologists. Consensus interpretations—as teams of two radiologists—did not improve accuracy or reduce variability. Patient summary: This study investigated radiologist variability and differences in accuracy using multiparametric magnetic resonance imaging for the diagnosis of prostate cancer. Despite attempts to standardize interpretation within the field, we found substantial variability and significant differences in accuracy among individual radiologists.
KW - Accuracy
KW - Biopsy
KW - Diagnosis
KW - Magnetic resonance imaging
KW - Prostate cancer
KW - Radiologist variability
UR - http://www.scopus.com/inward/record.url?scp=85054614249&partnerID=8YFLogxK
U2 - 10.1016/j.euf.2018.10.008
DO - 10.1016/j.euf.2018.10.008
M3 - Article
C2 - 30327280
AN - SCOPUS:85054614249
SN - 2405-4569
VL - 6
SP - 267
EP - 272
JO - European Urology Focus
JF - European Urology Focus
IS - 2
ER -