TY - JOUR
T1 - Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores
AU - Goldstein, Seth D.
AU - Lindeman, Brenessa
AU - Colbert-Getz, Jorie
AU - Arbella, Trisha
AU - Dudas, Robert
AU - Lidor, Anne
AU - Sacks, Bethany
PY - 2014/2
Y1 - 2014/2
N2 - Background The clinical knowledge of medical students on a surgery clerkship is routinely assessed via subjective evaluations from faculty members and residents. Interpretation of these ratings should ideally be valid and reliable. However, prior literature has questioned the correlation between subjective and objective components when assessing students' clinical knowledge. Methods Retrospective cross-sectional data were collected from medical student records at The Johns Hopkins University School of Medicine from July 2009 through June 2011. Surgical faculty members and residents rated students' clinical knowledge on a 5-point, Likert-type scale. Interrater reliability was assessed using intraclass correlation coefficients for students with ≥4 attending surgeon evaluations (n = 216) and ≥4 resident evaluations (n = 207). Convergent validity was assessed by correlating average evaluation ratings with scores on the National Board of Medical Examiners (NBME) clinical subject examination for surgery. Average resident and attending surgeon ratings were also compared by NBME quartile using analysis of variance. Results There were high degrees of reliability for resident ratings (intraclass correlation coefficient,.81) and attending surgeon ratings (intraclass correlation coefficient,.76). Resident and attending surgeon ratings shared a moderate degree of variance (19%). However, average resident ratings and average attending surgeon ratings shared a small degree of variance with NBME surgery examination scores (ρ2 ≤.09). When ratings were compared among NBME quartile groups, the only significant difference was for residents' ratings of students with the lower 25th percentile of scores compared with the top 25th percentile of scores (P =.007). Conclusions Although high interrater reliability suggests that attending surgeons and residents rate students with consistency, the lack of convergent validity suggests that these ratings may not be reflective of actual clinical knowledge. Both faculty members and residents may benefit from training in knowledge assessment, which will likely increase opportunities to recognize deficiencies and make student evaluation a more valuable tool.
AB - Background The clinical knowledge of medical students on a surgery clerkship is routinely assessed via subjective evaluations from faculty members and residents. Interpretation of these ratings should ideally be valid and reliable. However, prior literature has questioned the correlation between subjective and objective components when assessing students' clinical knowledge. Methods Retrospective cross-sectional data were collected from medical student records at The Johns Hopkins University School of Medicine from July 2009 through June 2011. Surgical faculty members and residents rated students' clinical knowledge on a 5-point, Likert-type scale. Interrater reliability was assessed using intraclass correlation coefficients for students with ≥4 attending surgeon evaluations (n = 216) and ≥4 resident evaluations (n = 207). Convergent validity was assessed by correlating average evaluation ratings with scores on the National Board of Medical Examiners (NBME) clinical subject examination for surgery. Average resident and attending surgeon ratings were also compared by NBME quartile using analysis of variance. Results There were high degrees of reliability for resident ratings (intraclass correlation coefficient,.81) and attending surgeon ratings (intraclass correlation coefficient,.76). Resident and attending surgeon ratings shared a moderate degree of variance (19%). However, average resident ratings and average attending surgeon ratings shared a small degree of variance with NBME surgery examination scores (ρ2 ≤.09). When ratings were compared among NBME quartile groups, the only significant difference was for residents' ratings of students with the lower 25th percentile of scores compared with the top 25th percentile of scores (P =.007). Conclusions Although high interrater reliability suggests that attending surgeons and residents rate students with consistency, the lack of convergent validity suggests that these ratings may not be reflective of actual clinical knowledge. Both faculty members and residents may benefit from training in knowledge assessment, which will likely increase opportunities to recognize deficiencies and make student evaluation a more valuable tool.
KW - Assessment
KW - Medical student education
KW - Surgery clerkship
UR - http://www.scopus.com/inward/record.url?scp=84893672911&partnerID=8YFLogxK
U2 - 10.1016/j.amjsurg.2013.10.008
DO - 10.1016/j.amjsurg.2013.10.008
M3 - Article
C2 - 24239528
AN - SCOPUS:84893672911
SN - 0002-9610
VL - 207
SP - 231
EP - 235
JO - American journal of surgery
JF - American journal of surgery
IS - 2
ER -