Purpose: To assess the correlation between quantitative measures of contour variability and physician's qualitative measure for clinical usefulness of auto‐segmentation in prostate cancer radiotherapy Methods: Our study was based on three serial CT images (one planning and two under‐treatment image sets) for each of five prostate cancer patients. On each CT image, bladder, prostate and rectum were manually contoured by three experienced physicians. Deformable image registration (ITK Demons) was used to register each of the under‐treatment CT images to the planning CT image. The resultant displacement vector fields were used to automatically segment planning CT organs by deformably mapping manual contours on the treatment CT's to the planning CT. For qualitative assessment of automatic and manual contours, trial was conducted with four radiation oncology residents. Each resident was shown sets of randomly chosen manual or automatic bladder, prostate and rectum contours overlaid on the planning CT image in Pinnacle (Philips TPS) using a total of hundred‐thirty‐five contours. Residents were asked to accept/reject contour based on its clinical usability. Quantitatively, surface distances and DICE coefficient were computed between inter‐observer manual contours (manual/manual) and between each automatic and its corresponding manual contour (auto/manual). Results: No statistically significant differences were found in mean surface distances between manual/manual and auto/manual contours for bladder and rectum while manual/manual contour distances were significantly smaller for prostate. The distribution of DICE values between manual/manual and auto/manual contours were also similar. Qualitatively, acceptance rates for manual contours were significantly higher than that for automatic contours. Conclusion: No correspondence was found between qualitative and quantitative measure for manual and automatic contours for rectum and bladder while the two measures appear to be related for prostate. This study suggests that using quantitative measures for evaluating auto‐segmentation without a qualitative calibration might not always be predictive of its clinical usefulness.(Supported by NIH P01CA166602) This work was supported by NIH Grant P01 CA 166602 E. Weiss and J. Williamson have grants from Varian medical systems and Philips Radiation Oncology Systems.