Background: The field of psychiatric epidemiology continues to employ self-report instruments, but the low degree of agreement between diagnoses achieved using these instruments vs that achieved by psychiatrists in the clinical modality threatens the credibility of the results. Methods: In the Baltimore Epidemiologic Catchment Area follow-up, 349 individuals who had a Diagnostic Interview Schedule (DIS) interview were blindly examined by psychiatrists using the Schedules for Clinical Assessment in Neuropsychiatry (SCAN). Comparisons were made at the level of diagnosis, syndrome, and DSM-IV symptom group. Indexes of agreement were computed and characteristics of discrepant cases were identified. Results: Agreement on diagnosis of major depressive disorder was only fair (κ = 0.20), with the DIS missing many cases judged to meet criteria for diagnosis using the SCAN (29% sensitivity). A major source of discrepancy was respondents with false-negative diagnoses who repeatedly failed to report DIS symptoms attributed to life crises or medical conditions. Older age, male sex, and lower impairment were associated with underdetection by the DIS, using logistic regression analysis. In spite of the diagnostic discrepancy, there was substantial correlation in numbers of symptom groups in the 2 modalities (r = 0.49). Agreement was highest (about 55% sensitivity and 90% specificity) when both the SCAN and DIS thresholds were set at the level of depression syndrome instead of diagnosis, Conclusions: Weak agreement at the level of diagnosis continues to threaten the credibility of estimates of prevalence of specific disorders. A bias toward underreporting, as well as stronger agreement at the level of the depression syndrome and on ordinal measures of depressive symptoms, suggests that associations with risk factors are conservative.