Large-scale electronic health record research introduces biases compared to traditional manually curated retrospective research. We used data from a community-acquired pneumonia study for which we had a gold standard to illustrate such biases. The challenges include data inaccuracy, incompleteness, and complexity, and they can produce in distorted results. We found that a nal̈ve approach approximated the gold standard, but errors ona minority of cases shifted mortality substantially. Manual review revealed errors in both selecting and characterizing the cohort, and narrowing the cohort improved the result. Nevertheless, a significantly narrowed cohort might contain its own biases that would be difficult to estimate.