Background Efforts to improve healthcare quality involve profiling hospitals and providers. Whether cancer-specific measures can be used reliably for profiling purposes has not been reported. Study Design Hospitals and surgeons were profiled with 3 measures assessing the adequacy of lymphadenectomy for colon (ie at least 12 regional lymph nodes [12RLN] are removed and pathologically examined for resected colon cancer), gastric (ie at least 15 regional lymph nodes [G15RLN] are removed and pathologically examined for resected gastric cancer), and non-small cell lung (ie at least 10 regional lymph nodes [10RLN] are removed and pathologically examined for American Joint Committee on Cancer stage IA, IB, IIA, and IIB resected non-small cell lung cancer) cancers using hierarchical models. National Cancer Data Base cases spanning 2010 to 2013 were included if they met measure eligibility. Reliability estimates for hospital and surgeon performance across cumulative years of data (2013, 2012 to 2013, 2011 to 2013, and 2010 to 2013) were calculated with and without risk adjustment. Surgeon caseload minimums were projected to achieve reliabilities of 0.40 and 0.70. Results Reliability estimates tended to increase with longer periods of data collection but at different rates, depending on measure, level of aggregation, and performance outlier status. Profiling hospitals using 12RLN with 2 years of data yielded a median reliability of 0.72 (interquartile range [IQR] 0.55 to 0.83); however, 4 years of data yielded a median reliability of only 0.31 (IQR 0.14 to 0.54) for surgeons. The G15RLN performance was poor overall; 10RLN had high reliability at both hospital (0.74; IQR 0.50 to 0.86) and surgeon (0.61; IQR 0.34 to 0.80) levels using 1 year of data, but the literature questions this measure's validity. Few surgeons could achieve appropriate levels of reliability regardless of increased data collection duration. Conclusions Profiling hospitals based on measures such as these can achieve acceptable reliability in reasonable timeframes, but does not always. Either lower levels of reliability should be accepted to profile surgeons with these measures or longer timeframes should be used.