Markov models and master equations are a powerful means of modeling dynamic processes like protein conformational changes. However, these models are often difficult to understand because of the enormous number of components and connections between them. Therefore, a variety of methods have been developed to facilitate understanding by coarse-graining these complex models. Here, we employ Bayesian model comparison to determine which of these coarse-graining methods provides the models that are most faithful to the original set of states. We find that the Bayesian agglomerative clustering engine and the hierarchical Nyström expansion graph (HNEG) typically provide the best performance. Surprisingly, the original Perron cluster cluster analysis (PCCA) method often provides the next best results, outperforming the newer PCCA+ method and the most probable paths algorithm. We also show that the differences between the models are qualitatively significant, rather than being minor shifts in the boundaries between states. The performance of the methods correlates well with the entropy of the resulting coarse-grainings, suggesting that finding states with more similar populations (i.e., avoiding low population states that may just be noise) gives better results.