TY - JOUR
T1 - Choice of Adaptive Sampling Strategy Impacts State Discovery, Transition Probabilities, and the Apparent Mechanism of Conformational Changes
AU - Zimmerman, Maxwell I.
AU - Porter, Justin R.
AU - Sun, Xianqiang
AU - Silva, Roseane R.
AU - Bowman, Gregory R.
N1 - Funding Information:
*E-mail: bowman@biochem.wustl.edu. ORCID Maxwell I. Zimmerman: 0000-0003-0721-0652 Funding This work was funded by NSF CAREER Award MCB-1552471 and NIH R01GM12400701. Notes The authors declare no competing financial interest.
Funding Information:
G.R.B. holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and a Packard Fellowship for Science and Engineering from The David and Lucile Packard Foundation. M.I.Z. holds a Monsanto Graduate Fellowship and a Center for Biological Systems Engineering Fellowship.
Publisher Copyright:
© 2018 American Chemical Society.
PY - 2018/11/13
Y1 - 2018/11/13
N2 - Interest in atomically detailed simulations has grown significantly with recent advances in computational hardware and Markov state modeling (MSM) methods, yet outstanding questions remain that hinder their widespread adoption. Namely, how do alternative sampling strategies explore conformational space and how might this influence predictions generated from the data? Here, we seek to answer these questions for four commonly used sampling methods: (1) a single long simulation, (2) many short simulations run in parallel, (3) adaptive sampling, and (4) our recently developed goal-oriented sampling algorithm, FAST. We first develop a theoretical framework for analytically calculating the probability of discovering select states on simple landscapes, where we uncover the drastic effects of varying the number and length of simulations. We then use kinetic Monte Carlo simulations on a variety of physically inspired landscapes to characterize the probability of discovering particular states and transition pathways for each of the four methods. Consistently, we find that FAST simulations discover each target state with the highest probability, while traversing realistic pathways. Furthermore, we uncover the potential pathology that short parallel simulations sometimes predict an incorrect transition pathway by crossing large energy barriers that long simulations would typically circumnavigate. We refer to this pathology as "pathway tunneling". To protect against this phenomenon when using adaptive-sampling and FAST simulations, we introduce the FAST-string method. This method enhances sampling along the highest-flux transition paths to refine an MSMs transition probabilities and discriminate between competing pathways. Additionally, we compare the performance of a variety of MSM estimators in describing accurate thermodynamics and kinetics. For adaptive sampling, we recommend simply normalizing the transition counts out of each state after adding small pseudocounts to avoid creating sources or sinks. Lastly, we evaluate whether our insights from simple landscapes hold for all-atom molecular dynamics simulations of the folding of the -repressor protein. Remarkably, we find that FAST-contacts predicts the same folding pathway as a set of long simulations but with orders of magnitude less simulation time.
AB - Interest in atomically detailed simulations has grown significantly with recent advances in computational hardware and Markov state modeling (MSM) methods, yet outstanding questions remain that hinder their widespread adoption. Namely, how do alternative sampling strategies explore conformational space and how might this influence predictions generated from the data? Here, we seek to answer these questions for four commonly used sampling methods: (1) a single long simulation, (2) many short simulations run in parallel, (3) adaptive sampling, and (4) our recently developed goal-oriented sampling algorithm, FAST. We first develop a theoretical framework for analytically calculating the probability of discovering select states on simple landscapes, where we uncover the drastic effects of varying the number and length of simulations. We then use kinetic Monte Carlo simulations on a variety of physically inspired landscapes to characterize the probability of discovering particular states and transition pathways for each of the four methods. Consistently, we find that FAST simulations discover each target state with the highest probability, while traversing realistic pathways. Furthermore, we uncover the potential pathology that short parallel simulations sometimes predict an incorrect transition pathway by crossing large energy barriers that long simulations would typically circumnavigate. We refer to this pathology as "pathway tunneling". To protect against this phenomenon when using adaptive-sampling and FAST simulations, we introduce the FAST-string method. This method enhances sampling along the highest-flux transition paths to refine an MSMs transition probabilities and discriminate between competing pathways. Additionally, we compare the performance of a variety of MSM estimators in describing accurate thermodynamics and kinetics. For adaptive sampling, we recommend simply normalizing the transition counts out of each state after adding small pseudocounts to avoid creating sources or sinks. Lastly, we evaluate whether our insights from simple landscapes hold for all-atom molecular dynamics simulations of the folding of the -repressor protein. Remarkably, we find that FAST-contacts predicts the same folding pathway as a set of long simulations but with orders of magnitude less simulation time.
UR - http://www.scopus.com/inward/record.url?scp=85050908164&partnerID=8YFLogxK
U2 - 10.1021/acs.jctc.8b00500
DO - 10.1021/acs.jctc.8b00500
M3 - Article
C2 - 30240203
AN - SCOPUS:85050908164
VL - 14
SP - 5459
EP - 5475
JO - Journal of Chemical Theory and Computation
JF - Journal of Chemical Theory and Computation
SN - 1549-9618
IS - 11
ER -