TY - JOUR
T1 - Deep learning the structural determinants of protein biochemical properties by comparing structural ensembles with DiffNets
AU - Ward, Michael D.
AU - Zimmerman, Maxwell I.
AU - Meller, Artur
AU - Chung, Moses
AU - Swamidass, S. J.
AU - Bowman, Gregory R.
N1 - Funding Information:
This work was funded by NSF CAREER Award MCB-1552471 and NIH grants R01 GM124007 and RF1AG067194 (G.R.B.). We would like to thank AMD for the donation of critical hardware and support resources from its HPC Fund that enabled the computations for this work. G.R.B. holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and a Packard Fellowship for Science and Engineering from The David & Lucile Packard Foundation. M.D.W. was supported by a MolSSI COVID-19 seed software fellowship and would like to thank Sina Mostafanejad and Doaa Altarawy for their guidance in developing DiffNets as a software package.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - Understanding the structural determinants of a protein’s biochemical properties, such as activity and stability, is a major challenge in biology and medicine. Comparing computer simulations of protein variants with different biochemical properties is an increasingly powerful means to drive progress. However, success often hinges on dimensionality reduction algorithms for simplifying the complex ensemble of structures each variant adopts. Unfortunately, common algorithms rely on potentially misleading assumptions about what structural features are important, such as emphasizing larger geometric changes over smaller ones. Here we present DiffNets, self-supervised autoencoders that avoid such assumptions, and automatically identify the relevant features, by requiring that the low-dimensional representations they learn are sufficient to predict the biochemical differences between protein variants. For example, DiffNets automatically identify subtle structural signatures that predict the relative stabilities of β-lactamase variants and duty ratios of myosin isoforms. DiffNets should also be applicable to understanding other perturbations, such as ligand binding.
AB - Understanding the structural determinants of a protein’s biochemical properties, such as activity and stability, is a major challenge in biology and medicine. Comparing computer simulations of protein variants with different biochemical properties is an increasingly powerful means to drive progress. However, success often hinges on dimensionality reduction algorithms for simplifying the complex ensemble of structures each variant adopts. Unfortunately, common algorithms rely on potentially misleading assumptions about what structural features are important, such as emphasizing larger geometric changes over smaller ones. Here we present DiffNets, self-supervised autoencoders that avoid such assumptions, and automatically identify the relevant features, by requiring that the low-dimensional representations they learn are sufficient to predict the biochemical differences between protein variants. For example, DiffNets automatically identify subtle structural signatures that predict the relative stabilities of β-lactamase variants and duty ratios of myosin isoforms. DiffNets should also be applicable to understanding other perturbations, such as ligand binding.
UR - http://www.scopus.com/inward/record.url?scp=85106735725&partnerID=8YFLogxK
U2 - 10.1038/s41467-021-23246-1
DO - 10.1038/s41467-021-23246-1
M3 - Article
C2 - 34021153
AN - SCOPUS:85106735725
VL - 12
JO - Nature Communications
JF - Nature Communications
SN - 2041-1723
IS - 1
M1 - 3023
ER -