TY - JOUR
T1 - Protein sequence models for prediction and comparative analysis of the SARS-CoV-2 −human interactome
AU - Kshirsagar, Meghana
AU - Tasnina, Nure
AU - Ward, Michael D.
AU - Law, Jeffrey N.
AU - Murali, T. M.
AU - Lavista Ferres, Juan M.
AU - Bowman, Gregory R.
AU - Klein-Seetharaman, Judith
N1 - Publisher Copyright:
© 2020 The Authors.
PY - 2021
Y1 - 2021
N2 - Viruses such as the novel coronavirus, SARS-CoV-2, that is wreaking havoc on the world, depend on interactions of its own proteins with those of the human host cells. Relatively small changes in sequence such as between SARS-CoV and SARS-CoV-2 can dramatically change clinical phenotypes of the virus, including transmission rates and severity of the disease. On the other hand, highly dissimilar virus families such as Coronaviridae, Ebola, and HIV have overlap in functions. In this work we aim to analyze the role of protein sequence in the binding of SARS-CoV-2 virus proteins towards human proteins and compare it to that of the above other viruses. We build supervised machine learning models, using Generalized Additive Models to predict interactions based on sequence features and find that our models perform well with an AUC-PR of 0.65 in a class-skew of 1:10. Analysis of the novel predictions using an independent dataset showed statistically significant enrichment. We further map the importance of specific amino-acid sequence features in predicting binding and summarize what combinations of sequences from the virus and the host is correlated with an interaction. By analyzing the sequence-based embeddings of the interactomes from different viruses and clustering them together we find some functionally similar proteins from different viruses. For example, vif protein from HIV-1, vp24 from Ebola and orf3b from SARS-CoV all function as interferon antagonists. Furthermore, we can differentiate the functions of similar viruses, for example orf3a’s interactions are more diverged than orf7b interactions when comparing SARS-CoV and SARS-CoV-2.
AB - Viruses such as the novel coronavirus, SARS-CoV-2, that is wreaking havoc on the world, depend on interactions of its own proteins with those of the human host cells. Relatively small changes in sequence such as between SARS-CoV and SARS-CoV-2 can dramatically change clinical phenotypes of the virus, including transmission rates and severity of the disease. On the other hand, highly dissimilar virus families such as Coronaviridae, Ebola, and HIV have overlap in functions. In this work we aim to analyze the role of protein sequence in the binding of SARS-CoV-2 virus proteins towards human proteins and compare it to that of the above other viruses. We build supervised machine learning models, using Generalized Additive Models to predict interactions based on sequence features and find that our models perform well with an AUC-PR of 0.65 in a class-skew of 1:10. Analysis of the novel predictions using an independent dataset showed statistically significant enrichment. We further map the importance of specific amino-acid sequence features in predicting binding and summarize what combinations of sequences from the virus and the host is correlated with an interaction. By analyzing the sequence-based embeddings of the interactomes from different viruses and clustering them together we find some functionally similar proteins from different viruses. For example, vif protein from HIV-1, vp24 from Ebola and orf3b from SARS-CoV all function as interferon antagonists. Furthermore, we can differentiate the functions of similar viruses, for example orf3a’s interactions are more diverged than orf7b interactions when comparing SARS-CoV and SARS-CoV-2.
KW - SARS-CoV
KW - SARS-CoV-2
KW - generalized additive models
KW - protein interaction prediction
KW - protein sequence
UR - http://www.scopus.com/inward/record.url?scp=85102842367&partnerID=8YFLogxK
M3 - Conference article
C2 - 33691013
AN - SCOPUS:85102842367
SN - 2335-6928
SP - 154
EP - 165
JO - Pacific Symposium on Biocomputing
JF - Pacific Symposium on Biocomputing
T2 - 2021 Pacific Symposium on Bicomputing, PSB 2021
Y2 - 5 January 2021 through 7 January 2021
ER -