Protein sequence models for prediction and comparative analysis of the SARS-CoV-2 −human interactome

Meghana Kshirsagar, Nure Tasnina, Michael D. Ward, Jeffrey N. Law, T. M. Murali, Juan M. Lavista Ferres, Gregory R. Bowman, Judith Klein-Seetharaman

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations


Viruses such as the novel coronavirus, SARS-CoV-2, that is wreaking havoc on the world, depend on interactions of its own proteins with those of the human host cells. Relatively small changes in sequence such as between SARS-CoV and SARS-CoV-2 can dramatically change clinical phenotypes of the virus, including transmission rates and severity of the disease. On the other hand, highly dissimilar virus families such as Coronaviridae, Ebola, and HIV have overlap in functions. In this work we aim to analyze the role of protein sequence in the binding of SARS-CoV-2 virus proteins towards human proteins and compare it to that of the above other viruses. We build supervised machine learning models, using Generalized Additive Models to predict interactions based on sequence features and find that our models perform well with an AUC-PR of 0.65 in a class-skew of 1:10. Analysis of the novel predictions using an independent dataset showed statistically significant enrichment. We further map the importance of specific amino-acid sequence features in predicting binding and summarize what combinations of sequences from the virus and the host is correlated with an interaction. By analyzing the sequence-based embeddings of the interactomes from different viruses and clustering them together we find some functionally similar proteins from different viruses. For example, vif protein from HIV-1, vp24 from Ebola and orf3b from SARS-CoV all function as interferon antagonists. Furthermore, we can differentiate the functions of similar viruses, for example orf3a’s interactions are more diverged than orf7b interactions when comparing SARS-CoV and SARS-CoV-2.

Original languageEnglish
Pages (from-to)154-165
Number of pages12
JournalPacific Symposium on Biocomputing
StatePublished - 2021
Event2021 Pacific Symposium on Bicomputing, PSB 2021 - Virtual, Online
Duration: Jan 5 2021Jan 7 2021


  • SARS-CoV
  • SARS-CoV-2
  • generalized additive models
  • protein interaction prediction
  • protein sequence


Dive into the research topics of 'Protein sequence models for prediction and comparative analysis of the SARS-CoV-2 −human interactome'. Together they form a unique fingerprint.

Cite this