TY - JOUR
T1 - Spotlite
T2 - Web application and augmented algorithms for predicting co-complexed proteins from affinity purification - Mass spectrometry data
AU - Goldfarb, Dennis
AU - Hast, Bridgid E.
AU - Wang, Wei
AU - Major, Michael B.
N1 - Publisher Copyright:
© 2014 American Chemical Society.
PY - 2014/12/5
Y1 - 2014/12/5
N2 - Protein-protein interactions defined by affinity purification and mass spectrometry (APMS) suffer from high false discovery rates. Consequently, lists of potential interactions must be pruned of contaminants before network construction and interpretation, historically an expensive, time-intensive, and error-prone task. In recent years, numerous computational methods were developed to identify genuine interactions from the hundreds of candidates. Here, comparative analysis of three popular algorithms, HGSCore, CompPASS, and SAINT, revealed complementarity in their classification accuracies, which is supported by their divergent scoring strategies. We improved each algorithm by an average area under a receiver operating characteristics curve increase of 16% by integrating a variety of indirect data known to correlate with established protein-protein interactions, including mRNA coexpression, gene ontologies, domain-domain binding affinities, and homologous protein interactions. Each APMS scoring approach was incorporated into a separate logistic regression model along with the indirect features; the resulting three classifiers demonstrate improved performance on five diverse APMS data sets. To facilitate APMS data scoring within the scientific community, we created Spotlite, a user-friendly and fast web application. Within Spotlite, data can be scored with the augmented classifiers, annotated, and visualized (http://cancer.unc.edu/majorlab/software.php). The utility of the Spotlite platform to reveal physical, functional, and disease-relevant characteristics within APMS data is established through a focused analysis of the KEAP1 E3 ubiquitin ligase.
AB - Protein-protein interactions defined by affinity purification and mass spectrometry (APMS) suffer from high false discovery rates. Consequently, lists of potential interactions must be pruned of contaminants before network construction and interpretation, historically an expensive, time-intensive, and error-prone task. In recent years, numerous computational methods were developed to identify genuine interactions from the hundreds of candidates. Here, comparative analysis of three popular algorithms, HGSCore, CompPASS, and SAINT, revealed complementarity in their classification accuracies, which is supported by their divergent scoring strategies. We improved each algorithm by an average area under a receiver operating characteristics curve increase of 16% by integrating a variety of indirect data known to correlate with established protein-protein interactions, including mRNA coexpression, gene ontologies, domain-domain binding affinities, and homologous protein interactions. Each APMS scoring approach was incorporated into a separate logistic regression model along with the indirect features; the resulting three classifiers demonstrate improved performance on five diverse APMS data sets. To facilitate APMS data scoring within the scientific community, we created Spotlite, a user-friendly and fast web application. Within Spotlite, data can be scored with the augmented classifiers, annotated, and visualized (http://cancer.unc.edu/majorlab/software.php). The utility of the Spotlite platform to reveal physical, functional, and disease-relevant characteristics within APMS data is established through a focused analysis of the KEAP1 E3 ubiquitin ligase.
KW - KEAP1
KW - affinity purification mass spectrometry
KW - bioinformatics
KW - machine learning
KW - protein-protein interactions
UR - http://www.scopus.com/inward/record.url?scp=84915798945&partnerID=8YFLogxK
U2 - 10.1021/pr5008416
DO - 10.1021/pr5008416
M3 - Article
C2 - 25300367
AN - SCOPUS:84915798945
SN - 1535-3893
VL - 13
SP - 5944
EP - 5955
JO - Journal of Proteome Research
JF - Journal of Proteome Research
IS - 12
ER -