TY - JOUR
T1 - Accurate and efficient target prediction using a potency-sensitive influence-relevance voter
AU - Lusci, Alessandro
AU - Browning, Michael
AU - Fooshee, David
AU - Swamidass, Joshua
AU - Baldi, Pierre
N1 - Funding Information:
AL, DF, and PB’s research was supported by Grants NSF IIS-0513376, NIH LM010235, and NIH NLM T15 LM07443 and a Google Faculty Research award to PB. We acknowledge OpenEye Scientific Software for its academic software license, NVIDIA for a hardware donation, and Yuzo Kanomata for computing support.
Publisher Copyright:
© 2015 Lusci et al.
PY - 2015/12/29
Y1 - 2015/12/29
N2 - Background: A number of algorithms have been proposed to predict the biological targets of diverse molecules. Some are structure-based, but the most common are ligand-based and use chemical fingerprints and the notion of chemical similarity. These methods tend to be computationally faster than others, making them particularly attractive tools as the amount of available data grows. Results: Using a ChEMBL-derived database covering 490,760 molecule-protein interactions and 3236 protein targets, we conduct a large-scale assessment of the performance of several target-prediction algorithms at predicting drug-target activity. We assess algorithm performance using three validation procedures: standard tenfold cross-validation, tenfold cross-validation in a simulated screen that includes random inactive molecules, and validation on an external test set composed of molecules not present in our database. Conclusions: We present two improvements over current practice. First, using a modified version of the influence-relevance voter (IRV), we show that using molecule potency data can improve target prediction. Second, we demonstrate that random inactive molecules added during training can boost the accuracy of several algorithms in realistic target-prediction experiments. Our potency-sensitive version of the IRV (PS-IRV) obtains the best results on large test sets in most of the experiments. Models and software are publicly accessible through the chemoinformatics portal at http://chemdb.ics.uci.edu/.
AB - Background: A number of algorithms have been proposed to predict the biological targets of diverse molecules. Some are structure-based, but the most common are ligand-based and use chemical fingerprints and the notion of chemical similarity. These methods tend to be computationally faster than others, making them particularly attractive tools as the amount of available data grows. Results: Using a ChEMBL-derived database covering 490,760 molecule-protein interactions and 3236 protein targets, we conduct a large-scale assessment of the performance of several target-prediction algorithms at predicting drug-target activity. We assess algorithm performance using three validation procedures: standard tenfold cross-validation, tenfold cross-validation in a simulated screen that includes random inactive molecules, and validation on an external test set composed of molecules not present in our database. Conclusions: We present two improvements over current practice. First, using a modified version of the influence-relevance voter (IRV), we show that using molecule potency data can improve target prediction. Second, we demonstrate that random inactive molecules added during training can boost the accuracy of several algorithms in realistic target-prediction experiments. Our potency-sensitive version of the IRV (PS-IRV) obtains the best results on large test sets in most of the experiments. Models and software are publicly accessible through the chemoinformatics portal at http://chemdb.ics.uci.edu/.
KW - Fingerprints
KW - Influence-relevance voter
KW - Large-scale
KW - Molecular potency
KW - Random inactive molecules
KW - Target-prediction
UR - http://www.scopus.com/inward/record.url?scp=84952304768&partnerID=8YFLogxK
U2 - 10.1186/s13321-015-0110-6
DO - 10.1186/s13321-015-0110-6
M3 - Article
C2 - 26719774
AN - SCOPUS:84952304768
VL - 7
JO - Journal of Cheminformatics
JF - Journal of Cheminformatics
SN - 1758-2946
IS - 1
M1 - 63
ER -