TY - JOUR
T1 - Graph kernels for chemical informatics
AU - Ralaivola, Liva
AU - Swamidass, Sanjay J.
AU - Saigo, Hiroto
AU - Baldi, Pierre
N1 - Funding Information:
Work supported by a Laurel Wilkening Faculty Innovation award, an NIH Biomedical Informatics Training grant (LM-07443-01), an NSF MRI grant (EIA-0321390), a Sun Microsystems award, a grant from the University of California Systemwide Biotechnology Research and Education Program to PB and an MD/PhD Harvey Fellowship to S.J.S. We would also like to acknowledge OpenEye Scientific Software for their free academic software license.
PY - 2005/10
Y1 - 2005/10
N2 - Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here, we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depth-first search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5% on the Mutag dataset, 65-67% on the PTC (Predictive Toxicology Challenge) dataset, and 72% on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed.
AB - Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here, we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depth-first search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5% on the Mutag dataset, 65-67% on the PTC (Predictive Toxicology Challenge) dataset, and 72% on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed.
KW - Activity
KW - Chemical informatics
KW - Computational chemistry
KW - Convolution kernels
KW - Drug design
KW - Graph kernels
KW - Kernel methods
KW - Recursive neural networks
KW - Spectral kernels
KW - Toxicity
UR - http://www.scopus.com/inward/record.url?scp=23844480138&partnerID=8YFLogxK
U2 - 10.1016/j.neunet.2005.07.009
DO - 10.1016/j.neunet.2005.07.009
M3 - Article
C2 - 16157471
AN - SCOPUS:23844480138
SN - 0893-6080
VL - 18
SP - 1093
EP - 1110
JO - Neural Networks
JF - Neural Networks
IS - 8
ER -