TY - JOUR
T1 - Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity
AU - Swamidass, S. Joshua
AU - Chen, Jonathan
AU - Bruand, Jocelyne
AU - Phung, Peter
AU - Ralaivola, Liva
AU - Baldi, Pierre
N1 - Funding Information:
Work supported by NIH (LM-07443-01) and NSF (EIA-0321390) grants to P.B., by the UCI Medical Scientist Training Program, and by a Harvey Fellowship to S.J.S.
PY - 2005/6
Y1 - 2005/6
N2 - Motivation: Small molecules play a fundamental role in organic chemistry and biology. They can be used to probe biological systems and to discover new drugs and other useful compounds. As increasing numbers of large datasets of small molecules become available, it is necessary to develop computational methods that can deal with molecules of variable size and structure and predict their physical, chemical and biological properties. Results: Here we develop several new classes of kernels for small molecules using their 1D, 2D and 3D representations. In 1D, we consider string kernels based on SMILES strings. In 2D, we introduce several similarity kernels based on conventional or generalized fingerprints. Generalized fingerprints are derived by counting in different ways subpaths contained in the graph of bonds, using depth-first searches. In 3D, we consider similarity measures between histograms of pairwise distances between atom classes. These kernels can be computed efficiently and are applied to problems of classification and prediction of mutagenicity, toxicity and anti-cancer activity on three publicly available datasets. The results derived using cross-validation methods are state-of-the-art. Tradeoffs between various kernels are briefly discussed.
AB - Motivation: Small molecules play a fundamental role in organic chemistry and biology. They can be used to probe biological systems and to discover new drugs and other useful compounds. As increasing numbers of large datasets of small molecules become available, it is necessary to develop computational methods that can deal with molecules of variable size and structure and predict their physical, chemical and biological properties. Results: Here we develop several new classes of kernels for small molecules using their 1D, 2D and 3D representations. In 1D, we consider string kernels based on SMILES strings. In 2D, we introduce several similarity kernels based on conventional or generalized fingerprints. Generalized fingerprints are derived by counting in different ways subpaths contained in the graph of bonds, using depth-first searches. In 3D, we consider similarity measures between histograms of pairwise distances between atom classes. These kernels can be computed efficiently and are applied to problems of classification and prediction of mutagenicity, toxicity and anti-cancer activity on three publicly available datasets. The results derived using cross-validation methods are state-of-the-art. Tradeoffs between various kernels are briefly discussed.
UR - http://www.scopus.com/inward/record.url?scp=26944486424&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bti1055
DO - 10.1093/bioinformatics/bti1055
M3 - Article
C2 - 15961479
AN - SCOPUS:26944486424
SN - 1367-4803
VL - 21
SP - i359-i368
JO - Bioinformatics
JF - Bioinformatics
IS - SUPPL. 1
ER -