TY - JOUR
T1 - Probabilistic substructure mining from small-molecule screens
AU - Ranu, Sayan
AU - Calhoun, Bradley T.
AU - Singh, Ambuj K.
AU - Swamidass, S. Joshua
PY - 2011/9
Y1 - 2011/9
N2 - Identifying the overrepresented substructures from a set of molecules with similar activity is a common task in chemical informatics. Existing substructure miners are deterministic, requiring the activity of all mined molecules to be known with high confidence. In contrast, we introduce pGraphSig, a probabilistic structure miner, which effectively mines structures from noisy data, where many molecules are labeled with their probability of being active. We benchmark pGraphSig on data from several small-molecule high throughput screens, finding that it can more effectively identify overrepresented structures than a deterministic structure miner.
AB - Identifying the overrepresented substructures from a set of molecules with similar activity is a common task in chemical informatics. Existing substructure miners are deterministic, requiring the activity of all mined molecules to be known with high confidence. In contrast, we introduce pGraphSig, a probabilistic structure miner, which effectively mines structures from noisy data, where many molecules are labeled with their probability of being active. We benchmark pGraphSig on data from several small-molecule high throughput screens, finding that it can more effectively identify overrepresented structures than a deterministic structure miner.
UR - http://www.scopus.com/inward/record.url?scp=80052886632&partnerID=8YFLogxK
U2 - 10.1002/minf.201100058
DO - 10.1002/minf.201100058
M3 - Article
C2 - 27467413
AN - SCOPUS:80052886632
SN - 1868-1743
VL - 30
SP - 809
EP - 815
JO - Molecular Informatics
JF - Molecular Informatics
IS - 9
ER -