Statistical distribution of chemical fingerprints

S. Joshua Swamidass, Pierre Baldi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Binary fingerprints are binary vectors used to represent chemical molecules by recording the presence or absence of particular substructures, such as labeled paths in the 2D graph of bonds. Complete fingerprints are often reduced to a compressed format-of typical dimension n = 512 or n -1024-by using a simple congruence operation. The statistical properties of complete or compressed fingerprints representations are important since fingerprints are used to rapidly search large databases and to develop statistical machine learning methods in chemoinformatics. Here we present an empirical and mathematical analysis of the distribution of complete and compressed fingerprints. In particular, we derive formulas that provide good approximation for the expected number of bits set to one in a compressed fingerprint, given its uncompressed version, and vice versa.

Original languageEnglish
Title of host publicationFuzzy Logic and Applications - 6th International Workshop, WILF 2005, Revised Selected Papers
Pages11-18
Number of pages8
StatePublished - Jun 23 2006
Event6th International Workshop - Fuzzy Logic and Applications - Crema, Italy
Duration: Sep 15 2005Sep 17 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3849 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference6th International Workshop - Fuzzy Logic and Applications
Country/TerritoryItaly
CityCrema
Period09/15/0509/17/05

Fingerprint

Dive into the research topics of 'Statistical distribution of chemical fingerprints'. Together they form a unique fingerprint.

Cite this