Abstract

The challenge of similarity search in massive DNA sequence databases has inspired major changes in B L AST-style alignment tools, which accelerate search by inspecting only pairs of sequences sharing a common short "seed," or pattern of matching residues. Some of these changes raise the possibility of improving search performance by probing sequence pairs with several distinct seeds, any one of which is sufficient for a seed match. However, designing a set of seeds to maximize their combined sensitivity to biologically meaningful sequence alignments is computationally difficult, even given recent advances [16, 6] in designing single seeds. This work describes algorithmic improvements to seed design that address the problem of designing a set of n seeds to be used simultaneously. We give a new local search method to optimize the sensitivity of seed sets. The method relies on efficient incremental computation of the probability that an alignment contains a match to a seed IT, given that it has already failed to match any of the seeds in a set II. We demonstrate experimentally that multi-seed designs, even with relatively few seeds, can be significantly more sensitive than even optimized single-seed designs.

Original languageEnglish
Pages76-84
Number of pages9
DOIs
StatePublished - 2004
EventRECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology - San Diego, CA., United States
Duration: Mar 27 2004Mar 31 2004

Conference

ConferenceRECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology
Country/TerritoryUnited States
CitySan Diego, CA.
Period03/27/0403/31/04

Keywords

  • Biosequence comparison
  • Genomic DNA
  • Mandala
  • Seed design
  • Similarity search

Fingerprint

Dive into the research topics of 'Designing multiple simultaneous seeds for DNA similarity search'. Together they form a unique fingerprint.

Cite this