Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments

Lon R. Cardon, Gary D. Stormo

Research output: Contribution to journalArticle

107 Scopus citations

Abstract

An Expectation Maximization algorithm for identification of DNA binding sites is presented. The approach predicts the location of binding regions while allowing variable length spacers within the sites. In addition to predicting the most likely spacer length for a set of DNA fragments, the method identifies individual sites that differ in spacer size. No alignment of DNA sequences is necessary. The method is illustrated by application to 231 Escherichia coli DNA fragments known to contain promoters with variable spacings between their consensus regions. Maximum-likelihood tests of the differences between the spacing classes indicate that the consensus regions of the spacing classes are not distinct. Further tests suggest that several positions within the spacing region may contribute to promoter specificity.

Original languageEnglish
Pages (from-to)159-170
Number of pages12
JournalJournal of Molecular Biology
Volume223
Issue number1
DOIs
StatePublished - Jan 5 1992
Externally publishedYes

Keywords

  • DNA-protein
  • Expectation Maximum
  • consensus sequences
  • multiple alignment
  • promoters

Fingerprint Dive into the research topics of 'Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments'. Together they form a unique fingerprint.

  • Cite this