Abstract

Regulatory sites that control gene expression are essential to the proper functioning of cells, and identifying them is critical for modeling regulatory networks. We have developed Magma (Multiple Aligner of Genomic Multiple Alignments), a software tool for multiple species, multiple gene motif discovery. Magma identifies putative regulatory sites that are conserved across multiple species and occur near multiple genes throughout a reference genome. Magma takes as input multiple alignments that can include gaps. It uses efficient clustering methods that make it about 70 times faster than PhyloNet, a previous program for this task, with slightly greater sensitivity. We ran Magma on all non-coding DNA conserved between Caenorhabditis elegans and five additional species, about 70 Mbp in total, in <4 h. We obtained 2,309 motifs with lengths of 6-20 bp, each occurring at least 10 times throughout the genome, which collectively covered about 566 kbp of the genomes, approximately 0.8% of the input. Predicted sites occurred in all types of non-coding sequence but were especially enriched in the promoter regions. Comparisons to several experimental datasets show that Magma motifs correspond to a variety of known regulatory motifs.

Original languageEnglish
Pages (from-to)139-147
Number of pages9
JournalJournal of Computational Biology
Volume19
Issue number2
DOIs
StatePublished - Feb 1 2012

Keywords

  • ChIP analysis
  • cis-regulatory elements
  • eukaryotic motif-finding
  • fast motif-finding
  • genome-wide motif-finding
  • motif redundancy
  • motif-expression association
  • transcription factor binding site discovery

Fingerprint

Dive into the research topics of 'Fast, sensitive discovery of conserved genome-wide motifs'. Together they form a unique fingerprint.

Cite this