Build a dictionary, learn a grammar, decipher stegoscripts, and discover genomic regulatory elements

Guandong Wang, Weixiong Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

It has been a challenge to discover transcription factor (TF) binding motifs (TFBMs), which are short cis-regulatory DNA sequences playing essential roles in transcriptional regulation. We approach the problem of discovering TFBMs from a steganographic perspective. We view the regulatory regions of a genome as if they constituted a stegoscript with conserved words (i.e., TFBMs) being embedded in a covertext, and model the stegoscript with a statistical model consisting of a dictionary and a grammar. We develop an efficient algorithm, WordSpy, to learn such a model from a stegoscript and to recover conserved motifs. Subsequently, we select biologically meaningful motifs based on a motif's specificity to the set of genes of interest and/or the expression coherence of the genes whose promoters contain the motif. From the promoters of 645 distinct cell-cycle related genes of S. cerevisiae, our method is able to identify all known cell-cycle related TFBMs among its top ranking motifs. Our method can also be directly applied to discriminative motif finding. By utilizing the ChIP-chip data of Lee et al., we predicted potential binding motifs of 113 known transcription factors of budding yeast.

Original languageEnglish
Title of host publicationSystems Biology and Regulatory Genomics - Joint Annual RECOMB 2005 Satellite Workshops on Systems Biology and on Regulatory Genomics, Revised Selected Papers
Pages80-94
Number of pages15
StatePublished - 2007
EventJoint Annual RECOMB 2005 Satellite Workshops on Systems Biology and on Regulatory Genomics - San Diego, CA, United States
Duration: Dec 2 2005Dec 4 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4023 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceJoint Annual RECOMB 2005 Satellite Workshops on Systems Biology and on Regulatory Genomics
Country/TerritoryUnited States
CitySan Diego, CA
Period12/2/0512/4/05

Fingerprint

Dive into the research topics of 'Build a dictionary, learn a grammar, decipher stegoscripts, and discover genomic regulatory elements'. Together they form a unique fingerprint.

Cite this