TY - GEN
T1 - Build a dictionary, learn a grammar, decipher stegoscripts, and discover genomic regulatory elements
AU - Wang, Guandong
AU - Zhang, Weixiong
PY - 2007
Y1 - 2007
N2 - It has been a challenge to discover transcription factor (TF) binding motifs (TFBMs), which are short cis-regulatory DNA sequences playing essential roles in transcriptional regulation. We approach the problem of discovering TFBMs from a steganographic perspective. We view the regulatory regions of a genome as if they constituted a stegoscript with conserved words (i.e., TFBMs) being embedded in a covertext, and model the stegoscript with a statistical model consisting of a dictionary and a grammar. We develop an efficient algorithm, WordSpy, to learn such a model from a stegoscript and to recover conserved motifs. Subsequently, we select biologically meaningful motifs based on a motif's specificity to the set of genes of interest and/or the expression coherence of the genes whose promoters contain the motif. From the promoters of 645 distinct cell-cycle related genes of S. cerevisiae, our method is able to identify all known cell-cycle related TFBMs among its top ranking motifs. Our method can also be directly applied to discriminative motif finding. By utilizing the ChIP-chip data of Lee et al., we predicted potential binding motifs of 113 known transcription factors of budding yeast.
AB - It has been a challenge to discover transcription factor (TF) binding motifs (TFBMs), which are short cis-regulatory DNA sequences playing essential roles in transcriptional regulation. We approach the problem of discovering TFBMs from a steganographic perspective. We view the regulatory regions of a genome as if they constituted a stegoscript with conserved words (i.e., TFBMs) being embedded in a covertext, and model the stegoscript with a statistical model consisting of a dictionary and a grammar. We develop an efficient algorithm, WordSpy, to learn such a model from a stegoscript and to recover conserved motifs. Subsequently, we select biologically meaningful motifs based on a motif's specificity to the set of genes of interest and/or the expression coherence of the genes whose promoters contain the motif. From the promoters of 645 distinct cell-cycle related genes of S. cerevisiae, our method is able to identify all known cell-cycle related TFBMs among its top ranking motifs. Our method can also be directly applied to discriminative motif finding. By utilizing the ChIP-chip data of Lee et al., we predicted potential binding motifs of 113 known transcription factors of budding yeast.
UR - http://www.scopus.com/inward/record.url?scp=34547474250&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:34547474250
SN - 3540482938
SN - 9783540482932
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 80
EP - 94
BT - Systems Biology and Regulatory Genomics - Joint Annual RECOMB 2005 Satellite Workshops on Systems Biology and on Regulatory Genomics, Revised Selected Papers
T2 - Joint Annual RECOMB 2005 Satellite Workshops on Systems Biology and on Regulatory Genomics
Y2 - 2 December 2005 through 4 December 2005
ER -