TY - JOUR
T1 - Identifying the conserved network of cis-regulatory sites of a eukaryotic genome
AU - Wang, Ting
AU - Stormo, Gary D.
PY - 2005/11/29
Y1 - 2005/11/29
N2 - A major focus of genome research has been to decipher the cis-regulatory code that governs complex transcriptional regulation. We report a computational approach for identifying conserved regulatory motifs of an organism directly from whole genome sequences of several related species without reliance on additional information. We first construct phylogenetic profiles for each promoter, then use a BLAST-like algorithm to efficiently search through the entire profile space of all of the promoters in the genome to identify conserved motifs and the promoters that contain them. Statistical significance is estimated by modified Karlin-Altschul statistics. We applied this approach to the analysis of 3,524 Saccharomyces cerevisiae promoters and identified a highly organized regulatory network involving 3,315 promoters and 296 motifs. This network includes nearly all of the currently known motifs and covers >90% of known transcription factor binding sites. Most of the predicted coregulated gene clusters in the network have additional supporting evidence. Theoretical analysis suggests that our algorithm should be applicable to much larger genomes, such as the human genome, without reaching its statistical limitation.
AB - A major focus of genome research has been to decipher the cis-regulatory code that governs complex transcriptional regulation. We report a computational approach for identifying conserved regulatory motifs of an organism directly from whole genome sequences of several related species without reliance on additional information. We first construct phylogenetic profiles for each promoter, then use a BLAST-like algorithm to efficiently search through the entire profile space of all of the promoters in the genome to identify conserved motifs and the promoters that contain them. Statistical significance is estimated by modified Karlin-Altschul statistics. We applied this approach to the analysis of 3,524 Saccharomyces cerevisiae promoters and identified a highly organized regulatory network involving 3,315 promoters and 296 motifs. This network includes nearly all of the currently known motifs and covers >90% of known transcription factor binding sites. Most of the predicted coregulated gene clusters in the network have additional supporting evidence. Theoretical analysis suggests that our algorithm should be applicable to much larger genomes, such as the human genome, without reaching its statistical limitation.
KW - Comparative genomics
KW - Motif discovery
KW - Regulatory network
UR - http://www.scopus.com/inward/record.url?scp=28444484981&partnerID=8YFLogxK
U2 - 10.1073/pnas.0505147102
DO - 10.1073/pnas.0505147102
M3 - Article
C2 - 16301543
AN - SCOPUS:28444484981
SN - 0027-8424
VL - 102
SP - 17400
EP - 17405
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 48
ER -