TY - GEN
T1 - cis-Regulatory element prediction in mammalian genomes
AU - Siddiqui, Asim
AU - Robertson, Gordon
AU - Bilenky, Misha
AU - Astakhova, Tamara
AU - Griffith, Obi L.
AU - Hassel, Maik
AU - Lin, Keven
AU - Montgomery, Stephen
AU - Oveisi, Mehrdad
AU - Pleasance, Erin
AU - Robertson, Neil
AU - Sleumer, Monica C.
AU - Teague, Kevin
AU - Varhol, Richard
AU - Zhang, Maggie
AU - Jones, Steven
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2005
Y1 - 2005
N2 - The identification of cis-regulatory elements and modules is an important step in understanding the regulation of genes. We have developed a pipeline capable of running multiple motif prediction methods on a whole genome scale. Using gene expression datasets to identify co-expressed genes and the Ensembl Compara database orthologues, we assemble input sequence sets comprised of the upstream regions of a target gene, its orthologues and co-expressed genes on the premise that such genes will share promoters by evolution (orthologues) or share regulatory control mechanisms (co-expressed genes). Co-expressed genes are identified by an approach that combines Pearson distances from multiple gene expression datasets derived from multiple experimental approaches and calibrated against the GO database. Our pipeline runs a number of established motif detection algorithms with a range of parameter settings on the input dataset. We integrate the diverse result sets by scoring motifs with a method-independent function. For each target gene, we assign p-values to the motif score by running the discovery pipeline on multiple sets of input sequence containing the target gene, non-coexpressed genes and "fake" orthologues generated by neutral numerical evolution. We have predicted 30,636 motif binding sites in human for 4,182 genes and an initial set of 472 motif binding sites in mouse for 92 genes with p < 0.001. The positive predictive value against a library of biologically confirmed regulatory sites approaches 0.4 at the highest p-value threshold. Predicted regulatory elements and other resources from the project are available at www.cisred.org.
AB - The identification of cis-regulatory elements and modules is an important step in understanding the regulation of genes. We have developed a pipeline capable of running multiple motif prediction methods on a whole genome scale. Using gene expression datasets to identify co-expressed genes and the Ensembl Compara database orthologues, we assemble input sequence sets comprised of the upstream regions of a target gene, its orthologues and co-expressed genes on the premise that such genes will share promoters by evolution (orthologues) or share regulatory control mechanisms (co-expressed genes). Co-expressed genes are identified by an approach that combines Pearson distances from multiple gene expression datasets derived from multiple experimental approaches and calibrated against the GO database. Our pipeline runs a number of established motif detection algorithms with a range of parameter settings on the input dataset. We integrate the diverse result sets by scoring motifs with a method-independent function. For each target gene, we assign p-values to the motif score by running the discovery pipeline on multiple sets of input sequence containing the target gene, non-coexpressed genes and "fake" orthologues generated by neutral numerical evolution. We have predicted 30,636 motif binding sites in human for 4,182 genes and an initial set of 472 motif binding sites in mouse for 92 genes with p < 0.001. The positive predictive value against a library of biologically confirmed regulatory sites approaches 0.4 at the highest p-value threshold. Predicted regulatory elements and other resources from the project are available at www.cisred.org.
UR - http://www.scopus.com/inward/record.url?scp=33749043451&partnerID=8YFLogxK
U2 - 10.1109/CSBW.2005.35
DO - 10.1109/CSBW.2005.35
M3 - Conference contribution
AN - SCOPUS:33749043451
SN - 0769524427
SN - 9780769524429
T3 - 2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts
SP - 203
EP - 206
BT - 2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts
T2 - 2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts
Y2 - 8 August 2005 through 11 August 2005
ER -