TY - JOUR
T1 - ScerTF
T2 - A comprehensive database of benchmarked position weight matrices for Saccharomyces species
AU - Spivak, Aaron T.
AU - Stormo, Gary D.
N1 - Funding Information:
Funding for open access charge: National Institutes of Health (R01 GM078222 and HG00249).
PY - 2012/1
Y1 - 2012/1
N2 - Saccharomyces cerevisiae is a primary model for studies of transcriptional control, and the specificities of most yeast transcription factors (TFs) have been determined by multiple methods. However, it is unclear which position weight matrices (PWMs) are most useful; for the roughly 200 TFs in yeast, there are over 1200 PWMs in the literature. To address this issue, we created ScerTF, a comprehensive database of 1226 motifs from 11 different sources. We identified a single matrix for each TF that best predicts in vivo data by benchmarking matrices against chromatin immunoprecipitation and TF deletion experiments. We also used in vivo data to optimize thresholds for identifying regulatory sites with each matrix. To correct for biases from different methods, we developed a strategy to combine matrices. These aligned matrices outperform the best available matrix for several TFs. We used the matrices to predict co-occurring regulatory elements in the genome and identified many known TF combinations. In addition, we predict new combinations and provide evidence of combinatorial regulation from gene expression data. The database is available through a web interface at http://ural.wustl.edu/ScerTF. The site allows users to search the database with a regulatory site or matrix to identify the TFs most likely to bind the input sequence.
AB - Saccharomyces cerevisiae is a primary model for studies of transcriptional control, and the specificities of most yeast transcription factors (TFs) have been determined by multiple methods. However, it is unclear which position weight matrices (PWMs) are most useful; for the roughly 200 TFs in yeast, there are over 1200 PWMs in the literature. To address this issue, we created ScerTF, a comprehensive database of 1226 motifs from 11 different sources. We identified a single matrix for each TF that best predicts in vivo data by benchmarking matrices against chromatin immunoprecipitation and TF deletion experiments. We also used in vivo data to optimize thresholds for identifying regulatory sites with each matrix. To correct for biases from different methods, we developed a strategy to combine matrices. These aligned matrices outperform the best available matrix for several TFs. We used the matrices to predict co-occurring regulatory elements in the genome and identified many known TF combinations. In addition, we predict new combinations and provide evidence of combinatorial regulation from gene expression data. The database is available through a web interface at http://ural.wustl.edu/ScerTF. The site allows users to search the database with a regulatory site or matrix to identify the TFs most likely to bind the input sequence.
UR - http://www.scopus.com/inward/record.url?scp=84862214168&partnerID=8YFLogxK
U2 - 10.1093/nar/gkr1180
DO - 10.1093/nar/gkr1180
M3 - Article
C2 - 22140105
AN - SCOPUS:84862214168
SN - 0305-1048
VL - 40
SP - D162-D168
JO - Nucleic acids research
JF - Nucleic acids research
IS - D1
ER -