TY - JOUR
T1 - Systematic analysis of binding of transcription factors to noncoding variants
AU - Yan, Jian
AU - Qiu, Yunjiang
AU - Ribeiro dos Santos, André M.
AU - Yin, Yimeng
AU - Li, Yang E.
AU - Vinckier, Nick
AU - Nariai, Naoki
AU - Benaglio, Paola
AU - Raman, Anugraha
AU - Li, Xiaoyu
AU - Fan, Shicai
AU - Chiou, Joshua
AU - Chen, Fulin
AU - Frazer, Kelly A.
AU - Gaulton, Kyle J.
AU - Sander, Maike
AU - Taipale, Jussi
AU - Ren, Bing
N1 - Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer Nature Limited.
PY - 2021/3/4
Y1 - 2021/3/4
N2 - Many sequence variants have been linked to complex human traits and diseases1, but deciphering their biological functions remains challenging, as most of them reside in noncoding DNA. Here we have systematically assessed the binding of 270 human transcription factors to 95,886 noncoding variants in the human genome using an ultra-high-throughput multiplex protein–DNA binding assay, termed single-nucleotide polymorphism evaluation by systematic evolution of ligands by exponential enrichment (SNP-SELEX). The resulting 828 million measurements of transcription factor–DNA interactions enable estimation of the relative affinity of these transcription factors to each variant in vitro and evaluation of the current methods to predict the effects of noncoding variants on transcription factor binding. We show that the position weight matrices of most transcription factors lack sufficient predictive power, whereas the support vector machine combined with the gapped k-mer representation show much improved performance, when assessed on results from independent SNP-SELEX experiments involving a new set of 61,020 sequence variants. We report highly predictive models for 94 human transcription factors and demonstrate their utility in genome-wide association studies and understanding of the molecular pathways involved in diverse human traits and diseases.
AB - Many sequence variants have been linked to complex human traits and diseases1, but deciphering their biological functions remains challenging, as most of them reside in noncoding DNA. Here we have systematically assessed the binding of 270 human transcription factors to 95,886 noncoding variants in the human genome using an ultra-high-throughput multiplex protein–DNA binding assay, termed single-nucleotide polymorphism evaluation by systematic evolution of ligands by exponential enrichment (SNP-SELEX). The resulting 828 million measurements of transcription factor–DNA interactions enable estimation of the relative affinity of these transcription factors to each variant in vitro and evaluation of the current methods to predict the effects of noncoding variants on transcription factor binding. We show that the position weight matrices of most transcription factors lack sufficient predictive power, whereas the support vector machine combined with the gapped k-mer representation show much improved performance, when assessed on results from independent SNP-SELEX experiments involving a new set of 61,020 sequence variants. We report highly predictive models for 94 human transcription factors and demonstrate their utility in genome-wide association studies and understanding of the molecular pathways involved in diverse human traits and diseases.
UR - http://www.scopus.com/inward/record.url?scp=85099935952&partnerID=8YFLogxK
U2 - 10.1038/s41586-021-03211-0
DO - 10.1038/s41586-021-03211-0
M3 - Article
C2 - 33505025
AN - SCOPUS:85099935952
SN - 0028-0836
VL - 591
SP - 147
EP - 151
JO - Nature
JF - Nature
IS - 7848
ER -