TY - JOUR
T1 - Inherent limitations of probabilistic models for protein-DNA binding specificity
AU - Ruan, Shuxiang
AU - Stormo, Gary D.
N1 - Publisher Copyright:
© 2017 Ruan, Stormo.
PY - 2017/7
Y1 - 2017/7
N2 - The specificities of transcription factors are most commonly represented with probabilistic models. These models provide a probability for each base occurring at each position within the binding site and the positions are assumed to contribute independently. The model is simple and intuitive and is the basis for many motif discovery algorithms. However, the model also has inherent limitations that prevent it from accurately representing true binding probabilities, especially for the highest affinity sites under conditions of high protein concentration. The limitations are not due to the assumption of independence between positions but rather are caused by the non-linear relationship between binding affinity and binding probability and the fact that independent normalization at each position skews the site probabilities. Generally probabilistic models are reasonably good approximations, but new high-throughput methods allow for biophysical models with increased accuracy that should be used whenever possible.
AB - The specificities of transcription factors are most commonly represented with probabilistic models. These models provide a probability for each base occurring at each position within the binding site and the positions are assumed to contribute independently. The model is simple and intuitive and is the basis for many motif discovery algorithms. However, the model also has inherent limitations that prevent it from accurately representing true binding probabilities, especially for the highest affinity sites under conditions of high protein concentration. The limitations are not due to the assumption of independence between positions but rather are caused by the non-linear relationship between binding affinity and binding probability and the fact that independent normalization at each position skews the site probabilities. Generally probabilistic models are reasonably good approximations, but new high-throughput methods allow for biophysical models with increased accuracy that should be used whenever possible.
UR - http://www.scopus.com/inward/record.url?scp=85026652264&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1005638
DO - 10.1371/journal.pcbi.1005638
M3 - Article
C2 - 28686588
AN - SCOPUS:85026652264
SN - 1553-734X
VL - 13
JO - PLoS computational biology
JF - PLoS computational biology
IS - 7
M1 - e1005638
ER -