TY - GEN
T1 - Attaining the 2nd Chargaff Rule by Tandem Duplications
AU - Jain, Siddharth
AU - Raviv, Netanel
AU - Bruck, Jehoshua
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/8/15
Y1 - 2018/8/15
N2 - Erwin Chargaff in 1950 made an experimental observation that the count of A is equal to the count of \mathrm{T} and the count of \mathrm{C} is equal to the count of \mathrm{G} in DNA. This observation played a crucial role in the discovery of the double stranded helix structure by Watson and Crick. However, this symmetry was also observed in single stranded DNA. This phenomenon was termed as the 2nd Chargaff Rule. This symmetry has been verified experimentally in genomes of several different species not only for mononucleotides but also for reverse complement pairs of larger lengths upto a small error. While the symmetry in double stranded DNA is related to base pairing and replication mechanisms, the symmetry in a single stranded DNA is still a mystery in its function and source. In this work, we define a sequence generation model based on reverse complement tandem duplications. We show that this model generates sequences that satisfy the 2nd Chargaff Rule even when the duplication lengths are very small when compared to the length of sequences. We also provide estimates on the number of generations that are needed by this model to generate sequences that satisfy the 2nd Chargaff Rule. We provide theoretical bounds on the disruption in symmetry for different values of duplication lengths under this model. Moreover, we experimentally compare the disruption in the symmetry incurred by our model with what is observed in human genome data.
AB - Erwin Chargaff in 1950 made an experimental observation that the count of A is equal to the count of \mathrm{T} and the count of \mathrm{C} is equal to the count of \mathrm{G} in DNA. This observation played a crucial role in the discovery of the double stranded helix structure by Watson and Crick. However, this symmetry was also observed in single stranded DNA. This phenomenon was termed as the 2nd Chargaff Rule. This symmetry has been verified experimentally in genomes of several different species not only for mononucleotides but also for reverse complement pairs of larger lengths upto a small error. While the symmetry in double stranded DNA is related to base pairing and replication mechanisms, the symmetry in a single stranded DNA is still a mystery in its function and source. In this work, we define a sequence generation model based on reverse complement tandem duplications. We show that this model generates sequences that satisfy the 2nd Chargaff Rule even when the duplication lengths are very small when compared to the length of sequences. We also provide estimates on the number of generations that are needed by this model to generate sequences that satisfy the 2nd Chargaff Rule. We provide theoretical bounds on the disruption in symmetry for different values of duplication lengths under this model. Moreover, we experimentally compare the disruption in the symmetry incurred by our model with what is observed in human genome data.
KW - Balanced and unbalanced sequences
KW - Duplications
KW - Inversion symmetry
KW - Reverse complement
UR - https://www.scopus.com/pages/publications/85052457352
U2 - 10.1109/ISIT.2018.8437526
DO - 10.1109/ISIT.2018.8437526
M3 - Conference contribution
AN - SCOPUS:85052457352
SN - 9781538647806
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 2241
EP - 2245
BT - 2018 IEEE International Symposium on Information Theory, ISIT 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Symposium on Information Theory, ISIT 2018
Y2 - 17 June 2018 through 22 June 2018
ER -