TY - JOUR
T1 - The Impact of Patterns in Linkage Disequilibrium and Sequencing Quality on the Imprint of Balancing Selection
AU - Hayeck, Tristan J.
AU - Li, Yang
AU - Mosbruger, Timothy L.
AU - Bradfield, Jonathan P.
AU - Gleason, Adam G.
AU - Damianos, George
AU - Shaw, Grace Tzun Wen
AU - Duke, Jamie L.
AU - Conlin, Laura K.
AU - Turner, Tychele N.
AU - Fernández-Viña, Marcelo A.
AU - Sarmady, Mahdi
AU - Monos, Dimitri S.
N1 - Publisher Copyright:
© The Author(s) 2024. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
PY - 2024/2/1
Y1 - 2024/2/1
N2 - Regions under balancing selection are characterized by dense polymorphisms and multiple persistent haplotypes, along with other sequence complexities. Successful identification of these patterns depends on both the statistical approach and the quality of sequencing. To address this challenge, at first, a new statistical method called LD-ABF was developed, employing efficient Bayesian techniques to effectively test for balancing selection. LD-ABF demonstrated the most robust detection of selection in a variety of simulation scenarios, compared against a range of existing tests/tools (Tajima's D, HKA, Dng, BetaScan, and BalLerMix). Furthermore, the impact of the quality of sequencing on detection of balancing selection was explored, as well, using: (i) SNP genotyping and exome data, (ii) targeted high-resolution HLA genotyping (IHIW), and (iii) whole-genome long-read sequencing data (Pangenome). In the analysis of SNP genotyping and exome data, we identified known targets and 38 new selection signatures in genes not previously linked to balancing selection. To further investigate the impact of sequencing quality on detection of balancing selection, a detailed investigation of the MHC was performed with high-resolution HLA typing data. Higher quality sequencing revealed the HLA-DQ genes consistently demonstrated strong selection signatures otherwise not observed from the sparser SNP array and exome data. The HLA-DQ selection signature was also replicated in the Pangenome samples using considerably less samples but, with high-quality long-read sequence data. The improved statistical method, coupled with higher quality sequencing, leads to more consistent identification of selection and enhanced localization of variants under selection, particularly in complex regions.
AB - Regions under balancing selection are characterized by dense polymorphisms and multiple persistent haplotypes, along with other sequence complexities. Successful identification of these patterns depends on both the statistical approach and the quality of sequencing. To address this challenge, at first, a new statistical method called LD-ABF was developed, employing efficient Bayesian techniques to effectively test for balancing selection. LD-ABF demonstrated the most robust detection of selection in a variety of simulation scenarios, compared against a range of existing tests/tools (Tajima's D, HKA, Dng, BetaScan, and BalLerMix). Furthermore, the impact of the quality of sequencing on detection of balancing selection was explored, as well, using: (i) SNP genotyping and exome data, (ii) targeted high-resolution HLA genotyping (IHIW), and (iii) whole-genome long-read sequencing data (Pangenome). In the analysis of SNP genotyping and exome data, we identified known targets and 38 new selection signatures in genes not previously linked to balancing selection. To further investigate the impact of sequencing quality on detection of balancing selection, a detailed investigation of the MHC was performed with high-resolution HLA typing data. Higher quality sequencing revealed the HLA-DQ genes consistently demonstrated strong selection signatures otherwise not observed from the sparser SNP array and exome data. The HLA-DQ selection signature was also replicated in the Pangenome samples using considerably less samples but, with high-quality long-read sequence data. The improved statistical method, coupled with higher quality sequencing, leads to more consistent identification of selection and enhanced localization of variants under selection, particularly in complex regions.
KW - Bayesian
KW - balancing selection
KW - human leukocyte antigen genes
KW - linkage disequilibriumhuman
KW - population genetics
KW - sequencing platform
KW - statistical genetics
UR - http://www.scopus.com/inward/record.url?scp=85184822841&partnerID=8YFLogxK
U2 - 10.1093/gbe/evae009
DO - 10.1093/gbe/evae009
M3 - Article
C2 - 38302106
AN - SCOPUS:85184822841
SN - 1759-6653
VL - 16
JO - Genome Biology and Evolution
JF - Genome Biology and Evolution
IS - 2
M1 - evae009
ER -