TY - JOUR
T1 - A probabilistic method for identifying rare variants underlying complex traits
AU - Wang, Jiayin
AU - Zhao, Zhongmeng
AU - Cao, Zhi
AU - Yang, Aiyuan
AU - Zhang, Jin
N1 - Funding Information:
This work was supported by National Science Foundation [IIS-0803440], [CCF-1116175] and [IIS-0953563] and the Ph.D. Programs Foundation of Ministry of Education of China [20100201110063]. Authors thank Professor Sean Tavtigian and Professor Georgia Chenevix-Trench for sharing the ATM datasets with us. Authors thank Professor Chun-Xia Yan M.D. and M.D. Feng Zhu for discussing the elevated region and the background region from a medical and clinical view.
Funding Information:
The publication costs for this article were funded by Xi’an Jiaotong University. This article has been published as part of BMC Genomics Volume 14 Supplement 1, 2013: Selected articles from the Eleventh Asia Pacific Bioinformatics Conference (APBC 2013): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/ bmcgenomics/supplements/14/S1.
Publisher Copyright:
© 2013 Wang et al.
PY - 2013/1/21
Y1 - 2013/1/21
N2 - Background: Identifying the genetic variants that contribute to disease susceptibilities is important both for developing methodologies and for studying complex diseases in molecular biology. It has been demonstrated that the spectrum of minor allelic frequencies (MAFs) of risk genetic variants ranges from common to rare. Although association studies are shifting to incorporate rare variants (RVs) affecting complex traits, existing approaches do not show a high degree of success, and more efforts should be considered. Results: In this article, we focus on detecting associations between multiple rare variants and traits. Similar to RareCover, a widely used approach, we assume that variants located close to each other tend to have similar impacts on traits. Therefore, we introduce elevated regions and background regions, where the elevated regions are considered to have a higher chance of harboring causal variants. We propose a hidden Markov random field (HMRF) model to select a set of rare variants that potentially underlie the phenotype, and then, a statistical test is applied. Thus, the association analysis can be achieved without pre-selection by experts. In our model, each variant has two hidden states that represent the causal/non-causal status and the region status. In addition, two Bayesian processes are used to compare and estimate the genotype, phenotype and model parameters. We compare our approach to the three current methods using different types of datasets, and though these are simulation experiments, our approach has higher statistical power than the other methods. The software package, RareProb and the simulation datasets are available at: http://www.engr.uconn.edu/~jiw09003.
AB - Background: Identifying the genetic variants that contribute to disease susceptibilities is important both for developing methodologies and for studying complex diseases in molecular biology. It has been demonstrated that the spectrum of minor allelic frequencies (MAFs) of risk genetic variants ranges from common to rare. Although association studies are shifting to incorporate rare variants (RVs) affecting complex traits, existing approaches do not show a high degree of success, and more efforts should be considered. Results: In this article, we focus on detecting associations between multiple rare variants and traits. Similar to RareCover, a widely used approach, we assume that variants located close to each other tend to have similar impacts on traits. Therefore, we introduce elevated regions and background regions, where the elevated regions are considered to have a higher chance of harboring causal variants. We propose a hidden Markov random field (HMRF) model to select a set of rare variants that potentially underlie the phenotype, and then, a statistical test is applied. Thus, the association analysis can be achieved without pre-selection by experts. In our model, each variant has two hidden states that represent the causal/non-causal status and the region status. In addition, two Bayesian processes are used to compare and estimate the genotype, phenotype and model parameters. We compare our approach to the three current methods using different types of datasets, and though these are simulation experiments, our approach has higher statistical power than the other methods. The software package, RareProb and the simulation datasets are available at: http://www.engr.uconn.edu/~jiw09003.
UR - http://www.scopus.com/inward/record.url?scp=84881298161&partnerID=8YFLogxK
U2 - 10.1186/1471-2164-14-S1-S11
DO - 10.1186/1471-2164-14-S1-S11
M3 - Article
C2 - 23369113
AN - SCOPUS:84881298161
SN - 1471-2164
VL - 14
JO - BMC Genomics
JF - BMC Genomics
M1 - S11
ER -