TY - JOUR
T1 - Non-parametric Polygenic Risk Prediction via Partitioned GWAS Summary Statistics
AU - Chun, Sung
AU - Imakaev, Maxim
AU - Hui, Daniel
AU - Patsopoulos, Nikolaos A.
AU - Neale, Benjamin M.
AU - Kathiresan, Sekar
AU - Stitziel, Nathan O.
AU - Sunyaev, Shamil R.
N1 - Publisher Copyright:
© 2020 American Society of Human Genetics
PY - 2020/7/2
Y1 - 2020/7/2
N2 - In complex trait genetics, the ability to predict phenotype from genotype is the ultimate measure of our understanding of genetic architecture underlying the heritability of a trait. A complete understanding of the genetic basis of a trait should allow for predictive methods with accuracies approaching the trait's heritability. The highly polygenic nature of quantitative traits and most common phenotypes has motivated the development of statistical strategies focused on combining myriad individually non-significant genetic effects. Now that predictive accuracies are improving, there is a growing interest in the practical utility of such methods for predicting risk of common diseases responsive to early therapeutic intervention. However, existing methods require individual-level genotypes or depend on accurately specifying the genetic architecture underlying each disease to be predicted. Here, we propose a polygenic risk prediction method that does not require explicitly modeling any underlying genetic architecture. We start with summary statistics in the form of SNP effect sizes from a large GWAS cohort. We then remove the correlation structure across summary statistics arising due to linkage disequilibrium and apply a piecewise linear interpolation on conditional mean effects. In both simulated and real datasets, this new non-parametric shrinkage (NPS) method can reliably allow for linkage disequilibrium in summary statistics of 5 million dense genome-wide markers and consistently improves prediction accuracy. We show that NPS improves the identification of groups at high risk for breast cancer, type 2 diabetes, inflammatory bowel disease, and coronary heart disease, all of which have available early intervention or prevention treatments.
AB - In complex trait genetics, the ability to predict phenotype from genotype is the ultimate measure of our understanding of genetic architecture underlying the heritability of a trait. A complete understanding of the genetic basis of a trait should allow for predictive methods with accuracies approaching the trait's heritability. The highly polygenic nature of quantitative traits and most common phenotypes has motivated the development of statistical strategies focused on combining myriad individually non-significant genetic effects. Now that predictive accuracies are improving, there is a growing interest in the practical utility of such methods for predicting risk of common diseases responsive to early therapeutic intervention. However, existing methods require individual-level genotypes or depend on accurately specifying the genetic architecture underlying each disease to be predicted. Here, we propose a polygenic risk prediction method that does not require explicitly modeling any underlying genetic architecture. We start with summary statistics in the form of SNP effect sizes from a large GWAS cohort. We then remove the correlation structure across summary statistics arising due to linkage disequilibrium and apply a piecewise linear interpolation on conditional mean effects. In both simulated and real datasets, this new non-parametric shrinkage (NPS) method can reliably allow for linkage disequilibrium in summary statistics of 5 million dense genome-wide markers and consistently improves prediction accuracy. We show that NPS improves the identification of groups at high risk for breast cancer, type 2 diabetes, inflammatory bowel disease, and coronary heart disease, all of which have available early intervention or prevention treatments.
KW - genome-wide association study
KW - linkage disequilibrium
KW - non-parametric prediction
KW - phenotype prediction
KW - polygenic score
KW - prognosis
KW - summary statistics
UR - http://www.scopus.com/inward/record.url?scp=85086113873&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2020.05.004
DO - 10.1016/j.ajhg.2020.05.004
M3 - Article
C2 - 32470373
AN - SCOPUS:85086113873
SN - 0002-9297
VL - 107
SP - 46
EP - 59
JO - American journal of human genetics
JF - American journal of human genetics
IS - 1
ER -