@article{28faaf132e43466e8d0bf8106c474a8f,
title = "Imputation across genotyping arrays for genome-wide association studies: Assessment of bias and a correction strategy",
abstract = "A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality.",
author = "Johnson, {Eric O.} and Hancock, {Dana B.} and Levy, {Joshua L.} and Gaddis, {Nathan C.} and Saccone, {Nancy L.} and Bierut, {Laura J.} and Page, {Grier P.}",
note = "Funding Information: Disease Research, was provided by the NIH GEI (U01HG004438), the National Institute on Alcohol Abuse and Alcoholism, the National Institute on Drug Abuse, and the NIH contract {\textquoteleft}{\textquoteleft}High throughput genotyping for studying the genetic contributions to human disease{\textquoteright}{\textquoteright} (HHSN268200782096C). The SAGE dataset used for the analyses described in this manuscript was obtained from dbGaP at http:// www.ncbi.nlm.nih.gov/projects/gap/, through accession number phs000092.v1.p1. The CGEMS (http://cgems.cancer.gov/) PanScan study was derived from 12 cohorts, as outlined by Amundadottir et al. (2009). The PanScan dataset used for the analyses described in this manuscript was obtained from dbGaP through accession number phs000206.v3.p2. The CGEMS breast cancer GWAS was derived from the Nurses{\textquoteright} Health Study, which was supported by NIH grants CA65725, CA87969, CA49449, CA67262, CA50385, and 5UO1 CA098233. The CGEMS dataset used for the analyses described in this manuscript was obtained from dbGaP through accession number phs000147.v1.p1. Funding support for the GWAS of Schizophrenia was provided by the National Institute of Mental Health (R01 MH67257, R01 MH59588, R01 MH59571, R01 MH59565, R01 MH59587, R01 MH60870, R01 MH59566, R01 MH59586, R01 MH61675, R01 MH60879, R01 MH81800, U01 MH46276, U01 MH46289 U01 MH46318, U01 MH79469, and U01 MH79470), and the genotyping of samples was provided through GAIN. The datasets used for the analyses described in this manuscript were obtained from the dbGaP through accession number phs000021.v3.p2. Samples and associated phenotype data for the GWAS of Schizophrenia were provided by the Molecular Genetics of Schizophrenia Collaboration (PI: Pablo V. Gejman, Evanston Northwestern Healthcare (ENH) and Northwestern University, Evanston, IL, USA). Funding Information: Acknowledgments This work was supported by National Institute of Drug Abuse grant nos. R33DA027486 and R01DA026141 (E.O. Johnson PI), as well as R01DA025888 (L.J. Bierut & E.O. Johnson Co-PIs). Funding support for SAGE was provided through the NIH Genes, Environment and Health Initiative [GEI] (U01 HG004422). SAGE is one of the GWAS funded as part of the Gene Environment Association Studies (GENEVA) under GEI. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by the GENEVA Coordinating Center (U01 HG004446). Assistance with data cleaning was provided by the National Center for Biotechnology Information. Support for collection of datasets and samples was provided by the Collaborative Study on the Genetics of Alcoholism (COGA; U10 AA008401), the Collaborative Genetic Study of Nicotine Dependence (COGEND; P01 CA089392), and the Family Study of Cocaine Dependence (FSCD; R01 DA013423). Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited",
year = "2013",
month = may,
doi = "10.1007/s00439-013-1266-7",
language = "English",
volume = "132",
pages = "509--522",
journal = "Human genetics",
issn = "0340-6717",
number = "5",
}