TY - JOUR
T1 - GAWMerge expands GWAS sample size and diversity by combining array-based genotyping and whole-genome sequencing
AU - NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium
AU - Mathur, Ravi
AU - Fang, Fang
AU - Gaddis, Nathan
AU - Hancock, Dana B.
AU - Cho, Michael H.
AU - Hokanson, John E.
AU - Bierut, Laura J.
AU - Lutz, Sharon M.
AU - Young, Kendra
AU - Smith, Albert V.
AU - Silverman, Edwin K.
AU - Page, Grier P.
AU - Johnson, Eric O.
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Genome-wide association studies (GWAS) have made impactful discoveries for complex diseases, often by amassing very large sample sizes. Yet, GWAS of many diseases remain underpowered, especially for non-European ancestries. One cost-effective approach to increase sample size is to combine existing cohorts, which may have limited sample size or be case-only, with public controls, but this approach is limited by the need for a large overlap in variants across genotyping arrays and the scarcity of non-European controls. We developed and validated a protocol, Genotyping Array-WGS Merge (GAWMerge), for combining genotypes from arrays and whole-genome sequencing, ensuring complete variant overlap, and allowing for diverse samples like Trans-Omics for Precision Medicine to be used. Our protocol involves phasing, imputation, and filtering. We illustrated its ability to control technology driven artifacts and type-I error, as well as recover known disease-associated signals across technologies, independent datasets, and ancestries in smoking-related cohorts. GAWMerge enables genetic studies to leverage existing cohorts to validly increase sample size and enhance discovery for understudied traits and ancestries.
AB - Genome-wide association studies (GWAS) have made impactful discoveries for complex diseases, often by amassing very large sample sizes. Yet, GWAS of many diseases remain underpowered, especially for non-European ancestries. One cost-effective approach to increase sample size is to combine existing cohorts, which may have limited sample size or be case-only, with public controls, but this approach is limited by the need for a large overlap in variants across genotyping arrays and the scarcity of non-European controls. We developed and validated a protocol, Genotyping Array-WGS Merge (GAWMerge), for combining genotypes from arrays and whole-genome sequencing, ensuring complete variant overlap, and allowing for diverse samples like Trans-Omics for Precision Medicine to be used. Our protocol involves phasing, imputation, and filtering. We illustrated its ability to control technology driven artifacts and type-I error, as well as recover known disease-associated signals across technologies, independent datasets, and ancestries in smoking-related cohorts. GAWMerge enables genetic studies to leverage existing cohorts to validly increase sample size and enhance discovery for understudied traits and ancestries.
UR - http://www.scopus.com/inward/record.url?scp=85135869453&partnerID=8YFLogxK
U2 - 10.1038/s42003-022-03738-6
DO - 10.1038/s42003-022-03738-6
M3 - Article
C2 - 35953715
AN - SCOPUS:85135869453
SN - 2399-3642
VL - 5
JO - Communications Biology
JF - Communications Biology
IS - 1
M1 - 806
ER -