A comparison of genetic imputation methods using Long Life Family Study genotypes and sequence data with the 1000 Genome reference panel

Aldi T. Kraja, E. Warwick Daw, Petra Lenzini, Lihua Wang, Shiow J. Lin, Christine A. Williams, Alan B. Wells, Kathryn L. Lunetta, Joanne M. Murabito, Paola Sebastiani, Giuseppe Tosto, Sandra Barral, Ryan L. Minster, Anatoly Yashin, Thomas Perls, Michael A. Province

Research output: Contribution to journalArticlepeer-review

Abstract

This study compares methods of imputing genetic markers, given a typed GWAS scaffold from the Long Life Family Study (LLFS) and latest reference panel of 1000-Genomes. We examined two programs for pre-phasing haplotypes MACH/SHAPEIT2 and MINIMAC/IMPUTE2 for imputation. SHAPEIT2 is advantageous for haplotype pre-phasing. MINIMAC and IMPUTE2 produced similar imputation quality. We used a 4MB region on chromosome 2 of LLFS and in the Supplement, we compared methods using chromosome 19 data from the Genetic Analysis Workshop-19. IMPUTE2 had the advantage of using two references 1000G and a sequence for a subset of subjects. SHAPEIT2 and IMPUTE2 were used to finalise the full LLFS autosome imputation. In LLFS, 44% of ~80M autosomal imputed variants showed good imputation quality (info ≥ 0.30). Low imputation quality was associated with a predominantly low allele frequency in 1000-Genomes. New emerging large-scale sequences and enhanced imputation methodologies will further improve imputation quality.

Original languageEnglish
Pages (from-to)59-84
Number of pages26
JournalInternational Journal of Bioinformatics Research and Applications
Volume16
Issue number1
DOIs
StatePublished - 2020

Keywords

  • 1000 Genomes reference
  • FCGENE software
  • Genetic imputation
  • IMPUTE2 software
  • LLFS
  • Long life family study
  • MACH software
  • MINIMACH software
  • SHAPEIT2 software
  • Sequence reference

Fingerprint

Dive into the research topics of 'A comparison of genetic imputation methods using Long Life Family Study genotypes and sequence data with the 1000 Genome reference panel'. Together they form a unique fingerprint.

Cite this