Accuracy and computational efficiency of a graphical modeling approach to linkage disequilibrium estimation

Haley J. Abel, Alun Thomas

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

We develop recent work on using graphical models for linkage disequilibrium to provide efficient programs for model fitting, phasing, and imputation of missing data in large data sets. Two important features contribute to the computational efficiency: the separation of the model fitting and phasing-imputation processes into different programs, and holding in memory only the data within a moving window of loci during model fitting. Optimal parameter values were chosen by cross-validation to maximize the probability of correctly imputing masked genotypes. The best accuracy obtained is slightly below than that from the Beagle program of Browning and Browning, and our fitting program is slower. However, for large data sets, it uses less storage. For a reference set of n individuals genotyped at m markers, the time and storage required for fitting a graphical model are approximately O(nm) and O(n+m), respectively. To impute the phases and missing data on n individuals using an already fitted graphical model requires O(nm) time and O(m) storage. While the times for fitting and imputation are both O(nm), the imputation process is considerably faster; thus, once a model is estimated from a reference data set, the marginal cost of phasing and imputing further samples is very low.

Original languageEnglish
Article number5
JournalStatistical Applications in Genetics and Molecular Biology
Volume10
Issue number1
DOIs
StatePublished - 2011

Keywords

  • SNP genotype assays
  • cross validation
  • phasing-imputation

Fingerprint

Dive into the research topics of 'Accuracy and computational efficiency of a graphical modeling approach to linkage disequilibrium estimation'. Together they form a unique fingerprint.

Cite this