Haplotype inference from short sequence reads using a population genealogical history model

Jin Zhang, Yufeng Wu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

High-throughput sequencing is currently a major transforming technology in biology. In this paper, we study a population genomics problem motivated by the newly available short reads data from high-throughput sequencing. In this problem, we are given short reads collected from individuals in a population. The objective is to infer haplotypes with the given reads. We first formulate the computational problem of haplotype inference with short reads. Based on a simple probabilistic model on short reads, we present a new approach of inferring haplotypes directly from given reads (i.e. without first calling genotypes). Our method is finding the most likely haplotypes whose local genealogical history can be approximately modeled as a perfect phylogeny. We show that the optimal haplotypes under this objective can be found for many data using integer linear programming for modest sized data when there is no recombination. We then develop a related heuristic method which can work with larger data, and also allows recombination. Simulation shows that the performance of our method is competitive against alternative approaches.

Original languageEnglish
Title of host publicationPacific Symposium on Biocomputing 2011, PSB 2011
Pages288-299
Number of pages12
StatePublished - Dec 1 2011
Event16th Pacific Symposium on Biocomputing, PSB 2011 - Kohala Coast, HI, United States
Duration: Jan 3 2011Jan 7 2011

Publication series

NamePacific Symposium on Biocomputing 2011, PSB 2011

Conference

Conference16th Pacific Symposium on Biocomputing, PSB 2011
Country/TerritoryUnited States
CityKohala Coast, HI
Period01/3/1101/7/11

Keywords

  • High-throughput sequencing
  • bioinformatics algorithms
  • haplotype inference
  • population genomics

Fingerprint

Dive into the research topics of 'Haplotype inference from short sequence reads using a population genealogical history model'. Together they form a unique fingerprint.

Cite this