TY - GEN
T1 - Haplotype inference from short sequence reads using a population genealogical history model
AU - Zhang, Jin
AU - Wu, Yufeng
PY - 2011/12/1
Y1 - 2011/12/1
N2 - High-throughput sequencing is currently a major transforming technology in biology. In this paper, we study a population genomics problem motivated by the newly available short reads data from high-throughput sequencing. In this problem, we are given short reads collected from individuals in a population. The objective is to infer haplotypes with the given reads. We first formulate the computational problem of haplotype inference with short reads. Based on a simple probabilistic model on short reads, we present a new approach of inferring haplotypes directly from given reads (i.e. without first calling genotypes). Our method is finding the most likely haplotypes whose local genealogical history can be approximately modeled as a perfect phylogeny. We show that the optimal haplotypes under this objective can be found for many data using integer linear programming for modest sized data when there is no recombination. We then develop a related heuristic method which can work with larger data, and also allows recombination. Simulation shows that the performance of our method is competitive against alternative approaches.
AB - High-throughput sequencing is currently a major transforming technology in biology. In this paper, we study a population genomics problem motivated by the newly available short reads data from high-throughput sequencing. In this problem, we are given short reads collected from individuals in a population. The objective is to infer haplotypes with the given reads. We first formulate the computational problem of haplotype inference with short reads. Based on a simple probabilistic model on short reads, we present a new approach of inferring haplotypes directly from given reads (i.e. without first calling genotypes). Our method is finding the most likely haplotypes whose local genealogical history can be approximately modeled as a perfect phylogeny. We show that the optimal haplotypes under this objective can be found for many data using integer linear programming for modest sized data when there is no recombination. We then develop a related heuristic method which can work with larger data, and also allows recombination. Simulation shows that the performance of our method is competitive against alternative approaches.
KW - High-throughput sequencing
KW - bioinformatics algorithms
KW - haplotype inference
KW - population genomics
UR - http://www.scopus.com/inward/record.url?scp=84872941210&partnerID=8YFLogxK
M3 - Conference contribution
C2 - 21121056
AN - SCOPUS:84872941210
SN - 9814335053
SN - 9789814335058
T3 - Pacific Symposium on Biocomputing 2011, PSB 2011
SP - 288
EP - 299
BT - Pacific Symposium on Biocomputing 2011, PSB 2011
T2 - 16th Pacific Symposium on Biocomputing, PSB 2011
Y2 - 3 January 2011 through 7 January 2011
ER -