TY - JOUR
T1 - How frugal is mother nature with haplotypes?
AU - Climer, Sharlee
AU - Jäger, Gerold
AU - Templeton, Alan R.
AU - Zhang, Weixiong
N1 - Funding Information:
Funding: Olin Fellowship (to S.C., in part); two National Institutes of Health grants (P50-GM65509 and 2RO1 GM02871924A2 to A.T.); Alzheimer’s Association; two National Science Foundation grants (IIS-0535257 and DBI-0743797 to W.Z.).
PY - 2009/1
Y1 - 2009/1
N2 - Motivation: Inference of haplotypes from genotype data is crucial and challenging for many vitally important studies. The first, and most critical step, is the ascertainment of a biologically sound model to be optimized. Many models that have been proposed rely partially or entirely on reducing the number of unique haplotypes in the solution. Results: This article examines the parsimony of haplotypes using known haplotypes as well as genotypes from the HapMap project. Our study reveals that there are relatively few unique haplotypes, but not always the least possible, for the datasets with known solutions. Furthermore, we show that there are frequently very large numbers of parsimonious solutions, and the number increases exponentially with increasing cardinality. Moreover, these solutions are quite varied, most of which are not consistent with the true solutions. These results quantify the limitations of the Pure Parsimony model and demonstrate the imperative need to consider additional properties for haplotype inference models. At a higher level, and with broad applicability, this article illustrates the power of combinatorial methods to tease out imperfections in a given biological model.
AB - Motivation: Inference of haplotypes from genotype data is crucial and challenging for many vitally important studies. The first, and most critical step, is the ascertainment of a biologically sound model to be optimized. Many models that have been proposed rely partially or entirely on reducing the number of unique haplotypes in the solution. Results: This article examines the parsimony of haplotypes using known haplotypes as well as genotypes from the HapMap project. Our study reveals that there are relatively few unique haplotypes, but not always the least possible, for the datasets with known solutions. Furthermore, we show that there are frequently very large numbers of parsimonious solutions, and the number increases exponentially with increasing cardinality. Moreover, these solutions are quite varied, most of which are not consistent with the true solutions. These results quantify the limitations of the Pure Parsimony model and demonstrate the imperative need to consider additional properties for haplotype inference models. At a higher level, and with broad applicability, this article illustrates the power of combinatorial methods to tease out imperfections in a given biological model.
UR - https://www.scopus.com/pages/publications/58049194188
U2 - 10.1093/bioinformatics/btn572
DO - 10.1093/bioinformatics/btn572
M3 - Article
C2 - 18987010
AN - SCOPUS:58049194188
SN - 1367-4803
VL - 25
SP - 68
EP - 74
JO - Bioinformatics
JF - Bioinformatics
IS - 1
ER -