Abstract
Parametric model fitting of unprocessed sequencing-gel trace data and a least-squares optimization algorithm provide a method for accurately determining allele frequencies of single nucleotide substitutions in a population. The method uses trace data from two homozygous individuals and from either a heterozygous individual or a mixed population of templates. A parametric model is fit to each of the traces to estimate the amount of each of the four fluorescent dyes that is present at each site. The parameters estimated from each trace are then normalized to account for scalar variations due to differences in the amount of sample loaded. The parameters estimated from the trace of the heterozygous individual or from the mixture are viewed as a weighted sum of the parameters estimated from the traces of the homozygous individuals. The weights, or allele frequencies, are estimated by minimizing the sum of squared errors between the linear combination of homozygous traces and the mixed trace. Comparison of allele frequencies estimated by our method to known frequencies at polymorphic sites in three pools of CEPH individuals show that our method is accurate. Our method is automatic and much less labor-intensive than previous approaches.
Original language | English |
---|---|
Pages | 202-206 |
Number of pages | 5 |
DOIs | |
State | Published - 1998 |
Event | Proceedings of the 1998 2nd Annual International Conference on Computational Molecular Biology - New York, NY, USA Duration: Mar 22 1998 → Mar 25 1998 |
Conference
Conference | Proceedings of the 1998 2nd Annual International Conference on Computational Molecular Biology |
---|---|
City | New York, NY, USA |
Period | 03/22/98 → 03/25/98 |