TY - JOUR
T1 - Aspects of coverage in medical DNA sequencing
AU - Wendl, Michael C.
AU - Wilson, Richard K.
N1 - Funding Information:
The germ of this work evolved during the "Monday-morning cancer discussions" organized by Richard Wilson and Elaine Mardis of the Genome Sequencing Center and Timothy Ley of the Department of Medicine, Washington University. The authors wish especially to thank Elaine Mardis, Timothy Ley, and Li Ding for their input and critical comments and Brian Dunford-Shore for analyzing the C. elegans data and generating the information for plotting the empirical results in Fig. 1. They also appreciate general discussions of genome coverage involving Ken Chen, Jarret Glasscock, Michael McLellan II, Ryan Richt, and Todd Wylie. This work was partially supported by grant HG003079 from the National Human Genome Research Institute (Richard K. Wilson, PI).
PY - 2008/5/16
Y1 - 2008/5/16
N2 - Background: DNA sequencing is now emerging as an important component in biomedical studies of diseases like cancer. Short-read, highly parallel sequencing instruments are expected to be used heavily for such projects, but many design specifications have yet to be conclusively established. Perhaps the most fundamental of these is the redundancy required to detect sequence variations, which bears directly upon genomic coverage and the consequent resolving power for discerning somatic mutations. Results: We address the medical sequencing coverage problem via an extension of the standard mathematical theory of haploid coverage. The expected diploid multi-fold coverage, as well as its generalization for aneuploidy are derived and these expressions can be readily evaluated for any project. The resulting theory is used as a scaling law to calibrate performance to that of standard BAC sequencing at 8× to 10× redundancy, i.e. for expected coverages that exceed 99% of the unique sequence. A differential strategy is formalized for tumor/normal studies wherein tumor samples are sequenced more deeply than normal ones. In particular, both tumor alleles should be detected at least twice, while both normal alleles are detected at least once. Our theory predicts these requirements can be met for tumor and normal redundancies of approximately 26× and 21×, respectively. We explain why these values do not differ by a factor of 2, as might intuitively be expected. Future technology developments should prompt even deeper sequencing of tumors, but the 21× value for normal samples is essentially a constant. Conclusion: Given the assumptions of standard coverage theory, our model gives pragmatic estimates for required redundancy. The differential strategy should be an efficient means of identifying potential somatic mutations for further study.
AB - Background: DNA sequencing is now emerging as an important component in biomedical studies of diseases like cancer. Short-read, highly parallel sequencing instruments are expected to be used heavily for such projects, but many design specifications have yet to be conclusively established. Perhaps the most fundamental of these is the redundancy required to detect sequence variations, which bears directly upon genomic coverage and the consequent resolving power for discerning somatic mutations. Results: We address the medical sequencing coverage problem via an extension of the standard mathematical theory of haploid coverage. The expected diploid multi-fold coverage, as well as its generalization for aneuploidy are derived and these expressions can be readily evaluated for any project. The resulting theory is used as a scaling law to calibrate performance to that of standard BAC sequencing at 8× to 10× redundancy, i.e. for expected coverages that exceed 99% of the unique sequence. A differential strategy is formalized for tumor/normal studies wherein tumor samples are sequenced more deeply than normal ones. In particular, both tumor alleles should be detected at least twice, while both normal alleles are detected at least once. Our theory predicts these requirements can be met for tumor and normal redundancies of approximately 26× and 21×, respectively. We explain why these values do not differ by a factor of 2, as might intuitively be expected. Future technology developments should prompt even deeper sequencing of tumors, but the 21× value for normal samples is essentially a constant. Conclusion: Given the assumptions of standard coverage theory, our model gives pragmatic estimates for required redundancy. The differential strategy should be an efficient means of identifying potential somatic mutations for further study.
UR - http://www.scopus.com/inward/record.url?scp=45749155981&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-9-239
DO - 10.1186/1471-2105-9-239
M3 - Article
C2 - 18485222
AN - SCOPUS:45749155981
SN - 1471-2105
VL - 9
JO - BMC bioinformatics
JF - BMC bioinformatics
M1 - 239
ER -