Classical results for randomly covering a one-dimensional domain are generalized to multiple domains. The density function for the number of gaps is derived in the context of Bell's polynomials. Limiting forms are determined as well. The multiple domain configuration is a good model for DNA sequencing scenarios in which the target is fragmented, e.g., filtered DNA libraries and macronuclear genomes. Large-scale sequencing efforts are now starting to focus on such projects. Fragmentation effects are most prominent for small targets but vanish for very large targets. Here, the current model converges with classical theory. Pyrosequencing has been suggested as a viable, much cheaper alternative for large filtered projects. However, our model indicates that a recently demonstrated microscale Sanger reaction will likely be far more effective.
- Probabilistic modeling
- Sequence redundancy