Abstract
Motivation: Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times. Results: We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped postpass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-Alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results.
| Original language | English |
|---|---|
| Pages (from-to) | 2503-2505 |
| Number of pages | 3 |
| Journal | Bioinformatics |
| Volume | 30 |
| Issue number | 17 |
| DOIs | |
| State | Published - Sep 1 2014 |
Fingerprint
Dive into the research topics of 'SAMBLASTER: Fast duplicate marking and structural variant read extraction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver