TY - JOUR
T1 - Population-based structural variation discovery with Hydra-Multi
AU - Lindberg, Michael R.
AU - Hall, Ira M.
AU - Quinlan, Aaron R.
N1 - Publisher Copyright:
© 2014 The Author.
PY - 2015/4/15
Y1 - 2015/4/15
N2 - Summary: Current strategies for SNP and INDEL discovery incorporate sequence alignments from multiple individuals to maximize sensitivity and specificity. It is widely accepted that this approach also improves structural variant (SV) detection. However, multisample SV analysis has been stymied by the fundamental difficulties of SV calling, e.g. library insert size variability, SV alignment signal integration and detecting long-range genomic rearrangements involving disjoint loci. Extant tools suffer from poor scalability, which limits the number of genomes that can be co-analyzed and complicates analysis workflows. We have developed an approach that enables multisample SV analysis in hundreds to thousands of human genomes using commodity hardware. Here, we describe Hydra-Multi and measure its accuracy, speed and scalability using publicly available datasets provided by The 1000 Genomes Project and by The Cancer Genome Atlas (TCGA). Availability and implementation: Hydra-Multi is written in C++ and is freely available at https://github.com/arq5x/Hydra.
AB - Summary: Current strategies for SNP and INDEL discovery incorporate sequence alignments from multiple individuals to maximize sensitivity and specificity. It is widely accepted that this approach also improves structural variant (SV) detection. However, multisample SV analysis has been stymied by the fundamental difficulties of SV calling, e.g. library insert size variability, SV alignment signal integration and detecting long-range genomic rearrangements involving disjoint loci. Extant tools suffer from poor scalability, which limits the number of genomes that can be co-analyzed and complicates analysis workflows. We have developed an approach that enables multisample SV analysis in hundreds to thousands of human genomes using commodity hardware. Here, we describe Hydra-Multi and measure its accuracy, speed and scalability using publicly available datasets provided by The 1000 Genomes Project and by The Cancer Genome Atlas (TCGA). Availability and implementation: Hydra-Multi is written in C++ and is freely available at https://github.com/arq5x/Hydra.
UR - http://www.scopus.com/inward/record.url?scp=84927720616&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btu771
DO - 10.1093/bioinformatics/btu771
M3 - Article
C2 - 25527832
AN - SCOPUS:84927720616
SN - 1367-4803
VL - 31
SP - 1286
EP - 1289
JO - Bioinformatics
JF - Bioinformatics
IS - 8
ER -