TY - JOUR
T1 - Characterizing the Major Structural Variant Alleles of the Human Genome
AU - Audano, Peter A.
AU - Sulovari, Arvis
AU - Graves-Lindsay, Tina A.
AU - Cantsilieris, Stuart
AU - Sorensen, Melanie
AU - Welch, Anne Marie E.
AU - Dougherty, Max L.
AU - Nelson, Bradley J.
AU - Shah, Ankeeta
AU - Dutcher, Susan K.
AU - Warren, Wesley C.
AU - Magrini, Vincent
AU - McGrath, Sean D.
AU - Li, Yang I.
AU - Wilson, Richard K.
AU - Eichler, Evan E.
N1 - Publisher Copyright:
© 2018 Elsevier Inc.
PY - 2019/1/24
Y1 - 2019/1/24
N2 - In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity.
AB - In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity.
KW - gap closure
KW - human reference genome
KW - major allele
KW - real-time (SMRT) sequencing
KW - single-molecule
KW - structural variation
KW - whole-genome sequence and assembly
UR - http://www.scopus.com/inward/record.url?scp=85060093572&partnerID=8YFLogxK
U2 - 10.1016/j.cell.2018.12.019
DO - 10.1016/j.cell.2018.12.019
M3 - Article
C2 - 30661756
AN - SCOPUS:85060093572
SN - 0092-8674
VL - 176
SP - 663-675.e19
JO - Cell
JF - Cell
IS - 3
ER -