Skip to main navigation Skip to search Skip to main content

Long-read sequence and assembly of segmental duplications

  • Mitchell R. Vollger
  • , Philip C. Dishuck
  • , Melanie Sorensen
  • , Anne Marie E. Welch
  • , Vy Dang
  • , Max L. Dougherty
  • , Tina A. Graves-Lindsay
  • , Richard K. Wilson
  • , Mark J.P. Chaisson
  • , Evan E. Eichler

Research output: Contribution to journalArticlepeer-review

Abstract

We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33–79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (<99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.

Original languageEnglish
Pages (from-to)88-94
Number of pages7
JournalNature Methods
Volume16
Issue number1
DOIs
StatePublished - Jan 1 2019

Fingerprint

Dive into the research topics of 'Long-read sequence and assembly of segmental duplications'. Together they form a unique fingerprint.

Cite this