Abstract
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Original language | English |
---|---|
Pages (from-to) | 312-324 |
Number of pages | 13 |
Journal | Nature |
Volume | 617 |
Issue number | 7960 |
DOIs | |
State | Published - May 11 2023 |
Fingerprint
Dive into the research topics of 'A draft human pangenome reference'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
}
In: Nature, Vol. 617, No. 7960, 11.05.2023, p. 312-324.
Research output: Contribution to journal › Article › peer-review
TY - JOUR
T1 - A draft human pangenome reference
AU - Liao, Wen Wei
AU - Asri, Mobin
AU - Ebler, Jana
AU - Doerr, Daniel
AU - Haukness, Marina
AU - Hickey, Glenn
AU - Lu, Shuangjia
AU - Lucas, Julian K.
AU - Monlong, Jean
AU - Abel, Haley J.
AU - Buonaiuto, Silvia
AU - Chang, Xian H.
AU - Cheng, Haoyu
AU - Chu, Justin
AU - Colonna, Vincenza
AU - Eizenga, Jordan M.
AU - Feng, Xiaowen
AU - Fischer, Christian
AU - Fulton, Robert S.
AU - Garg, Shilpa
AU - Groza, Cristian
AU - Guarracino, Andrea
AU - Harvey, William T.
AU - Heumos, Simon
AU - Howe, Kerstin
AU - Jain, Miten
AU - Lu, Tsung Yu
AU - Markello, Charles
AU - Martin, Fergal J.
AU - Mitchell, Matthew W.
AU - Munson, Katherine M.
AU - Mwaniki, Moses Njagi
AU - Novak, Adam M.
AU - Olsen, Hugh E.
AU - Pesout, Trevor
AU - Porubsky, David
AU - Prins, Pjotr
AU - Sibbesen, Jonas A.
AU - Sirén, Jouni
AU - Tomlinson, Chad
AU - Villani, Flavia
AU - Vollger, Mitchell R.
AU - Antonacci-Fulton, Lucinda L.
AU - Baid, Gunjan
AU - Baker, Carl A.
AU - Belyaeva, Anastasiya
AU - Billis, Konstantinos
AU - Carroll, Andrew
AU - Chang, Pi Chuan
AU - Cody, Sarah
AU - Cook, Daniel E.
AU - Cook-Deegan, Robert M.
AU - Cornejo, Omar E.
AU - Diekhans, Mark
AU - Ebert, Peter
AU - Fairley, Susan
AU - Fedrigo, Olivier
AU - Felsenfeld, Adam L.
AU - Formenti, Giulio
AU - Frankish, Adam
AU - Gao, Yan
AU - Garrison, Nanibaa’ A.
AU - Giron, Carlos Garcia
AU - Green, Richard E.
AU - Haggerty, Leanne
AU - Hoekzema, Kendra
AU - Hourlier, Thibaut
AU - Ji, Hanlee P.
AU - Kenny, Eimear E.
AU - Koenig, Barbara A.
AU - Kolesnikov, Alexey
AU - Korbel, Jan O.
AU - Kordosky, Jennifer
AU - Koren, Sergey
AU - Lee, Ho Joon
AU - Lewis, Alexandra P.
AU - Magalhães, Hugo
AU - Marco-Sola, Santiago
AU - Marijon, Pierre
AU - McCartney, Ann
AU - McDaniel, Jennifer
AU - Mountcastle, Jacquelyn
AU - Nattestad, Maria
AU - Nurk, Sergey
AU - Olson, Nathan D.
AU - Popejoy, Alice B.
AU - Puiu, Daniela
AU - Rautiainen, Mikko
AU - Regier, Allison A.
AU - Rhie, Arang
AU - Sacco, Samuel
AU - Sanders, Ashley D.
AU - Schneider, Valerie A.
AU - Schultz, Baergen I.
AU - Shafin, Kishwar
AU - Smith, Michael W.
AU - Sofia, Heidi J.
AU - Abou Tayoun, Ahmad N.
AU - Thibaud-Nissen, Françoise
AU - Tricomi, Francesca Floriana
AU - Wagner, Justin
AU - Walenz, Brian
AU - Wood, Jonathan M.D.
AU - Zimin, Aleksey V.
AU - Bourque, Guillaume
AU - Chaisson, Mark J.P.
AU - Flicek, Paul
AU - Phillippy, Adam M.
AU - Zook, Justin M.
AU - Eichler, Evan E.
AU - Haussler, David
AU - Wang, Ting
AU - Jarvis, Erich D.
AU - Miga, Karen H.
AU - Garrison, Erik
AU - Marschall, Tobias
AU - Hall, Ira M.
AU - Li, Heng
AU - Paten, Benedict
N1 - Funding Information: We would like to acknowledge S. Bidwell and other members of the GenBank staff at the National Center for Biotechnology Information (NCBI; NLM/NIH) for their work to release the assemblies into GenBank. Certain commercial equipment, instruments or materials are identified to adequately specify experimental conditions or reported results. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology nor does it imply that the equipment, instruments or materials identified are necessarily the best available for the purpose. Computational infrastructure and support were provided by the Centre for Information and Media Technology at Heinrich Heine University Düsseldorf. This work was funded in part by the National Human Genome Research Institute of the National Institutes of Health under award numbers U41HG010972, 1U01HG010973, U41HG007234, 1R01HG011274, R01HG010485, U24HG010262 U01HG010963, U24HG007497 and R01HG011649. This work was funded in part by the National Institutes of Health under award numbers U01HG010961, OT2OD033761, U24HG011853, R01-HG006677, R35-GM130151, R01HG002385, R01HG010169, U01HG01973, 5U01HG010971, R01GM123489, U24HG009081 and 1ZIAHG200398. This work was funded in part by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. The work of F.T.-N. and V.A.S. was supported by the National Center for Biotechnology Information of the National Library of Medicine (NLM), National Institutes of Health. This work was funded in part by the USDA National Institute of Food and Agriculture, grant number 2018-67015-28199, and the National Science Foundation (NSF), grant IOS-1744309, and NSF PPoSS award number 2118709 (E.G. and P.P.). This work was funded in part by the Natural Sciences and Engineering Research Council of Canada (NSERC). G.Bourque is supported by a Canada Research Chair Tier 1 award, a FRQ-S, Distinguished Research Scholar award and by the World Premier International Research Center Initiative (WPI), MEXT, Japan. J.S. was supported by the Carlsberg Foundation. This work was funded in part by intramural funding at the National Institute of Standards and Technology. E.E.E., D.H. and E.D.J. are investigators of the Howard Hughes Medical Institute. This work was funded in part by an Oxford Nanopore Research grant (SC20130149) awarded to M. Akeson, University of California Santa Cruz. This work was funded in part by Wellcome Trust award numbers WT104947/Z/14/Z, WT222155/Z/20/Z and WT108749/Z/15/Z. This work was funded in part by a Juan de la Cierva fellowship grant (IJC2020-045916-I) funded by MCIN/AEI/ 10.13039/501100011033 and by the European Union NextGenerationEU/PRTR. This work was funded in part by the Novo Nordisk Foundation (NNF21OC0069089). S.H. acknowledges funding from the Central Innovation Programme (ZIM) for SMEs of the Federal Ministry for Economic Affairs and Energy of Germany. This work was supported by the BMBF-funded de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) (031A532B, 031A533A, 031A533B, 031A534A, 031A535A, 031A537A, 031A537B, 031A537C, 031A537D and 031A538A). This work was funded in part by the German Federal Ministry of Education and Research (BMBF) (031L0184A) and the European Commission, Innovative training network (ITN) (956229). W.-W.L. was supported in part by the Government Scholarship to Study Abroad (GSSA) from the Ministry of Education of Taiwan. Funding Information: We would like to acknowledge S. Bidwell and other members of the GenBank staff at the National Center for Biotechnology Information (NCBI; NLM/NIH) for their work to release the assemblies into GenBank. Certain commercial equipment, instruments or materials are identified to adequately specify experimental conditions or reported results. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology nor does it imply that the equipment, instruments or materials identified are necessarily the best available for the purpose. Computational infrastructure and support were provided by the Centre for Information and Media Technology at Heinrich Heine University Düsseldorf. This work was funded in part by the National Human Genome Research Institute of the National Institutes of Health under award numbers U41HG010972, 1U01HG010973, U41HG007234, 1R01HG011274, R01HG010485, U24HG010262 U01HG010963, U24HG007497 and R01HG011649. Publisher Copyright: © 2023, The Author(s).
PY - 2023/5/11
Y1 - 2023/5/11
N2 - Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
AB - Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
UR - http://www.scopus.com/inward/record.url?scp=85158007304&partnerID=8YFLogxK
U2 - 10.1038/s41586-023-05896-x
DO - 10.1038/s41586-023-05896-x
M3 - Article
C2 - 37165242
AN - SCOPUS:85158007304
SN - 0028-0836
VL - 617
SP - 312
EP - 324
JO - Nature
JF - Nature
IS - 7960
ER -