A catalog of reference genomes from the human microbiome

Karen E. Nelson, George M. Weinstock, Sarah K. Highlander, Kim C. Worley, Heather Huot Creasy, Jennifer Russo Wortman, Douglas B. Rusch, Makedonka Mitreva, Erica Sodergren, Asif T. Chinwalla, Michael Feldgarden, Dirk Gevers, Brian J. Haas, Ramana Madupu, Doyle V. Ward, Bruce W. Birren, Richard A. Gibbs, Barbara Methe, Joseph F. Petrosino, Robert L. StrausbergGranger G. Sutton, Owen R. White, Richard K. Wilson, Scott Durkin, Michelle Gwinn Giglio, Sharvari Gujja, Clint Howarth, Chinnappa D. Kodira, Nikos Kyrpides, Teena Mehta, Donna M. Muzny, Matthew Pearson, Kymberlie Pepin, Amrita Pati, Xiang Qin, Chandri Yandava, Qiandong Zeng, Lan Zhang, Aaron M. Berlin, Lei Chen, Theresa A. Hepburn, Justin Johnson, Jamison McCorrison, Jason Miller, Pat Minx, Chad Nusbaum, Carsten Russ, Sean M. Sykes, Chad M. Tomlinson, Sarah Young, Wesley C. Warren, Jonathan Badger, Jonathan Crabtree, Victor M. Markowitz, Joshua Orvis, Andrew Cree, Steve Ferriera, Lucinda L. Fulton, Robert S. Fulton, Marcus Gillis, Lisa D. Hemphill, Vandita Joshi, Christie Kovar, Manolito Torralba, Kris A. Wetterstrand, Amr Abouellleil, Aye M. Wollam, Christian J. Buhay, Yan Ding, Shannon Dugan, Michael G. FitzGerald, Mike Holder, Jessica Hostetler, Sandra W. Clifton, Emma Allen-Vercoe, Ashlee M. Earl, Candace N. Farmer, Konstantinos Liolios, Michael G. Surette, Jennifer Russo Wortman, Qiang Xu, Craig Pohl, Katarzyna Wilczek-Boney, Dianhui Zhu

Research output: Contribution to journalArticlepeer-review

502 Scopus citations


The human microbiome refers to the community of microorganisms, including prokaryotes, viruses, and microbial eukaryotes, that populate the human body. The National Institutes of Health launched an initiative that focuses on describing the diversity of microbial species that are associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains, previously unidentified ("novel") polypeptides that had both unmasked sequence length greater than 100 amino acids and no BLASTP match to any nonreference entry in the nonredundant subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (∼97%) were unique. In addition, this set of microbial genomes allows for ∼40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic data sets. In addition, the associated metrics and standards used by our group for quality assurance are presented.

Original languageEnglish
Pages (from-to)994-999
Number of pages6
Issue number5981
StatePublished - May 21 2010


Dive into the research topics of 'A catalog of reference genomes from the human microbiome'. Together they form a unique fingerprint.

Cite this