TY - JOUR
T1 - The functional spectrum of low-frequency coding variation
AU - Marth, Gabor T.
AU - Yu, Fuli
AU - Indap, Amit R.
AU - Garimella, Kiran
AU - Gravel, Simon
AU - Leong, Wen F.
AU - Tyler-Smith, Chris
AU - Bainbridge, Matthew
AU - Blackwell, Tom
AU - Zheng-Bradley, Xiangqun
AU - Chen, Yuan
AU - Challis, Danny
AU - Clarke, Laura
AU - Ball, Edward V.
AU - Cibulskis, Kristian
AU - Cooper, David N.
AU - Fulton, Bob
AU - Hartl, Chris
AU - Koboldt, Dan
AU - Muzny, Donna
AU - Smith, Richard
AU - Sougnez, Carrie
AU - Stewart, Chip
AU - Ward, Alistair
AU - Yu, Jin
AU - Xue, Yali
AU - Altshuler, David
AU - Bustamante, Carlos D.
AU - Clark, Andrew G.
AU - Daly, Mark
AU - DePristo, Mark
AU - Flicek, Paul
AU - Gabriel, Stacey
AU - Mardis, Elaine
AU - Palotie, Aarno
AU - Gibbs, Richard
N1 - Funding Information:
This research was supported by the National Institutes of Health grants R01 HG004719 and RC2 HG005552 (GTM). R01 HG003229 (AGC and CDB). CTS, YC and YX were supported by The Wellcome Trust (WT 077009). FY, DC, JY, and RG were supported by the National Human Genome Research Institute, National Institutes of Health, under grants 5U54HG003273 and 1U01HG005211-0109. A list of members of the pilot phase of the1000 Genomes Project is provided in Additional file 2.
PY - 2011/9/14
Y1 - 2011/9/14
N2 - Background: Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.Results: The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants.Conclusions: This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.
AB - Background: Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.Results: The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants.Conclusions: This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.
UR - http://www.scopus.com/inward/record.url?scp=80052825195&partnerID=8YFLogxK
U2 - 10.1186/gb-2011-12-9-r84
DO - 10.1186/gb-2011-12-9-r84
M3 - Article
C2 - 21917140
AN - SCOPUS:80052825195
SN - 1474-7596
VL - 12
JO - Genome biology
JF - Genome biology
IS - 9
M1 - R84
ER -