TY - JOUR
T1 - Minimum sample sizes for population genomics
T2 - an empirical study from an Amazonian plant species
AU - Nazareno, Alison G.
AU - Bemmels, Jordan B.
AU - Dick, Christopher W.
AU - Lohmann, Lúcia G.
N1 - Publisher Copyright:
© 2017 John Wiley & Sons Ltd
PY - 2017/11
Y1 - 2017/11
N2 - High-throughput DNA sequencing facilitates the analysis of large portions of the genome in nonmodel organisms, ensuring high accuracy of population genetic parameters. However, empirical studies evaluating the appropriate sample size for these kinds of studies are still scarce. In this study, we use double-digest restriction-associated DNA sequencing (ddRADseq) to recover thousands of single nucleotide polymorphisms (SNPs) for two physically isolated populations of Amphirrhox longifolia (Violaceae), a nonmodel plant species for which no reference genome is available. We used resampling techniques to construct simulated populations with a random subset of individuals and SNPs to determine how many individuals and biallelic markers should be sampled for accurate estimates of intra- and interpopulation genetic diversity. We identified 3646 and 4900 polymorphic SNPs for the two populations of A. longifolia, respectively. Our simulations show that, overall, a sample size greater than eight individuals has little impact on estimates of genetic diversity within A. longifolia populations, when 1000 SNPs or higher are used. Our results also show that even at a very small sample size (i.e. two individuals), accurate estimates of FST can be obtained with a large number of SNPs (≥1500). These results highlight the potential of high-throughput genomic sequencing approaches to address questions related to evolutionary biology in nonmodel organisms. Furthermore, our findings also provide insights into the optimization of sampling strategies in the era of population genomics.
AB - High-throughput DNA sequencing facilitates the analysis of large portions of the genome in nonmodel organisms, ensuring high accuracy of population genetic parameters. However, empirical studies evaluating the appropriate sample size for these kinds of studies are still scarce. In this study, we use double-digest restriction-associated DNA sequencing (ddRADseq) to recover thousands of single nucleotide polymorphisms (SNPs) for two physically isolated populations of Amphirrhox longifolia (Violaceae), a nonmodel plant species for which no reference genome is available. We used resampling techniques to construct simulated populations with a random subset of individuals and SNPs to determine how many individuals and biallelic markers should be sampled for accurate estimates of intra- and interpopulation genetic diversity. We identified 3646 and 4900 polymorphic SNPs for the two populations of A. longifolia, respectively. Our simulations show that, overall, a sample size greater than eight individuals has little impact on estimates of genetic diversity within A. longifolia populations, when 1000 SNPs or higher are used. Our results also show that even at a very small sample size (i.e. two individuals), accurate estimates of FST can be obtained with a large number of SNPs (≥1500). These results highlight the potential of high-throughput genomic sequencing approaches to address questions related to evolutionary biology in nonmodel organisms. Furthermore, our findings also provide insights into the optimization of sampling strategies in the era of population genomics.
KW - Amphirrhox longifolia
KW - ddRADseq
KW - genetic diversity
KW - single nucleotide polymorphism
KW - Violaceae
UR - https://www.scopus.com/pages/publications/85017365031
U2 - 10.1111/1755-0998.12654
DO - 10.1111/1755-0998.12654
M3 - Article
C2 - 28078808
AN - SCOPUS:85017365031
SN - 1755-098X
VL - 17
SP - 1136
EP - 1147
JO - Molecular Ecology Resources
JF - Molecular Ecology Resources
IS - 6
ER -