TY - JOUR
T1 - A new method for handling missing species in diversification analysis applicable to randomly or nonrandomly sampled phylogenies
AU - Cusimano, Natalie
AU - Stadler, Tanja
AU - Renner, Susanne S.
N1 - Funding Information:
FUNDING Supported by the German Research Council (grant RE 603/7-1).
PY - 2012/10
Y1 - 2012/10
N2 - Chronograms from molecular dating are increasingly being used to infer rates of diversification and their change over time. A major limitation in such analyses is incomplete species sampling that moreover is usually nonrandom. While the widely used γ statistic with the Monte Carlo constant-rates test or the birth-death likelihood analysis with the δ AICrc test statistic are appropriate for comparing the fit of different diversification models in phylogenies with random species sampling, no objective automated method has been developed for fitting diversification models to nonrandomly sampled phylogenies. Here, we introduce a novel approach, CorSiM, which involves simulating missing splits under a constant rate birth-death model and allows the user to specify whether species sampling in the phylogeny being analyzed is random or nonrandom. The completed trees can be used in subsequent model-fitting analyses. This is fundamentally different from previous diversification rate estimation methods, which were based on null distributions derived from the incomplete trees. CorSiM is automated in an R package and can easily be applied to large data sets. We illustrate the approach in two Araceae clades, one with a random species sampling of 52% and one with a nonrandom sampling of 55%. In the latter clade, the CorSiM approach detects and quantifies an increase in diversification rate, whereas classic approaches prefer a constant rate model; in the former clade, results do not differ among methods (as indeed expected since the classic approaches are valid only for randomly sampled phylogenies). The CorSiM method greatly reduces the type I error in diversification analysis, but type II error remains a methodological problem.
AB - Chronograms from molecular dating are increasingly being used to infer rates of diversification and their change over time. A major limitation in such analyses is incomplete species sampling that moreover is usually nonrandom. While the widely used γ statistic with the Monte Carlo constant-rates test or the birth-death likelihood analysis with the δ AICrc test statistic are appropriate for comparing the fit of different diversification models in phylogenies with random species sampling, no objective automated method has been developed for fitting diversification models to nonrandomly sampled phylogenies. Here, we introduce a novel approach, CorSiM, which involves simulating missing splits under a constant rate birth-death model and allows the user to specify whether species sampling in the phylogeny being analyzed is random or nonrandom. The completed trees can be used in subsequent model-fitting analyses. This is fundamentally different from previous diversification rate estimation methods, which were based on null distributions derived from the incomplete trees. CorSiM is automated in an R package and can easily be applied to large data sets. We illustrate the approach in two Araceae clades, one with a random species sampling of 52% and one with a nonrandom sampling of 55%. In the latter clade, the CorSiM approach detects and quantifies an increase in diversification rate, whereas classic approaches prefer a constant rate model; in the former clade, results do not differ among methods (as indeed expected since the classic approaches are valid only for randomly sampled phylogenies). The CorSiM method greatly reduces the type I error in diversification analysis, but type II error remains a methodological problem.
KW - Birth-death likelihood analysis
KW - diversification rates
KW - missing-species- problem
KW - model fitting
KW - nonrandom species sampling
KW - γ statistic
UR - http://www.scopus.com/inward/record.url?scp=84861212667&partnerID=8YFLogxK
U2 - 10.1093/sysbio/sys031
DO - 10.1093/sysbio/sys031
M3 - Article
C2 - 22334344
AN - SCOPUS:84861212667
SN - 1063-5157
VL - 61
SP - 785
EP - 792
JO - Systematic Biology
JF - Systematic Biology
IS - 5
ER -