The tropical Andes of South America, the world's richest biodiversity hotspot, are home to many rapid radiations. While geological, climatic, and ecological processes collectively explain such radiations, their relative contributions are seldom examined within a single clade. We explore the contribution of these factors by applying a series of diversification models that incorporate mountain building, climate change, and trait evolution to the first dated phylogeny of Andean bellflowers (Campanulaceae: Lobelioideae). Our framework is novel for its direct incorporation of geological data on Andean uplift into a macroevolutionary model. We show that speciation and extinction are differentially influenced by abiotic factors: speciation rates rose concurrently with Andean elevation, while extinction rates decreased during global cooling. Pollination syndrome and fruit type, both biotic traits known to facilitate mutualisms, played an additional role in driving diversification. These abiotic and biotic factors resulted in one of the fastest radiations reported to date: the centropogonids, whose 550 species arose in the last 5 million yr. Our study represents a significant advance in our understanding of plant evolution in Andean cloud forests. It further highlights the power of combining phylogenetic and Earth science models to explore the interplay of geology, climate, and ecology in generating the world's biodiversity.
Phylogeneticists are increasingly assembling genome-scale data sets that include hundreds of genes to resolve their focal clades. Although these data sets commonly include a moderate to high amount of missing data, there remains no consensus on their impact to species tree estimation. Here, using several simulated and empirical data sets, we assess the effects of missing data on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and gene rate heterogeneity. We demonstrate that concatenation (RAxML), gene-tree-based coalescent (ASTRAL, MP-EST, and STAR), and supertree (matrix representation with parsimony [MRP]) methods perform reliably, so long as missing data are randomly distributed (by gene and/or by species) and that a sufficiently large number of genes are sampled. When data sets are indecisive sensu Sanderson et al. (2010. Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evol Biol. 10:155) and/or ILS is high, however, high amounts of missing data that are randomly distributed require exhaustive levels of gene sampling, likely exceeding most empirical studies to date. Moreover, missing data become especially problematic when they are nonrandomly distributed. We demonstrate that STAR produces inconsistent results when the amount of nonrandom missing data is high, regardless of the degree of ILS and gene rate heterogeneity. Similarly, concatenation methods using maximum likelihood can be misled by nonrandom missing data in the presence of gene rate heterogeneity, which becomes further exacerbated when combined with high ILS. In contrast, ASTRAL, MP-EST, and MRP are more robust under all of these scenarios. These results underscore the importance of understanding the influence of missing data in the phylogenomics era.