Zhenxiang Xi, Liang Liu, and Charles C. Davis. 11/15/2015. “
The Impact of Missing Data on Species Tree Estimation.” Molecular Biology and Evolution, 33, 3, Pp. 838-60.
AbstractPhylogeneticists are increasingly assembling genome-scale data sets that include hundreds of genes to resolve their focal clades. Although these data sets commonly include a moderate to high amount of missing data, there remains no consensus on their impact to species tree estimation. Here, using several simulated and empirical data sets, we assess the effects of missing data on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and gene rate heterogeneity. We demonstrate that concatenation (RAxML), gene-tree-based coalescent (ASTRAL, MP-EST, and STAR), and supertree (matrix representation with parsimony [MRP]) methods perform reliably, so long as missing data are randomly distributed (by gene and/or by species) and that a sufficiently large number of genes are sampled. When data sets are indecisive sensu Sanderson et al. (2010. Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evol Biol. 10:155) and/or ILS is high, however, high amounts of missing data that are randomly distributed require exhaustive levels of gene sampling, likely exceeding most empirical studies to date. Moreover, missing data become especially problematic when they are nonrandomly distributed. We demonstrate that STAR produces inconsistent results when the amount of nonrandom missing data is high, regardless of the degree of ILS and gene rate heterogeneity. Similarly, concatenation methods using maximum likelihood can be misled by nonrandom missing data in the presence of gene rate heterogeneity, which becomes further exacerbated when combined with high ILS. In contrast, ASTRAL, MP-EST, and MRP are more robust under all of these scenarios. These results underscore the importance of understanding the influence of missing data in the phylogenomics era.
PDF Zhenxiang Xi, Liang Liu, and Charles C. Davis. 11/2015. “
Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased.” Molecular Phylogenetics and Evolution, 92, Pp. 63-71.
AbstractThe development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014).
PDF Charles C. Davis, Charles G. Willis, Bryan Connolly, Courtland Kelly, and Aaron M. Ellison. 10/2015. “
Herbarium records are reliable sources of phenological change driven by climate and provide novel insights into species' phenological cueing mechanisms.” American Journal of Botany, 102, 10, Pp. 1599-609.
Abstract
PREMISE OF THE STUDY: Climate change has resulted in major changes in the phenology of some species but not others. Long-term field observational records provide the best assessment of these changes, but geographic and taxonomic biases limit their utility. Plant specimens in herbaria have been hypothesized to provide a wealth of additional data for studying phenological responses to climatic change. However, no study to our knowledge has comprehensively addressed whether herbarium data are accurate measures of phenological response and thus applicable to addressing such questions. METHODS: We compared flowering phenology determined from field observations (years 1852-1858, 1875, 1878-1908, 2003-2006, 2011-2013) and herbarium records (1852-2013) of 20 species from New England, United States. KEY RESULTS: Earliest flowering date estimated from herbarium records faithfully reflected field observations of first flowering date and substantially increased the sampling range across climatic conditions. Additionally, although most species demonstrated a response to interannual temperature variation, long-term temporal changes in phenological response were not detectable. CONCLUSIONS: Our findings support the use of herbarium records for understanding plant phenological responses to changes in temperature, and also importantly establish a new use of herbarium collections: inferring primary phenological cueing mechanisms of individual species (e.g., temperature, winter chilling, photoperiod). These latter data are lacking from most investigations of phenological change, but are vital for understanding differential responses of individual species to ongoing climate change.
PDF Carl Veller, Martin A. Nowak, and Charles C. Davis. 7/2015. “
Extended flowering intervals of bamboos evolved by discrete multiplication.” Ecology Letters, 18, Pp. 653-9.
AbstractNumerous bamboo species collectively flower and seed at dramatically extended, regular intervals - some as long as 120 years. These collective seed releases, termed 'masts', are thought to be a strategy to overwhelm seed predators or to maximise pollination rates. But why are the intervals so long, and how did they evolve? We propose a simple mathematical model that supports their evolution as a two-step process: First, an initial phase in which a mostly annually flowering population synchronises onto a small multi-year interval. Second, a phase of successive small multiplications of the initial synchronisation interval, resulting in the extraordinary intervals seen today. A prediction of the hypothesis is that mast intervals observed today should factorise into small prime numbers. Using a historical data set of bamboo flowering observations, we find strong evidence in favour of this prediction. Our hypothesis provides the first theoretical explanation for the mechanism underlying this remarkable phenomenon.
PDF Charles C. Davis and Zhenxiang Xi. 6/3/2015. “
Horizontal gene transfer in parasitic plants.” Current Opinion in Plant Biology, 26, Pp. 14-19.
Abstract
Horizontal gene transfer (HGT) between species has been a major focus of plant evolutionary research during the past decade. Parasitic plants, which establish a direct connection with their hosts, have provided excellent examples of how these transfers are facilitated via the intimacy of this symbiosis. In particular, phylogenetic studies from diverse clades indicate that parasitic plants represent a rich system for studying this phenomenon. Here, HGT has been shown to be astonishingly high in the mitochondrial genome, and appreciable in the nuclear genome. Although explicit tests remain to be performed, some transgenes have been hypothesized to be functional in their recipient species, thus providing a new perspective on the evolution of novelty in parasitic plants.
PDF Charlie G. Willis and Charles C. Davis. 5/15/2015. “
Rethinking migration.” Science, 348, 6236, Pp. 766.
PDF Liang Liu, Zhenxiang Xi, Shaoyuan Wu, Charles C. Davis, and Scott V. Edwards. 4/15/2015. “
Estimating phylogenetic trees from genome-scale data.” Annals of the New York Academy of Sciences, 1360, Pp. 36-53.
Abstract
The heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. Phylogenetic methods known as "species tree" methods have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Here we review theory and empirical examples that help clarify conflicts between species tree and concatenation methods, and misconceptions in the literature about the performance of species tree methods. Considering concatenation as a special case of the multispecies coalescent model helps explain differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences and long-branch attraction. We show that approaches, such as binning, designed to augment the signal in species tree analyses can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods incorporating biological realism are a key to phylogenetic analysis of whole-genome data.
PDF Goia M. Lyra, Emmanuelle de S. Costa, Priscilla de Jesus, Joao Carlos de Matos, Taiara A. Caires, Mariana C. Oliveira, Eurico C. Oliveira, Zhenxiang Xi, Jose Marcos de C. Nunes, and Charles C. Davis. 2015. “
Phylogeny of Gracilariaceae (Rhodophyta): evidence from plastid and mitochondrial nucleotide sequences.” Journal of Phycology, 51, Pp. 356-66.
Abstract
Gracilariaceae are mostly pantropical red algae and include ~230 species in seven genera. Infrafamilial classification of the group has long been based on reproductive characters, but previous phylogenies have shown that traditionally circumscribed groups are not monophyletic. We performed phylogenetic analyses using two plastid (universal plastid amplicon and rbcL) and one mitochondrial (cox1) loci from a greatly expanded number of taxa to better assess generic relationships and understand patterns of character distributions. Our analyses produce the most well-supported phylogeny of the family to date, and indicate that key characteristics of spermatangia and cystocarp type do not delineate genera as commonly suggested. Our results further indicate that Hydropuntia is not monophyletic. Given their morphological overlap with closely related members of Gracilaria, we propose that Hydropuntia be synonymized with the former. Our results additionally expand the known ranges of several Gracilariaceae species to include Brazil. Lastly, we demonstrate that the recently described Gracilaria yoneshigueana should be synonymized as G. domingensis based on morphological and molecular characters. These results demonstrate the utility of DNA barcoding for understanding poorly known and fragmentary materials of cryptic red algae.
PDF