Evolving in isolation: Genetic tests reject recent connections of Amazonian savannas with the central Cerrado

The effects of past climatic shifts remain enigmatic for the Amazon region, especially for islands of savanna within the tropical forest known as “Amazonian savannas” (AS). These disjunct savanna areas share many plant and animal species with the Cerrado biome in central Brazil (the CC), fuelling debate over historical connections. We evaluate hypothesized corridors between the CC and the AS, and specifically investigate whether a history of isolation versus recent connections is supported by genetic tests.


| INTRODUCTION
Climate change has induced historical shifts in landscapes, including the fragmentation of once widespread biomes into relatively isolated patches. The persistence of such populations and the evolutionary dynamics shaping their current genetic structure are commonly considered in studies of the Northern hemisphere following the glacial retreat of the Pleistocene (Hewitt, 2004;Knowles & Massatti, 2017;Pielou, 1992). However, the impact of past climatic shifts is not unique to these areas. The effects of climatic extremes are worldwide, with documented shifts of biomes leaving behind relict populations (e.g., Bonatelli et al., 2014;Migliore et al., 2013;Ornelas, Ruiz-Sanchez, & Sosa, 2010). However, tropical regions remain critically understudied relative to their northern counterparts. The evolutionary history of many tropical biomes is also enigmatic because of particularly sparse palynological or fossil evidence (e.g., Jaramillo et al., 2010) and limited or inconsistent support for a range of different hypotheses regarding the magnitude of climate-induced distributional shifts.
Such uncertainty is exemplified by debates over the evolutionary history of the central Cerrado (CC) and Amazonian savannas (AS) of Brazil ( Figure 1). The CC is a hyper-diverse, yet relatively understudied savanna biome that covers over 2 million km 2 . Many plant and animal taxa (including over 70 woody species) are present in the CC and AS, with some AS displaying higher floristic similarity to locations within the CC than to geographically proximate AS (Ratter, Bridgewater, & Ribeiro, 2003), suggesting past connections between the CC and AS (Prance, 1996;Silva, 1995;Silva & Bates, 2002), rather than independent long-distance dispersal events (see Pennington, Lewis, & Ratter, 2006). However, different hypotheses narrate how the retraction of the Cerrado from its former maximum extent might have occurred, which include past connections-that is, corridors-between the Cerrado and areas where AS persist today. Where such corridors might have existed, and which geographic areas they might have connected are still debated. For example, three different corridors between the CC and AS have been proposed: a coastal corridor, a central Amazonian corridor, and an Andes corridor (Haffer, 1967(Haffer, , 1974Webb, 1991). Depending on the study, support for hypothesized corridors differ, as does the purported timing of past connections between the CC and AS (e.g., Bueno et al., 2017;Quijada-Mascareñas et al., 2007;Savit & Bates, 2015;Vargas-Ramírez, Maran, & Fritz, 2010;Werneck, Nogueira, Colli, Sites, & Costa, 2012). That is, the uncertainty over the geographic location of corridors is paralleled by debate over when such connections might have occurred (e.g., during the Miocene and Pliocene, Pascual & Jaureguizar, 1990; versus the Pleistocene, Haffer, 1969;Prance, 1982;van der Hammen, 1991), including whether such connections might have been forged during the drier climate of the Last Glacial Maximum, LGM, especially given the lack of support for such late Pleistocene expansion based on palynological evidence (Colinvaux, Irion, Rasanen, Bush, & de Mello, 2001;Kastner & Goni, 2003;Mayle, Burn, Power, & Urrego, 2009).
Here, we address the extent to which the AS have evolved in isolation from the CC by quantifying population genetic structure of two widely distributed tree species that are common in both the CC and AS-Byrsonima coccolobifolia Kunth and Byrsonima crassifolia (L.) Kunth (Ratter et al., 2003). Specifically, we test the degree to which Cerrado populations are genetically distinct from the AS, as opposed to exhibiting parallel geographic structuring of genetic variation within the CC and among AS, as expected if multiple corridors provided regional connections between different areas of the CC and different subsets of AS. We conducted this test using genomic data (i.e., more than 7,000 and 4,500 loci sequenced in 86 and 68 individuals of B. coccolobifolia and B. crassifolia, respectively), as well as assays of the geographic structure of chloroplast DNA (cpDNA) across an even broader sampling of populations. In addition to the individual histories, we consider the degree to which the taxa show concordant patterns of genetic variation. As ecologically similar, dominant and co-distributed taxa, concordance would lend support to common factors structuring the history of constituent taxa in this diverse biome (Avise, 2004), thereby overriding stochastic processes associated with the biomes dynamic history (Behling & Hooghiemstra, 2001;Ledru, 2002;but see Massatti & Knowles, 2014. Lastly, we estimate divergence times between the CC and AS to determine how long the AS may have been evolving independently of the CC.

| Study species
Byrsonima Rich. Ex Kunth is a common genus, with most of its diversity represented by South American savanna taxa, many of which co-occur (Anderson, Anderson, & Davis, 2006;Ratter et al., 2003).
Byrsonima coccolobifolia and B. crassifolia are the most common species from the genus in the Cerrado and in the Amazonian savannas (Ratter et al., 2003), with the range of B. crassifolia extending into the savanna woodlands of Central America and Mexico (Anderson, 1981). Its fleshy fruits are bird-dispersed (Anderson, 1983) and flowers are pollinated by oil-collecting bees, especially Centris species (Benezar & Pessoni, 2006;Vinson, Williams, Frankie, & Shrum, 1997).

| Sampling and DNA extraction
Population sampling of B. coccolobifolia and B. crassifolia covered both species' ranges across the CC and AS (Figure 1; for details see Table S1.1, Appendix S1 in Supporting Information) and was informed by occurrence data from NeoTropTree (Oliveira-Filho, 2017)  DNA was extracted using a CTAB protocol (Novaes, Rodrigues, & Lovato, 2009) from silica-gel dried leaves that were stored at −20°C until DNA extraction. DNA quality was evaluated with Nanodrop ® (Thermo Scientific, Walthham, USA) and quantified with Qubit ® (Thermo Scientific).

| Genomic dataset
Two genomic libraries were prepared (one for each species) following the double-digest restriction site-associated DNA sequencing (ddRADseq) protocol of Peterson, Weber, Kay, Fisher, and Hoekstra (2012 quantile) and SNPs from the two last nucleotides ( Figure S1.1 in Appendix S1) to guard against sequencing and assembly errors.
Following this step, the software PLINK 1.07 (Purcell et al., 2007) was used to identify SNPs with a maximum of 20% of missing  Table S1.2 in Appendix S1).

| Characterizations of genomic variation and structure
Genetic structure was investigated using two different strategies: principal components analysis (PCA), which does not require any assumptions about the underlying genetic model (Jombart, Pontier, & Dufour, 2009), and Bayesian clustering, which applies a coalescent model for inferences about genetic structure. The packages "adegenet" 2.0 (Jombart, 2008;Jombart & Ahmed, 2011) and "ade4" 1.7-2 (Dray & Dufour, 2007) were used to perform a PCA in R; missing data were replaced by the mean frequency of the most frequent allele. The robustness of PCA results was evaluated using datasets with different levels of missing data (5 and 20%; see Huang & Knowles, 2016) and with an additional minimum stack depth per individual of 10. Because these results were qualitatively similar ( Figure S3.1 in Appendix S3), the results are not discussed further. Bayesian clustering was performed with the software STRUCTURE 2.3.4 (Pritchard, Stephens, & Donnelly, 2000), with only one SNP per locus. These analyses included admixture among populations and a correlation among allele frequencies with 1 to 10 genetic clusters (K) tested. Ten independent runs were performed for each K-value, with 100,000 burn-in and 300,000 MCMC iterations (the number of burn-in and MCMC iteration were increased when necessary to reach convergence). The most probable number of cluster was identified with STRUCTURE HARVESTER (Earl & Vonholdt, 2012), and the posterior probability of individual assignment to each cluster was permuted across different runs and visually displayed with CLUMPAK (Kopelman, Mayzel, Jakobsson, Rosenberg, & Mayrose, 2015). A hierarchical analysis with subsets of populations from each inferred genetic cluster was used to test for additional structure within the initial clusters identified by STRUCTURE (e.g. Massatti & Knowles, 2014;Papadopoulou & Knowles, 2015). Hierarchical analyses were performed with the same parameter settings described above, with Kvalues ranging from 1 to the maximum number of populations in each sequential analysis. Note that analyses of genetic structure in B. coccolobifolia suggested the presence of a cryptic taxon (i.e., PCA analysis revealed that the individuals were quite divergent, and distinct, from all the other populations; see Figure 2). Because inclusion of these populations (specifically, coAGN, coNAT and coFOR populations) would confound comparisons of CC to AS (e.g., compare PCA with and without these individuals; Figure 2), the populations were removed and are not included in the geographic structure results.
Tests of the association between geography and genetic structure were performed using two approaches in each species. Isolation-by-distance (IBD) was tested by evaluating whether there was a significant correlation between geographic distance and genetic dis- Slatkin, 1995) based on 100,000 permutations with the package "vegan" 2.3-1 in R (Oksanen et al., 2017). Additionally, a Procrustes analysis, which retains the relative longitudinal and latitudinal position of populations to test for an association between genetic variation and geography was used (for additional details see Appendix S3), with the significance of the association, t 0 , (Wang, Zöllner, & Rosenberg, 2012) evaluated by 10,000 permutations (package "vegan"). The robustness of the association between genes and geography was assessed using a sequential population drop-out procedure (see Knowles & Massatti, 2017). Geographic structuring of genetic variation was also assessed with additional Procrustes analyses conducted on the CC and AS separately.
Lastly, levels of genetic diversity were characterized for each population using the dataset with all SNPs (i.e., not the dataset with only a single SNP per locus). These include estimation of standard population genetics statistics such as nucleotide diversity (π), expected T A B L E 1 Number of individuals sampled, N, and estimates of genetic diversity per population of Byrsonima coccolobifolia and B. crassifolia (see Figure 1 for distributional map of sampled populations)

| Estimates of divergence times
Divergence times were estimated between the CC and AS using a composite-likelihood method based on the site frequency spectrum the population size of CC was calculated from the nucleotide diversity, π, of fixed and variable sites using a nuclear genomic mutation rate of 7 × 10 −9 subs/site/generation (Ossowski et al., 2010). This mutation rate was estimated based on spontaneous mutations of Arabidopsis thaliana, a herbaceous annual plant, and therefore divergence times estimated here will tend to be relatively more recent Analyses were run with 15 individuals selected from each of the sampled populations that had the smallest amount of missing data (see Table S3.1 in Appendix S3). Since B. coccolobifolia displayed some admixture between the CC and AS, we estimated divergence times F I G U R E 2 Principle Components Analysis (PCA) of Byrsonima coccolobifolia including (a) and excluding (b) populations that revealed cryptic genetic diversity indicative of potentially different species (i.e., the three divergent sampled populations: coAGN, coNAT, and coFOR). The amount of variation explained by each axis is given in parentheses and colours indicate population identity with and without the populations that displayed admixture (i.e., populations A-coHTA, coCHG and coVHA). We used a python script to calculate the folded joint SFS based on the vcf file from POPULATIONS (script is available on https://github.com/KnowlesLab; Papadopoulou & Knowles, 2015). Only loci with a minimum coverage of 10 that were present in all selected individuals were used to calculate the SFS.
Divergence times were estimated excluding monomorphic sites (i.e., using the "removeZeroSFS" option in FASTSIMCOAL2) and assuming no migration between the CC and AS (this assumption is corroborated by other analyses-see below); note that any violation of this assumption would result in underestimated divergence times (i.e., this assumption is conservative with respect to evaluating whether the AS have had a relatively short history of isolation from the CC).

| RESULTS
Measures of genomic diversity were generally similar across populations in both species (Table 1), whereas cpDNA diversity varied somewhat between taxa and among populations (Table S2.1 in Appendix S2), including the fixation of a single cpDNA haplotype in some populations (Figure 3), which contrasts with genomic diversity estimates (see Table 1 These results clearly do not support a LGM divergence even considering that the shortest possible generation time of 3 years was used. We also note the confidence intervals surrounding the parameter estimate for t do not overlap with the LGM. This conclusion is also robust to inclusion of admixed populations of B. coccolobifolia (Table 2). Even with considering potential errors in the mutation rate, the mutation rate would have to be six to twelve times faster than the one applied here to accommodate a divergence time consistent with the LGM.
However, as noted in the methods, mutation rates in woody plants are thought to be slower-not faster-than the one applied here, so a LGM divergence is extremely unlikely.

| DISCUSSION
The genomic distinctiveness of the CC populations from the disjunct AS and lack of any regional structure that group populations from given the general lack of phylogeographic studies of the AS, we reflect on the relevance of our results on the processes contributing to savanna species diversity, as well as to future conservation efforts.

Amazonian savannas
To explain the similarity between CC and AS, three regional connections or corridors were proposed: the coastal, the central Amazonian, and the Andean corridor (Haffer, 1967(Haffer, , 1974Silva & Bates, 2002;Webb, 1991). These corridors are hypothesized to have connected the CC and AS during waves of Pleistocene savanna expansions (Haffer, 1969;Silva & Bates, 2002), possibly as recent as the Holocene (see de Freitas et al., 2001). However, our genomic data did not provide strong support for the existence of such corridors in either species. Instead, analyses suggest a history of restricted gene flow between the CC and AS (Figure 4), with the AS evolving in relative isolation from the CC over a history of divergence that predates the LGM (Table 2). This is corroborated by palynological evidence that draws into question any recent large savanna expansions that might have served as connections between the CC and AS (Colinvaux, Oliveira, Moreno, Miller, & Bush, 1996;Colinvaux et al., 2001;Kastner & Goni, 2003;Mayle et al., 2009). The only exception is the admixture detected in one south-western Amazon population (see suggests a more ancient common history, rather than the maintenance by corridors per se. Moreover, it implies that the differences in species composition between the AS and CC might reflect the cumulative loss of species in the AS (community relaxation -Connor & McCoy, 1979), rather than differences in the maintenance of diversity through successful/unsuccessful utilization of corridors.
Additional circumstantial evidence of localized extinctions rests in the observation that few Cerrado taxa are found across all AS populations (Ratter et al., 2003). Alternatively, with many taxa restricted in distribution to the CC, there might have been historical restrictions to expansion for many taxa such that they were never part of the AS, even when Cerrado reached its broadest historical distribution. Additional tests will be needed to evaluate this hypothesis.
These might include testing for evidence of environmental filtering or differences in the dispersal capabilities of exclusively CC taxa compared with those distributed across the AS, although no significant difference in seed dispersal syndromes for species present in CC and AS has been suggested in past studies (see Vieira, Aquino, Brito, Fernandes-Bulhão, & Henriques, 2002).

| Conflicting support for connections of the CC and AS
When comparing our results to past studies purported to support hypothesized connections between the CC and AS, several nonmutually exclusive explanations might account for such contrasting support of the corridor hypothesis. These include: (a) differences in the resolution of genetic markers, (b) relying solely upon applications of distributional or ecological-niche models, and (c) differences among taxa in access to the corridors due to historical contingencies or differences in the taxa themselves (i.e., speciesspecific traits). Below, we consider each of these explanations in turn with reference to results from our analyses of B. coccolobifolia and B. crassifolia.
The genetic marker applied to test a phylogeographic hypothesis can impact the likelihood that a study might find support for or refute a particular hypothesis (Knowles, 2009). In particular, tests that rely upon genetic structure as evidence of isolation (e.g., when support for putative corridors is based on the lack of genetic differ- In other words, if chloroplast data by itself are going to be used to refute a hypothesis of isolation, it is important to test whether the data may be consistent with a history of isolation, which can be evaluated using computer simulations (see Knowles & Maddison, 2002). Alternatively, and as we apply here, additional markers can be used to model the divergence history of the species (as opposed to the history of a single locus; see Knowles, 2009). Here, the parameterized divergence models support a long history of isolation between CC and AS regions that predate the LGM (Table 2).
For the Cerrado, evidence for the existence of corridors connecting areas north and south of Amazon comes primarily from distributional data (e.g., Ávila-Pires, 1995;Nogueira & Rodrigues, 2006;Silva & Bates, 2002), with some support for distinct routes of movement suggested by a few phylogeographic studies (e.g., Buzatti et al., 2017;Quijada-Mascareñas et al., 2007;Savit & Bates, 2015). As with concerns regarding inferences based on a single locus, inferences based on ENMs alone might also be misleading (as opposed to Assuming that such corridors existed, it is possible that some species just by chance, found themselves in the right place at the right time to have access to a corridor, whereas others did not. On the other hand, the lack of consistent support for corridors could also reflect deterministic processes related to species-specific differences (Massatti & Knowles, 2014Papadopoulou & Knowles, 2016 crassifolia would not have utilized corridors (if they existed). First, they are very common species and widely distributed (Figure 1), so unlike rare or patchily distributed species, they most probably would have had access to any putative corridor. Second, these attributes also make it less likely that any species-specific traits would have restricted their movement (i.e., they obviously can readily disperse to occupy vast areas of the Cerrado biome).

| Scale-specific effects of climate-induced distributional shifts
As possible remnants of a dynamic historical past, the tropical Amazonian savannas are similar to relict populations in northern latitudes (Pielou, 1991). However, this dynamic history, with cycles of climateinduced distributional shifts, contributes to the enigmatic nature of tropical relicts and debate over their role as drivers of divergence (e.g., Capurucho et al., 2013). By rejecting hypothesized periods of connectivity between CC and AS through putative expansions during glacial-interglacial periods (Prance, 1996;Silva & Bates, 2002), our study raises some intriguing questions about divergence of Cerrado species. Here, we make the argument that connections forged during cycles of expansion, while not extensive enough to support corridors between the CC and AS, may have played an important role in divergence within the CC and within the AS. In other words, geographic scale determines whether climatic oscillations promote connections.
Likewise, we note that the existence of regional structure itself within both the CC and AS, suggests a limit on the level of connectiveness across populations in the past (otherwise, the regional structure would have been lost, and the only structure would be the population level structure that was also observed; see Figure 5).
What might limit the role of climate-induced distributional shifts at the larger scale-that is, why were distributional shifts not associated with connections between the CC and the AS?
The most obvious answer is that the extent of savanna expansion (or conversely forest contraction) may have been more limited than previous proposals. For example, suggestions of a fairly stable forest during the LGM, especially for the western part of Amazon (e.g., Bush, Silman, & Urrego, 2004;Cheng et al., 2013;Colinvaux, Oliveira, & Bush, 2000), offer an alternative to Haffer's (1969) scenario of forest fragmentation during glacial periods. In addition, recent isotopic data sampled from the Amazon dry corridor (i.e., an area of current lower precipitation within the Amazon, Haffer, 1969) suggest forest physiognomies during the LGM consistent with the maintenance of rain forest (Wang et al., 2017), and/or its replacement by dry-forest habitats, instead of savanna (Bush, 2017;Pennington, Prado, & Pendry, 2000).  (Capurucho et al., 2013;Matos et al., 2016), suggesting that the connections we propose among AS populations based on regional structuring of genetic variation may not be an anomaly. It is clear that future research will be charting new directions about the drivers of divergence within the CC and AS as the focus shifts from one built on a history of corridors connections, to the independent evolutionary trajectories of the CC and AS.

| Conservation of Cerrado and Amazonian savannas
Despite the high endemism and species diversity, the Cerrado is rapidly being loss (less than 20% remains undisturbed; Strassburg et al., 2017), especially with the expansion of agriculture, cattle ranching, and charcoal production, and conservation of the Cerrado biome has received little attention. Although rates of loss have decreased over the last several years (i.e., since 2010), we are nevertheless loosing Cerrado faster than Amazon Forest (Françoso et al., 2015).
Given the extent of the biome, covering 2 million km 2 , assessments of genetic diversity and population structure arguably could provide important guidance in conservation efforts. Yet, with relatively sparse geographic sampling, and limited genomic study, such information is rarely considered in conserving this highly threatened biome. Analyses of broadly distributed taxa in particular, like B. coccolobifolia and B.
crassifolia, could be used to devise conservation strategies that protect not only the constituent taxa, but also preserve diversity generating processes (see Moritz, 2002). For example, our study revealed an unexpected cryptic species in B. coccolobifolia from the central and northern areas of the CC, an area reportedly of high species richness (Ratter et al., 2003).  Figure 3 and Table 1). However, their apparent genetic isolation does place them at substantial risk ( Figure 4). Moreover, these populations arguably should be considered as unique Cerrado environments in conservation efforts of the biome, given their relatively long isolated history from the CC (see Table 2). Even though most AS display much less species diversity than the CC (but see Ratter et al., 2003 for exceptions), some AS contain more than 250 plant taxa (Miranda, Absy, & Rebelo, 2003;Sanaiotti, 1997), in addition to vulnerable and endemic species of birds, reptiles, amphibians, and plants (Barbosa, Campos, Pinto, & Fearnside, 2007;Carvalho, 1997;França, Mesquita, & Colli, 2006;Rocha & Miranda, 2014). It is also important to note that the number of species in the AS most likely is larger considering that these areas are highly understudied (Carvalho & Mustin, 2017). Lastly, AS are under particularly high anthropogenic disturbance because they are misleadingly considered as natural pastures in an environment largely dominated by forest (Miranda et al., 2003), making immediate attention as conservation units an imperative (Carvalho & Mustin, 2017).

| CONCLUSIONS
Our results show independent evolution of the CC and AS populations of both broadly distributed tree species studied here (B. coccolobifolia and B. crassifolia), casting doubt on the importance of corridors in structuring Cerrado plant communities. In the context of understanding the evolutionary history of AS populations in particular, it is possible that climatic change in the tropics, and/or differences in the traits of the species themselves, might make certain corridors more or less accessible during different geologic periods (Silva & Bates, 2002;Wüster et al., 2005), but careful consideration of this hypothesis will require expanding the dataset to other broadly distributed taxa.
Specifically, our genomic data suggest a relatively long history of isolation between the CC and AS regions that predates the LGM, as well as population structuring of genetic variation within regions in both species. The contrast between genetic structure of genomic versus chloroplast datasets also highlights the need for cautious interpretation of what constitutes evidence for the corridor hypothesis. Our findings suggest that methodology, not biology, may contribute to some of the differences in support for the corridor hypothesis reported across studies. Lastly, as a biodiversity hotspot, these results have direct implications for diversification in the Cerrado, as well as its conservation, especially given extensive and ongoing habitat destruction (Carvalho & Mustin, 2017;Mittermeier et al., 2004).

ACKNOWLEDG EMENTS
The authors thank the Instituto Chico Mendes de Conservação da