“relative rank” the same as its “parent” node. However, if the node led to another node, the “child” node was given a rank one integer higher than the “parent” node. This procedure was applied to all subsequent nodes and taxa until every taxon in the tree had been assigned a relative rank. To assign a “final rank” to each taxon, they were entered into a spreadsheet and sorted by their relative ranks from lowest to highest. They were then assigned an ordered number from 1 to n. If only one taxon was associated with a given relative rank, the ordered number of that taxon became its final rank. If there were three or fewer taxa associated with the same relative rank, the lowest of their ordered numbers became their final rank. If there were four or more taxa associated with the same relative rank, the average of their ordered numbers was calculated and became their final rank. To assign age ranks, we obtained first appearance dates from the PBDB, ordered them, and assigned ranks beginning with the oldest. Taxa with no date listed in the PBDB were eliminated from the calculations. In this way, the lowest ranks correspond to the basalmost branches of the tree and the earliest appearance in the fossil record. B. Correlating the Ranks We used Spearman’s Rank Correlation (SRC) to measure the fit between the clade ranks and the age ranks. SRC is monotonic, which means that it measures order as opposed to linearity. In macroevolutionary theory, we would expect taxa to make their first fossil appearances in the same order that those taxa evolved, all other things being equal. If clade ranks and age ranks are completely congruent, a straight line with a slope of 1 is expected, regardless of whether the relationship is strictly linear. This would mean that taxa that are believed to have evolved first always appear earlier in the fossil record than taxa that are believed to have evolved later. Negative SRC values are also possible, demonstrating the opposite trend. We calculated the SRC coefficient (rho) for each tree based on our age and clade ranks using Python. All additional analyses were conducted in R. C. Simulations To simulate the correlation of large sets of points, we began with the empirical distribution of taxon sample sizes from our database of phylogenies. For each real phylogeny, we created a set of points consisting of exact rank matches, e.g. (1,1), (2,2), (3,3)...(n,n), where n = the number of taxa in that phylogeny. We then scrambled a set percentage of those points, so that only the remaining unscrambled points were perfectly correlated. We repeated this procedure for every phylogeny in our database. We then calculated Spearman’s rho and p-values for each of these sets of points. In this way, we could simulate correlations with the precise taxon sample sizes of our real phylogenies while evaluating different fractions of perfectly correlated points. Following this procedure, we produced four separate sets of simulated correlations, one each for 60%, 50%, 40%, and 30% randomized points, thus giving an expected correlation of approximately 0.4, 0.5, 0.6, and 0.7. These fractions of randomized points were chosen specifically to match the spread of correlations among our real phylogenies. III. RESULTS Our full dataset consists of 2,721 phylogenies that were ranked by clade and age. The phylogenies are almost entirely Phanerozoic. Only three phylogenies included Proterozoic taxa; all three were Porifera phylogenies that spanned the Precambrian to the Cenozoic. Slightly more than half of the phylogenies had their first appearance in the Mesozoic (1401, 51%). Only 132 phylogenies (4.9%) span the entire Phanerozoic, from the Paleozoic to Cenozoic (Table 1). For convenience in future studies, we assigned one of 38 higher-order taxonomic “groups” to every phylogeny, based on classifications from Graeme Lloyd’s website. The traditional Linnean-type rank of these groups varied from mammal orders to entire phyla (e.g. echinoderms or hemichordates). The five largest groups were the dinosaurs (655 phylogenies, 24.0%), the non-dinosaurian archosauromorphs (351, 12.9%), non-mammalian synapsids (164, 6.0%), Mesozoic mammals (158, 5.8%), and cetaceans (148, 5.4%). Collectively, these five groups account for 1,476 phylogenies (54.2%). The mean number of phylogenies for a group is 71.6, while the median is 30.5. The number of phylogenies for each group is shown in Fig. 2 and listed in Table 2. We selected phylogenies for our study based on completely unique taxon samples. For papers where multiple tree topologies were published for a single set of taxa, we calculated clade ranks and correlations for all trees, then chose the tree with the best correlation for our dataset. Although this strategy reduced “absolute” redundancy of phylogenies using the exact same taxon sample, our dataset nevertheless contains a great deal of “relative” redundancy. For example, Stratigraphically Highest Taxon Cenozoic Mesozoic Paleozoic Stratigraphically Lowest Taxon Cenozoic 543 (20.0%) Mesozoic 710 (26.1%) 691 (25.4%) Paleozoic 132 (4.9%) 435 (16.0%) 207 (7.6%) Table 1. Breakdown of Phanerozoic phylogenies by stratigraphic coverage. MCGUIRE et al. Testing the order of the fossil record 2023 ICC 480
RkJQdWJsaXNoZXIy MTM4ODY=