Evolution is a powerful thing. In the span of generations it turns scuttling reptiles into towering sauropods and soaring birds, and it has made and unmade more living things than humanity will ever know. Understanding the relationships between the lineages of living things is one of the grander ways in which humans understand our place in the infinite assemblage of life, and it also tells us an enormous amount about how everything is related to everything else. For the right ultra-specific kind of nerd, it’s also barrels of fun. Fortunately, we have just such a nerd in attendance.
So I made cladograms for all my pets and plants.
Ancestors and Descendants
Cladograms differ from other classification diagrams in that they explicitly present and are organized in terms of evolutionary relationships. On a cladogram, organisms are placed based on their ancestry, not how different they appear to be from each other. Groups, in turn, are defined as a specific common ancestor and all of its descendants, again regardless of apparent differences. This contrasts cladograms with older classification systems that rely specifically on grouping things based on their appearance. In older diagrams, “Amphibians,” “Reptiles,” “Mammals,” and “Birds” coexisted as the four kinds of four-legged animals, with no deeper insight presented into how they might be related to each other. Each group was recognizable and that was enough. The primary value proposition of a cladistic system is to illuminate and demonstrate those relationships, and they are often not what people expect.
One of the most common surprises that cladograms reveal is that some highly recognizable, highly specialized groups of animals, such as whales, snakes, and birds, are relatively recent representatives of older lineages they no longer strongly resemble. In the older model, they were so distinctive that they seemed to warrant top-level billing as whole separate categories, but cladistic investigation shows that, while they are indeed definable groups in their own right, they are groups within other groups and those other groups cannot be fully understood without their inclusion. As a result, when cladistic definitions are in play, snakes are lizards, whales are even-toed ungulates, and birds are the last living dinosaurs, among other curiosities. For more on how cladistics works, have a look at this.
For pet owners whose animal friends are common domestic animals, a cladogram holds little appeal. Such people usually keep few species, so the relationships between them might be interesting but are not complex enough to make the effort of diagramming them feel quite as momentous as mine did. That is the delight of being an aquarist, and one that reptile and arthropod collectors can also understand. It is routine for aquarists to keep numerous species of fish in their aquaria, often from many parts of the world, and the diversity of shapes and behaviors they provide is part of their charm. In turn, because bony fish are by far the most diverse lineage of vertebrates, containing more species than all the other lineages combined, their evolutionary relationships can be much more involved than those of mammals or birds.
Houseplant collectors, likewise, often have many species in their homes and, like aquarists, often choose their green charges for a diversity of appearances. Plant evolutionary relationships are often even more involved than animal relationships thanks to the sheer abundance of plant species, the complexity of plant genetics, and the flexibility of plants’ general body plan.
And I am not only an aquarist, not only a houseplant collector, but also a planted aquarist, specializing in lavish aquatic gardens that often have more plant than fish species in them. I also have cats, so, I’m all of the above.
Seriously, Though, Why?
But also, using the concrete example of the organisms in my care, rather than the somewhat more abstract-seeming diversity of all of the planet’s life, places practical bounds on what would otherwise be a nigh-infinite project. We are not seeking to enumerate every connection between every living species on this teeming planet, but merely between the tens of species in my collection. This is a manageable task that nevertheless illuminates something about life on Earth, and that makes it worth doing.
Next Question: How?
There is a lot of software for making cladograms, and a shocking amount of it is free. Scientists often want to help and be helped by other scientists and free open-source software is a way to enable numerous researchers, each with a little relevant programming skill, to continuously improve a specialty program such as this. Protocols such as PhyML are a testament to their efforts. They are also difficult to use for this kind of project.
Most cladogram software works by comparing specific gene sequences fed into it and building cladograms for those genes. This is immediately useful for, say, researchers tracking the divergence of genes after gene duplication events (vital for understanding certain medical conditions and embryological milestones), but it is also potentially useful for situations like mine. I know my way around GenBank and the other public repositories of sequenced genes, and I used one of these programs (based on PhyML) to create a cladogram as part of my doctoral thesis, but here, it did not work out quite so easily.
These gene-based cladogram programs can only work with the data they are given, and that means that they are only as good as the available gene sequences. For a cladogram like this, it is necessary to pick a gene whose sequence is available for all species, and one whose evolution is conservative enough that it mostly tracks the time since their divergence from one another rather than recent selective pressure. Common choices for this sort of thing are genes involved in protein synthesis or mitochondrial function, which govern the most basic workings of metabolism. The need to avoid genes that geologically recent habitat or lifestyle changes might have altered limits the options that can be considered, and the need to find one gene whose versions in all species to be included are available in public repositories makes finding a good candidate gene exponentially more challenging with each additional species. In some cases, a species might be obscure enough to researchers to be missing data altogether. Atop these challenges, modern published cladograms that purport to relate entire species, not just specific genes, are generally the result of even more complex algorithms that synthesize the relationships suggested by multiple genes, and these programs are not as easily accessible. The Tree of Life Project provides a program like this as a web service, but charges for its use for groups of 10 or more species.
As is often the case for problems I am solving, the solution that worked best for me was a combination of online reading, Microsoft Excel, and graph paper.
Wikipedia, the Tree of Life Project, and various other sources provide extensive information on the current classification consensus, especially when combined. Using this information without specialized cladogram software costs one access to things like maximum-likelihood values that suggest how much confidence a researcher should have in each part of the cladogram, but I did not need those for what I was doing. The consensus information itself could be enough, especially with me rarely keeping large numbers of closely related organisms whose specific relationships might be difficult to suss out from such public information alone.
Information in hand, the next step was organizing it. This is where Microsoft Excel came in handy. With major clades generally having names, I could write out those names in a sequence of columns with a row for each species, adding extra columns as needed, and then use the alphabetization function to find species that shared higher-order groups. This sorted what was previously 19 animals and 30 plants by their consensus relationships, doing most of the work of organizing the cladogram automatically.
Then came turning data into art. I translated my Excel work into crude sketches on scrap paper to help make the mental switch and further organize my process, but the real work involved graph paper and math. Graph paper is useful for this kind of diagram because it enables visually accurate distances in two dimensions and reasonably sharp corners. I chose a minimum node length of four squares and to align all species in a single column on the right for visual clarity. This was also the maximum I could fit on a single sheet of graph paper on its short axis given that my nodes were seven layers deep. With that, I started sketching.
Something I figured out early was that I had to work from the most recent groups (pairs of relatively closely aligned species) toward the deeper, older links, not the reverse. The style of diagram depends on similar spacing throughout to maintain aesthetic cohesion (rather than, as more scientific versions do, deciding on spacing based on measures of genetic relatedness or palaeontological distance), so to get it right I had to work from the evolutionarily newest to the oldest splits. That meant marking related pairs first, then their connections to adjacent species or pairs, and then the next layer older, onward through all seven layers that applied. I performed this separately for animals and plants, since all 49 species would not fit on a single page. As a bonus, that makes it trivial to upload the two sections as separate images and avoid issues with file sizes.
Once I had a good sketch on graph paper for each cladogram, I scanned them, added text labels, and overlaid each hand-drawn line with an AutoShape line in PowerPoint. This added boldness and clarity to all the lines, and importantly, also effectively digitized both cladograms. In this form, they could be altered without me going back to graph paper and scanning. Which was good, because I would indeed need to alter them: my inventory changed over the course of this exercise, adding some species and removing others. I don’t think I got the spacing quite right on the additions, but I’m satisfied.
The 19 species of animals on my animal cladogram were about what I expected, but showing them like this led to some amusing surprises. My collection is mostly bony fish, so the relationships outside of the bony fish (tetrapods and invertebrates) all look much closer than they are. This is an artifact of the decisions I made while crafting this cladogram, which prioritized aesthetics over making the inter-node distances reflect evolutionary distance. This proved particularly funny for the invertebrates, where extremely distantly related arthropods (Amano shrimp) and mollusks (three species of snails) look as close to each other as some fish lineages are to each other. With a diverse enough assemblage of creatures, the difference evens out, but not here. It was similarly charming to have my cats nestled next to my turtle and to the aquatic frogs in my paludarium as the whole of the tetrapods.
Most aquarium fish fall into one of two deep bony-fish branches, the Ostariophysi and the Percomorpha, and those two branches are well represented here. Ostariophysi is almost exclusively freshwater and its constituent groups are very well known to tropical aquarists and anglers: cypriniforms (including carps, barbs, loaches, and suckers), characins (including tetras and piranhas), catfish, and knifefish (including electric eels). I have members of three of these linages in my collection. What may surprise people familiar with the shapes of these fish is that the catfish are closer to the characins than they are to cyprinids, given that cyprinids also famously have fleshy barbels on their faces. The details here are dense biochemical and anatomical tidbits that will be left out for brevity, but indeed, the cypriniforms are not catfishes’ closest relative.
Plants were a much more involved story. In addition to turning out to have far more species of plants on hand than I did animals, plant relationships are in an intense state of flux at present, being reevaluated extensively as new genetic information comes to light and is processed. A lot of old ideas about how plants are related to each other are being reconsidered now, and the field is constantly updating. The result was a cladogram with some familiar ideas, some expected oddities, and some dramatic surprises.
One reality I expected that this cladogram bears out is that the habitat and form of plants are both mostly irrelevant when it comes to their evolutionary connections. Aquatic plants are the majority of my collection and are found all over the cladogram, though they often cluster together in smaller branches. Similarly, whether a plant is a low rosette, a climbing vine, a woody shrub, an aquatic stem, or some other general shape is almost completely invisible on this cladogram. The Asparagales branch is an especially apt illustration: the tall, woody Dracaena has as its closest relatives in my collection the low succulent Hawarothiopsis and Phalaenopsis, an epiphytic orchid.
The two main surprises were, first, that a group with five representatives in my collection, the Lamiales, is in a state of such flux at present that the relationships within it were not resolved in any of the sources I checked, leading to a five-way unresolved junction I have marked with a dashed line. The second was what a large fraction of my collection in and out of water is in the single plant family Araceae, the arums. With my collection being as eclectic as it is, having one in six of them turn out to all be arums was unexpected, especially with several of these not strongly resembling one another: Cryptocoryne has long grass-like leaves, Epipremnum is long trailing vines, and Zamioculcas is a woody shrub resembling a cycad. The arum family is, for the record, most famous for the calla lily, titan corpse flower, and jack-in-the-pulpit, all characterized by an inflorescence that takes the form of a flower-covered spike with a sheath around it.
People with a little knowledge about plant classification might be startled to see the water lily Nymphaea off to one side and the main split in my flowering plants being between monocots and eudicots. Older texts divide flowering plants into monocots and dicots based on a variety of characteristics, including leaf veination, number of embryonic leaves (cotyledons, providing the names), and characteristic growth patterns, with the monocots presumed to be a primitive branch. In the 2000s DNA evidence revealed that most “dicots” were actually more closely related to monocots than they were to a small set of other dicot branches, including Amborella, magnolias and their kin, and water lilies like Nymphaea. Monocots’ distinctive anatomy is a specialized characteristic, not a primitive one.
Life Evolves Onward
I fully anticipate updating these cladograms as my inventory changes. I have one more addition to make to my inventory of animals, as well as any changes that might unfold in the future as fish age, perish, and make way for new acquisitions. Much more likely are changes to my plant collection; even now, I am testing some additions to my paludarium’s land area that might add some species. This is enough fun for me that, like my computer and electrical diagrams, keeping them updated is its own reward. With luck, these diagrams have also been interesting for you, dear readers.