Maximum Genetic Diversity (MGD)
A scientific hypothesis explaining why mutation rates are not consistent across species, or regions of the genome.
Charting MGD's effect over time: The top chart illustrates the hypothesis that macroevolutionarily, neutral non-coding regions of the genome become functionally coding as time passes. The bottom chart illustrates how MGD is hypothesized to be reached microevolutionarily once the entire neutral region of a population's genome has had time to experience and preserve mutations.
Pardon the formatting in the document below, as the links to the WayBack Machine demonstrate - this was originally on Wikipedia circa 2019 after I worked with the brilliant Dr. Shi Huang to get it written up as well as I could, being that I haven’t formally studied biology since AP Bio over 20 years ago. And this page stayed up, until my dad and I started trying to get our peer reviewed look at COVID-19’s likely laboratory origins linked onto Wikipedia, since passing peer review is inarguably their gold standard. Then, in true CCP censorship fashion, they went back and saw I had originally written and created it - so they deleted it. Well motherfuckers, the internet don’t forget.
Maximum Genetic Diversity (MGD) is a scientific hypothesis relating to molecular evolution, which is the study of how and why populations of organisms experience genetic changes over time.
MGD starts with the observation that some regions of the genome are more likely to preserve mutations into the next generation than others. This difference in the observed rate of mutation means some regions of the genome appear to mutate faster than others, and is theorized to relate to balancing the preservation of vital information relating to a species' function against its ability to mutate and adapt to new environmental niches. According to MGD, these regions of the genome eventually drift into two rough categories: faster-mutating sections tuned to respond quickly to environmental pressures and allow adaptive radiation, as well as slower-mutating sections involved in an organism's most fundamental instructions.
Because MGD asserts that only slow-mutating genes accurately reflect shared evolutionary history, relationships between species can alternatively be calculated by their "maximum genetic diversity," which is determined by measuring the frequency of mutations in specific corresponding regions of orthologous genes instead of using raw overall genetic similarity.
Using calculations based on mutations in these slow-mutating genes provides a chart of genetic ancestry that lines up with the fossil record - measurements based on raw genetic similarity yield results that clash with the fossil record. Also due to this grouping into fast and slow, MGD hypothesizes that over time complex organisms become genetically fragile and less tolerant to mutation as their MGD decreases, since an increasing proportion of their genome will have become slow-mutating over time.
MGD asserts that this is because increased organismal and social complexity means more of the genome is needed to preserve the expanding instructional manual necessary for complex behavior and function, and so more of an organism's genome must become slow-mutating as the organism increases in complexity, since being slow-mutating preserves and protects those vital instructions.
MGD seeks to reconcile the inconsistencies observed around the neutral theory of molecular evolution, whose "original lines of evidence... are now falsified" according to a paper published in Oxford's Molecular Biology and Evolution in 2018. One example of this is that supposedly consistent and neutral mutation rates from proteins across a wide range of species were demonstrably not neutral nor consistent. Another study published in Nature in December 2019 noted that "defining the evolutionary time scales according to the molecular clock is intrinsically biased, especially for proteins of complex organisms." Although a number of other arguments have been proposed against the neutral theory in recent years, there is not a yet a consensus that the neutral theory is entirely falsified and counter-arguments against the role of selection do exist.
Furthermore, beyond the fact that MGD is still relatively unknown, it also contradicts the current paradigm in molecular evolution, since the neutral theory's fundamental premises are still nearly ubiquitously utilized in genetic analysis and admixture studies. Additionally, some of the phenomena explained by MGD could theoretically be accounted for by other processes such as gene conversion or concerted evolution. Lastly, even if the neutral theory is disproved, it does not necessarily validate MGD, as alternative theories have been proposed that also incorporate the effects of selection on the genome.
And so MGD will have to be more rigorously tested against any alternative theories before becoming widely adopted. However, to date MGD has not been contradicted in peer-reviewed literature, and its assumptions and framework have been confirmed when it comes to examining the ratio of brain-specific proteins in a range of mammals, for classifying and timing the evolutionary genetic structure of a wide range of organisms ranging from yeast to primates, by evaluating the genetic fitness of yeast which become more genetically fragile as they become more fit, by a genetic model that seeks to more accurately model not only the location of mutations but the rate at which they occur, and by the observation that vital slow-mutating genes are more protected by "transcriptional scanning" in mammalian testes than fast-evolving genes involved with responding quickly to environmental challenges.
Under MGD, modern evolutionary theory becomes an interplay between short-term microevolution which follows the neutral theory's expectation of random but predictable rate of linear change, and longer-term macroevolution that cannot be timed with the same clock as microevolution since diversification can flow in punctuated fits and starts over long periods of time when a complex species disperses into an array of diverse environmental niches.
As this occurs, MGD anticipates that each population will preserve the slow-mutating section of its genome which holds its most fundamental instructions from mutations, but quickly preserve mutations at sites that provide greater environmental fitness depending on the pressures of each unique niche. Support for this supposition is provided by a genetic model that seeks to solve the inconsistencies between the way regions of proteins seem to mutate in unison and the speed at which that happens, which observed that mutations seem to occur in "avalanches" that drastically alter not only specific regions of the genome as commonly assumed, but also only for short periods of time, using MGD's modeling to create a more precise model of evolutionary change. Further support is provided by the fact that in mammals, highly diverse regions of the genome involved with responding to environmental pressures are less preserved by enzymatic processes in the testes than more vital regions, and so those more vital regions appear to evolve more slowly than regions that need to change quickly to respond to environmental challenges.
The fact that some genomic regions preserve mutations at different rates than others can be demonstrated when any three species separated by significant evolutionary time are compared two at a time: each pair can have aligned overlapping genomic positions in orthologous proteins where mutations get preserved at a far higher rate than the neutral theory's random drift statistically allows.
Additionally, these overlapping regions not only have higher mutation rates, they are also less likely to be involved with an organism's fundamental instructions within MGD's framework. This is supported by the observation that the higher percentage of active-coding exons a species has, the lower percentage of overlapping sites it will have – species with more of their genome designated for active-coding exons - stretches of nucleotides that are used by RNA to create proteins - have a lower proportion of high-mutation overlapping sites, and vice-versa.
Since an organism's fundamental genomic structure can be traced by the maximum possible diversity of observed mutations in orthologous genes – the MGD – shared with a given sister species, measuring evolutionary and genomic distance is not done universally under this theory: instead the unit of measurement for each pairing is set by the MGD of an orthologous gene shared by the species. The simpler of any two organisms then sets the MGD, which will always be independent from both mutation rates and time.
Calculations derived from the neutral theory result in the false conclusion that fish are equidistant to every single evolutionarily ascending species: the same distance from homo sapiens as to snake, and from ox to rabbit, rat, pig and tiger and boar. Calculations using MGD yield results that capture the branching and punctuated nature of speciation, preserve its gradual increasing fractal complexity over time, and are consistent with patterns of speciation deduced from the fossil record.
Since the 1960s when the term was popularized by Richard Lewontin, the amount of genetic diversity in populations was accepted to tick steadily at a rate timed by a molecular clock set by the mutation of a hemoglobin protein in most vertebrates, which was first calculated by Emanuel Margoliash. This conclusion, that genetic diversity would accumulate within a population indefinitely over time, was reached because it was assumed that every population's genome would continually accumulate mutations as time passed - and so the more mutations that were observed the more basal and older a population was assumed to be since there was thought to be no upper limit as to how many mutations could accumulate.
And so timing this presumably stable and universal rate of mutation and hence diversity using the molecular clock was first theorized by Motoo Kimura, but popularized by Émile Zuckerkandl and Linus Pauling, and assumed to regulate all genetic variation both within and between species. Subsequently, the neutral theory and molecular clock were used in a variety of settings, most notably in phylogenetics, or the study of how different species change and pass on traits over time. Many measurements that are nearly ubiquitous in population genetics, such as the fixation index, are also based on the molecular clock.
However since its inception there have been points against the neutral theory and its molecular clock's fundamental assumptions, such evidence that they may be affected by natural selection. Despite this, the molecular clock was assumed to regulate all orthologous genes inherited from a common ancestor, and used to set the historic rate of speciation across the animal kingdom, as well as answer questions around the evolutionary and genetic relationships between species.
Despite its widespread use, the molecular clock can encounter a ten-fold rate of error depending on whether dates are being assessed within or before the past two million years, and a twenty-fold rate of error within the same time-frame depending on exactly which way the molecular clock is applied. Additionally, a contemporary review of four different papers meant to address these and other issues with the neutral theory's molecular clock noted that none of the possible ways explored to fix them appeared tenable.
Additionally, it has been argued that there is no independent evidence to support the molecular clock's premise that all species have similar mutation rates, and the neutral theory fails to note and explain the common occurrence of overlapping mutations: where mutations in independently evolving species occur at orthologous overlapping protein positions at a rate too high to be neutral. MGD provides a framework for the genetic equidistance phenomenon, and it appears to align findings derived from predictable genomic patterns that have been observed in simple organisms like yeast all the way up to the most complex, homo sapiens.
Contrary to Emanuel Margoliash's original assumption that genetic distance could be universally determined by the rate of mutation in a blood protein, and then calculated for all life on earth by time alone - meaning that all mutations on earth were set by that protein and had a biologically-universal rate that is constant and steady - the genetic equidistance phenomenon could also be explained by MGD's assumption that mutation-rates are specific to each gene and might vary across species and within populations.
In 2008, a cancer researcher, Shi Huang, then a faculty member at the Sanford Burnham Prebys Medical Research Institute in La Jolla California and since 2009 at South Central University China, independently discovered the genetic equidistance phenomenon and first published a preprint describing the Maximum Genetic Diversity theory in 2008, which was published as a peer-reviewed book chapter later that year.
In the following years, there has been an increasing amount of evidence giving validity to MGD, and it has been used to answer questions about genetic diversity, the ratios of brain-specific genes for rodents and primates, and the phylogenetic relationships between a variety of different species - from primitive worms to fruit bugs, snails and crayfish, all the way up to primates. MGD has also been used as in biomedical contexts like cancer research. The reception of the theory in the fields of genetics and evolutionary biology has not been evaluated so far, though the theory has been explored in a number of publications. In 2016, MGD provided the framework for the "Increasing Functional Variance" (IFV) hypothesis, which posits that MGD is correct but incomplete in that it the genome needs to be further sorted into more categories beyond only fast and slow, and that increasing species complexity might not decrease as a rule due to the increased number of categories.
Although in 2010, MGD was described in the Handbook of Developmental Science, Behavior, and Genetics as resolving "all the paradoxes in molecular evolution," additional avenues of exploration remain open since MGD was not applied to original research until 2016, and has been utilized by just a handful of outside studies and papers since then.
Mutating, Fast and Slow
Because simpler organisms are less likely to be affected at all by any one single-base mutation in their exons, or functionally-active coding stretches of their genome, MGD considers them to be more genetically robust than more complex organisms whose genomes are less tolerant to mutation and so are thought to be more fragile. MGD theorizes that in complex organisms that depend on myriad interconnected networks of proteins and regulation, there is far less margin for error since the odds that a substitution will create an erroneous and detrimental base-change in a crucial stretches of the genome increase as more DNA sequences and their subsequent proteins are needed in additional fine-tuned cell-types and epigenetic functions.
Like interspersed series of genetic capstones, these stretches of code become relatively less tolerant to mutation and will appear to be mutating more slowly when compared to faster-mutating less-fundamental regions of the genome according to MGD. One example of this is the observation that mtDNA is not selectively neutral - meaning that certain versions of its alleles are much more beneficial than others - since its average diversity holds across all animal phyla, capturing mtDNA's comparatively slower rate of mutation and intrinsic functional importance. And support for the assertion that genomic fragility comes hand-in-hand with evolutionary fitness was confirmed by a study on the effects hundreds of different mutations had on fitness in yeast, which found that "more-fit strains are less robust."
Comparing paper and real airplanes further demonstrates the effect of differing mutation rates: many different materials and fold-geometries can create a paper airplane that flies for 10 feet. Compare that to a plane that can take off and land under its own power carrying hundreds of souls which has several orders of magnitudes more complexity and much less tolerance for error since it performs far more complex functions than simply gliding after being pushed. While the materials do not need to be perfect to accomplish the first goal, small inconsistencies in the material or their configuration can easily lead to failure for the second – there is far more margin for error building a 8"x11" paper airplane than a Boeing 747. Changing the 747's paint color or the material covering its seats could have nearly infinite permutations, but the tolerances for its guidance system - depending on an extraordinarily delicate interplay between impossibly delicate hardware and thousands of lines of coded software - are almost nonexistent.
MGD asserts that as organismal and social complexity grows, the need to preserve the fundamental structure of an organism – the most basal directions involved in the most basic development and functions which mutate more slowly – becomes balanced against the need to be able to adapt to increasing numbers of environmental pressures and challenges - done by faster-mutating regions that respond adaptively to environmental pressures. MGD posits that the variability in the rate of change causes evolutionary selective pressures to sort alleles into two rough groups: slow-mutating ones involved with an organism's most basic structure and function, and fast-mutating ones that respond quickly in order to increase the odds a beneficial mutation occurs and is preserved.
As time passes and a species increases in complexity, MGD theorizes that a greater proportion of its genome becomes slow-mutating as a greater amount of information becomes needed to preserve a more complex organism's fundamental development and behavior. And with a larger proportion of its genome dedicated to intrinsic instructions, MGD now considers the species more genetically fragile.
Intertwined evolutionary effects
Under MGD, as organisms increase in complexity population-wide genetic diversity is regulated by the need to maintain a harmonious balance between those two broad categories: fast-mutating alleles that adapt quickly to the pressure of a given environment, and slow-mutating ones that preserve the most fundamental and basal instructions for the organism. Maintaining this balance means that simpler organisms will have a higher percentage of their genome able to tolerate mutational change, since simple means less-complex biological and epigenetic processes that are more tolerable to change than those of more and genomically-delicate complex organisms according to MGD. As organismal complexity increases, the margin for genomic error narrows and toleration for new mutations shrinks since within MGD's framework higher-order life means more complex cellular mechanisms and more fragile biological processes. This balance can also be seen when a cross-species comparison is made between the substitution rates of brain-specific genes and more widely expressed ones: the more complex an organism is, the less tolerant to mutation and delicate its complex brain-specific genes will be, and the more brain-specific genes it will have compared to more widely expressed ones.
In more general terms: a jellyfish would have some chance of reproducing if it was born with 10 stingers instead of 11, while a mammal would have none with a missing limb. And more specifically: cockroaches can survive and function for weeks without their heads, whereas a human is at risk of debilitating injury and even death if they trip and hit their head.
MGD suggests that maximum population-wide genetic diversity can increase up to a point that is set by the physiological and epigenetic complexity of the organism and its environmental interactions, but past that maximal fitness is decreased because the level of mutation becomes maladaptive by deleteriously altering an organism's fundamental instructions. Having less than the maximum and ideal level of genetic diversity means poor adaptive capacity to respond to changing environmental pressures under MGD, and higher than the ideal maximum means damage to the basic physiology of the organism because its most basal instructions become damaged.
MGD theorizes that a population will have the most evolutionary success when its diversity level is properly tuned to its environment, and when the levels of its slow- and fast-mutating alleles are optimally balanced.
The slow clock
MGD calculates the time and amount of genetic divergence between species by first randomly picking a statistically-significant set of orthologous genes shared between any three macroevolutionary-distant species. Genes are sorted as either fast or slow after the alignment of two more closely-related species' orthologous genes alongside the third less-related species. If no amino-acid positions overlap, meaning they share a mutation at a given position, the gene is assigned a score of 0. For genes with any overlapping amino-acid positions at all, the higher that count of overlapping positions the faster mutating the gene is considered to be. Since there is no hard ratio necessary to measure MGD, the genes would then be sorted into roughly half slow-mutating and half fast-mutating.
Divergence time is then calculated by first using the accepted date from the fossil record as a multiplier in an equation that divides the distances of any slow-mutating gene shared by the species against each other. Using MGD accurately matches genetic dates of divergence with those established by the fossil record for species ranging from humans and octopuses versus cockles, as well as snakes, humans, and birds - covering a range of tens of millions of years. No matter what species was compared with humans - ranging from yeast to louseorangutan - the more complex species always had a greater sequence similarity to humans in slow-evolving genes as predicted by MGD.
In any group of species A, B, and C - with A the most complex and C the least - just because A and B might appear closer to each other as far as their entire genomes are considered, that does not necessarily mean they can be grouped together, and C considered an outgroup. To chart their genetic relationships, only the distances between slow-evolving genes can be used under MGD. Only when both A and B are the same distance away from C measured by their slow-evolving genes can they be considered a separate clade.
The "slow clock" of MGD is based on reports that using the molecular clock derived from the neutral theory to time species divergence can be off by up to a factor of twenty, depending on whether a single unchanging species is being used or an inter-species comparison is being made. MGD holds that the molecular clock can still be used to accurately measure genetic diversity in relatively short time scales among similar species, while its accuracy fades when it is applied to windows over hundreds of thousands of years, and when applied to species with diverse phenotypic expression – demonstrated by the fact that results derived from the molecular clock do not always seem to align with the fossil record.
MGD's slow clock also provides parsimony for estimating phylogenetic events such as the marsupials' split from the rest of mammalia about 150 million years ago, the biological explosion at the K-T boundary approximately 66 million years ago, as well as the divergence of the genus Homo from Pan several million years ago.
Overlapping origin stories
Starting from the fact that some regions of the genome preserve mutations faster than others, MGD builds an explanatory framework for the fact that population-wide genetic diversity does not always increase from small to large without an upper limit, something that must always happen under the neutral theory and its infinite site models which posit that over time a population will continue to accumulate mutations at a steady rate as time passes upwards to infinity and beyond.
MGD posits that when two populations have different genetic diversity levels, it does not necessarily mean that the population with lower genetic diversity is descended from the one with higher genetic diversity as implied by the neutral theory. Under the neutral theory's molecular clock, the most basal or older populations will always have the highest rate of diversity because existing first means more mutations would have had time to accumulate in their genome. However under MGD, higher overall genomic diversity may simply be due to having more fast-mutating alleles needed to deal with a wider array of environmental challenges, but since genetic distance can only be measured by slow-mutating genes, raw overall diversity rates alone should not to used to derive genetic relationships under MGD since slow-mutating genes may make up a minority of the genome.
The fact that most broad phenotypic traits are regulated by multiple loci is also incompatible with the neutral theory, since it would be statistically unlikely for enough linkage disequilibrium to form across the genome if mutations were occurring randomly. MGD accounts for this, since phenotypically-linked fast-mutating SNPs are recognized to respond to selective pressures more rapidly than the slow-mutating more basal SNPs. MGD also explains why raw genetic diversity does not flow temporally from basal to more modern as a concrete rule.
MGD may be related to existing genealogical phenomena such as the unexpectedly high rate of de novo genes, and X:A ratios, which can vary widely within populations of a species since some populations require more autosomal diversity to cope with environmental stressors, while sex-chromosome diversity remains fixed. MGD contradicts the fixation index, which assumes the neutral theory applies across the entire genome and only considers fast-mutating autosomal DNA in population genetics analyses. The fixation index subsequently contradicts the data available from cladological analysis of mtDNA and Y-DNA trees which have far hardier well-preserved roots that are less likely to preserve a mutation than flighty autosomal DNA, which incorporates environmentally-linked beneficial mutations quickly and readily.
Once again, bravo! A very fun interesting read.
Um yes, it seems kind of silly that scientists think that this would be true for all species, just from observing nature: "the molecular clock's premise that all species have similar mutation rates"
"Contrary to Emanuel Margoliash's original assumption that genetic distance could be universally determined by the rate of mutation in a blood protein, and then calculated for all life on earth by time alone - meaning that ALL MUTATIONS ON EARTH WERE SET BY THAT PROTEIN AND had a biologically-universal rate that is constant and steady - the genetic equidistance phenomenon could also be explained by MGD's assumption that mutation-rates are specific to each gene and might vary across species and within populations."
What if they haven't found all species on Earth yet, what if one species doesn't contain that blood protein yet is "alive?" What if they don't really understand or calculate time accurately yet?
Also, the MGD theory seems to better reflect the infinite complexity that is life on Earth, and in our universe. To simply lump it all together in a universal rate doesn't seem to reflect reality.
I think of a Star Trek episode in which there were two crews on the same ship, but they experienced time differently so one crew was in a hyper speed dimension in comparison to the other crew, yet both experienced time in their dimension as natural. So what is "normal" for one species with respect to time and space might be entirely different for another species. It seems more logical that their genetic evolutions would be different. They are often under vastly different time/space/environmental pressures, some of which we cannot accurately measure yet, like dark matter and energy and even gravity. Thank you for these articles, they are immensely though provoking!