(Duplodnaviria) All viruses of this realm share homologous MCPs (HK97-fold), large and small terminase subunits, prohead maturation proteases and portal proteins, indicating that their morphogenetic modules are monophyletic.
Euryarchaeota and Thaumarchaeota.
Furthermore, the observation that many bacterial members of the Duplodnaviria encode archaeal-like genome replication modules, which are not homologous to the bacterial functional counterparts, also argues in favour of the origin of this virus group antedating the archaeal–bacterial divide. The second realm of dsDNA viruses, Varidnaviria, is represented in prokaryotes by four families of bacterial viruses (Tectiviridae, Corticoviridae, Autolykiviridae and Finnlakeviridae), one family of archaeal viruses (Turriviridae) and the family Sphaerolipoviridae, in which different genera include viruses infecting either bacteria or archaea. However, mining metagenomic data for homologues of the DJR MCP using sensitive computational methods resulted in the discovery of a vast diversity of previously unknown viruses of this realm that, in all likelihood, infect prokaryotes. Actual host assignments await but some of these virus genomes were found in geothermal habitats, strongly suggesting archaeal hosts.
proviruses encoding DJR MCPs, which has substantially expanded the reach of Varidnaviria in both prokaryotic domains. Phylogenetic analysis of the concatenated DJR MCP and genome packaging ATPases of archaeal varidnaviruses suggested coevolution of this group of viruses with the major archaeal lineages rather than recent horizontal transfer from bacteria. Thus, most likely, the LUCA virome also included multiple groups of dsDNA viruses with vertical (both single and double) jelly-roll MCPs. Furthermore, reconstruction of DJR MCP evolution sheds light on the pre-LUCA stages of virus evolution. Among the ssDNA viruses (realm Monodnaviria), only members of a single order, Tubulavirales (until recently known as the family Inoviridae), consisting of filamentous or rod-shaped viruses, appear to be hosted by both bacteria and archaea.
However, whereas tubulaviruses are ubiquitous in bacteria, their association with archaea was inferred from putative proviruses present in several archaeal lineages, namely methanogens and aenigmarchaea. Such distribution has been judged best compatible with horizontal virus transfer from bacteria to archaea. Given their ubiquity in bacteria, the origin of filamentous bacteriophages concomitantly or soon after the emergence of the last bacterial common ancestor (LBCA) appears likely, whereas their presence in LUCA cannot be ruled out either. Similarly, microviruses with icosahedral capsids and circular ssDNA genomes are nearly ubiquitous in the environment and are genetically highly diverse. Although for the vast majority of these viruses the hosts are unknown, the few known isolates infect broadly diverse bacteria from five different phyla. It is thus likely that microviruses have a long-standing evolutionary history in bacteria, which probably dates back at least to the LBCA.
In the extant biosphere, RNA viruses dominate the eukaryotic virome but are rare in bacteria (compared with DNA viruses) and unknown in archaea. Bacterial RNA viruses are represented by two families, the positive-sense RNA Leviviridae and dsRNA Cystoviridae. The host range of experimentally identified members of both families is limited to a narrow range of bacteria (almost exclusively Proteobacteria). However, recent metagenomics efforts have drastically expanded the known diversity of leviviruses, indicating that their share in the prokaryotic virome had been substantially under-appreciated. Reverse-transcribing viruses are conspicuously confined to eukaryotes although prokaryotes carry a substantial diversity of non-packaging (that is, non-viral) retroelements, for example, group II introns. The extant distribution of the viruses of the realm Riboviria, with its drastic display of eukaryotic over prokaryotic host ranges, might appear paradoxical given the broadly accepted RNA world concept of the origin of life, implying the early origin of RdRP and reverse transcriptase (RT) and, as a consequence, the primordial status of RNA viruses. The origin of leviviruses within bacteria is best compatible with their currently characterized distribution and is a distinct possibility.
Furthermore, unlike the LUCA, for which most evolutionary reconstructions suggest a mesophilic or a moderate thermophilic lifestyle, the last common ancestors of bacteria and archaea are inferred to have been thermophiles or hyperthermophiles. Extremely high temperatures might be restrictive for the propagation of RNA viruses and thus could represent a bottleneck associated with the demise of the ancestral RNA virome (and potentially explain why RNA viruses are unknown in archaea). Thus, of the realm Riboviria, positive-sense RNA viruses are a putative component of the LUCA virome, The ancestral status of many archaea-specific virus groups is difficult to ascertain. However, some monophyletic virus assemblages, such as those with spindle-shaped virions, infect hosts from all major archaeal lineages and thus can be traced to the last archaeal common ancestor. Therefore, their presence in the LUCA virome, with subsequent loss in the bacterial lineage, cannot be ruled out either.
Virus evolution before the LUCA
The reconstruction of the evolutionary paths from ancestral host proteins to viral capsids sheds light on the early stages of evolution of both realms of dsDNA viruses. The DJR MCP of the Varidnaviria appears to be a unique virus feature, with no potential cellular ancestors detected. By contrast, the SJR MCP of numerous RNA viruses that were also acquired by ssDNA viruses through recombination can be traced to ancestral cellular carbohydrate-binding proteins, with several probable points of entry into the virus world. Thus, the DJR MCP, in all likelihood evolved from the SJR MCP early in the evolution of viruses. Remarkably, apparent evolutionary intermediates are detectable in two virus families. Viruses in the family Sphaerolipoviridae encode two ‘vertically’ oriented SJR MCPs that are likely to represent the ancestral duplication preceding the fusion that gave rise to the DJR MCP88–90. The recently discovered archaeal dsDNA viruses in the family Portogloboviridae contain one SJR MCP92 and thus appear to represent an even earlier evolutionary intermediate. Indeed, structural comparisons of the SJR MCPs from RNA and DNA viruses show that the portoglobovirus MCP is most closely related to the MCPs of sphaerolipoviruses. Combined with the inferred presence in the LUCA virome of multiple groups of Varidnaviria, the discovery of the intermediate MCP forms in capsids of extant viruses implies extensive evolution of varidnaviruses predating the LUCA. The families Portogloboviridae and Sphaerolipoviridae appear to be relics of the pre-LUCA evolution of varidnaviruses and, accordingly, must have been part of the LUCA virome. For the members of the second realm of dsDNA viruses, Duplodnaviria, no cellular ancestor was detected in the dedicated comparative analyses of the sequences and structures of virion proteins. However, a recent structural comparison has shown that the main scaffold of the HK97-like MCP belongs to the strand-helix-strand-strand (SHS2) fold (with the insertion of an additional, uncharacterized domain of the DUF1884 (PF08967) family) and appears to be specifically related to the dodecin family of the SHS2-fold proteins. Dodecins are widespread proteins in bacteria and archaea that form dodecameric compartments involved in flavin sequestration and storage and are thus plausible ancestors for the HK97-fold MCP. Although, in this case, there are no detectable evolutionary intermediates among viruses, the inferred presence of multiple groups of duplodnaviruses in the LUCA virome implies that the recruitment of dodecin and the insertion of DUF1884 are ancient events. Consistently, viruses with short tails (podovirus morphology), long non-contractile tails (siphovirus morphology) and long contractile tails (myovirus morphology) are all found in both bacteria and archaea, indicating that the morphogenetic toolkit of viruses with HK97-fold MCPs attained considerable versatility in the pre-LUCA era.
Virus replication modules
Each virus genome includes two major functional modules, one for virion formation (morphogenetic module) and one for genome replication. The two modules rarely display congruent histories over long evolutionary spans and are instead exchanged horizontally between different groups of viruses through recombination, continuously producing new virus lineages. The morphogenetic modules including the vertical jelly-roll and HK97-fold MCPs can be traced to the LUCA virome. One of the most widespread replication modules in the virosphere is the rolling circle replication endonuclease (RCRE) of the HUH superfamily. Homologous RCREs are encoded by viruses with SJR and DJR MCPs, HK97-like MCPs and morphologically diverse ssDNA viruses and are also found in many families of bacterial and archaeal plasmids and transposons. Thus, RCRE can be confidently assigned to the LUCA virome or mobilome (that is, all the MGEs of the LUCA). Protein-primed family B DNA polymerases (pPolBs) represent another replication module with a broad distribution spanning several families of viruses and non-viral MGEs62. pPolB is present in bacteria-infecting members of the realms Duplodnaviria (phi29-like podoviruses) and Varidnaviria (Tectiviridae, Autolykiviridae and diverse varidnavirus genomes identified in metagenomic data) as well as in several families of archaeal viruses (Halspiviridae, Thaspiviridae, Ovaliviridae and Pleolipoviridae). In phylogenetic analyses, pPolBs split into two separate clades corresponding to bacterial and archaeal viruses, strongly suggesting that they have coevolved with bacterial and archaeal lineages ever since their divergence from the LUCA. Two other key replication proteins that are among the most common in bacterial and archaeal viruses and MGEs are primases of the archaeo-eukaryotic primase (AEP) superfamily and superfamily 3 helicases (S3H). Whereas S3H are exclusive to viruses and MGEs, the viral AEP form specific families that are not closely related to the cellular homologues. Notably, bacteria do not employ AEP for primer synthesis, and thus bacterial viruses could not have recruited this protein from their hosts. Thus, AEP and S3H, along with RCRE and pPolB, appear to represent major components of the replication modules of the LUCA virome. More generally, contemporary duplodnaviruses display a remarkable diversity of genome replication modules, from minimalist initiators that recruit cellular DNA replisomes for viral genome replication to near-complete virus-encoded DNA replication machineries. In many cases, these DNA replication proteins do not have close cellular homologues, suggesting a long evolutionary history within the virus world. Notably, some of the phage proteins, such as helicase loaders, have replaced their cellular counterparts at the onset of certain bacterial lineages for the replication of cellular chromosomes. Although some tailed bacterial dsDNA viruses encode replication factors of apparent bacterial origin, in archaeal duplodnaviruses, the proteins involved in informational processes, including components of the genome replication machinery, DNA repair and RNA metabolism, are of archaeal type, with none of the known archaeal viruses encoding components of the bacterial-type replication machinery. Finally, tailed archaeal viruses carry archaeal or eukaryotic-like promoters, consistent with the fact that none of the known archaeal viruses encode RNA polymerases, further pointing to long-term coevolution with the hosts. These considerations argue against (recent) horizontal transfers of duplodnaviruses between bacteria and archaea accounting for the observed distribution of these viruses, even though some such transfers might have occurred. Thus, analyses of duplodnavirus and varidnavirus genome replication modules complement those of the morphogenetic modules and suggest extensive divergence of both groups of viruses in the pre-LUCA era.
Conclusions
The informal reconstructions attempted here suggest a remarkably diverse, complex LUCA virome. This ancestral virome was likely dominated by dsDNA viruses from the realms Duplodnaviria and Varidnaviria. In addition, two groups of ssDNA viruses (realm Monodnaviria), namely Microviridae and Tubulavirales, can be traced to the LBCA, whereas spindle-shaped viruses, most likely infected the last archaeal common ancestor. The possibility that these virus groups were present in the LUCA virome but were subsequently lost in one of the two primary domains cannot be dismissed. The point of origin of the extant bacterial positive-sense RNA viruses (realm Riboviria) remains uncertain, with both bacterial and primordial origins remaining viable scenarios. Further virus prospecting efforts could shed light on the history of these viruses. Although the inferred LUCA virome in all likelihood did not include members of many extant groups of viruses of prokaryotes, its apparent complexity seems to exceed the typical complexity of well-characterized viromes of bacterial or archaeal species. These observations imply that the LUCA was not a homogenous microbial population but rather a community of diverse microorganisms, with a shared gene core that was inherited by all descendant life-forms and a diversified pangenome that included various genes involved in virus–host interactions, in particular multiple defence systems.
According to the ‘chimeric’ scenario of virus origins, different groups of viruses evolved through recruitment of cellular proteins as virion components19. Here, we present evidence that — contingent on our mapping of both duplodnaviruses and varidnaviruses to the LUCA virome — several such events occurred in the earliest phase of the evolution of life, from the primordial pool of replicators to the LUCA. Moreover, virus evolution during that early era went through multiple, distinct stages as demonstrated by the reconstructed histories of the capsid proteins of the two realms of dsDNA viruses. The cellular SJR-containing carbohydrate-binding or nucleoplasmin-like proteins (the ancestors of the varidnavirus DJR MCPs) and the dodecins (the ancestors of the duplodnavirus MCPs) belong to expansive protein families that have already undergone substantial diversifying evolution prior to the origins of the two realms of viruses. The respective protein families do not belong to the universal core of cellular life, so their apparent pre-LUCA diversification further emphasizes the substantial pangenomic, organizational and functional complexity of the LUCA. This conclusion is indeed compatible with the previous inferences on the LUCA made from the analysis of coalescence in different families of ancient genes, namely that a common ancestor containing all the genes shared by the three domains of life has never existed108. Straightforward thinking on the LUCA virome might have envisaged it as a domain of RNA viruses descending from the primordial RNA world. However, the reconstructions suggest otherwise, indicating that the LUCA was similar to the extant prokaryotes with respect to the repertoire of viruses it hosted. These findings do not defy the RNA world scenario but mesh well with the conclusion that DNA viruses have evolved and diversified extensively already in the pre-LUCA era. The RNA viruses, after all, might have been the first to emerge but, by the time the LUCA lived, they had already been largely supplanted by the more efficient DNA virosphere. 8
Aude Bernheim (2019): For a microorganism to be protected against a wide variety of viruses, it should encode a broad defense arsenal that can overcome the multiple types of viruses that can infect it. Owing to the selective advantage that defense systems provide, they are frequently gained by bacteria and archaea through horizontal gene transfer (HGT). Faced with viruses that encode counter-defense mechanisms, bacteria and archaea cannot rely on a single defense system and thus need to present several lines of defense as a bet-hedging strategy of survival. Given their selective advantage in the arms race against viruses, one might expect that defense systems, once acquired (either through direct evolution or via HGT), would accumulate in prokaryotic genomes and be selected for. Surprisingly, this is not the case as defense systems are known to be frequently lost from microbial genomes over short evolutionary time scales, suggesting that they can impose selective disadvantages in the absence of infection pressure. Competition studies between strains encoding defense systems, such as CRISPR–Cas or Lit Abi, and cognate defense-lacking strains have demonstrated the existence of a fitness cost in the absence of phage infection. Access to a diverse set of defense mechanisms is essential in order to combat the enormous genetic and functional diversity of viruses. None of the strains encode all defense systems. However, if these strains are mixed as part of a population, the pan-genome of this population would encode an ‘immune potential’ that encompasses all of the depicted systems. As these systems can be readily available by HGT, given the high rate of HGT in defense systems, the population in effect harbors an accessible reservoir of immune systems that can be acquired by population members. When the population is subjected to infection, this diversity ensures that at least some population members would encode the appropriate defense system, and these members would survive and form the basis for the perpetuation of the population 5
Felix Broecker (2019): Cellular organisms have co-evolved with various mobile genetic elements (MGEs), including transposable elements (TEs), retroelements, and viruses, many of which can integrate into the host DNA. MGEs constitute ∼50% of mammalian genomes, >70% of some plant genomes, and up to 30% of bacterial genomes. The recruitment of transposable elements (TEs), viral sequences, and other MGEs for antiviral defense mechanisms has been a major driving force in the evolution of cellular life. 6
Muller's Ratchet: Another hurdle in the hypothetical origin of life scenarios
E. V. Koonin (2017): Both the emergence of parasites in simple replicator systems and their persistence in evolving life forms are inevitable because the putative parasite-free states are evolutionarily unstable. 3 E. V. Koonin (2016): In the absence of recombination, finite populations are subject to irreversible deterioration through the accumulation of deleterious mutations, a process known as Muller’s ratchet, that eventually leads to the collapse of a population via mutational meltdown. 2
Dana K Howe (2008): The theory of Muller's Ratchet predicts that small asexual populations are doomed to accumulate ever-increasing deleterious mutation loads as a consequence of the magnified power of genetic drift and mutation that accompanies small population size. Evolutionary theory predicts that mutational decay is inevitable for small asexual populations, provided deleterious mutation rates are high enough. Such populations are expected to experience the effects of Muller's Ratchet where the most-fit class of individuals is lost at some rate due to chance alone, leaving the second-best class to ultimately suffer the same fate, and so on, leading to a gradual decline in mean fitness. The mutational meltdown theory built upon Muller's Ratchet to predict a synergism between mutation and genetic drift in promoting the extinction of small asexual populations that are at the end of a long genomic decay process. Since deleterious mutations are harmful by definition, accumulation of them would result in loss of individuals and a smaller population size. Small populations are more susceptible to the ratchet effect and more deleterious mutations would be fixed as a result of genetic drift. This creates a positive feedback loop that accelerates the extinction of small asexual populations. This phenomenon has been called mutational meltdown. From the onset, there would have had to be a population of diversified microbes, not just the population of one progenitor, but varies with different genetic make-ups, internally compartmentalized, able to perform Horizontal Gene Transfer and recombination. Unless these preconditions were met, the population would die. 1
A plurality of ancestors
The origin of life did not coincide with the organismal LUCA; rather, a profound gap in time, biological evolution, geochemical change, and surviving evidence separates the two. After life emerged from prebiotic processes, diversification ensued and the initial self-replicating and evolving living systems occupied a wide range of available ecological niches. From this time until the existence of the organismal LUCA, living systems, lineages and communities would have come and gone, evolving via the same processes that are at work today, including speciation, extinction, and gene transfer. 4
Eugene V. Koonin (2020): The LUCA was not a homogenous microbial population but rather a community of diverse microorganisms, with a shared gene core that was inherited by all descendant life-forms and a diversified pangenome that included various genes involved in virus–host interactions, in particular multiple defense systems. 8
Horizontal Gene transfer, and the Origin of Life
Gregory P Fournier (2015): The genomic history of prokaryotic organismal lineages is marked by extensive horizontal gene transfer (HGT) between groups of organisms at all taxonomic levels. These HGT events have played an essential role in the origin and distribution of biological innovations. Analyses of ancient gene families show that HGT existed in the distant past, even at the time of the organismal last universal common ancestor (LUCA). Mobile genetic elements, including transposons, plasmids, bacteriophage, and self-splicing molecular parasites, have played a crucial role in facilitating the movement of genetic material between organisms. Ancient HGT during Hadean/Archaean times is more difficult to study than more recent transfers, although it has been proposed that its role was even more pronounced during earlier times in life’s history.
Aude Bernheim (2019): None of the strains encode all defense systems. However, if these strains are mixed as part of a population, the pan-genome of this population would encode an ‘immune potential’ that encompasses all of the depicted systems. As these systems can be readily available by HGT, given the high rate of HGT in defense systems, the population in effect harbors an accessible reservoir of immune systems that can be acquired by population members. When the population is subjected to infection, this diversity ensures that at least some population members would encode the appropriate defense system, and these members would survive and form the basis for the perpetuation of the population 5
Eugene V. Koonin (2014): Recombinases derived from unrelated mobile genetic elements have essential roles in both prokaryotic and vertebrate adaptive immune systems. 7
From the onset, there would have had to be a population of diversified microbes, not just the population of one species of progenitor, but varies with different genetic make-ups, able to perform Horizontal Gene Transfer (HGT) and recombination. Also, there had to be transposons, viral sequences, plasmids, viruses, mobile genetic elements, parasites, etc. Unless these preconditions were met, the population would go extinct.
Gene regulation
The regulation of genes is essential and performed in all life forms. Genes have to be expressed at the right time and encountered fast and with precision by the cell's machinery. It is often mentioned that genes are analogous to blueprints. A better comparison might be to compare them to books in a library. Each book contains the instructions to make a specific molecular machine, or how to operate the cell. The gene regulatory network compares to library software, to find books on the shelf. The regulatory circuitry controls how the cell has to operate, and how to respond and adapt to the surrounding environmental conditions. It activates transcription and represses it when needed, and is responsible for forming phenotypes that best adapt to the surrounding environmental conditions. It controls DNA replication, the partition of nascent chromosomes to form daughter cells, and the repair of DNA, among other essential tasks. Obviously, these functions had to be fully functional when life started, since they are indispensable.
1. Dana K Howe Muller's Ratchet and compensatory mutation in Caenorhabditis briggsae mitochondrial genome evolution 2008
2. Eugene V Koonin: Inevitability of Genetic Parasites 2016 Sep 26
3. Eugene V. Koonin: Inevitability of the emergence and persistence of genetic parasites caused by evolutionary instability of parasite-free states 04 December 2017
4. Gregory P Fournier: Ancient horizontal gene transfer and the last common ancestors 22 April 2015
5. Aude Bernheim The pan-immune system of bacteria: antiviral defence as a community resource 06 November 2019
6. Felix Broecker: Evolution of Immune Systems From Viruses and Transposable Elements 29 January 2019
7. Eugene V. Koonin: Evolution of adaptive immunity from transposable elements combined with innate immune systems December 2014
8. Eugene V. Koonin: [/size]The LUCA and its complex virome [size]14 July 2020
[/size]