ElShamah - Reason & Science: Defending ID and the Christian Worldview
Would you like to react to this message? Create an account in a few clicks or log in to continue.
ElShamah - Reason & Science: Defending ID and the Christian Worldview

Welcome to my library—a curated collection of research and original arguments exploring why I believe Christianity, creationism, and Intelligent Design offer the most compelling explanations for our origins. Otangelo Grasso


You are not connected. Please login or register

Perguntas ....

Go to page : Previous  1, 2, 3, 4, 5, 6 ... 12 ... 19  Next

Go down  Message [Page 5 of 19]

101Perguntas .... - Page 5 Empty Re: Perguntas .... Wed Sep 22, 2021 6:39 pm

Otangelo


Admin

was it a boat on mount ararat or not ?// there are food remains, it is a boat with pitch on the walls
its just evidence of an ancient boat

1. How could such a structure have been constructed at 13,800 feet in the permanent snow cap? I have snapshots of each wood plank with different cuts. I did count just in the site which Philip Williams visited, over 150 planks of different sizes and cuts. Panda lee reported one plank, in his expedition in 2008, to be 20 meters long. Dispersed and buried in ice deep below the surface. In the access tunnels and wood chambers, there are massive stairs made of wood trunks, each at least with a weight of over 200 lbs, buried in ice. There are huge wood walls impregnated with pitch, and curved, like a boat hull. How was all this constructed up there?

2. How could it have been buried so deep?. Under 20-30 feet of frozen volcanic rock and ice? Sub-freezing temperatures make it virtually impossible to construct a large ship under those conditions.

3. How could it have been made, considering that it is too unstable and dangerous in its location? One site leans precariously on a ledge, another on the side of a glacier slowing moving down the mountain. Large rocks regularly tumble the mountain burying the structure and threatening the life of workers.

4. It is a huge structure. It is in at least two pieces which together appear to be about the size of Noah’s Ark (450 feet long). How could it have been made, dragging that much timber that high, fabricating and assembling all the intricate wooden joints ? is it not too much for this height and temperature?

5. It is too complicated. It has a bowed hull, three decks, numerous square deep wooden joints for square wooden nails, tongue and grooved joined boards with evidence of handcraft: Would it not be too intricate and complex to construct under such difficult conditions?

6. It contains pottery, food remains skeletons of animals and various artifacts of ancient age.

7. If it is of recent construction, why is there surface patina on it which does not exist on recently fabricated boards, and there is no known way to fabricate it?

8. Noah and his family worked on level ground for perhaps 120 years. How could someone build this big ship on a high mountain under these seemingly impossible conditions in the few weeks per year after meltwater stops flowing and before winter snow prevents access to the sites?

Kangaroos, why no evidence of them travelling ?

Flood in archaeology and geology no evidence of simultaneous flood

Noah's flood was worldwide:
There are many extra-biblical evidences that point to a worldwide catastrophe such as a global flood. There are vast fossil graveyards found on every continent and large amounts of coal deposits that would require the rapid covering of vast quantities of vegetation. Oceanic fossils are found upon mountain tops around the world. Cultures in all parts of the world have some form of flood legend. All of these facts and many others are evidence of a global flood.
https://www.gotquestions.org/global-flood.html

Five civilizations at the flood event

400 years were time enough to the Tower of Babel

Archaeology does not support that hypothesis neither

Strata are dated at various digs

Historians find evidence of

excavation of the city of Jericho

civilization vanished and the appeared again

aegyptians were some of the best historians of the world. Old kingdom age. Age of the great pyramid builders . How did the pyramids be built ? population problem ?

there should be no humans when the pyramids were built

south America, new and old DNA populations







https://reasonandscience.catsboard.com

102Perguntas .... - Page 5 Empty Re: Perguntas .... Tue Oct 12, 2021 7:38 am

Otangelo


Admin

Perguntas .... - Page 5 Rsos210664f01

H-element of Titanokorys gainesi gen. et sp. nov., paratype ROMIP 65168. (a) Part; (b) counterpart; (c) close-up of ornamentation, photographed under low-angle light; (d), (e) close-ups of posterolateral margins. Ap, anterolateral processes; Bp bilobate axial posterior region; He, H-element; Lp, posterolateral processes; Mn, medial notch; On, ocular notch; Ri, ridges associated with a reticulated pattern; Sa, sagittal spine; Sl1,2, terminal (Sl1) and medial (Sl2) spines of posterolateral processes; Tu, tubercles; Vm, the ventrolateral margin of H-element. Scale bars: (a,b) = 20 mm; (c,d) = 5 mm.

Perguntas .... - Page 5 Rsos210664f02

Assemblage of Titanokorys gainesi gen. et sp. nov., paratype ROMIP 65741. (a) Overview of slab, showing close association of H-element and partial appendage with an assemblage of Cambroraster falcatus consisting of an H-element and pair of appendages; (b) detail of T. gainesi; (c) close-up of endites from frontal appendage; (d) close-up of endites from frontal appendage of C. falcatus; (e) close-up of anterior margin of H-element, showing ornamentation. Ap, anterolateral processes; EnX, endite no. X; He, H-element; He-C, H-element of C. falcatus; Fa, frontal appendage; Fa-C, frontal appendage of C. falcatus; Ri, ridges associated with a reticulated pattern; Sa, sagittal spine; Se, secondary spines on endites; Sp, spiniform distal endites; Ts, Terminal spine; Tu, tubercles. Scale bars, (a,b) = 20 mm; (c–e) = 5 mm.

Perguntas .... - Page 5 Rsos210664f03

Assemblage of Titanokorys gainesi gen. et sp. nov., holotype ROMIP 65415. (a) Overview of slab, with boxed regions indicating close-ups in other panels, note associated agnostids (Peronopsis cf. columbiensis) possibly feeding on the remains or encrusting biofilms [27]; (b,c) original obliquely preserved H-element, with arrows showing the direction of deformation and dashes indicating sagittal axis of symmetry (b) and hypothetical undeformed version (c) using distort mode in Adobe Photoshop version 21.2.2 (based on the length-width proportions of ROMIP 65168). (d) Close-up of P-element spine; (e) close-up of P-element showing ridges; (f) close-up of bands of gill lamellae; (g,h) appendages and oral cone photographed using different low-angle light orientations to emphasize different details; (i,j) overall view (i) and close-up (j) of the frontal appendage of Cambroraster falcatus, ROMIP 65084, showing comparatively shorter spiniform distal endites and shorter secondary spines on more proximal endites. (k) Line drawing of appendages and oral cone of T. gainesi (from g,h); (l–n) close-ups of frontal appendages using different low-angle light orientations (l, close-up of g; m, close-up of h). Bu; burrow; Gb, gill blade; Ig, individual gill filament; In, Indeterminate; Oc, oral cone; Pc, Peronopsis cf. columbiensis; Pd, peduncle (podomere 1); Pe, P-element; PoX, podomere no. X; Ps, P-element spine; other abbreviations see figures 1 and 2. Scale bars: (a–c) = 50 mm; (e,g–i,k–n) = 10 mm; (d,f,j) = 5 mm.

Perguntas .... - Page 5 Rsos210664f04

H-elements of Titanokorys gainesi gen. et sp. nov. showing ornamentation. (a,b) Paratype ROMIP 65749; (a) overview, note associated ptychopariid trilobites; (b) close-up of boxed region from (a); (c,d) paratype ROMIP 65748; (c) overview photographed under low-angle light; (d) close-up of boxed region in (c) showing tuberculate margin. For abbreviations, figure 1. Scale bars = 10 mm.

Perguntas .... - Page 5 Rsos210664f06

Comparative morphology of Pahvantia hastata. (a–c), P. hastata KUMIP 314089; (a), the part showing distal ends of broken endites to the left of the gill blades; (b) part and counterpart superposed to show the nearly complete appendage partly overlying the gill blades, lower inset showing complete counterpart with carapace elements, upper inset showing a close-up of partial appendage and gills on counterpart; (c) counterpart superposed on line drawing of part; (d) appendage of Hurdia for comparison, ROMIP 59259; (e–h), disarticulated Hurdia assemblages, showing groups of connected gill blades associated with other body parts; (e,f), ROMIP 60031; (g,h), ROMIP 60041. Scale bars: (a–c) = 2 mm; (b); upper inset, 5 mm; lower inset, 10 mm; (d–h) = 10 mm. Ds, dorsal spine; Ot, Ottoia prolifica; PEn, peduncular endite, other abbreviations see figures 1 and 3. (a–c) Images courtesy Rudy Lerosey-Aubril.

https://reasonandscience.catsboard.com

103Perguntas .... - Page 5 Empty Re: Perguntas .... Thu Nov 11, 2021 2:03 pm

Otangelo


Admin

Environmental adaptive pressures in the ocean remain largely the same. There are variegated eco-systems, but all fish have to adapt to live in the ocean, in salty water. Deep-sea creatures living thousands of meters below the ocean surface are exposed to darkness and heavy bone-crushing pressures of the weight of water

Social effects of evolutionary theory

The promotion of materialism
Darwins Theory of evolution paved the way to introduce philosophical naturalism into science. Richard Dawkins wrote: Although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist.

Before the Theory of evolution was popularized in the 19th century, the major figures advancing science were Christians. They are the true science fathers. Bacon, for example, set out the first conception of the scientific method, firmly based on experimental evidence and inductive reasoning.

Faith in evolution diminishes faith in the Bible, the existence of God, and the Genesis account
Creationist and evolutionist positions are by many perceived and portrayed as mutually exclusive and diametrically opposed. The greatest impact is on the Christian faith, whose very foundation was shaken, from the creation of man and the world, the fall of man, and redemption. The rejection of a teleological view of nature, the attack on natural theology, or the depiction of man as merely an advanced ape.  Many justify evolution to deny the existence of a supreme being, an afterlife, and spiritual rewards. most studied by researchers.

Scientific racism
https://en.wikipedia.org/wiki/Scientific_racism

Perceptions of race and ethnicity.
An evolutionary perspective was used to highlight racial differences, and to prove the inferiority of those that are phenotypically different. The drawing trees of life led Ernst Haeckel into believing in the evolution of a Germanic super-race.
He wrote: “if one must draw a sharp boundary, it has to be drawn between the most highly developed and civilised man on one hand, and the rudest savages on the other, and the latter have to be classed with the animals”.

The historical evidence is overwhelming that human evolution was an integral part of Nazi racial ideology. It held a prominent place in the Nazi school curriculum and in training courses in the Nazi worldview. Nazi officials and SS anthropologists agreed that humans, including the Nordic race, had evolved from primates. They believed that the Nordic race had evolved to a higher level of intelligence, physical prowess, and social solidarity than other races, in large part because they had faced what biologists today would call greater selective pressure.

In Nazi Germany, the Darwinist idea of evolution through struggle was taken up in order to prove that the superior pure races would prevail over the mixed inferior ones. Racial thinking facilitated the rise of  political anti-Semitism. Racism indicated that the Jews were not just a religious community but biologically different from other races. 

How can one explain the enthusiastic reception of Blavatsky's ideas by significant numbers of Europeans and Americans from the 1880s onwards? Theosophy offered an appealing mixture of ancient religious ideas and new concepts borrowed from the Darwinian theory of evolution and modern science. This syncretic faith thus possessed the power to comfort certain individuals whose traditional outlook had been upset by the discrediting of orthodox religion

Human zoos



Social Darwinism
an idea popular in the 19th century that holds that "the survival of the fittest" explains and justifies differences in wealth and success among societies and people.

The value and dignitiy of human life
Without God, there can be no intrinsic, sanctity, or inherent value of human life, there can be no measure to distinguish why a cockroach is less valuable than man.  

Eugenics
which claimed that human civilization was subverting natural selection by allowing the "less fit" to survive and "out-breed" the "more fit." Later advocates of this theory would suggest radical and often coercive social measures to attempt to "correct" this imbalance.

From Darwin to Hitler by Richard Weikart, Weikart claims that Darwinism's impact on ethics and morality played a key role not only in the rise of eugenics, but also in euthanasia, infanticide, abortion, and racial extermination, all ultimately embraced by the Nazis.

"The cultural consequences of this triumph of materialism were devastating. Materialists denied the existence of objective standards binding on all cultures, claiming that environment dictates our moral beliefs." ... "materialism spawned a virulent strain of utopianism. Thinking they could engineer the perfect society through the application of scientific knowledge, materialist reformers advocated coercive government programs that falsely promised to create heaven on earth."

Lack of purpose in life.
If there is no God, and evolution is true, then life is empty and purposeless, leading to nihilistic pessimism.  If we are just animals; there is no purpose and meaning in life. Then, pain, death, and suffering are a necessary part of life, essential to furthering the evolution of life on this planet. A loving God has no place in such a scenario. Such a scenario robs the sense that there is a “master plan.” from God for each one of us. 

There can be no fundamental meaning, if there is no God which made us for a specific purpose, and if our lives will cease one day to exist. If that is so, the day we cease to exist, even IF there is a God, what we did during our lifetime, will ultimately cease to have a fundamental meaning. It is just a momentary transition out of oblivion into oblivion.


There can be no objective moral values if evolution is true
1. If there is no God, there are no objective moral values, since they are prescribed " ought to be's".
2. If there is no God, then moral values are just a matter of personal opinion, and as such, no objectively or universally valid at all. According to Naturalism/Materialism, any claims of morality have to be relativistic, utilitarian, and/or cultural in basis but *not* intrinsic or transcendent.
3. If that is the case, unbelievers have no moral standard to judge anything as morally good or bad.
4. Therefore, in order to criticize God, they need to borrow from the theistic worldview, and as such, their criticism is self-contradicting and invalid.
5. Even IF they could make a case to criticize God's choices, that would not refute his existence.

https://reasonandscience.catsboard.com

104Perguntas .... - Page 5 Empty Re: Perguntas .... Thu May 19, 2022 5:53 am

Otangelo


Admin

Science has not even been able to explain the natural origin of the four basic building blocks of life. In the same sense, as the bricks of a house require precise sizes and dimensions, and have to be made in factories, in serial production, usually involving complex manufacturing processes, production lines, compartments, etc. so do the four basic building blocks of life. Carbohydrates, amino acids, nucleotides, and phospholipids in the precise form as used in the cell are not simply encountered naturally in the environment. Cells use complex metabolic pathways to make them, using the molecular machines inside the cells, proteins, and, furthermore, recycle them, and those that have done their job, are disposed of as trash in a complex machine called the proteasome. Proteins can be imagined as tiny self-operating, preprogrammed robots, that are in many cases lined up in a production-line-like manner, similar as in a car factory. The raw materials are imported from the exterior of the cell, entering the metabolism pathway, where one enzyme does the first manufacturing step, handing over the intermediate product to the second, and so forth, going through several steps until the end product is ready to be used wherever it is needed.  In some cases, the intermediate product is toxic to the cell, like producing reactive oxygen species (ROS). The trick to solving this problem is to employ proteins that have more than one reactive center. So the raw product goes in, and the entire process goes through all necessary stages inside the protein, and no intermediate product leaks out to the outside to contaminate the cell. Pretty smart, hah ?!! One of the essential parts of the cell is the cell membrane. It employs phospholipids, and they use in their structure so-called fatty acids.  One of these awe-inspiring nano multistep nanomachines that perform several synthesis processes, making fatty acids, is called Fatty acid synthase. We'll come later to talk about it.

These tiny production lines produce all the cell components. In abiogenesis research, there are two main views about how everything started. One is the metabolism first, and the other is the replicator first scenario. In order to perform metabolism reactions, there is always a team at work. It is a joint venture of several players at work. In the same sense, if a robot in a car factory production line stops to do its job and has a malfunction or damage, the entire production line gets to a halt, so in the cell. These cellular metabolic pathways are composed of interdependent networks. It is an all-or-nothing business. Everything has to work properly, or the cell dies. Obviously, all this high-tech stuff was not extant on early earth, so the question arises: How did it all start?  The four basic building blocks had to be made somehow, alternatively, without having the complex manufacturing steps at hand. So, origin of life (OOL) researchers had to find natural, non-enzymatic processes that could account for the first emergence of these building blocks. As it comes out, all the proposals have failed. Some are invoking the escape excuse that "science is still working on it". But is it really just a matter of time, until a natural explanation will be found? Decades over decades of intensive, multibillion-dollar scientific research all over the globe, and no result yet?

Even if we hypothesize a path of a natural origin of the basic building blocks on the early earth, a very important gap is commonly overlooked, or not mentioned: How do you go from the non-enzymatic, natural emergence of the building blocks, eventually catalyzed by clay and similar sorts, to the synthesis processes in the cell, using information, energy, and molecular machines embedded in complex metabolic pathways, fully protected and encapsulated in a membrane, full of gates, and able to keep out the materials that should not go in, and recognizing and importing those necessary for life? How do you keep a homeostatic milieu? How do you get the complex, highly regulated, orchestrated, and controlled replication process, including all the necessary error-check and repair mechanisms fully in operation ? How do you go from one state of affairs of e prebiotic soup, or hydrothermal vents, or even meteorites in space full of all sorts of amino acids,  to a fully self-replicating cell? 

Many are trying to explain this fact away, claiming that life could have started much simpler. But how much simpler? And what is actually life? What is the threshold and transition point of non-life, to life? Citing Denton, again, he writes:

1. When we see complexification, that is: Interconnecting parts,where the system is greater than the sum of their parts, then it is logical to attribute such actions to an intelligently acting mind with foresight and foreknowledge, and distant goals.
2. Making systems with the hallmark of complexity depends on the careful elaboration and design in detail of many elementary parts and interconnecting them in a meaningful way conferring a specific purpose or function. Not rarely, small changes in one part of the system can cause sudden and unexpected outputs in other parts of the system, system-wide reorganization, or breaking down of the higher function.
3. Random accidents are not the best case-adequate explanation for the origin of emerging properties of a complex system. intelligent design is.






The RNA world
One has to dig deep, to arrive at the bottom of affairs, and there investigate carefully, to try to unravel, and understand what really is going on, what mechanisms are in play, and that can serve as a basis to draw true to the case inferences. Unless this is done, the risk to come to false conclusions is considerable and real.

To put it short. There is no evidence that there were selective forces driving simple molecules like RNA monomers to catenate, polymerize, start to self-replicate, and undergo mutations through natural selection, becoming informative semantophoretic strings bearing the functional equivalent of a computer hard drive.

Astrophysicist Dr. Paul Sutter claimed in a YouTube video:

"It's possible that early life didn't even use proteins or DNA. It's possible that early life only used RNA. This is called the RNA world hypothesis, and it works because RNA is capable of self-replicating. It's capable of catalyzing reactions. But eventually, short RNA strands appear. And then, these short RNA strands start participating in chemical reactions that get ever more complex. And then, slowly over time due to evolutionary pressure, eventually DNA and proteins emerge as more efficient versions of the same basic process." 9

These are the kind of explanations that pop up in science articles with a certain frequency. The RNA world hypothesis has been popular for almost four decades.  It doesn't matter how confidently and enthusiastically, and convincingly the claim is made, this is pseudo-scientific gobbledygook. Ann Gauger from the Discovery Institute gave a good characterization of pseudo-science. She wrote:

" When certain biologists discuss the early stages of life there is a tendency to think too vaguely. They see a biological wonder before them and they tell a story about how it might have come to be. They may even draw a picture to explain what they mean. Indeed, the story seems plausible enough, until you zoom in to look at the details. I don’t mean to demean the intelligence of these biologists. It’s just that it appears they haven’t considered things as completely as they should. Like a cartoon drawing, the basic idea is portrayed, but there is nothing but blank space where the profound detail of biological processes should be."

This modus-operandi stretches out in all evolutionary literature, books, and popular newspaper articles. The crux is in the details. MICHAEL J. BEHE brought it to the point. He wrote:
In order to say that some function is understood, every relevant step in the process must be elucidated. The relevant steps in biological processes occur ultimately at the molecular level, so a satisfactory explanation of a biological phenomenon such as sight, or digestion, or immunity, must include a molecular explanation. It is no longer sufficient, now that the black box of vision has been opened, for an ‘evolutionary explanation’ of that power to invoke only the anatomical structures of whole eyes, as Darwin did in the 19th century and as most popularizers of evolution continue to do today. Anatomy is, quite simply, irrelevant. So is the fossil record. It does not matter whether or not the fossil record is consistent with evolutionary theory, any more than it mattered in physics that Newton’s theory was consistent with everyday experience. The fossil record has nothing to tell us about, say, whether or how the interactions of 11-cis-retinal with rhodopsin, transducin, and phosphodiesterase could have developed step-by-step. Neither do the patterns of biogeography matter, or of population genetics, or the explanations that evolutionary theory has given for rudimentary organs or species abundance. 5


Paradoxes of life
Even if there were a bunch of primed, selected nucleotides on early earth, the only trajectory would have been destruction, either through UV light, hydrolysis ( where the polymer strings break apart in the contact with water), or self-decomposition, becoming asphalts. I often cite Steve Benners brilliant science paper: Paradoxes in the Origin of Life. He writes:

An enormous amount of empirical data have established, as a rule, that organic systems, given energy and left to themselves, devolve to give uselessly complex mixtures, “asphalts”. The literature reports (to our knowledge) ZERO CONFIRMED OBSERVATIONS where evolution emerged spontaneously from a devolving chemical system. It is impossible for any non-living chemical system to escape devolution to enter into the Darwinian world of the “living”. 6

https://reasonandscience.catsboard.com

105Perguntas .... - Page 5 Empty Re: Perguntas .... Wed May 25, 2022 3:37 pm

Otangelo


Admin

The author is a Bible-believing Christian and holds the belief, that the Genesis 1 account is literally true. No compromise. To the science-oriented reader, that might sound a bit shocking. How can an educated person in the 21st century believe in talking snakes, donkeys, and 2000-year-old fairy tales told by uneducated sheep-herders? Has science not overcome this? Don't we know better today? 

Energy ( Glucose ) + matter ( elements) + information ( stored in DNA) = building blocks of life ( amino acids, DNA and RNA, carbohydrates, phospholipids )
Building blocks ( amino acids) + Information = ATP synthase machines
ATP synthase Machines produce ATP energy.
ATP energy + metabolism = Hardware of the cell that stores information, DNA, and RNA
Information + machines =

In order for life to start, you need energy and information to make the building blocks of life in the right specified complex functional form. So you need 1. energy, 2. matter, 3 information ( data) in the process. These building blocks are used to make machines. Information directs the arrangement of the building blocks, in order to make these machines, like ATP synthase, that makes energy in the form of ATP. Energy and other micromachines are required to make the hardware molecule, DNA, that stores the software, that instructs how to make the machines that make energy, and the hardware to store information. Metabolic pathways make the building blocks. Energy is consumed in the process. Information is required in the process.

This is a circle that has no beginning, and no end. Either all of this started fully operational and ready, or it would have never started.  

In order to make the complex building blocks of life, the elements that make them must be available in a useful form. Carbon, Nitrogen, Phosphorus, Sulfur. They can only be available to the cell if energy cycles are in place. These energy cycles depend on specialized bacteria. So life is necessary to make the elements, that are necessary to have life. Another cycle had to be fully set up to start everything. A stepwise, gradual evolutionary process, is not possible to achieve that state of affairs.

This is a circle that has no beginning, and no end. Either all of this started fully operational and ready, or it would have never started.  










How were the 20 proteinogenic amino acids selected on early earth?
Science is absolutely clueless about how and why specifically this set of amino acids is incorporated into the genetic code to make proteins. Why 20, and not more or less? ( in some rare cases, 22) considering that many different ones could have been chosen? Stanly Miller wrote in the  science paper from 1981: Reasons for the Occurrence of the Twenty Coded Protein Amino Acids:

There are only twenty amino acids that are coded for in protein synthesis, along with about 120 that occur by post-translational modifications. Yet there are over 300 naturally-occurring amino acids known, and thousands of amino acids are possible. The question then is - why were these particular 20 amino acids selected during the process that led to the origin of the most primitive organism and during the early stages of Darwinian evolution. Why Are beta, gamma and theta Amino Acids absent? The selection of a-amino acids for protein synthesis and the exclusion of the beta, gamma, and theta amino acids raises two questions. First, why does protein synthesis use only one type of amino acid and not a mixture of various α, β, γ, δ… acids? Second, why were the a-amino acids selected? The present ribosomal peptidyl transferase has specificity for only a-amino acids. Compounds with a more remote amino group reportedly do not function in the peptidyl transferase reaction. The ribosomal peptidyl transferase has a specificity for L-a-amino acids, which may account for the use of a single optical isomer in protein amino acids. The chemical basis for the selection of a-amino acids can be understood by considering the deleterious properties that beta, theta, and gamma-amino acids give to peptides or have for protein synthesis. 1

The question is not only why not more or less were selected and are incorporated in the amino acid "alphabet", but also how they could/would have been selected from a prebiotic soup, ponds, puddles, or even the archaean ocean?
The ribosome core that performs the polymerization, or catenation of amino acids, joining one amino acid monomer to another,  the ribosomal peptidyl transferase center, only incorporates alpha-amino acids, as Joongoo Lee and colleagues explain in a scientific article from 2020:

Ribosome-mediated polymerization of backbone-extended monomers into polypeptides is challenging due to their poor compatibility with the translation apparatus, which evolved to use α-L-amino acids. Moreover, mechanisms to acylate (or charge) these monomers to transfer RNAs (tRNAs) to make aminoacyl-tRNA substrates is a bottleneck. The shape, physiochemical, and dynamic properties of the ribosome have been evolved to work with canonical α-amino acids 11

There are no physical requirements that dictate, that the ribosome should/could not be constructed capable to incorporate β, γ, δ… amino acids. Indeed, scientists work on polymer engineering, designing ribosomes that use an expanded amino acid alphabet. A 3D printer uses specifically designed polyester filaments to be fed with, that can process them, and print various objects based on the software information that dictates the product form. If someone tries to use raw materials that are inadequate, the printer will not be able to perform the job it was designed for. The ribosome is a molecular 3D nano printer, as Jan Mrazek and colleagues elucidate in a science paper published in 2014

Structural and functional evidence point to a model of vault assembly whereby the polyribosome acts like a 3D nanoprinter to direct the ordered translation and assembly of the multi-subunit vault homopolymer, a process which we refer to as polyribosome templating. 12 where the reaction center is also specifically adjusted to perform its reaction with the specific set of α-amino acids. 

The materials that the machine is fed with, and the machine itself have both to be designed from scratch, in order to function properly. One cannot operate with the adequacy of the other. There is a clear interdependence that indicates that the amino acid alphabet was selected to work with the ribosome as we know it.

From Georga Tech:
The preference for the incorporation of the biological amino acids over non-biological counterparts also adds to possible explanations for why life selected for just 20 amino acids when 500 occurred naturally on the Hadean Earth.
“Our idea is that life started with the many building blocks that were there and selected a subset of them, but we don’t know how much was selected on the basis of pure chemistry or how many biological processes did the selecting. Looking at this study, it appears today’s biology may reflect these early prebiotic chemical reactions more than we had thought,” said Loren Williams,  professor in Georgia Tech’s School of Chemistry and Biochemistry 
4

The authors mention 500 supposedly extant on early earth. Maybe they got that number from a scientific article about nonribosomal peptides (NRPs) which coincides with that number of 500. Areski Flissi and colleagues write:

Secondary metabolites (nonribosomal peptides) are produced by bacteria and fungi. In fact, >500 different building blocks, called monomers, are observed in these peptides, such as derivatives of the proteinogenic amino acids, rare amino acids, fatty acids or carbohydrates. In addition, various types of bonds connect their monomers such as disulfide or phenolic bonds. Some monomers can connect with up to five other monomers, making cycles or branches in the structure of the NRPs. 5

Stuart A. Kauffman and colleagues published a paper in 2018, which gives us an entirely different perspective. They wrote on page 22, in the section Discussion:

Using the PubChem dataset and the Murchison meteorite mass spectroscopy data we could reconstruct the time evolution and managed to calculate the time of birth of amino acids, which is about 165 million years after the start of evolution. ( They mean after the Big Bang)  a mere blink of an eye in cosmological terms. All this puts the Miller-Urey experiment in a very different perspective. the results suggest that the main ingredients of life, such as amino acids, nucleotides and other key molecules came into existence very early, about 8-9 billion years before life. 6

Why should the number of possible amino acids on early earth be restricted to 500? In fact, as Allison Soult, a chemist from the University of Kentucky wrote: Any ( large ) number of amino acids can possibly be imagined.  7 This number is defacto limitless. The universe should theoretically be able to produce an infinite number of different amino acids. The AA R sidechains can have any isomer combination. They can come right-handed, or left-handed, with one or two functional groups, with cyclic (cyclobutane, cyclopentane, and cyclohexane) and/or branched structures, they can come amphoteric, with different charges, and so on. Furthermore: A carbon atom bonded to a functional group, like carbonyl,  is known as the α carbon atom. The second is β (α, β, γ, δ…) and so on, according to the Greek alphabetical order. It is conceivable that the protein alphabet would be made of β peptides. There is nothing that physically constrains or limits amino acids to have different configurations. In fact, we do know bioactive peptides that use β-amino acids do form polymer sequences 3  Every synthetic chemist will confirm this. There is also no plausible reason why only hydrogen, carbon, nitrogen, oxygen, and sulfur should/could be used in a pool of 118 elements extant in the universe. If the number of possible AA combinations to form a set is limitless, then the chance of selecting randomly a specific set of AAs for specific functions is practically zero. It would have never happened by non-designed means. 

Optimality of the amino acid set that is used to encode proteins 
In 2011, Gayle K. Philip published a science paper, titled: Did evolution select a nonrandom "alphabet" of amino acids? They wrote in the abstract:

The last universal common ancestor of contemporary biology (LUCA) used a precise set of 20 amino acids as a standard alphabet with which to build genetically encoded protein polymers. Many alternatives were also available, which highlights the question: what factors led biological evolution on our planet to define its standard alphabet? Here, we demonstrate unambiguous support that the standard set of 20 amino acids represents the possible spectra of size, charge, and hydrophobicity more broadly and more evenly than can be explained by chance alone. 2

We know that conscious intelligent agents with foresight are able to conceptualize and visualize apriori, a system of building blocks, like Lego bricks, that have a set of properties that optimally perform a specific function or/and task, that is intended to be achieved, and subsequently, we know that intelligent agents can physically instantiate the physical 3D object previously conceptualized. 

Lego bricks in their present form were launched in 1958. The interlocking principle with its tubes makes it unique and offers unlimited building possibilities. It's just a matter of getting the imagination going – and letting a wealth of creative ideas emerge through play. 8

Amino acids are analogous to Lego bricks. Bricks to build a house are made with the right stability, size, materials, and capacity of isolation for maintaining adequate narrow-range temperatures inside a house. Glass is made with transparency to serve as windows.  (Rare earth) Metals, plastic, rubber, etc. are made to serve as building blocks of complex machines. A mix of atoms will never by itself organize to become the building blocks of a higher-order complex integrated system based on functional, well-integrated, and matching sub-parts. But that is precisely what nature needs in order to complexify into the integrated systems-level organization of cells and multicellularity. We know about the limited range of unguided random processes. And we know the infinite range of engineering solutions that capable intelligent agents can instantiate. 

Gayle K. Philip continues:
We performed three specific tests: we compared (in terms of coverage) (i) the full set of 20 genetically encoded amino acids for size, charge, and hydrophobicity with equivalent values calculated for a sample of 1 million alternative sets (each also comprising 20 members)  results showed that the standard alphabet exhibits better coverage (i.e., greater breadth and greater evenness) than any random set for each of size, charge, and hydrophobicity, and for all combinations thereof. Results indicate that life genetically encodes a highly unusual subset of amino acids relative to any random sample of what was prebiotically plausible. A maximum of 0.03% random sets out-performed the standard amino acid alphabet in two properties, while no single random set exhibited greater coverage in all three properties simultaneously. These results combine to present a strong indication that the standard amino acid alphabet, taken as a set, exhibits strongly nonrandom properties. Random chance would be highly unlikely to represent the chemical space of possible amino acids with such breadth and evenness in charge, size, and hydrophobicity (properties that define what protein structures and functions can be built). It is remarkable that such a simple starting point for analysis yields such clear results.

If the set does exhibit nonrandom properties, and random chance is highly unlikely, where does that optimality come from? It cannot be due to physical necessity. Matter has not the necessity to instantiate, to sort out a set of building blocks for distant goals. Evolution and natural selection is a hopelessly inadequate mechanism that was not at play at that stage. The only option left is intelligent design.

Later, in 2015, Melissa Ilardo and colleagues echoed Gayle K. Philip in the paper: Extraordinarily Adaptive Properties of the Genetically Encoded Amino Acids. They wrote:

We compared the encoded amino acid alphabet to random sets of amino acids. We drew 10^8 random sets of 20 amino acids from our library of 1913 structures and compared their coverage of three chemical properties: size, charge, and hydrophobicity, to the standard amino acid alphabet. We measured how often the random sets demonstrated better coverage of chemistry space in one or more, two or more, or all three properties. In doing so, we found that better sets were extremely rare. In fact, when examining all three properties simultaneously, we detected only six sets with better coverage out of the 10^8 possibilities tested. Sets that cover chemistry space better than the genetically encoded alphabet are extremely rare and energetically costly. The amino acids used for constructing coded proteins may represent a largely global optimum, such that any aqueous biochemistry would use a very similar set. 9

That's pretty impressive and remarkable. That means, that only one in 16 million sets is better suited for the task. The most recent paper to be mentioned was written by Andrew J. Doig in 2016. He wrote:

Why the particular 20 amino acids were selected to be encoded by the Genetic Code remains a puzzle. They were selected to enable the formation of soluble structures with close-packed cores, allowing the presence of ordered binding pockets. Factors to take into account when assessing why a particular amino acid might be used include its component atoms, functional groups, biosynthetic cost, use in a protein core or on the surface, solubility and stability. Applying these criteria to the 20 standard amino acids, and considering some other simple alternatives that are not used, we find that there are excellent reasons for the selection of every amino acid. Rather than being a frozen accident, the set of amino acids selected appears to be near ideal.10

The last sentence is remarkable. "the set of amino acids selected appears to be near ideal." It remains a puzzle as so many other things in biology that find no answer by the ones that build their inferences on a constraint set of possible explanations, where an intelligent causal agency is excluded a priori. Selecting things for specific goals is a conscious process, that requires intelligence. Attributes, that chance alone lacks, but an intelligent creator can employ to create life.

Biosynthetic cost: Protein synthesis takes a major share of the energy resources of a cell [12]. Table 1 shows the cost of biosynthesis of each amino acid, measured in terms of number of glucose and ATP molecules required. These data are often nonintuitive. For example, Leu costs only 1 ATP, but its isomer Ile costs 11. Why would life ever therefore use Ile instead of Leu, if they have the same properties? Larger is not necessarily more expensive; Asn and Asp cost more in ATP than their larger alternatives Gln and Glu, and large Tyr costs only two ATP, compared to 15 for small Cys. The high cost of sulfur-containing amino acids is notable.

This is indeed completely counterintuitive and does not conform with naturalistic predictions.

Burial and surface: Proteins have close-packed cores with the same density as organic solids and side chains fixed into a single conformation. A solid core is essential to stabilise proteins and to form a rigid structure with well-defined binding sites. Nonpolar side chains have therefore been selected to stabilise close-packed hydrophobic cores. Conversely, proteins are dissolved in water, so other side chains are used on a protein surface to keep them soluble in an aqueous environment.

The problem here is that molecules and an arrangement of correctly selected varieties of amino acids would bear no function until life began. Functional subunits of proteins, or even fully operating proteins on their own would only have a function after life began, and the cells intrinsic operations were on the go. It is as if molecules had the inherent drive to contribute to life to have a first go, which of course is absurd. The only rational alternative is that a powerful creator had the foresight, and knew which arrangement and selection of amino acids would fit and work to make life possible.

Which amino acids came first? It is plausible that the first proteins used a subset of the 20 and a simplified Genetic Code, with the first amino acids acquired from the environment.

Why is plausible? It is not only not plausible, but plain and clearly impossible. The genetic code could not emerge gradually, and there is no known explanation for how it emerged. The author also ignores that the whole process of protein synthesis requires all parts in the process fully operational right from the beginning. A gradual development by evolutionary selective forces is highly unlikely.

Energetics of protein folding: Folded proteins are stabilized by hydrogen bonding, removal of nonpolar groups from water (hydrophobic effect), van der Waals forces, salt bridges, and disulfide bonds. Folding is opposed by loss of conformational entropy, where rotation around bonds is restricted, and introduction of strain. These forces are well balanced so that the overall free energy changes for all the steps in protein folding are close to zero.

Foresight and superior knowledge would be required to know how to get a protein fold that bears function, and where the forces are outbalanced naturally to get an overall energy homeostatic state close to zero.


1. S L Miller: Reasons for the occurrence of the twenty coded protein amino acids 1981 https://pubmed.ncbi.nlm.nih.gov/7277510/
2. Gayle K. Philip: Did evolution select a nonrandom "alphabet" of amino acids? 2011 Mar 24 https://pubmed.ncbi.nlm.nih.gov/21434765/
3. Chiara Cabrele: Peptides Containing β-Amino Acid Patterns: Challenges and Successes in Medicinal Chemistry September 10, 2014 https://pubs.acs.org/doi/10.1021/jm5010896
4. Pre-Life Building Blocks Spontaneously Align in Evolutionary Experiment https://news.gatech.edu/news/2019/08/01/pre-life-building-blocks-spontaneously-align-evolutionary-experiment
5. Areski Flissi: Norine: update of the nonribosomal peptide resource https://academic.oup.com/nar/article/48/D1/D465/5613672
6. Stuart A. Kauffman: Theory of chemical evolution of molecule compositions in the universe, in the Miller-Urey experiment and the mass distribution of interstellar and intergalactic molecules  30 Nov 2019
https://arxiv.org/abs/1806.06716
7. https://chem.libretexts.org/Courses/University_of_Kentucky/UK%3A_CHE_103_-_Chemistry_for_Allied_Health_(Soult)/Chapters/Chapter_13%3A_Amino_Acids_and_Proteins/13.1%3A_Amino_Acids
8. https://web.archive.org/web/20150905173143/http://www.lego.com/en-us/aboutus/lego-group/the_lego_history
9. Melissa Ilardo: Extraordinarily Adaptive Properties of the Genetically Encoded Amino Acids 24 March 2015 https://www.nature.com/articles/srep09414
10. Andrew J. Doig: Frozen, but no accident – why the 20 standard amino acids were selected 2 December 2016https://febs.onlinelibrary.wiley.com/doi/pdf/10.1111/febs.13982
11. Joongoo Lee: Ribosome-mediated polymerization of long chain carbon and cyclic amino acids into peptides in vitro 27 August 2020 https://www.nature.com/articles/s41467-020-18001-x
12. Jan Mrazek: Polyribosomes Are Molecular 3D Nanoprinters That Orchestrate the Assembly of Vault Particles 2014 Oct 30 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4245718/

https://reasonandscience.catsboard.com

106Perguntas .... - Page 5 Empty Re: Perguntas .... Thu Jul 21, 2022 5:55 am

Otangelo


Admin

Jean Lehmann (24 March 2022): The degeneracy of the genetic code confers a wide array of properties to coding sequences.  The second position of the anticodon–codon interaction is a critical parameter controlling the extent of non-specific pairings accepted at the third position by the ribosome, a flexibility at the root of degeneracy. That residue A1493 of the decoding center provides a significant contribution to the stability, revealing that the ribosome is directly involved in the establishment of degeneracy. The A1493 and A1492 establish the basis of degeneracy when an elementary kinetic scheme of translation is prevailing. The translation of genetic information relies on base pairing between anticodons and codons. While the first two codon positions are restricted to canonical Watson–Crick base pairs, some flexibility occurs at the third position. This flexibility was postulated by F. Crick in 1966 to account for the observed degeneracy in the genetic code, which had just been fully deciphered. He suggested that G could base pair not only with C but also with U if some base displacement was possible at the third position, a possibility coined the ‘wobble hypothesis'. This flexibility would allow reduced sets of tRNAs to translate all amino-acid encoding codons, thereby making translation more efficient. The reason why unspecific pairing can be accepted at the third position became apparent only about 35 years later when the first structures at atomic resolution of the 30S subunit co-crystallized with mRNA fragments and anticodon stem-loops were elucidated. These structures revealed that, unlike at the first and second positions, the ribosome does not structurally constrain the wobble position, implying that some flexibility in the geometry of base pairing is possible.

In the meantime, it was discovered that extended wobbling, called ‘superwobbling’, can also occur at the third position. In that case, an unmodified U (exceptionally an A) at position 34 of a tRNA can base pair with any base at the third position of the codons. So far, superwobbling has been observed only in mitochondria, chloroplasts and other small genome entities with reduced sets of tRNAs. In such cases, the extent of wobbling matches the degeneracy families associated with each of the 16 N1N2 codon doublets of the genetic code: all codons of any codon family, whether it is two- or four-fold degenerate, are translated by a single tRNA through wobbling and superwobbling, respectively.

The rationale behind the existence of these two degeneracy families was partially unraveled in 1978 by U. Lagerkvist, who noticed that the strength of the base pairs in positions 1 and 2 of the codons and the purine/pyrimidine nature of the base at the second position constituted a set of three criteria (or parameters) with which a complete categorization of the 16 codon doublets into the two degeneracy families was possible, a feature that can be highlighted by a symmetry in the genetic code table. Based on the then available structural organization of the decoding center, and the architecture of the anticodon loop, an interpretation of these parameters was proposed in 2008. The analysis  demonstrated that all three parameters of Lagerkvist determine the number of hydrogen bonds contributing to the stability of the WC geometry of the base pair at the second position of the anticodon (N35-N2) 19






‘Snooze button’ on biological clocks improves cell adaptability
Vanderbilt University (2013): The circadian clocks that control and influence dozens of basic biological processes have an unexpected “snooze button” that helps cells adapt to changes in their environment. At least some species can alter the way that their biological clocks function by using different “synonyms” that exist in the genetic code. This provides organisms with a novel and previously unappreciated mechanism for responding to changes in their environment. Like many written languages, the genetic code is filled with synonyms: differently spelled “words” that have the same or very similar meanings. For a long time, biologists thought that these synonyms, called synonymous codons, were in fact interchangeable. Recently, they have realized that this is not the case and that differences in synonymous codon usage have a significant impact on cellular processes. While biological clocks are vital to maintaining healthy patterns of sleep, metabolism, physiology and behavior, under certain environmental conditions strict adherence to these rhythms can be disadvantageous. Organisms can ignore the clock under certain circumstances—much like hitting a biological snooze button on the internal timepiece—and enhance their survival in the face of ever-changing circumstances. CCA, CCG and CCC are synonymous codons because they all encode for the same amino acid, proline. It turns out that there is a reason for this redundancy. Some codons are faster and easier for cells to process and assemble into proteins than others. Optimizing all the codons used by the fungal biological clock knocked the clock out, which was totally unexpected! Clock proteins in the fungus are not properly assembled if they are synthesized too rapidly; it’s as if the speed of one’s writing affected our ability to read the text.

In the cyanobacteria, however, a different phenomenon is observed. Researchers optimized the codons in the cyanobacteria’s biological clock. This did not shut the clock down in the algae, but it did have a more subtle, but potentially as profound effect: It significantly reduced cell survival at certain temperatures. The biological clock with optimized codons might work better at lower temperatures. However the substitution also modified the biological clock so it ran with a longer, 30-hour period. When forced to operate in 24-hour daily light/dark cycles, the bacteria with the optimized clock grew significantly slower than “wild-type” cells. In cyanobacteria, it’s as if writing speed changes the meaning. The potential importance of changes in synonymous codon usage in adapting to environmental factors is magnified by the fact that they can influence the operation of biological clocks, which function as a key adaptation to daily environmental rhythms. Biological clocks control and influence dozens of different basic biological processes, including sleeping and feeding patterns, core body temperature, brain activity, hormone production and cell regeneration. It is now clear that variations in codon usage are a fundamental and underappreciated form of gene regulation. 18

https://reasonandscience.catsboard.com

107Perguntas .... - Page 5 Empty Re: Perguntas .... Sat Jul 23, 2022 4:55 pm

Otangelo


Admin

3. Robert Alicki: Information is not physical 11 Feb 2014
21. David Hume: https://philosophy.lander.edu/intro/introbook2.1/x4211.html
24. 
26. Sedeer el-Showk: The Language of DNA July 28, 2014










1. Paul Davies & Jeremy England:  The Origins of Life: Do we need a new theory for how life began? at 15:30 Life = Chemistry + information  Jun 25, 2021
2. Guenther Witzany: Life is physics and chemistry and communication 2014 Dec 31
3. Robert Alicki: Information is not physical 11 Feb 2014
4. Paul C. W. Davies: The algorithmic origins of life 06 February 2013
5. David L Abel: Dichotomy in the definition of prescriptive information suggests both prescribed data and prescribed algorithms: biosemiotics applications in genomic systems 2012 Mar 14
6. Albert Voie: Biological function and the genetic code are interdependent 2006
7. Paul Davies: Life force 18 September 1999
8. G. F. Joyce, L. E. Orgel: Prospects for Understanding the Origin of the RNA World 1993
10. Claus Emmeche: FROM LANGUAGE TO NATURE - the semiotic metaphor in biology 1991
11. David L Abel: Three subsets of sequence complexity and their relevance to biopolymeric information 11 August 2005
12. George M Church: Next-generation digital information storage in DNA 2012 Aug 16
13. Richard Dawkins on the origins of life (1 of 5) Sep 29, 2008
14. Leroy Hood: The digital code of DNA 2003 Jan 23
15. Hubert P. Yockey: Information Theory, Evolution, and the Origin of Life 2005
16. David L. Abel: The Capabilities of Chaos and Complexity 9 January 2009
17. Peter R. Wills: DNA as information 13 March 2016
18. Paul Davies: The Origin of Life January 31, 2003
19. Sergi Cortiñas Rovira: Metaphors of DNA: a review of the popularisation processes  21 March 2008
20. Massimo Pigliucci:  Why Machine-Information Metaphors are Bad for Science and Science Education 2010
21. David Hume: https://philosophy.lander.edu/intro/introbook2.1/x4211.html
22. Barry Arrington A Dog Is A Chien Is A Perro Is A Hund February 11, 2013
23. Paul Davies: The secret of life won't be cooked up in a chemistry lab
24. 
25. P.Marshall:  Evolution 2.0:Breaking the Deadlock Between Darwin and Design September 1, 2015
26. Sedeer el-Showk: The Language of DNA July 28, 2014
27. Change Laura Tan, Rob Stadler: The Stairway To Life: An Origin-Of-Life Reality Check  March 13, 2020 
28. David L Abel: The Universal Plausibility Metric (UPM) & Principle (UPP) 2009; 6: 27
29. Edward J. Steele: Cause of Cambrian Explosion -Terrestrial or Cosmic? 2018
30. Katarzyna Adamala OPEN QUESTIONS IN ORIGIN OF LIFE: EXPERIMENTAL STUDIES ON THE ORIGIN OF NUCLEIC ACIDS AND PROTEINS WITH SPECIFIC AND FUNCTIONAL SEQUENCES BY A CHEMICAL SYNTHETIC BIOLOGY APPROACH February 2014

29. Sir Fred Hoyle: The Universe: Past and Present Reflections November 1981
30. Robert T. Pennock: [size=12]Intelligent Design Creationism and Its Critics: Philosophical, Theological, and Scientific Perspectives 2001
[/size]
31. Paul Davies: The Origin of Life  January 31, 2003

32. Paul Davies & Jeremy England:  The Origins of Life: Do we need a new theory for how life began? Jun 25, 2021

33. Paul Davies: 'I predict a great revolution': inside the struggle to define life 2019
34. David T.F Dryden: How much of protein sequence space has been explored by life on Earth? 15 April 2008
35. Evolution: Possible, or impossible? Probability and the First Proteins
36. Steve Meyer, Signature in the Cell 2009
37. Hubert P.Yockey: A calculation of the probability of spontaneous biogenesis by information theory 7 August 1977
38. M. Emile Borel: LES PROBABILITIES DINOMBRABLES ET LEURS APPLICATIONS ARITHMtTIOUES. 8 novembre 1908

39. Florian Lauck: Coping with Combinatorial Space in Molecular Design October 2013

40. W.Patrick Walters: Virtual screening—an overview 1 April 1998

41. Brian R. Johnson: Self-organization, Natural Selection, and Evolution: Cellular Hardware and Genetic Software  December 2010

42. Paul Davies: The FIFTH MIRACLE: The Search for the Origin and Meaning of Life  March 16, 2000

43. Daniel J. Nicholson Is the cell really a machine? 4 June 2019

44. MARSHALL W. NIRENBERG Will Society Be Prepared? 11 August 1967

45. Patricia Bralley: An introduction to molecular linguistics Fehruary 1996

46. V A Ratner: The genetic language: grammar, semantics, evolution 1993 May;29

47. Eric Alani: DNA Spell Checkers

48. Libretexts: Genetic Information

49. Richard Dawkins: The blind watchmaker  1 January 1986

50. María A Sánchez-Romero: The bacterial epigenome 2020 Jan;18
51. Daniel J. Nicholson: On Being the Right Size, Revisited: The Problem with Engineering Metaphors in Molecular Biology 2020


1. B.Alberts: Molecular Biology of the Cell. 4th edition. 2003
2. Eugene V. Koonin: Origin and evolution of the genetic code: the universal enigma 2012 Mar 5
9. S J Freeland: The genetic code is one in a million 1998 Sep
10. Shalev Itzkovitz: The genetic code is nearly optimal for allowing additional information within protein-coding sequences 2007 Apr; 17
12. PAUL DAVIES: The Fifth Miracle The Search for the Origin and Meaning of Life 2000
13. H.Yockey: Information theory, evolution, and the origin of life 2005
14. Job Merkel: The Language of DNA 15 NOV, 2019
15. Stephen J. Freeland: Early Fixation of an Optimal Genetic Code 01 April 2000
16. Thomas Butler: Extreme genetic code optimality from a molecular dynamics calculation of amino acid polar requirement 17 June 2009
17. Fazale Rana The Cell's Design: How Chemistry Reveals the Creator's Artistry 1 junho 2008 Page 172:
18. David L. Abel: Redundancy of the genetic code enables translational pausing 2014 Mar 27
21. ULRICH E. STEGMANN: The arbitrariness of the genetic code 9 September 2003
22. L’udmila Lackova: Arbitrariness is not enough: towards a functional approach to the genetic code 2 May 2017
34. B. Alberts Molecular Biology of the Cell 6th ed. 2015
37. D. L. Gonzalez  On the origin of degeneracy in the genetic code 18 October 2019
39. Tessa E.F. Quax: Codon Bias as a Means to Fine-Tune Gene Expression 2016 Jul 16

https://reasonandscience.catsboard.com

108Perguntas .... - Page 5 Empty Re: Perguntas .... Tue Aug 02, 2022 4:00 pm

Otangelo


Admin

(Duplodnaviria)  All viruses of this realm share homologous MCPs (HK97-fold), large and small terminase subunits, prohead maturation proteases and portal proteins, indicating that their morphogenetic modules are monophyletic.

Euryarchaeota and Thaumarchaeota.

Furthermore, the observation that many bacterial members of the Duplodnaviria encode archaeal-like genome replication modules, which are not homologous to the bacterial functional counterparts, also argues in favour of the origin of this virus group antedating the archaeal–bacterial divide. The second realm of dsDNA viruses, Varidnaviria, is represented in prokaryotes by four families of bacterial viruses (Tectiviridae, Corticoviridae, Autolykiviridae and Finnlakeviridae), one family of archaeal viruses (Turriviridae) and the family Sphaerolipoviridae, in which different genera include viruses infecting either bacteria or archaea. However, mining metagenomic data for homologues of the DJR MCP using sensitive computational methods resulted in the discovery of a vast diversity of previously unknown viruses of this realm that, in all likelihood, infect prokaryotes. Actual host assignments await but some of these virus genomes were found in geothermal habitats, strongly suggesting archaeal hosts. 

proviruses encoding DJR MCPs, which has substantially expanded the reach of Varidnaviria in both prokaryotic domains. Phylogenetic analysis of the concatenated DJR MCP and genome packaging ATPases of archaeal varidnaviruses suggested coevolution of this group of viruses with the major archaeal lineages rather than recent horizontal transfer from bacteria. Thus, most likely, the LUCA virome also included multiple groups of dsDNA viruses with vertical (both single and double) jelly-roll MCPs. Furthermore, reconstruction of DJR MCP evolution sheds light on the pre-LUCA stages of virus evolution. Among the ssDNA viruses (realm Monodnaviria), only members of a single order, Tubulavirales (until recently known as the family Inoviridae), consisting of filamentous or rod-shaped viruses, appear to be hosted by both bacteria and archaea.

 However, whereas tubulaviruses are ubiquitous in bacteria, their association with archaea was inferred from putative proviruses present in several archaeal lineages, namely methanogens and aenigmarchaea. Such distribution has been judged best compatible with horizontal virus transfer from bacteria to archaea. Given their ubiquity in bacteria, the origin of filamentous bacteriophages concomitantly or soon after the emergence of the last bacterial common ancestor (LBCA) appears likely, whereas their presence in LUCA cannot be ruled out either. Similarly, microviruses with icosahedral capsids and circular ssDNA genomes are nearly ubiquitous in the environment and are genetically highly diverse. Although for the vast majority of these viruses the hosts are unknown, the few known isolates infect broadly diverse bacteria from five different phyla. It is thus likely that microviruses have a long-standing evolutionary history in bacteria, which probably dates back at least to the LBCA.

 In the extant biosphere, RNA viruses dominate the eukaryotic virome but are rare in bacteria (compared with DNA viruses) and unknown in archaea. Bacterial RNA viruses are represented by two families, the positive-sense RNA Leviviridae and dsRNA Cystoviridae. The host range of experimentally identified members of both families is limited to a narrow range of bacteria (almost exclusively Proteobacteria). However, recent metagenomics efforts have drastically expanded the known diversity of leviviruses, indicating that their share in the prokaryotic virome had been substantially under-appreciated. Reverse-transcribing viruses are conspicuously confined to eukaryotes although prokaryotes carry a substantial diversity of non-packaging (that is, non-viral) retroelements, for example, group II introns. The extant distribution of the viruses of the realm Riboviria, with its drastic display of eukaryotic over prokaryotic host ranges, might appear paradoxical given the broadly accepted RNA world concept of the origin of life, implying the early origin of RdRP and reverse transcriptase (RT)  and, as a consequence, the primordial status of RNA viruses. The origin of leviviruses within bacteria is best compatible with their currently characterized distribution and is a distinct possibility.  

Furthermore, unlike the LUCA, for which most evolutionary reconstructions suggest a mesophilic or a moderate thermophilic lifestyle, the last common ancestors of bacteria and archaea are inferred to have been thermophiles or hyperthermophiles. Extremely high temperatures might be restrictive for the propagation of RNA viruses and thus could represent a bottleneck associated with the demise of the ancestral RNA virome (and potentially explain why RNA viruses are unknown in archaea).  Thus, of the realm Riboviriapositive-sense RNA viruses are a putative component of the LUCA virome,  The ancestral status of many archaea-specific virus groups is difficult to ascertain. However, some monophyletic virus assemblages, such as those with spindle-shaped virions, infect hosts from all major archaeal lineages and thus can be traced to the last archaeal common ancestor. Therefore, their presence in the LUCA virome, with subsequent loss in the bacterial lineage, cannot be ruled out either.

Virus evolution before the LUCA 
The reconstruction of the evolutionary paths from ancestral host proteins to viral capsids sheds light on the early stages of evolution of both realms of dsDNA viruses. The DJR MCP of the Varidnaviria appears to be a unique virus feature, with no potential cellular ancestors detected. By contrast, the SJR MCP of numerous RNA viruses that were also acquired by ssDNA viruses through recombination can be traced to ancestral cellular carbohydrate-binding proteins, with several probable points of entry into the virus world. Thus, the DJR MCP, in all likelihood evolved from the SJR MCP early in the evolution of viruses. Remarkably, apparent evolutionary intermediates are detectable in two virus families. Viruses in the family Sphaerolipoviridae encode two ‘vertically’ oriented SJR MCPs that are likely to represent the ancestral duplication preceding the fusion that gave rise to the DJR MCP88–90. The recently discovered archaeal dsDNA viruses in the family Portogloboviridae contain one SJR MCP92 and thus appear to represent an even earlier evolutionary intermediate. Indeed, structural comparisons of the SJR MCPs from RNA and DNA viruses show that the portoglobovirus MCP is most closely related to the MCPs of sphaerolipoviruses. Combined with the inferred presence in the LUCA virome of multiple groups of Varidnaviria, the discovery of the intermediate MCP forms in capsids of extant viruses implies extensive evolution of varidnaviruses predating the LUCA. The families Portogloboviridae and Sphaerolipoviridae appear to be relics of the pre-LUCA evolution of varidnaviruses and, accordingly, must have been part of the LUCA virome. For the members of the second realm of dsDNA viruses, Duplodnaviria, no cellular ancestor was detected in the dedicated comparative analyses of the sequences and structures of virion proteins. However, a recent structural comparison has shown that the main scaffold of the HK97-like MCP belongs to the strand-helix-strand-strand (SHS2) fold (with the insertion of an additional, uncharacterized domain of the DUF1884 (PF08967) family) and appears to be specifically related to the dodecin family of the SHS2-fold proteins. Dodecins are widespread proteins in bacteria and archaea that form dodecameric compartments involved in flavin sequestration and storage and are thus plausible ancestors for the HK97-fold MCP. Although, in this case, there are no detectable evolutionary intermediates among viruses, the inferred presence of multiple groups of duplodnaviruses in the LUCA virome implies that the recruitment of dodecin and the insertion of DUF1884 are ancient events. Consistently, viruses with short tails (podovirus morphology), long non-contractile tails (siphovirus morphology) and long contractile tails (myovirus morphology) are all found in both bacteria and archaea, indicating that the morphogenetic toolkit of viruses with HK97-fold MCPs attained considerable versatility in the pre-LUCA era.

Virus replication modules 
Each virus genome includes two major functional modules, one for virion formation (morphogenetic module) and one for genome replication. The two modules rarely display congruent histories over long evolutionary spans and are instead exchanged horizontally between different groups of viruses through recombination, continuously producing new virus lineages. The morphogenetic modules including the vertical jelly-roll and HK97-fold MCPs can be traced to the LUCA virome. One of the most widespread replication modules in the virosphere is the rolling circle replication endonuclease (RCRE) of the HUH superfamily. Homologous RCREs are encoded by viruses with SJR and DJR MCPs, HK97-like MCPs and morphologically diverse ssDNA viruses and are also found in many families of bacterial and archaeal plasmids and transposons. Thus, RCRE can be confidently assigned to the LUCA virome or mobilome (that is, all the MGEs of the LUCA). Protein-primed family B DNA polymerases (pPolBs) represent another replication module with a broad distribution spanning several families of viruses and non-viral MGEs62. pPolB is present in bacteria-infecting members of the realms Duplodnaviria (phi29-like podoviruses) and Varidnaviria (Tectiviridae, Autolykiviridae and diverse varidnavirus genomes identified in metagenomic data) as well as in several families of archaeal viruses (Halspiviridae, Thaspiviridae, Ovaliviridae and Pleolipoviridae). In phylogenetic analyses, pPolBs split into two separate clades corresponding to bacterial and archaeal viruses, strongly suggesting that they have coevolved with bacterial and archaeal lineages ever since their divergence from the LUCA. Two other key replication proteins that are among the most common in bacterial and archaeal viruses and MGEs are primases of the archaeo-eukaryotic primase (AEP) superfamily and superfamily 3 helicases (S3H). Whereas S3H are exclusive to viruses and MGEs, the viral AEP form specific families that are not closely related to the cellular homologues. Notably, bacteria do not employ AEP for primer synthesis, and thus bacterial viruses could not have recruited this protein from their hosts. Thus, AEP and S3H, along with RCRE and pPolB, appear to represent major components of the replication modules of the LUCA virome. More generally, contemporary duplodnaviruses display a remarkable diversity of genome replication modules, from minimalist initiators that recruit cellular DNA replisomes for viral genome replication to near-complete virus-encoded DNA replication machineries. In many cases, these DNA replication proteins do not have close cellular homologues, suggesting a long evolutionary history within the virus world. Notably, some of the phage proteins, such as helicase loaders, have replaced their cellular counterparts at the onset of certain bacterial lineages for the replication of cellular chromosomes. Although some tailed bacterial dsDNA viruses encode replication factors of apparent bacterial origin, in archaeal duplodnaviruses, the proteins involved in informational processes, including components of the genome replication machinery, DNA repair and RNA metabolism, are of archaeal type, with none of the known archaeal viruses encoding components of the bacterial-type replication machinery. Finally, tailed archaeal viruses carry archaeal or eukaryotic-like promoters, consistent with the fact that none of the known archaeal viruses encode RNA polymerases, further pointing to long-term coevolution with the hosts. These considerations argue against (recent) horizontal transfers of duplodnaviruses between bacteria and archaea accounting for the observed distribution of these viruses, even though some such transfers might have occurred. Thus, analyses of duplodnavirus and varidnavirus genome replication modules complement those of the morphogenetic modules and suggest extensive divergence of both groups of viruses in the pre-LUCA era.

Conclusions
The informal reconstructions attempted here suggest a remarkably diverse, complex LUCA virome. This ancestral virome was likely dominated by dsDNA viruses from the realms Duplodnaviria and Varidnaviria. In addition, two groups of ssDNA viruses (realm Monodnaviria), namely Microviridae and Tubulavirales, can be traced to the LBCA, whereas spindle-shaped viruses, most likely infected the last archaeal common ancestor. The possibility that these virus groups were present in the LUCA virome but were subsequently lost in one of the two primary domains cannot be dismissed. The point of origin of the extant bacterial positive-sense RNA viruses (realm Riboviria) remains uncertain, with both bacterial and primordial origins remaining viable scenarios. Further virus prospecting efforts could shed light on the history of these viruses. Although the inferred LUCA virome in all likelihood did not include members of many extant groups of viruses of prokaryotes, its apparent complexity seems to exceed the typical complexity of well-characterized viromes of bacterial or archaeal species. These observations imply that the LUCA was not a homogenous microbial population but rather a community of diverse microorganisms, with a shared gene core that was inherited by all descendant life-forms and a diversified pangenome that included various genes involved in virus–host interactions, in particular multiple defence systems. 

According to the ‘chimeric’ scenario of virus origins, different groups of viruses evolved through recruitment of cellular proteins as virion components19. Here, we present evidence that — contingent on our mapping of both duplodnaviruses and varidnaviruses to the LUCA virome — several such events occurred in the earliest phase of the evolution of life, from the primordial pool of replicators to the LUCA. Moreover, virus evolution during that early era went through multiple, distinct stages as demonstrated by the reconstructed histories of the capsid proteins of the two realms of dsDNA viruses. The cellular SJR-containing carbohydrate-binding or nucleoplasmin-like proteins (the ancestors of the varidnavirus DJR MCPs) and the dodecins (the ancestors of the duplodnavirus MCPs) belong to expansive protein families that have already undergone substantial diversifying evolution prior to the origins of the two realms of viruses. The respective protein families do not belong to the universal core of cellular life, so their apparent pre-LUCA diversification further emphasizes the substantial pangenomic, organizational and functional complexity of the LUCA. This conclusion is indeed compatible with the previous inferences on the LUCA made from the analysis of coalescence in different families of ancient genes, namely that a common ancestor containing all the genes shared by the three domains of life has never existed108. Straightforward thinking on the LUCA virome might have envisaged it as a domain of RNA viruses descending from the primordial RNA world. However, the reconstructions suggest otherwise, indicating that the LUCA was similar to the extant prokaryotes with respect to the repertoire of viruses it hosted. These findings do not defy the RNA world scenario but mesh well with the conclusion that DNA viruses have evolved and diversified extensively already in the pre-LUCA era. The RNA viruses, after all, might have been the first to emerge but, by the time the LUCA lived, they had already been largely supplanted by the more efficient DNA virosphere. 8

Aude Bernheim (2019): For a microorganism to be protected against a wide variety of viruses, it should encode a broad defense arsenal that can overcome the multiple types of viruses that can infect it. Owing to the selective advantage that defense systems provide, they are frequently gained by bacteria and archaea through horizontal gene transfer (HGT). Faced with viruses that encode counter-defense mechanisms, bacteria and archaea cannot rely on a single defense system and thus need to present several lines of defense as a bet-hedging strategy of survival. Given their selective advantage in the arms race against viruses, one might expect that defense systems, once acquired (either through direct evolution or via HGT), would accumulate in prokaryotic genomes and be selected for. Surprisingly, this is not the case as defense systems are known to be frequently lost from microbial genomes over short evolutionary time scales, suggesting that they can impose selective disadvantages in the absence of infection pressure. Competition studies between strains encoding defense systems, such as CRISPR–Cas or Lit Abi, and cognate defense-lacking strains have demonstrated the existence of a fitness cost in the absence of phage infectionAccess to a diverse set of defense mechanisms is essential in order to combat the enormous genetic and functional diversity of viruses. None of the strains encode all defense systems. However, if these strains are mixed as part of a population, the pan-genome of this population would encode an ‘immune potential’ that encompasses all of the depicted systems. As these systems can be readily available by HGT, given the high rate of HGT in defense systems, the population in effect harbors an accessible reservoir of immune systems that can be acquired by population members. When the population is subjected to infection, this diversity ensures that at least some population members would encode the appropriate defense system, and these members would survive and form the basis for the perpetuation of the population 5

Felix Broecker (2019): Cellular organisms have co-evolved with various mobile genetic elements (MGEs), including transposable elements (TEs), retroelements, and viruses, many of which can integrate into the host DNA. MGEs constitute ∼50% of mammalian genomes, >70% of some plant genomes, and up to 30% of bacterial genomes. The recruitment of transposable elements (TEs), viral sequences, and other MGEs for antiviral defense mechanisms has been a major driving force in the evolution of cellular life. 6

Muller's Ratchet: Another hurdle in the hypothetical origin of life scenarios
E. V. Koonin (2017): Both the emergence of parasites in simple replicator systems and their persistence in evolving life forms are inevitable because the putative parasite-free states are evolutionarily unstable. 3 E. V. Koonin (2016): In the absence of recombination, finite populations are subject to irreversible deterioration through the accumulation of deleterious mutations, a process known as Muller’s ratchet, that eventually leads to the collapse of a population via mutational meltdown. 2

Dana K Howe (2008): The theory of Muller's Ratchet predicts that small asexual populations are doomed to accumulate ever-increasing deleterious mutation loads as a consequence of the magnified power of genetic drift and mutation that accompanies small population size. Evolutionary theory predicts that mutational decay is inevitable for small asexual populations, provided deleterious mutation rates are high enough. Such populations are expected to experience the effects of Muller's Ratchet where the most-fit class of individuals is lost at some rate due to chance alone, leaving the second-best class to ultimately suffer the same fate, and so on, leading to a gradual decline in mean fitness. The mutational meltdown theory built upon Muller's Ratchet to predict a synergism between mutation and genetic drift in promoting the extinction of small asexual populations that are at the end of a long genomic decay process. Since deleterious mutations are harmful by definition, accumulation of them would result in loss of individuals and a smaller population size. Small populations are more susceptible to the ratchet effect and more deleterious mutations would be fixed as a result of genetic drift. This creates a positive feedback loop that accelerates the extinction of small asexual populations. This phenomenon has been called mutational meltdown. From the onset, there would have had to be a population of diversified microbes, not just the population of one progenitor, but varies with different genetic make-ups, internally compartmentalized, able to perform Horizontal Gene Transfer and recombination. Unless these preconditions were met, the population would die. 1

A plurality of ancestors
The origin of life did not coincide with the organismal LUCA; rather, a profound gap in time, biological evolution, geochemical change, and surviving evidence separates the two. After life emerged from prebiotic processes, diversification ensued and the initial self-replicating and evolving living systems occupied a wide range of available ecological niches. From this time until the existence of the organismal LUCA, living systems, lineages and communities would have come and gone, evolving via the same processes that are at work today, including speciation, extinction, and gene transfer.  4

Eugene V. Koonin (2020): The LUCA was not a homogenous microbial population but rather a community of diverse microorganisms, with a shared gene core that was inherited by all descendant life-forms and a diversified pangenome that included various genes involved in virus–host interactions, in particular multiple defense systems. 8

Horizontal Gene transfer, and the Origin of Life
Gregory P Fournier (2015): The genomic history of prokaryotic organismal lineages is marked by extensive horizontal gene transfer (HGT) between groups of organisms at all taxonomic levels. These HGT events have played an essential role in the origin and distribution of biological innovations. Analyses of ancient gene families show that HGT existed in the distant past, even at the time of the organismal last universal common ancestor (LUCA). Mobile genetic elements, including transposons, plasmids, bacteriophage, and self-splicing molecular parasites, have played a crucial role in facilitating the movement of genetic material between organisms. Ancient HGT during Hadean/Archaean times is more difficult to study than more recent transfers, although it has been proposed that its role was even more pronounced during earlier times in life’s history.  

Aude Bernheim (2019): None of the strains encode all defense systems. However, if these strains are mixed as part of a population, the pan-genome of this population would encode an ‘immune potential’ that encompasses all of the depicted systems. As these systems can be readily available by HGT, given the high rate of HGT in defense systems, the population in effect harbors an accessible reservoir of immune systems that can be acquired by population members. When the population is subjected to infection, this diversity ensures that at least some population members would encode the appropriate defense system, and these members would survive and form the basis for the perpetuation of the population 5

Eugene V. Koonin (2014): Recombinases derived from unrelated mobile genetic elements have essential roles in both prokaryotic and vertebrate adaptive immune systems. 7

From the onset, there would have had to be a population of diversified microbes, not just the population of one species of progenitor, but varies with different genetic make-ups, able to perform Horizontal Gene Transfer (HGT) and recombination. Also, there had to be transposons, viral sequences, plasmids, viruses, mobile genetic elements, parasites, etc.  Unless these preconditions were met, the population would go extinct.

Gene regulation
The regulation of genes is essential and performed in all life forms. Genes have to be expressed at the right time and encountered fast and with precision by the cell's machinery. It is often mentioned that genes are analogous to blueprints. A better comparison might be to compare them to books in a library. Each book contains the instructions to make a specific molecular machine, or how to operate the cell. The gene regulatory network compares to library software, to find books on the shelf.  The regulatory circuitry controls how the cell has to operate, and how to respond and adapt to the surrounding environmental conditions. It activates transcription and represses it when needed, and is responsible for forming phenotypes that best adapt to the surrounding environmental conditions. It controls DNA replication, the partition of nascent chromosomes to form daughter cells, and the repair of DNA, among other essential tasks. Obviously, these functions had to be fully functional when life started, since they are indispensable.

1. Dana K Howe Muller's Ratchet and compensatory mutation in Caenorhabditis briggsae mitochondrial genome evolution 2008
2. Eugene V Koonin: Inevitability of Genetic Parasites 2016 Sep 26
3. Eugene V. Koonin: Inevitability of the emergence and persistence of genetic parasites caused by evolutionary instability of parasite-free states 04 December 2017
4. Gregory P Fournier: Ancient horizontal gene transfer and the last common ancestors 22 April 2015
5. Aude Bernheim The pan-immune system of bacteria: antiviral defence as a community resource 06 November 2019
6. Felix Broecker: Evolution of Immune Systems From Viruses and Transposable Elements 29 January 2019
7. Eugene V. Koonin: Evolution of adaptive immunity from transposable elements combined with innate immune systems December 2014
8. Eugene V. Koonin: [/size]The LUCA and its complex virome [size]14 July 2020
[/size]

https://reasonandscience.catsboard.com

109Perguntas .... - Page 5 Empty Re: Perguntas .... Sat Aug 13, 2022 8:39 am

Otangelo


Admin

The CRISPR-Cas system acts in a sequence-specific manner by recognizing and cleaving foreign DNA or RNA. The defense mechanism can be divided into three stages:

(i) adaptation or spacer acquisition,
(ii) crRNA biogenesis, and
(iii) target interference

(a) Adaptation
In a first phase, a distinct sequence of the invading MGE called a protospacer is incorporated into the CRISPR array yielding a new spacer. This event enables the host organism to memorize the intruder's genetic material and displays the adaptive nature of this immune system. Two proteins, Cas1 and Cas2, seem to be ubiquitously involved in the spacer acquisition process as they can be found in almost all CRISPR-Cas types. Exceptions are the type III-C, III-D and IV CRISPR-Cas systems, which harbour no homologous proteins. Moreover, type V-C shows a minimal composition as it comprises only a putative effector protein termed C2C3 and a Cas1 homologue. The selection of protospacers and their processing before integration remain widely obscure in many CRISPR-Cas types. Recent findings, however, shed light on the biochemistry of the spacer integration process. It has been demonstrated that Cas1 and Cas2 of the type I-E system of Escherichia coli form a complex that promotes the integration of new spacers in a manner that is reminiscent of viral integrases and transposases [10–13]. Although both Cas1 and Cas2 are nucleases [14–16], the catalytically active site of Cas2 is dispensable for spacer acquisition [10–12]. A new spacer is usually incorporated at the leader-repeat boundary of the CRISPR array [1] while the first repeat of the array is duplicated [17,18].

The mechanisms of the different CRISPR-Cas types might be conserved only to a certain extent as several studies have shown variations regarding the requirements and targets of the adaptation machinery. While Cas1 and Cas2 are sufficient to promote spacer acquisition in most studied type I CRISPR-Cas systems, type I-B further requires Cas4 for adaptation [19]. The type I-F CRISPR-Cas system of Pseudomonas aeruginosa additionally requires the interference machinery to promote the uptake of new spacers [20]. Similarly, type II-A systems require Csn2, Cas9 and tracrRNA (trans activating CRISPR RNA—see further details below) for acquisition [1,21,22]. Another, so far unique, adaptation mode was revealed for a type III-B Cas1 protein that is fused to a reverse transcriptase. Here, acquisition from both DNA and RNA was reported [23].

The selection of a target sequence that is integrated into the CRISPR locus is not random. It has been demonstrated that in type I, II and V CRISPR-Cas systems, a short sequence, called the protospacer adjacent motif (PAM), is located directly next to the protospacer and is crucial for acquisition and interference [24–29]. In type II-A CRISPR-Cas systems, the PAM-recognizing domain of Cas9 is responsible for protospacer selection [21,22]. It is believed that after protospacer selection, Cas9 recruits Cas1, Cas2 and possibly Csn2 for integration of the new spacer into the CRISPR array. This feature may be conserved among all class 2 CRISPR-Cas systems although experimental evidence is missing. For type I-E, the Cas1–Cas2 complex is sufficient for spacer selection and integration although it has been reported that the presence of the interference complex increases the frequency of integrated spacers that are adjacent to a proper PAM [24,25]. Moreover, in a process called priming, the interference machinery of several type I CRISPR-Cas systems can stimulate the increased uptake of new spacers upon crRNA-guided binding to a protospacer that was selected upon a first infection [19,25,30]. This process displays a distinct adaptation mode compared to naive spacer acquisition as it strictly requires a pre-existing spacer matching the target. It usually leads to higher acquisition rates from protospacers that lie in close proximity to the target site [25]. Interestingly, primed spacer acquisition does not depend on target cleavage as it also functions for degenerated target sites that would usually result in impaired interference [31]. The exact mechanism remains obscure but it has been demonstrated that the interference complex can recruit Cas1 and Cas2 during PAM-independent binding to DNA [32].

https://royalsocietypublishing.org/doi/10.1098/rstb.2015.0496

https://reasonandscience.catsboard.com

110Perguntas .... - Page 5 Empty Re: Perguntas .... Wed Aug 17, 2022 8:00 am

Otangelo


Admin

Eugene V. Koonin (2013): CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-Cas (CRISPR-associated genes) systems is able to memorize the encounters with infectious agent and attack it specifically afterwards 3

Simon A. Jackson (2017): Bacteria and archaea are constantly threatened by phage infection and invasion by mobile genetic elements (MGEs) through conjugation and transformation. In response, microorganisms have a defense arsenal, including various innate mechanisms and the CRISPR-Cas (clustered regularly interspaced short palindromic repeats and associated proteins) adaptive immune systems. CRISPR-Cas systems are widely distributed, occurring in 50 and 87% of complete bacterial and archaeal genomes, respectively. These systems function as RNA-guided nucleases ( A nuclease is an enzyme capable of cleaving the phosphodiester bonds between nucleotides of nucleic acids ) that provide sequence-specific defense against invading mobile genetic elements. CRISPR adaptation is achieved by incorporating short DNA fragments from mobile genetic elements into CRISPR arrays to form memory units called spacers. Early bioinformatic studies showed that many spacers were of foreign origin, hinting that CRISPR loci may act as a form of memory for a prokaryotic immune system. Subsequent confirmation of the link between spacers and resistance to phage and mobile genetic elements was gained experimentally.

CRISPR-Cas systems are adaptive immune defense systems found in bacteria and archaea. There is a diverse range of CRISPR-Cas systems. These systems are divided into two major classes and six types. Each system consists of two components: 

1. a locus for memory storage (the CRISPR array) and 
2. cas genes that encode the machinery driving immunity.

Information stored within CRISPR arrays is used to direct the sequence-specific destruction of invading genetic elements, including viruses and plasmids. As such, all CRISPR-Cas immune systems are reliant on the formation of CRISPR memories, known as spacers, to facilitate future defense. To form these memories, small fragments of invader nucleic acids are added as spacers to the CRISPR memory banks in a process termed CRISPR adaptation. The genetic basis of immunity means that CRISPR adaptation provides heritable benefits, an attribute that is unparalleled in eukaryotic immune systems. There is widespread evidence of highly active CRISPR adaptation in nature, and it is clear that these systems play important roles in shaping microbial evolution and global ecological networks.

CRISPR adaptation requires several processes, including selection and processing of spacer precursors and their subsequent localization to, and integration into, the CRISPR loci. At the heart of CRISPR adaptation is a protein complex, the Cas1-Cas2 “workhorse,” which catalyzes the addition of new spacers to CRISPR memory banks. Foreign DNA is converted to pre-spacer substrates and captured by the Cas1- Cas2 complex. After this, Cas1-Cas2 locates the genomic CRISPR locus and docks in the appropriate position for insertion of the new spacer into the CRISPR array, while duplicating a CRISPR repeat. The cues directing the docking of substrate-laden Cas1-Cas2 differ between systems, with some relying on intrinsic sequence specificity and others assisted by host proteins. Before integration, accurate processing of the spacer precursors is required to ensure that the new spacers are compatible with the protein machinery in order to elicit CRISPR-Cas defense. For a given CRISPR-Cas system, spacers must typically be of a certain length and be inserted into the CRISPR in a specific orientation. It is becoming increasingly apparent that Cas1-Cas2 complexes from diverse systems are capable of ensuring that these system-specific factors are met with high fidelity. New findings also account for the ordering of stored memories: Typically, the insertion of new spacers is directed to one end of CRISPR arrays, and it has been shown that this enhances immunity against recently encountered invaders. The chronological ordering of new spacers has enabled insights into the temporal dynamics of interactions between hosts and invaders that are constantly changing. Some CRISPR-Cas systems use existing spacers to recognize previously encountered elements and promote the formation of new CRISPR memories, a process known as primed CRISPR adaptation. Viruses and plasmids that have escaped previous CRISPR-Cas defenses through genetic mutations trigger primed CRISPR adaptation. Primed CRISPR adaptation is also strongly promoted by recurrent invaders, even in the absence of escape mutations. This has led to previously separate paradigms of invader destruction and primed CRISPR adaptation beginning to converge into a unified model.

CRISPR adaptation is crucial for ensuring both population-level protection through spacer diversity and protection of the host through invader clearance. Thus, despite the relative wealth of mechanistic information about CRISPR adaptation in a few specific types, work in other systems continues to reveal distinct modes of operation for spacer acquisition. Do other systems possess analogous mechanisms that have yet to be discovered, or does the absence of priming in these systems explain the prevalence of type I systems in nature?  

Red Queen CRISPR adaptation 
The ability to keep defenses up to date by acquiring new spacers is central to the success of CRISPR-Cas systems. Typically, new spacers are inserted at a specific end of the CRISPR array, adjacent to a “leader” region that contains conserved sequence motifs. The leader usually also contains the promoter driving CRISPR transcription, and it has been demonstrated that integration of new spacers at the leader end enhances defense against phages and mobile genetic elements encountered recently. This “polarized” addition of spacers into CRISPR loci produces a chronological account of the encounters between phages and bacteria that can provide insights into phage-host co-occurrences, evolution, and ecology. Maintenance of CRISPR-Cas defense is reliant on the addition of new spacers into CRISPR arrays. The continuous competition between host CRISPR adaptation and mobile genetic elements escape, akin to Red Queen dynamics, has been exposed in several recent metagenome studies. Individual cells within a prokaryotic community acquire different, and often multiple, spacers during CRISPR adaptation. The diversity of CRISPR loci within cell populations optimizes defense by limiting the reproductive success of mutants that escape the CRISPR-Cas defenses of individual cells. Furthermore, the resulting polymorphisms in CRISPR loci enable fast and accurate differentiation of species subtypes, which may prove to have economic and clinical benefits—for example, enabling tracing of pathogens during outbreaks

Origins of CRISPR adaptation 
According to their constituent Cas proteins, CRISPR-Cas systems are classified into two major classes consisting of six types and 19 subtypes (Fig. 1). 

Perguntas .... - Page 5 Crispr10
Fig. 1. Target interactions and the PAMs of diverse CRISPR-Cas types. 
Recognition of the invading DNA target by the crRNA-Cas effector complexes of types I, II, and V results in the formation of an RNA-DNA hybrid in which the nontarget DNA strand is displaced. The target strand contains the protospacer (red), which is complementary to the spacer sequence in the crRNA (orange). The protospacer-adjacent motif (PAM, blue) is located at either the 3′ end (types I and V) or the 5′ end (type II) of the protospacer. Types III and VI recognize RNA targets, with type III exhibiting additional transcription-dependent DNA targeting. Some type III systems require an RNA-based PAM (rPAM). Type VI systems exhibit specificity for a protospacer-flanking sequence (PFS) motif, which is analogous to a PAM.

Despite the divergence of CRISPR-Cas systems into several types, the proteins primarily responsible for catalyzing spacer acquisition— namely, Cas1 and Cas2—remain relatively conserved, and the genes encoding these proteins are associated with nearly all CRISPR-Cas systems. Indeed, as long as spacers can be acquired from mobile genetic elements, distinct effector machineries capable of using the information stored in CRISPRs are likely to arise. In addition, considerable progress has been made lately toward elucidating the molecular basis of how, when, and why CRISPR adaptation occurs. 

The molecular basis of CRISPR adaptation 
CRISPR adaptation requires the integration of new spacers into CRISPR loci and duplication of the associated repeat sequences. The Cas1 and Cas2 proteins,  constitute the “workhorse” of spacer integration. Spacers added to CRISPR arrays must be compatible with the diverse range of type-specific effector complex machinery (Fig. 1). Thus, despite being near ubiquitous among CRISPR-Cas types, Cas1-Cas2 homologs meet the varied requirements for the acquisition of appropriate spacer sequences in different systems. For example, the effector complexes of several CRISPR-Cas types only recognize targets containing a specific sequence adjacent to where the CRISPR RNA (crRNA) base-pairs with the target strand of a MGE (Fig. 1). The crRNA-paired target sequence is termed the protospacer, and the adjacent target-recognition motif is called a protospacer-adjacent motif (PAM). PAM-based target discrimination prevents the unintentional recognition and self-destruction of the CRISPR locus by the crRNA-effector complex, yet canonical PAM sequences vary between and sometimes within systems. The Cas1 subunits form two dimers that are bridged by a central Cas2 dimer (Fig. 2A). 

Perguntas .... - Page 5 Crispr11
Fig. 2. Cas1-Cas2–mediated spacer acquisition.
(A) The Cas1-Cas2 protein complex loaded with a prespacer substrate (the E. coli type I-E structure is shown; Protein Data Bank ID, 5DQZ). 
(B) The Cas1 PAM sensing site, showing the canonical type I-E PAM (CTT, yellow) residue-specific interactions (a residue from the noncatalytic Cas1 monomer is annotated with an asterisk) and the site of PAM processing (scissors). H, histidine; K, lysine; Q, glutamine; R, arginine; Y, tyrosine. 
(C) A schematic representation of the substrate-loaded Cas1-Cas2 protein complex with the active PAM-sensing site highlighted (light purple) and a partially duplexed DNA prespacer substrate (strands are purple and pink). The ruler mechanism determining spacer length for the E. coli type I-E system uses two conserved tyrosine residues (the “Cas1 wedge;” gray hexagons). 
(D) Spacer integration proceeds as follows: (i) The Cas1-Cas2–prespacer complex binds to the leader (green) and first repeat (black). For type I and type II systems, Cas1-Cas2 docking to the leader-proximal repeat is assisted by integration host factor (IHF) and recognition of the leader-anchoring site (LAS), respectively. (ii) The first nucleophilic attack most likely occurs at the leader-repeat junction and gives rise to a half-site intermediate. (iii) The second nucleophilic attack occurs at the boundary between the repeat and the spacer (orange), resulting in full-site integration. (iv) Host DNA repair enzymes fill the integration site. 
(E) The type I-E repeat is magnified to indicate the inverted repeats within its sequence and highlight the anchoring sites of the molecular rulers that determine the point of integration. nt, nucleotides.

Cas1- Cas2–mediated spacer integration prefers double-stranded DNA (dsDNA) substrates and proceeds through a mechanism resembling retroviral integration. In addition to Cas1-Cas2, at least one CRISPR repeat, part of the leader sequence, and several host factors for repair of the insertion sites (e.g., DNA polymerase) are required. Spacer acquisition involves three main processes: 

1.substrate capture, 
2.recognition of the CRISPR locus, and
3.integration within the array.

Cas1-Cas2 substrate capture 
During substrate capture, Cas1-Cas2 is loaded with an integration-compatible pre-spacer, which is thought to be partially duplexed dsDNA. For type I systems, the presence of a canonical PAM within the prespacer substrate increases the affinity for Cas1-Cas2 binding but is not requisite.  For the E. coli type I-E Cas1-Cas2–prespacer complex, the ends of the dsDNA prespacer are splayed by tyrosine wedges in each Cas1 dimer, which locks open the DNA branch points while fixing in place a core 23–base pair dsDNA region. The 3′ single-stranded ends of the prespacer extend into active subunits of each corresponding Cas1 dimer (Fig. 2, A to C). The length of new spacers is governed by the fixed distances between the two Cas1 wedges and from the branch points to the integrase sites. Many CRISPR-Cas systems have highly consistent yet system-specific spacer lengths, and it is likely that analogous wedge-based Cas1- Cas2 “molecular rulers” exist in these systems to control prespacer length. However, in some systems, such as type III, the length of spacers found within CRISPR arrays appears more variable, and studies of Cas1-Cas2 structure and function in these systems are lacking


According to Borja Alonso-Lerma: (2022) CRISPR-Cas molecular complexes were extant in extinct firmicutes bacterial species,  2600 myr ago, calling them ancient enzymes with extraordinary properties.

The CRISPR-Cas molecular complexes of prokaryotes are very diverse in function and composition, with currently in over thirty subtypes categorized in six types and two classes (types I, III and IV in class 1, types II, V and VI in class 2). Most of them serve as defense systems that prevent infection by viruses and other transmissible genetic elements. To defend against foreign nucleic acids, CRISPR loci harbour repeat-intervening DNA fragments (spacers) acquired during the so-called adaptation stage from invading nucleic acid fragments that have attacked the cell lineage in past generations. In many CRISPR-Cas systems, only DNA fragments next to specific short sequences called protospacer adjacent motifs (PAMs) are incorporated as new spacers. Over time, adaptation has created a vast collection of invading DNA spacers that are continuously evolving.

Transcripts from the CRISPR array are processed into small RNA molecules (crRNA), each containing a fragment of a single spacer and part of the repeats. The following steps of the CRISPR mechanism may show substantial variations depending on the system type. In the type II CRISPR-Cas system, associated with the signature Cas9 effector protein (referred to as the CRISPR-Cas9 system), the crRNA molecules bound to an accessory trans-activating crRNA (tracrRNA), serve as cognate guides (gRNAs) for the Cas9 nuclease. After binding to a compatible PAM in the invading dsDNA, Cas9 uses its RuvC and HNH nuclease domains to catalyze a double-strand break (DSB) within PAM-adjacent sequences that match the spacer carried by the crRNA.

The accuracy of the targeting depends on the binding of Cas9 to the gRNA, and the Cas9-gRNA complex subsequently to the PAM and remainder of the target site. This recognition ability is also the basis for the utilization of CRISPR-Cas9 as a gene editor, often linking the crRNA and tracrRNA into a single-guide RNA (sgRNA). The incorporation of invading DNA in the CRISPR locus is an adaptive process that is time-dependent, i.e., over time bacteria will acquire new DNA that will be transferred to new generations. The ability of the organisms to process new DNA for self-protection must have arisen due to evolutionary pressure, which might be responsible for the recognition and cleavage properties of Cas nucleases. For instance, although the absence of the PAM in the spacer-proximal sequence of the type II CRISPR repeats protects the CRISPR locus from targeting, the acquisition of spacers matching the host genome has the potential to induce autoimmunity and diverse mechanisms have been identified that prevent or minimize the deleterious consequences of CRISPR-driven DSB in prokaryotes. However, how nucleases have achieved such abilities, as well as the origin of the system, is a matter of debate.

Frank Hille et al., (2018): Prokaryotic CRISPR-Cas adaptive immune systems store memory of past infections, and upon reinfection, deploy RNA-guided nucleases for sequence-specific silencing of phages and other mobile genetic elements (MGEs), such as plasmids and transposons.

The defining feature of these systems, which are found in 50% and nearly 90% of complete bacterial and archaeal genomes, respectively, is the CRISPR array. This genomic locus is composed of alternating identical repeats and unique spacers. The spacer sequences match plasmids and phage genomes, as it was the first hint that CRISPR-Cas might function as a prokaryotic defense mechanism. The function of CRISPR-Cas as an adaptive immune system in which the CRISPR array serves as an archive of previous infections was ultimately demonstrated by the observation that phage challenge of Streptococcus thermophilus stimulates expansion of the CRISPR array by acquisition of phage-derived spacers that immunize against subsequent infection. The ability of the immune system to prevent infection by other mobile genetic elements (MGEs) was demonstrated shortly afterward by showing CRISPR-driven inhibition of plasmid conjugation and transformation in Staphylococcus epidermidis (Bacterial conjugation is the transfer of genetic material between bacterial cells by direct cell-to-cell contact or by a bridge-like connection between two cells.). Adjacent to the CRISPR array is a series of genes encoding the Cas proteins that drive the three phases of immunity: adaptation, CRISPR RNA (crRNA) biogenesis, and interference. During adaptation, foreign nucleic acids are selected, processed, and integrated into the CRISPR array to provide a memory of infection. Memory is retrieved when the CRISPR array is transcribed to produce a long precursor crRNA (pre-crRNA) that is processed within the repeat sequences to yield mature crRNAs. Upon subsequent infection, the interference machinery is guided by crRNAs to cleave complementary sequences, termed protospacers, in the foreign nucleic acids (Figure 1).

Perguntas .... - Page 5 Gr110
Figure 1: The Three Stages of CRISPR Immunity
During adaptation, the Cas1-Cas2 complex selects a part of the foreign DNA and integrates it into the host’s CRISPR array. In the next stage (crRNA maturation), the CRISPR array is transcribed into a long pre-crRNA that is further processed by Cas proteins or, in some cases, by cellular RNases. In the interference stage, the mature crRNAs guide Cas nucleases to the cognate foreign DNA. The Cas proteins cleave the foreign nucleic acid upon binding of the crRNA to the target sequence. In class 1 systems, the interference machinery is a multi-Cas-protein complex, whereas class 2 systems utilize a single Cas protein for target cleavage

By compromising the selfish, often hostile programs encoded by MGEs, CRISPR-Cas systems protect prokaryotes from succumbing to infection. According to the assortment of cas genes and the nature of the interference complex, CRISPR-Cas systems have been assigned to two classes, which are further subdivided into six types and several subtypes that each possess signature cas genes. Class 1 CRISPR-Cas systems (types I, III, and IV) employ multi-Cas protein complexes for interference, whereas in class 2 systems (types II, V, and VI), interference is accomplished by a single effector protein.

Interference is presented first, followed by the preceding processes that license this final stage of immunity. The role of these immune systems in the ecology of phage-prokaryote interactions and the strategies that phages have evolved to counter CRISPR-Cas.

Interference: Cleaving DNA and RNA Invaders
Sequence-specific destruction of invading MGEs is the basis for CRISPR-Cas defense. In the final stage of CRISPR-Cas-mediated immunity, mature crRNAs guide the interference machinery to cleave invading nucleic acids. In order to store the genetic information of a parasitic MGE, a part of the foreign DNA must be integrated in the genomic CRISPR locus of the host. This, however, raises an inherent problem for the interference machinery: the sole reliance on sequence complementarity between the crRNA and the target sequence would result in cleavage of the CRISPR array. Hence, nearly all characterized CRISPR-Cas systems (except type III) have authentication and discrimination mechanism that involves coordinated recognition of a short sequence, called the protospacer adjacent motif (PAM), by both the adaptation and interference machinery. The presence of a PAM proximal to the acquired spacer and targeted protospacer and its absence in the CRISPR array facilitates robust immunity while averting auto-immune targeting of the CRISPR array.

Interference in Class 1 CRISPR-Cas Systems

Type I

Type I systems are the most widespread CRISPR-Cas systems and employ a crRNA-bound multiprotein complex termed CRISPR-associated complex for antiviral defense (Cascade) for target recognition, as well as the nuclease Cas3 for target cleavage (Figure 2).

Perguntas .... - Page 5 Gr210
Figure 2: The Interference Pathways of Class 1 CRISPR-Cas Systems
Two general pathways for class 1 interference exist. In the first pathway, exemplified by the type I-E system of E. coli, crRNA bound by Cas6 serves as a scaffold for Cascade assembly. Cascade first recognizes the PAM (yellow) on the invader DNA. R-loop formation is induced when crRNA base pairs with the target strand of the DNA. The presence of the R-loop triggers the recruitment of the endonuclease Cas3, which initiates degradation of the non-target strand. Similar to type I systems, type III systems form multi-Cas-protein complexes for interference (Csm and Cmr for type III-A and type III-B, respectively) using the crRNA as a scaffold. Type III-A is shown here as an example. Unlike in type I, Cas6 of type III is not an integral part of the interference complex. The crRNA within the type III complexes binds to complementary regions in target RNA transcripts. Binding triggers a Cas10-mediated double-strand break within the corresponding template DNA, after which Cas7 (Csm3) cleaves the transcript RNA. Upon target binding, Cas10 also generates cyclic oligoadenylates, which activate the RNase Csm6 to degrade non-specific RNAs.

Cas3 is the hallmark protein of type I systems and is recruited upon target binding by Cascade to cleave the foreign DNA. Although the overall architecture of Cascade is conserved, its composition can vary between different subtypes and homology of the subunits has often been established on the basis of functional similarities rather than sequence similarities. Among the seven subtypes that have been identified to date (I-A to I-F and I-U), the I-E system of Escherichia coli is most thoroughly characterized and has the full complement of subunits that are found in type I systems, thus serving as a model for understanding type I interference. 2



1. Luciano Marraffini: (Ph)ighting phages – how bacteria resist their parasites 2020 Feb 13
2. Frank Hille et al.,: The Biology of CRISPR-Cas: Backward and Forward MARCH 08, 2018
3. Eugene V. Koonin (2013): Comparative genomics of defense systems in archaea and bacteria  2013 Apr; 4



Borja Alonso-Lerma: Evolution of CRISPR-associated Endonucleases as Inferred from Resurrected Proteins (2022)

https://reasonandscience.catsboard.com

111Perguntas .... - Page 5 Empty Re: Perguntas .... Wed Aug 17, 2022 8:02 am

Otangelo


Admin

Eugene V. Koonin (2013):  Arms race between viruses and their hosts is arguably the most powerful and relentless driving force in evolution. As a result, numerous extremely diverse and elaborate antiviral defense systems have emerged and occupy a substantial part of the genome especially in free-living archaea and bacteria. The defense systems of prokaryotes can be classified into two broad groups that differ in their modes of action. The first group includes those defense systems that function on the self–non-self discrimination principle, with DNA, usually being the target of the discriminatory recognition; these defense mechanisms can be viewed as prokaryotic immunity. At least three types of defense systems and their derivatives belong to this group. The best characterized of these are the extremely numerous and diverse restriction-modification (R-M) systems that use methylation to label the ‘self’ genomic DNA and recognize and cleave any unmodified ‘non-self’ DNA. Another defense system in this group is DNA phosphorothioation (known as the DND system), which labels DNA by phosphothiolation and destroys unmodified DNA. The R-M and DND systems represent the prokaryotic version of innate immunity.

Unlike R-M and DND systems, which attack non-self invaders indiscriminately, the CRISPR-Cas (CRISPR-associated genes) systems are able to memorize the encounters with the infectious agents and attack them specifically afterward. Thus, CRISPR-Cas is often viewed as a prokaryotic adaptive immunity system.

The second group of defense systems is generally based on programmed cell death or dormancy induced by infection. Numerous and diverse toxin-antitoxin (TA) systems belong in this category. Depending on the nature of toxins and antitoxins, the TA systems are currently classified into three types: type I with antisense RNA as antitoxin and a protein, usually a small membrane holin-like protein as a toxin; type II, in which both toxin and antitoxin are proteins, and type III, in which with the RNA antitoxin directly inactivates the protein toxin. Two additional types of TA systems (IV and V) have been recently proposed based on distinct mechanisms of action of the respective antitoxins. In addition to the TA systems, abortive infection (ABI) or phage exclusion systems also often use the mechanism of cell death or dormancy. These systems have not been so far classified in detail, but some of them fit well into the TA systems description. The vast majority of toxins in both TA systems and ABI systems interfere with the translation process, mostly via mRNA or tRNA cleavage. 1

Eugene V. Koonin (2018): All CRISPR–Cas systems employ the same architectural and functional principles, and given the conservation of the principal building blocks, share a common ancestry. 2

1. Eugene V. Koonin (2013): Comparative genomics of defense systems in archaea and bacteria 2013 Apr; 4
2. Eugene V. Koonin: The basic building blocks and evolution of CRISPR–Cas systems 2018 Apr 13.

An updated evolutionary classification of CRISPR-Cas systems.
https://europepmc.org/article/PMC/5426118

Owing to the complexity of the gene composition and genomic architecture of the CRISPR–Cas systems, any single, all-encompassing classification criterion is rendered impractical, and thus a ‘polythetic’ approach based on combined evidence from phylogenetic, comparative genomic and structural analysis was developed

My comment: In other words, they are unable to characterize a common ancestor, upon which all CRISPR-Cas systems evolved and diversified.

https://reasonandscience.catsboard.com

112Perguntas .... - Page 5 Empty Re: Perguntas .... Wed Aug 17, 2022 8:02 am

Otangelo


Admin

Understanding CRISPR-Cas9
https://www.youtube.com/watch?v=cLMo6DYdJRE

1min 26s: Let's begin by talking about how CRISPR cas9 works naturally in bacteria like streptococcus pyogenes or E.Coli or something like that.  Essentially CRISPR cas9 in a bacteria acts as an adaptive immune response that is it remembers when a virus has infected the cell in the past and it keeps a little bit of viral DNA and uses it so that if the same species of virus infects the cell again it will be able to respond to it much more quickly and more effectively. A bacteriophage that's a kind of virus that infects bacteria by injecting its DNA into this bacterial cell like a syringe, it injects its DNA into the bacterial cell. And the first thing that happens is a pair of enzymes called CAS one and CAS two they are actually two separate enzymes but they function together they're joined at the hip they're always together and they work in concert and what they will do is they will cut out a region of the viruses DNA called a protospacer and stick it into part of the bacterial chromosome that's called a CRISPR array

Perguntas .... - Page 5 Crispr20

In the CRISPR array, there are repeats separated by spaces and so this protospacer will become a spacer in the CRISPR array the term proto means ahead of or before and that's exactly what happens these enzymes Cas1 and Cas2 identify this and think well that would be a suitable spacer let's turn it into a spacer so it's a proto-spacer and they make it into a spacer, of course, they don't really think anything they're enzymes they don't have brains. 

The spacer gets inserted at the five-prime end of the crisper array that is at the five-prime end of the complementary strand of the crisper array so it gets put there and then they build a new repeat region afterwards and you'll notice that there's a repeat after each spacer spacer repeat spacer repeat spacer repeat. Every one of those repeats is exactly the same as all the other repeats that's why it's called a repeat and the spaces of course are in between them. The term CRISPR array CRISPR the word CRISPR stands for clustered regularly interspaced palindromic regions ( A palindrome is a word, number, phrase, or other sequence of characters which reads the same backward as forward, such as madam or racecar)  Clustered means all together because this CRISPR array is all in one place on the chromosome. All these spaces are all together in one cluster regularly interspaced is referring to the spaces that are regularly placed between these repeats along the CRISPR region. The palindromic repeats well the repeat part of that makes sense right because these repeats are repeated and the palindromic bit simply means that there are regions within those repeats that read the same on both strands in a five prime to three prime direction

Restriction endonucleases that restriction endonucleases identify usually a restriction site because it's a palindrome these repeats are often a site where enzymes can interact. Now when it comes to cas1 and cas2 taking a proto-spacer and turning it into a spacer they don't just cut the DNA randomly in any old place they cut it at a precise location ajacent upstream of a proto-spacer adjacent motif now a proto spacer. 

This term proto spacer is obviously referring to the protospacer adjacent means next to and a motif is that's an English word that just means a regularly repeated pattern as you'd see on wallpaper or something like that and of course it's quite common in any DNA to find two guanines together but the proto-spacer adjacent motif at least in streptococcus pyogenes is any nucleotide at all followed by guanine guanine. Two guanines together followed following after anything adenine thymine cytosine another guanine it doesn't really matter. Any nucleotide followed by two guanines that is the protospacer adjacent motif or the pam as its called. 

Perguntas .... - Page 5 Crispr19

These enzymes cas1 and cas2 they'll scan the DNA looking for a pam site and when they find it they'll go upstream that is to the five prime end on the complementary strand or the coding strand and then they'll cut out a section of bases probably around about 20 to 26 bases long and then turn that into a spacer and they'll insert the spacer at the five prime ends of the crisper region and then build a new repeat to the five prime end of that pushing the spacer further and further toward the five prime ends. The crisper array then is flexible as more I mean if a particular bacteria has been infected by lots of different bacteria it may have a very long CRISPR array with lots of spaces and repeats. Other bacteria might only have one or a few. In some bacteria, people have discovered that they have hundreds of spaces and in other species of bacteria, there may only be a couple of them so it's a flexible CRISPR array. 

From time to time RNA polymerase will transcribe that CRISPR region that whole CRISPR array into an RNA molecule but it's not a messenger RNA because it's not going to go to a ribosome and be translated into a polypeptide we call it pre-CRISPR RNA it's a single RNA molecule containing both repeat and spacer regions and another kind of RNA called unprocessed tracer RNA which has come it's been transcribed from another gene somewhere else in the cell about a quarter of the way around the chromosome there's another gene that's transcribed by RNA polymerase to make this unprocessed tracer RNA and the tracer RNA has regions in it that are complementary to regions in the CRISPR RNA and so it will stick to again those complementary regions will stick to the repeat regions in the CRISPR RNA

Then an enzyme called RNAase comes along and cuts through those repeat regions giving a piece of RNA that is made of the RNA from the spacer the RNA from the repeat and some of the unprocessed tracer RNA which we call tracer RNA so we have this structure in the cell made really of two pieces of RNA. One piece of RNA one molecule of RNA which is spacer RNA and repeat RNA that's a single polymer of RNA nucleotides and then we have another polymer of RNA nucleotides which is held to it by hydrogen bonds called tracer RNA. Altogether we can refer to that as CRISPR colon tracer RNA or CRISPR tracer RNA. That molecule gets picked up by an enzyme called cas9. Cas9 grabs that CRISPR tracer RNA and holds onto it and now we refer to it as gRNA or guide RNA. So guide RNA is CRISPR tracer RNA but it's crisp tracer RNA that's attached to a cas9 enzyme. 

Perguntas .... - Page 5 Crispr18

The cas9 enzyme itself is made of a single polypeptide chain but it has six different regions that do different things. In particular we don't need to know what all these regions do but there are a couple that I think are actually helpful to have a look at.  Firstly there's this area that is called the pi which stands for pam interacting domain that's the part of the polypeptide part of the protein which identifies the pam hence the name pam interacting it identifies gg it has a region in it that has a complementary shape and charges to two guanines and that helps the cas9 enzyme to identify gg something on a stretch of DNA.  These two domains are also really helpful and by domain we just mean a part of the protein that sort of form it performs a specific function it is all one polypeptide again but HNC and RuvC are what we call nuclease domains that is they're the part of the protein that actually cuts the DNA. A lot of the rest of this structure is there just to hold the guide RNA in place in the cas9 enzyme. 

Perguntas .... - Page 5 Crispr16

We have both right here the CRISPR RNA made of both the RNA that was transcribed from the spacer region of the DNA and the RNA that was transcribed from the repeat region all as a single polymer of nucleotides all as a single RNA molecule.  And then held to that RNA molecule we've got a separate polymer of nucleotides here a separate RNA held to it by complementary base pairing the way to think about this really is the CRISPR RNA is kind of like the lenses in eyeglasses. It's the lenses that do the job of the glasses the reason one has glasses in the first place is because of what the lenses do for somebody. The purpose of the arms of the glasses is just to hold the lenses to one face and that's what the tracer RNA does in the guide RNA the tracer RNA is really there just to anchor the CRISPR RNA to the cas9 enzyme it serves as a scaffold we say for the CRISPR RNA holding it to the enzyme 

You may have also heard of sg RNA and there's a bit of confusion around that a lot of people use the terms sg RNA and gRNA interchangeably as though they're the same thing they're not exactly the same thing and there's even confusion around whether it what sg actually stands for some people say that it stands for short guide other people say it stands for synthetic guide and they both actually make sense because it is short and it is synthetic and by synthetic I mean it's made in a laboratory.


CRISPR cas9 in a bacteria acts as an adaptive immune response that is it remembers when a virus has infected the cell in the past and it keeps a little bit of viral DNA and uses it so that if the same species of virus infects the cell again it will be able to respond to it much more quickly and more effectively. A bacteriophage infects bacteria by injecting its DNA into this bacterial cell like a syringe. And the first thing that happens is a pair of enzymes called CAS1 and CAS2 they are actually two separate enzymes but they function together they're joined at the hip they're always together and they work in concert and what they will do is they will cut out a region of the viruses DNA called a protospacer and stick it into part of the bacterial chromosome that's called a CRISPR array. 

In the CRISPR array, there are repeats separated by spaces and so this protospacer will become a spacer in the CRISPR array. The term proto means ahead of or before and that's exactly what happens these enzymes Cas1 and Cas2 are programmed to identify this as a suitable spacer and turn it into a spacer so Cas1 and Cas2 turn a proto-spacer into a spacer.  The spacer gets inserted at the five-prime end of the crisper array that is at the five-prime end of the complementary strand of the crisper array so it gets put there and then they build a new repeat region afterward. There's a repeat after each spacer:  Spacer repeat, spacer repeat, spacer repeat. Every one of those repeats is exactly the same as all the other repeats that's why it's called a repeat and the spaces are in between them. The term CRISPR  stands for clustered regularly interspaced palindromic regions ( A palindrome is a word, number, phrase, or another sequence of characters which reads the same backward as forward, such as madam or racecar)  Clustered means all together because this CRISPR array is all in one place on the chromosome. All these spaces are all together in one cluster regularly interspaced. 

A restriction endonuclease is an enzyme that cleaves DNA into fragments at or near specific recognition sites within molecules known as restriction sites. Restriction endonucleases identify usually a restriction site because it's a palindrome these repeats are often a site where enzymes can interact. Now when it comes to cas1 and cas2 taking a proto-spacer and turning it into a spacer they don't just cut the DNA randomly in any old place they cut it at a precise location adjacent upstream of a proto-spacer. It's quite common in any DNA to find two guanines together but the proto-spacer adjacent motif at least in streptococcus pyogenes is any nucleotide at all followed by guanine guanine. Two guanines together followed following after anything adenine thymine cytosine another guanine it doesn't really matter. Any nucleotide followed by two guanines that is the protospacer adjacent motif or the pam as it's called. 

These enzymes cas1 and cas2 they'll scan the DNA looking for a pam site and when they find it they'll go upstream that is to the five prime end on the complementary strand or the coding strand and then they'll cut out a section of bases probably around about 20 to 26 bases long and then turn that into a spacer and they'll insert the spacer at the five prime ends of the crisper region and then build a new repeat to the five prime end of that pushing the spacer further and further toward the five prime ends. The crisper array is flexible. If a particular bacteria has been infected by lots of different bacteria it may have a very long CRISPR array with lots of spaces and repeats. Other bacteria might only have one or a few. In some bacteria, people have discovered that they have hundreds of spaces and in other species of bacteria, there may only be a couple of them so it's a flexible CRISPR array. 

From time to time RNA polymerase will transcribe that CRISPR region that whole CRISPR array into an RNA molecule but it's not a messenger RNA because it's not going to go to a ribosome and be translated into a polypeptide we call it pre-CRISPR RNA. It's a single RNA molecule containing both repeat and spacer regions and another kind of RNA called unprocessed tracer RNA which is recruited.  It's been transcribed from another gene somewhere else in the cell by RNA polymerase to make this unprocessed tracer RNA and the tracer RNA has regions in it that are complementary to regions in the CRISPR RNA and so it will stick to the repeat regions in the CRISPR RNA. 

Then an enzyme called RNAase comes along and cuts through those repeat regions giving a piece of RNA that is made of the RNA from the spacer the RNA from the repeat and some of the unprocessed tracer RNA which we call tracer RNA. So we have this structure in the cell made really of two pieces of RNA. One piece of RNA one molecule of RNA which is spacer RNA and repeat RNA that's a single polymer of RNA nucleotides,  and then we have another polymer of RNA nucleotides which is held to it by hydrogen bonds called tracer RNA. All together we can refer to that as CRISPR colon tracer RNA or CRISPR tracer RNA. That molecule gets picked up by an enzyme called cas9. Cas9 grabs that CRISPR tracer RNA and holds onto it and now we refer to it as gRNA or guide RNA. So guide RNA is CRISPR tracer RNA but it's crisp tracer RNA that's attached to a cas9 enzyme. 

The cas9 enzyme itself is made of a single polypeptide chain but it has six different regions that do different things. In particular, we don't need to know what all these regions do but there are a couple that is actually helpful to have a look at.  Firstly there's this area that is called the pi which stands for pam interacting domain that's the part of the polypeptide part of the protein which identifies the pam hence the name pam interacting. It identifies gg. It has a region in it that has a complementary shape and charges to two guanines and that helps the cas9 enzyme to identify gg on a stretch of DNA. These two domains are also really helpful and by domain is meant a part of the protein that performs a specific function. It is all one polypeptide again but HNC and RuvC are what we call nuclease domains that is they're the part of the protein that actually cuts the DNA a lot of the rest of this structure is there just to hold the guide RNA in place in the cas9 enzyme 

CRISPR adaptation requires several processes, including

1. Selection and processing of spacer precursors and their subsequent localization to, and integration into, the CRISPR loci. At the heart of CRISPR adaptation is a protein complex, the Cas1-Cas2 “workhorse,” which catalyzes the addition of new spacers to CRISPR memory banks. Foreign DNA is converted to pre-spacer substrates and captured by the Cas1- Cas2 complex. 

2. After this, Cas1-Cas2 locates the genomic CRISPR locus and docks in the appropriate position for insertion of the new spacer into the CRISPR array, 

3. Before integration, accurate processing of the spacer precursors is required to ensure that the new spacers are compatible with the protein machinery in order to elicit CRISPR-Cas defense.

4. Duplicating a CRISPR repeat. The cues directing the docking of substrate-laden Cas1-Cas2 differ between systems, with some relying on intrinsic sequence specificity and others assisted by host proteins.  For a given CRISPR-Cas system, spacers must typically be of a certain length and be inserted into the CRISPR in a specific orientation. It is becoming increasingly apparent that Cas1-Cas2 complexes from diverse systems are capable of ensuring that these system-specific factors are met with high fidelity. New findings also account for the ordering of stored memories: Typically, the insertion of new spacers is directed to one end of CRISPR arrays, and it has been shown that this enhances immunity against recently encountered invaders. The chronological ordering of new spacers has enabled insights into the temporal dynamics of interactions between hosts and invaders that are constantly changing. Some 

5. CRISPR-Cas systems use existing spacers to recognize previously encountered elements 

6. CRISPR-Cas systems promote the formation of new CRISPR memories, a process known as primed CRISPR adaptation. Viruses and plasmids that have escaped previous CRISPR-Cas defenses through genetic mutations trigger primed CRISPR adaptation. Primed CRISPR adaptation is also strongly promoted by recurrent invaders, even in the absence of escape mutations. This has led to previously separate paradigms of invader destruction and primed CRISPR adaptation beginning to converge into a unified model.

CRISPR adaptation is crucial for ensuring both population-level protection through spacer diversity and protection of the host through invader clearance. Thus, despite the relative wealth of mechanistic information about CRISPR adaptation in a few specific types, work in other systems continues to reveal distinct modes of operation for spacer acquisition. Do other systems possess analogous mechanisms that have yet to be discovered, or does the absence of priming in these systems explain the prevalence of type I systems in nature?

https://reasonandscience.catsboard.com

113Perguntas .... - Page 5 Empty Re: Perguntas .... Wed Aug 17, 2022 8:05 am

Otangelo


Admin

Introduction 
The CRISPR-Cas systems mediating adaptive immunity against viruses and other forms of foreign DNA (notably plasmids) in archaea and bacteria are encoded by large, complex genomic loci that consist of cassettes of CRISPR repeats which are associated with remarkably diverse clusters of CRISPR-associated (cas) genes. At least 45 distinct protein families have been identified among the products of the cas genes. An analysis involving more sensitive methods of sequence comparison and additional evidence from genomic context has revealed distant homologous relationships between some of these families, paring down the number of distinct protein groups to approximately 25. The CRISPR-Cas loci combine the presence of highly conserved genes and gene blocks with extreme variability of both gene composition and operon architecture. This striking fluidity of the CRISPR-Cas system poses both fundamental and more practical challenges. Explaining the evolution of any complex biological system is a fundamental and traditionally difficult problem in evolutionary biology, starting with Darwin’s scenario for the evolution of the eye. In the case of CRISPR-Cas, the difficulty is exacerbated by the unusually polymorphic, apparently loose arrangement of the system components. 

Classification of the CRISPR-Cas Systems 
The recently developed classification of CRISPR-Cas systems divides them into three distinct types (I, II and III). All these systems include two universal genes: cas1 encoding a metal-dependent DNase with no apparent sequence specificity that could be involved in the integration of the alien DNA (spacer) into CRISPR cassettes, and cas2 encoding a metal-dependent endoribonuclease that also appears to be involved in the spacer acquisition stage. Otherwise, however, the three types of CRISPR-Cas systems substantially differ in their sets of constituent genes, and each is characterized by a unique signature gene. The signature genes for the three types are, respectively, cas3 (a superfamily 2 helicase containing an N-terminal HD superfamily nuclease domain), cas9 (a large protein containing a predicted RuvC-like and HNH nuclease domains) and cas10 (a protein containing a domain homologous to the palm domain of nucleic acid polymerases and nucleotide cyclases). Within these three types, CRISPR-Cas systems have been further classified into subtypes on the basis of several considerations that include distinct signature genes, along with the phylogeny of the universal cas1 gene. The Cas proteins known as RAMPs (Repeat-Associated Mysterious Proteins) are present in several copies in both type I and III systems. Some of the RAMPs have been shown to possess sequence- or structure-specific RNAse activity that is involved in the processing of pre-crRNA transcripts. The crystal structures of several RAMPs have been solved and indicate that they contain one or two domains that display distinct versions of the RNA recognition motif (RRM) also known as ferredoxin fold.

Cas Protein Families

Cas1 and Cas2: Signature Cas Proteins Implicated in Spacer Acquisition 
Two Cas proteins, Cas1 and Cas2, are represented in all CRISPR/Cas systems that are predicted to be functionally active. These proteins are thought to function as the ‘information processing’ module of CRISPR-Cas that is involved in spacer integration (the adaptation stage). The predicted roles of Cas1 and Cas2 in spacer acquisition are in agreement with the observations that these proteins are not involved in the antiviral defense stage of the mechanism when a spacer is already present in the CRISPR array. The cas1 and cas2 genes comprise the cores of the three distinct types of CRISPR-Cas systems. The putative nuclease/integrase Cas1 is the most conserved among all Cas proteins. This protein is widely used as a marker for detection of CRISPR-Cas systems in bacterial and archaeal genomes and for construction of phylogenetic trees that provide a framework for reconstruction of CRISPR-Cas system evolution. Based on the evolutionary conservation of several acidic residues and a histidine, Cas1 has been predicted to possess nuclease activity. The Cas1 protein from Pseudomonas aeruginosa is a metal-dependent nuclease that cleaves ssDNA or dsDNA, generating approximately 80 bp DNA fragments. The conserved amino acid residues of Cas1 line up a metal-binding pocket in the a-helical domain of a novel fold. 

Perguntas .... - Page 5 Cas1_c10
Cas1 structure and domain fusions. 
The cartoon shows the dimeric structure of Cas1. The catalytic alpha-helical domain is shown in dark violet and the N-terminal domain is shown in green. Catalytic residues are yellow and the metal ion is red

The catalytic domain is connected to the N-terminal, mostly beta-stranded domain by a flexible linker; Cas1 protein forms homodimers. Mutation of metal ion-binding amino acid residues of Cas1 inhibits Cas1- catalyzed DNA degradation. The function of the N-terminal domain is not clear. Similar properties have been reported for the Cas1 protein (YgbT) from E. coli. Additionally, nuclease activity of E. coli Cas1 against branched DNAs including Holliday junctions, replication forks and 50 -flaps has been demonstrated. Furthermore, genome-wide screens have shown that YgbT physically and genetically interacts with key components of DNA repair systems such as recB, recC and ruvB, suggesting a dual role for Cas1 protein in bacterial antivirus immunity and DNA repair. Several conserved fusions of Cas1 with other protein domains have been detected; all the genes encoding Cas1 fusion proteins belong to cas operons. The most common is the fusion of Cas1 with the Cas4 protein, a RecB-like nuclease (PD-(D/E)XK nuclease superfamily) containing a C-terminal three-cysteine cluster. This fusion might indicate a role for Cas4 in spacer acquisition. Several fusions of Cas1 with reverse transcriptase (RT) similarly might be indicative of involvement of RT in the function of some CRISPR-Cas systems. Furthermore, some RTs appear to be involved in a distinct abortive infection mechanism of antivirus defense, suggesting the possibility that CRISPR-Cas systems and the abortive infection mechanism could be functionally linked. The cas2 gene is typically located immediately downstream of the cas1 gene and encodes a small protein of approximately 100 amino acids; in subtype I–F CRISPR-Cas systems, cas2 is fused to the cas3 gene. Based on the conservation of aspartate or asparagine located after the N-terminal b-strand, the Cas2 protein has been predicted to possess nuclease activity. Several Cas2 proteins have been crystallized and studied biochemically. 

The HD Domain: A Single Strand-Specific DNAse Required for Interference 
The CRISPR-associated HD nuclease is a component of all Type I and Type III systems. In most Type I systems, the HD domain forms an N-terminal fusion with the Cas3 helicase but in some Type I-A systems it appears as a stand-alone gene (cas300). A few Type I–C systems contain the HD domain as a C-terminal fusion with Cas3. In a limited number of Type I-E systems the Cas3 protein (HD and helicase domains) are fused to a Cascade subunit (Cse1). In several Type III CRISPR-Cas systems the HD domain is fused to the Cas10 protein. In some of these Cas10-HD fusions, the HD domain shows a circular permutation so that the N-terminal metal-binding histidine is displaced to the extreme C-terminus. However, the HD domain of Cas10d (Subtype I-D) does not show the circular permutation that makes it similar to HD domain present in Cas3.

Cascade-Associated Proteins 
Expression and transcript processing is the best-characterized stage of CRISPR Cas-mediated immunity. It has been shown that the long primary transcript of a CRISPR locus (pre-crRNA) is processed into short crRNAs. Processing of pre-crRNA is catalyzed by endoribonucleases encoded by cas genes that function either as subunits of a Cascade (CRISPR-associated complex for antiviral defense complex consisting of several Cas proteins, or as stand-alone enzymes, e.g. Cas6 of the archaeon Pyrococcus furiosus. In the latter case, the formation of a multisubunit complex (denoted Cmr complex of Type III-B system) also has been observed. Recently, two additional Cas protein complexes have been characterized. The first one is the Csy complex associated with Type I–F system from P. aeruginosa, which also includes the CRISPR transcript processing endoribonuclease, Cas6f (Csy4), a homolog of Cas6 . The second complex is a(rchaeal)Cascade from S. solfataricus which corresponds to CRISPR Cas system Type I-A. Preliminary models of the architectures of these complexes are shown in Fig. below. 

Perguntas .... - Page 5 Cascad10
Cascade complexes models.
The models for four characterized Cascade complexes include Cascade from E. coli, Csy complex for the system Type I–F from P. aeruginosa, aCASCADE from S. solfataricus, and Cmr complex from P. furiosus. For the first three complexes, the observed or inferred stoichiometry of subunits is reflected in the cartoons. The stably associated subunits are shown by solid circles and weakly associated subunits are shown by dashed circles. Three groups of RAMPs (Cas5, Cas6, Cas7) are indicated along with the corresponding gene names. Large subunits are shown by magenta shades and the small subunits by yellow shades

The general features of the Cascade complexes in Type I CRISPR-Cas systems are: (1) multiple subunits of Cas7, apparently involved in binding crRNA; (2) strong association between Cas7 and Cas5 proteins; (3) loose association of Cas6 with Cascade; Cas6 missing in some organisms; (4) loose association between the large (Cse1/CasA) and small subunits (Cse2/CasB) if present.

crRNA Biogenesis
Mature crRNAs are key elements in CRISPR-Cas defense against genome invaders. These short RNAs are composed of unique repeat/spacer sequences that guide the Cas protein(s) to the cognate invading nucleic acids for their destruction. The biogenesis of mature crRNAs involves highly precise processing events. Interestingly, different types of CRISPR-Cas systems have distinct crRNA maturation mechanisms. The CRISPR repeat-spacer array is transcribed as a precursor CRISPR RNA molecule (pre-crRNA) that undergoes one or two maturation steps. In type I CRISPR-Cas systems, pre-crRNA is cleaved within the repeat regions by a specific Cas6-like endoribonuclease that at least in some cases is a subunit of a Cascade complex to yield the mature crRNAs. In type III systems, the standalone endoribonuclease Cas6 processes pre-crRNA by cleavage within the repeats, producing an intermediate molecule that is further trimmed to generate the mature crRNAs. Type II systems have a unique crRNA biogenesis pathway, in which a trans-acting small RNA (encoded by the CRISPR-Cas locus) base pairs with each repeat sequence of the pre-crRNA to form a double-stranded RNA template that is cleaved by the housekeeping endoribonuclease III in the presence of protein Cas9 (Csn1). The generated intermediates are then subjected to further maturation by a yet to be revealed mechanism.


The core components of the CRISPR-Cas defense machinery are the short CRISPR RNAs (crRNAs) that associate with one or more Cas proteins to target and destroy invading nucleic acids. The CRISPR-Cas systems are extremely variable in their Cas gene composition; a recent reevaluation has resulted in a classification with three main CRISPR-Cas types that are further divided into subtypes. Despite the Cas diversification, all systems share a common molecular mechanism for genome silencing in which the mature crRNAs contain a unique invader-derived partial sequence that guides the Cas protein(s) to the cognate invading nucleic acids for their eventual destruction. Critical for the activity of CRISPR-Cas is the maturation of crRNAs from the precursor transcript of the CRISPR repeat-spacer array. 

The biogenesis of mature crRNAs can be divided into three steps. 

In the first step, transcription, a long primary transcript or precursor crRNA (pre-crRNA) is transcribed from a promoter located upstream of the leader preceding the CRISPR repeat-spacer array. 
In the second step, cleavage, the pre-crRNA is cleaved at a specific site within the repeats to yield intermediate crRNAs that consist of the entire spacer sequence flanked by partial repeat sequences. 
In some cases, an additional step, processing, concerns a second nucleolytic processing of the intermediate crRNA that generates the active mature crRNAs. 

The diversification of CRISPR-Cas into various (sub)types together with the large panel of distinct Cas proteins correlates with distinct types of crRNA biogenesis. A common theme among the subtypes is the (unidirectional) transcription of pre-crRNA followed by a first processing event within the repeats. In types I and III, a Cas6-like protein catalyzes this step (Fig. 5.1). 

Perguntas .... - Page 5 Cascad13
Comparison of crRNA processing pathways in type I, II, and III systems. 
In the type I-E system, the palindromic repeats in pre-crRNA form hairpin structures that are recognized by the nuclease Cas6e (Cse3), which is an integral subunit of Cascade. After cleavage, the crRNA hairpin remains associated with Cas6e while other subunits bind the 50 handle and spacer, which is used for the recognition of cognate genetic element sequences. In type II systems, pre-crRNA with unstructured repeats is bound to an RNA species known as tracrRNA that is complementary to the repeat sequence, forming an RNA duplex that is recognized and cleaved by host RNase III in the presence of Cas9 (Csn1) protein. Further processing by unknown nucleases generates mature crRNA. In type III-B systems, crRNA is generated by the Cas6 endonuclease (as mentioned for type I systems). Cas6 binds unstructured pre-crRNA, cleaving within the repeat to generate crRNA with 50 and 30 repeat-derived termini. These crRNAs are taken up by archaeal Cascade (homologous to a type I-A system) or alternatively loaded into the Cmr (type III-B) complex, when present. In the latter case, the 30 repeat-derived sequence is trimmed away by unknown nucleases. The recently described Cas5d endoribonuclease of subtype I-C that also cleaves pre-crRNA within the repeats and assembles in a Cascade-like complex (Nam et al. 2012) is not represented here

In type II, a trans-acting small RNA directs pre-crRNA dicing by housekeeping endoribonuclease III-mediated cleavage within the repeats in the presence of Cas9 (Csn1) (Fig. above). The processed crRNAs from types I (I-A, I-E, I-F) do not seem to undergo further maturation, whereas types II and III (and possibly some type I subtypes) have a second maturation step to produce the active crRNAs, the distinct components and mechanisms of which are yet to be determined (Fig. above).

crRNA Biogenesis in Type I Systems 
Type I systems are present in both bacteria and archaea. Like all CRISPR-Cas systems, types I are predicted to target mobile genetic sequences. Experimental evidence has been provided for spacer acquisition in Escherichia coli (subtype I-E), and the correlating resistance against plasmid and phage. In Pseudomonas aeruginosa, the system (subtype I-F) is required for inhibition of biofilm formation that depends on an integrated bacteriophage and its role in phage maintenance resistance is yet to be demonstrated. Type I systems are characterized by a Cascade (-like) ribonucleoprotein complex and a nuclease/helicase (Cas3) required for interference. Processing of the pre-crRNA transcript is catalyzed by a Cas6-like metal-independent endoribonuclease that cleaves the repeat sequence at a conserved position 8 nt upstream of the repeat-spacer boundary. The mature crRNAs end up in Cascade where they play the crucial role of guiding the complex to the complementary target DNA. In most type I systems characterized so far, the Cas6-like enzyme is a subunit of a Cascade-like complex, which is distinct from the apparent standalone version of Cas6 that may supply the intermediate or mature crRNAs to different complexes in type III systems. The crRNAs of subtypes I-E and I-F have stable hairpin structures, the functions of which might be to initially expose the cleavage site to the Cas6 catalytic domain, and to subsequently assist in the stable interaction between guide crRNA and Cascade. Following Cas6-mediated cleavage within the repeats, crRNAs of sub-types I-A, I-E, and I-F are not processed any further.

crRNA Biogenesis in Type II Systems 
Type II CRISPR-Cas systems are characterized by a minimal locus with only four genes (cas9, cas1, cas2, and either csn2 or cas4) and the presence of tracrRNA in the vicinity of the cas operon or repeat-spacer array. Types II are present in bacteria but have, at this point, never been detected in archaea. The system has been studied mainly in streptococci where the first biological evidence for immunity against both cell death (mediated by lytic phages, Streptococcus thermophilus) and acquisition of virulence genes (mediated by lysogenic bateriophages, Streptococcus pyogenes) was demonstrated. Type II is also active against plasmid maintenance. In 2011, a study in the Gram-positive human pathogen S. pyogenes revealed a unique crRNA biogenesis pathway characteristic for type II wherein a first processing event is achieved by the coordinated action of three factors: a trans-acting small RNA, the host-encoded RNase III and the Cas9 protein. 

crRNA Biogenesis in Type III Systems 
Type III CRISPR-Cas systems are present in both bacteria and archaea. This variant has mainly been studied in the archaeon P. furiosus (subtype III-B). In addition, crRNA biogenesis has recently been investigated in the Gram-positive bacterial pathogen Staphylococcus epidermidis (subtype III-A). In archaeal species, subtype III-B spacers are predicted to target viruses although no in vivo experiment has yet proven the full activity of the system in the limitation of virus propagation. However, recent evidence for targeting of a small RNA, antisense to pre-crRNA, was demonstrated in P. furiosus. In S. epidermidis, the subtype III-A was demonstrated to be critical for horizontal dissemination of antibiotic resistance by directly targeting invading conjugative plasmid DNA. The hallmark of crRNA production in type III is the protein Cas6, which is also present in type I. As mentioned above, in type I systems, Cas6-like endoribonucleases are either an integral component of the Cascade complexes (for example Cas6e and Cas6f in E. coli and P. aeruginosa, respectively, or are weakly associated with the complex (for example Cas6 in S. solfataricus a Cascade. In contrast, Cas6 of subtype III-B seems to function as a standalone CRISPR repeat RNA-specific endoribonuclease in P. furiosus, S. solfataricus and presumably in other systems III of many archaea and possibly bacteria. crRNA maturation in type III occurs in two steps. A first processing event involves dicing of pre-crRNA by Cas6-mediated cleavage within the repeats to generate 1X intermediate units that undergo further maturation to produce the active mature crRNAs. Another feature of the CRISPR-Cas type III is the presence of csm and cmr genes encoding repeat-associated mysterious proteins (RAMP) proteins in subtype III-A and III-B, respectively. The functions of these Cas proteins remains to be clarified although some recent studies have indicated that they may function in crRNA biogenesis and/or targeting of invading nucleic acids (DNA in the case of subtype III-A and RNA in the case of subtype III-B).

https://reasonandscience.catsboard.com

114Perguntas .... - Page 5 Empty Re: Perguntas .... Wed Aug 17, 2022 11:14 am

Otangelo


Admin

Adaptation

Cas1 preferentially binds CRISPR DNA in a Cas2- dependent manner, further supporting a direct role in spacer acquisition. It could be speculated that the Cas1-Cas2 complex both transport the spacer material and perform spacer integration, which would explain the need for the many Cas1 subunits in the complex. A few additional factors are known to be required for spacer acquisition: Cas9, Csn2 and tracrRNA in Type II-A and Cas4 in Type I-B. The roles of tracrRNA, Csn2 and Cas4 are unclear but Cas9 probably guides the integration machinery. Host polymerases, ligases and recombination proteins are likely to perform generic steps in the adaptation, as such factors can be found in every host cell. Spacer selection appears guided by certain sequence elements in the target. Analysis of target sequences has revealed a short motif next to the target sequence called protospacer adjacent motif (PAM) that is crucial for discrimination between self and non-self. While initially thought to be important only for interference, the PAM also has a role in spacer acquisition. This is supported by the fact that most newly acquired spacers have a PAM next to their protospacer. In the Type II-A system, Cas9 is responsible for identifying the PAM as mutations that disable Cas9's PAM recognition result in acquisition from protospacers without PAM. In Type I-E, PAM recognition during spacer acquisition may be different as they are indicated to be identified by Cas1-Cas2 alone. However, Cascade increases the frequency of correct PAMs for inserted spacers. In Type I-E, spacers are preferentially incorporated from extra-chromosomal elements, which is demonstrated to be a result of a connection between adaptation and replication. In Type II-A, spacer acquisition may not be biased toward extra-chromosomal elements as cells with nuclease-deficient Cas9 demonstrate unbiased spacer sampling and an increased rate of spacer acquisition in one study. How are spacers actually integrated into the CRISPR array? Cas1 nuclease activity is required for nicking the CRISPR array in E. coli, and Cas1 is possibly responsible for the integration of the new spacer. An in vitro study with Type I-E Cas1 and Cas2 confirm that the complex can insert DNA fragments into a CRISPR array by a mechanism reminiscent of retroviral integrases and transposases. In both the Type I-E and Type II-A system, it is demonstrated that parts of the leader and one repeat are required for spacer integration. Further, the leader-proximal repeat serves as template for synthesis of the new repeat, probably by a strand separation mechanism. The leader dependence is likely the cause for the observed polar addition of spacers to the CRISPR, although there are reported exceptions. The palindromic nature of many CRISPR repeats is important to determine the position and direction of spacer integration into the array. It is indicated that palindromic repeats form cruciform DNA structures that recruits Cas1 and Cas2, and such structures are known

to be a target for Cas1 cleavage. Interestingly, in vitro spacer integration can also be performed at other sequences predicted to form cruciform structures, in the absence of repeats. Taken together, spacer integration is directed both by sequence and structure of the CRISPR. Adaptation has been shown to be coupled to the interference machinery through primed spacer acquisition, which occurs when there is a targeting spacer already present in the CRISPR array. The interference machinery and a pre-existing spacer accelerate the acquisition of subsequent spacers from the same target. Primed spacer acquisition was first described in the Type I-E system in E. coli, but has subsequently been reported for I-B in H. hispanica and I-F in P. atrosepticum, but so far not in any Type II or III system. Priming seems to occur by slightly different processes in the described cases but the exact molecular mechanisms remain unknown. In Type I-F systems, Cas2 is fused to Cas3 [13], further indicating a direct connection between the adaptation and interference processes. Interestingly, spacers with several mismatches that are incapable of providing protection against the target still induce primed spacer acquisition. It should be noted that although Cas9 is required for spacer acquisition in the Type II-A system, this is not an example of primed spacer acquisition as the requirement is not dependent on a pre-existing spacer against the target. The advantages of primed spacer acquisition are obvious: multiple spacers provide increased resistance against invading DNA, and make it more difficult for target to evolve escape mutants as several sites would need to be changed simultaneously.

https://reasonandscience.catsboard.com

115Perguntas .... - Page 5 Empty Re: Perguntas .... Fri Aug 19, 2022 11:38 am

Otangelo


Admin

F.Hille et al., (2018): Prokaryotic CRISPR-Cas adaptive immune systems store memory of past infections, and upon reinfection, deploy RNA-guided nucleases for sequence-specific silencing of phages and other mobile genetic elements (MGEs), such as plasmids and transposons.

The defining feature of these systems, which are found in 50% and nearly 90% of complete bacterial and archaeal genomes, respectively, is the CRISPR array. This genomic locus is composed of alternating identical repeats and unique spacers. The spacer sequences match plasmids and phage genomes, as it was the first hint that CRISPR-Cas might function as a prokaryotic defense mechanism. The function of CRISPR-Cas as an adaptive immune system in which the CRISPR array serves as an archive of previous infections. Adjacent to the CRISPR array is a series of genes encoding the Cas proteins that drive the three phases of immunity: 

1. adaptation, 
2. CRISPR RNA (crRNA) biogenesis, and 
3. interference. 

During adaptation, foreign nucleic acids are selected, processed, and integrated into the CRISPR array to provide a memory of infection. Memory is retrieved when the CRISPR array is transcribed to produce a long precursor crRNA (pre-crRNA) that is processed within the repeat sequences to yield mature crRNAs. Upon subsequent infection, the interference machinery is guided by crRNAs to cleave complementary sequences, termed protospacers, in the foreign nucleic acids. By compromising the selfish, often hostile programs encoded by mobile genetic elements (MGEs), CRISPR-Cas systems protect prokaryotes from succumbing to infection. According to the assortment of cas genes and the nature of the interference complex, CRISPR-Cas systems have been assigned to two classes, which are further subdivided into six types and several subtypes that each possess signature cas genes. Class 1 CRISPR-Cas systems (types I, III, and IV) employ multi-Cas protein complexes for interference, whereas in class 2 systems (types II, V, and VI), interference is accomplished by a single effector protein.

Interference is presented first, followed by the preceding processes that license this final stage of immunity. The role of these immune systems in the ecology of phage-prokaryote interactions and the strategies that phages have evolved to counter CRISPR-Cas.

https://reasonandscience.catsboard.com

116Perguntas .... - Page 5 Empty Re: Perguntas .... Thu Aug 25, 2022 9:12 am

Otangelo


Admin

CRISPR adaptation
The adaptation phase provides the genetic memory that is a prerequisite for the subsequent expression and interference phases that neutralize the re-invading nucleic acids. Conceptually, the process can be divided into two steps:

1. Protospacer selection (Cas1-Cas2 substrate capture) 
2. Generation of spacer material followed by 
3. Integration of the spacer into the CRISPR array and synthesis of a new repeat. 

A bacteriophage infects bacteria by injecting its DNA into this bacterial cell like a syringe. And the first thing that happens is a pair of enzymes called CAS1 and CAS2 that are two separate enzymes but they function together they're joined and work always together in concert. They cut out a region of the viruses' DNA called a protospacer and stick it into the bacterial data bank, a part of the chromosome that's called a CRISPR array.

In the CRISPR array, there are repeats (which are small sections of the DNA extracted from the invading DNA from the phage) separated by spaces and so this protospacer, after processing, will become a spacer in the CRISPR array. The term proto means ahead of or before and that's exactly what happens these enzymes Cas1 and Cas2 are programmed to identify a phage DNA section suitable to become a spacer and turn it into it. The spacer gets inserted at the five-prime end of the crisper array, of the complementary strand, and then the machinery builds a new repeat region afterward. There's a repeat after each spacer:  Spacer repeat, spacer repeat. Every one of those repeats is exactly the same as all the other repeats that's why it's called a repeat and the spaces are in between them. The term CRISPR  stands for clustered regularly interspaced palindromic regions ( A palindrome is a word, number, phrase, or another sequence of characters that reads the same backward as forward, such as madam or racecar)  Clustered means all together because this CRISPR array is all in one place on the chromosome. All these spaces are all together in one cluster regularly interspaced.

A restriction endonuclease is an enzyme that cleaves DNA into fragments at or near a specific recognition site. Restriction endonucleases identify usually a restriction site because it's a palindrome. These repeats are often a site where enzymes can interact. Now when it comes to cas1 and cas2 taking a proto-spacer and turning it into a spacer they don't just cut the DNA randomly in any place. They cut it at a precise location adjacent upstream of a proto-spacer. It's quite common in any DNA to find two guanines together but the proto-spacer adjacent motif at least in streptococcus pyogenes is any nucleotide at all followed by guanine guanine. Two guanines together followed following after anything, like for example adenine - thymine -  cytosine -  another guanine it doesn't really matter. Any nucleotide followed by two guanines that is the protospacer adjacent motif or the PAM as it's called.

These enzymes cas1 and cas2 they'll scan the DNA looking for a PAM site and when they find it they'll go upstream that is to the five prime end on the complementary strand or the coding strand and then they'll cut out a section of bases, around 20 to 26 bases long, and then turn that into a spacer and they'll insert it at the five prime ends of the crisper region, and then build a new repeat to the five prime end of that, pushing the spacer further and further toward the five prime ends. The crisper array is flexible. If a particular bacteria has been infected by lots of different bacteria it may have a very long CRISPR array with lots of spaces and repeats. Other bacteria might only have one or a few. In some bacteria, people have discovered that they have hundreds of spaces and in other species of bacteria, there may only be a couple of them so it's a flexible CRISPR array.13

Dipali G Sashital (2019): The key proteins in collecting and storing the virus DNA are called Cas1, Cas2 and Cas4. Previous work suggests that Cas4 is important for cutting suitable lengths of DNA for storage. The adaptation proteins Cas1 and Cas2 are conserved among most CRISPR systems, suggesting a common molecular mechanism for acquiring spacers. Cas1 and Cas2 catalyze spacer integration via two transesterification reactions mediated by nucleophilic attack on each strand of a double-stranded prespacer substrate at the phosphodiester backbone within the CRISPR array. Integration occurs at the first repeat in the CRISPR array, with one attack occurring between the upstream leader sequence and the repeat and the other occurring on the opposite strand between the repeat and first spacer within the array. These reactions result in the insertion of the prespacer between two single-strand repeats, and this gapped intermediate is repaired by host factors. In order to form a functional spacer, the adaptation complex must capture and process longer fragments of DNA from the invader containing a flanking sequence called a protospacer adjacent motif (PAM). The PAM is an essential motif during target recognition by the surveillance complex and must be present next to the target in order for interference to occur. However, the PAM is not part of the spacer and must be removed from the prespacer prior to integration through a processing step. In addition, integration must occur in the correct orientation to produce a crRNA that is complementary to the PAM-containing strand of the invader. In some systems, additional Cas proteins, such as Cas4, are also required during adaptation. Cas4 is widespread in type I, II, and V systems. In in vivo studies, deletion of cas4 reduced the adaptation efficiency and resulted in the acquisition of non-functional spacers from regions that lacked a correct PAM. Some systems have two cas4 genes that work together to define the PAM, length and orientation of spacers, suggesting that the two Cas4 proteins are involved in processing each end of the prespacer and that they may be present during integration. Similarly, in vitro studies have suggested that Cas4 is involved in PAM-dependent prespacer processing. Cas4 endonucleolytically cleaves PAM-containing 3ʹ-single-stranded overhangs that flank double-stranded prespacers. Importantly, Cas4 cleavage activity is dependent on the presence of Cas1 and Cas2, and Cas4 inhibits premature integration of unprocessed prespacers. These observations suggest that Cas4 associates with the Cas1-Cas2 complex, although direct biochemical and structural evidence for this Cas4-Cas1-Cas2 complex remains elusive. 14

Simon A. Jackson (2017): The Cas1 and Cas2 proteins,  constitute the “workhorse” of spacer integration. Spacers added to CRISPR arrays must be compatible with the diverse range of type-specific effector complex machinery. Thus, despite being near ubiquitous among CRISPR-Cas types, Cas1-Cas2 homologs meet the varied requirements for the acquisition of appropriate spacer sequences in different systems. For example, the effector complexes of several CRISPR-Cas types only recognize targets containing a specific sequence adjacent to where the CRISPR RNA (crRNA) base-pairs with the target strand of a mobile genetic element (MGE). The crRNA-paired target sequence is termed the protospacer, and the adjacent target-recognition motif is called a protospacer-adjacent motif (PAM). PAM-based target discrimination prevents the unintentional recognition and self-destruction of the CRISPR locus by the crRNA-effector complex, yet canonical PAM sequences vary between and sometimes within systems. The Cas1 subunits form two dimers that are bridged by a central Cas2 dimer. . In addition to Cas1-Cas2, at least one CRISPR repeat, part of the leader sequence, and several host factors for repair of the insertion sites (e.g., DNA polymerase) are required. 

Cas1-Cas2 substrate capture
During substrate capture, Cas1-Cas2 is loaded with an integration-compatible prespacer, which is thought to be partially duplexed dsDNA. For type I systems, the presence of a canonical PAM within the prespacer substrate increases the affinity for Cas1-Cas2 binding but is not requisite.  The 3′ single-stranded ends of the prespacer extend into active subunits of each corresponding Cas1 dimer. The length of new spacers is governed by the fixed distances between the two Cas1 wedges and from the branch points to the integrase sites. Many CRISPR-Cas systems have highly consistent yet system-specific spacer lengths, and it is likely that analogous wedge-based Cas1- Cas2 “molecular rulers” exist in these systems to control prespacer length. However, in some systems, such as type III, the length of spacers found within CRISPR arrays appears more variable, and studies of Cas1-Cas2 structure and function in these systems are lacking. 

Recognition of the CRISPR array 
Before integration, the substrate-bound Cas1- Cas2 complex must locate the CRISPR leader repeat sequence. Specific sequences upstream of CRISPR arrays direct leader-polarized spacer integration, both through direct Cas1-Cas2 recognition and assisted by host proteins. The Cas1-Cas2 complexes of several systems show an intrinsic affinity for the leader-repeat region in vitro, yet this is not always wholly sufficient to provide the specificity observed in vivo. It was recently discovered that for the type I-E system, leader-repeat recognition is assisted by the integration host factor (IHF) heterodimer. IHF binds the CRISPR leader in a sequence-specific manner and induces 120° DNA bending, providing a cue to accurately localize Cas1-Cas2 to the leader-repeat junction. A conserved sequence motif upstream of the IHF pivot is proposed to stabilize the Cas1- Cas2–leader-repeat interaction and increase the efficiency of spacer acquisition, supporting binding of the adaptation complex to DNA sites on either side of the bound IHF. IHF is absent in many prokaryotes, including archaea, indicating that other leader-proximal integration mechanisms exist. Indeed, type II-A Cas1-Cas2 from Streptococcus pyogenes catalyzed leader-proximal integration in vitro at a level of precision comparable to that of the type I-E system with IHF. In type II systems, a short leader-anchoring site (LAS) adjacent to the first repeat and ≤6 base pairs of this repeat are essential for CRISPR adaptation and are conserved in systems with similar repeats. Placement of an additional LAS in front of a nonleader repeat resulted in the integration of spacers at both sites, whereas LAS deletion caused ectopic integration at a downstream repeat adjacent to a spacer containing a LAS-like sequence (15). Hence, in contrast to type I-E systems, type II-A systems appear to rely solely on intrinsic sequence specificity for the leader-repeat junction.

Integration into the CRISPR array 
For CRISPR-Cas types that are reliant on PAM sequences for recognition of targets, the acquisition of interference-proficient spacers requires the processing of the prespacer substrate at a specific position relative to the PAM. Each of the four Cas1 monomers in the Cas1-Cas2 complex contains a PAM-sensing domain. The presence of a PAM in the active site of just one of the Cas1 monomers is sufficient to appropriately position the substrate and PAM relative to the cleavage site. Furthermore, the presence of a PAM within the prespacer substrate ensures integration into the CRISPR in the correct orientation. This directional fidelity is critical because otherwise the PAM in the MGE target would lie at the wrong end of the crRNA target binding site, thus precluding target recognition. To avoid premature loss of the PAM directional cue, processing of the prespacer likely occurs after Cas1-Cas2 orients and docks at the leader-proximal repeat. Cas1-mediated processing of the prespacer creates two 3′OH ends required for nucleophilic attack on each strand of the leader-proximal repeat. The initial nucleophilic attack most likely occurs at the leader-repeat junction and forms a half-site intermediate; then, a second attack at the existing repeat-spacer junction generates the full-site integration product. After the first nucleophilic attack, the intrinsic sequence specificity of the Cas1-Cas2 complex defines the site of the second attack and ensures accurate repeat duplication. CRISPR repeats are often semi-palindromic, containing two short inverted repeat (IR) elements, but the location of these can vary. In type I-B and I-E systems, the IRs occur close to the center of the repeat and are important for spacer acquisition. In the type I-E system, both IRs act as anchors for the Cas1-Cas2 complex, which contains two molecular rulers to position the Cas1 active site for the second nucleophilic attack at the repeat-spacer boundary. However, in the type I-B system from Haloarcula hispanica, only the first IR is essential for integration, and a single molecular ruler, directed by an anchor between the IRs, has been proposed. In the type II-A systems of Streptococcus thermophilus and S. pyogenes, the IRs are located distally within the repeats, suggesting that these short sequences may directly position the nucleophilic attacks without a need for molecular rulers. Although these recent findings suggest that leader-repeat regions at the beginning of CRISPR arrays contain sequences to ensure appropriate Cas1-Cas2 localization, further work is required to determine how the spacer integration events are specifically orchestrated in the diverse range of CRISPR-Cas types.

Production of prespacers from foreign DNA 
Despite the elegance of memory-directed defense, CRISPR adaptation is not without complications. For example, the inadvertent acquisition of spacers from host DNA must be avoided because this will result in cytotoxic self-targeting, akin to autoimmunity in eukaryotic adaptive immune systems. Therefore, production of prespacer substrates from MGEs should outweigh production from host DNA.

Naïve CRISPR adaptation 
Acquisition of spacers from MGEs that are not already cataloged in host CRISPRs is termed naïve CRISPR adaptation. For naïve CRISPR adaptation, prespacer substrates are generated from foreign material and loaded onto Cas1-Cas2. The main known source of these precursors is the host RecBCD complex. Stalled replication forks that occur during DNA replication can result in double-strand breaks (DSBs), which are repaired through RecBCD-mediated unwinding and degradation of the dsDNA ends back to the nearest Chi sites (In Escherichia coli, acquisition of new spacers largely depends on RecBCD-mediated processing of double-stranded DNA breaks occurring primarily at replication forks, and that the preference for foreign DNA is achieved through the higher density of Chi sites on the self chromosome, in combination with the higher number of forks on the foreign DNA. This explains the strong preference to acquire spacers both from high copy plasmids and from phages). During this repair process, RecBCD produces single-stranded DNA (ssDNA) fragments, which have been proposed to subsequently anneal to form partially duplexed prespacer substrates for Cas1-Cas2. The greater number of active origins of replication and the paucity of Chi sites on MGEs, compared with the host chromosome, bias naïve adaptation toward foreign DNA. Furthermore, RecBCD recognizes the unprotected dsDNA ends that are commonly present in phage genomes upon injection or before packaging, which theoretically provides an additional phage-specific source of naïve prespacer substrates. Despite the role of RecBCD in substrate generation, naïve CRISPR adaptation can occur in its absence, albeit with reduced bias toward foreign DNA. Thus, events other than double-strand breaks (DSBs) might also stimulate naïve CRISPR adaptation, such as R-loops that occur during plasmid replication, lagging ends of incoming conjugative elements, and even CRISPR-Cas–mediated spacer integration events themselves. Furthermore, we do not know whether all CRISPR-Cas systems have an intrinsic bias toward production of prespacers from foreign DNA. In high-throughput studies of native systems, the frequency of acquisition of spacers from host genomes is likely to be underestimated, because the autoimmunity resulting from self-targeting spacers means that these genotypes are typically lethal. For example, in the S. thermophilus type II-A system, spacer acquisition appears biased toward MGEs, yet nuclease-deficient Cas9 fails to discriminate between host and foreign DNA. It is unknown whether CRISPR adaptation in type II systems is reliant on DNA break repair. Further studies in a range of host systems are required to clarify how diverse CRISPR-Cas systems balance the requirement for naïve production of prespacers from MGEs against the risk of acquiring spacers from host DNA.

crRNA-directed CRISPR adaptation (priming) 
Mutations in the target PAM or protospacer sequences can abrogate immunity, allowing MGEs to escape CRISPR-Cas defenses. Furthermore, the protection conferred by individual spacers varies: Often, several MGE-specific spacers are required to mount an effective defense and to prevent proliferation of escape mutants. Thus, to maintain effective immunity, CRISPR-Cas systems need to undergo CRISPR adaptation faster than MGEs can evade targeting. Indeed, type I systems have a mechanism known as primed CRISPR adaptation (or priming) to facilitate rapid spacer acquisition, even against highly divergent invaders. Priming uses MGE target recognition that is facilitated by preexisting spacers to trigger the acquisition of additional spacers from previously encountered elements. Thus, priming is advantageous when MGE replication within the host cell exceeds defense capabilities. This can occur when cells are infected by mobile genetic element escape mutants or when the levels of CRISPR-Cas activity are insufficient to provide complete immunity using only the existing spacers, even in the absence of MGE escape mutations. Priming begins with target recognition by crRNA-effector complexes.  Therefore, factors that influence target recognition (i.e., the formation and stability of the crRNA-DNA hybrid), including PAM sensing and crRNA-target complementarity, affect the efficiency of primed CRISPR adaptation. Furthermore, these same factors can induce conformational rearrangements in the target-bound crRNA-effector complex that result in favoring either the interference or priming pathways. In type I-E systems, the Cas8e (Cse1) subunit of Cascade can adopt one of two conformational modes, which may promote either direct or Cas1-Cas2–stimulated recruitment of the effector Cas3 nuclease. Cas3, which is found in all type I systems, exhibits 3′ to 5′ helicase and endonuclease activity that nicks, unwinds, and degrades target DNA. In vitro activity of the type I-E Cas3 produces ssDNA fragments of ~30 to 100 nucleotides that are enriched for PAMs in their 3′ ends and that anneal to provide partially duplexed prespacer substrates. The spatial positioning of Cas1-Cas2 during primed substrate generation has not been clearly established, although Cas1-Cas2–facilitated recruitment of Cas3 would imply that the CRISPR adaptation machinery is localized close to the site of prespacer production. In type I-F systems, Cas3 is fused to the C terminus of Cas2 (Cas2-3), so these systems form Cas1–Cas2-3 complexes that couple the CRISPR adaptation machinery directly to the source of prespacer generation during priming. Despite different target recognition modes favoring distinct Cas3 recruitment routes, primed CRISPR adaptation can be provoked by mobile genetic element escape mutants and non-escape (interference proficient) targets. However, when the intracellular copy number influences of the MGE are excluded, interference-proficient targets promote greater spacer acquisition than escape mutants. This forms a positive feedback loop, reinforcing immunity against recurrent threats even in the absence of escapees. If the copy number of the MGE within the host cell is factored in, then escape mutants actually trigger more spacer acquisition. This is because interference rapidly clears targeted MGEs from the cell, whereas escape mutants that evade immediate clearance by existing CRISPR-Cas immunity persist for longer. Over time, the prolonged presence of the escape MGE, combined with the priming-centric CRISPR-Cas target recognition mode, results in higher net production of prespacer substrates and spacer integration. Because priming is initiated by site-specific target recognition (i.e., targeting a priming protospacer), Cas1-Cas2–compatible prespacers are subsequently produced from MGEs with locational biases .However, priming is stimulated more strongly from the interference-proficient protospacer than from the original priming protospacer. 15 

Cas protein–assisted production of spacers
DNA breaks induced by interference activity of class 2 CRISPR-Cas effector complexes could trigger host DNA repair mechanisms (e.g., RecBCD), thereby providing substrates for Cas1- Cas2. In agreement with a model for DNA break–stimulated enhancement of CRISPR adaptation, restriction enzyme activity can stimulate RecBCD-facilitated production of prespacer substrates. RecBCD activity may also partially account for the enhanced CRISPR adaptation observed during phage infection of a host possessing an innate restriction-modification defense system. Whether the enhanced CRISPR adaptation was RecBCD-dependent in this example is unknown. In a CRISPR Cas–induced DNA break model, the production of prespacer substrates is preceded by a sequence-specific target recognition. Although direct evidence to support this concept is lacking, CRISPR adaptation in type II-A systems requires Cas1-Cas2, Cas9, a transactivating crRNA (tracrRNA; a cofactor for crRNA processing and interference in type II systems), and Csn2. The PAM-sensing domain of Cas9 enhances the acquisition of spacers with interference-proficient PAMs. However, Cas9 nuclease activity is dispensable, and existing spacers are not strictly necessary, suggesting that the PAM interactions of Cas9 could be sufficient to select appropriate new spacers. Some Cas9 variants can also function with non-CRISPR RNAs and tracrRNA. This raises the possibility that host or MGE-derived RNAs might direct promiscuous Cas9 activity, resulting in DNA breaks or replication fork stalling that could potentially result in prespacer generation.

Roles of accessory Cas proteins in CRISPR adaptation
Although Cas1 and Cas2 play a central role in CRISPR adaptation, type-specific variations in cas gene clusters occur. In many systems, Cas1-Cas2 is assisted by accessory Cas proteins, which are often mutually exclusive and type-specific. For example, in the S. thermophilus type II-A system, deletion of csn2 impaired the acquisition of spacers from invading phages. Direct interaction between Cas1 and Csn2 also suggests a role for Csn2 in conjunction with the spacer acquisition machinery. Csn2 multimers cooperatively bind to the free ends of linear dsDNA and can translocate by rotation-coupled movement. Given that substrate-loaded type II-A Cas1-Cas2 is capable of full-site spacer integration in vitro, Csn2 may be required for prespacer substrate production, selection, or processing. Potentially, Csn2 binding to the free ends of dsDNA provides a cue for nucleases to assist in prespacer generation. Cas4, another ring-forming accessory protein, is found in type I, II-B, and V systems. Confirming its role in CRISPR adaptation, Cas4 is necessary for type I-B priming in H. hispanica and interacts with a Cas1-Cas2 fusion protein in the Thermoproteus tenax type I-A system. Fusions between Cas4 and Cas1 are found in several systems, which indicates a functional association with the spacer acquisition machinery. Cas4 contains a RecB-like domain and four conserved cysteine residues, which are presumably involved in the coordination of an iron-sulfur cluster. However, Cas4 proteins appear to be functionally diverse, with some possessing uni or bidirectional exonuclease activity, whereas others exhibit ssDNA endonuclease activity and unwinding activity on dsDNA. Because of its nuclease activity, Cas4 is hypothesized to be involved in prespacer generation. In type III systems, spacers complementary to RNA transcribed from MGEs are required for immunity. Some bacterial type III systems contain fusions of Cas1 with reverse transcriptase domains (RTs) that provide a mechanism to integrate spacers from RNA substrates. The RT-Cas1 fusion from M. mediterranea can integrate RNA precursors into an array, which are subsequently reverse-transcribed to generate DNA spacers. However, integration of DNA-derived spacers also occurs, indicating that the RNA derived–spacer route is not exclusive. Hence, the combined integrase and reverse transcriptase activity of RT-Cas1–Cas2 enhances CRISPR adaptation against highly transcribed DNA MGEs and potentially against RNA-based invaders. Other host proteins may also be necessary for prespacer substrate production. For example, RecG is required for efficient primed CRISPR adaptation in type I-E and I-F systems, but its precise role remains speculative. Additionally, it is still enigmatic why some CRISPR-Cas systems require accessory proteins, whereas closely related types do not. For example, type II-C systems lack cas4 and csn2, which assist CRISPR adaptation in type II-A and II-B systems, respectively. These type-specific differences exemplify the diversity that has arisen.

The genesis of adaptive immunity in prokaryotes 
Casposons are transposon-like elements typified by the presence of Cas1 homologs, or casposases, which catalyze site-specific DNA integration and result in the duplication of repeat sites, analogously to spacer acquisition. It is possible that ancestral innate defenses gained DNA integration functionality from casposases, thus seeding the genesis of prokaryotic adaptive immunity. The innate ancestor remains unidentified but is likely to be a nuclease-based system. Co-occurrence of casposon-derived terminal IRs and casposases in the absence of full casposons might represent an intermediate of the signature CRISPR repeat-spacer-repeat structures. However, the evolutionary journey from the innate immunity– casposase hybrid to full adaptive immunity is unclear. Evolution of diverse CRISPR-Cas types would have required stringent coevolution of the Cas1-Cas2 spacer acquisition machinery, PAM and leader-repeat sequences, crRNA processing mechanisms, and effector complexes. In some systems, mechanisms to enhance the production of Cas1-Cas2–compatible prespacers from MGEs, such as priming, might have arisen because naïve CRISPR adaptation is an inefficient process with a high probability of acquiring spacers from host DNA. However, it was recently shown that promiscuous binding of crRNA-effector complexes to the host genome results in a basal level of lethal “self-priming” in a type I-F system. Host CRISPR and cas gene regulation mechanisms might have arisen to balance the likelihood of self-acquisition events against the requirement to adapt to new threats—for example, when the risk of phage infection or horizontal gene transfer is high. Alternatively, it has been proposed that selective acquisition of self-targeting spacers could provide benefits, such as invoking altruistic cell death, facilitating rapid genome evolution, regulating host processes, or even preventing the uptake of other CRISPR-Cas systems. 15

Interference: Cleaving DNA and RNA Invaders
Sequence-specific destruction of invading MGEs is the basis for CRISPR-Cas defense. In the final stage of CRISPR-Cas-mediated immunity, mature crRNAs guide the interference machinery to cleave invading nucleic acids. In order to store the genetic information of a parasitic MGE, a part of the foreign DNA must be integrated in the genomic CRISPR locus of the host. This, however, raises an inherent problem for the interference machinery: the sole reliance on sequence complementarity between the crRNA and the target sequence would result in cleavage of the CRISPR array. Hence, nearly all characterized CRISPR-Cas systems (except type III) have authentication and discrimination mechanism that involves coordinated recognition of a short sequence, called the protospacer adjacent motif (PAM), by both the adaptation and interference machinery. The presence of a PAM proximal to the acquired spacer and targeted protospacer and its absence in the CRISPR array facilitates robust immunity while averting auto-immune targeting of the CRISPR array. 13

https://reasonandscience.catsboard.com

117Perguntas .... - Page 5 Empty Re: Perguntas .... Fri Sep 09, 2022 7:36 am

Otangelo


Admin

An RNA world could not explain the origin of the genetic code
Susan Lindquist (2010): An RNA-only world could not explain the emergence of the genetic code, which nearly all living organisms today use to translate genetic information into proteins. The code takes each of the 64 possible three-nucleotide RNA sequences and maps them to one of the 20 amino acids used to build proteins. 69

Without amino acids, there could not be an assignment. 64 trinucleotide codons are assigned to 20 amino acids. Both had to be present, in order for this assignment to occur. But having both would not be enough either. In reality, the entire system had to be created from the get-go, fully developed, and operational from the beginning. That led Fujio Egami (1981) to present a " working hypothesis on the interdependent genesis of nucleotide bases, protein amino acids, and the primitive genetic code: the primitive genetic code was dependent upon the concentration of different nucleotide bases and amino acids coexisting in the primeval environments and upon the selective affinity between bases and amino acids." 70
 
Charles W Carter, Jr (2017): Computational and structural modeling argue that some mutual, interdependent processes embedded information into proteins and nucleic acids.

While talking about evolutionary modification, Egami was probably not aware that an interdependent system cannot be the product of evolutionary change. An interdependent system must be right all at once, right from the start. Stepwise, gradual evolution of the system is not possible, since intermediate stages would confer no advantage, nor function. Mapping nucleotides to amino acids is only possible if all players are there. 71

The RNA-peptide world
The RNA-peptide world tries to build a bridge between the replication first, and metabolism first scenarios, advancing the RNA world and combining it with catalytic peptides and primitive metabolism.

Stephen D. Fried (2022): Diverse lines of research in molecular biology, bioinformatics, geochemistry, biophysics, and astrobiology provide clues about the progression and early evolution of proteins, and lend credence to the idea that early peptides served many central prebiotic roles before they were encodable by a polynucleotide template, in a putative ‘peptide-polynucleotide stage’. 72

The presupposition is that a result of chemical prebiotic conditions permitted the emergence of activated ribonucleotides and amino acids.  The proposal hypothesizes that RNAs started to interact and get into a relationship with small peptides ( small amino acid strands) right from the beginning, rather than everything starting exclusively with RNAs, that later would transition to mutually beneficial interaction with amino acids.  In modern cells, DNA that stores the genetic data using the genetic code is transcribed into messenger RNA (mRNA), subsequently translated in the ribosome apparatus into functional amino acid sequences, which form polypeptides, and in the end, proteins. The core problem is the origin of the codon-amino acid assignment through the genetic code. The RNA-peptide world attempts to address this current state of affairs, starting with an RNA-peptide world, which constitutes the first step to arrive at the end of the current solution, where the sophisticated translation is performed through the ribosome.    

Charles W. Carter, Jr. (2015): In the RNA-world scenario, the necessary catalysts were initially entirely RNA-based and did not include genetically encoded proteins. IN the RNA-peptide world, the idea that coded peptides functioned catalytically in the early stages of the origin of life directly contradicts the second central tenet of the “RNA World” scenario.  The important distinction between this scenario and the RNA World hypothesis is that the requisite specificity is low in the initial stages of the former but unacceptably high in the latter. Low specificity processes occur with greater frequency and hence are more likely to have occurred first. The unavailability of activated amino acids was the most critical barrier to the emergence of protein synthesis. 73

Dave Speijer (2015): Wery small RNAs (versatile and stable due to base-pairing) and amino acids, as well as dipeptides, coevolved. The “RNA world” hypothesis is seen as one of the main contenders for a viable theory on the origin of life. Relatively small RNAs have catalytic power, RNA is everywhere in present-day life, the ribosome is seen as a ribozyme, and rRNA and tRNA are crucial for modern protein synthesis. However, this view is incomplete at best. The modern protein-RNA ribosome most probably is not a distorted form of a “pure RNA ribosome” evolution started out with. Though the oldest center of the ribosome seems “RNA only”, we cannot conclude from this that it ever functioned in an environment without amino acids and/or peptides. Very small RNAs (versatile and stable due to base-pairing) and amino acids, as well as dipeptides, coevolved. Remember, it is the amino group of aminoacylated tRNA that attacks peptidyl-tRNA, destroying the bond between peptide and tRNA. This activity of the amino acid part of aminoacyl-tRNA illustrates the centrality of amino acids in life. With the rise of the “RNA world” view of early life, the pendulum seems to have swung too much towards the ribozymatic part of early biochemistry. The necessary presence and activity of amino acids and peptides is in need of highlighting. We argue that an RNA world completely independent of amino acids never existed.

Indeed, I agree, that an RNA world never existed. But did an RNA-peptide world?

Speijer: The idea of an independent RNA world without oligopeptides or amino acids stabilizing structures and helping in catalysis does not seem a viable concept. On the other hand, the idea of catalytic protein existing without RNA storing the polypeptide sequences, which have catalytic activity, and organizing the production of these sequences, also does not seem a viable concept. Here we argue for a “coevolutionary” theory in which amino acids and (very small) peptides, as well as small RNAs, existed together and where their separate abilities not only reinforced each other’s survival but allowed life to more quickly climbing the ladder of complexity.

Every naturalistic approach works only from the simple to the complex in a slow, gradual manner. Even if not linear but with ups and downs, the outcome is always that there is more functional complexity at the end. That is as well Speijers proposal: "Starting with small molecules (easily) derived from prebiotic chemistry, we will try to reconstruct a possible history in which every stage of increased complexity arises from the previous more simple stage because specific nucleotide/amino acid (RNA/peptide) interactions allowed it do so." Observe how Speijer introduces teleonomy into the explanation. As if RNAs and amino acids operated or behaved with the "aim" or purpose of keeping a state of affairs, that wasn't even there. RNAs and amino acids on their own are not alive. They are molecules used in biology. But molecules have no innate drive or "urge" to keep a specific state of affairs, that would favor a future outcome, the gradual complexification that would, in the end, result in the existence of self-replicating cells.

Dennis R. Salahub (2008): We now come at a crucial and, we have to admit, somewhat theoretical juncture: coevolution is illustrated by the presumption that RNAs could not persist without peptide protection, that very short (very early) peptides were made more abundant by RNA-producing them, and that they co-evolve forming longer RNAs and peptides. This would constitute an RNA/peptide world of ribozymes and short oligopeptides. These oligopeptides had RNA protection functions (DADVDGD being the obvious ancestor sequence of the universal RNA polymerase active site sequence NADFDGD) This motif (Asn-Ala-Asp-Phe-Asp-Gly-Asp) is a specific stretch of amino acids that is central in all cellular life. RNA polymerases catalyze the transcription from DNA to mRNA. Dennis R. Salahub (2008): Most known RNA polymerases (RNAPs) share a universal heptapeptide, called the NADFDGD motif. The crystal structures of RNAPs indicate that in all cases this motif forms a loop with an embedded triad of aspartic acid residues. This conserved loop is the key part of the active site. 74

The odds to get this sequence randomly is one in 20^7 or one in 10^10, that is taking a pool of 20 selected amino acids used in life would have to be shuffled 10 billion times to get this specified functional sequence. Not forgetting, that it is incorporated in a much longer polymer sequence that also has to be functional, and embedded and working in a joint venture with other polymer subunit strands of RNA polymerase. A far fetch.

Kunnev (2018): The hypothesis assumes that ribonucleotides would polymerize leading to very short RNAs from 2 to about 40 bases. The polymerization would incorporate random sequences and random 3D structures. The process would preserve mostly stable ones. Wet-Dry cycles could facilitate the process of RNA polymerization. Compartmentalization is another important factor since most of the described events are unlikely to occur in very low concentrations. Some level of environmental separation would be expected, for example, micro-chambers out of porous surface of rocks or lipid vesicles or both. Surface adsorption might have facilitated RNA-RNA interactions, RNA-lipids interactions and some beneficial chemical reactions. Thus, clay surfaces have been shown to promote encapsulation of RNA into vesicles and grow by incorporating fatty acid supplied as micelles and can divide without dilution of their contents.  At temperatures between 1°C and to denaturation (about 55°C) temperature, short random RNA oligos would get stabilized via intra and intermolecular hybridization based on Watson-Crick base pairing, forming complexes of various 3D shape and size. Larger hybridized regions would confer greater stability and would be selected for. Highly self-complementary RNAs would be unlikely to exist, forcing intermolecular hybridization of short sequences and the emergence of complexes of several RNA oligos. The formation of RNA complexes also assumes a thermal cycle that would drive the process by sequential denaturation (~55–100°C) and re-annealing (<55°C) phases. Frequent repetition of the thermal cycle and stability selection would favor accumulation of complexes with higher degree of complementarity and higher GC content. Non-enzymatic aminoacylation between 2′ or 3′ positions of ribose and activated amino acids could occur. In addition, ribozymes capable of amino acid transfer from one RNA to another have been selected under laboratory conditions and similar molecules could have participated in aminoacylation of RNAs. Aminoacylated RNAs would be involved in complex formation, bringing some of the aminoacylated RNA 3′-ends in close proximity. This would promote peptide bond formation between two adjacent amino acids, most likely with the assistance of wet/dry natural cycles. All amino acids would have statistically equal probability to aminoacylate RNA. At that stage, any RNA molecule could be aminoacylated and could serve as a template. 75

That means, any available amino acid nearby could be involved in the reaction - inclusive amino acids not used in life, and they could be attached anywhere to the RNA molecule. There is also no restriction in regards of possible RNA configurations with any sort of nucleobases. There is no mechanism that would prevent other than the nucleobases used in life to be involved in the reactions.  It would result simply in a disordered random accumulation of RNA-peptides. 

Kunnev: We presume that following this initial stage all components of the translation system would co-evolve in a stepwise way. Specialization of ribosomal Large Subunit—LSU will start with evolution of peptidyl transferase center (PTC). The evolution of peptides to proteins would occur from small motif to domains and finally— folded proteins. 

Felix Müller (2022): The ability to grow peptides on RNA with the help of non-canonical vestige nucleosides offers the possibility of an early co-evolution of covalently connected RNAs and peptides, which then could have dissociated at a higher level of sophistication to create the dualistic nucleic acid–protein world that is the hallmark of all life on Earth. It is difficult to imagine how an RNA world with complex RNA molecules could have emerged without the help of proteins and it is hard to envision how such an RNA world transitions into the modern dualistic RNA and protein world, in which RNA predominantly encodes information whereas proteins are the key catalysts of life. 76

This story, when it comes to elucidating the trajectory from these small RNA-peptides, to fully developed proteins is very "sketchy" and superficial. This is a common modus operandi to uphold a story, that by looking closer, does not withstand scrutiny.

Charles Carter, structural biologist (2017): For life to take hold, the mystery polymer would have had to coordinate the rates of chemical reactions that could differ in speed by as much as 20 orders of magnitude. 73

Marcel Filoche (2019): Enzymes speed up biochemical reactions at the core of life by as much as 15 orders of magnitude. Yet, despite considerable advances, the fine dynamical determinants at the microscopic level of their catalytic proficiency are still elusive. Rate-promoting vibrations in the picosecond range, specifically encoded in the 3D protein structure, are localized vibrations optimally coupled to the chemical reaction coordinates at the active site. Remarkably, our theory also exposes a hitherto unknown deep connection between the unique localization fingerprint and a distinct partition of the 3D fold into independent, foldspanning subdomains that govern long-range communication. The universality of these features is demonstrated on a pool of more than 900 enzyme structures, comprising a total of more than 10,000 experimentally annotated catalytic sites. Our theory provides a unified microscopic rationale for the subtle structure-dynamics-function link in proteins. The intricate networks of metabolic cascades that power living organisms ultimately rest on the exquisite ability of enzymes to increase the rate of chemical reactions by many orders of magnitude. Although many molecular machines contain intrinsically disordered domains, the 3D fold is central to enzyme functioning. In particular, increasing evidence is accumulating in the literature in favor of the existence of specific fold-encoded motions believed to govern the relevant collective coordinate(s) that are coupled to the chemical transformation. These motions typically correspond to localized vibrations of the protein scaffold that contribute to the catalytic reaction, i.e., modes that, if impeded, would lead to a deterioration of the catalytic efficiency. 77

Mathieu E. Rebeaud (2021): The more the function of a machine depends on its precise setup and arrangement respecting very limited tolerances, the more efforts have to be undertaken to achieve the required precision, demanding engineering solutions where nothing can be left to chance. That is precisely the case with proteins. There is an extraordinarily limited tolerance upon which proteins have to be engineered and designed, a requirement to achieve the necessary catalytic functions. That sets the bar for the cause to instantiate this state of affairs very high, for which random events are entirely inadequate!! The situation becomes even worse when we consider what Mathieu E. Rebeaud described as (2021): the challenge of reaching and maintaining properly folded and functional proteomes. Most proteins must fold to their native structure in order to function, and their folding is largely imprinted in their primary amino acid sequence. However, many proteins, especially large multidomain polypeptides, or certain protein types such as all-beta or repeat proteins, tend to misfold and aggregate into inactive species that may also be toxic. Life met this challenge by evolving employing molecular chaperones that can minimize protein misfolding and aggregation, even under stressful out-of-equilibrium conditions favoring aggregation. 78

Hays S. Rye (2013): Protein folding is a spontaneous process that is essential for life, yet the concentrated and complex interior of a cell is an inherently hostile environment for the efficient folding of many proteins. Some proteins—constrained by sequence, topology, size, and function—simply cannot fold by themselves and are instead prone to misfolding and aggregation. This problem is so deeply entrenched that a specialized family of proteins, known as molecular chaperones assists in protein folding. The bacterial chaperonin GroEL, along with its co-chaperonin GroES, is probably the best-studied example of this family of protein-folding machine. 79

Chaperones do bear no function unless there are misfolded proteins, that need to be re-folded in order to function. But non-functional proteins accumulating in the cell would be toxic waste and eventually kill the cell. So this creates another chicken & egg problem. What came first: Protein synthesis, or chaperones helping proteins to fold correctly? Consider as well, that, as Jörg Martin puts it (2000): The intracellular assembly of GroEL-type chaperonins appears to be a chaperone-dependent process itself and requires functional preformed chaperonin complexes !! 80 There are machines in the cell, that help other machines to be folded correctly, and these machines are also employed to help other machines to fold in order to be able to operate properly! Amazing!

Thorsten Hugel (2020): In a living cell, protein function is regulated in several ways, including post-translational modifications (PTMs), protein-protein interaction, or by the global environment (e.g. crowding or phase separation). While site-specific PTMs act very locally on the protein, specific protein interactions typically affect larger (sub-)domains, and global changes affect the whole protein non-specifically. Herein, we directly observe protein regulation under three different degrees of localization, and present the effects on the Hsp90 chaperone system at the levels of conformational steady states, kinetics and protein function. Interestingly using single-molecule FRET, we find that similar functional and conformational steady states are caused by completely different underlying kinetics. We disentangle specific and non-specific effects that control Hsp90’s ATPase function, which has remained a puzzle up to now. Lastly, we introduce a new mechanistic concept: functional stimulation through conformational confinement. Our results demonstrate how cellular protein regulation works by fine-tuning the conformational state space of proteins. 81

Susan Lindquist (2010): Cells also require a ubiquitin-proteasome system, targeting terminally misfolded proteins for degradation, and with translocation machineries to get proteins to their proper locations. These protein folding agents constitute a large, diverse, and structurally unrelated group. Many are upregulated in response to heat and are therefore termed heat shock proteins (HSPs).  HSP90 is one of the most conserved HSPs, present from bacteria to mammals, and is an essential component of the protective heat shock response. The role of HSP90, however, extends well beyond stress tolerance. Even in nonstressed cells, HSP90 is highly abundant and associates with a wide array of proteins (known as clients) that depend on its chaperoning function to acquire their active conformations. 20% of yeast proteins are influenced by Hsp90 function, making it the most highly connected protein in the yeast genome, and GroES mediates the folding of ~10% of proteins in E. coli. 82

Short RNA-peptides, or peptides on their own, are not functional and are useless in a supposed "proto-cell" unless they have the right size and sequence, able to fold into the functional 3D conformation.  In face of this evidence, supposing and theorizing intermediate states and transitions of growing size and complexity over long periods of time until a functional state of affairs is achieved, is untenable. It opposes the evidence just described. Sophisticated exquisite mechanisms have to be instantiated from the get-go, to guarantee the right setup and folding of proteins of the full length.  Such a hypothesized transition is never to work and going to happen. These RNA-peptides would simply lay around, and then sooner or later disintegrate.  These explanations not including an intelligent agent are entirely inadequate to account for the origin of this kind of these high-tech engineering marvel implementations on a molecular scale!

George Church, Professor of Genetics, described the ribosome as "the most complicated thing that is present in all organisms". The peptidyl transferase center (PTC) is the core of the ribosome, where peptide bond formation occurs, which is a central catalytic reaction in life, where proteins are synthesized, and is as such of particular importance.  The process is so intriguingly complex, that a science paper in 2015 had to admit that: "The detailed mechanism of peptidyl transfer, as well as the atoms and functional groups involved in this process are still in limbo." 83 The PTC is a ribozyme, which means it is composed of ribosomal RNAs ( rRNAs). Francisco Prosdocimi (2020): The PTC region has been considered crucial in the understanding about the origins of life. It has been described as the most significant trigger that engendered a mutualistic behavior between nucleic acids and peptides, allowing the emergence of biological systems. The emergence of this proto-PTC is a prerequisite to couple a chemical symbiosis between RNAs and peptides. Of 1434 complete sequences of 23S ribosomal RNAs analyzed, it was demonstrated that site A2451 from the 23S rRNA, which is the catalytic site of the PTC, is essential for the peptide bond to occur, and is absolutely preserved in each and every analyzed sequence. The PTC is known to be a flexible and efficient catalyst as it is capable of recognizing different, specific substrates (20 different amino acids bind to aminoacyl-tRNAs) and polymerizing proteins at a similar rate. 84  

Sávio T. Farias (2014): Studies reveal that the PTC has a symmetrical structure comprising approximately 180 nucleotides. Molecular structure models suggest that the catalytic portion of the 23S rRNA entities of the symmetrical region possesses the common stem-elbow-stem (SES) structural motif. 85

Let's suppose that this structure would have emerged in an RNA-peptide world. Let's also not consider, that finding a functional sequence of 180 RNAs would vastly exceed the resources in sequence space, exhausting the maximum number of possible events in a universe that is 18 Billion years old (10^16 seconds) where every atom (10^80) is changing its state at the maximum rate of 10^40 times per second is 10^139. If we had such a core PTC, it would have no function whatsoever, unless all other players would be in place to perform translation from RNA to amino acids, having as well the genetic code implemented, and the entire chain from DNA to mRNA, to then coming to the events in translation. All these proposals, the RNA world, and the RNA-peptide world are based on silly pipe dreams - that they call theories when they are not more than ideas, based on fertile minds, and not results based on scientific evidence, experimentation, and tests in the lab. These are just invented scenarios - out of the need to keep an explanatory framework based on philosophical naturalism to find answers that do not require invoking a supernatural entity. All these proposals have been shown to be inadequate and doomed to failure. Biological cells are too complicated, sophisticated, integrated, and functional in order to warrant the belief that they could have originated by unguided means - the ribosome is a prime example to conclude this.


69. Susan Lindquist: HSP90 at the hub of protein homeostasis: emerging mechanistic insights 2010 Jul;11
70. F Egami: A working hypothesis on the interdependent genesis of nucleotide bases, protein amino acids, and primitive genetic code 1981 Sep;11
71. Charles W Carter, Jr: Interdependence, Reflexivity, Fidelity, Impedance Matching, and the Evolution of Genetic Coding 24 October 2017

72. Stephen D. Fried: Peptides before and during the nucleotide world: an origins story emphasizing cooperation between proteins and nucleic acids 09 February 2022
73. Charles W. Carter, Jr. What RNA World? Why a Peptide/RNA Partnership Merits Renewed Experimental Attention 23 January 2015
74. Dennis R. Salahub: Characterization of the active site of yeast RNA polymerase II by DFT and ReaxFF calculations 08 April 2008
75. Dimiter Kunnev: Possible Emergence of Sequence Specific RNA Aminoacylation via Peptide Intermediary to Initiate Darwinian Evolution and Code Through Origin of Life 2018 Oct 2;8
76. Felix Müller: A prebiotically plausible scenario of an RNA–peptide world 11 May 2022
77. Marcel Filoche: Universality of fold-encoded localized vibrations in enzymes 26 Feb 2019
78. Mathieu E. Rebeaud:  On the evolution of chaperones and cochaperones and the expansion of proteomes across the Tree of Life May 17, 2021
79. Hays S. Rye: GroEL-Mediated Protein Folding: Making the Impossible, Possible 2013 Sep 25
80. Jörg Martin: Assembly and Disassembly of GroEL and GroES Complexes 2000
81. Thorsten Hugel: Controlling protein function by fine-tuning conformational flexibility 2020 Jul 22
82. Susan Lindquist: HSP90 at the hub of protein homeostasis: emerging mechanistic insights 2010 Jul;11
83. Hadieh Monajemi: The P-site A76 2′-OH acts as a peptidyl shuttle in a stepwise peptidyl transfer mechanism 2015
84. Francisco Prosdocimi: The Ancient History of Peptidyl Transferase Center Formation as Told by Conservation and Information Analyses 2020 Aug 5
85. Sávio T. Farias: Origin and evolution of the Peptidyl Transferase Center from proto-tRNAs 2014

https://reasonandscience.catsboard.com

118Perguntas .... - Page 5 Empty Re: Perguntas .... Fri Sep 09, 2022 8:17 am

Otangelo


Admin

Chapter 7

https://reasonandscience.catsboard.com/t2809-on-the-origin-of-life-by-the-means-of-an-intelligent-designer#9369

Biosemiotic information
So far, I have dealt mostly with the physical aspect of life and the origin of the basic building blocks. In this chapter, we will give a closer look in regards to a fundamental and essential aspect of life: The information stored in biomolecules. Life is more than physics and chemistry. In a conversation with J.England, Paul Davies succinctly described life as Chemistry + information 1. Witzany (2015) gave a similar description: "Life is physics and chemistry and communication. 2. Its even more than just information. Life employs advanced languages, analogous to human languages.

Paul Davies (2013): Chemistry is about substances and how they react, whereas biology appeals to concepts such as information and organization. Informational narratives permeate biology. DNA is described as a genetic "database", containing "instructions" on how to build an organism. The genetic "code" has to be "transcribed" and "translated" before it can act. And so on. If we cast the problem of life's origin in computer jargon, attempts at chemical synthesis focus exclusively on the hardware – the chemical substrate of life – but ignore the software – the informational aspect. To explain how life began we need to understand how its unique management of information came about. In the 1940s, the mathematician John von Neumann compared life to a mechanical constructor, and set out the logical structure required for a self-reproducing automaton to replicate both its hardware and software. But Von Neumann's analysis remained a theoretical curiosity. Now a new perspective has emerged from the work of engineers, mathematicians and computer scientists, studying the way in which information flows through complex systems 3

SUNGCHUL JI (2006): Biological systems and processes cannot be solely accounted for based on the laws of physics and chemistry. They require in addition the principles of semiotics, the science of symbols and signs, including linguistics. It was Von Neumann recognizing first the interrelationship required for self-replication: symbol-matter complementarity. Linguistics provides a fundamental principle to account for the structure and function of the cell. Cell language has counterparts to 10 of the 13 design features of human language characterized by Hockett and Lyon. 4

Cells are information-driven factories
Specified complex information observed in biomolecules dictates and directs the making of irreducible complex molecular machines, robotic molecular production lines, and chemical cell factories. In other words: Cells have a codified description of themselves in digital form stored in genes and have the machinery to transform that blueprint through information transfer from genotype to phenotype, into an identical representation in analog 3D form, the physical 'reality' of that description.  No law in physics or in chemistry, is known to specify that A should represent, or be assigned to mean B. The cause leading to a machine’s and factory's functionality has only been found in the mind of the engineer and nowhere else. 

Paul Davies (1999): How did stupid atoms spontaneously write their own software … ? Nobody knows … … there is no known law of physics able to create information from nothing. 5

Timothy R. Stout (2019): A living cell may be viewed as an information-driven machine. 6

David L Abel (2005): An algorithm is a finite sequence of well-defined, computer-implementable instructions. Genetic algorithms instruct sophisticated biological organization. A linear, digital, cybernetic string of symbols representing syntactic, semantic, and pragmatic prescription.  Genes are not analogous to messages; genes are messages. Genes are literal programs. They are sent from a source by a transmitter through a channel.   Prescriptive sequences are called "instructions" and "programs." They are algorithmically complex sequences. They are cybernetic. 7

G. F. Joyce (1993): A blueprint cannot produce a car all by itself without a factory and workers to assemble the parts according to the instructions contained in the blueprint; in the same way, the blueprint contained in RNA cannot produce proteins by itself without the cooperation of other cellular components which follow the instructions contained in the RNA. 8

Claus Emmeche (1991): Biological systems start from the (digital) axioms and definitions and develop an analogic three-dimensional geometry: an instance of the morphology of life. 9

Is the claim that DNA stores information just a metaphor? 
There has been a long-standing dispute: Is DNA a code? Does DNA store information in a literal sense or is it just a metaphor?  Many have objected and claimed that DNA or its information content can be described in a metaphorical sense storing information, using a code, but not literally. Some have also claimed that DNA is just chemistry. That has raised a lot of confusion.

Sergi Cortiñas Rovira (2008): The most popular metaphor is the one of information (DNA = information). It is an old association of ideas that dates back to the origins of genetics, when research was carried out into the molecule (initially thought to be proteins) that should have contained the information to duplicate cells and organisms. In this type of popularisation model, DNA was identified with many everyday-use objects able to store information: a computer file of living beings, a database for each species, or a library with all the information about an individual. To Dawkins, the human DNA is a “user guide to build a living being” or “the architect’s designs to build a building”. 10

Massimo Pigliucci (2010):Genes are often described by biologists using metaphors derived from computational science: they are thought of as carriers of information, as being the equivalent of ‘‘blueprints’’ for the construction of organisms. Modern proponents of Intelligent Design, the latest version of creationism, have exploited biologists’ use of the language of information and blueprints to make their spurious case. . In this article we illustrate how the use of misleading and outdated metaphors in science can play into the hands of pseudoscientists. Thus, we argue that dropping the blueprint and similar metaphors will improve both the science of biology and its understanding by the general public.  We will see that analogies between living organisms and machines or programs (what we call ‘‘machine-information metaphors’’) are in fact highly misleading in several respects.

This is the claim. How does Pigliucci justify his accusation? He continues:

‘‘direct encoding systems’’, such as human-designed software, suffer from ‘‘brittleness’’, that is they break down if one or a few components stop working, as a result of the direct mapping of instructions to outcomes. If we think of living organisms as based on genetic encoding systems—like blueprints—we should also expect brittleness at the phenotypic level which, despite the claims of creationists and ID supporters that we have encountered above, is simply not observed.  Indeed, the fact that biological organisms cannot possibly develop through a type of direct encoding of information is demonstrated by calculations showing that the gap between direct genetic information (about 30,000 protein-coding genes in the human genome) and the information required to specify the spatial position and type of each cell in the body is of several orders of magnitude. Where does the difference come from? An answer that is being explored successfully is the idea that the information that makes development possible is localized and sensitive (as well as reactive) to the conditions of the immediate surroundings. In other words, there is no blueprint for the organism, but rather each cell deploys genetic information and adjusts its status to signals coming from the surrounding cellular environment, as well as from the environment external to the organism itself. 11

The answer to this claim is a resounding no. What defines organismal architecture and body plans, or phenotypic complexity, anatomical novelty, as well as the ability of adaptation, is preprogrammed prescribed instructional complex information encoded through ( at least ) 33 variations of genetic codes, and 45 epigenetic codes, and complex communication networks using signaling that act on a structural level in an integrated interlocked fashion, which is pre-programmed do respond to nutrition demands, environmental cues, control reproduction, homeostasis, metabolism, defense systems, and cell death. So the correct answer is, that the phenomena described by Pigliucci, and the fact that genes alone do not explain phenotype, is not explained by denying that genes store literally information, but that even more prescribing, instructional information is in operation, also on an epigenetic level. Pigliucci's claims are entirely misleading in the opposite direction of the truth.    

Pigliucci argues that: phenotypes are fault-tolerant—to use software engineering terminology—because they are not brittle: giving up talk of blueprints and computer programs immediately purchases an understanding of why living organisms are not, in fact, irreducibly complex.

Agreed, the metaphor of a blueprint or computer program might be faulty or not fully up to the task to describe what goes on in biological information systems - but not because they do not describe literally the state of affairs, but because they do not fully clarify and/or describe or illustrate the sophistication, the superb information engineering feat that goes on in the living, to which what we in comparison as intelligent human agents have come up with in comparison, is pale, rudimentary, and primitive.

Richard Dawkins (2008): After the seventh minute of his speech, Dawkins admits that: Can you think of any other class of molecule, that has that property, of folding itself up, into a uniquely characteristic enzyme, of which there is an enormous repertoire, capable of catalyzing an enormous repertoire of chemical reactions, and this is in itself to be absolutely determined by a digital code. 12

Hubert Yockey (2005): Information, transcription, translation, code, redundancy, synonymous, messenger, editing, and proofreading are all appropriate terms in biology. They take their meaning from information theory (Shannon, 1948) and are not synonyms, metaphors, or analogies. 13

Barry Arrington (2013):Here’s an example of an arbitrary arrangement of signs:  DOG.  This is the arrangement of signs English speakers use when they intend to represent Canis lupus familiaris. In precise semiotic parlance, the word “dog” is a “conventional sign” for Canis lupus familiaris among English speakers.  Here, “conventional” is used in the sense of a “convention” or an agreement.  In other words, English speakers in a sense “agree” that “dog” means Canis lupus familiaris.

Now, the point is that there is nothing inherent in a dog that requires it to be represented in the English language with the letters “D” followed by “O” followed by “G.”  If the rules of the semiotic code (i.e., the English language) were different, the identical purpose could be accomplished through a different arrangement of signs.  We know this because in other codes the same purpose is accomplished with vastly different signs.  In French the purpose is accomplished with the following arrangement of signs:  C H I E N.  In Spanish the purpose is accomplished with the following arrangement of signs:  P E R R O.  In German the purpose is accomplished with the following arrangement of signs:  H U N D.

In each of the semiotic codes the purpose of signifying an animal of the species Canis lupus familiaris is accomplished through an arbitrary set of signs.  If the rules of the code were different, a different set of signs would accomplish the identical purpose.  For example, if, for whatever reason, English speakers were collectively to agree that Canis lupus familiaris should be represented by “B L I M P,” then “blimp” would accomplish the purpose of representing Canis lupus familiaris just as well as “dog.”

How does this apply to the DNA code?  The arrangement of signs constituting a particular instruction in the DNA code is arbitrary in the same way that the arrangement of signs for representing Canis lupus familiaris is arbitrary.  For example, suppose in a particular strand of DNA the arrangement “AGC” means “add amino acid X.”  There is nothing about amino acid X that requires the instruction “add amino acid  X” to be represented by  “AGC.”  If the rules of the code were different the same purpose (i.e, instructing the cell to “add amino acid  X”) could be accomplished using “UAG” or any other combination.  Thus, the sign AGC is “arbitrary” in the sense UB was using the word.

Why is all of this important to ID?  It is important because it shows that the DNA code is not analogous to a semiotic code.  It is isometric with a semiotic code.  In other words, the digital code embedded in DNA is not “like” a semiotic code, it “is” a semiotic code.  This in turn is important because there is only one known source for a semiotic code:  intelligent agency.  Therefore, the presence of a semiotic code embedded within the cells of every living thing is powerful evidence of design, and the burden is on those who would deny design to demonstrate how a semiotic code could be developed though blind chance or mechanical law or both. 14

DNA is a semantophoretic molecule (a biological macromolecule that stores genetic information). RNA and DNA are analogous to a computer hard disk. DNA monomers are joined to long strings        (like train bandwagons) made up of the four nucleobases ( Adenine, Guanine, Cytosine, and Thymine, (Uracil in RNA). The aperiodic sequence of nucleotides carries instructional information that directs the assembly and polymerization of amino acids in the ribosome, forming polymer strands that make up proteins, the molecular workers of the cell.

No one who understands the subject argues that the information stored in DNA is called so just as a “metaphor”, by means that it only ‘looks like’ coded information and information processing but is not really so. This is blatantly false. The sequence of the nucleotides stored in DNA, the trinucleotide codon "words" lined up are exactly parallel to the way that the alphabetic letters are arranged, and work in this sentence. The words that I write here have symbolic meanings that you can look up in a dictionary, and I have strung them together in a narrative sequence to tell you a story about biological information. The genetic code, each codon, have symbolic meanings that a cell (and you) can look up in a ‘dictionary’ of the genetic code table, and they are strung together in sequences that have meaning for the workings of the cell. The cell exercises true information storage, retrieval, and processing, resulting in functional proteins, required to make a living organism,  and no educated person in biology would deny it.

DNA and RNA are the hardware, and the specified complex sequence of nucleotides is the software. That information is conveyed using a genetic code, which is a set of rules, where meaning is assigned to trinucleotide codon words. The information in DNA is first transcribed to messenger RNA (mRNA) ( which acts like a post officer, sending a message from A to B ) and then translated in the ribosome. A set of three nucleotides (trinucleotides) form a codon. Life uses 64 codon "words" that are assigned or mapped to 20 ( in certain cases 22) amino acids. Origin of Life researchers are confronted with the problem of explaining the origin of the complex, specified ( or instructional assembly) information stored in DNA, and on top of that, the origin of the genetic code. These are two, often conflated, but very distinct problems, which have caused a lot of confusion, which comes from the ambiguity in using the term “genetic code”. Here is a quote from Francis Crick, who seems to be the one who coined this term: Unfortunately the phrase “genetic code” is now used in two quite distinct ways. Laymen often use it to mean the entire genetic message in an organism. Molecular biologists usually mean the little dictionary that shows how to relate the four-letter language of the nucleic acids to the twenty-letter language of the proteins, just as the Morse code relates the language of dots and dashes to the twenty-six letters of the alphabet… The proper technical term for such a translation is, strictly speaking, not a code but a cipher. In the same way, the Morse code should really be called the Morse cipher. I did not know this at the time, which was fortunate because “genetic code” sounds a lot more intriguing than “genetic cipher”.

The specification, from triplet codon to amino acid, is called a cipher. It is like a translation from one language to another. We can use for example the google translate program. We write the English word language, and the program translates it and we can get the word "Sprache", in German, which is equivalent to the word  "language" in English.  As in all translations, there must be someone or something, that is bilingual, in this case, to turn the coded instructions written in nucleic acid language into a result written in the amino-acid language. In Cells the adaptor molecule, tRNA, performs this task. One end of the tRNA mirrors the code on the codons on the messenger RNA and the other end is attached to the amino acid that is coded for.  the correct amino acid is attached to the correct tRNA by an enzyme called amino acid tRNA Syntethase.. This raises a huge - an even tougher problem concerning the coding assignments—i.e., which triplets code for which amino acids. How did these designations come about? Because nucleic-acid bases and amino acids don’t recognize each other directly but have to deal via the tRNA chemical intermediary, there is no obvious reason why particular triplets should go with particular amino acids. Other translations are conceivable. Coded instructions are a good idea, but the actual code seems to be pretty arbitrary. Perhaps it is simply a frozen accident, a random choice that just locked itself in, with no deeper significance? That is what Crick proposed. How could that not be called just an "ad-hoc" assertion, face no other reasonable or likely explanation? - unless, of course, we permit the divine into the picture.

One deals with sequence specificity, and the other with mapping or assigning the meaning of one biomolecule to another ( the codon TTA ( Adenine - Adenine - Thymine) is assigned to the amino acid Leucine (Leu).  That means, that when an mRNA strand with the codon sequence TTA enters the ribosome translation machine, specialized molecules ( tRNAs, aminoacyl tRNA synthetases, etc.) are recruited, Leucine is picked and added to the growing and elongating polymer strand that is being formed in the ribosome, that will, in the end, fold into a very specific, functional 3D configuration, and be part of a protein, which will bear a precise function in the cell. As the instructions of a floorplan or a blueprint direct the making of a machine, so does the information ( conveyed in the sequence of trinucleotide codons) direct the making of molecular machines. There is a precise 1:1 analogous situation here. But it goes further than that. Individual machines often operate in a joint venture with other machines, composing production lines, being part of a team that constructs products, that are still just intermediate products, that only later are assembled together with other intermediate products, to form a functional device of high integrated complexity. Mete data is necessary, or diverse levels of information, that operate together. DNA contains coding and non-coding genes. Non-coding genes are employed in the gene regulatory network. They dictate the timeframe upon which genes are expressed and orchestrate the spatiotemporal pattern upon which individual cells, or in multicellular organisms, the embryo develops. This is the second level of DNA information.  

Paul Davies (2013): The significant property of biological information is not its complexity, great though that may be, but the way it is organised hierarchically. In all physical systems there is a flow of information from the bottom upwards, in the sense that the components of a system serve to determine how the system as a whole behaves.  3

So why has that been such a conundrum? Because many want to avoid the design inference at all costs.  Hume wrote in the form of a dialogue between three characters. Philo, Cleanthes, and Demea. The design argument is spoken by Cleanthes in Part II of the Dialogues: The curious adapting of means to ends, throughout all nature, resembles exactly, though much exceeds, the production of human contrivance, or human design, the thought, wisdom and intelligence. Since therefore the effects resemble each other, we are led to infer, by all the rules of analogy, that the causes also resemble; and that the Author of nature is somewhat similar to the mind of man; though possesses of much large faculties, proportioned to the grandeur of the work executed. By this argument a posteriori, and by this argument alone, do we prove at once the existence of a Deity, and his similarity to human mind and intelligence. 

Does DNA store prescriptive, or descriptive information? 
One common misconception is that natural principles are just discovered, and described by us. In other words, that life is supposedly just chemistry, and we describe the state of affairs going on there.  Two cans with Coca Cola, one is normal, the other is diet. Both bear information that we can describe. We describe the information transmitted to us that one can contain Coca Cola, and the other is diet. But that does not occur naturally. A chemist invented the formula of how to make Coke, and Diet Coke, and that is not dependent on descriptive, but PREscriptive information. The same occurs in nature. We discover that DNA contains a genetic code. But the rules upon which the genetic code operates are prescriptive. The rules are arbitrary. The genetic Code is constraint to behave in a certain way. But what genes store, is information that is similarly organized to a library (the genome), which stores many books (genes) each containing either the instructions, the know-how to make proteins, or there is the non-coding section, which stores regulatory elements (promoters, enhancers, silencers, insulators, MicroRNAs (miRNAs), etc. that works like a program, directing/controlling the operation of the cell, like when a gene has to be expressed (when the information in a gene has to be transcribed and translated). This is information that prescribes how to assemble and operate the cell factory, so it's prescriptive information.

How exactly is information related to biology?
It is related in several ways. I will address two of them. DNA contains information in the sense that the nucleotides sequences or arrangements of characters instruct how to produce a specific amino acid chain that will fold into functional form. DNA base sequences convey instructions. They perform functions and produce specific effects. Thus, they not only possess statistical information but instructional assembly information.

Instructional assembly information
Paul Davies Origin of Life (2003), page 18: Biological complexity is instructed complexity or, to use modern parlance, it is information-based complexity. Inside each and every one of us lies a message. Decrypted, the message contains instructions on how to make a human being. Inside each and every one of us lies a message. It is inscribed in an ancient code, its beginnings lost in the mists of time. Decrypted, the message contains instructions on how to make a human being.  The message isn't written in ink or type, but in atoms, strung together in an elaborately arranged sequence to form DNA, short for deoxyribonucleic acid. It is the most extraordinary molecule on Earth. Although DNA is a material structure, it is pregnant with meaning. The arrangement of the atoms along the helical strands of your DNA determines how you look and even, to a certain extent, how you feel and behave. DNA is nothing less than a blueprint, or more accurately an algorithm or instruction manual, for building a living, breathing, thinking human being. We share this magic molecule with almost all other life forms on Earth. From fungi to flies, from bacteria to bears, organisms are sculpted according to their respective DNA instructions. Each individual's DNA differs from others in their species (with the exception of identical twins), and differs even more from that of other species. But the essential structure – the chemical make-up, the double helix architecture – is universal. 15

Tan, Change; Stadler, Rob (2020): In DNA and RNA, no chemical or physical forces impose a preferred sequence or pattern upon the chain of nucleotides. In other words, each base can be followed or preceded by any other base without bias, just as the bits and bytes of information on a computer are free to represent any sequence without bias. This characteristic of DNA and RNA is critical—in fact, essential—for DNA and RNA to serve as unconstrained information carriers. However, this property also obscures any natural explanation for the information content of life—the molecules themselves provide no explanation for the highly specific sequence of nucleotides required to code for specific biologic functions. Only two materialistic explanations have been proposed for the information content of life: fortuitous random arrangements that happen to be functional or the combination of replication, random mutations, and natural selection to improve existing functionality over time. 16

Perguntas .... - Page 5 Source11

A section of Alosa pseudoharengus (a fish) mitochondrion DNA. This reference sequence continues on all the way up to 16,621 “letters.” Each nucleotide is a physical symbol vehicle in a material symbol system. The specific selection of symbols and their syntax (particular sequencing) prescribes needed three-dimensional molecular structures and metabolic cooperative function prior to natural selection’s participation. (Source: http://www.genome.jp/dbget-bin/www_bget?refseq+NC_009576).

David L. Abel (2009): The figure above  shows the prescriptive coding of a section of DNA. Each letter represents a choice from an alphabet of four options. The particular sequencing of letter choices prescribes the sequence of triplet codons and ultimately the translated sequencing of amino acid building blocks into protein strings. The sequencing of amino acid monomers (basically the sequencing of their R groups) determines minimum Gibbs-free-energy folding into secondary and tertiary protein structure. It is this three-dimensional structure that provides “lock-and-key” binding fits, catalysis, and other molecular machine formal functions. The sequencing of nucleotides in DNA also prescribes highly specific regulatory micro RNAs and other epigenetic factors. Thus linear digital instructions program cooperative and holistic metabolic proficiency. 17

George M Church (2012): DNA is among the densest and stable information media known. The development of new technologies in both DNA synthesis and sequencing make DNA an increasingly feasible digital storage medium. We developed a strategy to encode arbitrary digital information in DNA, wrote a 5.27-megabit book using DNA microchips, and read the book by using next-generation DNA sequencing. 18

Peter R. Wills (2016): The biological significance of DNA lies in the role it plays as a carrier of information, especially across generations of reproducing organisms, and within cells as a coded repository of system specification and stability. 19

David L Abel (2005): Genes are not analogous to messages; genes are messages. 7

Leroy Hood: (2003): The value of having an entire genome sequence is that one can initiate the study of a biological system with a precisely definable digital core of information for that organism — a fully delineated genetic source code. Genes that encode the protein and RNA molecular machines of life, and the regulatory networks that specify how these genes are expressed in time, space and amplitude. 20

Information related to the genetic code
Information is divided into five levels. These can be illustrated with a STOP sign.
The first level, statistics, tells us the STOP sign is one word and has four letters. It is related to the improbability of a sequence of symbols (or the uncertainty to obtain it).
The second level, syntax, requires the information to fall within the rules of grammar such as correct spelling, word, and sentence usage. The word STOP is spelled correctly.
The third level, semantics, provides meaning and implications. The STOP sign means that when we walk or drive and approach the sign we are to stop moving, look for traffic and proceed when it is safe.
The fourth level, pragmatics, is the application of the coded message. It is not enough to simply recognize the word STOP and understand what it means; we must actually stop when we approach the sign.
The fifth level, apobetics, is the overall purpose of the message. The STOP signs are placed by our local government to provide safety and traffic control.

The code in DNA completely conforms to all five of these levels of information.

Perry Marshall, Evolution 2.0:The alphabet (symbols), syntax (grammar), and semantics (meaning) of any communication system must be determined in advance before any communication can take place. Otherwise, you could never be certain that what the transmitter is saying is the same as what the receiver is hearing. It’s like when you visit a Russian website and your browser doesn’t have the language plug-in for Russian. The text just appears as a bunch of squares. You would never have any idea if the Russian words were spelled right. When a message’s meaning is not yet decided, it requires intentional action by conscious agents to reach a consensus. The simple process of creating a new word in English, like a blog, requires speakers who agree on the meaning of the other words in their sentences. Then they have to mutually agree to define the new word in a specific way. Once a word is agreed upon, it is added to the dictionary. The dictionary is a decode table for the English language. Even if noise might occasionally give you a real word by accident, it could never also tell you what that word means. Every word has to be defined by mutual agreement and used in the correct context in order to have meaning. 21

Okay, you probably wish you could see an example, of how that works in the cell, right? Let's make an analogy. Let's suppose you have a recipe to make spaghetti with a special tomato sauce written on a word document saved on your computer.  You have a Japanese friend and only communicate with him using the google translation program. Now he wants to try out that recipe and asks you to send him a copy. So you write an email, annex the word document, and send it to him. When he receives it, he will use google translate and get the recipe in Japanese, written in kanji, in logographic Japanese characters which he understands. With the information at hand, he can make the spaghetti with that fine special tomato sauce exactly as described in the recipe. In order for that communication to happen, you use at your end 26 letters from the alphabet to write the recipe, and your friend has 2,136 kanji characters that permit him to understand the recipe in Japanese. Google translate does the translation work.

While the recipe is written on a word document saved on your computer, in the cell, the recipe (instructions or master plan) for the construction of proteins which are the life essential molecular machines, veritable working horses, is written in genes through DNA. While you use the 26 letters of the alphabet to write your recipe, the Cell uses DNA, deoxyribonucleotides, and four monomer "letters". In kanji there are 2136 characters, the alphabet uses 26,   computer codes being binary, use 0,1. The language of DNA is digital, but not binary. Where binary encoding has 0 and 1 to work with (2 - hence the 'binary)  DNA uses four different organic bases, which are adenine (A), guanine (G), cytosine (C), and thymine (T). The way by which DNA stores the genetic information consists of codons equivalent to words, consisting of an array of three DNA nucleotides. These triplets form "words". While you used sentences to write the spaghetti receipt, the equivalent sentences are called genes written through codon "words". With four possible nucleobases, the three nucleotides can give 4^3 = 64 different possible "words" (tri-nucleotide sequences). In the standard genetic code, three of these 64 codons (UAA, UAG, and UGA) are stop codons. There has to be a mechanism to extract the information in the genome, and send it to the ribosome,  the factory that makes proteins, which is at another place in the cell, free-floating in the cytoplasm. The message contained in the genome is transcribed by a very complex molecular machine, called RNA polymerase. It makes a transcript, a copy of the message in the genome, and that transcript is sent to the Ribosome. That transcript is called messenger RNA or typically mRNA.  In communications and information processing, code is a system of rules to convert information—such as assigning the meaning of a letter, or word, into another form, ( as another word, letter, etc. ) In translation, 64 genetic codons are assigned to 20 amino acids. It refers to the assignment of the codons to the amino acids, thus being the cornerstone template underlying the translation process. Assignment means designating, ascribing, corresponding, and correlating. The Ribosome does basically what google translate does. But while google translate just gives the receipt in another language, and our Japanese friend still has to make the spaghettis,  the Ribosome actually makes in one step the end product, which are proteins.  Imagine the brainpower involved in the entire process from inventing the receipt to making spaghetti, until they are on the table of your Japanese friend. What is involved?

1. Your imagination of the recipe
2. Inventing an alphabet, a language
3. Inventing the medium to write down the message
4. Inventing the medium to store the message
5. Storing the message in the medium
6. Inventing the medium ( the machine) to extract the message
7. Inventing the medium to send the message
8. Inventing the second language (Japanese)
9. Inventing the translation code/cipher from your language to Japanese
10. Making the machine that performs the translation
11. Programming the machine to know both languages, to make the translation
12. Making ( performing) the translation
12. Making of the spaghettis on the other end using the receipt in Japanese  

1. Creating a recipe to make a cake is always a mental process. Creating a blueprint to make a machine is always a mental process.
2. To suggest that a physical process can create instructional assembly information, a recipe or a blueprint, is like suggesting that a throwing ink on paper will create a blueprint. It is never going to happen!  
3. Physics and chemistry alone do not possess the tools to create a concept, or functional complex machines made of interlocked parts for specific purposes
4. The only cause capable of creating conceptual semiotic information is a conscious intelligent mind.
5. DNA stores codified information to make proteins, and cells, which are chemical factories in a literal sense.

Information is not physical
Robert Alicki (2014): Information is a disembodied abstract entity independent of its physical carrier. ”Information is always tied to a physical representation. It is represented by engraving on a stone tablet, a spin, a charge, a hole in a punched card, a mark on paper, or some other equivalent. Information is neither classical nor quantum, it is independent of the properties of physical systems used for its processing. 3

Paul C. W. Davies (2013): The key distinction between the origin of life and other ‘emergent’ transitions is the onset of distributed information control, enabling context-dependent causation, where an abstract and non-physical systemic entity (algorithmic information) effectively becomes a causal agent capable of manipulating its material substrate. Biological information is functional due to the right sequence. There have been a variety of terms employed for measuring functional biological information — complex and specified information (CSI), Functional Sequence Complexity (FSC) Instructional complex Information.  I like the term instructional because it defines accurately what is being done, namely instructing the right sequence of amino acids to make proteins, and also the sequence of messenger RNA, which is used for gene regulation, and a variety of yet unexplored function. Another term is prescriptive information (PI). It describes as well accurately what genes do. They prescribe how proteins have to be assembled. But it smuggles in as well a meaning, which is highly disputed between proponents of intelligent design, and unguided evolution. Prescribing implies that an intelligent agency preordained the nucleotide sequence in order to be functional. 22

David L Abel (2012): Biological information frequently manifests its “meaning” through instruction or actual production of formal bio-function. Such information is called Prescriptive Information (PI). PI programs organize and execute a prescribed set of choices. Closer examination of this term in cellular systems has led to a dichotomy in its definition suggesting both prescribed data and prescribed algorithms are constituents of PI.  In addition to algorithm execution, there needs to be an assembly algorithm. Any manufacturing engineer knows that nothing (in production) is built without plans that precisely define orders of operations to properly and economically assemble components to build a machine or product. There must be by necessity, an order of operations to construct biological machines. This is because biological machines are neither chaotic nor random, but are functionally coherent assemblies of proteins/RNA elements. An Algorithm is a set of rules or procedures that precisely defines a finite sequence of operations. These instructions prescribe a computation or action that, when executed, will proceed through a finite number of well-defined states that leads to specific outcomes.  One of the greatest enigmas of molecular biology is how codonic linear digital programming is not only able to anticipate what the Gibbs free energy folding will be, but it actually prescribes that eventual folding through its sequencing of amino acids. Much the same as a human engineer, the nonphysical, formal PI instantiated into linear-digital codon prescription makes use of physical realities like thermodynamics to produce the needed globular molecular machines. 23

An algorithm is a finite sequence of well-defined, computer-implementable instructions resulting in precise intended functions. A prescriptive algorithm in a biological context can be described as performing control operations using rules, axioms, and coherent instructions. These instructions are performed, using a linear, digital, cybernetic string of symbols representing syntactic, semantic, and pragmatic prescriptive information. Cells host algorithmic programs for cell division, cell death, enzymes pre-programmed to perform DNA splicing, programs for dynamic changes of gene expression in response to the changing environment. Cells use pre-programmed adaptive responses to genomic stress,  pre-programmed genes for fetal development regulation, temporal programs for genome replication, pre-programmed animal genes dictating behaviors including reflexes and fixed action patterns, pre-programmed biological timetables for aging etc. A programming algorithm is like a recipe that describes the exact steps needed to solve a problem or reach a goal. We've all seen food recipes - they list the ingredients needed and a set of steps for how to make a meal. Well, an algorithm is just like that.  A programming algorithm describes how to do something, and it will be done exactly that way every time.

Albert Voie (2006): Life expresses both function and sign systems. Due to the abstract character of function and sign systems, life is not a subsystem of natural laws. This suggests that our reason is limited in respect to solving the problem of the origin of life and that we are left accepting life as an axiom. Memory-stored controls transform symbols into physical states. Von Neumann made no suggestion as to how these symbolic and material functions in life could have originated. He felt, "That they should occur in the world at all is a miracle of the first magnitude."

Perguntas .... - Page 5 Von_ne11

No natural law restricts the possibility-space of a written or spoken text. Languages are indeed abstract, and non-physical, and it is really easy to see that they are subsystems of the mind and belong to another category of phenomena than subsystems of the laws of nature, such as molecules. Another similar set of subsystems is functional objects. In the engineering sense, a function is a goal-oriented result coming of an intelligent entity.  The origin of a machine cannot be explained solely as a result of physical or chemical events. Machines can go wrong and break down - something that does not happen to laws of physics and chemistry. In fact, a machine can be smashed and the laws of physics and chemistry will go on operating unfailingly in the parts remaining after the machine ceases to exist. Engineering principles create the structure of the machine which harnesses energy based on the laws of physics for the purposes the machine is designed to serve. Physics cannot reveal the practical principles of design or coordination which are the structure of the machine. The cause leading to a machine’s functionality is found in the mind of the engineer and nowhere else.

In life, there is interdependency between biological sign systems, data, and the construction, machine assembly, and operation, that is directed by it. The abstract sign-based genetic language stores the abstract information necessary to build functional biomolecules. 

Von Neumann believed that life was ultimately based on logic. Von Neumann’s abstract machine consisted of two central elements: a Universal Computer and a Universal Constructor. The Universal Constructor builds another Universal Constructor based on the directions contained in the Universal Computer. When finished, the Universal Constructor copies the Universal Computer and hands the copy to its descendant. As a model of a self-replicating system, it has its counterpart in life where the Universal Computer is represented by the instructions contained in the genes, while the Universal Constructor is represented by the cell and its machinery. 24

On the one side, there is the computer storing the data, on the other, the construction machines. The construction machines build/replicate and make another identical construction machine, based on the data stored in the computer. Once finished, the construction machines copy the computer and the data and hand it down to the descendant.  As a model of a self-replicating system, it has its counterpart in life where the computer is represented by the instructions contained in the genes, while the construction machines are represented by the cell and its machinery that transcribes, translates, and replicates the information stored in genes.  RNA polymerase transcribes, and the ribosome translates the information stored in DNA and produces a Fidel reproduction of the cell and all the machinery inside of the cell. Once done, the genome is replicated, and handed over to the descendant replicated cell, and the mother cell has produced a daughter cell.   The entire process of self-replication is data-driven and based on a sequence of events that can only be instantiated by understanding and knowing the right sequence of events. There is an interdependence of data and function. The function is performed by machines that are constructed based on the data instructions. The cause to instantiate such a sequence of events and state of affairs has never been shown to be otherwise, then a mind. 

Timothy R. Stout (2019): A body of information is stored in a genome within the cell. Cellular “hardware” then reads, decodes, and uses the information. The information drives the operation in a manner analogous to how software in a computer drives computer hardware. In both cases, proper information needs to be available for use by functioning hardware which in turn is controlled by it. The gradual step-by-step developmental processes characteristic of evolution are not compatible with the first appearance of a computer. There is a minimum amount of functioning information required for computer operation. The information and hardware must interact with each other in a very intricate, intertwined manner. The minimum amounts required for each are staggeringly complex. In industry, a computer needs to be designed before it is fabricated. The probability is virtually zero for an unguided, random combination of logic gates to form a functioning computer, complete with internal memory, memory address logic, data registers, a central processing unit, data input, and output components, control signal inputs, and outputs, and connections between internal components. Beyond this, there are no known means for random combinations of logic to generate a body of information tailored to work with a specific form of computer hardware. There are no known means for such information to be stored for use by the computer and to be accessible by it. Computers are the product of deliberate intelligent action, not random processes. Since computers and living cells are both information-driven machines, this suggests the possibility that the difficulties facing initial computer fabrication could also apply to initial cell fabrication. If this suggestion proves valid, it poses serious issues concerning the adequacy of natural processes being adequate to account for the information-driven physical life we see around us. There is another aspect of this problem that has particular significance. In industry, both computers and processer-driven applications ranging from microwave ovens to self-driving automobiles start with a predefined system specification.

Typically, this will define an overall task for the machine to accomplish. Some tasks may be done in hardware or software. Typically, the software is cheaper and more readily adapts to a wide range of possible variations in operation. However, hardware is faster and requires minimal input to trigger its operation. The specification determines whether a particular task is to be done in hardware or software. It also determines how the software and the hardware interact with each other to accomplish a given task. A major objective of the system specification is to define a software specification describing what the software needs to do and a hardware specification defining what the hardware needs to do. In industry, separate hardware and software design engineering teams then design a product meeting their specified goals. In an ideal world, the system specification will be so complete and accurate and the proficiency of the software and hardware engineers in implementing their specifications will likewise be so complete and accurate that the system will work the first time the power is turned on and the two are brought together. In real life, this is not typical.

If a living cell is more complex than a physical computer and if debug of computer design typically is an extremely difficult task, this suggests that a living cell must have its origin in a being so intelligent that it can anticipate all of the behaviors of the various arrangements of building block amino acids and nucleotides. The first cell must appear in working form without needing debug. This is particularly the case since special test equipment for identifying design problems would not be available in a prebiotic scenario. Although mutation and natural selection can have use in adapting an already living cell to changing environmental conditions, they appear inadequate to meet the requirements of initial cellular appearance. Slight modification of an existing, already working design is trivial compared to the difficulties of implementing an initial design. During my experience as an industrial design engineer, I was active on many design projects that were canceled for various reasons. I have worked on designs that were ready for a prototype to be built, but funds were not provided to make it. There is a difference between having a paper design, no matter how good it might be, and actually having resources to build the product. It is insufficient for an intelligent being to design a living cell capable of survival in the environment in which it will appear. Since the design specification appears outside of natural law, its physical implementation must also take place outside of natural law. Natural processes have no ability to implement non-material plans. The actual appearance on Earth of a living cell required an intelligent being to work outside of natural law in order to arrange molecules and atoms into dynamic relationships with each other in accordance with a predefined specification, one which was developed through intelligence and apart from natural processes. There is a word we use to call an extremely intelligent being who can move molecules and atoms into predetermined, dynamic relationships at will—God. This paper has plausibly demonstrated how unsuppressed, unbiased scientific observation leads to a Being with the characteristics of God as the source of the physical life we see around us. 25

Instructional assembly information has always a mental origin
David L Abel: (2009): Even if multiple physical cosmoses existed, it is a logically sound deduction that linear-digital genetic instructions using a representational material symbol system (MSS) cannot be programmed by the chance and/or fixed laws of physicodynamics. This fact is not only true of the physical universe but would be just as true in any imagined physical multiverse. Physicality cannot generate non-physical Prescriptive Information (PI). Physicodynamics cannot practice formalisms (The Cybernetic Cut). Constraints cannot exercise formal control unless those constraints are themselves chosen to achieve formal function. 26

Edward J. Steele (2018): The transformation of an ensemble of appropriately chosen biological monomers (e.g. amino acids, nucleotides) into a primitive living cell capable of further evolution appears to require overcoming an information hurdle of super astronomical proportions, an event that could not have happened within the time frame of the Earth except, we believe, as a miracle. All laboratory experiments attempting to simulate such an event have so far led to dismal failure. It would thus seem reasonable to go to the biggest available “venue” in relation to space and time. A cosmological origin of life thus appears plausible and overwhelmingly likely to us. 27

Katarzyna Adamala (2014):  There is a conceptual problem, namely the emergence of specific sequences among a vast array of possible ones, the huge “sequence space”, leading to the question “why these macromolecules, and not the others?” One of the main open questions in the field of the origin of life is the biogenesis of proteins and nucleic acids as ordered sequences of monomeric residues, possibly in many identical copies. The first important consideration is that functional proteins and nucleic acids are chemically speaking copolymers, i.e., polymers formed by several different monomeric units, ordered in a very specific way. 

Attempts to obtain copolymers, for instance by a random polymerization of monomer mixtures, yield a difficult-to-characterize mixture of all different products. To the best of our knowledge, there is no clear approach to the question of the prebiotic synthesis of macromolecules with an ordered sequence of residues. The copolymeric nature of proteins and nucleic acid challenges our understanding of the origin of life also from a theoretical viewpoint. The number of all possible combinations of the building blocks (20 amino acids, 4 nucleotides) forming copolymers of even moderate length is ‘astronomically’ high, and the total number of possible combinations it is often referred as the “sequence space”. Simple numerical considerations suggest that the exhaustive exploration of the sequence spaces, both for proteins and nucleic acid, was physically not possible in the early Universe, both for lack of time and limited chemical material. There are no methods described in the literature to efficiently generate long polypeptides, and we also lack a theory for explaining the origin of some macromolecular sequences instead of others.

The theoretical starting point is the fact that the number of natural proteins on Earth, although apparently large, is only a tiny fraction of all the possible ones. Indeed, there are thought to be roughly 10^13 proteins of all sizes in extant organisms. This number, however, is negligible when compared to the number of all theoretically possible different proteins. The discrepancy between the actual collection of proteins and all possible ones stands clear if one considers that the number of all possible 50-residues peptides that can be synthesized with the standard 20 amino acids is 20^50, namely 10^65. Moreover, the number of theoretically possible proteins increases with length, so that the related sequence space is beyond contemplation; in fact, if we take into account the living organisms, where the average length of proteins is much greater, the number of possible different proteins becomes even bigger. The difference between the number of possible proteins (i.e. the sequence space) and the number of those actually present in living organisms is comparable, in a figurative way, to the difference that exists between a drop of water and an entire Ocean. This means that there is an astronomically large number of proteins that have never been subjected to the long pathway of natural evolution on Earth: the “Never Born Proteins” (NBPs). 28



Last edited by Otangelo on Fri Sep 09, 2022 9:35 am; edited 1 time in total

https://reasonandscience.catsboard.com

119Perguntas .... - Page 5 Empty Re: Perguntas .... Fri Sep 09, 2022 8:17 am

Otangelo


Admin

1. Paul Davies & Jeremy England:  The Origins of Life: Do we need a new theory for how life began? at 15:30 Life = Chemistry + information  Jun 25, 2021
2. Guenther Witzany: [url=https://pubmed.ncbi.nlm.nih.gov/25557438/#:~:text=Manfred Eigen extended Erwin Schroedinger's,quasispecies and hypercycles%2C which have]Life is physics and chemistry and communication[/url] 2014 Dec 31
3. Paul Davies: The secret of life won't be cooked up in a chemistry lab
4. SUNGCHUL JI: The Linguistics of DNA: Words, Sentences, Grammar, Phonetics, and Semantics  06 February 2006
5. Paul Davies: Life force 18 September 1999
6. Timothy R. Stout: Information-Driven Machines and Predefined Specifications: Implications for the Appearance of Organic Cellular Life April 8, 2019
7. David L Abel: Three subsets of sequence complexity and their relevance to biopolymeric information 11 August 2005
8. G. F. Joyce, L. E. Orgel: Prospects for Understanding the Origin of the RNA World 1993
9. Claus Emmeche: FROM LANGUAGE TO NATURE - the semiotic metaphor in biology 1991
10. Sergi Cortiñas Rovira: Metaphors of DNA: a review of the popularisation processes  21 March 2008
11. Massimo Pigliucci:  Why Machine-Information Metaphors are Bad for Science and Science Education 2010
12. Richard Dawkins on the origins of life (1 of 5) Sep 29, 2008
13. Hubert P. Yockey: Information Theory, Evolution, and the Origin of Life 2005
14. Barry Arrington A Dog Is A Chien Is A Perro Is A Hund February 11, 2013
15. Paul Davies: The Origin of Life January 31, 2003
16. Change Laura Tan, Rob Stadler: The Stairway To Life: An Origin-Of-Life Reality Check  March 13, 2020 
17. David L. Abel: The Capabilities of Chaos and Complexity 9 January 2009
18. George M Church: Next-generation digital information storage in DNA 2012 Aug 16
19. Peter R. Wills: DNA as information 13 March 2016
20. Leroy Hood: The digital code of DNA 2003 Jan 23
21. P.Marshall:  Evolution 2.0:Breaking the Deadlock Between Darwin and Design September 1, 2015
22. Paul C. W. Davies: The algorithmic origins of life 06 February 2013
23. David L Abel: Dichotomy in the definition of prescriptive information suggests both prescribed data and prescribed algorithms: biosemiotics applications in genomic systems 2012 Mar 14
24. Albert Voie: Biological function and the genetic code are interdependent 2006
25. Timothy R. Stout: Information-Driven Machines and Predefined Specifications: Implications for the Appearance of Organic Cellular Life April 8, 2019
26. David L Abel: The Universal Plausibility Metric (UPM) & Principle (UPP) 2009; 6: 27
27. Edward J. Steele: Cause of Cambrian Explosion -Terrestrial or Cosmic? 2018
28. Katarzyna Adamala OPEN QUESTIONS IN ORIGIN OF LIFE: EXPERIMENTAL STUDIES ON THE ORIGIN OF NUCLEIC ACIDS AND PROTEINS WITH SPECIFIC AND FUNCTIONAL SEQUENCES BY A CHEMICAL SYNTHETIC BIOLOGY APPROACH February 2014

https://reasonandscience.catsboard.com

120Perguntas .... - Page 5 Empty Re: Perguntas .... Fri Sep 09, 2022 9:40 am

Otangelo


Admin

1. Paul Davies & Jeremy England:  The Origins of Life: Do we need a new theory for how life began? at 15:30 Life = Chemistry + information  Jun 25, 2021


2. Guenther Witzany: [url=https://pubmed.ncbi.nlm.nih.gov/25557438/#:~:text=Manfred Eigen extended Erwin Schroedinger's,quasispecies and hypercycles%2C which have]Life is physics and chemistry and communication[/url] 2014 Dec 31


3. Paul Davies: The secret of life won't be cooked up in a chemistry lab

4. SUNGCHUL JI: The Linguistics of DNA: Words, Sentences, Grammar, Phonetics, and Semantics  06 February 2006



5. Paul Davies: Life force 18 September 1999


6. Timothy R. Stout: Information-Driven Machines and Predefined Specifications: Implications for the Appearance of Organic Cellular Life April 8, 2019


7. David L Abel: Three subsets of sequence complexity and their relevance to biopolymeric information 11 August 2005


8. G. F. Joyce, L. E. Orgel: Prospects for Understanding the Origin of the RNA World 1993


9. Claus Emmeche: FROM LANGUAGE TO NATURE - the semiotic metaphor in biology 1991

10. Sergi Cortiñas Rovira: Metaphors of DNA: a review of the popularisation processes  21 March 2008

11. Massimo Pigliucci:  Why Machine-Information Metaphors are Bad for Science and Science Education 2010

12. Richard Dawkins on the origins of life (1 of 5) Sep 29, 2008

13. Hubert P. Yockey: Information Theory, Evolution, and the Origin of Life 2005

14. Barry Arrington A Dog Is A Chien Is A Perro Is A Hund February 11, 2013

15. Paul Davies: The Origin of Life January 31, 2003

16. Change Laura Tan, Rob Stadler: The Stairway To Life: An Origin-Of-Life Reality Check  March 13, 2020 

17. David L. Abel: The Capabilities of Chaos and Complexity 9 January 2009

18. George M Church: Next-generation digital information storage in DNA 2012 Aug 16

19. Peter R. Wills: DNA as information 13 March 2016

20. Leroy Hood: The digital code of DNA 2003 Jan 23

21. P.Marshall:  Evolution 2.0:Breaking the Deadlock Between Darwin and Design September 1, 2015

22. Paul C. W. Davies: The algorithmic origins of life 06 February 2013

23. David L Abel: Dichotomy in the definition of prescriptive information suggests both prescribed data and prescribed algorithms: biosemiotics applications in genomic systems 2012 Mar 14

24. Albert Voie: Biological function and the genetic code are interdependent 2006

25. Timothy R. Stout: Information-Driven Machines and Predefined Specifications: Implications for the Appearance of Organic Cellular Life April 8, 2019

26. David L Abel: The Universal Plausibility Metric (UPM) & Principle (UPP) 2009; 6: 27

27. Edward J. Steele: Cause of Cambrian Explosion -Terrestrial or Cosmic? 2018

28. Katarzyna Adamala OPEN QUESTIONS IN ORIGIN OF LIFE: EXPERIMENTAL STUDIES ON THE ORIGIN OF NUCLEIC ACIDS AND PROTEINS WITH SPECIFIC AND FUNCTIONAL SEQUENCES BY A CHEMICAL SYNTHETIC BIOLOGY APPROACH February 2014


29. Sir Fred Hoyle: The Universe: Past and Present Reflections November 1981

30. Robert T. Pennock: Intelligent Design Creationism and Its Critics: Philosophical, Theological, and Scientific Perspectives 2001

31. Paul Davies: The Origin of Life  January 31, 2003

32. Paul Davies & Jeremy England:  The Origins of Life: Do we need a new theory for how life began? Jun 25, 2021

33. Paul Davies: 'I predict a great revolution': inside the struggle to define life 2019

34. David T.F Dryden: How much of protein sequence space has been explored by life on Earth? 15 April 2008

35. Evolution: Possible, or impossible? Probability and the First Proteins

36. Steve Meyer, Signature in the Cell 2009

37. Hubert P.Yockey: A calculation of the probability of spontaneous biogenesis by information theory 7 August 1977

38. M. Emile Borel: LES PROBABILITIES DINOMBRABLES ET LEURS APPLICATIONS ARITHMtTIOUES. 8 novembre 1908

39. Florian Lauck: Coping with Combinatorial Space in Molecular Design October 2013

40. W.Patrick Walters: Virtual screening—an overview 1 April 1998

41. Brian R. Johnson: Self-organization, Natural Selection, and Evolution: Cellular Hardware and Genetic Software  December 2010

42. Paul Davies: [url=https://www.amazon.com/FIFTH-MIRACLE-Search-Origin-Meaning/dp/068486309X#:~:text=Are We Alone in the,years ago%2C Mars resembled earth.]The FIFTH MIRACLE: The Search for the Origin and Meaning of Life[/url]  March 16, 2000

43. Daniel J. Nicholson Is the cell really a machine? 4 June 2019

43a P.Marshall:  Evolution 2.0:Breaking the Deadlock Between Darwin and Design September 1, 2015

44. MARSHALL W. NIRENBERG Will Society Be Prepared? 11 August 1967

45. Patricia Bralley: An introduction to molecular linguistics Fehruary 1996

46. V A Ratner: The genetic language: grammar, semantics, evolution 1993 May;29

47. Eric Alani: DNA Spell Checkers

48. Libretexts: Genetic Information

49. Richard Dawkins: The blind watchmaker  1 January 1986


50. María A Sánchez-Romero: The bacterial epigenome 2020 Jan;18

51. Daniel J. Nicholson: On Being the Right Size, Revisited: The Problem with Engineering Metaphors in Molecular Biology 2020

52. David F. Coppedge Cilia Are Antennas for Human Senses and Development October 26, 2007

53. E. Camprubí: The Emergence of Life  27 November 2019

54. B.Alberts: Molecular Biology of the Cell. 4th edition. 2003

55. B. Alberts Molecular Biology of the Cell 6th ed. 2015

56. J.Monod: Chance and Necessity: An Essay on the Natural Philosophy of Modern Biology  12 setember 1972

57. PAUL DAVIES: The Fifth Miracle The Search for the Origin and Meaning of Life 2000

58. Job Merkel: The Language of DNA 15 NOV, 2019

59. ULRICH E. STEGMANN: The arbitrariness of the genetic code 9 September 2003

60. David L. Abel: The Capabilities of Chaos and Complexity 9 January 2009

61. Ludmila Lackova: [url=https://pubmed.ncbi.nlm.nih.gov/28488159/#:~:text=Arbitrariness in the genetic code,between amino acids and nucleobases.]Arbitrariness is not enough: towards a functional approach to the genetic code[/url] 2 May 2017

62. Eugene V. Koonin: Origin and Evolution of the Universal Genetic Code 2017

63. Thomas Butler: Extreme genetic code optimality from a molecular dynamics calculation of amino acid polar requirement 17 June 2009

64. S J Freeland: The genetic code is one in a million 1998 Sep

65. S J Freeland: Early Fixation of an Optimal Genetic Code 01 April 2000

66. Shalev Itzkovitz: The genetic code is nearly optimal for allowing additional information within protein-coding sequences 2007 Apr; 17

67. H.Yockey: [url=https://www.cambridge.org/br/academic/subjects/life-sciences/evolutionary-biology/information-theory-evolution-and-origin-life?format=HB&isbn=9780521802932#:~:text=Information Theory%2C Evolution and the,the algorithmic language of computers.]Information theory, evolution, and the origin of life[/url] 2005

68. Fazale Rana The Cell's Design: How Chemistry Reveals the Creator's Artistry 1 junho 2008 Page 172:

69. D. L. Gonzalez  On the origin of degeneracy in the genetic code 18 October 2019

70. Tessa E.F. Quax: Codon Bias as a Means to Fine-Tune Gene Expression 2016 Jul 16

71. M.Eberlin Foresight 2019



72. David L Abel: Redundancy of the genetic code enables translational pausing  2014 May 20

72a.  Susha Cheriyedath: START and STOP Codons Feb 26, 2019

73. University of Utah Reading the genetic code depends on context APRIL 17, 2017

74. J Lehmann: Physico-chemical constraints connected with the coding properties of the genetic system 2000 Jan 21

75. Carl R. Woese: Aminoacyl-tRNA Synthetases, the Genetic Code, and the Evolutionary Process  2000 Mar; 6

76. CARL R. WOESE: The Biological Significance of the Genetic Code 1969

77. Ádám Radványid The evolution of the genetic code: Impasses and challenges February 2018

77a. Henri Grosjean: An integrated, structure- and energy-based view of the genetic code 2016 Sep 30

78. C R Woese: The molecular basis for the genetic code. 1967

79. Eugene V. Koonin: Origin and Evolution of the Universal Genetic Code 2017

80. TZE-FEI WONG: A Co-Evolution Theory of the Genetic Code 1975

81. Irene A. Chen:  An expanded genetic code could address fundamental questions about algorithmic information, biological function, and the origins of life 20 July 2010

82. Takahito Mukai: Rewriting the Genetic Code  July 11, 2017

82a. Dirson Jian Li: Formation of the Codon Degeneracy during Interdependent Development between Metabolism and Replication 20 December 2021

83. C R Woese: Order in the genetic code 1965 Jul;5

84. Stephen J. Freeland: The Case for an Error Minimizing Standard Genetic Code October 2003

85. J.Monod: Chance and Necessity: An Essay on the Natural Philosophy of Modern Biology  12 setember 1972

86. John Maynard Smith: The Major Transitions in Evolution 1997

87. Victor A. Gusev Arzamastsev:  [url=https://www.webpages.uidaho.edu/~stevel/565/literature/Genetic code - Lucky chance or fundamental law of nature.pdf]Genetic code: Lucky chance or fundamental law of nature?[/url] 1997

88. Yuri I Wolf On the origin of the translation system and the genetic code in the RNA world by means of natural selection, exaptation, and subfunctionalization 2007 May 31

89. Eugene V. Koonin: Origin and evolution of the genetic code: the universal enigma 2012 Mar 5

90. Marcello Barbieri Code Biology  February 2018

91. Julian Mejıa: Origin of Information Encoding in Nucleic Acids through a Dissipation-Replication Relation April 18, 2018

92. Charles W Carter: Insuperable problems of the genetic code initially emerging in an RNA World 2018 February

93. Florian Kaiser: The structural basis of the genetic code: amino acid recognition by aminoacyl-tRNA synthetases 28 July 2020

94a. JOSEF BERGER: THE GENETIC CODE AND THE ORIGIN OF LIFE 1976

94. David L. Abel: The Capabilities of Chaos and Complexity 9 January 2009

95. S.C. Meyer,  P.A. Nelson Can the Origin of the Genetic Code Be Explained by Direct RNA Templating?  August 24, 2011

96. M.Eberlin Foresight 2019

97. David L. Abel: The Capabilities of Chaos and Complexity 9 January 2009

98. Victor A.Gusev: Genetic code: Lucky chance or fundamental law of nature? December 2004

https://reasonandscience.catsboard.com

121Perguntas .... - Page 5 Empty Re: Perguntas .... Fri Sep 09, 2022 11:59 am

Otangelo


Admin

Chapter 8

The origin of Viruses
Peter Pollard (2014): Viruses are vital to our very existence. No one seems to stick up for the good guys that keep ecosystems diverse and balanced. Phage species richness is immense with a predicted 100 million phage species.
The sheer number of these good viruses is astonishing. Their concentration in a productive lake or river is often 100 million per millilitre – that’s more than four times the population of Australia squeezed into a ¼ of a teaspoon of water. Globally the oceans contain 10^30 viruses.

Curtis A. Suttle (2005):  If the viruses were stretched end to end they would span ~10 million light years. In context, this is equivalent to the carbon in ~75 million blue whales (~10% carbon, by weight), and is ~100 times the distance across our own galaxy. This makes viruses the most abundant biological entity in the water column of the world’s oceans, and the second largest component of biomass after prokaryotes. 1

Polland continues: What are viruses? Viruses are not living organisms. They are simply bits of genetic material (DNA or RNA) covered in protein, that behave like parasites. They attach to their target cell (the host), inject their genetic material, and replicate themselves using the host cells’ metabolic pathways, as you can see in the figure below. Then the new viruses break out of the cell — the cell explodes (lyses), releasing hundreds of viruses. Viruses are very picky about who they will infect. Each viral type has evolved to infect only one host species. Viruses that infect bacteria dominate our world. A virus that infects one species of bacteria won’t infect another bacterial species, and definitely can’t infect you. We have our own suite of a couple of dozen viral types that cause us disease and death. Algae and plants are primary producers, and the foundation of the world’s ecosystems. Using sunlight they turn raw elements like carbon dioxide, nitrogen and phosphorus into organic matter. In turn, they are eaten by herbivores, which are in turn eaten by other animals, and so on. Energy and nutrients are passed on up the food chain until animals die. But what ensures that the primary producers get the raw elements they need to get started? The answer hinges on the viruses’ relationship with bacteria. A virus doesn’t go hunting for its prey. It relies on randomly encountering a host — it’s a numbers game. When the host, such as a bacterial cell, grows rapidly, that number increases. The more of a bacterial species there is, the more likely it will come into contact with its viral nemesis — “killing the winner”. This means that no single bacterial species dominates an ecosystem for very long. In freshwater, for example, you see very high rates of bacterial growth. You would think this high bacterial production would become part of the food chain and end up as fish food. But that is rarely the case. We now realise that the bacteria actually disappear from these ecosystems. So where do the bacteria go? The answer lies in the interaction between bacteria and viruses. When a virus bursts open a bacterial cell its “guts” are spewed back into the water along with all the new viruses. The cell contents then become food for the neighboring bacteria, thereby stimulating their growth. These bacteria increase in numbers and upon coming into contact with their viral nemesis they, too, become infected and lyse. This process of viral infection, lysis, and nutrient release occur over and over again. Bacteria are, in effect, cannibalizing each other with the help of their associated viruses. Very quickly, the elements that support the food web are put back into circulation with the help of viruses.

It’s the combination of high bacterial growth and viral infection that keeps ecosystems functioning. Thus viruses are a critical part of inorganic nutrient recycling. So while they are tiny and seem insignificant, viruses actually play an essential global role in the recycling of nutrients through food webs.

The origin of viruses, essential agents for life, is another mystery besides  the origin of life 
This is a major conundrum. Researchers struggle to find a coherent narrative to explain the fact that life depends on Viruses and vice versa. If we want to elucidate how life began, it is important to give a closer look as well to the origin of Viruses.  Viruses are rarely in the spotlight when it comes to elucidating biological origins. Unjustifiably so, since they are essential for life.

What came first, cells or viruses? 
This is a classical chicken & egg problem: Gladys Kostyrka (2016): Cells depend on viruses, but viruses depend on cells as a host for replication. What came first? How could viruses play critical roles in the OL if life relies on cellular organization and if viruses are defined as parasites of cells? In other words, how could viruses play a role in the emergence of cellular life if the existence of cells is a prerequisite for the existence of viruses? 2

Colin Hill (2021): From the giant, ameba-infecting marine viruses to the tiny Porcine circovirus harboring only two genes, viruses and their cellular hosts are ecologically and evolutionarily intertwined. 3

C.Arnold (2014): Koonin believes in a Virus World. According to him, the ancestors of modern viruses emerged when all life was still a floating stew of genetic information, amino acids, and lipids. The earliest pieces of genetic material were, according to him, likely short pieces of RNA with relatively few genes that often parasitized other floating bits of genetic material to make copies of themselves. These naked pieces of genetic information swapped genes at a primeval genetic flea market, appropriating hand-me-downs from other elements and discarding genes that were no longer needed.

This appears to be one of the many attempts to make sense and explain the origin of viruses, but it is not convincing. Recent evidence puts a question mark on such hypotheses. C.Arnold continues: The largest virus ever discovered, pithovirus is more massive than even some bacteria. Most viruses copy themselves by hijacking their host's molecular machinery. But pithovirus is much more independent, possessing some replication machinery of its own. Pithovirus's relatively large number of genes also differentiated it from other viruses, which are often genetically simple—the smallest have a mere four genes. Pithovirus has around 500 genes, and some are used for complex tasks such as making proteins and repairing and replicating DNA. "It was so different from what we were taught about viruses," Abergel said. (Also see "Virus-Infecting Virus Fuels Definition of Life Debate.") The stunning find, first revealed in 2014, isn't just expanding scientists' notions of what a virus can be. It is reframing the debate over the origins of life. The ancestors of modern viruses are far from being evolutionary laggards.

The origin of their complexity demands as much an explanation as the origin of the first cells. C.Arnold again: The predominant theories for the origin of viruses propose that they emerged either from a type of degenerate cell that had lost the ability to replicate on its own or from genes that had escaped their cellular confines. Giant viruses, first described in 2003, began to change that line of thinking for some scientists. These novel entities represented an entirely new kind of virus. Indeed, the first specimen—isolated from an amoeba living in a cooling tower in England—was so odd that it took scientists years to understand what they had.  The scientists named it mimivirus, for MImicking MIcrobe virus, because amoebae appear to mistake it for their typical bacterial meal. Then, next, with a staggeringly high number of genes, approximately 2,500, pandoravirus seemed to herald an entirely new class of viral life. "More than 90 percent of its genes do not resemble anything else found on Earth.

If viruses developed from cells, they should be less diverse because cells would contain the entire range of genes available to viruses. Viruses are also more diverse when it comes to reproduction. "Cells only have two main ways of replicating their DNA. Viruses, on the other hand, have many more methods at their disposal. 4

Hugh Ross (2020): Without viruses, bacteria would multiply and, within a relatively short time period, occupy every niche and cranny on Earth’s surface. The planet would become a giant bacterial slime ball. Those sextillions of bacteria would consume all the resources essential for life and die. Viruses keep Earth’s bacterial population in check. They break up and kill bacteria at the just-right rates and in the just-right locations so as to maintain a population and diversity of bacteria that is optimal for both the bacteria and all the other life forms. It is important to note that all multicellular life depends on bacteria being present at the optimal population level and optimal diversity. We wouldn’t be here without viruses! Viruses also play a crucial role in Earth’s carbon cycle. They and the bacterial fragments they create are carbonaceous substances. Through their role in precipitation, they collect as vast carbonaceous sheets on the surfaces of the world’s oceans. These sheets or mats of viruses and bacterial fragments sink slowly and eventually land on the ocean floors. As they are sinking they provide important nutrients for deep-sea and benthic (bottom-dwelling) life. Plate tectonics drive much of the viral and bacterial fragments into Earth’s crust and mantle where some of that carbonaceous material is returned to the atmosphere through volcanic eruptions.5 Virus-archaea interactions play a central role in global biogeochemical cycles. Ramesh K Goel (2021): Viruses play vital biogeochemical and ecological roles by (a) expressing auxiliary metabolic genes during infection, (b) enhancing the lateral transfer of host genes, and (c) inducing host mortality. Even in harsh and extreme environments, viruses are major players in carbon and nutrient recycling from organic matter. 6 Eugene V. Koonin (2020): Lytic infections (involving the replication of a viral genome) of cellular organisms, primarily bacteria, by viruses play a central role in the biological matter turnover in the biosphere. Considering the enormous abundance and diversity of viruses and other mobile genetic elements (MGEs), and the ubiquitous interactions between MGEs and cellular hosts, a thorough investigation of the evolutionary relationships among viruses and mobile genetic elements (MGEs) is essential to advance our understanding of the evolution of life 7 Eugene V Koonin (2013): Virus killing of marine bacteria and protists largely determines the composition of the biota, provides a major source of organic matter for consumption by heterotrophic organisms, and also defines the formation of marine sediments through the deposition of skeletons of killed plankton organisms such as foraminifera and diatoms. 8 Rachel Nuwer (2020):  If all viruses suddenly disappeared, the world would be a wonderful place for about a day and a half, and then we’d all die – that’s the bottom line. The vast majority of viruses are not pathogenic to humans, and many play integral roles in propping up ecosystems. Others maintain the health of individual organisms – everything from fungi and plants to insects and humans. “We live in a balance, in a perfect equilibrium.  In 2018, for example, two research teams independently made a fascinating discovery. A gene of viral origin encodes for a protein that plays a key role in long-term memory formation by moving information between cells in the nervous system. 9

P. Forterre (2008): Historically, three hypotheses have been proposed to explain the origin of viruses: 

(1) they originated in a precellular world (‘the virus-first hypothesis’); 
(2) they originated by reductive evolution from parasitic cells (‘the reduction hypothesis’); and 
(3) they originated from fragments of cellular genetic material that escaped from cell control (‘the escape hypothesis’). 

All these hypotheses had specific drawbacks. The virus-first hypothesis was usually rejected firsthand since all known viruses require a cellular host. The reduction hypothesis was difficult to reconcile with the observation that the most reduced cellular parasites in the three domains of life, such as Mycoplasma in Bacteria, Microsporidia in Eukarya, or Nanoarchaea in Archaea, do not look like intermediate forms between viruses and cells. Finally, the escape hypothesis failed to explain how such elaborate structures as complex capsids and nucleic acid injection mechanisms evolved from cellular structures since we do not know any cellular homologs of these crucial viral components. 

Much like the concept of prokaryotes became the paradigm on how to think about bacterial evolution, the escape hypothesis became the paradigm favored by most virologists to solve the problem of virus origin. This scenario was chosen mainly because it was apparently supported by the observation that modern viruses can pick up genes from their hosts. In its classical version, the escape theory suggested that bacteriophages originated from bacterial genomes and eukaryotic viruses from eukaryotic genomes. This led to a damaging division of the virologist community into those studying bacteriophages and those studying eukaryotic viruses, ‘phages’ and viruses being somehow considered to be completely different entities. The artificial division of the viral world between ‘viruses’ and bacteriophages also led to much confusion on the nature of archaeal viruses. Indeed, although most of them are completely unrelated to bacterial viruses, they are often called ‘bacteriophages’, since archaea (formerly archaebacteria) are still considered by some biologists as ‘strange bacteria’. For instance, archaeal viruses are grouped with bacteriophages in the drawing that illustrates viral diversity in the last edition of the Virus Taxonomy Handbook. Hopefully, these outdated visions will finally succumb to the accumulating evidence from molecular analyses. 

Viruses Are Not Derived from Modern Cells 
Abundant data are now already available to discredit the escape hypothesis in its classical adaptation of the prokaryote/eukaryote paradigm. This hypothesis indeed predicts that proteins encoded by bacterial viruses (avoiding the term bacteriophage here) should be evolutionarily related to bacterial proteins, whereas proteins encoded by viruses infecting eukaryotes should be related to eukaryotic proteins. This turned out to be wrong since, with a few exceptions (that can be identified as recent transfers from their hosts), most viral encoded proteins have either no homologs in any cell or only distantly related homologs. In the latter cases, the most closely related cellular homolog is rarely from the host and can even be from cells of a domain different from the host. More and more biologists are thus now fully aware that viruses form a world of their own, and that it is futile to speculate on their origin in the framework of the old prokaryote/ eukaryote dichotomy.

A more elaborate version has been proposed by William Martin and Eugene Koonin, who suggested that life originated and evolved in the cell-like mineral compartments of a warm hydrothermal chimney. In that model, viruses emerged from the assemblage of self-replicating elements using these inorganic compartments as the first hosts. The formation of true cells occurred twice independently only at the end of the process (and at the top of the chimney), producing the first archaea and bacteria. The latter escaped from the same chimney system as already fully elaborated modern cells. In the model, viruses first co-evolved with acellular machineries producing nucleotide precursors and proteins.

The emergence of the RNA world involves at least the existence of complex mechanisms to produce ATP, RNA, and proteins. This means an elaborated metabolism to produce ribonucleotide triphosphate (rNTP) and amino acids, RNA polymerases, and ribosomes, as well as an ATP-generating system. If such a complex metabolism was present, it appears unlikely that it was unable to produce lipid precursors, hence membranes. If this is correct, then ‘modern’ viruses did not predate cells but originated in a world populated by primitive cells. 

Viruses and the Origin of DNA 
Considering the possibility that at least some DNA viruses originated from RNA viruses, it has been suggested that DNA itself could have appeared in the course of virus evolution (in the context of competition between viruses and their cellular hosts). Indeed, DNA is a modified form of RNA, and both viruses and cells often chemically modify their genomes to protect themselves from nucleases produced by their competitor. It is usually considered that DNA replaced RNA in the course of evolution simply because it is more stable (thanks to the removal of the reactive oxygen in position 20 of the ribose) and because cytosine deamination (producing uracil) can be corrected in DNA (where uracil is recognized as an alien base) but not in RNA. 10

Anyone that studies biochemistry, knows the enormous complexity of ribonucleotide reductase enzymes, that remove oxygen from the 2' position of ribose, the backbone of RNA, to transform RNA into DNA. There is no scientific explanation for how RNA could have transitioned to DNA, and the origin of the ultra-complex machinery to catalyze the needed reactions. Molecules have no goals, no foresight. They did not think about the advantage of stability if transitioning to DNA. There's nothing about inert chemicals and physical forces that say we want to become part of a living self-replicating entity called a cell at the end of a chemical evolutionary process. Molecules do not have the "drive", they do not urge or "want" to find ways to become information-bearing biomolecules, or able to harness energy as ATP molecules, become more efficient, or become part of a molecular machine, or in the end, a complex organism. There is a further hurdle to overcome. More and more biologists are now fully aware that viruses form a world of their own. Proteins encoded by bacterial viruses are not related to bacterial proteins. Modern viruses exhibit very different types of genomes (RNA, DNA, single-stranded, double-stranded), including highly modified DNA, whereas all modern cellular organisms have double-stranded DNA genomes. So the question becomes how Viruses that have a DNA genome originated since they had an independent origin from living cells. Even more: P. Forterre (2008): Many DNA viruses encode their own enzymes for deoxynucleotide triphosphate (dNTP) production, ribonucleotide reductases (the enzymes that produce deoxyribonucleotides from ribonucleotides), and thymidylate synthases (the enzymes that produce deoxythymidine monophosphate (dTMP) from deoxyuridine monophosphate (dUMP). 
That means RNR enzymes would have evolved independently, in a convergent manner, twice !! 

Forterre continues: The replacement of RNA by DNA as cellular genetic material would have thus allowed genome size to increase, with a concomitant increase in cellular complexity (and efficiency) leading to the complete elimination of RNA cells by the ancestors of modern DNA cells. This traditional textbook explanation has been recently criticized as incompatible with Darwinian evolution since it does not explain what immediate selective advantage allowed the first organism with a DNA genome to predominate over former organisms with RNA genomes. Indeed, the newly emerging DNA cell could not have immediately enlarged its genome and could not have benefited straight away from a DNA repair mechanism to remove uracil from DNA. Instead, if the replacement of RNA by DNA occurred in the framework of the competition between cells and viruses, either in an RNA virus or in an RNA cell, modification of the RNA genome into a DNA genome would have immediately produced a benefit for the virus or the cell. It has been argued that the transformation of RNA genomes into DNA genomes occurred preferentially in viruses because it was simpler to change in one step the chemical composition of the viral genome than that of the cellular genomes (the latter interacting with many more proteins). Furthermore, modern viruses exhibit very different types of genomes (RNA, DNA, single-stranded, double-stranded), including highly modified DNA, whereas all modern cellular organisms have double-stranded DNA genomes. This suggests a higher degree of plasticity for viral genomes compared to cellular ones. The idea that DNA originated first in viruses could also explain why many DNA viruses encode their own enzymes for deoxynucleotide triphosphate (dNTP) production, ribonucleotide reductases (the enzymes that produce deoxyribonucleotides from ribonucleotides), and thymidylate synthases (the enzymes that produce deoxythymidine monophosphate (dTMP) from deoxyuridine monophosphate (dUMP). Because in modern cells, dTMP is produced from dUMP, the transition from RNA to DNA occurred likely in two steps, first with the appearance of ribonucleotide reductase and production of U-DNA (DNA containing uracil), followed by the appearance of thymidylate synthases and formation of T-DNA (DNA containing thymine). The existence of a few bacterial viruses with U-DNA genomes has been taken as evidence that they could be relics of this period of evolution. If DNA first appeared in the ancestral virosphere, one has also to explain how it was later on transferred to cells. One scenario posits the co-existence for some time of an RNA cellular chromosome and a DNA viral genome (episome) in the same cell, with the progressive transfer of the information originally carried by the RNA chromosome to the DNA ‘plasmid’ via retro-transposition. 10

Viruses, the most abundant biological entities on earth
Steven W. Wilhelm (2012): Viruses are the most abundant life forms on Earth, with an estimated 10^31 total viruses globally. 11 Eugene V. Koonin (2020): Viruses appear to be the dominant biological entities on our planet, with the total count of virus particles in aquatic environments alone at any given point in time reaching the staggering value of 10^31, a number that is at least an order of magnitude greater than the corresponding count of cells.  The genetic diversity of viruses is harder to assess, but, beyond doubt, the gene pool of viruses is, in the least, comparable to that of hosts. The estimates of the number of distinct prokaryotes on earth differ widely, in the range of 10^7 to 10^12, and accordingly, estimation of the number of distinct viruses infecting prokaryotes at 10^8 to 10^13 is reasonable. Even assuming the lowest number in this range and even without attempting to count viruses of eukaryotes, these estimates represent vast diversity. Despite the rapid short-term evolution of viruses, the key genes responsible for virion formation and virus genome replication are conserved over the long term due to selective constraints. Genetic parasites inescapably emerge even in the simplest molecular replicator systems and persist through their subsequent evolution. Together with the ubiquity and enormous diversity of viruses in the extant biosphere, these findings lead to the conclusion that viruses and other mobile genetic elements MGEs played major roles in the evolution of life ever since its earliest stages.7 G.Witzany (2015): If we imagine that 1ml of seawater contains one million bacteria and ten times more viral sequences it can be determined that 10^31 bacteriophages infect 10^24 bacteria per second. 12

No common ancestor for Viruses
Eugene V. Koonin (2020): In the genetic space of viruses and mobile genetic elements (MGEs), no genes are universal or even conserved in the majority of viruses. Viruses have several distinct points of origin, so there has never been a last common ancestor of all viruses. 7

Viruses and the tree of life ( Virology blog 2009): Viruses are polyphyletic (a group whose members come from multiple ancestral sources): In a phylogenetic tree, the characteristics of members of taxa are inherited from previous ancestors. Viruses cannot be included in the tree of life because they do not share characteristics with cells, and no single gene is shared by all viruses or viral lineages. Viruses are polyphyletic – they have many evolutionary origins. Viruses don’t have a structure derived from a common ancestor.  Cells obtain membranes from other cells during cell division. According to this concept of ‘membrane heredity’, today’s cells have inherited membranes from the first cells.  Viruses have no such inherited structure.  They play an important role in regulating population and biodiversity. 13

Eugene V. Koonin (2017): The entire history of life is the story of virus-host coevolution. Therefore the origins and evolution of viruses are an essential component of this process. A signature feature of the virus state is the capsid, the proteinaceous shell that encases the viral genome. Although homologous capsid proteins are encoded by highly diverse viruses, there are at least 20 unrelated varieties of these proteins. Viruses are the most abundant biological entities on earth and show a remarkable diversity of genome sequences, replication and expression strategies, and virion structures.  Virus genomes typically consist of distinct structural and replication modules that recombine frequently and can have different evolutionary trajectories. 14  

The importance of the admission that viruses do not share a common ancestor cannot be outlined enough. Researchers also admit, that under a naturalistic framework, the origin of viruses remains obscure, and has not found an explanation. One reason is that viruses depend on a cell host in order to replicate. Another is, that the virus capsid shells that protect the viral genome are unique, there is no counterpart in life. A science paper that I quote below describes capsids with a "geometrically sophisticated architecture not seen in other biological assemblies". This seems to be interesting evidence of design. The claim that their origin has something to do with evolution is also misleading - evolution plays no role in explaining either the origin of life or the origin of viruses. The fact that "no single gene is shared by all viruses or viral lineages" prohibits drawing a tree of viruses leading to a common ancestor.  

Edward C. Holmes (2011): The discovery of mimivirus has undoubtedly had a major impact on theories of viral origins. More striking is that most (∼70% at the time of writing) mimivirus genes have no known homologs, in either virus or cellular genomes, so their origins are unknown. More importantly, the discovery of mimivirus highlights our profound ignorance of the virosphere. It is therefore a truism that a wider sampling of viruses in nature is likely to tell us a great deal more about viral origins. Although perhaps less lauded, the discovery of conserved protein structures among diverse viruses with little if any primary sequence similarity has even grander implications for our understanding of viral origins. 15

Capsid-encoding organisms in contrast to ribosome-encoding organisms
Eugene V. Koonin (2014): Viruses were defined as one of the two principal types of organisms in the biosphere, namely, as capsid-encoding organisms in contrast to ribosome-encoding organisms, i.e., all cellular life forms. Structurally similar, apparently homologous capsids are present in a huge variety of icosahedral viruses that infect bacteria, archaea, and eukaryotes. These findings prompted the concept of the capsid as the virus “self” that defines the identity of deep, ancient viral lineages. This “capsidocentric” perspective on the virus world is buttressed by observations on the extremely wide spread of certain capsid protein (CP) structures that are shared by an enormous variety of viruses, from the smallest to the largest ones, that infect bacteria, archaea, and all divisions of eukaryotes. The foremost among such conserved capsid protein structures is the so-called jelly roll capsid (JRC) protein fold, which is represented, in a variety of modifications, in extremely diverse icosahedral (spherical) viruses that infect hosts from all major groups of cellular life forms. In particular, the presence of the double-beta-barrel JRC (JRC2b) in a broad variety of double-stranded DNA (dsDNA) viruses infecting bacteria, archaea, and eukaryotes has been touted as an argument for the existence of an “ancient virus lineage,” of which this type of capsid protein is the principal signature. Under this approach, viruses that possess a single beta-barrel JRC (JRC1b)—primarily RNA viruses and single-stranded DNA (ssDNA) viruses— could be considered another major viral lineage. A third lineage is represented by dsDNA viruses with icosahedral capsids formed by the so-called HK97-like capsid protein (after bacteriophage HK97, in which this structure was first determined), with a fold that is unrelated to the jelly roll fold. This assemblage of viruses is much less expansive than those defined by either JRC2b or JRC1b, but nevertheless, it unites dsDNA viruses from all three domains of cellular life. The capsid-based definition of a virus does capture a quintessential distinction between the two major empires of life forms, i.e., viruses and cellular life forms.    16

Viruses with a different genetic alphabet
Stephen Freeland (2022): The genetic material of more than 200 bacteriophage viruses uses 1-aminoadenine (Z) instead of adenine (A). This minor difference in chemical structures is nevertheless a fundamental deviation from the standard alphabet of four nucleobases established by biological evolution at the time of life's Last Universal Common Ancestor (LUCA). Placed into broader context, the finding illustrates a deep shift taking place in our understanding of the chemical basis for biology. 17

What is the best explanation for viral origin?
Edward C. Holmes (2011):  The central debating point in discussions of the origin of viruses is whether they are ancient, first appearing before the last universal cellular ancestor (LUCA), or evolved more recently, such that their ancestry lies with genes that “escaped” from the genomes of their cellular host organisms and subsequently evolved independent replication. The escaped gene theory has traditionally dominated thinking on viral origins, in large part because viruses are parasitic on cells now and it has been argued that this must have always have been the case. However, there is no gene shared by all viruses, and recent data are providing increasingly strong support for a far more ancient origin. 18

Koonin mentions three possible scenarios for their origin. One of them: Eugene V. Koonin (2017): The virus-first hypothesis, also known as the primordial virus world hypothesis, regards viruses (or virus-like genetic elements) as intermediates between prebiotic chemical systems and cellular life and accordingly posits that virus-like entities originated in the precellular world.

The second: The regression hypothesis, in contrast, submits that viruses are degenerated cells that have succumbed to obligate intracellular parasitism and in the process shed many functional systems that are ubiquitous and essential in cellular life forms, in particular the translation apparatus. The third, the escape hypothesis postulates that viruses evolved independently in different domains of life from cellular genes that embraced selfish replication and became infectious. 14

The second and third are questionable, in face of the fact that evolution would sort out degenerated cell parts that would harm their survival. The hypothesis that these parts would become parasites, goes detrimentally against the evolutionary paradigm, since evolution is about the survival of the fittest, and not evolving parasites that would kill the cell. Furthermore, if Viruses were not extant right from the beginning, how would ecological homeostasis be guaranteed?

Koonin agrees that the first is the most plausible. He writes:  The diversity of genome replication-expression strategies in viruses, contrasting the uniformity in cellular organisms, had been considered to be most compatible with the possibility that the virus world descends directly from a precellular stage of evolution, and an updated version of the escape hypothesis states that the first viruses have escaped not from contemporary but rather from primordial cells, predating the last universal cellular ancestor. The three evolutionary scenarios imply different timelines for the origin of viruses but offer little insight into how the different components constituting viral genomes might have combined to give rise to modern viruses.

The conclusion that can be drawn is, that Viruses co-emerged with life, and that occurred multiple times. If just emerging once is extremely unlikely based on the odds, how much more, multiple times?

Koonin continues: A typical virus genome encompasses two major functional modules, namely, determinants of virion formation and those of genome replication. Understanding the origin of any virus group is possible only if the provenances of both components are elucidated. Given that viral replication proteins often have no closely related homologs in known cellular organisms, it has been suggested that many of these proteins evolved in the precellular world or in primordial, now extinct, cellular lineages. The ability to transfer the genetic information encased within capsids—the protective proteinaceous shells that comprise the cores of virus particles (virions)—is unique to bona fide viruses and distinguishes them from other types of selfish genetic elements such as plasmids and transposons. Thus, the origin of the first true viruses is inseparable from the emergence of viral capsids. Studies on the origin of viral capsids are severely hampered by the high sequence divergence among these proteins.

Analysis of the available sequences and structures of major capsid proteins (CP) and nucleocapsid (NC) proteins encoded by representative members of 135 virus taxa (117 families and 18 unassigned genera) allowed us to attribute structural folds to 76.3% of the known virus families and unassigned genera. The remaining taxa included viruses that do not form viral particles (3%) and viruses for which the fold of the major virion proteins is not known and could not be predicted from the sequence data (20.7%). The former group includes capsidless viruses of the families Endornaviridae, Hypoviridae, Narnaviridae, and Amalgaviridae, all of which appear to have evolved independently from different groups of full-fledged capsid-encoding RNA viruses. The latter category includes eight taxa of archaeal viruses with unique morphologies and genomes, pleomorphic bacterial viruses of the family Plasmaviridae, and 19 diverse taxa of eukaryotic viruses. It should be noted that, with the current explosion of metagenomics studies, the number and diversity of newly recognized virus taxa will continue to rise. Although many of these viruses are expected to have previously observed CP/NC protein folds, novel architectural solutions doubtlessly will be discovered as well. 17

Gladys Kostyrka (2016): To french molecular biologist and microbiologist Patrick Forterre, viruses could not exist without cells because he endorses their definition as intracellular obligate parasites. However, this does not mean that viruses did not exist prior to DNA cells. On the basis of comparative sequence analyses of proteins and nucleic acids from viruses and their cellular hosts, Forterre hypothesized that viruses originated before DNA cells and before LUCA (the Last Universal Cellular Ancestor). Forterre’s hypothesis has been first formulated in the 1990s and was inspired by protein phylogenies. “Comparative sequence analyses of type II DNA topoisomerases and DNA polymerases from viruses, prokaryotes and eukaryotes suggest that viral genes diverged from cellular genes before the emergence of the last common ancestor (LCA) of prokaryotes and eukaryotes”.  At least some viruses originated not from the known cellular domains e Bacteria, Eukarya, and Archaea e but before these three domains were formed. In other words, these viruses must have originated before LUCA.  There are several genes shared by many groups of viruses with extremely diverse replication-expression strategies, genome size and host ranges. In other words, there are several “hallmark genes”, coding for several hallmark proteins present in many viruses. Yet these genes and proteins are not supposed to be shared by viruses that do not have the same origin, given their diversity. This “key observation” of several hallmark viral genes is thus problematic. It is even more problematic if one takes into account the fact that these genes are not found in any cellular life forms. It is then highly improbable that these viral hallmark genes were originally cellular genes that were transferred to viruses. Koonin assumes that these genes originated in a primordial viral world and were conserved. “The simplest explanation for the fact that the hallmark proteins involved in viral replication and virion formation are present in a broad variety of viruses but not in any cellular life forms seems to be that the latter actually never possessed these genes. Rather, the hallmark genes, probably, antedate cells and descend directly from the primordial pool of virus-like genetic elements” 19

If Koonin's hypothesis were the case, these nucleotides would require foresight to assemble into genes, that later would become virions, depending on cell hosts. That seems not tenable.  The evidence is better interpreted by the creationist model. It coincides with the hypothesis, that God created each species/kind and virus separately. Multiple creation events by natural means and the emergence of symbiotic and parasitic relationships just mean multiplying the odds, and then naturalistic proposals become more and more untenable.

Achieving the same function through different molecular assembly routes refutes an evolutionary-naturalistic origin of viruses
Eugene V. Koonin (2015): The ability to form virions is the key feature that distinguishes viruses from other types of mobile genetic elements, such as plasmids and transposons. The origin of bona fide viruses thus appears to be intimately linked to and likely concomitant with the origin of the capsids. However, tracing the provenance of viral capsid proteins (CPs) proved to be particularly challenging because they typically do not display sequence or structural similarity to proteins from cellular life forms. Over the years, a number of structural folds have been discovered in viral CPs. Strikingly, morphologically similar viral capsids, in particular, icosahedral, spindle-shaped and filamentous ones, can be built from CPs which have unrelated folds. Thus, viruses have found multiple solutions to the same problem. Nevertheless, the process of de novo origin of viral CPs remains largely enigmatic.  9

Stephen J. Gould (1990):…No finale can be specified at the start, none would ever occur a second time in the same way, because any pathway proceeds through thousands of improbable stages. Alter any early event, ever so slightly, and without apparent importance at the time, and evolution cascades into a radically different channel.21

Fazale Rana (2001): Gould’s metaphor of “replaying life’s tape” asserts that if one were to push the rewind button, erase life’s history, and let the tape run again, the results would be completely different.  The very essence of the evolutionary process renders evolutionary outcomes as nonreproducible (or nonrepeatable). Therefore, “repeatable” evolution is inconsistent with the mechanism available to bring about biological change. 22

William Schopf (2002): Because biochemical systems comprise many intricately interlinked pieces, any particular full-blown system can only arise once…Since any complete biochemical system is far too elaborate to have evolved more than once in the history of life, it is safe to assume that microbes of the primal LCA cell line had the same traits that characterize all its present-day descendants. 23 24

Hugh M. B. Harris: (2021): Viruses are ubiquitous. They infect almost every species and are probably the most abundant biological entities on the planet, yet they are excluded from the Tree of Life (ToL). Viruses may well be essential for ecosystem diversity 25

Matti Jalasvuori (2012): Viruses play a vital role in all cellular and genetic functions, and we can therefore define viruses as essential agents of life. Viruses provide the largest reservoir of genes known in the biosphere but were not, stolen’ from the host. Such capsids cannot be of host origin. It is well accepted by virologists that viruses often contain many complex genes (including core genes) that cannot be attributed to having been derived from host genes. 26

Julia Durzyńska (2015): Many attempts have been made to define nature of viruses and to uncover their origin.   As the origin of viruses and that of living cells are most probably interdependent, we decided to reveal ideas concerning nature of cellular last universal common ancestor (LUCA).   Many viral particles (virions) contain specific viral enzymes required for replication. A few years ago, a new division for all living organisms into two distinct groups has been proposed: ribosome-encoding organisms (REOs) and capsid-encoding organisms (CEOs). 27

Eugene V. Koonin: (2012): Probably an even more fundamental departure from the three-domain schema is the discovery of the Virus World, with its unanticipated, astonishing expanse and equally surprising evolutionary connectedness. Virus-like parasites inevitably emerge in any replicator systems, so THERE IS NO EXAGGERATION IN THE STATEMENT THAT THERE IS NO LIFE WITHOUT VIRUSES. And in quite a meaningful sense, not only viruses taken together, but also major groups of viruses seem to be no less (if not more) fundamentally distinct as the three (or two) domains of cellular life forms, given that viruses employ different replication-expression cycles, unlike cellular life forms which, in this respect, are all the same. 28

Shanshan Cheng: (2013): Viral capsid proteins protect the viral genome by forming a closed protein shell around it. Most of currently found viral shells with known structure are spherical in shape and observe icosahedral symmetry. Comprised of a large number of proteins, such large, symmetrical complexes assume a geometrically sophisticated architecture not seen in other biological assemblies. Geometry of the complex architecture aside, another striking feature of viral capsid proteins lies in the folded topology of the monomers, with the canonical jelly-roll β barrel appearing most prevalent (but not sole) as a core structural motif among capsid proteins that make up these viral shells of varying sizes. Our study provided support for the hypothesis that viral capsid proteins, which are functionally unique in viruses in constructing protein shells, are also structurally unique in terms of their folding topology.29

Eugene V. Koonin (2020): In a seminal 1971 article, Baltimore classified all then-known viruses into six distinct classes that became known as Baltimore classes (BCs) (a seventh class was introduced later), on the basis of the structure of the virion's nucleic acid (traditionally called the virus genome):

The seven Baltimore classes (BCs)
For each BC, the processes of replication, transcription, translation, and virion assembly are shown by color-coded arrows (see the inset). Host enzymes that are involved in virus genome replication or transcription are prefixed with “h-,” and in cases when, in a given BC, one of these processes can be mediated by either a host- or a virus-encoded enzyme, the latter is prefixed with “v-.” Otherwise, virus-encoded enzymes are not prefixed. CP, capsid protein; DdDp, DNA-directed DNA polymerase; DdRp, DNA-directed RNA polymerase; gRNA, genomic RNA; RdRp, RNA-directed RNA polymerase; RT, reverse transcriptase; RCRE, rolling-circle replication (initiation) endonuclease.

1. Double-stranded DNA (dsDNA) viruses, with the same replication-expression strategy as in cellular life forms
2. Single-stranded DNA (ssDNA) viruses that replicate mostly via a rolling-circle mechanism
3. dsRNA viruses
4. Positive-sense RNA [(+)RNA] viruses that have ssRNA genomes with the same polarity as the virus mRNA(s)
5. Negative-sense RNA [(−)RNA] viruses that have ssRNA genomes complementary to the virus mRNA(s)
6. RNA reverse-transcribing viruses that have (+)RNA genomes that replicate via DNA intermediates synthesized by reverse transcription of the genome
7. DNA reverse-transcribing viruses replicating via reverse transcription but incorporating into virions a dsDNA or an RNA-DNA form of the virus genome.

Evidence supports monophyly for some of the BCs but refutes it for others. Generally, the evolution of viruses and MGEs is studied with methods of molecular evolutionary analysis that are also used for cellular organisms. However, the organizations of the genetic spaces dramatically differ between viruses and their cellular hosts.

Perguntas .... - Page 5 Baltim10
An illustration of the "pathways" each Baltimore group goes through to synthesize mRNA. Of the 6 “superviral hallmark genes” in virus genomes of the seven Baltimore classes.

Koonin (2020):The origins of superviral hallmark genes VHGs appear to be widely different. In particular, RdRps, RTs, and RCREs most likely represent the heritage of the primordial replicator pool as indicated by the absence of orthologs of these proteins in cellular life-forms. At the top of the megataxonomy are the four effectively independent realms that, however, are connected at an even higher rank of unification through the super-VHG domains. 7

Rob Phillips (2018):The International Committee on Taxonomy of Viruses or ICTV classifies viruses into seven orders:

Herpesvirales, large eukaryotic double-stranded DNA viruses;
Caudovirales, tailed double-stranded DNA viruses typically infecting bacteria;
Ligamenvirales, linear double-stranded viruses infecting archaea;
Mononegavirales, nonsegmented negative (or antisense) strand single-stranded RNA viruses of plants and animals;
Nidovirales, positive (or sense) strand single-stranded RNA viruses of vertebrates;
Picornavirales, small positive strand single-stranded RNA viruses infecting plants, insects, and animals;
Tymovirales, monopartite positive single-stranded RNA viruses of plants.

In addition to these orders, there are ICTV families, some of which have not been assigned to an ICTV order. Only those ICTV viral families with more than a few members present in our dataset are explored. 30

Structure and Assembly of Complex Viruses
Carmen San Martin (2013): Viral particles consist essentially of a proteinaceous capsid protecting a genome and involved also in many functions during the virus life cycle. In simple viruses, the capsid consists of a number of copies of the same, or a few different proteins organized into a symmetric oligomer. Structurally complex viruses present a larger variety of components in their capsids than simple viruses. They may contain accessory proteins with specific architectural or functional roles; or incorporate non-proteic elements such as lipids. They present a range of geometrical variability, from slight deviations from the icosahedral symmetry to complete asymmetry or even pleomorphism. Putting together the many different elements in the virion requires an extra effort to achieve correct assembly, and thus complex viruses require sophisticated mechanisms to regulate morphogenesis. This chapter provides a general view of the structure and assembly of complex viruses.

A viral particle consists essentially of a proteinaceous capsid with multiple roles in the protection of the viral genome, cell recognition and entry, intracellular trafficking, and controlled uncoating. Viruses adopt different strategies to achieve these goals. Simple viruses generally build their capsids from a number of copies of the same, or a few different proteins, organized into a symmetric oligomer. In the case of complex viruses, capsid assembly requires further elaborations. What are the main characteristics that define a structurally complex virus? Structural complexity on a virus often, but not necessarily, derives from the need to house a large genome, in which case a larger capsid is required. However, capsid or genome sizes by themselves are not determinants of complexity. For example, flexible filamentous viruses can reach lengths in the order of microns, but most of their capsid mass is built by a single capsid protein arranged in a helical pattern. On the other hand, architecturally complex viruses such as HIV have moderate-sized genomes (7–10 kb of single-stranded (ss) RNA). Structurally complex viruses incorporate a larger variety of components into their capsids than simple viruses. They may contain accessory proteins with specific architectural or functional roles or incorporate non-proteic elements such as lipids. 31

Forming viral symmetric shells
Roya Zandi (2020): The process of formation of virus particles in which the protein subunits encapsidate genome (RNA or DNA) to form a stable, protective shell called the capsid is an essential step in the viral life cycle. The capsid proteins of many small single-stranded RNA viruses spontaneously package their wild-type (wt) and other negatively charged polyelectrolytes, a process basically driven by the electrostatic interaction between positively charged protein subunits and negatively charged cargo.  Regardless of the virion size and assembly procedures, most spherical viruses adopt structures with icosahedral symmetry. How exactly capsid proteins (CPs) assemble to assume a specific size and symmetry have been investigated for over half a century now. As the self-assembly of virus particles involves a wide range of thermodynamics parameters, different time scales, and an extraordinary number of possible pathways, the kinetics of assembly has remained elusive, linked to Levinthal’s paradox for protein folding. The role of the genome on the assembly pathways and the structure of the capsid is even more intriguing. The kinetics of virus growth in the presence of RNA is at least 3 orders of magnitude faster than that of empty capsid assembly, indicating that the mechanism of assembly of CPs around RNA might be quite different. Some questions then naturally arise: What is the role of RNA in the assembly process, and by what means then does RNA preserve assembly accuracy at fast assembly speed? Two different mechanisms for the role of the genome have been proposed: (i) en masse assembly and (ii) nucleation and growth.

The assembly interfaces in many CPs are principally short-ranged hydrophobic in character, whereas there is a strong electrostatic, nonspecific long-ranged interaction between RNA and CPs. To this end, the positively charged domains of CPs associate with the negatively charged RNA quite fast and form an amorphous complex. Hydrophobic interfaces then start to associate, which leads to the assembly of a perfect icosahedral shell. Based on the en masse mechanism, the assembly pathways correspond to situations in which intermediates are predominantly disordered. They found that, at neutral pH, a considerable number of CPs were rapidly (∼28 ms) adsorbed to the genome, which more slowly (∼48 s) self-organized into compact but amorphous nucleoprotein complexes (NPC). By lowering the pH, they observed a disorder−order transition as the protein−protein interaction became strong enough to close up the capsid and to overcome the high energy barrier separating NPCs from virions. 32

1. Curtis A. Suttle: Viruses in the sea 2005
2. Gladys Kostyrka: What roles for viruses in origin of life scenarios? 27 February 2016
3. Hugh M. B. Harris A Place for Viruses on the Tree of Life 14 January 2021
4. CARRIE ARNOLD: Could Giant Viruses Be the Origin of Life on Earth? JULY 17, 2014
5. Hugh Ross: Viruses and God’s Good Designs March 30, 2020
6. Ramesh K Goel: Viruses and Their Interactions With Bacteria and Archaea of Hypersaline Great Salt Lake 2021 Sep 28
7. Eugene V. Koonin: Global Organization and Proposed Megataxonomy of the Virus World 4 March 2020
8. Eugene VKoonin: A virocentric perspective on the evolution of life October 2013
9. Rachel Nuwer  Why the world needs viruses to function  (2020)
10. P.Forterre: Origin of Viruses 2008
11. Steven W. Wilhelm: Ocean viruses and their effects on microbial communities and biogeochemical cycles 2012 Sep 5.
12. G.Witzany: Viruses are essential agents within the roots and stem of the tree of life 21 February 2010
13. Viruses and the tree of life 19 March 2009
14. Eugene V. Koonin: Multiple origins of viral capsid proteins from cellular ancestors March 6, 2017
15. Edward C. Holmes: What Does Virus Evolution Tell Us about Virus Origins? 2011 Jun; 85
16. Eugene V. Koonin: Virus World as an Evolutionary Network of Viruses and Capsidless Selfish Elements 2, June 2014
17. Stephen Freeland: Undefining life's biochemistry: implications for abiogenesis 23 February 2022
18. Edward C. Holmes: What Does Virus Evolution Tell Us about Virus Origins? 2011 Jun; 85
19. Gladys Kostyrka: What roles for viruses in origin of life scenarios? 27 February 2016
20. Eugene V. Koonin:  Evolution of an archaeal virus nucleocapsid protein from the CRISPR-associated Cas4 nuclease 2015
21. Stephen J. Gould, Wonderful Life: The Burgess Shale and the Nature of History 1990
22. Fazale Rana: Repeatable Evolution or Repeated Creation? 2001
23. J. William Schopf: Life’s Origin 2002
24. Fazale Rana: Newly Discovered Example of Convergence Challenges Biological Evolution 2008
25. Hugh M. B. Harris: A Place for Viruses on the Tree of Life 14 January 2021
26. Matti Jalasvuori  Viruses: Essential Agents of Life (2012)
27. Julia Durzyńska  Viruses and cells intertwined since the dawn of evolution  (2015)
28. Eugene V. Koonin:  The Logic of Chance : The Nature and Origin of Biological Evolution (2012)
29. Shanshan Cheng: Viral Capsid Proteins Are Segregated in Structural Fold Space February 7, 2013
30. Rob Phillips: A comprehensive and quantitative exploration of thousands of viral genomes 2018 Apr 19
31. Carmen San Martin: Structure and Assembly of Complex Viruses  19 April 2013
32. Roya Zandi: How a Virus Circumvents Energy Barriers to Form Symmetric Shells March 2, 2020

https://reasonandscience.catsboard.com

122Perguntas .... - Page 5 Empty Re: Perguntas .... Fri Sep 09, 2022 12:35 pm

Otangelo


Admin

1. Curtis A. Suttle: Viruses in the sea 2005


2. Gladys Kostyrka: What roles for viruses in origin of life scenarios? 27 February 2016


3. Hugh M. B. Harris A Place for Viruses on the Tree of Life 14 January 2021


4. CARRIE ARNOLD: Could Giant Viruses Be the Origin of Life on Earth? JULY 17, 2014


5. Hugh Ross: Viruses and God’s Good Designs March 30, 2020


6. Ramesh K Goel: Viruses and Their Interactions With Bacteria and Archaea of Hypersaline Great Salt Lake 2021 Sep 28


7. Eugene V. Koonin: Global Organization and Proposed Megataxonomy of the Virus World 4 March 2020


8. Eugene VKoonin: A virocentric perspective on the evolution of life October 2013


9. Rachel Nuwer  Why the world needs viruses to function  (2020)


10. P.Forterre: Origin of Viruses 2008


11. Steven W. Wilhelm: Ocean viruses and their effects on microbial communities and biogeochemical cycles 2012 Sep 5.


12. G.Witzany: Viruses are essential agents within the roots and stem of the tree of life 21 February 2010


13. Viruses and the tree of life 19 March 2009


14. Eugene V. Koonin: Multiple origins of viral capsid proteins from cellular ancestors March 6, 2017


15. Edward C. Holmes: What Does Virus Evolution Tell Us about Virus Origins? 2011 Jun; 85


16. Eugene V. Koonin: Virus World as an Evolutionary Network of Viruses and Capsidless Selfish Elements 2, June 2014


17. Stephen Freeland: Undefining life's biochemistry: implications for abiogenesis 23 February 2022


18. Edward C. Holmes: What Does Virus Evolution Tell Us about Virus Origins? 2011 Jun; 85


19. Gladys Kostyrka: What roles for viruses in origin of life scenarios? 27 February 2016


20. Eugene V. Koonin:  Evolution of an archaeal virus nucleocapsid protein from the CRISPR-associated Cas4 nuclease 2015


21. Stephen J. Gould, Wonderful Life: The Burgess Shale and the Nature of History 1990


22. Fazale Rana: Repeatable Evolution or Repeated Creation? 2001


23. J. William Schopf: Life’s Origin 2002


24. Fazale Rana: Newly Discovered Example of Convergence Challenges Biological Evolution 2008


25. Hugh M. B. Harris: A Place for Viruses on the Tree of Life 14 January 2021


26. Matti Jalasvuori  Viruses: Essential Agents of Life (2012)


27. Julia Durzyńska  Viruses and cells intertwined since the dawn of evolution  (2015)


28. Eugene V. Koonin:  The Logic of Chance : The Nature and Origin of Biological Evolution (2012)


29. Shanshan Cheng: Viral Capsid Proteins Are Segregated in Structural Fold Space February 7, 2013


30. Rob Phillips: A comprehensive and quantitative exploration of thousands of viral genomes 2018 Apr 19


31. Carmen San Martin: Structure and Assembly of Complex Viruses  19 April 2013



32. Roya Zandi: How a Virus Circumvents Energy Barriers to Form Symmetric Shells March 2, 2020

https://reasonandscience.catsboard.com

123Perguntas .... - Page 5 Empty Re: Perguntas .... Fri Sep 09, 2022 10:55 pm

Otangelo


Admin

Muller's Ratchet: Another hurdle in the hypothetical origin of life scenarios
E. V. Koonin (2017): Both the emergence of parasites in simple replicator systems and their persistence in evolving life forms are inevitable because the putative parasite-free states are evolutionarily unstable. 33 E. V. Koonin (2016): In the absence of recombination, finite populations are subject to irreversible deterioration through the accumulation of deleterious mutations, a process known as Muller’s ratchet, that eventually leads to the collapse of a population via mutational meltdown. 33a

Dana K Howe (2008): The theory of Muller's Ratchet predicts that small asexual populations are doomed to accumulate ever-increasing deleterious mutation loads as a consequence of the magnified power of genetic drift and mutation that accompanies small population size. Evolutionary theory predicts that mutational decay is inevitable for small asexual populations, provided deleterious mutation rates are high enough. Such populations are expected to experience the effects of Muller's Ratchet where the most-fit class of individuals is lost at some rate due to chance alone, leaving the second-best class to ultimately suffer the same fate, and so on, leading to a gradual decline in mean fitness. The mutational meltdown theory built upon Muller's Ratchet to predict a synergism between mutation and genetic drift in promoting the extinction of small asexual populations that are at the end of a long genomic decay process. Since deleterious mutations are harmful by definition, accumulation of them would result in loss of individuals and a smaller population size. Small populations are more susceptible to the ratchet effect and more deleterious mutations would be fixed as a result of genetic drift. This creates a positive feedback loop that accelerates the extinction of small asexual populations. This phenomenon has been called mutational meltdown. From the onset, there would have had to be a population of diversified microbes, not just the population of one progenitor, but varies with different genetic make-ups, internally compartmentalized, able to perform Horizontal Gene Transfer and recombination. Unless these preconditions were met, the population would die. 33b

Comment:  One would have to presume that the first universal common ancestor started as a diversified population, which is the point of objection. If the origin of ONE OoL life event was extremely unlikely, imagine multiple diversified organisms, that would escape Muller's ratchet. That's a far stretch! On top of that, this population would have to find a fully flourishing virus world, fully apt to start the evolutionary arms race necessary to have ecological homeostasis, essential for life to exist on earth. Good luck explaining the origin of over 20 separate virus lineages, polyphyly and multiple ancestries of viruses, and the capsid world!!

A plurality of ancestors
The origin of life did not coincide with the organismal LUCA; rather, a profound gap in time, biological evolution, geochemical change, and surviving evidence separates the two. After life emerged from prebiotic processes, diversification ensued and the initial self-replicating and evolving living systems occupied a wide range of available ecological niches. From this time until the existence of the organismal LUCA, living systems, lineages and communities would have come and gone, evolving via the same processes that are at work today, including speciation, extinction, and gene transfer.  34
Eugene V. Koonin (2020): The LUCA was not a homogenous microbial population but rather a community of diverse microorganisms, with a shared gene core that was inherited by all descendant life-forms and a diversified pangenome that included various genes involved in virus–host interactions, in particular multiple defense systems. 35

Horizontal Gene transfer, and the Origin of Life
Gregory P Fournier (2015): The genomic history of prokaryotic organismal lineages is marked by extensive horizontal gene transfer (HGT) between groups of organisms at all taxonomic levels. These HGT events have played an essential role in the origin and distribution of biological innovations. Analyses of ancient gene families show that HGT existed in the distant past, even at the time of the organismal last universal common ancestor (LUCA). Mobile genetic elements, including transposons, plasmids, bacteriophage, and self-splicing molecular parasites, have played a crucial role in facilitating the movement of genetic material between organisms. Ancient HGT during Hadean/Archaean times is more difficult to study than more recent transfers, although it has been proposed that its role was even more pronounced during earlier times in life’s history.  

Aude Bernheim (2019): None of the strains encode all defense systems. However, if these strains are mixed as part of a population, the pan-genome of this population would encode an ‘immune potential’ that encompasses all of the depicted systems. As these systems can be readily available by HGT, given the high rate of HGT in defense systems, the population in effect harbors an accessible reservoir of immune systems that can be acquired by population members. When the population is subjected to infection, this diversity ensures that at least some population members would encode the appropriate defense system, and these members would survive and form the basis for the perpetuation of the population 36

Eugene V. Koonin (2014): Recombinases derived from unrelated mobile genetic elements have essential roles in both prokaryotic and vertebrate adaptive immune systems. 37

From the onset, there would have had to be a population of diversified microbes, not just the population of one species of progenitor, but varies with different genetic make-ups, able to perform Horizontal Gene Transfer (HGT) and recombination. Also, there had to be transposons, viral sequences, plasmids, viruses, mobile genetic elements, parasites, etc.  Unless these preconditions were met, the population would go extinct.

The virome of the Last Universal Common Ancestor (LUCA)
Eugene V. Koonin (2020):  Given that all life forms are associated with viruses and/or other mobile genetic elements, there is no doubt that the LUCA was a host to viruses. Even a conservative version of this reconstruction suggests a remarkably complex virome that already included the main groups of extant viruses of bacteria and archaea. The presence of a highly complex virome implies the substantial genomic and pan-genomic complexity of the LUCA itself. Viruses and other mobile genetic elements (MGEs) are involved in parasitic or symbiotic relationships with all cellular life forms.  Genetic parasites must have been inalienable components of life from their very beginnings. Unlike cellular life forms, viruses employ all existing types of nucleic acids as replicating genomes packaged into virions. This diversity of the replication and expression strategies has been captured in a systematic form in the ‘Baltimore classification of viruses. There are four realms (the highest rank in virus taxonomy) of viruses that are monophyletic (a group of taxa composed only of a common ancestor) with respect to their core gene sets and partially overlap with the Baltimore classification: Riboviria, Monodnaviria, Duplodnaviria and Varidnaviria. Riboviria includes viruses with positive-sense, negative-sense, and double-stranded RNA (dsRNA) genomes as well as reverse-transcribing viruses with RNA and DNA genomes. Members of this realm are unified by the homologous RNA-dependent RNA polymerases (RdRPs) and reverse archaea that form several distinct, seemingly unrelated groups.

The accessory genes that are present in each strain in addition to the core genome and collectively account for the bulk of the pangenome include diverse anti-parasite defense systems, genes involved in inter-microbial conflicts, such as antibiotic production and resistance, and integrated mobile genetic elements (MGEs). Given that genetic parasites are intrinsic components of any replicator system, this pangenome structure should necessarily have been established at the earliest stages of cellular evolution. Thus, we can conclude with reasonable confidence that it was a prokaryotic population with a pangenomic complexity comparable to that of the extant archaea and bacteria. The LUCA virome was likely dominated by dsDNA viruses. More specifically, several groups of tailed dsDNA viruses (Duplodnaviria) indicate that (at least) this realm of viruses had already reached considerable diversity prior to the radiation of archaea and bacteria.   Each virus genome includes two major functional modules, one for virion formation (morphogenetic module) and one for genome replication. The two modules rarely display congruent histories over long evolutionary spans and are instead exchanged horizontally between different groups of viruses through recombination, continuously producing new virus lineages. 

LUCAs ancestral virome was likely dominated by dsDNA viruses from the realms Duplodnaviria and Varidnaviria. LUCA was not a homogenous microbial population but rather a community of diverse microorganisms, with a shared gene core that was inherited by all descendant life-forms and a diversified pangenome that included various genes involved in virus-host interactions, in particular multiple defense systems. 35

Koonin mentions that Duplodnaviria and Varidnaviria are subdivided into three classes: DJR MCP viruses,  Sphaerolipoviridae, and  Portogloboviridae were extant in the LUCA.

Aude Bernheim (2019): For a microorganism to be protected against a wide variety of viruses, it should encode a broad defense arsenal that can overcome the multiple types of viruses that can infect it. Owing to the selective advantage that defense systems provide, they are frequently gained by bacteria and archaea through horizontal gene transfer (HGT). Faced with viruses that encode counter-defense mechanisms, bacteria and archaea cannot rely on a single defense system and thus need to present several lines of defense as a bet-hedging strategy of survival. Given their selective advantage in the arms race against viruses, one might expect that defense systems, once acquired (either through direct evolution or via HGT), would accumulate in prokaryotic genomes and be selected for. Surprisingly, this is not the case as defense systems are known to be frequently lost from microbial genomes over short evolutionary time scales, suggesting that they can impose selective disadvantages in the absence of infection pressure. Competition studies between strains encoding defense systems, such as CRISPR–Cas or Lit Abi, and cognate defense-lacking strains have demonstrated the existence of a fitness cost in the absence of phage infectionAccess to a diverse set of defense mechanisms is essential in order to combat the enormous genetic and functional diversity of viruses. None of the strains encode all defense systems. However, if these strains are mixed as part of a population, the pan-genome of this population would encode an ‘immune potential’ that encompasses all of the depicted systems. As these systems can be readily available by HGT, given the high rate of HGT in defense systems, the population in effect harbors an accessible reservoir of immune systems that can be acquired by population members. When the population is subjected to infection, this diversity ensures that at least some population members would encode the appropriate defense system, and these members would survive and form the basis for the perpetuation of the population 36

Felix Broecker (2019): Cellular organisms have co-evolved with various mobile genetic elements (MGEs), including transposable elements (TEs), retroelements, and viruses, many of which can integrate into the host DNA. MGEs constitute ∼50% of mammalian genomes, >70% of some plant genomes, and up to 30% of bacterial genomes. The recruitment of transposable elements (TEs), viral sequences, and other MGEs for antiviral defense mechanisms has been a major driving force in the evolution of cellular life. 38

The immune system and defense mechanisms of the LUCA
LUCA had to have already sophisticated, complex, and advanced immune and defense systems to protect itself from invaders, viruses, plasmids, and phages. They had to because viruses are sophisticated too.

Bacteriophage resistance mechanisms
Anna Lopatina (2020): In general, bacteria are known to resist phage infections by mutating or altering their surface receptors, targeting the phage nucleic acids, producing small molecules that poison phage replication, or committing suicide upon phage infection. 39

Tina Y.Liu (2020): All cells must defend against infection by harmful genetic elements, like viruses or transposons. Prokaryotes use a multitude of different strategies to combat their viruses, which are called phages. These include, but are not limited to, adsorption and injection blocking, abortive infection, toxin-antitoxin, restriction-modification, and CRISPR-Cas (clustered regularly interspaced short palindromic repeats CRISPR-associated) systems 40

Luciano Marraffini (2019): Everywhere bacteria are found, they coexist with their respective phages, undergoing continuous cycles of infection. As a consequence, in order to survive and thrive, bacteria have developed an arsenal of anti-phage mechanisms. The diversity and sophistication of bacteria’s anti-phage mechanisms are astounding. The red queen hypothesis states that an organism must constantly evolve to maintain their relative fitness in the face of a predator. In the context of the bacteria-phage relationship, this means that bacteria continuously evolve and update anti-phage mechanisms, while phages adapt to overcome these mechanisms. Competitive bacteria-phage coevolution often referred to as an “evolutionary arms race”, has produced a multitude of bacterial defense mechanisms that act to inhibit every stage of the phage life cycle. As a result of this arms race, bacteria and phages coevolve, and seem to exist in stable equilibria without dramatic fluctuations or extinction events in natural environments. Key to this arms race is the propensity of bacterial defense systems to spread through horizontal gene transfer. Whereas this in principle could lead to an extensive proliferation of defense mechanisms to provide more protection to the host population, bacteria only tend to have a subset of the available diversity of anti-phage mechanisms. This is in part due to fitness costs associated with carrying defense systems. Therefore bacteria, even in the context of a race for survival against their parasites, must tune the trade-off between the cost of carrying anti-phage systems and the benefit of resisting phage infection.

Infection begins with the binding to specific surface proteins or cell wall components of the host cell, an event that is followed by the injection of the phage’s genome. Consequently, bacteria use both broad and phage-specific mechanisms to prevent phage adsorption and injection. Many bacteria spend much of their life cycle embedded in biofilms, an extracellular matrix made up of polymers where bacteria live in close proximity, often on surfaces. Biofilms protect bacteria in various ways. Biofilms can conditionally survive and grow in the presence of phage. Cells inside the colony divide, and are shielded by peripheral cells that get infected. The biofilm structure prevents phage access to the biofilm interior. This depended on the presence of the host protein curli, which forms amyloid fibres that promote the formation of an extracellular matrix and a dense cell packing. In addition to the protective shield provided by biofilms, Gram-negative bacteria can secrete outer membrane vesicles (OMVs), spherical structures made up of outer membrane components and periplasmic cargo which pinch off the cell. Since they contain exposed outer membrane proteins that can act as phage receptors, OMVs can act as decoys, sequestering extracellular phage. One report showed that pre-incubation with OMVs reduced T4 infectivity in E. coli.

Another mechanism to prevent adsorption is the introduction of mutations within receptor genes that affect the protein or its expression. This is a common mode of resistance that is perhaps best exemplified by the identification of mutations in LamB, the phage lambda receptor, in E. coli resistant cells. More recently it was found that receptor expression can be modulated by lysogenic phages via Sie. The P. aeruginosa prophage D3112 expresses the protein Tip, which interacts with the ATPase PilB to prevent type IV pili extension. D3112, as well as other phages that use these pili as receptors, are therefore unable to infect D3112 lysogens. Indeed, a systematic screen of P. aeruginosa Sie mechanisms identified many prophages interfering with either type IV pilus function, or with the O-antigen, another typical P. aeruginosa phage receptor in the surface polysaccharide. 41

Simon J. Labrie et al. (2010): Bacteriophages (or phages) are now widely recognized to outnumber bacteria by an estimated tenfold. The remarkable diversity of phages is best illustrated by the frequency of novel genes that are found in newly characterized phage genomes. Such natural variation is a reflection of the array of bacterial hosts that are available to phages and the high evolution rate of phages when facing selective pressure created by antiphage barriers. In most environments, a large pool of phages and hosts are involved in continuous cycles of co-evolution, in which emerging phage-insensitive hosts help to preserve bacterial lineages, whereas counter-resistant phages threaten such new bacterial strains. Phages and phage resistance mechanisms, therefore, have key roles in regulating bacterial populations in most, if not all, habitats.

Preventing phage adsorption 
Adsorption of phages to host receptors is the initial step of infection and, perhaps, one of the most intricate events, as phages must recognize a particular host-specific cell component. Phages are faced with an astonishing diversity in the composition of host membranes and cell walls. Furthermore, bacteria have a range of barriers to prevent phage adsorption. These adsorption-blocking mechanisms can be divided into at least three categories: the blocking of phage receptors, the production of extracellular matrix, and the production of competitive inhibitors.  42

Luciano Marraffini (2019): Bacteria can also prevent adsorption by hiding or masking surface receptors. For example, in Pseudomonas aeruginosa, type IV pili can be glycosylated to prevent the binding of several pilus-specific phages. Receptors can also be blocked by polysaccharide capsules, which shield the whole bacterial surface. The polysialic acid capsule of E. coli K1 prevents phage T7 attachment to its receptor, lipopolysaccharide (LPS), thereby reducing infectivity. In response, phages can have enzymes in their tails that degrade various capsules, giving rise to an evolutionary arms race that results in the extreme diversification of capsule synthesis and hydrolyzing enzyme genes of the host and phage, respectively. Finally, surface proteins can also hide phage receptors. E. coli lytic phage T5 uses the outer membrane iron uptake protein FhuA as its receptor and expresses the lipoprotein Llp to mask it. This prevents additional T5 particles, and possibly other phages that use FhuA as receptor, such as T1 and phi80, from entering and disturbing T5’s infection cycle. This phenomenon is an example of superinfection exclusion (Sie), a process where intracellular phages, including prophages, block the infection of the same (homotypic Sie) or a different (heterotypic Sie) phage. 41

Blocking of phage receptors. 
Simon J. Labrie et al. (2010): To limit phage propagation, bacteria can adapt the structure of their cell surface receptors or their three-dimensional conformation. For example, Staphylococcus aureus produces a cell-wall anchored virulence factor, immunoglobulin G-binding protein A, which binds to the Fc fragment of immunoglobulin G. It has been shown that phage adsorption improves when the bacteria produce less protein A, indicating that the phage receptor is masked by this protein. Phage T5, which infects Escherichia coli, produces a lipoprotein (Llp) that blocks its own receptor, ferrichrome-iron receptor (FhuA). Llp is expressed at the beginning of the infection, thereby preventing superinfection. This protein also protects newly synthesized phage T5 virions from inactivation by binding free receptors that are released from lysed cells. Host cells also use lipoproteins to inhibit phages, as seen in E. coli F+ strains. The outer-membrane protein TraT, encoded by the F plasmid, masks or modifies the conformation of outer-membrane protein A (OmpA), which is a receptor for many T-even-like E. coli phages. Bordetella spp. use phase variation to alter their cell surface, which is necessary for the colonization and survival of the bacteria. The production of many adhesins and toxins is under the control of the BvgAS two-component regulatory system. Bvg+ Bordetella spp. cells express colonization and virulence factors, including adhesins, toxins and a type III secretion system, that are not expressed in the Bvg– phase. The phage receptor, pertactin autotransporter (Prn), is expressed only in the Bvg+ phase, thus the efficiency of infection of the Bordetella phage BPP-1 is 1 million-fold higher for Bvg+ cells than for Bvg– cells. Interestingly, although this receptor is absent from Bvg– cells, the phage BPP-1 is still able to infect them, albeit at a much lower rate, indicating that this phage has evolved a strategy to overcome the absence of its primary receptor. Some phages that infect Bordetella spp. use a newly discovered family of genetic elements known as diversity generating retroelements to promote genetic variability. These phages switch hosts through a template-dependent, reverse-transcriptase-mediated process that introduces nucleotide substitutions in the variable region of the phage gene mtd, which encodes major tropism determinant protein, the protein that is responsible for host recognition. Comparative genome analyses have revealed putative diversity-generating retroelement systems in other phages, including in those that infect Bifidobacterium spp.

Production of extracellular matrix. 
The production of structured extracellular polymers can promote bacterial survival in various ecological niches by protecting the bacteria against harsh environmental conditions and, in some cases, providing a physical barrier between phages and their receptors. Some phages also specifically recognize these extracellular polymers and even degrade them. Polysaccharide-degrading enzymes can be classified into two groups: hydrolases (also known as the polysaccharases) and lyases. The lyases cleave the linkage between the monosaccharide and the C4 of uronic acid and introduce a double bond between the C4 and C5 of uronic acid. The hydrolases break the glycosyl–oxygen bond in the glycosidic linkage. These viral enzymes are found either bound to the phage structure (connected to the receptor-binding complex) or as free soluble enzymes from lysed bacterial cells. Alginates are exopolysaccharides that are mainly produced by Pseudomonas spp., Azotobacter spp. and some marine algae. An increased phage resistance was observed for alginate-producing Azotobacter spp. cells. However, phage F116, which targets Pseudomonas spp., produces an alginate lyase, facilitating its dispersion in the alginate matrix as well as reducing the viscosity of this matrix. It was proposed that alginate is involved in the adsorption of phage 2 and φPLS-I, which also target Pseudomonas spp., as an alginate-deficient mutant was phage resistant.

Hyaluronan (also known as hyaluronic acid) is composed of alternating N-acetylglucosamine and glucuronic-acid residues and is produced by pathogenic streptococci as a constituent of their capsule. This virulence factor helps bacterial cells to escape the immune system by interfering with defense mechanisms that are mediated by antibodies, complements, and phagocytes. Interestingly, genes encoding hyaluronan-degrading enzymes (known as hyaluronidases) are often found in the prophages that are inserted into the genomes of pathogenic bacterial strains. not only are these prophage-encoded enzymes able to destroy the bacterial hyaluronan, but they also degrade human hyaluronan, helping the bacteria to spread through connective tissues. Both virulent and temperate streptococcal phages possess hyaluronidase, but the quantity of enzyme produced by temperate phages is several orders of magnitude higher than the quantity produced by virulent phages, therefore enabling the temperate phages to cross the hyaluronan barrier. Cell surface glycoconjugates of E. coli strains and Salmonella spp. serovars are extremely diverse. At least two serotype-specific surface sugars are produced by E. coli isolates: the lipopolysaccharide O antigen and the capsular polysaccharide K antigen. Phages have co-evolved with that diversity, and some are specific to these antigens. Capsular-negative mutants are insensitive to K antigen-specific phages. A similar observation was made with Salmonella phage P22, which recognized the O antigen. Furthermore, the P22 tail spike also possesses an endoglycosidase activity, enabling the phage to cross the 100 nm O antigen layer. Phage Φv10, which specifically binds to the O antigen of E.coli O157:H7, possesses an O-acetyltransferase that modifies the O157 antigen to block adsorption of Φv10 and similar phages. 

Preventing phage DNA injection
Superinfection exclusion (Sie) systems are proteins that block the entry of phage DnA into host cells, thereby conferring immunity against specific phages. These proteins are predicted to be membrane-anchored or associated with membrane components. The genes encoding these proteins are often found in prophages, suggesting that in many cases Sie systems are important for phage–phage interactions rather than phage–host interactions. many different Sie systems have been identified, although only a few have been characterized. 42

Luciano Marraffini (2019): Blocking the entry of phage DNA into the cytoplasm is another mechanism of preventing phage infections. The E. coli prophage HK97 confers both homotypic as well as heterotypic (against the closely related phage HK75) Sie thorough the expression of gp15. This is an inner membrane (transmembrane) protein that interacts with the host glucose transporter PtsG, and most likely disrupts its association with phage components required for translocating the viral genome across the inner membrane, thereby preventing the transfer of DNA into the cytoplasm. Another recent example of a heterotypic Sie mechanism preventing DNA injection is found in the mycobacteriophage Fruitloop. During the lytic cycle, Fruitloop gp52 inactivates Wag31, an essential mycobacterial protein involved in cell wall synthesis at the cell poles. This prevents DNA injection by an unrelated group of mycobacteriophages that rely on Wag31, including the phages Hedgerow and Rosebush.  41

Sie systems in Gram-negative bacteria. 
Simon J. Labrie et al. (2010): Coliphage T4, as well-characterized virulent phage, has two Sie systems encoded by imm and sp. These systems cause rapid inhibition of DNA injection into cells, preventing subsequent infection by other T-even-like phages. The Imm and Sp systems act separately and have different mechanisms of action. Imm prevents the transfer of phage DNA into the bacterial cytoplasm by changing the conformation of the injection site. Imm has two non-conventional transmembrane domains and is predicted to be localized to the membrane, but Imm alone does not confer complete phage immunity and must be associated with another membrane protein to exert its function and achieve complete exclusion. The membrane protein Sp inhibits the activity of the T4 lysozyme (which is encoded by gp5), thereby presumably preventing the degradation of peptidoglycan and the subsequent entry of phage DNA. The T4 lysozyme is found at the extremity of the tail and creates holes in the host cell wall, facilitating the injection of phage DNA into the cell. The Sim and SieA systems are associated with the prophages that are found in several enterobacteriaceae species and have been well characterized, although the molecular mechanisms of their blocking activities are not yet fully understood. To exert its activity, Sim must be processed at its amino terminus in a SecA-dependent manner. The resulting 24 kDa Sim protein confers resistance against coliphages P1, c1, c4 and vir mutants. The only evidence that has led to the proposal that Sim blocks DNA entry is that phage adsorption is not affected by the presence of this protein and a bacterium can be successfully transformed with the phage genome. Finally, SieA is found in the inner membrane of Salmonella enterica subsp. enterica serovar Typhimurium carrying lysogenic phage P22 and prevents the infection of phages L, mG178, and mG40. notably, it was initially believed that SieB was also involved in superinfection exclusion, but it was later shown to cause phage-abortive infection.

Sie systems in Gram-positive bacteria. 
To date, only a few examples of mechanisms that inhibit phage DNA injection have been identified in Gram-positive bacteria. most were identified in Lactococcus lactis, a species used in industrial milk fermentation processes. The best-characterized system is Sie2009, which was identified in the genome of the temperate phage lactococcal phage Tuc2009 and then subsequently found in other prophages in the genomes of several L. lactis strains. most lactococcal prophages (including Tuc2009) belong to the P335 lactococcal phage group, and Sie2009 from these phages confers resistance to a genetically distinct group of lactococcal phages (the 936 group). The 936 group is the predominant group of L. lactis-specific phages found in the dairy industry. Lactococcal Sie systems are predicted to be localized to the membrane, and they provide resistance by inhibiting the transfer of phage DNA into host cells. Finally, a Sie-like system was recently found in the prophage of Streptococcus thermophilus, another bacterial species used in industrial milk fermentation processes. Prophage TP-J34 encodes a signal-peptide-bearing 142-amino-acid lipoprotein (LTP) that blocks the injection of phage DNA into the cell. Surprisingly, this system confers resistance to some lactococcal phages when transformed into L. lactis.

Cutting phage nucleic acids Restriction–modification systems. 
Many, if not all, bacterial genera possess restriction-modification (r–m) systems. Their activities are due to several heterogeneous proteins that have been classified into at least four groups (type I–type Iv). The principal function of the r–m system is thought to be protecting the cell against invading DNA, including viruses. When unmethylated phage DNA enters a cell harboring a r–m system, it will be either recognized by the restriction enzyme and rapidly degraded or, to a lesser extent, methylated by a bacterial methylase to avoid restriction, therefore leading to the initiation of the phage’s lytic cycle. The fate of phage DNA is determined mainly by the processing rates of these two enzymes. As the restriction enzyme is often more active than the methylase, the incoming phage DNA is usually cleaved, although the host DNA is always protected by the methylase activity. moreover, methylases are usually more specific for hemimethylated DNA (that is, DNA containing methyl groups on only one of the two DNA strands). when the phage DNA is methylated, the new virions become insensitive to the cognate restriction enzyme and readily infect neighboring cells containing the same r–m system. The phage will remain insensitive until it infects a bacterium that does not encode the same methylase gene, in which case the new virions will become unmethylated again and will therefore be sensitive once again to the r–m system of the original bacterium. To cope with these r–m systems, phages have evolved several anti-restriction strategies. One of these strategies is the absence of endonuclease recognition sites in their genomes through the accumulation of point mutations. For example, the polyvalent Staphylococcus phage K has no Sau3A sites (which have a 5′-GATC-3′ recognition sequence) in its double-stranded-DNA genome. The antiviral efficiency of an r–m system is directly proportional to the number of recognition sites in a viral double-stranded-DNA genome. Furthermore, some phages have overcome r–m systems through the acquisition of the cognate methylase gene in their genomes. Perhaps, the most striking example of an anti-restriction system is found in phage T4. The genome of this virulent phage contains the unusual base hydroxymethylcytosine (HmC) instead of the cytosine that is found in the host DNA. This modification allows phage T4 DNA to be impervious to r–m systems that recognize specific sequences containing a cytosine. In the co-evolutionary arm race, some bacteria have acquired the ability to attack modified phage DNA. In contrast to classical r–m systems, modification-dependent systems (mDSs) are specific for either methylated or hydroxymethylated DNA. Only a few mDS enzymes have been thoroughly characterized, such as DpnI from Streptococcus pneumoniae and mcrA, mcrBC and mrr from E. coli. Interestingly, phage T4 is also resistant to mDS enzymes, because its HmC residues are glucosylated. In yet another twist, E. coli CT596 is able to attack glucosylated DNA, as it possesses a two-component system consisting of glucose-modified restriction S (GmrS) and GmrD proteins encoded by a prophage. This system specifically recognizes and cleaves DNA containing glucosylated HmC but has no effect on unglycosylated DNA. Some T4-like phages have a gene encoding internal protein I (IPI), which is specifically designed to disable the GmrS–GmrD system. During infection, mature IPI (IPI*) is injected into the host cell along with the phage genome. According to its structure, IPI* may interact with the GmrS–GmrD complex to inactivate its restriction activity. Interestingly, some bacterial strains have found ways to bypass IPI* by using a single, fused polypeptide.

Luciano Marraffini (2019): Restriction-modification (RM) systems are a ubiquitous and extremely diverse mode of anti-phage defense. They are normally made up of two activities; a restriction endonuclease and a methyltransferase (the modification component). The restriction endonuclease recognizes short DNA motifs, usually 4–8 base-pairs long, and cuts the phage DNA. These DNA motifs exist in both the bacterial host and invading phage, but the host protects its genome by using the methyltransferase to modify its own DNA to avoid recognition by the restriction enzyme. An invading phage is usually not methylated, and will therefore be cut upon injection. RM systems are classified into four major types, based on their mechanism of action and subunit composition. Both type I and III systems translocate along DNA and cleave away from the recognition sites. Type II, known for its use in molecular cloning, cleave within or near the recognition site. Type IV systems lack a methylase and only contain a restriction endonuclease which only cleaves modified DNA. Finally, there are examples of “inverted” RM systems that do not belong to any of these types. The phage ϕC31 can propagate in Streptomyces coelicolor A2(3) harboring the four-gene “phage growth limiting” (pgl) locus, but only mounts one cycle of infection. The released phages are unable to reinfect Pgl+ hosts, presumably due to the action of the methyltransferase pglX, which modifies new phage DNA to make it susceptible for restriction in the next Pgl+ host by an unknown mechanism.

RM systems and DNA modfications exemplify an elaborate “arms race” between E. coli and phage T4. T4 contains hydroxymethylcytosine (HMC) instead of cytosine in its DNA, inhibiting all type I-III RM systems that recognize sites containing cytosine. To counter this, E. coli uses McrBC, a type IV system specific for HMC-containing DNA. In response, T4 can glycosylate its DNA, which impairs McrBC activity. Against this, E. coli has evolved an additional type IV system, the GmrSGmrD system, that can cleave glycosylated DNA.

The CRISPR–Cas system
Tina Y.Liu (2020): CRISPR-Cas systems stand out as the only known RNA programmed pathways for detecting and destroying bacteriophages and plasmids. . Class 1 CRISPR-Cas systems, the most widespread and diverse of these adaptive immune systems, use an RNA-guided multi-protein complex to find foreign nucleic acids and trigger their destruction. These multisubunit complexes target and cleave DNA and RNA, and regulatory molecules control their activities. CRISPR-Cas loci constitute the only known adaptive immune system in bacteria and archaea. They typically include an array of repeat sequences (CRISPRs) with intervening “spacers” matching sequences of DNA or RNA from viruses or other mobile genetic elements, and a set of genes encoding CRISPR-associated (Cas) proteins.  

Transcription across the CRISPR array produces a precursor crRNA (pre-crRNA) that is processed by nucleases into small, non-coding CRISPR RNAs (crRNAs). Each crRNA molecule assembles with one or more Cas proteins into an effector complex that binds crRNA-complementary regions in foreign DNA or RNA. The effector complex then triggers degradation of the targeted DNA or RNA using either an intrinsic nuclease activity or a separate nuclease in trans.

Giedrius Gasiunas (2012):The silencing of invading nucleic acids is executed by ribonucleoprotein complexes preloaded with small, interfering CRISPR RNAs (crRNAs) that act as guides for targeting and degradation of foreign nucleic acid. The Cas9–crRNA complex of the Streptococcus thermophilus CRISPR3/Cas system introduces a double-strand break at a specific site in DNA containing a sequence complementary to crRNA. DNA cleavage is executed by Cas9, which uses two distinct active sites, RuvC and HNH, to generate site-specific nicks on opposite DNA strands. Results demonstrate that the Cas9–crRNA complex functions as an RNA-guided endonuclease with RNA-directed target sequence recognition and protein-mediated DNA cleavage. 43

J.Cepelewicz (2020): CRISPR acts like an adaptive immune system; it enables bacteria that have been exposed to a virus to pass on a genetic “memory” of that infection to their descendants, which can then mount better defenses against a repeat infection. It’s a system that works so well that an estimated half of all bacterial species use CRISPR. Researchers have uncovered dozens of other systems that bacteria use to rebuff phage invasions. But in laboratory studies, bacteria primarily develop what’s known as surface-based phage resistance. Mutations change receptor molecules on the surface of the bacterial cell, so that the phage can no longer recognize and invade it.

The strategy is akin to shutting a door and throwing away the key: It offers the bacteria complete safety from infection by the virus. But that protection comes at a significant price, because it also disrupts whatever nutrient uptake, waste disposal, communication task or other cellular function the receptor would have been providing — taking a constant toll on a cell’s fitness.

In contrast, CRISPR only drags on a cell’s resources when it’s active, during a viral infection. Even so, CRISPR represents a riskier gambit: It doesn’t start to work until phages have already entered the cell, meaning that there’s a chance the viruses could overcome it. And CRISPR doesn’t just attack viral DNA; it can also prevent bacteria from taking up beneficial genes from other microbes, like those that confer antibiotic resistance. What factors affect the trade-offs in costs and fitness? For the past six years, Edze Westra, an evolutionary ecologist at the University of Exeter in England, has led a team pursuing the answer to that question. In 2015, they discovered that nutrient availability and phage density affected whether Pseudomonas bacteria relied on surface-based or CRISPR-based resistance. In environments poor in resources, receptor modifications were more burdensome, so CRISPR became a better bargain. When resources were plentiful, bacteria grew more densely and phage epidemics became more frequent. Bacteria then faced greater selective pressure to close themselves off from infection entirely, and so they shut down receptors to gain surface-based resistance. This explained why surface-based resistance was so common in laboratory cultures. Growing in a test tube rich in nutrients, “these bacteria are on a holiday,” Westra said. “They are having a terrific time.”

Still, these rules weren’t cut and dried. Plenty of bacteria in natural high-nutrient environments use CRISPR, and plenty of bacteria in natural low-nutrient environments don’t. “It’s all over the place,” Westra said. “That told us that we were probably still missing something.”

How Biodiversity Reshapes the Battle
Then one of Westra’s graduate students, Ellinor Opsal, proposed another potential factor: the diversity of the biological communities in which bacteria live. This factor is harder to study, but scientists had previously observed that it could affect phage immunity in bacteria. For example, in 2005, James Bull, a biologist at the University of Texas, Austin, and William Harcombe, his graduate student at the time (now at the University of Minnesota), found that E. coli bacteria didn’t evolve immunity to a phage when a second bacterial species was present. Similarly, Britt Koskella, an evolutionary biologist at the University of California, Berkeley, and one of her graduate students, Catherine Hernandez, reported last year that phage resistance failed to arise in Pseudomonas bacteria living on their actual host (a plant), though they always gained immunity in a test tube. Could the diversity of the surroundings influence not just whether or not resistance to phages evolved, but the nature of that resistance?

To find out, Westra’s team performed a new set of experiments: Instead of altering the nutrient conditions for Pseudomonas bacteria growing with phages, they added three other bacterial species — species that competed against Pseudomonas for resources but weren’t targeted by the phage. Left to themselves, Pseudomonas would normally develop surface-based mutations. But in the company of rivals, they were far more likely to turn to CRISPR. Further investigation showed that the more complex community dynamics had shifted the fitness costs: The bacteria could no longer afford to inactivate receptors because they not only had to survive the phage, but also had to outcompete the bacteria around them. These results from Westra’s group dovetail with earlier findings that phages can produce greater diversity in bacterial communities. “Now, that diversity is actually feeding back to the phage side of things” by affecting phage resistance, Koskella said. “It’s neat to see that coming full circle.” By understanding that kind of feedback loop, she added, “we can start to ask more general questions about the impacts that phages have in a community context.”

For one, the bacteria’s shift toward a CRISPR-based phage response had another, broader effect. When Westra’s group grew Pseudomonas in moth larvae hosts, they found that the bacteria with surface-based resistance were less virulent, killing the larvae much more slowly than the bacteria with active CRISPR systems did. 44

Discovering CRISPR
S.H. Sternberg (2015): The CRISPR locus was first identified in Escherichia coli as an unusual series of 29-bp repeats separated by 32-bp spacer sequences (Ishino et al., 1987) 21 Carl Zimmer tells us the story (2015): The scientists who discovered CRISPR had no way of knowing that they had discovered something so revolutionary. They didn’t even understand what they had found. In 1987, Yoshizumi Ishino and colleagues at Osaka University in Japan published the sequence of a gene called iap belonging to the gut microbe E. coli. To better understand how the gene worked, the scientists also sequenced some of the DNA surrounding it. They hoped to find spots where proteins landed, turning iap on and off. But instead of a switch, the scientists found something incomprehensible. Near the iap gene lay five identical segments of DNA. DNA is made up of building blocks called bases, and the five segments were each composed of the same 29 bases. These repeat sequences were separated from each other by 32-base blocks of DNA, called spacers. Unlike the repeat sequences, each of the spacers had a unique sequence.

This peculiar genetic sandwich didn’t look like anything biologists had found before. When the Japanese researchers published their results, they could only shrug. “The biological significance of these sequences is not known,” they wrote. It was hard to know at the time if the sequences were unique to E. coli, because microbiologists only had crude techniques for deciphering DNA. But in the 1990s, technological advances allowed them to speed up their sequencing. By the end of the decade, microbiologists could scoop up seawater or soil and quickly sequence much of the DNA in the sample. This technique — called metagenomics — revealed those strange genetic sandwiches in a staggering number of species of microbes. They became so common that scientists needed a name to talk about them, even if they still didn’t know what the sequences were for. In 2002, Ruud Jansen of Utrecht University in the Netherlands and colleagues dubbed these sandwiches “clustered regularly interspaced short palindromic repeats” — CRISPR for short.

Jansen’s team noticed something else about CRISPR sequences: They were always accompanied by a collection of genes nearby. They called these genes Cas genes, for CRISPR-associated genes. The genes encoded enzymes that could cut DNA, but no one could say why they did so, or why they always sat next to the CRISPR sequence. Three years later, three teams of scientists independently noticed something odd about CRISPR spacers. They looked a lot like the DNA of viruses. “And then the whole thing clicked,” said Eugene Koonin. At the time, Koonin, an evolutionary biologist at the National Center for Biotechnology Information in Bethesda, Md., had been puzzling over CRISPR and Cas genes for a few years. As soon as he learned of the discovery of bits of virus DNA in CRISPR spacers, he realized that microbes were using CRISPR as a weapon against viruses.

Koonin knew that microbes are not passive victims of virus attacks. They have several lines of defense. Koonin thought that CRISPR and Cas enzymes provide one more. In Koonin’s hypothesis, bacteria use Cas enzymes to grab fragments of viral DNA. They then insert the virus fragments into their own CRISPR sequences. Later, when another virus comes along, the bacteria can use the CRISPR sequence as a cheat sheet to recognize the invader.
Scientists didn’t know enough about the function of CRISPR and Cas enzymes for Koonin to make a detailed hypothesis. But his thinking was provocative enough for a microbiologist named Rodolphe Barrangou to test it. To Barrangou, Koonin’s idea was not just fascinating, but potentially a huge deal for his employer at the time, the yogurt maker Danisco. Danisco depended on bacteria to convert milk into yogurt, and sometimes entire cultures would be lost to outbreaks of bacteria-killing viruses. Now Koonin was suggesting that bacteria could use CRISPR as a weapon against these enemies.

To test Koonin’s hypothesis, Barrangou and his colleagues infected the milk-fermenting microbe Streptococcus thermophilus with two strains of viruses. The viruses killed many of the bacteria, but some survived. When those resistant bacteria multiplied, their descendants turned out to be resistant too. Some genetic change had occurred. Barrangou and his colleagues found that the bacteria had stuffed DNA fragments from the two viruses into their spacers. When the scientists chopped out the new spacers, the bacteria lost their resistance. Barrangou, now an associate professor at North Carolina State University, said that this discovery led many manufacturers to select for customized CRISPR sequences in their cultures, so that the bacteria could withstand virus outbreaks. “If you’ve eaten yogurt or cheese, chances are you’ve eaten CRISPR-ized cells,” he said.

In 2007, Blake Wiedenheft joined Doudna’s lab as a postdoctoral researcher, eager to study the structure of Cas enzymes to understand how they worked. Doudna agreed to the plan — not because she thought CRISPR had any practical value, but just because she thought the chemistry might be cool. “You’re not trying to get to a particular goal, except understanding,” she said. As Wiedenheft, Doudna and their colleagues figured out the structure of Cas enzymes, they began to see how the molecules worked together as a system. When a virus invades a microbe, the host cell grabs a little of the virus’s genetic material, cuts open its own DNA, and inserts the piece of virus DNA into a spacer. As the CRISPR region fills with virus DNA, it becomes a molecular most-wanted gallery, representing the enemies the microbe has encountered. The microbe can then use this viral DNA to turn Cas enzymes into precision-guided weapons. The microbe copies the genetic material in each spacer into an RNA molecule. Cas enzymes then take up one of the RNA molecules and cradle it. Together, the viral RNA and the Cas enzymes drift through the cell. If they encounter genetic material from a virus that matches the CRISPR RNA, the RNA latches on tightly. The Cas enzymes then chop the DNA in two, preventing the virus from replicating.

CRISPR, microbiologists realized, is also an adaptive immune system. It lets microbes learn the signatures of new viruses and remember them. And while we need a complex network of different cell types and signals to learn to recognize pathogens, a single-celled microbe has all the equipment necessary to learn the same lesson on its own. But how did microbes develop these abilities? Ever since microbiologists began discovering CRISPR-Cas systems in different species, Koonin and his colleagues have been reconstructing the systems’ evolution. CRISPR-Cas systems use a huge number of different enzymes, but all of them have one enzyme in common, called Cas1. The job of this universal enzyme is to grab incoming virus DNA and insert it in CRISPR spacers. Recently, Koonin and his colleagues discovered what may be the origin of Cas1 enzymes.

Along with their own genes, microbes carry stretches of DNA called mobile elements that act like parasites. The mobile elements contain genes for enzymes that exist solely to make new copies of their own DNA, cut open their host’s genome, and insert the new copy. Sometimes mobile elements can jump from one host to another, either by hitching a ride with a virus or by other means, and spread through their new host’s genome.

Koonin and his colleagues discovered that one group of mobile elements, called casposons, makes enzymes that are pretty much identical to Cas1. In a new paper in Nature Reviews Genetics, Koonin and Mart Krupovic of the Pasteur Institute in Paris argue that the CRISPR-Cas system got its start when mutations transformed casposons from enemies into friends. Their DNA-cutting enzymes became domesticated, taking on a new function: to store captured virus DNA as part of an immune defense. While CRISPR may have had a single origin, it has blossomed into a tremendous diversity of molecules. Koonin is convinced that viruses are responsible for this. Once they faced CRISPR’s powerful, precise defense, the viruses evolved evasions. Their genes changed sequence so that CRISPR couldn’t latch onto them easily. And the viruses also evolved molecules that could block the Cas enzymes. The microbes responded by evolving in their turn. They acquired new strategies for using CRISPR that the viruses couldn’t fight. Over many thousands of years, in other words, evolution behaved like a natural laboratory, coming up with new recipes for altering DNA. 45

K. Severinov  (2020): CRISPR-Cas are diverse (two classes, six types) prokaryotic adaptive immunity systems that protect cells from phages and other mobile genetic elements (MGEs) They consist of CRISPR arrays and CRISPR-associated cas genes. The total number of spacers in array varies from one to several hundreds. 46

Understanding CRISPR-Cas9
Imagine a company had the task to install a security system in its headquarters, based on biometrics. Biometrics comes from the Greek words “bios” (life) and “metrikos” (measure). It involves the implementation of a system that uses the analysis of biological characteristics of people, and that analyzes human characteristics for identity verification or identification. In order to distinguish employees that are permitted to enter the building, and exclude to enter those that are not welcome, there has to be first data collection and storage of the information in a memory bank. Every time, when someone arrives at the building, it will go through the security check, and the provided data will be compared to the data in the memory bank. If there is a match, the person is permitted to enter, or not.

Analogously, cells are capable of doing almost the same, with a few differences. They have an ingenious security check system, based on enemy recognition, and based on that knowledge, creating a sophisticated data bank, that is employed to recognize future enemy invasions, and annihilate them.

Perguntas .... - Page 5 Crispr21
A roadmap of CRISPR-Cas adaptation and defense. 
In the example illustrated, a bacterial cell is infected by a bacteriophage. The first stage of CRISPR-Cas defense is CRISPR adaptation. This involves the incorporation of small fragments of DNA from the invader into the host CRISPR array. This forms a genetic “memory” of the infection. The memories are stored as spacers (colored squares) between repeat sequences (R), and new spacers are added at the leader-proximal (L) end of the array. The Cas1 and Cas2 proteins, encoded within the cas gene operon, form a Cas1-Cas2 complex (blue)—the “workhorse” of CRISPR adaptation. In this example, the Cas1-Cas2 complex catalyzes the addition of a spacer from the phage genome (purple) into the CRISPR array. The second stage of CRISPR-Cas defense involves transcription of the CRISPR array and subsequent processing of the precursor transcript to generate CRISPR RNAs (crRNAs). Each crRNA contains a single spacer unit that is typically flanked by parts of the adjoining repeat sequences (gray). Individual crRNAs assemble with Cas effector proteins (light green) to form crRNA-effector complexes. The crRNA-effector complexes catalyze the sequence-specific recognition and destruction of foreign DNA and/or RNA elements. This process is known as interference.




33. Eugene V. Koonin: Inevitability of the emergence and persistence of genetic parasites caused by evolutionary instability of parasite-free states 04 December 2017
33a. Eugene V Koonin: Inevitability of Genetic Parasites 2016 Sep 26
33b. Dana K Howe Muller's Ratchet and compensatory mutation in Caenorhabditis briggsae mitochondrial genome evolution 2008
34. Gregory P Fournier: Ancient horizontal gene transfer and the last common ancestors 22 April 2015
35. Eugene V. Koonin: The LUCA and its complex virome  14 July 2020
36. Aude Bernheim The pan-immune system of bacteria: antiviral defence as a community resource 06 November 2019
37. Eugene V. Koonin: Evolution of adaptive immunity from transposable elements combined with innate immune systems December 2014
38. Felix Broecker: Evolution of Immune Systems From Viruses and Transposable Elements 29 January 2019
39. Anna Lopatina: Abortive Infection: Bacterial Suicide as an Antiviral Immune Strategy 2020 Sep 29
40. Tina Y.Liu: Chemistry of Class 1 CRISPR-Cas effectors: Binding, editing, and regulation 16 October 2020
41. Luciano Marraffini: (Ph)ighting phages – how bacteria resist their parasites 2020 Feb 13
42. Simon J Labrie: Bacteriophage resistance mechanisms 2010 Mar 29.
43. Giedrius Gasiunas: Cas9–crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria September 4, 2012
44. Jordana Cepelewicz: Biodiversity Alters Strategies of Bacterial Evolution January 6, 2020
45. Carl Zimmer Breakthrough DNA Editor Born of Bacteria February 6, 2015
46. K. Severinov : Detection of CRISPR adaptation FEBRUARY 03 2020



12. Aryn A. Price et al.,: Harnessing the Prokaryotic Adaptive Immune System as a Eukaryotic Antiviral Defense 2016 Feb 3
13. Devashish Rath: The CRISPR-Cas immune system: Biology, mechanisms and applications October 2015
14. Dipali G Sashital: The Cas4-Cas1-Cas2 complex mediates precise prespacer processing during CRISPR adaptation Apr 25, 2019
15. SIMON A. JACKSON: CRISPR-Cas: Adapting to change 7 Apr 2017
16. M. P. Terns et al. Three CRISPR-Cas immune effector complexes coexist in Pyrococcus furious 2015 Jun; 21
21. Samuel H. Sternberg et al. Surveillance and Processing of Foreign DNA by the Escherichia coli CRISPR-Cas System  2015 Nov 5

https://reasonandscience.catsboard.com

124Perguntas .... - Page 5 Empty Re: Perguntas .... Tue Sep 13, 2022 6:15 am

Otangelo


Admin

Origin of CRISPR-Cas molecular complexes of prokaryotes



U. Qimron: CRISPR and their associated proteins comprise a significant prokaryotic defense system against viruses and horizontally transferred nucleic acids 26 P. Cossart (2016): In nature, bacteria need to defend themselves constantly, particularly against bacteriophages (or phages), the viruses that specifically attack bacteria. A phage generally attaches itself to a bacterium, injects its DNA into it, and subverts the bacterium’s mechanisms of replication, transcription, and translation in order to replicate itself. The phage DNA reproduces its own DNA, transcribes it into RNA, and produces phage proteins that accumulate to generate new phages and eventually cause the bacterial cell to explode (or lyse), releasing hundreds of new bacteriophages. Phages continually infect bacteria everywhere—in soil, in water, and even in our own intestinal microbiota. Bacteriophage families are numerous and vary widely in their form, size, composition, and the bacteria they target. To begin their attack, bacteriophages need a site of attachment, a particular component on the surface of a bacterium. This site of attachment is specific for each virus and the bacteria that it can infect. Bacteria have an immune system called CRISPR.  CRISPR regions in the chromosomes allow bacteria to recognize predators, particularly previously encountered phages, and to destroy them. CRISPR regions protect and essentially “vaccinate” bacteria against bacteriophages. In fact, it has been shown that bacteria can be artificially vaccinated! When a population of bacteria is inoculated with a phage, a small number survive and are able to integrate a fragment of the phage DNA into their genome, in the region called the CRISPR locus. This allows the bacteria, if the phage ever attacks again, to recognize the phage DNA and degrade it. This ingenious phenomenon, known as interference, occurs due to the structure of the CRISPR region and to cas genes (CRISPR-associated genes) located near this region. The CRISPR locus is a region of the chromosome composed of repeated sequences of around 50 nucleotides, interspersed with sequences known as spacers that are similar to those of bacteriophages. Some bacteria have several CRISPR loci with different sequence repetitions. Around 40% of bacteria have one or more CRISPRs, whereas others have none. CRISPR loci can be quite long, sometimes with more than 100 repetitions and spacers. CRISPRs have two functions: acquisition and interference. Acquisition, also called adaptation, is the process of acquiring fragments of DNA from a phage, and interference is the immunization process by Cas proteins encoded by cas genes (Fig. below). 

Perguntas .... - Page 5 Crispr24

https://reasonandscience.catsboard.com

125Perguntas .... - Page 5 Empty Re: Perguntas .... Tue Sep 27, 2022 9:19 pm

Otangelo


Admin

The amazing design of the bacteriophage and its DNA packaging motor

https://reasonandscience.catsboard.com/t2134-the-amazing-design-of-bacteriophage-viruses-and-its-dna-packaging-motor

Introduction

F.Arisaka (2005): Bacteriophage is an elaborate molecular machine that carries its genomic DNA and efficiently injects it into bacteria. It has a complicated assembly mechanism, where proteins as scaffolding proteins and cleavage of polypeptide bonds in some cases are involved. T4 phage belongs to a family, Caudovirales, which designates a group of phages that has a tail. More than 95% of phages have tails, and possession of the tail is unique to bacteriophages.  Bacteria as single cell organisms have a much more strongly constructed membrane structure than eukaryotic cells. For example, E. coli, a gram-negative bacterium, has triple-layered cell membranes; namely, outer membrane, peptidoglycan layer, and inner membrane. Phages have such a complex structure as a tail to invade the tough barrier of the host cells. 24

Perguntas .... - Page 5 Bacter17

Vergote (2018): The virus bacteriophage T4 resembles the Lunar Lander that was used in the 70’s by the Apollo space program. It has a landing system, duplicates of one protein in the head and a tail used to pass that DNA to infect bacteria. If you are looking for the best design, nature is the perfect place to start. 20 
Is it a coincidence? The Bacteriophage T4 consists of a capsid shell, a head where it stores and protects its genome and a syringe-like structure used to insert the DNA into a host. The tail terminates with a multiprotein baseplate that changes its conformation from a “high-energy” dome-shaped to a “low-energy” star-shaped structure during infection. It also has an ultrafast DNA packaging motor to translocate or pack, long stretches of the virus's genetic material into its capsid shell.

Bacteriophages are molecular machines – created for one reason – to kill bacteria – to control bacterial species population. 
Eric S Miller (2003): T4 bacteriophages constitute a beautifully integrated system of biological machines and networks 13

Eric S Miller (2010): Phage T4 is one of the most extensively investigated viruses and has been the central focus of several monographs and reviews over the last 25 years. The T4 biological system is amenable to investigation by genetic, phylogenetic, biochemical, biophysical, structural, computational, and other tools. 12

A. Roberts (2015): Phage help maintain microbial diversity and balance within Earth’s biosphere. Phage are thought to turn over 20–50 percent of the biomass in Earth’s oceans daily! In the absence of these microbial predators it is hard to imagine how our planet would ever sustain life beyond mere microbes. The planet would be covered with microbial competition specialists, sequestering all of Earth’s resources necessary for advanced life. If not for bacterial predation via phage, bacteria would certainly dominate life to the exclusion of advanced organisms. 2

Vincent R. Racaniello (2015): There are more than 10^30 bacteriophage particles in the world’s oceans, enough to extend out into space for 200 million light-years if arranged head to tail. The average human body contains approximately 10^13 cells, but these are outnumbered 10-fold by bacteria and as much as 100-fold by virus particles. 14

Nicola Twilley (2015): There are an estimated 10^31—ten million trillion trillion—phages on Earth, more than every other organism, including bacteria, put together. According to researchers in Vancouver, these tiny viruses cause a collective trillion trillion successful infections per second, in the process destroying up to forty percent of all bacterial cells in the ocean every single day. Following their deaths at the hands of phages, those carbon-containing microorganisms sink down into the marine sediment, effectively removing greenhouse gases from circulation. Anything that bacteria do, from breaking down the carcasses of dead animals to converting atmospheric nitrogen into plant food, is at the mercy of the phages that infect, kill, or otherwise transform them. Phages are the puppet masters; they insure that essential biochemical processes run smoothly. 11

J.Sarfati (2008): Viruses are particles so tiny that they can’t be seen by an ordinary light microscope, but only under an electron microscope. Viruses come in many different sizes, shapes and designs, and they operate in diverse ways. They are composed of DNA (or RNA in the case of RNA viruses, including retroviruses) and protein. They are not living organisms because they cannot carry out the necessary internal metabolism to sustain life, nor can they reproduce themselves. They are biologically inert until they enter into host cells. Then they start to propagate using host cellular resources. The infected cell produces multiple copies of the virus, then often bursts to release the new viruses so the cycle can repeat. One of the most common types is the bacteriophage (or simply ‘phage’) which infects bacteria. It consists of an infectious tailpiece made of protein, and a head capsule (capsid) made of protein and containing DNA packaged at such high pressure that when released, the pressure forces the DNA into the infected host cell. How does the virus manage to assemble this long information molecule at high pressure inside such a small package, especially when the negatively charged phosphate groups repel each other? It has a special packaging motor, more powerful than any molecular motor yet discovered, even those in muscles.The genome is about 1,000 times longer than the diameter of the virus. It is the equivalent of reeling in and packing 100 yards of fishing line into a coffee cup, but the virus is able to package its DNA in under five minutes.This motor exerts a force twice as powerful as a car engine. So the motor, a terminase enzyme complex, ‘can capture and begin packaging a target DNA molecule within a few seconds.’ Such a powerful motor must use a lot of energy, and in one second, this one goes through over 300 units of life’s energy currency ATP.  The virus has a complementary motor-enzyme, ATPase, built into its packaging engine, to release the energy of the ATP.  And not only is the packing motor powerful, it can change its speed as if it had gears. The researchers say that this is important, because the DNA fed to it from the cell is likely not a straightforward untangled thread. Just as it is good for a car to have brakes and gears, rather than only being able to go 60 miles per hour, the DNA-packaging motor may need to slow down, or stop and wait if it encounters an obstruction. It may permit DNA repair, transcription or recombination— the swapping of bits of DNA to enhance genetic diversity—to take place before the genetic material is packaged within the viral capsid. 4

Joseph W. Francis (2003): Microbes and viruses perform essential roles in all ecosystems of the biosphere. Microbes and viruses perform many beneficial activities in ecosystems and in symbiotic partnerships with all biological organisms. I propose that microbes were created as an organosubstrate; a link between macro-organisms and a chemically rich but inert physical environment, to provide a substrate upon which multicellular creatures can thrive and persist in intricately designed ecosystems. Viewed in this context microbes and viruses could also be thought of as a single, complex, massive, multicellular, multitaxon organism with incredible and powerful life supporting properties. Many microbes live on and within living organisms. It is estimated that the number of microbes living on the human body far exceeds the 70 trillion human cells that comprise it. The discipline of microbial ecology is increasingly revealing that microbial and viral symbionts play vitally important roles within organisms and ecosystems. In fact, axenic (germ-free life) probably does not exist in nature; all animal species with the exception of prenatal life are thought to live with microbial symbionts. A tremendous number of symbiotic relationships are being discovered. Many of these relationships involve complex lifestyles and anatomies that appear to be designed to foster the symbiotic lifestyle. A general survey of symbiotic relationships also shows that the most common functions provided by symbionts involve nutritional support, protection and reproduction/population control.

Structure of the bacteriophage T4
Perguntas .... - Page 5 Bacter12
Head: It is elongated and hexagonal in shape. Possesses a prismoid structure. It is surrounded by an envelope called a capsid.
Capsid: It is produced by identical protein subunits called capsomeres. It contains around 2000 capsomeres.
Genetic material: It is 50 nm long and can be either DNA or RNA. The structure of genetic material can be linear or circular. It is tightly packed inside the head.
Neck: It is also called a collar, which connects the head and tail. It possesses a circular plate-like structure.
Tail: It resembles a hollow tube. A tail is surrounded by a protein sheath.
Sheath: It is composed of around 144 protein subunits. The sheath of the bacteriophage is highly contractile. It contains 24 rings.
Base plate: It is hexagonal in shape. The base plate is present at a distal end.
Tail fibers: These are attached to the base plate. It appears long and thread-like filaments. Tail fibers induce host specificity, or they are host-specific. They are generally found 6 in number. Size: 130x2nm
Spikes: It is also called a tail pin. Spikes recognize the receptor sites of the host cell.

Hari charan (2020): This overall structure is necessary the way the phages deliver their payload of genetic material into bacteria. Once on the surface of a bacterium, the tube portion contracts, and the phage acts like microscopic hypodermic needle, literally injecting the genetic material into the bacterium.

P. G. Leiman (2003): Bacteriophage T4 is one of the most complex viruses. More than 40 different proteins form the mature virion, which consists of a protein shell encapsidating a 172-kbp double-stranded genomic DNA, a ‘tail,’ and fibers, attached to the distal end of the tail. The fibers and the tail carry the host cell recognition sensors and are required for attachment of the phage to the cell surface. The tail also serves as a channel for delivery of the phage DNA from the head into the host cell cytoplasm. The tail is attached to the unique ‘portal’ vertex of the head through which the phage DNA is packaged during head assembly. Bacteriophage T4 is a double-stranded DNA (dsDNA) tailed virus that infects E.coli. The T4 is one of the most complex viruses, with a genome that contains 274 open reading frames out of which more than 40 encode structural proteins. The mature virus, or ‘virion,’ consists of a prolate head with hemiicosahedral ends  encapsidating the genomic DNA; a cocylindrical contractile tail, terminated with a baseplate; and six fibers attached to the baseplate. The head, tail, and fibers assemble via independent ordered pathways and join together to form a mature virus particle. Unlike animal viruses, infection of host cells by tailed bacteriophages is highly efficient – only one bacteriophage T4 particle is required, in general, to infect a host cell. Upon infection, the phage shuts down host-specific nucleic acid and protein syntheses, thus ensuring production of only its own components in amounts sufficient to assemble up to 200 progeny virus particles per infected cell. The efficiency of the infection process and the large genome of bacteriophage T4, in which only half of the genes are necessary for proliferation on E. coli, contribute to the diversity of the phages from the T4-like family, a subgroup of Myoviridae. These phages propagate on a wide range of bacterial hosts that grow in diverse environments. 

Bacteriophage T4 head structure
 
The head of bacteriophage T4 is composed of more than 3000 polypeptide chains of at least 12 kinds of protein. The shell has icosahedral ends and a cylindrical equatorial midsection with a unique portal vertex where the phage tail is attached. 9

Moh Lan Yap (2015): The head is first assembled as an empty capsid that is subsequently packaged with DNA by an ATP-dependent packaging machine. This machine binds to the same special pentameric vertex that is later occupied by the phage tail.  The capsid is composed of 930 post-translationally modified monomers, or 155 hexamers of the major protein, gene product 23 (gp23*, where the * signifies post-translational cleavage). The presence of proteins, homologous to the major capsid protein, which form pentamers as opposed to hexamers is a frequent solution to the formation of the pentameric vertices in icosahedral viruses. The portal protein has multiple roles. It initiates head assembly, genome packaging and serves as the genome gatekeeper to prevent leakage of the packaged DNA.  The rod-shaped Soc binds between two gp23* hexamers, thus forming a continuous mesh surrounding the hexameric gp23* on the capsid. Soc maintains the stability of the head under extreme environments. Hoc is an elongated molecule protruding from the center of gp23* hexamers. Its Ig-like domains, exposed on the outer surface of the head, may provide survival advantages to the phage. 21

The Molecular Architecture of the Bacteriophage T4 Neck

Andrei Fokine et.al. (2013):  The T4 head and tail are assembled via independent pathways. Assembly of the T4 head is a complex process that includes a number of intermediate stages. The head assembly is initiated by the dodecamer (A dodecamer (protein) is a protein complex with 12 protein subunits.) of the portal protein, gp20 (gp, gene product). First, a head precursor, called the prohead, is assembled, which is subsequently processed by a scaffold-associated protease. Then the phage genomic DNA is packaged into the capsid through the portal vertex by an ATP-driven motor composed of five gp17 molecules. Upon completion of the DNA packaging, the head assembly is finalized by attachment of several copies of the gp13 and gp14 proteins to the portal vertex. Monomers of gp13 and gp14 have a size of 309 and 256 amino acid (aa) residues, respectively. The gp13–gp14 complex seals the portal vertex and creates a site for attachment of the independently assembled tail. Mutant phages lacking these proteins produce heads that are unable to bind tails and lose their DNA.

The T4 tail assembly begins with the baseplate formation and proceeds with polymerization of the tail tube and the contractile sheath. The tail tube is formed by gp19 molecules (163 aa residues). The length of the tube is controlled by a mechanism involving the “tape-measure protein”, gp29. The elongation of the tail tube is terminated by attachment of the hexamer of the 175-residue tail tube terminator protein, gp3, which binds to the last row of gp19 subunits (probably also to gp29) and stabilizes the tail tube. The T4 tail tube is used as a scaffold for the polymerization of the contractile sheath. The gp18 sheath molecules (659 aa residues) assemble around the tube in the form of a six-start helix. The T4 tail assembly is completed by the hexamer of the tail terminator protein, gp15 (the monomer is 272 aa residues long), which binds to the top† of the tail. Contraction of the tail during infection is associated with a substantial rearrangement of the gp18 subunits and results in shortening of the sheath to less than one-half of its original length. 22

The tail structure

Phys.Org ( 2016):  To infect bacteria, most bacteriophages employ a 'tail' that stabs and pierces the bacterium's membrane to allow the virus's genetic material to pass through. The most sophisticated tails consist of a contractile sheath surrounding a tube akin to a stretched coil spring at the nanoscale. When the virus attaches to the bacterial surface, the sheath contracts and drives the tube through it. All this is controlled by a million-atom baseplate structure at the end of the tail. Phages are widely distributed on the planet. They accompany bacteria everywhere - in the soil, water, hot springs, algal bloom, animal intestines etc - and have a dramatic impact on the diversity of bacterial populations, including for example, the microbiome of the human gut. 18

The tail, fibers, and infection process  

Phages from the Myoviridae family have exceptionally complex, contractile tails. Bacteriophage T4 devotes 25 kbp of its genome to tail assembly, which is comparable with the size of the entire adenovirus genome (36 kbp). Products of at least 22 genes are involved in tail assembly, which include a phage-encoded chaperone that participates in folding of the long and short tail fibers.  The bacteriophage T4 tail is composed of two concentric protein cylinders, at one end of which is the baseplate and fibers. The inner cylinder, called the tail tube, is built of 144 copies of gp19. The tail tube has a 40 Å-diameter channel for DNA passage from the head to the infected cell. The outer cylinder, called the tail sheath, tightly envelopes the 90 Å-diameter tail tube and has a width of about 210 Å. It is composed of 144 copies of gp18. The subunits comprising each cylinder form a six-start helix with a pitch of 41 Å and a righthanded twist angle of 17°. The helix has a length of 984 Å and contains 24 repeats. During infection, the phage recognizes an E. coli bacterium using its long tail fibers (LTFs) connected to the baseplate. The phage then anchors the baseplate to the lipopolysaccharide cell surface receptors using the short tail fibers (STF), which are initially assembled under the baseplate. This event triggers a hexagon-to-star conformational change in the baseplate and causes an irreversible contraction of the tail sheath, releasing about 25 kcal/ mol of energy per gp18 monomer. During this process, the gp18 hexamers flatten, rotate, and expand radially, resulting in a decrease of their thickness by 26 Å and an increase of the twist angle by 15°. The contracted tail sheath has a length of only 360 Å and a width of 270 Å. The tail tube does not change its length during sheath contraction. As a result, almost half of the tube protrudes out of the contracted tail sheath and the baseplate.

The sheath can be caused to contract by exposing the phage to 3 M urea. Nevertheless, the DNA is not released until the tail tube tip binds to a cytoplasmic membrane receptor common to enteric bacteria, suggesting that tail contraction does not cause the release of DNA. The interaction of the tail tube tip with the cytoplasmic membrane involves creation of a channel for DNA passage. During DNA transfer from the capsid into the cell, the membrane remains virtually undamaged since the transfer requires a proton motive force across the membrane. The assembly pathway of the bacteriophage T4 tail is regulated by ordered sequential interactions of proteins rather than sequential gene expression. The baseplate, a remarkably complex multiprotein structure, is assembled first. It is composed of about 150 subunits of at least 16 different gene products, many of which are oligomeric (table 3). These proteins form six independently assembled wedges that join together around the central hub with the help of the trimeric proteins (gp9) and (gp12). Each wedge is assembled by sequential interactions of the seven protein oligomers: (gp11), (gp10), (gp7), (gp8), (gp6), gp53, and gp25. The baseplate hub is formed by (gp5), (gp27), gp29 and, probably, gp28. The assembly of the baseplate is completed with the attachment of six copies of gp48 and six copies of gp54 to the external interface between the wedges and the hub. The latter proteins serve as a starting point for polymerization of gp19 to form the tail tube, which is terminated with gp3. The tail tube serves as a scaffold for polymerization of the tail sheath around it. During this process, gp18 stores energy in its conformation (possibly by ATP hydrolysis), making the non-contracted T4 tail a stretched spring. The length of the tail tube is controlled by the ruler protein, gp29, which also participates in assembly of the central part of the baseplate. The length of the tail sheath is determined by the length of the tube. The assembly of the tail is completed by attachment of a gp15 hexamer to the last ring of the tail sheath. The baseplate is a dome-shaped object.  The hollow tail tube stems from the center of the baseplate. 15

V. V. Mesyanzhinov (2004): Products of at least 22 genes are involved in assembly of the T4 phage tail (Table above) that uses the energy of the sheath contraction for DNA ejection into the host cell. The assembly pathway of the tail is based on strictly ordered sequential interactions of proteins. The baseplate is a remarkably complex multiprotein structure of the tail that serves as a control unit of virus infection. The baseplate is composed of ~150 subunits of at least 16 different gene products, many of which are oligomeric, and assembled from six identical wedges that surround a central hub. The T4 gp11 (the short tail fiber connecting protein), gp10, gp7, gp8, gp6, gp53, and gp25 combine sequentially to built up a wedge. The central hub is formed by gp5, gp27, and gp29 and probably gp26 and gp28. Assembly of the baseplate is completed by attaching gp9 and gp12 forming the short tail fibers, and also gp48 and gp54 that are required to initiate polymerization of the tail tube, a channel for DNA ejection that is constructed of 138 copies of gp19. The length of the tail tube is probably determined by the “ruler protein” or template, gp29. The tail tube serves as a template for assembly of 138 copies of gp18 that form the contractile tail sheath. In the absence of the tail tube, gp18 assembles into long polysheaths with a structure similar in several aspects to the contracted state. Both the tail tube and the tail sheath have helical symmetry. Assembled tail sheath represents a metastable supramolecular structure, and sheath contraction is an irreversible process. During contraction the length of tail sheath decreases from 980 to 360 Å and its outer diameter increases from 210 to 270 Å. The assembly of the tail is completed by a gp15 hexamer that binds to the last gp18 ring of the tail sheath. The assembled tail associates with the head after DNA packaging. Then six gpwac (fibritin) molecules attach to the neck of the virion forming a ring embracing it (“collar”) and thin filaments protruding from the collar (“whiskers”) that help with attachment of the phage particle to other fibrous proteins, the long tail fibers. 23

The cell-puncturing device of bacteriophage T4

ScienceDaily (2016): To infect bacteria, most bacteriophages employ a 'tail' that stabs and pierces the bacterium's membrane to allow the virus's genetic material to pass through. The most sophisticated tails consist of a contractile sheath surrounding a tube akin to a stretched coil spring at the nanoscale. When the virus attaches to the bacterial surface, the sheath contracts and drives the tube through it. All this is controlled by a million-atom baseplate structure at the end of the tail. EPFL scientists have now shown, in atomic detail, how the baseplate coordinates the virus's attachment to a bacterium with the contraction of the tail's sheath. 

ScienceDaily (2002): The viral machine works as follows: The virus uses its long-tail fibers to recognize its host and to send a signal back to the baseplate. Once the signal is received, the short-tail fibers help anchor the baseplate into the cell surface receptors. As the virus sinks down onto the surface, the baseplate undergoes a change — shifting from a hexagon to a star-shaped structure. At this time, the whole tail structure shrinks and widens, bringing the internal pin-like tube in contact with the outer membrane of the E. coli cell. As the tail tube punctures the outer and inner membranes of the E. coli cell, the virus' DNA is injected through the tail tube into the host cell. 17

P. G. Leiman (2003): Phages are widely distributed on the planet. They accompany bacteria everywhere -- in the soil, water, hot springs, algal bloom, animal intestines etc -- and have a dramatic impact on the diversity of bacterial populations, including for example, the microbiome of the human gut.  The entire baseplate-tail-tube complex consists of one million atoms, making up 145 chains of 15 different proteins. The scientists were also able to identify a minimal set of molecular components in the baseplate that work together like miniature gears to control the activity of the virus's tail. These components, and the underlying functional mechanism, are the same across many viruses and even bacteria that use similar tail-like structures to inject toxins into neighboring cells. 15

One of the most remarkable features of the baseplate is the spike, or needle, along the axis of the dome. The crystal structure of the gp5-gp27 complex (fig. below) can be fitted into the baseplate map so that the needle density is occupied by the C-terminal domain of gp5. The gp27 trimer forms a channel suitable for the passage of a dsDNA and serves as an extension of the tail tube. Gp5 consists of three domains: an N-terminal oligosaccharide binding-fold domain, the middle lysozyme domain, and the C-terminal triple b-helix domain (fig.above). The gp5 lysozyme domain has 43% sequence identity and a closely similar structure to the T4 lysozyme encoded by gene e (T4L). 

Structure of the T4 baseplate

ScienceDaily (2002): The baseplate is the "nerve center" of the virus. When the long and short fibers attach to E. coli, the baseplate transmits this message to the tail, which contracts like a muscle. The baseplate both controls the needlepoint of the tail and the cutting enzyme that make a tiny, nanometer-sized hole through the cell wall of the E. coli. The viral DNA is then squeezed through the tail into the host. The E. coli, thus infected, starts to make only new phage particles and ultimately dies. 16

M. I. Taylor (2016): Bacteriophages (viruses of bacteria) use a specialized organelle called a tail to deliver their genetic material and proteins across the cell envelope during infection. In phages with the most complex contractile tails, attachment to the host cell is accompanied by a substantial transformation of the tail structure: the external tail sheath contracts and drives a spike-tipped , rigid tube through the host cell membrane. Other macromolecular complexes, such as the type VI secretion system (T6SS), metamorphosis-associated contractile (MAC) arrays, R-type pyocins, Serratia antifeeding prophage, Photorhabdus virulence cassette and rhapidosomes, use a similar contractile sheath–rigid tube mechanism to breach the bacterial or eukaryotic cell envelope. The most complex part of these ‘contractile injection systems is the baseplate, which is responsible for coordinating host recognition or other environmental signals with sheath contraction. The T4 baseplate is currently thought to contain at least 15 different proteins with copy numbers ranging from 1 to 18. Assembly of the T4 baseplate involves two large independent intermediates: a hub and a wedge. Several phage and possibly host cell chaperones mediate the joining of six wedges to the hub, which is then followed by the attachment of receptor-binding fibres to this structure. The nascent baseplate initiates tube assembly and subsequent polymerization of the sheath in the extended high-energy state. The remarkable structure and transformation of the T4 tail and other contractile injection systems have received considerable attention. This is a 440,000-atom (not counting hydrogens) structure. 19 

V. V. Mesyanzhinov (2004): Practically every assembled T4 particle is able to infect an E. coli host cell. The baseplate is the control center of the viral infectivity, and understanding of the baseplate structure, a multiprotein machine, is a challenging problem. Below we represent the data about baseplate proteins with known atomic structure.

The tail tube & tail sheath terminators

Moh Lan Yap (2015): The polymerized tail tube and sheath are capped by the terminator proteins gp3 and gp15, respectively, to prevent depolymerization before the tail attaches to the head. Both gp3 and gp15 form hexameric rings that interact with the last row of gp19 and gp18 molecules. The central pore and the side surface of gp15 are negatively charged, whereas the top and the bottom surfaces are positively charged. The top and bottom surfaces interact with gp14 and gp3 proteins, respectively. The interaction between gp15 and gp18 is different in the extended and contracted (postinfection) conformations. In the contracted tail, the negatively charged side surface of the gp15 hexamer interacts with positively charged surfaces of the C-terminal domains of the gp18 molecules. These interactions help to maintain the integrity of the tail in its contracted form. The gp15 hexamer may have undergone a conformational change during the infection process, which might be propagated through gp14 and gp13, to the portal assembly to allow the release of the genomic DNA. 21

Long tail fibers collar & whiskers

Moh Lan Yap (2015): The long tail fibers (LTFs) consist of four different proteins, namely gp34, gp35, gp36 and gp37.  The chaperon protein gp57A is required for the trimerization of gp34 and gp37, whereas the chaperon protein gp38 is required for proper folding of gp37. The proximal half of the fiber is formed by gp34, which interacts with the adaptor protein gp9 on the baseplate. The monomeric gp35 forms the hinge or knee connecting the proximal and distal parts of the LTFs. The proximal and distal half-fibers assemble independently. Subsequently, the C-terminal part of gp36 binds to the N-terminal region of gp37. The distal part of the fiber is a trimer that can be divided by visual inspection of EM images into 11 domains (D1–D11) (Figure B). Domains D1 and D2 are a part of gp36 and D3–D11 are a part of gp37. The crystal structure of what had been assigned by EM as domains D10 and D11 of gp37 (residues 811–1026) has been determined (Figure C). This structure suggested that a more accurate description of this part of gp37 was in terms of a collar, needle and head domain. The head domain sits at the tip of the distal end and thus should be the T4 component that recognizes the receptor-binding site on the host cell. Since gp37 is known to bind to lipopolysaccharide, and as protein saccharide interactions usually involve aromatic side chains, a series of Tyr and Trp residues at the tip of gp37 might be important for host recognition. Sequence analysis shows that the collar and needle domains are conserved among other phages, whereas the head domain has diverged, suggesting that the host range specificity is determined by the head domain, consistent with the head domain having the receptor recognition function. 21

V. V. Mesyanzhinov (2004): THE STRUCTURES OF PHAGE FIBERS: Certain viruses, like adeno- and reoviruses, as well as many bacteriophages use fibrous proteins to recognize their host receptors. T4 has three types of fibrous proteins: the long tail fibers, the short tail fibers, and whiskers. The long tail fibers, which are ~1450 Å long and only ~40 Å in diameter, are primary reversible adsorption devices. Each fiber consists of the rigid proximal halves, encoded by gene 34, and the distal ones, encoded by genes 36 and 37. These halves are connected by gp35 that forms a hinge region and interacts with gp34 and gp36. The proteins that form the long tail fibers are homotrimers, except for gp35 that assembles as a monomer. The N-terminus of gp34 forms the baseplate-binding bulge, and the C-terminus of gp37 binds to a cell lipopolysaccharide (LPS) receptor. Two phage-encoded chaperones, gp57A and gp38, are required for assembly of both long tail fiber proximal and distal parts. Gp38 is a structural component for closely related T2 phage distal part of long tail fiber; it binds to the tip of gp37 and is responsible for receptor recognition. Gp57A is also required for assembly of short tail fibers. Another two assembly-assisted proteins, gp63 and gpwac, participate in the attachment of the long tail fibers to the baseplate. 

Structure of short tail fibers. 
The short tail fiber is a club-shaped molecule ~340 Å long consisting of a parallel, in-register assembled trimer of gp12 of 527 residues per polypeptide chain. Short tail fibers are attached to baseplate by the N-terminal thin part, while the globular C-terminus binds to the host cell LPS receptors. The ordered residues, 246-289, revealed a new folding motif, which is composed of intertwined strands. Residues 290-327 form a central righthanded triple β-helix. . X-Ray crystallography of this fragment at 1.5 Å resolution reveals the structure of the C-terminal part of the molecule that has a novel “knitted” fold, consisting of three extensively intertwined gp12 monomers, and interacts with LPS. The intertwining of the receptor binding domain represents a case of a 3D “domain swapping” phenomena found in several proteins. 

Fibritin structure (Whiskers) 
T4 fibrous protein, gpwac (whisker antigen control) or fibritin, is attached to the neck formed by gp13 and gp14, the collar and whiskers. Fibritin is a homotrimer assembled in parallel and attached to the T4 neck by the N-terminal domain. The C-terminal domain is a “foldon”, a unit required for fibritin folding and initiation of protein assembly. It was shown that the foldon provides correct alignment of three polypeptide chains [82, 83]. The foldon is a protein unit, which forms on the initial steps of folding and often remains intact after it is transferred into other proteins. Fibritin belongs to a specific class of accessory proteins acting in the phage assembly as a bi-complementary template accelerating the connection of the distal parts of long tail fibers to their proximal parts. Being a structural component of the mature phage particle, fibritin also works as a primitive molecular sensor. Under conditions unfavorable for phage growth (low temperature), fibritin holds the long fibers in a fixed position, raised to the tail and capsid, keeping virus particles noninfectious. 23

How DNA got into the bacteriophage

In a bacterium infected by T4 new bacteriophages are assembled in a stepwise process. The shaft builds up. DNA is replicated and the pro head assembles as an empty shell. But how does the DNA get into the prohead? The initiation of DNA import is not entirely clear yet but once DNA the packaging motor and pro head interact the DNA is rapidly threaded through a pore in the circular motor at the speed of about 2000 base pairs a second. Once the head is full, the packaging motor cuts the DNA, and the motor complex falls away. The shaft and long tail fibers are attached to complete the infectious particle. Within one hour more than 100 new phages are released from a single infected bacteria which makes the t4 phage one of the most efficient but also a fascinating killing machine

The argument from the DNA’s molecular motor
1. There is a “very fast and powerful molecular motor” that crams the viral DNA tightly into the capsid with the help of five moving parts.
2. The parts of the motor move in sequence like the pistons in a car's engine, progressively drawing the genetic material into the virus's head, or capsid.
3. The motor is needed to insert DNA into the capsid of the T4 virus, which is called a bacteriophage because it infects bacteria.
4. The T4 molecular motor is the strongest yet discovered in viruses and proportionately twice as powerful as an automotive engine. The motors generate 20 times the force produced by the protein myosin, one of the two proteins responsible for the contraction and strength of muscles.
5. Even viruses, which are not even alive by the scientific definition of being able to reproduce independently, show incredible design.
6. If design is what we observe, then there must be a designer.
7. God most probably, exists.

1. http://www.pnas.org/content/111/42/15096.short
2. Anjeanette Roberts Celebrating 3.8 Billion Years of Bacteriophage October 22, 2015
3. http://www.ncbi.nlm.nih.gov/pubmed/22297528
4. https://creation.com/images/pdfs/tj/j22_1/j22_1_15-16.pdf
6. http://creation.com/did-god-make-pathogenic-viruses
7. Joseph W. Francis: The Organosubstrate of Life: A Creationist Perspective of Microbes and Viruses 2003 
8.   http://teaguesterling.com/dna/motor-protein.pdf
9.   G. Leiman: Structure and morphogenesis of bacteriophage T4 P.  9 May 2003 
10. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3109452/
11. Nicola Twilley: Inside the World of Viral Dark Matter February 6, 2015
12. Eric S Miller Bacteriophage T4 and its relatives 28 October 2010
13. Eric S Miller Bacteriophage T4 genome 2003 Mar;6
14. Vincent R. Racaniello: Principles of Virology, Volume 1: Molecular Biology 18 agosto 2015
15. Science Daily: How viruses infect bacteria: A tale of a tail May 18, 2016
16. Science Daily: New Understanding Of Complex Virus Nano-Machine For Cell Puncturing And DNA Delivery February 4, 2002
17. Science Daily: Study Reveals New Information On How Viruses Enter Cells February 7, 2002
18. Phys.Org: How viruses infect bacteria: A tale of a tail MAY 18, 2016
19. M. I. Taylor et.al. Structure of the T4 baseplate and its function in triggering sheath contraction  18 May 2016
20. Jaap Vergote Design and nature Jun 24, 2018
21. Moh Lan Yap: Structure and function of bacteriophage T4  2015 Aug 1
22. Andrei Fokine: The Molecular Architecture of the Bacteriophage T4 Neck 2013 Feb 19
23. V. V. Mesyanzhinov: Molecular Architecture of Bacteriophage T4 July 9, 2004
24. Fumio Arisaka: Assembly and infection process of bacteriophage T4 03 November 2005

https://reasonandscience.catsboard.com

Sponsored content



Back to top  Message [Page 5 of 19]

Go to page : Previous  1, 2, 3, 4, 5, 6 ... 12 ... 19  Next

Permissions in this forum:
You cannot reply to topics in this forum