Origin of CRISPR-Cas molecular complexes of prokaryotes

Otangelo

Otangelo

Admin

Posts : 9704
Join date : 2009-08-09
Age : 58
Location : Aracaju brazil

Origin of CRISPR-Cas molecular complexes of prokaryotes

https://reasonandscience.catsboard.com/t3243-origin-of-crispr-cas-molecular-complexes-of-prokaryotes

1. All entrance identity check systems based on data collection and storage mechanisms are designed.
2. CRISPR-Cas is an immune system based on data storage and identity check systems.
3. Therefore, it was designed.

Casey Luskin: Researchers Suggest Molecular Machine Is Irreducibly Complex August 8, 2014,
https://evolutionnews.org/2014/08/researchers_sug/

Structure of molecular machine that targets viral DNA for destruction determined
"The structure of this biological machine is conceptually similar to an engineer's blueprint, and it explains how each of the parts in this complex assemble into a functional complex that efficiently identifies viral DNA when it enters the cell," Wiedenheft said. "This surveillance machine consists of 12 different parts and each part of the machine has a distinct job. If we're missing one part of the machine, it doesn't work."
https://www.sciencedaily.com/releases/2014/08/140807154143.htm

R.N. JACKSON: Crystal structure of the CRISPR RNA–guided surveillance complex from Escherichia coli 7 Aug 2014
https://www.science.org/doi/10.1126/science.1256328

Programmable Memory in Prokaryotes April 12, 2017
https://evolutionnews.org/2017/04/programmable-memory-in-prokaryotes/

Luciano A. Marraffini (2018): Many bacteria and archaea have the unique ability to heritably alter their genomes by incorporating small fragments of foreign DNA, called spacers, into CRISPR loci. Once transcribed and processed into individual CRISPR RNAs, spacer sequences guide Cas effector nucleases to destroy complementary, invading nucleic acids. Collectively, these two processes are known as the CRISPR–Cas immune response. 25

Udi Qimron: CRISPR and their associated proteins comprise a significant prokaryotic defense system against viruses and horizontally transferred nucleic acids 26

https://www.youtube.com/watch?v=qc6xgb4VXl0
CRISPR is actually a sequence of information that is present in the bacterial genome and it is regularly interspersed and repetitive in nature that's why it's called palindromic repeats and there is a protein that is associated with CRISPR locus which is known as CRISPR associated protein line which works like a molecular scissor. CRISPR can be understood as a defense mechanism against viruses that are used against phage viruses so the phage viruses are the key enemies of e.coli which infect them and kill them so the virus has developed a defense mechanism that is embedded in their genome and which is in terms of a DNA sequence CRISPR locus and a cas gene which is producing a protein cas9. The sequence information of the CRISPR locus is strikingly similar to the sequence information present in the phage genome scientists scratch their heads why this sequence information is identical and similar what is the consequence of that or what is the cause of that? The phage infected the bacteria and the fed genetic material is inside the bacteria and tries to get integrated into the bacterial genome these sequences remain incorporated in the CRISPR locus now once the sequence gets incorporated into the CRISPR locus they become a part of the bacterial genome. Let's say there is a second infection by the same patch now this CRISPR locus would create specific RNA which is targeting the phage genome and the cas gene product which is cas9 would get associated with that RNA which is produced by the CRISPR locus called guide RNA and another RNA called tracer RNA and ultimately this cas9 along with this tracer and guide RNA cleaves the fudge genome now once the phage genome is cleaved using this CRISPR cas9 system the phage DNA would be degraded and that is how the bacteria protects itself from phage viruses and that's how it works it's a bacterial adaptive immune system so precise and beautiful.

Pascale Cossart (2016): In nature, bacteria need to defend themselves constantly, particularly against bacteriophages (or phages), the viruses that specifically attack bacteria. A phage generally attaches itself to a bacterium, injects its DNA into it, and subverts the bacterium’s mechanisms of replication, transcription, and translation in order to replicate itself. The phage DNA reproduces its own DNA, transcribes it into RNA, and produces phage proteins that accumulate to generate new phages and eventually cause the bacterial cell to explode (or lyse), releasing hundreds of new bacteriophages. Phages continually infect bacteria everywhere—in soil, in water, and even in our own intestinal microbiota. Bacteriophage families are numerous and vary widely in their form, size, composition, and the bacteria they target. To begin their attack, bacteriophages need a site of attachment, a particular component on the surface of a bacterium. This site of attachment is specific for each virus and the bacteria that it can infect. Bacteria have an immune system called CRISPR. CRISPR regions in the chromosomes allow bacteria to recognize predators, particularly previously encountered phages, and to destroy them. CRISPR regions protect and essentially “vaccinate” bacteria against bacteriophages. In fact, it has been shown that bacteria can be artificially vaccinated! When a population of bacteria is inoculated with a phage, a small number survive and are able to integrate a fragment of the phage DNA into their genome, in the region called the CRISPR locus. This allows the bacteria, if the phage ever attacks again, to recognize the phage DNA and degrade it. This ingenious phenomenon, known as interference, occurs due to the structure of the CRISPR region and to cas genes (CRISPR-associated genes) located near this region. The CRISPR locus is a region of the chromosome composed of repeated sequences of around 50 nucleotides, interspersed with sequences known as spacers that are similar to those of bacteriophages. Some bacteria have several CRISPR loci with different sequence repetitions. Around 40% of bacteria have one or more CRISPRs, whereas others have none. CRISPR loci can be quite long, sometimes with more than 100 repetitions and spacers. CRISPRs have two functions: acquisition and interference. Acquisition, also called adaptation, is the process of acquiring fragments of DNA from a phage, and interference is the immunization process by Cas proteins encoded by cas genes (Fig. below).

Origin of CRISPR-Cas molecular complexes of prokaryotes Crispr24

(Top) The three steps involved in CRISPR function.
(1) Integration of a piece of DNA from a phage into the CRISPR locus (acquisition);
(2) the expression of Cas proteins and of pre-crRNA, which is then split into small crRNAs;
(3) the interference that takes place when the DNA injected by a phage into a bacterium meets a crRNA, a hybrid form that is then degraded, consequently preventing infection.
(Bottom) Schematic drawing of genome modification (gene editing) by an sgRNA (small guided RNA) made of crRNA and tracrRNA and the endonuclease Cas9.

Bacteria have numerous proteins with various complementary and synergistic functions in the process of adaptation and interference. They permit the addition of DNA fragments into the CRISPR locus, but their main purpose is to react to invading phages. The CRISPR locus is transcribed into a long CRISPR RNA, which is then split into small RNAs called crRNAs, each containing a spacer and a part of the repeated sequence. When a phage injects its DNA into the bacterium, the crRNA recognizes and binds to it. An enzyme then recognizes the hybrid and cleaves the phage DNA at the point where the crRNA has paired. Replication of the phage DNA is inactivated, and the infection is stopped. Genome editing or modification is the identification of the proteins involved in the cleavage of the hybrid DNA. This process is performed by a complex of proteins containing the protein Cas1 and sometimes by a single protein called Cas9. Cas9 is unique in that it can attach itself to a DNA strand and, due to the two distinct domains of its structure, cut this DNA on each of its two strands. This protein is the basis of the CRISPR/Cas9 technology, which enables a variety of genome modifications and mutations in mammals, plants, insects, and fish in addition to bacteria. This system works due to the Cas9 protein and also a guide RNA hybrid that is made from one RNA similar to the region to be mutated and a second RNA called tracrRNA, or trans-activating crRNA. tracrRNA was discovered next to the CRISPR locus in Streptococcus pyogenes and was shown to be homologous to the repeated regions of the locus, enabling it to guide the Cas9 protein and the crRNA toward the target. In summary, by expressing the Cas9 protein with a composite RNA made up of an identical sequence to the target region, a tracrRNA, and a complementary fragment to the tracrRNA, one can now introduce a mutation or deletion into a target genome of any origin. After the 2012 publication in Science of the elegant studies by the teams led by Emmanuelle Charpentier and Jennifer Doudna, the CRISPR method was so intriguing that it provoked an avalanche of research and publications demonstrating that this technique could be used in many cases and with many variations. 24

The CRISPR–Cas system
Tina Y.Liu (2020): CRISPR-Cas systems stand out as the only known RNA programmed pathways for detecting and destroying bacteriophages and plasmids. . Class 1 CRISPR-Cas systems, the most widespread and diverse of these adaptive immune systems, use an RNA-guided multi-protein complex to find foreign nucleic acids and trigger their destruction. These multisubunit complexes target and cleave DNA and RNA, and regulatory molecules control their activities. CRISPR-Cas loci constitute the only known adaptive immune system in bacteria and archaea. They typically include an array of repeat sequences (CRISPRs) with intervening “spacers” matching sequences of DNA or RNA from viruses or other mobile genetic elements, and a set of genes encoding CRISPR-associated (Cas) proteins.

Transcription across the CRISPR array produces a precursor crRNA (pre-crRNA) that is processed by nucleases into small, non-coding CRISPR RNAs (crRNAs). Each crRNA molecule assembles with one or more Cas proteins into an effector complex that binds crRNA-complementary regions in foreign DNA or RNA. The effector complex then triggers degradation of the targeted DNA or RNA using either an intrinsic nuclease activity or a separate nuclease in trans.

Giedrius Gasiunas (2012):The silencing of invading nucleic acids is executed by ribonucleoprotein complexes preloaded with small, interfering CRISPR RNAs (crRNAs) that act as guides for targeting and degradation of foreign nucleic acid. The Cas9–crRNA complex of the Streptococcus thermophilus CRISPR3/Cas system introduces a double-strand break at a specific site in DNA containing a sequence complementary to crRNA. DNA cleavage is executed by Cas9, which uses two distinct active sites, RuvC and HNH, to generate site-specific nicks on opposite DNA strands. Results demonstrate that the Cas9–crRNA complex functions as an RNA-guided endonuclease with RNA-directed target sequence recognition and protein-mediated DNA cleavage. 20

J.Cepelewicz (2020): CRISPR acts like an adaptive immune system; it enables bacteria that have been exposed to a virus to pass on a genetic “memory” of that infection to their descendants, which can then mount better defenses against a repeat infection. It’s a system that works so well that an estimated half of all bacterial species use CRISPR. Researchers have uncovered dozens of other systems that bacteria use to rebuff phage invasions. But in laboratory studies, bacteria primarily develop what’s known as surface-based phage resistance. Mutations change receptor molecules on the surface of the bacterial cell, so that the phage can no longer recognize and invade it.

The strategy is akin to shutting a door and throwing away the key: It offers the bacteria complete safety from infection by the virus. But that protection comes at a significant price, because it also disrupts whatever nutrient uptake, waste disposal, communication task or other cellular function the receptor would have been providing — taking a constant toll on a cell’s fitness.

In contrast, CRISPR only drags on a cell’s resources when it’s active, during a viral infection. Even so, CRISPR represents a riskier gambit: It doesn’t start to work until phages have already entered the cell, meaning that there’s a chance the viruses could overcome it. And CRISPR doesn’t just attack viral DNA; it can also prevent bacteria from taking up beneficial genes from other microbes, like those that confer antibiotic resistance. What factors affect the trade-offs in costs and fitness? For the past six years, Edze Westra, an evolutionary ecologist at the University of Exeter in England, has led a team pursuing the answer to that question. In 2015, they discovered that nutrient availability and phage density affected whether Pseudomonas bacteria relied on surface-based or CRISPR-based resistance. In environments poor in resources, receptor modifications were more burdensome, so CRISPR became a better bargain. When resources were plentiful, bacteria grew more densely and phage epidemics became more frequent. Bacteria then faced greater selective pressure to close themselves off from infection entirely, and so they shut down receptors to gain surface-based resistance. This explained why surface-based resistance was so common in laboratory cultures. Growing in a test tube rich in nutrients, “these bacteria are on a holiday,” Westra said. “They are having a terrific time.”

Still, these rules weren’t cut and dried. Plenty of bacteria in natural high-nutrient environments use CRISPR, and plenty of bacteria in natural low-nutrient environments don’t. “It’s all over the place,” Westra said. “That told us that we were probably still missing something.”

How Biodiversity Reshapes the Battle
Then one of Westra’s graduate students, Ellinor Opsal, proposed another potential factor: the diversity of the biological communities in which bacteria live. This factor is harder to study, but scientists had previously observed that it could affect phage immunity in bacteria. For example, in 2005, James Bull, a biologist at the University of Texas, Austin, and William Harcombe, his graduate student at the time (now at the University of Minnesota), found that E. coli bacteria didn’t evolve immunity to a phage when a second bacterial species was present. Similarly, Britt Koskella, an evolutionary biologist at the University of California, Berkeley, and one of her graduate students, Catherine Hernandez, reported last year that phage resistance failed to arise in Pseudomonas bacteria living on their actual host (a plant), though they always gained immunity in a test tube. Could the diversity of the surroundings influence not just whether or not resistance to phages evolved, but the nature of that resistance?

To find out, Westra’s team performed a new set of experiments: Instead of altering the nutrient conditions for Pseudomonas bacteria growing with phages, they added three other bacterial species — species that competed against Pseudomonas for resources but weren’t targeted by the phage. Left to themselves, Pseudomonas would normally develop surface-based mutations. But in the company of rivals, they were far more likely to turn to CRISPR. Further investigation showed that the more complex community dynamics had shifted the fitness costs: The bacteria could no longer afford to inactivate receptors because they not only had to survive the phage, but also had to outcompete the bacteria around them. These results from Westra’s group dovetail with earlier findings that phages can produce greater diversity in bacterial communities. “Now, that diversity is actually feeding back to the phage side of things” by affecting phage resistance, Koskella said. “It’s neat to see that coming full circle.” By understanding that kind of feedback loop, she added, “we can start to ask more general questions about the impacts that phages have in a community context.”

For one, the bacteria’s shift toward a CRISPR-based phage response had another, broader effect. When Westra’s group grew Pseudomonas in moth larvae hosts, they found that the bacteria with surface-based resistance were less virulent, killing the larvae much more slowly than the bacteria with active CRISPR systems did. 18

Discovering CRISPR
S.H. Sternberg (2015): The CRISPR locus was first identified in Escherichia coli as an unusual series of 29-bp repeats separated by 32-bp spacer sequences (Ishino et al., 1987) 21 Carl Zimmer tells us the story (2015): The scientists who discovered CRISPR had no way of knowing that they had discovered something so revolutionary. They didn’t even understand what they had found. In 1987, Yoshizumi Ishino and colleagues at Osaka University in Japan published the sequence of a gene called iap belonging to the gut microbe E. coli. To better understand how the gene worked, the scientists also sequenced some of the DNA surrounding it. They hoped to find spots where proteins landed, turning iap on and off. But instead of a switch, the scientists found something incomprehensible. Near the iap gene lay five identical segments of DNA. DNA is made up of building blocks called bases, and the five segments were each composed of the same 29 bases. These repeat sequences were separated from each other by 32-base blocks of DNA, called spacers. Unlike the repeat sequences, each of the spacers had a unique sequence.

This peculiar genetic sandwich didn’t look like anything biologists had found before. When the Japanese researchers published their results, they could only shrug. “The biological significance of these sequences is not known,” they wrote. It was hard to know at the time if the sequences were unique to E. coli, because microbiologists only had crude techniques for deciphering DNA. But in the 1990s, technological advances allowed them to speed up their sequencing. By the end of the decade, microbiologists could scoop up seawater or soil and quickly sequence much of the DNA in the sample. This technique — called metagenomics — revealed those strange genetic sandwiches in a staggering number of species of microbes. They became so common that scientists needed a name to talk about them, even if they still didn’t know what the sequences were for. In 2002, Ruud Jansen of Utrecht University in the Netherlands and colleagues dubbed these sandwiches “clustered regularly interspaced short palindromic repeats” — CRISPR for short.

Jansen’s team noticed something else about CRISPR sequences: They were always accompanied by a collection of genes nearby. They called these genes Cas genes, for CRISPR-associated genes. The genes encoded enzymes that could cut DNA, but no one could say why they did so, or why they always sat next to the CRISPR sequence. Three years later, three teams of scientists independently noticed something odd about CRISPR spacers. They looked a lot like the DNA of viruses. “And then the whole thing clicked,” said Eugene Koonin. At the time, Koonin, an evolutionary biologist at the National Center for Biotechnology Information in Bethesda, Md., had been puzzling over CRISPR and Cas genes for a few years. As soon as he learned of the discovery of bits of virus DNA in CRISPR spacers, he realized that microbes were using CRISPR as a weapon against viruses.

Koonin knew that microbes are not passive victims of virus attacks. They have several lines of defense. Koonin thought that CRISPR and Cas enzymes provide one more. In Koonin’s hypothesis, bacteria use Cas enzymes to grab fragments of viral DNA. They then insert the virus fragments into their own CRISPR sequences. Later, when another virus comes along, the bacteria can use the CRISPR sequence as a cheat sheet to recognize the invader.
Scientists didn’t know enough about the function of CRISPR and Cas enzymes for Koonin to make a detailed hypothesis. But his thinking was provocative enough for a microbiologist named Rodolphe Barrangou to test it. To Barrangou, Koonin’s idea was not just fascinating, but potentially a huge deal for his employer at the time, the yogurt maker Danisco. Danisco depended on bacteria to convert milk into yogurt, and sometimes entire cultures would be lost to outbreaks of bacteria-killing viruses. Now Koonin was suggesting that bacteria could use CRISPR as a weapon against these enemies.

To test Koonin’s hypothesis, Barrangou and his colleagues infected the milk-fermenting microbe Streptococcus thermophilus with two strains of viruses. The viruses killed many of the bacteria, but some survived. When those resistant bacteria multiplied, their descendants turned out to be resistant too. Some genetic change had occurred. Barrangou and his colleagues found that the bacteria had stuffed DNA fragments from the two viruses into their spacers. When the scientists chopped out the new spacers, the bacteria lost their resistance. Barrangou, now an associate professor at North Carolina State University, said that this discovery led many manufacturers to select for customized CRISPR sequences in their cultures, so that the bacteria could withstand virus outbreaks. “If you’ve eaten yogurt or cheese, chances are you’ve eaten CRISPR-ized cells,” he said.

In 2007, Blake Wiedenheft joined Doudna’s lab as a postdoctoral researcher, eager to study the structure of Cas enzymes to understand how they worked. Doudna agreed to the plan — not because she thought CRISPR had any practical value, but just because she thought the chemistry might be cool. “You’re not trying to get to a particular goal, except understanding,” she said. As Wiedenheft, Doudna and their colleagues figured out the structure of Cas enzymes, they began to see how the molecules worked together as a system. When a virus invades a microbe, the host cell grabs a little of the virus’s genetic material, cuts open its own DNA, and inserts the piece of virus DNA into a spacer. As the CRISPR region fills with virus DNA, it becomes a molecular most-wanted gallery, representing the enemies the microbe has encountered. The microbe can then use this viral DNA to turn Cas enzymes into precision-guided weapons. The microbe copies the genetic material in each spacer into an RNA molecule. Cas enzymes then take up one of the RNA molecules and cradle it. Together, the viral RNA and the Cas enzymes drift through the cell. If they encounter genetic material from a virus that matches the CRISPR RNA, the RNA latches on tightly. The Cas enzymes then chop the DNA in two, preventing the virus from replicating.

CRISPR, microbiologists realized, is also an adaptive immune system. It lets microbes learn the signatures of new viruses and remember them. And while we need a complex network of different cell types and signals to learn to recognize pathogens, a single-celled microbe has all the equipment necessary to learn the same lesson on its own. But how did microbes develop these abilities? Ever since microbiologists began discovering CRISPR-Cas systems in different species, Koonin and his colleagues have been reconstructing the systems’ evolution. CRISPR-Cas systems use a huge number of different enzymes, but all of them have one enzyme in common, called Cas1. The job of this universal enzyme is to grab incoming virus DNA and insert it in CRISPR spacers. Recently, Koonin and his colleagues discovered what may be the origin of Cas1 enzymes.

Along with their own genes, microbes carry stretches of DNA called mobile elements that act like parasites. The mobile elements contain genes for enzymes that exist solely to make new copies of their own DNA, cut open their host’s genome, and insert the new copy. Sometimes mobile elements can jump from one host to another, either by hitching a ride with a virus or by other means, and spread through their new host’s genome.

Koonin and his colleagues discovered that one group of mobile elements, called casposons, makes enzymes that are pretty much identical to Cas1. In a new paper in Nature Reviews Genetics, Koonin and Mart Krupovic of the Pasteur Institute in Paris argue that the CRISPR-Cas system got its start when mutations transformed casposons from enemies into friends. Their DNA-cutting enzymes became domesticated, taking on a new function: to store captured virus DNA as part of an immune defense. While CRISPR may have had a single origin, it has blossomed into a tremendous diversity of molecules. Koonin is convinced that viruses are responsible for this. Once they faced CRISPR’s powerful, precise defense, the viruses evolved evasions. Their genes changed sequence so that CRISPR couldn’t latch onto them easily. And the viruses also evolved molecules that could block the Cas enzymes. The microbes responded by evolving in their turn. They acquired new strategies for using CRISPR that the viruses couldn’t fight. Over many thousands of years, in other words, evolution behaved like a natural laboratory, coming up with new recipes for altering DNA. 17

Diversity, ecology, and evolution of the CRISPR-Cas systems
Devashish Rath (2015): The length and sequence of repeats and the length of spacers are well conserved within a CRISPR locus, but may vary between CRISPRs in the same or different genomes. Repeat sequences are in the range of 21 bp to 48 bp, and spacers are between 26 bp and 72 bp. The observed variation is perhaps not surprising given how widespread the system is. The number of spacers within a CRISPR locus varies widely; from a few to several hundred. Genomes can have single or multiple CRISPR loci and in some species, these loci can make up a significant part of the chromosome. Not all CRISPR loci have adjacent cas genes and instead rely on trans-encoded factors. (a trans-acting factor is usually a regulatory protein that binds to DNA). Another feature associated with CRISPR loci is the presence of a conserved sequence, called leader, located upstream of the CRISPR with respect to the direction of transcription. The Cas proteins are a highly diverse group. Many are predicted or identified to interact with nucleic acids; e.g. as nucleases, helicases and RNA-binding proteins. The Cas1 and Cas2 proteins are involved in adaptation and are virtually universal for CRISPR-Cas systems. Other Cas proteins are only associated with certain types of CRISPR-Cas systems. The diversity of Cas proteins, presence of multiple CRISPR loci, and frequent horizontal transfer of CRISPR-Cas systems make classification a complex task. The most adopted classification identifies Type I, II and III CRISPR-Cas systems, with each having several subgroups. Different types of CRISPR-Cas systems can co-exist in a single organism. Recently, a Type IV system was proposed, which contains several Cascade genes but no CRISPR, cas1 or cas2. Type IV complex would be guided by protein-DNA interaction, not by crRNA, and constitutes an innate immune system preset to attack certain sequences. The Type I systems are defined by the presence of the signature protein Cas3, a protein with both helicase and DNase domains responsible for degrading the target. Currently, six subtypes of the Type I system are identified (Type I-A through Type I-F) that have a variable number of cas genes. Apart from cas1, cas2 and cas3, all Type I systems encode a Cascade-like complex. Cascade binds crRNA and locates the target, and most variants are also responsible for processing the crRNA. Cascade also enhances spacer acquisition in some cases. In the Type I-A system, Cas3 is a part of the Cascade complex. The Type II CRISPR-Cas systems encode Cas1 and Cas2, the Cas9 signature protein, and sometimes a fourth protein (Csn2 or Cas4). Cas9 assists in adaptation participate in crRNA processing and cleave the target DNA assisted by crRNA and an additional RNA called tracrRNA. Type II systems have been divided into subtypes II-A and II-B but recently a third, II-C, has been suggested. The csn2 and cas4 genes, both encoding proteins involved in adaptation, are present in Type II-A and Type II-B, respectively, while Type II-C lacks a fourth gene. The Type III CRISPR-Cas systems contain the signature protein Cas10 with unclear function. Most Cas proteins are destined for the Csm (in Type III-A) or Cmr (in Type III-B) complexes, which are similar to Cascade. Interestingly, while all Type I and II systems are known to target DNA, Type III systems target DNA and/or RNA. So far, the Type II systems have been exclusively found in bacteria while the Type I and Type III systems occur both in bacteria and archaea. A large number of genomes with detected CRISPRs could be used as an argument for its importance as a defense mechanism. However, the CRISPR-Cas systems are probably mobile genetic elements that frequently transfer horizontally, which also contributes to their high prevalence. Other findings indicate that phages can still replicate in populations with one but not two spacers targeting them.

Eugene V. Koonin (2019): The number and diversity of known CRISPR–Cas systems have substantially increased in recent years. The new classification includes 2 classes, 6 types and 33 subtypes compared with 5 types and 16 subtypes in 2015. At the adaptation stage, a distinct complex of Cas proteins binds to a target DNA, often after recognizing a distinct, short motif known as a protospacer-adjacent motif (PAM), and cleaves out a portion of the target DNA, the protospacer. After duplication of the repeat at the 5ʹ end of the CRISPR array, the adaptation complex inserts the protospacer DNA into the array, so that it becomes a spacer. Some CRISPR–Cas systems employ an alternative mechanism of adaptation — namely, spacer acquisition from RNA, via reverse transcription by a reverse transcriptase encoded at the CRISPR–cas locus. At the expression stage, the CRISPR array is typically transcribed as a single transcript — the pre-CRISPR RNA (pre-crRNA) — that is processed into mature CRISPR RNAs (crRNAs), each containing the spacer sequence and parts of the flanking repeats. In different CRISPR–Cas variants, the pre-crRNA processing is mediated by a distinct subunit of a multiprotein Cas complex, by a single, multidomain Cas protein, or by non-Cas host RNases. At the interference stage, the crRNA, which typically remains bound to the processing complex (protein), serves as a guide to recognize the protospacer (or a closely similar sequence) in the invading genome of a virus or plasmid, which is then cleaved and inactivated by a Cas nuclease (or nucleases) that either is part of the effector or is recruited at the interference stage. The above summary is a brief, oversimplified description of the CRISPR–Cas functionality that inevitably omits many details. These can be found in recent reviews on different aspects of CRISPR–Cas biology. Similar to other biological defense mechanisms, archaeal and bacterial CRISPR–Cas systems show a remarkable diversity of Cas protein sequences, gene compositions, and architectures of the genomic loci. Our knowledge of this diversity is continuously expanding through the screening of ever-growing genomic and metagenomic databases. To keep pace with such expansion, a robust classification of CRISPR–Cas systems is essential for the progress of CRISPR research, but this presents formidable challenges, owing to the lack of universal markers and the fast evolution of the CRISPR–cas loci. Therefore, the two previous CRISPR–Cas classifications, published in Nature Reviews Microbiology in 2011 and 2015, employed a multipronged approach that combined comparisons of the gene compositions of CRISPR–Cas systems and their loci architectures with sequence similarity-based clustering and phylogenetic analysis of conserved Cas proteins, such as Cas1. The 2015 classification included 5 types and 16 subtypes, as well as introducing the major division of CRISPR–Cas systems into two classes that radically differ with respect to the architectures of their effector modules involved in crRNA processing and interference. The class 1 systems have effector modules composed of multiple Cas proteins, some of which form crRNA-binding complexes (such as the Cascade complex in type I systems) that, with contributions from additional Cas proteins, mediate pre-crRNA processing and interference. By contrast, class 2 systems encompass a single, multidomain crRNA-binding protein (such as Cas9 in type II systems) that combines all activities required for interference and, in some variants, also those involved in pre-crRNA processing (Box 1).

Origin of CRISPR-Cas molecular complexes of prokaryotes The_tw11

Class 1 CrisPr–Cas systems have effector modules composed of multiple Cas proteins that form a crrNa-binding complex and function together in binding and processing of the target. Class 2 systems have a single, multidomain. crrNa-binding protein that is functionally analogous to the entire effector complex of class 1. Part a of the figure illustrates the generic organizations of the class 1 and class 2 CrisPr–Cas loci. Part b of the figure shows the functional modules of CrisPr–Cas systems. the scheme shows the typical relationships between the genetic, structural and functional organizations of the six types of CrisPr–Cas systems. Protein names follow the current nomenclature. an asterisk indicates the putative small subunit that might be fused to the large subunit in several type i subtypes. the pound symbols (#) indicate that other unknown sensor, effector and ring nuclease protein families could be involved in the same signaling pathway. Dispensable (and/or missing, in some subtypes and variants) components are indicated by dashed outlines. Cas6 is shown with a thin solid outline for type i because it is dispensable in some, but not most, systems and with a dashed line for type iii because most of these systems apparently use the Cas6 protein provided in trans by other CrisPr–cas loci. the three colours for Cas9, Cas10, Cas12 and Cas13 reflect the fact that these proteins contribute to different stages of the CrisPr–Cas response. the CrisPr-associated rossmann fold (CarF) and higher eukaryotes and prokaryotes nucleotide-binding (HePN) domain proteins are the most common sensors and effectors, respectively, in the type III ancillary modules, but several alternative sensors and effectors have been identified, as well43. ring nucleases are a distinct variety of CarF domain proteins that cleave cyclic oligoa produced by Cas10 and thus control the indiscriminate rNase activity of the HePN domain of Csx1. Ls, large subunit; ss, small subunit; tracrrNa, transactivating CrisPr rNa.

CRISPR–Cas classification
No genes are shared by all CRISPR–Cas systems, ruling out the possibility of a straightforward, comprehensive phylogenetic classification analogous to that employed for cellular life forms.

Class 1 and its derivatives.
The classification of class 1 CRISPR–Cas systems, which include types I, III and IV, has remained relatively stable compared with the 2015 version (Fig. below).

Origin of CRISPR-Cas molecular complexes of prokaryotes Update10

Updated classification of class 1 criSPr–cas systems.
The figure schematically shows representative (typical) CRISPR–cas loci of each class 1 subtype and of selected distinct variants, with the dendrogram on the left showing the likely evolutionary relationships between the types and subtypes. The column on the right indicates the organism and the corresponding gene range. Homologous genes are colour-coded and identified by a family name. The gene names follow the previous classification18. Where both a systematic name and a legacy name are commonly used, the legacy name is given under the systematic name. The small subunit is encoded by csm2, cmr5, cse2, csa5 and several additional families of homologous genes that are collectively denoted cas11. The adaptation module genes cas1 and cas2 are dispensable in subtypes III-A and III-E (dashed lines). Gene regions coloured cream represent the HD nuclease domain; the HD domain in Cas10 is distinct from that in Cas3 and Cas3ʹʹ. Functionally uncharacterized genes are shown in grey. The tan shading shows the effector module. The grey shading of different hues shows the two levels of classification: subtypes and variants. Most of the subtype III-B, III-C, III-E and III-F loci, as well as IV-B and IV-C loci, lack CRISPR arrays and are shown accordingly, although for each of the type III subtypes exceptions have been detected. CHAT, protease domain of the caspase family; RT, reverse transcriptase; TPR, tetratricopeptide repeat.

Origin of CRISPR-Cas molecular complexes of prokaryotes Update11

Updated classification of class 2 criSPr–cas systems.
The figure schematically shows representative (typical) CRISPR–cas loci for each class 2 subtype and for selected distinct variants, with the dendrogram on the left showing the likely evolutionary relationships between the types and subtypes. The column on the right indicates the organism and the corresponding gene range. Homologous genes are colour coded and are identified by a family name following the previous classification18. Where both a systematic name and a legacy name are commonly used, the legacy name is given under the systematic name. The grey shading of different hues shows the two levels of classification: subtypes and variants. The adaptation module genes cas1 and cas2 are present in only a subset of the subtype V-D, VI-A and VI-D loci and are accordingly shown by dashed lines. The WYL-domain-encoding genes and csx27 genes are also dispensable and shown by dashed lines. Additional genes encoding components of the interference module, such as transactivating CRISPR RNA (tracrRNA), are shown. The domains of the effector proteins are colour-coded: RuvC-like nuclease, green; HNH nuclease, yellow; higher eukaryotes and prokaryotes nucleotide-binding (HEPN) RNase, purple; transmembrane domains, blue.

Origin of CRISPR-Cas molecular complexes of prokaryotes Outlin10

Outline of a complete scenario for the origins and evolution of criSPr–cas systems.
The figure depicts a hypothetical scenario of the origin of CRISPR–Cas systems from an ancestral signalling system (possibly an abortive infection defence system (Abi)). This putative ancestral Abi module shares a cyclic oligoA polymerase Palm domain (RNA recognition motif (RRM) fold) with Cas10 and is proposed to function analogously to type III CRISPR–Cas systems. Specifically, cyclic oligoA molecules that are synthesized in response to virus infection bind to the CRISPR-associated Rossmann fold (CARF) domain of the second protein in this system, resulting in activation of the RNase activity of the higher eukaryotes and prokaryotes nucleotide-binding (HEPN) domain, which induces dormancy through indiscriminate RNA cleavage. This putative ancestral Abi module would give rise to the type III-like CRISPR–Cas effector module via duplication of the RRM domain, with subsequent inactivation of one of the copies (the two RRM domains are denoted RRM1 and RRM2). The ancestral class 1 CRISPR–Cas system is inferred to have evolved through the merger of two modules: the adaptation module, including the CRISPR repeats, derived from a casposon, and the type III-like effector module, likely derived from the ancestral Abi system. The subsequent acquisition of the HD nuclease domain by the effector module provided for RNA-guided DNA cleavage. Inactivation of the oligoA polymerase domain in the effector complex, or possibly replacement of Cas10 by an unrelated protein and acquisition of the Cas3 helicase, led to the emergence of type I systems, which lack the cyclic oligoA-dependent signalling pathway and exclusively cleave double-stranded DNA. Class 2 systems of type II and different subtypes of type V appear to have evolved independently by the recruitment of distinct TnpB nucleases that are encoded by IS605-like transposable elements. Type VI likely originated from an RNA-cleaving, HEPN domain-containing abortive infection or toxin–antitoxin system. Some CRISPR–Cas systems, such as type IV and Tn7-linked systems I-F3 and V-K, were subsequently recruited by mobile genetic elements and lost their interference capacity along with the original defence function. The key evolutionary events are described to the right of the images. The typical CRISPR–cas operon organization is shown for each CRISPR–Cas subtype and for selected distinct variants. Homologous genes are colour-coded and identified by a family name following the previous classification18. The multiforking arrows denote events that have been inferred to have occurred on multiple, independent occasions during the evolution of CRISPR–Cas systems. GGDD, key catalytic motif of the cyclase or polymerase domain of Cas10 that is involved in the synthesis of cyclic oligoA signalling molecules; HRAMP, haloarchaeal repeat-associated mysterious proteins; TR, terminal repeats; tracrRNA, transactivating CRISPR RNA; TSD, target site duplication, the likely source of ancestral repeats. 23

Imagine a company had the task to install a security system in its headquarters, based on biometrics. Biometrics comes from the Greek words “bios” (life) and “metrikos” (measure). It involves the implementation of a system that uses the analysis of biological characteristics of people, and that analyzes human characteristics for identity verification or identification. In order to distinguish employees that are permitted to enter the building, and exclude to enter those that are not welcome, there has to be first data collection and storage of the information in a memory bank. Every time, when someone arrives at the building, it will go through the security check, and the provided data will be compared to the data in the memory bank. If there is a match, the person is permitted to enter, or not.

Analogously, cells are capable of doing almost the same, with a few differences. They have an ingenious security check system, based on enemy recognition, and based on that knowledge, creating a sophisticated data bank, that is employed to recognize future enemy invasions, and annihilate them.

Origin of CRISPR-Cas molecular complexes of prokaryotes Divers11

Diversity of CRISPR–Cas systems.
The CRISPR-associated (Cas) proteins can be divided into distinct functional categories as shown. The three types of CRISPR–Cas systems are defined on the basis of a type-specific signature Cas protein (indicated by an asterisk) and are further subdivided into subtypes. The CRISPR ribonucleoprotein (crRNP) complexes of type I and type III systems contain multiple Cas subunits, whereas the type II system contains a single Cas9 protein. Boxes indicate components of the crRNP complexes for each system. The type III-B system is unique in that it targets RNA, rather than DNA, for degradation.

Understanding CRISPR-Cas9
Eugene V Koonin (2011): CRISPR-Cas systems have three distinct functional stages of their operation. During the first stage, adaptation, short pieces of DNA (characteristic length of approximately 30 bp) homologous to virus or plasmid sequences (known as proto-spacers) are integrated into the CRISPR loci. The short (3 or 4 nucleotides) proto-spacer adjacent motifs (PAMs) located immediately downstream of the proto-spacer appear to determine the selection of the protospacer followed by integration into a pre-existing CRISPR array. The second stage, expression and processing, involves transcription and cleavage of long primary transcript of a CRISPR locus (pre-crRNA) that is processed into short crRNAs. This step is catalyzed by endoribonucleases encoded by the cas genes that either operate as a subunit of a larger complex (e.g. Cascade, CRISPR-associated complex for antiviral defense in Escherichia coli) or as a stand-alone enzyme, e.g., Cas6 in the archaeon Pyrococcus furiosus. At the third stage, interference, the alien nucleic acid (DNA or RNA) is targeted by a ribonucleoprotein complex containing a crRNA guide and a set of Cas proteins, and cleaved within or in the vicinity of the PAM sequence. In several CRISPR-Cas systems, crRNA have been shown to be complementary to either strand of the phage or plasmid which is best compatible with DNA being the target. Direct demonstration of DNA being the target of the CRISPR-Cas machinery has come from experiments in Staphylococcus epidermidis. In this case, insertion of a self-splicing intron into the proto-spacer sequence of the target gene rendered the respective plasmid resistant to the CRISPR-mediated immunity. 22

Origin of CRISPR-Cas molecular complexes of prokaryotes Crispr23

Origin of CRISPR-Cas molecular complexes of prokaryotes Crispr21

A roadmap of CRISPR-Cas adaptation and defense.
In the example illustrated, a bacterial cell is infected by a bacteriophage. The first stage of CRISPR-Cas defense is CRISPR adaptation. This involves the incorporation of small fragments of DNA from the invader into the host CRISPR array. This forms a genetic “memory” of the infection. The memories are stored as spacers (colored squares) between repeat sequences (R), and new spacers are added at the leader-proximal (L) end of the array. The Cas1 and Cas2 proteins, encoded within the cas gene operon, form a Cas1-Cas2 complex (blue)—the “workhorse” of CRISPR adaptation. In this example, the Cas1-Cas2 complex catalyzes the addition of a spacer from the phage genome (purple) into the CRISPR array. The second stage of CRISPR-Cas defense involves transcription of the CRISPR array and subsequent processing of the precursor transcript to generate CRISPR RNAs (crRNAs). Each crRNA contains a single spacer unit that is typically flanked by parts of the adjoining repeat sequences (gray). Individual crRNAs assemble with Cas effector proteins (light green) to form crRNA-effector complexes. The crRNA-effector complexes catalyze the sequence-specific recognition and destruction of foreign DNA and/or RNA elements. This process is known as interference. 13

Origin of CRISPR-Cas molecular complexes of prokaryotes Overvi11

[/size]
Overview of the CRISPR–Cas system.
Adaptive immunity by CRISPR–Cas systems is mediated by CRISPR RNAs (crRNAs) and Cas proteins, which form multicomponent CRISPR ribonucleoprotein (crRNP) complexes. The cas genes are colored according to function, as indicated by the four functional categories in coloured boxes:

spacer acquisition (yellow);
crRNA processing (pink);
crRNA assembly and surveillance (blue); and
target degradation (purple).

Involvement of non-Cas components (grey) is indicated, either when experimentally demonstrated (for example, RNase III processing in type II systems) or when anticipated (for example, the potential involvement of housekeeping repair and/or recombination enzymes). The first stage is known as acquisition, which occurs following the entry of an invading mobile genetic element (in this case, a viral genome). The invading DNA is fragmented and a new protospacer (green) is selected, processed and integrated as a new spacer at the leader end of the CRISPR array. During the second stage, which is known as expression, the CRISPR locus is transcribed and the pre-crRNA is processed into small crRNAs by CRISPR-associated (Cas6) and/or housekeeping ribonucleases (such as RNase III). The mature crRNAs and Cas proteins assemble to form a crRNP complex. During the final stage of interference, the crRNP scans invading DNA for a complementary nucleic acid target and on successful recognition, the target is eventually degraded by Cas nucleases. 27

S. H. Sternberg (2015): CRISPR-Cas immunity is conferred through integration of short DNA fragments into the CRISPR locus, and these spacer sequences record the history of past infections. The CRISPR locus is transcribed, and the resultant transcript is processed into shorter CRISPR-RNAs (crRNAs). CRISPR-Cas systems are classified as types I, II or III, which can be distinguished based on the presence of the signature Cas3, Cas9, or Cas10 genes, respectively. Type I are the most common, and much of our understanding of type I CRISPR-Cas systems comes from studies of E. coli Cascade (CRISPR-associated complex for antiviral defense), which is comprised of the five proteins Cse1, Cse2, Cas7, Cas5e, and Cas6e. These proteins assemble on a 61-nt crRNA, yielding a 405- kDa complex. The crRNA contains the 32-nt spacer sequence, which directs Cascade to sequences (protospacers) in foreign DNA, leading to formation of an R-loop intermediate. Cascade then recruits Cas3, which has an N-terminal histidine-aspartate (HD) nuclease domain and C-terminal superfamily 2 (SF2) helicase domain, to degrade the DNA. Cascade must discriminate between spacer sequences found in the bacterial chromosome and those found in foreign DNA. This discrimination is thought to be accomplished through recognition of a trinucleotide sequence motif called the protospacer-adjacent motif (PAM; 5′-A[A/T]G-3′ for E. coli Cascade), which is adjacent to the protospacer in foreign DNA, but absent in the CRISPR locus. Strict sequence requirements present a potential weakness because mutations in either the PAM or protospacer can allow foreign DNA to escape CRISPR-Cas immunity. However, bacteria can rapidly restore immunity using a positive-feedback loop to update the CRISPR locus. Priming requires Cascade with a crRNA bearing at least partial complementarity to the escape target, suggesting Cascade must be able to locate targets even when they bear mutations sufficient to escape immunity. Priming also requires Cas3 and the Cas1-Cas2 complex, which integrate new sequences into the CRISPR locus. The PAM-dependent pathway is highly efficient and allows Cascade to recruit Cas3 for strand-specific degradation of the target genome. The PAM-independent pathway is less efficient, but Cascade can still bind tightly to the DNA, ensuring that it can initiate the sequence of molecular events that precede primed spacer acquisition. Through this pathway, Cas3 recruitment becomes strictly dependent on Cas1-Cas2, and Cas1-Cas2 also attenuate Cas3 nuclease activity and enable Cas3 to rapidly translocate in either direction along the foreign DNA. These results establish Cas1-Cas2 as a trans-acting factor necessary for the recruitment and regulation of Cas3 at escape targets. Based on our findings, we propose a mechanistic framework describing how Cascade, Cas1, Cas2, and Cas3 work together to process and disable foreign genetic elements. 21

M. P. Terns et al. (2015): (CRISPR-Cas immune systems function to defend prokaryotes against potentially harmful mobile genetic elements including viruses and plasmids. The multiple CRISPR-Cas systems (Types I, II, and III) each target destruction of foreign nucleic acids via structurally and functionally diverse effector complexes (crRNPs). CRISPR-Cas effector complexes are comprised of CRISPR RNAs (crRNAs) that contain sequences homologous to the invading nucleic acids and Cas proteins specific to each immune system type. CRISPR-Cas systems confer prokaryotes with adaptive immunity against viruses, conjugative plasmids, and other potential genome invaders. A host CRISPR (clustered regularly interspaced short palindromic repeats) locus contains a leader region (typically 100–500 bp) followed by multiple copies of a repeat sequence (∼30–40 bp) separated by similarly sized, variable invader-derived sequences. Each crRNA contains a guide region comprised of invader-derived sequences that allow crRNA-Cas protein effector complexes to recognize and destroy invader nucleic acids. CRISPR-associated (Cas) proteins provide enzymatic machinery and structural components to carry out the distinct phases of the CRISPR-Cas pathway. Moreover, modules of Cas proteins (e.g., Csa, Cst, Cse, Csm, Cmr) comprise the distinct CRISPR-Cas immune systems: Type I (A-G), Type II (A-C), and Type III (A-B). 16

Dipali G Sashital (2019): Within this system, the CRISPR locus is programmed with ‘spacer’ sequences that are derived from foreign DNA and serve as a record of prior infection events 14 CRISPR cas9 in a bacteria acts as an adaptive immune response that is it remembers when a virus has infected the cell in the past and it keeps a little bit of viral DNA and stores it in a memory bank ( the spacers) and uses it so that if the same species of virus infects the cell again it will be able to compare the injected DNA to sequences in the data bank, recognize and respond to it quickly and effectively, and destroy it.

Devashish Rath (2015): The CRISPR-Cas mediated defense process can be divided into three stages.

1. Adaptation or spacer acquisition,where a short fragment of invading DNA is inserted into the CRISPR locus for future recognition of that invader;
2. crRNA biogenesis ( expression), which involves the biogenesis of guide RNA units (crRNA)
3. Target interference where these effector complexes vigilantly scan for and degrade invading genetic material previously identified by—and integrated into—the CRISPR-Cas system

The first stage, adaptation, leads to the insertion of new spacers in the CRISPR locus. In the second stage, expression, the system gets ready for action by expressing the cas genes and transcribing the CRISPR into a long precursor CRISPR RNA (pre-crRNA). The pre-crRNA is subsequently processed into mature crRNA by Cas proteins and accessory factors. In the third and last stage, interference, target nucleic acid is recognized and destroyed by the combined action of crRNA and Cas proteins

A.Price et al., (2016): CRISPR-Cas systems operate as adaptive immune defenses to target and degrade nucleic acids derived from bacteriophages and other foreign genetic elements. 12

1. Dana K Howe Muller's Ratchet and compensatory mutation in Caenorhabditis briggsae mitochondrial genome evolution 2008
2. Eugene V Koonin: Inevitability of Genetic Parasites 2016 Sep 26
3. Eugene V. Koonin: Inevitability of the emergence and persistence of genetic parasites caused by evolutionary instability of parasite-free states 04 December 2017
4. Gregory P Fournier: Ancient horizontal gene transfer and the last common ancestors 22 April 2015
5. Aude Bernheim The pan-immune system of bacteria: antiviral defence as a community resource 06 November 2019
6. Felix Broecker: Evolution of Immune Systems From Viruses and Transposable Elements 29 January 2019
7. Eugene V. Koonin: Evolution of adaptive immunity from transposable elements combined with innate immune systems December 2014
8. Eugene V. Koonin: The LUCA and its complex virome  14 July 2020
9. Luciano Marraffini: (Ph)ighting phages – how bacteria resist their parasites 2020 Feb 13
10. Simon J Labrie: Bacteriophage resistance mechanisms 2010 Mar 29.
11. Anna Lopatina: Abortive Infection: Bacterial Suicide as an Antiviral Immune Strategy 2020 Sep 29
12. Aryn A. Price et al.,: Harnessing the Prokaryotic Adaptive Immune System as a Eukaryotic Antiviral Defense 2016 Feb 3
13. Devashish Rath: The CRISPR-Cas immune system: Biology, mechanisms and applications October 2015
14. Dipali G Sashital: The Cas4-Cas1-Cas2 complex mediates precise prespacer processing during CRISPR adaptation Apr 25, 2019
15. SIMON A. JACKSON: CRISPR-Cas: Adapting to change 7 Apr 2017
16. M. P. Terns et al. Three CRISPR-Cas immune effector complexes coexist in Pyrococcus furious 2015 Jun; 21
17. Carl Zimmer Breakthrough DNA Editor Born of Bacteria February 6, 2015
18. Jordana Cepelewicz: Biodiversity Alters Strategies of Bacterial Evolution January 6, 2020
19. Tina Y.Liu: Chemistry of Class 1 CRISPR-Cas effectors: Binding, editing, and regulation 16 October 2020
20. Giedrius Gasiunas: Cas9–crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria September 4, 2012
21. Samuel H. Sternberg et al. Surveillance and Processing of Foreign DNA by the Escherichia coli CRISPR-Cas System  2015 Nov 5
22. Eugene V Koonin: Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPRCas systems 2011 Jul 14
23. Eugene V Koonin: Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants 19 December 2019
24. Pascale Cossart: THE NEW Microbiology From Microbiomes to CRISPR  2016
25. Luciano A. Marraffini: Molecular mechanisms of CRISPR–Cas spacer acquisition 31 August 2018
26. Udi Qimron: Proteins and DNA elements essential for the CRISPR adaptation process in Escherichia coli 2012 Feb 8
27. John van der Oost et al.: Unravelling the structural and mechanistic basis of CRISPR–Cas systems 09 June 2014

https://www.the-scientist.com/news-opinion/prokaryotes-are-capable-of-learning-to-recognize-phages-70378

Last edited by Otangelo on Wed Nov 02, 2022 6:13 pm; edited 43 times in total

Otangelo

Otangelo

Admin

Posts : 9704
Join date : 2009-08-09
Age : 58
Location : Aracaju brazil

CRISPR adaptation

The adaptation phase provides the genetic memory that is a prerequisite for the subsequent expression and interference phases that neutralize the re-invading nucleic acids. Conceptually, the process can be divided into two steps:

1. Protospacer selection (Cas1-Cas2 substrate capture)
2. Generation of spacer material followed by
3. Integration of the spacer into the CRISPR array and synthesis of a new repeat.

A bacteriophage infects bacteria by injecting its DNA into this bacterial cell like a syringe. And the first thing that happens is a pair of enzymes called CAS1 and CAS2 that are two separate enzymes but they function together they're joined and work always together in concert. They cut out a region of the viruses' DNA called a protospacer and stick it into the bacterial data bank, a part of the chromosome that's called a CRISPR array.

In the CRISPR array, there are repeats (which are small sections of the DNA extracted from the invading DNA from the phage) separated by spaces and so this protospacer, after processing, will become a spacer in the CRISPR array. The term proto means ahead of or before and that's exactly what happens these enzymes Cas1 and Cas2 are programmed to identify a phage DNA section suitable to become a spacer and turn it into it. The spacer gets inserted at the five-prime end of the crisper array, of the complementary strand, and then the machinery builds a new repeat region afterward. There's a repeat after each spacer: Spacer repeat, spacer repeat. Every one of those repeats is exactly the same as all the other repeats that's why it's called a repeat and the spaces are in between them. The term CRISPR stands for clustered regularly interspaced palindromic regions ( A palindrome is a word, number, phrase, or another sequence of characters that reads the same backward as forward, such as madam or racecar) Clustered means all together because this CRISPR array is all in one place on the chromosome. All these spaces are all together in one cluster regularly interspaced.

A restriction endonuclease is an enzyme that cleaves DNA into fragments at or near a specific recognition site. Restriction endonucleases identify usually a restriction site because it's a palindrome. These repeats are often a site where enzymes can interact. Now when it comes to cas1 and cas2 taking a proto-spacer and turning it into a spacer they don't just cut the DNA randomly in any place. They cut it at a precise location adjacent upstream of a proto-spacer. It's quite common in any DNA to find two guanines together but the proto-spacer adjacent motif at least in streptococcus pyogenes is any nucleotide at all followed by guanine guanine. Two guanines together followed following after anything, like for example adenine - thymine - cytosine - another guanine it doesn't really matter. Any nucleotide followed by two guanines that is the protospacer adjacent motif or the PAM as it's called.

These enzymes cas1 and cas2 they'll scan the DNA looking for a PAM site and when they find it they'll go upstream that is to the five prime end on the complementary strand or the coding strand and then they'll cut out a section of bases, around 20 to 26 bases long, and then turn that into a spacer and they'll insert it at the five prime ends of the crisper region, and then build a new repeat to the five prime end of that, pushing the spacer further and further toward the five prime ends. The crisper array is flexible. If a particular bacteria has been infected by lots of different bacteria it may have a very long CRISPR array with lots of spaces and repeats. Other bacteria might only have one or a few. In some bacteria, people have discovered that they have hundreds of spaces and in other species of bacteria, there may only be a couple of them so it's a flexible CRISPR array.13

Dipali G Sashital (2019): The key proteins in collecting and storing the virus DNA are called Cas1, Cas2 and Cas4. Previous work suggests that Cas4 is important for cutting suitable lengths of DNA for storage. The adaptation proteins Cas1 and Cas2 are conserved among most CRISPR systems, suggesting a common molecular mechanism for acquiring spacers. Cas1 and Cas2 catalyze spacer integration via two transesterification reactions mediated by nucleophilic attack on each strand of a double-stranded prespacer substrate at the phosphodiester backbone within the CRISPR array. Integration occurs at the first repeat in the CRISPR array, with one attack occurring between the upstream leader sequence and the repeat and the other occurring on the opposite strand between the repeat and first spacer within the array. These reactions result in the insertion of the prespacer between two single-strand repeats, and this gapped intermediate is repaired by host factors. In order to form a functional spacer, the adaptation complex must capture and process longer fragments of DNA from the invader containing a flanking sequence called a protospacer adjacent motif (PAM). The PAM is an essential motif during target recognition by the surveillance complex and must be present next to the target in order for interference to occur. However, the PAM is not part of the spacer and must be removed from the prespacer prior to integration through a processing step. In addition, integration must occur in the correct orientation to produce a crRNA that is complementary to the PAM-containing strand of the invader. In some systems, additional Cas proteins, such as Cas4, are also required during adaptation. Cas4 is widespread in type I, II, and V systems. In in vivo studies, deletion of cas4 reduced the adaptation efficiency and resulted in the acquisition of non-functional spacers from regions that lacked a correct PAM. Some systems have two cas4 genes that work together to define the PAM, length and orientation of spacers, suggesting that the two Cas4 proteins are involved in processing each end of the prespacer and that they may be present during integration. Similarly, in vitro studies have suggested that Cas4 is involved in PAM-dependent prespacer processing. Cas4 endonucleolytically cleaves PAM-containing 3ʹ-single-stranded overhangs that flank double-stranded prespacers. Importantly, Cas4 cleavage activity is dependent on the presence of Cas1 and Cas2, and Cas4 inhibits premature integration of unprocessed prespacers. These observations suggest that Cas4 associates with the Cas1-Cas2 complex, although direct biochemical and structural evidence for this Cas4-Cas1-Cas2 complex remains elusive. 14

Simon A. Jackson (2017): The Cas1 and Cas2 proteins, constitute the “workhorse” of spacer integration. Spacers added to CRISPR arrays must be compatible with the diverse range of type-specific effector complex machinery. Thus, despite being near ubiquitous among CRISPR-Cas types, Cas1-Cas2 homologs meet the varied requirements for the acquisition of appropriate spacer sequences in different systems. For example, the effector complexes of several CRISPR-Cas types only recognize targets containing a specific sequence adjacent to where the CRISPR RNA (crRNA) base-pairs with the target strand of a mobile genetic element (MGE). The crRNA-paired target sequence is termed the protospacer, and the adjacent target-recognition motif is called a protospacer-adjacent motif (PAM). PAM-based target discrimination prevents the unintentional recognition and self-destruction of the CRISPR locus by the crRNA-effector complex, yet canonical PAM sequences vary between and sometimes within systems. The Cas1 subunits form two dimers that are bridged by a central Cas2 dimer. . In addition to Cas1-Cas2, at least one CRISPR repeat, part of the leader sequence, and several host factors for repair of the insertion sites (e.g., DNA polymerase) are required.

Cas1-Cas2 substrate capture
During substrate capture, Cas1-Cas2 is loaded with an integration-compatible prespacer, which is thought to be partially duplexed dsDNA. For type I systems, the presence of a canonical PAM within the prespacer substrate increases the affinity for Cas1-Cas2 binding but is not requisite. The 3′ single-stranded ends of the prespacer extend into active subunits of each corresponding Cas1 dimer. The length of new spacers is governed by the fixed distances between the two Cas1 wedges and from the branch points to the integrase sites. Many CRISPR-Cas systems have highly consistent yet system-specific spacer lengths, and it is likely that analogous wedge-based Cas1- Cas2 “molecular rulers” exist in these systems to control prespacer length. However, in some systems, such as type III, the length of spacers found within CRISPR arrays appears more variable, and studies of Cas1-Cas2 structure and function in these systems are lacking.

Naïve CRISPR adaptation
Acquisition of spacers from MGEs that are not already cataloged in host CRISPRs is termed naïve CRISPR adaptation. For naïve CRISPR adaptation, prespacer substrates are generated from foreign material and loaded onto Cas1-Cas2. The main known source of these precursors is the host RecBCD complex. Stalled replication forks that occur during DNA replication can result in double-strand breaks (DSBs), which are repaired through RecBCD-mediated unwinding and degradation of the dsDNA ends back to the nearest Chi sites (In Escherichia coli, acquisition of new spacers largely depends on RecBCD-mediated processing of double-stranded DNA breaks occurring primarily at replication forks, and that the preference for foreign DNA is achieved through the higher density of Chi sites on the self chromosome, in combination with the higher number of forks on the foreign DNA. This explains the strong preference to acquire spacers both from high copy plasmids and from phages). During this repair process, RecBCD produces single-stranded DNA (ssDNA) fragments, which have been proposed to subsequently anneal to form partially duplexed prespacer substrates for Cas1-Cas2. The greater number of active origins of replication and the paucity of Chi sites on MGEs, compared with the host chromosome, bias naïve adaptation toward foreign DNA. Furthermore, RecBCD recognizes the unprotected dsDNA ends that are commonly present in phage genomes upon injection or before packaging, which theoretically provides an additional phage-specific source of naïve prespacer substrates. Despite the role of RecBCD in substrate generation, naïve CRISPR adaptation can occur in its absence, albeit with reduced bias toward foreign DNA. Thus, events other than double-strand breaks (DSBs) might also stimulate naïve CRISPR adaptation, such as R-loops that occur during plasmid replication, lagging ends of incoming conjugative elements, and even CRISPR-Cas–mediated spacer integration events themselves. Furthermore, we do not know whether all CRISPR-Cas systems have an intrinsic bias toward production of prespacers from foreign DNA. In high-throughput studies of native systems, the frequency of acquisition of spacers from host genomes is likely to be underestimated, because the autoimmunity resulting from self-targeting spacers means that these genotypes are typically lethal. For example, in the S. thermophilus type II-A system, spacer acquisition appears biased toward MGEs, yet nuclease-deficient Cas9 fails to discriminate between host and foreign DNA. It is unknown whether CRISPR adaptation in type II systems is reliant on DNA break repair. Further studies in a range of host systems are required to clarify how diverse CRISPR-Cas systems balance the requirement for naïve production of prespacers from MGEs against the risk of acquiring spacers from host DNA.

crRNA-directed CRISPR adaptation (priming)
Mutations in the target PAM or protospacer sequences can abrogate immunity, allowing MGEs to escape CRISPR-Cas defenses. Furthermore, the protection conferred by individual spacers varies: Often, several MGE-specific spacers are required to mount an effective defense and to prevent proliferation of escape mutants. Thus, to maintain effective immunity, CRISPR-Cas systems need to undergo CRISPR adaptation faster than MGEs can evade targeting. Indeed, type I systems have a mechanism known as primed CRISPR adaptation (or priming) to facilitate rapid spacer acquisition, even against highly divergent invaders. Priming uses MGE target recognition that is facilitated by preexisting spacers to trigger the acquisition of additional spacers from previously encountered elements. Thus, priming is advantageous when MGE replication within the host cell exceeds defense capabilities. This can occur when cells are infected by mobile genetic element escape mutants or when the levels of CRISPR-Cas activity are insufficient to provide complete immunity using only the existing spacers, even in the absence of MGE escape mutations. Priming begins with target recognition by crRNA-effector complexes. Therefore, factors that influence target recognition (i.e., the formation and stability of the crRNA-DNA hybrid), including PAM sensing and crRNA-target complementarity, affect the efficiency of primed CRISPR adaptation. Furthermore, these same factors can induce conformational rearrangements in the target-bound crRNA-effector complex that result in favoring either the interference or priming pathways. In type I-E systems, the Cas8e (Cse1) subunit of Cascade can adopt one of two conformational modes, which may promote either direct or Cas1-Cas2–stimulated recruitment of the effector Cas3 nuclease. Cas3, which is found in all type I systems, exhibits 3′ to 5′ helicase and endonuclease activity that nicks, unwinds, and degrades target DNA. In vitro activity of the type I-E Cas3 produces ssDNA fragments of ~30 to 100 nucleotides that are enriched for PAMs in their 3′ ends and that anneal to provide partially duplexed prespacer substrates. The spatial positioning of Cas1-Cas2 during primed substrate generation has not been clearly established, although Cas1-Cas2–facilitated recruitment of Cas3 would imply that the CRISPR adaptation machinery is localized close to the site of prespacer production. In type I-F systems, Cas3 is fused to the C terminus of Cas2 (Cas2-3), so these systems form Cas1–Cas2-3 complexes that couple the CRISPR adaptation machinery directly to the source of prespacer generation during priming. Despite different target recognition modes favoring distinct Cas3 recruitment routes, primed CRISPR adaptation can be provoked by mobile genetic element escape mutants and non-escape (interference proficient) targets. However, when the intracellular copy number influences of the MGE are excluded, interference-proficient targets promote greater spacer acquisition than escape mutants. This forms a positive feedback loop, reinforcing immunity against recurrent threats even in the absence of escapees. If the copy number of the MGE within the host cell is factored in, then escape mutants actually trigger more spacer acquisition. This is because interference rapidly clears targeted MGEs from the cell, whereas escape mutants that evade immediate clearance by existing CRISPR-Cas immunity persist for longer. Over time, the prolonged presence of the escape MGE, combined with the priming-centric CRISPR-Cas target recognition mode, results in higher net production of prespacer substrates and spacer integration. Because priming is initiated by site-specific target recognition (i.e., targeting a priming protospacer), Cas1-Cas2–compatible prespacers are subsequently produced from MGEs with locational biases .However, priming is stimulated more strongly from the interference-proficient protospacer than from the original priming protospacer. 15

Cas protein–assisted production of spacers
DNA breaks induced by interference activity of class 2 CRISPR-Cas effector complexes could trigger host DNA repair mechanisms (e.g., RecBCD), thereby providing substrates for Cas1- Cas2. In agreement with a model for DNA break–stimulated enhancement of CRISPR adaptation, restriction enzyme activity can stimulate RecBCD-facilitated production of prespacer substrates. RecBCD activity may also partially account for the enhanced CRISPR adaptation observed during phage infection of a host possessing an innate restriction-modification defense system. Whether the enhanced CRISPR adaptation was RecBCD-dependent in this example is unknown. In a CRISPR Cas–induced DNA break model, the production of prespacer substrates is preceded by a sequence-specific target recognition. Although direct evidence to support this concept is lacking, CRISPR adaptation in type II-A systems requires Cas1-Cas2, Cas9, a transactivating crRNA (tracrRNA; a cofactor for crRNA processing and interference in type II systems), and Csn2. The PAM-sensing domain of Cas9 enhances the acquisition of spacers with interference-proficient PAMs. However, Cas9 nuclease activity is dispensable, and existing spacers are not strictly necessary, suggesting that the PAM interactions of Cas9 could be sufficient to select appropriate new spacers. Some Cas9 variants can also function with non-CRISPR RNAs and tracrRNA. This raises the possibility that host or MGE-derived RNAs might direct promiscuous Cas9 activity, resulting in DNA breaks or replication fork stalling that could potentially result in prespacer generation.

Roles of accessory Cas proteins in CRISPR adaptation
Although Cas1 and Cas2 play a central role in CRISPR adaptation, type-specific variations in cas gene clusters occur. In many systems, Cas1-Cas2 is assisted by accessory Cas proteins, which are often mutually exclusive and type-specific. For example, in the S. thermophilus type II-A system, deletion of csn2 impaired the acquisition of spacers from invading phages. Direct interaction between Cas1 and Csn2 also suggests a role for Csn2 in conjunction with the spacer acquisition machinery. Csn2 multimers cooperatively bind to the free ends of linear dsDNA and can translocate by rotation-coupled movement. Given that substrate-loaded type II-A Cas1-Cas2 is capable of full-site spacer integration in vitro, Csn2 may be required for prespacer substrate production, selection, or processing. Potentially, Csn2 binding to the free ends of dsDNA provides a cue for nucleases to assist in prespacer generation. Cas4, another ring-forming accessory protein, is found in type I, II-B, and V systems. Confirming its role in CRISPR adaptation, Cas4 is necessary for type I-B priming in H. hispanica and interacts with a Cas1-Cas2 fusion protein in the Thermoproteus tenax type I-A system. Fusions between Cas4 and Cas1 are found in several systems, which indicates a functional association with the spacer acquisition machinery. Cas4 contains a RecB-like domain and four conserved cysteine residues, which are presumably involved in the coordination of an iron-sulfur cluster. However, Cas4 proteins appear to be functionally diverse, with some possessing uni or bidirectional exonuclease activity, whereas others exhibit ssDNA endonuclease activity and unwinding activity on dsDNA. Because of its nuclease activity, Cas4 is hypothesized to be involved in prespacer generation. In type III systems, spacers complementary to RNA transcribed from MGEs are required for immunity. Some bacterial type III systems contain fusions of Cas1 with reverse transcriptase domains (RTs) that provide a mechanism to integrate spacers from RNA substrates. The RT-Cas1 fusion from M. mediterranea can integrate RNA precursors into an array, which are subsequently reverse-transcribed to generate DNA spacers. However, integration of DNA-derived spacers also occurs, indicating that the RNA derived–spacer route is not exclusive. Hence, the combined integrase and reverse transcriptase activity of RT-Cas1–Cas2 enhances CRISPR adaptation against highly transcribed DNA MGEs and potentially against RNA-based invaders. Other host proteins may also be necessary for prespacer substrate production. For example, RecG is required for efficient primed CRISPR adaptation in type I-E and I-F systems, but its precise role remains speculative. Additionally, it is still enigmatic why some CRISPR-Cas systems require accessory proteins, whereas closely related types do not. For example, type II-C systems lack cas4 and csn2, which assist CRISPR adaptation in type II-A and II-B systems, respectively. These type-specific differences exemplify the diversity that has arisen.

The genesis of adaptive immunity in prokaryotes
Casposons are transposon-like elements typified by the presence of Cas1 homologs, or casposases, which catalyze site-specific DNA integration and result in the duplication of repeat sites, analogously to spacer acquisition. It is possible that ancestral innate defenses gained DNA integration functionality from casposases, thus seeding the genesis of prokaryotic adaptive immunity. The innate ancestor remains unidentified but is likely to be a nuclease-based system. Co-occurrence of casposon-derived terminal IRs and casposases in the absence of full casposons might represent an intermediate of the signature CRISPR repeat-spacer-repeat structures. However, the evolutionary journey from the innate immunity– casposase hybrid to full adaptive immunity is unclear. Evolution of diverse CRISPR-Cas types would have required stringent coevolution of the Cas1-Cas2 spacer acquisition machinery, PAM and leader-repeat sequences, crRNA processing mechanisms, and effector complexes. In some systems, mechanisms to enhance the production of Cas1-Cas2–compatible prespacers from MGEs, such as priming, might have arisen because naïve CRISPR adaptation is an inefficient process with a high probability of acquiring spacers from host DNA. However, it was recently shown that promiscuous binding of crRNA-effector complexes to the host genome results in a basal level of lethal “self-priming” in a type I-F system. Host CRISPR and cas gene regulation mechanisms might have arisen to balance the likelihood of self-acquisition events against the requirement to adapt to new threats—for example, when the risk of phage infection or horizontal gene transfer is high. Alternatively, it has been proposed that selective acquisition of self-targeting spacers could provide benefits, such as invoking altruistic cell death, facilitating rapid genome evolution, regulating host processes, or even preventing the uptake of other CRISPR-Cas systems. 15

Interference: Cleaving DNA and RNA Invaders
Sequence-specific destruction of invading MGEs is the basis for CRISPR-Cas defense. In the final stage of CRISPR-Cas-mediated immunity, mature crRNAs guide the interference machinery to cleave invading nucleic acids. In order to store the genetic information of a parasitic MGE, a part of the foreign DNA must be integrated in the genomic CRISPR locus of the host. This, however, raises an inherent problem for the interference machinery: the sole reliance on sequence complementarity between the crRNA and the target sequence would result in cleavage of the CRISPR array. Hence, nearly all characterized CRISPR-Cas systems (except type III) have authentication and discrimination mechanism that involves coordinated recognition of a short sequence, called the protospacer adjacent motif (PAM), by both the adaptation and interference machinery. The presence of a PAM proximal to the acquired spacer and targeted protospacer and its absence in the CRISPR array facilitates robust immunity while averting auto-immune targeting of the CRISPR array. 13

The adaptation phase provides the genetic memory that is a prerequisite for the subsequent expression and interference phases that neutralize the re-invading nucleic acids. The insertion of new spacers has been experimentally demonstrated in several CRISPR Cas subtypes; Type I-A (Sulfolobus solfataricus, and Sulfolobus islandicus), I-B (Haloarcula hispanica), I-E (E. coli) and I-F (Pseudomonas aeruginosa and Pectobacterium atrosepticum) and Type II-A (S. thermophilus and a Streptococcus pyogenes system expressed in Staphylococcus aureus). There are two types of spacer acquisition; naïve, when the invader has not been previously encountered and primed, when there is a pre-existing record of the invader in the CRISPR. Although spacer acquisition is observed, the mechanism is only partly understood. Conceptually, the process can be divided into two steps: protospacer selection and generation of spacer material followed by integration of the spacer into the CRISPR array and synthesis of a new repeat. Occasional deletion of spacers is required to limit the size of the CRISPR, but there is little knowledge of the mechanism or frequency of such events. The key factors in spacer integration are Cas1 and Cas2. This function was suggested early as the proteins are ubiquitous but dispensable for interference. This was later confirmed by the overexpression of Cas1 and Cas2 from a Type I-E system in E. coli, which resulted in spacer integration even in the absence of all other Cas proteins. Both Cas1 and Cas2 are nucleases and mutations in the active site of Cas1 abolishes spacer integration in E. coli. Cas1 and Cas2 from E. coli form a complex where one Cas2 dimer binds two Cas1 dimers. Formation of the complex is required for spacer acquisition but Cas2 nuclease activity is dispensable.

Adaptation: Memorizing Invading Nucleic Acids
Adaptation, also known as spacer acquisition, is the step in which memory of previous infections is formed and is the reason why CRISPR-Cas immunity is adaptive and heritable. The CRISPR array serves as a genetic memory bank, and spacer acquisition into the array is accomplished in several steps: the detection of an MGE, protospacer selection, protospacer processing, and spacer integration into the CRISPR array. The key players of spacer acquisition are Cas1 and Cas2, which are present in nearly all CRISPR-Cas systems. In the type I-E CRISPR-Cas system of E. coli, a stable complex composed of two Cas1 dimers bridged by one Cas2 dimer (abbreviated as Cas1-Cas2) acts as an integrase in which Cas1 is catalytic and Cas2 has a structural function. Cas1 and Cas2 are the only Cas proteins required for naive spacer acquisition in the type I-E system (Figure A);

Origin of CRISPR-Cas molecular complexes of prokaryotes Crispr22

Spacer Acquisition in Type I Systems
(A) During naive spacer acquisition in type I-E systems, the Cas1-Cas2 complex is sufficient for the recognition of a canonical PAM. After initial fragmentation of the invading DNA by RecBCD (not shown), suitable protospacers are integrated at the leader-proximal end of the CRISPR array (inset). The CRISPR-unrelated integrated host factor (IHF) is essential for this process, as it binds a specific sequence of the leader, yielding a sharply bent DNA structure. DNA bending allows the Cas1-Cas2 complex to recognize and bind the leader-proximal repeat. The 30 OH ends of the protospacer perform nucleophilic attacks on the leader side and spacer side of the repeat backbone. During the first integration reaction, the leader-repeat boundary is nicked and ligated to one strand of the protospacer. During the second integration reaction, the other protospacer strand is ligated to the opposite end of the repeat, leading to the duplication of the first repeat. DNA polymerase and ligase subsequently fill the single-strand gaps.
(B) Primed spacer acquisition requires an existing spacer matching the target. Mutations in the seed sequence or the PAM, however, abolish interference. In some cases, crRNA guides Cascade to bind the imperfect target sequence, but the complex fails to recruit Cas3 for DNA degradation. Here, Cas1-Cas2 recruits Cas3, and the complex translocates bidirectionally (dashed arrows) away from the target site without degrading the DNA. Cas1-Cas2 selects proper protospacer with canonical PAM for spacer integration. (C) Interference-driven spacer acquisition also requires the presence of an existing spacer against the invader, which results in target cleavage by the interference machinery. Following the degradation of target DNA by Cas3, Cas1-Cas2 captures DNA fragments and subsequently integrates them into the CRISPR array.

however, additional Cas proteins are required in other systems. In the type II-A CRISPR-Cas system of Streptococcus pyogenes and S. thermophilus, all Cas proteins (Cas9, Cas1, Cas2, and Csn2) and tracrRNA are essential for spacer integration. The adaptation mechanisms of type I and type II are the most thoroughly characterized. They provide a model for our current understanding of spacer acquisition.

The Origin of Protospacer
Spacer acquisition begins with the detection of foreign genetic elements that are subsequently processed and integrated into the CRISPR array. In order to avoid auto-immunity, it is important that the adaptation machinery display a preference for foreign versus self DNA and/or that the activity of the adaptation machinery is enhanced by signals of (imminent) infection (see The Ecology and Regulation of CRISPR-Cas). A study in E. coli revealed that the degraded DNA fragments generated during the repair of double-stranded DNA (dsDNA) breaks (DSBs) are an important source of protospacers. The RecBCD repair complex is recruited to DSB sites, which are often found at replication forks. RecBCD unwinds and degrades the DNA until it reaches a crossover hotspot instigator (Chi) site. Sequences proximal to the Chi sites, as well as sites of replication fork stalling, were shown to be the protospacer sampling hotspots, suggesting that the RecBCD degradation fragments are captured by the adaptation machinery. The underrepresentation of Chi sites on foreign DNA compared to genomic DNA of E. coli allows RecBCD to degrade larger portions of the foreign genome and serves as a basis of preferential acquisition of non-self DNA. A similar mechanism was recently described in the type II-A system of S. pyogenes, where regions between exposed DNA ends and Chi sites were highly favored for spacer sampling. In phage DNA, sequences between the injected linear DNA ends and the closest Chi site are spacersampling hotspots. It has been demonstrated that the AddAB machinery (the Gram-positive paralogs of RecBCD) was necessary for efficient spacer acquisition and thus suggests a similar self- versus non-self-discrimination strategy as observed in E. coli. The reliance on other host proteins suggest that the adaptation machinery lacks an intrinsic ability to distinguish between self- and non-self-DNA. Indeed, overexpression of catalytically inactive Cas9, which abolishes interference and thus prevents auto-immunity, resulted in a surplus of genome-derived spacers over plasmid-derived spacers in the type II-A system of S. thermophilus. Considering that spacer integration is a rare event, a low acquisition rate might be a strategy to compensate for inefficient self- versus non-self-discrimination in order to reduce the chance of auto-immunity and/or allow beneficial horizontal gene transfer.

Protospacer Selection and Processing
In addition to preferential fragmentation of foreign DNA by the RecBCD/AddAB machinery, selection of specific protospacers by the adaptation machinery is often non-random. In type I and type II systems, the adaptation machinery selects protospacers with a PAM that is compatible with the interference machinery. Studies in E. coli showed that the Cas1-Cas2 complex is sufficient for PAM recognition; the Cas1 subunits preferably bind the PAM-complementary sequence. Moreover, type I-E Cas1-Cas2 prefers protospacers with 30 -single-stranded overhangs of at least 7 nt at both ends, showing that both PAM and structure affect protospacer selection. These dual-forked DNA substrates are likely derived from the partial re-annealing of the ssDNA fragments generated by RecBCD or by the interference machinery during interference-driven adaptation. Two Cas1 tyrosine wedges splay the dual-fork DNA and stabilize the 23-bp duplex. This positions the 30 overhangs near the active sites of the Cas1 dimers. Cas1 cleaves the 30 overhangs to generate a 33-nt product with a 30 OH on each overhang. Two nucleotides of the PAM-complementary sequence are removed in this process, thus preventing acquisition of spacers that would result in cleavage of the CRISPR array. The structure of the Cas1-Cas2 complex seems to serve as a molecular ruler that determines the protospacer size and thus prepares the protospacer for integration into the CRISPR array. Unlike type I-E, Cas1 and Cas2 alone are not sufficient for naive spacer acquisition in type II-A systems. Here, Cas9, Csn2 and tracrRNA are additional requirements. Cas9 selects protospacers that are adjacent to a PAM while random protospacers are selected when the PAM recognition domain of Cas9 is mutated. Cas9 catalytic activity is dispensable for protospacer acquisition, indicating that Cas9 is not involved in protospacer processing.

Spacer Integration
The CRISPR array is preceded by an AT-rich leader sequence. Spacer integration preferentially occurs at the leader end of the CRISPR array and thus keeps a chronological record of previous infections. The mechanism of protospacer integration has been studied in detail in the type I-E system of E. coli. In vitro studies showed that the mechanism by which Cas1-Cas2 integrates new spacers is similar to that of viral integrases and transposases. First, the 30 OH of the protospacer performs a nucleophilic attack at the target site and thus attaches to the 5' phosphate of the leader-proximal repeat. This process depends on the recognition of the leader-repeat boundary, which is specified through binding of the leader sequence by a CRISPR-independent protein called integration host factor (IHF). IHF sharply bends the DNA, which results in a U-shaped leader structure and favors recognition of the leader-repeat boundary by Cas1-Cas2 (Figure A; inset). In the second step, the 3' OH of the other protospacer strand is ligated to the opposite end of the first repeat. Important during this step are two inverted repeat motifs in the CRISPR repeat, which serve as anchors for the Cas1-Cas2 complex and determine the position of the second integration site (Goren et al., 2016). Upon complex binding, the repeat becomes distorted, which is crucial for making the second integration site accessible to Cas1. The incorporation of the new spacer in the correct orientation is ensured by the presence of the partial PAM on the protospacer. Though some PAM nucleotides are removed prior to integration, this likely occurs after binding of the acquisition complex to the leader-repeat junction, so directionality is preserved (Figure A; inset). Unlike type I-E, recognition of the leader-repeat end in type II-A is IHF independent and requires a short motif termed leader-anchoring site (LAS), which consists of 5 bp of the repeat-proximal leader end and is directly recognized by Cas1- Cas2. Interestingly, mutations in the LAS can lead to ectopic spacer integration within the CRISPR array. Although spacer acquisition is less effective in this case, the recognition of alternative anchoring sites gives the system flexibility to overcome alterations of the canonical LAS by integrating new spacer at an alternative anchoring site. However, spacer sequences within the CRISPR array provide less resistance against phages than leader-proximal spacers, likely due to the lower abundance of distally encoded crRNAs. After recognition of the LAS, the type II-A Cas1-Cas2 complex can conduct the first integration reaction at either end of the first repeat, although integration at the leader boundary is usually preferred. Structural data supporting this model were recently presented for the type II-A integration complex of Enterococcus faecalis. Here, terminal sequences on both sides of the repeat were shown to be sufficient but suboptimal for target recognition. Additional interactions of Cas1 with the first four repeat-proximal nucleotides of the leader, however, allow a more efficient interaction with the target and thus explaining the preference for first integration event at the leader side of the first repeat. The first reaction is characterized by generation of a half-site integration intermediate where only one strand of the protospacer is ligated to one end of the repeat. The second integration event depends on proper protospacer size, the recognition of the opposite repeat end, and bending of the repeat by the Cas1-Cas2 complex. In case these requirements are not fulfilled, full-site integration cannot occur and the acquisition complex presumably reverses the first integration reaction, or the half-site integration intermediate is removed by DNA repair proteins.

Cas Mosterd (2020): In recent years, steady progress has been made in our understanding of CRISPR-Cas systems, with the interference phase attracting the most attention. Many molecular details remain poorly understood in the spacer acquisition process. During this phase, a stretch of DNA (prespacer) is first captured by Cas proteins. 2

Martin Wilkinson (2019): A necessary stage prior to Cas1-Cas2-directed spacer integration is the acquisition (capture) of specific-length prespacer DNA substrates from the viral DNA adjacent to the PAM sites. In type I CRISPR systems, such as in E. coli, a complex of Cas1-Cas2 alone is sufficient to integrate new spacers into the CRISPR array. By contrast, in type II-A CRISPR systems, Cas1 proteins are unable to recognize the PAM sequence and so depend on the PAM-recognition function of Cas9 to generate new spacers. Spacer acquisition also requires the tetrameric dsDNA-binding protein Csn2. Once a correctly processed protospacer is generated, the Cas1-Cas2 subcomplex alone is sufficient to carry out the integration stage into the host CRISPR array. Cas1, Cas2, Csn2, and Cas9 are all required for spacer acquisition in type II-A CRISPR-Cas systems. The Cas18-Cas24-Csn28 (monomer) complex structure shows how these three proteins interact to form a large multi-subunit complex. Furthermore, the presence of double-stranded DNA running along the central channel of the complex suggests a protective role for the complex. The length of occluded DNA is approximately 30 bp, very similar to the length of the prespacer fragments produced as substrates for integration by Cas1-Cas2 complex. Bacteria contain RecBCD and/or AddAB nuclease-helicase complexes whose role, in addition to DNA repair, is digestion of invading bacteriophage DNA. These enzyme complexes are highly effective DNA-degrading machines. 1

Cas Mosterd (2020): the Cas proteins and the bound prespacer are directed to the CRISPR locus, a repeat sequence is duplicated and the prespacer is integrated between the two repeats as a novel spacer. The cas1 and cas2 genes are essential and specific to the adaptation phase and both are relatively well conserved in almost all CRISPR-Cas systems. Other universal elements of the acquisition process are the leader sequence and the first repeat of the CRISPR locus. Cas1 is a metal-dependent endonuclease that cleaves single-stranded DNA, double-stranded DNA (dsDNA), and singlestranded RNA (ssRNA). Like Cas1, Cas2 is also a metal-dependent endonuclease. The Cas2 protein was first described as having activity against ssRNA. Cas1 and Cas2 form a complex consisting of a Cas2 dimer flanked by two Cas1 dimers (Cas14–Cas22). Within the complex, only the nuclease activity of Cas1 is essential for adaptation. Cas1 has a strong binding affinity for the CRISPR locus, as it is involved in the integration of new spacers. In the absence of Cas2, Cas1 is incapable of binding to the CRISPR locus, making both proteins essential during adaptation. Cas2 functions as an adaptor protein, bridging the Cas1 proteins as well as binding and stabilizing the prespacer DNA. In addition, a CRISPR repeat and a leader sequence are required for the integration of a novel spacer. The ATrich leader sequence is typically located upstream of the CRISPR locus, which often contains the promoter that directs transcription of the CRISPR locus into pre-crRNA. Repressor proteins silence transcription in certain species, whereas in other species, expression of the CRISPR operon can be constitutive and upregulated during phage infection. In addition to its function in transcription, the leader sequence has another role within the integration process, as the integration of novel spacers usually occurs at the leader end of the CRISPR locus. Indeed, the leader sequence and the flanking repeat sequence harbour recognition signals with the ability to direct the Cas1–Cas2 complex to this position. These recognition sites are located within 10 bp of the integration site in both the leader and repeat sequences. The method by which sequences are integrated into the CRISPR locus suggests the probable acquisition of spacers that also target the bacterial chromosome (selfDNA) instead of foreign DNA. However, there appears to be a bias towards acquiring new spacers from foreign DNA compared with self-DNA. The number of PAM sequences present on bacterial self-DNA is similar to that of foreign DNA, and therefore, the reason cannot be related to PAM prevalence. It has been shown that one of the sources of protospacers for the Cas1–Cas2 complex are degradation products of the bacterial RecBCD complex, which processes double-stranded DNA breaks (DSBs), although this does not apply to all the CRISPR-Cas system types. The large majority of these DSBs occur at replication forks during DNA replication, and more are found on plasmids than on the bacterial chromosome. The RecBCD complex unwinds and degrades the DNA, starting at the DSB until it reaches a crossover hotspot instigator (Chi) site. Compared with plasmid and phage DNA, bacterial genomic DNA is rich in Chi sites and, therefore, is indirectly protected from spacer acquisition. Because phage DNA enters the cell as linear DNA, contains few Chi sites, and is highly replicative, it is presumably an easier target than chromosomal DNA for the RecBCD complex. This may explain why the CRISPR-Cas system preferentially acquires spacers from foreign DNA sources rather than its own. In type I systems, the degradation products from the activity of Cas3 during primed adaptation (see below) have also been demonstrated to function as prespacers. While the RecBCD complex is found only in Gram-negative bacteria, Gram-positive bacteria possess a highly similar mechanism in the form of the AddAB repair machinery. As with the RecBCD complex, the function of the AddAB machinery has been demonstrated to have a large impact on spacer acquisition. Depending on the type of CRISPR-Cas system, additional factors are required to process the prespacers into spacers ready for integration. In type I-E systems, Cas2 is fused to a DnaQ domain, which degrades prespacers at the 3= end to generate suitable spacers for integration. A similar role has been proposed for Cas4, present in types I, II, and V systems. Cas4 cuts the 3= overhangs of prespacer DNA until a bound Cas14–Cas22 complex is encountered to generate spacers of specific length and a correct PAM. As such, Cas4 prevents the integration of nonfunctional spacers with an inappropriate length or PAM. The function of Cas4 and DnaQ can also be performed by non-CRISPR nucleases in other systems. 2

Luciano A. Marraffini (2018): Protospacer capture: Identification of foreign nucleic acids. CRISPR systems use various mechanisms to bias spacer acquisition to foreign genetic elements. For the generation of spacer substrates, CRISPR systems use the DNA repair machinery of the host: RecBCD in Gram-negative organisms and its homologue, AddAB, in Gram-positive organisms. RecBCD stimulates spacer acquisition from double-strand breaks. This activity is limited by chi sites, which are eight nucleotide sequence motifs. Because chi sites are enriched in the host chromosome relative to genomes of phages or plasmids (for example, in the Escherichia coli genome, chi sites are found at rates 14 times higher than expected), this is a mechanism to constrain spacer acquisition from the host genome and differentiate self from non-self nucleic acids. The free dsDNA end that is presented to the cell during infection by dsDNA phages is exploited by the CRISPR system to preferentially acquire spacers from the phage DNA (Fig. 2a), as the bacterial chromosome is circular and lacks free DNA ends (with the exception of accidental dsDNA breaks, most common at the terminus).

Origin of CRISPR-Cas molecular complexes of prokaryotes Protos10

Fig. 2 | Protospacer selection and capture.
a | RecBCD in Gram-negative organisms (or AddAB in Gram-positive organisms) generates substrates for spacer acquisition following the injection of viral DNA, possibly by producing more invader DNA molecules that contain free ends.
b| Two mechanisms for selection of functional targets. In the type I-E system, Cas1–Cas2 has inherent substrate preference for protospacers with a canonical protospacer-adjacent motif (PAM). In type II, the PAM-interacting domain of Cas9 (loaded with trans-activating CRISPR RNA (tracrRNA), not shown) guides the Cas1–Cas2 complex (as well as the accessory protein Csn2) in selecting protospacers.
c | The CRISPR RNA (crRNA)-guided CRISPR-associated complex for antiviral defence (Cascade) binds to a foreign target in a PAM-dependent manner, and it subsequently recruits the nuclease Cas3, which results in the generation of suitable substrates for spacer acquisition. d | Imperfect target recognition by Cascade results in an altered conformation of the Cse1 subunit. This leads to the recruitment of a nuclease-inactive Cas3 in a Cas1–Cas2-dependent manner, which mediates primed spacer acquisition.

This also biases the pool of acquired spacers to the end of the phage genome that is being injected. This results in acquisition of spacers that facilitate the immediate recognition and cleavage of invading DNA at the very beginning of the infection and results in more effective immunity. Although RecBCD is important for efficient spacer acquisition, its degradation products are reported to be ssDNA fragments. Given that the in vitro spacer integration studies showed that dsDNA protospacer substrates are markedly favored over ssDNA ones it remains unresolved if and how RecBCD degradation products could be used for spacer integration. Alternatively, it is possible that the Cas1–Cas2 machinery physically associates with RecBCD to either directly uptake degradation products from RecBCD or to sample intact dsDNA upstream of RecBCD. Moreover, given that spacer acquisition can occur in the absence of RecBCD and AddAB, there may be alternative pathways for spacer generation, which will be an interesting area of future study. There is also evidence that CRISPR systems avoid deleterious levels of autoimmunity by limiting the rate of spacer acquisition. In laboratory settings, successful acquisition of new spacers against phages is an extremely rare event, estimated to occur in only 1 in 107 cells. Spacer acquisition from the host genome is equally rare and does not pose substantial fitness costs to the host. However, increased rates of spacer acquisition in mutants have been shown to lead to higher levels of toxicity, which suggests that the rate of spacer acquisition has been tuned to balance the benefits of protection with autoimmunity. To mitigate growth rate costs associated with autoimmunity, it is also possible for spacer acquisition to be temporally regulated. Indeed, quorum sensing has been implicated as a regulator of CRISPR activity in at least two bacterial species. 25

Last edited by Otangelo on Sun Aug 28, 2022 8:38 am; edited 25 times in total

Otangelo

Otangelo

Admin

Posts : 9704
Join date : 2009-08-09
Age : 58
Location : Aracaju brazil

Gil Amitai (2016): Composition of the adaptation machinery: CRISPR–Cas adaptation is a complex, multistage process in which a protospacer needs to be extracted from an invading foreign DNA and subsequently stored within the CRISPR array as a spacer. First, the foreign DNA needs to be recognized as a target for spacer acquisition. Second, a sequence of a specific size (typically 30–40 bp, depending on the subtype of CRISPR–Cas system) needs to be acquired from the foreign DNA. Finally, the acquired sequence must be integrated as a new spacer into the CRISPR array, and the adjacent repeat sequence needs to be duplicated.

Origin of CRISPR-Cas molecular complexes of prokaryotes The_th10

The three stages of CRISPR–Cas immunity.
a | Organization of a typical CRISPR–cas locus in a bacterial or archaeal genome. The numbers, order, and identities of the cas genes are variable between CRISPR–Cas subtypes, and the number of spacer–repeat units varies between species.
b | In the adaptation stage, the Cas1–Cas2 complex, which comprises two Cas1 dimers and a single Cas2 dimer, acquires a protospacer from the invader DNA and integrates it as a new spacer into the CRISPR array. Integration is coupled with a duplication of the first repeat.
c | In the expression and maturation stage, the CRISPR array is transcribed and then processed into mature CRISPR RNAs (crRNAs), each containing a transcribed spacer and part of the repeat sequence. These crRNAs form ribonucleoprotein (RNP) complexes with Cas proteins. The Cas proteins in these complexes vary between subtypes and include CRISPR-associated complex for antiviral defence (Cascade) proteins (type I CRISPR–Cas systems), Cas9 (type II systems), Csm proteins (type III‑A systems) and Cmr proteins (type III‑B systems).
d | In the interference stage, the crRNA–Cas RNP complex identifies the target DNA through complementary base-pairing in the presence of a protospacer adjacent motif (PAM; in type I, type II and type V systems), and the target sequence is then degraded by nuclease proteins or domains. Both the position of the PAM and the identity of the nuclease that degrades the target vary between CRISPR–Cas subtypes.

Although the components and prerequisites of the spacer acquisition machinery vary between organisms and subtypes of CRISPR–Cas system, several components seem to be universally conserved and are essential among all CRISPR–Cas subtypes. These components are the Cas proteins Cas1 and Cas2 and, within the CRISPR array locus, the leader sequence and the first CRISPR repeat. Cas1 and Cas2 are essential for spacer acquisition in all studied CRISPR–Cas systems, but do not seem to have any role in the expression and maturation stage was shown to be involved in all stages of spacer acquisition. Cas1 is an endonuclease , and its endonuclease activity is essential for spacer acquisition. or the interference stage. Cas1 and Cas2 are usually encoded in the same operon and form a structurally stable protein complex. Cas2 has various DNA and RNA cleavage activities, but these activities are not essential for spacer acquisition; therefore, the primary role of Cas2 in spacer acquisition is currently thought not to involve its catalytic activity.

The Cas1–Cas2 complex typically inserts new spacers into the junction between the leader sequence and the first repeat of the CRISPR array. The leader sequence is a long AT-rich sequence positioned immediately upstream of the CRISPR array, and it usually contains both the promoter that drives crRNA expression and the recognition sequence for spacer insertion. The junction between the leader sequence and the first CRISPR repeat is the preferred site of new spacer integration, and the minimal sequence required for integration spans only a short segment at the 3′ end of the leader sequence and a single repeat unit. Owing to the preference for integration at this junction, spacers are inserted into the CRISPR array with a polarity towards the leader sequence end of the array, generating a chronologically ordered array in which the most recently acquired spacer is the spacer most proximal to the leader sequence. The sequences from which the spacers are derived are called protospacers (denoting the sequence segments residing in the foreign DNA molecule prior to integration into the CRISPR array). For type I, type II and type V CRISPR–Cas systems, a protospacer adjacent motif (PAM) is present upstream or downstream of the protospacer in the foreign DNA. The PAM is a short (2–5 nucleotide) sequence that is essential for cleavage of the target DNA during the interference stage. During spacer acquisition, spacers are preferentially selected from protospacers that have a cognate PAM for the CRISPR–Cas system in question. Although the Cas1–Cas2 complex was shown to be sufficient to mediate PAM-dependent spacer acquisition in type I CRISPR–Cas systems, PAM recognition in type II systems additionally requires Cas9.

The source material for new spacers
At the initial stage of spacer acquisition, the foreign DNA needs to be recognized and processed to derive the substrate for spacer integration by the Cas1–Cas2 complex. A recent genome-wide study of protospacer hotspots in E. coli suggested that the substrates for integration are degraded DNA intermediates that are formed during the repair of double-strand breaks (DSBs). When a DSB occurs in an E. coli cell, the RecBCD exonuclease complex recognizes the exposed double-stranded DNA (dsDNA) end and then rapidly unwinds and degrades the DNA until it reaches an 8bp sequence motif (5′-GCTGGTGG-3′) called a Chi site. Using deep-sequencing analysis of millions of spacer acquisition events in Cas1–Cas2-expressing E. coli, it was found that protospacer hotspots are located between replication fork stalling sites (which are major sources of DSBs) and the nearest Chi site. This suggested that the Cas1–Cas2 complex acquires new spacers from the debris emerging from RecBCD-mediated DNA degradation (FIG. 2a).

Origin of CRISPR-Cas molecular complexes of prokaryotes Discri11

Figure 2 | Discrimination between self and non-self DNA in type I‑E CRISPR–Cas system adaptation.
a | For CRISPR–Cas systems, the source material for new spacers is suggested to be derived from the processing of linear double-stranded DNA (dsDNA) ends, which are found in phage DNA or are formed following a double-strand break (DSB). The multisubunit RecBCD nuclease enzyme processes these ends, producing single-stranded DNA (ssDNA) intermediates. DNA processing by RecBCD proceeds until the enzyme reaches the nearest instance of a specific octameric sequence known as a Chi site.
b | The Escherichia coli genome is highly enriched in Chi sites, so that RecBCD processing is soon terminated and thus produces only a small amount of host genome-derived degradation material. By contrast, foreign DNA that lacks Chi site enrichment is more extensively processed, providing ample material for new spacers.

Indeed, the artificial induction of DSBs at a specific position in the E. coli genome resulted in the formation of a strong hotspot for spacer acquisition between the DSB site and the nearest Chi site on either side of the induced break. These data are also suggestive of an elegant solution to the problem of discrimination between self and non-self DNA in CRISPR–Cas-based immunity. Under native conditions, RecBCD is thought to degrade linear dsDNA into single-stranded DNA (ssDNA) molecules with sizes ranging from tens to thousands of nucleotides. Recent structural studies have shown that the Cas1–Cas2 complex binds to protospacers with a 23bp dsDNA core and splayed ssDNA ends. Presumably, therefore, RecBCD-generated ssDNA fragments reanneal in the cell to form incomplete dsDNA intermediates that are substrates for spacer acquisition by the Cas1–Cas2 complex. However, an alternative possibility is that the Cas1–Cas2 complex initially binds to ssDNA, and then DNA polymerase activity from an unknown source generates the second strand to form a dsDNA. Hence, further studies are required to elucidate the mechanism of the very early steps of spacer acquisition

Discrimination between self and non-self DNA.
In natural settings, the accidental acquisition of spacers from ‘self ’ DNA — that is, from the genome of the cell — instead of from invading DNA is usually detrimental, as it results in the degradation of self DNA by the CRISPR–Cas interference machinery. Such self-targeting leads to CRISPR–Cas autoimmunity, and it has been shown that escape from this autoimmunity usually involves the mutational inactivation of cas genes, mutations in the repeats next to the self-derived spacer or escape mutations in the PAM. Therefore, it is necessary for CRISPR–Cas systems to avoid acquiring self DNA to minimize these harmful effects. Indeed, early observations in the E. coli type I-E CRISPR–Cas system showed a strong preference for spacer acquisition from foreign DNA and an avoidance of self DNA. The involvement of the RecBCD machinery and Chi sites in generating the substrate for spacer acquisition provides a simple explanation for the strong bias against acquiring self DNA. Chi sites are highly enriched in the E. coli genome, occurring on average once every 4.6kb (instead of approximately once every 65kb, as expected by chance). Therefore, when a DSB occurs in the E. coli genome, RecBCD degrades only a short length of self DNA before the degradation activity is halted by the nearest Chi site (which is 4.6kb away, on average). Thus, only a small number of degraded self DNA molecules are generated as potential substrates for spacer acquisition by the Cas1–Cas2 complex. By contrast, a DSB in exogenous DNA that is not enriched for Chi sites results in long-range DNA degradation by RecBCD, generating ample substrates for new spacers (FIG. 2b). Moreover, as the genetic material of phages usually enters the host cell as linear dsDNA, the linear end is perceived by RecBCD as a DSB, promoting the degradation of phage DNA and the formation of substrates for new spacers. To counter this mechanism, some phages express RecBCD inhibitors, and others enrich their genomes with Chi sites. The suggested RecBCD-based machinery also explains the preference of the Cas1–Cas2 complex for protospacers from high-copy-number plasmids, even though such plasmids are circular rather than linear. It has previously been documented that most DSBs in the cell are produced at replication forks during DNA replication. Importantly, two replication forks are present on the chromosome during DNA replication, but the number of replication forks on plasmid DNA is proportional to the plasmid copy number (one or two forks per copy). As a result, in cells with high copy-number plasmids, replication forks are much more abundant on plasmid DNA than on the chromosome. This relative abundance of replication forks on plasmid DNA is therefore expected to cause more DSBs in plasmids than in the chromosome; this would yield more linear plasmid DNA molecules that form substrates for RecBCD and, ultimately, a larger number of plasmid-derived protospacers as source material for the Cas1–Cas2 complex. Indeed, in several experimental systems in which the E. coli type I-E Cas1–Cas2 complex was expressed without the presence of the interference machinery, the acquisition of new spacers showed a strong bias for plasmid DNA compared with chromosomal DNA. Interestingly, some CRISPR–Cas systems contain the protein Cas4, which has a RecB nuclease domain that has ssDNA-targeted exonuclease activity. One may speculate that the RecB domain of Cas4 operates as an alternative RecB nuclease in bacteria in which RecBCD is absent, or that it competes with host RecB. It is important to note that the mechanism for discrimination between self and non-self DNA described here has to date been observed only in the type I-E CRISPR–Cas system of E. coli. It is possible that other systems in other organisms use alternative mechanisms to avoid self DNA during the adaptation process. For example, it has been observed that inactivation of the Cas9 nuclease activity in a type II-A CRISPR–Cas system leads to pervasive spacer acquisition from the self chromosome, indicating that a different mode of discrimination between self and non-self DNA operates in type II CRISPR–Cas systems.

John van der Oost (2014): ‘Self’ versus ‘non-self’ discrimination by CRISPR–Cas systems
All immune systems must efficiently distinguish ‘self’ from ‘non-self’ to avoid autoimmunity. ( Autoimmune disease occurs when an immune response attacks our own tissues. ) In DNA-targeting CRISPR–Cas systems, the mechanism of discrimination occurs during CRISPR surveillance. The protospacer itself cannot be used for discrimination, as the crRNA spacer is also complementary to its template in the CRISPR locus on the host chromosome. Type-specific short sequences (of 2–3 nucleotides), which are collectively known as protospacer adjacent motifs (PAMs), are necessary for discrimination. The most important feature of the PAM is that it differs from the corresponding sequence of the CRISPR repeat, which enables discrimination between a non-self target and a self non-target. Indeed, experimental analyses of CRISPR interference by type I and type II systems have confirmed an important role for the PAM motif. Moreover, studies of CRISPR adaptation in these systems indicate that the PAM is also important for spacer acquisition. This makes sense, as only functional protospacers (that is, those that provide immunity) are selected for integration into the CRISPR array. Type I and type II systems use a ‘non-self activation’ strategy that involves protein-mediated detection of a PAM that is located adjacent to the targeted protospacers in the invading DNA. This eventually results in the ‘switching on’ of interference, most probably by a conformational change that triggers either the recruitment of a nuclease to the crRNP complex (for example, Cas3 in type I systems) or the induction of intrinsic crRNP nuclease activity (for example, Cas9 in type II systems). In type I systems, PAMs are located downstream (at the 3ʹ end) of the protospacer on the target strand, whereas PAMs of type II systems are located upstream (at the 5ʹ end) of the protospacer. Recognition of PAMs may occur in a single-stranded conformation, which either exclusively involves the strand that base pairs with the crRNA (in type I systems) or the displaced strand (in type II systems). Type III systems seem to lack the PAM-based system; instead, the type III‑A system uses a ‘self inactivation’ strategy that involves base pairing between the 5ʹ handle of the crRNA (as part of the Csm complex) and the repeat sequence in the CRISPR locus on the host chromosome. Base paring in this region of the crRNA signals binding to the chromosomal CRISPR array (self DNA), which seems to trigger the ‘switching off’ of the interference process, possibly by preventing the recruitment of the nuclease.

James K Nuñez (2014): Crystal structure of the Cas1–Cas2 complex: To gain insights into the structural organization of the Cas1–Cas2 complex, we determined the crystal structure of the complex. Crystal structures of Cas1 and Cas2 alone from various organisms, including E. coli K12. Cas1 proteins are asymmetrical homodimers with each monomer having an N-terminal β-sheet domain and C-terminal α-helical domain. Cas2 proteins are symmetrical homodimers with a core ferredoxin fold. The overall architecture of the asymmetric unit is a heterohexameric complex consisting of two Cas1 dimers (Cas1a-b and Cas1c-d) that sandwich one Cas2 dimer (Fig. 2).

Origin of CRISPR-Cas molecular complexes of prokaryotes Crysta13

Figure 2 Crystal structure of the Cas1–Cas2 complex.
(a) Overall structure, consisting of a Cas2 dimer (yellow and orange) and two Cas1 dimers (denoted with suffixes a–d; blue and teal).
(b) Superposition of the Cas1a–Cas1b dimer with the previously determined E. coli Cas1 structure (gray, PDB 3NKD24). The dashed orange circle highlights the conformational change observed in the α-helical domain of Cas1a.
(c) Superposition of the Cas2 dimer in the complex with the previously determined E. coli Cas2 structure (gray, PDB 4MAK). The blue arrows point to the last resolved residue in the 4MAK structure. The N and C indicate the termini of each monomer; r.m.s.d. values of the superpositions are indicated.

Cas1a and Cas1c make contacts with the Cas2 dimer, and we observed no contacts between Cas1b or Cas1d and the Cas2 dimer. The Cas1c-Cas2 protein-protein interface buries a large surface area of ~3,100 Å2, whereas the Cas1a-Cas2 interface buries an additional 800 Å2 contributed by the C terminus of Cas1a, as described further below. Superposition of the two Cas1 dimers (a-b dimer with c-d dimer) shows high structural similarity, with an r.m.s. deviation (r.m.s.d.) of 0.394 Å for the Cα atoms. Similar contacts are present between Cas1a and Cas1c with Cas2 on opposite sides, thus creating a symmetrical complex. Although Cas1 and Cas2 predominantly form a heterotetrameric complex in solution, our crystal structure suggests that the complex may also be capable of accessing a hexameric state during acquisition.

Conformational changes and contacts within the complex
The interface between Cas1 and Cas2 consists of hydrogen-bonding, electrostatic and hydrophobic interactions. We observed extensive electrostatic contacts between three arginine residues (R245, R252 and R256) in α8 of Cas1 with two acidic residues (E65 and D84) of Cas2. The R252 residue is positioned between E65 and D84 and may sample salt bridges between the two acidic residues, although we observed continuous density between R252 and E65 at the Cas1a–Cas2 interface. In the same region, backbone hydrogen-bond contacts are present between the newly resolved Cas2 β7 C terminus and β4 of Cas1. To identify Cas1 and Cas2 conformational changes that occur upon complex formation, we superimposed previously determined structures of apo Cas1 and Cas2 from E. coli with the Cas1–Cas2 complex structure (Fig. 2b,c). In addition to minor conformational changes present in the canonical βαββαβ ferredoxin fold of Cas2, the C terminus forms two antiparallel β-sheets (β6–β7) that contact β4 of Cas1 (Figs. 2c ). This region is unresolved in the apo-Cas2 structure, which terminates at the C terminus of β5. Presumably, the β6–β7 region is flexible before complex formation with Cas1.

John van der Oost (2014): The involvement of Cas1 and Cas2.
The strict conservation of Cas1 and Cas2 in all CRISPR–Cas systems, together with the finding that Cas1 and Cas2 are required for the integration of new spacers, suggests that the basic mechanism of CRISPR adaptation is conserved (FIG. 3).

Origin of CRISPR-Cas molecular complexes of prokaryotes Spacer11

Figure 3 | CRISPR spacer acquisition.
a | Proposed stages of CRISPR spacer acquisition: fragmentation of invading DNA (in this case, phage DNA), selection of the protospacer by recognition of the protospacer adjacent motif (PAM), processing of the pre-spacer, nicking of the leader-end repeat in the CRISPR locus, integration of the new spacer and duplication of the flanking repeat. Both type I and type II systems rely on PAM recognition for spacer integration, whereas the type III systems do not. b | Crystal structures of Cas1 (from Pseudomonas aeruginosa) and Cas2 (from Desulfovibrio vulgaris), which are the two main endonucleases that are involved in spacer acquisition. Cas1 is a metal-dependent, dimeric endonuclease (DNase) with a unique three-dimensional fold that consists of an amino-terminal β‑strand domain and a carboxy-terminal α‑helical domain. Sequence conservation (indicated by colour intensity) of Cas1 shows that the metal ion-binding site is highly conserved among Cas1 family proteins. Cas2 is a metal-dependent, dimeric endonuclease (RNase and/or DNase), with a metal-binding site at the interface of the two subunits (which is composed of RAMP domains).

Although the simultaneous expression of both Cas1 and Cas2 enables spacer acquisition, their precise functions in the adaptation process remain elusive. Cas1 is a metal-dependent endonuclease that catalyzes the cleavage of double-stranded DNA (dsDNA), single-stranded DNA (ssDNA) and branched DNA in a sequence-independent manner. Crystal structures of the homodimeric Cas1 protein have shown that it consists of an amino-terminal β-strand domain and a carboxy-terminal α-helical domain (FIG. 3b). The C-terminal domain contains a conserved binding site for a divalent metal ion, which is crucial for DNA degradation in vitro and spacer acquisition in vivo. The metal-binding site is surrounded by a cluster of basic residues that form a positively charged strip across the surface of the C-terminal domain. This surface has been implicated in DNA binding and might be involved in the positioning of substrates close to the metal ion in the active site. Cas2 is a metal-dependent nuclease that contains a RAMP-like fold with a typical β1 α1 β2 β3 α2 β4 arrangement, in which the two α-helices are positioned together on one face of a four-stranded antiparallel β-sheet (FIG. 3b).

The β-sheets from two Cas2 protomers form a β-sandwich, and conserved amino acids are positioned along the dimer interface. The substitution of a conserved aspartic acid residue in each protomer, located at the dimer interface, does not affect their assembly (FIG. 3b), but it perturbs the binding of a metal ion and disrupts nuclease activity. Although several studies have reported that Cas2 proteins are endoribonucleases, other Cas2 proteins mainly catalyze the cleavage of dsDNA, which indicates that they are deoxyribonucleases. Differences in the loop regions might explain differences in substrate preference; for example, Cas2 proteins that have a long loop connecting α2 to β4 have a relatively narrow substrate-binding cleft and correspond to ribonucleases. By contrast, Cas2 proteins that have long β1 –α1 loops contain wider substrate-binding clefts and show deoxyribonuclease activity. A recent study has revealed that Cas1 and Cas2 from E. coli form a stable complex that interacts with the CRISPR locus. The data show that an intact Cas1–Cas2 complex is essential for spacer acquisition in vivo. Importantly, although Cas1 activity is required for protospacer processing and/or spacer integration, Cas2 activity is not needed for spacer acquisition. Other factors involved in spacer acquisition. In addition to the participation of Cas1 and Cas2, there are indications that a variable set of accessory factors might be involved in spacer acquisition. Pulldown assays have shown that Cas1 of Escherichia coli interacts with RecBCD and RuvB, which are housekeeping proteins that are involved in general DNA repair and recombination. Moreover, several cases of gene fusion and conserved gene clustering suggest that CRISPR acquisition might require additional Cas proteins, such as Csn2, Cas4, Csa1 and Cas3. Attempts have been made to verify the putative roles of some of these proteins in CRISPR adaptation. Csn2 is encoded by all type II-A systems and has been shown to be involved in CRISPR adaptation in Streptococcus thermophilus . Several structural studies have revealed that Csn2 forms a tetrameric ring-shaped complex with a positively charged central cavity that binds to, and slides along, DNA fragments. The apparent lack of Csn2 catalytic activity suggests that it might have an accessory role during spacer acquisition (such as stabilizing the double-strand break during spacer integration) or that it might be involved in the recruitment of additional factors. Cas4 and Csa1 share amino acid sequence similarity with RecB- and AddB-type nuclease–helicases. The Cas4 protein of Sulfolobus solfataricus is a ring-shaped decamer that has DNA-targeting 5ʹ to 3ʹ exonuclease activity. In addition, some Cas4 homologues have been reported to have endonuclease activity as well as helicase activity. Fusions of Cas4 and Cas1 occur in several bacterial and archaeal type I and type III systems, which indicates that the two proteins are functionally related. Cas4 from Thermoproteus tenax has been shown to form a complex in vitro with a Cas1–Cas2 fusion protein and Csa1. However, such complexes have not yet been isolated from a natural system, which may indicate that the proteins interact only transiently in vivo. Furthermore, it is likely that fusion proteins (such as Cas4–Cas1 and Cas1–Cas2) might contribute to stabilizing these complexes. Cas3 is a multidomain nuclease–helicase that is fused to Cas2 in type I-F systems. In the type I-F system of Pectobacterium atrosepticum, a direct interaction between Cas1 and the Cas2–Cas3 fusion protein has been observed, which suggests that Cas3 has a dual role, functioning during CRISPR interference as well as during spacer acquisition. The proposed role for Cas3 during both acquisition and interference might be related to a phenomenon that is known as ‘primed spacer acquisition’. Priming refers to the positive-feedback loop that accelerates the acquisition of new spacers from previously encountered genetic elements. In the type I-E system, this process requires Cas1, Cas2, Cas3 and an RNP complex that is composed of crRNA and multiple Cas proteins (that is, Cascade), which suggests that many proteins participate in this process. However, the mechanism of primed spacer acquisition is currently unknown.

PAM-dependent spacer acquisition
Jiuyu Wang (2015): Bacteria acquire memory of viral invaders by incorporating invasive DNA sequence elements into the host CRISPR locus, generating a new spacer within the CRISPR array. Our study reveals a protospacer DNA comprising a 23-bp duplex bracketed by tyrosine residues, together with anchored flanking 30 overhang segments. The PAM-complementary sequence in the 30 overhang is recognized by the Cas1a catalytic subunits in a base-specific manner, and subsequent cleavage at positions 5 nt from the duplex boundary generates a 33-nt DNA intermediate that is incorporated into the CRISPR array via a cut-and-paste mechanism. Upon protospacer binding, Cas1-Cas2 undergoes a significant conformational change, generating a flat surface conducive to proper proto-spacer recognition. Here, our study provides important structure-based mechanistic insights into PAM-dependent spacer acquisition.

An A-T-rich leader sequence located upstream of the first repeat is essential for spacer acquisition and promotes the transcription of the CRIPSPR array. The CRISPR-Cas system defends against invasive nucleic acids from phages or plasmids in three steps. First, in the spacer acquisition step (also called adaptation), a new spacer is acquired from the invader DNA and integrated into the CRISPR locus. Second, the CRISPR locus is transcribed and processed into short mature CRISPR RNA (crRNA), which then binds to Cas proteins and forms a protein-RNA complex. Finally, the invading nucleic acid complementary to crRNA is recognized and degraded by the protein-crRNA complex. While the molecular mechanisms of expression and interference steps are now well characterized in molecular and functional terms, the adaptation step still awaits detailed analysis. Recent studies have shown that the protospacer-adjacent motif (PAM) is fundamental to avoid auto-immunity. Only if the invading DNA is flanked by the correct PAM can it be cleaved during interference. Furthermore, it was shown that PAMs are of critical importance for recognition and selection of protospacer during acquisition. It was found that protospacers flanked by the correct PAM could be incorporated into the CRISPR array. Interestingly, in Escherichia coli, the last nucleotide of the new repeat is derived from the first nucleotide of the incoming spacer, and this nucleotide is indeed the last nucleotide of the PAM sequence. Cas1 and Cas2 are the only two Cas proteins universally conserved across all CRISPR-Cas systems. Previous in vitro analysis showed that Cas1 is a metal-dependent DNase, capable of cleaving single-stranded (ss) DNA, double-stranded (ds) DNA, cruciform DNA, and branched DNA in a sequence-independent manner. Likewise, Cas2 was identified as a metal-dependent endoribonuclease that cleaves ssRNA or dsDNA or, alternately, shows no significant nuclease activity. However, one recent study demonstrated that the ‘‘active site’’ of Cas2 is not required for spacer acquisition, suggesting that Cas2 could play other as-yet unknown functions. Overexpression of E. coli Cas1 and Cas2 induces new spacer acquisition by inserting exactly 33 nt foreign DNA behind the first repeat, indicating that Cas1 and Cas2 are both necessary and sufficient for new spacer acquisition. Previous studies demonstrated that Cas1 and Cas2 form a stable complex, which functions as an integrase that incorporates the new spacers into the CRISPR locus. In E. coli, the integration process involves the staggered cleavage of the first CRISPR repeat, and new spacers are incorporated proximal to the leader sequence. From this, three fundamental questions arise as to how Cas1-Cas2 mediates the spacer acquisition. First, what are the physiological DNA substrates of Cas1-Cas2, and what are the respective roles of Cas1 and Cas2 proteins? Second, while the spacers are known to be of a set length in each species, what are the molecular mechanisms underlying spacer length determination? Third, how does the acquisition machinery select protospacers containing a PAM sequence?

To understand the molecular mechanisms of spacer acquisition, we determined the crystal structure of E. coli Cas1-Cas2 bound with dual-forked DNA. The protospacer DNA captured by Cas1-Cas2 adopts a dual-forked form, with the 30 overhangs of the protospacer essential for new spacer acquisition. The PAM-complementary sequence (50 -CTT-30 ), located within the 30 overhang, is recognized in a sequence-specific manner and is cleaved by Cas1a, generating a DNA intermediate that has 5-nt 30 overhangs on the two partner strands. Given that tyrosine residues cap either end of a 23-bp duplex, Cas1- Cas2 predetermines the length of the newly acquired spacer, thereby highlighting the role of both Cas1 and Cas2 in the acquisition mechanism. Moreover, Cas1-Cas2 undergoes a significant conformational change upon protospacer binding, thereby generating optimal protospacer and target binding sites.

Search and Optimization of the DNA Substrate
In terms of nomenclature, within each symmetric half of the complex, the proteins are labeled Cas1a, Cas1b, and Cas2 and Cas1a0 , Cas1b0 , and Cas20 . Analysis of our structures showed that this complex contains a pair of Cas1 dimers sandwiching one Cas2 dimer

Origin of CRISPR-Cas molecular complexes of prokaryotes Crysta12

Figure 1. Crystal Structure (2.6 A˚ ) of E. coli Cas1-Cas2 Bound to a Dual-Forked DNA
(A) A representation of the CRISPR-Cas locus of E. coli K12. The CRISPR locus consists of series of repeats (orange diamonds) that are separated by spacer sequences (red rectangles) of constant length. Cas1 and Cas2 are shown in magenta and green colors, respectively.
(B) Schematic diagram of the dual-forked DNA, which is a 23-mer palindromic duplex with 50 -(T)6 and 30 -(T)10 overhangs on both ends. The nucleotides in the 50 overhangs are numbered from -6 to -1; those in the DNA duplex are numbered from 1 to 23; and those in the 3' overhang are numbered from 24 to 33. The two strands of DNA are colored in red and blue, respectively.
(C) Structure of the dual-forked DNA in the Cas1-Cas2 complex.
(D) Orthogonal views of the crystal structure of the complex of Cas1-Cas2 bound to the dual-forked DNA. The Cas1a and Cas1a0 are shown in light orange, and Cas1b and Cas1b' are show in magenta. Two monomers of Cas2 are in green and cyan, respectively. The proposed Arch segment is labeled.
(E) The surface view of the Cas1-Cas2 dual-forked DNA complex in the same orientation as Figure 1D, bottom.

similar to the structure of DNA-free Cas1-Cas2. In this 2-fold symmetric complex, the two single-forked DNAs lie on the surface of the Cas1-Cas2 in a head-to-head orientation. Each 10-bp duplex lies on the interface of a Cas1a/b dimer, with the fork facing toward the edge of the Cas1a/b dimer and the duplex end positioned on the Cas1-Cas2 interface. These findings strongly indicate that the two DNA forks always face toward the outside of Cas1-Cas2, suggesting that this orientation of the forks is fixed in the protein complex. While the two forks are facing outward, the blunt ends of both duplexes extend toward the center, where the Cas2 dimer is located. Interestingly, the blunt ends do not meet but leave a gap in between, indicating that Cas1-Cas2 associates with duplex DNA longer than 20 bp. To test this assumption, we used various substrates, including single-fork DNA containing either 11- or 12-bp duplexes and dual-forked DNA with duplexes of 21–24 bp in length, flanked by 30 and 50 overhangs at both ends. To our surprise, the complex with dual-forked DNA substrates resulted in crystals with greatly improved diffraction, from which we obtained a structure of the complex at a higher resolution of 2.6 A˚ . This result suggests that this dual-forked DNA is closely related to the in vivo substrate used by Cas1- Cas2. 3

Simon A. Jackson (2017): Recognition of the CRISPR array
Before integration, the substrate-bound Cas1- Cas2 complex must locate the CRISPR leader repeat sequence. Specific sequences upstream of CRISPR arrays direct leader-polarized spacer integration, both through direct Cas1-Cas2 recognition and assisted by host proteins. The Cas1-Cas2 complexes of several systems show an intrinsic affinity for the leader-repeat region in vitro, yet this is not always wholly sufficient to provide the specificity observed in vivo. It was recently discovered that for the type I-E system, leader-repeat recognition is assisted by the integration host factor (IHF) heterodimer. IHF binds the CRISPR leader in a sequence-specific manner and induces 120° DNA bending, providing a cue to accurately localize Cas1-Cas2 to the leader-repeat junction. A conserved sequence motif upstream of the IHF pivot is proposed to stabilize the Cas1- Cas2–leader-repeat interaction and increase the efficiency of spacer acquisition, supporting binding of the adaptation complex to DNA sites on either side of the bound IHF. IHF is absent in many prokaryotes, including archaea, indicating that other leader-proximal integration mechanisms exist. Indeed, type II-A Cas1-Cas2 from Streptococcus pyogenes catalyzed leader-proximal integration in vitro at a level of precision comparable to that of the type I-E system with IHF. In type II systems, a short leader-anchoring site (LAS) adjacent to the first repeat and ≤6 base pairs of this repeat are essential for CRISPR adaptation and are conserved in systems with similar repeats. Placement of an additional LAS in front of a nonleader repeat resulted in the integration of spacers at both sites, whereas LAS deletion caused ectopic integration at a downstream repeat adjacent to a spacer containing a LAS-like sequence. Hence, in contrast to type I-E systems, type II-A systems appear to rely solely on intrinsic sequence specificity for the leader-repeat junction.

Integration into the CRISPR array
For CRISPR-Cas types that are reliant on PAM sequences for recognition of targets, the acquisition of interference-proficient spacers requires the processing of the prespacer substrate at a specific position relative to the PAM. Each of the four Cas1 monomers in the Cas1-Cas2 complex contains a PAM-sensing domain. The presence of a PAM in the active site of just one of the Cas1 monomers is sufficient to appropriately position the substrate and PAM relative to the cleavage site. Furthermore, the presence of a PAM within the prespacer substrate ensures integration into the CRISPR in the correct orientation. This directional fidelity is critical because otherwise the PAM in the MGE target would lie at the wrong end of the crRNA target binding site, thus precluding target recognition. To avoid premature loss of the PAM directional cue, processing of the prespacer likely occurs after Cas1-Cas2 orients and docks at the leader-proximal repeat. Cas1-mediated processing of the prespacer creates two 3′OH ends required for nucleophilic attack on each strand of the leader-proximal repeat. The initial nucleophilic attack most likely occurs at the leader-repeat junction and forms a half-site intermediate; then, a second attack at the existing repeat-spacer junction generates the full-site integration product. After the first nucleophilic attack, the intrinsic sequence specificity of the Cas1-Cas2 complex defines the site of the second attack and ensures accurate repeat duplication. CRISPR repeats are often semi-palindromic, containing two short inverted repeat (IR) elements, but the location of these can vary. In type I-B and I-E systems, the IRs occur close to the center of the repeat and are important for spacer acquisition. In the type I-E system, both IRs act as anchors for the Cas1-Cas2 complex, which contains two molecular rulers to position the Cas1 active site for the second nucleophilic attack at the repeat-spacer boundary. However, in the type I-B system from Haloarcula hispanica, only the first IR is essential for integration, and a single molecular ruler, directed by an anchor between the IRs, has been proposed. In the type II-A systems of Streptococcus thermophilus and S. pyogenes, the IRs are located distally within the repeats, suggesting that these short sequences may directly position the nucleophilic attacks without a need for molecular rulers. Although these recent findings suggest that leader-repeat regions at the beginning of CRISPR arrays contain sequences to ensure appropriate Cas1-Cas2 localization, further work is required to determine how the spacer integration events are specifically orchestrated in the diverse range of CRISPR-Cas types.

Production of prespacers from foreign DNA
Despite the elegance of memory-directed defense, CRISPR adaptation is not without complications. For example, the inadvertent acquisition of spacers from host DNA must be avoided because this will result in cytotoxic self-targeting, akin to autoimmunity in eukaryotic adaptive immune systems. Therefore, production of prespacer substrates from MGEs should outweigh production from host DNA.

Cas protein–assisted production of spacers
DNA breaks induced by interference activity of class 2 CRISPR-Cas effector complexes could trigger host DNA repair mechanisms (e.g., RecBCD), thereby providing substrates for Cas1- Cas2. In agreement with a model for DNA break–stimulated enhancement of CRISPR adaptation, restriction enzyme activity can stimulate RecBCD-facilitated production of prespacer substrates. RecBCD activity may also partially account for the enhanced CRISPR adaptation observed during phage infection of a host possessing an innate restriction-modification defense system. Whether the enhanced CRISPR adaptation was RecBCD-dependent in this example is unknown. In a CRISPR Cas–induced DNA break model, the production of prespacer substrates is preceded by a sequence-specific target recognition. Although direct evidence to support this concept is lacking, CRISPR adaptation in type II-A systems requires Cas1-Cas2, Cas9, a transactivating crRNA (tracrRNA; a cofactor for crRNA processing and interference in type II systems), and Csn2. The PAM-sensing domain of Cas9 enhances the acquisition of spacers with interference-proficient PAMs. However, Cas9 nuclease activity is dispensable, and existing spacers are not strictly necessary, suggesting that the PAM interactions of Cas9 could be sufficient to select appropriate new spacers. Some Cas9 variants can also function with non-CRISPR RNAs and tracrRNA. This raises the possibility that host or MGE-derived RNAs might direct promiscuous Cas9 activity, resulting in DNA breaks or replication fork stalling that could potentially result in prespacer generation.

Spacer Integration
The CRISPR array is preceded by an AT-rich leader sequence. Spacer integration preferentially occurs at the leader end of the CRISPR array and thus keeps a chronological record of previous infections. The mechanism of protospacer integration has been studied in detail in the type I-E system of E. coli. In vitro studies showed that the mechanism by which Cas1-Cas2 integrates new spacers is similar to that of viral integrases and transposases. First, the 30 OH of the protospacer performs a nucleophilic attack at the target site and thus attaches to the 5' phosphate of the leader-proximal repeat. This process depends on the recognition of the leader-repeat boundary, which is specified through binding of the leader sequence by a CRISPR-independent protein called integration host factor (IHF). IHF sharply bends the DNA, which results in a U-shaped leader structure and favors recognition of the leader-repeat boundary by Cas1-Cas2 (Figure A; inset). In the second step, the 3' OH of the other protospacer strand is ligated to the opposite end of the first repeat. Important during this step are two inverted repeat motifs in the CRISPR repeat, which serve as anchors for the Cas1-Cas2 complex and determine the position of the second integration site (Goren et al., 2016). Upon complex binding, the repeat becomes distorted, which is crucial for making the second integration site accessible to Cas1. The incorporation of the new spacer in the correct orientation is ensured by the presence of the partial PAM on the protospacer. Though some PAM nucleotides are removed prior to integration, this likely occurs after binding of the acquisition complex to the leader-repeat junction, so directionality is preserved (Figure A; inset). Unlike type I-E, recognition of the leader-repeat end in type II-A is IHF independent and requires a short motif termed leader-anchoring site (LAS), which consists of 5 bp of the repeat-proximal leader end and is directly recognized by Cas1- Cas2. Interestingly, mutations in the LAS can lead to ectopic spacer integration within the CRISPR array. Although spacer acquisition is less effective in this case, the recognition of alternative anchoring sites gives the system flexibility to overcome alterations of the canonical LAS by integrating new spacer at an alternative anchoring site. However, spacer sequences within the CRISPR array provide less resistance against phages than leader-proximal spacers, likely due to the lower abundance of distally encoded crRNAs. After recognition of the LAS, the type II-A Cas1-Cas2 complex can conduct the first integration reaction at either end of the first repeat, although integration at the leader boundary is usually preferred. Structural data supporting this model were recently presented for the type II-A integration complex of Enterococcus faecalis. Here, terminal sequences on both sides of the repeat were shown to be sufficient but suboptimal for target recognition. Additional interactions of Cas1 with the first four repeat-proximal nucleotides of the leader, however, allow a more efficient interaction with the target and thus explaining the preference for first integration event at the leader side of the first repeat. The first reaction is characterized by generation of a half-site integration intermediate where only one strand of the protospacer is ligated to one end of the repeat. The second integration event depends on proper protospacer size, the recognition of the opposite repeat end, and bending of the repeat by the Cas1-Cas2 complex. In case these requirements are not fulfilled, full-site integration cannot occur and the acquisition complex presumably reverses the first integration reaction, or the half-site integration intermediate is removed by DNA repair proteins. 13

Acquisition of spacers
The acquisition of new invader-derived spacers generally proceeds in a polarized manner at the leader-end of the CRISPR locus, which results in a chronological record of previously encountered foreign nucleic acid. The most recent experimental data support the following model for the step-wise acquisition of novel spacers. The recognition and fragmentation of invading DNA is likely to be the first step in the process. A recent study reported functional synergy between an R–M system and CRISPR–Cas in Streptococcus thermophilus, which suggests that fragments of invader DNA that are generated by the R–M system might be potential substrates for spacer acquisition. The CRISPR– Cas system selects suitable spacers by the detection of a specific protospacer adjacent motif (PAM), followed by processing of the DNA substrates into spacer precursors of a defined size. After the opening of the leader-end repeat by the nicking of both strands at opposite sides of the repeat, the new spacer is integrated in a specific, PAM-dependent orientation. In support of this model, the leader-end repeat is duplicated during spacer acquisition. In addition to DNA that is derived from MGEs (that is, ‘non-self ’ DNA), fragments of chromosomal DNA (that is, ‘self ’ DNA) are occasionally integrated as novel CRISPR spacers. However, as these self-targeting spacers are associated with cytotoxicity, their presence in the genome is typically associated with a modified PAM or an inactivated CRISPR–Cas system. In the absence of Cas proteins that are essential for target cleavage, the acquisition of chromosome-derived spacers has indeed been observed, but it occurs at least 100-fold less frequently than the acquisition of plasmid-derived spacers. This suggests that CRISPR–Cas systems can distinguish invading, non-self DNA from self DNA.

1. Martin Wilkinson et al.:Structure of the DNA-Bound Spacer Capture Complex of a Type II CRISPR-Cas System May 09, 2019
2. Cas Mosterd: A short overview of the CRISPR-Cas adaptation stage 9 June 2020
3. Wang et al., Structural and Mechanistic Basis of PAM-Dependent Spacer Acquisition in CRISPR-Cas Systems November 5, 2015
4. James K Nuñez: Cas1–Cas2 complex formation mediates spacer acquisition during CRISPR–Cas adaptive immunity 04 May 2014

Last edited by Otangelo on Sun Aug 28, 2022 12:57 pm; edited 21 times in total

Otangelo · 4 Interference: Cleaving DNA and RNA Invaders Wed Aug 17, 2022 1:16 pm

Otangelo

Admin

Posts : 9704
Join date : 2009-08-09
Age : 58
Location : Aracaju brazil

crRNA Biogenesis: Generating Guides for Cas Proteins

The hallmark of CRISPR-Cas defense is the utilization of crRNAs for sequence-specific targeting of invading genetic elements. The transcription start point of the precursor crRNA (pre crRNAs) usually lies within the leader sequence preceding the CRISPR array. The transcript is subsequently processed within the repeats to generate mature crRNAs, which are usually composed of a repeat segment that is recognized by Cas proteins in a structure- and/or sequence-dependent manner and a spacer portion that is important for target binding (Figure below).

Origin of CRISPR-Cas molecular complexes of prokaryotes The_cr10

The crRNA Maturation Pathways of Class 1 and Class 2 CRISPR-Cas Systems
In class 1 systems, the CRISPR array is transcribed yielding a long pre-crRNA. Cas6-family enzymes recognize the repeat structure and/or sequence and process the RNA into intermediate or mature crRNAs. In some cases (e.g., type I-A and type I-B), Cas6 acts as a dimer to process the unstructured pre-crRNA. crRNA maturation in class 2 systems differs significantly. In type II, tracrRNA (red) and pre-crRNA form duplexes, which are bound and stabilized by Cas9. This enables the processing by the host protein RNase III. The intermediate crRNA is further matured by an unknown RNase. Type II-C systems were described to employ an RNase-III-independent pathway. Here, promoter sequences within the repeats enable internal transcription and formation of mature crRNAs, which form crRNA:tracrRNA duplexes that bind Cas9. In type V-A and type VI, Cas12a and Cas13, respectively, recognize the structure and sequence of their repeats in order to cleave the pre-crRNA upstream of the stem structure. In type V-A, an additional uncharacterized processing event occurs. Processing by Cas13 of type VI yields mature crRNA. However, crRNA maturation is not an absolute requirement for interference in this system.

Class 1 crRNA Maturation
The process of crRNA maturation shows great similarity between type I and type III systems. Both typically employ Cas6 enzymes to specifically process the repeat within the pre-crRNA. A notable exception is the type I-C system, which does not code for a Cas6 homolog. Here, Cas5d functionally replaces Cas6. The majority of type I pre-crRNAs harbor palindromic sequences within their repeats and are thus able to form stable stem-loop structures that are recognized by Cas6 or Cas5d (the affix ‘‘d’’ in Cas5d refers to ‘‘Dvulg,’’ the former name of this protein in the type I-C system). The nucleases cleave the RNA directly downstream of the hairpin, yielding mature crRNAs that are composed of a full spacer flanked by a short repeat-derived 5' handle and the 3' stem-loop. Most Cas6 enzymes remain bound to the crRNA after repeat cleavage and therefore act as scaffolds for the formation of Cascade. In contrast, all so-far-described homologs of Cas6a (type I-A) and some homologs of Cas6b (type I-B) release the crRNA after the processing event. In contrast to the other subtypes, type I-A and I-B repeats are non-palindromic, and it was believed that in these cases, Cas6 solely recognizes the repeat sequence. However, recent studies revealed the significance of a stem-loop structure for repeat cleavage and suggest that Cas6 remodels the repeats to form the requisite stem-loop structure and reposition the cleavage site. In contrast to most other monomeric Cas6 proteins, Cas6 proteins in systems with non-palindromic crRNA repeats (mainly I-A and I-B) form dimers, thus raising the idea that dimerization might be related to the re-modeling function of Cas6. As some repeat sequences in type III can either be unstructured or only form weak stem-loops, they might also rely on Cas6 to remodel crRNA. Supporting this assumption, type III Cas6 proteins show high sequence similarity to Cas6 homologs of type I-A and I-B. In addition, the overall processing mechanism in type III is highly similar to type I-A and I-B: in these systems, Cas6 cleaves the pre-crRNA within the repeat region and the processed crRNA undergoes further trimming at the 3' end, thus removing the hairpin. Moreover, like the Cas6 homologs of type I-A and I-B, type III Cas6 proteins are not part of the interference complex. Homologs of Cas6 are not present in the subtypes III-C and III-D. Here, Cas5 proteins might be involved in pre-crRNA processing comparable to type I-C. The mechanism of crRNA processing in the recently classified subtype IV awaits characterization. However, the presence of cas5 orthologs and cas6-like genes in this type suggests a processing mechanism similar to other class 1 types.

Class 2 crRNA Maturation
Class 2 systems co-opt the interference machinery and, in some cases non-Cas proteins, for crRNA maturation. Type II systems and type V-B systems require tracrRNA for CRISPR-mediated immunity. The effector protein of the specific type—for example, Cas9 of type II-A—binds and stabilizes the tracrRNA:crRNA duplex and further recruits the host protein RNase III for processing within the duplexed repeat. After a second cleavage by an unknown RNase— which removes the 5' repeat-derived tag—the effector complex composed of Cas9 and the tracrRNA:crRNA duplex is ready for interference. Type II-C systems of Neisseria meningitidis and Campylobacter jejuni also utilize a tracrRNA:crRNA duplex for target interference. Here, however, it was described that the repeats of the type II-C arrays contain promoter elements that lead to the transcription of individual crRNAs. These crRNAs can be processed by RNase III, but this is not a prerequisite for a functional interference complex. In type V and type VI systems, the effector proteins Cas12 and Cas13, respectively, possess dual nuclease activity for crRNA processing and target interference. In type V-A systems, Cas12a recognizes the repeat hairpin structure and cleaves within the repeat to generate crRNAs with 5' repeat-derived tags. For type V-B and type V-C systems, comprehensive data are still missing, but it seems that their effector proteins Cas12b and Cas12c process precursor crRNA and that the former also requires tracrRNA. Similar to Cas12a, the Cas13 effector protein of type VI systems does not require tracrRNA for crRNA processing. Cas13a recognizes the sequence and structure of the repeat within the pre-crRNA and processes upstream of the hairpin. Interestingly, crRNA maturation is not a strict requirement for target cleavage in type VI-A, as pre-crRNAs can also serve as functional guides. In type VI-B systems, the repeats can vary in length within one CRISPR array, while spacer size remains the same. Therefore, Cas13b-mediated processing results in the generation of mature crRNAs that harbor a 30-nt spacer portion combined with either a 36-nt or an 88-nt repeat sequence. Both repeat architectures were shown to promote target cleavage by the effector.

crRNA Biogenesis
Mature crRNAs are key elements in CRISPR-Cas defense against genome invaders. These short RNAs are composed of unique repeat/spacer sequences that guide the Cas protein(s) to the cognate invading nucleic acids for their destruction. The biogenesis of mature crRNAs involves highly precise processing events. Interestingly, different types of CRISPR-Cas systems have distinct crRNA maturation mechanisms. The CRISPR repeat-spacer array is transcribed as a precursor CRISPR RNA molecule (pre-crRNA) that undergoes one or two maturation steps. In type I CRISPR-Cas systems, pre-crRNA is cleaved within the repeat regions by a specific Cas6-like endoribonuclease that at least in some cases is a subunit of a Cascade complex to yield the mature crRNAs. In type III systems, the standalone endoribonuclease Cas6 processes pre-crRNA by cleavage within the repeats, producing an intermediate molecule that is further trimmed to generate the mature crRNAs. Type II systems have a unique crRNA biogenesis pathway, in which a trans-acting small RNA (encoded by the CRISPR-Cas locus) base pairs with each repeat sequence of the pre-crRNA to form a double-stranded RNA template that is cleaved by the housekeeping endoribonuclease III in the presence of protein Cas9 (Csn1). The generated intermediates are then subjected to further maturation by a yet to be revealed mechanism.

The core components of the CRISPR-Cas defense machinery are the short CRISPR RNAs (crRNAs) that associate with one or more Cas proteins to target and destroy invading nucleic acids. The CRISPR-Cas systems are extremely variable in their Cas gene composition; a recent reevaluation has resulted in a classification with three main CRISPR-Cas types that are further divided into subtypes. Despite the Cas diversification, all systems share a common molecular mechanism for genome silencing in which the mature crRNAs contain a unique invader-derived partial sequence that guides the Cas protein(s) to the cognate invading nucleic acids for their eventual destruction. Critical for the activity of CRISPR-Cas is the maturation of crRNAs from the precursor transcript of the CRISPR repeat-spacer array.

The biogenesis of mature crRNAs can be divided into three steps.

In the first step, transcription, a long primary transcript or precursor crRNA (pre-crRNA) is transcribed from a promoter located upstream of the leader preceding the CRISPR repeat-spacer array.
In the second step, cleavage, the pre-crRNA is cleaved at a specific site within the repeats to yield intermediate crRNAs that consist of the entire spacer sequence flanked by partial repeat sequences.
In some cases, an additional step, processing, concerns a second nucleolytic processing of the intermediate crRNA that generates the active mature crRNAs.

The diversification of CRISPR-Cas into various (sub)types together with the large panel of distinct Cas proteins correlates with distinct types of crRNA biogenesis. A common theme among the subtypes is the (unidirectional) transcription of pre-crRNA followed by a first processing event within the repeats. In types I and III, a Cas6-like protein catalyzes this step (Fig. 5.1).

Origin of CRISPR-Cas molecular complexes of prokaryotes Cascad13

Comparison of crRNA processing pathways in type I, II, and III systems.
In the type I-E system, the palindromic repeats in pre-crRNA form hairpin structures that are recognized by the nuclease Cas6e (Cse3), which is an integral subunit of Cascade. After cleavage, the crRNA hairpin remains associated with Cas6e while other subunits bind the 50 handle and spacer, which is used for the recognition of cognate genetic element sequences. In type II systems, pre-crRNA with unstructured repeats is bound to an RNA species known as tracrRNA that is complementary to the repeat sequence, forming an RNA duplex that is recognized and cleaved by host RNase III in the presence of Cas9 (Csn1) protein. Further processing by unknown nucleases generates mature crRNA. In type III-B systems, crRNA is generated by the Cas6 endonuclease (as mentioned for type I systems). Cas6 binds unstructured pre-crRNA, cleaving within the repeat to generate crRNA with 50 and 30 repeat-derived termini. These crRNAs are taken up by archaeal Cascade (homologous to a type I-A system) or alternatively loaded into the Cmr (type III-B) complex, when present. In the latter case, the 30 repeat-derived sequence is trimmed away by unknown nucleases. The recently described Cas5d endoribonuclease of subtype I-C that also cleaves pre-crRNA within the repeats and assembles in a Cascade-like complex (Nam et al. 2012) is not represented here

In type II, a trans-acting small RNA directs pre-crRNA dicing by housekeeping endoribonuclease III-mediated cleavage within the repeats in the presence of Cas9 (Csn1) (Fig. above). The processed crRNAs from types I (I-A, I-E, I-F) do not seem to undergo further maturation, whereas types II and III (and possibly some type I subtypes) have a second maturation step to produce the active crRNAs, the distinct components and mechanisms of which are yet to be determined (Fig. above).

crRNA Biogenesis in Type I Systems
Type I systems are present in both bacteria and archaea. Like all CRISPR-Cas systems, types I are predicted to target mobile genetic sequences. Experimental evidence has been provided for spacer acquisition in Escherichia coli (subtype I-E), and the correlating resistance against plasmid and phage. In Pseudomonas aeruginosa, the system (subtype I-F) is required for inhibition of biofilm formation that depends on an integrated bacteriophage and its role in phage maintenance resistance is yet to be demonstrated. Type I systems are characterized by a Cascade (-like) ribonucleoprotein complex and a nuclease/helicase (Cas3) required for interference. Processing of the pre-crRNA transcript is catalyzed by a Cas6-like metal-independent endoribonuclease that cleaves the repeat sequence at a conserved position 8 nt upstream of the repeat-spacer boundary. The mature crRNAs end up in Cascade where they play the crucial role of guiding the complex to the complementary target DNA. In most type I systems characterized so far, the Cas6-like enzyme is a subunit of a Cascade-like complex, which is distinct from the apparent standalone version of Cas6 that may supply the intermediate or mature crRNAs to different complexes in type III systems. The crRNAs of subtypes I-E and I-F have stable hairpin structures, the functions of which might be to initially expose the cleavage site to the Cas6 catalytic domain, and to subsequently assist in the stable interaction between guide crRNA and Cascade. Following Cas6-mediated cleavage within the repeats, crRNAs of sub-types I-A, I-E, and I-F are not processed any further.

crRNA Biogenesis in Type II Systems
Type II CRISPR-Cas systems are characterized by a minimal locus with only four genes (cas9, cas1, cas2, and either csn2 or cas4) and the presence of tracrRNA in the vicinity of the cas operon or repeat-spacer array. Types II are present in bacteria but have, at this point, never been detected in archaea. The system has been studied mainly in streptococci where the first biological evidence for immunity against both cell death (mediated by lytic phages, Streptococcus thermophilus) and acquisition of virulence genes (mediated by lysogenic bateriophages, Streptococcus pyogenes) was demonstrated. Type II is also active against plasmid maintenance. In 2011, a study in the Gram-positive human pathogen S. pyogenes revealed a unique crRNA biogenesis pathway characteristic for type II wherein a first processing event is achieved by the coordinated action of three factors: a trans-acting small RNA, the host-encoded RNase III and the Cas9 protein.

crRNA Biogenesis in Type III Systems
Type III CRISPR-Cas systems are present in both bacteria and archaea. This variant has mainly been studied in the archaeon P. furiosus (subtype III-B). In addition, crRNA biogenesis has recently been investigated in the Gram-positive bacterial pathogen Staphylococcus epidermidis (subtype III-A). In archaeal species, subtype III-B spacers are predicted to target viruses although no in vivo experiment has yet proven the full activity of the system in the limitation of virus propagation. However, recent evidence for targeting of a small RNA, antisense to pre-crRNA, was demonstrated in P. furiosus. In S. epidermidis, the subtype III-A was demonstrated to be critical for horizontal dissemination of antibiotic resistance by directly targeting invading conjugative plasmid DNA. The hallmark of crRNA production in type III is the protein Cas6, which is also present in type I. As mentioned above, in type I systems, Cas6-like endoribonucleases are either an integral component of the Cascade complexes (for example Cas6e and Cas6f in E. coli and P. aeruginosa, respectively, or are weakly associated with the complex (for example Cas6 in S. solfataricus a Cascade. In contrast, Cas6 of subtype III-B seems to function as a standalone CRISPR repeat RNA-specific endoribonuclease in P. furiosus, S. solfataricus and presumably in other systems III of many archaea and possibly bacteria. crRNA maturation in type III occurs in two steps. A first processing event involves dicing of pre-crRNA by Cas6-mediated cleavage within the repeats to generate 1X intermediate units that undergo further maturation to produce the active mature crRNAs. Another feature of the CRISPR-Cas type III is the presence of csm and cmr genes encoding repeat-associated mysterious proteins (RAMP) proteins in subtype III-A and III-B, respectively. The functions of these Cas proteins remains to be clarified although some recent studies have indicated that they may function in crRNA biogenesis and/or targeting of invading nucleic acids (DNA in the case of subtype III-A and RNA in the case of subtype III-B).

Interference: Cleaving DNA and RNA Invaders

Sequence-specific destruction of invading MGEs is the basis for CRISPR-Cas defense. In the final stage of CRISPR-Cas-mediated immunity, mature crRNAs guide the interference machinery to cleave invading nucleic acids. In order to store the genetic information of a parasitic MGE, a part of the foreign DNA must be integrated in the genomic CRISPR locus of the host. This, however, raises an inherent problem for the interference machinery: the sole reliance on sequence complementarity between the crRNA and the target sequence would result in cleavage of the CRISPR array. Hence, nearly all characterized CRISPR-Cas systems (except type III) have an authentication and discrimination mechanism that involves coordinated recognition of a short sequence, called the proto spacer adjacent motif (PAM), by both the adaptation and interference machinery. The presence of a PAM proximal to the acquired spacer and targeted proto spacer and its absence in the CRISPR array facilitates robust immunity while averting auto-immune targeting of the CRISPR array.

Interference in Class 1 CRISPR-Cas Systems Type I
Type I systems are the most widespread CRISPR-Cas systems and employ a crRNA-bound multiprotein complex termed CRISPR-associated complex for antiviral defense (Cascade) for target recognition, as well as the nuclease Cas3 for target cleavage (Figure below)

Origin of CRISPR-Cas molecular complexes of prokaryotes The_in10

The Interference Pathways of Class 1 CRISPR-Cas Systems
Two general pathways for class 1 interference exist. In the first pathway, exemplified by the type I-E system of E. coli, crRNA bound by Cas6 serves as a scaffold for Cascade assembly. Cascade first recognizes the PAM (yellow) on the invader DNA. R-loop formation is induced when crRNA base pairs with the target strand of the DNA. The presence of the R-loop triggers the recruitment of the endonuclease Cas3, which initiates degradation of the non-target strand. Similar to type I systems, type III systems form multi-Cas-protein complexes for interference (Csm and Cmr for type III-A and type III-B, respectively) using the crRNA as a scaffold. Type III-A is shown here as an example. Unlike in type I, Cas6 of type III is not an integral part of the interference complex. The crRNA within the type III complexes binds to complementary regions in target RNA transcripts. Binding triggers a Cas10-mediated double-strand break within the corresponding template DNA, after which Cas7 (Csm3) cleaves the transcript RNA. Upon target binding, Cas10 also generates cyclic oligoadenylates, which activate the RNase Csm6 to degrade non-specific RNAs.

Cas3 is the hallmark protein of type I systems and is recruited upon target binding by Cascade to cleave the foreign DNA. Although the overall architecture of Cascade is conserved, its composition can vary between different subtypes and homology of the subunits has often been established on the basis of functional similarities rather than sequence similarities. Among the seven subtypes that have been identified to date (I-A to I-F and I-U), the I-E system of Escherichia coli is most thoroughly characterized and has the full complement of subunits that are found in type I systems, thus serving as a model for understanding type I interference. Cascade of the type I-E CRISPR-Cas system has a molecular weight of 405 kDa and displays the following composition: (Cas5e)1-(Cas6e)1-(Cas7e)6-(Cas8e)1- (Cas11e)2. According to the former nomenclature, Cas8e and Cas11 were known as Cse1 and Cse2, respectively. In almost all type I systems, pre-crRNA is processed by an RNase of the Cas6 family (or Cas5d in subtype I-C). In E. coli, pre-crRNA processing by Cas6e yields 61-nt-long mature crRNAs that encompass the full spacer sequence, flanked on both sides by repeat portions (see crRNA Biogenesis: Generating Guides for Cas Proteins). Cas6e remains bound to the 30 repeat portion of the crRNA after processing. Subsequently, Cascade assembles into a seahorse-like shape. The crRNA is an integral part of Cascade and is bound along the backbone of the complex and capped by Cas5e at the 50 end. The helical backbone is composed of six tightly connected Cas7e proteins that adopt hand-like shapes with the thumb domains responsible for tight connection of the subunits. Starting from the last nucleotide of the 5' repeat handle, the thumbs of the subunits kink the crRNA at every sixth nucleotide. The five nucleotides in between are aligned along the palm domain to enable efficient base pairing of the crRNA with the target DNA. Cas11e and Cas8e are defined as the small and large subunits of Cascade, respectively. Two Cas11e subunits interact directly with the Cas7e backbone and form the ‘‘belly’’ of the complex; Cas8e interacts with Cas5e, Cas7e, and Cas11e and forms the tail. Recognition of the PAM in the double-stranded target DNA is mediated by the large subunit, which also initiates the local unwinding of DNA and the subsequent binding of crRNA to the cDNA strand of the protospacer. Crucial for protospacer binding of the Cascade complex are the first eight PAM-proximal nucleotides of the crRNA (termed seed sequence), with the exception of the sixth nucleotide, which does not bind to the target. Mutations in the seed sequence greatly impair the binding of Cascade to the target in E. coli. The non-target strand is bound by two Cas11e subunits, leading to the formation and stabilization of the so-called R-loop structure, which is

accompanied by substantial conformational changes of the small and large subunits and thus allows recruitment of the nuclease Cas3 for target cleavage. The HD domain of Cas3 nicks the displaced non-target DNA strand, inducing structural changes in the protein that activate its ATP-dependent helicase activity. As a result, Cas3 translocates and successively degrades the non-target DNA strand in the 3' to 5' direction, leaving a single-stranded DNA (ssDNA) gap of roughly 200–300 nt in the target genome. This, however, might be an intermediate degradation product, as partially ssDNA might not lead to full destruction of the invader. It is thought that the complete degradation of the target DNA is mediated either by other host nucleases or by the potent, Cascade-independent ssDNA nuclease activity of Cas3 that has been observed in vitro. Although the overall structure of Cascade and involvement of Cas3 are conserved, there are several notable subtype-specific differences in the type I interference machinery. Several subtypes lack certain Cascade subunits found in type I-E. In fact, the subtypes I-A and I-E are the only systems that harbor a separate gene for the small subunit. In the other subtypes, the small subunit is either fused to or functionally replaced by Cas8. An even more minimal Cascade architecture is seen in type I-C, which lacks a Cas6 homolog, and type I-Fv (a variant of type I-F), where the large and small subunits are absent and functionally replaced by Cas5fv and Cas7fv. An interesting variation in the overall shape of Cascade was found in type I-F, in which the backbone of the surveillance complex (known as Csy in this sub-type) has a short helical pitch and almost forms a closed ring but subsequently ‘‘unwinds’’ upon target DNA recognition. Although Cas3 is the signature protein of type I systems, fusion or fission of the cas3 gene is seen in several subtypes. Collectively, these studies suggest that there is significant genetic and functional plasticity in the components of the type I interference machinery but that the overall architecture and modules for crRNA binding and processing (Cas6 and/or Cas5), the backbone (Cas7), PAM-recognition and R-loop stabilization, and target cleavage (Cas3) are conserved. The seed sequence is crucial for type I-E and I-F interference and is therefore likely another common feature among type I systems. High-resolution structures and detailed insight into target recognition and cleavage are still awaited for subtypes I-A, I-D and I-U. However, given the presence of genes encoding the core functional units of the interference machinery, it is likely that they follow the same principles that have been established by investigation of interference in the other type I systems.

Interference in Class 2 CRISPR-Cas Systems Type II
Cas9 is a dual RNA-guided DNA endonuclease that is required for interference and immunity in type II systems (Figure below)

Origin of CRISPR-Cas molecular complexes of prokaryotes Asdafa10

The Interference Pathways of Class 2 CRISPR-Cas Systems
In class 2 systems, interference is accomplished by a single effector protein. In the three well-characterized examples of class 2 interference (type II, type V-A, and type VI), the effector proteins (Cas9, Cas12a, and Cas13, respectively) participate in crRNA maturation and are therefore already bound to the guide RNA prior to target selection and cleavage. During type II interference, Cas9 in complex with tracrRNA (red) and crRNA interrogates the target DNA for the correct PAM (yellow) sequences before probing for complementarity to the crRNA. Base pairing of crRNA to the target strand induces an R-loop structure that finally triggers cleavage of the target and non-target strands by the HNH and RuvC domains, respectively, to yield a blunt double-strand break in the DNA. Type V systems utilize Cas12 proteins for interference. Cas12a of the type V-A system does not require tracrRNA. Following PAM recognition, the RuvC domain of Cas12a cleaves the target sequence at the PAM-distal end, yielding a staggered cut. The tracrRNA requirement and mechanisms may differ for other Cas12 proteins. In type VI, Cas13 is guided by the crRNA to target complementary ssRNAs. Target binding requires a protospacer flanking site (PFS, yellow) and induces conformational changes of Cas13, resulting in an activated catalytic site within the two HEPN domains of the protein. In the activated state, Cas13 acts as a RNase, indiscriminately cutting any exposed RNA, including target RNA, resulting in the global degradation of RNA.

The type II-A, II-B, and II-C systems are differentiated on the basis of the size of the cas9 gene and the presence of subtype-specific genes. In addition to crRNA, Cas9 requires trans-activating crRNA (tracrRNA), a small RNA that bears complementarity to the repeat regions of crRNA. Once bound to mature dual RNA (tracrRNA:crRNA) or the engineered single-guide RNA (sgRNA) chimera that has been developed for genome engineering applications, Cas9 identifies target DNA through PAM recognition and subsequent base pairing of the guide RNA with the DNA. If the target displays sufficient complementarity to the RNA guide, Cas9 generates a blunt, double-strand break 3 bp upstream of the PAM. Cas9 has a bilobed structure with a central cleft that accommodates the crRNA: DNA duplex. The a-helical recognition (REC) lobe and the nuclease (NUC) lobe are joined by a disordered linker and by the highly conserved arginine-rich bridge helix that forms multiple contacts to the sgRNA. The NUC lobe contains the conserved HNH and RuvC nuclease domains and a variable C-terminal domain that interacts with the PAM. Detailed structural analysis of Cas9 in inactive and active nucleic-acid-bound states, together with numerous biochemical studies, have substantially contributed to our understanding of the interference mechanism in type II systems and are the subject of several in-depth reviews. Structural studies have confirmed that guide RNA binding regulates Cas9 activity by inducing a large conformational rearrangement in the protein. This results in the ordering of the PAM-interacting residues and the seed sequence of the guide RNA to render the protein competent for DNA binding and PAM recognition. The guide RNA-bound surveillance complex scans the DNA, and upon recognition of its cognate PAM in the non-target strand, it induces local DNA duplex unwinding to allow the guide RNA to probe for complementarity of the 10- to 12-nt seed sequence in the PAM-proximal region of the target strand. Base pairing between the guide RNA and target DNA and additional conformational changes in Cas9 promote further invasion of the guide RNA beyond the seed sequence, thus stabilizing the R-loop structure. PAM-distal complementarity and divalent cations are necessary for conformational activation of the HNH domain into the cleavage-competent state. Conformational activation of the HNH domain is coupled to rearrangements of the linker loops between the HNH and RuvC domains. This allosteric communication between the nuclease domains results in concerted cleavage of the target strand by the HNH domain and the non-target strand by the RuvC domain

Type III
Type III CRISPR-Cas systems employ Cascade-like complexes (termed Csm for III-A and Cmr for III-B) that display high similarity to type I effector complexes in their overall composition and structure (Figure above). However, in contrast to other described interference mechanisms, type III systems target both RNA and DNA substrates. Here, DNA cleavage strictly depends on the transcription of the target sequence. In the following, we will describe the type III-A and III-B complexes according to the new nomenclature presented by Koonin et al., (2017). The protein names according to the former nomenclature will be given in parentheses. Similar to Cascade, the Csm and Cmr complexes assemble along the mature crRNA, which is bound by Cas5 (Csm4/Cmr3) at the 50 repeat handle. The backbone is composed of Cas7-family proteins (Csm3 and Csm5 for type III-A/Cmr4, Cmr6, and Cmr1 for type III-B), while Cas11 (Csm2/Cmr5) and Cas10 are the small and large subunits, respectively. Target cleavage is initiated by binding of the type III effector complex to the nascent target transcript in a crRNA-dependent manner. The Cas7 subunits (Csm3/Cmr4) cleave the ssRNA at every sixth nucleotide. DNA cleavage is carried out by the palm domain of the Cas10 subunit and strictly requires transcription of the target in both type III systems. RNases belonging to the Csm6 or related Csx1 families are frequently associated with type III CRISPR-Cas systems. Both Csm6 and Csx1 nonspecifically degrade foreign transcripts and have auxiliary or sometimes essential functions during type III interference even though they are not part of the effector complex. Recently, it has been revealed that the Cas10 subunit of the Csm complex not only mediates target DNA cleavage, but also converts ATP into cyclic adenylates that act as second messengers to activate the Csm6 RNase. The production of the messenger by Cas10 depends on binding of the Csm complex to the target RNA and thus constitutes a regulatory mechanism that triggers robust interference in the presence of an invader. In most type III systems, binding of the 50 repeat portion of the crRNA to the target inhibits DNA cleavage and serves as a PAM-independent mechanism of self- versus non-self-discrimination. However, the necessity of a PAM sequence (rather than the 50 repeat tag incompatibility) was revealed for the type III-B system of Pyrococcus furiosus. Here, the so-called RNA-PAM (rPAM) is located 30 of the crRNA-complementary sequence on the target RNA and was shown to be crucial for DNA cleavage by the Cmr complex. In contrast, a recent study on the type III-A system of S. epidermidis found no evidence for the necessity of a PAM or rPAM in this system, indicating subtype or species-related differences in self- versus non-self discrimination in type III systems. The subtypes III-C and III-D were only recently identified, and little is known about their interference mechanisms. Despite the absence of the conserved adaptation module (composed of the genes cas1 and cas2) and sequence divergence in their cas10 genes, their overall genetic composition is comparable to the well-characterized types III-A and III-B, suggesting similar interference mechanisms. Type IV systems are also categorized as class 1 CRISPR-Cas systems and harbor genes resembling cas5, cas7, and cas8. However, comprehensive data on type IV-mediated CRISPR-Cas immunity is still missing.

Type V
Type V CRISPR-Cas systems are divided into subtypes V-A, V-B, and V-C that are characterized by effector proteins Cas12a (formerly called Cpf1), Cas12b (C2c1), and Cas12c (C2c3), respectively. Phylogenetic analysis and the low amino acid similarity of these proteins to one another and to Cas9 suggest that they all evolved independently from distinct transposon-associated nucleases of the TnpB family. Whereas Cas12c awaits detailed characterization, the structure and activity of Cas12a and Cas12b have been recently investigated and are described below. Furthermore, several loci tentatively annotated as type V-U lack the cas1-cas2 adaptation module but encode small putative effector proteins containing RuvC-like and zinc finger motifs whose potential regulatory or defense functions will require further investigation. Unlike Cas9 and Cas12b, Cas12a of type V-A does not require tracrRNA for activity (Figure above). After PAM recognition and sufficient base pairing between the crRNA and target DNA, Cas12a and Cas12b cleave both DNA strands, resulting in staggered double-stranded breaks with 5- and 7-nt overhangs distal to the PAM, respectively. In contrast to type II systems, which utilize diverse PAMs that are located on the non-target strand, Cas12 proteins recognize the PAM on both DNA strands, with the non-target PAM sequence being T-rich. Interestingly, Cas12b does not possess a PAM recognition domain like Cas9 or Cas12a. Moreover, Cas12a and Cas12b require a seed sequence of approximately 18 nt, implying that they could be highly specific alternatives to Cas9 for genome editing applications. Cas12a and Cas12b share the bilobed structure of Cas9, composed of the REC and NUC lobes. In Cas12a, cleavage of both DNA strands occurs in a single catalytic site of the RuvC domain. How both strands are positioned in the catalytic site and whether they are cleaved successively will require further investigation. Structures of Cas12b ternary complex with extended target and non-target DNA strands suggest that both strands can be positioned in the same RuvC catalytic pocket and that at least the target DNA strand is cleaved by the RuvC domain. Details of the catalytic reaction remain unknown, but it is plausible that Cas12a and Cas12b utilize a similar mechanism.

Type VI
Recent computational searches led to the identification of type VI systems, which are defined by the presence of proteins containing two RxxxxH motifs. These motifs are characteristic of higher eukaryotes and prokaryotes nucleotide (HEPN)-binding domains, which are commonly found in RNases. Type VI systems encode the HEPN-containing effector protein Cas13, which, unlike other class 2 effectors, cleaves ssRNA (Figure above). Cas13 is activated by ‘‘target’’ ssRNAs complementary to the crRNA to degrade not only the target ssRNA, but also collateral ssRNAs, similar to Csm6 and Csx1 enzymes in type III systems. Though Cas13 enzymes can, in principle, cleave any ssRNA by employing the conserved arginine and histidine residues within the two HEPN domains of the NUC lobe, Cas13a subfamilies display divergent preference for either uridine or adenine 50 to the scissile bond. Cas13 enzymes are programmed by crRNA and activated by target ssRNA but do not require tracrRNA. Cas13a tolerates peripheral mismatches in the crRNA: target ssRNA duplex but requires a central seed sequence for RNase activity. In addition, a non-G protospacer flanking site (PFS) 30 of the target ssRNA is important for activation of Leptotrichia shahii Cas13a but might differ in other species. In contrast, Cas13b activity requires a PFS at each side of the protospacer; the 50 PFS is non-C, whereas NAN or NNA are preferred at the 30 PFS. Cas13a and Cas13b also process the repeat regions of pre-crRNA, but biochemical and structural studies suggest distinct active sites for RNA-activated RNA degradation and pre-crRNA processing by Cas13a. Furthermore, crRNA maturation does not appear to be an absolute requirement for interference in type VI systems. Upon binding to (pre-)crRNA, Cas13a undergoes conformational changes that stabilize the crRNA and facilitate target binding. Binding of target ssRNA activates the RNase activity of Cas13a by inducing further conformational changes that bring the catalytic sites of the HEPN domains into close proximity. In contrast to the internal active sites of other class 2 effectors, the two HEPN domains of Cas13a form a composite active site at the external surface of the enzyme that is proposed to account for non-specific RNA degradation. When expressed heterologously in E. coli with phage-specific crRNAs, Cas13a and Cas13b can confer protection against ssRNA phages, potentially by targeted degradation of the phage genome and/or mRNAs. However, the indiscriminate RNase activity of Cas13 enzymes leads to restriction of bacterial growth, implying that type VI might degrade host RNAs to induce death or dormancy of infected cells. The balance between death versus dormancy and between self- versus non-self-targeting in type VI interference might be determined by the intrinsic activity of Cas13, the relative phage load, or by inhibitors or activators of Cas13, such as the recently identified Csx27 and Csx28 proteins.

Basic structural and functional blocks of CRISPR–Cas systems
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5898231/

Last edited by Otangelo on Fri Aug 26, 2022 10:37 am; edited 6 times in total

Otangelo

Otangelo

Admin

Posts : 9704
Join date : 2009-08-09
Age : 58
Location : Aracaju brazil

Eugene V. Koonin (2019): A hypothetical scenario of the origin of CRISPR–Cas systems from an ancestral signaling system (possibly an abortive infection defense system (Abi)). This putative ancestral Abi module shares a cyclic oligoA polymerase Palm domain (RNA recognition motif (RRM) fold) with Cas10 and is proposed to function analogously to type III CRISPR–Cas systems. Specifically, cyclic oligoA molecules that are synthesized in response to virus infection bind to the CRISPR-associated Rossmann fold (CARF) domain of the second protein in this system, resulting in activation of the RNase activity of the higher eukaryotes and prokaryotes nucleotide-binding (HEPN) domain, which induces dormancy through indiscriminate RNA cleavage. This putative ancestral Abi module would give rise to the type III-like CRISPR–Cas effector module via duplication of the RRM domain, with subsequent inactivation of one of the copies (the two RRM domains are denoted RRM1 and RRM2). 23

Eugene V. Koonin (2019): CRISPR-Cas systems are the most biologically complex because, CRISPR-Cas possess an integral capacity of creating immune memory and thus represent bona fide adaptive immunity. The origin of the most prevalent forms of the effector modules remains a hard problem. In type III, the entire effector complex, with the sole exception of the small subunit, is composed of domains with the same structural fold, the RRM fold, which is topologically identical to the widespread ferredoxin-like fold.

1. Eugene V. Koonin: Origins and evolution of CRISPR-Cas systems 25 March 2019
2. Eugene V. Koonin: The basic building blocks and evolution of CRISPR–Cas systems 2013 Dec;4

Otangelo

Otangelo

Admin

Posts : 9704
Join date : 2009-08-09
Age : 58
Location : Aracaju brazil

Origin of CRISPR-Cas molecular complexes of prokaryotes

Discovering CRISPR
S.H. Sternberg (2015): The CRISPR locus was first identified in Escherichia coli as an unusual series of 29-bp repeats separated by 32-bp spacer sequences (Ishino et al., 1987) 21 Carl Zimmer tells us the story (2015): The scientists who discovered CRISPR had no way of knowing that they had discovered something so revolutionary. They didn’t even understand what they had found. In 1987, Yoshizumi Ishino and colleagues at Osaka University in Japan published the sequence of a gene called iap belonging to the gut microbe E. coli. To better understand how the gene worked, the scientists also sequenced some of the DNA surrounding it. They hoped to find spots where proteins landed, turning iap on and off. But instead of a switch, the scientists found something incomprehensible. Near the iap gene lay five identical segments of DNA. DNA is made up of building blocks called bases, and the five segments were each composed of the same 29 bases. These repeat sequences were separated from each other by 32-base blocks of DNA, called spacers. Unlike the repeat sequences, each of the spacers had a unique sequence.

This peculiar genetic sandwich didn’t look like anything biologists had found before. When the Japanese researchers published their results, they could only shrug. “The biological significance of these sequences is not known,” they wrote. It was hard to know at the time if the sequences were unique to E. coli, because microbiologists only had crude techniques for deciphering DNA. But in the 1990s, technological advances allowed them to speed up their sequencing. By the end of the decade, microbiologists could scoop up seawater or soil and quickly sequence much of the DNA in the sample. This technique — called metagenomics — revealed those strange genetic sandwiches in a staggering number of species of microbes. They became so common that scientists needed a name to talk about them, even if they still didn’t know what the sequences were for. In 2002, Ruud Jansen of Utrecht University in the Netherlands and colleagues dubbed these sandwiches “clustered regularly interspaced short palindromic repeats” — CRISPR for short.

Jansen’s team noticed something else about CRISPR sequences: They were always accompanied by a collection of genes nearby. They called these genes Cas genes, for CRISPR-associated genes. The genes encoded enzymes that could cut DNA, but no one could say why they did so, or why they always sat next to the CRISPR sequence. Three years later, three teams of scientists independently noticed something odd about CRISPR spacers. They looked a lot like the DNA of viruses. “And then the whole thing clicked,” said Eugene Koonin. At the time, Koonin, an evolutionary biologist at the National Center for Biotechnology Information in Bethesda, Md., had been puzzling over CRISPR and Cas genes for a few years. As soon as he learned of the discovery of bits of virus DNA in CRISPR spacers, he realized that microbes were using CRISPR as a weapon against viruses.

Koonin knew that microbes are not passive victims of virus attacks. They have several lines of defense. Koonin thought that CRISPR and Cas enzymes provide one more. In Koonin’s hypothesis, bacteria use Cas enzymes to grab fragments of viral DNA. They then insert the virus fragments into their own CRISPR sequences. Later, when another virus comes along, the bacteria can use the CRISPR sequence as a cheat sheet to recognize the invader.
Scientists didn’t know enough about the function of CRISPR and Cas enzymes for Koonin to make a detailed hypothesis. But his thinking was provocative enough for a microbiologist named Rodolphe Barrangou to test it. To Barrangou, Koonin’s idea was not just fascinating, but potentially a huge deal for his employer at the time, the yogurt maker Danisco. Danisco depended on bacteria to convert milk into yogurt, and sometimes entire cultures would be lost to outbreaks of bacteria-killing viruses. Now Koonin was suggesting that bacteria could use CRISPR as a weapon against these enemies.

To test Koonin’s hypothesis, Barrangou and his colleagues infected the milk-fermenting microbe Streptococcus thermophilus with two strains of viruses. The viruses killed many of the bacteria, but some survived. When those resistant bacteria multiplied, their descendants turned out to be resistant too. Some genetic change had occurred. Barrangou and his colleagues found that the bacteria had stuffed DNA fragments from the two viruses into their spacers. When the scientists chopped out the new spacers, the bacteria lost their resistance. Barrangou, now an associate professor at North Carolina State University, said that this discovery led many manufacturers to select for customized CRISPR sequences in their cultures, so that the bacteria could withstand virus outbreaks. “If you’ve eaten yogurt or cheese, chances are you’ve eaten CRISPR-ized cells,” he said.

In 2007, Blake Wiedenheft joined Doudna’s lab as a postdoctoral researcher, eager to study the structure of Cas enzymes to understand how they worked. Doudna agreed to the plan — not because she thought CRISPR had any practical value, but just because she thought the chemistry might be cool. “You’re not trying to get to a particular goal, except understanding,” she said. As Wiedenheft, Doudna and their colleagues figured out the structure of Cas enzymes, they began to see how the molecules worked together as a system. When a virus invades a microbe, the host cell grabs a little of the virus’s genetic material, cuts open its own DNA, and inserts the piece of virus DNA into a spacer. As the CRISPR region fills with virus DNA, it becomes a molecular most-wanted gallery, representing the enemies the microbe has encountered. The microbe can then use this viral DNA to turn Cas enzymes into precision-guided weapons. The microbe copies the genetic material in each spacer into an RNA molecule. Cas enzymes then take up one of the RNA molecules and cradle it. Together, the viral RNA and the Cas enzymes drift through the cell. If they encounter genetic material from a virus that matches the CRISPR RNA, the RNA latches on tightly. The Cas enzymes then chop the DNA in two, preventing the virus from replicating.

CRISPR, microbiologists realized, is also an adaptive immune system. It lets microbes learn the signatures of new viruses and remember them. And while we need a complex network of different cell types and signals to learn to recognize pathogens, a single-celled microbe has all the equipment necessary to learn the same lesson on its own. But how did microbes develop these abilities? Ever since microbiologists began discovering CRISPR-Cas systems in different species, Koonin and his colleagues have been reconstructing the systems’ evolution. CRISPR-Cas systems use a huge number of different enzymes, but all of them have one enzyme in common, called Cas1. The job of this universal enzyme is to grab incoming virus DNA and insert it in CRISPR spacers. Recently, Koonin and his colleagues discovered what may be the origin of Cas1 enzymes.

Along with their own genes, microbes carry stretches of DNA called mobile elements that act like parasites. The mobile elements contain genes for enzymes that exist solely to make new copies of their own DNA, cut open their host’s genome, and insert the new copy. Sometimes mobile elements can jump from one host to another, either by hitching a ride with a virus or by other means, and spread through their new host’s genome.

Koonin and his colleagues discovered that one group of mobile elements, called casposons, makes enzymes that are pretty much identical to Cas1. In a new paper in Nature Reviews Genetics, Koonin and Mart Krupovic of the Pasteur Institute in Paris argue that the CRISPR-Cas system got its start when mutations transformed casposons from enemies into friends. Their DNA-cutting enzymes became domesticated, taking on a new function: to store captured virus DNA as part of an immune defense. While CRISPR may have had a single origin, it has blossomed into a tremendous diversity of molecules. Koonin is convinced that viruses are responsible for this. Once they faced CRISPR’s powerful, precise defense, the viruses evolved evasions. Their genes changed sequence so that CRISPR couldn’t latch onto them easily. And the viruses also evolved molecules that could block the Cas enzymes. The microbes responded by evolving in their turn. They acquired new strategies for using CRISPR that the viruses couldn’t fight. Over many thousands of years, in other words, evolution behaved like a natural laboratory, coming up with new recipes for altering DNA. 17

The CRISPR–Cas system
Tina Y.Liu (2020): CRISPR-Cas systems stand out as the only known RNA programmed pathways for detecting and destroying bacteriophages and plasmids. . Class 1 CRISPR-Cas systems, the most widespread and diverse of these adaptive immune systems, use an RNA-guided multi-protein complex to find foreign nucleic acids and trigger their destruction. These multisubunit complexes target and cleave DNA and RNA, and regulatory molecules control their activities. CRISPR-Cas loci constitute the only known adaptive immune system in bacteria and archaea. They typically include an array of repeat sequences (CRISPRs) with intervening “spacers” matching sequences of DNA or RNA from viruses or other mobile genetic elements, and a set of genes encoding CRISPR-associated (Cas) proteins. Transcription across the CRISPR array produces a precursor crRNA (pre-crRNA) that is processed by nucleases into small, non-coding CRISPR RNAs (crRNAs). Each crRNA molecule assembles with one or more Cas proteins into an effector complex that binds crRNA-complementary regions in foreign DNA or RNA. The effector complex then triggers degradation of the targeted DNA or RNA using either an intrinsic nuclease activity or a separate nuclease.

Giedrius Gasiunas (2012): The silencing of invading nucleic acids is executed by ribonucleoprotein complexes preloaded with small, interfering CRISPR RNAs (crRNAs) that act as guides for targeting and degradation of foreign nucleic acid. The Cas9–crRNA complex of the Streptococcus thermophilus CRISPR3/Cas system introduces a double-strand break at a specific site in DNA containing a sequence complementary to crRNA. DNA cleavage is executed by Cas9, which uses two distinct active sites, RuvC and HNH, to generate site-specific nicks on opposite DNA strands. Results demonstrate that the Cas9–crRNA complex functions as an RNA-guided endonuclease with RNA-directed target sequence recognition and protein-mediated DNA cleavage. 20

J.Cepelewicz (2020): CRISPR acts like an adaptive immune system; it enables bacteria that have been exposed to a virus to pass on a genetic “memory” of that infection to their descendants, which can then mount better defenses against a repeat infection. It’s a system that works so well that an estimated half of all bacterial species use CRISPR. Researchers have uncovered dozens of other systems that bacteria use to rebuff phage invasions. But in laboratory studies, bacteria primarily develop what’s known as surface-based phage resistance. Mutations change receptor molecules on the surface of the bacterial cell, so that the phage can no longer recognize and invade it. The strategy is akin to shutting a door and throwing away the key: It offers the bacteria complete safety from infection by the virus. But that protection comes at a significant price, because it also disrupts whatever nutrient uptake, waste disposal, communication task or other cellular function the receptor would have been providing — taking a constant toll on a cell’s fitness. In contrast, CRISPR only drags on a cell’s resources when it’s active, during a viral infection. Even so, CRISPR represents a riskier gambit: It doesn’t start to work until phages have already entered the cell, meaning that there’s a chance the viruses could overcome it. And CRISPR doesn’t just attack viral DNA; it can also prevent bacteria from taking up beneficial genes from other microbes, like those that confer antibiotic resistance. What factors affect the trade-offs in costs and fitness? For the past six years, Edze Westra, an evolutionary ecologist at the University of Exeter in England, has led a team pursuing the answer to that question. In 2015, they discovered that nutrient availability and phage density affected whether Pseudomonas bacteria relied on surface-based or CRISPR-based resistance. In environments poor in resources, receptor modifications were more burdensome, so CRISPR became a better bargain. When resources were plentiful, bacteria grew more densely and phage epidemics became more frequent. Bacteria then faced greater selective pressure to close themselves off from infection entirely, and so they shut down receptors to gain surface-based resistance. This explained why surface-based resistance was so common in laboratory cultures. Growing in a test tube rich in nutrients, “these bacteria are on a holiday,” Westra said. “They are having a terrific time.”

Still, these rules weren’t cut and dried. Plenty of bacteria in natural high-nutrient environments use CRISPR, and plenty of bacteria in natural low-nutrient environments don’t. “It’s all over the place,” Westra said. “That told us that we were probably still missing something.”

How Biodiversity Reshapes the Battle
Then one of Westra’s graduate students, Ellinor Opsal, proposed another potential factor: the diversity of the biological communities in which bacteria live. This factor is harder to study, but scientists had previously observed that it could affect phage immunity in bacteria. For example, in 2005, James Bull, a biologist at the University of Texas, Austin, and William Harcombe, his graduate student at the time (now at the University of Minnesota), found that E. coli bacteria didn’t evolve immunity to a phage when a second bacterial species was present. Similarly, Britt Koskella, an evolutionary biologist at the University of California, Berkeley, and one of her graduate students, Catherine Hernandez, reported last year that phage resistance failed to arise in Pseudomonas bacteria living on their actual host (a plant), though they always gained immunity in a test tube. Could the diversity of the surroundings influence not just whether or not resistance to phages evolved, but the nature of that resistance?

To find out, Westra’s team performed a new set of experiments: Instead of altering the nutrient conditions for Pseudomonas bacteria growing with phages, they added three other bacterial species — species that competed against Pseudomonas for resources but weren’t targeted by the phage. Left to themselves, Pseudomonas would normally develop surface-based mutations. But in the company of rivals, they were far more likely to turn to CRISPR. Further investigation showed that the more complex community dynamics had shifted the fitness costs: The bacteria could no longer afford to inactivate receptors because they not only had to survive the phage, but also had to outcompete the bacteria around them. These results from Westra’s group dovetail with earlier findings that phages can produce greater diversity in bacterial communities. “Now, that diversity is actually feeding back to the phage side of things” by affecting phage resistance, Koskella said. “It’s neat to see that coming full circle.” By understanding that kind of feedback loop, she added, “we can start to ask more general questions about the impacts that phages have in a community context.”

For one, the bacteria’s shift toward a CRISPR-based phage response had another, broader effect. When Westra’s group grew Pseudomonas in moth larvae hosts, they found that the bacteria with surface-based resistance were less virulent, killing the larvae much more slowly than the bacteria with active CRISPR systems did. 18

Diversity, ecology, and evolution of the CRISPR-Cas systems
Devashish Rath (2015): The length and sequence of repeats and the length of spacers are well conserved within a CRISPR locus, but may vary between CRISPRs in the same or different genomes. Repeat sequences are in the range of 21 bp to 48 bp, and spacers are between 26 bp and 72 bp. The number of spacers within a CRISPR locus varies widely; from a few to several hundred. Genomes can have single or multiple CRISPR loci and in some species, these loci can make up a significant part of the chromosome. Not all CRISPR loci have adjacent cas genes and instead rely on trans-encoded factors. (a trans-acting factor is usually a regulatory protein that binds to DNA). Another feature associated with CRISPR loci is the presence of a conserved sequence, called leader, located upstream of the CRISPR with respect to the direction of transcription. The Cas proteins are a highly diverse group. Many are predicted or identified to interact with nucleic acids; e.g. as nucleases, helicases and RNA-binding proteins. The Cas1 and Cas2 proteins are involved in adaptation and are virtually universal for CRISPR-Cas systems. Other Cas proteins are only associated with certain types of CRISPR-Cas systems. The diversity of Cas proteins, the presence of multiple CRISPR loci, and frequent horizontal transfer of CRISPR-Cas systems make classification a complex task. The most adopted classification identifies Type I, II and III CRISPR-Cas systems, with each having several subgroups. Different types of CRISPR-Cas systems can co-exist in a single organism. Recently, a Type IV system was proposed, which contains several Cascade genes but no CRISPR, cas1 or cas2. Type IV complex would be guided by protein-DNA interaction, not by crRNA, and constitutes an innate immune system preset to attack certain sequences. The Type I systems are defined by the presence of the signature protein Cas3, a protein with both helicase and DNase domains responsible for degrading the target.

Eugene V. Koonin (2019): The number and diversity of known CRISPR–Cas systems have substantially increased in recent years. The new classification includes 2 classes, 6 types and 33 subtypes compared with 5 types and 16 subtypes in 2015. At the adaptation stage, a distinct complex of Cas proteins binds to a target DNA, often after recognizing a distinct, short motif known as a protospacer-adjacent motif (PAM), and cleaves out a portion of the target DNA, the protospacer. (The PAM is a component of the invading virus or plasmid, but is not found in the bacterial host genome and hence is not a component of the bacterial CRISPR locus.) After duplication of the repeat at the 5ʹ end of the CRISPR array, the adaptation complex inserts the protospacer DNA into the array, so that it becomes a spacer. Some CRISPR–Cas systems employ an alternative mechanism of adaptation — namely, spacer acquisition from RNA, via reverse transcription by a reverse transcriptase encoded at the CRISPR–cas locus. At the expression stage, the CRISPR array is typically transcribed as a single transcript — the pre-CRISPR RNA (pre-crRNA) — that is processed into mature CRISPR RNAs (crRNAs), each containing the spacer sequence and parts of the flanking repeats. In different CRISPR–Cas variants, the pre-crRNA processing is mediated by a distinct subunit of a multiprotein Cas complex, by a single, multidomain Cas protein, or by non-Cas host RNases. At the interference stage, the crRNA, which typically remains bound to the processing complex (protein), serves as a guide to recognize the protospacer (or a closely similar sequence) in the invading genome of a virus or plasmid, which is then cleaved and inactivated by a Cas nuclease (or nucleases). The above summary is a brief, oversimplified description of the CRISPR–Cas functionality that inevitably omits many details.

Similar to other biological defense mechanisms, archaeal and bacterial CRISPR–Cas systems show a remarkable diversity of Cas protein sequences, gene compositions, and architectures of the genomic loci. Our knowledge of this diversity is continuously expanding through the screening of ever-growing genomic and metagenomic databases. To keep pace with such expansion, a robust classification of CRISPR–Cas systems is essential for the progress of CRISPR research, but this presents formidable challenges, owing to the lack of universal markers and the fast evolution of the CRISPR–cas loci. The 2015 classification included 5 types and 16 subtypes, as well as introducing the major division of CRISPR–Cas systems into two classes that radically differ with respect to the architectures of their effector modules involved in crRNA processing and interference.

The class 1 systems have effector modules composed of multiple Cas proteins, some of which form crRNA-binding complexes (such as the Cascade complex in type I systems) that, with contributions from additional Cas proteins, mediate pre-crRNA processing and interference. By contrast, class 2 systems encompass a single, multidomain crRNA-binding protein (such as Cas9 in type II systems) that combines all activities required for interference and, in some variants, also those involved in pre-crRNA processing (Box 1).

Box 1: Class 1 CrisPr–Cas systems have effector modules composed of multiple Cas proteins that form a crrNa-binding complex and function together in binding and processing of the target. Class 2 systems have a single, multidomain. crrNa-binding protein that is functionally analogous to the entire effector complex of class 1. Part a of the figure illustrates the generic organizations of the class 1 and class 2 CrisPr–Cas loci. Part b of the figure shows the functional modules of CrisPr–Cas systems. the scheme shows the typical relationships between the genetic, structural and functional organizations of the six types of CrisPr–Cas systems. Protein names follow the current nomenclature. an asterisk indicates the putative small subunit that might be fused to the large subunit in several type i subtypes. the pound symbols (#) indicate that other unknown sensor, effector and ring nuclease protein families could be involved in the same signaling pathway. Dispensable (and/or missing, in some subtypes and variants) components are indicated by dashed outlines. Cas6 is shown with a thin solid outline for type i because it is dispensable in some, but not most, systems and with a dashed line for type iii because most of these systems apparently use the Cas6 protein provided in trans by other CrisPr–cas loci. the three colours for Cas9, Cas10, Cas12 and Cas13 reflect the fact that these proteins contribute to different stages of the CrisPr–Cas response. the CrisPr-associated rossmann fold (CarF) and higher eukaryotes and prokaryotes nucleotide-binding (HePN) domain proteins are the most common sensors and effectors, respectively, in the type III ancillary modules, but several alternative sensors and effectors have been identified, as well43. ring nucleases are a distinct variety of CarF domain proteins that cleave cyclic oligoa produced by Cas10 and thus control the indiscriminate rNase activity of the HePN domain of Csx1. Ls, large subunit; ss, small subunit; tracrrNa, transactivating CrisPr rNa.

Understanding CRISPR-Cas9
Imagine a company had the task to install a security system in its headquarters, based on biometrics. Biometrics comes from the Greek words “bios” (life) and “metrikos” (measure). It involves the implementation of a system that uses the analysis of biological characteristics of people, and that analyzes human characteristics for identity verification or identification. In order to distinguish employees that are permitted to enter the building, and exclude to enter those that are not welcome, there has to be first data collection and storage of the information in a memory bank. Every time, when someone arrives at the building, it will go through the security check, and the provided data will be compared to the data in the memory bank. If there is a match, the person is permitted to enter, or not.

Analogously, cells are capable of doing almost the same, with a few differences. They have an ingenious security check system, based on enemy recognition, and based on that knowledge, creating a sophisticated data bank, that is employed to recognize future enemy invasions, and annihilate them.

Pascale Cossart (2016): In nature, bacteria need to defend themselves constantly, particularly against bacteriophages (or phages), the viruses that specifically attack bacteria. A phage generally attaches itself to a bacterium, injects its DNA into it, and subverts the bacterium’s mechanisms of replication, transcription, and translation in order to replicate itself. The phage DNA reproduces its own DNA, transcribes it into RNA, and produces phage proteins that accumulate to generate new phages and eventually cause the bacterial cell to explode (or lyse), releasing hundreds of new bacteriophages. Phages continually infect bacteria everywhere—in soil, in water, and even in our own intestinal microbiota. Bacteriophage families are numerous and vary widely in their form, size, composition, and the bacteria they target. To begin their attack, bacteriophages need a site of attachment, a particular component on the surface of a bacterium. This site of attachment is specific for each virus and the bacteria that it can infect. Bacteria have an immune system called CRISPR. CRISPR regions in the chromosomes allow bacteria to recognize predators, particularly previously encountered phages, and to destroy them. CRISPR regions protect and essentially “vaccinate” bacteria against bacteriophages. In fact, it has been shown that bacteria can be artificially vaccinated! When a population of bacteria is inoculated with a phage, a small number survive and are able to integrate a fragment of the phage DNA into their genome, in the region called the CRISPR locus. This allows the bacteria, if the phage ever attacks again, to recognize the phage DNA and degrade it. This ingenious phenomenon, known as interference, occurs due to the structure of the CRISPR region and to cas genes (CRISPR-associated genes) located near this region. The CRISPR locus is a region of the chromosome composed of repeated sequences of around 50 nucleotides, interspersed with sequences known as spacers that are similar to those of bacteriophages. Some bacteria have several CRISPR loci with different sequence repetitions. Around 40% of bacteria have one or more CRISPRs, whereas others have none. CRISPR loci can be quite long, sometimes with more than 100 repetitions and spacers. CRISPRs have two functions: acquisition and interference. Acquisition, also called adaptation, is the process of acquiring fragments of DNA from a phage, and interference is the immunization process by Cas proteins encoded by cas genes .

Bacteria have numerous proteins with various complementary and synergistic functions in the process of adaptation and interference. They permit the addition of DNA fragments into the CRISPR locus, but their main purpose is to react to invading phages. The CRISPR locus is transcribed into a long CRISPR RNA, which is then split into small RNAs called crRNAs, each containing a spacer and a part of the repeated sequence. When a phage injects its DNA into the bacterium, the crRNA recognizes and binds to it. An enzyme then recognizes the hybrid and cleaves the phage DNA at the point where the crRNA has paired. Replication of the phage DNA is inactivated, and the infection is stopped. Genome editing or modification is the identification of the proteins involved in the cleavage of the hybrid DNA. This process is performed by a complex of proteins containing the protein Cas1 and sometimes by a single protein called Cas9. Cas9 is unique in that it can attach itself to a DNA strand and, due to the two distinct domains of its structure, cut this DNA on each of its two strands. This protein is the basis of the CRISPR/Cas9 technology, which enables a variety of genome modifications and mutations in mammals, plants, insects, and fish in addition to bacteria. This system works due to the Cas9 protein and also a guide RNA hybrid that is made from one RNA similar to the region to be mutated and a second RNA called tracrRNA, or trans-activating crRNA. tracrRNA was discovered next to the CRISPR locus in Streptococcus pyogenes and was shown to be homologous to the repeated regions of the locus, enabling it to guide the Cas9 protein and the crRNA toward the target. In summary, by expressing the Cas9 protein with a composite RNA made up of an identical sequence to the target region, a tracrRNA, and a complementary fragment to the tracrRNA, one can now introduce a mutation or deletion into a target genome of any origin. After the 2012 publication in Science of the elegant studies by the teams led by Emmanuelle Charpentier and Jennifer Doudna, the CRISPR method was so intriguing that it provoked an avalanche of research and publications demonstrating that this technique could be used in many cases and with many variations. 24

Understanding CRISPR-Cas9 1min 26s: Let's begin by talking about how CRISPR cas9 works naturally in bacteria like streptococcus pyogenes or E.Coli or something like that. Essentially CRISPR cas9 in a bacteria acts as an adaptive immune response that is it remembers when a virus has infected the cell in the past and it keeps a little bit of viral DNA and uses it so that if the same species of virus infects the cell again it will be able to respond to it much more quickly and more effectively. A bacteriophage that's a kind of virus that infects bacteria by injecting its DNA into this bacterial cell like a syringe, it injects its DNA into the bacterial cell. And the first thing that happens is a pair of enzymes called CAS one and CAS two they are actually two separate enzymes but they function together they're joined at the hip they're always together and they work in concert and what they will do is they will cut out a region of the viruses DNA called a protospacer and stick it into part of the bacterial chromosome that's called a CRISPR array

In the CRISPR array, there are repeats separated by spaces and so this protospacer will become a spacer in the CRISPR array the term proto means ahead of or before and that's exactly what happens these enzymes Cas1 and Cas2 identify this and think well that would be a suitable spacer let's turn it into a spacer so it's a proto-spacer and they make it into a spacer, of course, they don't really think anything they're enzymes they don't have brains.

The spacer gets inserted at the five-prime end of the crisper array that is at the five-prime end of the complementary strand of the crisper array so it gets put there and then they build a new repeat region afterward and you'll notice that there's a repeat after each spacer spacer repeat spacer repeat spacer repeat. Every one of those repeats is exactly the same as all the other repeats that's why it's called a repeat and the spaces of course are in between them. The term CRISPR array CRISPR the word CRISPR stands for clustered regularly interspaced palindromic regions ( A palindrome is a word, number, phrase, or other sequences of characters that reads the same backward as forward, such as madam or racecar) Clustered means all together because this CRISPR array is all in one place on the chromosome. All these spaces are all together in one cluster regularly interspaced is referring to the spaces that are regularly placed between these repeats along the CRISPR region. The palindromic repeats well the repeat part of that makes sense right because these repeats are repeated and the palindromic bit simply means that there are regions within those repeats that read the same on both strands in a five prime to three prime direction

Restriction endonucleases that restriction endonucleases identify usually a restriction site because it's a palindrome these repeats are often a site where enzymes can interact. Now when it comes to cas1 and cas2 taking a proto-spacer and turning it into a spacer they don't just cut the DNA randomly in any old place they cut it at a precise location adjacent upstream of a proto-spacer adjacent motif now a proto spacer.

This term proto spacer is obviously referring to the protospacer adjacent means next to and a motif is that's an English word that just means a regularly repeated pattern as you'd see on wallpaper or something like that and of course it's quite common in any DNA to find two guanines together but the proto-spacer adjacent motif at least in streptococcus pyogenes is any nucleotide at all followed by guanine guanine. Two guanines together followed following after anything adenine thymine cytosine another guanine it doesn't really matter. Any nucleotide followed by two guanines that is the protospacer adjacent motif or the pam as it's called.

These enzymes cas1 and cas2 they'll scan the DNA looking for a pam site and when they find it they'll go upstream that is to the five prime end on the complementary strand or the coding strand and then they'll cut out a section of bases probably around about 20 to 26 bases long and then turn that into a spacer and they'll insert the spacer at the five prime ends of the crisper region and then build a new repeat to the five prime end of that pushing the spacer further and further toward the five prime ends. The crisper array then is flexible as more I mean if a particular bacteria has been infected by lots of different bacteria it may have a very long CRISPR array with lots of spaces and repeats. Other bacteria might only have one or a few. In some bacteria, people have discovered that they have hundreds of spaces and in other species of bacteria, there may only be a couple of them so it's a flexible CRISPR array.

From time to time RNA polymerase will transcribe that CRISPR region that whole CRISPR array into an RNA molecule but it's not a messenger RNA because it's not going to go to a ribosome and be translated into a polypeptide we call it pre-CRISPR RNA it's a single RNA molecule containing both repeat and spacer regions and another kind of RNA called unprocessed tracer RNA which has come it's been transcribed from another gene somewhere else in the cell about a quarter of the way around the chromosome there's another gene that's transcribed by RNA polymerase to make this unprocessed tracer RNA and the tracer RNA has regions in it that are complementary to regions in the CRISPR RNA and so it will stick to again those complementary regions will stick to the repeat regions in the CRISPR RNA

Then an enzyme called RNAase comes along and cuts through those repeat regions giving a piece of RNA that is made of the RNA from the spacer the RNA from the repeat and some of the unprocessed tracer RNA which we call tracer RNA so we have this structure in the cell made really of two pieces of RNA. One piece of RNA one molecule of RNA which is spacer RNA and repeat RNA that's a single polymer of RNA nucleotides and then we have another polymer of RNA nucleotides which is held to it by hydrogen bonds called tracer RNA. Altogether we can refer to that as CRISPR colon tracer RNA or CRISPR tracer RNA. That molecule gets picked up by an enzyme called cas9. Cas9 grabs that CRISPR tracer RNA and holds onto it and now we refer to it as gRNA or guide RNA. So guide RNA is CRISPR tracer RNA but it's crisp tracer RNA that's attached to a cas9 enzyme.

The cas9 enzyme itself is made of a single polypeptide chain but it has six different regions that do different things. In particular we don't need to know what all these regions do but there are a couple that I think are actually helpful to have a look at. Firstly there's this area that is called the pi which stands for pam interacting domain that's the part of the polypeptide part of the protein which identifies the pam hence the name pam interacting it identifies gg it has a region in it that has a complementary shape and charges to two guanines and that helps the cas9 enzyme to identify gg something on a stretch of DNA. These two domains are also really helpful and by domain we just mean a part of the protein that sort of form it performs a specific function it is all one polypeptide again but HNC and RuvC are what we call nuclease domains that is they're the part of the protein that actually cuts the DNA. A lot of the rest of this structure is there just to hold the guide RNA in place in the cas9 enzyme.

We have both right here the CRISPR RNA made of both the RNA that was transcribed from the spacer region of the DNA and the RNA that was transcribed from the repeat region all as a single polymer of nucleotides all as a single RNA molecule. And then held to that RNA molecule we've got a separate polymer of nucleotides here a separate RNA held to it by complementary base pairing the way to think about this really is the CRISPR RNA is kind of like the lenses in eyeglasses. It's the lenses that do the job of the glasses the reason one has glasses in the first place is because of what the lenses do for somebody. The purpose of the arms of the glasses is just to hold the lenses to one face and that's what the tracer RNA does in the guide RNA the tracer RNA is really there just to anchor the CRISPR RNA to the cas9 enzyme it serves as a scaffold we say for the CRISPR RNA holding it to the enzyme

You may have also heard of sg RNA and there's a bit of confusion around that a lot of people use the terms sg RNA and gRNA interchangeably as though they're the same thing they're not exactly the same thing and there's even confusion around whether it what sg actually stands for some people say that it stands for short guide other people say it stands for synthetic guide and they both actually make sense because it is short and it is synthetic and by synthetic I mean it's made in a laboratory.

Eugene V Koonin (2011): CRISPR-Cas systems have three distinct functional stages of their operation. During the first stage, adaptation, short pieces of DNA (characteristic length of approximately 30 bp) homologous to virus or plasmid sequences (known as proto-spacers) are integrated into the CRISPR loci. The short (3 or 4 nucleotides) proto-spacer adjacent motifs (PAMs) located immediately downstream of the proto-spacer appear to determine the selection of the protospacer followed by integration into a pre-existing CRISPR array. The second stage, expression and processing, involves transcription and cleavage of long primary transcript of a CRISPR locus (pre-crRNA) that is processed into short crRNAs. This step is catalyzed by endoribonucleases encoded by the cas genes that either operate as a subunit of a larger complex (e.g. Cascade, CRISPR-associated complex for antiviral defense in Escherichia coli) or as a stand-alone enzyme, e.g., Cas6 in the archaeon Pyrococcus furiosus. At the third stage, interference, the alien nucleic acid (DNA or RNA) is targeted by a ribonucleoprotein complex containing a crRNA guide and a set of Cas proteins, and cleaved within or in the vicinity of the PAM sequence. In several CRISPR-Cas systems, crRNA have been shown to be complementary to either strand of the phage or plasmid which is best compatible with DNA being the target. Direct demonstration of DNA being the target of the CRISPR-Cas machinery has come from experiments in Staphylococcus epidermidis. In this case, insertion of a self-splicing intron into the proto-spacer sequence of the target gene rendered the respective plasmid resistant to the CRISPR-mediated immunity. 22

A roadmap of CRISPR-Cas adaptation and defense.
In the example illustrated, a bacterial cell is infected by a bacteriophage. The first stage of CRISPR-Cas defense is CRISPR adaptation. This involves the incorporation of small fragments of DNA from the invader into the host CRISPR array. This forms a genetic “memory” of the infection. The memories are stored as spacers (colored squares) between repeat sequences (R), and new spacers are added at the leader-proximal (L) end of the array. The Cas1 and Cas2 proteins, encoded within the cas gene operon, form a Cas1-Cas2 complex (blue)—the “workhorse” of CRISPR adaptation. In this example, the Cas1-Cas2 complex catalyzes the addition of a spacer from the phage genome (purple) into the CRISPR array. The second stage of CRISPR-Cas defense involves transcription of the CRISPR array and subsequent processing of the precursor transcript to generate CRISPR RNAs (crRNAs). Each crRNA contains a single spacer unit that is typically flanked by parts of the adjoining repeat sequences (gray). Individual crRNAs assemble with Cas effector proteins (light green) to form crRNA-effector complexes. The crRNA-effector complexes catalyze the sequence-specific recognition and destruction of foreign DNA and/or RNA elements. This process is known as interference. 13

S. H. Sternberg (2015): CRISPR-Cas immunity is conferred through integration of short DNA fragments into the CRISPR locus, and these spacer sequences record the history of past infections. The CRISPR locus is transcribed, and the resultant transcript is processed into shorter CRISPR-RNAs (crRNAs). CRISPR-Cas systems are classified as types I, II or III, which can be distinguished based on the presence of the signature Cas3, Cas9, or Cas10 genes, respectively. Type I are the most common, and much of our understanding of type I CRISPR-Cas systems comes from studies of E. coli Cascade (CRISPR-associated complex for antiviral defense), which is comprised of the five proteins Cse1, Cse2, Cas7, Cas5e, and Cas6e. These proteins assemble on a 61-nt crRNA, yielding a 405- kDa complex. The crRNA contains the 32-nt spacer sequence, which directs Cascade to sequences (protospacers) in foreign DNA, leading to formation of an R-loop intermediate. Cascade then recruits Cas3, which has an N-terminal histidine-aspartate (HD) nuclease domain and C-terminal superfamily 2 (SF2) helicase domain, to degrade the DNA. Cascade must discriminate between spacer sequences found in the bacterial chromosome and those found in foreign DNA. This discrimination is thought to be accomplished through recognition of a trinucleotide sequence motif called the protospacer-adjacent motif (PAM; 5′-A[A/T]G-3′ for E. coli Cascade), which is adjacent to the protospacer in foreign DNA, but absent in the CRISPR locus. Strict sequence requirements present a potential weakness because mutations in either the PAM or protospacer can allow foreign DNA to escape CRISPR-Cas immunity. However, bacteria can rapidly restore immunity using a positive-feedback loop to update the CRISPR locus. Priming requires Cascade with a crRNA bearing at least partial complementarity to the escape target, suggesting Cascade must be able to locate targets even when they bear mutations sufficient to escape immunity. Priming also requires Cas3 and the Cas1-Cas2 complex, which integrate new sequences into the CRISPR locus. The PAM-dependent pathway is highly efficient and allows Cascade to recruit Cas3 for strand-specific degradation of the target genome. The PAM-independent pathway is less efficient, but Cascade can still bind tightly to the DNA, ensuring that it can initiate the sequence of molecular events that precede primed spacer acquisition. Through this pathway, Cas3 recruitment becomes strictly dependent on Cas1-Cas2, and Cas1-Cas2 also attenuate Cas3 nuclease activity and enable Cas3 to rapidly translocate in either direction along the foreign DNA. These results establish Cas1-Cas2 as a trans-acting factor necessary for the recruitment and regulation of Cas3 at escape targets. Based on our findings, we propose a mechanistic framework describing how Cascade, Cas1, Cas2, and Cas3 work together to process and disable foreign genetic elements. 21

M. P. Terns et al. (2015): (CRISPR-Cas immune systems function to defend prokaryotes against potentially harmful mobile genetic elements including viruses and plasmids. The multiple CRISPR-Cas systems (Types I, II, and III) each target destruction of foreign nucleic acids via structurally and functionally diverse effector complexes (crRNPs). CRISPR-Cas effector complexes are comprised of CRISPR RNAs (crRNAs) that contain sequences homologous to the invading nucleic acids and Cas proteins specific to each immune system type. CRISPR-Cas systems confer prokaryotes with adaptive immunity against viruses, conjugative plasmids, and other potential genome invaders. A host CRISPR (clustered regularly interspaced short palindromic repeats) locus contains a leader region (typically 100–500 bp) followed by multiple copies of a repeat sequence (∼30–40 bp) separated by similarly sized, variable invader-derived sequences. Each crRNA contains a guide region comprised of invader-derived sequences that allow crRNA-Cas protein effector complexes to recognize and destroy invader nucleic acids. CRISPR-associated (Cas) proteins provide enzymatic machinery and structural components to carry out the distinct phases of the CRISPR-Cas pathway. Moreover, modules of Cas proteins (e.g., Csa, Cst, Cse, Csm, Cmr) comprise the distinct CRISPR-Cas immune systems: Type I (A-G), Type II (A-C), and Type III (A-B). 16

Dipali G Sashital (2019): Within this system, the CRISPR locus is programmed with ‘spacer’ sequences that are derived from foreign DNA and serve as a record of prior infection events 14 CRISPR cas9 in a bacteria acts as an adaptive immune response that is it remembers when a virus has infected the cell in the past and it keeps a little bit of viral DNA and stores it in a memory bank ( the spacers) and uses it so that if the same species of virus infects the cell again it will be able to compare the injected DNA to sequences in the data bank, recognize and respond to it quickly and effectively, and destroy it.

Devashish Rath (2015): The CRISPR-Cas mediated defense process can be divided into three stages.

1. Adaptation or spacer acquisition,
2. crRNA biogenesis ( expression), and
3. Target interference

CRISPR memory update
https://www.youtube.com/watch?v=piHaA1nBsDY

1. Naive adaptation
- The MGE enters RecBCD and one strand is secured by the Chi site
- Cas1-Cas2 is recruited to the PAM site
2. Prespacer trimming
- DNA Pol III trims the prespacer
- The DNA Pol III cuts PAM of the prespacer and integrates it at the S-Site.
3. Spacer integration
4. Repeat duplication
- The repeat is divided into two single strands, one of the two strands is used to form a new repeat, with the spacer dividing the two, and a second strand is added by DNA polymerase to both single strand repeats, to have double strand repeats again. The new spacer is formed.
5. Interference
- A spacer is taken to form a crRNA and inserted in CASCADE
6. Primed adaptation

My comment: This is a logical sequence with a clear end goal. To confer protection bacteria from invading MGEs. Foresight is necessary to instantiate such an interdependent system, that requires all sequential operating steps to be fully implemented, instantiated, and operational, with each member, each enzyme, and protein machine operating in a coordinated fashion in a joint venture, in a precise orchestration together.

The first stage, adaptation, leads to the insertion of new spacers in the CRISPR locus. In the second stage, expression, the system gets ready for action by expressing the cas genes and transcribing the CRISPR into a long precursor CRISPR RNA (pre-crRNA). The pre-crRNA is subsequently processed into mature crRNA by Cas proteins and accessory factors. In the third and last stage, interference, target nucleic acid is recognized and destroyed by the combined action of crRNA and Cas proteins

A.Price et al., (2016): CRISPR-Cas systems operate as adaptive immune defenses to target and degrade nucleic acids derived from bacteriophages and other foreign genetic elements. 12

1. Dana K Howe Muller's Ratchet and compensatory mutation in Caenorhabditis briggsae mitochondrial genome evolution 2008
2. Eugene V Koonin: Inevitability of Genetic Parasites 2016 Sep 26
3. Eugene V. Koonin: Inevitability of the emergence and persistence of genetic parasites caused by evolutionary instability of parasite-free states 04 December 2017
4. Gregory P Fournier: Ancient horizontal gene transfer and the last common ancestors 22 April 2015
5. Aude Bernheim The pan-immune system of bacteria: antiviral defence as a community resource 06 November 2019
6. Felix Broecker: Evolution of Immune Systems From Viruses and Transposable Elements 29 January 2019
7. Eugene V. Koonin: Evolution of adaptive immunity from transposable elements combined with innate immune systems December 2014
8. Eugene V. Koonin: The LUCA and its complex virome  14 July 2020
9. Luciano Marraffini: (Ph)ighting phages – how bacteria resist their parasites 2020 Feb 13
10. Simon J Labrie: Bacteriophage resistance mechanisms 2010 Mar 29.
11. Anna Lopatina: Abortive Infection: Bacterial Suicide as an Antiviral Immune Strategy 2020 Sep 29
12. Aryn A. Price et al.,: Harnessing the Prokaryotic Adaptive Immune System as a Eukaryotic Antiviral Defense 2016 Feb 3
13. Devashish Rath: The CRISPR-Cas immune system: Biology, mechanisms and applications October 2015
14. Dipali G Sashital: The Cas4-Cas1-Cas2 complex mediates precise prespacer processing during CRISPR adaptation Apr 25, 2019
15. SIMON A. JACKSON: CRISPR-Cas: Adapting to change 7 Apr 2017
16. M. P. Terns et al. Three CRISPR-Cas immune effector complexes coexist in Pyrococcus furious 2015 Jun; 21
17. Carl Zimmer Breakthrough DNA Editor Born of Bacteria February 6, 2015
18. Jordana Cepelewicz: Biodiversity Alters Strategies of Bacterial Evolution January 6, 2020
19. Tina Y.Liu: Chemistry of Class 1 CRISPR-Cas effectors: Binding, editing, and regulation 16 October 2020
20. Giedrius Gasiunas: Cas9–crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria September 4, 2012
21. Samuel H. Sternberg et al. Surveillance and Processing of Foreign DNA by the Escherichia coli CRISPR-Cas System  2015 Nov 5
22. Eugene V Koonin: Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPRCas systems 2011 Jul 14
23. Eugene V Koonin: Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants 19 December 2019
24. Pascale Cossart: THE NEW Microbiology From Microbiomes to CRISPR  2016

https://www.the-scientist.com/news-opinion/prokaryotes-are-capable-of-learning-to-recognize-phages-70378

Last edited by Otangelo on Tue Aug 23, 2022 9:25 am; edited 3 times in total

Otangelo

Otangelo

Admin

Posts : 9704
Join date : 2009-08-09
Age : 58
Location : Aracaju brazil

CRISPR adaptation

Devashish Rath (2015): The adaptation phase provides the genetic memory that is a prerequisite for the subsequent expression and interference phases that neutralize the re-invading nucleic acids. Conceptually, the process can be divided into two steps:

1. Protospacer selection (Cas1-Cas2 substrate capture)
2. Generation of spacer material followed by
3. Integration of the spacer into the CRISPR array and synthesis of a new repeat.

A bacteriophage infects bacteria by injecting its DNA into this bacterial cell like a syringe. And the first thing that happens is a pair of enzymes called CAS1 and CAS2 that are two separate enzymes but they function together they're joined and work always together in concert. They cut out a region of the viruses' DNA called a protospacer and stick it into the bacterial data bank, a part of the chromosome that's called a CRISPR array.

In the CRISPR array, there are repeats (which are small sections of the DNA extracted from the invading DNA from the phage) separated by spaces and so this protospacer, after processing, will become a spacer in the CRISPR array. The term proto means ahead of or before and that's exactly what happens these enzymes Cas1 and Cas2 are programmed to identify a phage DNA section suitable to become a spacer and turn it into it. The spacer gets inserted at the five-prime end of the crisper array, of the complementary strand, and then the machinery builds a new repeat region afterward. There's a repeat after each spacer: Spacer repeat, spacer repeat. Every one of those repeats is exactly the same as all the other repeats that's why it's called a repeat and the spaces are in between them. The term CRISPR stands for clustered regularly interspaced palindromic regions ( A palindrome is a word, number, phrase, or another sequence of characters that reads the same backward as forward, such as madam or racecar) Clustered means all together because this CRISPR array is all in one place on the chromosome. All these spaces are all together in one cluster regularly interspaced.

A restriction endonuclease is an enzyme that cleaves DNA into fragments at or near a specific recognition site. Restriction endonucleases identify usually a restriction site because it's a palindrome. These repeats are often a site where enzymes can interact. Now when it comes to cas1 and cas2 taking a proto-spacer and turning it into a spacer they don't just cut the DNA randomly in any place. They cut it at a precise location adjacent upstream of a proto-spacer. It's quite common in any DNA to find two guanines together but the proto-spacer adjacent motif at least in streptococcus pyogenes is any nucleotide at all followed by guanine guanine. Two guanines together followed following after anything, like for example adenine - thymine - cytosine - another guanine it doesn't really matter. Any nucleotide followed by two guanines that is the protospacer adjacent motif or the PAM as it's called.

These enzymes cas1 and cas2 they'll scan the DNA looking for a PAM site and when they find it they'll go upstream that is to the five prime end on the complementary strand or the coding strand and then they'll cut out a section of bases, around 20 to 26 bases long, and then turn that into a spacer and they'll insert it at the five prime ends of the crisper region, and then build a new repeat to the five prime end of that, pushing the spacer further and further toward the five prime ends. The crisper array is flexible. If a particular bacteria has been infected by lots of different bacteria it may have a very long CRISPR array with lots of spaces and repeats. Other bacteria might only have one or a few. In some bacteria, people have discovered that they have hundreds of spaces and in other species of bacteria, there may only be a couple of them so it's a flexible CRISPR array.13

Dipali G Sashital (2019): The key proteins in collecting and storing the virus DNA are called Cas1, Cas2 and Cas4. ( Cas4 is widespread in type I, II, and V systems.). Previous work suggests that Cas4 is important for cutting suitable lengths of DNA for storage. The adaptation proteins Cas1 and Cas2 are conserved among most CRISPR systems, suggesting a common molecular mechanism for acquiring spacers. Cas1 and Cas2 catalyze spacer integration via two transesterification reactions mediated by nucleophilic attack on each strand of a double-stranded prespacer substrate at the phosphodiester backbone within the CRISPR array. Integration occurs at the first repeat in the CRISPR array, with one attack occurring between the upstream leader sequence and the repeat and the other occurring on the opposite strand between the repeat and first spacer within the array. These reactions result in the insertion of the prespacer between two single-strand repeats, and this gapped intermediate is repaired by host factors. In order to form a functional spacer, the adaptation complex must capture and process longer fragments of DNA from the invader containing a flanking sequence called a protospacer adjacent motif (PAM). The PAM is an essential motif during target recognition by the surveillance complex and must be present next to the target in order for interference to occur. However, the PAM is not part of the spacer and must be removed from the prespacer prior to integration through a processing step. In addition, integration must occur in the correct orientation to produce a crRNA that is complementary to the PAM-containing strand of the invader. In some systems, additional Cas proteins, such as Cas4, are also required during adaptation. In in vivo studies, deletion of cas4 reduced the adaptation efficiency and resulted in the acquisition of non-functional spacers from regions that lacked a correct PAM. Some systems have two cas4 genes that work together to define the PAM, length and orientation of spacers, suggesting that the two Cas4 proteins are involved in processing each end of the prespacer and that they may be present during integration. Similarly, in vitro studies have suggested that Cas4 is involved in PAM-dependent prespacer processing. Cas4 endonucleolytically cleaves PAM-containing 3ʹ-single-stranded overhangs that flank double-stranded prespacers. Importantly, Cas4 cleavage activity is dependent on the presence of Cas1 and Cas2, and Cas4 inhibits premature integration of unprocessed prespacers. These observations suggest that Cas4 associates with the Cas1-Cas2 complex, although direct biochemical and structural evidence for this Cas4-Cas1-Cas2 complex remains elusive. 14

My comment: Cas1 and cas2 scan DNA looking for a PAM they are programmed to identify, the adaptation complex captures and processes longer fragments of DNA from the invader then the machinery builds a new repeat region, the PAM is not part of the spacer and must be removed from the prespacer prior to integration. Afterward. host factors repair gapped intermediates, Cas4 inhibits premature integration of unprocessed prespacers. These are all terms with teleological significance, and indicate preprogrammed design. We can observe here sophisticated,operations that are executed in a sequential, logical, machine-like fashion, where each player has its specific task and function.

Simon A. Jackson (2017): The Cas1 and Cas2 proteins, constitute the “workhorse” of spacer integration. Spacers added to CRISPR arrays must be compatible with the diverse range of type-specific effector complex machinery. Thus, despite being near ubiquitous among CRISPR-Cas types, Cas1-Cas2 homologs meet the varied requirements for the acquisition of appropriate spacer sequences in different systems. For example, the effector complexes of several CRISPR-Cas types only recognize targets containing a specific sequence adjacent to where the CRISPR RNA (crRNA) base pairs with the target strand of a mobile genetic element (MGE). The crRNA-paired target sequence is termed the protospacer, and the adjacent target-recognition motif is called a protospacer-adjacent motif (PAM). PAM-based target discrimination prevents the unintentional recognition and self-destruction of the CRISPR locus by the crRNA-effector complex, yet canonical PAM sequences vary between and sometimes within systems. The Cas1 subunits form two dimers that are bridged by a central Cas2 dimer. . In addition to Cas1-Cas2, at least one CRISPR repeat, part of the leader sequence, and several host factors for repair of the insertion sites (e.g., DNA polymerase) are required.

Cas1-Cas2 substrate capture
In E. coli, expression of the Cas1–Cas2 protein complex triggers acquisition of new 33-base-pair (bp) spacers at the A/T-rich leader end of the CRISPR locus. The protospacer sequence is derived from the M13 bacteriophage genome and is highly acquired into the E. coli CRISPR locus after infection. Cas1–Cas2 has the ability to select specific DNA substrates before integration.

Many CRISPR-Cas systems have highly consistent yet system-specific spacer lengths, and it is likely that analogous wedge-based Cas1- Cas2 “molecular rulers” exist in these systems to control prespacer length.

Jennifer A Doudna (2014):

Recognition of the CRISPR array
Before integration, the substrate-bound Cas1- Cas2 complex must locate the CRISPR leader repeat sequence. Specific sequences upstream of CRISPR arrays direct leader-polarized spacer integration, both through direct Cas1-Cas2 recognition and assisted by host proteins. The Cas1-Cas2 complexes of several systems show an intrinsic affinity for the leader-repeat region in vitro, yet this is not always wholly sufficient to provide the specificity observed in vivo. It was recently discovered that for the type I-E system, leader-repeat recognition is assisted by the integration host factor (IHF) heterodimer. IHF binds the CRISPR leader in a sequence-specific manner and induces 120° DNA bending, providing a cue to accurately localize Cas1-Cas2 to the leader-repeat junction. A conserved sequence motif upstream of the IHF pivot is proposed to stabilize the Cas1- Cas2–leader-repeat interaction and increase the efficiency of spacer acquisition, supporting binding of the adaptation complex to DNA sites on either side of the bound IHF. IHF is absent in many prokaryotes, including archaea, indicating that other leader-proximal integration mechanisms exist. Indeed, type II-A Cas1-Cas2 from Streptococcus pyogenes catalyzed leader-proximal integration in vitro at a level of precision comparable to that of the type I-E system with IHF. In type II systems, a short leader-anchoring site (LAS) adjacent to the first repeat and ≤6 base pairs of this repeat are essential for CRISPR adaptation and are conserved in systems with similar repeats. Placement of an additional LAS in front of a nonleader repeat resulted in the integration of spacers at both sites, whereas LAS deletion caused ectopic integration at a downstream repeat adjacent to a spacer containing a LAS-like sequence (15). Hence, in contrast to type I-E systems, type II-A systems appear to rely solely on intrinsic sequence specificity for the leader-repeat junction.

Integration into the CRISPR array
For CRISPR-Cas types that are reliant on PAM sequences for recognition of targets, the acquisition of interference-proficient spacers requires the processing of the prespacer substrate at a specific position relative to the PAM. Each of the four Cas1 monomers in the Cas1-Cas2 complex contains a PAM-sensing domain. The presence of a PAM in the active site of just one of the Cas1 monomers is sufficient to appropriately position the substrate and PAM relative to the cleavage site. Furthermore, the presence of a PAM within the prespacer substrate ensures integration into the CRISPR in the correct orientation. This directional fidelity is critical because otherwise the PAM in the MGE target would lie at the wrong end of the crRNA target binding site, thus precluding target recognition. To avoid premature loss of the PAM directional cue, processing of the prespacer likely occurs after Cas1-Cas2 orients and docks at the leader-proximal repeat. Cas1-mediated processing of the prespacer creates two 3′OH ends required for nucleophilic attack on each strand of the leader-proximal repeat. The initial nucleophilic attack most likely occurs at the leader-repeat junction and forms a half-site intermediate; then, a second attack at the existing repeat-spacer junction generates the full-site integration product. After the first nucleophilic attack, the intrinsic sequence specificity of the Cas1-Cas2 complex defines the site of the second attack and ensures accurate repeat duplication. CRISPR repeats are often semi-palindromic, containing two short inverted repeat (IR) elements, but the location of these can vary. In type I-B and I-E systems, the IRs occur close to the center of the repeat and are important for spacer acquisition. In the type I-E system, both IRs act as anchors for the Cas1-Cas2 complex, which contains two molecular rulers to position the Cas1 active site for the second nucleophilic attack at the repeat-spacer boundary. However, in the type I-B system from Haloarcula hispanica, only the first IR is essential for integration, and a single molecular ruler, directed by an anchor between the IRs, has been proposed. In the type II-A systems of Streptococcus thermophilus and S. pyogenes, the IRs are located distally within the repeats, suggesting that these short sequences may directly position the nucleophilic attacks without a need for molecular rulers. Although these recent findings suggest that leader-repeat regions at the beginning of CRISPR arrays contain sequences to ensure appropriate Cas1-Cas2 localization, further work is required to determine how the spacer integration events are specifically orchestrated in the diverse range of CRISPR-Cas types.

Production of prespacers from foreign DNA
Despite the elegance of memory-directed defense, CRISPR adaptation is not without complications. For example, the inadvertent acquisition of spacers from host DNA must be avoided because this will result in cytotoxic self-targeting, akin to autoimmunity in eukaryotic adaptive immune systems. Therefore, production of prespacer substrates from MGEs should outweigh production from host DNA.

Naïve CRISPR adaptation
Acquisition of spacers from MGEs that are not already cataloged in host CRISPRs is termed naïve CRISPR adaptation. For naïve CRISPR adaptation, prespacer substrates are generated from foreign material and loaded onto Cas1-Cas2. The main known source of these precursors is the host RecBCD complex. Stalled replication forks that occur during DNA replication can result in double-strand breaks (DSBs), which are repaired through RecBCD-mediated unwinding and degradation of the dsDNA ends back to the nearest Chi sites (In Escherichia coli, acquisition of new spacers largely depends on RecBCD-mediated processing of double-stranded DNA breaks occurring primarily at replication forks, and that the preference for foreign DNA is achieved through the higher density of Chi sites on the self chromosome, in combination with the higher number of forks on the foreign DNA. This explains the strong preference to acquire spacers both from high copy plasmids and from phages). During this repair process, RecBCD produces single-stranded DNA (ssDNA) fragments, which have been proposed to subsequently anneal to form partially duplexed prespacer substrates for Cas1-Cas2. The greater number of active origins of replication and the paucity of Chi sites on MGEs, compared with the host chromosome, bias naïve adaptation toward foreign DNA. Furthermore, RecBCD recognizes the unprotected dsDNA ends that are commonly present in phage genomes upon injection or before packaging, which theoretically provides an additional phage-specific source of naïve prespacer substrates. Despite the role of RecBCD in substrate generation, naïve CRISPR adaptation can occur in its absence, albeit with reduced bias toward foreign DNA. Thus, events other than double-strand breaks (DSBs) might also stimulate naïve CRISPR adaptation, such as R-loops that occur during plasmid replication, lagging ends of incoming conjugative elements, and even CRISPR-Cas–mediated spacer integration events themselves. Furthermore, we do not know whether all CRISPR-Cas systems have an intrinsic bias toward production of prespacers from foreign DNA. In high-throughput studies of native systems, the frequency of acquisition of spacers from host genomes is likely to be underestimated, because the autoimmunity resulting from self-targeting spacers means that these genotypes are typically lethal. For example, in the S. thermophilus type II-A system, spacer acquisition appears biased toward MGEs, yet nuclease-deficient Cas9 fails to discriminate between host and foreign DNA. It is unknown whether CRISPR adaptation in type II systems is reliant on DNA break repair. Further studies in a range of host systems are required to clarify how diverse CRISPR-Cas systems balance the requirement for naïve production of prespacers from MGEs against the risk of acquiring spacers from host DNA.

crRNA-directed CRISPR adaptation (priming)
Mutations in the target PAM or protospacer sequences can abrogate immunity, allowing MGEs to escape CRISPR-Cas defenses. Furthermore, the protection conferred by individual spacers varies: Often, several MGE-specific spacers are required to mount an effective defense and to prevent proliferation of escape mutants. Thus, to maintain effective immunity, CRISPR-Cas systems need to undergo CRISPR adaptation faster than MGEs can evade targeting. Indeed, type I systems have a mechanism known as primed CRISPR adaptation (or priming) to facilitate rapid spacer acquisition, even against highly divergent invaders. Priming uses MGE target recognition that is facilitated by preexisting spacers to trigger the acquisition of additional spacers from previously encountered elements. Thus, priming is advantageous when MGE replication within the host cell exceeds defense capabilities. This can occur when cells are infected by mobile genetic element escape mutants or when the levels of CRISPR-Cas activity are insufficient to provide complete immunity using only the existing spacers, even in the absence of MGE escape mutations. Priming begins with target recognition by crRNA-effector complexes. Therefore, factors that influence target recognition (i.e., the formation and stability of the crRNA-DNA hybrid), including PAM sensing and crRNA-target complementarity, affect the efficiency of primed CRISPR adaptation. Furthermore, these same factors can induce conformational rearrangements in the target-bound crRNA-effector complex that result in favoring either the interference or priming pathways. In type I-E systems, the Cas8e (Cse1) subunit of Cascade can adopt one of two conformational modes, which may promote either direct or Cas1-Cas2–stimulated recruitment of the effector Cas3 nuclease. Cas3, which is found in all type I systems, exhibits 3′ to 5′ helicase and endonuclease activity that nicks, unwinds, and degrades target DNA. In vitro activity of the type I-E Cas3 produces ssDNA fragments of ~30 to 100 nucleotides that are enriched for PAMs in their 3′ ends and that anneal to provide partially duplexed prespacer substrates. The spatial positioning of Cas1-Cas2 during primed substrate generation has not been clearly established, although Cas1-Cas2–facilitated recruitment of Cas3 would imply that the CRISPR adaptation machinery is localized close to the site of prespacer production. In type I-F systems, Cas3 is fused to the C terminus of Cas2 (Cas2-3), so these systems form Cas1–Cas2-3 complexes that couple the CRISPR adaptation machinery directly to the source of prespacer generation during priming. Despite different target recognition modes favoring distinct Cas3 recruitment routes, primed CRISPR adaptation can be provoked by mobile genetic element escape mutants and non-escape (interference proficient) targets. However, when the intracellular copy number influences of the MGE are excluded, interference-proficient targets promote greater spacer acquisition than escape mutants. This forms a positive feedback loop, reinforcing immunity against recurrent threats even in the absence of escapees. If the copy number of the MGE within the host cell is factored in, then escape mutants actually trigger more spacer acquisition. This is because interference rapidly clears targeted MGEs from the cell, whereas escape mutants that evade immediate clearance by existing CRISPR-Cas immunity persist for longer. Over time, the prolonged presence of the escape MGE, combined with the priming-centric CRISPR-Cas target recognition mode, results in higher net production of prespacer substrates and spacer integration. Because priming is initiated by site-specific target recognition (i.e., targeting a priming protospacer), Cas1-Cas2–compatible prespacers are subsequently produced from MGEs with locational biases .However, priming is stimulated more strongly from the interference-proficient protospacer than from the original priming protospacer. 15

Cas protein–assisted production of spacers
DNA breaks induced by interference activity of class 2 CRISPR-Cas effector complexes could trigger host DNA repair mechanisms (e.g., RecBCD), thereby providing substrates for Cas1- Cas2. In agreement with a model for DNA break–stimulated enhancement of CRISPR adaptation, restriction enzyme activity can stimulate RecBCD-facilitated production of prespacer substrates. RecBCD activity may also partially account for the enhanced CRISPR adaptation observed during phage infection of a host possessing an innate restriction-modification defense system. Whether the enhanced CRISPR adaptation was RecBCD-dependent in this example is unknown. In a CRISPR Cas–induced DNA break model, the production of prespacer substrates is preceded by a sequence-specific target recognition. Although direct evidence to support this concept is lacking, CRISPR adaptation in type II-A systems requires Cas1-Cas2, Cas9, a transactivating crRNA (tracrRNA; a cofactor for crRNA processing and interference in type II systems), and Csn2. The PAM-sensing domain of Cas9 enhances the acquisition of spacers with interference-proficient PAMs. However, Cas9 nuclease activity is dispensable, and existing spacers are not strictly necessary, suggesting that the PAM interactions of Cas9 could be sufficient to select appropriate new spacers. Some Cas9 variants can also function with non-CRISPR RNAs and tracrRNA. This raises the possibility that host or MGE-derived RNAs might direct promiscuous Cas9 activity, resulting in DNA breaks or replication fork stalling that could potentially result in prespacer generation.

Roles of accessory Cas proteins in CRISPR adaptation
Although Cas1 and Cas2 play a central role in CRISPR adaptation, type-specific variations in cas gene clusters occur. In many systems, Cas1-Cas2 is assisted by accessory Cas proteins, which are often mutually exclusive and type-specific. For example, in the S. thermophilus type II-A system, deletion of csn2 impaired the acquisition of spacers from invading phages. Direct interaction between Cas1 and Csn2 also suggests a role for Csn2 in conjunction with the spacer acquisition machinery. Csn2 multimers cooperatively bind to the free ends of linear dsDNA and can translocate by rotation-coupled movement. Given that substrate-loaded type II-A Cas1-Cas2 is capable of full-site spacer integration in vitro, Csn2 may be required for prespacer substrate production, selection, or processing. Potentially, Csn2 binding to the free ends of dsDNA provides a cue for nucleases to assist in prespacer generation. Cas4, another ring-forming accessory protein, is found in type I, II-B, and V systems. Confirming its role in CRISPR adaptation, Cas4 is necessary for type I-B priming in H. hispanica and interacts with a Cas1-Cas2 fusion protein in the Thermoproteus tenax type I-A system. Fusions between Cas4 and Cas1 are found in several systems, which indicates a functional association with the spacer acquisition machinery. Cas4 contains a RecB-like domain and four conserved cysteine residues, which are presumably involved in the coordination of an iron-sulfur cluster. However, Cas4 proteins appear to be functionally diverse, with some possessing uni or bidirectional exonuclease activity, whereas others exhibit ssDNA endonuclease activity and unwinding activity on dsDNA. Because of its nuclease activity, Cas4 is hypothesized to be involved in prespacer generation. In type III systems, spacers complementary to RNA transcribed from MGEs are required for immunity. Some bacterial type III systems contain fusions of Cas1 with reverse transcriptase domains (RTs) that provide a mechanism to integrate spacers from RNA substrates. The RT-Cas1 fusion from M. mediterranea can integrate RNA precursors into an array, which are subsequently reverse-transcribed to generate DNA spacers. However, integration of DNA-derived spacers also occurs, indicating that the RNA derived–spacer route is not exclusive. Hence, the combined integrase and reverse transcriptase activity of RT-Cas1–Cas2 enhances CRISPR adaptation against highly transcribed DNA MGEs and potentially against RNA-based invaders. Other host proteins may also be necessary for prespacer substrate production. For example, RecG is required for efficient primed CRISPR adaptation in type I-E and I-F systems, but its precise role remains speculative. Additionally, it is still enigmatic why some CRISPR-Cas systems require accessory proteins, whereas closely related types do not. For example, type II-C systems lack cas4 and csn2, which assist CRISPR adaptation in type II-A and II-B systems, respectively. These type-specific differences exemplify the diversity that has arisen.

The genesis of adaptive immunity in prokaryotes
Casposons are transposon-like elements typified by the presence of Cas1 homologs, or casposases, which catalyze site-specific DNA integration and result in the duplication of repeat sites, analogously to spacer acquisition. It is possible that ancestral innate defenses gained DNA integration functionality from casposases, thus seeding the genesis of prokaryotic adaptive immunity. The innate ancestor remains unidentified but is likely to be a nuclease-based system. Co-occurrence of casposon-derived terminal IRs and casposases in the absence of full casposons might represent an intermediate of the signature CRISPR repeat-spacer-repeat structures. However, the evolutionary journey from the innate immunity– casposase hybrid to full adaptive immunity is unclear. Evolution of diverse CRISPR-Cas types would have required stringent coevolution of the Cas1-Cas2 spacer acquisition machinery, PAM and leader-repeat sequences, crRNA processing mechanisms, and effector complexes. In some systems, mechanisms to enhance the production of Cas1-Cas2–compatible prespacers from MGEs, such as priming, might have arisen because naïve CRISPR adaptation is an inefficient process with a high probability of acquiring spacers from host DNA. However, it was recently shown that promiscuous binding of crRNA-effector complexes to the host genome results in a basal level of lethal “self-priming” in a type I-F system. Host CRISPR and cas gene regulation mechanisms might have arisen to balance the likelihood of self-acquisition events against the requirement to adapt to new threats—for example, when the risk of phage infection or horizontal gene transfer is high. Alternatively, it has been proposed that selective acquisition of self-targeting spacers could provide benefits, such as invoking altruistic cell death, facilitating rapid genome evolution, regulating host processes, or even preventing the uptake of other CRISPR-Cas systems. 15

Interference: Cleaving DNA and RNA Invaders
Sequence-specific destruction of invading MGEs is the basis for CRISPR-Cas defense. In the final stage of CRISPR-Cas-mediated immunity, mature crRNAs guide the interference machinery to cleave invading nucleic acids. In order to store the genetic information of a parasitic MGE, a part of the foreign DNA must be integrated in the genomic CRISPR locus of the host. This, however, raises an inherent problem for the interference machinery: the sole reliance on sequence complementarity between the crRNA and the target sequence would result in cleavage of the CRISPR array. Hence, nearly all characterized CRISPR-Cas systems (except type III) have authentication and discrimination mechanism that involves coordinated recognition of a short sequence, called the protospacer adjacent motif (PAM), by both the adaptation and interference machinery. The presence of a PAM proximal to the acquired spacer and targeted protospacer and its absence in the CRISPR array facilitates robust immunity while averting auto-immune targeting of the CRISPR array. 13

The adaptation phase provides the genetic memory that is a prerequisite for the subsequent expression and interference phases that neutralize the re-invading nucleic acids. The insertion of new spacers has been experimentally demonstrated in several CRISPR Cas subtypes; Type I-A (Sulfolobus solfataricus, and Sulfolobus islandicus), I-B (Haloarcula hispanica), I-E (E. coli) and I-F (Pseudomonas aeruginosa and Pectobacterium atrosepticum) and Type II-A (S. thermophilus and a Streptococcus pyogenes system expressed in Staphylococcus aureus). There are two types of spacer acquisition; naïve, when the invader has not been previously encountered and primed, when there is a pre-existing record of the invader in the CRISPR. Although spacer acquisition is observed, the mechanism is only partly understood. Conceptually, the process can be divided into two steps: protospacer selection and generation of spacer material followed by integration of the spacer into the CRISPR array and synthesis of a new repeat. Occasional deletion of spacers is required to limit the size of the CRISPR, but there is little knowledge of the mechanism or frequency of such events. The key factors in spacer integration are Cas1 and Cas2. This function was suggested early as the proteins are ubiquitous but dispensable for interference. This was later confirmed by the overexpression of Cas1 and Cas2 from a Type I-E system in E. coli, which resulted in spacer integration even in the absence of all other Cas proteins. Both Cas1 and Cas2 are nucleases and mutations in the active site of Cas1 abolishes spacer integration in E. coli. Cas1 and Cas2 from E. coli form a complex where one Cas2 dimer binds two Cas1 dimers. Formation of the complex is required for spacer acquisition but Cas2 nuclease activity is dispensable.

Adaptation: Memorizing Invading Nucleic Acids
Adaptation, also known as spacer acquisition, is the step in which memory of previous infections is formed and is the reason why CRISPR-Cas immunity is adaptive and heritable. The CRISPR array serves as a genetic memory bank, and spacer acquisition into the array is accomplished in several steps: the detection of an MGE, protospacer selection, protospacer processing, and spacer integration into the CRISPR array. The key players of spacer acquisition are Cas1 and Cas2, which are present in nearly all CRISPR-Cas systems. In the type I-E CRISPR-Cas system of E. coli, a stable complex composed of two Cas1 dimers bridged by one Cas2 dimer (abbreviated as Cas1-Cas2) acts as an integrase in which Cas1 is catalytic and Cas2 has a structural function. Cas1 and Cas2 are the only Cas proteins required for naive spacer acquisition in the type I-E system (Figure A);

Fig.1 Spacer Acquisition in Type I Systems
(A) During naive spacer acquisition in type I-E systems, the Cas1-Cas2 complex is sufficient for the recognition of a canonical PAM. After initial fragmentation of the invading DNA by RecBCD (not shown), suitable protospacers are integrated at the leader-proximal end of the CRISPR array (inset). The CRISPR-unrelated integrated host factor (IHF) is essential for this process, as it binds a specific sequence of the leader, yielding a sharply bent DNA structure. DNA bending allows the Cas1-Cas2 complex to recognize and bind the leader-proximal repeat. The 30 OH ends of the protospacer perform nucleophilic attacks on the leader side and spacer side of the repeat backbone. During the first integration reaction, the leader-repeat boundary is nicked and ligated to one strand of the protospacer. During the second integration reaction, the other protospacer strand is ligated to the opposite end of the repeat, leading to the duplication of the first repeat. DNA polymerase and ligase subsequently fill the single-strand gaps.
(B) Primed spacer acquisition requires an existing spacer matching the target. Mutations in the seed sequence or the PAM, however, abolish interference. In some cases, crRNA guides Cascade to bind the imperfect target sequence, but the complex fails to recruit Cas3 for DNA degradation. Here, Cas1-Cas2 recruits Cas3, and the complex translocates bidirectionally (dashed arrows) away from the target site without degrading the DNA. Cas1-Cas2 selects proper protospacer with canonical PAM for spacer integration. (C) Interference-driven spacer acquisition also requires the presence of an existing spacer against the invader, which results in target cleavage by the interference machinery. Following the degradation of target DNA by Cas3, Cas1-Cas2 captures DNA fragments and subsequently integrates them into the CRISPR array.

however, additional Cas proteins are required in other systems. In the type II-A CRISPR-Cas system of Streptococcus pyogenes and S. thermophilus, all Cas proteins (Cas9, Cas1, Cas2, and Csn2) and tracrRNA are essential for spacer integration. The adaptation mechanisms of type I and type II are the most thoroughly characterized. They provide a model for our current understanding of spacer acquisition.

The Origin of Protospacer
Spacer acquisition begins with the detection of foreign genetic elements that are subsequently processed and integrated into the CRISPR array. In order to avoid auto-immunity, it is important that the adaptation machinery display a preference for foreign versus self DNA and/or that the activity of the adaptation machinery is enhanced by signals of (imminent) infection (see The Ecology and Regulation of CRISPR-Cas). A study in E. coli revealed that the degraded DNA fragments generated during the repair of double-stranded DNA (dsDNA) breaks (DSBs) are an important source of protospacers. The RecBCD repair complex is recruited to DSB sites, which are often found at replication forks. RecBCD unwinds and degrades the DNA until it reaches a crossover hotspot instigator (Chi) site. Sequences proximal to the Chi sites, as well as sites of replication fork stalling, were shown to be the protospacer sampling hotspots, suggesting that the RecBCD degradation fragments are captured by the adaptation machinery. The underrepresentation of Chi sites on foreign DNA compared to genomic DNA of E. coli allows RecBCD to degrade larger portions of the foreign genome and serves as a basis of preferential acquisition of non-self DNA. A similar mechanism was recently described in the type II-A system of S. pyogenes, where regions between exposed DNA ends and Chi sites were highly favored for spacer sampling. In phage DNA, sequences between the injected linear DNA ends and the closest Chi site are spacersampling hotspots. It has been demonstrated that the AddAB machinery (the Gram-positive paralogs of RecBCD) was necessary for efficient spacer acquisition and thus suggests a similar self- versus non-self-discrimination strategy as observed in E. coli. The reliance on other host proteins suggest that the adaptation machinery lacks an intrinsic ability to distinguish between self- and non-self-DNA. Indeed, overexpression of catalytically inactive Cas9, which abolishes interference and thus prevents auto-immunity, resulted in a surplus of genome-derived spacers over plasmid-derived spacers in the type II-A system of S. thermophilus. Considering that spacer integration is a rare event, a low acquisition rate might be a strategy to compensate for inefficient self- versus non-self-discrimination in order to reduce the chance of auto-immunity and/or allow beneficial horizontal gene transfer.

Protospacer Selection and Processing
In addition to preferential fragmentation of foreign DNA by the RecBCD/AddAB machinery, selection of specific protospacers by the adaptation machinery is often non-random. In type I and type II systems, the adaptation machinery selects protospacers with a PAM that is compatible with the interference machinery. Studies in E. coli showed that the Cas1-Cas2 complex is sufficient for PAM recognition; the Cas1 subunits preferably bind the PAM-complementary sequence. Moreover, type I-E Cas1-Cas2 prefers protospacers with 30 -single-stranded overhangs of at least 7 nt at both ends, showing that both PAM and structure affect protospacer selection. These dual-forked DNA substrates are likely derived from the partial re-annealing of the ssDNA fragments generated by RecBCD or by the interference machinery during interference-driven adaptation. Two Cas1 tyrosine wedges splay the dual-fork DNA and stabilize the 23-bp duplex. This positions the 30 overhangs near the active sites of the Cas1 dimers. Cas1 cleaves the 30 overhangs to generate a 33-nt product with a 30 OH on each overhang. Two nucleotides of the PAM-complementary sequence are removed in this process, thus preventing acquisition of spacers that would result in cleavage of the CRISPR array. The structure of the Cas1-Cas2 complex seems to serve as a molecular ruler that determines the protospacer size and thus prepares the protospacer for integration into the CRISPR array. Unlike type I-E, Cas1 and Cas2 alone are not sufficient for naive spacer acquisition in type II-A systems. Here, Cas9, Csn2 and tracrRNA are additional requirements. Cas9 selects protospacers that are adjacent to a PAM while random protospacers are selected when the PAM recognition domain of Cas9 is mutated. Cas9 catalytic activity is dispensable for protospacer acquisition, indicating that Cas9 is not involved in protospacer processing.

Spacer Integration
The CRISPR array is preceded by an AT-rich leader sequence. Spacer integration preferentially occurs at the leader end of the CRISPR array and thus keeps a chronological record of previous infections. The mechanism of protospacer integration has been studied in detail in the type I-E system of E. coli. In vitro studies showed that the mechanism by which Cas1-Cas2 integrates new spacers is similar to that of viral integrases and transposases. First, the 30 OH of the protospacer performs a nucleophilic attack at the target site and thus attaches to the 5' phosphate of the leader-proximal repeat. This process depends on the recognition of the leader-repeat boundary, which is specified through binding of the leader sequence by a CRISPR-independent protein called integration host factor (IHF). IHF sharply bends the DNA, which results in a U-shaped leader structure and favors recognition of the leader-repeat boundary by Cas1-Cas2 (Figure A; inset). In the second step, the 3' OH of the other protospacer strand is ligated to the opposite end of the first repeat. Important during this step are two inverted repeat motifs in the CRISPR repeat, which serve as anchors for the Cas1-Cas2 complex and determine the position of the second integration site (Goren et al., 2016). Upon complex binding, the repeat becomes distorted, which is crucial for making the second integration site accessible to Cas1. The incorporation of the new spacer in the correct orientation is ensured by the presence of the partial PAM on the protospacer. Though some PAM nucleotides are removed prior to integration, this likely occurs after binding of the acquisition complex to the leader-repeat junction, so directionality is preserved (Figure A; inset). Unlike type I-E, recognition of the leader-repeat end in type II-A is IHF independent and requires a short motif termed leader-anchoring site (LAS), which consists of 5 bp of the repeat-proximal leader end and is directly recognized by Cas1- Cas2. Interestingly, mutations in the LAS can lead to ectopic spacer integration within the CRISPR array. Although spacer acquisition is less effective in this case, the recognition of alternative anchoring sites gives the system flexibility to overcome alterations of the canonical LAS by integrating new spacer at an alternative anchoring site. However, spacer sequences within the CRISPR array provide less resistance against phages than leader-proximal spacers, likely due to the lower abundance of distally encoded crRNAs. After recognition of the LAS, the type II-A Cas1-Cas2 complex can conduct the first integration reaction at either end of the first repeat, although integration at the leader boundary is usually preferred. Structural data supporting this model were recently presented for the type II-A integration complex of Enterococcus faecalis. Here, terminal sequences on both sides of the repeat were shown to be sufficient but suboptimal for target recognition. Additional interactions of Cas1 with the first four repeat-proximal nucleotides of the leader, however, allow a more efficient interaction with the target and thus explaining the preference for first integration event at the leader side of the first repeat. The first reaction is characterized by generation of a half-site integration intermediate where only one strand of the protospacer is ligated to one end of the repeat. The second integration event depends on proper protospacer size, the recognition of the opposite repeat end, and bending of the repeat by the Cas1-Cas2 complex. In case these requirements are not fulfilled, full-site integration cannot occur and the acquisition complex presumably reverses the first integration reaction, or the half-site integration intermediate is removed by DNA repair proteins.

Cas Mosterd (2020): In recent years, steady progress has been made in our understanding of CRISPR-Cas systems, with the interference phase attracting the most attention. During this phase, a stretch of DNA (prespacer) is first captured by Cas proteins. Then, the Cas proteins and the bound prespacer are directed to the CRISPR locus, a repeat sequence is duplicated and the prespacer is integrated between the two repeats as a novel spacer. The cas1 and cas2 genes are essential and specific to the adaptation phase and both are relatively well conserved in almost all CRISPR-Cas systems. Other universal elements of the acquisition process are the leader sequence and the first repeat of the CRISPR locus, suggesting that some processes are likely conserved in all types.

Cas1 is a metal-dependent endonuclease (Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain) that cleaves single-stranded DNA, double-stranded DNA (dsDNA), and single-stranded RNA (ssRNA). Like Cas1, Cas2 is also a metal-dependent endonuclease. Cas1 and Cas2 form a complex consisting of a Cas2 dimer flanked by two Cas1 dimers (Cas14–Cas22). Within the complex, only the nuclease activity of Cas1 is essential for adaptation. Cas1 has a strong binding affinity for the CRISPR locus, as it is involved in the integration of new spacers. In the absence of Cas2, Cas1 is incapable of binding to the CRISPR locus, making both proteins essential during adaptation. Cas2 functions as an adaptor protein, bridging the Cas1 proteins as well as binding and stabilizing the prespacer DNA.

In addition, a CRISPR repeat and a leader sequence are required for the integration of a novel spacer. The adenine - thymine AT-rich leader sequence is typically located upstream of the CRISPR locus, which often contains the promoter that directs transcription of the CRISPR locus into pre-crRNA. Repressor proteins silence transcription in certain species, whereas in other species, expression of the CRISPR operon can be constitutive and upregulated during phage infection. In addition to its function in transcription, the leader sequence has another role within the integration process, as the integration of novel spacers usually occurs at the leader end of the CRISPR locus. Indeed, the leader sequence and the flanking repeat sequence harbor recognition signals with the ability to direct the Cas1–Cas2 complex to this position. These recognition sites are located within 10 bp of the integration site in both the leader and repeat sequences.

The number of PAM sequences present on bacterial self-DNA is similar to that of foreign DNA, and therefore, the reason cannot be related to PAM prevalence. It has been shown that one of the sources of protospacers for the Cas1–Cas2 complex are degradation products of the bacterial RecBCD complex, which processes double-stranded DNA breaks (DSBs), although this does not apply to all the CRISPR-Cas system types. The large majority of these DSBs occur at replication forks during DNA replication, and more are found on plasmids than on the bacterial chromosome. The RecBCD complex unwinds and degrades the DNA, starting at the DSB until it reaches a crossover hotspot instigator (Chi) site. Compared with plasmid and phage DNA, bacterial genomic DNA is rich in Chi sites and, therefore, is indirectly protected from spacer acquisition. Because phage DNA enters the cell as linear DNA, contains few Chi sites, and is highly replicative, it is presumably an easier target than chromosomal DNA for the RecBCD complex. This may explain why the CRISPR-Cas system preferentially acquires spacers from foreign DNA sources rather than its own. In type I systems, the degradation products from the activity of Cas3 during primed adaptation have also been demonstrated to function as prespacers. While the RecBCD complex is found only in Gram-negative bacteria, Gram-positive bacteria possess a highly similar mechanism in the form of the AddAB repair machinery. As with the RecBCD complex, the function of the AddAB machinery has been demonstrated to have a large impact on spacer acquisition.

Depending on the type of CRISPR-Cas system, additional factors are required to process the prespacers into spacers ready for integration. In type I-E systems, Cas2 is fused to a DnaQ domain, which degrades prespacers at the 3′ end to generate suitable spacers for integration. A similar role has been proposed for Cas4, present in types I, II, and V systems. Cas4 cuts the 3′ overhangs of prespacer DNA until a bound Cas14–Cas22 complex is encountered to generate spacers of specific length and a correct PAM. As such, Cas4 prevents the integration of nonfunctional spacers with an inappropriate length or PAM. The function of Cas4 and DnaQ can also be performed by non-CRISPR nucleases in other systems.

1. Jennifer A Doudna et al, : Cas1-Cas2 complex formation mediates spacer acquisition during CRISPR-Cas adaptive immunity 2014 Jun;21
2. Cas Mosterd et al.: A short overview of the CRISPR-Cas adaptation stage 19 June 2020

Last edited by Otangelo on Wed Aug 24, 2022 9:23 am; edited 4 times in total

Otangelo

Otangelo

Admin

Posts : 9704
Join date : 2009-08-09
Age : 58
Location : Aracaju brazil

https://news.berkeley.edu/2016/12/22/compact-crispr-systems-found-in-some-of-worlds-smallest-microbes/

https://cdnsciencepub.com/doi/full/10.1139/cjm-2020-0212?af=R

Otangelo

Otangelo

Admin

Posts : 9704
Join date : 2009-08-09
Age : 58
Location : Aracaju brazil

Deep in the ocean, in our own gut, and everywhere in between, phage viruses infect all sorts of bacteria. Phages carry around DNA (or RNA) genomes as blueprints for their assembly and spread. However, phages cannot reproduce on their own. They must take resources from bacteria during the process of infection and use these resources to produce more phages.

THE PHAGE INFECTION PROCESS
When a phage lands on a bacterium, it “locks onto” the bacterial surface using its legs or tail fibers. Then, the phage’s DNA travels from storage space in its head, down a long tube, the tail, and into the bacterial cell. After a phage injects its DNA into a bacterium, the bacterium’s own protein machinery replicates the phage genome. The bacterium then follows the instructions in the phage DNA and generates more heads, tails, tail fibers, and other phage pieces. These “body parts” assemble to form new phages. Once the phages reach a critical mass, they explode out of or lyse the bacterium and escape into the environment. This is similar to a balloon popping. Each time a new bacterial cell is infected, a new swarm of phages is produced. The infection can thereby spread exponentially, quickly taking over communities of bacteria. This is known as the lytic cycle. In contrast, so-called lysogenic phages hide their DNA inside an infected bacterium’s genome. They lie in wait and only produce more phages at a later time when conditions are right.

HOW DO BACTERIA DEFEND AGAINST VIRUSES?
If phages are so good at killing bacteria, how are there any bacteria left on Earth? Bacteria have a whole range of tricks to fend off phages. Each of their defense systems works a bit differently. Perhaps the most obvious defense is to prevent phages from landing on bacterial cells in the first place. The bacterial cell membrane is covered with different receptors, proteins, or groups of proteins that recognize molecules from the outside environment and transmit signals into the bacterium. Phages have taken advantage of bacterial receptors for their own purposes. They attach to their bacterial hosts by sticking to receptors in a process called adsorption. Some bacteria keep phages from binding to their surfaces by altering the structure of their receptors or introducing physical barriers to prevent attachment. Alternatively, some bacteria block phage DNA from entering the cytoplasm (the cell interior). In order to inject their DNA into a bacterial cell, phages must penetrate the cell membrane. By adding new proteins into this membrane, bacteria can clog the phage entryway and block DNA injection. When phage DNA makes it into the cytoplasm, some bacteria can prevent the genome from being replicated, while others prevent replicated phage genomes from being loaded into capsids and bursting out of the cell. Many bacteria deploy restriction-modification (RM) systems to destroy phage DNA that is injected into the cell. These defense systems are composed of scissor-like proteins called restriction enzymes. These enzymes cut phage DNA apart, thereby destroying the instructions for making more phages.

To prevent their own DNA from being damaged by restriction enzymes, bacteria add protective chemicals called methyl groups to their genomes. Restriction enzymes ignore methylated DNA and don’t cut it up. Thus, methylation keeps the bacterial genome safe. If a phage manages to bypass all these safeguards, the bacterium’s last line of defense is cell suicide. This “altruistic” act kills the individual bacterium but prevents the production of more phage copies that could go on to infect neighboring cells. One common version of this process is known as abortive infection.

BEYOND INNATE DEFENSES
The defense systems described above are considered innate defenses. This means that they generally evolve slowly, act quickly during infection, and defend against phages in general rather than against any one specific phage. Almost every bacterium has some form of innate defense.

About half of bacteria also have an adaptive immune mechanism called a CRISPR-Cas system, which defends against specific types of phages. This system can adapt, rapidly generating immunity against new phage challengers.

The CRISPR immune system
Bacteria remember and fend off specific phage threats through CRISPR-Cas immunity. Bacteria use a variety of defenses to block attacks from phages, including a unique form of adaptive immunity known as a CRISPR-Cas system. These systems are found in about half of all bacterial species and nearly all archaea, though they may be less common in microbes. CRISPR systems work by capturing small pieces of invading phage DNA, stashing them in the host cell’s genome, and using these molecular memories to find and destroy matching phages.

Origin of CRISPR-Cas molecular complexes of prokaryotes Crispr25

Architecture of a CRISPR locus
Generally, an intact CRISPR locus will include a set of cas genes next to an array of CRISPR repeats and spacers that have been captured from phage DNA.

Repeats — These are segments of DNA that all have the same sequence. They're generally around 30 base pairs long. The repeats are called "palindromic" in the CRISPR acronym because they usually contain a stretch of bases followed by the complementary bases of the same sequence, in reverse order. The terminology comes from palindromes in language: words or phrases that can be read the same way forward and backward, like "race car." This internal complementarity is important because it means that most CRISPR repeats fold into a structure called a stem-loop or hairpin once they're transcribed into RNA.

Spacers — These sit between the repeats. They are small segments of phage DNA, each one different from the next. These are the "molecular memories" described above. Spacers are remnants from previous phage infections and they get passed down through generations of bacteria. This means that an individual cell doesn’t have to have been infected itself in order to be immune from a phage.

In most cases, you'll find a set of CRISPR-associated, or cas, genes right next to the repeat-spacer array. These genes encode Cas proteins, which carry out all the activities needed for immunity. Besides fending off phages, CRISPR systems can also block other invasive nucleic acids, like plasmids.

There are a variety of different CRISPR systems, but they all defend bacteria by following three basic steps: spacer acquisition, CRISPR RNA biogenesis, and interference. Let’s go through each step in a little more detail.

Origin of CRISPR-Cas molecular complexes of prokaryotes Ewrewe10

CRISPR is an adaptive immune system found in bacteria and archaea. Cas proteins store pieces of phage DNA as a memory of infection. Other Cas complexes use these memories as guides to find and destroy matching phage genomes to stop subsequent infection.

SPACER ACQUISITION
CRISPR systems work by remembering phage infections, so the first step of CRISPR immunity is making a memory. When a phage infects a bacterium equipped with a CRISPR-Cas system, a new spacer sequence can be added to the CRISPR array. This proceeds through a process called acquisition or adaptation.

In this process, two Cas proteins, Cas1 and Cas2, work together to capture pieces of injected phage DNA. The Cas1–Cas2 complex acts as a ruler and works with accessory proteins to measure and cut out a precisely-sized piece of DNA. The complex holds onto the phage DNA and inserts it into one end of the CRISPR array as a new spacer. The system incorporates a new repeat in the process, so all spacers are always flanked by repeats on each side. The bacterium thereby stores a memory of the phage.

This step is akin to a vaccination — you can imagine the repeat-spacer array is like a vaccination card that shows all the phages against which the bacterium is immune.

Origin of CRISPR-Cas molecular complexes of prokaryotes Crispr26

Step 1: Spacer acquisition
The first step in establishing CRISPR immunity is to capture a short snippet of an invading phage's DNA.

CRISPR RNA BIOGENESIS
CRISPR systems store these memories of phage infection in their DNA, but don’t use the DNA to recognize subsequent phage infections directly. Instead, they convert the DNA memories into RNA. Bacteria make many RNA copies of the phage memories and use them to find newly invading phages. It is useful to make these RNA "working copies" because they can be broken down and recycled without destroying the original, permanent memory.

The biological term for this process is CRISPR RNA biogenesis. It involves transcribing the whole CRISPR array of repeats and spacers into a long piece of RNA called a precursor CRISPR RNA (pre-crRNA). Typically, Cas proteins then cut or 'cleave' the long RNA into short, individual segments. These contain one spacer and parts of the repeats. The final, fully trimmed RNA copies of the phage memories are called CRISPR RNAs (crRNAs). Next, Cas proteins will use each crRNA to hunt for and positively identify matching phage DNA.

Origin of CRISPR-Cas molecular complexes of prokaryotes Crispr27

Step 2: crRNA biogenesis and processing
The CRISPR array is transcribed into one long RNA molecule and then sliced into shorter pieces that each contain a single spacer.

INTERFERENCE
The final step in CRISPR immunity leads to the destruction of invading phage DNA. This process is called interference, because it involves the CRISPR system "interfering with" or stopping the phage's life cycle. Interference starts with a single Cas protein or a group of Cas proteins grabbing onto or binding one crRNA. Together, the Cas protein(s) plus crRNA form what is known as the search complex, effector complex, or what we'll call it here — the surveillance complex. (When multiple molecules of protein, RNA, or other components stick to each other and work together, they're called a complex.)

When a phage injects its DNA into a bacterial cell, the surveillance complex scouts the phage’s genome for a sequence that matches the spacer in its crRNA. This target sequence is called the protospacer. Because the crRNA is a copy of the original phage DNA, it is complementary to, and can form base pairs with, one strand of the DNA injected by another phage of the same species. Base pairs between RNA and DNA look very similar to the ones between each strand of DNA in a normal double helix.

The surveillance complex unzips the phage DNA and checks to see if the crRNA can base-pair with one strand. If the whole spacer can base-pair, it means the surveillance complex has found the target it's been searching for, so it cleaves the phage DNA to destroy it. If the surveillance complex searches through all the DNA in the cell but doesn't find a match, it doesn't make any cuts. CRISPR surveillance complexes destroy DNA by cutting it apart. They make different kinds of cuts depending on the CRISPR-Cas system. Some systems make a single break in each strand. Some cut both the target and any nearby RNA or DNA. Still others chew the target up bit by bit, like a lawnmower. After the phage DNA is cut at least once, other proteins inside the cell help break down the rest.

Since the phage’s replication instructions are destroyed, the infection is over. Thanks to its CRISPR defense system, the bacterium survives. In summary, the CRISPR system protected the cell by remembering an old phage infection and using that memory to recognize and stop the same type of phage when it attacked again.

Origin of CRISPR-Cas molecular complexes of prokaryotes Crispr29

Step 3: Spacer acquisition
Surveillance complexes composed of one or more Cas proteins carry crRNA guides and search for complementary targets to destroy. When a phage's genome is cut, it cannot replicate and infection is stopped.

https://www.youtube.com/watch?v=Aqw4DihmoQY

Prespacer - The acquisition machinery selects a segment of phage DNA called a prespacer and integrates it into the bacterial CRISPR array.
Spacer - Once the phage DNA is incorporated into the CRISPR array, it is called a spacer. After a crRNA is transcribed from the array, the phage-derived sequence of the RNA is also called a spacer.
Protospacer - The stretch of DNA in the phage genome that is complementary to the crRNA and cleaved by the surveillance complex is called the protospacer. The PAM is a protospacer-adjacent motif, meaning a short sequence right next to the surveillance complex's target.

Avoiding self-targeting via the PAM
The protospacer adjacent motif (PAM) prevents CRISPR enzymes from cutting the repeat-spacer array

WHAT IS A PAM AND WHY IS IT IMPORTANT?
During CRISPR defense, bacteria use Cas proteins to destroy invading phage DNA. They identify that phage DNA by comparing it to RNA copies of the same sequence stored in their genome's CRISPR array. Thus, you might be wondering, why don’t Cas proteins end up cutting the spacers in the array?

Recognizing the difference between “self” DNA (the bacterial genome) and “non-self” DNA (foreign DNA like a phage genome) is essential because bacteria typically die when their genomes are damaged. CRISPR systems are able to avoid harmful destruction of the bacterium's own genome by relying on short DNA sequences called protospacer adjacent motifs, or PAMs. These are found next to CRISPR targets in phage DNA but never in bacterial CRISPR arrays. As a result, Cas proteins cut PAM-containing, phage DNA, but not PAM-less bacterial DNA.

We'll explain this in more detail using the CRISPR-Cas9 system as an example.

Origin of CRISPR-Cas molecular complexes of prokaryotes Crispr28

Avoiding self-targeting via PAM sequences
Surveillance complexes will only cut DNA targets containing short sequences called PAMs. In the Type II-A system shown here, Cas9 guides the acquisition proteins (Cas1, Cas2, and Csn2) to select new spacers from DNA right next to a PAM (in this case, the sequence "NGG") in the phage genome. In contrast, the repeats in CRISPR arrays do not contain the PAM. Thus, despite the complementarity between the crRNA guide and spacer, Cas9 will not bind or cut within the array.

THE CRISPR-CAS9 SYSTEM AND ITS PAM
Cas9 works by searching for any DNA that is complementary to its CRISPR RNA (crRNA) guide. Once Cas9 finds such DNA, it cuts the DNA apart. Since the bacterial CRISPR array encodes the crRNAs, they are always going to be complementary. The PAM requirement is what prevents Cas9 from cutting the array. But how?

Before testing whether its crRNA can base-pair with a stretch of DNA, Cas9 first checks for its specific PAM sequence: GG, two guanine bases. If there’s no PAM, Cas9 floats away. Even if the crRNA contains a perfect, complementary match to a DNA sequence, Cas9 will not find it without first seeing a PAM. When Cas9 successfully finds a PAM, it then checks to see if the adjacent DNA matches its crRNA guide. If it does, then Cas9 knows it’s okay to cut. The repeat-spacer array does not contain PAM sequences, so Cas9 does not cut it.

Origin of CRISPR-Cas molecular complexes of prokaryotes Crispr30

The SpyCas9 PAM is NGG
The Cas9 protein from Streptococcus pyogenes recognizes the PAM, "NGG." The "N" means any genetic letter (A, T, G, or C). Basically, as long as there's a "GG" one nucleotide away from the target, Cas9 will bind to it. Other Cas9s and other CRISPR systems have their own PAMs.

How is it that there's always a PAM next to the intended target in the phage genome? The CRISPR system makes sure of this during the first step of CRISPR immunity, acquisition. When a phage injects its DNA into a bacterium, Cas1, Cas2, and another protein called Csn2 work together with Cas9 to choose appropriate pieces of the phage DNA, called prespacers, to insert into the CRISPR array. Indeed, these proteins only choose prespacers that are next to PAMs. The diligent activities of Cas1, Cas2, Csn2, and the PAM requirement enable bacteria to defend themselves from phages while avoiding having their own DNA cleaved.

CRISPR diversity CRISPR systems all work a little bit differently
Bacteria have a wide variety of CRISPR systems to defend themselves. At its simplest, a CRISPR-Cas system is composed of a set of cas genes and a CRISPR array. Microbes can have different combinations of cas genes, and each unique set is defined as a different CRISPR system. The Cas proteins that make up the system are like a team, each with unique roles that work together to accomplish a goal. Imagine a sports league — everyone's playing the same game, but not every team is the same. All CRISPR systems contain the cas1 gene. cas1 encodes the Cas1 protein, which adds new spacers to the CRISPR arrays so bacteria can fend off new phages. A CRISPR array can have anywhere between one spacer and hundreds of spacers. What varies between systems? For just about every aspect of CRISPR immunity, nature has come up with a few variations. Let’s consider each step of CRISPR immunity, one by one.

DIFFERENCES IN ACQUISITION, CRISPR RNA BIOGENESIS, AND SURVEILLANCE COMPLEXES
Acquisition - This step tends to be pretty similar across systems, but Cas1 and Cas2 get help from a variety of additional proteins that vary by system, including other Cas proteins and sometimes host proteins that aren't encoded next to the cas genes and play other roles in the cell.

crRNA biogenesis - Various Cas proteins cut pre-crRNAs into mature crRNAs. Sometimes there are dedicated Cas proteins for this job. Sometimes the same protein that accomplishes the interference step also processes its crRNA. Other times, unrelated bacterial proteins participate too. The make-up of the processed crRNA guides can vary between systems as well. crRNAs are typically a single strand of RNA that folds into a hairpin or stem-loop structure at one end. However, the presence, size, shape, and location of the hairpin within the RNA vary. Sometimes crRNAs also need additional RNAs to fully function. For example, the common CRISPR-Cas9 system uses both a crRNA and an additional piece of RNA called a tracrRNA.

Interference - This step features the most variation between systems and serves as the basis for their classification. CRISPR-Cas systems are divided into two main classes. Class 1 systems have a large, multi-protein surveillance complex that finds nucleic acid targets. Sometimes that complex cuts the target directly, and sometimes it activates a separate cutting protein, called a nuclease, or multiple separate nucleases, for destruction. Class 2 systems, on the other hand, use a single protein to find and cleave targets. Each of these classes is further divided into types and subtypes that vary based on which set of cas genes they use.
While we mostly discuss the DNA-targeting Cas9 system in CRISPRpedia, some Cas enzymes cleave RNA, and some systems destroy both DNA and RNA. Different systems usually recognize unique PAMs.

Perhaps the greatest variability comes from how CRISPR systems destroy nucleic acids. The Cas3 enzyme, found in type I systems, moves along target DNA, chopping it up like a lawn mower. Some surveillance complexes cut the same target in multiple places. Other CRISPR enzymes, like Cas9, cut just once. Finally, some CRISPR systems don’t limit cutting to their matching target but instead, once a target has been identified, begin cutting up any nearby pieces of DNA or RNA too.

Origin of CRISPR-Cas molecular complexes of prokaryotes Crispr31

Variations between CRISPR-Cas systems CRISPR systems come in a variety of forms. They all have similar parts and can all cut genetic material. Yet, they differ in the types of genetic material they cut, the make-up of the components that do the cutting (effectors), and the ways they cut.

CRISPR SYSTEM VARIETY LETS BACTERIA FEND OFF DIVERSE THREATS

https://innovativegenomics.org/crisprpedia/crispr-in-nature/

Otangelo

Otangelo

Admin

Posts : 9704
Join date : 2009-08-09
Age : 58
Location : Aracaju brazil

CRISPR-Cas molecular complexes of prokaryotes

In 2014, science daily reported: "The structure of this biological machine is conceptually similar to an engineer's blueprint, and it explains how each of the parts in this complex assemble into a functional complex that efficiently identifies viral DNA when it enters the cell," Wiedenheft said. "This surveillance machine consists of 12 different parts and each part of the machine has a distinct job. If we're missing one part of the machine, it doesn't work." 28

Comment: That is the description of an irreducibly complex machine, where all parts must be there, in order for the machine to work, and if one is missing, the function of the machine breaks.

Devashish Rath (2015): The CRISPR-Cas mediated defense process can be divided into three stages.

1. Adaptation or spacer acquisition,where a short fragment of invading DNA is inserted into the CRISPR locus for future recognition of that invader;
2. crRNA biogenesis ( expression), which involves the biogenesis of guide RNA units (crRNA)
3. Target interference where these effector complexes vigilantly scan for and degrade invading genetic material previously identified by—and integrated into—the CRISPR-Cas system

The first stage, adaptation, leads to the insertion of new spacers in the CRISPR locus. In the second stage, expression, the system gets ready for action by expressing the cas genes and transcribing the CRISPR into a long precursor CRISPR RNA (pre-crRNA). The pre-crRNA is subsequently processed into mature crRNA by Cas proteins and accessory factors. In the third and last stage, interference, target nucleic acid is recognized and destroyed by the combined action of crRNA and Cas proteins. 13

Comment: The Crispr-Cas system confers only an advantageous function if the 3 stages are fully developed, and operating in conjunction, as a team. The stages individually provide no function, and neither are the individual proteins known to be used in other biological systems, from where they could have been co-opted. A common argument brought forward by those that argue that parts integrated in a higher order system could previously have been used somewhere else, and then recruited to be used somewhere else.

13. Devashish Rath: The CRISPR-Cas immune system: Biology, mechanisms and applications October 2015

28. Sciencedaily: Structure of molecular machine that targets viral DNA for destruction determined August 7, 2014

Sponsored content

Origin of CRISPR-Cas molecular complexes of prokaryotes

1 Origin of CRISPR-Cas molecular complexes of prokaryotes Mon Jul 25, 2022 2:45 pm

Otangelo

2 Re: Origin of CRISPR-Cas molecular complexes of prokaryotes Mon Aug 08, 2022 4:18 pm

Otangelo

3 Re: Origin of CRISPR-Cas molecular complexes of prokaryotes Wed Aug 17, 2022 12:50 pm

Otangelo

4 Interference: Cleaving DNA and RNA Invaders Wed Aug 17, 2022 1:16 pm

Otangelo

5 Re: Origin of CRISPR-Cas molecular complexes of prokaryotes Wed Aug 17, 2022 6:09 pm

Otangelo

6 Re: Origin of CRISPR-Cas molecular complexes of prokaryotes Mon Aug 22, 2022 10:01 am

Otangelo

7 Re: Origin of CRISPR-Cas molecular complexes of prokaryotes Mon Aug 22, 2022 11:21 am

Otangelo

8 Re: Origin of CRISPR-Cas molecular complexes of prokaryotes Tue Aug 23, 2022 10:29 pm

Otangelo

9 Re: Origin of CRISPR-Cas molecular complexes of prokaryotes Wed Sep 14, 2022 9:39 am

Otangelo

10 Re: Origin of CRISPR-Cas molecular complexes of prokaryotes Tue Oct 18, 2022 3:04 pm

Otangelo

11 Re: Origin of CRISPR-Cas molecular complexes of prokaryotes

Sponsored content