ElShamah Ministries: Defending the Christian Worldview and Creationism
Would you like to react to this message? Create an account in a few clicks or log in to continue.
ElShamah Ministries: Defending the Christian Worldview and Creationism

Otangelo Grasso: This is my personal virtual library, where i collect information, which leads in my view to the Christian faith, creationism, and Intelligent Design as the best explanation of the origin of the physical Universe, life, and biodiversity

You are not connected. Please login or register

The various codes in the cell

Go down  Message [Page 1 of 1]

1The various codes in the cell Empty The various codes in the cell Thu Oct 22, 2015 12:07 pm



The various codes in the cell


M.Barbieri: The genetic code is at the centre of life but it is not the only code that exists in living systems. Any organic code is a mapping between two independent worlds, and requires molecular structures that act like adaptors, i.e., that perform two independent recognition processes. The adaptors are required because there is no necessary connection between two independent worlds, and a set of rules is required in order to guarantee the specificity of the connection. The adaptors, in short, are essential in all organic codes. They are the molecular fingerprints of the codes, and their presence in a biological process is a sure sign that that process is based on a code. In splicing and in signal transduction, for example, I have shown (in 2003) that there are true adaptors at work, and that means that those processes are based on splicing codes and on signal transduction codes. 2

Code Biology: the study of all Codes of Life

Biological cells depend not only on the physical matter but essentially on pre-programmed information and at least a twenty eight informational code and language systems which are used to host and store that information, in order to arrange and produce the complex cellular structures essential for life, and keep the life-essential functions, that is the reproduction, metabolism. food uptake, intracellular organizational arrangement, growth and development, permanence, change, and adaptation.   Living cells host multiple kinds of informational code systems which are used to store complex instructional/specifying information ( CSI ).  All code systems, languages, information, and translation systems can be tracked back to an intelligent origin.  Evolution is not a driving force to explain the origin of cells and its language systems and programmed information content. Nor does physicochemical attraction explain the arrangement of nucleotides, molecules and amino acids resulting in the formation of complex molecular machines and intracellular molecular production lines.  The only alternative to intelligence is random self-assembly by unguided lucky events. Random chaotic events are however too unspecific to explain the extremely organized, controlled, error check and repair mechanisms, and factory-like production systems that cells host.  Therefore, biological cells, cell code systems and the coded information ( CSI ), have most probably a mind as the causal origin.

1. Regulation, governing, controlling, recruiting, interpretation, recognition, orchestrating, elaborating strategies, guiding, instruct are all tasks of the gene regulatory network.
2. Such activity can only be exercised if no intelligence is present if the correct actions were pre-programmed by intelligence.
3. Therefore, most probably, the gene regulatory network was programmed by an intelligent agency.

Outstanding implication of the existence of organic codes in Nature comes from the fact that any code involves meaning and we need therefore to introduce in biology, with the standard methods of science, not only the concept of biological information but also that of biological meaning. The study on the organic codes, in conclusion, is bringing to light new mechanisms that operated in the history of life and new fundamental concepts. It is an entirely new field of research, the exploration of a vast and still largely unexplored dimension of the living world, the real new frontier of biology.

The irreducible interdependence of information generation and transmission systems
1. Codified information transmission system depends on: 
a) A language where a symbol, letters, words, waves or frequency variations, sounds, pulses, or a combination of those are assigned to something else. Assigning meaning of characters through a code system requires a common agreement of meaning. Statistics, Semantics, Synthax, and Pragmatics are used according to combinatorial, context-dependent, and content-coherent rules. 
b) Information encoded through that code,
c) An information storage system, 
d) An information transmission system, that is encoding, transmitting, and decoding.
e) Eventually translation ( the assignment of the meaning of one language to another )
f)  Eventually conversion ( digital-analog conversion, modulators, amplifiers)
g) Eventually transduction converting the nonelectrical signals into electrical signals
2. In living cells, information is encoded through at least 30 genetic, and almost 30 epigenetic codes that form various sets of rules and languages. They are transmitted through a variety of means, that is the cell cilia as the center of communication, microRNA's influencing cell function, the nervous system, the system synaptic transmission, neuromuscular transmission, transmission b/w nerves & body cells, axons as wires, the transmission of electrical impulses by nerves between brain & receptor/target cells, vesicles, exosomes, platelets, hormones, biophotons, biomagnetism, cytokines and chemokines, elaborate communication channels related to the defense of microbe attacks, nuclei as modulators-amplifiers. These information transmission systems are essential for keeping all biological functions, that is organismal growth and development, metabolism, regulating nutrition demands, controlling reproduction, homeostasis, constructing biological architecture, complexity, form, controlling organismal adaptation, change,  regeneration/repair, and promoting survival. 
3. The origin of such complex communication systems is best explained by an intelligent designer. Since no humans were involved in creating these complex computing systems, a suprahuman super-intelligent agency must have been the creator of the communication systems used in life. 

1.  The 31 Genetic Codes 
2.  The Acoustic codes
3.  The Adhesion Code
4.  The Apoptosis Code
5.  The Bioelectric Code
6.  The Biophoton Code
7.  The Calcium Code
8.  The Chaperone Code
9.  The Chromatin Code
10. The Circular motif ( ribosome) Code
11. The Cytoskeleton Code
12. The Coactivator/corepressor/epigenetic Code
13. The Code of human language
14. The compartment Code
15. The Hidden Code within the Genetic Code
16. The DNA methylation Code
17. The Differentiation Code
18. The Domain substrate specificity Code of Nonribosomal peptide synthetases (NRPS)
19. The Error correcting Code
20. The Genomic Code
21. The Genomic regulatory Code
22. The Glycomic Code
23. The Histone Code
24. The HOX Code
25. The immune response code, or language
26. The Lamin Code
27. The Metabolic Code
28. The Myelin Code
29. The Neuronal spike-rate Code
30. The Non-ribosomal Code
31. The Nucleosome Code
32. The Nuclear signalling Code
33. The Olfactory Code
34. The Operon Code
35. The Phosphorylation Code
36. The Post-translational modification Code for transcription factors
37. The RNA Code
38. The Ribosomal Code
39. The Riboswitch Code
40. The Splicing Codes
41. The Signal transduction Code
42. The Signal Integration Codes
43. The Sugar Code
44. The Synaptic Adhesive Code
45. The Talin Code
46. The Transcription factor Code
47. The Transcriptional cis-regulatory Code
48. The Tubulin Code
49. The Ubiquitin Code


The transcription factor code: defining the role of a developmental transcription factor in the adult brain.
For the human brain to develop and function correctly, each of its 100 billion neurons must follow a specific and pre-programmed code of gene expression. This code is driven by key transcription factors that regulate the expression of numerous proteins, moulding the neurons identity to create its unique shape and electrical behaviour.

Unraveling a novel transcription factor code determining the human arterial-specific endothelial cell signature
Our pioneering profiling study on freshly isolated ECs unveiled a combinatorial transcriptional code that induced an arterial fingerprint more proficiently than the current gold standard, HEY2, and this codeconveyed an in vivo arterial-like behavior upon venous ECs.

The transcriptional regulatory code of eukaryotic cells--insights from genome-wide analysis of chromatin organization and transcription factor binding.
The term 'transcriptional regulatory code' has been used to describe the interplay of these events in the complex control of transcription. With the maturation of methods for detecting in vivo protein-DNA interactions on a genome-wide scale, detailed maps of chromatin features and transcription factor localization over entire genomes of eukaryotic cells are enriching our understanding of the properties and nature of this transcriptional regulatory code.

The Splicing code
rigin and evolution of spliceosomal introns

The rna binding protein binding code
A compendium of RNA-binding motifs for decoding gene regulation

microRNA binding code
The code within the code: microRNAs target coding regions

The Glycan or Sugar Code
Biological information transfer beyond the genetic code: the sugar code

The non-ribosomal code
A allowed the identification of amino acid residues that play a decisive role in the coordination of the substrate and have lead to the concept of the so-called nonribosomal code, which allows the prediction of A-domain selectivity on the basis of its primary sequence

Coded information can always be tracked back to a intelligence, which has to set up the convention of meaning of the code, and the information carrier, that can be a book, the hardware of a computer, or  the smoke of a fire of a indian tribe signalling to another. All communication systems have an encoder which produces a message which is processed by a decoder. In the cell there are several code systems. DNA is the most well known, it stores coded information through the four nucleic acid bases. But there are several others, less known. Recently there was some hype about a second DNA code. In fact, it is essential for the expression of genes. The cell uses several formal communication systems according to Shannon’s model because they encode and decode messages using a system of symbols.  As Shannon wrote :

“Information, transcription, translation, code, redundancy, synonymous, messenger, editing, and proofreading are all appropriate terms in biology. They take their meaning from information theory (Shannon, 1948) and are not synonyms, metaphors, or analogies.” (Hubert P. Yockey,  Information Theory, Evolution, and the Origin of Life,  Cambridge University Press, 2005).

An organism’s DNA encodes all of the RNA and protein molecules required to construct its cells. Yet a complete description of the DNA sequence of an organism—be it the few million nucleotides of a bacterium or the few billion nucleotides of a human—no more enables us to reconstruct the organism than a list of English words enables us to reconstruct a play by Shakespeare. In both cases, the problem is to know how the elements in the DNA sequence or the words on the list are used. Under what conditions is each gene product made, and, once made, what does it do? The different cell types in a multicellular organism differ dramatically in both structure and function. If we compare a mammalian neuron with a liver cell, for example, the differences are so extreme that it is difficult to imagine that the two cells contain the same genome. The genome of a organism contains the instructions to make all different cells, and  the expression of either a neuron cell or liver cell can be regulated at many of the steps in the pathway from DNA to RNA to Protein. The most important imho is CONTROL OF TRANSCRIPTION BY SEQUENCESPECIFIC DNA-BINDING PROTEINS, called transcription factors or regulators. These proteins recognize specific sequences of DNA (typically 5–10 nucleotide pairs in length) that are often called cis-regulatory sequences.   Transcription regulators bind to these sequences, which are dispersed throughout genomes, and this binding puts into motion a series of reactions that ultimately specify which genes are to be transcribed and at what rate. Approximately 10% of the protein-coding genes of most organisms are devoted to transcription regulators. Transcription regulators must recognize short, specific cis-regulatory sequences within this structure. The outside of the double helix is studded with DNA sequence information that transcription regulators recognize: the edge of each base pair presents a distinctive pattern of hydrogen-bond donors, hydrogen-bond acceptors, and hydrophobic patches in both the major and minor grooves. The 20 or so contacts that are typically formed at the protein–DNA interface add together to ensure that the interaction is both highly specific and very strong.

These instructions are written in a language that is often called the ‘gene regulatory code’.  The preference for a given nucleotide at a specific position is mainly determined by physical interactions between the aminoacid side chains of the TF ( transcription factor ) and the accessible edges of the base pairs that are contacted.  It is possible that some complex code, comprising rules from each of the different layers, contributes to TF– DNA binding; however, determining the precise rules of TF binding to the genome will require further scientific research. So, Genomes contain both a genetic code specifying amino acids, and this regulatory code specifying transcription factor (TF) recognition sequences. We find that ~15% of human codons are dual-use codons (`duons') that simultaneously specify both amino acids and TF recognition sites. Genomes also contain a parallel regulatory code specifying recognition sequences for transcription factors (TFs) , and the genetic and regulatory codes have been assumed to operate independently of one another, and to be segregated physically into the coding and non-coding genomic compartments. the potential for some coding exons to accommodate transcriptional enhancers or splicing signals has long been recognized

In order for communication to happen, 1. The sequence of DNA bases located in the regulatory region of the gene is required , and 2. transcription factors that read the code. If one of both is missing, communication fails, the gene that has to be expressed, cannot be encountered, and the whole procedure of gene expression fails. This is a irreducible complex system. The gene regulatory code could not arise in a stepwise manner either, since if that were the case, the code has only the right significance if fully developed. Thats a example par excellence of intelligent design.. The fact that these transcription factor binding sequences overlap protein coding sequences, suggest that both sequences were designed together, in order to optimize the efficiency of the DNA code. As we learn more and more about DNA structure and function, it is apparent that the code was not just hobbled together by the trial and error method of natural selection, but that it was specifically designed to provide optimal efficiency and function.

 Stephen Meyer puts it that way in his excellent book: Darwins doubt pg.270:


Keep in mind, too, that animal forms have more than just genetic information. They also need tightly  integrated networks of genes, proteins, and other molecules to regulate their development—in other words, they require developmental gene regulatory networks, the dGRNs . Developing animals face two main challenges. First, they must produce different types of proteins and cells and, second, they must get those proteins  and cells to the right place at the right time.20 Davidson has shown that embryos accomplish this task by relying on networks of regulatory DNA-binding proteins (called transcription factors) and their physical targets. These physical targets are typically sections of DNA (genes) that produce other  proteins or RNA molecules, which in turn regulate the expression of still other genes.

These interdependent networks of genes and gene products present a striking appearance of design. Davidson's graphical depictions of these dGRNs look for all the world like wiring diagrams in an electrical engineering blueprint or a schematic of an integrated circuit, an uncanny resemblance Davidson himself has often noted. "What emerges, from the analysis of animal dGRNs," he muses, "is almost astounding: a network of logic interactions programmed into the DNA sequence that amounts  essentially to a hardwired biological computational device." These molecules collectively form a tightly integrated network of signaling molecules that function as an integrated circuit. Integrated circuits in electronics are systems of individually functional components such as transistors, resistors, and capacitors that are connected together to perform an overarching function. Likewise, the functional components of dGRNs—the DNA-binding proteins, their DNA target sequences, and the other molecules that the binding proteins and target molecules produce and regulate—also form an integrated circuit, one that contributes to accomplishing the overall function of producing an adult animal form. 

Davidson himself has made clear that the tight functional constraints under which these systems of molecules (the dGRNs) operate preclude their gradual alteration by the mutation and selection mechanism. For this reason, neo-Darwinism has failed to explain the origin of these systems of molecules and their functional integration. Like advocates of evolutionary developmental biology, Davidson himself favors a model of evolutionary change that envisions mutations generating large-scale developmental effects, thus perhaps bypassing nonfunctional intermediate circuits or systems. Nevertheless, neither proponents of "evo-devo," nor proponents of other recently proposed materialistic theories of evolution, have identified a mutational mechanism capable of generating a dGRN or anything even remotely resembling a complex integrated circuit. Yet, in our experience, complex integrated circuits—and the functional integration of parts in complex systems generally—are known to be produced by intelligent agents—specifically, by engineers. Moreover, intelligence is the only known cause of such effects. Since developing animals employ a form of integrated circuitry, and certainly one manifesting a tightly and functionally integrated system of parts and subsystems, and since intelligence is the only known cause of these features, the necessary presence of these features in developing Cambrian animals would seem to indicate that intelligent agency played a role in their origin 

The Calcium Code
Steady-state stomatal closure could be restored if calcium oscillations similar to wild type were imposed; thus, the cells have an intact downstream signaling pathway, but cannot initiate the proper calcium oscillation code to trigger the pathway.

Defined changes of cytosolic Ca2+ concentration are triggered by cellular second messengers, such as NAADP, IP3, IP6, Sphingosine-1-Phospate, and cADPR  and it is evident that the identity and intensity of a specific stimulus impulse results in stimulus-specific and dynamic alterations of cytosolic Ca2+ concentration.  This heterogeneity of increases in cytosolic-free Ca2+ ion concentration in terms of duration, amplitude, frequency, and spatial distribution lead A.M. Hetherington and coworkers to formulate the concept of “Ca2+ signatures”. Signal information would be encoded by a specific Ca2+ signature that is defined by precise control of spatial, temporal, and concentration parameters of alterations in cytosolic Ca2+ concentration.

The RNA code

In 2004, oncologist Gideon Rechavi at Tel Aviv University in Israel and his colleagues compared all the human genomic DNA sequences then available with their corresponding messenger RNAs — the molecules that carry the information needed to make a protein from a gene.

They were looking for signs that one of the nucleotide building blocks in the RNA sequence, called adenosine (A), had changed to another building block called inosine (I). This 'A-to-I editing' can alter a protein's coding sequence, and, in humans, is crucial for keeping the innate immune response in check. “It sounds simple, but in real life it was really complicated,” Rechavi recalls. “Several groups had tried it before and failed” because sequencing mistakes and single-nucleotide mutations had made the data noisy. But using a new bioinformatics approach, his team uncovered thousands of sites in the transcriptome — the complete set of mRNAs found in an organism or cell population — and later studies upped the number into the millions1.

Inosine is something of a special case: researchers can readily detect this chink in the armour by comparing DNA and RNA sequences. But at least one-quarter of our mRNAs harbour chemical tags — decorations to the A, C, G and U nucleotides — that are invisible to today's sequencing technologies. (Similar chemical tags, called epigenetic markers, are also found on DNA.) Researchers aren't sure what these chemical changes in RNA do, but they're trying to find out.

A wave of studies over the past five years — many of which focus on a specific RNA mark called N6-methyladenosine (m6A) — have mapped these alterations across transcriptomes and demonstrated their importance to health and disease. But the problem is vast: these marks coat not only mRNA but other RNA transcripts as well, and they cut across all the domains of life and beyond, marking even viruses with their presence.

The modifications themselves are not new. What has given them meaning and driven epitranscriptomics into the spotlight is the discovery of enzymes that can add, remove and interpret them. In 2010, chemical biologist Chuan He at the University of Chicago, Illinois, proposed that these chemical tags could be reversible and important regulators of gene expression. Not long afterwards, his group demonstrated2 the first eraser of these marks on mRNA, an enzyme called FTO. That discovery meant that m6A wasn't just a passive mark — cells actively controlled it. And this realization came at about the same time that global approaches, harnessing the power of next-generation sequencing, made it possible to map m6A and other modifications across the transcriptome.

David Coppedge In Life, Not One Code but Many May 19, 2022

Distinct responses to rare codons in select Drosophila tissues

1) http://www.garlandscience.com/res/pdf/9780815341291_ch08.pdf
2. Barbieri: Organic codes

Last edited by Otangelo on Fri Sep 23, 2022 2:02 pm; edited 66 times in total




The Hidden Codes That Shape Protein Evolution 1

Despite redundancy in the genetic code (1), the choice of codons used is highly biased in some proteins, suggesting that additional constraints operate in certain protein-coding regions of the genome. This suggests that the preference for particular codons, and therefore amino acids in specific regions of the protein, is often determined by factors unrelated to protein structure or function (2, 3).  Stergachis et al. (4) reveal that transcription factors bind within protein-coding regions (in addition to nearby noncoding regions) in a large number of human genes. Thus, a transcription factor “binding code” may influence codon choice and, consequently, protein evolution. This “binding” code joins other “regulatory” codes that govern chromatin organization (3), enhancers (5, 6), mRNA structure (7), mRNA splicing (3), microRNA target sites (6, 8), translational efficiency (9), and cotranslational folding (10), all of which have been proposed to constrain codon choice, and thus protein evolution (see the figure).

The various codes in the cell Emss-610

Constraining codes. Regulatory elements within protein-coding regions (such as transcription factor binding) can influence codon choice and amino acid preference that are independent of protein structure or function
Redundancy in the genetic code might facilitate the existence of multiple overlapping regulatory codes within protein-coding regions of the genome.

How widespread is the phenomenon of “regulatory” codes that overlap the genetic code, and how do they constrain the evolution of protein sequences? Stergachis et al. address these questions for the transcription factor–binding regulatory code. They use deoxyribonuclease I (DNase I) footprinting to map transcription factor occupancy (a protein bound to DNA can protect that region from enzymatic cleavage) at nucleotide resolution across the human genome in 81 diverse cell types. The authors determined that ~14% of the codons within 86.9% of human genes are occupied by transcription factors. Such regions, called “duons,” therefore encode two types of information: one that is interpreted by the genetic code to make proteins and the other, by the transcription factor–binding regulatory code to influence gene expression. This requirement for transcription factors to bind within protein-coding regions of the genome has led to a considerable bias in codon usage and choice of amino acids, in a manner that is constrained by the binding motif of each transcription factor.

To investigate whether single-nucleotide variants within duons affect transcription factor binding, Stergachis et al. mapped the known variants that are associated with a disease or a trait onto duons. Of those, 17.4% quantitatively skew the allelic origins of DNA fragments protected from cleavage by DNase I in human cells, suggesting that such single-nucleotide variants affect transcription factor occupancy. They also determined that such variants are not biased toward whether they result in synonymous or nonsynonymous changes in the protein sequence. Intriguingly, a large fraction of the variants that result in a nonsynonymous change are predicted not to alter protein function. This indicates that some variants within duons might primarily affect transcription factor binding instead. This supports the emerging idea that single-nucleotide variants within protein-coding regions can lead to disease without affecting protein structure or function (11, 12). Thus, the whole spectrum of “regulatory” codes within protein-coding regions should be considered when assessing the impact of single-nucleotide variants and interpreting disease mutation data from exome sequencing (only the protein-coding regions of the genome) and cancer genome studies.
Do the regulatory codes harmoniously coexist? Evidence is emerging that there can be conflicts. For example, in the fruit fly Drosophila melanogaster, there is a striking decrease in the use of codons that are optimal for translation, but a rise in codons that enhance RNA splicing, toward the end of exons (13). This may indicate that the requirement for accurate RNA splicing has superseded that for optimal translation. Likewise, Stergachis et al. observed that the binding motifs of transcription factors within protein-coding genomic regions are selectively devoid of sequences that contain a stop codon.
What features might permit synergistic coexistence of the regulatory and genetic codes? One major constraint of protein-coding genes is the requirement for the encoded polypeptide segment to fold into a defined tertiary structure. It is possible that in regions where folding constraints are not present, such as in intrinsically disordered regions (14), there might be increased tolerance for protein-coding genomic regions to harbor more regulatory elements that can be interpreted by different regulatory codes.
Stergachis et al. make a number of important genome-scale observations, but several mechanistic questions remain to be answered. For instance, although the authors report a weak tendency for transcription factors to preferentially bind to the protein-coding regions of highly expressed genes, it is unclear how the binding of a transcription factor within protein-coding regions mechanistically influences the expression of a gene. Perhaps this type of binding might result in alternative promoters with different transcriptional start sites or affect the expression of neighboring genes (by acting as a distal enhancer element, for example). It is also unclear whether binding of a transcription factor within a protein-coding region may not directly affect gene expression but instead determine the formation and maintenance of higher-order chromatin structure.
Future research will need to determine the number of overlapping codes that can be tolerated by the genetic code. There is also the question of possible trade-offs, in terms of maintaining regulation and functionality, that have been made to accommodate coexistence of codes and whether this can lead to nonoptimal or deleterious consequences. For instance, protein-coding regions that cannot tolerate mutations due to multiple overlapping codes may be exploited by pathogens during host infection. The investigation of overlapping codes opens new vistas on the functional interpretation of variation in coding regions and makes it clear that the story of the genetic code has not yet run its course.

1) http://www.sciencemag.org/content/342/6164/1325

Professor Moran's opinion on this : http://sandwalk.blogspot.com.br/2014/01/press-release-hyperbole-and-duon.html

Last edited by Admin on Tue Nov 24, 2015 1:36 pm; edited 1 time in total




Exonic transcription factor binding directs codon choice and impacts protein evolution1

The genetic code, common to all organisms, contains extensive redundancy, wherein most amino acids can be specified by 2–6 synonymous codons. The observed ratios of synonymous codons are highly non-random, and codon usage biases are fixtures of both prokaryotic and eukaryotic genomes (1). In organisms with short life spans and large effective population sizes codon biases have been linked to translation efficiency and mRNA stability (2–7). However, these mechanisms explain only a small fraction of observed codon preferences in mammalian genomes (7–11), which appear to be under selection (12),.
Genomes also contain a parallel regulatory code specifying recognition sequences for transcription factors (TFs) (13), and the genetic and regulatory codes have been assumed to operate independently of one another, and to be segregated physically into the coding and non-coding genomic compartments. However the potential for some coding exons to accommodate transcriptional enhancers or splicing signals has long been recognized (14–18).
To define intersections between the regulatory and genetic codes, we generated nucleotide-resolution maps of transcription factor occupancy in 81 diverse human cell types using genomic DNaseI footprinting (19). Collectively, we defined 11,598,043 distinct 6–40bp footprints genome-wide (~1,018,514 per cell-type), 216,304 of which localized completely within protein-coding exons (~24,842 per cell-type) (Fig. 1A–B, S1A, Table S1). ~14% of all human coding bases contact a TF in at least one cell type (avg. 1.1% per cell type; Figs. 1C, S1B) and 86.9% of genes contained coding TF footprints (avg. 33% per cell type) (Figs. S1C–D).

Figure 1
The various codes in the cell Nihms-14

TFs densely populate and evolutionarily constrain protein-coding exons
(A) Distribution of DNaseI footprints. (B) Per-nucleotide DNaseI cleavage and ChIP-seq signal for coding CTCF (left) and NRSF (right) binding elements. (C) Proportion of coding bases within DNaseI footprints in each of 81 cell types (left), or any cell type (right). (D) Average footprint density within first, internal, or final coding exons (mean +/− SEM; p-value, paired t-test, n.s.: p-value> 0.1). (E) PhyloP conservation at 4FDBs within and outside footprints. (F) Estimated mutational age at all (grey), synonymous (brown) and nonsynonymous (red) coding SNVs (European) within and outside footprints (p-values per (21)) (G) Structure of DNA-bound KLF4 vs. average per-nucleotide DNaseI cleavage and evolutionary constraint at KLF4 footprints. (H) Average per-nucleotide conservation at 4FDBs (brown) and NDBs (red) overlapping KLF4 (left) and NFIC (right) footprints. (r = Pearson correlation, conservation at promoter bases vs. 4FDBs (top) or NDBs (bottom)). (I)Evolutionary constraint imparted by 63 TFs at promoter elements, 4FDBs and NDBs (Pearson correlations).

The exonic TF footprints we observed likely underestimate the true fraction of protein-coding bases that contact TFs since (i) TF footprint detection increases substantially with sequencing depth (13), and (ii) the 81 cell types sampled, though extensive, is far from complete, as we saw little evidence of saturation of coding TF footprint discovery (Fig. S2).

Figure 2
The various codes in the cell Nihms-15
Transcription factors modulate global codon biases
(A) Proportions of all codons (grey), or codons outside of (yellow), or within (purple) footprints, that encode asparagine (top) or leucine (bottom). Note that codons with bias (AAC for asparagine and CTG for leucine) preferentially localize within footprints. (B) Preferential footprinting of biased codons, calculated as in (A) (p-values, Pearson's chi-squared test). (C) Preferential footprinting of each codon trinucleotide in coding vs non-coding regions (C = coding, NC = non-coding). (D) Difference in average evolutionary constraint at 3rd positions of biased codons outside vs. within footprints (p-values, Mann-Whitney test). (E) Proportions of amino acids encoded by CpG-containing codons among all codons (grey), codons outside footprints (yellow), or codons within footprints (purple)

To ascertain coding footprints more completely, we developed an approach for targeted exonic footprinting via solution-phase capture of DNaseI-seq libraries using RNA probes complementary to human exons (19). Targeted capture footprinting of exons from abdominal skin and mammary stromal fibroblasts yielded ~10-fold increases in DNaseI cleavage, equivalent to sequencing >4 billion reads per sample using conventional genomic footprinting (Fig. S3A), quantitatively exposing many additional TF footprints (Fig. S3B–D). Overall, we identified an average of ~175,000 coding footprints per cell type (Fig. S1E), 7-12-fold more than conventional footprinting.

Figure 3
The various codes in the cell Nihms-16
TFs exploit and avoid specific coding features

(A) Percentage of TF motifs occupied in coding vs. non-coding regions (p-values, paired t-test). (B) Density of NFYA (left), AP2 (middle) and SP1 (right) footprints relative to translated region of first coding exons. (C) (top) Density of YY1 footprints across first coding exons. (bottom) YY1 recognition sequence and corresponding amino acid sequence within YY1 footprints overlapping start codons. (D) (top left and bottom) For NRSF as per (C). (right, arrow) Protein domain annotation of first exon third-frame NRSF footprints vs. SP1 footprints. (E) TF preference (avoidance) of stop codon trinucleotides within vs. outside footprints in non-coding regions (p-values, Pearson's chi-squared test).

While coding sequences are densely occupied by TFs in vivo, the density of TF footprints at different genic positions varied widely, with many genes exhibiting sharply increased density in the translated portion of their first coding exon (Figs. 1D, S4A). By contrast, internal coding exons were as likely as flanking intronic sequences to harbor TF footprints (Fig.1D). The total number of coding DNaseI footprints within a gene was related both to the length of the gene, and to its expression level (Fig. S4B–D).

Figure 4
The various codes in the cell Nihms-17
Genetic variation in duons frequently alters TF occupancy

(A) Proportion of coding footprints overlapping a SNV in any of 81 cell-types. (B) Proportion of SNVs in duons that allelically alter TF occupancy. (C) (top) Per-nucleotide DNaseI cleavage at common nonsynonymous G→A SNV (rs8110393) in G/G and A/A homozygous cells. (bottom) Allelic SP1 occupancy in heterozygous (G/A) cells. (D) Proportion of synonymous and nonsynonymous variants in duons that allelically alter TF occupancy. (E–F) Proportion of nonsynonymous variants from (D) grouped by predicted impact of coding variant on protein function using (E) SIFT or (F) Polyphen-2. Note that none of the bins are significantly different (Fisher's exact test; n.s. indicates p-value > 0.1).

Given their abundance, we sought to determine whether exonic TF binding elements were under evolutionary selection. 4-fold degenerate coding bases are frequently used as a model of neutral (or nearly neutral) evolution (20), but may exhibit constraint when a functional signal impinges on coding sequence (11). Across the coding compartment, 4-fold degenerate bases (4FDBs) within TF footprints show significantly greater evolutionary constraint vs. non-footprinted 4FDBs (Figs. 1E, S5A–B), indicating that TF-DNA recognition constrains the third codon position.
To test for evolutionary constraint at coding footprints in modern human populations, we quantified the age of mutations arising within or outside of coding footprints using exome sequencing data from 4,298 individuals of European ancestry (Fig. S5C) and 2,217 individuals of African American ancestry (Fig. S5D) (21). This analysis revealed that mutations within coding footprints were on average 10.2% younger than those outside of footprints (Figs. 1F, S5E), signaling influence of coding TF elements on human fitness.
Strikingly, both synonymous and nonsynonymous mutations within coding footprints were significantly younger than those outside of footprints (Figs. 1F,S5E), indicating that coding TF binding constrains both codon and amino acid evolution. The genome-wide recognition sequence landscape of each TF has evolved to fit the molecular topography of its protein-DNA binding interface (13) (Fig. 1G). To study how specific TFs influence codon and amino acid choice at their recognition sites, we compared the per-nucleotide evolutionary conservation profiles of TF recognition sequences at non-coding, 4FDBs and non-degenerate coding bases (NDBs). For example, the conservation profiles at 4FBDs and NDBs at KLF4 and NFIC recognition sites closely mirror those of recognition sites in non-coding regions (promoter; Fig. 1H). As such, these TFs constrain both codon choice (via constraint on 4FDBs), and amino acid choice (via NDBs) encoded at their recognition sites. Analysis of conservation profiles for 63 TFs with prevalent occupancy within coding regions (19) showed that 73% constrain 4FDBs, and 51% constrain NDBs (Figs. 1I, S6, S7). Thus, individual TFs may influence both codon and amino acid choice.
To examine how TF binding relates to codon usage patterns, we examined -binding at preferred (biased) vs. non-preferred codons. For example, across all human proteins Asparagine is encoded by the AAC codon 52% of the time (vs. AAT, 48%), indicating a generalized 4% bias in favor of this codon. However, genome-wide, 60.4% of Asn codons within footprints are AAC, vs. only 50.8% outside of footprints (i.e., a 9.6% occupancy bias towards the preferred codon) (Fig. 2A). Strikingly, apart from Arginine (see below), for all amino acids encoded by two or more codons, the codon that is preferentially utilized genome-wide is also preferentially occupied by TFs (Fig. 2B, Table S2).
To determine whether preferential occupancy of biased codons is inherent to TF recognition sequences, we compared trinucleotide frequencies within coding vs. non-coding footprints. Trinucleotide combinations favored by TFs within coding sequence were equivalent to those favored in non-coding sequence (Fig. 3C), indicating that global TF binding preferences are directly reflected in the frequency of different codons. Notably, baseline trinucleotide frequencies within coding and non-coding sequence are largely independent of one another (Table S2). The fact that the third position of preferred codons overlapping footprints is under excess evolutionary constraint (Fig. 2D, Table S2) supports a general role for TFs in potentiating codon usage biases through the selective preservation of preferred codons.
While nearly all codon biases parallel TF recognition preferences genome-wide, Arginine, one of the 5 amino acids encoded by codons containing CpGs (4 out of 6 codons), was a notable exception. CpGs frequently occur in regulatory DNA (Table S2), yet have an elevated mutational rate (22). Consequently, although TFs may favor CpG-containing codons (Fig. 2E), and impart excess constraint thereto (Table S2), the higher mutational rate at such codons is likely incompatible with preferential utilization.
We note that codons outside footprints still exhibit usage biases (Fig. 2A andTable S2); however, it is likely that these biases also reflect the actions of TFs. Firstly, our conclusions above are drawn from a conservative and incomplete annotation of duons. Secondly, because TF trinucleotide preferences and codon biases have not changed substantially since the divergence of humans and mice (Fig. S8), preferences at any given codon may result from a TF binding element extant in some ancestral species to human. Third, codon usage bias can be exaggerated due to mutual reinforcement with other cellular factors such as tRNA abundances (23, 24). Indeed, such mechanisms could be linked to codon biases created by exonic TF occupancy through a feedback mechanism that potentiates intrinsic TF-imposed biases, resulting in both abundant and rare codons and associated tRNAs, differences in which could in turn affect protein synthesis and stability (25–27).
To analyze positional occupancy patterns of specific TFs within coding sequence, we systematically matched TF recognition sequences with footprints, providing an accurate measure of a TF's in vivo occupancy (13, 28). This analysis revealed that a subset of TFs selectively avoid coding sequences (Fig. 3A). Intriguingly, TFs involved in positioning the transcriptional pre-initiation complex, such as NFYA and SP1 (29), preferentially avoid the translated region of the first coding exon (Fig. 3A), and typically occupy elements immediately upstream of the methionine start codon (Figs. 3B, S9A). Conversely, TFs involved in modulating promoter activity, such as YY1 and NRSF, preferentially occupy the translated region of the first coding exon (30, 31) (Fig. 3A,C). These findings indicate that that the translated portion of the first coding exon may serve functionally as an extension of the canonical promoter.
More broadly, the repressor NRSF preferentially occupies and evolutionarily constrains sequences coding for leucine-rich protein domains, such as signal peptide and transmembrane domains (Figs. 3D, S9B,C). Also, TFs such as CTCF and SREBP1 preferentially occupy and constrain splice sites (Fig. S10A–D), which are otherwise generally depleted of DNaseI footprints (Fig. S10E). The above results suggest that specific protein structural and splicing features may undergo exaptation for specific regulatory purposes.
We also found that the occupancy of specific TFs within coding sequence parallels the extent of CpG methylation at their binding site (Fig. S11). This raises the possibility that gene body methylation, which is paradoxically extensive at actively transcribed genes (32, 33), may provide a tunable mechanism for thwarting opportunistic TF occupancy within coding sequence during transcription.
If TFs, through selective recognition sequences, could impose changes in protein sequence, deleterious consequences could arise if such changes resulted in a nonsense substitution. We observed that TFs generally avoid stop codons (Fig. S10E). Surprisingly, this finding extends to non-coding regions, where stop codon trinucleotides (TAA, TAG and TGA) are selectively depleted within footprints. This indicates that the global TF repertoire has been selectively purged of DNA binding domains capable of recognizing, and thus preferentially stabilizing, nonsense codons (Fig. 3E and S10F).
The high sequencing coverage provided by genomic footprinting revealed 592,867 heterozygous single nucleotide variants (SNVs) across the 81 cell type samples, and 3% of coding footprints harbored heterozygous SNVs (Fig. 4A). Functional SNVs that disrupt TF occupancy quantitatively skew the allelic origins of DNaseI cleavage fragments (13), and 17.4% of all heterozygous coding SNVs within footprints showed this signature (Figs. 4B, S12), including both synonymous and nonsynonymous variant classes (Fig. 4C). The potential of a coding SNV to disrupt overlying TF occupancy was independent of the class of variant (Fig. 4D), or whether a nonsynonymous variant was predicted to be deleterious to protein function (Fig. 4E–F).
Notably, 13.5% of common disease- and trait-associated SNVs identified by genome-wide associated studies (GWAS) (19) fall within duons (Fig. S13A). GWAS SNPs in duons encompass both synonymous (12%) and nonsynonymous (88%) substitutions (Fig. S13A), and may directly affect pathogenetic mechanisms (Fig. S13B–F, Table S3). As such, disease-associated variants within duons may compromise both regulatory and/or protein-structural functions. These findings have substantial practical implications for the interpretation of genetic variation in coding regions.
In summary, our results indicate that simultaneous encoding of amino acid and regulatory information within exons is a major functional feature of complex genomes. The information architecture of the received genetic code is optimized for superimposition of additional information (34, 35), and this intrinsic flexibility has been extensively exploited by natural selection. While TF binding within exons may serve multiple functional roles, we note that our analyses above is agnostic to these roles, which may be complex (36).

1) http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3967546/

Laurence Moran on the paper : http://sandwalk.blogspot.com.br/2014/01/the-duon-delusion-and-why-transcription.html


4The various codes in the cell Empty Re: The various codes in the cell Tue Nov 24, 2015 2:25 pm



The transcription factor code :

The transcription factor code: defining the role of a developmental transcription factor in the adult brain.
For the human brain to develop and function correctly, each of its 100 billion neurons must follow a specific and pre-programmed code of gene expression. This code is driven by key transcription factors that regulate the expression of numerous proteins, moulding the neurons identity to create its unique shape and electrical behaviour.

Unraveling a novel transcription factor code determining the human arterial-specific endothelial cell signature
Our pioneering profiling study on freshly isolated ECs unveiled a combinatorial transcriptional code that induced an arterial fingerprint more proficiently than the current gold standard, HEY2, and this codeconveyed an in vivo arterial-like behavior upon venous ECs. 

The transcriptional regulatory code of eukaryotic cells--insights from genome-wide analysis of chromatin organization and transcription factor binding.
The term 'transcriptional regulatory code' has been used to describe the interplay of these events in the complex control of transcription. With the maturation of methods for detecting in vivo protein-DNA interactions on a genome-wide scale, detailed maps of chromatin features and transcription factor localization over entire genomes of eukaryotic cells are enriching our understanding of the properties and nature of this transcriptional regulatory code.

Human Genes Encoding Transcription Factors and Chromatin-Modifying Proteins Have Low Levels of Promoter Polymorphism: A Study of 1000 Genomes Project Data
Genome-wide analysis of histone modifications revealed that, like transcription factors, each chromatin-remodelling protein can affect transcriptional level of thousands of genes, thereby orchestrating gene activity according to intracellular conditions or external stimuli [30].
Thus, both classes of proteins are involved in the complicated process of transcriptional control, ensuring correct expression of specific genes. Both so called “transcription factor-binding regulatory code” and “histonecode” may be effectively used for prediction of gene expression activity. Moreover, these codes are redundant for predicting gene expression

Last edited by Admin on Tue Nov 20, 2018 5:03 am; edited 1 time in total


5The various codes in the cell Empty The Splicing code Tue Nov 24, 2015 2:38 pm



The Splicing code


Breaking the second genetic code
splicing code’ is indeed breakable. One difficulty with understanding alternative pre-mRNA splicing is that the selection of particular exons in mature mRNAs is determined not only by intron sequences adjacent to the exon boundaries, but also by a multitude of other sequence elements present in both exons and introns. These auxiliary sequences are recognized by regulatory factors that assist or prevent the function of the spliceosome — the molecular machinery in charge of intron removal.

15% to 50% of human disease mutations affect splice site selection. Tissue-dependent splicing is regulated by trans-acting factors, cis-acting RNA sequence motifs, and other RNA features, such as exon length and secondary structure. For nearly two decades, researchers have sought to define a regulatory splicing code in the form of a set of RNA features that can account for abundances of spliced isoforms. Through detailed investigation of a small number of examples of regulated splicing, it is clear that a splicing code must account for various features that act together to control splicing. Furthermore, a code should enable the reliable prediction of the regulatory properties of previously uncharacterized exons and the effects of mutations within regulatory elements. Here we describe a method for inferring a splicing regulatory code that addresses these challenges 

Wang further observes that splicing "is a tightly regulated process, and a great number of human diseases are caused by the 'misregulation' of splicing in which the gene was not cut and pasted correctly." This implies that important protein products are produced by splicing, meaning that the splicing code plays an important functional role in cells.

After the gene is copied the transcript is edited, splicing out the introns and glueing together the exons. Not only is it a fantastically complex process, it also adds tremendous versatility to how genes are used. A given gene may be spliced into alternate sets of exons, resulting in different protein machines. There are three genes, for example, that generate over 3,000 different spliced products to help control the neuron designs of the brain.

And how does the splicing machinery know where to cut and paste? There is an elaborate code that the splicing machinery uses to decide how to do its splicing. This splicing code is extremely complicated, using not only sequence patterns in the DNA transcript, but also the shape of transcript, as well as other factors.

Recent analysis from the Encyclopedia of DNA Elements (ENCODE) project1  indicates that most of the human genome is transcribed and consists of ~60,000 genes of which ~20,000 protein-coding genes.  This number is surprisingly low given the proteomic complexity that is evident in many tissues, particularly the central nervous system (CNS). The human transcriptome is composed of a vast RNA population that undergoes further diversification by splicing. The spliceosome splices these 20,000 genes, and each gene is spliced into over 300 different protein species or variants.

This split gene architecture introduces a requirement for an intricate splicing regulatory network that consists of an 

1. array of RNA regulatory sequences
2. RNA–protein complexes and 
3. splicing factors.

Aberrant splicing is a significant cause of pathology. Detecting specific splice sites in this large sequence pool is the responsibility of the major and minor spliceosomes in collaboration with numerous splicing factors. This complexity makes splicing susceptible to sequence polymorphisms and deleterious mutations.  Indeed, RNA mis-splicing underlies a growing number of human diseases with substantial societal consequences.

This means that the spliceosome and splicing code is far more determinant than the genome in regards of creating the blueprint necessary for the production of proteins for the human body. And if aberrant splicing causes pathology, then mutations are not beneficial but generate desease. 

Do i have to explain further what that means ? 

1. http://sci-hub.tw/10.1016/j.biosystems.2017.11.002
2. http://sci-hub.tw/https://www.nature.com/articles/nrg.2015.3

A few notes on the 'species-specific' alternative splicing code

Last edited by Admin on Sun May 20, 2018 4:12 pm; edited 3 times in total


6The various codes in the cell Empty The rna binding protein binding code Tue Nov 24, 2015 4:28 pm



The rna binding protein binding code

A compendium of RNA-binding motifs for decoding gene regulation

The eukaryotic-wide RNA-binding protein specificity code
The RNA-binding proteins play substantial and diverse roles in the post-transcriptional regulation (PTR) of gene expression. For example, recent work estimates that 40-60% of the variability in human protein levels is controlled post-transcriptionally, suggesting that regulation by RBPs contributes as much to gene expression levels as transcription factors. In collaboration with Hughes lab, we have recently biochemically-measured RNA binding preferences for more than 200 RBPs and, because RBP RNA-specificity is highly conserved, we were able to infer motifs for nearly 5,000 more RBPs by homology. This work was recently published


7The various codes in the cell Empty Re: The various codes in the cell Tue Nov 24, 2015 4:44 pm



microRNA binding code

The code within the code: microRNAs target coding regions
We report here an analysis of published proteomics experiments that further support a functional role for coding region microRNA binding sites
Among possible genetic codes, the universal code has been shown to be nearly optimal for incorporating embedded information.Evidence thus far supports the conclusion that the coding regions of genes can contain additional information besides the amino acid sequence of the encoded protein, including functional microRNA binding sites.

microRNAs represent ~4% of the genes in the human genome.
This discovery suggests that the genome is far from being deciphered, and most importantly that miRNAs are likely to represent just the “tip of the iceberg” with many other small non-coding RNAs to be discovered.




Astonishing DNA complexity demolishes neo-Darwinism

Multiple Codes
A major outcome of the studies so far is that there are multiple information codes operating in living cells. The protein code is the simplest, and has been studied for half a century. But a number of other codes are now known, at least by inference. Cell memory code. DNA is a very long, thin molecule. If you unwound the DNA from just one human cell it would be about 2 metres long! To squash this into a tiny cell nucleus, the DNA is wound up in four separate layers of chromatin structure (as described earlier). The first level of this chromatin structure carries a ‘histone code’ that  contains information about the cell’s history (i.e. it is a cell memory).8,9 The DNA is coiled twice around a group of 8 histone molecules, and a 9th histone pins this structure into place to form what is called a nucleosome. These nucleosomes can carry various chemical modifications that either allow, or prevent, the expression of the DNA wrapped around them. Every time a cell divides into two new cells, its DNA double-helix splits into two single strands, which then each produce a new double-strand. But nucleosomes
are not duplicated like the DNA-strands. Rather, they are distributed between either one or the other of the two new DNA double strands, and the empty spaces are filled by new nucleosomes. Cell division is therefore an opportunity for changes in the nucleosomal composition of a specific DNA region. Changes can also happen during the lifetime of a cell due to chemical reactions allowing inter-conversions between the different nucleosome types. The memory effect of these changes can be that a latent capacity that was dormant comes to life, or, conversely, a previously active capacity shuts down.

Differentiation code.
In humans, there are about 300 different cell types in our bodies that make up the different tissue types (nerves, blood, muscle, liver, spleen, eyes etc). All of these cells contain the same DNA, so how does each cell know how to become a nerve cell rather than a blood cell? The required information is written in code down the side of the DNA double-helix in the form of different molecules attached to the nucleotides that form the ‘rungs’ in the ‘ladder’ of the helix. This code silences developmental genes in embryonic stem cells, but preserves their potential to become activated during embryogenesis. The embryo itself is largely defined by its DNA sequence,
but its subsequent development can be altered in response to lineage-specific transcriptional programs and environmental cues, and is epigenetically maintained.

Replication Code.
The replication code was discovered by addressing the question of how cells maintain their normal metabolic activity (which continually uses the DNA as source information) when it comes time for cell division. The key problem is that a large proportion of the whole genome is required for the normal operation of the cell—probably at least 50% in unspecialized body cells and up to 70–80% in complex liver and brain cells—and, of course, the whole genome is required during replication. This creates a huge logistic problem—how to avoid clashes between the transcription machinery (which needs to continually copy information for ongoing use in the cell) and the replication machinery (which needs to unzip the whole of the DNA double-helix and replicate a ‘zipped’ copy back onto each of the separated strands). The cell’s solution to this logistics nightmare is truly astonishing. Replication does not begin at any one point, but at thousands of different points. But of these thousands of potential start points, only a subset are used in any one cell cycle—different subsets are used at different times and places. A full understanding is yet to emerge because the system is so complex; however, some progress has been made:

The large set of potential replication start sites is not essential, but optional. In early embryogenesis, for example, before any transcription begins, the whole genome replicates numerous times without any reference to the special set of potential start sites.

The pattern of replication in the late embryo and adult is tissue-specific. This suggests that cells in a particular tissue cooperate by coordinating replication so that while part of the DNA in one cell is being replicated, the corresponding part in a neighbouring cell is being transcribed. Transcripts can thus be shared so that normal functions can be maintained throughout the tissue while different parts of the DNA are being replicated.

DNA that is transcribed early in the cell division cycle is also replicated in the early stage (but the transcription and replication machines are carefully kept apart). The early transcribed DNA is that which is needed most often in cell function. The correlation between transcription and replication in this early phase allows the cell to minimize the ‘downtime’ in transcription of the most urgent supplies while replication takes place There is a ‘pecking order’ of control. Preparation for replication may take place at thousands of different locations, but once replication does begin at a particular site, it suppresses replication at nearby sites so that only one copy of the DNA is made. If transcription happens to occur nearby, replication is suppressed until transcription is completed. This clearly demonstrates that keeping the cell alive and functioning properly takes precedence over cell division.

There is a built-in error correction system called the ‘cell-cycle checkpoints’. If replication proceeds without any problems, correction is not needed. However, if too many replication events occur at once the potential for conflict between transcription and regulation increases, and/or it may indicate that some replicators have stalled because of errors. Once the threshold number is exceeded, the checkpoint system is activated, the whole process is slowed down, and errors are corrected. If too much damage occurs, the daughter cells will be mutant, or the cell’s self-destruct mechanism (the apoptosome) will be activated to dismantle the cell and recycle its components.

An obvious benefit of the pattern of replication initiation being never the same from one cell division to the next is that it prevents accumulation of any errors that are not corrected. The exact location of the replication code is yet to be pinpointed, but because it involves transcription factors gaining access to transcription sites, and this is known to
be controlled by chromatin structure, then the code itself is probably written into the chromatin structure.



9The various codes in the cell Empty The tubulin code Wed Jan 13, 2016 6:34 pm



The tubulin code

The α- and β-tubulin heterodimer – the building block of microtubules – undergoes multiple post-translational modifications (PTMs) (Table above). The modified tubulin subunits are non-uniformly distributed along microtubules. Analogous to the model of the ‘histone code’ on chromatin, diverse PTMs are proposed to form a biochemical ‘tubulin code’ that can be ‘read’ by factors that interact with microtubules (Verhey and Gaertig, 2007).

Who are the Interpreters of the Tubulin Code?

A major implication of the tubulin code is that PTMs influence the recruitment of protein complexes (microtubule effectors), which in turn contribute to microtubule-based functions. Three major classes of microtubule binding proteins can be considered as interpreters of the tubulin code. First, microtubule associated proteins (MAPs) such as Tau, MAP1 and MAP2 that bind statically along the length of microtubules. Second, plus end tracking proteins (+TIPs) that bind in a transient manner to the plus-ends of growing microtubules. And third, molecular motors that use the energy of ATP hydrolysis to carry cargoes along microtubule tracks.

This is a relevant and amazing fact , and raises the question of how the " tubulin code "  beside the several other codes in the cell emerged. Once more this shows that intelligence was involved in creating these amazing biomolecular structures and specified complex coded instructing patterns  , since the formation of  coded information has always shown to be able only to be produced by intelligent minds. Furthermore: What good would the tubulin code be for, if no specific goal was in mind, that is, it acts as emitter , and if there is no destination of the information, there is no reason of the code to exist in the first place. So both, sender and receiver, must exist first as hardware, that is the microtubule with the post transcriptional modified tubulin units in a specified coded conformation, and the the receiver, which can be Kinesin or Myosin motor proteins, which are directed to the right destination, or other proteins.

Last edited by Admin on Wed Jan 04, 2017 5:14 am; edited 2 times in total


10The various codes in the cell Empty The Glycan or Sugar Code Wed Jan 13, 2016 6:54 pm



The Glycan or Sugar Code 1
Carbohydrates are essential for all forms of life, but the largest variety of their functions is now found in higher eukaryotes. The majority of eukaryotic proteins are modified by cotranslational and posttranslational attachment of complex oligosaccharides (glycans) to generate the most complex epiproteomic modification – protein glycosylation. 

Most proteins are glycosylated: That is, complex carbohydrates are chemically bonded to them to generate enormous diversity in protein functions. [5] Since carbohydrate molecules are branched, they carry many more orders of magnitude of information than linear molecules such as DNA and RNA. This has been called the “sugar code,” and although it is highly specified it is largely independent of DNA sequence information.  A third biochemical alphabet forming code words with an information storage capacity second to no other substance class in rather small units (words, sentences) is established by monosaccharides (letters). As hardware oligosaccharides surpass peptides by more than seven orders of magnitude in the theoretical ability to build isomers, when the total of conceivable hexamers is calculated.  A genetic program is not sufficient for embryogenesis: biological information outside of DNA is needed to specify the body plan of the embryo and much of its subsequent development. Some of that information is in cell membrane patterns, which contain a two-dimensional code mediated by proteins and carbohydrates. 2

According to the most widely held modern version of Darwin’s theory, DNA mutations can supply raw materials for morphological evolution because they alter a genetic program that controls embryo development. Yet a genetic program is not sufficient for embryogenesis: biological information outside of DNA is needed to specify the body plan of the embryo and much of its subsequent development. Some of that information is in cell membrane patterns, which contain a two-dimensional code mediated by proteins and carbohydrates. These molecules specify targets for morphogenetic determinants in the cytoplasm, generate endogenous electric fields that provide spatial coordinates for embryo development, regulate intracellular signaling, and participate in cell–cell interactions. Although the individual membrane molecules are at least partly specified by DNA sequences, their two-dimensional patterns are not. Furthermore, membrane patterns can be inherited independently of the DNA. I review some of the evidence for the membrane code and argue that it has important implications for modern evolutionary theory.

1) http://www.ncbi.nlm.nih.gov/pubmed/10798195

2) https://reasonandscience.catsboard.com/t2071-carbohydrates-and-glycobiology-the-3rd-alphabet-of-life-after-dna-and-proteins?highlight=glycan+code

Last edited by Otangelo on Wed Nov 11, 2020 1:40 pm; edited 3 times in total


11The various codes in the cell Empty Re: The various codes in the cell Sat Jan 16, 2016 2:44 pm




Trifonov advocates[19]:4 the notion that biological sequences bear many codes contrary to the generally recognized one genetic code (coding amino acids order). He was also the first one to demonstrate[20] that there are multiple codes present in the DNA. He points out that even so called non-coding DNA has a function, i.e. contains codes, although different from the triplet code.


Trifonov recognizes[19]:5–10 specific codes in the DNA, RNA and proteins:

in DNA sequences
chromatin code (Trifonov 1980) is a set of rules responsible for positioning of the nucleosomes.
in RNA sequences

RNA-to-protein translation code (triplet code)
Every triplet in the RNA sequence corresponds (is translated) to a specific amino acid.

splicing code
is a code responsible for RNA splicing; still poorly identified.

framing code (Trifonov 1987)
The consensus sequence of the mRNA is (GCU)n which is complementary to (xxC)n in the ribosomes. It maintains the correct reading frame during mRNA translation.

translation pausing code (Makhoul & Trifonov 2002)
Clusters of rare codons are placed in the distance of 150 bp from each other. The translation time of these codons is longer than of their synonymous counterparts which slows down the translation process and thus provides time for the fresh-synthesized segment of a protein to fold properly.

in protein sequences

protein folding code (Berezovsky, Grosberg & Trifonov 2000)
Proteins are composed of modules. The newly synthesized protein is folded a module by module, not as a whole. 

fast adaptation codes (Trifonov 1989)
are present in all three types of biological sequences. They are represented by tandem repeats (AB...MN)n. The number of repetitions (n) can change in the cell genome as a response to stress which may (or may not) help the cell to adapt to the environmental pressure. 

codes of evolutionary past

binary code (Trifonov 2006)
The first ancient codons were GGC and GCC from which the other codons have been derived by series of point mutations. Nowadays, we can see it in modern genes as "mini-genes" containing a purine at the middle position in the codons alternating with segments having a pyrimidine in the middle nucleotides.

genome segmentation code (Kolker & Trifonov 1995)
Methionines tend to occur every 400 bps in the modern DNA sequences as a result of fusion of ancient independent sequences.

THE CODES CAN OVERLAP EACH OTHER SO THAT UP TO 4 DIFFERENT CODES CAN BE IDENTIFIED IN ONE DNA SEQUENCE (specifically a sequence involved in a nucleosome). According to Trifonov, other codes are yet to be discovered.



12The various codes in the cell Empty Code Biology Fri Apr 22, 2016 10:28 pm



Code Biology

Marcello Barbieri , page 14:

The Signal Transduction Codes Signal transduction is the process by which cells transform the signals from the environment, called first messengers, into internal signals, called second messengers. First and second messengers belong to two independent worlds because there are literally hundreds of first messengers (hormones, growth factors, neurotransmitters, etc.) but only four great families of second messengers (cyclic AMP, calcium ions, diacylglycerol and inositol trisphosphate) (Alberts et al. 2007). The crucial point is that the molecules that perform signal transduction are true adaptors. They consist of three subunits: a receptor for the first messengers, an amplifier for the second messengers, and a mediator in between (Berridge 1985). This allows the transduction complex to perform two independent recognition processes, one for the first messenger and the other for the second messenger. Laboratory experiments have proved that any first messenger can be associated with any second messenger, which means that there is a potentially unlimited number of arbitrary connections between them. In signal transduction, in short, we find all three essential components of a code:

(1) two independents worlds of molecules (first messengers and second messengers), 
(2) a set of adaptors that create a mapping between them, and 
(3) the proof that the mapping is arbitrary because its rules can be experimentally changed.

The cells that evolved new codes, such as splicing codes, cytoskeleton codes, compartment codes, histone code and so on, became eukarya and have generated increasingly complex cellular structures.

Before the origin of the genetic code, the common ancestor was engaged in evolving coding rules and was therefore a code exploring system.

Had the common ancestor not have already to have a sophisticated code and cipher system set up, and so the molecular machines required for transcription, translation and replication ? And why at all would random chance explore codes to set them up ? 

After the origin of the code, however, Introduction xv no other modification in coding rules was allowed and the cell became a code conservation system. 

There are 18 different Cell codes known. Why would natural mechanisms stop to allow other , different codes ? 

Another part of the ancestral cells, however, maintained the potential to evolve the rules of different codes and behaved as new code exploring, or code generating, systems. In the early Eukarya, for example, the cells had a code conservation part for the genetic code, but also a code exploring part for the splicing code, and this tells us something important about life.

Well, the splicing code has nothing to do with the genetic code.

The origin of the first cells was based on the ability of the ancestral systems to generate the rules of the genetic code

What systems were this, why  would that " system " generate a code and its rules at all ?

Another outstanding implication of the existence of organic codes in Nature comes from the fact that any code involves meaning and we need therefore to introduce in biology, with the standard methods of science, not only the concept of biological information but also that of biological meaning. The study on the organic codes, in conclusion, is bringing to light new mechanisms that operated in the history of life and new fundamental concepts. It is an entirely new field of research, the exploration of a vast and still largely unexplored dimension of the living world, the real new frontier of biology.

A Gallery of Organic Codes 
The Apparatus of Protein Synthesis
The Genetic Code 
Stereochemistry and Arbitrariness 
The Splicing Codes 
The Metabolic Code
The Signal Transduction Codes 
The Signal Integration Codes 
The Histone Code 
Is the “Histone Code” an Organic Code?
The Tubulin Code 
The Sugar Code 
The Glycomic Code

Last edited by Admin on Sun Jul 03, 2016 12:37 pm; edited 3 times in total


13The various codes in the cell Empty The Metabolic Code Fri Apr 22, 2016 11:01 pm



The Splicing Codes

Code Biology

Marcello Barbieri , page 14

The primary transcripts of the genes are often transformed into messenger RNAs by removing some RNA pieces (called introns) and by joining together the remaining pieces (the exons). This cutting-and-sealing operation, known as splicing, is a true assembly because exons are assembled into messengers, and we need therefore to find out if it is a catalyzed assembly (like transcription) or a codified assembly (like translation). In the first case splicing would require only catalysts (comparable to RNA-polymerases), whereas in the second case it would need an assembly machine and a set of adaptors (comparable to ribosome and tRNAs). These parallels immediately suggest that splicing is a codified process because it is implemented by structures that are very much comparable to those of protein synthesis. The splicing bodies, known as spliceosomes, are huge molecular machines like ribosomes, and employ small molecules, known as small-nuclear- RNAs (snRNAs) which are comparable to tRNAs. The similarity, however, goes much deeper than that because splicing is carried out by molecular structures that are true adaptors. They perform two independent recognition processes, one for the beginning and one for the end of each exon, thus creating a specific correspondence between primary transcripts and messenger RNAs. Splicing, in other words, is a codified process based on adaptors and takes place with sets of rules that have been referred to as splicing codes. It must be underlined, however, that there are two outstanding complications in splicing. One is the fact that the order in which the exons are joined together can be shuffled in various ways, an operation, called alternative splicing, that allows many species to generate a whole family of variant proteins from the same gene. The expression of these proteins, furthermore, can change from one tissue to another and in different stages of embryonic development, thus enormously increasing the protein variety that can be associated to a gene. Alternative splicing has in this way a powerful role in the generation of biological complexity, and splicing mistakes often have pathological effects; it has been estimated that they account for about one fifth of all inherited diseases.

The other great complication of splicing is the fact that many introns carry sequences that are similar to exons but translate into nonsense and for this reason are called pseudo exons or pseudo genes. They would create havoc if incorporated into mRNAs and the splicing machinery needs the means to differentiate real exons from pseudo ones. The result is that real exons contain internal identity marks that are known as exonic splicing enhancers (ESEs) and exonic splicing silencers

Question: Had these indentity marks not have to be present in the process right from the beginning ? Would the absence of these or ones not fully developed  not make the process impossible to happen without mistakes ? 

The presence of these marks, in turn, means that the adaptors of the splicing codes are not single molecules but combinations of molecules because they must be able to recognize not only the beginning and the end of the real exons, but also their internal identity marks.

This makes the whole process even more impossible to emerge in a stepwise manner, since both, the recognition of the beginning and the end of the exons is required, that means, the genome needs to have the start and stop signals at the right place, and the molecular machines, programmed to recognize the signals must be in place, fully developed, and fully programmed, and the identity marks are required beside the hardware as well. Furthermore, this seems to be one more irreducible complex system , since both, the software, and the hardware, had to be in place, just right , fully developed and programmed since the beginning. 

The actual deciphering of the splicing codes has already started but it is taking considerably longer than that of the genetic code because it is incredibly more complex. Let us keep in mind that the discovery of the genetic code has been facilitated by two particularly favourable features. More precisely, by the fact that 

(1) the adaptors are single molecules (the tRNAs) and 
(2) the coding units form a closed set (64 codons and 20 amino acids).

 In the case of splicing, instead, the adaptors are combinations of molecules (combinatorial codes), and the domain (or alphabet) of the codes is open and potentially unlimited. The overall complexity of splicing is such that the most practical way of discovering its codes is by building computational models that are capable of predicting new splicing rules on the basis of existing data. Such models have already started appearing in the literature , and represent our first glimpse of the rules of the splicing codes.

The Metabolic Code 

This is the first organic code that came to light after the discovery of the genetic code. It was described in Science, in 1975, by Gordon Tomkins, a professor of biochemistry at the University of San Francisco. Tragically, Tomkins died that very year, aged 49, from a brain tumour, and apparently his idea died with him. Recently, however, there has been an attempt to rescue his work from oblivion (Swan and Goldberg 2010) and here we will try to show that such attempt is amply justified. Tomkins investigated the evolution of metabolism and started from the need of the ancestral cells to obtain energy. “Since both nucleic acid and protein synthesis are endergonic reactions, primordial cells were almost certainly endowed with the capacity to capture the necessary energy from the environment and to transform it into usable form, presumably ATP (adenosine triphosphate). The biosynthetic capabilities of primitive cells were, however, probably quite limited ::: survival would therefore have required the evolution of regulatory mechanisms that could maintain a relatively constant intracellular environment in the face of changes in external conditions” (Tomkins 1975). Granted this basic need of the cells to evolve regulatory mechanisms, Tomkins distinguished between two types of regulation that he called simple and complex, elationship (positive or negative) between the components of a metabolic circuit, and the end products affecting their own metabolism. 

Complex regulation is characterized by two new entities that Tomkins called symbols and domains. In order to illustrate them, Tomkins made the example of molecules that are accumulated inside a cell as a consequence of a particular environment and become a symbol of that environment. In most microorganisms, for example, cyclic AMP is accumulated as a result of carbon starvation and becomes a symbol of that deficiency. Another example is ppGpp (guanosine50 -diphosphate30 - diphosphate) that accumulates as a result of amino acid starvation and represents a symbol of that condition. These molecules are symbols because they bear no structural relationship to the molecules that promote their accumulation (cyclic AMP, for example is accumulated as a result of glucose starvation, but it is not a chemical analog of glucose). This is what suggested to Tomkins the existence of a metabolic code. “Since a particular environmental condition is correlated with a corresponding intracellular symbol, the relationship between the extra- and intracellular events may be considered as a metabolic code in which a specific symbol represents a unique state of the environment.” Tomkins went on to show how metabolic coding in unicellular organisms might have evolved into the endocrine system of the metazoa, and described what happens in the slime mold Dictyostelium discoideum. “Given sufficient nutrients, this organism exists as independent myxamoebas. Upon starvation, they generate cyclic AMP and release it into the surrounding medium. This substance serves as a chemical attractant that causes the aggregation of a large number of myxamoebas to form a multicellular slug. In this case, as in E. coli, cyclic AMP acts as an intracellular symbol of carbon-source starvation. In addition, however, the cyclic nucleotide is released from the Dictyostelium cells in which it is formed and diffuses to other nearby cells, promoting the aggregation response. Cyclic AMP thus acts in these organisms both as an intracellular symbol of starvation and as a hormone which carries this metabolic information from one cell to another.”

Hormones, according to Tomkins, evolved in order “to carry information from sensor cells in direct contact with the environment, to more sequestered responder cells. Specifically, the metabolic state of a sensor cell, represented by the levels of its intracellular symbols, is encoded by the synthesis and secretion of corresponding levels of hormones. When hormones reach the responder cells, the metabolic message is decoded into corresponding primary intracellular symbols. In this way, endocrine cells act as both sensors and responders, that is, intermediates in the transmission of metabolic information from primary sensor cells to the tissues in which the final chemical responses take place.”

The Signal Transduction Codes

Living cells react to many physical and chemical stimuli from the environment, and in general their reactions consist in the expression of specific genes. We need therefore to understand how the environment interacts with the genes, and the turning point, in this field, came from the discovery that the external signals (known as first messengers) never reach the genes. They are invariably transformed into a different world of internal signals (called second messengers) and only these, or their derivatives, reach the genes. In most cases, the molecules of the external signals do not even enter the cell and are captured by specific receptors of the cell membrane, but even those that do enter (some hormones) must interact with
intracellular receptors in order to influence the genes (Sutherland 1972). The transfer of information from environment to genes takes place therefore in two distinct steps: one from first to second messengers, called signal transduction, and a second path from second messengers to genes which is known as signal integration. The surprising thing about signal transduction is that there are literally hundreds of first messengers (ions, nutrients, hormones, growth factors, neurotransmitters, etc.) whereas the second messengers belong to only four molecular families: cyclic AMP or GMP, calcium ions (Ca2+), inositol trisphosphate (IP3), and diacylglycerol (DAG) (Alberts et al. 2007). First and second messengers, in other words, belong to two very different worlds, and this suggests immediately that signal transduction may be based on organic codes. This is reinforced by the discovery that there is no necessary connection between first and second messengers, because it has been proven that the same first messengers can activate different types of second messengers, and that different first messengers can act on the same type of second messengers (Alberts et al. 2007). The only plausible explanation is that signal transduction is based on organic codes, but of course one would like a direct proof. The signature of an organic code, as we have seen, is the presence of adaptors and the transmembrane receptor proteins of signal transduction do have the defining characteristics of the adaptors. 

The transduction system consists of at least three types of molecules: 

a receptor for the first messengers, 
an amplifier for the second messengers and 
a mediator in between (Berridge 1985). 

This transmembrane system performs two independent recognition processes, one for the first and the other for the second messenger, and the two steps are connected by the bridge of the mediator. This connection, on the other hand, could be implemented in countless different ways since any first messenger can be coupled with any second messenger, and this makes it imperative to have a selection in order to guarantee biological specificity. 

In signal transduction, in short, we find the three defining features of a code: 

(1) two independents worlds of objects (first messengers and second messengers), 
(2) a potentially unlimited number of arbitrary connections produced by adaptors, and 
(3) a set of coding rules (a selection of the adaptors) that ensures the specificity of the correspondence. 

The effects that external signals have on cells, in short, do not depend on the energy or the information that they carry, but on the meaning that cells give them with sets of rules that have been referred to as signal transduction codes (Barbieri 1998, 2003). One may wonder at this point why signal transduction codes are never mentioned in biochemistry books despite the fact that the their molecules are true adaptors. The problem here is that the study of signal transduction started when organic codes were not known, and it has always been assumed a priori that in this process there is no need for them. A code, in short, has not been found simply because it has never been looked for. The genetic code, on the contrary, was predicted on theoretical grounds, and it was discovered precisely because experiments were devised with the specific purpose to look for it.

The Signal Integration Codes

We have seen that there are only four families of second messengers in the cell, and yet the reactions that they set in motion can pick up an individual gene among tens of thousands. How this is achieved is still a mystery, but some progress has been made. Perhaps the most illuminating discovery, so far, is that second messengers do not act independently. Calcium ions and cyclic-AMPs, for example, have effects that in some occasions reinforce each other whereas in others are mutually exclusive. The cell, in short, can combine its internal signals in countless different ways, and it is precisely this combinatorial ability that explains why a small number of second messengers can generate an extraordinarily high number of specific genetic responses. The activation of second messengers, in other words, sets in motion a cascade of reactions that normally ends with the expression of a target gene, and again we need to understand if they are normal catalized reactions or if at least some of them are based on the rules of a code. One of the most interesting clues, in this field, is the fact that signalling molecules have in general more than one function. Epidermal growth factor, for example, stimulates the proliferation of fibroblasts and keratinocytes, but it has an antiproliferative effect on hair follicle cells, whereas in the intestine it is a suppressor of gastric acid secretion. Other findings have proved that all growth factors can have three distinct functions, with proliferative, anti-proliferative, and proliferationindependent effects. They are, in short, multifunctional molecules. In addition to growth factors, it has been found that many other molecules have multiple functions. Adrenaline, for example, is a neurotransmitter, but it is also a hormone produced by the adrenal glands to spring the body into action by increasing the blood pressure, speeding up the heart and releasing glucose from the liver. Acetylcholine is another common neurotransmitter in the brain, but it also act on the heart (where it induces relaxation), on skeletal muscles (where the result is contraction), and in the pancreas (which is made to secrete enzymes). Cholecystokinin is a peptide that acts as a hormone in the intestine, where it increases the bile flow during digestion, whereas in the nervous system is a neurotransmitter. Encephalins are sedatives in the brain, but in the digestive system are hormones which control the mechanical movements of food. Insulin is universally known for lowering the sugar levels in the blood, but it also controls fat metabolism and in other less known ways it is affecting almost every cell of the body. The discovery of multifunctional molecules suggests that their function is not decided solely by their structure, but also by the context in which they find themselves. What matters, in other words, is not their ability to catalize a specific reaction, but the fact that they are employed as molecular signs that can be given one meaning in a certain context and a different meaning in another one. A second finding that points to the existence of codes in signal integration is the fact that the regulation processes set in motion by second messengers are strongly conserved in evolution, and yet the actual reactions involved have undergone great changes in the history of life. The regulation of cellular energy homeostasis, for example, has been highly conserved from yeast to man, with the key role being played by a protein kinase that is called AMPK in animals and Snf1 in yeast. Despite this overall conservation, it has been found that an evolutionary divergence of about 150 million years between two species of budding yeasts (Saccharomyces cerevisiae and Kluyveromyces lactis) has produced substantial differences in their Snf1 regulatory networks. Again, what seems to matter in these regulation processes is not a specific set of catalysts, but a set of rules that can be implemented in many different ways. The information carried by first messengers, in conclusion, undergoes two great transformations in its journey towards the genes. First, it is transformed into internal messengers with the rules of the signal transduction codes, and then it is channelled along complex three-dimensional circuits that integrate it with other signals according to the rules of one or more signal integration codes.

The Histone Code

The classic double helix described by Watson and Crick has a width of 2 nm (two millionths of a millimeter), but in eukaryotes many segments of this filament are folded around groups of eight histone proteins and form blocks, called nucleosomes, that give to the filament a ‘beads-on-a-string’ appearance. This string, called chromatin, is almost six times thicker than the double helix and is further folded into spirals of nucleosome groups, called solenoids, that arrange it in fibers of increasing thickness and ultimately into the 600 nm fiber of the chromosome. These multiple foldings allow the eukaryotic cells to pack their long chromosomes into the tiny space of their nuclei, and for this reason it was initially assumed that the histones have a purely packaging role. The experimental data, however, have shown that the ‘tails’ of the histones (the parts that protrude from the surface of the nucleosomes) are subject to a wide variety of post-translational modifications (in particular acetylation, methylation and phosphorylation) that have highly dynamic roles and are involved in the activation or repression of gene activity. The histone tails represent about 25–30 % of the histone mass, and their posttranslational modifications can alter the chromatin either directly or indirectly. The direct modifications are those that physically open or close the molecular space (in particular the electrostatic barrier) that surrounds the genes and in this way control the transit of DNA-binding proteins. Several discoveries, however, have shown that the most frequent effects are obtained by indirect mechanisms. In these cases, the modified histone tails provide ‘marks’ on the surface of the nucleosomes that are recognized by specialized effector proteins which set in motion chains of biological reactions that eventually end in the activation or the repression of specific gene. A crucial breakthrough, is this field, was the discovery that the post-translational modifications of the histones do not act individually. Most of them are involved in both the activation and the repression of genes (the phosphorilation of histone H3, for example, takes part in the condensation as well as in the decondensation of chromatin), which means that the final result is due to a combination of histone marks rather than a single one. This led David Allis and colleagues to propose that the histone marks operate in combinatorial groups, like letters that are put together into the words of a molecular ‘language’ that was referred to as histone code. The same concept was independently proposed by Brian Turner who argued that there is an epigenetic code at the heart of the regulation mechanisms that are initiated by histone tail modifications. Turner pointed out that these modifications are epigenetic because they operate in addition to genetic changes, and underlined that they have both short-term and long-term effects. The shortterm modifications change rapidly in response to external signals and represent a mechanism by which the genome quickly responds to the environment. The long-term modifications, instead, are those that are put in place at early stages of embryonic development and allow the transcription or the silencing of specific genes at more advanced stages. The existence of long-term effects was revealed by the discovery that many histone modifications survive the trauma of mitosis and are transmitted to the daughter cells. This is particularly important in embryonic development where the cells must perpetuate their state of differentiation into distinct tissues. The histone modifications, in other words, provide a mechanism of cell memory, in the sense that they enable the cells to ‘remember’ their specific pattern of gene expression for many generations. It has been shown, for example, that the expression of Hox genes in embryonic development is regulated by histone modifications . Another example of long-term effects is provided by the histone modifications that allow neural cells to generate faster action potentials the more they are used, making the transmission of action potentials increasingly easier. Today, in conclusion, a large number of data support the idea that the regulation of genetic activity by histone modifications plays a fundamental role in all eukaryotes and is based on the rules of a combinatorial code that has become known as ‘histone code’.

Is the “Histone Code” an Organic Code?

This question is the title of a paper where Stefan Kühn and Jan-Hendrik Hofmeyr described the results of a research project dedicated to find out whether or not the histone code has all the essential characteristics of an organic code. The prototype example of the genetic code shows that an organic code requires three things: 

(1) two independent molecular worlds, 
(2) a set of molecular adaptors that create a mapping between them, and 
(3) the demonstration that the mapping is arbitrary because its rules can be changed. Kühn and Hofmeyr tested the histone code in respect to all these points.

1. The Two Independent Worlds of the Histone Code
An organic code is a mapping between organic signs and organic meanings, and in many cases signs and meanings are both organic molecules. The genetic code, for example, is a mapping between codons and amino acids, whereas the signal transduction code is a mapping between first and second messengers. Kühn and Hofmeyr, however, pointed out that the organic meanings can be biological effects rather than molecules. In principle this may not seem an extension of the original definition because biological effects are necessarily implemented by molecules, but in practice it is a very useful generalization because there are cases in which a biological function is an experimental reality even when its molecular components are not fully known. And this is precisely the case in the histone code, where the organic signs are groups of histone modifications and the organic meanings are biological reactions that promote the activation or the repression of specific genes. The histone code, in other words, is a mapping between two independent worlds.

2. The Adaptors of the Histone Code
The effector proteins of the histone code are the molecules that establish a bridge between organic signs and organic meanings, but in order to prove that they are true adaptors it is necessary to show that they operate independently on signs and meanings. Kühn and Hofmeyer underlined that this is precisely what happens because the effector proteins have two distinct domains: one that recognizes histone modifications and a different type that initiates biological reactions. It has been shown, for example, that the acetylated lysines are specifically recognized only by the bromodomains of the effector proteins . The methylated amino acids are recognized by a greater variety of domains but again each recognition step is absolutely specific . The effector  proteins, in other words, perform two independent recognition processes on signs and meanings and are therefore true adaptors.

3. The Arbitrariness of the Histone Code
An organic code is arbitrary when its rules are not dictated by physical necessity and in this case it must be possible, at least in principle, to exchange the part of an adaptor that recognizes an organic sign with a different one and show that the modified adaptor associates the old organic meaning to the new sign. Kühn and Hofmeyr noticed that the experimental data support this possibility because there is evidence that the chromodomains of the effector proteins can be interchanged. The histone code, in conclusion, did pass the three tests and Kühn and Hofmeyr ended their paper with these words: “Although we probably do not yet know the complete histone code, we have more than enough information to be able to recognize the histone code as a bona fide organic code.”

Nucleosome Positioning Codes 1

DNA molecules are much longer than the cells that contain them. This requires their compaction, which introduces also an opportunity: the regulation of transcription through a differentiated fashion of DNA packaging. In eukaryotes DNA molecules can guide their own packaging into nucleosomes by having the desired mechanical properties (stiffnesses and intrinsic curvature) written into their base-pair (bp) sequence. This has been referred to as the “nucleosome positioning code” . Nucleosomes are the fundamental packaging units of eukaryotic DNA, where 147 bp are wrapped in a 1 3/4 left-handed superhelical turn around an octamer of histone proteins. As the DNA is strongly deformed when wrapped around the histones, sequence-dependent geometrical and mechanical properties could—at least locally—overrule other effects that also influence nucleosome positioning like the presence of proteins that compete for the same DNA stretch or the action of chromatin remodellers.
Multiplexing Genetic and Nucleosome Positioning Codes: A Computational Approach

The Tubulin Code

Tubulin is the major component of the microtubules, the filaments that form an internal scaffolding in all eukaryotic cells and give origin to organelles such as cilia, centrioles, basal bodies and the mitotic spindle. Most microtubules are in a state of rapid turnover by dynamic instability and alternate very quickly between growth and shrinkage. Within the cell, however, there is also a population of microtubules that are relatively stable, in the sense that their turnover is measured in hours rather than minutes. The function of the stable microtubules is still not completely known, but there are clear indications that they are involved in the morphogenesis of the eukaryotic cell. What is certain, is that the stable microtubules undergo a variety of post-translational modifications (PTMs) that have been strongly conserved  because they are found in all eukaryotic taxa. These PTMs consist in processes like acetylation, phosphorylation, polyglutamylation, polyglycylation, detyrosination, and palmitoylation that act preferentially on stable microtubules. They have been studied with various tests on purified tubulin, but the experiments have failed to detect any direct effect of the PTMs on the dynamics of the microtubules. This means that PTMs do not act by changing directly the intrinsic properties of the microtubules, but rather by providing combinatorial signals for the recruitment of proteins that interact with the microtubules. Different combinations of PTMs, in other words, act like signposts that specify the properties that stable microtubules are going to have in different regions of the cell or in different periods of the cell cycle. To this set of signposts that operate on stable microtubules, Kristen Verhey  and Jacek Gaertig  gave the name of Tubulin code. Any organic code, as we have seen, requires molecules that act like adaptors between two different domains. Verhey and Gaertig have called these molecules ‘interpreters’, and have identified three major classes of microtubule binding proteins that can be considered interpreters of the tubulin code:

 “First, microtubule associated proteins (MAPs) such as Tau, MAP1 and MAP2 that bind statically along the length of microtubules. 
Second, plus-end tracking proteins (+TIPs) that bind in a transient manner to the plus-ends of growing microtubules. 
And third, molecular motors that use the energy of ATP hydrolysis to carry cargoes along microtubule tracks.” 

Verhey and Gaertig have also called attention to a unique characteristic of the tubulin code. Many epigenetic modifications are transmitted from one generation to the next, but this does not usually happen in the tubulin world: “Some microtubule-based organelles (e.g., centrosomes and basal bodies) are inherited by a template-driven mechanism but there is no evidence that the template organelle directly influences the PTM pattern in the new organelle. Rather, the PTM pattern is recreated in the newly formed organelle in a gradual manner : : : Other microtubulebased structures, such as cytoplasmic microtubules, the mitotic spindle and cilia, are formed de novo mostly, if not entirely, from unmodified tubulin heterodimers. Thus, in case of both template-dependent and template-independent microtubular structures, PTM patterns are probably recreated without a direct influence of preexisting PTMs.” The existence of the tubulin code, in conclusion, is based on sound experimental evidence but the actual deciphering of its rules is still at a preliminary stage and requires a detailed understanding of how the PTMs influence the recruitment of proteins and regulate the functions of the stable microtubules.

The Sugar Code

For a long time, sugars have been regarded as molecules that provide energy (mostly in the form of glucose and glycogen) or structural support (like cellulose in plants), but molecular biology has shown that they also have a third outstanding function: by binding to proteins they generate glycoproteins, molecules that take part in countless communication processes in and between cells. The addition of sugars to proteins is a post-translational modification, called glycosylation, that greatly expands the potentialities of many protein families and gives origin to glycoproteins that perform a wide variety of functions. Some operate on the cell membrane and act as antennae for receiving molecular signals or as docking sites for importing compounds. Other glycoproteins take part in cell-to-cell interactions, for example in sperm-oocyte attachment, in bacteria-to-cell relationships and in the aggregation of platelets. A third family operates in the immune system where glycoproteins interact with antigens, recognize white blood cells, and take part in the major histocompatibility complex (MHC). Yet another family is that of the glycoproteins that act as hormones, like human corionic gonadotropin (HCG), thyroid-stimulating hormone (TSH) and erythropoietin (EPO). Then there are glycoproteins that have protective functions (mucins), some that are involved in transport (transferrin) and others that act as enzymes (alkaline phosphatase). The key point in these interactions is that in most cases it is the sugar component that determines the recognition ability of the glycoproteins. This point has been particularly underlined by Winterburn and Phelps (1972), who convincingly argued that “the significance of the glycosyl residues is to impart a discrete recognitional role on the protein”. Sugars, in other words, are carriers of information because their sequences have specific biological functions, and yet the information they carry is only partially contained in the genome. In most cases it is due to subtle epigenetic modifications in the terminals of the sugar antennae. It has been found, furthermore, that sugars have a capacity to store information that is many orders of magnitudes higher than that of nucleotides and amino acids . This makes us realize that, after nucleotides and amino acids, sugars are a third great family of informational molecules, but how do they transmit their messages to the other components of the cell? The key discovery, on this point, is that the functions that are associated with sugars are not set in motion by the sugars. In most cases, they are set in motion by proteins that interact with the sugars and recognize the specific role that they have in any given set of circumstances. These sugar-binding proteins became popular in the early 1900s mainly because they served to determine the chemical structure of the ABO blood groups and were originally called agglutinins. In 1954, however, Boyd argued that they should be given a new name that reflects the unique function that they actually perform, i.e., the highly specific selection of carbohydrates. To this purpose he proposed to call them lectins, on the ground that this term derives “from the Latin lectus, the past principle of legere meaning to pick, choose or select” (Boyd 1954). The next step in the discovery of the informational properties of the sugars was the recognition, by Hans-Joachim Gabius, that their messages must be decoded in order to have biological effects, and that lectins are the decoding devices in this process. Gabius, in other words, realized that lectins are adaptors, molecules that act as intermediaries between sugars and biological reactions and establish connections between them that are not determined by physical necessity. This is why he proposed that there is a Sugar code at the basis of the communication processes that involve sugars, and that “lectins are the translators of the Sugar code”.

Last edited by Admin on Thu Oct 01, 2020 12:45 pm; edited 4 times in total


14The various codes in the cell Empty The Glycomic Code Thu Apr 28, 2016 7:23 am



The Glycomic Code

Not only the universe , but biological systems as well are fine-tuned on a razors edge. There are things, that are easily overlooked, but determine the arise of advanced life  on planet earth. Who could imagine, that the structure of  plant cell walls require complex coded information and the assembly to form special complex matrix structures that are  controlled by rules that are arbitrary in order to prevent microorganisms to enter plant cells and destroy them ? If that were the case, we would not be here......  


An extracellular matrix called cell wall surrounds all plant cells and one of its most common component is cellulose, a polymer formed by long chains of glucose that bind to each other with such great affinity that most of the water is excluded from their surface. The result is a structure that is very hard to hydrate and to break. Cell walls, however, are not made of cellulose only. There are other polymers that occur in significant amounts and in most cases they are similar to cellulose in structure, but are branched in more complex ways. Because of their similarity to cellulose, these branched polysaccharides have been called hemicelluloses. They surround the cellulose microfibrils and interact with each other with non-covalent bonds in such a complex way that the hemicelluloses are even harder to disassemble. On top of that, there is an even higher level of complexity: the cellulose-hemicellulose domain is embedded in a matrix of pectins (polysaccharides with very complex chemical structure) which forms a jelly-like structure that retains water and at the same time it further reduces the pores of the cell wall. The complexity is probably due to the fact that the cell walls, in addition to controlling the expansion and the growth of the plant cells, must also form a barrier that prevents, or makes it extremely difficult for, microorganisms to enter into the cell cytoplasm. When microorganisms invade a plant cell, it may seem that all they need to gain access is a few enzymes that degrade pectins, hemicellulose and cellulose, but that is not the case. In fact, if microorganisms could easily enter into plant cells, most plants would not survive and life on Earth would not exist in its present form. So, how did plants manage to defend themselves? The second most abundant polysaccharide on Earth after cellulose, is xyloglucan, and by using enzymes such as cellulases, researchers could study whether the oligosaccharides found in xyloglucan were arranged randomly or not. A first answer to this question came from the discovery by Buckeridge et al. of a new xyloglucan polymer that contains two families of oligosaccharides, one with four and the other with five glucoses (tetramers and pentamers).  

There are regularities in the tetramers and pentamers of the xyloglucan molecules. This was probably the first proof that the constitutive blocks of xyloglucan are not arranged randomly. After that finding, Marcos Buckeridge and Amanda De Souza performed experiments on a large number of hemicelluloses and found that some enzymes have higher specificity for certain regions in all branched polymers, which implies that their molecules too are non-randomly organized. These regularities in hemicelluloses suggested that their assembly is controlled by rules, and the fact that they are the result of contingent  developments indicated that the rules are arbitrary. This is why the authors proposed that there is a glycomic code in plant cell wall hemicelluloses. A consequence of this proposal is the idea that 

plants have an increasingly complex system of coding rules for the assembly of hemicelluloses in order to keep at bay the invading organisms by forcing them to develop an increasingly high number of specific enzymes. As a result, only a few microorganisms managed to find the key to enter any given plant cell.  The constraints imposed by the glycomic code on plant cell walls are so severe that many organisms – including us – have digestive systems that are totally dependent on cell walls (familiarly known as food fibers) 

Furthermore, plants themselves hardly degrade their own walls. It is true that in a forest cell walls are eventually degraded, but this is achieved by communities of microorganisms and never (or rarely) by a single species. If the glycomic code of plant cell walls did not exist, in conclusion, we would probably not be here because plants would be utterly different from what they presently are.

Breaking the “Glycomic Code” of Cell Wall Polysaccharides May Improve Second-Generation Bioenergy Production from Biomass 2

Plant cell walls display a highly complex organization that confers resistance (recalcitrance) to enzymatic hydrolysis. This poses a barrier  due to the difficulty of enzymes in accessing wall polymers. Here, we examine the fine structure of some of the main cell wall hemicelluloses and present some evidences that lend support to the idea of a glycomic code, which can be defined as the diversity of encrypted results of the biosynthetic mechanisms of plant cell wall polysaccharides that give rise to fine-structural domains containing information in polysaccharides. These are responsible for the formation of polymer composites with different levels of polymer-polymer interactions and recalcitrance to hydrolysis. Polysaccharide motifs that are recalcitrant to hydrolysis are here called pointrons, and the ones that are available to enzyme attack are named pexons. From the biotechnological viewpoint, the understanding of the glycomic code will require further identification of pointrons and possibly the transformation of them into pexons so that walls would become suitable to hydrolysis. 

Do plant cell walls have a code? 3

A code is a set of rules that establish correspondence between two worlds, signs (consisting of encrypted information) and meaning (of the decrypted message). A third element, the adaptor, connects both worlds, assigning meaning to a code. We propose that a Glycomic Code exists in plant cell walls where signs are represented by monosaccharides and phenylpropanoids and meaning is cell wall architecture with its highly complex association of polymers. Cell wall biosynthetic mechanisms, structure, architecture and properties are addressed according to Code Biology perspective, focusing on how they oppose to cell wall deconstruction. Cell wall hydrolysis is mainly focused as a mechanism of decryption of the Glycomic Code. Evidence for encoded information in cell wall polymers fine structure is highlighted and the implications of the existence of the Glycomic Code are discussed. Aspects related to fine structure are responsible for polysaccharide packing and polymer-polymer interactions, affecting the final cell wall architecture. The question whether polymers assembly within a wall display similar properties as other biological macromolecules (i.e. proteins, DNA, histones) is addressed, i.e. do they display a code?

The Hox Codes

Code biology, Barbieri, page 107

In 1979, David Elder proposed a model that was capable of accounting for the regularities that exist in the bodies of many segmented worms (annelids). The segments of these animals are often subdivided into annuli whose number varies according to a simple rule: if a segment contains n annuli, the following segment contains either the same number n (repetition) or n plus or minus 1 (digital modification). Elder noticed that this type of rules is known to the designers of electronic circuits as a Gray code, a code that is binary (because it employs circuits that have only one of two states), combinatorial (because its outcomes are obtained by combinations of circuits) and progressive (because consecutive outcomes must be coded by combinations that differ in the state of one circuit only). The results obtained with these rules describe with great accuracy what is observed in segmented worms, and Elder proposed therefore that the body plan of these animals is based on a combinatorial code that is a biological equivalent of the Gray code. He underlined in particular that the coding principle cannot be the classical “one geneone pattern”, but “one combination of genes-one pattern” and for this reason he called it epigenetic code (Elder 1979). After the discovery of the Hox genes, it became increasingly clear that they are used in many different permutations, according to a combinatorial set of rules that became known as Hox code. The term Hox code was introduced independently by Paul Hunt and colleagues (1991) and by Kessel and Gruss (1991) to account for the finding that the individual characteristics of the vertebrae are determined by different combinations of Hox genes. Later on, it was found that this is true in most other organs and it became standard practice to refer to any combination of Hox genes as a Hox code. The epigenetic code proposed by Elder, in particular, is a Hox code because it is Hox genes that are responsible for the body plan of the segmented worms. It must be underlined that the Hox genes can be used in different combinations not only in various parts of a body, but also in different stages of embryonic development. At the phylotypic stage, for example, the Hox genes specify characteristics of the phylum, whereas in later stages they determine characteristics at lower levels of organization. There is, in short, a hierarchy of Hox gene expressions, and therefore a hierarchy of Hox codes. At this point, however, we have to face a key definition problem: is it legitimate to say that the Hox codes are true organic codes? More precisely, that they have the basic features that we find, for example, in the genetic code? An organic code is a mapping between two independent worlds and cannot exist without a set of adaptors that physically realize the mapping. The Hox codes have been defined instead as patterns of combinatorial gene expression and do not require adaptors because a molecular pattern in one world is not a mapping between two independent worlds. We have therefore two different definitions of code, one based on mapping and the other on patterns, or sequences, and it is important to keep them separate because they have different biological implications.

2) http://link.springer.com/article/10.1007%2Fs12155-014-9460-6#/page-1
3) http://www.ncbi.nlm.nih.gov/pubmed/26706079




How the various informational codes in the cell point to design


The deeper science digs, the more we discover, how complex and ingenious life is. The ones that argue that the more science moves forward, the less God is necessary, have it just backward. It's exactly the opposite: The more science discovers, the more intelligent design becomes evident.  Would Darwin ever imagine how complex cell factories are? And the fact, that life is permeated by information? So far, we know about at least 12 different code systems in the cell, of which the glycan code of glycoproteins exceeds DNA complexity by far, and science is just in the beginning to unravel its complexity.

The various codes in the cell


The Genetic Code
The Splicing Codes
The Metabolic Code
The Signal Transduction Codes
The Signal Integration Codes
The Histone Code
The Tubulin Code
The Sugar Code
The Glycomic Code
The non-ribosomal code
The Calcium Code
The RNA code

and at least 19 different gene codes ( below i am listing 26! ):

The different genetic codes

the National Center for Biotechnology Information (NCBI), currently acknowledges nineteen different coding languages for DNA


1. The Standard Code
2. The Vertebrate Mitochondrial Code
3. The Yeast Mitochondrial Code
4. The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
5. The Invertebrate Mitochondrial Code
6. The Ciliate, Dasycladacean and Hexamita Nuclear Code
9. The Echinoderm and Flatworm Mitochondrial Code
10. The Euplotid Nuclear Code
11. The Bacterial, Archaeal and Plant Plastid Code
12. The Alternative Yeast Nuclear Code
13. The Ascidian Mitochondrial Code
14. The Alternative Flatworm Mitochondrial Code
16. Chlorophycean Mitochondrial Code
21. Trematode Mitochondrial Code
22. Scenedesmus obliquus Mitochondrial Code
23. Thraustochytrium Mitochondrial Code
24. Pterobranchia Mitochondrial Code
25. Candidate Division SR1 and Gracilibacteria Code
26. Pachysolen tannophilus Nuclear Code
27. Karyorelict Nuclear
28. Condylostoma Nuclear
29. Mesodinium Nuclear
30. Peritrich Nuclear
31. Blastocrithidia Nuclear

they had to emerge independently. One could not have evolved from another.


16The various codes in the cell Empty Re: The various codes in the cell Thu Jul 30, 2020 1:54 pm



The various codes in the cell


Evolution and Unprecedented Variants of the Mitochondrial Genetic Code in a Lineage of Green Algae  16 October 2019
Mitochondria of diverse eukaryotes have various departures from the standard genetic code. There are unprecedented codes, not extant in any translation system examined so far, necessitating redefinition of existing translation tables and creating at least seven new ones. . The unanticipated degree of evolutionary malleability diversity of the genetic code employed by nuclear genomes was recently documented by studies of various microbial eukaryotes, unveiling codes with no dedicated termination codons. Changes in the genetic code employed by plastids were considered very rare, but this perspective is changing with a series of recent discoveries afforded by sequencing of plastid genomes of exotic algae and plants. However, the most important playground for code evolution diversifications are mitochondria and their translation apparatus. Human mitochondria provided the first known deviation from the standard code, and a plethora of code modifications have been described from mitochondria of diverse eukaryotes over the following 40 decades, amounting for 13 different mitochondrial code variants included in the present list of alternative translation tables maintained by NCBI 1 ( catalogizing 33 codes. ) 1  The following genetic codes are described here:

1. The Standard Code
2. The Vertebrate Mitochondrial Code
3. The Yeast Mitochondrial Code
4. The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
5. The Invertebrate Mitochondrial Code
6. The Ciliate, Dasycladacean and Hexamita Nuclear Code
9. The Echinoderm and Flatworm Mitochondrial Code
10. The Euplotid Nuclear Code
11. The Bacterial, Archaeal and Plant Plastid Code
12. The Alternative Yeast Nuclear Code
13. The Ascidian Mitochondrial Code
14. The Alternative Flatworm Mitochondrial Code
16. Chlorophycean Mitochondrial Code
21. Trematode Mitochondrial Code
22. Scenedesmus obliquus Mitochondrial Code
23. Thraustochytrium Mitochondrial Code
24. Rhabdopleuridae Mitochondrial Code
25. Candidate Division SR1 and Gracilibacteria Code
26. Pachysolen tannophilus Nuclear Code
27. Karyorelict Nuclear Code
28. Condylostoma Nuclear Code
29. Mesodinium Nuclear Code
30. Peritrich Nuclear Code
31. Blastocrithidia Nuclear Code
33. Cephalodiscidae Mitochondrial UAA-Tyr Code

This list is, however, incomplete, as it ignores a growing number of additional variants that have appeared in the literature.

34.The parasitic nematode Radopholus similis mitochondrial genome 2
35.Picoplanktonic Green Alga Pycnococcus provasolii Reduced Mitochondrial Genome 3
36.Clathrina clathrus Mitochondrial DNA code 4
37.Ashbya mitochondria code 5 6

The various codes in the cell 41437_2008_Article_BFhdy200862_Fig2_HTML
Structural genomic features of metazoan mtDNA drawn into a phylogenetic context. 6

Mitochondria are the only cytoplasmic organelles in humans to house their own DNA (mtDNA). A prototypical molecule of human mtDNA is 16,569 bp long. It is a closed circle of 37 maternally inherited genes, 22 of which encode tRNAs, 13 specify polypeptides, and 2 encode rRNAs. 7

1. https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
2. https://link.springer.com/article/10.1186/1756-0500-2-192
3. https://link.springer.com/article/10.1007%2Fs00239-010-9322-6
4. https://academic.oup.com/mbe/article/30/4/865/1068534
5. https://dash.harvard.edu/handle/1/11879200
6. https://www.nature.com/articles/hdy200862
7. https://www.ncbi.nlm.nih.gov/books/NBK210010/




The problem of the origin of the hardware and software in the cell is far greater than commonly appreciated

- Getting the basic elements to make the building blocks of life
- RNA world
- RNA and DNA synthesis
- Polymerization through catalysts on clay
- The Eigen threshold
- The transition from the RNA world, to the DNA world
- Obtaining the genetic Code
- The genetic code is optimal amongst 1 million
- The second, overlapping code in DNA
- The amazing information storage capacity of DNA
- Getting the information in the genome
- Getting the gene expression machinery to make proteins
- Origin of the 37 gene codes: Did they evolve? 

It is known, that explaining where the information stored in DNA comes from, in special to make the first organism, is a problem not explained by science, and unsolved. This has been traditionally, a major argument used by IDists to make their case for design. Not rarely, proponents of materialism resort to the so-called RNA world, but it is plagued with problems. The foremost are two: The hardware, and the software problem: How to get RNA and DNA on the Hadean Earth, and the second is how to get information to give life a first go. Prebiotic synthesis of RNA and DNA has never been solved. The hurdles are truly formidable. I have listed 37 different unsolved issues 1 Adherents of evolution usually start their narrative when life already started. While it is true, that mutations provoke change, it is by far not substantiated, that such changes, either single point mutations, or lateral gene transfer, or larger sections like exons, nor genetic shift or gene flow could bring forward the millions of different species on earth. But when we look to the root of the problem, the gigantic problem faced by science to solve the riddle of how information-rich life started, becomes clear. No naturalistic explanations exist, despite decades of attempts to solve the riddle. The problem is formidable, and manyfold. First of all, there is no evidence that the atoms in the usable form required to make RNA and DNA were extant on the early earth. 19  Secondly, even IF we presuppose that this problem has a viable solution, catalysis on clay to form polymerization of RNA strands is just wishful thinking. 3  But even, let's suppose, that was the way it went, there is the next problem:

The primary incentive behind the theory of self-replicating systems that Manfred Eigen outlined was to develop a simple model explaining the origin of biological information and, hence, of life itself. Eigen’s theory revealed the existence of the fundamental limit on the fidelity of replication (the Eigen threshold): If the product of the error (mutation) rate and the information capacity (genome size) is below the Eigen threshold, there will be stable inheritance and hence evolution; however if it is above the threshold, the mutational meltdown and extinction become inevitable (Eigen, 1971). The Eigen threshold lies somewhere between 1 and 10 mutations per round of replication

The very origin of the first organisms presents at least an appearance of a paradox because a certain minimum level of complexity is required to make self-replication possible at all; high-fidelity replication requires additional functionalities that need even more information to be encoded.  The crucial question is how the Darwin-Eigen cycle could have started—how was the minimum complexity that is required to achieve the minimally acceptable replication fidelity attained? In even the simplest modern systems, such as RNA viruses, replication is catalyzed by complex protein polymerases. The replicase itself is produced by translation of the respective mRNA(s), which is mediated by the immensely complex ribosomal apparatus. Hence, the dramatic paradox of the origin of life is that to attain the minimum complexity required for a biological system to start on the Darwin-Eigen spiral, a system of a far greater complexity appears to be required. How such a system could emerge is a  puzzle that defeats conventional evolutionary thinking, all of which is about biological systems moving along the spiral; the solution is bound to be unusual. The origin of life—or, to be more precise, the origin of the first replicator systems and the origin of translation—remains a huge enigma, and progress in solving these problems has been very modest—in the case of translation, nearly negligible.4

Now let us suppose that this problem would be overcome by RNA catalysis. The next huge step would be to go from short polypeptide RNA to long, stable DNA chains. The transition from RNA to DNA is the next overwhelmingly huge problem. Highly complex nanomachines are required to synthesize DNA from RNA: At least 26 hypercomplex enzymes like RNR proteins are required 6 Of course, to make those, DNA is required, which turns the riddle a catch22 problem: 

What came first, DNA or the machines that make DNA? 

Now let's suppose, that problem would have been solved, and we have the raw materials, RNA, and DNA, and some working prebiotic polymerization mechanism. Lets even suppose that RNA on clay would work. 
The next problem would be to form the genetic code, of 64 codons, and the assignment of the meaning of each codon to one of the 20 amino acids used to make proteins. 8 That is the genetic cipher, or the translation code. Assigning the meaning of one symbol to something else is ALWAYS based on mind. 7 There is NO viable alternative explanation. One science paper has called the origin of the genetic code the universal enigma 10 On top of that, the genetic code is near-optimal amongst 1 million alternative codes, which are less robust. How to explain that feat?   9 Furthermore, an “overlapping language” has been found in the genetic code. How to explain THAT marvel of ingeniosity? Now, let's suppose we had RNA, DNA, polymerization, and the genetic code. We can equate it to an information storing hard disk but of far higher sophistication than anything devised by man. 12 Even Richard Dawkins had to admit in 

The Blind Watchmaker, pp. 116–117.... 
there is enough information capacity in a single human cell to store the Encyclopaedia Britannica, all 30 volumes of it, three or four times over. 

Now, let's suppose, we have a fully operational raw material, and the genetic language upon which to store genetic information. Only now, we can ask: Where did the information come from to make the first living organism? Various attempts have been made to lower the minimal information content to produce a fully working operational cell. Often, Mycoplasma is mentioned as a reference to the threshold of the living from the non-living. Mycoplasma genitalium is held as the smallest possible living self-replicating cell. It is, however, a pathogen, an endosymbiont that only lives and survives within the body or cells of another organism ( humans ).  As such, it IMPORTS many nutrients from the host organism. The host provides most of the nutrients such bacteria require, hence the bacteria do not need the genes for producing such compounds themselves. As such, it does not require the same complexity of biosynthesis pathways to manufacturing all nutrients as a free-living bacterium. 

Better candidates are the simplest free-living bacteria such as Pelagibacter ubique. 13 It is known to be one of the smallest and simplest, self-replicating, and free-living cells.  It has complete biosynthetic pathways for all 20 amino acids.  These organisms get by with about 1,300 genes and 1,308,759 base pairs and code for 1,354 proteins.  14  They survive without any dependence on other life forms. Incidentally, these are also the most “successful” organisms on Earth. They make up about 25% of all microbial cells.   If a chain could link up, what is the probability that the code letters might by chance be in some order which would be a usable gene, usable somewhere—anywhere—in some potentially living thing? If we take a model size of 1,200,000 base pairs, the chance to get the sequence randomly would be 4^1,200,000 or 10^722,000. This probability is hard to imagine but an illustration may help. 

Imagine covering the whole of the USA with small coins, edge to edge. Now imagine piling other coins on each of these millions of coins. Now imagine continuing to pile coins on each coin until reaching the moon about 400,000 km away! If you were told that within this vast mountain of coins there was one coin different to all the others. The statistical chance of finding that one coin is about 1 in 10^55. 

Now, after several chemical evolutionary miraculous events, we have eventually a functional genome, with complex instructional codified information stored to make a hypothetical minimal self-replicating cell. But we have not yet dealt with the origin of the transcription and translation machinery, necessary to express the genetic information, to make proteins.  Where did that machinery come from? Of course, genetic information is required to specify the amino acid chains that make these machines. The problem is nothing short of monumental. The macro-molecular machinery belongs to the most complex known.  To make proteins, and direct and insert them to the right place where they are needed, at least 25 unimaginably complex biosyntheses and production-line like manufacturing steps are required. Each step requires extremely complex molecular machines composed of numerous subunits and co-factors, which require the very own processing procedure described below, which makes its origin an irreducible  catch22 problem 16
To exemplify this, lets take the Ribosome 17
The origin of the translation system is, arguably, the central and the hardest problem in the study of the origin of life, and one of the hardest in all evolutionary biology. 
The design of the translation system in even the simplest modern cells ( such as CarsonellaMycoplasma,) is extremely complex. At the heart of the system is the ribosome, a large complex of at least three RNA molecules and 60–80 proteins arranged in precise spatial architecture and interacting with other components of the translation system in the most finely choreographed fashion. These other essential components include the complete set of tRNAs for the 20 amino acids (~40 tRNA species considering the presence of isoacceptor tRNAs in all species), the set of 18–20 cognate aminoacyl-tRNA synthetases, and a complement of at least 7–8 translation factors.. Together with the universal conservation of ~30 RNA species [three rRNAs, the signal recognition particle (SRP) RNA, and tRNAs of at least 18 specificities]  5

To end the story: Science has catalogized, so far, besides the standard genetic code, other 37 different codes, specially employed in mitochondria.  Invertebrates use a different mitochondrial genetic code than in vertebrates, and both of those codes are different from the “universal” genetic code. That means that the eukaryotic cells that eventually evolved into invertebrates must have formed when a cell that used the “universal” code engulfed a cell that used a different code. Of course, that raises the question, if originally, two different codes emerged. However, the eukaryotic cells that eventually evolved into vertebrates must have formed when a cell that used the “universal” code engulfed a cell that used yet another different code. As a result, invertebrates must have evolved from one line of eukaryotic cells, while vertebrates must have evolved from a completely separate line of eukaryotic cells. But this isn’t possible, since evolution depends on vertebrates evolving from invertebrates.

Now, of course, this serious problem can be solved by assuming that while invertebrates evolved into vertebrates, their mitochondria also evolved to use a different genetic code. But how that would be possible? After all, the invertebrates spent supposedly millions of years evolving, and through all those years, their mitochondrial DNA was set up based on one code. How could the code change without destroying the function of the mitochondria? At a minimum, this adds another task to the long, long list of unfinished tasks necessary to explain how evolution could possibly work. Along with explaining how nuclear DNA can evolve to produce the new structures needed to change invertebrates into vertebrates, proponents of evolution must also explain how, at the same time, mitochondria can evolve to use a different genetic code!

There would be much more to say, as to ask: Where did the gene regulatory network, that orchestrates gene expression come from, and how is that regulated, and how are proteins directed to their end destination. But i leave that to another article. So, the end question: How is all this better explained? By chance, or intelligent design? I go with the latter. 

1. https://reasonandscience.catsboard.com/t1279p75-abiogenesis-is-mathematically-impossible#7759
2. https://reasonandscience.catsboard.com/t2437-essential-elements-and-building-blocks-for-the-origin-of-life#7789
3. https://reasonandscience.catsboard.com/t2865-rna-dna-it-s-prebiotic-synthesis-impossible#7307
4. https://reasonandscience.catsboard.com/t2234-the-origin-of-replication-and-translation-and-the-rna-world
5. https://reasonandscience.catsboard.com/t2234-the-origin-of-replication-and-translation-and-the-rna-world#4442
6. https://reasonandscience.catsboard.com/t2894-prevital-unguided-origin-of-the-four-basic-building-blocks-of-life-impossible#7650
7. https://reasonandscience.catsboard.com/t2057-origin-of-translation-of-the-4-nucleic-acid-bases-and-the-20-amino-acids-and-the-universal-assignment-of-codons-to-amino-acids
8. https://reasonandscience.catsboard.com/t2001-origin-and-evolution-of-the-genetic-code-the-universal-enigma
9. https://reasonandscience.catsboard.com/t1404-the-genetic-code-is-nearly-optimal-for-allowing-additional-information-within-protein-coding-sequences
10. https://reasonandscience.catsboard.com/t2363-the-genetic-code-insurmountable-problem-for-non-intelligent-origin
11. https://reasonandscience.catsboard.com/t2185-the-second-code-of-dna
12. https://reasonandscience.catsboard.com/t2052-the-amazing-dna-information-storage-capacity
13. https://microbewiki.kenyon.edu/index.php/Pelagibacter_ubique#:~:text=Description%20and%20significance,of%20all%20microbial%20plankton%20cells.
14. https://www.uniprot.org/proteomes/UP000002528
15. https://reasonandscience.catsboard.com/t2508-abiogenesis-uncertainty-quantification-of-a-primordial-ancestor-with-a-minimal-proteome-emerging-through-unguided-natural-random-events#7792
16. https://reasonandscience.catsboard.com/t2039-the-interdependent-and-irreducible-structures-required-to-make-proteins
17. https://reasonandscience.catsboard.com/t1661-translation-through-ribosomes-amazing-nano-machines
18. http://blog.drwile.com/?p=14280
19. https://reasonandscience.catsboard.com/t2437-essential-elements-and-building-blocks-for-the-origin-of-life#7789


18The various codes in the cell Empty The NEURAL CODE Tue Nov 24, 2020 9:15 am






19The various codes in the cell Empty Re: The various codes in the cell Thu Jan 07, 2021 4:19 pm



Error-correcting codes and information in biology

Somebody trying to know about error-correcting codes as used in communication technology would be exposed to a plentiful, highly mathematical literature 1. Their domain widely exceeds the technical one since they have a first-order role, although poorly recognized, in biology and in linguistics. They are actually omnipresent in both nature. Literal communication consists of reproducing at a certain place a message which is available elsewhere.  The message to be reproduced is a sequence of symbols, each belonging to a predetermined finite set of signs called the alphabet.  These signs can be distinguished from each other without any ambiguity. The analysis of speech shows that it can be interpreted as a sequence of elementary sound waveforms in finite number, referred to as phonemes, the set of which constitutes a phonetic alphabet specific to any language. A written alphabet like the Latin one is intended to represent the oral language by means of an established (although often rather loose) correspondence between its letters and the phonemes of the language. The communication process is irreversible, in the sense that it is impossible to delete or change already transmitted symbols. The necessary agent of a communication is a sequence of signs which belong to the alphabet, referred to as a message. For a spoken message, these signs are the phonemes of some language; for a written one they are letters of the alphabet used by some human community. Events, objects, beings, and also ideas, reasonings, feelings, myths, etc. can be evoked by a message. Because we perceive speech at a very early age and because we learn how to read since childhood, we do not wonder about this almost miraculous correspondence between combinations of a small number of signs and an unlimited number of elements of the concrete and the mental world. This correspondence is realized by means of a language, a complex system which comprises a number of sequences of signs from a set of phonemes (in its oral form) or letters (in its written form): its words, the set of which constitutes the lexicon of this language. A dictionary associates the words of the lexicon with concrete objects or abstract entities, or actions they incur or relations between them. A grammar defines the rules which enable combining the words so as to express the relations between the objects they represent. By the agency of the language, a message establishes a communication between the speaker and the listener, or the writer and the reader, which bears a meaning and is referred to as semantic.

1. https://www.sciencedirect.com/science/article/abs/pii/S0303264719301145


20The various codes in the cell Empty 21. The immune response code, or language Sat Jan 29, 2022 2:01 pm



21. The immune response code, or language

Dr. Francis Collins Immune Macrophages Use Their Own ‘Morse Code’ July 7th, 2021 1

In the language of Morse code, the letter “S” is three short sounds and the letter “O” is three longer sounds. Put them together in the right order and you have a cry for help: S.O.S. Now an NIH-funded team of researchers has cracked a comparable code that specialized immune cells called macrophages use to signal and respond to a threat.

In fact, by “listening in” on thousands of macrophages over time, one by one, the researchers have identified not just a lone distress signal, or “word,” but a vocabulary of six words. Their studies show that macrophages use these six words at different times to launch an appropriate response. What’s more, they have evidence that autoimmune conditions can arise when immune cells misuse certain words in this vocabulary. This bad communication can cause them incorrectly to attack substances produced by the immune system itself as if they were a foreign invaders.

The findings, published recently in the journal Immunity, come from a University of California, Los Angeles (UCLA) team led by Alexander Hoffmann and Adewunmi Adelaja. As an example of this language of immunity, the video above shows in both frames many immune macrophages (blue and red). You may need to watch the video four times to see what’s happening (I did). Each time you run the video, focus on one of the highlighted cells (outlined in white or green), and note how its nuclear signal intensity varies over time. That signal intensity is plotted in the rectangular box at the bottom.

The macrophages come from a mouse engineered in such a way that cells throughout its body light up to reveal the internal dynamics of an important immune signaling protein called nuclear NFκB. With the cells illuminated, the researchers could watch, or “listen in,” on this important immune signal within hundreds of individual macrophages over time to attempt to recognize and begin to interpret potentially meaningful patterns.

On the left side, macrophages are responding to an immune activating molecule called TNF. On the right, they’re responding to a bacterial toxin called LPS. While the researchers could listen to hundreds of cells at once, in the video they’ve randomly selected two cells (outlined in white or green) on each side to focus on in this example.

As shown in the box in the lower portion of each frame, the cells didn’t respond in precisely the same way to the same threat, just like two people might pronounce the same word slightly differently. But their responses nevertheless show distinct and recognizable patterns. Each of those distinct patterns could be decomposed into six code words. Together these six code words serve as a previously unrecognized immune language!

Overall, the researchers analyzed how more than 12,000 macrophage cells communicated in response to 27 different immune threats. Based on the possible arrangement of temporal nuclear NFκB dynamics, they then generated a list of more than 900 pattern features that could be potential “code words.”

Using an algorithm developed decades ago for the telecommunications industry, they then monitored which of the potential words showed up reliably when macrophages responded to a particular threatening stimulus, such as a bacterial or viral toxin. This narrowed their list to six specific features, or “words,” that correlated with a particular response.

To confirm that these pattern features contained meaning, the team turned to machine learning. If they taught a computer just those six words, they asked, could it distinguish the external threats to which the computerized cells were responding? The answer was yes.

But what if the computer had five words available, instead of six? The researchers found that the computer made more mistakes in recognizing the stimulus, leading the team to conclude that all six words are indeed needed for reliable cellular communication.

To begin to explore the implications of their findings for understanding autoimmune diseases, the researchers conducted similar studies in macrophages from a mouse model of Sjögren’s syndrome, a systemic condition in which the immune system often misguidedly attacks cells that produce saliva and tears. When they listened in on these cells, they found that they used two of the six words incorrectly. As a result, they activated the wrong responses, causing the body to mistakenly perceive a serious threat and attack itself.

While previous studies have proposed that immune cells employ a language, this is the first to identify words in that language, and to show what can happen when those words are misused. Now that researchers have a list of words, the next step is to figure out their precise definitions and interpretations [2] and, ultimately, how their misuse may be corrected to treat immunological diseases.

Evolution News Another Language Found in Life: Immune Signaling June 18, 2021 2

Begin with a remarkable fact: the body’s immune system finds specific targets and mounts a coordinated response to eliminate them. That much is common knowledge. Everyone knows why this happens, too: without it, the organism would die. Inquiring minds, though, want to know how the immune system does it. Scientists at UCLA believe they have discovered the Rosetta Stone of the immune system: a molecular “language” that activates the body’s defenses to mount a coordinated and accurate response to pathogens.

1. https://directorsblog.nih.gov/2021/07/07/immune-macrophages-use-their-own-morse-code/
2. https://evolutionnews.org/2021/06/another-language-found-in-life-immune-signaling/


Sponsored content

Back to top  Message [Page 1 of 1]

Permissions in this forum:
You cannot reply to topics in this forum