One of the greatest mysteries in cell life is how the information stored in the cell itself can dynamically control the many changes that continuously take place in living cells and in living beings. So, the first question is: what is this information, and where is it stored? 2
Of course, the classical answer is that it is in DNA, and in particular in protein-coding genes. But we know that today that answer is not enough.
Indeed, a cell is an ever-changing reality. If we take a cell, any cell, at some specific time t, that cell is the repository of a lot of information, at that moment and in that state. That information can be grossly divided in (at least) two different compartments:
a) Genomic information, which is stored in the sequence of nucleotides in the genome. This information is relatively stable and, with a few important exceptions, is the same in all the cells of a multicellular being.
b) Nongenomic information. This includes all the specific configurations which are present in that cell at time t, and in particular all epigenetic information (configurations that modify the state of the genomic information) and, more generally, all configurations in the cell. The main components of this dynamic information are the cell transcriptome and proteome at time t and the sum total of its chromatin configurations.
The transcriptome/proteome is the sum total of all proteins and RNAs (and maybe other functional molecules) that are present in the cell at time t, and which define what the cell is and does at that time.The chromatin configuration can be considered as a special “reading” of the genomic information, individualized by many levels of epigenetic control. While the genomic information is more or less the same in all cells, it can be expressed in myriads of different ways, according to the chromatin organization at that moment, which determines what genes or parts of the genome are “available” at time t in the cell. In this way, one genomic sequence can be read in multiple different ways, with different functional meanings and effects. So, if we just stick to protein-coding genes, the 20000 genes in the human genome are available only partially in each cell at each moment, and that allows for a myriad of combinatorial dynamic “readings” of the one stable genome. Two important points:
The interaction between transcriptome/proteome and chromatin configuration is, indeed, an interaction. The transcriptome/proteome determines the chromatin configuration in many ways: for example, changing the methylation of DNA (DNA methyltransferases); or modifying the post-transcriptional modifications (methylation, acetylation, ubiquitination and others) of histones (covalent histone-modifying complexes), or creating new loops in chromatin (transcription factors); or directly remodeling chromatin itself (ATP-dependent chromatin remodeling complexes). In the same way, any modification of the chromatin landscape immediately influences what the existing transcriptome/proteome is and can do because it directly changes the transcriptome/proteome as a result of the changes in gene transcription. Of course, this can modify the availability of genes, promoters, enhancers, and regulatory regions in general at the chromatin level. That’s the meaning of the two big red arrows connecting, at each stage, the two levels of regulation.
Now, let’s try to imagine the flow of dynamic information in the cell as a continuous interaction between these two big levels of organization:
Two important points:
- The interaction between transcriptome/proteome and chromatin configuration is, indeed, an interaction. The transcriptome/proteome determines the chromatin configuration in many ways: for example, changing the methylation of DNA (DNA methyltransferases); or modifying the post-transcriptional modifications (methylation, acetylation, ubiquitination and others) of histones (covalent histone-modifying complexes), or creating new loops in chromatin (transcription factors); or directly remodeling chromatin itself (ATP-dependent chromatin remodeling complexes). In the same way, any modification of the chromatin landscape immediately influences what the existing transcriptome/proteome is and can do, because it directly changes the transcriptome/proteome as a result of the changes in gene transcription. Of course, this can modify the availability of genes, promoters, enhancers, and regulatory regions in general at chromatin level. That’s the meaning of the two big red arrows connecting, at each stage, the two levels of regulation. The same concept is evident in Fig. 1, which shows how the output of transcription has immediate, complex and constant feedback on transcription regulation itself.
- As a result of the continuous changes in the transcriptome/proteome and in chromatin configurations, cell states continuously change in time (yellow arrows). However, this continuous flow of different functional states in each cell can have two different meanings, as shown by the two alternative big brown arrows on the right:
- Cells can change dramatically, following a definite developmental pathway: that’s what happens in cell differentiation, for example from a hematopoietic stem cell to differentiated blood cells like lymphocytes, monocytes, neutrophils, and so on. The end of the differentiation is the final differentiated cell, which is in a sense more “stable”, having reached its final intended “form”.
- Those “stable” differentiated cells, however, are still in a continuous flow of informational change, which is still drawn by continuous modifications in the transcriptome/proteome and in chromatin configurations. Even if these changes are less dramatic, and do not change the basic identity of the differentiated cell, still they are necessary to allow adaptation to different contexts, for example varying messages from near cells or from the rest of the body, either hormonal, or neurologic, or other, or other stimuli from the environment (for example, metabolic conditions, stress, and so on), or even simply the adherence to circadian (or other) rythms. IOWs, “stable” cells are not stable at all: they change continuously while retaining their basic cell identity, and those changes are, again, drawn by continuous modifications in the transcriptome/proteome and in the chromatin configurations of the cell.
Now, let’s have a look at the main components that make the whole process possible. I will mention only briefly the things that have been known for a long time and will give more attention to the components for which there is some recent deeper understanding available. We start with those components that are part of the DNA sequence itself, In other words (IOWs) the genes themselves and those regions of DNA which are involved in their transcription regulation (cis-regulatory elements).Cis elements Genes and promoters Of course, genes are the oldest characters in this play. We have the 20000 protein-coding genes in the human genome, which represent about 1.5% of the whole genomic sequence of 3 billion base pairs. But we must certainly add the genes that code for noncoding RNAs: at present, about 15000 genes for long non-coding RNAs, and about 5000 genes for small noncoding RNAs, and about 15000 pseudogenes. So, the concept of a gene is now very different than in the past, and it includes many DNA sequences that have nothing to do with protein-coding. Moreover, it is interesting to observe that many non-protein-coding genes, in particular, those that code for lncRNAs, have a complex exon-intron structure, like protein-coding genes, and undergo splicing, and even alternative splicing. For a good recent review about lncRNAs, see here:The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expressionLet’s go to promoters. Wikipedia: In genetics, a promoter is a region of DNA that initiates transcription of a particular gene. Promoters are located near the transcription start sites of genes, on the same strand and upstream on the DNA (towards the 5′ region of the sense strand). Promoters can be about 100–1000 base pairs long.A promoter includes:
- The transcription start site (TSS), the point where transcription starts
- A binding site for RNA polymerase
- General transcription factors binding sites for, such as the TATA box and the BRE in eukaryotes
- Other parts that can interact with different regulatory elements.
The second modality is the foundation for the concept of histone code: in that sense, histone marks work as signals of a symbolic code, whose effects in most cases are mediated by complex networks of proteins which can write, read or erase the signals.3D configuration of ChromatinOne the final effects of epigenetic markers, either DNA methylation or histone modifications, is the change in 3D configuration of chromatin, which in turn is related to chromatin accessibility and therefore to transcription regulation.This is, again, a very deep and complex issue. There are specific techniques to study chromatin configuration in space, which are independent of the mapping of chromatin accessibility and of epigenetic markers. The most used are chromosome conformation capture(3C) and genome-wide 3C(Hi-C). Essentially, these techniques are based on specific procedures of fixation and digestion of chromatin that preserve chromatin loops and allow to analyze them and therefore the associations between distant genomic sites (IOWs, enhancer-promoter associations) in specific cells and in specific cell states.Again to make it brief, chromatin topology depends essentially on at least two big factors:The generation of specific loops throughout the genome because of enhancer-promoter associationsThe interactions of chromatin with the nuclear laminaAs a result of those, and other, factors, chromatin generates different levels of topologic organization, which can be described, in a very gross simplification, as follows, going from simpler to more complex structures:Local loopsTopologically associating domains (TADs): This are bigger regions that delimit and isolate sets of specific interaction loops. They can correspond to the idea of isolated “transcription factories”. TADs are separated, at a genomic level, by specific insulators (see later)Lamina-associated domains (LADs and Nucleolus associated domains (NADs): these correspond usually to mainly inactive chromatin regionsChromosomal territories, which are regions of the nucleus preferentially occupied by particular chromosomesA and B nuclear compartments: at a higher level, chromatin in the nucleus seems to be divided into two gross compartments: the A compartment is mainly formed by active chromatin, the B compartment by repressed chromatin[/list]
Figure below shows a simple representation of some of these concepts.
The concept of TAD is particularly interesting, because TADs are insulated units of transcription: many different enhancer-promoter interactions (and therefore loops) can take place inside a TAD, but not usually between one TAD and another one. This happens because TADs are separated by strong insulators.
A very good summary about TADs can be found in the following paper:
Minor Loops in Major Folds: Enhancer–Promoter Looping, Chromatin Restructuring, and Their Association with Transcriptional Regulation and Disease
This is taken from Fig. 1 in that paper, and gives a good idea of what TADs are:
Structural organization of chromatin
(A) Chromosomes within an interphase diploid eukaryotic nucleus are found to occupy specific nuclear spaces, termed chromosomal territories.
(B) Each chromosome is subdivided into topological associated domains (TAD) as found in Hi-C studies. TADs with repressed transcriptional activity tend to be associated with the nuclear lamina (dashed inner nuclear membrane and its associated structures), while active TADs tend to reside more in the nuclear interior. Each TAD is flanked by regions having low interaction frequencies, as determined by Hi-C, that are called TAD boundaries (purple hexagon).
(C) An example of an active TAD with several interactions between distal regulatory elements and genes within it.
Source: Matharu, Navneet (2015-12-03). “Minor Loops in Major Folds: Enhancer–Promoter Looping, Chromatin Restructuring, and Their Association with Transcriptional Regulation and Disease“. PLOS Genetics 11 (12): e1005640. DOI:10.1371/journal.pgen.1005640. PMID 26632825. PMC: PMC4669122. ISSN 1553-7404.
Author: Navneet Matharu, Nadav Ahituv
By Navneet Matharu, Nadav Ahituv [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
There is, of course, a good correlation between the three types of analysis and genomic mapping that we have described::
- Chromatin accessibility mapping
- Epigenetic marks
- Chromatin topology studies
However, these three approaches are different, even if strongly related. They are not measuring the same thing, but different things that contribute to the same final scenario.
CTCF and CohesinBut what are these insulators, the boundaries that separate TADs one from another?
While the nature of insulators can be complex and varies somewhat from species to species, in mammals the main proteins responsible for that function are CTCF and cohesin.
CTCF is indeed a TF, a zinc finger protein with repressive functions. While it has other important roles, it is the major marker of TAD insulators in mammals. It is 727 AAs long in humans, and its evolutionary history shows a definite information jump in vertebrates (0.799 baa, 581 bits) as shown in Fig. 5, which is definitely uncommon for a TF.
We have already encountered CTCF as one of the epigenetic markers used in histone code mapping. Its importance in transcription regulation and in many other important cell functions cannot be overemphasized.
Cohesin is a multiprotein complex which forms a ring around the double stranded DNA, and contributes to a lot of important stabilizations of the DNA fiber in different situations, especially mitosis and meiosis. But we know now that it is also a major actor in insulating TADs, as can be seen in Fig 4, and in regulating chromatin topology. Cohesin and its interacting proteins, like MAU2 and NIPBL, are a fascinating and extremely complex issue of their own, so I just mention them here because otherwise this already too long post would become unacceptably long. However, I suggest here a final, very recent review about these issues, for those interested:
Forces driving the three‐dimensional folding of eukaryotic genomes
The last decade has radically renewed our understanding of higher order chromatin folding in the eukaryotic nucleus. As a result, most current models are in support of a mostly hierarchical and relatively stable folding of chromosomes dividing chromosomal territories into A‐ (active) and B‐ (inactive) compartments, which are then further partitioned into topologically associating domains (TADs), each of which is made up from multiple loops stabilized mainly by the CTCF and cohesin chromatin‐binding complexes. Nonetheless, the structure‐to‐function relationship of eukaryotic genomes is still not well understood. Here, we focus on recent work highlighting the biophysical and regulatory forces that contribute to the spatial organization of genomes, and we propose that the various conformations that chromatin assumes are not so much the result of a linear hierarchy, but rather of both converging and conflicting dynamic forces that act on it.
Summary and ConclusionsSo this is the part where I should argue about how all the things discussed in this OP do point to design. Or maybe I should simply keep silent in this case. Because, really, there should be no need to say anything.
But I will. Because, you know, I can already hear our friends on the other side argue, debate, or just suggest, that there is nothing in all these things that neo-darwinism can’t explain. They will, they will. Or they will just keep silent.
So, I will briefly speak.
First of all, a summary of what has been said. I will give it as a list of what really happens, as far as we know, each time that a gene starts to be transcribed in the appropriate situation: maybe to contribute to the differentiation of a cell, maybe to adjust to a metabolic challenge, or to anything else.
- So, our gene was not transcribed, say, “half an hour ago”, and now it begins to be transcribed. What has happened ot effect this change?
- As we know, first of all some specific parts of DNA that were not active “half an hour ago” had to become active. At the very least, the gene itself, its promoter, and one appropriate enhancer. Therefore, some specific condition of the DNA in those sites must have changed: maybe through changes in histone marks, maybe through chromatin remodeling proteins, maybe through some change in DNA methylation, maybe through the activity of some TF, or some multi-protein structure made by TFs or other proteins, maybe in other ways. What we know is that, whatever the change, in the end it has to change some aspects of the pre-existing chromatin state in that cell: chromatin accessibility, nucleosome distribution, 3D configuration, probably all of them. Maybe the change is small, but it must be there. In our Fig. 2 (at the beginning of this long post) the red arrows are therefore acting from left to right, to effect a transition from state 1 to state 2.
- So, the appropriate DNA sequences are now accessible. What happens then?
- At the promoter, we need at least that the multiprotein structure formed by our 6 general TFs and the multiprotein structure that is RNA Pol II bind the promoter. See Figure 3.
- Always at the promoter, the huge multiprotein structure which is the Mediator complex must join all the rest. See Figure 4.
- At the enhancer, one or more specific TFs must bind the appropriate motif by the appropriate DBD, interact one with the other, recruit possible co-factors.
- At this point, the structure bound at the enhancer must interact with the distant structure at the promoter, probably through the Mediator complex, generating a new chromatin loop, usually in the context of the same TAD. see Fig. 7.
- So, now the 3D configuration of chromatin has changed, and transcription can start.
- But as the new protein is transcribed, and then probably translated (through many further intermediate regulation steps, of course, like the Spliceosome and all the rest), the transcriptome/proteome is changing too. In many cases, that will imply changes in factors that can act on chromatin itself, for example if the new protein is a TF, or any other protein implied directly or indirectly in the above described processes, or even if it can in some way generate new signals that will in the end act on transcription regulation. Maybe the change is small, but it must be there. In our Fig. 2 (at the beginning of this long post) the red arrows are now probably acting from right to left, possibly initiating a transition from state 2 to state 3.
- After all, that is what must have happened at the beginning of this sequence, when some new condition in the transcriptome/proteome started the transcription of our new protein.
And now, a few considerations:
- This is just an essential outline: what really happens is much, much more complex
- As we have seen, the working of all this huge machinery requires a lot of complex and often very specific proteins. First of all the 2000 specific TFs, and then the dozens, maybe hundreds, of proteins that implement the different steps. Many of which are individually huge, often thousands of AAs long.
- The result of this machinery and of its workings is that thousands of proteins are transcribed and translated smoothly at different times and in different cells. The result is that a stem cell is a stem cell, a hepatocyte a hepatocyte and a lymphocyte a lymphocyte. IOWs, the miracle of differentiation. The result is also that liver cells, renal cells, blood cells, after having differentiated to their “stable” state, still perform new wonders all the time, changing their functional states and adapting to all sorts of necessities. The result is also that tissues and organs are held together, that 10^11 neurons are neatly arranged to perform amazing functions, and so on. All these things rely heavily on a correct, constant control of transcription in each individual cell.
- This scenario is, of course, irreducibly complex. Sure, many individual components could probably be shown not to be absolutely necessary for some rough definition of function: transcription can probably initiate even in the absence of some regulatory factor, and so on. But the point is that the incredibly fine regulation of the whole process, its management and control, certainly require all or almost all the components that we have described here.
- Beyond its extraordinary functional complexity, this regulation network also uses at its very core at least one big sub-network based on a symbolic code: the histone code. Therefore, it exhibits a strong and complex semiotic foundation.
So, the last question could be: can all this be the result of a neo-darwinian process of RV + NS of simple, gradual steps?
That, definitely, I will not answer. I think that everybody already knows what I believe. As for others, everyone can decide for themselves.
Brian R. Johnson (2010): By creating depressions or hills, for example, materials on the surface of the membrane can direct molecules toward or away from particular regions. The mechanism for how a cell does work in this case is inherent to the properties of membranes and is not contained in the DNA. The cell's DNA, in contrast, contains instructions for how to take advantage of the membrane's properties at the appropriate time and in the correct manner. Further, such genetic information would be useless without a fully functional membrane, inherited from a parent, on which to act. Thus, a membrane that evolved in the distant past has subsequently passed from one generation to the next ever since, with instructions that modulate its properties accumulating over the years. 3
3. Brian R. Johnson: Self-organization, Natural Selection, and Evolution: Cellular Hardware and Genetic Software December 2010