James A. Shapiro: Evolution: A View from the 21st Century 2011
The main question to ask is how transcriptional regulatory circuits arise in the first place. How are similar binding sites amplified and distributed to multiple locations throughout the genome? How do higher-order circuit elements, enhancers, and more-complex cis-regulatory modules (CRMs) form and then disperse through the genome? These questions are distinct from those that evolutionists ask about the origins and diversification of coding sequences.
How Various Cellular Components Communicate
The DNA can be thought of as the cell’s “recipe book,” written using four “letters” – the nucleotide molecules. Every cell in our body contains DNA with the same nucleotide sequence (with some exceptions). However, the tissues of our body – muscle, bone, skin, etc. are quite different from each other in how they are formed and how they function.
How is possible that all tissues’ cells contain the same DNA sequence but cells in different tissues express different sets of genes and function differently?
The answer lies in the regulation of gene expression – a wide range of mechanisms that, together, regulate which recipes out of the DNA-book will be “cooked,”i.e., which genes would be expressed in each particular cell, in what amounts, and when. Although this recipe book is the same in all cells, the recipes cooked from it may be quite different.
The mechanisms that regulate gene expression may be broken into four major stages, revolving around the production, transfer, translation, and decay of messenger RNA (mRNA) molecules – each encodes a unique protein:
mRNA Synthesis and Maturation: the DNA is a large molecule (almost 2 meters long). In the process called “transcription”, a gene encoding one protein (one recipe of the cookbook), is copied out into mRNA molecule, the nucleotide sequence of which is dictated by the DNA nucleotide sequence; this molecule carries the instructions for building the protein.
mRNA Transport from the cell nucleus into the cytoplasm, an intra-cell environment outside the nucleus, where proteins are produced. The nucleus can be thought of as a “safe” where the precious recipe book is kept. Recipes are copied out of it as necessary, but the book itself is never taken out of the safe.
mRNA Translation: this stage is carried out by the ribosome – the “protein factory.” The ribosome reads the mRNA instruction (a single recipe) and produces a protein. Proteins are composed of amino acids, the sequence of which is dictated by the mRNA nucleotide sequence; the amino acids sequence determines the protein nature and functionality. Proteins perform many functions in our body and are responsible, in part, for what we are.
mRNA Decay: like most molecules in our body, mRNAs are turned over. Their degradation is carried out by factors that, as Choder’s group reported in 2013 (in Cell), also participate in transcription. Thus, mRNA synthesis and decay processes are linked.
Though similar life forms that share a fundamental body plan (the mammals, the apes within the mammals) do share many of the same protein genes, they do not share the programming for the use of those genes. 21
Comparisons of the levels of morphological and protein divergence between humans and chimps demonstrated that the level of protein divergence was too small to account for the anatomical differences between these two species. To reconcile the level of divergence between proteins and morphology, it has been proposed that morphological divergence is based mostly on changes in the mechanisms controlling gene expression and not changes in the protein-coding genes themselves. The past decades have seen major advances in developmental genetics that have changed the way we approach the origin of morphological characters. These advances have produced several generalizations about the relationship between genetics and phenotypes. Among the most widely recognized is the concept of toolbox genes, that is that different body plans are realized with a conserved set of developmental genes, namely transcription factors and signalling molecules. 20
Toolbox genes do not change their functions, although their expression patterns can change.
That raises the question of how toolbox genes emerged in the first place.
Internal Signaling and Information
Developmental Gene Regulatory Networks dGRN's
Animal body plan design, along with cell type differentiation, is controlled by the precise regulation of gene expression in time and space, which in turn is driven by developmental gene regulatory networks (dGRNs) Changes of control genes regulating development are related to morphological change and divergence, suggesting that the changes in morphology are the result of nucleotide substitutions in cis-regulatory elements and amino-acid substitutions in transcription factors affecting the regulation of gene expression. Intergenic genomic regions and differences of protein-coding sequences have an important role in determining differences in gene regulatory patterns, and consequently, in animal body plan diversity. Such networks comprise a constellation of elements including regulatory genes (transcription factors, signalling molecules, noncoding RNAs), regulatory sequences (cis-regulatory modules, enhancers, promoters, insulators), and target genes (differentiation a and structural genes b ), which interplay tightly to trigger induction or repression of gene expression. The right execution of this molecular choreography, repeated anew in every generation, is fundamental to the life of every animal on Earth. Recent advances determined the regulatory gene interactions that underpin the dGRNs and how these interactions control the process of regulation of gene expression during animal development. In this sense, many molecular mechanisms that underlie dGRNs and influence cell type and animal body plan development have been characterized in vertebrates. Thus, dGRNs plays a key role to explain animal diversity. 16
Just as development is a system property of the regulatory genome, it must be considered at system level that the development process can change. Modification of the body plan depends on alteration of the structure of developmental gene regulatory networks as a whole. The hierarchy and multiple additional design features of these networks act to produce Boolean regulatory state specification functions at upstream phases of development of the body plan. These are created by the logic outputs of network subcircuits, and in modern animals these outputs are impervious to continuous adaptive variation. Animal body plans is a system-level property of the developmental gene regulatory networks (dGRNs) which control ontogeny of the body plan. It follows that gross morphological novelty requires dramatic alterations in dGRN architecture, always involving multiple regulatory genes, and typically affecting the deployment of whole network subcircuits. Because dGRNs are deeply hierarchical, and it is the upper levels of these GRNs that control major morphological features in development, a question dealt with below in this essay arises: how can we think about selection in respect to dGRN organization? The answers lie in the architecture of dGRNs and the developmental logic they generate at the system level, far from micro-evolutionary mechanism. While adaptive evolutionary variation occurs constantly in modern animals at the periphery of dGRNs, the stability over geological epochs of the developmental properties that define the major attributes of their body plans requires special explanations rooted deep in the structure/function relations of dGRNs. 19
Mechanistic developmental biology has shown that its fundamental concepts are largely irrelevant to the process by which the body plan is formed in ontogeny. In addition, it gives rise to lethal errors in respect to evolutionary process. Neo-Darwinian evolution is uniformitarian in that it assumes that all process works the same way, so that evolution of enzymes or flower colors can be used as current proxies for study of evolution of the body plan. It
erroneously assumes that change in protein coding sequence is the basic cause of change in developmental program, and it erroneously assumes that evolutionary change in body plan morphology occurs by a continuous process. All of these assumptions are basically counterfactual. This cannot be surprising since the neo-Darwinian synthesis from which these ideas stem was a pre-molecular biology concoction focused on population genetics and adaptation natural history, neither of which have any direct mechanistic import for the genomic regulatory systems that drive embryonic development of the body plan.
When the properties of the gene regulatory networks that actually generate body plans and body parts are taken into account, it can be seen that many entirely new and different mechanistic factors come into play. The result is that just as the paleontological record of change in animal morphology is the opposite of uniformitarian, so, for very good reasons that are embedded in their structure/function relations, are the mechanisms of dGRN emergence.
No observations on single genes can ever illuminate the overall mechanisms of the development of the body plan or of body parts.
The architecture of animal body plans as change and conservation of developmental Gene Regulatory Network (dGRN) structure: mechanistic consequences:
Since dGRN structure depends on cis-regulatory linkages at nodes, the change in dGRN structure occurs by redeployment of cis-regulatory modules controlling regulatory gene expression.
Since dGRNs are deeply hierarchical effects of given cis-regulatory change depend specifically on their location in dGRN.
Since dGRNs are deeply hierarchical, subcircuits operating at upper levels (early in developmental process) preclude/prevent/prohibit/hinder/impede certain downstream linkages, and mediate others, i.e., canalize dGRN structure (and developmental process).
dGRNs can be represented as complex logic maps that state in detail the interactions between developmental control genes (transcription factors and components of cell signaling pathways) and cis-regulatory modules (promoter, enhancers, and insulators) in order to visualize how differentiation and structural genes (target genes) are turned off or on at a given time and location during development. . In addition, dGRNs have a modular architecture, consisting of multiple sub-circuits—each in charge of individual regulatory tasks defined by a set of specific developmental control genes and their cis-regulatory modules.
Building on this modular architecture, dGRNs are hierarchical as they are divided into different components. For example, the components controlling the initial stages of development are at the top of the hierarchy, while the portions governing intermediate processes, such as spatial subdivision and morphological patterns are in the middle, and the components controlling more specific functions, including cell differentiation and organogenesis/morphogenesis, are at the periphery.
Elucidating the network of interactions between genes that govern cell differentiation through development is one of the core challenges in genome research. These networks are known as developmental gene regulatory networks (dGRNs) and consist largely of the functional linkage between developmental control genes, cis-regulatory modules, and differentiation genes, which generate spatially and temporally refined patterns of gene expression. 16
The components known as kernels, which consist of conserved interactions among transcription factors, are highly conserved regulatory interactions; they are responsible for the progenitor states of a developing structure. Other components of the network, known as intermediate and peripheral, have great impacts on the phenotype. Understanding how the components of dGRNs have emerged is a central issue in evo-devo biology.
Initial strategies for unravelling dGRNs
Regulatory information at different levels of network organization, from single node to subcircuit to large-scale GRNs depends on regulatory design features such as network architecture, hierarchical organization, and cis-regulatory logic which contribute to the developmental function of network circuits. 15
Transcriptional regulatory circuits: A cell senses its environment and calculates the amount of protein it needs for it various functions. This information processing is done by transcription networks. These networks, a major study object of systems biology, often contain recurring network topologies called 'motifs' . Composition and engineering concepts for these circuits have been extensively studied. dGRN's are composed of interesting functions such as scillators, frequency multipliers and frequency band-pass filters. Transcriptional regulatory circuits can be seen as an analog to electronic circuits. Data input, data processing and data output is an abstraction found in both circuit types. Transcriptional circuits have chemicals as an input. Data processing happens as functional clusters of genes impact each other's expression through inducible transcription factors and cis-regulatory elements. The output is e.g. proteins. Promoters control the expression of genes in response to one or more transcription factors. Rules for programming gene expression with combinatorial promoters have been identified. These networks are engineered to perform a wide range of logic functions.
Gene regulatory networks are pre-programmed to instruct the ordered gene expression and as result, the assembly of the basic functional units into structures of higher order complexity, like Cell factories and multicellular organisms, that can be compared to a city of interconnected factories. Simply expecting a change at the DNA level – the production of one or more slightly different molecular machines – is not going to produce a changed body structure or, as a comparison, the architecture and structure of an entire city, which what is claimed and supposed of macroevolution to be able of doing. For that to occur, a change to the assembly process, the dGRN is needed. That would be, as to evolve/mutate/change the City of New York and transform it into the City of Los Angeles. After the dGRN has done its work and the body plan has been produced variety can occur. But that would be microevolution – which isn’t disputed. It is as if a Sky Scraper in Manhattan would be imploded, and a new, completely different building would be constructed. That would however not change the structural or road map of Manhattan as a whole. Fiddling with dGRNs are always catastrophically bad. Far more is needed than new genetic material to create a new kind of organism. New genetic material is required, but its existence alone will not produce a new body plan. To get that you also need a new or altered dGRN.
dGRNs can be represented as complex logic maps that state in detail the interactions between developmental control genes (transcription factors and components of cell signaling pathways) and cis-regulatory modules (promoter, enhancers, and insulators) in order to visualize how differentiation and structural genes (target genes) are turned off or on at a given time and location during development In addition, dGRNs have a modular architecture, consisting of multiple sub-circuits—each in charge of individual regulatory tasks defined by a set of specific developmental control genes and their cis-regulatory modules (promoter, enhancers, and insulators) in order to visualize how differentiation and structural genes (target genes) are turned off or on at a given time and location during development. dGRNs have a modular architecture, consisting of multiple sub-circuits—each in charge of individual regulatory tasks defined by a set of specific developmental control genes and their cis-regulatory modules. Building on this modular architecture, dGRNs are hierarchical as they are divided into different components. For example, the components controlling the initial stages of development are at the top of the hierarchy, while the portions governing intermediate processes, such as spatial subdivision and morphological patterns are in the middle, and the components controlling more specific functions, including cell differentiation and organogenesis/morphogenesis, are at the periphery.
Animal morphology results from the functional organization of the gene regulatory networks (GRNs) that control development of the body plan. 13 The body plan is formed by the execution of an inherited genomic regulatory program for embryonic development. A major mechanism in dGRN's which determine gene expression is the alteration of the structure and architecture of cis-regulatory modules. The basic control task is to determine transcriptional activity throughout embryonic time and space, and here ultimately lies causality in the developmental process. The genomic control apparatus for any given developmental episode consists of the specifically expressed genes that encode the transcription factors required to direct the events of that episode, most importantly including the cis-regulatory control regions of these genes. The cis-regulatory sequences combinatorially determine which regulatory inputs will affect the expression of each gene and what other genes it will affect; that is, they hard-wire the functional linkages among the regulatory genes, forming network subcircuits. The subcircuits perform biologically meaningful jobs, for example, acting as logic gates, interpreting signals, stabilizing given regulatory states, or establishing specific regulatory states in given cell lineages. GRNs are inherently hierarchical: the networks controlling each phase of development are assemblages of subcircuits, the subcircuits are assemblages of specific regulatory linkages among specific genes, and the linkages are individually determined by assemblages of cis-regulatory transcription factor target sites. But at the highest level of its organization, the developmental GRN is hierarchical. Development progresses from phase to phase, and this fundamental phenomenon reflects the underlying sequential hierarchy of the GRN control system. In the earliest embryonic phases, the function of the developmental GRN is establishment of specific regulatory states in the spatial domains of the developing organism. In this way the design of the future body plan is mapped out in regional regulatory landscapes, which differentially endow the potentialities of the future parts. Lower down in the hierarchy, GRN apparatus continues regional regulatory specification on finer scales. Ultimately, precisely confined regulatory states determine how the differentiation and morphogenetic gene batteries at the terminal periphery of the GRN will be deployed.
Understanding the regulation of gene expression is one of the key problems in current biology. 10 Transcriptional regulation is a key mechanism for cells to accomplish changes in gene expression levels. 11 Gene regulation plays a key role in the control of fundamental processes in living organisms, ranging from development to nutrition and metabolic coordination. 12 Genes are regulated at several levels of integration but one key step is the control of gene transcription. Determining the fundamental structure of transcriptional Gene Regulatory Networks (GRNs, considered here as the relationships of transcription factors (TFs) and their targets) is a major challenge of systems biology.
The genes included in dGRNs encode transcription factors, components of signal transduction pathways, and often effector genes as markers of differentiated cell states. 1 dGRNs have the potential of providing a causal understanding of how upstream specification controls downstream events (i.e. differentiation or cell biological functions).
Coded information can always be tracked back to a intelligence, which has to set up the convention of meaning of the code, and the information carrier, that can be a book, the hardware of a computer, or the smoke of a fire of a indian tribe signalling to another. All communication systems have an encoder which produces a message which is processed by a decoder. In the cell there are several code systems. DNA is the most well known, it stores coded information through the four nucleic acid bases. But there are several others, less known. Recently there was some hype about a second DNA code. In fact, it is essential for the expression of genes. The cell uses several formal communication systems according to Shannon’s model because they encode and decode messages using a system of symbols. As Shannon wrote :
“Information, transcription, translation, code, redundancy, synonymous, messenger, editing, and proofreading are all appropriate terms in biology. They take their meaning from information theory (Shannon, 1948) and are not synonyms, metaphors, or analogies.” (Hubert P. Yockey, Information Theory, Evolution, and the Origin of Life, Cambridge University Press, 2005).
An organism’s DNA encodes all of the RNA and protein molecules required to construct its cells. Yet a complete description of the DNA sequence of an organism—be it the few million nucleotides of a bacterium or the few billion nucleotides of a human—no more enables us to reconstruct the organism than a list of English words enables us to reconstruct a play by Shakespeare. In both cases, the problem is to know how the elements in the DNA sequence or the words on the list are used. Under what conditions is each gene product made, and, once made, what does it do? The different cell types in a multicellular organism differ dramatically in both structure and function. If we compare a mammalian neuron with a liver cell, for example, the differences are so extreme that it is difficult to imagine that the two cells contain the same genome. The genome of a organism contains the instructions to make all different cells, and the expression of either a neuron cell or liver cell can be regulated at many of the steps in the pathway from DNA to RNA to Protein. The most important imho is CONTROL OF TRANSCRIPTION BY SEQUENCESPECIFIC DNA-BINDING PROTEINS, called transcription factors or regulators. These proteins recognize specific sequences of DNA (typically 5–10 nucleotide pairs in length) that are often called cis-regulatory sequences. Transcription regulators bind to these sequences, which are dispersed throughout genomes, and this binding puts into motion a series of reactions that ultimately specify which genes are to be transcribed and at what rate. Approximately 10% of the protein-coding genes of most organisms are devoted to transcription regulators. Transcription regulators must recognize short, specific cis-regulatory sequences within this structure. The outside of the double helix is studded with DNA sequence information that transcription regulators recognize: the edge of each base pair presents a distinctive pattern of hydrogen-bond donors, hydrogen-bond acceptors, and hydrophobic patches in both the major and minor grooves. The 20 or so contacts that are typically formed at the protein–DNA interface add together to ensure that the interaction is both highly specific and very strong.
These instructions are written in a language that is often called the ‘gene regulatory code’. The preference for a given nucleotide at a specific position is mainly determined by physical interactions between the aminoacid side chains of the TF ( transcription factor ) and the accessible edges of the base pairs that are contacted. It is possible that some complex code, comprising rules from each of the different layers, contributes to TF– DNA binding; however, determining the precise rules of TF binding to the genome will require further scientific research. So, Genomes contain both a genetic code specifying amino acids, and this regulatory code specifying transcription factor (TF) recognition sequences. We find that ~15% of human codons are dual-use codons (`duons') that simultaneously specify both amino acids and TF recognition sites. Genomes also contain a parallel regulatory code specifying recognition sequences for transcription factors (TFs) , and the genetic and regulatory codes have been assumed to operate independently of one another, and to be segregated physically into the coding and non-coding genomic compartments. the potential for some coding exons to accommodate transcriptional enhancers or splicing signals has long been recognized
In order for communication to happen, 1. The sequence of DNA bases located in the regulatory region of the gene is required, and 2. transcription factors that read the code. If one of both is missing, communication fails, the gene that has to be expressed, cannot be encountered, and the whole procedure of gene expression fails. This is an irreducible complex system. The gene regulatory code could not arise in a stepwise manner either, since if that were the case, the code has only the right significance if fully developed. That's an example par excellence of intelligent design.. The fact that these transcription factor binding sequences overlap protein-coding sequences, suggest that both sequences were designed together, in order to optimize the efficiency of the DNA code. As we learn more and more about DNA structure and function, it is apparent that the code was not just hobbled together by the trial and error method of natural selection, but that it was specifically designed to provide optimal efficiency and function.
The control of gene expression involves complex circuits that exhibit enormous variation in design. 2 For years the most convenient explanation for these variations was historical accident. According to this view, evolution is a haphazard process in which many different designs are generated by chance. The central importance of gene regulation in modern molecular biology provides strong motivation to search for more of these underlying design principles. Gene circuits sense their environmental context and orchestrate the expression of a set of genes to produce appropriate patterns of cellular response. The importance of this role has made the experimental study of gene regulation central to nearly all areas of modern molecular biology. The fruits of several decades of intensive investigation have been the discovery of a plethora of both molecular mechanisms and circuitry by which these are interconnected. Several elements of design, each exhibiting a variety of realizations, have been identified among elementary gene circuits in prokaryotic organisms. Design principles appear to govern the realization of these elements. Experimental studies of specific gene systems by molecular biologists have revealed an immense variety of molecular mechanisms that are combined into complex gene circuits, and the patterns of gene expression observed in response to environmental and developmental signals are equally diverse. Are these variations in design the result of historical accident or have they been selected for specific functional reasons?
There are several different levels of hierarchical organization that intervene between the genotype and the phenotype. These levels are linked by gene circuits that can be characterized in terms of the following elements of design:
- transcription unit,
- input signalling,
- mode of control,
- logic unit,
- expression cascade,
The Transcription unit consists of a set of coordinately regulated structural genes that encode proteins, an up-stream promoter site which transcription of the genes is initiated, and a down-stream terminator site at which transcription ceases. Transcription units are the principal feature around which gene circuits are organized. On the input side, signals in the extracellular or intracellular environment are detected by binding to specific receptor molecules, which propagate the signal to specific regulatory molecules in a process called transduction, although in many cases the regulator molecules are also the receptor molecules. Regulator molecules in turn bind to the modulator sites of transcription units in one of two alternative modes and the signals are combined in a logic unit to determine the rate of transcription. On the output side, transcription initiates an expression cascade that yields one or many mRNA products, one or many protein products, and possibly one or many products of enzymatic activity. Thus, the transcription unit emits a fan-out of signals, which are then connected in a diverse fashion to the receptors of other transcription units to complete the interlocking gene circuitry.
Input signalling The input signals for transcription units can arise either from the external environment or from within the cell. When signals originate in the extracellular environment, they often involve binding of signal molecules to specific receptors in the cellular membrane. In bacteria, alterations in the membrane-bound receptor are communicated directly to regulator proteins via short signal transduction pathways called ‘‘two-component systems.’’ In other cases, signal molecules in the environment are transported across the membrane, and in some cases are subsequently modified metabolically, to become signal molecules that bind directly to regulator proteins . in these cases the receptor and regulator are one and the same molecule.
Mode of control Regulators exerts their control over gene expression by acting in one of two different modes. In the positive mode, they stimulate expression of an otherwise quiescent gene, and induction of gene expression is achieved by supplying the functional form of the regulator. In the negative mode, regulators block expression of an otherwise active gene, and induction of gene expression is achieved by removing the functional form of the regulator. Each of these two designs positive or negative requires the transcription unit to have the appropriate modulator site initiator type or operator type and promoter function low level or high level.
Logic unit The control regions associated with transcription units may be considered the logic unit where input signals from various regulators are integrated to govern the rate of transcription initiation. There are two lines of evidence suggesting that most transcription units in bacteria have only a few regulatory inputs. If the number of inputs was fewer on average, the behavior of the network was too fixed; whereas if the number was greater on average, the behavior was too chaotic. The optimal behaviour associated with a few inputs often is described as ‘‘operating at the edge of chaos.’’ Second, with the arrival of the genomic era and the sequencing of the complete genome for a number of bacteria, there is now experimental evidence regarding the distribution of inputs per transcription unit. The sequence for Escherichia coli has shown that the number of modulator sites located near the promoters of transcription units is on average approximately two to three. The large majority have two and a few have as many as five.
Expression cascades Expression cascades produce the output signals from transcription units. They typically reflect the flow of information from DNA to RNA to protein to metabolites, which has been called the ‘‘Central Dogma’’ of molecular biology.
Gene regulatory networks (GRNs) provide system level explanations of developmental and physiological functions in the terms of the genomic regulatory code. Depending on their developmental functions, GRNs differ in their degree of hierarchy, and also in the types of modular sub-circuit of which they are composed, although there is a commonly employed sub-circuit repertoire. Mathematical modelling of some types of GRN sub-circuit has deepened biological understanding of the functions they mediate. The structural organization of various kinds of GRN reflects their roles in the life process, and causally illuminates developmental processes. GRNs determine the main events of postembryonic development, including organogenesis and formation of adult parts and cell types. Beyond that, GRNs control a vast array of physiological capabilities and modes of response to environmental fluctuations and challenges. GRNs are composed of multiple sub-circuits, that is, the individual regulatory tasks into which a process can be parsed are each accomplished by a given GRN sub-circuit. Thus the operational significance of a GRN structure will be indicated by the types of sub-circuit it contains. However, GRNs have more global organizational properties as well. GRNs may be deeply layered, generating successive regulatory transactions, or they may be shallow, in the sense that they mandate few transactions between the initial inputs and the terminal activation of effector genes.
‘The overall control principle is that the embryonic process is finely divided into precise little “jobs” to be done, and each is assigned to a specific subcircuit or wiring feature in the upper-level dGRN. There is always an observable consequence if a dGRN subcircuit is interrupted. Since these consequences are always catastrophically bad, flexibility is minimal, and since the subcircuits are all interconnected, the whole network partakes of the quality that there is only one way for things to work. And indeed the embryos of each species develop in only one way. 7
Thus we can think of a crown group dGRN as a terminal, finely divided, extremely elegant control system that allows continuing alteration and variation only after the body plan per se has formed, i.e., in structural terms, at the dGRN periphery, and in developmental terms, late in the process. It is no surprise, from this point of view, that cell type re-specification by insertion of alternative differentiation drivers is changing only at the dGRN periphery, quite a different matter from altering body plan.
The significance of GRN subcircuits
Some things never change, and a principle is that developmental jobs are controlled through the logic outputs of genetic subcircuits. How the animal body plan has emerged is a question that in the end can only be addressed in the terms of transcriptional regulatory systems biology.
Five different sub-circuits of the GRN control five distinct cell biological activities. 6 The individual components of a complex developmental process are in general controlled by GRN subcircuits, and it is their architecture that illuminates the basic logic of development. The same topology of each of the subcircuits appears repeatedly in diverse developmental GRNs and executes similar network functions. But in each instance, subcircuits of a given type are composed of entirely different regulatory genes. In fact, in these examples, there is no case where the same regulatory genes are used more than once for a given type of circuit. This demonstrates that subcircuit function depends exclusively on topological design and is not determined by the specific biochemical nature of the constituent regulatory factors, except for their general properties as activating or repressing transcription factors. Perhaps most surprising is the fundamental similarity of inductive signalling logic despite the tremendous varieties of signal transduction pathways responding to diverse ligands. Genomic regulatory functions in development depend on subcircuit logic functions and not on unique properties of particular individual transcription factors. Feedback subcircuits, the various feedforward subcircuits, double-negative gate subcircuits, Janus signalling subcircuits, etc. are required constituents of GRNs for embryonic development.
The recurrent use of the same subcircuits implies that there may exist a finite complement of genomically encoded subcircuits which recur regularly in developmental GRNs, and from which these GRNs are constructed. This idea has deep implications for the origins of the morphological programs that control bilaterian development.
The utilization of given subcircuit architectures is clearly not what would be expected on the basis of random occurrence of regulatory interactions. To give one striking example, one might expect that isolated autoregulatory wiring would occur with equal frequency throughout the different regions of a GRN, that is, any gene might have a similar probability of engaging in autoregulation positive or negative. But what is observed is clearly different from this. Here we found among all these examples only three occurrences of isolated positive autoregulations but 10 occurrences of positive autoregulation in the context of positive intergenic feedback subcircuits, even though there are many times more genes in these networks as a whole than are included in intergenic feedback subcircuits. Additional evidence that isolated positive autoregulation has been disfavored is the relatively high frequency of isolated negative autoregulation that we observe, 18 occurrences in the same set of networks, obviously none in the context of positive intergenic feedbacks. On a random basis, the likelihood of a positive and negative autoregulatory site should be quite similar, but in fact we see 3 versus 18 occurrences outside of intergenic feedback subcircuits.
Subcircuit occurrence depends on requirement in GRN context for given subcircuit functions, which of course is not expected to be random. We can predict that as comparative subcircuit databases expand beyond the initial attempts we have made here, a large body of evidence will accumulate displaying the nonrandom occurrence of all the canonical subcircuit topologies we here consider.
Modular GRN sub-circuits are defined by their topologies, and the topology of a sub-circuit directly indicates its function in life. Sub-circuits perform developmental biology jobs that can be defined uniquely, and not with very common ‘motifs’ such as the coherent feed-forward loop, which although it has specific dynamic properties, appears in so many different contexts that no unique developmental biology function can be associated with it.
Design principles govern transcriptional regulation networks that control gene expression in cells. 4 ‘Network motifs’ are patterns of interconnections that recur in many different parts of a network at frequencies much higher than those found in randomized networks. Each network motif has a specific function in determining gene expression, such as generating temporal expression programs and governing the responses to fluctuating external signals. The transcriptional network can be represented as a directed graph, in which each node represents an operon (an operon is a group of contiguous genes that are transcribed into a single mRNA molecule). and edges represent direct transcriptional interactions. Each edge is directed from an operon that encodes a transcription factor to an operon that is regulated by that transcription factor.
The ‘feedforward loop’ motif is defined by a transcription factor X that regulates a second transcription factor Y, such that both X and Y jointly regulate an operon Z. We term X the ‘general transcription factor’, Y the ‘specific transcription factor’, and Z the ‘effector operon(s)’. For example, if X and Y both positively regulate Z, and X positively regulates Y, the feedforward loop is coherent. If, on the other hand, X represses Y, then the motif is incoherent.
An understanding of the design principles of biochemical networks such as gene regulatory, metabolic, or intracellular signalling networks is a central concern of systems biology. In particular, the intricate interplay between network topology and resulting dynamics is crucial to our understanding of such networks, as is their presumed modular structure. 5 A topological feature of central interest is the existence of positive and negative feedback loops. Feedback loops have a decisive effect on dynamics, which has been studied extensively through the analysis of mathematical network models, both continuous and discrete. In biological networks, each variable can only attain two values (0/1 or on/off).
Logic gates evoke images of circuit boards, but cells are arguably equally good in relying on logic computations. A classic example is the Lac operon, which activates itself upon the condition “lactose AND NOT glucose”. In recent years, there have been multiple reports on rationally designed, genetically encoded logic gates and circuits in living cells. Just like the Lac operon, these gates receive two or more molecular signals (inputs) and generate a product (output) whose level is logically linked to the inputs. 18
Stephen C.Meyer, Darwin's doubt, page 228:
Think, again, of arranging Lego blocks. There are many ways of arranging small numbers of Lego blocks. These various arrangements form common structural motifs such as: two blocks stuck together at right angles; several curved blocks forming circular rings, stacked blocks forming hollow squares or walls or cube-like shapes; blocks arranged as prisms or cylinders; flat layers of blocks stacked two bumps thick or three bumps thick or more. Though these structural elements stick together because of interactions between the bumps and indentations on each block, those bumps and indentations themselves do not specify any particular larger structure—a castle or an airplane, for example—because each motif may be combined or recombined with many other structural motifs in numerous different ways. The shape and properties of the modular elements do not dictate the type of larger structure that must be built from them. Instead, to build a particular structure, the modular elements must be arranged in particular ways. And since there are many possible ways to arrange these modular elements, only one or a few of which will result in a desired structure, every Lego set includes a blueprint with step-by-step instructions—in other words, additional information.
Producing a body plan from the different types of cell clusters generated by Newman’s dynamical patterning modules (DPMs), would also require additional information. Newman does not account for this information. He correctly highlights the way certain recurrent motifs for organizing groups of cells seem to form spontaneously as the result of physical interactions between individual cells (his DPMs). He does not, however, establish that these groups of cells must arrange themselves into specific tissues, organs, or body plans in response to any known physical process or law. Instead, it seems entirely possible that these modular elements (cell clusters) have many “degrees of freedom” and can be arranged in innumerable ways. If so, then some additional information—an overall organismal blueprint or set of assembly instructions—would need to direct the arrangement of these modular elements. Newman does not consider this possibility. Nor does he cite any law-like self-organizational process that would eliminate the need for such information to direct animal development.
Transcription regulation networks control the expression of genes. The transcription networks of well-studied microorganisms appear to be made up of a small set of recurring regulation patterns, called network motifs. The same network motifs have recently been found in diverse organisms from bacteria to humans, suggesting that they serve as basic building blocks of transcription networks. Transcription factors respond to biological signals and accordingly change the transcription rate of genes, allowing cells to make the proteins they need at the appropriate times and amounts. Transcription networks contain a small set of recurring regulation patterns, called network motifs. Network motifs can be thought of as recurring circuits of interactions from which the networks are built. Network motifs were first systematically defined in Escherichia coli, in which they were detected as patterns that occurred in the transcription network much more often than would be expected in random networks. The same motifs have since been found in organisms from bacteria and yeast to plants and animals. 14
There are two types of transcription networks:
- sensory networks that respond to signals such as stresses and nutrients, and
- developmental networks that guide differentiation events.
I will first consider sensory networks, the motifs of which are common to both types of network. I will then turn to motifs that are specific to developmental networks. Network motifs are also found in other biological networks, such as those that involve protein modifications or interactions between neuronal cells. Each network motif can carry out specific information-processing functions. Simple regulation occurs when transcription factor Y regulates gene X with no additional interactions. Figure a, below:
Simple regulation and autoregulation.
a | In simple regulation, transcription factor Y is activated by a signal Sy . When active, it binds the promoter of gene X to enhance or inhibit its transcription rate.
b | In negative autoregulation (NAR), X is a transcription factor that represses its own promoter.
c | In positive autoregulation (PAR), X activates its own promoter.
d | NAR speeds the response time (the time needed to reach halfway to the steady-state concentration) relative to a simple-regulation system that reaches the same steady-state expression. PAR slows the response time.
e | An experimental study of NAR, using a synthetic gene circuit in which the repressor TetR fused to GFP represses its own promoter. High-resolution fluorescence measurements in living Escherichia coli cells show that this NAR motif has a response time about fivefold faster than a simple-regulation design.
f | A schematic cell-cell distribution of protein levels. NAR tends to make this distribution narrower in comparison with simple regulation, whereas PAR tends to make it wider and in extreme cases bimodal with two populations of cells. X/Xst, X concentration relative to steady state Xst
Y is usually activated by a signal, Sy . The signal can be an inducer molecule that directly binds Y, or a modification of Y by a signal-transduction cascade, and so on. When transcription begins, the concentration of the gene product X rises and converges to a steady-state level (FIG d). This level is equal to the ratio of the production and degradation rates, where degradation includes both active degradation and the effect of dilution by cell growth. When production stops, the concentration of the gene product decays exponentially. In both cases, the response time, which is defined as the time it takes to reach halfway between the initial and final levels, is equal to the half-life of the gene product a. The faster the degradation rate, the shorter the response time. For proteins that are not actively degraded, as is the case for most proteins in growing bacterial cells, the response time is equal to one cell-generation time. This is a result of the dilution effect from cell growth.
Negative autoregulation (NAR) occurs when a transcription factor represses the transcription of its own gene (FIG.b). This network motif occurs in about half of the repressors in E. coli, and in many eukaryotic repressors. NAR has been shown to display two important functions. First, NAR speeds up the response time of gene circuits. This occurs when NAR uses a strong promoter to obtain a rapid initial rise in the concentration of protein X. When X concentration reaches the repression threshold for its own promoter, the production rate of new X decreases. Thus, the concentration of X locks into a steady-state level that is close to its repression threshold. By contrast, a simply regulated gene that is designed to reach the same steady-state level must use a weaker promoter. As a result, an NAR system reaches 50% of its steady state faster than a simply regulated gene (FIG. d). The dynamics of NAR show a rapid initial rise followed by a sudden locking into the steady state, possibly accompanied by an overshoot or damped oscillations. Response acceleration (or speed-up) by NAR has been demonstrated experimentally. Speed-up in a natural context was demonstrated in the SOS DNA-repair system of E. coli, in which the master regulator, LexA, represses its own promoter.
Positive autoregulation (PAR) occurs when a transcription factor enhances its own rate of production (FIG. c). The effects are opposite to those of NAR: response times are slowed and variation is usually enhanced. In addition to speeding responses, NAR can reduce cell–cell variation in protein levels. These variations are due to an inherent source of noise: the production rates of proteins fluctuate by tens of percents (FIG.f). This noise results in cell–cell variation in protein level. NAR can, in many cases, reduce these variations: high concentrations of X reduce its own rate of production, whereas low concentrations cause an increased production rate. The result is a narrower distribution of protein levels than would be expected in simply regulated genes (FIG.f) However, if the NAR feedback contains a long delay, noise can also be amplified. PAR slows the response time because at early stages, when levels of X are low, production is slow. Production picks up only when X concentration approaches the activation threshold for its own promoter. Thus, the desired steady state is reached in an S-shaped curve (FIG.d). The response time is longer than in a corresponding simple-regulation system, as shown theoretically and experimentally. PAR tends to increase cell–cell variability. If PAR is weak (that is, X moderately enhances its own production rate), the cell–cell distribution of X concentration is expected to be broader than in the case of a simply regulated gene (FIG. f). Strong PAR can lead to bimodal distributions, whereby the concentration of X is low in some cells but high in others. In cells in which the concentration is high, X activates its own production and keeps it high indefinitely. Strong PAR can, therefore, lead to a differentiation-like partitioning of cells into two populations (FIG.f). In some cases, PAR can be useful as a memory to maintain gene expression. In other cases, a bimodal distribution is thought to help cell populations to maintain a mixed phenotype so that they can better respond to a stochastic environment.
The second family of network motifs is the feedforward loop (FFL). It appears in hundreds of gene systems in E. coli and yeast, as well as in other organisms. This motif consists of three genes: a regulator, X, which regulates Y, and gene Z, which is regulated by both X and Y. Because each of the three regulatory interactions in the FFL can be either activation or repression, there are eight possible structural types of FFL (FIG a).
Named after George Boole, a 19th-century mathematician, Boolean logic is the basis of digital circuit design. For example, the AND function means that both conditions must be on to trigger the circuit. In notational form, where 1 means present and 0 means absent,
1 AND 1 = true or “on”
0 AND 0 = false or “off”
1 AND 0 = false or “off”
In the OR function, either one (or both) of the inputs can be on to trigger the circuit:
1 OR 1 = true or “on”
0 OR 0 = false or “off”
1 OR 0 = true or “on”
Negation of these functions, called NAND and NOR, reverse the outputs. For a logic gate with two inputs and one output, such as a transistor, there are 16 possible operations (AND, NAND, OR, NOR, XOR, XNOR, …). 17
Feedforward loops (FFLs).
a | The eight types of feedforward loops (FFLs) are shown. In coherent FFLs, the sign of the direct path from transcription factor X to output Z is the same as the overall sign of the indirect path through transcription factor Y. Incoherent FFLs have opposite signs for the two paths.
b | The coherent type-1 FFL with an AND input function at the Z promoter.
c | The incoherent type-1 FFL with an AND input function at the Z promoter. SX and SY are input signals for X and Y.
To understand the function of the FFLs, we need to understand how X and Y are integrated to regulate the Z promoter. Two common ‘input functions’ are an ‘AND gate’, in which both X and Y are needed to activate Z, and an ‘OR gate’, in which binding of either regulator is sufficient. Other input functions are possible, such as the additive input function in the flagella system24,31 and the hybrid of AND and OR logic in the lac promoter32. However, much of the essential behaviour of FFLs can be understood by focusing on the stereotypical AND and OR gates. Each of the eight FFL types can thus appear with at least two input functions. In the best studied transcriptional networks (E. coli and yeast), two of the eight FFL types occur much more frequently than the other six types. These common types are the coherent type-1 FFL (C1-FFL) and the incoherent type-1 FFL (I1-FFL). Here I discuss their dynamical functions in detail; the functions of all eight FFL types.
The C1-FFL is a ‘sign-sensitive delay’ element and a persistence detector. In the C1-FFL, both X and Y are transcriptional activators (FIG. b above). I will first consider the behaviour of the FFL when the Z promoter has an AND input function, and then turn to the case of the OR input function. With an AND input function, the C1-FFL shows a delay after stimulation, but no delay when stimulation stops. To see this, let’s follow the behaviour of the FFL. When the signal Sx appears, X becomes active and rapidly binds its downstream promoters. As a result, Y begins to accumulate. However, owing to the AND input function, Z production starts only when Y concentration crosses the activation threshold for the Z promoter. This results in a delay of Z expression following the appearance of Sx
The coherent type-1 feedforward loop (C1-FFL) and its dynamics.
The C1-FFL with an AND input function shows delay after stimulus (SX) addition and no delay after stimulus removal. It thus acts as a sign-sensitive filter, which responds only to persistent stimuli.
In contrast, when the signal Sx is removed, X rapidly becomes inactive. As a result, Z production stops because deactivation of its promoter requires only one arm of the AND gate to be ‘shut off ’. Hence, there is no delay in deactivation of Z after the signal Sx is removed (FIG. above). This dynamic behaviour is called sign-sensitive delay; that is, delay depends on the sign of the Sx step. An ON step (addition of Sx) causes a delay in Z expression, but
an OFF step (removal of Sx) causes no delay. The duration of the delay is determined by the biochemical parameters of the regulator Y; for example, the higher the activation threshold for the Z promoter by Y, the longer the delay. The delay that is generated by the FFL can be useful to filter out brief spurious pulses of signal. A signal that appears only briefly does not allow Y to accumulate and cross its threshold, and thus does not induce a Z response. Only persistent signals lead to Z expression(FIG. above). The sign-sensitive delay function of this motif has been experimentally demonstrated in the arabinose utilization system of E. coli.
An experimental study of the C1-FFL in the arabinose system of Escherichia coli, using fluorescent-reporter strains and high-resolution measurements in living cells.
This system (represented by red circles) shows a delay after addition of the input signal (cAMP), and no delay after its removal, relative to a simple-regulation system that responds to the same input signal (the lac system, represented by blue squares).
a Genes that direct differentiation of embryonic stem cells (hESCs) into posterior specialized differentiated progenitor cells
b A structural gene is a gene that codes for any RNA or protein product other than a regulatory factor (i.e. regulatory protein).
Last edited by Otangelo on Mon Oct 24, 2022 8:26 am; edited 86 times in total