Transcription factors (TF)
https://reasonandscience.catsboard.com/t2738-transcription-factors-tf
DNA is like a politician, surrounded by a flock of protein handlers and advisers that must vigorously massage it, twist it, and on occasion, reinvent it before the grand blueprint of the body can make any sense at all.” These “handlers and advisers” are the transcription factors. During development, transcription factors play essential roles in every aspect of embryogenesis, controlling differential gene expression leading to differentiation. When in doubt, it is usually a transcription factor’s fault, a sentiment that is often used by politicians, too. Transcription factors can be grouped together in families based on similarities in DNA-binding domains
Slight differences of the transcription factors in the amino acids of their DNA-binding sites at the binding site can cause the binding site to recognize different DNA sequences. DNA regulatory elements such as enhancers and silencers function by binding transcription factors, and each element can have binding sites for several transcription factors. Transcription factors bind to the DNA of the regulatory element using one site on the protein and other sites to interact with other transcription factors and proteins, leading to the recruitment of histone-modifying enzymes.
Question: How could the recruitment of the right TF partners be due to mutations and natural selection, if there is almost an infinite number of combinations which would not work, and interaction of the other side of TF's with histone-modifying enzymes depends on a finely tuned, orchestrated, and highly specific interactions of three players: TF's, histone-modifying enzymes, and histones ?
For example, the association of the Pax6, Sox2, and l-Maf transcription factors in lens cells recruits a histone acetyltransferase that can transfer acetyl groups to the histones and dissociate the nucleosomes in that area. Similarly, when MITF, a transcription factor essential for ear development and pigment production, binds to its specific DNA sequence, it also binds to (different) histone acetyltransferase that facilitates the dissociation of nucleosomes. In addition, the Pax7 transcription factor that activates muscle-specific genes binds to the enhancer region of these genes within the muscle precursor cells. Pax7 then recruits a histone methyltransferase that methylates the lysine in the fourth position of histone H3 (H3K4), resulting in the trimethylation of this lysine and the activation of transcription. The displacement of nucleosomes along the DNA makes it possible for other transcription factors to find their binding sites and regulate expression. In addition to recruiting histone-modifying enzymes, transcription factors can also work by stabilizing the transcription pre-initiation complex that enables RNA polymerase II to bind to the promoter
The bridge between enhancer and promoter can be made by transcription factors.
Certain transcription factors bind to DNA on the promoter (where RNA polymerase II will initiate transcription), whereas other transcription factors bind to the enhancer (which regulates when and where transcription can occur). Other transcription factors do not bind to the DNA; rather, they link the transcription factors that have bound to the enhancer and promoter sequences. In this way, the chromatin loops to bring the enhancer to the promoter. The example shown here is the mouse β-globin gene. (A) Transcription factors assemble on the enhancer, but the promoter is not used until the GATA1 transcription factor binds to the promoter. (B) GATA1 can recruit several other factors, including Ldb1, which forms a link uniting the enhancer-bound factors to the promoter-bound factors.
At any one time most of the genes of a higher organism are silent: only a minority of the genes in a given cell is expressed, that is, being transcribed into messenger RNA, which in turn acts as a template for protein synthesis. The selective expression of any one gene is accomplished primarily through the interaction of protein transcription factors with characteristic DNA sequences included in the control region of the gene, which is most commonly located near to the actual coding region. The binding of a set of such factors, or regulatory proteins, acts as a molecular switch for the activation of RNA polymerase and other components of the transcriptional machinery, which are common to all genes. The supply of a particular combination of such transcription factors ensures that a gene is switched on at the right place and at the right time. 3
Gene expression is controlled by binding of a protein to a regulatory region, or promoter. TFs that bind to cis-regulator DNA sequences are responsible for either positively or negatively influencing the transcription of specific genes, essentially determining whether a particular gene will be turned "on" or "off" in an organism. Much of the complexity in differentiation in animal and plant cells can be attributed to elaborate systems made up of short (6 to 8 base pair) cis-regulatory DNA sequences or motifs, as well as the TFs that bind to the motifs, interact with each other to form complexes, and recruit RNA polymerase II. Most eukaryotic genes have promoters that consist of the TATA box close to the 5' end of the gene and, farther upstream, several motifs recognized by specific transcription factors. 7
Transcription Factors are required to initiate transcription
Differentiated eukaryotic cells possess a remarkable capacity for the selective expression of specific genes. The synthesis rates of a particular protein in two cells of the same organism may differ by as much as a factor of 10^9, or 1 billion. For example, reticulocytes (immature red blood cells) synthesize large amounts of haemoglobin but no detectable insulin, whereas the pancreatic cells produce large quantities of insulin but no haemoglobin.
In contrast, prokaryotic systems generally exhibit no more than a thousandfold range in their transcription rates so that at least a few copies of all the proteins they encode are present in any cell. Nevertheless, the basic mechanism for initiating transcription of structural genes is the same in eukaryotes and prokaryotes: Protein factors bind selectively to the promoter regions of DNA.
A complex of at least six general transcription factors operate in Eukaryotic Cells and have in prokaryotic Cells an equivalent of just one a prokaryotic σ-factor.
Properties and Functions of the Eukaryotic General Transcription Factors
The six GTFs are highly conserved from yeast to humans.
Question: How could there have been a transition from simpler transcription initiation in eukaryotes, using just one σ-factor, to six, in eukaryotic cells, considering, that the six GTFs are highly conserved ( they have not changed in supposed deep evolutionary time ? )
Complex Switches Control Gene Transcription in Eukaryotes
When compared to the situation in bacteria, transcription regulation in eukaryotes involves many more proteins and much longer stretches of DNA. It often seems bewilderingly complex. Yet many of the same principles apply. As in bacteria, the time and place that each gene is to be transcribed is specified by its cis-regulatory sequences, which are “read” by the transcription regulators that bind to them. Once bound to DNA, positive transcription regulators (activators) help RNA polymerase begin transcribing genes, and negative regulators (repressors) block this from happening. In bacteria, most of the interactions between DNA-bound transcription regulators and RNA polymerases (whether they activate or repress transcription) are direct. In contrast, these interactions are almost always indirect in eukaryotes: many intermediate proteins, including the histones, act between the DNA-bound transcription regulator and RNA polymerase. Moreover, in multicellular organisms, it is common for dozens of transcription regulators to control a single gene, with cis-regulatory sequences spread over tens of thousands of nucleotide pairs. DNA looping allows the DNA-bound regulatory proteins to interact with each other and ultimately with RNA polymerase at the promoter. Finally, because nearly all of the DNA in eukaryotic organisms is compacted by nucleosomes and higher-order structures, transcription initiation in eukaryotes must overcome this inherent block.
Transcription factor domains
Transcription factors have three major domains. The first is a DNA -binding domain that recognizes a particular DNA sequence in the enhancer. There are several different types of DNA-binding domains, and they often designate the major family classifications for transcription factors. Some of the most common protein domains that convey DNA binding are the Homeodomain, Zinc Finger Leucine Zipper, Helix-Loop-Helix, and Helix-Turn-Helix. For instance, the homeodomain transcription factor Pax69 uses its paired DNA-binding sites to recognize the enhancer sequence, CAATTAGTCACGCTTGA. In contrast, the MITF transcription factor involved in ear and pigment cell development contains both leucine zipper and helix-loop-helix domains, and it recognizes shorter DNA sequences called the E-box (CACGTG) and the M-box. These sequences for MITF binding have been found in the regulatory regions of genes encoding several pigment-cell-specific enzymes of the tyrosinase family. Without MITF, these proteins are not synthesized properly, and melanin pigment is not made.
The second domain is a trans-activating domain that activates or suppresses the transcription of the gene whose promoter or enhancer it has bound. Usually, this transactivating domain enables the transcription factor to interact with the proteins involved in binding RNA polymerase II or with enzymes that modify histones. MITF contains such a domain of amino acids in the center of the protein. When the MITF dimer is bound to its target sequence in the enhancer, the trans-activating region is able to bind a transcription-associated factor (TAF), p300/CBP. The p300/CBP protein is a histone acetyltransferase enzyme that can transfer acetyl groups to each histone in the nucleosomes. Acetylation of the nucleosomes destabilizes them and allows the genes for pigment-forming enzymes to be expressed.
Finally, third, there is usually a protein-protein interaction domain that allows the transcription factor’s activity to be modulated by TAFs or other transcription factors. MITF has a protein-protein interaction domain that enables it to dimerize with another MITF protein. The resulting homodimer (i.e., two identical protein molecules bound together) is the functional protein that binds to enhancer DNA of certain genes and activates transcription
Three-dimensional model of the homodimeric transcription factor MITF
(one protein shown in red, the other in blue) binding to a promoter element in DNA (white). The amino termini are located at the bottom of the figure and form the DNA-binding domains that recognize an 11-base-pair sequence of DNA having the core sequence CATGTG. The protein-protein interaction domain is located immediately above. MITF has the basic helix-loop-helix structure found in many transcription factors. The carboxyl end of the molecule is thought to be the trans-activating domains that bind the p300/CBP transcription-associated factor
(TAF).
Insulators
The boundaries of gene expression appear to be set by DNA sequences called insulators. Insulator sequences limit the range in which an enhancer can activate gene expression. They thereby “insulate” a promoter from being activated by another gene’s enhancers. Some insulator DNA regions have been found to bind a zinc-finger transcription factor called CTCF,11 which functions to alter the three-dimensional conformation of chromatin and thereby separate (or insulate) enhancer elements from the promoter. CTCF is ubiquitously expressed in eukaryotes and has been charted to bind tens of thousands of sites on the genome. Mechanistically, CTCF physically interacts with cohesin, a ring-shaped complex of multiple subunits that function to stabilize chromatin loop structures. It is hypothesized that CTCF uses its 11 zinc-finger domains to selectively bind DNA, often insulator elements, to create loop structures that distance enhancers from promoters. For instance, the chick β-globin gene has been shown to form a complex with cohesin. This CTCF-cohesin complex may bind to the enhancer-bound Mediator, thereby preventing the enhancer from activating the adjacent promoter.
Pioneer transcription factors: breaking the silence
Finding an enhancer is not easy because the DNA is usually so wound up that the enhancer sites are not accessible. Given that the enhancer might be covered by nucleosomes, how can a transcription factor find its binding site? That is the job of certain transcription factors that penetrate repressed chromatin and bind to their enhancer DNA sequences. They have been called “pioneer” transcription factors, and they appear to be critical in establishing certain cell lineages. One of these transcription factors is FoxA1, which binds to certain enhancers and opens up the chromatin to allow other transcription factors access to the promoter. FoxA1 is extremely important in specifying liver cells, remaining bound to the DNA during mitosis, and providing a mechanism to reestablish normal transcription in presumptive liver cells. Another pioneer transcription factor is the Pax7 protein mentioned above. It activates muscle-specific gene transcription in a population of muscle stem cells by binding to its DNA recognition sequence and being stabilized there by dimethylated H3K4 on the nucleosomes. It then recruits the histone methyltransferase that converts the dimethylated H3K4 into the trimethylated H3K4 associated with active transcription.
Regulatory Proteins ( transcription factors ) Have Discrete DNA-Binding Domains
Transcription factors ( TFs) must recognize short, specific cis-regulatory sequences within this structure. The outside of the double helix is studded with DNA sequence information that transcription regulators recognize: the edge of each base pair presents a distinctive pattern of hydrogen-bond donors, hydrogen-bond acceptors, and hydrophobic patches in both the major and minor grooves. Because the major groove is wider and displays more molecular features than does the minor groove, nearly all transcription regulators make the majority of their contacts with the major groove.
TFs recognize DNA in a specific manner; the mechanisms underlying this specificity have been identified for many TFs, based on three-dimensional structures of protein-DNA complexes. 4 Regulatory proteins generally bind to specific DNA sequences. Their affinity for these target sequences is roughly 10^4 to 10^6 times higher than their affinity for any other DNA sequence. Most regulatory proteins have discrete DNA-binding domains containing substructures that interact closely and specifically with the DNA. These binding domains usually include one or more of a relatively small group of recognizable and characteristic structural motifs. To bind specifically to DNA sequences, regulatory proteins must recognize surface features on the DNA. Most of the chemical groups that differ among the four DNA bases and thus permit discrimination between base pairs are hydrogen-bond donor and acceptor groups exposed in the major groove of DNA and most of the protein-DNA contacts that impart specificity are hydrogen bonds.
Groups in DNA available for protein binding.
Shown here are functional groups on all four base pairs that are displayed in the major and minor grooves of DNA. Groups that can be used for base-pair recognition by proteins are shown in red.
A notable exception is the nonpolar surface near C-5 of pyrimidines, where thymine is readily distinguished from cytosine by its protruding methyl group. Protein-DNA contacts are also possible in the minor groove of the DNA, but the hydrogen-bonding patterns here generally do not allow ready discrimination between base pairs. Within regulatory proteins, the amino acid side chains most often hydrogen-bonding to bases in the DNA are those of Asn, Gln, Glu, Lys, and Arg residues. Is there a simple recognition code in which a particular amino acid always pairs with a particular base? The two hydrogen bonds that can form between Gln or Asn and the N6 and N-7 positions of adenine cannot form with any other base. And an Arg residue can form two hydrogen bonds with N-7 and O6 of guanine
Two examples of specific amino acid residue–base pair interactions that have been observed in DNA-protein binding.
Examination of the structure of many DNA-binding proteins has shown that a protein can recognize each base pair in more than one way which adds to complexity and combination variety. To interact with bases in the major groove of DNA, a protein requires a relatively small substructure that can stably protrude from the protein surface. The DNA-binding domains of regulatory proteins tend to be small (60 to 90 amino acid residues), and the structural motifs within these domains that are actually in contact with the DNA are smaller still. The DNA-binding sites for regulatory proteins are often inverted repeats of a short DNA sequence (a palindrome) at which multiple subunits of a regulatory protein bind cooperatively.
Control of transcription by sequence-specific DNA - binding proteins
How does a cell determine which of its thousands of genes to transcribe? Perhaps the most important concept, one that applies to all species on Earth, is based on a group of proteins known as transcription regulators. These proteins recognize specific sequences of DNA (typically 5–10 nucleotide pairs in length) that are called cis-regulatory sequences because they must be on the same chromosome (that is, in cis) to the genes they control. Transcription Factors (TFs) bind to these sequences, which are dispersed throughout genomes, and this binding puts into motion a series of reactions that ultimately specify which genes are to be transcribed and at what rate. Approximately 10% of the protein-coding genes of most organisms are devoted to transcription regulators, making them one of the largest classes of proteins in the cell. Cis-regulatory sequences typically lie near the gene, often in the intergenic region directly upstream from the transcription start point of the gene. Although a few genes are controlled by a single cis-regulatory sequence that is recognized by a single transcription regulator, the majority have complex arrangements of cis-regulatory sequences, each of which is recognized by a different transcription regulator.
the outside of the double helix is studded with DNA sequence information that transcription regulators recognize: the edge of each base pair presents a distinctive pattern of hydrogen-bond donors, hydrogen-bond acceptors, and hydrophobic patches in both the major and minor grooves
How the different base pairs in DNA can be recognized from their edges without the need to open the double helix. The four possible configurations of base pairs are shown, with potential hydrogen-bond donors indicated in blue, potential hydrogen-bond acceptors in red, and hydrogen bonds of the base pairs themselves as a series of short, parallel red lines. Methyl groups, which form hydrophobic protuberances, are shown in yellow, and hydrogen atoms that are attached to carbons, and are therefore unavailable for hydrogen-bonding, are white. From the major groove, each of the four base-pair configurations projects a unique pattern of features.
Because the major groove is wider and displays more molecular features than does the minor groove, nearly all transcription regulators make the majority of their contacts with the major groove
Transcription Regulators Contain Structural Motifs That Can Read DNA Sequences
Molecular recognition in biology generally relies on an exact fit between the surfaces of two molecules, and the study of transcription regulators has provided some of the clearest examples of this principle. A transcription regulator recognizes a specific cis-regulatory sequence because the surface of the protein is extensively complementary to the special surface features of the double helix that displays that sequence. Each transcription regulator makes a series of contacts with the DNA, involving hydrogen bonds, ionic bonds, and hydrophobic interactions. Although each individual contact is weak, the 20 or so contacts that are typically formed at the protein–DNA interface add together to ensure that the interaction is both highly specific and very strong
Question: How could that recognition have arisen? Had it not have to be fully developed and functioning right from the beginning, otherwise the cell would be unable to recognize which genes to express?
The binding of a transcription regulator to a specific DNA sequence. On the left, a single contact is shown between a transcription regulator and DNA; such contacts allow the protein to “read” the DNA sequence. On the right, the complete set of contacts between a transcription regulator (a member of the homeodomain family) and its cis-regulatory sequence is shown. The DNA-binding portion of the protein is 60 amino acids long. Although the interactions in the major groove are the most important, the protein is also seen to contact both the minor groove and phosphates in the sugar-phosphate DNA backbone.
In fact, DNA–protein interactions include some of the tightest and most specific molecular interactions known in biology. Although each example of protein–DNA recognition is unique in detail, x-ray crystallographic and nuclear magnetic resonance (NMR) spectroscopic studies of hundreds of transcription regulators have revealed that many of them contain one or another of a small set of DNA-binding structural motifs
The transcription factor Code
One important consequence of the combinatorial association of transcription factors is coordinated gene expression. The simultaneous expression of many cell-specific genes can be explained by the binding of transcription factors by the enhancer elements. For example, many genes that are specifically activated in the lens contain an enhancer that binds Pax6. So, all the other transcription factors might be assembled at the enhancer, but until Pax6 binds, they cannot activate the gene. Similarly, many of the coexpressed muscle-specific genes contain enhancers that bind the Mef2 transcription factor, and the enhancers on genes encoding pigment-producing enzymes bind MITF. In some instances, entire ensembles of transcription factors appear to direct simultaneous gene transcription. Junion and colleagues have shown, for example, that a particular ensemble of five transcription factors is bound on hundreds of enhancers that are active in the developing Drosophila heart muscle cells.
Combinatorial Control: In order to specify which gene will be expressed in a given situation, your cells use a diverse collection of DNA-binding proteins to control access to the DNA. Surprisingly, there are relatively few of these proteins: by some estimates, the human genome encodes about 2,600 of them. But then, the capabilities of this limited set are greatly expanded by using them in combination, by requiring two or more to bind simultaneously to activate a gene. In this way, each protein may be used in many ways and the spectrum of responses is far more varied. 12
Usually, a combination of several (as many as six) transcription factors is necessary to form a transcription complex which can harness and activate the RNA polymerase to initiate transcription at the right starting point. A little reflection will convince one of the reason why a set of such protein factors is required, rather than a single protein for each gene. If the latter were the case, that protein would have to be coded for by the expression of another gene, which would, in turn, require another protein transcription factor and so on, leading to an infinite recurrence. However, if a set of proteins is involved, then different combinations can be used for different genes. Thus a smaller number of regulatory proteins can control a large number of genes.
In addition, this provides the means for multiple control at the level of the gene, which has the advantage that transcription may be regulated in a quantitative rather than in an all-or-none manner, and also for producing a network of interacting genes, since the protein product of one gene can affect the expression of another. So one has a combinatorial principle at work here operating at the level of a combination of proteins. In the case of zinc finger proteins, the principle also operates within individual proteins, where different subdomains can be combined to give greater variety or precision of recognition-a microcosm, as it were, of the macroscopic picture.
Transcriptional control is dependent on the interactions of all the TFs and whether they attract RNA polymerase or block it from initiating transcription. Multiple TFs can accumulate, creating a bulk the size of a ribosome. Once bound together, changes to the functional domains of a TF and/or covalent interactions with other factors can turn transcription on or off, depending on whether they allow or prohibit the recruitment of RNA polymerase. Two TFs bound at sites near one another on the DNA strand can combine to form a dimer and bend the DNA in what is believed to be part of the activation process. Some TFs are believed to act as tethering elements between distant enhancers and promoters by forming connections with other proteins.
Combinatorial interactions among transcription factors (TFs) are critical for integrating diverse intrinsic and extrinsic signals, fine-tuning regulatory output and increasing the robustness and plasticity of regulatory systems. In higher eukaryotes, transcription factors (TFs) rarely operate by themselves, but rather directly or indirectly interact with specific partner TFs or chromatin regulators when binding to enhancers. It has been estimated that roughly 75% of all metazoan TFs heterodimerize with other factors 8
Cellular signalling cascades regulate the activity of transcription factors that convert extracellular information into gene regulation. A group of Broad scientists has found that TFs' binding sites within enhancers cluster in distinct patterns reflecting the factors' roles in gene expression control. These patterns may constitute a position-based code 5 The TF clusters may constitute a general regulatory code, with different cell types substituting specific TFs to activate different sets of enhancers. 6
Post-translational modification code for transcription factors
PTMs can involve covalently linking chemical groups, lipids, carbohydrates or (poly)peptide chains to amino acids of the target molecule during or after its translation 10 Cellular responses to environmental or physiological cues rely on transduction pathways that must ensure discrimination between different signals. These cascades ‘crosstalk’ and lead to a combinatorial regulation. This often results in different combinations of posttranslational modifications (PTMs) on target proteins, which might act as a molecular barcode. A PTM code is necessary in the context of transcription factors regulating multiple processes. Thus, the coding potential of PTM combinations should both provide a further layer of information integration from several transduction pathways and warrant highly specific cellular outputs. 9
Some TFs have the ability to regulate several seemingly unrelated processes, and an indiscriminate modulation of their targets, as induced by a global activation or inhibition, would be problematic. For example, FOXO factors regulate functions as diverse as glucose metabolism, cell differentiation, longevity, neuropeptide secretion, stress resistance and apoptosis, cancer progression, and female fertility. How can such TFs properly modulate specific cellular processes in response to different signals? For signalling pathways to mediate precise effects, the ‘molecular behavior’ of these TFs (i.e. interaction with partners and targets) can change specifically as a function of their PTMs. The existence of a PTM code for TFs is expected to have several non-exclusive ways of action: distinct PTM isoforms could have:
(i) distinct DNA-binding specificities;
(ii) different DNA-binding affinities with the same sequence specificity; and/or
(iii) distinct protein partners (altering sequence specificity or not).
Indeed, some PTMs can give rise to new protein-binding abilities by creating new interaction surfaces. Modulation of the PTM status of TFs provides a layer of swift signal integration enabling fine-tuning of cellular responses to environmental or physiological cues. 10
Post-translational modifications of proteins are a central feature of the signal transduction pathways that regulate gene transcription in response to hormones. Modifications that affect the function of transcriptional regulators may also serve as a mechanism to incorporate multiple signals, mediate cross-talk, and modulate specific responses. Hormone-responsive transcription factors are subject to multiple modifications which imply an additional level of regulation conferred by enzymes that mediate specific modifications, such as phosphorylation, ubiquitination, SUMOylation, and S-nitrosylation. These modifications can affect protein stability, sub-cellular localization, interactions with co-repressors and activators, and DNA binding. 11
Post‐translational control of transcription factors: methylation ranks highly
A Eukaryotic Gene Control Region Consists of a Promoter Plus Many cis-Regulatory Sequences
In eukaryotes, RNA polymerase II transcribes all the protein-coding genes and many noncoding RNA genes. This polymerase requires five general transcription factors (27 subunits in toto, in contrast to bacterial RNA polymerase, which needs only a single general transcription factor (the σ subunit). The stepwise assembly of the general transcription factors at a eukaryotic promoter provides, in principle, multiple steps at which the cell can speed up or slow down the rate of transcription initiation in response to transcription regulators. Because the many cis-regulatory sequences that control the expression of a typical gene are often spread over long stretches of DNA, we use the term gene control region to describe the whole expanse of DNA involved in regulating and initiating transcription of a eukaryotic gene. This includes the promoter, where the general transcription factors and the polymerase assemble, plus all of the cis-regulatory sequences to which transcription regulators bind to control the rate of the assembly processes at the promoter
The gene control region for a typical eukaryotic gene.
The promoter is the DNA sequence where the general transcription factors and the polymerase assemble. The cis-regulatory sequences are binding sites for transcription regulators, whose presence on the DNA affects the rate of transcription initiation. These sequences can be located adjacent to the promoter, far upstream of it, or even within introns or entirely downstream of the gene. The broken stretches of DNA signify that the length of DNA between the cis-regulatory sequences and the start of transcription varies, sometimes reaching tens of thousands of nucleotide pairs in length. The TATA box is a DNA recognition sequence for the general transcription factor TFIID. As shown in the lower panel, DNA looping allows transcription regulators bound at any of these positions to interact with the proteins that assemble at the promoter. Many transcription regulators act through Mediator, while some interact with the general transcription factors and RNA polymerase directly. Transcription regulators also act by recruiting proteins that alter the chromatin structure of the promoter. Whereas Mediator and the general transcription factors are the same for all RNA polymerase II-transcribed genes, the transcription regulators and the locations of their binding sites relative to the promoter differ for each gene.
Transcription Factors Exert Combinatorial Control
It is possible for a TF to respond to a physical signal, such as red or far-red light, but the signal must be transduced to a chemically modified activator that interacts with the TF. The complexity and fine gradations of DNA expression in eukaryotes result from combinatorics, in that the combination of chromatin and TF signals, rather than the individual TF signal, is read out.
Structural biology has been at the forefront of the search for a protein-DNA recognition code. 5 Unlike genic coding regions, which are easily interpreted from the triplet code, noncoding regulatory elements are difficult to decode. Regulatory TFBSs are often clustered, with binding sites from different TFs in close proximity to one another. Ultimately, enhancers are difficult to decode and require further experimental work for their identification and functional characterization. It is possible that some complex code, comprising rules from each of the different layers, contributes to TF-DNA binding; however, determining the precise rules of TF binding to the genome will require further high-quality structural and high-throughput binding data.
The proteins encoded by selector and axial patterning genes most often belong to one of two categories: transcription factors or components of signalling pathways. Ultimately, these proteins exert their effect through the control of gene expression. Thus developmental processes such as embryonic axis formation and segmentation are organized by regulating gene expression in discrete regions and cell populations of the embryo. The transcription factors found in the Drosophila toolkit include representatives of most of the known families of sequence-specific DNA-binding proteins. These families are distinguished by the type of secondary structures in the folded protein that are involved in protein subunit interactions and contact with DNA. Most transcription factors possess either a helix-turn-helix, zinc finger, leucine zipper, or helix-loop-helix (HLH) motif
Structural motifs of major transcription factor families
(a) The second and third a-helices of the homeodomain form a helix-turn-helix structure.
(b) The zinc finger motif involves a coordination complex of Zn++ with critically positioned cysteine (C) and histidine (H) residues. The “finger” contacts DNA.
(c) The leucine zipper structure is formed by association of two subunits with regularly spaced leucine residues.
(d) The helix-loop-helix structure is similar to the leucine zipper except that a protein loop interrupts the helices and the association of the subunits.
Common DNA-binding motifs.
The basic helix-loop-helix, helix-turn-helix, zinc finger, and leucine zipper are common structural motifs found in many eukaryotic transcription factors. The primary structure of each motif is shown above a representation of the structure interacting with DNA. The basic helix-loop-helix, helix-turn-helix, and leucine zipper motifs form dimers when interacting with DNA. The zinc finger forms a trimer.
The homeodomain superfamily belongs to the helix-turn-helix class of factors, for example (Fig. a above ). Three proteins with divergent homeodomain sequencesaBicoid, Fushi tarazu, and Zenahave very different roles in organizing
the anteroposterior axis, segmentation, and dorsoventral axis patterning, respectively. All three proteins are also encoded within the Antennapedia Complex, surrounded by Hox genes. Most other homeodomain proteins are encoded by genes dispersed throughout the genome. Several of the gap genes encode zinc finger proteins, whereas genes such as the dorsoventral patterning gene twist and the pair-rule segmentation gene hairy encode basic HLH
proteins. The identification of these DNA-binding motifs in proteins has allowed biologists to deduce the biochemical function of a gene product by inspecting its encoded sequence, rather than resorting to exhaustive biochemical analysis. Furthermore, because these motifs are involved in contact with DNA, they are often constrained with respect to evolutionary changes in their sequence. As a result, these highly conserved motifs are useful for isolating
gene homologs in other taxa.
The second major category of toolkit genes encode proteins involved in the process of cell signaling, either as ligands, receptors for ligands, or components involved in the intracellular transduction of signals. At least seven major signalling pathways operate in the Drosophila embryo:
- the Hedgehog,
- Notch,
- Wingless,
- Dpp/transforming growth factor-β (TGF-β),
- Toll,
- epidermal growth factor (EGF),
- fibroblast growth factor (FGF)
signalling pathways. All of these pathways have similar elements namely, each has at least one signalling ligand, at least one receptor spanning the cell membrane, and at least one DNA-binding transcription factor that responds to signalling inputs by binding to target genes to turn them on or off.
Although the signalling logic may be similar among pathways, the biochemistry is not. Many structural types of ligands and receptors exist, and each pathway regulates the activity of different transcription factors. The response to ligand binding is mediated by a variety of mechanisms that often involve post-translation modifications (for example, protein phosphorylation, proteolysis, binding to a cofactor) that regulate the activity of, or the translocation of, transcription factors to the cell nucleus.
A generic signalling pathway
Most signalling pathways operate through similar logic but have different proteins and signal transduction mechanisms. Signalling begins when membrane-bound receptors bind a ligand, leading to the release or activation of associated intracellular proteins. Receptor activation often leads to the modification of inactive transcription factors that are translocated to the cell nucleus, bind to cis-regulatory DNA sequences or to DNA-binding proteins, and regulate the level of target gene transcription.
Crucial determinants of gene expression patterns are DNA-binding transcription factors that choose genes for transcriptional activation or repression by recognizing the sequence of DNA bases in their promoter regions. [url=http://arquivo.ufv.br/DBV/PGFVG/BVE684/htms/pdfs_revisao/expressaogenica/DNA methylation patterns and epigenetic memory.pdf]1[/url]
Genomes contain both a genetic code specifying amino acids and a regulatory code specifying transcription factor (TF) recognition sequences. 2
Transcription factors (TFs) are proteins that bind to specific sequences on the DNA near their target genes, thus modulating transcription initiation. TFs can activate or repress transcription depending where they bind relative to the transcription start site of the target gene. Each TF regulates a set of genes, in response to specific environmental and/or intracellular triggers. 9 A complete transcriptional regulatory interaction between a TF and its target gene-(s) encompasses
(1) signal sensing,
(2) signal transduction,
(3) the TF; and
(4) the target gene-(s)
A transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate - turn on and off - genes in order to make sure that they are expressed in the right cell at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization (body plan) during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are up to 2600 TFs in the human genome. TFs work alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase (the enzyme that performs the transcription of genetic information from DNA to RNA) to specific genes. A defining feature of TFs is that they contain at least one DNA-binding domain (DBD), which attaches to a specific sequence of DNA adjacent to the genes that they regulate. .Other proteins such as coactivators, chromatin remodelers, histone acetyltransferases, histone deacetylases, kinases, and methylases are also essential to gene regulation, but lack DNA-binding domains, and therefore are not TFs. 4
By binding to the promoter region at the start of other genes they turn them on, initiating the production of another protein, and so on. Each time a cell divides, two cells result which, although they contain the same genome in full, can differ in which genes are turned on and making proteins. Sometimes a 'self-sustaining feedback loop' ensures that a cell maintains its identity and passes it on. Less understood is the mechanism of epigenetics by which chromatin modification may provide cellular memory by blocking or allowing transcription. A major feature of multicellular animals is the use of morphogen gradients, which in effect provide a positioning system that tells a cell wherein the body it is, and hence what sort of cell to become. A gene that is turned on in one cell may make a product that leaves the cell and diffuses through adjacent cells, entering them and turning on genes only when it is present above a certain threshold level. These cells are thus induced into a new fate, and may even generate other morphogens that signal back to the original cell. Over longer distances, morphogens may use the active process of signal transduction. Such signalling controls embryogenesis, the building of a body plan from scratch through a series of sequential steps.
In eukaryotes, transcription can be modulated at many different levels, from the assembly of the core transcriptional machinery to the architecture and intranuclear localization of chromosomes. A vast array of proteins, including RNA polymerases, histones, histone modifiers, transcription factors, and co-factors, are involved in maintaining the accuracy and specificity of the regulatory process. 8 DNA-binding transcription factors (TFs) play a central
role, as they are responsible for directing transcription initiation to specific gene promoters based on their sequence-recognition abilities.
Transcriptional regulation
Transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA (transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from altering the number of copies of RNA that are transcribed, to the temporal control of when the gene is transcribed. This control allows the cell or organism to respond to a variety of intra- and extracellular signals and thus mount a response. Some examples of this include producing the mRNA that encodes enzymes to adapt to a change in a food source, producing the gene products involved in cell cycle-specific activities, and producing the gene products responsible for cellular differentiation in higher eukaryotes, as studied in evolutionary developmental biology.
The regulation of transcription is a vital process in all living organisms. It is orchestrated by transcription factors and other proteins working in concert to finely tune the amount of RNA being produced through a variety of mechanisms. Prokaryotic organisms and eukaryotic organisms have very different strategies for accomplishing control over transcription, but some important features remain conserved between the two. Most importantly is the idea of combinatorial control, which is that any given gene is likely controlled by a specific combination of factors to control transcription. 3
Much of the early understanding of transcription came from prokaryotic organisms, although the extent and complexity of transcriptional regulation is greater in eukaryotes. Prokaryotic transcription is governed by three main sequence elements:
Promoters are elements of DNA that may bind RNA polymerase and other proteins for the successful initiation of transcription directly upstream of the gene. Operators recognize repressor proteins that bind to a stretch of DNA and inhibit the transcription of the gene. Positive control elements that bind to DNA and incite higher levels of transcription While these means of transcriptional regulation also exist in eukaryotes, the transcriptional landscape is significantly more complicated both by the number of proteins involved as well as by the presence of introns and the packaging of DNA into histones.
Perspectives on Gene Regulatory Network Evolution
2017
Animal development proceeds through the activity of genes and their cis-regulatory modules (CRMs) working together in sets of gene regulatory networks (GRNs).
The emergence of species-specific traits and novel structures results from evolutionary changes in GRNs. This is a baseless claim. To understand evolution we must study how GRNs evolve at the molecular scale. That sequent sentence proves my point. They do not know how the GRN's supposed evolution changes body plans. A relatively small set of common ‘toolkit’ genes shape the animal body plan regulated by conserved ‘subcircuits’ of regulatory interactions, themselves parts of larger GRNs. GRNs consist of transcription factors (TFs) and enhancers (CRMs) that control spatiotemporal patterns of gene expression. A strict definition of GRNs requires at a minimum both TFs and CRMs. The implicit admission of irreducibility is evident. For a GRN to be understood as a network, both edges (TFs) and nodes (CRMs) must be defined, and changes to both the TFs and the CRMs must be considered when exploring the mechanisms of GRN
1. http://arquivo.ufv.br/DBV/PGFVG/BVE684/htms/pdfs_revisao/expressaogenica/DNA methylation patterns and epigenetic memory.pdf]http://arquivo.ufv.br/DBV/PGFVG/BVE684/htms/pdfs_revisao/expressaogenica/DNA%20methylation%20patterns%20and%20epigenetic%20memory.pdf
2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3967546/
3. http://sci-hub.tw/https://nyaspubs.onlinelibrary.wiley.com/doi/abs/10.1111/j.1749-6632.1995.tb24814.x
4. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4149858/
5. https://www.broadinstitute.org/news/sketching-out-transcription-factor-code-binding-patterns-reflect-factors-gene-expression-roles
6. http://www.pnas.org/content/115/30/E7222
7. https://www.nature.com/scitable/topicpage/transcription-factors-and-transcriptional-control-in-eukaryotic-1046
8. https://academic.oup.com/nar/article/42/4/e24/2435199
9. http://gero.usc.edu/labs/benayounlab/files/2018/06/A-post-translational-modification-code-for-transcription-factors.pdf
10. https://www.ccs.neu.edu/home/radivojac/papers/nussinov_trendsbiochemsci_2012.pdf
11. https://academic.oup.com/jxb/article/66/16/4933/499938
12. http://pdb101.rcsb.org/motm/122
https://reasonandscience.catsboard.com/t2738-transcription-factors-tf
DNA is like a politician, surrounded by a flock of protein handlers and advisers that must vigorously massage it, twist it, and on occasion, reinvent it before the grand blueprint of the body can make any sense at all.” These “handlers and advisers” are the transcription factors. During development, transcription factors play essential roles in every aspect of embryogenesis, controlling differential gene expression leading to differentiation. When in doubt, it is usually a transcription factor’s fault, a sentiment that is often used by politicians, too. Transcription factors can be grouped together in families based on similarities in DNA-binding domains
Slight differences of the transcription factors in the amino acids of their DNA-binding sites at the binding site can cause the binding site to recognize different DNA sequences. DNA regulatory elements such as enhancers and silencers function by binding transcription factors, and each element can have binding sites for several transcription factors. Transcription factors bind to the DNA of the regulatory element using one site on the protein and other sites to interact with other transcription factors and proteins, leading to the recruitment of histone-modifying enzymes.
Question: How could the recruitment of the right TF partners be due to mutations and natural selection, if there is almost an infinite number of combinations which would not work, and interaction of the other side of TF's with histone-modifying enzymes depends on a finely tuned, orchestrated, and highly specific interactions of three players: TF's, histone-modifying enzymes, and histones ?
For example, the association of the Pax6, Sox2, and l-Maf transcription factors in lens cells recruits a histone acetyltransferase that can transfer acetyl groups to the histones and dissociate the nucleosomes in that area. Similarly, when MITF, a transcription factor essential for ear development and pigment production, binds to its specific DNA sequence, it also binds to (different) histone acetyltransferase that facilitates the dissociation of nucleosomes. In addition, the Pax7 transcription factor that activates muscle-specific genes binds to the enhancer region of these genes within the muscle precursor cells. Pax7 then recruits a histone methyltransferase that methylates the lysine in the fourth position of histone H3 (H3K4), resulting in the trimethylation of this lysine and the activation of transcription. The displacement of nucleosomes along the DNA makes it possible for other transcription factors to find their binding sites and regulate expression. In addition to recruiting histone-modifying enzymes, transcription factors can also work by stabilizing the transcription pre-initiation complex that enables RNA polymerase II to bind to the promoter
The bridge between enhancer and promoter can be made by transcription factors.
Certain transcription factors bind to DNA on the promoter (where RNA polymerase II will initiate transcription), whereas other transcription factors bind to the enhancer (which regulates when and where transcription can occur). Other transcription factors do not bind to the DNA; rather, they link the transcription factors that have bound to the enhancer and promoter sequences. In this way, the chromatin loops to bring the enhancer to the promoter. The example shown here is the mouse β-globin gene. (A) Transcription factors assemble on the enhancer, but the promoter is not used until the GATA1 transcription factor binds to the promoter. (B) GATA1 can recruit several other factors, including Ldb1, which forms a link uniting the enhancer-bound factors to the promoter-bound factors.
At any one time most of the genes of a higher organism are silent: only a minority of the genes in a given cell is expressed, that is, being transcribed into messenger RNA, which in turn acts as a template for protein synthesis. The selective expression of any one gene is accomplished primarily through the interaction of protein transcription factors with characteristic DNA sequences included in the control region of the gene, which is most commonly located near to the actual coding region. The binding of a set of such factors, or regulatory proteins, acts as a molecular switch for the activation of RNA polymerase and other components of the transcriptional machinery, which are common to all genes. The supply of a particular combination of such transcription factors ensures that a gene is switched on at the right place and at the right time. 3
Gene expression is controlled by binding of a protein to a regulatory region, or promoter. TFs that bind to cis-regulator DNA sequences are responsible for either positively or negatively influencing the transcription of specific genes, essentially determining whether a particular gene will be turned "on" or "off" in an organism. Much of the complexity in differentiation in animal and plant cells can be attributed to elaborate systems made up of short (6 to 8 base pair) cis-regulatory DNA sequences or motifs, as well as the TFs that bind to the motifs, interact with each other to form complexes, and recruit RNA polymerase II. Most eukaryotic genes have promoters that consist of the TATA box close to the 5' end of the gene and, farther upstream, several motifs recognized by specific transcription factors. 7
Transcription Factors are required to initiate transcription
Differentiated eukaryotic cells possess a remarkable capacity for the selective expression of specific genes. The synthesis rates of a particular protein in two cells of the same organism may differ by as much as a factor of 10^9, or 1 billion. For example, reticulocytes (immature red blood cells) synthesize large amounts of haemoglobin but no detectable insulin, whereas the pancreatic cells produce large quantities of insulin but no haemoglobin.
In contrast, prokaryotic systems generally exhibit no more than a thousandfold range in their transcription rates so that at least a few copies of all the proteins they encode are present in any cell. Nevertheless, the basic mechanism for initiating transcription of structural genes is the same in eukaryotes and prokaryotes: Protein factors bind selectively to the promoter regions of DNA.
A complex of at least six general transcription factors operate in Eukaryotic Cells and have in prokaryotic Cells an equivalent of just one a prokaryotic σ-factor.
Properties and Functions of the Eukaryotic General Transcription Factors
The six GTFs are highly conserved from yeast to humans.
Question: How could there have been a transition from simpler transcription initiation in eukaryotes, using just one σ-factor, to six, in eukaryotic cells, considering, that the six GTFs are highly conserved ( they have not changed in supposed deep evolutionary time ? )
Complex Switches Control Gene Transcription in Eukaryotes
When compared to the situation in bacteria, transcription regulation in eukaryotes involves many more proteins and much longer stretches of DNA. It often seems bewilderingly complex. Yet many of the same principles apply. As in bacteria, the time and place that each gene is to be transcribed is specified by its cis-regulatory sequences, which are “read” by the transcription regulators that bind to them. Once bound to DNA, positive transcription regulators (activators) help RNA polymerase begin transcribing genes, and negative regulators (repressors) block this from happening. In bacteria, most of the interactions between DNA-bound transcription regulators and RNA polymerases (whether they activate or repress transcription) are direct. In contrast, these interactions are almost always indirect in eukaryotes: many intermediate proteins, including the histones, act between the DNA-bound transcription regulator and RNA polymerase. Moreover, in multicellular organisms, it is common for dozens of transcription regulators to control a single gene, with cis-regulatory sequences spread over tens of thousands of nucleotide pairs. DNA looping allows the DNA-bound regulatory proteins to interact with each other and ultimately with RNA polymerase at the promoter. Finally, because nearly all of the DNA in eukaryotic organisms is compacted by nucleosomes and higher-order structures, transcription initiation in eukaryotes must overcome this inherent block.
Transcription factor domains
Transcription factors have three major domains. The first is a DNA -binding domain that recognizes a particular DNA sequence in the enhancer. There are several different types of DNA-binding domains, and they often designate the major family classifications for transcription factors. Some of the most common protein domains that convey DNA binding are the Homeodomain, Zinc Finger Leucine Zipper, Helix-Loop-Helix, and Helix-Turn-Helix. For instance, the homeodomain transcription factor Pax69 uses its paired DNA-binding sites to recognize the enhancer sequence, CAATTAGTCACGCTTGA. In contrast, the MITF transcription factor involved in ear and pigment cell development contains both leucine zipper and helix-loop-helix domains, and it recognizes shorter DNA sequences called the E-box (CACGTG) and the M-box. These sequences for MITF binding have been found in the regulatory regions of genes encoding several pigment-cell-specific enzymes of the tyrosinase family. Without MITF, these proteins are not synthesized properly, and melanin pigment is not made.
The second domain is a trans-activating domain that activates or suppresses the transcription of the gene whose promoter or enhancer it has bound. Usually, this transactivating domain enables the transcription factor to interact with the proteins involved in binding RNA polymerase II or with enzymes that modify histones. MITF contains such a domain of amino acids in the center of the protein. When the MITF dimer is bound to its target sequence in the enhancer, the trans-activating region is able to bind a transcription-associated factor (TAF), p300/CBP. The p300/CBP protein is a histone acetyltransferase enzyme that can transfer acetyl groups to each histone in the nucleosomes. Acetylation of the nucleosomes destabilizes them and allows the genes for pigment-forming enzymes to be expressed.
Finally, third, there is usually a protein-protein interaction domain that allows the transcription factor’s activity to be modulated by TAFs or other transcription factors. MITF has a protein-protein interaction domain that enables it to dimerize with another MITF protein. The resulting homodimer (i.e., two identical protein molecules bound together) is the functional protein that binds to enhancer DNA of certain genes and activates transcription
Three-dimensional model of the homodimeric transcription factor MITF
(one protein shown in red, the other in blue) binding to a promoter element in DNA (white). The amino termini are located at the bottom of the figure and form the DNA-binding domains that recognize an 11-base-pair sequence of DNA having the core sequence CATGTG. The protein-protein interaction domain is located immediately above. MITF has the basic helix-loop-helix structure found in many transcription factors. The carboxyl end of the molecule is thought to be the trans-activating domains that bind the p300/CBP transcription-associated factor
(TAF).
Insulators
The boundaries of gene expression appear to be set by DNA sequences called insulators. Insulator sequences limit the range in which an enhancer can activate gene expression. They thereby “insulate” a promoter from being activated by another gene’s enhancers. Some insulator DNA regions have been found to bind a zinc-finger transcription factor called CTCF,11 which functions to alter the three-dimensional conformation of chromatin and thereby separate (or insulate) enhancer elements from the promoter. CTCF is ubiquitously expressed in eukaryotes and has been charted to bind tens of thousands of sites on the genome. Mechanistically, CTCF physically interacts with cohesin, a ring-shaped complex of multiple subunits that function to stabilize chromatin loop structures. It is hypothesized that CTCF uses its 11 zinc-finger domains to selectively bind DNA, often insulator elements, to create loop structures that distance enhancers from promoters. For instance, the chick β-globin gene has been shown to form a complex with cohesin. This CTCF-cohesin complex may bind to the enhancer-bound Mediator, thereby preventing the enhancer from activating the adjacent promoter.
Pioneer transcription factors: breaking the silence
Finding an enhancer is not easy because the DNA is usually so wound up that the enhancer sites are not accessible. Given that the enhancer might be covered by nucleosomes, how can a transcription factor find its binding site? That is the job of certain transcription factors that penetrate repressed chromatin and bind to their enhancer DNA sequences. They have been called “pioneer” transcription factors, and they appear to be critical in establishing certain cell lineages. One of these transcription factors is FoxA1, which binds to certain enhancers and opens up the chromatin to allow other transcription factors access to the promoter. FoxA1 is extremely important in specifying liver cells, remaining bound to the DNA during mitosis, and providing a mechanism to reestablish normal transcription in presumptive liver cells. Another pioneer transcription factor is the Pax7 protein mentioned above. It activates muscle-specific gene transcription in a population of muscle stem cells by binding to its DNA recognition sequence and being stabilized there by dimethylated H3K4 on the nucleosomes. It then recruits the histone methyltransferase that converts the dimethylated H3K4 into the trimethylated H3K4 associated with active transcription.
Regulatory Proteins ( transcription factors ) Have Discrete DNA-Binding Domains
Transcription factors ( TFs) must recognize short, specific cis-regulatory sequences within this structure. The outside of the double helix is studded with DNA sequence information that transcription regulators recognize: the edge of each base pair presents a distinctive pattern of hydrogen-bond donors, hydrogen-bond acceptors, and hydrophobic patches in both the major and minor grooves. Because the major groove is wider and displays more molecular features than does the minor groove, nearly all transcription regulators make the majority of their contacts with the major groove.
TFs recognize DNA in a specific manner; the mechanisms underlying this specificity have been identified for many TFs, based on three-dimensional structures of protein-DNA complexes. 4 Regulatory proteins generally bind to specific DNA sequences. Their affinity for these target sequences is roughly 10^4 to 10^6 times higher than their affinity for any other DNA sequence. Most regulatory proteins have discrete DNA-binding domains containing substructures that interact closely and specifically with the DNA. These binding domains usually include one or more of a relatively small group of recognizable and characteristic structural motifs. To bind specifically to DNA sequences, regulatory proteins must recognize surface features on the DNA. Most of the chemical groups that differ among the four DNA bases and thus permit discrimination between base pairs are hydrogen-bond donor and acceptor groups exposed in the major groove of DNA and most of the protein-DNA contacts that impart specificity are hydrogen bonds.
Groups in DNA available for protein binding.
Shown here are functional groups on all four base pairs that are displayed in the major and minor grooves of DNA. Groups that can be used for base-pair recognition by proteins are shown in red.
A notable exception is the nonpolar surface near C-5 of pyrimidines, where thymine is readily distinguished from cytosine by its protruding methyl group. Protein-DNA contacts are also possible in the minor groove of the DNA, but the hydrogen-bonding patterns here generally do not allow ready discrimination between base pairs. Within regulatory proteins, the amino acid side chains most often hydrogen-bonding to bases in the DNA are those of Asn, Gln, Glu, Lys, and Arg residues. Is there a simple recognition code in which a particular amino acid always pairs with a particular base? The two hydrogen bonds that can form between Gln or Asn and the N6 and N-7 positions of adenine cannot form with any other base. And an Arg residue can form two hydrogen bonds with N-7 and O6 of guanine
Two examples of specific amino acid residue–base pair interactions that have been observed in DNA-protein binding.
Examination of the structure of many DNA-binding proteins has shown that a protein can recognize each base pair in more than one way which adds to complexity and combination variety. To interact with bases in the major groove of DNA, a protein requires a relatively small substructure that can stably protrude from the protein surface. The DNA-binding domains of regulatory proteins tend to be small (60 to 90 amino acid residues), and the structural motifs within these domains that are actually in contact with the DNA are smaller still. The DNA-binding sites for regulatory proteins are often inverted repeats of a short DNA sequence (a palindrome) at which multiple subunits of a regulatory protein bind cooperatively.
Control of transcription by sequence-specific DNA - binding proteins
How does a cell determine which of its thousands of genes to transcribe? Perhaps the most important concept, one that applies to all species on Earth, is based on a group of proteins known as transcription regulators. These proteins recognize specific sequences of DNA (typically 5–10 nucleotide pairs in length) that are called cis-regulatory sequences because they must be on the same chromosome (that is, in cis) to the genes they control. Transcription Factors (TFs) bind to these sequences, which are dispersed throughout genomes, and this binding puts into motion a series of reactions that ultimately specify which genes are to be transcribed and at what rate. Approximately 10% of the protein-coding genes of most organisms are devoted to transcription regulators, making them one of the largest classes of proteins in the cell. Cis-regulatory sequences typically lie near the gene, often in the intergenic region directly upstream from the transcription start point of the gene. Although a few genes are controlled by a single cis-regulatory sequence that is recognized by a single transcription regulator, the majority have complex arrangements of cis-regulatory sequences, each of which is recognized by a different transcription regulator.
the outside of the double helix is studded with DNA sequence information that transcription regulators recognize: the edge of each base pair presents a distinctive pattern of hydrogen-bond donors, hydrogen-bond acceptors, and hydrophobic patches in both the major and minor grooves
How the different base pairs in DNA can be recognized from their edges without the need to open the double helix. The four possible configurations of base pairs are shown, with potential hydrogen-bond donors indicated in blue, potential hydrogen-bond acceptors in red, and hydrogen bonds of the base pairs themselves as a series of short, parallel red lines. Methyl groups, which form hydrophobic protuberances, are shown in yellow, and hydrogen atoms that are attached to carbons, and are therefore unavailable for hydrogen-bonding, are white. From the major groove, each of the four base-pair configurations projects a unique pattern of features.
Because the major groove is wider and displays more molecular features than does the minor groove, nearly all transcription regulators make the majority of their contacts with the major groove
Transcription Regulators Contain Structural Motifs That Can Read DNA Sequences
Molecular recognition in biology generally relies on an exact fit between the surfaces of two molecules, and the study of transcription regulators has provided some of the clearest examples of this principle. A transcription regulator recognizes a specific cis-regulatory sequence because the surface of the protein is extensively complementary to the special surface features of the double helix that displays that sequence. Each transcription regulator makes a series of contacts with the DNA, involving hydrogen bonds, ionic bonds, and hydrophobic interactions. Although each individual contact is weak, the 20 or so contacts that are typically formed at the protein–DNA interface add together to ensure that the interaction is both highly specific and very strong
Question: How could that recognition have arisen? Had it not have to be fully developed and functioning right from the beginning, otherwise the cell would be unable to recognize which genes to express?
The binding of a transcription regulator to a specific DNA sequence. On the left, a single contact is shown between a transcription regulator and DNA; such contacts allow the protein to “read” the DNA sequence. On the right, the complete set of contacts between a transcription regulator (a member of the homeodomain family) and its cis-regulatory sequence is shown. The DNA-binding portion of the protein is 60 amino acids long. Although the interactions in the major groove are the most important, the protein is also seen to contact both the minor groove and phosphates in the sugar-phosphate DNA backbone.
In fact, DNA–protein interactions include some of the tightest and most specific molecular interactions known in biology. Although each example of protein–DNA recognition is unique in detail, x-ray crystallographic and nuclear magnetic resonance (NMR) spectroscopic studies of hundreds of transcription regulators have revealed that many of them contain one or another of a small set of DNA-binding structural motifs
The transcription factor Code
One important consequence of the combinatorial association of transcription factors is coordinated gene expression. The simultaneous expression of many cell-specific genes can be explained by the binding of transcription factors by the enhancer elements. For example, many genes that are specifically activated in the lens contain an enhancer that binds Pax6. So, all the other transcription factors might be assembled at the enhancer, but until Pax6 binds, they cannot activate the gene. Similarly, many of the coexpressed muscle-specific genes contain enhancers that bind the Mef2 transcription factor, and the enhancers on genes encoding pigment-producing enzymes bind MITF. In some instances, entire ensembles of transcription factors appear to direct simultaneous gene transcription. Junion and colleagues have shown, for example, that a particular ensemble of five transcription factors is bound on hundreds of enhancers that are active in the developing Drosophila heart muscle cells.
Combinatorial Control: In order to specify which gene will be expressed in a given situation, your cells use a diverse collection of DNA-binding proteins to control access to the DNA. Surprisingly, there are relatively few of these proteins: by some estimates, the human genome encodes about 2,600 of them. But then, the capabilities of this limited set are greatly expanded by using them in combination, by requiring two or more to bind simultaneously to activate a gene. In this way, each protein may be used in many ways and the spectrum of responses is far more varied. 12
Usually, a combination of several (as many as six) transcription factors is necessary to form a transcription complex which can harness and activate the RNA polymerase to initiate transcription at the right starting point. A little reflection will convince one of the reason why a set of such protein factors is required, rather than a single protein for each gene. If the latter were the case, that protein would have to be coded for by the expression of another gene, which would, in turn, require another protein transcription factor and so on, leading to an infinite recurrence. However, if a set of proteins is involved, then different combinations can be used for different genes. Thus a smaller number of regulatory proteins can control a large number of genes.
In addition, this provides the means for multiple control at the level of the gene, which has the advantage that transcription may be regulated in a quantitative rather than in an all-or-none manner, and also for producing a network of interacting genes, since the protein product of one gene can affect the expression of another. So one has a combinatorial principle at work here operating at the level of a combination of proteins. In the case of zinc finger proteins, the principle also operates within individual proteins, where different subdomains can be combined to give greater variety or precision of recognition-a microcosm, as it were, of the macroscopic picture.
Transcriptional control is dependent on the interactions of all the TFs and whether they attract RNA polymerase or block it from initiating transcription. Multiple TFs can accumulate, creating a bulk the size of a ribosome. Once bound together, changes to the functional domains of a TF and/or covalent interactions with other factors can turn transcription on or off, depending on whether they allow or prohibit the recruitment of RNA polymerase. Two TFs bound at sites near one another on the DNA strand can combine to form a dimer and bend the DNA in what is believed to be part of the activation process. Some TFs are believed to act as tethering elements between distant enhancers and promoters by forming connections with other proteins.
Combinatorial interactions among transcription factors (TFs) are critical for integrating diverse intrinsic and extrinsic signals, fine-tuning regulatory output and increasing the robustness and plasticity of regulatory systems. In higher eukaryotes, transcription factors (TFs) rarely operate by themselves, but rather directly or indirectly interact with specific partner TFs or chromatin regulators when binding to enhancers. It has been estimated that roughly 75% of all metazoan TFs heterodimerize with other factors 8
Cellular signalling cascades regulate the activity of transcription factors that convert extracellular information into gene regulation. A group of Broad scientists has found that TFs' binding sites within enhancers cluster in distinct patterns reflecting the factors' roles in gene expression control. These patterns may constitute a position-based code 5 The TF clusters may constitute a general regulatory code, with different cell types substituting specific TFs to activate different sets of enhancers. 6
Post-translational modification code for transcription factors
PTMs can involve covalently linking chemical groups, lipids, carbohydrates or (poly)peptide chains to amino acids of the target molecule during or after its translation 10 Cellular responses to environmental or physiological cues rely on transduction pathways that must ensure discrimination between different signals. These cascades ‘crosstalk’ and lead to a combinatorial regulation. This often results in different combinations of posttranslational modifications (PTMs) on target proteins, which might act as a molecular barcode. A PTM code is necessary in the context of transcription factors regulating multiple processes. Thus, the coding potential of PTM combinations should both provide a further layer of information integration from several transduction pathways and warrant highly specific cellular outputs. 9
Some TFs have the ability to regulate several seemingly unrelated processes, and an indiscriminate modulation of their targets, as induced by a global activation or inhibition, would be problematic. For example, FOXO factors regulate functions as diverse as glucose metabolism, cell differentiation, longevity, neuropeptide secretion, stress resistance and apoptosis, cancer progression, and female fertility. How can such TFs properly modulate specific cellular processes in response to different signals? For signalling pathways to mediate precise effects, the ‘molecular behavior’ of these TFs (i.e. interaction with partners and targets) can change specifically as a function of their PTMs. The existence of a PTM code for TFs is expected to have several non-exclusive ways of action: distinct PTM isoforms could have:
(i) distinct DNA-binding specificities;
(ii) different DNA-binding affinities with the same sequence specificity; and/or
(iii) distinct protein partners (altering sequence specificity or not).
Indeed, some PTMs can give rise to new protein-binding abilities by creating new interaction surfaces. Modulation of the PTM status of TFs provides a layer of swift signal integration enabling fine-tuning of cellular responses to environmental or physiological cues. 10
Post-translational modifications of proteins are a central feature of the signal transduction pathways that regulate gene transcription in response to hormones. Modifications that affect the function of transcriptional regulators may also serve as a mechanism to incorporate multiple signals, mediate cross-talk, and modulate specific responses. Hormone-responsive transcription factors are subject to multiple modifications which imply an additional level of regulation conferred by enzymes that mediate specific modifications, such as phosphorylation, ubiquitination, SUMOylation, and S-nitrosylation. These modifications can affect protein stability, sub-cellular localization, interactions with co-repressors and activators, and DNA binding. 11
Post‐translational control of transcription factors: methylation ranks highly
A Eukaryotic Gene Control Region Consists of a Promoter Plus Many cis-Regulatory Sequences
In eukaryotes, RNA polymerase II transcribes all the protein-coding genes and many noncoding RNA genes. This polymerase requires five general transcription factors (27 subunits in toto, in contrast to bacterial RNA polymerase, which needs only a single general transcription factor (the σ subunit). The stepwise assembly of the general transcription factors at a eukaryotic promoter provides, in principle, multiple steps at which the cell can speed up or slow down the rate of transcription initiation in response to transcription regulators. Because the many cis-regulatory sequences that control the expression of a typical gene are often spread over long stretches of DNA, we use the term gene control region to describe the whole expanse of DNA involved in regulating and initiating transcription of a eukaryotic gene. This includes the promoter, where the general transcription factors and the polymerase assemble, plus all of the cis-regulatory sequences to which transcription regulators bind to control the rate of the assembly processes at the promoter
The gene control region for a typical eukaryotic gene.
The promoter is the DNA sequence where the general transcription factors and the polymerase assemble. The cis-regulatory sequences are binding sites for transcription regulators, whose presence on the DNA affects the rate of transcription initiation. These sequences can be located adjacent to the promoter, far upstream of it, or even within introns or entirely downstream of the gene. The broken stretches of DNA signify that the length of DNA between the cis-regulatory sequences and the start of transcription varies, sometimes reaching tens of thousands of nucleotide pairs in length. The TATA box is a DNA recognition sequence for the general transcription factor TFIID. As shown in the lower panel, DNA looping allows transcription regulators bound at any of these positions to interact with the proteins that assemble at the promoter. Many transcription regulators act through Mediator, while some interact with the general transcription factors and RNA polymerase directly. Transcription regulators also act by recruiting proteins that alter the chromatin structure of the promoter. Whereas Mediator and the general transcription factors are the same for all RNA polymerase II-transcribed genes, the transcription regulators and the locations of their binding sites relative to the promoter differ for each gene.
Transcription Factors Exert Combinatorial Control
It is possible for a TF to respond to a physical signal, such as red or far-red light, but the signal must be transduced to a chemically modified activator that interacts with the TF. The complexity and fine gradations of DNA expression in eukaryotes result from combinatorics, in that the combination of chromatin and TF signals, rather than the individual TF signal, is read out.
Structural biology has been at the forefront of the search for a protein-DNA recognition code. 5 Unlike genic coding regions, which are easily interpreted from the triplet code, noncoding regulatory elements are difficult to decode. Regulatory TFBSs are often clustered, with binding sites from different TFs in close proximity to one another. Ultimately, enhancers are difficult to decode and require further experimental work for their identification and functional characterization. It is possible that some complex code, comprising rules from each of the different layers, contributes to TF-DNA binding; however, determining the precise rules of TF binding to the genome will require further high-quality structural and high-throughput binding data.
The proteins encoded by selector and axial patterning genes most often belong to one of two categories: transcription factors or components of signalling pathways. Ultimately, these proteins exert their effect through the control of gene expression. Thus developmental processes such as embryonic axis formation and segmentation are organized by regulating gene expression in discrete regions and cell populations of the embryo. The transcription factors found in the Drosophila toolkit include representatives of most of the known families of sequence-specific DNA-binding proteins. These families are distinguished by the type of secondary structures in the folded protein that are involved in protein subunit interactions and contact with DNA. Most transcription factors possess either a helix-turn-helix, zinc finger, leucine zipper, or helix-loop-helix (HLH) motif
Structural motifs of major transcription factor families
(a) The second and third a-helices of the homeodomain form a helix-turn-helix structure.
(b) The zinc finger motif involves a coordination complex of Zn++ with critically positioned cysteine (C) and histidine (H) residues. The “finger” contacts DNA.
(c) The leucine zipper structure is formed by association of two subunits with regularly spaced leucine residues.
(d) The helix-loop-helix structure is similar to the leucine zipper except that a protein loop interrupts the helices and the association of the subunits.
Common DNA-binding motifs.
The basic helix-loop-helix, helix-turn-helix, zinc finger, and leucine zipper are common structural motifs found in many eukaryotic transcription factors. The primary structure of each motif is shown above a representation of the structure interacting with DNA. The basic helix-loop-helix, helix-turn-helix, and leucine zipper motifs form dimers when interacting with DNA. The zinc finger forms a trimer.
The homeodomain superfamily belongs to the helix-turn-helix class of factors, for example (Fig. a above ). Three proteins with divergent homeodomain sequencesaBicoid, Fushi tarazu, and Zenahave very different roles in organizing
the anteroposterior axis, segmentation, and dorsoventral axis patterning, respectively. All three proteins are also encoded within the Antennapedia Complex, surrounded by Hox genes. Most other homeodomain proteins are encoded by genes dispersed throughout the genome. Several of the gap genes encode zinc finger proteins, whereas genes such as the dorsoventral patterning gene twist and the pair-rule segmentation gene hairy encode basic HLH
proteins. The identification of these DNA-binding motifs in proteins has allowed biologists to deduce the biochemical function of a gene product by inspecting its encoded sequence, rather than resorting to exhaustive biochemical analysis. Furthermore, because these motifs are involved in contact with DNA, they are often constrained with respect to evolutionary changes in their sequence. As a result, these highly conserved motifs are useful for isolating
gene homologs in other taxa.
The second major category of toolkit genes encode proteins involved in the process of cell signaling, either as ligands, receptors for ligands, or components involved in the intracellular transduction of signals. At least seven major signalling pathways operate in the Drosophila embryo:
- the Hedgehog,
- Notch,
- Wingless,
- Dpp/transforming growth factor-β (TGF-β),
- Toll,
- epidermal growth factor (EGF),
- fibroblast growth factor (FGF)
signalling pathways. All of these pathways have similar elements namely, each has at least one signalling ligand, at least one receptor spanning the cell membrane, and at least one DNA-binding transcription factor that responds to signalling inputs by binding to target genes to turn them on or off.
Although the signalling logic may be similar among pathways, the biochemistry is not. Many structural types of ligands and receptors exist, and each pathway regulates the activity of different transcription factors. The response to ligand binding is mediated by a variety of mechanisms that often involve post-translation modifications (for example, protein phosphorylation, proteolysis, binding to a cofactor) that regulate the activity of, or the translocation of, transcription factors to the cell nucleus.
A generic signalling pathway
Most signalling pathways operate through similar logic but have different proteins and signal transduction mechanisms. Signalling begins when membrane-bound receptors bind a ligand, leading to the release or activation of associated intracellular proteins. Receptor activation often leads to the modification of inactive transcription factors that are translocated to the cell nucleus, bind to cis-regulatory DNA sequences or to DNA-binding proteins, and regulate the level of target gene transcription.
Crucial determinants of gene expression patterns are DNA-binding transcription factors that choose genes for transcriptional activation or repression by recognizing the sequence of DNA bases in their promoter regions. [url=http://arquivo.ufv.br/DBV/PGFVG/BVE684/htms/pdfs_revisao/expressaogenica/DNA methylation patterns and epigenetic memory.pdf]1[/url]
Genomes contain both a genetic code specifying amino acids and a regulatory code specifying transcription factor (TF) recognition sequences. 2
Transcription factors (TFs) are proteins that bind to specific sequences on the DNA near their target genes, thus modulating transcription initiation. TFs can activate or repress transcription depending where they bind relative to the transcription start site of the target gene. Each TF regulates a set of genes, in response to specific environmental and/or intracellular triggers. 9 A complete transcriptional regulatory interaction between a TF and its target gene-(s) encompasses
(1) signal sensing,
(2) signal transduction,
(3) the TF; and
(4) the target gene-(s)
A transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate - turn on and off - genes in order to make sure that they are expressed in the right cell at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization (body plan) during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are up to 2600 TFs in the human genome. TFs work alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase (the enzyme that performs the transcription of genetic information from DNA to RNA) to specific genes. A defining feature of TFs is that they contain at least one DNA-binding domain (DBD), which attaches to a specific sequence of DNA adjacent to the genes that they regulate. .Other proteins such as coactivators, chromatin remodelers, histone acetyltransferases, histone deacetylases, kinases, and methylases are also essential to gene regulation, but lack DNA-binding domains, and therefore are not TFs. 4
By binding to the promoter region at the start of other genes they turn them on, initiating the production of another protein, and so on. Each time a cell divides, two cells result which, although they contain the same genome in full, can differ in which genes are turned on and making proteins. Sometimes a 'self-sustaining feedback loop' ensures that a cell maintains its identity and passes it on. Less understood is the mechanism of epigenetics by which chromatin modification may provide cellular memory by blocking or allowing transcription. A major feature of multicellular animals is the use of morphogen gradients, which in effect provide a positioning system that tells a cell wherein the body it is, and hence what sort of cell to become. A gene that is turned on in one cell may make a product that leaves the cell and diffuses through adjacent cells, entering them and turning on genes only when it is present above a certain threshold level. These cells are thus induced into a new fate, and may even generate other morphogens that signal back to the original cell. Over longer distances, morphogens may use the active process of signal transduction. Such signalling controls embryogenesis, the building of a body plan from scratch through a series of sequential steps.
In eukaryotes, transcription can be modulated at many different levels, from the assembly of the core transcriptional machinery to the architecture and intranuclear localization of chromosomes. A vast array of proteins, including RNA polymerases, histones, histone modifiers, transcription factors, and co-factors, are involved in maintaining the accuracy and specificity of the regulatory process. 8 DNA-binding transcription factors (TFs) play a central
role, as they are responsible for directing transcription initiation to specific gene promoters based on their sequence-recognition abilities.
Transcriptional regulation
Transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA (transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from altering the number of copies of RNA that are transcribed, to the temporal control of when the gene is transcribed. This control allows the cell or organism to respond to a variety of intra- and extracellular signals and thus mount a response. Some examples of this include producing the mRNA that encodes enzymes to adapt to a change in a food source, producing the gene products involved in cell cycle-specific activities, and producing the gene products responsible for cellular differentiation in higher eukaryotes, as studied in evolutionary developmental biology.
The regulation of transcription is a vital process in all living organisms. It is orchestrated by transcription factors and other proteins working in concert to finely tune the amount of RNA being produced through a variety of mechanisms. Prokaryotic organisms and eukaryotic organisms have very different strategies for accomplishing control over transcription, but some important features remain conserved between the two. Most importantly is the idea of combinatorial control, which is that any given gene is likely controlled by a specific combination of factors to control transcription. 3
Much of the early understanding of transcription came from prokaryotic organisms, although the extent and complexity of transcriptional regulation is greater in eukaryotes. Prokaryotic transcription is governed by three main sequence elements:
Promoters are elements of DNA that may bind RNA polymerase and other proteins for the successful initiation of transcription directly upstream of the gene. Operators recognize repressor proteins that bind to a stretch of DNA and inhibit the transcription of the gene. Positive control elements that bind to DNA and incite higher levels of transcription While these means of transcriptional regulation also exist in eukaryotes, the transcriptional landscape is significantly more complicated both by the number of proteins involved as well as by the presence of introns and the packaging of DNA into histones.
Perspectives on Gene Regulatory Network Evolution
2017
Animal development proceeds through the activity of genes and their cis-regulatory modules (CRMs) working together in sets of gene regulatory networks (GRNs).
The emergence of species-specific traits and novel structures results from evolutionary changes in GRNs. This is a baseless claim. To understand evolution we must study how GRNs evolve at the molecular scale. That sequent sentence proves my point. They do not know how the GRN's supposed evolution changes body plans. A relatively small set of common ‘toolkit’ genes shape the animal body plan regulated by conserved ‘subcircuits’ of regulatory interactions, themselves parts of larger GRNs. GRNs consist of transcription factors (TFs) and enhancers (CRMs) that control spatiotemporal patterns of gene expression. A strict definition of GRNs requires at a minimum both TFs and CRMs. The implicit admission of irreducibility is evident. For a GRN to be understood as a network, both edges (TFs) and nodes (CRMs) must be defined, and changes to both the TFs and the CRMs must be considered when exploring the mechanisms of GRN
1. http://arquivo.ufv.br/DBV/PGFVG/BVE684/htms/pdfs_revisao/expressaogenica/DNA methylation patterns and epigenetic memory.pdf]http://arquivo.ufv.br/DBV/PGFVG/BVE684/htms/pdfs_revisao/expressaogenica/DNA%20methylation%20patterns%20and%20epigenetic%20memory.pdf
2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3967546/
3. http://sci-hub.tw/https://nyaspubs.onlinelibrary.wiley.com/doi/abs/10.1111/j.1749-6632.1995.tb24814.x
4. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4149858/
5. https://www.broadinstitute.org/news/sketching-out-transcription-factor-code-binding-patterns-reflect-factors-gene-expression-roles
6. http://www.pnas.org/content/115/30/E7222
7. https://www.nature.com/scitable/topicpage/transcription-factors-and-transcriptional-control-in-eukaryotic-1046
8. https://academic.oup.com/nar/article/42/4/e24/2435199
9. http://gero.usc.edu/labs/benayounlab/files/2018/06/A-post-translational-modification-code-for-transcription-factors.pdf
10. https://www.ccs.neu.edu/home/radivojac/papers/nussinov_trendsbiochemsci_2012.pdf
11. https://academic.oup.com/jxb/article/66/16/4933/499938
12. http://pdb101.rcsb.org/motm/122
Last edited by Admin on Tue Nov 27, 2018 1:06 pm; edited 9 times in total