liv: ribbon diagram of a p53 monomer (p53)
[personal profile] liv
[personal profile] lilacsigil wanted another post about transcription factors and why some of them just blindly copy and some have more complex roles. I am not sure quite what further to explain without going into technical details, so I'll have a go at that. If it works out that this post is boring or too obscure, please feel free to ask me more questions about what it is that you actually want to know.

When I talk as a geneticist or a cell biologist, I generally say, this factor does this thing, or this factor turns on this gene, but if I want to explain the underlying biochemistry I have to unpack that simplification. I'm going to talk a bit about what DNA is actually like chemically, though I'm still mostly going to be working with abstractions to an extent, actual chemists are probably not going to be impressed by this post. The thing is that most cellular proteins don't really "do" things, they act as chemical catalysts so that the activation energy for chemical reactions is low enough that the chemistry can happen at 37 degrees and not need biologically implausible temperature and pressure conditions.

So what's actually going on chemically when DNA is transcribed to make RNA? Most people have seen the iconic image of DNA as a double helix. It's a little bit like a spiral staircase, having two parallel rails which are invariant, made of alternating sugars and phosphates, linked by a series of steps which are more or less flat and more or less perpendicular to the the rails. But unlike a real spiral staircase, there is half a step attached to each of the rails and they are only loosely joined in the middle. These half steps are the DNA bases, which come in four slightly different forms, known as A C G and T. A and T match up and link together via two weak hydrogen bonds, and C and G match up and link together via three weak hydrogen bonds. The order of these bases carries the information to specify genes. The other way that the DNA double helix is not like a spiral staircase is that the two rails are in fact slightly out of phase, so rather than all the turns of the spiral being equally spaced, there are alternating big gaps and then small gaps. Again that's probably familiar to most people as images of the DNA double helix is such a common iconic symbol all over the place. The big gaps and the small gaps are called the major and minor grooves of the helix. Oh, and the whole structure is not in fact rigid, it's pretty flexible.

People talk about the genetic code, in the sense that the sequence of bases corresponds to the sequence of amino acids in the proteins eventually synthesized. There are 64 possible triplets of bases, and between them they represent 20 amino acids plus stop. But the sequence of the DNA doesn't only specify the sequence of the protein, it also has to specify where genes start and end, which of the two strands should actually be interpreted as a gene, and which genes should be activated in which circumstances. So some sequences of bases don't say "add a glycine, then an alanine, then a leucine", they say "start here" or "only transcribe me if we're short of a particular metabolite", or any number of other possible instructions.

So there are two parts to transcription, there are the transcription factors and accessory proteins which interpret the meta-instructions, if you like, and there is the transcription machinery, whose job is to copy the relevant DNA strand to make complementary RNA, which ultimately determines the sequence of the proteins that get synthesized. As it happens the transcription factors, the interpreting machinery, can "read" the sequence, whereas the transcription machinery is blind. I'm not sure there's a particular reason for this division of functions, the system just happened to evolve this way. But it does have advantages, it means that the cell can have complex and fine control over which genes are transcribed when, without the control sequences changing the structure and therefore the function of the proteins that get created.

Although the DNA bases, the steps in the staircase, are often represented as abstract shapes with three or two pin connections, in reality all four bases are slightly chemically different from eachother at the edges, not just at the connection surface. This means that transcription factors can read the signals that say things like: start here and copy in this direction. This is because the transcription factors have shapes which fit chemically and in shape with the surface formed by one particular sequence of DNA bases. Since a short sequence might show up just by chance specifying part of a protein, many transcription factors have two-part recognition sequences, often mirror imaged, because that kind of structure is rarely found in proteins, meaning that there's less potential confusion between control sequences and coding sequences. This also means that transcription factors often work in combinatorial ways, because you need at least two to be able to read the two halves of the recognition sequence, and the two factors may be the same as eachother or different. The two halves of a transcription factor will only work together if they interlock in exactly the right way, so you can have AA, AB, AC all controlling different genes, giving more possible outcomes with fewer separate factors.

The actual process of transcription requires unwinding the two strands of DNA and breaking the weak links between the paired bases. This leaves some of the sequence exposed with its connectors sticking out and able to form hydrogen bonds. And the core of the transcription machinery, the RNA polymerase enzyme, catalyses sticking RNA nucleotides (a nucleotide a base plus a sugar and phosphate to form the invariant backbone) together to make a continuous strand of RNA. The only way this chemical reaction can happen is if two bases are really precisely aligned, which requires both the enzyme acting as a scaffold, and the incoming bases making connections to their complementary partners in the exposed DNA. If the connectors aren't matched, the RNA bases won't be held firmly enough to form new bonds. That's how the DNA sequence manages to determine the RNA sequence.

So basically transcription factors recognize the start of the gene, and also receive signals from inside or outside the cell so that they are only active when a particular gene is needed. They then provide docking sites which allow the transcription machinery to build up a complex, exactly perfectly aligned to carry out the whole process of initiation or starting the process of transcription. The RNA polymerase itself performs elongation, blindly adding bases to the RNA according to the sequence of the complementary DNA. Then transcription termination factors recognize sequences indicating the end of a gene and halt the process at the appropriate point. There are factors which bend the DNA so that it's distorted out of shape and this allows different transcription factors to read sequence, or so that two regions that are distant on the chromosome can be brought together in physical space, allowing one protein to bind to both of them and potentially creating a start site that might not have been obvious from the linear sequence of the DNA. There are repressors which bind to and block the sites recognized by transcription factors, preventing the gene from being activated when this is not appropriate.

I'm running a day behind on the meme at this point, I wrote this yesterday while travelling but didn't get online to post it until today. I don't know if what I've written quite makes any sense, so please do ask any questions. Either to clarify what I've written here, or to ask about how transcription factors work at a different level from this.

[December Days masterpost]

(no subject)

Date: 2014-12-21 10:51 pm (UTC)
ewx: (Default)
From: [personal profile] ewx
That's very interesting l-) I have some questions...
How closely associated are TFs with genes - i.e. are there distinct types of TF for every gene, or can one TF find the start and/or end of many different genes?
Is there much evolution to speak of in TFs? Or are they so critical than any variation produces a non-viable organism?

(no subject)

Date: 2014-12-22 03:54 am (UTC)
lilacsigil: 12 Apostles rocks, text "Rock On" (12 Apostles)
From: [personal profile] lilacsigil
Thank you! The advantages of having separate transcription readers and transcription machinery was really interesting, along with the two-part recognition system. It's very cool that they're often in the mirror image form to distinguish them from random proteins that might come along. And with so many steps in the process I can see why some transcriptions go really wrong: but I guess the catch is finding out when they go wrong in a way that causes disease or sub-optimal function, and how that works in different people...

Soundbite

Miscellaneous. Eclectic. Random. Perhaps markedly literate, or at least suffering from the compulsion to read any text that presents itself, including cereal boxes.

Top topics

December 2025

S M T W T F S
 123456
78910111213
14151617181920
21222324252627
282930 31   

Expand Cut Tags

No cut tags

Subscription Filters