Molecular Biology of Gene Regulation: Transcription Factors

Transcript of Part 1: Gene Regulation: An Introduction

00:00:00.22		My name is Bob Tjian,
00:00:01.18		I'm a professor of MCB at the University of California, Berkeley,
00:00:05.29		and I'm also serving as the President of the Howard Hughes.
00:00:09.29		I'm going to spend the next 25 or 30 minutes telling you about some fundamentals
00:00:15.09		of one of the most important molecular processes in living cells,
00:00:21.02		which is the expression of genes through a process called transcription.
00:00:26.19		Now, first, to understand what gene expression means,
00:00:32.08		you have to have a sense of what we tend to refer to in the field as the "central dogma" of molecular biology.
00:00:39.25		Another way to think about this is the flow of biological information from DNA
00:00:46.28		(in other words, our chromosomes, which every cell has its complement)
00:00:50.21		to be transcribed into a sister molecule called RNA.
00:00:56.25		So this process of converting DNA into RNA is called "transcription,"
00:01:01.13		and that is the topic of this lecture.
00:01:05.12		This process is very complicated, as you will see by the end of my two lectures.
00:01:11.20		And it is very important for many, many fundamental processes in biology.
00:01:18.08		What I'm going to spend today's lecture on is the discovery of a large family of transcription proteins:
00:01:27.03		These are "factors," we call them, that are key molecules
00:01:31.11		that regulate the use of genetic information that has been encoded in the genome.
00:01:38.22		Now, transcription factors, or proteins, are involved in many fundamental aspects of biology,
00:01:46.04		including embryonic development, cellular differentiation, and cell fate.
00:01:51.20		In other words, pretty much what your cells are doing, how a tissue works,
00:01:56.13		and how an organism survives and reproduces is dependent on the process of gene expression,
00:02:03.17		and the first step in this process is transcription.
00:02:08.07		Now, there are many other reasons why a large group of people and scientists are interested in transcription,
00:02:16.13		and another reason is that understanding the fundamental molecular mechanisms that control transcription,
00:02:23.27		in humans or in any other organism, can inform us
00:02:29.13		and teach us about what happens when something goes wrong, for example, in diseases.
00:02:34.28		And I list here just a few diseases that we could study as a result of understanding the structure and
00:02:42.19		function of these transcription factor proteins that I'm going to be telling you about.
00:02:48.08		And of the course the hope is that in understanding the molecular underpinnings of complex diseases,
00:02:53.26		like cancer, diabetes, Parkinson's, and so forth,
00:02:58.00		that we will be able to develop and use better, more specific therapeutic drugs
00:03:05.01		and also to develop more accurate and rapid diagnostic tools.
00:03:10.04		So those are a couple of the reasons why many of us have spent,
00:03:14.01		in my case, over 30 years, studying this process of transcriptional regulation.
00:03:20.15		Now, to get the whole thing started, I have to give you a sense of what the magnitude of the problem is.
00:03:27.05		So imagine that one would really like to understand how this process of decoding the genome happens in humans.
00:03:34.24		So, as you may know, the human genome has some 3 billion base pairs
00:03:39.04		(or bits of genetic information), and that encodes roughly 22,000 genes.
00:03:45.01		These are stretches of DNA sequence that encode, ultimately,
00:03:50.08		a product that is a protein, which actually makes the cells function.
00:03:55.21		So as I already explained to you, there's this flow of biological information
00:04:00.19		where you have to extract the information buried in DNA, convert it into RNA,
00:04:05.17		and what I'm not going to tell you about today is the process of going from RNA to protein,
00:04:10.06		which is a reaction called a translational reaction.
00:04:13.27		I'm going to instead just focus on the first step of converting DNA into RNA,
00:04:18.22		which is the process of transcription.
00:04:22.02		Now, one of the most amazing results that we got over the last decade or so was,
00:04:29.11		when the human genome was entirely sequenced,
00:04:34.13		we realized that actually the number of genes in humans is not vastly different from many other organisms,
00:04:42.15		even simple organisms like little worms or fruit flies and so forth.
00:04:47.14		That is, roughly 22,000 to 25,000 genes is all the number of genes that all these different organisms have.
00:04:55.19		And yet, anybody looking at us versus a little roundworm in the soil or a fruit fly
00:05:02.13		can tell that we're a much more complex organism with a much bigger brain,
00:05:06.24		much more complex behavior, and so forth.
00:05:09.27		So how does this happen?
00:05:11.21		Part of the answer to this very interesting mystery or paradox
00:05:17.10		lies in the way genes are organized and how they're regulated.
00:05:21.22		And one of the most striking results of the genome sequencing project was to realize that
00:05:27.01		a vast, vast majority of the DNA in our chromosomes is actually not coding for specific gene products,
00:05:35.02		and that only roughly 3% of the DNA is actually encoding.
00:05:40.11		Those little arrows that I show you on this purple DNA are the gene-coding regions,
00:05:46.13		so you'll notice that there's a lot of "non-arrow" sequences,
00:05:50.04		which I'll show you in this next slide as green.
00:05:52.20		These are "non-coding" regions, so the vast majority (97% or greater) is non-coding.
00:05:58.26		So what are these other sequences doing?
00:06:02.18		And of course, it turns out that these sequences carry very important,
00:06:07.15		little fragments of DNA which we call "regulatory sequences."
00:06:12.09		And these are the sequences that actually control whether a gene gets turned on or not.
00:06:19.02		I'll be spending much of the next 20 minutes telling you about how this process all works
00:06:24.14		and what these little bits of DNA sequences actually function to control gene expression.
00:06:34.10		Now, the other thing I have to bring you up to date on
00:06:37.10		is this mysterious process we're calling "transcription,"
00:06:41.01		which reads double-stranded DNA and then makes a related molecule,
00:06:45.13		which is a single-stranded RNA molecule, which is an informational molecule.
00:06:49.25		That reaction is catalyzed by a very complex, multi-subunit enzyme called RNA polymerase II.
00:06:59.02		Now, there's the Roman numeral II at the end of this because there are actually at least three enzymes in most mammals,
00:07:07.13		that carry out different processes and different types of RNA production.
00:07:11.23		But I'm only going to tell you about the ones that make the classical messenger RNA,
00:07:16.17		which then ultimately becomes proteins.
00:07:20.03		Now, one of the things we learned early on in the study of mammalian
00:07:25.28		(or other multicellular organism) transcription processes is that,
00:07:30.17		despite the fact that this enzyme is quite complex in its structure,
00:07:35.28		it turns out to be an enzyme that nevertheless needs a lot of help to do its job.
00:07:42.16		So, on its own, this RNA polymerase II cannot tell the difference between the non-coding regions of the genome
00:07:49.29		and places where it's supposed to be coding, or reading, to make the appropriate messenger RNAs.
00:07:57.00		So this sort of leads you to think that there must be a number of other factors
00:08:02.25		that somehow directs RNA polymerase to the right place at the right time in the genome of every cell in your body,
00:08:11.08		so that the right products get made, so each cell in your body is functioning properly.
00:08:18.15		And this is where things get really interesting.
00:08:23.01		Some 25, 30 years ago, a number of laboratories took on the job of hunting for these elusive and,
00:08:31.17		as it turned out, specialized protein factors that recognize these little stretches of DNA sequences that I've been telling you about,
00:08:40.14		that make up the vast majority of the non-coding part of the genome,
00:08:45.15		and how these proteins can then recognize and, ultimately,
00:08:49.13		physically interact with these little bits of genetic information to then turn genes on or off.
00:08:57.21		Now, in this lecture,
00:09:00.02		I can't go into all the details of the types of experiments or the ranges of experiments
00:09:05.24		that many, many laboratories have done over the last two decades
00:09:09.11		to finally work out this molecular puzzle of how transcription works.
00:09:14.24		But I can tell you that there are fundamentally two major approaches
00:09:18.14		that have been taken over the last few decades to get a "parts list"
00:09:24.14		of the machinery that decodes the genome and carries out the process of transcription.
00:09:29.20		One is kind of the old style.
00:09:32.16		I'll call it "bucket biochemistry":
00:09:35.21		Take a live cell, crush it up, spread out all of its parts,
00:09:40.17		and then try to figure out how to put it back together again.
00:09:43.04		That's what I call "in vitro" biochemistry.
00:09:45.23		And the other one is "in vivo" genetics,
00:09:47.27		where you effectively use genetic tools (mutagenesis)
00:09:51.23		to go in there and selectively remove or "knock down"
00:09:55.18		or "knock out" certain genes
00:09:58.03		and gene products, and then ask what is the consequence on that cell or that organism.
00:10:03.01		Both of these technologies are very powerful and highly complementary,
00:10:10.20		and they continue to be used.
00:10:13.13		Today, I will focus primarily on the in vitro biochemical techniques
00:10:18.16		which led us to the discovery of the first few classes of transcription factors,
00:10:24.15		and in subsequent lectures, we'll go to more recent technologies that
00:10:29.22		allow us to speed up this whole process of identifying key regulatory molecules and how they work.
00:10:38.26		So, let's go back to the basic unit of gene expression, which is a gene,
00:10:45.04		here shown in the orange arrow, and the non-coding sequences surrounding it.
00:10:51.29		And you'll see that now I've added a few more elements to this purple DNA.
00:10:56.06		You see some symbols: blue square, round circle that's pink, and then a yellow triangle.
00:11:02.22		Those are just a way for me to graphically represent the little bits of
00:11:07.28		DNA sequences that I told you about that are the regulatory sequences.
00:11:11.08		So the little, round one happens to be very GC-rich,
00:11:14.22		the triangle one is a classical element that's called a TATA box
00:11:19.06		(I'll tell you about it a little bit later),
00:11:20.20		and the blue one is yet another recognition element.
00:11:23.18		So, why are we so interested in these little stretches of nucleic acid sequence
00:11:28.26		in the genome when it's buried amongst billions of other sequences?
00:11:33.10		Well, these individual little sequences turn out to be very important because of
00:11:37.25		where they sit (you'll notice they are sitting near the top of the arrow),
00:11:42.07		and they are recognized by very special proteins, which are the transcription factors.
00:11:48.11		So now I'm showing you some symbols with little cut-outs
00:11:51.24		which fit into either the square, the circle, or the triangle.
00:11:56.03		So, transcription factors, at least one major family of transcription factors,
00:12:03.07		are proteins whose three-dimensional structure is folded into a shape that
00:12:08.21		allows them to recognize these short stretches of double-stranded DNA,
00:12:13.19		in fact, largely through interactions with the major groove of DNA.
00:12:17.16		And I'll show you a structure of one in a little bit.
00:12:22.03		Now it turns out that there are probably thousands of these transcription factors,
00:12:26.23		because the number of genes that we have to control,
00:12:29.25		as I showed you, is on the order of 20,000 or 25,000 genes.
00:12:33.18		And so it turns out that you need a pretty large percentage of the genome devoted to
00:12:38.23		encoding these regulatory proteins, in order for a complex organism like ourselves to survive.
00:12:45.09		The other component of this, let's call it the "transcriptional apparatus,"
00:12:49.16		is of course the enzyme that catalyzes RNA, and I already told you that this enzyme
00:12:55.04		on its own can't tell the difference between random DNA sequence and a gene or a promoter.
00:13:01.16		These other sequence-specific DNA-binding proteins are the ones that
00:13:06.17		must recruit or otherwise direct RNA polymerase to essentially land on the right place and
00:13:14.03		at the right time in the genome, to turn on a certain subset of genes that are specifically required
00:13:20.26		in a specialized cell type, whatever cell you happen to be looking at.
00:13:26.16		So, that is kind of the first level of complexity of informational interactions between
00:13:33.05		the transcription factors and the more ubiquitous,
00:13:38.20		I would call it promiscuous, RNA polymerase II enzyme.
00:13:43.19		Well, as it turns out, it took several decades to work out most,
00:13:49.18		if not all, of the components of this so-called transcriptional machinery.
00:13:56.14		And it turns out, in this slide I'm showing you things are already starting to get more complicated,
00:14:03.05		so not only do you have RNA polymerase, but you have a bunch of other proteins that
00:14:07.04		go by names like TFIIA, B, D, E, H, F, and so forth.
00:14:13.12		So, it looks like there are going to be many, many proteins that
00:14:17.14		are necessary to form the transcriptional apparatus.
00:14:22.04		And then on top of that, you need sequence-specific DNA-binding proteins,
00:14:26.13		which I already described to you, to further inform or otherwise regulate the process of when
00:14:33.04		a particular RNA polymerase molecule should be binding to a particular gene.
00:14:37.29		So that's the sort of overview.
00:14:40.05		Now let me get into the specifics and how did we actually discover the family of proteins,
00:14:45.15		and it'll be interesting for you to see how science in this field evolved.
00:14:51.14		Now, as is often the case, when you first try to tackle a very complex problem,
00:14:56.28		and of course we didn't really know how complex it was when we began these studies,
00:15:00.24		but we assumed it might be complicated.
00:15:03.16		Certainly it would be more complicated than systems that we already had some idea about,
00:15:09.06		for example in bacteria or bacteriophages.
00:15:13.24		We took a lesson from our studies of bacteriophages and decided that,
00:15:18.11		to begin to dissect the molecular complexities of
00:15:22.09		the transcription process in animal cells, we should start with viruses,
00:15:26.28		because we knew that viruses will enter these host cells, these complex cells that
00:15:31.27		we ultimately want to study, and have to use the same
00:15:35.17		molecular machinery to transcribe their genes as the host mammalian cell would do.
00:15:41.27		So, this was kind of a trick or a way to look at a
00:15:45.28		molecular window into a complex system and try to simplify it.
00:15:49.22		And in our case, the early studies of the late 70s and early 80s
00:15:54.18		involved a very simple, one of the simplest double-stranded DNA viruses, called "Simian virus 40."
00:16:00.28		And Simian virus 40 is of course a monkey virus, which was nice because it's very close to humans,
00:16:06.15		and many things that we could learn about the way this virus uses its host, which are monkey cells,
00:16:12.12		to replicate and to express their RNAs and genes, would be applicable to our studies of humans, as you'll see.
00:16:20.11		And this virus was one of the first whose DNA,
00:16:24.07		its double-stranded DNA of about 5000 base pairs, was fully sequenced.
00:16:29.14		This was long before rapid, modern-day sequencing was available,
00:16:33.12		so this gave us a very powerful tool.
00:16:35.20		It basically allowed us to look at the entire genome of this virus,
00:16:39.03		which was tiny by comparison, only 5243 base pairs.
00:16:44.28		But just that information was already very important
00:16:47.27		because it very quickly allowed us, for example, to map where the genes are,
00:16:53.02		and one of the genes encoded a protein called the "Tumor antigen"
00:16:57.19		which turns out to be a transcription factor.
00:17:00.08		This then allowed us to get our hands (basically, to do biochemistry and genetics)
00:17:06.04		on the very first eukaryotic transcription factor.
00:17:09.14		Which in this case happens to be a repressor;
00:17:12.09		that is, a protein that, when it binds to DNA, just the same way as I showed you for the model case,
00:17:19.24		binds through specific protein-DNA interactions,
00:17:23.28		but in this case actually shuts transcription down rather than turn it up.
00:17:29.19		In the process of studying the way that this little virus, when it infects a mammalian cell,
00:17:36.15		uses proteins like T-antigen to regulate its gene expression,
00:17:42.17		it became clear that it had to use the host machinery to do the process.
00:17:47.22		And that meant that there must be monkey proteins that
00:17:53.00		are also involved in activating or repressing genes of this virus,
00:17:57.19		and this then led us to the most important step,
00:18:00.19		which is to transfer the technology we learned about viruses
00:18:04.06		and how to work with the virus transcription factor, like T-antigen,
00:18:07.26		to the cellular ones. And I'm going to give you just one example
00:18:11.05		of how the simple jump into the host cell allowed us to discover the first human transcription factor.
00:18:17.28		So, the question we then asked back in the early 1980s was:
00:18:23.17		What host molecule is regulating the expression of transcription of this virus, when the virus is in the host?
00:18:31.10		And we knew from the DNA sequence of the virus that
00:18:34.02		there were these six very GC-rich snippets of DNA that were regulatory,
00:18:41.05		because if we deleted them, the virus no longer would express the gene of interest.
00:18:45.20		So we knew that something was probably responsible for recognizing these GC boxes,
00:18:51.10		and we knew that it wasn't a virally encoded gene because we had tested
00:18:54.29		all of the viral genes, of which there were only six to begin with.
00:18:59.05		So we knew it had to be a host gene, and that led us to a whole,
00:19:04.06		I would say "family," of experiments that
00:19:06.20		led to the discovery of sequence-specific mammalian transcription factors.
00:19:10.25		And, as I said, we could've taken multiple approaches
00:19:13.25		to try to address this complicated issue.
00:19:16.26		I'll just give you one example of using in vitro biochemistry
00:19:20.29		to finally get our hands on this key, sequence-specific human transcription factor,
00:19:27.12		which of course has a homolog in the monkey.
00:19:31.29		And the way we did it was very interesting and simple in retrospect,
00:19:37.02		and that is recognizing the fact that whatever this protein was,
00:19:41.24		it had to have the property of recognizing those GC boxes that were sitting next to the viral gene.
00:19:49.22		We assumed that it must be a sequence-specific DNA-binding protein,
00:19:53.09		so all we had to do was figure out a way to extract proteins from human cells or monkey cells,
00:20:01.05		and then try to fish out those specific proteins out of the many thousands
00:20:05.26		of different proteins that were in this gemisch of cellular extract that would be responsible for
00:20:11.12		discriminating between random DNA sequences and the specific GC box.
00:20:17.15		And I'll quickly run through sort of the logic behind this.
00:20:20.19		So what I'm showing you here is a solid surface with DNA coupled to it that is
00:20:28.00		highly enriched for the recognition element, the GC box,
00:20:31.14		which should be the sequence recognized by the protein of interest.
00:20:35.14		Now, we had no idea what this protein was going to look like,
00:20:37.18		how many proteins there were going to be, and so forth,
00:20:39.20		but we knew it had to recognize the GC box.
00:20:42.05		So, we're going to try to fish this out of a pool of many thousands of other proteins.
00:20:47.18		Now, the key trick here was that,
00:20:49.28		because all cell extracts contain not only one DNA-binding protein but, as I told you,
00:20:55.25		thousands of different DNA-binding proteins, most of them, or in fact, in our case,
00:21:00.17		none of the other of several hundred to a thousand proteins that could bind DNA actually happened to recognize the GC box.
00:21:08.27		They just bind other DNA sequences.
00:21:11.07		So to kind of favor our protein being able to bind to our GC box and
00:21:16.07		not have to compete with all the other proteins, what we did was
00:21:20.22		to add nonspecific DNA in mass stoichiometric excess, so that all the other proteins that
00:21:28.21		wouldn't recognize the GC box would still have some partner to hang onto.
00:21:33.11		And this trick worked very well.
00:21:35.00		So, having the specific DNA on the solid resin and the nonspecific DNA flowing all over the place,
00:21:43.23		we could capture selectively the pink molecules here, which were the GC box recognition ones,
00:21:50.07		and the blue-green molecules of course predominantly bind to nonspecific DNA.
00:21:56.02		I show you one little blue one on the column because nothing works perfectly in real science
00:22:01.10		and it tells you that we have to go through this process iteratively to finally obtain
00:22:06.21		a preparation that's purely pink molecules with no green-blue ones.
00:22:11.18		Well, that turned out to work very, very well,
00:22:14.13		and that whole process of biochemical fractionation followed by a direct affinity, sequence-specific DNA resin
00:22:23.21		gave us the ability to perform a biochemical purification, followed by a molecular cloning,
00:22:30.12		of the transcription factor that encodes the protein Sp1.
00:22:35.03		And then we carried out a bunch of experiments, which I'll tell you next,
00:22:38.20		to show that this protein actually does activate transcription.
00:22:43.16		And of course, we went back and we proved that this protein, which turned out to be a rather large polypeptide,
00:22:49.08		can indeed recognize the GC box, and it doesn't matter if it's a GC box from the SV40 genome
00:22:55.20		or any other GC box that we could find in the human genome.
00:22:59.16		It would find that sequence and bind to it,
00:23:02.12		and then it would generally activate transcription.
00:23:05.22		So this led to the discovery of the first of a
00:23:09.03		very large family of sequence-specific DNA-binding proteins.
00:23:13.23		Now, I told you that the way these proteins tend to recognize short DNA sequences
00:23:19.08		is to interact with DNA through the major groove, and here is a perfect example.
00:23:23.12		So the stick, blue model there shows the actual three structures
00:23:28.17		that are called "zinc fingers," and the reason they're called zinc fingers is because
00:23:31.29		there are amino acids that are organized around a center that contains a zinc molecule,
00:23:37.27		which holds the three-dimensional shape of the polypeptide in a position just right for
00:23:43.23		fitting into the major groove of DNA, and the DNA here is shown in pink.
00:23:47.24		And you can see that that blue outline fits right into the major groove of the DNA, but not to the minor groove.
00:23:54.25		And one of the most important findings was not only the discovery of the first human transcription factor,
00:24:00.27		but the realization that most if not all sequence-specific DNA-binding transcription factors
00:24:06.18		have a similar structural motif. That is to say,
00:24:10.19		some structure is built to recognize sequences in the major groove of DNA,
00:24:15.24		and these three-dimensional motifs are recognizable as amino acid sequences in the genome.
00:24:23.27		So we can now much more quickly scan the entire sequence of a genome and
00:24:28.24		identify genes that are likely to be DNA-binding proteins,
00:24:31.28		as a result of understanding the structure-function relationships of these DNA-binding motifs, like zinc fingers.
00:24:40.03		So, what I'd like to show you now is that I've only introduced you to one class of transcription factors,
00:24:48.08		which are the sequence-specific DNA-binding proteins.
00:24:51.07		Well, I think I gave you a little taste of the level of complexity that's probably going to be needed to be able to
00:24:58.11		build the machine that's ultimately going to be able to allow you to transcribe every gene in every cell of the human body.
00:25:07.08		So that turns out to be a much more elaborated machine than what I just showed you.
00:25:12.05		So I want to show you now what is sort of our state-of-the-art thinking about
00:25:17.18		what is actually needed to build the machinery at a gene to allow it to be expressed and transcribed.
00:25:25.16		And the term I want to introduce you to is the "pre-initiation complex,"
00:25:30.27		and it's pretty much what it says.
00:25:33.10		It's the complex of multiple subunits that has to essentially land on the promoter of a gene,
00:25:40.28		which will be designated for later expression.
00:25:45.14		And this is a process that is probably quite orderly;
00:25:50.01		that is, there's an order of events that happens, which we,
00:25:52.28		by the way, are not entirely sure exactly what the order is or
00:25:56.18		even if the order is the same from one gene to the next.
00:25:59.12		But we can kind of see where it starts and where it ends up,
00:26:02.07		and the pathway in between, I would say, is still a little bit murky.
00:26:06.18		And the story here, again, starts with a little snippet of DNA called the TATA box,
00:26:11.13		which I already introduced you to briefly.
00:26:13.21		It's an AT-rich sequence which sits at the 5' end,
00:26:18.22		or the beginning, of many genes, but not all genes...
00:26:21.01		maybe 20% of the genes might contain this AT-rich region.
00:26:26.15		And that AT sequence is the signal or landmark, if you like,
00:26:31.16		for a particular protein to bind to it.
00:26:34.07		And that protein is called, not surprisingly,
00:26:36.26		the "TATA-binding protein," because it's the TATA sequence.
00:26:40.11		And so this represents a second class of transcription factors.
00:26:45.08		These are not the type that I just introduced you to, which are going to be different for every gene.
00:26:50.22		The TATA sequence is present in a very large number of genes, so it can't be gene-specific.
00:26:56.22		But it turns out to be very crucial for our understanding of how gene regulation works.
00:27:02.06		So, you start with a TATA-binding protein finding a TATA box.
00:27:08.01		We later found out that the TATA-binding protein rarely functions on its own
00:27:12.07		and has a bunch a of friends that we call "TAFs," for "TBP-associated factors."
00:27:17.03		And now you're talking about an assembly, a multi-subunit complex of almost a million daltons.
00:27:23.12		There are somewhere between 12 to 15 subunits in addition to the TATA-binding protein that make up this
00:27:29.05		little complex of proteins that kind of travels around together, and this is found in most cell types.
00:27:36.00		And later on I'll show you in a subsequent lecture that not every cell type might have
00:27:40.29		exactly the same complement of these subunits,
00:27:43.17		but many of them have this prototypic complex.
00:27:47.27		Is this enough for building the pre-initiation complex?
00:27:52.06		Unfortunately not. It turns out that there are a host of other,
00:27:57.21		I'll call them "ancillary factors," in addition to the multi-subunit RNA polymerase itself,
00:28:03.12		that are necessary for you to build up an assemble that is necessary to form an active,
00:28:12.06		ready-to-activate transcriptional pre-initiation complex, or the PIC.
00:28:18.27		This is kind of the picture we're getting to, and even this picture,
00:28:24.29		with many, many colors and many, many different polypeptides,
00:28:28.18		that adds up to probably greater than 85 individual proteins that all have to kind of fit together like a jigsaw puzzle...
00:28:37.00		it's probably not even the whole story.
00:28:39.17		You'll notice I still have one big, red question mark there,
00:28:42.24		because I think, as we begin to study specific cell types and specific processes,
00:28:48.09		like embryonic development or germ layer formation, additional components that are not present here
00:28:55.19		in this prototypic pre-initiation complex will come into play.
00:29:00.14		And that's a subject of a subsequent lecture.
00:29:03.15		But already you can tell that the transcriptional machinery is anything but simple.
00:29:09.23		So, can we get a better idea of what transcription might actually look like?
00:29:15.19		What's happening when a transcription process takes place?
00:29:18.20		So, let me first of all say that I'm going to finish my lecture now with a little cartoon,
00:29:24.15		which is our attempt to imagine the events that take place when you form a pre-initiation complex,
00:29:33.01		you bring regulatory proteins to the activated gene, and what happens during this process.
00:29:40.09		Now, keep in mind that this is, at this point, mostly a cartoon that is in our imagination,
00:29:47.25		and only parts, if any, of this is probably real.
00:29:52.29		But it gives you a sense of the complexity of the transactions that have to
00:29:57.26		take place just for one gene to transcribe and express itself.
00:30:02.07		So let me show you the movie, and then we'll finish
00:30:05.28		just by keeping in mind that there is much to be learned, and in my next lecture,
00:30:11.09		we'll go into the selectivity of this process in specialized cell types.
00:30:16.20		So now let's see what this cartoon of transcription looks like.
00:30:22.08		So we start off with DNA with some pre-assembled TFIID molecule,
00:30:26.27		and along comes this other green molecule, which is actually a co-factor,
00:30:30.24		which then forms this very large complex with RNA polymerase.
00:30:34.06		And then a distal activator protein came in and activated the process,
00:30:39.10		and this molecule, this bluish molecule that's moved away from the complex,
00:30:46.18		is actually the RNA polymerase.
00:30:48.03		And that little, yellow, sort of a bead-on-a-string is actually the RNA product.
00:30:53.20		So that gives you a sense of:
00:30:56.01		Things have to happen quickly, and yet it involves many, many molecules having to assemble
00:31:01.02		and then disassemble to give you this reaction to happen.
00:31:04.06		And in my next lecture, we'll go into more specific aspects of this reaction,
00:31:10.13		and particularly during embryonic development and tissue-specific gene expression.