Bacteriophages: Genes and Genomes

Transcript of Part 2: Bacteriophages: Genomic insights.

00:00:01.02 Hi. My name is Graham Hatfull. I am a professor at the University of Pittsburgh
00:00:04.15 and a Howard Hughes Medical Institute professor.
00:00:07.03 In part two we are going to talk about some of the insights that we can gain from comparing
00:00:14.02 the genomes of bacteriophages and perhaps learn something about how they are constructed and how they have evolved.
00:00:22.03 In part one we saw some morphologies of bacteriophages, what they look like in the electron microscope,
00:00:32.29 and I showed you some different types of structures
00:00:37.23 that could arguably reflect different ways in which those bacteriophages have evolved.
00:00:44.27 But we have to be very careful about interpreting differences in virion morphology, what the viruses look like,
00:00:53.09 and their evolutionary relationships and how the genomes compare to each other.
00:00:58.09 I've illustrated that in this particular slide
00:01:01.00 where I have shown five examples of bacteriophages.
00:01:05.06 These would all be classified according to their long flexible tails
00:01:10.14 as being members of the Siphoviridae, the Sipho viruses,
00:01:15.03 each with their heads and their tails attached.
00:01:18.18 It might be tempting to look at these and say well they all look very similar to each other,
00:01:25.06 almost indistinguishable, perhaps they all are genetically similar.
00:01:30.11 In fact this is an example where these five share essentially little or no sequence similarity at the genomic level whatsoever.
00:01:40.01 So if we want to understand how genomes have evolved
00:01:44.00 and how they are related to each other from a phylogenetic perspective,
00:01:48.23 we need to go in, isolate the DNA, and sequence those genomes and then compare them.
00:01:54.24 There are various ways in which we can compare the genomic sequences.
00:02:00.22 We can compare them by looking at the similarities of the nucleotide sequences,
00:02:07.26 essentially sequencing the DNA or if it's RNA, the RNA,
00:02:13.05 and then comparing them one to another and seeing what is shared.
00:02:16.08 A second way of doing that would be to look at the genes
00:02:21.09 and comparing them through their predicted amino acid sequence similarities of the proteins that are encoded by those genes.
00:02:28.22 Right here I am showing you an example of what it looks like if we take two bacteriophages
00:02:34.09 and compare their nucleotide sequences.
00:02:37.20 And this particular representation is referred to as a dot plot.
00:02:42.05 And what we have done is to take two bacteriophage genomes, in this case Fruitloop and Boomer,
00:02:49.10 and we have aligned the two sequences, and we are going to slide one next to the other computationally,
00:02:58.03 and ask if there are segments that are similar to each other within a particular window of comparison.
00:03:06.07 And every time we see sequence similarity, a dot is presented on this dot plot.
00:03:12.11 And what you can see here is that
00:03:15.12 there's a rather complex series of relationships reflecting a quasi diagonal line
00:03:25.08 from the top left to the bottom right of this representation.
00:03:29.21 So where you can see a relatively solid line that means that there is a segment of DNA
00:03:35.24 which is substantially similar between the two.
00:03:38.23 Where you fail to see a line, such as in the top left hand corner, is a region where the two genomes
00:03:47.14 appear to be substantially dissimilar.
00:03:50.10 They don't have shared nucleotide sequences.
00:03:52.15 And then there's all sorts of complicated interruptions and shifts in the diagonal line as you look between these.
00:04:02.13 And this tells us an important aspect, a component, of what we see when we compare these types of genomes.
00:04:11.09 And that is, they are not simply completely similar from end to end or completely dissimilar from end to end.
00:04:18.20 But quite commonly we see these interrupted portions
00:04:22.22 where different segments of the genomes are related to each other in different ways
00:04:27.20 as though different parts of the genome have different evolutionary histories,
00:04:33.13 different ways of arriving in the genomes as we see them in Fruitloop and Boomer today.
00:04:40.14 So from this type of analysis and looking at a number of bacteriophage genomes,
00:04:47.08 we can see the following general conclusions.
00:04:51.15 First of all, the DNA that is isolated from these particular virions, these double stranded DNA virus types,
00:05:01.12 that the genomes are linear. So they have a left end and a right end.
00:05:08.23 They tend to form predominantly two types of groups that we can see when we look at the linear genomes.
00:05:17.06 There are those that have defined ends.
00:05:20.03 That means that if you isolate the molecules from a million particles
00:05:25.10 of a particular phage type, each of the million DNA molecules that you get out
00:05:30.08 have the same left and right ends. In other cases that is not true.
00:05:35.26 The DNAs have the same overall genetic constitution,
00:05:41.14 but the specific physical ends of the left and the right can be positioned in different places.
00:05:47.27 And therefore they are referred to as being circularly permuted.
00:05:52.18 They are not circular. They are linear,
00:05:54.21 but they represent different positions of the ends relative to the genetic information.
00:06:01.28 Often these viruses also contain terminal redundancies,
00:06:06.22 which means that one segment of the genome is duplicated at both ends.
00:06:11.19 And so these two major types of genomes that you see either have defined ends
00:06:17.27 or terminally redundant and circularly permuted ends
00:06:20.24 and there are other viruses that have different variations on these themes.
00:06:25.22 The sizes of bacteriophage genomes varies enormously.
00:06:31.05 There are those that are as small as perhaps 5000 bases, and there are those that are as large as 500 kilobases,
00:06:39.02 which is quite amazing when you think that 500 kilobases
00:06:44.22 is about the same size of the smaller of the free living bacterial genomes.
00:06:50.14 And so there are examples of viruses that are the same size genomically
00:06:55.00 and have the same or more genes as small bacterial genomes.
00:07:00.15 The phage genomes tend to be densely packed with genes.
00:07:05.19 And so most of the DNA is encoding genes.
00:07:10.16 And as I mentioned before in this section, the phages infecting bacteria from different genera
00:07:17.05 tend to be unrelated at the DNA level.
00:07:21.13 So this slide shows an example of what we see when we take a DNA sequence of a particular phage,
00:07:34.00 in this case it is a phage called Giles,
00:07:36.26 And we use computational approaches and a bioinformatic strategy to identify
00:07:44.07 the protein coding genes that are present within the virus.
00:07:49.02 And so the genome is largely filled with protein coding genes,
00:07:54.22 and they are shown here by these boxes, either colored or in white.
00:08:00.21 The genome is represented by what looks like this railroad track here
00:08:05.24 which has markers every kilobase and every 100 bases.
00:08:10.21 And the genome for Giles is linear with defined ends,
00:08:14.27 and so in this representation it begins in top left hand corner and goes to the bottom right hand corner,
00:08:22.14 and each of the genes are shown in these boxes represented either above or below the DNA.
00:08:30.00 Genes that are shown above the DNA are transcribed in the rightwards direction,
00:08:36.03 coming this way, and those that are shown below such as a couple or three genes in the top left hand corner
00:08:44.23 are transcribed in the leftwards direction.
00:08:47.19 So those are the standards that we use for presenting the genes
00:08:52.25 and illustrating the direction that they are transcribed
00:08:56.17 relative to the overall genome structure.
00:08:59.19 You can see here from these genes that they are densely packed into this particular genome.
00:09:06.10 There's few non-coding spaces between the genes.
00:09:10.19 They essentially represent 95% or more of the genetic information that's available.
00:09:19.12 In this particular representation we have colored the genes in such a way
00:09:26.12 as to reflect the relationships that some of these genes share with genes that you find in other bacteriophages.
00:09:34.27 The genes that are shown in white, and you can see some across the top here,
00:09:40.02 are simply genes for which we don't have any other close relatives in any of the databases.
00:09:46.03 And this illustrates the point that phages such as this can be replete with genes
00:09:52.11 that are not closely related to known genes
00:09:55.24 and for which we have rather little idea as to what they do.
00:09:59.04 I mentioned that when we compare the nucleotide sequences of phages we can see that it looks as though the parts
00:10:12.01 have evolved differently to each other.
00:10:15.00 And this leads to the idea that phage genomes are characteristically mosaic.
00:10:21.09 They are constructed architecturally from segments which have been put together in a particular way.
00:10:29.01 Modules if you like. And that each of these modules is in effect mobile
00:10:35.08 or can move around the population of bacteriophages
00:10:38.28 such that you can find it in more than one or perhaps several different genomic contexts.
00:10:45.27 And this slide illustrates how this might look when you see mosaicism
00:10:52.15 at the level of nucleotide sequence comparisons.
00:10:56.00 So this is showing a small segment of three phage genomes.
00:11:01.19 The one at the top, PG1. Rosebush in the middle, and Qyrzula towards the bottom.
00:11:08.17 You can see the genome represented by the markers in the railroad tracks for each of these.
00:11:14.19 The genes that are encoded are shown by the color boxes with their gene names inside the boxes,
00:11:19.11 and where these genomes contain and share nucleotide sequence similarity
00:11:25.04 there is a color coded area shading between the two such as you can see here.
00:11:32.01 Now Rosebush and Qyrzula have very evident and strong nucleotide sequence similarity
00:11:39.22 both in the left part here and over here in the right part as well.
00:11:44.09 PG1 and Rosebush and Qyrzula have no sequence similarity that is evident by this comparison,
00:11:53.22 in this example, because there is no color shading over on this left part.
00:11:58.29 Nonetheless, in this middle segment things are different.
00:12:04.23 There appears to be very little sequence similarity between Rosebush and Qyrzula
00:12:10.16 because there is no shading in that area,
00:12:15.22 however, when we compare PG1 and Rosebush we can see that in this central segment right here
00:12:22.22 that there is indeed a purple color shading that reflects strong sequence similarity
00:12:29.17 between these two genomes, PG1 and Rosebush, in this center portion.
00:12:35.04 So this is really important because it illustrates an example where the different segments of these genomes
00:12:42.09 particularly Rosebush appear to have had different evolutionary histories.
00:12:47.09 They've come from different places.
00:12:49.14 This segment that's in the middle of Rosebush clearly did not come from the same place as Qyrzula
00:12:55.02 It appears to have come from a common ancestor which had more in common in this region with PG1.
00:13:02.15 So this is a good example of mosaicism, a key architectural feature of bacteriophage genomes.
00:13:08.26 When you look at the nucleotide sequence level you can see precisely where these types of events occur-
00:13:18.27 at the boundaries that must reflect where recombination occurred to give you this exchange of information.
00:13:26.08 And in this particular slide I am showing the detailed information of two genomes.
00:13:33.28 The one at the top here you can see the sequences,
00:13:36.27 and in blue the amino acid sequences of the predicted genes in that region.
00:13:41.18 In the bottom you can see a second genome that we are comparing.
00:13:46.00 And this red shading over on the right hand side is
00:13:50.28 simply reflecting a segment where these genomes are closely related.
00:13:54.22 The nucleotide sequences, the DNAs are extremely similar if not identical in this red part,
00:14:03.09 but over here, they are completely different. They are completely dissimilar.
00:14:08.07 And so the key point that you can see from this type of comparison that this module boundary, this junction
00:14:15.20 between the red and the white parts where recombination must have happened,
00:14:21.13 this module boundary, corresponds precisely to the boundaries of the genes.
00:14:27.16 It is this boundary which is where this gene starts up here, and its comparable gene begins down here.
00:14:38.04 These genes to the left are very different, and to the right they are identical.
00:14:42.23 So the module boundary, or the recombinant joint which must have brought these together
00:14:48.15 coincides with the gene boundaries themselves.
00:14:52.25 And this is a common and important observation and it helps us to think about how mosaicism can be generated.
00:15:01.14 And there are two fundamental models.
00:15:03.05 The first is that recombination happens at targeted, short, conserved boundary sequences.
00:15:13.06 The idea that there are some short conserved segments of sequences,
00:15:17.00 perhaps a dozen or a couple of dozen nucleotides in length
00:15:19.23 which corresponds to those boundary regions.
00:15:23.05 And that homologous recombination perhaps encoded by host enzymes
00:15:26.26 catalyzes exchange at that region in order to promote recombination
00:15:32.15 at places where genes themselves in their entirety get exchanged.
00:15:36.22 There are some examples of that that have been reported in the literature.
00:15:41.29 So this is certainly an event that can happen.
00:15:45.05 We think however that it is more likely that much of the mosaicism that you see
00:15:51.02 because it is this pervasive feature throughout phage genomes
00:15:55.07 can occur by an alternative mechanism which is by illegitimate recombination
00:15:59.19 at what are essentially randomly chosen sequences.
00:16:04.04 In other words, that even though we see a close correspondence
00:16:08.08 between the point of recombination and the gene boundaries,
00:16:12.03 this does not result by this model from targeted exchange at that point.
00:16:19.03 Rather that the exchange positions are random
00:16:22.28 and the reason why that correspondence occurs is because of
00:16:26.12 selection for gene function for those genes that can actually work.
00:16:32.03 And so this just illustrates the different types of examples of recombination.
00:16:39.08 In the top panel one could imagine the targeted recombination, targeted homologous recombination,
00:16:45.03 could occur at these short black segments. Short segments of DNA that are conserved at gene boundaries
00:16:53.28 in order to give you these exchange events in these recombinants.
00:16:57.08 This middle panel here shows an example of illegitimate recombination
00:17:02.06 where recombination has essentially happened anywhere.
00:17:05.20 It has happened between sequences that are not related to each other.
00:17:09.02 And you get whatever gobbledygook may arise from just a random exchange in the process.
00:17:15.15 And at the bottom here I want to emphasize that we do expect recombination to occur between shared sequences
00:17:25.04 such as whole genes that are shared.
00:17:27.01 Homologous recombination of this sort always happens,
00:17:31.19 and it gives you new combinations of flanking genes,
00:17:34.28 such as A now joined together with C.
00:17:38.03 Ok. So homologous recombination is always going to play a role in reassorting
00:17:43.07 the types of genes that can be present in the modules.
00:17:47.03 But homologous recombination of this general type does not generate new recombinant boundaries,
00:17:55.18 new module boundaries unless it is in this targeted approach.
00:18:00.27 So as I mentioned we think that whereas there are a small number of examples
00:18:06.26 that would support the exchange of boundary sequences
00:18:11.02 By far the majority of the boundaries that we see when we compare phage genomes
00:18:16.13 show no evidence of such boundary sequences, lending support to the idea that illegitimate recombination
00:18:23.27 is playing a key role. But there are some really important consequences
00:18:28.26 that we have to think about as a model for illegitimate recombination in this process.
00:18:33.16 First of all, illegitimate recombination, recombination between sequences
00:18:38.15 that don't share anything or very little in common,
00:18:41.11 is likely to happen at rather low frequencies.
00:18:45.02 It is going to occur at random positions, for the most part,
00:18:49.26 and that when you put together two pieces of DNA randomly,
00:18:55.00 for the most part it is just going to generate genomic garbage.
00:18:59.17 Material which may not have a genome of the appropriate length,
00:19:04.24 and will have lost some genes and is liable to be non-functional.
00:19:11.03 So in its essence we can think of it as a rather disruptive or destructive type of process.
00:19:19.21 And one can imagine that if this was going to play an important role,
00:19:24.24 that you would probably need multiple low frequency events in order to actually generate survivors,
00:19:34.07 the phoenix that can rise from the ashes with a full complement
00:19:37.04 of functional sequences that can function as a virus.
00:19:41.27 If sequences are going to recombine randomly with each other
00:19:49.07 then there is no necessity to think of these events as being predominantly involving two phage genomes.
00:19:58.12 The bacterial chromosome is about a hundred times the size of an average bacteriophage genome,
00:20:03.22 and therefore there is going to be a strong propensity or at least an opportunity
00:20:08.05 for the phage genome to recombine with the bacterial chromosome.
00:20:13.21 The process we can think of as being one that is infrequent and yet extremely creative.
00:20:22.08 This is the way in which you can take pieces of DNA
00:20:26.08 and put them together in a way in which has perhaps never been seen before in nature.
00:20:32.10 That's a way of making new genes, or perhaps putting domains together in novel combinations,
00:20:40.06 and generating new types of functions which perhaps have not been seen in nature before.
00:20:47.08 And so this fits in very much with our model as described by Darwin for the process of the origin of species,
00:20:58.09 where we can think of the variation being generated by these illegitimate recombination events
00:21:05.26 and then natural selection working on what is essentially this garbage
00:21:11.14 in order to select from that those components that work.
00:21:16.15 Even though we would think of this as being a very low frequency event,
00:21:21.25 requiring infrequent recombination events and multiple numbers of them, it is nonetheless it is creative.
00:21:30.27 And as we saw previously, that phages have likely to have been evolving for many, many years
00:21:40.19 in a very dynamic population very successfully.
00:21:44.26 So this will give us these recombinant joints.
00:21:49.09 These recombinant joints once they are formed are likely to be stably maintained.
00:21:53.07 There's no mechanism necessarily for undoing them and therefore
00:21:57.14 these survive as we see today as the fossilized relics of recombination events
00:22:02.29 that probably happened many of hundreds of millions or even billions of years ago.
00:22:08.27 And thinking about the mechanisms by which this might happen,
00:22:12.00 it's been shown that many bacteriophages encode recombinase enzymes
00:22:19.25 which have the capability to recombine genomes at least at very short sequences
00:22:26.23 that don't have to be completely identical to themselves.
00:22:32.00 raising the interesting possibility that bacteriophages actually encode their own machinery
00:22:36.20 that can facilitate this type of recombination,
00:22:40.27 and indeed the generation of the mosaic genomes as we see them.
00:22:44.10 Looking at bacteriophages that are very different in their sequences
00:22:53.03 is quite limited to each other and this shows us that if we really want to learn more
00:23:00.14 about the details about how mosaicism is created and how it works,
00:23:04.29 we really have to think about, and very carefully, about what types of genomes we want to compare with each other.
00:23:12.04 And we will see an example of that in part three.
00:23:15.09 So we can conclude then from this genomic comparison of phages
00:23:24.14 we can conclude that phage genomes are architecturally mosaic.
00:23:28.15 That mosaicism is fueled by this process of illegitimate recombination.
00:23:32.27 And that genome segments can eventually be reassorted by homologous recombination
00:23:39.13 once new joints between new genes are generated to form that mosaicism.
00:23:45.09 In part three, we'll look at a rather particular case
00:23:49.14 of the detailed analysis of bacteriophages that infect one particular common host
00:23:54.27 where all those bacteriophages can be argued
00:23:58.00 to be potentially in genetic communication with each other,
00:24:01.21 and we can therefore explore what they look like
00:24:04.06 and the insights that they can give us in bacteriophage evolution.