• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Bacteriophages: Genes and Genomes

Transcript of Part 3: Mycobacteriophage genomics

00:00:01.00		Hello. My name is Graham Hatfull.
00:00:03.08		I'm a professor at the University of Pittsburgh
00:00:05.23		and a Howard Hughes Medical Institute professor.
00:00:08.21		Today we are talking about bacteriophages, their genes and their genomes,
00:00:12.23		and in part three we are going to focus in on a comparative analysis
00:00:17.00		of a particular type of bacteriophages. These are the mycobacteriophages, phages that infect mycobacterial hosts.
00:00:26.09		And so I should explain why we would want to choose phages of a particular host.
00:00:37.00		And indeed, why we would want to focus on this particular group.
00:00:40.27		So, perhaps one of the most important aspects is that phages
00:00:46.10		that infect very different bacteria tend to be very unrelated to each other.
00:00:51.04		And therefore there is not much to be learned about the detailed mechanisms
00:00:56.02		of phage evolution by comparing them.
00:00:59.13		They are so different there is little to be learned.
00:01:02.23		On the other hand if we were to focus on the phages that infect a common bacterial host,
00:01:08.01		then we would argue that they must all be in some way
00:01:12.17		in genetic, at least potentially, in genetic communication with each other.
00:01:18.22		And then comes the question as to well which bacteria host should we use
00:01:22.13		in order to isolate and characterize these viruses?
00:01:27.06		And there's many of course bacteria to choose from.
00:01:32.05		If we had to think of them as ones that would be the most useful, the most interesting,
00:01:36.20		we might want to think about focusing on some bacterial pathogens.
00:01:40.17		Or alternatively bacteria that are important for other criteria.
00:01:46.15		Environmentally important, or other key aspects of their biology.
00:01:53.17		So we focused on the mycobacteriophages.
00:01:58.07		And in part because we think that the mycobacterial hosts are of sufficient importance
00:02:06.00		that they really warrant taking advantage of the viral systems that we could develop.
00:02:13.10		Not just for understanding the viruses, but for understanding the hosts that they infect.
00:02:18.17		And so I'll mention two bacterial species within this genus.
00:02:26.22		One is Mycobacterium tuberculosis, which is the causative agent of human TB.
00:02:34.19		And I'll mention a relative of Mycobacterium tuberculosis, which is called Mycobacterium smegmatis,
00:02:41.10		and this is important because it is a very helpful surrogate for us to use in the lab.
00:02:47.07		Mycobacterium tuberculosis we can grow in the lab,
00:02:51.18		but we have to be very cautious and careful with it for two reasons.
00:02:55.25		Primarily because it is a rather nasty bacterial pathogen,
00:03:01.22		and we certainly don't want any of us working in the lab to be infected with that organism.
00:03:08.19		But is has another feature that somewhat complicates its growth and manipulation in the lab.
00:03:13.22		And that is that it grows extremely slowly. It has a doubling time of about 24 hours.
00:03:18.19		So it takes a day to go from one cell to two cells with Mycobacterium tuberculosis.
00:03:24.28		That makes research pretty slow going on M. tb.,
00:03:31.23		but you also have to be very careful about sterility and your aseptic technique
00:03:36.18		because almost everything out there grows faster than Mycobacterium tuberculosis,
00:03:41.24		and if you are not careful, you will end up growing that rather than M. tb.
00:03:46.09		Mycobacterium smegmatis, in contrast, is a non-pathogen.
00:03:51.20		It does not cause disease in healthy adult human beings,
00:03:56.20		and it grows relatively quickly. It has a doubling time of about three hours,
00:04:01.29		which means that we can grow a lawn, a smooth lawn, of Mycobacterium smegmatis
00:04:05.29		on Petri dishes in about 24 hours, and we can grown individual colonies in three to four days.
00:04:14.02		Mycobacterium tuberculosis is actually a very serious and important human pathogen.
00:04:21.13		About two million people a year die from Mycobacterium tuberculosis infections, from TB.
00:04:31.02		And it is estimated that Mycobacterium tuberculosis kills more people
00:04:35.27		in the world than any other single, infectious agent.
00:04:39.05		Many people that are infected with the organism actually don't get disease
00:04:45.09		because the bacterium establishes a latent infection and doesn't cause health problems.
00:04:53.20		Although, it can do either with old age,
00:04:58.15		or with a compromise of your immune system, such as for example with HIV infection.
00:05:04.29		These are not only a very prevalent... it's a very prevalent disease is tuberculosis,
00:05:10.24		but there is a growing and widespread concern
00:05:13.22		about drug resistance strains of Mycobacterium tuberculosis,
00:05:17.27		that are either difficult to treat or effectively untreatable.
00:05:23.18		There's clearly a need for new strategies for diagnosis, prevention, and cure of tuberculosis.
00:05:32.02		And so we think these are good reasons to focus on the phages
00:05:36.20		that infect these organisms in the hope that they could contribute towards that specific cause.
00:05:43.25		And so this is an important point. The mycobacteriophages can really lead us in two directions.
00:05:51.10		They can tell us about viral diversity and the evolution of bacteriophages,
00:05:56.09		and at the same time they can provide tools for controlling TB
00:06:01.01		and in fact can provide elements that we need to manipulate TB to understand it and to work with it.
00:06:09.05		I am not going to focus here too much on the specific applications of the mycobacteriophages.
00:06:16.27		I thought that I would just mention one in passing,
00:06:19.28		which is the use of mycobacteriophages as a novel type of diagnostic system
00:06:24.28		in order to test whether a person is infected with TB
00:06:30.21		and indeed whether it is a drug resistant or a drug sensitive strain.
00:06:35.19		This is a strategy which was first described by my colleagues Bill Jacobs and Barry Bloom.
00:06:41.07		The idea is to make so called reporter mycobacteriophages,
00:06:45.26		recombinant phages that carry a gene that can report
00:06:50.27		and tell us about the metabolism of the mycobacterial cell.
00:06:57.04		So you can construct reporter phages that carry a gene
00:07:01.11		such as firefly luciferase that will make the bacteria emit light.
00:07:06.13		Or we can make reporter phages that carry green fluorescent protein from jellyfish
00:07:13.06		that when that is introduced by infection of the host, it makes the cell fluoresce.
00:07:18.12		And we can use these properties, fluorescence or light emission,
00:07:23.10		in order to then monitor what type of bacteria a particular patient is infected with.
00:07:30.18		And so this is an idea that I think shows considerable promise
00:07:33.05		and is currently undergoing further research and development.
00:07:37.20		If we want to compare the genomes of mycobacteriophages
00:07:48.13		in order to understand how they are related to each other,
00:07:52.06		how they've evolved, what their diversity is, well,
00:07:55.23		we need to have the mycobacteriophages in order to characterize.
00:08:00.08		And so we have gone out over the past few years
00:08:03.21		to isolate new mycobacteriophages and to genomically characterize them.
00:08:10.04		And whilst this has been a major focus in my laboratory,
00:08:14.17		this has also proven a very successful approach for both high school students
00:08:22.23		and undergraduate students to become involved in research endeavors
00:08:28.28		by going out and isolating new mycobacteriophages and sequencing them.
00:08:33.03		And now with the Howard Hughes Medical Institute science education alliance,
00:08:38.27		there are hundreds of students who are contributing to this cause,
00:08:42.16		and because of this we now have many new mycobacteriophages to characterize and to compare.
00:08:51.02		The process is relatively simple.
00:08:53.11		We start with a sample of soil or compost or wherever you might think to go
00:09:01.20		and look and to find out if there are some bacteriophages present,
00:09:04.18		The sample is mixed up with some liquid. The particulate matter is removed.
00:09:12.02		And we simply incubate some of that in the presence of our permissive bacterial host,
00:09:18.12		which is Mycobacterium smegmatis.
00:09:20.07		We lay those out on a Petri dish, as shown here, and we look for plaques,
00:09:26.02		for areas where a phage that was present in our original sample
00:09:32.09		has now infected these cells to form a plaque.
00:09:34.25		We can then pick an individual plaque, purify it, remove all of the other contaminants,
00:09:43.15		and we can propagate it in the laboratory until we have a high titer or a concentrated stock.
00:09:49.12		From that we can make DNA.
00:09:51.08		The DNA can be sequenced to give us tens of thousands of nucleotide sequence information,
00:10:00.20		and then we use computational approaches and bioinformatics
00:10:04.20		to predict where all the genes are in these genomes, and then we can compare them.
00:10:11.01		So we are using Mycobacterium smegmatis as our host,
00:10:16.24		fast growing, non-pathogen, and our samples predominantly come from soil and compost.
00:10:23.06		We have usually just simply plated out the sample with our permissive host,
00:10:28.24		but because the specific phages that we're after can be present at relatively low concentrations,
00:10:37.05		there is an approach that can be used with enrichment, where you simply take your soil or your compost sample,
00:10:43.00		you mix it and incubate it with some permissive host cells,
00:10:47.21		in this case Mycobacterium smegmatis, that allows even the small number of particles
00:10:53.23		that may be present to infect, to reproduce themselves,
00:10:57.27		and so that when it comes to the plating and the identification of
00:11:03.06		plaques they're present at higher concentrations.
00:11:05.20		There's a couple of different approaches,
00:11:08.03		but this is a relatively reproducible and simple process for discovering new phages.
00:11:16.02		So by this point thousands of mycobacteriophages have been isolated
00:11:19.06		using Mycobacterium smegmatis as a host.
00:11:22.16		I should state I think that some of these infect smegmatis,
00:11:27.14		but don't infect Mycobacterium tuberculosis, whereas others do.
00:11:32.19		And so we use a surrogate strain, Mycobacterium smegmatis, as a host,
00:11:37.23		but it is likely that the host range, the cell preferences of the phages that we isolate
00:11:43.28		are going to be all over the place and at this stage are not well defined.
00:11:47.13		We've... the most recent publication that describes the characterization of these
00:11:56.07		appeared earlier this year in 2010, and described a comparative analysis of 60 of these.
00:12:04.14		But because of the impact of the science education alliance program
00:12:10.21		as well as the ongoing studies of Pittsburgh,
00:12:12.17		the number of new phages and sequenced genomes, it is positively exploding.
00:12:18.27		And at this point in the middle of October in 2010,
00:12:23.28		154 completed genome sequences and much analysis awaiting to be done.
00:12:33.11		All of these phages, it turns out, even though they don't have to be,
00:12:38.20		are double stranded DNA tailed phages.
00:12:42.14		We haven't isolated any RNA phages or any single stranded DNA phages.
00:12:48.02		They are all double stranded DNA, tailed phages.
00:12:51.12		Now in part one of this lecture we saw that perhaps the most common order of bacteriophages
00:12:59.06		are the Caudovirales, the double stranded DNA, tailed viruses.
00:13:02.21		Just like these that I showed you. I also told you that there's three common types.
00:13:09.06		The so-called Siphoviridae with the long flexible tails, the Myoviridae with the contractile tails,
00:13:14.24		and the Podoviridae that have short stubby tails.
00:13:17.21		If we just compare the morphotypes of these 60 genomes,
00:13:23.13		which have been analyzed and published,
00:13:27.09		53 of them are of this Siphovirus type, 7 of them are of the Myovirus type.
00:13:33.08		We have no Podoviruses at all.
00:13:37.21		And so these numbers appear to hold true for the larger collection
00:13:42.05		of mycobacteriophages, and therefore we have growing confidence in the idea
00:13:47.15		that there really are no Podoviruses amongst the mycobacteriophages.
00:13:53.00		We don't know whether this is because phages with the short stubby tails
00:13:58.20		are physically incapable of infecting bacteria like the mycobacteria
00:14:04.25		that have thick and chemically complex cell walls,
00:14:08.06		or whether it's just a reflection of a restriction
00:14:12.12		of evolutionary opportunities to generate those types of phages.
00:14:19.04		So that's a little bit of a mystery as we don't have any Podoviruses,
00:14:24.18		but we have lots of examples of these other two morphotypes.
00:14:28.28		When we look at the genomes there are some basic parameters
00:14:34.21		that we can see that are helpful in thinking about what these genomes are like.
00:14:38.11		First of all, the average length of all them is 72,588 base pairs.
00:14:47.01		We don't really understand why mycobacteriophages would have that particular length.
00:14:51.07		Phages of other bacterial hosts often have very different average lengths
00:14:58.02		including those that are only half as long as the average mycobacteriophage genome.
00:15:03.09		And so we don't really know what determines this parameter,
00:15:08.02		either for the mycobacteriophages or indeed for any other phages.
00:15:12.28		There's also a large range in size from a little under 42,000 base pairs
00:15:20.23		up to about 164 and a half thousand base pairs.
00:15:25.06		So there is a lot of diversity in terms of size range.
00:15:29.25		The GC content on average for all of these 60 genomes is about 63 and a half percent.
00:15:37.00		A number which closely mirrors the GC content of the bacterial host Mycobacterium smegmatis.
00:15:44.08		And that's not a surprise because it has been seen from the analysis of phages of other bacterial hosts
00:15:52.27		that the GC content of the phages often mirrors that of the hosts.
00:15:57.09		What's perhaps more surprising, however, is that the range of GC content
00:16:03.10		amongst these phages is actually really amazingly broad
00:16:07.09		spanning from 56.3% at the lower end up to 69% at the upper end.
00:16:14.28		And we've been trying to think for some time as to what this span of GC content reflects.
00:16:23.16		One attractive idea although it remains to be fully tested is that these particular mycobacteriophages
00:16:33.13		whilst they have a common host in Mycobacterium smegmatis
00:16:37.12		may not necessarily have been infecting Mycobacterium smegmatis
00:16:43.17		as their preferred bacterial host in the environment from which we recovered the phages
00:16:49.13		in their recent ecological and evolutionary times.
00:16:54.11		In other words, they may have preferences for infecting some other bacterial host
00:17:00.20		that we have yet to figure out what that is.
00:17:03.24		But that might account for the range of GC content that we would see.
00:17:09.06		And so one of the things that we would like to do to test this idea
00:17:11.28		is to actually determine the specific host range
00:17:15.18		on a whole range of bacteria that are related to the Mycobacteria
00:17:21.24		to see if we can discern a pattern or a correlation between GC content and the host preferences.
00:17:27.17		And finally if we look at the number of genes that are present,
00:17:31.21		of these 60 genomes there is a total of 6858 open reading frames or putative protein coding genes, ORFs,
00:17:40.19		about a hundred and fourteen ORFs on average per genome.
00:17:45.27		And interestingly the average ORF size, the average size of an open reading frame, is only 616 base pairs.
00:17:55.14		That's about two thirds of the average size of a bacterial gene.
00:18:02.22		And this appears to be a parameter which is true not just for the mycobacteriophages,
00:18:08.08		but for other bacteriophages that people have looked at.
00:18:11.01		And we've been interested as to why this number
00:18:13.13		should be quite so different from that of the bacterial host.
00:18:17.14		It fits, however, I think, with the idea that illegitimate recombination
00:18:23.10		is playing a key role in how these genomes evolve.
00:18:29.06		And in fact we can see that many of the segments of DNA that appear to have come in
00:18:34.26		relatively recently from other genomes tend to be on the small side.
00:18:39.07		And therefore we can think of this process of evolution, as we talked about in part two,
00:18:46.29		may actually contribute to driving the average gene size down.
00:18:52.20		So we can take our 60 genomes, and we can ask the question:
00:18:58.25		"how are they related to each other at the nucleotide sequence level?"
00:19:03.27		And we can use an approach that we saw in part two,
00:19:10.17		which is where we can compare the nucleotide sequences in a dot plot analysis.
00:19:16.20		And one way of doing this is illustrated here.
00:19:21.29		Now what we've done is to take our 60 genomes,
00:19:24.29		and we've simply joined them together end to end to make a long concatamer,
00:19:30.26		and we've done that in random order.
00:19:33.01		We've just taken our sixty sequences joined them together
00:19:36.14		to get a long span and then simply compared them with each other.
00:19:40.02		Not surprisingly there is a diagonal line from the top left to the bottom right
00:19:45.22		because that simply tells us that every phage genome is identical to itself.
00:19:51.22		That is a good thing.
00:19:53.00		And then there's a number of diagonal lines you can see
00:19:57.05		where a particular phage in this part of the array
00:20:01.24		is similar to a second phage that is sitting in a different part of the array.
00:20:09.08		And because the genomes are in a random order in this concatamer,
00:20:13.10		these various types of relationships are scattered over this dot plot.
00:20:21.06		And we can see though, I think, that we have phage genomes that are similar to each other,
00:20:27.16		but there must be many that are completely dissimilar to each other at the nucleotide sequence level.
00:20:33.06		So having done this and identified, generally speaking, who is most closely related to who else,
00:20:40.13		what we can do is we can take each of the genomes
00:20:44.03		and we can change the order in which we've arrayed them in this concatamer,
00:20:50.10		and then repeat this computational comparison.
00:20:54.09		So when we do that, this is what the plot looks like.
00:20:57.14		And so all we've done is simply to group the genomes together that are similar to each other.
00:21:04.26		So for example if you look in the top right hand corner all of those genomes that are similar to each other are positioned
00:21:10.10		next to them in the top left hand part of the plot.
00:21:13.03		We can take this gross nucleotide sequence similarity
00:21:18.02		to put the genomes together into what we refer to as clusters.
00:21:24.04		Such as Cluster A, Cluster B, C, D, E, etc.
00:21:27.10		And so those clusters go up to cluster I,
00:21:30.13		and on the right hand side where it says Sin,
00:21:36.05		this corresponds to what we refer to as singleton genomes.
00:21:41.02		And out of these 60 genomes, there are 5 that are singletons,
00:21:45.11		which means that each of those has no close relatives
00:21:49.10		either here or anywhere through the biological world.
00:21:56.00		There is some important texture to this grouping and these clusterings,
00:22:01.25		and we can readily identify some clusters as being, having more than one closely related type.
00:22:11.14		And we therefore subdivide the cluster into sub-clusters.
00:22:16.09		You can see here for the cluster C that there are many of these genomes,
00:22:21.13		in fact almost all of them are very similar to each other,
00:22:25.08		and constitute sub-cluster C1, and then there is a single genome over here
00:22:31.10		which is related to the other C cluster genomes, but less so, so that constitutes sub-cluster C2.
00:22:41.10		So we have a large number of different types of genomes,
00:22:43.27		more than twenty substantially different types of genomes,
00:22:47.02		just within this group of 60 that we are looking at.
00:22:51.21		And so each of these genomes, and you can see them identified by name here,
00:22:57.05		as we zoom in on the different clusters and sub clusters.
00:23:00.18		Here we are looking at clusters A through to E.
00:23:03.15		Sub cluster C as I indicated can be divided into sub-cluster C1
00:23:10.25		with Bxz1, Cali, Catera, Rizal, ScottMcG, and Spud.
00:23:16.26		And then Myrna is the sole member of cluster C2.
00:23:20.22		And these are the remaining clusters, F, G, H, and I.
00:23:27.24		And then here are the singletons over on the right hand side here:
00:23:31.14		Corndog, Giles, TM4, Wildcat, and Omega.
00:23:37.00		And so we can take each of these genomes that we've assorted with each other
00:23:47.17		according to their nucleotide sequence similarity,
00:23:50.18		or if they are singletons, they're one of a type.
00:23:53.14		We can generate the genome maps,
00:23:55.26		and we can see what features they have and what they look like.
00:23:58.18		This is showing Giles, which I introduced previously in part 2 of the lecture,
00:24:03.29		and you can see its densely packed genes with the rightwards transcribed genes above the DNA,
00:24:12.22		and the leftwards transcribed genes below the DNA.
00:24:15.20		It is densely packed and we've color coordinated these genes according to their relatives.
00:24:22.03		And so we now have these genome maps for all of these phage genomes
00:24:28.11		and these maps then can be compared,
00:24:31.01		and in fact the genes and the predicted proteins can be compared as well.
00:24:35.23		So we look at these 60 mycobacteriophages,
00:24:39.20		and we see that the genes are tightly packed with few non-coding regions.
00:24:42.29		There's many, many genes, but there appears to be few operons.
00:24:49.11		Meaning that we think that there may be a hundred genes, but there may be only 2, 3, or 4 sites
00:24:56.09		for transcription initiation or promoters that are used to express these genes.
00:25:02.19		We actually know very little about the patterns of gene expression of any of these phages,
00:25:07.05		but the bioinformatic predictions are that there will be blocks of genes that are transcribed together.
00:25:15.16		The virion genes, those are the genes that encode the structural components, the heads and the tails,
00:25:24.23		those genes typically tend to be grouped together in the genome,
00:25:28.19		and they have a common order or synteny
00:25:32.09		which is conserved even though the genomic sequences may be extremely different to each other.
00:25:40.23		Especially once we examine the parts of the genomes outside of these virion genes,
00:25:46.14		we find vast numbers of genes, many of them relatively small,
00:25:51.03		which have a completely unknown function.
00:25:54.25		And we have failed to predict what they can do simply from comparing them with other genomes.
00:26:01.22		And so what we've done is to create a computer program.
00:26:08.06		This was a program call Phamerator, and it was written by a colleague of mine, Dr. Steve Cresawn,
00:26:13.27		which can then begin to analyze all of the genes and how they are related to each other
00:26:19.05		by comparing them at the amino acid sequence level.
00:26:22.24		This is really important because so far I have shown you how
00:26:26.29		we can compare genomes at the nucleotide sequence level,
00:26:30.05		I also showed you that we have lots of examples because we have many different types of genomes.
00:26:36.08		that appear to not share nucleotide sequence similarity
00:26:41.10		even though they are in genetic communication with each other, at least in principle,
00:26:46.10		because of the use of the common host.
00:26:49.05		Just because they don't have nucleotide sequence similarity
00:26:53.00		doesn't mean that they are completely unrelated.
00:26:56.00		And in fact, once we start to look at the gene relationships
00:27:01.16		by comparing the amino acid sequences
00:27:04.22		we can begin to see the patterns that reflect the common origins of the phages,
00:27:10.15		even though they no longer share nucleotide sequence similarity.
00:27:14.06		And so this program that Steve Cresawn wrote
00:27:19.07		called Phamerator facilitates this in a very important process.
00:27:25.07		What it does is it takes each of these open reading frames out of 60 genomes,
00:27:30.06		we have these 6,854 genes.
00:27:33.14		It takes each of the predicted proteins
00:27:37.10		and compares them with everything else
00:27:39.23		using alignment programs such as BLASTp and Clustal.
00:27:45.23		Genes which are related to each other because
00:27:49.21		they meet a particular threshold of similarity we group together.
00:27:55.13		And we put them in groups, and those groups are called phamilies or phams.
00:27:59.20		And of these 6,858 genes we have a total of 1,523 distinct phamilies or sequences.
00:28:12.04		A large proportion of those are what we refer to as "orphams".
00:28:18.13		They are phamilies but they only contain a single member.
00:28:21.25		Not because we believe that other members don't exist
00:28:26.14		but because this population of phages appears to be very diverse
00:28:31.12		and presumably quite large,
00:28:33.10		and we simply haven't yet identified the relatives
00:28:36.29		of these orphams that constitute these phamilies.
00:28:40.03		And so this is about 45% of all of our phamilies only have a single member.
00:28:45.16		This Phamerator program is extremely helpful for generating the maps
00:28:51.25		and displaying the relationships that help us understand
00:28:54.19		the mosaic components by which these are put together.
00:28:59.10		And so here I am showing segments of four genomes,
00:29:01.22		that you can see, just parts that are aligned
00:29:07.03		showing the boxes here and the numbers above the boxes such as here at the top in the middle, 1406,
00:29:16.13		refers to a particular phamily. That's a phamily number for which that gene is a member.
00:29:22.12		And then in this display we can color coordinate the degree
00:29:27.14		of sequence similarity at the nucleotide level between the various genomes.
00:29:32.26		And this is actually reflecting a part that I showed you... a part of these genomes
00:29:36.22		that we talked about in part two.
00:29:39.18		Now we can do this type of representation with large numbers of these genomes.
00:29:47.05		When we look at particular clusters, any particular cluster
00:29:51.03		can have genomes that are very similar to each other,
00:29:55.28		or they can be actually quite diverse, depending on the particular cluster
00:30:00.25		that you look at and the degree of sequence similarity.
00:30:04.03		I am just illustrating this with the clustered G phages,
00:30:09.06		for which in our expanded set we actually have 4 members now,
00:30:13.29		and the color coordination, the purple between these four genomes illustrates how very closely related they are.
00:30:21.25		And when we compare the colors of the genes at the protein levels,
00:30:25.21		you can see that these are also very similar.
00:30:30.13		This method is very powerful in part because it is a very easy way
00:30:35.18		of seeing rather smaller differences
00:30:37.27		that nonetheless have played a key role in how these genomes have evolved.
00:30:42.20		For example, down in the right hand end you can see these convolutions
00:30:46.22		here of segments that have been lost from one genome or gained by another.
00:30:51.13		And in fact this illustrates the finding of a new mobile genetic element,
00:30:57.10		a new ultra small transposon that appears to play a role in these particular...
00:31:02.08		in the evolution of these particular genomes.
00:31:05.09		So in part two we saw a lot about how mosaicism is the key architectural feature
00:31:14.16		of bacteriophage genomes. Because now we are looking at this group of mycobacteriophages
00:31:22.12		infecting a common host,
00:31:24.15		we have lots of examples where even though there is no nucleotide sequence similarity,
00:31:29.23		we can see that the genes are shared through common amino acid sequence similarity.
00:31:37.15		And therefore we can look at patterns that are contributing in generating the process of genome mosaicism
00:31:46.09		even in the absence of substantial sequence similarity.
00:31:51.23		And what we find is a massive amount of mosaicism
00:31:56.11		where the modules that contribute to the structure of the genome often correspond to simply to single genes.
00:32:08.14		So modules correspond to single genes when we conduct this type of analysis.
00:32:15.03		And we've developed a particular tool for representing this, representing the phylogenies if you like,
00:32:24.08		where we can take individual Phams- here is one Pham 233 and here is another Pham, Pham 471-
00:32:31.27		and in these representations, we've simply drawn as points around the circle
00:32:38.24		all of the genomes that we have available to us,
00:32:41.16		and for that particular sequence family we've drawn an arc between those genomes
00:32:48.28		that have a member of that Pham.
00:32:52.29		And therefore it essentially represents or reflects the phylogeny
00:32:57.26		or the evolutionary history of this particular family of sequences.
00:33:02.09		In the top part of the figure I've just shown a small segment of phage Omega from genes 125 to 128.
00:33:11.27		Gene 126 in Omega has a relative that we can see through amino acid sequence similarity
00:33:19.23		to a gene in this genome called Cjw1.
00:33:24.23		Gene 127 in Omega has a relative in a genome called KBG.
00:33:32.11		In that case, gene 84. But, and this is important, the context, the flanking sequences in each case is different.
00:33:46.15		Ok, the sequences to the left of Omega 126, which corresponds to Omega gene 125,
00:33:53.06		are completely unrelated to Cjw1 gene 72,
00:33:58.17		which is at the left part of that gene in Cjw1.
00:34:03.10		And the same goes for the KBG comparison.
00:34:08.26		So in this case we can see that we don't have any nucleotide sequence similarity between these,
00:34:13.15		but we can dissect these evolutionary relationships that show
00:34:21.09		that these two adjacent genes in this example in Omega
00:34:24.16		have clear and distinct evolutionary histories. They have different phylogenies.
00:34:30.02		And this is one example, but we have clearly thousands of examples of Phams
00:34:36.00		which share and exhibit these types of relationships.
00:34:40.26		And this has considerable importance when you start to think
00:34:46.02		about questions of phylogeny of whole phage genomes.
00:34:50.23		Why not just take whole phage genomes and construct a phylogenetic tree
00:34:54.16		so you can see how they are all related to each other?
00:34:58.16		The problem is that all of the bits of the genomes because they are mosaic,
00:35:03.10		built from modules and pieces, and all those bits and those pieces have distinct evolutionary histories.
00:35:10.04		They have different phylogenies. There is arguably no single, clear, evident
00:35:16.17		phylogeny for a genome as a whole.
00:35:18.27		The genome represents an individual phage,
00:35:23.04		and its evolutionary history is reflected in a multiplicity of events
00:35:28.17		that have put those pieces together in that particular combination, in that order, in that particular virus.
00:35:36.02		I am just showing some other genome maps here of
00:35:43.19		some of these genomes illustrating again that for some of these genes
00:35:52.05		that are encoding the structural proteins, we know what they do.
00:35:55.20		But most of these other abundance of genes with large numbers of genes,
00:36:00.02		we really have absolutely no idea what they do.
00:36:03.01		And we would certainly like to know what their functions are and indeed what their structures are.
00:36:08.00		And so now if we expand our analysis to include the unpublished information,
00:36:13.12		and this was done for a 153 genomes that are completely sequenced,
00:36:19.09		over 17,000 open reading frames, almost 3000 Phamilies of distinct and different protein sequences.
00:36:26.12		The number of Orphams has come down slightly.
00:36:30.26		It is about 41% as we have started to find some of the relatives of genes that were previously Orphams.
00:36:38.22		And amazingly, if we take these almost 3000 phamilies,
00:36:43.11		and we compare them against the sequence databases,
00:36:46.13		we find that about 80% of them are novel genes. They are novel sequences.
00:36:52.21		There are no relatives of either other phages or anything else that has been sequenced in the database.
00:36:59.18		Even of the 20% of Phams that do match, so you know there is a related protein out there in the databases,
00:37:08.14		about half of those are for genes for which people don't know what they do anyway.
00:37:14.09		So database searching is an interesting exercise with these bacteriophage genes.
00:37:23.17		It provides rather little information as to what the functions of the genes are.
00:37:27.06		It is obviously very helpful when they do,
00:37:29.24		but the amazing thing is we just don't know what most of these genes do, and we would like to.
00:37:34.27		In this particular system we've made some headway
00:37:40.05		in developing a tool that can now help us address this question.
00:37:46.03		It is called BRED, or bacteriophage recombineering of electroporated DNA,
00:37:50.26		and it provides a simple, reproducible technique
00:37:56.19		for constructing mutants in mycobacteriophage genomes, either deletions, insertions, point mutations.
00:38:04.26		This method is published, and I won't go through its details here,
00:38:09.02		but it really just requires a simple electroporation step,
00:38:13.24		an ability to put phage DNA and a synthetic substrate together
00:38:18.04		inside a cell, and those techniques are well established
00:38:20.23		for doing so, followed by nothing more complicated than simply doing
00:38:27.27		a polymerase chain reaction or PCR screens
00:38:30.27		amongst a dozen or so of the progeny plaques that are recovered
00:38:37.04		in order to find those that have the mutation that you need.
00:38:40.11		And this is all accomplished through the establishment of a so-called mycobacterial recombineering system
00:38:47.01		that enables this to happen
00:38:49.12		at much higher frequency than you would normally see it.
00:38:52.20		And so this is very powerful because we can use that type of approach
00:38:57.05		to now go and ask what those genes do.
00:39:00.05		And indeed we can use it to try to develop applications for some of what we've found and what we are learning
00:39:07.23		that might be useful for the genetics of mycobacteria or specifically control of tuberculosis.
00:39:13.17		And I will give one brief example of that which is a couple of genes called Lysin A and Lysin B.
00:39:21.04		In this case I am again showing Giles as an example.
00:39:27.06		In part one we talked about an important step that happens at the conclusion of lytic growth,
00:39:34.01		and that is that in order for the phage particles that have been generated by infection to get out
00:39:41.06		then the cell wall needs to be compromised. It needs to be broken open. The cell needs to be lysed.
00:39:49.03		And the phage encodes the enzymes that enable that to happen.
00:39:52.09		We know very little about the process in mycobacteriophages,
00:39:58.16		but we were surprised in looking at the genomes that there are two candidate genes that are involved,
00:40:04.14		lysin A and lysin B, and we were able to use this engineering technique
00:40:12.14		to construct, to find mutations where we've removed either one of these genes
00:40:17.19		and examined what the behaviors of the phages were.
00:40:20.26		That way would enable us to figure out exactly what roles these genes are playing in lysis.
00:40:27.20		And I won't show you all the detailed experiments that gave us the conclusions as to what these do,
00:40:35.29		except I think the results are very clear.
00:40:39.19		And that is that in this portrayal of what the mycobacterial cell wall looks like,
00:40:44.13		where you have an inner membrane.
00:40:45.20		You have the peptidylglycan of the cell wall.
00:40:49.14		There is a sugar layer called arabinogalactan,
00:40:53.20		and covalently attached to this is the so called mycobacterial outer membrane,
00:40:59.14		which is composed of an interesting type of lipids called the mycolic acids.
00:41:04.23		And this is found in the mycobacteria and it is found in tuberculosis,
00:41:08.11		but most bacteria don't have this type of outer wall structure.
00:41:16.15		So not surprisingly we find that genomically the Lysin B enzyme
00:41:24.18		is found predominantly only encoded by mycobacteriophages.
00:41:28.28		And that was part of what clued us in to Lysin B playing a role in perhaps degrading this cell wall structure.
00:41:37.00		What we now know is that the Lysin A is the enzyme that degrades the peptidylglycan.
00:41:43.11		And Lysin B is this novel enzyme that actually cleaves the mycobacterial outer membrane
00:41:50.27		from this arabinogalactan layer and therefore facilitates complete lysis
00:41:56.28		of the cell during the process of release of the progeny viruses at the conclusion of the lytic cycle.
00:42:06.16		And we are obviously very interested in these enzymes
00:42:09.05		because they are enzymes that degrade the cell walls of mycobacteria.
00:42:14.27		And therefore we like the idea that these enzymes could perhaps
00:42:19.19		play potentially useful roles either in the lab to try to break open and to destroy mycobacteria.
00:42:26.29		And perhaps even in a clinical setting perhaps to either help to inactivate mycobacteria
00:42:34.05		or perhaps to act synergistically with antibiotics
00:42:37.14		to make them work better and quicker in killing
00:42:40.11		Mycobacterium tuberculosis in an infected patient.
00:42:44.21		So I gave you just one example there of how we can begin to identify what these genes do
00:42:51.21		and how some of them may be useful.
00:42:53.07		We have seen that mycobacteriophages are highly diverse.
00:42:56.14		They have these architecturally mosaic genomes,
00:42:59.17		and we can dissect this mosaicism not just by looking at the nucleotide sequence similarities,
00:43:05.10		but by comparing amino acid sequence similarities, a feature that is really greatly enhancing
00:43:12.26		and aided by the fact that we have now this large number of phages
00:43:19.01		and phage genomes that infect a common host.
00:43:22.06		And I think that that raises the idea that there is probably a lot to be learned
00:43:27.12		from generating similar collections of bacteriophages that infect other bacterial hosts.
00:43:34.26		And the larger these collections grow, the greater the insights
00:43:39.04		and the resolution of the information that we can gain
00:43:42.12		from how similar they are, how related to each other, and the specific mechanisms by which they have evolved.
00:43:48.14		80% of the genes are of unknown function, and we and others have our work cut out
00:43:56.01		to try to find out what these are, what they do,
00:43:59.17		what they look like structurally, and why they are there.
00:44:04.09		We are beginning to learn about how they got to be there in these genomes.
00:44:08.12		Now we need to know what they do.
00:44:10.04		I've told you that the techniques have now been established
00:44:14.02		that we can begin to readily manipulate these genomes.
00:44:17.00		Tools that one again could imagine applying to other bacterial hosts
00:44:22.14		and other viruses in order to address these questions.
00:44:26.10		And I think that we have now a very powerful tool box
00:44:30.14		in this large set of phages, in this large number of genomes,
00:44:36.07		that can be used to understand what makes
00:44:41.06		Mycobacterium tuberculosis, a major human pathogen, tick.
00:44:46.26		And how we can exploit and use those genes and those genomes
00:44:51.21		for contributing towards the diagnosis, the prevention, and cure of human TB.
00:45:00.10		I would like to finish by acknowledging those who have helped to support this research,
00:45:07.19		both the National Institutes of Health and the Howard Hughes Medical Institute.
00:45:11.23		All the work that I have talked about was performed by a truly stunning set of colleagues,
00:45:18.28		and I've listed many of their names there.
00:45:25.01		As I mentioned throughout that the genomic work has in part been done
00:45:30.26		by a large number of undergraduate students and high school students,
00:45:35.23		both in Pittsburgh and beyond. I don't have you all listed here,
00:45:39.13		but the contributions I think are really massive,
00:45:42.20		and I acknowledge that contribution, and thank you for that.
00:45:46.24		And so thank you for your attention to this iBioSeminars lecture.

This material is based upon work supported by the National Science Foundation and the National Institute of General Medical Sciences under Grant No. 2122350 and 1 R25 GM139147. Any opinion, finding, conclusion, or recommendation expressed in these videos are solely those of the speakers and do not necessarily represent the views of the Science Communication Lab/iBiology, the National Science Foundation, the National Institutes of Health, or other Science Communication Lab funders.

© 2023 - 2006 iBiology · All content under CC BY-NC-ND 3.0 license · Privacy Policy · Terms of Use · Usage Policy
 

Power by iBiology