Genomics and Cell Biology of the Apicomplexa
Transcript of Part 3: Designing and Mining Pathogen Genome Databases: From Genes to Drugs and Vaccines I
00:00:05.04 Hello, my name is David Roos, 00:00:07.06 and I'm a professor at the 00:00:09.14 University of Pennsylvania in Philadelphia. 00:00:11.25 And welcome to the third in this series 00:00:15.11 of iBio's lectures on my five favorite parasites, 00:00:20.14 organisms in the phylum Apicomplexa, 00:00:22.29 which include a wide range of pathogens 00:00:26.19 responsible for diseases such as malaria and toxoplasmosis. 00:00:31.13 And as you may have heard in the last couple of talks, 00:00:35.10 we've discussed various aspects of the cell biology 00:00:38.20 of these parasites 00:00:40.27 and how they serve... 00:00:43.14 provide a fantastic window 00:00:46.03 into a wide range of cell biological 00:00:48.07 and molecular genetic studies 00:00:50.12 on these organisms, 00:00:52.24 and a window into eukaryotic pathogens in general. 00:00:56.25 In this talk, I'd like to turn our attention 00:00:59.05 to another aspect of technologically driven science, 00:01:05.00 in fact an area that's really changing the way 00:01:09.14 we think about biology as a whole, 00:01:11.10 something that I'm sure has been touched on in many of the iBio seminars. 00:01:13.19 In, particular that's the area of genomics, 00:01:15.25 an area which is characterized by vast amounts of data -- 00:01:22.16 complete data sets for all of the genes in the genome, 00:01:24.24 all of the proteins that are expressed 00:01:27.01 and the interactions between them. 00:01:30.24 And with the genome projects for... 00:01:34.19 the genome projects for humans 00:01:36.23 and for many other organisms, 00:01:38.19 such as the mosquitoes 00:01:41.13 that carry malaria parasites 00:01:43.28 and from the malaria parasite itself -- Plasmodium falciparum -- 00:01:46.27 with these large scale data sets available, 00:01:49.21 we can start to look at many interesting biological questions 00:01:54.09 and explore those in interesting ways. 00:02:01.01 So, by way of background, or to provide some sort of a context, 00:02:04.14 let me remind you that in the last couple of lectures 00:02:07.21 we've discussed 00:02:11.11 a remarkable cell biological system present in malaria parasites, 00:02:15.22 and I've told you how malaria parasites and their relatives 00:02:21.28 harbor a fascinating organelle, 00:02:23.28 which they stole from an ancestral plant or alga 00:02:27.07 when an ancient parasite -- symbolized here by this... 00:02:30.17 by this cartoon diagram indicated in pink -- 00:02:34.12 ate a eukaryotic alga 00:02:37.05 and retained the algal chloroplast 00:02:40.14 as a distinctive organelle 00:02:44.05 that is essential for parasite survival, 00:02:47.07 although it is no longer photosynthetic. 00:02:50.26 Here's a picture of that organelle, 00:02:53.04 colored in green in the electron micrograph 00:02:55.22 of a malaria parasite, 00:02:57.24 quite distinct from the parasite nucleus 00:02:59.17 or the Golgi apparatus 00:03:02.07 and specialized secretory organelles involved in invasion, 00:03:05.04 the inner membrane complex that we talked about 00:03:08.15 in the first of the... of these... of this iBio lecture series 00:03:13.02 that's critical for the assembly of parasites in a remarkable process 00:03:17.24 -- and the apicoplast -- 00:03:21.19 the apicomplexan plastid surrounded by multiple membranes, 00:03:23.13 distinct from a primary endosymbiotic organelle, 00:03:26.19 the mitochondria. 00:03:28.26 And I told you how this organelle 00:03:32.03 is essential for parasite survival. 00:03:33.19 And we now have, effectively, 00:03:35.25 a complete metabolic pathway map, 00:03:37.22 including many enzymes and pathways 00:03:42.02 which are the subject of intense interest 00:03:45.00 as possible targets for new 00:03:48.07 anti-malarial development. 00:03:50.09 What I also told you in the last talk 00:03:52.23 is that that metabolic pathway map 00:03:57.00 was described using a variety of traditional biological approaches 00:04:02.25 -- biochemistry, cell biological purification, 00:04:05.20 more modern approaches of proteomics -- 00:04:08.23 but that the most successful strategy 00:04:10.25 for identifying pathways and processes 00:04:13.25 associated with the apicoplast 00:04:15.22 was in fact a bioinformatics approach, 00:04:18.25 taking advantage of the availability of genome sequence data 00:04:22.27 -- some of it complete, some not -- 00:04:24.25 developing a range of computational 00:04:29.08 and experimental screens 00:04:31.03 to identify proteins that had one or another features 00:04:36.04 that we suspected might be associated 00:04:39.12 with the apicoplast, 00:04:41.11 giving rise to a candidate series of genes 00:04:43.20 which could then be tested experimentally 00:04:46.03 at the laboratory bench. 00:04:49.05 So, this is, I think, 00:04:51.00 a dramatic illustration of the success of 00:04:56.00 computational biological approaches 00:04:57.22 for practical application in expediting traditional biological research. 00:05:05.13 And it rapidly became clear 00:05:09.28 that the same sorts of approaches 00:05:11.28 that were so successful in asking questions like 00:05:14.02 "find me plant chloroplast genes 00:05:16.27 in malaria parasites" 00:05:18.25 might readily be used to address 00:05:21.09 a variety of other questions. 00:05:23.12 We might want to ask, for example, 00:05:25.29 can we find proteins 00:05:30.00 on the left-hand end of chromosome 10 00:05:33.05 in malaria parasites 00:05:36.03 which have multiple transmembrane domains 00:05:38.12 and might therefore be involved in importing nutrients, 00:05:41.29 or exporting drugs, 00:05:43.29 and play an important role in drug sensitivity 00:05:46.16 or drug resistance? 00:05:49.27 And so, the desire to be able to address questions like that 00:05:52.05 gave rise to the Plasmodium genome database, 00:05:54.25 which we will talk about further, 00:05:56.11 as part of the Plasmodium genome project. 00:06:01.03 PlasmoDB, the Plasmodium genome database, 00:06:03.12 is accessible at PlasmoDB.org. 00:06:06.03 And this project has itself been so successful 00:06:10.15 that it's given rise to a series of other pathogen databases. 00:06:17.06 Let's take just a moment to talk about 00:06:21.19 the philosophy behind PlasmoDB. 00:06:24.21 Our goal in this project is 00:06:30.19 to take information that emerges from genome sequencing projects... 00:06:33.19 ideally, curated information... 00:06:35.22 but to integrate that with automated analyses, 00:06:38.16 recognizing that the rapid pace with which data emerges 00:06:41.17 from these large-scale projects 00:06:43.28 precludes the ability to look at 00:06:46.26 each and every bit of information manually. 00:06:50.15 We want to incorporate data that comes 00:06:53.01 not only from genome sequencing projects 00:06:56.10 and cDNA sequencing projects, 00:06:58.22 but also information that comes from population genetics studies, 00:07:00.20 from functional genomic studies on RNAs and proteins, 00:07:05.05 from chromatin modifications, 00:07:07.04 protein-protein interactome studies, 00:07:08.25 clinical outcomes data, 00:07:10.19 and many other sorts of projects. 00:07:13.23 To provide that data in a rapid way 00:07:17.17 that most importantly of all enables laboratory researchers like you, 00:07:22.15 like myself, like individuals in my laboratory 00:07:25.24 to ask their own questions. 00:07:28.00 And what I mean by this is that 00:07:30.26 our goal is not simply to provide a catalog, 00:07:32.27 an encyclopedia of malaria, 00:07:34.16 in which one could look up the answer 00:07:38.21 to any question that might arise. 00:07:40.18 And the reason for that is best manifested 00:07:43.23 by the apicoplast project. 00:07:47.06 So, for example, 00:07:50.06 imagine that we had a catalog 00:07:52.28 of all of the information associated 00:07:55.13 with malaria parasites, circa a decade ago, 00:07:58.02 and we wanted to look up the answer to the question of, 00:08:01.19 find plant genes in malaria parasites? 00:08:04.25 That's not an entry that we would 00:08:07.16 have found in that encyclopedia. 00:08:10.23 And yet we're able to identify those genes 00:08:12.25 in what seemed like a nonsensical question 00:08:14.29 by taking advantage of generic tools 00:08:16.29 that allow us to look for proteins 00:08:19.02 that have particular attributes 00:08:21.11 associated with targeting to this organelle 00:08:23.21 or its evolutionary history, 00:08:25.21 as we talked about in the last lecture of this series. 00:08:28.07 And this project as a whole 00:08:30.11 has been quite successful, 00:08:32.22 with worldwide access, many, many thousands of hits per day 00:08:37.22 from dozens and dozens of countries, 00:08:40.26 over 100 countries around the world. 00:08:44.21 Our motivating goal is that there is 00:08:47.11 no such thing as a stupid question. 00:08:49.21 So, let's see what we would look at... 00:08:52.02 observe if we were to look at 00:08:55.15 the Plasmodium genome database, 00:08:58.02 and I encourage all of you to do so yourselves. 00:09:01.12 So, let's turn to look at 00:09:05.19 what we would see if we look at the Plasmodium genome database, 00:09:07.29 and I encourage each of you to do so... 00:09:10.08 do so yourselves, 00:09:11.23 and we'll be spending a bit of time on this database. 00:09:14.13 If you were to look at that... 00:09:16.17 we can look... we can enter the database 00:09:18.26 from the standpoint of an individual gene, 00:09:20.28 this particular gene, 00:09:22.29 a gene with the uninformative name PF10_0407 00:09:27.14 corresponds to a gene that is annotated 00:09:30.13 as a dihydrolipoamide acetyl transferase, 00:09:33.00 at least putatively. 00:09:35.23 Now, this gene may be of interest to you; 00:09:37.28 it's of no particular interest to me. 00:09:41.04 We can see that it is on chromosome 10 00:09:43.11 located at about 840,000+ nucleotides 00:09:47.00 from one end, 00:09:48.29 that it's the subject of an ongoing annotation effort. 00:09:55.16 Including a modification of the initial gene model, 00:09:57.27 it appears that a new exon has been... 00:09:59.14 has been added into this predicted gene. 00:10:01.13 And I've excised a vast wealth of additional information 00:10:03.14 that comes from expression profiling studies 00:10:05.17 and computational analyses 00:10:08.05 of predicted motifs 00:10:09.29 and knockout studies that are indicated 00:10:12.23 down at the bottom of this gene. 00:10:14.13 And if you know more about this gene, 00:10:16.13 I'd encourage you to click on the link 00:10:18.14 for user comments, popping up a window, 00:10:21.04 adding additional information 00:10:23.14 that can serve as a global online laboratory notebook 00:10:27.00 about this particular gene 00:10:29.05 so that the information is available to others. 00:10:31.24 Now, of course, you might not want to explore 00:10:34.18 the genome from the standpoint of individual genes one at a time. 00:10:37.29 You might want to look at the genome 00:10:39.28 from the standpoint of the genome sequence itself. 00:10:44.20 And so, we can take a look at this from a chromosomal point of view. 00:10:48.08 In this case, we're now looking at chromosome 11, 00:10:52.07 at the entire 2 million nucleotides of chromosome 11 00:10:54.21 spanning this range above, 00:10:56.26 and there are dozens or hundreds of genes on this, 00:11:00.10 too many packed in together too tightly 00:11:02.24 to be able to explore. 00:11:05.26 And we can look at additional information, 00:11:08.02 in this case displaying information 00:11:10.18 related to polymorphic data 00:11:12.26 coming from the emerging genome sequences 00:11:15.03 of dozens and soon to be hundreds 00:11:18.15 of malaria parasites, 00:11:20.20 in this case color-coded 00:11:23.04 to represent those that are coding versus non-coding 00:11:25.20 or synonymous versus non-synonymous polymorphisms. 00:11:29.28 We can zoom in on that further 00:11:32.18 and display additional information, 00:11:34.10 in this case from a comparative genomic standpoint 00:11:36.16 looking at chromosome 4 00:11:39.12 from Plasmodium falciparum 00:11:42.00 and comparing that genome sequence 00:11:45.27 with comparable regions, 00:11:48.00 syntenic regions, 00:11:50.01 regions which can be aligned from Plasmodium vivax 00:11:52.10 -- another human malaria parasite -- 00:11:54.13 or Plasmodium yoelii 00:11:56.12 -- a rodent malaria parasite. 00:11:58.27 And by aligning these, 00:12:01.29 we can see that virtually all of the genes 00:12:03.28 across this long span of DNA 00:12:06.25 -- hundreds of thousands of nucleotides -- 00:12:09.01 can be matched up one-to-one with genes 00:12:12.15 in the Plasmodium vivax genome, 00:12:14.05 with a few exceptions. 00:12:15.27 There are some regions that don't line up very well. 00:12:17.26 We seem to be unable to find 00:12:20.00 a corresponding portion to the telomeric ends 00:12:22.04 -- the left or the right telomeric ends of this chromosome -- 00:12:26.13 and that's perhaps not surprising 00:12:28.26 given that this region is rich in polymorphic genes 00:12:31.29 specific to Plasmodium falciparum. 00:12:34.29 Similarly, we have other gaps. 00:12:36.19 This gap here, 00:12:39.13 in which more polymorphic antigens 00:12:42.01 from Plasmodium falciparum correspond 00:12:44.18 to a break between two contiguous sequences 00:12:47.00 of Plasmodium vivax, 00:12:49.16 and another region down here in which an insertion 00:12:51.26 of several of these polymorphic genes 00:12:54.05 absent from Plasmodium vivax 00:12:56.00 corresponds to no obvious genes in Plasmodium vivax, 00:13:00.07 and here, conversely, 00:13:03.01 a region... a family of genes 00:13:05.02 -- vir genes in Plasmodium vivax -- 00:13:07.09 which seem to have no match in Plasmodium falciparum. 00:13:11.09 Interestingly, we can see that 00:13:14.09 while the Plasmodium yoelii genome 00:13:16.03 has not been assembled to completion... 00:13:18.08 there are scores of individual contigs aligned here... 00:13:24.11 we can nevertheless find a one-to-one mapping 00:13:28.20 between most of these genes and the genes in Plasmodium vivax 00:13:30.25 and Plasmodium falciparum. 00:13:32.16 Thus, while the genome sequence for Plasmodium yoelii 00:13:36.02 is not complete, 00:13:38.17 it's sufficiently complete 00:13:40.22 that we can find most of the genes that may be of interest. 00:13:44.19 Now, I tried to make the point that 00:13:47.26 these tools are not simply resources to look up the answer to these questions 00:13:53.15 -- how many genes are there on chromosome 4 00:13:56.05 in Plasmodium falciparum? -- 00:13:58.08 but can be used as research tools as well. 00:14:01.09 And I'd like to give you an illustration of that 00:14:03.19 from an interesting study that came 00:14:06.08 as a result of a workshop 00:14:09.16 for East African students in Tanzania 00:14:12.22 who were excited by the recent release 00:14:16.13 of Plasmodium vivax data 00:14:18.10 and wanted to explore that a little bit differently... 00:14:21.13 a little bit further. 00:14:23.11 And in the course of looking, 00:14:25.19 they noticed there are certain regions of the chromosome... 00:14:27.02 and here, a small portion of the Plasmodium falciparum chromosome... 00:14:30.12 chromosome 4, 00:14:33.26 zooming in on that portion of the larger chromosome 00:14:36.07 that we looked at earlier... 00:14:38.02 they noticed a particular region 00:14:39.29 with no annotated genes 00:14:42.03 that was extraordinarily rich in A and T nucleotides, 00:14:46.03 that the percent GC composition is virtually nil, 00:14:50.05 even in the context of Plasmodium falciparum, 00:14:52.17 the most AT-rich eukaryotic genome known. 00:14:56.08 And so, they asked the question, 00:14:57.29 whether this region, 00:14:59.19 which is known to correspond to the centromeric region 00:15:01.25 of the chromosome in Plasmodium falciparum, 00:15:03.28 might also correspond 00:15:07.07 to centromeric regions in Plasmodium vivax 00:15:09.04 or Plasmodium yoelii. 00:15:11.01 And so, they pulled up individual tracks 00:15:14.04 to display the genomes of Plasmodium yoelii 00:15:17.01 and found that in this case 00:15:20.02 the incompletely assembled Plasmodium yoelii genome 00:15:23.09 contained no sequence which was known to map to this region. 00:15:25.29 But in Plasmodium vivax, 00:15:27.26 we can see a single contiguous piece of DNA 00:15:30.05 that spans this region 00:15:32.20 that is also poor in annotated genes. 00:15:35.06 And sure enough, when these students turned on a display 00:15:38.20 to visualize the A and T nucleotides, 00:15:41.02 they see an AT-rich region of the Plasmodium vivax genome, 00:15:45.12 which undoubtedly corresponds to the... 00:15:48.07 the centromere... 00:15:49.21 these genes as well. 00:15:51.26 So, just to be clear, a group of students 00:15:53.27 with no previous exposure to these approaches, 00:15:56.24 sitting in Tanzania, 00:15:58.25 was able, for the first time anywhere in the world, 00:16:01.26 to carry out exploratory research, 00:16:04.28 computationally identifying highly probable 00:16:09.27 centromeric regions of the Plasmodium vivax genome. 00:16:14.03 Now, this is certainly a question 00:16:16.21 of great intellectual and academic interest. 00:16:20.02 But when we're dealing with important human pathogens 00:16:23.10 such as malaria parasites, 00:16:25.02 we obviously have many other concerns as well 00:16:29.02 of a practical nature. 00:16:30.24 We might want to ask, for example, 00:16:32.11 what are appropriate targets for drug or vaccine development? 00:16:36.19 And in addressing those kinds of questions, 00:16:40.04 it's worth taking a little bit of time 00:16:42.14 to think about what the questions are 00:16:46.14 that you'd like to ask 00:16:48.12 and how to formulate those in a computationally accessible form. 00:16:51.05 Just as we asked for the apicoplast, 00:16:53.26 find all genes that have one or another aspect 00:16:56.26 of the targeting signals associated with proteins 00:17:00.15 destined for the... that organelle, 00:17:02.28 and one or another of the evolutionary signatures 00:17:05.19 that suggests a plant or algal plastid ancestry. 00:17:09.06 We might want to ask, 00:17:11.16 if we were looking for targets for drug development, 00:17:13.10 for parasite homologues of genes 00:17:17.06 that are known to be targets for drug development... 00:17:19.14 dihydrofolate reductase, for example, 00:17:21.22 a prominent target for chemotherapeutic drugs 00:17:24.02 and indeed for drugs effective against malaria, 00:17:26.29 or proteases, a more recently explored class of targets 00:17:33.29 for a... for a variety of in infectious diseases, 00:17:37.18 genes that might be associated with the cytoskeleton, 00:17:40.21 that might highlight those remarkable aspects 00:17:43.13 of parasite division 00:17:45.27 that distinguish them from the way our cells divide 00:17:48.15 that we talked about in the first iBio lecture in this series. 00:17:51.14 We might want to look for proteins associated with the apicoplast 00:17:55.02 or that are expressed at appropriate stages of the parasite life cycle, 00:17:58.29 or in the appropriate place, 00:18:01.15 or for which we have candidate inhibitors. 00:18:04.12 All of these are questions 00:18:07.12 that might be relevant to drug target identification, 00:18:10.28 and we'll come back to look at some of those questions 00:18:14.01 a little bit later on. 00:18:15.17 But as maybe a more challenging exercise, 00:18:18.16 I'd like to think instead about vaccine candidates. 00:18:22.25 And this is a... I would submit, 00:18:24.19 a more challenging exercise 00:18:26.25 for at least two reasons, 00:18:28.11 first of all because we know perhaps less 00:18:32.07 about what constitutes an effective vaccine target, 00:18:35.01 and secondly because I for one 00:18:39.07 claim no expert knowledge 00:18:41.11 about vaccinology. 00:18:43.01 But even as a naive immunologist, 00:18:44.26 I might imagine that we might want to look for genes, for example, 00:18:49.19 that are immunodominant or expressed on the surface of the... 00:18:52.21 of the parasite. 00:18:55.07 Now, I'd be happy to take you through 00:18:58.10 a series of graphic illustrations 00:19:01.12 of how one might go about identifying candidate drug targets 00:19:04.16 or vaccine targets, 00:19:08.00 but I would urge all of you, 00:19:10.07 when confronted with resources 00:19:14.18 -- online resources that are designed to be able to explore these projects -- 00:19:19.20 to challenge yourself and challenge whoever is introducing you to those resources 00:19:24.18 to let you explore it on your own. 00:19:27.10 And so, rather than providing a series 00:19:32.06 of pre-prepared graphic illustrations, 00:19:34.05 I'd like to run a live demonstration. 00:19:37.15 We're going to take a break for just a moment, 00:19:39.16 and I'm going to set up my laptop here 00:19:43.03 to be able to run a live experiment -- 00:19:47.03 a series of questions that's designed 00:19:49.27 to identify candidate vaccine targets. 00:19:52.17 And I hope you'll find this of interest. 00:19:54.11 It's something you may want to follow along on your own computer, 00:19:58.12 either now or at some other time. 00:19:59.29 So, let's take just a break and get this set up.