Genomics and Cell Biology of the Apicomplexa
Transcript of Part 4: Designing and Mining Pathogen Genome Databases: From Genes to Drugs and Vaccines II
00:00:02.06 We're back, 00:00:03.29 and we're now looking live at the Plasmodium genome database 00:00:06.26 at PlasmoDB.org. 00:00:10.16 And before we turn to the question 00:00:13.10 that we raised on trying to identify candidate vaccine targets 00:00:17.06 for malaria, 00:00:19.16 let me just provide a little bit of context 00:00:22.26 that may be a little easier to see 00:00:26.05 than what we'd seen before 00:00:28.29 in those canned screen dumps. 00:00:32.10 The first point I'd like to make is that 00:00:34.25 the success of the Plasmodium genome database 00:00:37.12 has been such that it has led to 00:00:40.13 its expansion to encompass a variety of other organisms. 00:00:43.29 The PlasmoDB project 00:00:47.10 morphed into the Apicomplexan Genome Database, 00:00:50.06 APDB, 00:00:52.10 which was itself expanded still further 00:00:54.03 into the Eukaryotic Pathogen Genome Database, 00:00:58.17 encompassing a wide range of organisms -- 00:01:02.28 not only apicomplexan parasites, 00:01:05.28 such as Cryptosporidium and Plasmodium 00:01:08.18 and Toxoplasma and Theileria, 00:01:10.20 but also other species as well, 00:01:12.23 such as Giardia and Trichomonas, 00:01:14.24 which we won't be talking about today. 00:01:17.02 This project is in fact just part of a larger 00:01:20.22 Bioinformatics Resource Center project 00:01:23.04 funded by the US NIAID 00:01:25.09 that includes several different genome databases from... 00:01:30.11 dealing with a variety of pathogens. 00:01:33.00 And those of you who are interested in other pathogens 00:01:36.11 might want to explore this further. 00:01:38.24 Now, the purpose of having an overar... 00:01:40.28 overarching website for exploration 00:01:43.02 not only of malaria parasites 00:01:45.06 but many eukaryotic pathogens 00:01:48.05 is that there are a variety of questions 00:01:49.28 that you might want to ask 00:01:52.12 that extend beyond an individual species. 00:01:57.21 And these are several ways that you can explore that. 00:02:00.02 I should also point out that this homepage 00:02:02.13 also provides, in addition to links 00:02:05.02 to some of these other pages, 00:02:07.10 other bits of information. 00:02:09.00 You might, for example, be interested in tutorials 00:02:11.18 that highlight some of the features 00:02:13.15 we'll be talking about. 00:02:15.11 And further down the page, here, 00:02:17.02 you can see links to... 00:02:19.06 links to individual tutorials, 00:02:21.14 links to publications, 00:02:23.09 workshops, exercises, and so on. 00:02:27.25 We can run questions across 00:02:31.00 a variety of these organisms. 00:02:32.17 If we were interested in apicomplexan parasites, for example, 00:02:34.22 we might want to take a look at metabolic pathway maps 00:02:37.18 for these organisms. 00:02:39.20 And in this case, we've taken annotations 00:02:43.01 from the Plasmodium genome database, 00:02:45.11 the Toxoplasma genome database, 00:02:47.08 the Cryptosporidium genome database, 00:02:49.20 and mapped those on top of 00:02:52.23 the KEGG metabolic pathway projects 00:02:54.29 emerging from database projects in Japan. 00:02:57.18 If, for example, we take a look at carbohydrate metabolism, 00:02:59.28 and dive in further to see the glycolytic pathway, 00:03:03.06 the key to this analysis is indicated up at the top, 00:03:07.14 in which Toxoplasma is indicated in red, 00:03:09.20 Plasmodium is indicated in green, 00:03:12.24 Cryptosporidium in yellow, 00:03:14.16 and human in blue. 00:03:16.09 And so, we can see, by looking at the painting 00:03:18.14 of this metabolic pathway, 00:03:20.01 that from top to bottom 00:03:22.22 all of these organisms are capable 00:03:25.06 of carrying out glycolysis. 00:03:27.16 Now, that might not sound very surprising. 00:03:29.28 But we can dive in a little bit deeper 00:03:32.11 and take a look, for example, at the TCA cycle. 00:03:34.25 And now we see a somewhat different pattern, 00:03:37.05 in which the yellow bug 00:03:39.21 -- in this case, Cryptosporidium -- 00:03:42.12 doesn't do this pathway. 00:03:45.03 And indeed, that's the case. 00:03:47.02 Cryptosporidium is an anaerobe, which doesn't carry out... 00:03:49.21 which doesn't carry out oxidative phosphorylation. 00:03:54.18 There are many other pathways we could look at 00:03:58.18 if we were interested, for example, 00:04:00.17 in some of those metabolic pathways, 00:04:02.16 which we now know are associated with the apicoplast -- 00:04:04.27 for example, pathways involved in the biosynthesis of steroids. 00:04:08.25 We could see that purely from the pattern 00:04:11.26 of gene presence and absence, 00:04:14.07 the red and green organisms 00:04:17.28 -- Toxoplasma and Plasmodium -- 00:04:19.29 clearly use a different pathway 00:04:22.24 for synthesizing isoprenoids 00:04:24.25 than the blue organism -- human. 00:04:26.13 And indeed, this is the case. 00:04:28.10 One of the most striking findings from the biochemistry of the apicoplast 00:04:31.27 is that these parasites synthesize isoprenoid subunits 00:04:37.12 via a xylose pathway 00:04:40.15 typically associated with chloroplasts -- 00:04:42.25 quite distinct from the HMG CoA-reductase pathway 00:04:45.26 found in humans. 00:04:48.00 Cryptosporidium does neither, 00:04:49.29 presumably salvaging isoprenoid units, 00:04:52.17 which you can use apparently 00:04:54.24 to produce squalene, 00:04:56.15 but of these organisms is capable of converting that squalene 00:05:00.10 all the way into cholesterol, 00:05:02.25 so this is not a sterol biosynthesis pathway, 00:05:05.26 but it's certainly a pathway for the production 00:05:08.16 of isoprenoid precursors. 00:05:11.18 And there are many other fascinating aspects of parasite biochemistry 00:05:15.06 to take a look at. 00:05:17.00 So, let's return to the Eukaryotic Pathogen Genome Database, 00:05:20.18 and return further to the Plasmodium genome database 00:05:23.20 that we... 00:05:25.08 that is the subject of our discussion today. 00:05:28.22 Now, as you've already seen, 00:05:30.18 we can explore this database in many different ways. 00:05:33.01 We can look, for example, at individual genes, 00:05:36.09 and we'll just take a look at a single gene listed here, 00:05:38.29 the default gene on the... 00:05:41.09 on this pathway 00:05:44.06 known as the apical membrane antigen 1, 00:05:46.27 a famous gene in the world of malaria biology 00:05:49.23 because this has been advanced as one of the leading 00:05:55.19 vaccine candidates for malaria parasites, 00:05:58.07 although there are a variety of concerns about AMA1 00:06:01.04 which lead investigators 00:06:04.10 to be interested in identifying other candidates 00:06:07.09 that might also be worth exploration. 00:06:10.02 We can see that AMA1 is present on chromosome 11 00:06:13.05 and, as you've already seen in the illustration 00:06:15.15 from a chromosome-based view, 00:06:17.23 is a highly polymorphic antigen. 00:06:20.22 Many dozens of polymorphisms 00:06:23.16 known to be associated with this gene, 00:06:25.10 and we can see what those polymorphisms are 00:06:28.09 from different species. 00:06:29.24 We can see, for example, that this particular polymorphism 00:06:31.26 changes the coding potential, 00:06:34.13 such that in the reference 3D7 strain 00:06:36.27 the nucleotide C corresponds to a proline 00:06:39.26 whereas in many of the other species on this list... 00:06:42.29 many of the other isolates... 00:06:46.03 a T... C-to-T nucleotide polymorphism 00:06:49.25 results in a proline-to-serine mutation. 00:06:52.07 We can see user comments which have been entered, 00:06:55.28 providing additional information on these genes; 00:06:58.01 links to a variety of other gene pages; 00:07:00.28 protein features which have been identified 00:07:04.13 by a variety of means; 00:07:06.14 predicted structural information; 00:07:09.09 proteomic data indicating 00:07:12.21 that there's evidence for expression at the protein level; 00:07:15.12 microarray analysis on several different platforms 00:07:18.15 indicating that this gene 00:07:21.24 is most abundantly expressed late in the intraerythrocytic life cycle, 00:07:25.25 as one might expect for a gene 00:07:28.06 that is present in merozoites, the extracellular stage, 00:07:32.22 that one might want to target in a vaccine 00:07:36.16 that would be effective against 00:07:39.20 the disease-causing stage of malaria parasites; 00:07:43.24 additional information from other expression studies, 00:07:46.14 from knockout studies; 00:07:48.16 sequence information that can be shown here; 00:07:51.06 and so forth. 00:07:53.20 But as we described ear... discussed earlier, 00:07:56.21 the real power of this database 00:07:59.15 comes not solely from viewing it as a catalogue 00:08:02.25 of available information 00:08:05.08 but as an opportunity for being able 00:08:09.03 to ask your own questions. 00:08:11.28 So, what kinds of questions can we ask? 00:08:14.03 Here, under the queries and tools link 00:08:18.00 indicated at the upper left-hand corner of your screen, 00:08:21.26 we can see a grid describing 00:08:24.05 a wide range of questions 00:08:26.16 that one might choose to ask. 00:08:28.18 For example, we might imagine that chromosomal location 00:08:32.26 was in some way informative for candidate vaccine targets. 00:08:36.07 I'm not quite sure how that would work; 00:08:38.21 it's not really clear to me 00:08:41.24 how proximity to a centromere 00:08:44.00 might be indicative of a good target for vaccine development, 00:08:47.03 so I'm not going to pursue that line of inquiry, 00:08:49.27 but you might want to if you have reason for thinking 00:08:53.15 that chromosomal location is informative 00:08:55.21 for effective vaccine targets. 00:08:58.02 Let's instead start with some more obvious kinds of approaches. 00:09:02.18 We certainly would expect that a target 00:09:05.27 for vaccine development 00:09:08.14 would have to be antigenic in some way. 00:09:10.20 And so, here we can take advantage 00:09:13.10 of extensively curated information 00:09:15.26 that comes from the Immune Epitope Database Project, 00:09:18.05 whose research on other databases 00:09:20.21 has been incorporated into this database. 00:09:22.28 And we can look for genes that have been annotated 00:09:25.06 with a very high confidence of antigenic function 00:09:29.15 against Plasmodium falciparum. 00:09:32.05 And what we see is several genes -- 00:09:35.13 41 to be exact. 00:09:38.18 Now, that's a disappointingly small number. 00:09:41.10 It includes the merozoite surface protein 1, which, 00:09:45.10 with AMA1, 00:09:47.15 is also viewed as a promising candidate for antimalarial vaccine development, 00:09:50.20 and a variety of other merozoite surface proteins, 00:09:53.01 as one as one might expect. 00:09:55.23 But, surely, there must be more than 41 candidate targets 00:10:00.20 in a genome of many thousands of genes. 00:10:03.29 So, let's modify this query 00:10:07.06 to ask a different question, 00:10:10.00 ask not just those antigens 00:10:12.22 that have a high confidence of immune reactivity 00:10:17.04 but those that have any confidence of immune reactivity. 00:10:21.09 And now we come up with a much larger list of genes, 00:10:24.05 which may have lower... 00:10:26.06 for which we may have lower confidence, 00:10:28.12 but things that we might want to explore further. 00:10:32.12 What else might we want to ask? 00:10:34.15 We'll return to our query grid here 00:10:36.29 and ask about other information. 00:10:38.27 So, we've asked about genes 00:10:41.19 that show some evidence -- based on manual curation -- 00:10:44.09 of being effective epitopes. 00:10:47.08 We might also imagine that genes 00:10:49.20 that would be effective targets 00:10:53.10 for vaccine development 00:10:58.13 would have to be expressed in the right place and at right time. 00:11:03.08 By in the right place, we mean presumably 00:11:05.27 on the surface of an infected red blood cell 00:11:08.22 or on the surface of the parasite itself, 00:11:11.08 and we can gauge that information 00:11:14.11 by looking at cellular location, here. 00:11:16.17 We know that signal peptides 00:11:18.29 are likely to be involved in targeting proteins outside of the cell, 00:11:21.10 so let's ask for all proteins in a malaria parasite 00:11:24.01 that have a predicted signal sequence. 00:11:27.07 And we're interested, in this case, in Plasmodium falciparum, 00:11:29.24 although we could interrogate other malaria parasites as well. 00:11:33.04 And when we ask a question like that, 00:11:35.19 we get a list of many hundreds of genes, 00:11:39.05 including genes that certainly wouldn't be at the top 00:11:43.01 of anyone's list as a vaccine target, 00:11:45.28 such as this pseudogene that's listed... 00:11:48.04 that's indicated here. 00:11:50.28 Interestingly, as I look at this number, 00:11:54.02 while we find many hundreds of genes, 00:11:56.25 there are not that many hundreds of genes. 00:11:59.01 I would have naively expected that for an organism 00:12:01.12 that makes its living by secreting proteins to modify the host cell, 00:12:05.14 using those specialized apical secretory organelles 00:12:08.20 that we discussed in the first lecture of the series, 00:12:11.24 surely more than 10% of the parasite genome would be... 00:12:16.10 would be secreted. 00:12:19.08 What could possibly account for this... 00:12:22.19 for this shortfall? 00:12:24.22 These organisms, remember, are eukaryotic organisms. 00:12:27.26 And this brings us face to face 00:12:30.08 with the bane of genome annotation 00:12:34.10 in eukaryotic species, 00:12:36.23 and that is the following... 00:12:39.08 that while we are quite good at identifying coding sequence, 00:12:42.19 it's quite difficult to identify every single exon 00:12:46.19 that is encoded into a... 00:12:48.25 that's translated into a protein in eukaryotic species. 00:12:52.02 And that's particularly true 00:12:56.22 at the extreme 5' end of the gene; 00:12:59.03 the first exon is the most difficult to identify. 00:13:02.01 And that, in turn, manifests itself 00:13:04.24 as an inability to accurately predict signal sequences. 00:13:10.10 So, we can imagine expanding our search 00:13:13.23 a little more broadly 00:13:16.15 to identify more proteins. 00:13:18.10 Let's imagine, for example, 00:13:20.10 if we go back to the grid of questions that we've asked, 00:13:22.29 that we might want to ask 00:13:26.03 not only for proteins that have a recognizable signal peptide 00:13:29.27 but also for proteins that have recognizable transmembrane domains, 00:13:34.12 anticipating that those that have a transmembrane domain 00:13:36.19 without a signal sequence 00:13:39.14 are probably proteins for which we weren't really able 00:13:42.10 to recognize the signal sequence accurately. 00:13:44.29 Once again, we will look only in Plasmodium falciparum. 00:13:47.24 I'm not interested in proteins 00:13:50.06 with two or ten transmembrane domains. 00:13:52.06 I'm really interested in proteins that have at least one... 00:13:54.12 I don't care how many... 00:13:56.05 you know, at least one, 00:13:58.01 and no more than a thousand transmembrane domains. 00:14:00.22 And now we see a slightly larger number -- 00:14:03.11 actually, about double the number... 1,700+ proteins. 00:14:07.23 Now, some of those proteins... 00:14:09.29 so, this will presumably include many of those proteins 00:14:12.28 with signal peptides. 00:14:14.17 Some of them will be secreted without a transmembrane domain. 00:14:17.01 But it includes many other proteins 00:14:19.14 that we have some confidence are associated at least with a membrane, 00:14:22.09 although we have no confidence that it's associated 00:14:26.03 with the surface membrane of those proteins. 00:14:28.18 Now, those of you with sharp eyes may have noticed 00:14:31.24 that over on the far left-hand end of the screen 00:14:34.28 is a box indicated as "My Query History," 00:14:38.17 and this is a history of all of the questions 00:14:42.27 that we've asked in the context of this session. 00:14:45.27 We asked, first of all, 00:14:48.14 for this individual gene, AMA1. 00:14:51.06 Secondly, we asked for the... 00:14:53.24 for the high confidence epitopes, 00:14:56.28 and here for epitopes with even low confidence, 00:15:01.28 proteins with signal peptides or with transmembrane domains. 00:15:04.22 And what we're really interested in for... 00:15:07.08 from the standpoint of location 00:15:11.13 is genes that have either a signal peptide or a transmembrane domain, 00:15:14.24 and so I'm going to combine these queries using a combination, here, 00:15:18.14 to look for the results of prote... 00:15:20.22 of our search for a signal peptide -- 00:15:23.22 that is, query 4 -- 00:15:26.01 or a transmembrane domain. 00:15:28.10 And the result that I get will be a much larger set 00:15:32.16 -- or a somewhat larger set -- 00:15:35.04 of about 2,000 proteins that have either a signal peptide 00:15:37.05 or a transmembrane domain. 00:15:38.23 Once again, it includes many proteins 00:15:41.09 that I don't think any of us would advocate 00:15:43.07 as vaccine targets... 00:15:45.02 the cytochrome oxidase genes 00:15:46.28 associated presumably with the parasite mitochondrion. 00:15:49.24 But we can see, now, the results of this query, 00:15:51.28 a new question which I'm going to rename, 00:15:53.27 just so I don't lose track of it. 00:15:56.02 And I'll just call this "signal peptide or transmembrane domain," 00:16:01.00 just so I don't forget about what the question is that I've asked. 00:16:06.01 So, we can imagine a wide range of other questions, 00:16:08.11 and I would encourage any of you who have questions 00:16:11.02 you would like to ask 00:16:13.28 to explore this query grid 00:16:17.15 for accessible questions that may be relevant to the ways 00:16:22.06 that you choose to interrogate the database. 00:16:24.17 We might also want to know about proteins 00:16:27.07 that are not only in the right place on the surface 00:16:30.00 but also at the right time. 00:16:31.26 Remember that intraerythrocytic life cycle -- 00:16:34.16 in which a parasite invades into an erythrocyte, 00:16:38.21 sets up its home as a ring stage parasite, 00:16:40.25 develops and metabolizes 00:16:43.09 and grows as a trophozoite, 00:16:44.28 finally emerging by assembling daughter parasites 00:16:47.18 as a schizont, 00:16:49.15 before rupturing outside of the red blood cell to release merozoites -- 00:16:53.08 we might expect that if we were interested in a vaccine that targeted 00:16:56.27 the red blood cell stage of malaria 00:17:00.12 that's responsible for the clinical symptoms, 00:17:03.04 we'd be interested in targeting those merozoites, 00:17:05.27 a very short-lived form about which it's difficult 00:17:09.14 to gather detailed information. 00:17:11.16 We could ask, for example, for proteomic da... 00:17:14.05 for protein data, 00:17:16.29 looking for mass spec-based data... 00:17:19.27 evidence of expression on merozoites. 00:17:23.03 And you may wish to explore that... 00:17:25.10 the datasets associated with transcription, 00:17:28.03 some of which were described in Joseph DeRisi's iBio seminar 00:17:31.27 on malaria... 00:17:35.05 are probably more extensive and more... 00:17:37.13 and more comprehensive. 00:17:39.04 So, I'm going to, instead, 00:17:42.13 interrogate the expression profile for... from trans... 00:17:46.22 from transcript levels. 00:17:49.25 And there are a number of queries that can be used 00:17:52.14 against various different organisms using various different datasets. 00:17:54.17 Since you may be familiar with the data set generated in the DeRisi lab, 00:17:57.05 we'll take a look at that data here, 00:17:59.03 looking at expression timing 00:18:01.21 based on glass slide microarrays, 00:18:03.20 although there are other ways that you can interrogate this data as well. 00:18:06.15 Now, we've already seen, 00:18:09.01 from looking at the expression profile of AMA1, 00:18:11.24 that the transcripts were most abundant 00:18:15.17 towards the end of that 48-hour window 00:18:18.22 of replication inside an erythrocyte. 00:18:21.00 And that makes sense if you imagine 00:18:23.23 that transcription is going to precede translation, 00:18:26.17 and so we might imagine that in that schi... 00:18:29.12 stage of schizogony, 00:18:32.10 proteins are most like... is the most likely time to transcribe genes 00:18:36.24 that are going to be translated for protein in merozoites. 00:18:40.21 And so, I'm going to ask for genes 00:18:42.26 that are maximally expressed at... 00:18:45.18 in the last third of that intraerythrocytic... 00:18:48.28 of that intraerythrocytic life cycle, 00:18:51.20 that is, the last 16 hours. 00:18:53.13 In other words, we're looking for things... 00:18:56.06 genes where expression is maximal at 00:19:01.03 40 plus or minus 8 hours. 00:19:04.01 I don't care when the gene is turned off. 00:19:06.11 But I'm going to look for genes that are upregulated by 4-fold 00:19:11.01 -- you can change these parameters if you wish -- 00:19:14.03 and that are reasonably abundant, 00:19:16.19 let's say in the top 60th percentile 00:19:20.08 of all genes in the genome. 00:19:23.16 And now we'll run this query, and presumably return hundreds of genes that are... 00:19:28.26 that fulfill those criteria -- 00:19:31.24 600 genes in this particular question. 00:19:38.04 600 genes. 00:19:39.14 And we can see, if I stand aside, the actual expression profile. 00:19:41.27 This hypothetical protein shows, indeed, the pattern that we expect: 00:19:46.16 maximally expressed towards the end of the intraerythrocytic life cycle 00:19:49.14 in all three of these strains 00:19:52.22 symbolized by the red, blue, and yellow curves. 00:19:56.21 Alright. 00:19:59.01 Are there other questions that we... that we might want to address? 00:20:03.05 Well, as a geneticist, 00:20:05.27 I guess I would be particularly interested in taking advantage of some 00:20:08.11 of the most exciting new datasets 00:20:10.16 that have emerged for malaria parasites, 00:20:12.24 from resequencing projects designed to assess the diversity of parasites 00:20:16.18 throughout the world. 00:20:18.15 And these have... and as a result of such studies, 00:20:21.28 we can identify polymorphisms, 00:20:24.08 single nucleotide polymorphisms, or SNPs, 00:20:28.04 that distinguish one gene from another. 00:20:30.09 And so, we'll consider comparing 00:20:35.11 any two strains of our choosing, 00:20:37.01 and I'm going to compare the reference strain, 3D7, 00:20:39.02 the strain whose complete genome was first sequenced, 00:20:43.01 with a field isolate, 00:20:46.24 a field isolate from Ghana, the GHANA1 strain. 00:20:49.23 And we can set our parameters in various different ways. 00:20:53.10 We could ask, for example, 00:20:55.16 for polymorphisms that are known to affect 00:20:59.00 coding potential 00:21:00.22 or for the density of polymorphisms. 00:21:02.25 Just to keep things simple, 00:21:04.22 I'm going to ask for any gene that has 00:21:07.08 at least five known polymorphisms. 00:21:10.06 But once again, you may want to manipulate these parameters. 00:21:14.05 And asking a question like this 00:21:16.27 gives us back several hundred genes, 00:21:19.09 including, as we might expect, 00:21:21.16 the variant surface antigens, PfEMP1 genes 00:21:24.08 that I don't think are... 00:21:27.15 would be likely advocated as a single-subunit vaccine, 00:21:32.13 but certainly genes that are likely to be highly polymorphic. 00:21:39.10 There are many other questions you can consider asking, 00:21:41.22 and this is... 00:21:43.15 this has already become a fairly long session, 00:21:45.10 so I'm going to just limit myself to one more question, 00:21:47.09 a question related to the evolutionary biology of these parasites. 00:21:51.11 One might imagine, 00:21:54.08 if we were looking for candidate vaccine targets, 00:21:58.21 that we'd be interested in genes that are specific to malaria parasites, 00:22:03.16 so we can interrogate for genes 00:22:06.16 across the range of life, 00:22:08.25 for where those genes are found. 00:22:11.07 And we might imagine, 00:22:13.20 as we scroll down to look at eukaryotic organisms, 00:22:16.03 and the apicomplexa in particular, 00:22:18.10 that we'd be interested in genes that are found in Plasmodium falciparum -- of course -- 00:22:22.15 and perhaps, if we wanted to consider a candidate target 00:22:25.07 with broad-spectrum activity, 00:22:27.17 we might want to look for genes that are also present in Plasmodium vivax, 00:22:30.12 the second leading cause of malaria in humans. 00:22:35.21 But we're certainly not interested in genes 00:22:38.04 that are present in humans. 00:22:39.22 And so, I'm going to... 00:22:41.10 I'm going to ask in this particular question 00:22:43.23 for genes that are absent from humans 00:22:45.27 or maybe absent from mammals all together. 00:22:49.25 And running a question like this gives me a large fraction of the genome, 00:22:55.09 a third of the parasite genome, 00:22:57.07 which is distinctive in being present in Plasmodium falciparum. 00:23:00.11 Most of these proteins, we know nothing about... 00:23:02.21 hypothetical proteins. 00:23:04.18 So, let's return to our now rather long list of questions 00:23:09.13 that we suggest might be relevant 00:23:11.27 to vaccine development. 00:23:17.13 We've asked for proteins that have antigenicity. 00:23:23.00 That was our question number 3. 00:23:26.01 And I'm going to try to combine that information with the other questions I've asked. 00:23:32.16 I'm going to ask for proteins now, 00:23:34.25 not just for... not for the union of proteins with signal peptides 00:23:38.05 and transmembrane domains, 00:23:40.04 as we asked earlier, 00:23:41.25 but for the intersection of these various different queries. 00:23:43.21 I'm going to ask for genes 00:23:46.12 that have some level of immunogenicity 00:23:48.03 and also have either a signal peptide 00:23:50.17 or a transmembrane domain 00:23:53.16 -- so, that was my question number 6 -- 00:23:56.01 were also present at the right time, 00:23:58.25 expressed abundantly in schizonts; 00:24:02.14 and were highly polymorphic, 00:24:05.24 indicating diversifying selection, 00:24:08.00 presumably under control of the immune system; 00:24:10.01 and also showed this evolutionary profile 00:24:13.15 that was present in these particular species. 00:24:16.02 So, this is a question 00:24:18.20 that I've actually never asked in exactly this same way, 00:24:21.11 although I've certainly run many similar sorts 00:24:25.25 of questions in the past. 00:24:27.28 And I can see that in this particular set of queries, 00:24:30.18 I come up with a list of 23 proteins. 00:24:34.15 Let me turn off this track ind... 00:24:38.20 showing the expression profiling 00:24:40.26 so we can see these a little bit better, 00:24:42.23 and I'm going to display all of them on one page. 00:24:47.05 And now, we can ask a little more readily 00:24:50.04 about the various proteins we've looked at. 00:24:53.02 So, let's scroll down the list. 00:24:57.01 First on the list -- just first alphabetically -- 00:24:58.27 is a hypothetical protein. 00:25:00.23 It's a conserved hypothetical protein. 00:25:02.21 Is this a vaccine antigen... I don't know. 00:25:05.14 But my eye is immediately drawn to what you might think of 00:25:08.24 for this computational experiment as a positive control, 00:25:13.20 that AMA1 protein, 00:25:16.22 the protein that is one of the leading vaccine targets 00:25:19.00 for antimalarial vaccine development. 00:25:22.00 A number of other proteins: a guanylyl cyclase, a kinase, 00:25:26.09 several other hypothetical proteins. 00:25:28.22 It's hard for me to believe that a tRNA ligase 00:25:31.17 would be a good vaccine candidate, 00:25:34.03 but here's the second of my positive controls, 00:25:37.06 MSP1, the other of these leading candidates 00:25:40.06 for an intraerythrocytic or an erythrocytic stage vaccine, 00:25:44.05 and several other proteins which have certainly been considered. 00:25:47.14 This CLAG9 protein has been advanced, for example, 00:25:50.12 as a candidate target for vaccine development. 00:25:54.10 So, my point here is not to argue 00:25:57.16 that computational approaches, 00:26:00.03 considered in and of themselves, 00:26:02.06 are ever going to be sufficie 00:26:05.07 for identifying successful vaccine targets. 00:26:07.29 That would be as absurd as saying 00:26:11.23 that we can identify the function of the apicoplast 00:26:14.24 solely by using cell biological approaches 00:26:19.08 of organelle purification without any biochemical or genetic characterization. 00:26:25.09 But certainly, in a few minutes sitting here at the computer, 00:26:28.04 we're been able to filter the many thousands of genes 00:26:32.24 in the parasite genome 00:26:35.13 down to a rather short list, a list of 23, 00:26:37.12 that includes both of our positive control antigens, 00:26:41.04 AMA1 and MSP1. 00:26:43.05 And I would imagine that if I were interested 00:26:46.02 in vaccine development for malaria, 00:26:49.01 I would certainly want to explore further 00:26:51.22 the other 21 proteins on this list, 00:26:54.07 as a manageable set that might be worth exploring 00:26:57.16 for candidate genes 00:27:01.02 that may be as good as or even better 00:27:04.25 than AMA1 or MSP1 as vaccine targets 00:27:08.23 for antimalarial development.