Baker begins his talk by describing two reciprocal research problems. The first is how to predict the 3 dimensional structure of a protein from a specific amino acid sequence, while the second is how to determine the amino acid sequence that will generate a new protein designed to have a specific structure. Baker’s lab is addressing the second of these challenges by developing computer programs (such as Rosetta@Home) that calculate the lowest energy, or most likely, structures for differently folded amino acid sequences. Baker explains how his lab can design a new protein structure, not found in nature, and using the computer programs they have developed, determine the amino acid sequence. It is then possible to back translate to the DNA sequence and synthesize the gene that can then be used to make the protein. When the structures of these synthesized proteins are determined by crystallography and compared to the predicted structures of the designed proteins, they are found to overlap very closely, demonstrating that the protein design algorithms work well.
In the second of his talks, Baker tells us how his lab has moved beyond designing new protein structures to designing new protein functions. The first example he describes is the development of an inhibitor of the influenza virus. Baker’s lab designed a protein structure that fits into a highly conserved region of the hemagglutinin protein found on the surface of influenza. Preliminary lab data suggests that this designed protein protects mice from infection with the flu virus. Baker also describes experiments in which proteins were designed to fit together and build multicomponent materials such as nanocages, nanolayers and nanowires.
All Course Materials for this Session (Educators only) – Created by Aditya Anand and Lynn Wang
00:00:08.19 I'm David Baker of the University of Washington,
00:00:11.14 and today I'm going to give you an introduction
00:00:13.09 to protein design.
00:00:16.00 Proteins function
00:00:18.17 by folding to unique native structures,
00:00:21.08 and some representative native structures
00:00:23.00 are shown on this slide.
00:00:25.24 Proteins are encoded in genes
00:00:28.12 in our genomes.
00:00:30.07 Each gene encodes one protein,
00:00:32.05 and the proteins up to these
00:00:34.05 unique native structures
00:00:36.03 in order to carry out their biological function.
00:00:40.04 Native structures of proteins
00:00:42.00 are likely the lowest energy states
00:00:44.17 for the protein sequence,
00:00:47.13 so for each amino acid sequence
00:00:50.17 of a protein
00:00:52.11 their corresponds an energy landscape,
00:00:54.25 of which I've shown a cartoon here,
00:00:57.10 and there are many different possible conformations
00:01:00.01 a protein can have.
00:01:01.29 The native state of a protein
00:01:03.13 is the lowest energy state,
00:01:05.04 what I've shown here.
00:01:08.28 There are two research problems
00:01:10.18 I'm going to describe today.
00:01:12.10 The first problem
00:01:14.03 is the problem of predicting protein structure.
00:01:16.28 In our genomes,
00:01:18.28 we have on the order of 30,000 different genes.
00:01:22.20 Each encodes a unique protein,
00:01:24.22 and each organism that exists on Earth
00:01:27.23 has a different genome
00:01:29.17 with a different complement of genes,
00:01:31.09 and hence proteins.
00:01:33.06 So, there's a general problem
00:01:35.03 of predicting what the structures and functions
00:01:37.12 of these proteins are.
00:01:39.04 So, the top arrow
00:01:42.13 shows going from an amino acid sequence
00:01:45.02 to a 3-dimensional structure.
00:01:48.06 So, in this case
00:01:50.01 we have a fixed amino acid sequence
00:01:52.08 and we have to find the lowest-energy structure.
00:01:55.05 The inverse problem
00:01:56.26 is the protein design problem,
00:01:58.20 which I'm going to focus on today.
00:02:00.10 In this case,
00:02:01.18 we don't start with a naturally occurring amino acid sequence
00:02:03.27 or a naturally occurring structure.
00:02:05.20 Rather, we start with a brand new structure
00:02:08.12 that we'd like to make
00:02:09.29 and we go backwards
00:02:11.12 to find an amino acid sequence
00:02:13.13 which will fold up to that structure.
00:02:16.23 Both of these problems,
00:02:18.14 the protein structure prediction problem
00:02:20.15 and the protein design problem,
00:02:21.23 are very hard problems,
00:02:23.16 and I'm going to tell you why in the next few slides.
00:02:26.18 The first reason they're hard
00:02:28.13 is that a polypeptide chain
00:02:30.25 can have a very large number of different possible conformations.
00:02:33.20 For each side chain in a...
00:02:36.20 for each amino acid in a protein chain,
00:02:39.19 there are many rotatable bonds,
00:02:41.24 as shown in this schematic,
00:02:43.28 so each side chain,
00:02:45.12 each amino acid can have
00:02:47.11 on the order of 3 different conformations.
00:02:50.24 So, if you have a 100 residue protein,
00:02:53.03 that means you have 3 conformations
00:02:55.07 for the first one,
00:02:56.18 3 for the second one,
00:02:58.05 and the number of possible conformations, total,
00:03:00.00 you get by multiplying together
00:03:01.23 all of these possibilities.
00:03:03.13 So, it's 3 times 3 times 3...
00:03:05.22 up to 100 times.
00:03:08.05 So, more generally,
00:03:09.27 if you have...
00:03:11.25 if Nres is the number of amino acids in the protein,
00:03:13.24 the number of different conformations
00:03:15.20 is 3 to that power, so 3^Nres.
00:03:18.11 And this is an astronomical number.
00:03:21.13 The second reason that these problems are hard,
00:03:24.26 in particular the design problem is hard,
00:03:26.21 is there's also an astronomical number of protein sequences.
00:03:29.09 So again, the first residues
00:03:30.27 can be any 1 of the 20 different amino acids.
00:03:33.05 The second position
00:03:34.20 can also be any 1 of the 20 amino acids,
00:03:37.29 so the number of possible sequences
00:03:39.22 is 20 times 20 times 20...
00:03:41.12 to the Nres power,
00:03:42.29 which is again a very, very large number.
00:03:45.24 The third reason that these are hard problems
00:03:48.06 is that we need to find the lowest energy structure
00:03:52.08 for a sequence,
00:03:53.23 for example, in the protein structure prediction problem.
00:03:56.00 It's hard because calculation energies
00:03:57.20 is difficult to do accurately
00:03:59.28 because proteins have many, many atoms
00:04:02.25 and they're surrounded by water molecules,
00:04:05.05 which also have many atoms.
00:04:07.03 Each water only has three atoms,
00:04:09.11 but there are many, many water molecules.
00:04:11.07 So, we need to energies accurately
00:04:13.22 for systems that have many 1000s of atoms.
00:04:17.22 And now what I'm going to do
00:04:19.10 is tell you about how we go about
00:04:21.23 solving these problems.
00:04:23.21 So, to search through the possible
00:04:26.22 conformations for a protein,
00:04:29.00 we try and mimic the actual folding process,
00:04:33.14 and here you see a movie
00:04:37.06 depicting the computer calculation
00:04:39.04 -- this is using the Rosetta methodology
00:04:41.08 which my group and others
00:04:43.06 have been developing for the last 15 years or so --
00:04:46.06 we try and simulate the actual process of folding
00:04:48.27 so we can sample through
00:04:51.12 and find the lowest energy structures
00:04:53.04 much more quickly than we could
00:04:55.08 if we were sampling all possible configurations,
00:04:57.29 which is essentially impossible.
00:05:00.25 So, this calculation that you see here
00:05:04.19 takes not much longer
00:05:06.16 than it takes you to watch it to actually calculate,
00:05:09.03 to actually carry out on a computer.
00:05:11.24 The challenge is that
00:05:14.01 every folding calculation like this,
00:05:16.03 or nearly every one,
00:05:17.24 will end up in a different final structure,
00:05:19.20 so what we need to do is many, many of these
00:05:22.20 independent calculations
00:05:24.27 to build up a picture
00:05:27.01 of what that energy landscape looks like
00:05:29.02 and where the lowest energy structure is.
00:05:33.03 The second problem that I mentioned
00:05:35.11 -- searching through the space of sequences --
00:05:37.21 we handle as shown in this animation.
00:05:42.29 Starting with a protein backbone
00:05:45.09 for which we want to find a very low-energy sequence,
00:05:48.16 we carry out a calculation
00:05:51.05 which at each step
00:05:53.01 we're randomly substituting in a different amino acid identity,
00:05:57.25 and different side chain conformation for that amino acid,
00:05:59.29 at a randomly selected position.
00:06:02.19 We can do these substitutions very rapidly,
00:06:05.11 we evaluate the energy,
00:06:07.14 and we accept the change
00:06:09.07 if the energy got lower.
00:06:11.02 So, in this way,
00:06:12.21 we can scan through a very large number of possible sequences
00:06:15.21 and quite rapidly
00:06:17.17 identify the lowest energy sequence
00:06:19.25 for a structure.
00:06:22.02 The third problem,
00:06:24.04 the necessity
00:06:25.28 to calculate energies accurately,
00:06:28.18 we solve in the following way.
00:06:30.10 We use a model in which
00:06:32.01 we try and capture
00:06:33.25 the detailed interactions between atoms
00:06:35.16 as accurately as we can,
00:06:38.16 so there are terms in the energy function
00:06:40.23 that favor close atomic packing,
00:06:43.13 but the atoms can't be overlapping,
00:06:46.07 they penalize the burial of polar atoms
00:06:48.21 that would like to interact with solvent,
00:06:51.29 they penalize the burial of such atoms
00:06:53.18 away from water,
00:06:55.14 they favor the formation of hydrogen bonding interactions
00:06:58.04 between polar atoms,
00:06:59.25 we model the electrostatic interactions,
00:07:02.05 the favorability of positive and negative charges
00:07:04.20 to be close together,
00:07:06.10 and we also model
00:07:08.00 the bending preferences
00:07:09.21 of the polypeptide chain.
00:07:12.15 So, given what I've told you,
00:07:14.29 the algorithms for searching
00:07:17.06 for the lowest-energy structure
00:07:19.02 for a given amino acid sequence,
00:07:21.03 that was in the movie where the protein structure
00:07:23.23 was moving around,
00:07:26.12 and the algorithm
00:07:27.29 for searching for the lowest-energy sequence
00:07:29.29 for a fixed structure,
00:07:31.16 there are again two problems
00:07:33.24 which we can approach.
00:07:35.11 The first problem is the structure prediction problem
00:07:37.20 where, again, we are going from genome sequences
00:07:41.01 to try to...
00:07:43.20 starting from those
00:07:45.18 and predicting the structures and functions
00:07:47.09 of the proteins that are encoded by those genes.
00:07:50.08 The second problem is the design problem,
00:07:53.03 where we start with something completely new
00:07:55.18 that we would like to make
00:07:57.24 and work backwards
00:07:59.28 to identify a sequence
00:08:01.28 which is predicted to fold up to that structure.
00:08:05.03 And, for the remainder of this talk,
00:08:07.24 I'm going to describe some examples
00:08:10.20 of the second type of calculation,
00:08:12.29 the design calculation.
00:08:16.15 First I want to give you an overview
00:08:18.16 of the different types of protein structures
00:08:20.09 found in nature.
00:08:22.25 There in the top left is a depiction of
00:08:26.04 a globular protein,
00:08:29.23 where the secondary structure elements,
00:08:31.14 the alpha-helices and the beta-sheets,
00:08:33.15 come together and form a roughly spherical protein
00:08:36.23 with hydrophobic residues buried in the interior,
00:08:40.09 and it's the burial of those hydrophobic residues
00:08:42.15 away from solvent
00:08:44.07 which stabilizes the protein.
00:08:46.04 On the right is a protein
00:08:49.01 that consists of long helices packed together
00:08:51.28 to make, for example
00:08:54.07 in the case of what's shown,
00:08:55.26 a channel protein.
00:08:58.00 In the lower left is a repeat protein
00:09:01.02 in which a very simple module
00:09:02.22 is repeated over and over and over again
00:09:04.19 to make a long filament.
00:09:07.17 And then finally, on the bottom right
00:09:10.06 is a small protein
00:09:12.14 which is held together with disulfide bonds,
00:09:14.19 which are shown in yellow.
00:09:16.28 And, nature accomplishes
00:09:19.06 all the great diversity of biological functions,
00:09:22.11 in our bodies and in all living things,
00:09:24.25 through different...
00:09:26.27 by utilizing these different types of proteins
00:09:28.26 in different circumstances
00:09:30.13 where each one is most appropriate.
00:09:32.05 So, what I'm going to describe now
00:09:34.18 is our efforts to design
00:09:37.00 ideal versions of these classes of proteins,
00:09:40.08 not a protein that exists in nature,
00:09:42.25 but sort of like the Platonic ideal
00:09:44.22 of a globular protein
00:09:46.08 or a repeat protein.
00:09:48.21 In contrast to what's been...
00:09:52.03 has come through evolution
00:09:54.01 has been the result of natural selection,
00:09:56.09 so random amino acids substitutions, then selection...
00:10:00.01 the process that...
00:10:01.28 and so what the result is...
00:10:03.16 the proteins you actually get have a lot of history in them
00:10:05.25 and they may have initially functioned in one way
00:10:08.18 and then they were coopted for something else,
00:10:10.26 so each protein has a lot of idiosyncrasies
00:10:13.03 because of its history.
00:10:14.04 What I'm going to now describe to you
00:10:15.20 is taking what we've learned about
00:10:18.05 these classes of proteins
00:10:19.19 and the algorithms I've described to make,
00:10:20.23 again, sort of idealized protein structures
00:10:23.02 which are free of those types of idiosyncrasies.
00:10:27.14 And, the way this works
00:10:29.15 is I've outlined how the calculations...
00:10:32.03 how we calculate a sequence
00:10:33.25 which is predicted to fold up to a given structure,
00:10:37.08 but that's just the first step.
00:10:39.00 The next step is,
00:10:40.19 since we've designed the protein,
00:10:42.13 we know what its amino acid sequence is
00:10:44.11 because we came up with that amino acid sequence...
00:10:47.00 from the amino acid sequence
00:10:48.17 we can work back to the DNA sequence,
00:10:51.07 that's using the genetic code
00:10:53.01 which was worked out in the 1960s...
00:10:55.18 once we know the DNA sequence
00:10:57.17 we can write down...
00:11:00.17 we can essentially buy,
00:11:03.04 or make very easily in the lab,
00:11:05.04 a synthetic piece of DNA
00:11:07.10 that encodes this protein.
00:11:09.08 So, the protein we've designed on the computer
00:11:10.11 will have never existed in nature,
00:11:12.12 it's something completely new,
00:11:14.29 and the real miracle of this
00:11:16.26 is that it's so easy to manufacture DNA these days
00:11:19.20 that we can, for any crazy protein we design on the computer,
00:11:23.17 we can very, very easily
00:11:26.12 make a gene that encodes that protein
00:11:28.27 and once we have that gene
00:11:30.20 we can make the protein in the laboratory
00:11:33.13 by putting the gene into bacteria,
00:11:35.25 growing up the bacteria,
00:11:37.20 we can extract the protein out,
00:11:39.13 and then we can determine
00:11:41.01 whether that protein folds up to the structure
00:11:43.15 that we designed,
00:11:45.15 and we can also measure other properties of the protein.
00:11:49.00 So, what I'm going to tell you about
00:11:50.23 are several design calculations.
00:11:53.09 We set out to make a brand new protein
00:11:54.26 that was an idealized version
00:11:56.16 of what exists in nature.
00:11:58.21 We carried out the design calculation,
00:12:00.26 we designed a gene encoding the designed protein,
00:12:03.20 we put it into bacteria,
00:12:05.00 purified the protein,
00:12:06.16 and then solved the structure.
00:12:07.28 So, I'm going to be showing you the designed models
00:12:10.01 and then the crystal structures
00:12:11.26 of those designs
00:12:13.16 that we determined experimentally.
00:12:16.18 So, the first example
00:12:18.16 is of the class of globular proteins,
00:12:20.27 which are composed of regular secondary structure elements
00:12:23.11 surrounding a hydrophobic core.
00:12:27.23 After we do the design calculation,
00:12:30.00 where we come up with a sequence
00:12:31.12 that's predicted to adopt the structure,
00:12:34.19 and the two structures I'm talking about here
00:12:36.24 are the ones that are shown
00:12:38.19 under the design column on this slide,
00:12:40.25 again they're idealized so all the helices are perfect helices,
00:12:43.13 the strands are perfect strands,
00:12:45.09 and the loops are very regular,
00:12:47.25 there's one more step.
00:12:49.24 We take advantage of the protein structure prediction calculation
00:12:52.11 I described.
00:12:53.29 So, we take those sequences
00:12:55.20 and we send them out to volunteers
00:12:57.07 all around the world
00:12:58.23 who participate in a project called Rosetta@home,
00:13:00.19 and these volunteers
00:13:02.11 predict what the structure is
00:13:05.21 of that sequence;
00:13:07.00 they search for the lowest-energy state
00:13:08.08 of that sequence.
00:13:09.26 And, in the plots on the left,
00:13:11.20 you see many, many red dots.
00:13:13.22 Each red dot is the result
00:13:15.07 of a different Rosetta@home volunteer.
00:13:18.00 On the y-axis is the energy
00:13:19.23 that's calculated by the Rosetta program
00:13:22.12 that's running on their computer,
00:13:24.05 and on the x-axis
00:13:26.02 is how far away that low-energy structure they found
00:13:29.24 was from the structure we're trying to make,
00:13:32.00 the one that's in the design column.
00:13:34.02 And, you can see, first of all,
00:13:35.13 how big and complicated the space is
00:13:37.04 by the fact that
00:13:39.04 many of these lowest-energy structures that are found
00:13:41.07 are very far away from the structure
00:13:44.10 that we're targeting.
00:13:45.19 So, the x-axis is root-mean-squared deviation
00:13:47.24 in the atomic coordinates.
00:13:50.09 So, these structures on the right of these plots
00:13:53.22 are 10 Ångstroms... each atom is on average 10 Ångstroms away
00:13:56.26 from where it was supposed to be
00:13:58.13 in the designed model.
00:14:00.18 So, you can see that different people land
00:14:02.18 in different local minima on the landscape,
00:14:04.18 so different ones of those bumps
00:14:06.09 or those wells
00:14:07.28 that I showed in that schematic near the beginning.
00:14:10.01 But, what you can see is true for both of these sequences
00:14:12.18 is that the lower the energy,
00:14:14.10 that's again on the y-axis...
00:14:16.03 the lower the energy
00:14:18.02 the more the structure tends toward
00:14:20.28 the designed model,
00:14:22.13 and so there's almost a funnel shape
00:14:23.25 to these plots where,
00:14:25.22 as you go to lower and lower RMSD, going left,
00:14:28.23 the energy gets lower and lower.
00:14:30.17 So, the lowest-energy structures
00:14:32.08 found by our Rosetta@home volunteers,
00:14:36.05 who really play a critical role in our research,
00:14:38.18 the lowest-energy structures
00:14:40.11 are almost identical to the designed model.
00:14:42.09 When we see this property,
00:14:43.29 which is the one that we are looking for,
00:14:46.05 we then manufacture a gene,
00:14:48.15 a synthetic piece of DNA that encodes the design,
00:14:51.00 we make it in the lab,
00:14:52.22 and then we solve the structure,
00:14:54.09 in this case by nuclear magnetic resonance,
00:14:56.06 with colleagues
00:14:59.00 in the NESG Structural Genomic consortium.
00:15:02.09 And, on the right
00:15:04.08 you the see the column marked NMR
00:15:06.12 shows the experimentally determined structure,
00:15:08.23 and you can see it's very similar
00:15:10.07 to the designed models
00:15:12.04 in the second column.
00:15:13.21 And, then on the far right are superpositions...
00:15:17.25 blow-up superpositions
00:15:19.28 of the designed model and the experimental structure,
00:15:21.24 and they show that the side chains in these designs are,
00:15:24.20 in actuality,
00:15:28.01 where we designed them to be.
00:15:30.09 So, we've been able to make such structures
00:15:33.19 almost pretty routinely now,
00:15:35.07 so we can make brand new globular protein structures like this
00:15:38.23 quite effectively.
00:15:40.04 In fact, a new student coming to my laboratory
00:15:42.02 typically is assigned the project
00:15:43.26 of making up a brand new protein structure
00:15:45.29 and proving that the design...
00:15:47.20 designing it and then
00:15:49.28 characterizing the design in the laboratory.
00:15:53.12 Now, we can get to larger structures in this way...
00:15:58.25 we can make this Platonic ideals of globular proteins
00:16:01.29 and we can put them together
00:16:04.12 to make larger and more complex structures.
00:16:06.27 So, this shows an example of taking two of the...
00:16:09.26 two idealized building blocks
00:16:11.16 we've solved the structure of, fusing them together,
00:16:14.04 and in the lower panel on the left
00:16:15.23 is the designed model
00:16:17.20 and the right is the crystal structure.
00:16:19.09 So again, this is a completely made up protein,
00:16:21.18 but when we solve its structure experimentally
00:16:23.20 it comes out exactly as we designed it.
00:16:28.14 Now, the second class of proteins I described
00:16:31.03 are not globular, they're not spherical,
00:16:33.16 they can be long and elongated,
00:16:35.09 and this is actually a protein that's very close to my heart
00:16:37.14 because I designed it myself.
00:16:39.09 This protein...
00:16:40.25 a schematic of it is shown on the top right.
00:16:42.28 This is composed of 80 residue helices,
00:16:45.18 and I made it taking advantage
00:16:47.14 of the equations that Francis Crick worked out
00:16:50.24 whereby a backbone structure can be described
00:16:53.27 by a small number of parameters,
00:16:55.29 and I can make many, many different such structures
00:16:58.28 by sampling through different possibilities for these parameters.
00:17:01.26 I do that
00:17:03.21 and then I design each possibility
00:17:05.17 and choose the lowest-energy structures.
00:17:08.01 When this protein is manufactured in the lab...
00:17:12.06 when it was manufactured...
00:17:14.11 I did some initial tests
00:17:16.02 and found it was very stable,
00:17:17.29 and then Joe Rogers, a graduate student in England,
00:17:20.12 was asking me for a protein to do experiments on
00:17:23.14 so I sent him this protein
00:17:25.18 and he sent back this result, which is really quite remarkable.
00:17:30.03 In order to unfold this protein,
00:17:33.18 you have to add extremely high amounts
00:17:35.18 of a chemical denaturant called guanidine,
00:17:37.21 that's on this plot on the left,
00:17:40.17 and the unfolding...
00:17:42.27 you can see that on these lines...
00:17:46.07 as you add more guanidine are pretty flat,
00:17:48.10 and then at very high concentrations, over 7 molar,
00:17:50.22 the protein starts to unfold,
00:17:52.15 but only really does this at very high temperature.
00:17:54.26 So, this is something that's simply not seen
00:17:56.22 for naturally occuring proteins.
00:17:58.15 These designed proteins can be more ideal,
00:18:00.07 so much more stable.
00:18:01.26 And, when the crystal structure was solved of this protein,
00:18:03.25 it was found to be nearly identical
00:18:05.19 to the designed model.
00:18:07.01 So, we can make this class of proteins also.
00:18:10.13 I mentioned repeat proteins,
00:18:12.15 that was a third class,
00:18:14.16 and we've also been able to make
00:18:16.29 idealized versions of these types of proteins.
00:18:19.22 So, on the second column here,
00:18:23.09 you see a repeated protein
00:18:25.22 that goes on indefinitely,
00:18:27.23 and on the left is
00:18:30.02 a comparison of the designed model in red
00:18:33.11 to the crystal structure in grey.
00:18:35.07 You can see they're nearly identical.
00:18:37.28 And, on the right you see another example
00:18:40.08 of an infinitely extending repeat protein
00:18:42.27 where we've made one subsegment of it in the lab,
00:18:46.10 and you again see that the crystal structure
00:18:48.29 is nearly identical to the designed model.
00:18:51.27 So, we're very excited about these
00:18:53.20 as the basis for new types of new nanomaterial.
00:18:56.07 We can make rods,
00:18:58.05 straight rods and curved rods,
00:18:59.29 and start building things out of them.
00:19:04.04 And the final class of proteins,
00:19:06.11 those small disulfide-bonded proteins,
00:19:08.10 are very interesting because they could form the basis
00:19:10.24 of new types of therapeutics
00:19:12.14 because they're very small and easy to make.
00:19:15.13 And, here this shows examples of...
00:19:18.28 this is work by Vikram Mulligan, a postdoc in the lab,
00:19:22.00 where he's designed
00:19:23.20 very short peptides
00:19:25.16 that are predicted to fold up to unique structures,
00:19:28.28 and there are three examples in the top row of this slide
00:19:31.25 of designs he made,
00:19:33.25 then below that are NMR structures of these peptides
00:19:36.04 when they're actually made in the lab.
00:19:38.15 And again, these peptides
00:19:40.19 come out with very, very similar structures
00:19:42.28 to the designed models.
00:19:45.08 So, what I hope I've shown you today
00:19:47.07 is I've given you...
00:19:50.01 explained something about how...
00:19:53.11 about the protein structure prediction problem
00:19:56.02 and the protein design problem.
00:19:57.18 I've told you how we go about
00:19:59.13 approaching these problems,
00:20:01.01 and then I've shown you that we can start to design
00:20:03.08 sort of idealized versions
00:20:05.05 of the different classes of proteins
00:20:07.02 that are found in nature,
00:20:08.29 and these proteins are likely...
00:20:12.11 will be the basis for designing a whole new world
00:20:15.18 of functional proteins to solve modern day problems,
00:20:20.05 and I'll talk about that in another iBio seminar.
00:20:24.04 And, I want to acknowledge
00:20:25.26 the fantastic people
00:20:27.24 who have actually done most of this work.
00:20:30.00 So, Robu and Rie Koga
00:20:32.20 developed these rules for making idealized protein structures,
00:20:36.01 and I showed you...
00:20:38.07 took you through the design of two of their structures.
00:20:40.21 Vikram Mulligan, I mentioned,
00:20:42.08 did the designed cyclic peptide work.
00:20:44.06 TJ Brunette,
00:20:46.23 Possu Huang,
00:20:48.18 and Fabio did the work on the repeat proteins.
00:20:53.03 And thank you for your attention.
- What is the purpose of designing an “idealized” protein?
- What do all three methods for solving protein design have in common? Why is this important?
- Can you come up with a few applications of protein design?
- Think of a specific problem in which an idealized protein could act as a solution.
00:00:08.04 I'm David Baker.
00:00:09.15 I'm a professor at the University of Washington,
00:00:11.12 and this is part 2 of my iBio seminar,
00:00:13.26 and today I'm going to be talking about
00:00:16.18 the design of new protein functions.
00:00:18.16 In the first part,
00:00:20.14 I spoke about designing brand new protein structures,
00:00:23.06 and now I'm going to show you, today,
00:00:25.12 how we can go beyond designing structure,
00:00:29.11 to designing new protein functions.
00:00:33.16 The motivation for this is really presented by nature.
00:00:38.24 The exquisite functions
00:00:40.04 of naturally occurring proteins
00:00:41.28 really solved the challenges
00:00:43.22 that were faced during biological evolution remarkably well.
00:00:46.20 So, if you think what living things are able to do,
00:00:50.11 they're able to capture energy from the sun,
00:00:53.01 they're able to use that energy to build up molecules,
00:00:55.25 build up complex organisms,
00:00:58.07 and eventually to think,
00:01:00.04 and for me to talk and listen to you...
00:01:02.04 and for you to listen.
00:01:04.03 So, in all those processes...
00:01:07.21 they are largely mediated by proteins.
00:01:12.04 In our genomes,
00:01:14.02 of course, are genes,
00:01:16.09 and those genes give the blueprint for life,
00:01:18.29 but they do so by encoding proteins.
00:01:21.12 Proteins are what actually do the work.
00:01:23.07 And again,
00:01:25.11 the protein complement we have in our bodies,
00:01:27.27 and the other living things currently existing on earth,
00:01:30.23 are really exquisitely tuned by natural selection
00:01:34.02 to solve the problems
00:01:35.21 that were relevant during evolution.
00:01:37.21 However, in today's world
00:01:39.14 we face challenges that were not faced
00:01:42.00 during natural evolution.
00:01:43.14 There are diseases like cancer and Alzheimer's
00:01:45.25 that were not really issues during evolution
00:01:47.25 because we didn't live long enough.
00:01:49.29 We're heating up the planet,
00:01:51.23 we're running out of fuel,
00:01:55.13 and there are new types of viral epidemics
00:01:57.18 that are coming around,
00:02:00.25 and one can have reasonable confidence that
00:02:03.02 if we had another billion years to wait,
00:02:04.24 and there was adequate selection pressure,
00:02:06.15 that all of these problems would be solved
00:02:08.06 beautifully by natural selection.
00:02:10.14 But most of don't have a billion years to wait,
00:02:12.12 and so what if we could design
00:02:14.24 a whole new world of synthetic proteins
00:02:16.26 that solved today's problems
00:02:19.01 as well as naturally occurring proteins
00:02:22.02 solved the problems that arose during evolution.
00:02:26.19 And that's really the grand challenge of protein design.
00:02:31.13 The methods
00:02:34.11 that are used in the calculations
00:02:36.16 I'm going to tell you about today
00:02:38.11 I reviewed in part 1 of my iBio seminar,
00:02:40.19 but I'll go over the basic ideas quickly again now.
00:02:44.17 The basic principle is that
00:02:46.16 proteins fold to their lowest-energy states,
00:02:48.24 and so if we want to design new proteins
00:02:50.10 that fold up into new structures
00:02:52.05 that carry out new functions,
00:02:54.01 we have to be able to calculate energies
00:02:55.26 reasonably accurately
00:02:57.18 and we have to be able to sample through
00:02:59.21 the different possible protein conformations
00:03:01.15 to find the lowest-energy state.
00:03:03.14 And, over the years, my group,
00:03:04.29 in collaboration with many groups around the world,
00:03:06.25 has developed the Rosetta protein design software...
00:03:09.21 protein structure modeling software
00:03:11.18 to carry out these calculations.
00:03:15.19 If we want to design proteins
00:03:17.06 with new functions,
00:03:18.28 we need hypotheses about
00:03:21.15 the shape of the protein, the configuration of atoms,
00:03:23.15 that would best carry out that function.
00:03:25.25 And the final point is the most important one:
00:03:27.24 we can design new models of new molecules
00:03:30.13 as much as want on the computer,
00:03:32.12 but if we don't go to the lab and test them,
00:03:34.09 they remain purely science fiction,
00:03:36.27 so the final step in everything which I tell you about
00:03:40.18 is to... after doing the protein design calculation,
00:03:43.16 coming up with a new amino acid sequence
00:03:46.26 that encodes a protein
00:03:49.05 that's predicted to have the desired function,
00:03:51.02 the final step is to
00:03:53.08 manufacture a synthetic gene
00:03:55.19 encoding that new protein,
00:03:57.16 a brand new protein that never existed before,
00:04:01.07 and then take that synthetic gene,
00:04:03.00 put it into bacteria, make the protein,
00:04:05.02 and then see whether the protein does
00:04:06.28 what it was designed to do.
00:04:10.04 The way the protein design calculations work
00:04:13.22 is shown very schematically here
00:04:15.26 for the simplest possible case.
00:04:17.24 This is the problem
00:04:19.25 where we have a protein backbone we want to make,
00:04:23.25 and we want to find a sequence
00:04:26.03 which is very low energy in this backbone.
00:04:29.00 So, we keep the backbone fixed
00:04:31.05 and we search through the different combinations of amino acids
00:04:33.08 for an amino acid sequence
00:04:34.28 which is very low in energy in this structure.
00:04:37.25 Then, as I said, once we have that sequence,
00:04:40.03 we can go to the lab and make it
00:04:41.22 and experimentally test it.
00:04:44.15 So, the first example
00:04:46.03 I'm going to give you
00:04:48.07 concerns the influenza virus.
00:04:49.22 A schematic of the influenza virus
00:04:51.05 is shown on the upper left,
00:04:52.25 and then in the middle two panels
00:04:55.19 is a blow-up of a surface protein on the influenza virus
00:05:00.13 called the hemagglutinin,
00:05:02.17 and in yellow in the middle panel
00:05:05.10 are two parts of that viral surface protein,
00:05:09.00 this hemagglutinin,
00:05:10.23 which are very highly conserved during evolution.
00:05:12.28 The virus is constantly mutating
00:05:14.20 to evade our immune systems,
00:05:16.09 that's why we need new vaccines every year,
00:05:18.13 but there are certain regions
00:05:20.03 which absolutely don't change
00:05:21.29 because they're critical to the function of the virus.
00:05:24.16 There's a region I'll refer to as the stem region,
00:05:27.14 in the middle of the structure,
00:05:29.07 and then on the top,
00:05:31.10 where the protein is actually attaching
00:05:33.08 to cells in our bodies,
00:05:35.07 this is how the virus gets into our cells,
00:05:36.26 there's a second site called the receptor-binding site.
00:05:39.13 What I'm going to tell you about today
00:05:42.07 is the design of proteins
00:05:44.04 which bind to these sites shown in yellow
00:05:46.07 and block the virus function;
00:05:47.26 they prevent the virus from getting into our cells.
00:05:50.20 So, using the methods that I briefly outlined,
00:05:53.16 we've designed proteins which block the virus
00:05:56.20 that bind at both the site in the stem region on the side
00:05:59.10 and then on the surface,
00:06:01.22 but I'm going to tell you in detail
00:06:03.15 about the ones that bind at the stem site today.
00:06:07.29 So, the design process has two steps,
00:06:10.16 and I'm going to illustrate them for you here.
00:06:13.15 On the left
00:06:15.17 you see a blow-up of that stem region
00:06:18.28 of the influenza virus hemagglutinin,
00:06:22.03 that was the region that was in yellow
00:06:24.12 on the previous slide in the middle of the slide...
00:06:28.08 in the middle of the protein...
00:06:30.12 and you can see that there's kind of
00:06:33.17 a deep groove that we decided we would try
00:06:36.09 and design proteins to bind into.
00:06:39.17 The design calculation has two parts.
00:06:41.28 The first part
00:06:43.29 consists of placing amino acid sidechains
00:06:48.01 into the groove
00:06:49.25 in ways that they make very good interactions.
00:06:52.06 An analogy for our approach
00:06:54.24 is to think of this like a climber
00:06:58.11 would think about a climbing wall,
00:07:00.12 where there's some region
00:07:02.09 that you want to hold onto, like this groove,
00:07:04.18 and the first problem is to find handholds and footholds
00:07:06.21 that allow you to really get a grip on this,
00:07:08.17 and then you have to figure out
00:07:10.10 how you're going to place your body
00:07:12.16 so that you can have your hands and feet
00:07:14.06 in all the good places for them
00:07:16.02 at the same time.
00:07:17.17 So, we start by figuring out where the handholds and footholds are,
00:07:19.21 that is, where we can place disembodied amino acids
00:07:22.17 into this cavity
00:07:24.25 to make really good interactions,
00:07:27.19 and the second part is to place the body,
00:07:30.15 and this can either be a protein
00:07:32.01 that we designed from scratch
00:07:33.27 or one that we design de novo.
00:07:36.17 And, so what you see here again
00:07:39.04 in sort of the solid surface representation
00:07:41.15 is the flu virus protein,
00:07:44.04 and you see the sidechains
00:07:46.09 that we placed in the preceding slide
00:07:48.28 docked up against the surface,
00:07:51.00 and now the ribbon-y thing
00:07:52.21 is a brand new designed protein that we've made
00:07:56.20 that holds these critical side chains
00:07:59.02 up against the virus in exactly the right orientations.
00:08:03.21 There are...
00:08:05.06 one of the components of the calculations of the design
00:08:08.07 are electrostatic interactions,
00:08:10.11 favorable interactions between positive atoms and negative atoms,
00:08:13.28 so on the right
00:08:16.12 you see a very red region on the virus,
00:08:19.05 that's negatively charged,
00:08:20.28 and we're putting a blue side chain, which is positively charged,
00:08:22.29 right into that to get more binding energy.
00:08:27.28 The two designs that I'm going to tell you about
00:08:31.02 are shown here, again,
00:08:32.27 with the influenza virus in yellow
00:08:34.10 and the design in magenta.
00:08:36.16 You see the sidechains
00:08:38.23 fitting into that pocket on the virus,
00:08:42.01 and you see the backbone of the designed protein
00:08:44.09 in the ribbon diagram.
00:08:46.17 Something that's important for me to emphasize
00:08:48.15 is that when we do these calculations,
00:08:51.05 only a fraction of the computed designs
00:08:55.01 that are predicted to bind the virus
00:08:56.26 actually fold up to fold up to structures
00:08:59.16 that, when we test them,
00:09:01.17 bind the virus experimentally.
00:09:03.18 These two proteins
00:09:05.14 bind the virus and they bind quite tightly,
00:09:07.21 but most of the designs in fact don't,
00:09:10.11 and it turns out the reason that they don't
00:09:12.13 is probably because these sequences don't fold up,
00:09:15.08 don't really fold up to these structures.
00:09:17.20 Our calculations
00:09:19.12 are not quite good enough,
00:09:21.08 so that we get some designs
00:09:23.03 which simply don't fold properly,
00:09:24.26 but the thing that's very powerful now
00:09:28.27 is it's very easy to synthesize synthetic genes,
00:09:32.27 so we can make many, many, many different designs
00:09:36.02 that have been found in these computer calculations
00:09:40.13 and test them all,
00:09:41.28 and identify those which actually function.
00:09:45.02 Now, I told you that those two proteins
00:09:47.25 in fact do bind the virus,
00:09:49.19 but it's important to know how they bind the virus
00:09:51.18 and how similar it is to the way that we designed them to bind the virus.
00:09:55.11 So, on this slide
00:09:57.02 I show crystal structures,
00:09:58.26 determined in the laboratory of Ian Wilson at Scripps,
00:10:01.04 where the influenza virus protein
00:10:03.02 is shown on the left in magenta and cyan
00:10:09.17 and the design model is in purple,
00:10:12.12 and it's binding, again, in the middle of the influenza virus protein
00:10:15.05 in that stem region,
00:10:17.19 and in red is the crystal structure.
00:10:20.29 What you can see is that the crystal structure...
00:10:24.17 in the crystal structure,
00:10:26.22 this protein we've designed,
00:10:28.11 this one is called HB36 on the left,
00:10:30.13 is binding to the virus
00:10:34.23 exactly like we designed it to bind,
00:10:37.20 and in that inset there in the middle
00:10:40.20 you can see that even the designed side chains
00:10:42.20 in the crystal structure are exactly
00:10:46.01 where they were supposed to be.
00:10:47.27 And the same thing is true for the other designed protein
00:10:49.24 that I described, called HB80.
00:10:52.19 The crystal structure is, again,
00:10:54.24 nearly identical to the design model.
00:10:56.29 So, while I told you that
00:10:59.02 a large fraction of our designs simply don't bind at all,
00:11:01.04 the ones that do bind
00:11:03.13 bind to the virus in essentially exactly the same way
00:11:07.23 that they were supposed to bind the virus.
00:11:10.07 The proteins,
00:11:11.10 after some experimental optimization of the sequence,
00:11:14.11 bind with picomolar affinity to the virus,
00:11:17.18 they're very tight binding proteins,
00:11:22.25 and our collaborators Merika Treats,
00:11:25.13 a graduate student in the laboratory of Deb Fuller,
00:11:27.17 has some very exciting results now
00:11:29.12 showing that mice
00:11:32.12 who would die from a lethal infection from the flu virus
00:11:36.17 are completely protected
00:11:39.00 when these designed proteins,
00:11:40.15 actually the one that was on the left,
00:11:42.14 and given to them,
00:11:44.15 and the protein can be given to them
00:11:46.23 up to 24 hours before or 24 hours after
00:11:49.29 they are infected with the virus.
00:11:51.28 So, we're very excited now about the possibility
00:11:54.05 that this could become a new type of flu therapeutic
00:11:56.25 where either you're going into an area that's infected
00:11:58.28 or you've just been infected.
00:12:01.01 Such designed proteins
00:12:02.29 might be a future treatment for the flu.
00:12:08.09 We're designing proteins now,
00:12:11.10 using the techniques that I've described,
00:12:13.10 to bind to
00:12:15.13 not only other pathogens
00:12:17.12 but to proteins on the surfaces of cancer cells
00:12:21.12 and normal cells
00:12:23.20 to modulate biological function.
00:12:27.05 I don't have time today to tell you about that,
00:12:29.23 but we're able to make proteins
00:12:33.04 that are also useful for figuring
00:12:35.17 some fundamental biological questions,
00:12:37.14 because we can design proteins
00:12:39.18 that knock out specific interactions,
00:12:41.17 and so that allows biologists, then,
00:12:43.03 to probe what the function of that interaction is.
00:12:45.12 But now I'm going to switch gears
00:12:47.07 and talk about the design of proteins
00:12:49.20 to bind small molecules,
00:12:51.21 and we use a very similar approach.
00:12:53.18 On the left is the structure of
00:12:57.04 a small molecule called digoxigenin,
00:12:59.06 which is used as a therapeutic
00:13:02.15 to treat heart patients, some heart conditions,
00:13:05.03 but if you get too much of it
00:13:07.14 it's very, very dangerous and patients can die.
00:13:09.24 So, we were interested in trying to design a protein
00:13:11.22 that could essentially be a therapeutic sponge
00:13:13.16 and soak it up.
00:13:15.07 The designed protein is shown on the bottom right.
00:13:18.01 In magenta is this dig molecule,
00:13:22.17 I'll call it for short,
00:13:25.24 and in green is a protein we've designed
00:13:28.13 which makes very complementary interactions,
00:13:32.23 those are hydrogen bonding interactions
00:13:34.27 shown in the dashed lines,
00:13:37.17 and it surrounds the dig.
00:13:41.10 Another view of it is shown in the upper panel,
00:13:43.24 where you can see a space-filling view of the designed protein,
00:13:46.06 and you can see it really snugs the surface
00:13:48.04 of the small molecule.
00:13:50.08 So again, this is purely a computer calculation,
00:13:53.21 but we then go to the lab and make the protein...
00:13:57.01 and we make the protein...
00:13:59.07 and when we made the protein
00:14:01.06 we found it bound the small molecule,
00:14:03.17 and Barry Stoddard's group
00:14:05.23 was then able to solve the crystal structure,
00:14:07.23 and that's shown here.
00:14:09.29 In cyan is the...
00:14:11.12 sorry, in magenta is the designed model,
00:14:13.07 that's what I already showed you,
00:14:15.10 it's the designed model of the designed protein
00:14:17.04 bound to this small molecule,
00:14:19.16 and in cyan is the crystal structure,
00:14:21.23 and you can see that the small molecule...
00:14:24.04 first of all, you can see that
00:14:25.28 this designed protein has the correct structure,
00:14:28.04 and second, you can see that the designed molecule
00:14:30.09 binds to that structure
00:14:32.07 in almost exactly the way that was designed,
00:14:34.04 making those same hydrogen bonding interactions.
00:14:36.07 And the left panel shows you the
00:14:39.04 shape complementarity in the crystal structure
00:14:41.09 of this small molecule with the protein.
00:14:44.18 This design was very exciting
00:14:47.02 because it, again, binds the small molecule
00:14:52.03 with picomolar affinity,
00:14:54.10 and we are now using this method
00:14:56.00 to design proteins which bind
00:14:58.04 a number of different types of molecules,
00:15:00.03 both toxins and other types of drugs,
00:15:05.05 and these types of designed proteins
00:15:06.26 could be useful
00:15:08.17 not only for soaking up dangerous molecules
00:15:10.19 in the body,
00:15:12.06 but also for detection of molecules and other purposes.
00:15:16.04 And, I'm going to conclude
00:15:18.08 by telling you about
00:15:20.27 our work on designing new materials.
00:15:23.24 So, many of the materials that you're familiar with,
00:15:26.05 like silk and wool,
00:15:28.01 are made out of proteins,
00:15:29.23 and biology has lots of examples
00:15:32.07 of more specialized sort of nanomaterials,
00:15:35.02 like viruses
00:15:37.25 have these very elaborate and beautiful coat structures
00:15:40.25 with which they use to protect their DNA,
00:15:48.05 and the principle of all these materials in biology
00:15:51.23 is self-assembly,
00:15:53.06 where there's a subunit that's made,
00:15:55.13 that's encoded in a gene,
00:15:57.06 and then that subunit interacts
00:15:59.21 with other copies of itself
00:16:02.02 to make a larger structure.
00:16:03.21 And, I'm going to show you now
00:16:05.20 how we can design brand new proteins
00:16:07.11 which self-assemble with other copies of themselves
00:16:10.24 to make larger structure.
00:16:13.18 So, in this first example,
00:16:17.09 what we've done is to take a protein that's shown on the left,
00:16:21.06 and place it on the corners of a cube.
00:16:24.25 And so, there are eight corners on a cube,
00:16:26.22 so we've taken eight copies of this protein
00:16:28.27 and arranged them on the corners of the cube
00:16:30.27 in such a way that
00:16:34.25 the surfaces of these different copies
00:16:37.03 on the different corners
00:16:39.06 touch each other.
00:16:42.11 And we then designed the sequences
00:16:44.19 of these interfaces where they touch
00:16:47.10 so that the proteins...
00:16:49.20 to make very low energy interactions,
00:16:51.20 so that when this protein is made in cells,
00:16:53.22 what we hope is that
00:16:55.22 it will self-assemble into the cubic structure,
00:16:57.19 stabilized by these designed interactions
00:16:59.21 that we've made.
00:17:01.13 And, in the lower panel here,
00:17:03.09 you can see an electron micrograph of cells
00:17:05.16 that are making this designed protein,
00:17:07.24 and you can see that these cells
00:17:09.20 are filled with these cubic structures,
00:17:12.29 and the averages of these images
00:17:14.25 are shown on sort of the right column of this panel,
00:17:17.10 and you can see they look quite a bit like the designed model.
00:17:20.22 They look like little dice.
00:17:22.20 In fact, what we'd like to be good enough to do
00:17:24.07 is be able to put different numbers on different sides.
00:17:26.14 We're not quite there yet.
00:17:28.19 When the crystal structure was solved
00:17:31.10 in Todd Yates' lab,
00:17:33.10 it was found to be nearly identical to the designed model,
00:17:37.01 which we were very excited about.
00:17:38.26 So, we can make these types of nanomaterials
00:17:40.19 and enclosed structures
00:17:42.14 with very high accuracy.
00:17:44.27 This shows another view.
00:17:47.28 The left three columns
00:17:50.01 show the same design I just described,
00:17:52.18 but now viewed down the different symmetry axes of the cube.
00:17:56.21 So for example, the third column
00:17:58.16 is the four-fold axis of a cube,
00:18:00.22 and in the upper row
00:18:02.21 is the designed model,
00:18:04.08 what we were trying to make,
00:18:05.29 and in the lower row is the crystal structure,
00:18:08.08 those are the structures that we actually found experimentally,
00:18:10.25 and you can see they're essentially identical.
00:18:13.25 On the right is a second example
00:18:15.12 where we were trying to design proteins
00:18:17.13 to come together to form a tetrahedron,
00:18:19.10 and again you can see that
00:18:22.03 the designed models in the top row
00:18:23.26 are very similar to the actual crystal structures
00:18:26.17 that were solved experimentally in the bottom row.
00:18:32.29 And Yang Hsia, a graduate student in the lab,
00:18:35.03 has more recently used this approach
00:18:36.26 to try and make even bigger structures
00:18:39.14 like the icosahedron shown on the top left.
00:18:44.21 This is more or less
00:18:46.14 like the play structures that they have in some playgrounds,
00:18:49.28 except this is a complete icosahedron.
00:18:53.17 And, when Yang made this protein in the lab,
00:18:55.24 very recently,
00:18:57.16 he was excited when Shane Gonen,
00:18:59.17 who he sent the protein to to do electron microscopy,
00:19:02.27 sent back the pictures that I'm showing you here.
00:19:04.28 You can't quite see the whole icosahedron
00:19:07.00 but, for example
00:19:08.27 in the lower row on the middle panel,
00:19:11.05 you see something that looks very much like it.
00:19:13.09 So, we're currently trying to solve the high-resolution structure.
00:19:17.08 So, these were materials
00:19:19.02 that were made out of just one component
00:19:20.20 that was identical
00:19:22.08 that was then interacting with other copies of itself.
00:19:24.16 We can make this more sophisticated
00:19:26.14 by, instead of having one component,
00:19:28.17 we can have two components.
00:19:31.01 So, in panel A here, I'm showing two tetrahedra
00:19:34.02 that are inverted relative to each other,
00:19:36.08 one green and one blue.
00:19:40.19 And so, what we're doing here is
00:19:43.06 we're taking one building block, the green one,
00:19:45.13 and putting it at the corners of the green tetrahedron,
00:19:48.02 and another building block, the blue one,
00:19:50.28 and putting it at the corners of the blue tetrahedron,
00:19:53.14 and then as shown in the middle panel here,
00:19:55.18 we can move them...
00:19:57.25 we can slide them closer and further away
00:20:00.04 from the center of these tetrahedra,
00:20:02.24 and we can also rotate each one,
00:20:05.16 and we do this
00:20:07.17 until we find a way in which these fit together
00:20:10.13 in a very shape-complementary way,
00:20:12.13 and that's shown in panel C.
00:20:14.29 At this point it becomes a calculation
00:20:17.09 very similar to what I showed in that movie
00:20:19.11 that I showed at the beginning of my talk,
00:20:21.17 where we now have to design...
00:20:23.22 find an amino sequence...
00:20:25.16 amino acid sequences on both sides,
00:20:27.00 on both the green side and the blue side,
00:20:28.22 which fit together very well
00:20:30.17 and make very strong interactions.
00:20:33.07 And, when we've done that,
00:20:35.15 we again order synthetic genes,
00:20:37.18 or make synthetic genes,
00:20:39.10 that encode both proteins.
00:20:41.03 We make them in bacteria
00:20:42.24 and then we look to see
00:20:44.27 whether there's anything that's assembled,
00:20:47.00 and I'm going to show you the results on the next slide.
00:20:50.01 These are electron micrographs
00:20:51.27 of two of these materials.
00:20:53.29 These are, again, two components,
00:20:55.21 with a green component and a blue component,
00:20:58.13 and the designed models are shown
00:21:01.25 on the lower part of the slide,
00:21:05.24 with one component in green
00:21:07.21 and one component in blue.
00:21:09.17 In the upper panels
00:21:11.02 are electron micrographs of what we get out of E. coli cells,
00:21:13.22 bacterial cells
00:21:15.15 that are expressing these two proteins,
00:21:17.13 and you can see that...
00:21:18.25 first of all what you can see is that, for each design,
00:21:20.19 we get remarkably homogeneous particles,
00:21:22.25 so all the particles in these images
00:21:24.17 look essentially identical,
00:21:26.11 and if you look closely you can see that,
00:21:28.02 for the different shaped designs,
00:21:30.12 we get different shaped structures
00:21:31.29 and they correspond
00:21:34.05 to the shapes that we're trying to design.
00:21:36.06 So, I think in the middle panel,
00:21:38.04 you can the that the holes are a little bit bigger
00:21:40.13 than in the particles on the left panels.
00:21:44.14 And, what's exciting about this
00:21:47.24 for the applications I'll describe
00:21:49.24 is not only that the shapes are coming out right,
00:21:52.00 as we designed,
00:21:53.18 but that every particle is the same.
00:21:55.03 So, for example,
00:21:56.18 if you wanted to make a new type of drug delivery vehicle,
00:21:59.14 there are various ways of making particles
00:22:01.16 for drug delivery now,
00:22:03.18 so say you want to target a toxic compound
00:22:07.09 specifically to the tumor you want to kill,
00:22:10.06 but those methods always...
00:22:12.25 when you look at the particles
00:22:14.10 they're always very heterogeneous,
00:22:15.21 so it's hard to predict what they'll do inside the body.
00:22:17.20 With this technique,
00:22:19.14 we can make particles that are very precise
00:22:21.15 and each one is identical to each other one.
00:22:25.16 So, Todd Yeates' group
00:22:27.10 was again able to solve crystal structures
00:22:29.16 of these two-component materials.
00:22:31.14 So, in the upper rows are the designed models,
00:22:34.09 shown down the different symmetry axes...
00:22:36.17 two of the symmetry axes of these particles,
00:22:39.04 and in the lower rows
00:22:40.27 are the crystal structures of these designs.
00:22:42.26 So again, the process is,
00:22:44.21 you have the computer model, which is what's on the top row,
00:22:48.28 then you order a synthetic gene
00:22:50.26 which encodes both of the designed proteins,
00:22:53.27 you put these synthetic genes into bacteria,
00:22:56.06 you make the proteins,
00:22:58.02 and then you purify them out of E. coli
00:22:59.24 and you look to see what you've got.
00:23:01.29 And then, in this case, go one step further
00:23:04.27 to determine the X-ray crystal structures,
00:23:06.26 and what you can see here
00:23:08.28 is that these designed proteins are again...
00:23:11.09 the crystals structures are essentially identical
00:23:13.03 to the designed models.
00:23:14.18 So, we can make these designed nanomaterials
00:23:16.23 very, very precisely.
00:23:20.12 So, the different types of nanostructures
00:23:23.27 that I've described so far
00:23:26.15 are the ones on the left, and I already mentioned...
00:23:28.29 so, the question is, what good could they be for?
00:23:30.28 One very exciting possibility
00:23:32.20 is targeted drug delivery,
00:23:34.14 where, as I mentioned, you could put a chemotherapy agent
00:23:36.24 inside the cage
00:23:38.16 and then target it to the tumor,
00:23:40.01 so you don't have to take it systemically.
00:23:41.29 You can also put targeting domains on the outside
00:23:44.12 so that it goes exactly where you want it to go,
00:23:47.08 and we're now...
00:23:49.11 a first-year student in the lab
00:23:50.24 is now exploring different ways of putting nucleic acid
00:23:52.29 inside these
00:23:54.25 to make synthetic viruses,
00:23:56.09 not for bad purposes but for good purposes,
00:23:58.14 so we can deliver, say,
00:24:01.00 for gene therapy or for other types of therapy,
00:24:04.18 deliver RNA or DNA molecules
00:24:06.29 exactly in the body
00:24:08.21 where they would be good to go.
00:24:11.00 Another application is to vaccines.
00:24:13.16 We can display...
00:24:15.18 one of the things we're trying to display now
00:24:17.16 is the HIV coat protein...
00:24:19.23 we can display it on the outside of these cages,
00:24:21.28 it will be there in many copies,
00:24:24.02 and hopefully trigger a strong immune response.
00:24:27.21 We can also put molecules called adjuvants
00:24:30.04 inside these cages
00:24:32.01 to stimulate a stronger response.
00:24:33.17 Now, there are other types of particles,
00:24:35.09 other types of nanomaterials that we can design.
00:24:37.28 For example, the wire on the right side.
00:24:41.15 You could imagine things like
00:24:45.01 being useful for transporting ions
00:24:47.07 or maybe even electrons
00:24:49.01 in some sort of nanoelectronic device.
00:24:50.28 And, my last example today
00:24:52.23 is going to be for what you see in the middle
00:24:55.06 - a designed, repeating, 2-dimensional layer,
00:24:59.17 and this is the work of graduate student Shane Gonen.
00:25:03.14 Here is his design.
00:25:06.16 It's a hexagonal lattice
00:25:09.10 where these proteins are designed to
00:25:11.29 assemble first into hexagons,
00:25:13.22 which then interact with other copies of themselves
00:25:15.14 to tile the plane,
00:25:17.13 and when he makes this protein in E. coli
00:25:19.15 he gets this... this is straight out of a broken E. coli cell.
00:25:23.13 He sees these large arrays that correspond...
00:25:28.21 that have the geometry one would expect for his design,
00:25:32.25 and if he averages his data, the...
00:25:39.05 and then, sort of a representation of a map,
00:25:43.02 a density map that comes from this data
00:25:45.07 is shown in the lower-left panel,
00:25:47.10 and you can see that his model
00:25:49.09 fits into that quite well.
00:25:51.06 But, as you can imagine,
00:25:52.20 we really aren't satisfied until we've determined
00:25:54.13 the high-resolution structure,
00:25:56.10 which Shane is currently working on.
00:25:58.14 I've been very fortunate to have absolutely outstanding colleagues
00:26:01.07 that actually did all the work that I described.
00:26:03.13 Their names are listed on this slide
00:26:05.05 and, more generally,
00:26:08.08 I hope I've given you a sense, today,
00:26:10.04 for the potential of protein design
00:26:12.23 to create a whole new world of designed proteins
00:26:15.27 to solve challenges
00:26:20.02 that we collectively face today.
- Why can we not simply express a part of the HA protein to use as an epitope for antibody production?
- Baker discusses a number of protein design projects: ligand-binding proteins, self-assembling proteins, repeat proteins, and other nanomaterials. How might these proteins be useful in other areas of scientific research, i.e. medicine or environmental science?
- How can you increase the complexity of engineered nanomaterials? Why would you want to do this?
Paper for this Session’s Discussion
Discussion Questions for the Paper
- Briefly describe the computational method of protein design discussed in this paper (Fold From Loops). How many FFL designs were ultimately chosen for filtering and human-guided optimization?
- The authors then did an immunological evaluation of specific FFL designs they optimized. What evidence did the authors have to support and/or go against the clinical relevance of their designed FFL scaffolds?
- What evidence did the authors have from their antibody characterization that their designed scaffolds can “re-elicit” neutralizing antibodies?
- How does epitope-focused vaccine design work? What are the advantages of this method compared to traditional vaccine design methods?
Do you think there is enough evidence supporting the efficacy of FFL scaffolds for use as a vaccine against RSV? If no, which additional experiments are required? Comment on the feasibility of this approach for developing vaccines against other viruses, i.e. HIV or Ebola.
Tinberg CE, Khare SD, Dou J, Doyle L, Nelson JW, Schena A, Jankowski W, Kalodimos CG, Johnsson K, Stoddard BL, Baker D. Computational design of ligand-binding proteins with high affinity and selectivity. Nature. 2013 Sep 12;501(7466):212-6. PMID:24005320
King NP, Sheffler W, Sawaya MR, Vollmar BS, Sumida JP, André I, Gonen T, Yeates TO, Baker D. Computational design of self-assembling protein nanomaterials with atomic level accuracy.Science. 2012 Jun 1;336(6085):1171-4. PMID:22654060
Fleishman SJ, Whitehead TA, Ekiert DC, Dreyfus C, Corn JE, Strauch EM, Wilson IA, Baker D. Computational design of proteins targeting the conserved stem region of influenza hemagglutinin.Science. 2011 May 13;332(6031):816-21. PMID:21566186
Whitehead TA, Chevalier A, Song Y, Dreyfus C, Fleishman SJ, De Mattos C, Myers CA, Kamisetty H, Blair P, Wilson IA, Baker D. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing. Nat Biotechnol. 2012 May 27;30(6):543-8. PMID:22634563
Koga N, Tatsumi-Koga R, Liu G, Xiao R, Acton TB, Montelione GT, Baker D. Principles for designing ideal protein structures. Nature. 2012 Nov 8;491(7423):222-7. PMID: 23135467
Foldit Publications: http://fold.it/portal/info/science#folditpub
Rosetta@Home information: http://boinc.bakerlab.org/
David Baker received a BA in Biology from Harvard University and a PhD in Biochemistry from the University of California, Berkeley. Currently, Baker is the Head of the Institute for Protein Design and a Professor of Biochemistry at the University of Washington, and a Howard Hughes Medical Institute Investigator. His research utilizes both experimental and… Continue Reading