It is well known for years that the genomic sequence of a protein determines its three-dimensional structure. But until now, you needed experiments to discover this structure. Now, scientists from the University of Washington and the Howard Hughes Medical Institute (HHMI) in Seattle, are simply using computers to predict protein structures. This method works as well as experimental ones, or the third of the cases studied. It might look as a low percentage, but in fact, it is a remarkable achievement which opens a new way to discover how these biological machines, the proteins, are working, just by introducing a genomic sequence into a computer.
Here is how David Baker, an HHMI researcher at the University of Washington, describes his research.
"For more than 40 years, people have known the amino acid sequence of a protein specifies its three-dimensional structure, but no one has been able to translate the sequence into an accurate structure," said Baker. "The reason this research is exciting is that we're showing progress in predicting the structure from the sequence. It's not that the problem is solved, but that there is hope."
And indeed, there is hope. Here are some details about the latest results obtained by the researchers.
In the study, a sophisticated computer program folded 17 short strings of amino acids into 100,000 possible variations. When the researchers compared the best predictions to the actual structures solved earlier by other scientists using experimental techniques, they had the same success rate as the best hitters in major league baseball.
"For about one-third of our benchmark set of small proteins, we generated relatively high-resolution structure predictions, with parts of the structures predicted to near-atomic resolution," said first author Philip Bradley, a postdoctoral fellow in Baker's lab. "For us, it is a real step forward to achieve structures that are in some way comparable to what you can get by experiments."
The pictures below show two medium-resolution protein structure predictions. The superposition of low-energy models is depicted in blue and the experimental structures showing core sidechains are in red. "The lowest-energy round 1 model (A) for the Enga protein is topologically correct but does not have native-like sidechain packing or loop conformations. The same is true of the lowest-energy model (B) from round 2 for Yhhp." (Credit: HHMI).
Here is how works this two-step prediction process.
The first stage uses an approximate model which allows rapid calculation of the energy and so can be carried out rapidly, while the second uses a very detailed model for which the energy calculations take much longer but are much more accurate. A large scale search through possible structures is carried out in the first stage, and promising locations are then explored in detail in the second stage.
[In this second stage, the computer modeling program] replaces the fuzzy picture of the side chains with detailed, physically realistic models with all the atoms represented. From the positions of the atoms in the sidechains and the protein backbone, the computer then uses a detailed physical chemistry based force field which favors close packing of atoms and hydrogen bonding to more accurately compute the energy of the structure.
For more information, you should check the Baker's Lab home page, this specific one about prediction and design of macromolecular structures or read more about the Rosetta@home project about protein folding, design and docking.
The research work mentioned above has been published by Science under the name "Toward High-Resolution de Novo Structure Prediction for Small Proteins" (Vol. 309, Issue 5742, Pages 1868-1871, September 2005). Here is a link to the abstract. The above pictures have been extracted from the supporting online material to this article.
Finally, in a related paper published in the August issue of the journal Proteins (Volume 60, Issue 2, Pages 187-194, August 1, 2005), Baker and his colleagues reported that similar approaches can be used to predict the structures of protein complexes. Here is a link to the abstractof this paper called "Progress in protein-protein docking: Atomic resolution predictions in the CAPRI experiment using RosettaDock with an improved treatment of side-chain flexibility." And please note that CAPRI is not a lovely Italian city in this context, but is an acronym for "critical assessment of predicted interactions."
Sources: Howard Hughes Medical Institute news release, September 16, 2005; and various web sites
You'll find related stories by following the links below.