AlphaFold successfully predicted protein structure. Subvert biology!
AlphaFold successfully predicted protein structure. An AI network of Google’s artificial intelligence (AI) company DeepMind has taken a big step towards solving one of the biggest challenges in biology. The challenge here is to determine the 3D structure of a protein based on its amino acid sequence.
DeepMind’s program is called “AlphaFold”, and it defeated hundreds of other teams in a protein structure prediction biennial called “Key Assessment of Protein Structure Prediction” (CASP). On November 30, the event inventory meeting (which was changed to online this year) opened and the results were announced.
The function of a protein is determined by its 3D structure. Source: DeepMind The function of a protein is determined by its 3D structure. Source: DeepMind
“This is very remarkable.” said John Moult, a computational biologist at the University of Maryland. Moult co-founded CASP in 1994 to improve the calculation methods for accurately predicting protein structure. “To some extent, the problem is solved.”
The ability to accurately predict protein structure based on amino acid sequence will bring huge benefits to life sciences and medicine. This will greatly enhance our understanding of the basic structure of cells and accelerate the escalation of drug discovery.
AlphaFold won the top spot in the last CASP-in 2018, London-based DeepMind participated for the first time. And this year, DeepMind’s deep learning network is even more outstanding. In the words of scientists, its performance is amazing, or it heralds a revolution in biology.
“It changed the whole situation,” said Andrei Lupas, a judge of CASP and an evolutionary biologist at the Max Planck Institute for Developmental Biology. AlphaFold helped him discover the structure of a protein that has plagued his laboratory for decades. He believes that AlphaFold will change the way he works and the problem he wants to solve. “It will change medicine, change research, change bioengineering, change everything.” Lupas said.
Sometimes, the structure predicted by AlphaFold is almost the same as the structure determined by “gold standard” experimental methods such as X-ray crystallography and cryo-electron microscopy (cryo-EM) in recent years. Scientists say that, at present, AlphaFold cannot replace these laborious and expensive technologies, but it will bring a new way of studying life.
Protein is the cornerstone of life and determines everything that happens in the cell. How a protein works and what it does are determined by its 3D structure-“structure is function” is an axiom of molecular biology. Protein seems to take shape without help, just follow the laws of physics.
For decades, laboratory experiments have been the main means to obtain good protein structures. The first complete structure of a protein was determined in the 1950s. The technology used at that time irradiated the crystallized protein with an X-ray beam, and the diffracted light was converted into the atomic coordinates of the protein. X-ray crystallography contributed most of the protein structure, but in the past decade, cryo-electron microscopy has become the tool of choice in many structural biology laboratories.
Scientists have always wanted to know how the components of a protein-a chain of different amino acids-are twisted and folded into their final shape. Early attempts at using computers to predict protein structure in the 1980s and 1990s were not successful, the researchers said. The exaggerations in published papers can easily be self-defeating when other scientists try with other proteins.
In order to make this research more rigorous, Moult founded CASP. The structure of the protein that the participating teams need to predict has been analyzed experimentally, but it has not yet been announced. Moult believes that this experiment (he didn’t call it a competition) squeezed out water and purified the entire field. “You are really judging what looks promising, what is useful, and what needs to be discarded.” He said.
DeepMind’s performance on CASP13 in 2018 surprised many scientists in the field, which has always been a fortress of a small group of academic groups. However, its approach at the time was largely similar to other teams using AI, said Jinbo Xu, a computational biologist at the University of Illinois at Chicago.
The first iteration of AlphaFold applied deep learning to structural and genetic data to predict the distance between amino acid pairs in a protein. The second step does not require AI. AlphaFold will use this information to give a “consensus” model of the structure of the protein, said John Jumper, the director of the DeepMind project.
The team tried to expand from this method, but eventually hit a wall. So they changed directions. Jumper said that they designed an AI network that contains additional physical and geometric constraints that determine protein folding. They also assigned it a more difficult task: instead of letting it predict the relationship between amino acids, it predicts the final structure of a target protein sequence. “This makes the whole system more complicated,” Jumper said.
Each CASP will last for several months. During the competition, target proteins or protein domains are given regularly—about 100 in total, giving the team a few weeks to submit their predicted structure. Subsequently, a team of independent scientists used various indicators to evaluate the prediction results. These indicators mainly judge how similar the structure of the protein predicted by the team is to the experimental analysis. The reviewer does not know who made the prediction.
AlphaFold’s forecasts are called “427 groups,” and multiple forecasts have achieved amazing accuracy, making them stand out, Lupas said. “I guessed it was AlphaFold. Most people guessed it.” He said.
AlphaFold’s prediction level is high or low, but nearly two-thirds of the prediction results are comparable in quality to the experimental results. In some cases, Moult said, we don’t even know whether the difference between AlphaFold’s predictions and experimental results is a prediction error or an artifact in the experiment.
AlphaFold’s prediction results are poorly matched to the experimental structure analyzed by NMR spectroscopy, but this may be related to the way the original data is converted into a model, Moult said. AlphaFold is also difficult to simulate the monomer structure of protein complexes/groups because their interaction with other proteins can distort their shape.
On the whole, the prediction results of the participating teams this year are more accurate than the previous one, but the main improvement comes from AlphaFold, Moult said. The prediction accuracy is a perfect score of 100 points. Among the target proteins with medium difficulty, the best scores of other teams are generally 75 points, while AlphaFold can get around 90 points, Moult said.
About half of the teams mentioned “deep learning” when summarizing their methods in the abstract, Moult said, indicating that AI’s influence on the field should not be underestimated. Most of the teams participating in CASP14 have academic backgrounds, but there are also teams like Microsoft and Tencent.
Mohammed AlQuraishi, a computational biologist at Columbia University in New York, also participated in CASP. He is eager to know the details of AlphaFold’s performance in the competition. He is going to study the work of this system when the DeepMind team demonstrates their method on December 1. the way. He said that although the possibility is unlikely, it is also possible that this time the target protein is simpler than usual, allowing them to achieve such good results. Strong intuition tells AlQuraishi that AlphaFold will be disruptive.
“I think it can be said that the field of protein structure prediction will usher in a subversion. I suspect that many people will leave because the core problem in this field has been solved.” He said, “This is the highest level breakthrough, and it is definitely One of the most important scientific achievements I have seen in my life.”
DeepMind CEO Demis Hassabis said the company is understanding what biologists expect from AlphaFold. Source: OLI SCARFF/AFP/GettyDeepMind CEO Demis Hassabis said that the company is understanding what biologists want from AlphaFold. Source: OLI SCARFF/AFP/Getty
Speed up prediction of protein structure
AlphaFold’s predictions helped determine the structure of a bacterial protein that the Lupas lab has been trying to crack for years. Lupas’s team had collected raw X-ray diffraction data before, but turning these Rorschach ink-like patterns into a structure required some information about the shape of the protein. The techniques used to obtain this information, as well as other predictive tools, have failed. “The 427 group of models gave our structure in half an hour, and this structure has allowed us to spend ten years trying all the methods.” Lupas said.
DeepMind co-founder and CEO Demis Hassabis said the company plans to make AlphaFold available to other scientists. (DeepMind previously published a wealth of details about the first version of AlphaFold, enough to allow other scientists to repeat this method.) AlphaFold may take a few days to give a predicted structure, including predictions of the reliability of different regions of the protein. “We are just beginning to understand what biologists want.” Hassabis said, he believes that drug discovery and protein design are potential applications.
In early 2020, DeepMind announced the structure prediction results of a number of new coronavirus proteins that have not yet been experimentally determined. DeepMind’s prediction of the Orf3a protein is very similar to the structure determined by cryo-electron microscopy, said Stephen Brohawn, a molecular neurobiologist at the University of California, Berkeley. Brohawn’s team announced this structure in June. “Their previous results are really impressive.” He added.
AlphaFold is unlikely to shut down the laboratory-such as the Brohawn laboratory that uses experimental methods to analyze protein structures. But it may mean that if you want to get a good structure, you may only need relatively low-quality, easy-to-collect experimental data. Some of its applications are destined to shine, such as the analysis of the evolution of proteins, because the existing massive genomic data is now expected to be reliably transformed into structures. “This will empower a new generation of molecular biologists, allowing them to ask more cutting-edge questions.” Lupas said, “In the future, there will be more and more thinking and fewer pipetting.”
“I thought I would never see a day when this problem was solved in my life,” said Janet Thornton, a structural biologist at the European Molecular Biology Laboratory-European Institute of Bioinformatics and once a CASP reviewer. She hopes that this method can help reveal the functions of thousands of unresolved proteins in the human genome and figure out why there are different disease-causing gene mutations between people.
AlphaFold’s performance is also a turning point for DeepMind. This company is well-known for making AI a master of games such as Go, but the company’s long-term goal is to develop programs that can achieve broader and closer to human intelligence. Solving grand scientific problems, such as predicting protein structure, is one of the most important applications that their AI technology can achieve, Hassabis said. “I really think this is the most powerful thing we have ever done, and I mean in terms of actual impact.