London, November 30th, 2020 – In a major scientific advance, the latest version of DeepMind’s AI system, AlphaFold, has been recognized as the solution to the 50 year old major protein structure prediction challenge, often referred to as the “protein folding problem”. after a rigorous independent evaluation. This breakthrough could significantly accelerate biological research in the long term and, among other things, open up new opportunities for understanding diseases and discovering drugs.
CASP14 results today show that DeepMind’s latest AlphaFold system achieves an unprecedented level of accuracy in structure prediction. The system is able to determine highly precise structures within a few days. CASP, the Critical Assessment of Predicting Protein Structure, is a biennial community assessment that started in 1994 and is the gold standard for assessing predictive techniques. Participants must blindly predict the structure of proteins that have recently – or in some cases not yet – been determined experimentally and wait for their predictions to be compared with experimental data.
CASP uses the Global Distance Test (GDT) metric to evaluate accuracy in the range of 0 to 100. The new AlphaFold system achieved a median value of 92.4 GDT across all targets. The average error of the system is about 1.6 angstroms – roughly the width of an atom. According to Professor John Moult, co-founder and chairman of CASP, a score of around 90 GDT is informally considered to be competitive with results from experimental methods.
We’ve been dealing with this one problem for almost 50 years – how do proteins fold up? It’s a very special moment to see DeepMind come up with a solution to this problem, after having worked on this problem personally for so long and after so many stops wondering if we would ever get there.
Professor John Moult, Co-Founder and Chairman of CASP, University of Maryland
Why predicting protein structure is important
Proteins are vital and their shapes are closely related to their functions. The ability to accurately predict protein structures enables a better understanding of what they do and how they work. There are currently over 200 million proteins in the main database, and only a fraction of their 3D structures have been mapped.
One major challenge is the astronomical number of ways a protein could theoretically fold before fitting into its final 3D structure. Many of the greatest societal challenges, like developing therapies for diseases or finding enzymes that break down industrial waste, are largely related to proteins and their role. Determination of protein shapes and functions is an important area of scientific research, primarily using experimental techniques which can require years of tedious and arduous work per structure and which require the use of multi-million dollar specialty equipment.
DeepMind’s Approach to the Problem of Protein Folding
This breakthrough builds on DeepMind’s first entry on CASP13 in 2018, where the first version of AlphaFold achieved the highest accuracy among all participants. Now DeepMind has developed new deep learning architectures for CASP14, drawing inspiration from the fields of biology, physics, and machine learning, as well as the work of many scientists in the field of protein folding over the past half century.
A folded protein can be thought of as a “spatial graph” with residues being the nodes and edges connecting the residues in close proximity. This graph is important for understanding the physical interactions within proteins and their evolutionary history. For the latest version of AlphaFold used in CASP14, DeepMind created an attention-based neural network system that is trained continuously and tries to interpret the structure of this diagram while thinking about the implicit diagram it is creating. It uses evolutionarily related sequences, Multiple Sequence Alignment (MSA), and a representation of amino acid residue pairs to refine this graph.
By iterating this process, the system develops strong predictions about the underlying physical structure of the protein. In addition, AlphaFold can use an internal confidence measure to predict which parts of each predicted protein structure are reliable.
The system was trained on publicly available data consisting of ~ 170,000 protein structures from the protein database, with relatively little computational effort required by modern machine learning standards – roughly 128 TPUv3 cores (roughly equivalent to ~ 100-200 GPUs) were run over a few weeks.
Potential for real impact
DeepMind looks forward to working with others to learn more about AlphaFold’s potential, and the AlphaFold team is exploring how protein structure predictions can work with some professional groups to help understand certain diseases.
There is also evidence that predicting protein structure could be useful in future pandemic response efforts as one of many tools developed by the scientific community. Earlier this year, DeepMind predicted several protein structures of the SARS-CoV-2 virus, and the impressively fast work of experimenters has now confirmed that AlphaFold has achieved a high level of accuracy in its predictions.
AlphaFold is one of DeepMind’s most significant advances. But, as with all scientific research, much remains to be done, including how do multiple proteins form complexes, how they interact with DNA, RNA, or small molecules, and how the exact location of all amino acid side chains can be determined.
As with its previous CASP13 AlphaFold system, DeepMind plans to submit a paper on how this system works in a peer-reviewed journal in due course while also investigating how best to provide wider access to the system in a scalable manner.
AlphaFold is breaking new ground to demonstrate the amazing potential of AI as a tool for fundamental scientific discovery. DeepMind looks forward to working with others to unlock this potential.
This computational work represents an amazing advance in the problem of protein folding, a major 50 year old challenge in biology. It was decades before many people in the field would have predicted. It will be exciting to see how biological research is being fundamentally changed.
Professor Venki Ramakrishnan, Nobel Laureate and President of the Royal Society
The ultimate vision behind DeepMind has always been to build AI and then use it to expand our knowledge of the world around us by accelerating the pace of scientific discovery. For us, AlphaFold is the first proof of this thesis. This advance is our first major breakthrough in a longstanding major challenge in science that we hope will have a major impact on disease understanding and drug discovery in practice.
Demis Hassabis, PhD, Founder and CEO of DeepMind
This is an incredible AI-powered breakthrough in protein folding that will help us better understand one of the most fundamental building blocks of life. This tremendous advance by DeepMind has immediate practical implications, allowing researchers to tackle new and difficult problems, from future pandemic response to environmental sustainability.
Sundar Pichai, CEO, Google and Alphabet
Read press release
DeepMind is a multidisciplinary team of scientists, engineers, machine learning experts, and more that work together to research and build safe AI systems that learn to solve problems and advance scientific discoveries for all.
DeepMind is known for developing AlphaGo, the first program to defeat a world champion in the complex game of Go. It has published over 1,000 research papers – including more than a dozen in nature and science – and has broken groundbreaking results in many challenging areas of AI in StarCraft II protein folding.
DeepMind was founded in London in 2010 and partnered with Google in 2014 to accelerate its work. Since then, the community has expanded to include teams in Alberta, Montreal, Paris, and Mountain View, California.