Proteins are the molecular equipment that makes life possible, from the muscle fibers that move us to the enzymes that copy our DNA.
Protein function is strongly reliant on its three-dimensional structure, and scientists all around the world have long sought to answer a seemingly simple question: can you predict how these molecular machines are formed into their functional shape if you know the building blocks?
This is a difficult question to answer. Researchers have turned to artificial neural network models and mathematical frameworks that transform complicated patterns into numerical representations to anticipate and “see” the shape of proteins in 3D since complex structures are based on intricate physical connections.
Researchers from Georgia Tech and Oak Ridge National Laboratory use one such model, AlphaFold 2, to predict not just the physiologically active conformation of individual proteins, but also the functional protein pairings known as complexes, in a new work published in Nature Communications.
According to Jeffrey Skolnick, Regents’ Professor and Mary and Maisie Gibson Chair in the School of Biological Sciences and one of the study’s corresponding authors, the work could help researchers bypass lengthy experiments to study the structure and interactions of protein complexes on a large scale, and that computational models like these could mean big things for the field.
If these new computational models are successful, Skolnick said, “it could fundamentally change the way biological molecular systems are studied.”
The physical interactions between different [protein] domains of the same sequence are essentially the same as the interactions gluing different proteins together. It quickly became clear that relatively simple modifications to AlphaFold 2 could allow it predict the structural models of a protein complex.Mu Gao
Primed for Protein Prediction
AlphaFold 2 is a deep learning neural network model developed by London-based artificial intelligence lab DeepMind to predict the three-dimensional structure of a single protein given its amino acid sequence.
The Alphafold 2 program was highly successful in blind tests at the 14th iteration of the Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction, or CASP14, a bi-annual competition where researchers from around the world gather to put their computational models to the test, according to Skolnick and Mu Gao, a senior research scientist in the School of Biological Sciences.
“To us, what is striking about AlphaFold 2 is that it not only makes excellent predictions on individual protein domains (the basic structural or functional modules of a protein sequence), but it also performs very well on protein sequences composed of multiple domains,” Skolnick shared.
As a result of the program’s capacity to anticipate the structure of these complex, multi-domain proteins, the research team set out to see if it could go much further.
“The physical interactions between different [protein] domains of the same sequence are essentially the same as the interactions gluing different proteins together,” Gao explained. “It quickly became clear that relatively simple modifications to AlphaFold 2 could allow it predict the structural models of a protein complex.”
Davi Nakajima An, a fourth-year undergraduate in the School of Computer Science, was invited to join the team’s attempt to investigate new tactics.
The researchers combined the input features of many protein sequences instead of feeding in the attributes of just one protein sequence as AlphaFold 2 was designed to do. Their new program AF2Complex was designed using new criteria to estimate the strength of connections among investigated proteins.
Charting New Territory
To put AF2Complex to the test, the researchers teamed up with Georgia Tech’s Partnership for an Advanced Computing Environment (PACE) and gave the model the task of predicting the shapes of protein complexes it had never seen before.
In comparison to a more traditional method called docking, the improved algorithm was able to properly predict the structure of over twice as many protein complexes. Unlike AF2Complex, which just requires protein sequences as input, docking requires prior knowledge of individual protein structures in order to anticipate their combined form using complementary shapes.
“Encouraged by these promising results, we extended this idea to an even bigger problem, which is to predict interactions among multiple arbitrarily chosen proteins, e.g., in a simple case, two arbitrary proteins,” shared Skolnick.
AF2Complex was tasked with determining which of over 500 pairs of proteins could form a complex at all, in addition to predicting the shape of protein complexes. AF2Complex outperformed traditional docking approaches and AlphaFold 2 in detecting which of the arbitrary pairs were known to empirically interact using newly defined metrics.
The researchers used the Summit Oak Ridge Leadership Computing Facility, the world’s second-biggest super-computing center, to test AF2Complex on the proteome scale, which spans an organism’s full library of proteins that can be produced.
“Thanks to this resource, we were able to apply AF2Complex to about 7,000 pairs of proteins from the bacteria E. coli,” Gao shared.
“In that test, the team’s new model not only identified many pairs of proteins known to form complexes, but it was able to provide insights into interactions suspected but never observed experimentally,” Gao said.
Further investigation of these interactions revealed a possible molecular basis for protein complexes that are crucial for energy transport. Hemes, important metabolites that give blood its dark red hue, are known to be carried by these protein complexes.
Jerry M. Parks, a senior research and development staff scientist at Oak Ridge National Laboratory and a partner in the project, was able to put hemes at their hypothesized reaction locations inside the structure using AF2Complex’s predicted structural models.
“These computational models now provide insights into the molecular mechanisms for how this biomolecular system works,” Gao said.
“Deep learning is changing the way one studies a biological system,” Skolnick added. “We envision methods like AF2Complex will become powerful tools for any biologist who would like to understand molecular mechanisms of a biosystem involving protein interactions.”