A new algorithm can visualize the likely evolutionary pathway of genetic sequences responsible for the production of a specific protein. Photo by McCandlish lab, CSHL
April 15 (UPI) -- Scientists have developed a new algorithm that can predict how a protein could evolve to become highly effective or totally unproductive.
The machine learning model -- detailed this week in the journal Nature Communications -- works by analyzing how different combinations of genetic mutations trigger changes in the functionality of a protein.
In the lab, scientists studied the effects of mutations among the genes responsible for the production of a protein called streptococcal GB1. Because the GB1 protein is so complex, it can be influenced by a near infinite number of mutation combinations.
Scientists built the algorithm with the idea that all mutations matter, because different combinations of mutations produce unique interactions -- what scientists call epistasis.
"Having two mutations that cancel each others' effects is one possible example of epistasis," study author David McCandlish, an assistant professor at Cold Spring Harbor Laboratory, told UPI in an email. "But there are many other possibilities, such as two mutations that amplify each others' effects, or more complex interactions among three or more mutations. The multitude of ways that mutations can interact is part of what makes this area so fascinating."
In other words, a single gene mutation is unlikely to evolve a functional protein on its own. Evolutionary adaptations require a multitude of gene mutations.
"It is often observed that an adaptive mutation only has its beneficial effects in the presence of a specific 'permissive' or 'potentiating' mutation that precedes it," McCandlish said.
So how do a group of genes mutate to yield an effective protein, and what does it look like? The new model offers clues -- clues that can be visualized.
First, scientists were able to use what they already know about genes to simulate their potential behavior.
"Under some simplifying assumption, our first-principles understanding of how selection, mutation and genetic drift interact allows us to write down equations that give the probability of any specific evolutionary path," McCandlish said.
Unfortunately, studying the behavioral patterns of genes doesn't reveal whether the changes they necessitate will prove advantageous or not.
"A key missing ingredient is that we don't typically know the fitnesses of the genotypes along the path, which is needed to model the effects of natural selection," McCandlish said. "The imputation method described in the paper lets us use experimental data to fill in these missing fitnesses, so that we can use our theoretical understanding to calculate these evolutionary probabilities for a specific protein or DNA sequence."
When the algorithm combines basic assumptions about gene behavior and experimental data, it can predict optical evolutionary pathway for different sequences of gene mutations. It can also predict how long it will take for one genetic sequence to evolve into another more effective and efficient sequence.
In the future, scientists could use the algorithm to predict the evolutionary pathway of a potentially dangerous virus. The algorithm could also identify the best way to intercept an evolving virus before it mutates into a more dangerous form.
"Because at present the SARS-CoV-2 genome does not appear to be undergoing positive natural selection for increased functionality in human hosts, this work likely isn't particularly relevant to the current phase of the COVID-19 pandemic," McCandlish said.
"However, the method has many applications to other rapidly evolving pathogens where a key question is how to predict the evolution of drug resistance," he added. "The method may also be particularly useful in protein engineering, where we are trying to find protein sequences that do new functions such as catalyzing new chemical reactions."