Bioinformatics, aka computational biology, makes it possible for us to learn what could never be known or understood earlier. In this case, the technology makes it possible for an international team of researchers to trace not only the origin but the ancestral tree of the newly discovered coronavirus. It was a massive computational quest with many challenges along the way.
"Coronaviruses have genetic material that is highly recombinant, meaning different regions of the virus's genome can be derived from multiple sources," said author Maciej Boni, PhD, associate professor of biology at Pennsylvania State University, in a report in myScience.
"This has made it difficult to reconstruct SARS-CoV-2's origins. You have to identify all the regions that have been recombining and trace their histories. To do that, we put together a diverse team with expertise in recombination, phylogenetic dating, virus sampling, and molecular and viral evolution."
For a better look at just a few of the difficulties and the effort involved in tracing viruses back through time, watch this video.
But tracing the origins and lineage of viruses is not just a matter of satisfying researchers’ curiosity. Having this information has serious implications in preventing future pandemics from viruses stemming from this same lineage.
One thing these researchers learned was that the receptor-binding domain (RBD) on the Spike protein is one of the oldest traits in the coronavirus family lineage. RBD is how the virus finds and binds to receptors in human cells. It’s how this family of viruses infect humans, in other words.
"This means that other viruses that are capable of infecting humans are circulating in horseshoe bats in China," said David L. Robertson, professor of computational virology, MRC-University of Glasgow Centre for Virus Research.
They also learned that early research of the new virus was wrong in the belief that these viruses need to jump from bats to an intermediary species -- where it would evolve further -- before infecting humans. Pangolins were initially pegged as the intermediary species in the spread of the novel coronavirus. But as it turns out, that is not quite the case.
“While it is possible that pangolins may have acted as an intermediate host facilitating transmission of SARS-CoV-2 to humans, no evidence exists to suggest that pangolin infection is a requirement for bat viruses to cross into humans. Instead, our research suggests that SARS-CoV-2 likely evolved the ability to replicate in the upper respiratory tract of both humans and pangolins," said Robertson.
The primary implication from this research is that humans need to monitor wild bats more closely to find potential new viruses from this same lineage before they can create either an epidemic or a worldwide pandemic.
“The key to successful surveillance," said Robertson, "is knowing which viruses to look for and prioritizing those that can readily infect humans. We should have been better prepared for a second SARS virus."
But now with the aid of technologies like bioinformatics and machine learning, we know what to do next time. Knowledge, after all, is power. And we know have the power to save many human lives in the future.