Few headlines today announce the winner of computing challenges to be something other than artificial intelligence (AI). Judging by many media accounts, the robots always win, even the bodiless software ‘bots. But other computing approaches do win challenges, at least some of the time, and now bioinformatics has thoroughly thumped AI in one.
To be clear, AI is a bit of a misnomer in this context because most of those challenges are won by machine learning (ML) and deep learning (DL) computing models rather than AI as science fiction or serious scientists would define it. Nonetheless, machine models that can be trained, that “learn” in the machine rather than human fashion, are impressive achievements. And, yes, they often win computing challenges because they tend to be very efficient at large scale pattern detection and incredibly fast at it too.
But in this contest, researchers wanted to know which computing method – AI or bioinformatics – is more effective at accurately tracking the lab of origin for a synthetic genetic sequence. Todd Treangen, a computer scientist at Rice University’s Brown School of Engineering, and his team pitted sequence alignment and pan-genome-based methods in bioinformatics against deep learning models – and the race was on!
The bioinformatics tool known as PlasmidHawk won by correctly predicting the lab of origin 76% of the time, and by accurately including the responsible lab 85% of the time in its top ten list of lab possibilities. By comparison, a deep learning recurrent neural network (RNN) method was accurate 70% of the time in correctly identifying the lab of origin.
"This is, in a sense, against the grain given that deep learning approaches have recently outperformed traditional approaches, such as BLAST," Treangen said in an article in Science Daily.
"We show that a sequence alignment-based approach can outperform a convolutional neural network (CNN) deep learning method for the specific task of lab-of-origin prediction," he said.
The bioinformatics tool won the contest using the same data set as one of the deep learning experiments it challenged. The research methodology and findings are further explained in both the Science Daily article and in an open-access paper the Rice University team published in Nature Communications. PlasmidHawk is available on GitLab.
But the gist of the researchers’ finding is that "to predict the lab-of-origin, PlasmidHawk scores each lab based on matching regions between an unclassified sequence and the plasmid pan-genome, and then assigns the unknown sequence to a lab with the minimum score," according to lead author Qi Wang, a Rice graduate student, in the Science Daily article.
Why it matters
The program may be useful in tracking potentially harmful engineered sequences used in bioweapons and biowarfare. Conceivably it could prove the innocence of a wrongly accused lab too. But it’s also helpful in more normal and benign commercial activities.
"The goal is either to help protect intellectual property rights of the contributors of the sequences or help trace the origin of a synthetic sequence if something bad does happen," Treangen said.
PlasmidHawk achieves these feats in part, according to the paper in Nature, by “precisely singling out the signature sub-sequences that are responsible for the lab-of-origin detection.” In summary, PlasmidHawk “represents an explainable and accurate tool for lab-of-origin prediction of synthetic plasmid sequences.”
Further, the tool is more efficient and perhaps less complicated to use. The researchers said that PlasmidHawk needs “less pre-processing of data” than machine learning models or algorithms require, and “does not need retraining when adding new sequences to an existing project.” Another huge advantage of this tool over machine learning is its ability to provide “a detailed explanation for its lab-of-origin predictions in contrast to the previous deep learning approaches.”
But winning the race doesn’t knock machine learning and AI out of the game. Quite the contrary.
"The goal is to fill your computational toolbox with as many tools as possible," said co-author Ryan Leo Elworth, a postdoctoral researcher at Rice in Science Daily. "Ultimately, I believe the best results will combine machine learning, more traditional computational techniques and a deep understanding of the specific biological problem you are tackling."