No man is an island, and neither is his DNA an isolated incident; enter population genomics to illuminate your evolutionary path with others in your area. The same applies for any species from the microscopic to the Blue Whale. So much additional information is a treasure beyond measure. It’s also almost beyond the abilities of computing. Almost. This is how it works.
Sequencing a genome and unfolding its secrets was once thought to be the stuff of science fiction. Instead, it became one of the most significant scientific achievements in the history of man. While work continues in efforts to learn more about individual genomes, interest in comparing the DNA sequences of entire populations is growing, too.
Population genetics is the study of how inhabitants of a specific species change genetically over time. In other words, it’s a study of that species’ evolution in that specific place over time. That’s easier said than done however, which is why supercomputing is necessary to actually do all the math.
Here’s an easy-to-follow video explaining the concept from a high level for a quick idea of what all this means.
As you might imagine, scaling up a population genomics computing workload is a bit of a challenge. That’s a lot of information to stuff into a mathematical model for a computer to solve.
Computers do have limits, after all, which is why we humans come up with sophisticated terminology to describe the boundary. Words like “big data” which just means data too big for computers that we typically use today to calculate such things. The actual size of “big data” is thus always changing as it is relative to the computing power available on the day of discussion.
By those imprecise terms, population genomics amounts to really big ginormous data, if you can imagine such a thing. Even if you were only processing the genomes from a population of 100 individuals, the resulting amount of information is staggering. The human genome contains about 3,000 Mb of data, says the National Human Genome Research Institute. Now, multiply that by the 100 people in our population example here, and you can see how fast the size of the data grows.
But population genomics can be a study of much larger population sizes. For example, maybe the population under study is the residents of a large city. Tokyo, Japan has a population of 38,001,000 and Delhi, India has 25,703,168 with Shanghai, China following close behind with a population 23,740,778, according to the World Atlas. Computing such breath-takingly huge genetic data sets requires bioinformatics, more than a dash of machine learning (a subset of AI), and either supercomputing or high-performance computing (HPC) which isn’t quite the same thing as supercomputing, but it’s close.
There are population genomics projects (as well as other types of projects) that are of such a beastly size that even supercomputers fail to calculate them – or fail to finish calculating in one person’s lifetime. Truly mind-boggling amounts of data used for those workloads will one day be computed by quantum computers.
As scientists become able to mine and analyze genomic data from the microscopic bacterium to the planet’s entire population, and perhaps one day even populations on other planets, amazing discoveries will unfold at a dizzying rate.
While one of the main goals today is to develop highly effective personalized medicine, there are likely to be other discoveries and innovations that we have yet to imagine. One thing is certain, the more we know, the better off we’ll likely be.