Computing Needs for Genome Data are Getting Larger
The journal PLoS Biology published a paper called “Big Data: Astronomical or Genomical?” on July 7.
The paper, authored by a group of biologists and computer scientists from the University of Illinois at Urbana-Champaign and Cold Spring Harbor Laboratory, suggested there needs to be better computing resources in place to deal with the influx of genomic data that will arise over the next 10 years.
An estimated 100 million to 2 billion human genomes may be sequenced by 2025, which the authors note could potentially become even larger. Countries like Saudi Arabia and England have plans to sequence 100,000 of their citizens.
Also, a continuing price drop for sequencing technology coupled with the technological advancements being able to find more genetic variations could yield even more information about a person’s DNA.
The astronomy field, along with tech companies Twitter and Youtube, were used as benchmarks to show how important it is to have a strong computing system for this cascade of data related to DNA.
Nature’s Erika Check Hayden writes “the data storage demands” for this genomic information “could run to as much as 2 to 40 exabytes (1 exabyte is 1018 bytes) because the number of data that must be stored for a single genome are 30 times larger than the size of the genome itself, to make up for errors incurred during sequencing and preliminary analysis.
This storage capacity would surpass Youtube’s, Twitter’s, and the world’s largest astronomy project called the Square Kilometre Array.
Youtube is projected to need 1 to 2 exabytes for video storage by 2025 whereas Twitter may need to use 1 to 17 petabytes per year, according to Hayden. It also exceeds the capacity for the Square Kilometre Array, which may need to use 1 exabyte per year.
Other concerns mentioned in the paper emphasize the need to have strong security and privacy protocols when sharing this information through the cloud in conjunction with having a potent CPU that can sift through the data quickly with precise results.
Hayden writes, however, that some computer experts felt a comparison with similar ‘Big-Data’ fields “is not convincing and a little glib,” but agreed the computing needs for the genomics field would be enormous.