Articles

Six Reasons to Add Object Storage to Your Genomics Lexicon

Tue, 12/15/2015 - 8:42am
Claire Giordano, Senior Director of Emerging Storage Markets, Quantum

In genomics and bioinformatics, technology advances in next-generation sequencing are enabling organizations to generate more genomics data, more quickly. Over the last decade, the amount of genomics data produced has doubled every 7 months, worldwide. The data growth rates are so staggering that one expert even proposed we replace the term “astronomical” with “genomical.” But all this valuable sequencing data has introduced the challenges of scale to the field of genomics—how can organizations provide access to all of this NGS data to enable research: high-speed access, distributed access, long-term access, and affordable access? One storage technology that is being adopted to address the challenges of data growth is object storage. Based on erasure coding techniques that are at the foundation of digital communications, object storage can be used to efficiently store large data sets across geographies.

Six reasons to add object storage to your genomics lexicon

Object storage has been around for some time, but the technology is now gaining more and more traction in life sciences. This article outlines six reasons that organizations should add object storage to their genomics lexicon.

1. Massive scale for active archives

With today’s genomical data growth, storage infrastructure needs to scale. Organizations with hundreds of terabytes today will need petascale solutions in the not-too-distant future—and many research organizations already have petabytes of genomics data to manage. And while size really does matter, enabling massive scale is about more than capacity. Scale is about enabling performance even when there are hundreds of scientists analyzing and editing the data. Scale is about enabling sharing across large and distributed teams—especially as genomic research moves into clinical environments. Scale is about enabling organizations to grow their technology infrastructure in a flexible way, without being constrained to a particular operating system, or to a particular networking topology. Object storage is designed to scale in all these dimensions—making it perfect for today’s big genomics archives.

2. Durable access for the long-term

When bioinformaticians are asked about their retention policies for genomics data, it’s not unusual to hear answers like “in perpetuity” and “forever.” To meet the demands of these very-long-term retention policies, institutes need infrastructure that protects the integrity of the data, so that when a researcher needs to access DNA-sequence data in 2 years, or 5 years, or 10 years—the data is valid. With data integrity checks that run in the background, object storage is particularly good at detecting and repairing bit errors, especially as compared to traditional RAID disk storage solutions. And to ensure that data is accessible in 50 years, object storage is self-migrating, which means that it supports non-intrusive rolling upgrades that enable administrators to keep the storage technology current without the cost and disruption of expensive forklift upgrades.

3. Access across geographies

Object storage employs erasure coding technology to disperse data across different disks, storage nodes, and racks—and even across data centers and geographies. If a particular disk drive or storage node fails, the data can still be read from the remaining disks and nodes. This resilience to device failure also works if an entire data center is destroyed in a natural disaster. The data dispersion technology in object storage yields two big benefits: the first benefit being disaster preparedness. The second benefit—and perhaps the more important one in today’s hyper-connected genomics community—is that spreading data across locations makes it easier to share between those same locations, without the overhead and bandwidth load of traditional data replication technologies.

4. Flexible access, regardless of infrastructure

When preserving genomics data for decades, you need to plan for the evolution of the analysis software, the advances in networking connectivity, and the increased sophistication of lab infrastructure. Which means you need storage solutions that can interoperate well with traditional NFS and CIFS applications—as well as applications that take advantage of cloud protocols and S3 access. Object storage solutions do this well—supporting the past, the present, and the future of genomics workflows.

5. High-speed access for bioinformatics workflows

Genomics, proteomics—and all the many variations of omics research today—are unlocking the code of life. But unraveling the mysteries of genomes takes complex computational analysis and high-speed access to data. When it comes to petascale archives, object storage offers lower-latency access as compared to big tape archives—this means that it takes less time to read a file that has been archived to object storage, start to finish. And when an object storage solution is integrated as a tier in a multi-tier storage solution, it’s easy to make sure the active data is on higher speed storage such as flash, further enabling the bioinformatics workflow.

Read More: A Guide to Targeted NGS: Generating Accurate Data for Personalized Medicine

6. Affordable access today and tomorrow

While genomics data has been growing at staggering rates, data storage budgets have not. Research teams today place immense value in their data and want to preserve it for future reuse and analysis—but they do not have an infinite budget. Because the erasure coding software in object storage is coded to be resilient to device failures, object storage lowers operating costs—since it eliminates the need for most unscheduled maintenance on the hardware, and since it can leverage low-power, low-cooling cloud drives.

Object storage is the future of genomics archives

The advances in high-throughput sequencing are changing our planet and our lives. But to analyze and study genomics data, researchers need new types of data management. Genomics research demands storage infrastructure that does more than store data—today’s genomics teams need infrastructure that provides access to information: shared access, high-speed access, distributed access, and affordable access. The good news for researchers whose aim is to transform genomics data into insights: object storage provides a unique combination of features that is an ideal match for today’s genomical data growth.

Share this Story

X
You may login with either your assigned username or your e-mail address.
The password field is case sensitive.
Loading