Advances in precision medicine, genomics, and imaging; the widespread adoption of electronic health records; and the proliferation of medical Internet of Things (IoT) and mobile devices are resulting in an explosion of structured and unstructured healthcare-related data. Industry analysts expect that by 2020, the amount of medical data in the world will double every 73 days. And a typical healthcare consumer in the developed countries will generate 1,200 terabytes of data in a lifetime.
A substantial amount of this healthcare data deluge serves to advance precision medicine, such as medical research, which drives the need for high performance computing (HPC) environments to support big data demands. For example, sequencing an individual’s entire genome, a task growing ever more common, requires the same amount of data storage as 100 feature-length movies.
But data volume isn’t the only problem faced by medical research – disparate file types generated by different research tools and environments create silos that impede data access, drive down efficiency, drive up costs, and slow times to insight. To address the challenges posed by both the volume and variety of medical research data, world-class healthcare organizations are building data oceans.
To construct data oceans, software-defined infrastructure provides a foundation to manage and run rapidly evolving healthcare and life sciences applications (e.g. genomics, imaging, clinical, etc.). SDI enhances the HPC platform with analytics open frameworks such as Hadoop and Spark and consolidate disparate data stores. Behind the scenes, the SDI architecture creates a data hub to manage the ocean of data, orchestrate the different applications, and provide intelligent workload and policy-driven resource management. Putting all the data together into one coherent data resource to be analyzed and making it available to all users anywhere-anytime is key to facilitating research and accelerating time to insights.
The benefits are substantial. The ability to automatically migrate medical data to the optimal storage tier can substantially reduce costs. Eliminating the need for separate processing platforms for different data types dramatically increases resource utilization. Massive parallel processing and enhanced application and data portability accelerate time to insights.
But it’s not entirely serene sailing across HPC data oceans. They don’t solve every data processing problem. Medical research, like many other HPC environments, generates peaks and valleys of resource demand. The very efficiency and high utilization rates that data oceans are designed to produce can work against them when demand peaks beyond infrastructure capabilities. To accommodate these spikes in demand, traditional HPC environments often divide up jobs and stretch out scheduling – lengthening time to insight. But the very same SDI solutions used to create data oceans can address this challenge as well – by adopting hybrid cloud.
For example, by utilizing Spectrum LSF, healthcare researchers can determine through advanced reporting functionality where the bottlenecks are that cause jobs to run slower. Then IBM Spectrum LSF can move targeted jobs to the cloud. Does an HPC job require more memory? Run it on servers in the cloud with more memory. Does it need faster access to the underlying data? Provision a massively parallel IBM Spectrum Scale file system for the fastest access on the planet. Whatever resources are needed, with IBM SDI solutions, healthcare researchers can provision the required system for peak demand periods in the cloud, dynamically and automatically, only for as long as needed, resulting in faster insights for a fraction of the cost of building on-premises solutions.
The question becomes, can every high performance data architecture provide the infrastructure support, flexibility, and agility needed to meet highly demanding and unpredictable healthcare HPC requirements? Here again, IBM is offering solutions for cloud-scale data management and multi-cloud workload orchestration based on a reference architecture for high performance data and AI platforms (HPDA). Teaming with L7 Informatics (L7), the two solution-providers have built a cloud-based HPC environment that enables scientists to process and analyze huge volumes of genomics data up to 96% faster. Built on the IBM Cloud platform, the L7 Genomic Cloud uses IBM Spectrum Scale and IBM Spectrum LSF to support rapid data processing and analysis.
Chris Mueller, Founder, L7 Informatics, explains: “IBM Spectrum Scale provides high-performance data storage that we can scale quickly and easily. Built-in tiering capabilities allow a lot of flexibility in how we move data around, enabling customers to seamlessly migrate data from lab instruments up to the cloud for analysis and long-term storage. IBM Spectrum LSF, meanwhile, offers everything we need for HPC workload management in a single package, from job scheduling tools to resource management capabilities. It gives us the tools to manage the L7 Genomic Cloud as a complete HPC environment rather than just as a virtual machine and associated storage layer, providing intelligent, policy-driven scheduling and improved visibility to increase throughput.”
Oceans and clouds. For millennia these natural systems have nourished and supported humanity. Perhaps it’s not as surprising as it might seem that digital versions of them are now helping to accelerate medical advancements that offer great human benefit.
Follow the links in this article to learn more about how you can build software-defined data oceans and agile hybrid clouds that help your organization lower costs while gaining precious insights faster.