Biomedical research institutes are constantly shifting their focus to face the newest challenges in healthcare. As a result, their compute and data requirements change often and can be difficult to predict.
This can be a headache for CTOs and IT departments that are charged with managing current resource and planning future deployments. Historically, they could rely on traditional data storage appliances to meet their data needs. With storage taken care of, they could focus all their attention on the compute side of the equation, trusting that the storage and throughput would be adequate for whatever the compute side required.
In recent years however, these appliances have been unable to keep up with the growing complexity and variety of workflows, toolsets, and file sizes. At the same time, improved CPU and GPU-based computing have raised the bar for what HPC systems can accomplish when properly designed and deployed.
Now, instead of relying on standard storage appliances, these research computing leaders must find a new solution for improving throughput, removing I/O bottlenecks, and keeping total costs in line with their budget.
This trend is also prevalent in other industries like oil and gas, engineering and manufacturing, financial services, aerospace and defense, as well as those that use AI and ML. In fact, any organization that relies on data to drive insights, develop new markets, or operate core business practices will be unable to count on traditional storage systems as they increase their computing capacity.
Today’s workloads require a new storage solution, one that can cope with the proliferation of data and growing reliance on high-performance workflows. IDC predicts that by 2025 there will be 73.1 ZB of data generated by connected devices alone.1 Storing, accessing, and managing all this data will be a key consideration, no matter how the data is used.
Engineers can design and build the best HPC compute infrastructure in the world, but it won’t meet their client’s needs without a top-notch, integrated storage component. They need storage technology that provides the I/O throughput necessary for the massive quantities of data they consume and generate.
Because of this, we advise our medical research and other HPC-focused clients to leverage software-defined storage (SDS) on a custom hardware solution built with white-box servers. This is the best combination for performance-minded teams who need a solution tailored to their workload without the unnecessary overhead or technology lock-in of supercomputers-in-a-box.
Silicon Mechanics recently deployed a system that shows just how well an SDS-based design like this can give healthcare/life sciences clients strong ROI.
We have worked with Oklahoma Medical Research Foundation (OMRF) for years on various HPC systems, so we know the scope of their datasets and challenges. It was clear they needed a tiered storage system to support their compute workload.
Silicon Mechanics first started supplying HPC & SDS to OMRF in 2016. For this latest HPC cluster, we chose a high-density compute cluster based on the AMD EPYC 7302 16-Core 3.0GHz with 256GB of memory per node and 100GB interconnects. The cluster has over 100 compute nodes.
To get the most benefit from that best-in-class compute, we chose the massively parallel-architecture of WekaFS, which works well with the latest compute technologies and has the ability to seamlessly tier to object storage. The fast and cost-efficient Weka Limitless Data Platform storage system allows OMRF to leverage the latest technologies in storage such as NVMe, networking technologies like NVMe-oF, NVIDIA Mellanox InfiniBand, and 100Gb Ethernet.
The resulting hardware infrastructure can support mixed workloads, some of which work with a small number of large files (>100 GB), some with thousands of tiny (< 1MB) files, as well as a variety of workloads that fall in-between.
This tiered system leverages different types of storage media and file systems to maximize ROI. With nearly a petabyte of data in the overall system, 100TB is in the active tier on Weka, 250TB on a scale-out NAS tier, and 600TB in an object storage archive tier. This flexibility is key to maintaining scalability and performance for future workloads. It also eliminates unnecessary costs, like an excess of NVMe drives for non-performance storage tiers, that may be present in an appliance or similar standardized configuration.
This new system met all the requirements for storage and throughput, providing up to a 10x speed improvement on standard research jobs like Next-Gen Sequencing (NGS) analysis using the GATK pipeline, that’s a 1000% performance improvement! Not only that, but it can run more projects concurrently, which is a huge advantage when operating a shared system for different research teams.
Software-defined storage solutions are not reliant on any specific hardware manufacturer or platform. While that is a fantastic advantage in modern data management, it means the quality and efficiency of their underlying hardware infrastructure is a key variable in performance. You need to ensure you have a balanced and optimized infrastructure that is tested and ready to go when deployed.
You can maximize the benefits of SDS by working with a custom engineering firm like Silicon Mechanics that is capable of tailoring cost-effective open-source systems and providing the necessary professional services expertise.
With a partner that is dedicated to custom-tailoring, you can ‘rightsize’ your cluster, ensuring a solution is built to the specific speed, size, and complexity needed to support your specific needs, instead of simply aiming to be as large and powerful as your budget allows. And, to move more quickly, consider a partner who uses modular design, which allows independent modification and scaling of the compute and storage sides of a cluster.
To learn more about Silicon Mechanics’ system designs using SDS or if you are looking to design a new storage solution for your organization, connect with an expert at siliconmechanics.com/contact.