Optimize Your Cloud Storage Strategy for HPC
[Connect with IBM Spectrum LSF users and learn new skills in the IBM Spectrum LSF User Community.]
When it comes to HPC hybrid cloud management, data is the elephant in the room. The cost of storing, moving, and synchronizing data often determines where workloads should run. While compute costs tend to be predictable and linear, storage costs are more complex. Cloud providers charge for storage based on multiple dimensions, including capacity, bandwidth, IOPS, and storage types often with complicated, tiered rate structures.
While compute instances that can be turned on and off quickly, this is not the case for data. When data is maintained in a public cloud, it is often migrated to lower-cost storage tiers when not in-use, or retrieved to on-premises storage resulting in data egress charges. Storage and network-related charges are often where cloud cost overruns occur because of a multitude of hidden costs.
Hidden costs abound
Cloud data storage is made challenging by the sheer number of variables. These include:
- Multiple storage types – Cloud providers offer multiple types of storage including block storage, filesystems, various classes of object storage and backup services
- Data availability options – Depending on data availability requirements, data may be stored redundantly across multiple drives in a single data center, across multiple data centers in the same region or across regions.
- Storage Temperature – For object storage, cloud providers typically offer multiple storage classes depending on the frequency of access (standard, vault, cold-vault, etc.)
- IOPs-based charges – For some services, cloud providers charge based on the total number of IOPS
- Per-transaction, data egress, and other costs – Cloud providers typically charge incrementally for data access requests (PUT, GET, LIST, etc.), outbound data movement (egress), snapshots and data replication fees.
To avoid costs overruns, select the right type of storage for your application, monitor the various add-on costs above, and avoid persisting data in expensive cloud storage tiers. Some specific recommendations are provided below.
[Read also: The Perils of Becoming Trapped in the Cloud]
Five tips to reduce your cloud storage bill
- Pay attention to data gravity – For workloads that access large datasets, where a job runs is often dictated by the data’s location. Cloud providers bias users to persisting data in the cloud by providing free data ingress and taxing egress. This tends to increase overall cost and drive consumption of additional cloud services. To manage this, use a workload scheduler that supports cloud bursting, data-aware scheduling, and data staging capabilities. These capabilities will help determine the most cost-efficient location to run workloads and automatically stage data as needed to minimize network transfer and storage-related costs and delays.
- Use object storage appropriately – If you need to persist large datasets in the cloud, look for file systems that support transparent cloud-tiering and store infrequently used data in lower-cost object stores. Make sure that you take data retrieval costs and performance specs into account when selecting long-term storage. Choosing a less expensive cold storage tier can sometimes lead to higher costs depending on how often data is accessed. Despite their low storage costs, cold storage options frequently have incremental data retrieval charges, and it will take longer to restore data from cold storage impacting productivity.
- Automate decisions at runtime – Many HPC workloads are comprised of multi-step workflows with data dependencies between steps. Look for workflow management tools that are hybrid cloud-friendly and that support conditional branch logic based on workflow variables(1). By applying conditional logic, you can make workflows “smart” and automate decisions at run-time. For example, depending on the size of an intermediate dataset, and resource availability, you can determine at run-time whether it is more cost-efficient to move data to the cloud for processing or wait until local resources become available. Workflow steps can optionally orchestrate storage services or proactively migrate data. For example, if you anticipate that a future workload step will need cloud-based instances, you can provision a file system and start retrieving data from an object store or moving data in advance so that data will be accessible when needed.
- Use monitoring and alerting facilities – According to InfoWorld, as much as 35% of cloud spending is wasted(2) and Gartner says that 80% of organizations will overshoot their cloud budgets in 2019(3) because they lack the necessary tools and internal spending controls. Use cloud-provider monitoring and alerting facilities and workload and cluster monitoring tools to detect orphan instances and idle storage services so that they can be shut down or migrated to lower-cost storage tiers to avoid cost overruns.
- Pursue a hybrid-cloud strategy – While cloud storage is convenient, it can be more expensive than on-premises solutions over the long term. For HPC centers, it’s prudent to provide in-house storage solutions that are multi-protocol and hybrid-cloud aware. To ensure workload portability, local storage should present the same access methods as popular cloud services including file, object, and block storage. The local storage should be extensible to multiple clouds to avoid cloud provider lock-in.
Smarter solutions for hybrid cloud data management
Whether you decide to deploy storage on-premise, in the cloud or both, IBM offers a variety of solutions that can help reduce cloud spending. Organizations that prefer to run in the cloud can tap a comprehensive set of cloud services, including object storage, block storage, file storage, and IBM Cloud Backup along with monitoring tools to manage costs.
For clients that want to deploy an on-premises or hybrid-cloud storage environment, IBM’s Elastic Storage Server (ESS) is a software-defined, scalable file and object store that is easy to deploy and manage. It supports a comprehensive set of data access methods (POSIX, NFS, SMB/CIFS, iSCSI, S3, Swift and OpenStack) and provides transparent cloud-tiering to seamlessly extend storage to multiple clouds including the IBM Cloud and Amazon Web Services (AWS).
The IBM Spectrum LSF family provides workload management and cloud-provisioning facilities that enable administrators to manage application workloads across multiple clouds. Capabilities such as data-aware scheduling, IBM Spectrum LSF Data Manager and IBM Spectrum LSF Process Manager enable HPC administrators to automate the cost-saving measures described above and use cloud compute and storage resources more efficiently to reduce overall spending.
- 35% of cloud spending is wasted – https://www.infoworld.com/article/3344477/why-35-percent-of-cloud-spending-is-wasted.html
- How to Identify Solutions for Managing Costs in Public Cloud IaaS – https://www.gartner.com/en/documents/3847666/how-to-identify-solutions-for-managing-costs-in-public-c0