Data is critical to HPC, and ensuring your simulations have the data they need — when they need it — is essential. However, data can originate from many sources and need to be consumed by diverse resources. Having the flexibility to add more and different types of storage options to your cluster makes these data more readily available for your jobs.
Since launch, Amazon FSx has been aiming to provide you more options to launch, run, and scale feature rich and cost-effective storage – powered by your choice of filesystems. AWS ParallelCluster helps by enabling integration with these recent filesystem choices giving you the same flexibility so you can better architect your HPC storage.
AWS ParallelCluster version 3.2 introduces support for two new Amazon FSx filesystem types (Amazon FSx for NetApp ONTAP and Amazon FSx for OpenZFS). It also lifts the limit on the number of Amazon FSx and Amazon EFS filesystem mounts you can have on your cluster.
By increasing the options for filesystem access, your HPC workloads on AWS will have more pathways to get access to the data they need without you having to do the hard work. In today’s post, we’ll explain this in detail.
ParallelCluster already has support for Amazon Elastic File System (EFS), Amazon Elastic Block Store (EBS) and Amazon FSx for Lustre. In this release we added support for the FSx for NetApp ONTAP and FSx for OpenZFS filesystems.
Different filesystem types have specific characteristics making them more suited to different data types and workflows. For example, Amazon FSx for OpenZFS is a simple and powerful shared file storage based on OpenZFS and delivers ultra-high speed at low cost. You’ve probably been using OpenZFS for its efficiency and performance features like copy-on-write that enables instant snap-shots, integrated data resiliency, and its adaptive replacement cache – all built into the filesystem. You now have the choice to use the filesystem that is most appropriate for your needs, without worrying about incompatibility in ParallelCluster.
Prior to this release, ParallelCluster could only support the mounting of one of each file system types (e.g. one EFS mount and one FSx for Lustre). That required you to consolidate your data storage, forcing you to do more planning of your overall HPC storage configurations, due to these limited attach points.
With this ParallelCluster 3.2, you can now mount up to 20 Amazon FSx file systems and up to 20 Amazon EFS filesystems. These mounts are for existing filesystems, where your data already exists. They are not managed by ParallelCluster, so there is no data movement required. This also means they persist when you delete your cluster, allowing you with more control in decoupling the cluster infrastructure from the data. The User Guides for each filesystem type document best practices for creation and management (FSx for Lustre, FSx for NetApp ONTAP and FSx for OpenZFS).
Together these two features add a significant level of flexibility for storage solutions within ParallelCluster.
Using these new filesystems
The cluster configuration YAML syntax for multi-filesystem mounts hasn’t changed with this new release – you can simply specify more entries in the SharedStorage section of the configuration file. ParallelCluster will create the cluster with the specified storage mounted and ready to use. Here is an example of the storage configuration with the newly added entries for FSx for NetApp ONTAP and FSx for OpenZFS…