Google Cloud is rapidly expanding its portfolio of solutions for high performance computing, highlighted by a flurry of announcements at its latest event, Google Cloud Next ‘24. Conference attendees were introduced to new products and updates for the company’s highly customizable range of products specifically designed for the HPC community.
A3 VMs Shine in MLPerf Inference v4.0
Google Cloud’s A3 VMs showed impressive results in the latest MLPerf Inference v4.0 benchmark testing. A3 VMs are designed for training sophisticated AI models like LLMs and combine NVIDIA H100 GPUs with Google’s leading networking technology.
Google submitted 20 results across seven models for MLPerf, including Stable Diffusion XL and Llama 2 (70B) using A3 VMs. All results were within 0-5% of the peak performance demonstrated by NVIDIA’s submissions.
The A3 VM family joined other 3rd generation offerings like Google’s HPC-optimized H3 VM, which are ideally suited for HPC applications like climate modeling, scientific computing, engineering simulation, and more.
Parallelstore
Parallelstore optimizes resources for data-intensive AI/ML workloads by eliminating redundant data storage, reducing costs and idle GPU time.
This service is currently in private preview, and Google Cloud has seen performance results of up to 130 GiB/s read and IO latencies of less than 0.3ms per random read. For those interested in learning more about the private preview, please reach out to your Google Cloud team for more information.
Cloud HPC Toolkit Additions: Blueprints for ML and CAE
The Cloud HPC Toolkit is open-source software offered by Google Cloud which makes it easy for you to deploy HPC environments. It is designed to be highly customizable and extensible, and intends to address the HPC deployment needs of a broad range of use cases.
There are two intriguing new blueprints within the Cloud HPC Toolkit. The first is a blueprint for ML workloads (including LLM training) that allows users to spin up an HPC system running on A3 VMs with NVIDIA H100 Tensor Core GPUs which require attentive management of infrastructure and network configuration. The Cloud HPC Toolkit ML blueprint enables this through components including the open source scheduler Slurm, a fully managed Filestore, pre-configured user environments, and more.
The second new solution is a blueprint for computer aided engineering. CAE workloads are compute-intensive applications including structural analysis, fluid dynamics, thermal analysis, and electromagnetic analysis. The innovative CAE Reference Architecture blueprint harnesses the power of H3 and C3 VM families to deliver robust performance for major CAE software such as Ansys Fluent and Siemens Simcenter STAR-CCM+, ensuring efficient handling of memory-intensive workloads and complex resource management.
Customer Success Story: Stanford University
Stanford’s Doerr School of Sustainability is leveraging Google Cloud’s HPC Toolkit to meet the growing demands of its researchers. The toolkit’s flexible deployment options allow Stanford to seamlessly integrate cloud computing with on-prem resources, providing a consistent and familiar user interface through Chrome Remote Desktop. This approach allows researchers to access interactive nodes remotely while maintaining an experience akin to using on-premises clusters.
As a testament to the unmatched customization the HPC Toolkit offers, the school has developed its own modules for secure and efficient use of Vertex AI instances for code development.
Robert Clapp, a Stanford senior research engineer, explains how HPC Toolkit enables fast, secure, and consistent HPC deployment at scale: “With the Toolkit, we can stand up clusters with different partitions depending on our users’ needs, so that they can take advantage of the latest hardware like NVIDIA GPUs when needed and leverage Google Cloud’s workload-optimized VMs to reach price-performance targets. Dynamic cluster sizes, the ability to use spot VMs when appropriate in cluster partitions, and the ability to quickly get researchers up and running in environments they are used to have all been enhanced by the Toolkit.”
Visit Google Cloud at ISC 2024
The rapid pace of innovation shows this is an exhilarating time for HPC customers. Coming right off the heels of an inspiring Google Cloud Next ‘24 just a few weeks ago is another major event: ISC High Performance 2024 in Hamburg, Germany. The May 12-16 conference and exhibition will highlight the latest advances in HPC, machine learning, data analytics, and quantum computing. The Google Cloud team will be there to connect with the HPC community and demonstrate its continually expanding innovations in HPC. Visit Booth D19 to learn more.