Fresh Approaches to Extending Enterprise HPC to Public Clouds

By Sam Mitchell

October 15, 2014

Public cloud can be an easy choice for enterprises looking to extend high performance workloads, reduce infrastructure costs and increase flexibility. The cloud offers the chance to reduce the capital cost of owning and managing excess compute capacity and storage for all workloads. Enterprises can avoid the hidden costs of unused compute capacity by “cloud bursting” or shifting some peak demand toward the cloud-based HPC grid extensions. But how do you connect to existing grid resources and attest to security compliance?

For the security and network management needs of HPC users considering the cloud, the best solution is connecting to the existing grid with overlay networks. An overlay network simply creates a private, sealed network on top of any existing network. Using overlay networks over top of public cloud resources can add the flexibility, high availability and the robust security that HPC grid operators need to cope with unforeseen capacity demands.

With the following tried and true best practices for high performance computing (HPC) in the public cloud, enterprises – even in regulated industries such as healthcare and financial services – can manage a secure cloud-based HPC environment and still benefit from cloud’s economies of scale. Once HPC nodes are set up and secured in cloud, connecting between existing HPC grids and new deployments can be complex. By using a manageable and compliant cloud network topology, enterprises can ease the transition into cloud-based HPC.

The path to HPC in the public cloud starts by selecting trustworthy cloud providers and creating secure cloud deployments. Historically, HPC environments have been expensive to own, manage and operate as entirely on-premise compute capacity. One reason this happens is that organisations often require extra compute resource for irregular one-off jobs containing sensitive data such as intellectual property. Cloud infrastructure is an excellent way to expand quickly for unexpected one-off projects.

HPC grid extensions can ensure one-off projects do not break the bank and, with added encryption from an on-premise grid to a cloud-based grid extension, that the projects comply with regulatory requirements. Ultimately, HPC cloud best practices can help an enterprise save capital costs, prevent vendor lock-in, conserve IT resources and prevent organisations from having to change HPC vendors.

Organisations can use public cloud to extend existing HPC grids securely while reducing infrastructure costs, increasing flexibility, using existing grid resources, and attesting to security compliance. When possible, HPC grid managers can migrate additional workloads to a remote cloud provider in order to save overhead. HPC organizations can use public cloud to pursue smaller deployments and to keep up with the latest in technology. Plus, enterprise teams can use cloud deployments to develop parallel streams of work on identical, yet distinct, copies of their network topology to boost productivity in an agile and secure cloud environment.

Data is another primary consideration. Currently, enterprise application requirements and data volumes continue to grow significantly. The EMC Digital Universe study estimates the digital universe is doubling in size every two years and will multiply 10-fold between 2013 and 2020 – from 4.4 trillion gigabytes to 44 trillion gigabytes. Thanks to the growing popularity of applied data analytics such as Big Data and the Internet of Things, more enterprise leadership teams will re-examine how their organisations collect, store and analyse data. As data volumes increase, leadership will add pressure to maximize value from the new floods of data.

Enterprises can reduce ongoing management costs by using public cloud to quickly spin up the one-off projects that require additional resources. Plus, enterprises can avoid the hidden costs of unused compute capacity by “cloud bursting” or shifting some peak demand toward the cloud-based HPC grid extensions.

Grid HPC computing has been around long before the cloud, managing data volumes and running efficiently on on-premise hardware. Yet a one-off project can cause major headaches for IT teams to add the required infrastructure quickly. Both grid computing and “cloud HPC” aim to reduce the cost of computing while increasing reliability. Grid systems on mainframes benefit from virtualization and globally distributed data centers, or “cloud HPC.”

Cloud HPC has evolved from mainframe-based grid computing to solve large problems quickly, with on-demand provisioning with even more scale and global resources. Cloud HPC combines all the compute power of grid computing with public cloud’s added capacity, scale, and pay-per-use flexibility. But because the cloud business model allows users to quickly sign up and “cloud burst” capacity, HPC grid operators can instantly become cloud HPC grid operators and avoid the costs of unexpected one-off.

So what do the advances in public cloud mean for HPC? In recent years, the worldwide HPC sector has actually grown IT spending, estimated at $20.3 billion in 2011, and growing at a compound annual growth rate of 7.6%, according to IDC. New low-cost shared compute infrastructures have made the entire cloud into a potential HPC extension. With grid systems and public cloud IaaS, cloud HPC benefits from a network of powerful on-demand supercomputer without the staggering costs to buy, house and maintain the additional hardware.

Public cloud has a myriad of players with different options for pricing, location, performance, and even niche industry needs. For HPC users, the main concerns should be around cloud offerings with instant availability, large capacity, and excellent service-level performance. For example, the 2014 Gartner Magic Quadrant notes that Amazon’s cloud (Amazon Web Services, or AWS) has more than five times the cloud IaaS compute capacity in use than the next 14 providers listed, combined. That kind of capacity is a plus for HPC workload needs.

The second step in the cloud selection decision is to avoid vendor lock-in. Most cloud providers make it simple to put data into the cloud, but might not have accessible ways to move and remove data from their IaaS offering. Enterprises should seek out vendors with transparent data management policies in addition to security solutions and network controls that keep security and access controls in the hands of the cloud user. Look for providers who allow connections to multiple networks and do not limit data transfers between regions or even in and out of the cloud.

More Data In Transit Means More Data at Risk

The benefits of cloud-based capacity expansion and one-off projects should be clear by now, but what about the original questions of security? Cloud providers do offer some of the best-in-class physical data centre protections and state of the art equipment, but once data moves up the “cloud stack” lines of ownership and control begin to blur. The current public cloud model shares security responsibility between cloud providers, vendors and end HPC customers. The provider manages and verifies Layer 0 – 3 security, while end users must secure the nodes and applications.

Gartner analyst Lydia Leong writes: “IT managers purchasing cloud IaaS should remain aware that many aspects of security operations remain their responsibility, not the cloud provider’s. Critically, the customer often retains security responsibility for everything above the hypervisor.” HPC users should know to be vigilant about security, both on-premise and in cloud. The biggest difference between traditional data centres and cloud HPC are differing security rules for who owns the security responsibilities and who manages complexity.

Public cloud infrastructure definitely offers lower capital costs and on-demand flexibility, but can increase enterprise risk profiles and create new security concerns. The best practice for managing more complex cloud networks is to use end-to-end encryption for all HPC data as it travels to, across, and within public clouds and the public internet.

Data encryption helps mitigate all data centre security risks. Encryption comes in two forms: data in transit across the public internet, and data at rest on servers in a data centre. Public cloud providers do offer some encryption, but might not have fully end-to-end encryption within the internal cloud regions or between cloud zones.

Virtual private clouds (VPCs) do offer added security for HPC organisations using public cloud, but the enterprise still does not have full control over access and security across the public internet from data centre to cloud deployment.

HPC users can add security layers on top of public cloud providers’ security features by using highly available overlay network and site-to-site IPsec encryption. These two tools keep HPC users’ workloads safe from attacks in both the underlying infrastructure and over the public internet, no matter who owns and accesses the network. This way, providers can offer on-demand infrastructure while HPC organisations benefit from low costs, data viability, and high availability.

Using a secure tunnel network architecture and secure network protocols, such as IPsec connections and secure socket layer (SSL) protocols, keeps data safe from endpoint to endpoint. The full encryption protects data both at rest in a data centre or cloud and data traveling across the public internet. With end-to-end encryption, enterprise HPC data can be secure regardless of location. Encrypted connections between HPC grid nodes can ensure cloud bursting for higher capacity is secure enough to pass compliance requirements such as PCI and HIPAA in the US.

HPC Best Practices in Action: US Mutual Fund

A large mutual fund based in Boston uses the elasticity of public cloud to compute financial metrics that never had been possible in their internal infrastructure. The large public cloud they selected had the required elements of capacity, on-demand flexibility, and pay-as-you-go pricing. But they also wanted added security and the agility to prevent vendor lock-in.

What the public cloud offered, on its own, could not provide the security and control needed for this financial institution to extend their existing HPC grids on the same datacentre-based network. The mutual fund required VLAN isolation to ensure customer traffic was separate from all other data traveling to and within the cloud. They also wanted to ensure resilient file storage and data validity beyond the cloud providers’ offerings.

Rather than rebuild their HPC grid, the mutual fund wanted to rapidly connect and scale up the public cloud and determined that the most efficient strategy was to use an overlay network. Their solution also included full end-to-end and data-in-motion encryption required to meet the financial industry data protection regulations. The overlay network allowed the new HPC workloads to act like the existing HPC grid network and pass internal and external security tests.

With an overlay network, the mutual fund securely burst into public cloud IaaS as a natural extension to their grid. The HPC grid extension also ensured all data-in-motion was encrypted from the on premise grid to the cloud-based grid extension. The mutual fund could then incorporate their cloud HPC results into on-demand reports for their clients.

Public cloud saved expensive physical servers from sitting idle. Best practices prevented vendor lock-in and saved IT teams from re-architecting or changing HPC vendors. Now, the mutual fund company uses public cloud infrastructure to create a secure and automated natural HPC grid extension in which they flex up their processing power in seconds and back down when no longer needed.

About the Author

As Senior Cloud Solutions Architect, Sam Mitchell leads technical elements of cloud adoption. Mitchell runs demos, technical qualification, technical account management, proof of concepts, technical and competitive positioning, RFI/RFP responses and proposals. Before CohesiveFT, Mitchell was a Cloud Solution Architect at Platform Computing, which was acquired by IBM. He was also a Lead Architect at SITA, where he headed up OSS BSS Architecture, Design and Deployment activities on SITA’s cloud offerings.

Topics: Cloud

Mystery Solved: Intel’s Former HPC Chief Now Running Software Engineering Group

April 15, 2024

Last year, Jeff McVeigh, Intel's readily available leader of the high-performance computing group, suddenly went silent, with no interviews granted or appearances at press conferences. It led to questions -- what's Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Institute for Human-Centered AI (HAI) put out a yearly report to t Read more…

Crossing the Quantum Threshold: The Path to 10,000 Qubits

April 15, 2024

Editor’s Note: Why do qubit count and quality matter? What’s the difference between physical qubits and logical qubits? Quantum computer vendors toss these terms and numbers around as indicators of the strengths of t Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips are available off the shelf, a concern raised at many recent Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics — announced its second fund targeting €200 million. The very idea th Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. In a way, Nvidia is the new Intel IDF, the hottest chip show Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Computational Chemistry Needs To Be Sustainable, Too

April 8, 2024

A diverse group of computational chemists is encouraging the research community to embrace a sustainable software ecosystem. That's the message behind a recent Read more…

Hyperion Research: Eleven HPC Predictions for 2024

April 4, 2024

HPCwire is happy to announce a new series with Hyperion Research - a fact-based market research firm focusing on the HPC market. In addition to providing mark Read more…

Google Making Major Changes in AI Operations to Pull in Cash from Gemini

April 4, 2024

Over the last week, Google has made some under-the-radar changes, including appointing a new leader for AI development, which suggests the company is taking its Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Google Addresses the Mysteries of Its Hypercomputer

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

Intel’s Xeon General Manager Talks about Server Chips

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

Click Here for More Headlines

HPCwire is a registered trademark of Tabor Communications, Inc. Use of this site is governed by our Terms of Use and Privacy Policy.

Reproduction in whole or in part in any form or medium without express written permission of Tabor Communications, Inc. is prohibited.

Leading Solution Providers

Off The Wire

Industry Headlines

April 16, 2024

April 15, 2024

April 12, 2024

April 11, 2024

April 10, 2024

Subscribe to HPCwire's Weekly Update!

Mystery Solved: Intel’s Former HPC Chief Now Running Software Engineering Group

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

Crossing the Quantum Threshold: The Path to 10,000 Qubits

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

Nvidia’s GTC Is the New Intel IDF

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

Nvidia’s GTC Is the New Intel IDF

Google Announces Homegrown ARM-based CPUs

Computational Chemistry Needs To Be Sustainable, Too

Hyperion Research: Eleven HPC Predictions for 2024

Google Making Major Changes in AI Operations to Pull in Cash from Gemini

Nvidia H100: Are 550,000 GPUs Enough for This Year?

DoD Takes a Long View of Quantum Computing

Synopsys Eats Ansys: Does HPC Get Indigestion?

Intel’s Server and PC Chip Development Will Blur After 2025

Choosing the Right GPU for LLM Inference and Training

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

Google Addresses the Mysteries of Its Hypercomputer

Leading Solution Providers

Contributors

Tiffany Trader

Editorial Director

Douglas Eadline

Managing Editor

John Russell

Senior Editor

Kevin Jackson

Contributing Editor

Ali Azhar

Contributing Editor

Alex Woodie

Contributing Editor

Addison Snell

Contributing Editor

Drew Jolly

Assistant Editor

How AMD May Get Across the CUDA Moat

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

China Is All In on a RISC-V Future

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

Eyes on the Quantum Prize – D-Wave Says its Time is Now

GenAI Having Major Impact on Data Culture, Survey Says

Intel’s Xeon General Manager Talks about Server Chips

The Information Nexus of Advanced Computing and Data systems for a High Performance World

Share

Copy short link