Fresh Approaches to Extending Enterprise HPC to Public Clouds

By Sam Mitchell

October 15, 2014

Public cloud can be an easy choice for enterprises looking to extend high performance workloads, reduce infrastructure costs and increase flexibility. The cloud offers the chance to reduce the capital cost of owning and managing excess compute capacity and storage for all workloads. Enterprises can avoid the hidden costs of unused compute capacity by “cloud bursting” or shifting some peak demand toward the cloud-based HPC grid extensions. But how do you connect to existing grid resources and attest to security compliance?

For the security and network management needs of HPC users considering the cloud, the best solution is connecting to the existing grid with overlay networks. An overlay network simply creates a private, sealed network on top of any existing network. Using overlay networks over top of public cloud resources can add the flexibility, high availability and the robust security that HPC grid operators need to cope with unforeseen capacity demands.

With the following tried and true best practices for high performance computing (HPC) in the public cloud, enterprises – even in regulated industries such as healthcare and financial services – can manage a secure cloud-based HPC environment and still benefit from cloud’s economies of scale. Once HPC nodes are set up and secured in cloud, connecting between existing HPC grids and new deployments can be complex. By using a manageable and compliant cloud network topology, enterprises can ease the transition into cloud-based HPC.

The path to HPC in the public cloud starts by selecting trustworthy cloud providers and creating secure cloud deployments. Historically, HPC environments have been expensive to own, manage and operate as entirely on-premise compute capacity. One reason this happens is that organisations often require extra compute resource for irregular one-off jobs containing sensitive data such as intellectual property. Cloud infrastructure is an excellent way to expand quickly for unexpected one-off projects.

HPC grid extensions can ensure one-off projects do not break the bank and, with added encryption from an on-premise grid to a cloud-based grid extension, that the projects comply with regulatory requirements. Ultimately, HPC cloud best practices can help an enterprise save capital costs, prevent vendor lock-in, conserve IT resources and prevent organisations from having to change HPC vendors.

Organisations can use public cloud to extend existing HPC grids securely while reducing infrastructure costs, increasing flexibility, using existing grid resources, and attesting to security compliance. When possible, HPC grid managers can migrate additional workloads to a remote cloud provider in order to save overhead. HPC organizations can use public cloud to pursue smaller deployments and to keep up with the latest in technology. Plus, enterprise teams can use cloud deployments to develop parallel streams of work on identical, yet distinct, copies of their network topology to boost productivity in an agile and secure cloud environment.

Data is another primary consideration. Currently, enterprise application requirements and data volumes continue to grow significantly. The EMC Digital Universe study estimates the digital universe is doubling in size every two years and will multiply 10-fold between 2013 and 2020 – from 4.4 trillion gigabytes to 44 trillion gigabytes. Thanks to the growing popularity of applied data analytics such as Big Data and the Internet of Things, more enterprise leadership teams will re-examine how their organisations collect, store and analyse data. As data volumes increase, leadership will add pressure to maximize value from the new floods of data.

Enterprises can reduce ongoing management costs by using public cloud to quickly spin up the one-off projects that require additional resources. Plus, enterprises can avoid the hidden costs of unused compute capacity by “cloud bursting” or shifting some peak demand toward the cloud-based HPC grid extensions.

Grid HPC computing has been around long before the cloud, managing data volumes and running efficiently on on-premise hardware. Yet a one-off project can cause major headaches for IT teams to add the required infrastructure quickly. Both grid computing and “cloud HPC” aim to reduce the cost of computing while increasing reliability. Grid systems on mainframes benefit from virtualization and globally distributed data centers, or “cloud HPC.”

Cloud HPC has evolved from mainframe-based grid computing to solve large problems quickly, with on-demand provisioning with even more scale and global resources. Cloud HPC combines all the compute power of grid computing with public cloud’s added capacity, scale, and pay-per-use flexibility. But because the cloud business model allows users to quickly sign up and “cloud burst” capacity, HPC grid operators can instantly become cloud HPC grid operators and avoid the costs of unexpected one-off.

So what do the advances in public cloud mean for HPC? In recent years, the worldwide HPC sector has actually grown IT spending, estimated at $20.3 billion in 2011, and growing at a compound annual growth rate of 7.6%, according to IDC. New low-cost shared compute infrastructures have made the entire cloud into a potential HPC extension. With grid systems and public cloud IaaS, cloud HPC benefits from a network of powerful on-demand supercomputer without the staggering costs to buy, house and maintain the additional hardware.

Public cloud has a myriad of players with different options for pricing, location, performance, and even niche industry needs. For HPC users, the main concerns should be around cloud offerings with instant availability, large capacity, and excellent service-level performance. For example, the 2014 Gartner Magic Quadrant notes that Amazon’s cloud (Amazon Web Services, or AWS) has more than five times the cloud IaaS compute capacity in use than the next 14 providers listed, combined. That kind of capacity is a plus for HPC workload needs.

The second step in the cloud selection decision is to avoid vendor lock-in. Most cloud providers make it simple to put data into the cloud, but might not have accessible ways to move and remove data from their IaaS offering. Enterprises should seek out vendors with transparent data management policies in addition to security solutions and network controls that keep security and access controls in the hands of the cloud user. Look for providers who allow connections to multiple networks and do not limit data transfers between regions or even in and out of the cloud.

More Data In Transit Means More Data at Risk

The benefits of cloud-based capacity expansion and one-off projects should be clear by now, but what about the original questions of security? Cloud providers do offer some of the best-in-class physical data centre protections and state of the art equipment, but once data moves up the “cloud stack” lines of ownership and control begin to blur. The current public cloud model shares security responsibility between cloud providers, vendors and end HPC customers. The provider manages and verifies Layer 0 – 3 security, while end users must secure the nodes and applications.

Gartner analyst Lydia Leong writes: “IT managers purchasing cloud IaaS should remain aware that many aspects of security operations remain their responsibility, not the cloud provider’s. Critically, the customer often retains security responsibility for everything above the hypervisor.” HPC users should know to be vigilant about security, both on-premise and in cloud. The biggest difference between traditional data centres and cloud HPC are differing security rules for who owns the security responsibilities and who manages complexity.

Public cloud infrastructure definitely offers lower capital costs and on-demand flexibility, but can increase enterprise risk profiles and create new security concerns. The best practice for managing more complex cloud networks is to use end-to-end encryption for all HPC data as it travels to, across, and within public clouds and the public internet.

Data encryption helps mitigate all data centre security risks. Encryption comes in two forms: data in transit across the public internet, and data at rest on servers in a data centre. Public cloud providers do offer some encryption, but might not have fully end-to-end encryption within the internal cloud regions or between cloud zones.

Virtual private clouds (VPCs) do offer added security for HPC organisations using public cloud, but the enterprise still does not have full control over access and security across the public internet from data centre to cloud deployment.

HPC users can add security layers on top of public cloud providers’ security features by using highly available overlay network and site-to-site IPsec encryption. These two tools keep HPC users’ workloads safe from attacks in both the underlying infrastructure and over the public internet, no matter who owns and accesses the network. This way, providers can offer on-demand infrastructure while HPC organisations benefit from low costs, data viability, and high availability.

Using a secure tunnel network architecture and secure network protocols, such as IPsec connections and secure socket layer (SSL) protocols, keeps data safe from endpoint to endpoint. The full encryption protects data both at rest in a data centre or cloud and data traveling across the public internet. With end-to-end encryption, enterprise HPC data can be secure regardless of location. Encrypted connections between HPC grid nodes can ensure cloud bursting for higher capacity is secure enough to pass compliance requirements such as PCI and HIPAA in the US.

HPC Best Practices in Action: US Mutual Fund

A large mutual fund based in Boston uses the elasticity of public cloud to compute financial metrics that never had been possible in their internal infrastructure. The large public cloud they selected had the required elements of capacity, on-demand flexibility, and pay-as-you-go pricing. But they also wanted added security and the agility to prevent vendor lock-in.

What the public cloud offered, on its own, could not provide the security and control needed for this financial institution to extend their existing HPC grids on the same datacentre-based network. The mutual fund required VLAN isolation to ensure customer traffic was separate from all other data traveling to and within the cloud. They also wanted to ensure resilient file storage and data validity beyond the cloud providers’ offerings.

Rather than rebuild their HPC grid, the mutual fund wanted to rapidly connect and scale up the public cloud and determined that the most efficient strategy was to use an overlay network. Their solution also included full end-to-end and data-in-motion encryption required to meet the financial industry data protection regulations. The overlay network allowed the new HPC workloads to act like the existing HPC grid network and pass internal and external security tests.

With an overlay network, the mutual fund securely burst into public cloud IaaS as a natural extension to their grid. The HPC grid extension also ensured all data-in-motion was encrypted from the on premise grid to the cloud-based grid extension. The mutual fund could then incorporate their cloud HPC results into on-demand reports for their clients.

Public cloud saved expensive physical servers from sitting idle. Best practices prevented vendor lock-in and saved IT teams from re-architecting or changing HPC vendors. Now, the mutual fund company uses public cloud infrastructure to create a secure and automated natural HPC grid extension in which they flex up their processing power in seconds and back down when no longer needed.

About the Author

As Senior Cloud Solutions Architect, Sam Mitchell leads technical elements of cloud adoption. Mitchell runs demos, technical qualification, technical account management, proof of concepts, technical and competitive positioning, RFI/RFP responses and proposals. Before CohesiveFT, Mitchell was a Cloud Solution Architect at Platform Computing, which was acquired by IBM. He was also a Lead Architect at SITA, where he headed up OSS BSS Architecture, Design and Deployment activities on SITA’s cloud offerings.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Automated Optimization Boosts ResNet50 Performance by 1.77x on Intel CPUs

October 23, 2018

From supercomputers to cell phones, every system and software device in our digital panoply has a growing number of settings that, if not optimized, constrain performance, wasting precious cycles and watts. In the f Read more…

By Tiffany Trader

South Africa CHPC: Home Grown Dynasty

October 22, 2018

Before the build up to the final event in the 2018 Student Cluster Competition season (the SC18 competition in Dallas), I want to take a moment to write about one of the great inspirational stories of these competitions. Read more…

By Dan Olds

NSF Launches Quantum Computing Faculty Fellows Program

October 22, 2018

Efforts to expand quantum computing research capacity continue to accelerate. The National Science Foundation today announced a Quantum Computing & Information Science Faculty Fellows (QCIS-FF) program aimed at devel Read more…

By John Russell

HPE Extreme Performance Solutions

One Small Step Toward Mars: One Giant Leap for Supercomputing

Since the days of the Space Race between the U.S. and the former Soviet Union, we have continually sought ways to perform experiments in space. Read more…

IBM Accelerated Insights

Join IBM at SC18 and Learn to Harness the Next Generation of AI-focused Supercomputing

Blurring the lines between HPC and AI

Today’s high performance computers are helping clients gain insights at an unprecedented pace. The intersection of artificial intelligence (AI) and HPC can transform industries while solving some of the world’s toughest challenges. Read more…

Democratization of HPC Part 3: Ninth Graders Tap HPC in the Cloud to Design Flying Boats

October 18, 2018

This is the third in a series of articles demonstrating the growing acceptance of high-performance computing (HPC) in new user communities and application areas. In this article we present UberCloud use case #208 on how Read more…

By Wolfgang Gentzsch and Håkon Bull Hove

Automated Optimization Boosts ResNet50 Performance by 1.77x on Intel CPUs

October 23, 2018

From supercomputers to cell phones, every system and software device in our digital panoply has a growing number of settings that, if not optimized, constrain  Read more…

By Tiffany Trader

South Africa CHPC: Home Grown Dynasty

October 22, 2018

Before the build up to the final event in the 2018 Student Cluster Competition season (the SC18 competition in Dallas), I want to take a moment to write about o Read more…

By Dan Olds

Penguin Computing Launches Consultancy for Piecing AI Strategies Together

October 18, 2018

AI stands before the HPC industry as a beacon of great expectations, yet market research repeatedly shows that AI adoption is commonly stuck in the talking phas Read more…

By Tiffany Trader

When Water Quality—Not Quantity—Hinders HPC Cooling

October 18, 2018

Attention has been paid to the sheer quantity of water consumed by supercomputers’ cooling towers – and rightly so, as they can require thousands of gallons per minute to cool. But in the background, another factor can emerge, bottlenecking efficiency and raising costs: water quality. Read more…

By Oliver Peckham

Paper Offers ‘Proof’ of Quantum Advantage on Some Problems

October 18, 2018

Is quantum computing worth all the effort being poured into it or should we just wait for classical computing to catch up? An IBM blog today posed those questio Read more…

By John Russell

Dell EMC to Supply U Michigan’s Great Lakes Cluster

October 16, 2018

The University of Michigan (U-M) today announced Dell EMC is the lead vendor for U-M’s $4.8 million Great Lakes HPC cluster scheduled for deployment in first Read more…

By John Russell

Houston to Field Massive, ‘Geophysically Configured’ Cloud Supercomputer

October 11, 2018

Based on some news stories out today, one might get the impression that the next system to crack number one on the Top500 would be an industrial oil and gas mon Read more…

By Tiffany Trader

Nvidia Platform Pushes GPUs into Machine Learning, High Performance Data Analytics

October 10, 2018

GPU leader Nvidia, generally associated with deep learning, autonomous vehicles and other higher-end enterprise and scientific workloads (and gaming, of course) Read more…

By Doug Black

TACC Wins Next NSF-funded Major Supercomputer

July 30, 2018

The Texas Advanced Computing Center (TACC) has won the next NSF-funded big supercomputer beating out rivals including the National Center for Supercomputing Ap Read more…

By John Russell

IBM at Hot Chips: What’s Next for Power

August 23, 2018

With processor, memory and networking technologies all racing to fill in for an ailing Moore’s law, the era of the heterogeneous datacenter is well underway, Read more…

By Tiffany Trader

Requiem for a Phi: Knights Landing Discontinued

July 25, 2018

On Monday, Intel made public its end of life strategy for the Knights Landing "KNL" Phi product set. The announcement makes official what has already been wide Read more…

By Tiffany Trader

CERN Project Sees Orders-of-Magnitude Speedup with AI Approach

August 14, 2018

An award-winning effort at CERN has demonstrated potential to significantly change how the physics based modeling and simulation communities view machine learni Read more…

By Rob Farber

House Passes $1.275B National Quantum Initiative

September 17, 2018

Last Thursday the U.S. House of Representatives passed the National Quantum Initiative Act (NQIA) intended to accelerate quantum computing research and developm Read more…

By John Russell

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

By John Russell

D-Wave Breaks New Ground in Quantum Simulation

July 16, 2018

Last Friday D-Wave scientists and colleagues published work in Science which they say represents the first fulfillment of Richard Feynman’s 1982 notion that Read more…

By John Russell

Leading Solution Providers

HPC on Wall Street 2018 Booth Video Tours Playlist

Arista

Dell EMC

IBM

Intel

RStor

VMWare

TACC’s ‘Frontera’ Supercomputer Expands Horizon for Extreme-Scale Science

August 29, 2018

The National Science Foundation and the Texas Advanced Computing Center announced today that a new system, called Frontera, will overtake Stampede 2 as the fast Read more…

By Tiffany Trader

HPE No. 1, IBM Surges, in ‘Bucking Bronco’ High Performance Server Market

September 27, 2018

Riding healthy U.S. and global economies, strong demand for AI-capable hardware and other tailwind trends, the high performance computing server market jumped 28 percent in the second quarter 2018 to $3.7 billion, up from $2.9 billion for the same period last year, according to industry analyst firm Hyperion Research. Read more…

By Doug Black

Intel Announces Cooper Lake, Advances AI Strategy

August 9, 2018

Intel's chief datacenter exec Navin Shenoy kicked off the company's Data-Centric Innovation Summit Wednesday, the day-long program devoted to Intel's datacenter Read more…

By Tiffany Trader

Germany Celebrates Launch of Two Fastest Supercomputers

September 26, 2018

The new high-performance computer SuperMUC-NG at the Leibniz Supercomputing Center (LRZ) in Garching is the fastest computer in Germany and one of the fastest i Read more…

By Tiffany Trader

MLPerf – Will New Machine Learning Benchmark Help Propel AI Forward?

May 2, 2018

Let the AI benchmarking wars begin. Today, a diverse group from academia and industry – Google, Baidu, Intel, AMD, Harvard, and Stanford among them – releas Read more…

By John Russell

Houston to Field Massive, ‘Geophysically Configured’ Cloud Supercomputer

October 11, 2018

Based on some news stories out today, one might get the impression that the next system to crack number one on the Top500 would be an industrial oil and gas mon Read more…

By Tiffany Trader

Aerodynamic Simulation Reveals Best Position in a Peloton of Cyclists

July 5, 2018

Eindhoven University of Technology (TU/e) and KU Leuven research group conducts the largest numerical simulation ever done in the sport industry and cycling discipline. The goal was to understand the aerodynamic interactions in the peloton, i.e., the main pack of cyclists in a race. Read more…

No Go for GloFo at 7nm; and the Fujitsu A64FX post-K CPU

September 5, 2018

It’s been a news worthy couple of weeks in the semiconductor and HPC industry. There were several HPC relevant disclosures at Hot Chips 2018 to whet appetites Read more…

By Dairsie Latimer

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This