Comprehensively Evaluating HPC Cloud Cost Benefits

By Ian Armas Foster

July 30, 2013

HP Labs partnered with the University of Illinois at Champaign-Urbana to comprehensively evaluate the feasibility of running high performance applications in the cloud. The research set out to answer many questions, including wondering how HPC applications fare in the cloud versus supercomputers (they used the Ranger and Taub machines for those tests), which applications were best suited for cloud deployment, and what the cost benefits were for certain organizations in maintaining their high performance needs in a cloud.

Below is a grid of all the platforms they used in testing their various applications. As one can see, the Ranger and Taub systems are there along with public and private cloud instances.

It is important to note the approach the research team took with setting up their cloud systems. While they could have built a dedicated instance that would perform closer to supercomputing standards, they figured that such an instance would be unlikely in the scenario of a mid-sized enterprise or startup looking to purchase on-demand HPC resources.

With that said, they still took steps to optimize the performance. “To get maximum performance from virtual machines, we avoided any sharing of physical cores between virtual cores. In case of cloud, most common deployment of multi-tenancy is not sharing individual physical cores, but rather done at the node, or even coarser level. This is even more true with increasing number of cores per server.”

They tested those cloud systems and the control supercomputers on a variety of applications, including Jacobi2D, used for scientific simulation and image processing, NAMD, a molecular dynamics application, ChaNGa, used for cosmology simulation, and the NQueens problem among others.

The graphs above show how well the various machines’ performance scaled relative to the various applications. The applications that reportedly found trouble scaling were those that were communication intensive. “IS is a communication intensive benchmark and involves data reshuffling and permutation operations for sorting. Sweep3Dalso exhibits poor weak scaling after 4–8 cores on cloud. Other communication intensive applications such as LU, NAMD and ChaNGa also stop scaling on private cloud around 32 cores,” the report noted.

In all instances except for the public cloud, the EP, Jacobi2D and NQueens applications scaled up to 256 cores, while the public cloud imposed performance penalties once more than four cores were used.

Once the performance drop off was established for clouds, a fact that was altogether not surprising, the next task was to determine exactly what kind of penalty was suffered, such that they could relate that to the cost of apportioning those systems in the process of determining if cloud is indeed a cost effective means of securing HPC resources.

To quantify the amount of variability on cloud and compare it with a supercomputer, we calculated the coefficient of variation (standard deviation/mean) for execution time of ChaNGa across 5 executions,” the report stated. According to the research team, the amount of variability increases as they scale up as a result of decrease in granularity. “For the case of 256 cores at public cloud, standard deviation is equal to half the mean, implying that on average, values are spread out between 0.5x mean and 1.5x mean resulting in low predictability of performance across runs. In contrast, private cloud shows less variability.”

Overall, latency and bandwidth on cloud ended up coming in a couple of orders of magnitude below that of their Ranger and Taub machines, as shown in the logarithmic graphs below.

These bandwidth and latency issues make it difficult on those aforementioned communication intensive applications, where obviously contact among cores and nodes to complete a problem is key.

Again, the researchers note that a dedicated public cloud instance would solve a great deal of these problems. However, such an instance would likely cost more and therefore become less feasible for the mid-sized companies and startups that would utilize it. The multi-tenancy cloud setup renders many high performance applications untenable. “The performance of many HPC applications is very sensitive to the interconnect, as we showed in our experimental evaluation. In particular low latency requirements are typical for the HPC applications that incur substantial communication. This is in contrast with the commodity Ethernet network (1Gbps today moving to 10Gbps) typically deployed in cloud infrastructure,” the report noted.

With that said, it is still prudent for those smallmedium companies to enlist cloud-based HPC services, as the cost analysis shows below.

Even the communication intensive applications work well up to a certain amount of cores, an amount of cores unlikely to be exceeded by a medium institution. “The ability to take advantage of a large variety of different architectures (with different interconnects, processor types, memory sizes, etc.) can result in better utilization at global scale, compared to the limited choices available in any individual organization,” the report argued. Below is a sample of what such an architecture that relies on just four-core cloud-based machines would look like.

The report does go on to say that dedicated instances would be advantageous to large institutions looking for burst capacity, a concept that has been discussed here.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Nvidia Delivering New Options for MLPerf and HPC Performance

September 28, 2023

As HPCwire reported recently, the latest MLperf benchmarks are out. Not unsurprisingly, Nvidia was the leader across many categories. The HGX H100 GPU systems, which contain eight H100 GPUs, delivered the highest throughput on every MLPerf inference test in this round. Read more…

Hakeem Oluseyi Explores His Unlikely Journey from the Street to the Stars in SC23 Keynote

September 28, 2023

Defying the odds In the heart of one of the toughest neighborhoods in the country, young Hakeem Oluseyi’s world was a confined space, but his imagination soared to the stars. While other kids roamed the streets, he Read more…

Nvidia Takes Another Shot at Trying to Get AI to Mobile Devices

September 28, 2023

Nvidia takes another shot at trying to get to mobile devices Long before the current situation of Nvidia's GPUs holding AI hostage, the company tried to put its chips in mobile devices but failed. The Tegra mobile chi Read more…

IonQ Announces 2 New Quantum Systems; Suggests Quantum Advantage is Nearing

September 27, 2023

It’s been a busy week for IonQ, the quantum computing start-up focused on developing trapped-ion-based systems. At the Quantum World Congress today, the company announced two new systems (Forte Enterprise and Tempo) in Read more…

Rethinking ‘Open’ for AI

September 27, 2023

What does “open” mean in the context of AI? Must we accept hidden layers? Do copyrights and patents still hold sway? And do consumers have the right to opt out of data collection? These are the types of questions tha Read more…

AWS Solution Channel

Shutterstock 1024337068

Introducing a Community Recipe Library for HPC Infrastructure on AWS

We want to make it easier for customers to extend and build on AWS using tools like AWS ParallelCluster, Amazon FSx for Lustre, and some of the hundreds of other AWS services that customers often use to make discoveries from their data or simulations. Read more…

QCT Solution Channel

QCT and Intel Codeveloped QCT DevCloud Program to Jumpstart HPC and AI Development

Organizations and developers face a variety of issues in developing and testing HPC and AI applications. Challenges they face can range from simply having access to a wide variety of hardware, frameworks, and toolkits to time spent on installation, development, testing, and troubleshooting which can lead to increases in cost. Read more…

Leveraging Machine Learning in Dark Matter Research for the Aurora Exascale System 

September 25, 2023

Scientists have unlocked many secrets about particle interactions at atomic and subatomic levels. However, one mystery that has eluded researchers is dark matter. Current supercomputers don’t have the capability to run Read more…

Nvidia Delivering New Options for MLPerf and HPC Performance

September 28, 2023

As HPCwire reported recently, the latest MLperf benchmarks are out. Not unsurprisingly, Nvidia was the leader across many categories. The HGX H100 GPU systems, which contain eight H100 GPUs, delivered the highest throughput on every MLPerf inference test in this round. Read more…

IonQ Announces 2 New Quantum Systems; Suggests Quantum Advantage is Nearing

September 27, 2023

It’s been a busy week for IonQ, the quantum computing start-up focused on developing trapped-ion-based systems. At the Quantum World Congress today, the compa Read more…

Rethinking ‘Open’ for AI

September 27, 2023

What does “open” mean in the context of AI? Must we accept hidden layers? Do copyrights and patents still hold sway? And do consumers have the right to opt Read more…

Aurora Image

Leveraging Machine Learning in Dark Matter Research for the Aurora Exascale System 

September 25, 2023

Scientists have unlocked many secrets about particle interactions at atomic and subatomic levels. However, one mystery that has eluded researchers is dark matte Read more…

Watsonx Brings AI Visibility to Banking Systems

September 21, 2023

A new set of AI-based code conversion tools is available with IBM watsonx. Before introducing the new "watsonx," let's talk about the previous generation Watson Read more…

Intel’s Gelsinger Lays Out Vision and Map at Innovation 2023 Conference

September 20, 2023

Intel’s sprawling, optimistic vision for the future was on full display yesterday in CEO Pat Gelsinger’s opening keynote at the Intel Innovation 2023 confer Read more…

Intel Showcases ‘AI Everywhere’ Strategy in MLPerf Inferencing v3.1

September 18, 2023

Intel used the latest MLPerf Inference (version 3.1) results as a platform to reinforce its developing “AI Everywhere” vision, which rests upon 4th gen Xeon Read more…

China’s Quiet Journey into Exascale Computing

September 17, 2023

As reported in the South China Morning Post HPC pioneer Jack Dongarra mentioned the lack of benchmarks from recent HPC systems built by China. “It’s a we Read more…

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

Leading Solution Providers

Contributors

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

ISC 2023 Booth Videos

Cornelis Networks @ ISC23
Dell Technologies @ ISC23
Intel @ ISC23
Lenovo @ ISC23
Microsoft @ ISC23
ISC23 Playlist
  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire