Half-Time in the Uber-Cloud

By Wolfgang Gentzsch and Burak Yenier

September 20, 2012

Since its first announcement on June 28 here on HPCwire, and its official start on July 20, the Uber-Cloud Experiment has attracted over 160 industry and research organizations and individuals from 22 countries. They all have one goal: to jointly explore the end-to-end process of remotely accessing technical computing resources sitting in HPC centers and in the cloud. The focus of this experiment is on engineering simulations performed by small and medium enterprises that expect a quantum leap in innovation and competitiveness by using high performance computing.

The benefits of remote access to HPC are widely recognized. We have at our disposal most of the technology needed to access and run our engineering workloads on remote resources. But we still face other challenges more related to the human element. For example, trusting in the resource provider; giving away some control over our applications, data, and resources; security; provider lock-in; software licensing; unfamiliar pay-per-use computing model; and a general lack of clarity in distinguishing between hype and reality.

To explore these hurdles in detail and to learn more about this end-to-end process, we were able to build 20 teams, each consisting of an end-user and their application, the software provider, the computational resource provider, and an HPC and/or CAE expert who manages the team process. Thanks to our participants, the following teams have been established:

Team

Project Description

Anchor Bolt

Simulating steel to concrete fastening capacity for an anchor bolt

Resonance

Electromagnetic simulations of NMR probe heads

Radiofrequency

Radiofrequency field distribution inside heterogeneous human body

Supersonic

Simulation of jet mixing in the supersonic flow with shock

Liquid-Gas

Two-phase flow simulation of separation columns

Wing-Flow

Flow around an aerospace wing

Ship-Hull

Simulation water flow around a hull of the ship

Cement-Flows

Burner simulation with different solid fuels in mining industry

Sprinkler

Simulating water flow through an irrigation water sprinkler

Space Capsule

Aerothermodynamics and stability analysis of a space capsule

Car Acoustics

Low frequency car acoustics

Dosimetry

Numerical EMC and dosimetry with high-res models

Weathermen

Large-scale and high-resolution weather and climate prediction

Wind Turbine

CFD simulations of vertical and horizontal wind turbines

Combustion

Simulating combustion in an IC engine

Blood Flow

Simulation of water/ blood flow inside rotating micro channels

ChinaCFD

CFD using homegrown C/C++ application

Gas Bubbles

Simulation of gas bubbles in a liquid mixing vessel

Side impact

Optimization of the side-door intrusion bars under a crash

ColombiaBio

Analysis of the biological diversity in a geography using R scripts

All 20 of these projects are underway today. Two of them are busy with defining their end-user project, 15 teams are in contact with the assigned computing resources and setting up the project environment, one is working on initiating and monitoring the end-user project execution, one is reviewing the results with the end user, and one team is already documenting the findings of the HPC as a Service process. To illustrate the team process in more detail, we present two of the projects and their current status in more detail.

Simulating new probe design for a medical device

Team Expert: Chris Dagdigian from BioTeam

The team’s end user is faced with a common problem: a periodic need for large compute capacity in order to simulate and refine potential product changes and improvements. The periodic nature of the HPC requirements means that it is not possible to have the desired amount of capacity internally as the company finds it difficult to justify capital expenditure for complex assets that may end up sitting idle for long periods of time.

To date the company has invested in a modest amount of internal HPC capacity sufficient to meet base requirements. Additional HPC resources would allow the end user to greatly expand the sensitivity of current simulations and may enable new product & design initiatives previously written off as “untestable.”

The HPC software being employed is CST Studio, a popular commercial application for electromagnetic simulations of many types. The application is currently operating in the Amazon cloud and the team has successfully completed a series of architecture refinements and scaling benchmarks. The hybrid cloud-bursting architecture allows local HPC resources residing at the end-user site to be utilized along with the Amazon cloud-based resources.

At this point in the project the team is still exploring the scaling limits of the Amazon GPU-equipped EC2 instance types and is beginning new tests and scaling runs designed to test HPC task distribution via MPI. The use of MPI will allow enable them to leverage different EC2 instance type configurations and scale beyond some technical limits imposed by the amount of memory residing within the NVIDIA GPU cards.

They believe they are currently at (or close to) the point in which they are routinely running simulations that would not be technically possible using the local-only resources of the end user. They also intend to begin testing the Amazon EC2 Spot Market, in which cloud-based assets can be obtained from an auction-like marketplace offering deeply significant cost savings over traditional on-demand hourly prices.

Multiphase flows within the cement and mineral industry

Team Expert: Ingo Seipp from science + computing ag

In this project ANSYS CFX is used to simulate a flash dryer in which hot gas is used to evaporate water from a solid. The team consists of FLSmidth as the end user, Bull as the resource provider with its extreme factory (XF) HPC on demand service, ANSYS as the software provider, and science + computing ag as team experts.

FLSmidth is the leading supplier of complete plants, equipment and services to the global minerals and cement industries. The end user needs about four to five days to complete a simulation run on the local IT infrastructure. He would like to reduce the total throughput time of the project and, in a second step, increase the mesh size to refine the results, without investing in hardware, which may not always be utilized full-time. For this, the simulation must be run on more cores and more memory through more nodes connected by a high-speed network.

XF provides 150 teraflops of computing power with InfiniBand, GPUs and currently, about 30 installed applications. Others are added on demand. Users can access XF through an easy-to-use web portal or direct login.

In this project, XF has enabled access to the end user and integrated ANSYS CFX in a web-interface for submitting jobs. For the course of this project licenses have been granted by ANSYS. The end user can manage his ANSYS licenses easily through the portal. The preparations to run the jobs are almost completed now and the first test runs should be able to start shortly.

Announcing Round Two of the Uber-Cloud Experiment

We consider Round One as proof of the concept that: yes, remote access to HPC resources works, and, there is a real need for it! And yes, there are hurdles on the way, but we know how to overcome them.

During the half-time webinar we asked the attendees if they would like to participate in the second round of the Uber-Cloud Experiment. 97 percent answered said they would. Therefore, we decided to start a new round of the experiment immediately after the first round completes. It will run from mid-November to mid-February.

Round Two of the experiment will be more professional. The end-to-end process of identifying, accessing and using remote resources (hardware, software, expertise) will become more structured, standardized, and tools-based. We will also handle more teams and more applications beyond CAE, and offer a list of additional professional services, for example, measuring the team effort. Finally, existing teams will be encouraged to use other resources, existing participants can work in new teams, and new participants can join and form new teams.

For anyone interested in learning more about the experiment or to register for Round Two, go to the Uber-Cloud Experiment website.

About the Authors

Wolfgang Gentzsch and Burak Yenier are the creators and facilitators of the Uber-Cloud Experiment. Wolfgang is an HPC veteran. Having worked in leading positions in research, academia and industry for some 30 years, Wolfgang is now an HPC consultant and the chairman of the ISC Cloud conference series for HPC and Big Data in the Cloud. Burak is the vice president of operations at CashEdge, a software-as-a-service company in Silicon Valley, which provides innovative payments and aggregation solutions to financial institutions.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

UCSD, AIST Forge Tighter Alliance with AI-Focused MOU

January 18, 2018

The rich history of collaboration between UC San Diego and AIST in Japan is getting richer. The organizations entered into a five-year memorandum of understanding on January 10. The MOU represents the continuation of a 1 Read more…

By Tiffany Trader

New Blueprint for Converging HPC, Big Data

January 18, 2018

After five annual workshops on Big Data and Extreme-Scale Computing (BDEC), a group of international HPC heavyweights including Jack Dongarra (University of Tennessee), Satoshi Matsuoka (Tokyo Institute of Technology), Read more…

By John Russell

Researchers Measure Impact of ‘Meltdown’ and ‘Spectre’ Patches on HPC Workloads

January 17, 2018

Computer scientists from the Center for Computational Research, State University of New York (SUNY), University at Buffalo have examined the effect of Meltdown and Spectre security updates on the performance of popular H Read more…

By Tiffany Trader

HPE Extreme Performance Solutions

HPE and NREL Take Steps to Create a Sustainable, Energy-Efficient Data Center with an H2 Fuel Cell

As enterprises attempt to manage rising volumes of data, unplanned data center outages are becoming more common and more expensive. As the cost of downtime rises, enterprises lose out on productivity and valuable competitive advantage without access to their critical data. Read more…

Fostering Lustre Advancement Through Development and Contributions

January 17, 2018

Six months after organizational changes at Intel's High Performance Data (HPDD) division, most in the Lustre community have shed any initial apprehension around the potential changes that could affect or disrupt Lustre Read more…

By Carlos Aoki Thomaz

UCSD, AIST Forge Tighter Alliance with AI-Focused MOU

January 18, 2018

The rich history of collaboration between UC San Diego and AIST in Japan is getting richer. The organizations entered into a five-year memorandum of understandi Read more…

By Tiffany Trader

New Blueprint for Converging HPC, Big Data

January 18, 2018

After five annual workshops on Big Data and Extreme-Scale Computing (BDEC), a group of international HPC heavyweights including Jack Dongarra (University of Te Read more…

By John Russell

Researchers Measure Impact of ‘Meltdown’ and ‘Spectre’ Patches on HPC Workloads

January 17, 2018

Computer scientists from the Center for Computational Research, State University of New York (SUNY), University at Buffalo have examined the effect of Meltdown Read more…

By Tiffany Trader

Fostering Lustre Advancement Through Development and Contributions

January 17, 2018

Six months after organizational changes at Intel's High Performance Data (HPDD) division, most in the Lustre community have shed any initial apprehension aroun Read more…

By Carlos Aoki Thomaz

When the Chips Are Down

January 11, 2018

In the last article, "The High Stakes Semiconductor Game that Drives HPC Diversity," I alluded to the challenges facing the semiconductor industry and how that may impact the evolution of HPC systems over the next few years. I thought I’d lift the covers a little and look at some of the commercial challenges that impact the component technology we use in HPC. Read more…

By Dairsie Latimer

How Meltdown and Spectre Patches Will Affect HPC Workloads

January 10, 2018

There have been claims that the fixes for the Meltdown and Spectre security vulnerabilities, named the KPTI (aka KAISER) patches, are going to affect applicatio Read more…

By Rosemary Francis

Momentum Builds for US Exascale

January 9, 2018

2018 looks to be a great year for the U.S. exascale program. The last several months of 2017 revealed a number of important developments that help put the U.S. Read more…

By Alex R. Larzelere

ANL’s Rick Stevens on CANDLE, ARM, Quantum, and More

January 8, 2018

Late last year HPCwire caught up with Rick Stevens, associate laboratory director for computing, environment and life Sciences at Argonne National Laboratory, f Read more…

By John Russell

Inventor Claims to Have Solved Floating Point Error Problem

January 17, 2018

"The decades-old floating point error problem has been solved," proclaims a press release from inventor Alan Jorgensen. The computer scientist has filed for and Read more…

By Tiffany Trader

US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021

September 27, 2017

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the "Aurora" supercompute Read more…

By Tiffany Trader

Japan Unveils Quantum Neural Network

November 22, 2017

The U.S. and China are leading the race toward productive quantum computing, but it's early enough that ultimate leadership is still something of an open questi Read more…

By Tiffany Trader

AMD Showcases Growing Portfolio of EPYC and Radeon-based Systems at SC17

November 13, 2017

AMD’s charge back into HPC and the datacenter is on full display at SC17. Having launched the EPYC processor line in June along with its MI25 GPU the focus he Read more…

By John Russell

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

IBM Begins Power9 Rollout with Backing from DOE, Google

December 6, 2017

After over a year of buildup, IBM is unveiling its first Power9 system based on the same architecture as the Department of Energy CORAL supercomputers, Summit a Read more…

By Tiffany Trader

Fast Forward: Five HPC Predictions for 2018

December 21, 2017

What’s on your list of high (and low) lights for 2017? Volta 100’s arrival on the heels of the P100? Appearance, albeit late in the year, of IBM’s Power9? Read more…

By John Russell

GlobalFoundries Puts Wind in AMD’s Sails with 12nm FinFET

September 24, 2017

From its annual tech conference last week (Sept. 20), where GlobalFoundries welcomed more than 600 semiconductor professionals (reaching the Santa Clara venue Read more…

By Tiffany Trader

Leading Solution Providers

Chip Flaws ‘Meltdown’ and ‘Spectre’ Loom Large

January 4, 2018

The HPC and wider tech community have been abuzz this week over the discovery of critical design flaws that impact virtually all contemporary microprocessors. T Read more…

By Tiffany Trader

Perspective: What Really Happened at SC17?

November 22, 2017

SC is over. Now comes the myriad of follow-ups. Inboxes are filled with templated emails from vendors and other exhibitors hoping to win a place in the post-SC thinking of booth visitors. Attendees of tutorials, workshops and other technical sessions will be inundated with requests for feedback. Read more…

By Andrew Jones

Tensors Come of Age: Why the AI Revolution Will Help HPC

November 13, 2017

Thirty years ago, parallel computing was coming of age. A bitter battle began between stalwart vector computing supporters and advocates of various approaches to parallel computing. IBM skeptic Alan Karp, reacting to announcements of nCUBE’s 1024-microprocessor system and Thinking Machines’ 65,536-element array, made a public $100 wager that no one could get a parallel speedup of over 200 on real HPC workloads. Read more…

By John Gustafson & Lenore Mullin

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Flipping the Flops and Reading the Top500 Tea Leaves

November 13, 2017

The 50th edition of the Top500 list, the biannual publication of the world’s fastest supercomputers based on public Linpack benchmarking results, was released Read more…

By Tiffany Trader

GlobalFoundries, Ayar Labs Team Up to Commercialize Optical I/O

December 4, 2017

GlobalFoundries (GF) and Ayar Labs, a startup focused on using light, instead of electricity, to transfer data between chips, today announced they've entered in Read more…

By Tiffany Trader

How Meltdown and Spectre Patches Will Affect HPC Workloads

January 10, 2018

There have been claims that the fixes for the Meltdown and Spectre security vulnerabilities, named the KPTI (aka KAISER) patches, are going to affect applicatio Read more…

By Rosemary Francis

HPC Chips – A Veritable Smorgasbord?

October 10, 2017

For the first time since AMD's ill-fated launch of Bulldozer the answer to the question, 'Which CPU will be in my next HPC system?' doesn't have to be 'Whichever variety of Intel Xeon E5 they are selling when we procure'. Read more…

By Dairsie Latimer

  • arrow
  • Click Here for More Headlines
  • arrow
Share This