Help HPC Work Smarter and Accelerate Time to Insight

By Edward O. Pyzer-Knapp, IBM Research UK

November 14, 2019


[Attend the IBM LSF & HPC User Group Meeting at SC19 in Denver on November 19]

To recklessly misquote Jane Austen, it is a truth, universally acknowledged, that a company in possession of a highly complex problem must be in want of a massive technical computing cluster.

Certainly, we are seeing an explosion in the scale and complexity of problems which are being tackled by high performance computing solutions, which in turn are becoming larger and larger. Recently, we saw the installation of the IBM built supercomputers Summit and Sierra at Oak Ridge and Lawrence Livermore National Labs, providing access to compute capability never before seen. This has certainly opened the door to solving some of the computing’s most challenging scientific problems, but it is also abundantly clear that there are far more problems abounding – especially in industries such as oil and gas, and the chemical sector – than available compute. This is further complicated by the complexities of industrial research, where often the problems themselves are so sensitive that they cannot leave the four walls in which they were conceived. Given the sensible assumption that we cannot replicate Summit and Sierra on site for every challenge which occurs, the question looms large – given my resources, how do I move forward?

[Also read: IBM HPC is NOW]

Computing capability is regularly measured in benchmarks – how fast can a particular computer run a particular problem – and there is nothing inherently wrong with this. Pragmatically, though, what an end-user is most interested in is not the time it takes to run a single simulation, but the time it takes to extract actionable insight. So how might this be imbued into a technical computing workflow? IBM Research have developed a workflow accelerator based on the mathematics of Bayesian optimization (IBO) to do just this.

Whilst the mathematics behind this concept can be somewhat complex, the concept is simple. Think about the last time you lost something, say your keys or your glasses. I bet that you didn’t split the whole house into a grid, and start at point (0,0,0) and move through the whole house until the entire grid had been searched. Instead you formulated an opinion on where you thought the most likely place you had left them was (possibly based on historical information), you searched there, and if they were not there, you updated your views, and selected the next place to search. This loop of hypothesise, act, observe is well known to the scientific community, and is hardwired deep into our psyche. It is also precisely how Bayesian optimization works, replacing your brain with a sophisticated Bayesian, data-driven, model, and your decision making with a construct known as an ‘acquisition function’.

It is all well and good having some powerful technology, but in order to drive real business impact, and reduce that time to insight, it is necessary that the bottleneck is not simply moved from a person who owns the problem, to the person who understands the algorithm which will solve it. Our solution, therefore, is constructed to separate the concerns of the user and the developer. In order to integrate it into your workflow, a user only needs to be able to describe their problem as a set of parameters, and then relate back what happened when this set of parameters was tested.

The results of using this kind of technology can be transformative. For example, when determining the phase boundary of a complex phase diagram, industrial users would typically split the problem into a grid of ‘virtual experiments’, conduct each experiment on an HPC cluster (often taking days to weeks) before combining all the experiments to identify the phase boundary. Through the use of IBO technologies, we were able to only run experiments which contributed useful information to the problem and use our internal Bayesian model to interpolate the rest. On a representative problem, using a third of the number of simulations, we were able to produce a phase diagram with a resolution many orders of magnitude greater than was previously provided.

We also worked with IBM’s High-Speed Bus Signal Integrity (HSB-SI) Team, who work on high integrity designs with low jitter and robustness to manufacturing imperfections. For a simple design task, they would normally run thousands of simulations to map out a design space. Though integrating IBM BOA into their workflow, they were able to achieve the same results with roughly a hundred simulations, which is transformative for their ability to develop and test new prototypes.

In summary, HPC systems will continue to get faster and larger, but for running complex HPC workloads, size is not everything. Remember – you cannot get faster than the simulation you didn’t run because your computing worked smart, not hard.

Attending SC19 in Denver on Nov 17? You may be interested in attending this related session and user group meeting:

MC08: Optimizing Simulation Time to Final Design

Hyatt Regency / Centennial Ballroom F

Wednesday, November 20, 2019
1:00 PM – 2:00 PM

IBM LSF & HPC User Group

Hyatt Regency / Centennial Ballroom G

Tuesday, November 19, 2019
3:00 PM – 5:00 PM


Return to Solution Channel Homepage

IBM Resources

Follow @IBMSystems

IBM Systems on Facebook

Do NOT follow this link or you will be banned from the site!
Share This