Code Modernization: Bringing Codes Into the Parallel Age

By Doug Black

June 8, 2017

The ways that advanced computing performance depends on more – much more – than the processor take many forms. Regardless of Moore’s Law validity, it’s indisputable that other aspects of the computing ecosystem must keep pace with processor development if the system is to deliver the results everyone’s after.

One of those aspects is application code, some of which dates back to the late 1950s, when parallel computing was a futuristic computer science vision. Still in use today, those codes have been goosed, tickled and jolted for better performance, but they remain at their core what they’ve always been: serial applications.

We recently caught up with Joe Curley, senior director of Intel’s code modernization organization, who shared observations about Intel’s effort to optimize, or parallelize, widely used public codes for the latest generations of highly parallel x86 CPUs.

This includes work around applications used by manufacturers in product design, such as OpenFOAM for CFD; advanced MRI diagnostics programs used in the medical industry; seismic code for the oil and gas industry and applications used by banks and other financial services organizations.

Obviously, it’s in Intel’s self-interest to extend the life of the 40-year-old x86 architecture by maintaining an up-to-date code library. But organizations all over the world are hampered by the old code that, unoptimized, drags down the throughput of high performance clusters and impedes the work they do.

Intel’s Joe Curley

A recent development in code modernization, Curley said, has been incorporation of AI and machine learning techniques, which – when done right – can boost performance exponentially well beyond conventional, processor-focused code modernization work.

Much of Intel’s code modernization work comes out of its global network of Intel Parallel Computing Centers (IPCCs). Begun four years ago with six centers, the program has expanded to 72 and has worked on 120 codes in more than 21 domains.

The following are excerpts from our interview with Curley, some of which have been re-ordered for clarity.

Definition and Need

Code modernization can mean many things, from using a modern language to optimizing performance. We use code modernization in the literal sense: to become modern, using the newest information methods with technology.

The typical impact of a code modernization problem is giving someone the ability to take on a problem that was just too big to get at before. We’re trying to extract the maximum performance from an application and take full advantage of modern hardware. Other words have been used: optimization, parallelization and some others. But you can be parallel without being optimal, you can be optimal without being parallel. So we chose a slightly different term. It’s imperfect but it gets across the idea.

Modern, general purpose server processors have 18-22 processing cores with two threads and a vector unit built into it. They’re massively parallel processors. But by and large the applications we run on them have been derived from code that was generated in a sequential processing era. The fundamental problem that we work with is that many of the codes used in industry or in the enterprise today are derived from algorithms written anywhere from the 1950s to the 2000’s. And the microprocessors used at the time were primarily single core machines, so you have a very serial application.

In order to use a modern processor you could just take that serial application, create many copies of it and try to run it in parallel. And that’s been done for years. But the real power performance breakthroughs happen when someone steps back and asks: How can I start using all of these cores together computationally and in parallel?

What’s encompassed by code modernization

Our group does everything from training, academic engagement, building sample codes, working with ISVs in communities, both internally and externally. We focus efforts on open source communities and open source codes. The reason is that we’re not only trying to improve science, we’re also trying to improve the understanding of how to program in parallel and how to solve problems, so having the teaching example, or the example that a developer can look at, that’s incredibly important.

We’ve taken the output from the IPCCs, we’ve written it down, we’ve created case studies, we’ve created open source examples, graphs, charts – teaching examples – and then put it out through a series of textbooks. But importantly, all of the (output) can be used either by a software developer or an academic to teach people the state of the art.

For the IPCCs, the idea was to find really good problems that would most benefit from using the modern machine if only we could unlock performance of the code. Our work ranges practical academics to communities that generate community codes. In some cases they’re industrial and academic partnerships, some are in the oil and gas industry, working on refinement of core codes that will then go back in for use in seismic imaging. The idea is for these are to be real hands-on workshops between domain scientists, computer scientists, and Intel that have actual practical use within the life of our products.

So not only are we getting the first order of benefit if, say, an auto manufacturer was using OpenFOAM and got a result faster. That’s great, we’ve made it more efficient. But we’re also creating a pool of programmers and developers who’ll be building code for the next 20 years, making them more efficient as well.

Example: Medical/Life Sciences

One of our IPCC’s was with Princeton University, they were trying to get a better understanding of what was happening inside the human brain through imaging equipment while a patient was in the medical imaging apparatus. It’s a form of MRI called fMRI. The science on that is pretty well established. They knew how to take the data that was coming from the MRI, and they could compute on it and create a model of what’s going on inside the brain. But in 2012, when we started the project, they estimated it would take 44 years on their cluster to be able to calculate. It wasn’t a practical problem to solve.

So instead of using the serial method they were using they could start using it in parallel on more energy efficient, modern equipment. They came up with a couple of things. One: they parallelized their code and saw huge increases in performance. But they also looked at it algorithmically, they began to look at the practically of machine learning and AI, and how you could use that for science. Since these researchers happened to be from neural medicine centers they understood how the brain works. They were trying to use the same kind of cognition, or inference, that you have inside your brain algorithmically on the data coming from the medical imaging instrument.

They changed the algorithm, they parallelized their code, they put it all together and ended up with a 10,000X increase in performance. More practically, they were able to take something that would have taken 44 years down to a couple of minutes. They went from something requiring a supercomputing project at a national lab to something that could be done clinically inside a hospital.

That really captures what you can try to do inside a code modernization project. If you can challenge your algorithms, you can look at the best ways to compute, you can look at the parallelization, you can look at energy efficiency, and you can achieve massive increases in performance.

So now, how that hospital treats the neurology of the brain is different because of the advances offered by code modernization. Of course the application of that goes out into the medical community, and you can start looking at fMRI in more clinical environments.

Example: Industrial Design

One of the community applications, OpenFOAM, is used heavily in automobile manufacturing. We’ve worked with a number of fellow researchers to deliver breakthroughs in power and performance by 2 or 3x, which, across an application the size and magnitude of OpenFOAM is really substantial.

It also creates a lighthouse example for commercial ISVs of what can be done. This clearly showed that for computational fluid dynamics at scale, entirely new methods can be applied to the problem. We’ve had a lot of interest and pick-up from commercial ISVs on some of the work being done using some of the community codes.

Here’s the thing we want to get at: What’s the real value in computing a model faster? Most people tend to think of code modernization simply as making a simulation run faster. But one of the things we’ve done is develop software that can help you better visualize your physical design.

Audi, for example, has worked with Autodesk as an ISV partner, they’ve developed modern Raytracer (rendering engine) examples of things we work on inside our code modernization group. We have another group that works on visualization and how to take your images and make them look lifelike. Autodesk has come up with clever ways of doing that and building that into their product line, and then allowing Audi to remove physical prototypes both for assembly as well as for interior and exterior design from their process.

Think of someone building a clay model of a car and taking it to a wind tunnel, or building a fit-and-finish model of a car, to see how the interior design will look and to see if it’s pleasing to the customer. They’ve removed all that modeling. It’s all being done digitally, not only the digital design and simulation but also the digital prototyping, and then visualizing it through modern software on a departmental-sized computer.

The impact of that, according to Audi when they spoke at ISC, is that it removed seven months from their process for the fit-and-finish prototypes and six for the physical prototypes. If you can shave that much time out of your process you can gain major competitive advantage from HPC.

It’s all made possible by new highly parallel codes and interestingly, all the visualization is done entirely on general-purpose CPUs.

Example: Financial Services

For financial services companies, with code modernization there’s the opportunity to use the same cluster, that you’d use for the rest of your bank’s operations, for the most high performance tasks. Whether it’s options valuation or risk management or some of the tasks you use HPC for, we can do that on general-purpose Xeon CPUs.

In banking one of the problem is that most of those codes are the crown jewels of the banks. So we can’t talk about them. In many cases we don’t even see them. But we can work on the STAC-A2 benchmark – it’s a consortium of banks that’s built a suite of benchmarks for a variety of problems that operate sufficiently like what they do to get an idea of how fast they can run their software, and the STAC-A2 results get published.

On both our general-purpose Xeon and Xeon Phi CPUs through code modernization we’ve set world records for the STAC-A2 repeatedly. It’s an arms race. But we’ve done it multiple times with general purpose code.

That allows the bank to take that code as an exemplar, and apply it to their own special algorithms and their own financial science, and get the most performance out of their general-purpose infrastructure.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “pre-exascale” award), parsed out additional information ab Read more…

By Tiffany Trader

Tsinghua Crowned Eight-Time Student Cluster Champions at ISC

June 22, 2017

Always a hard-fought competition, the Student Cluster Competition awards were announced Wednesday, June 21, at the ISC High Performance Conference 2017. Amid whoops and hollers from the crowd, Thomas Sterling presented t Read more…

By Kim McMahon

GPUs, Power9, Figure Prominently in IBM’s Bet on Weather Forecasting

June 22, 2017

IBM jumped into the weather forecasting business roughly a year and a half ago by purchasing The Weather Company. This week at ISC 2017, Big Blue rolled out plans to push deeper into climate science and develop more gran Read more…

By John Russell

Intersect 360 at ISC: HPC Industry at $44B by 2021

June 22, 2017

The care, feeding and sustained growth of the HPC industry increasingly is in the hands of the commercial market sector – in particular, it’s the hyperscale companies and their embrace of AI and deep learning – tha Read more…

By Doug Black

HPE Extreme Performance Solutions

Creating a Roadmap for HPC Innovation at ISC 2017

In an era where technological advancements are driving innovation to every sector, and powering major economic and scientific breakthroughs, high performance computing (HPC) is crucial to tackle the challenges of today and tomorrow. Read more…

At ISC – Goh on Go: Humans Can’t Scale, the Data-Centric Learning Machine Can

June 22, 2017

I've seen the future this week at ISC, it’s on display in prototype or Powerpoint form, and it’s going to dumbfound you. The future is an AI neural network designed to emulate and compete with the human brain. In thi Read more…

By Doug Black

Cray Brings AI and HPC Together on Flagship Supers

June 20, 2017

Cray took one more step toward the convergence of big data and high performance computing (HPC) today when it announced that it’s adding a full suite of big data and artificial intelligence software to its top-of-the-l Read more…

By Alex Woodie

AMD Charges Back into the Datacenter and HPC Workflows with EPYC Processor

June 20, 2017

AMD is charging back into the enterprise datacenter and select HPC workflows with its new EPYC 7000 processor line, code-named Naples, announced today at a “global” launch event in Austin TX. In many ways it was a fu Read more…

By John Russell

Hyperion: Deep Learning, AI Helping Drive Healthy HPC Industry Growth

June 20, 2017

To be at the ISC conference in Frankfurt this week is to experience deep immersion in deep learning. Users want to learn about it, vendors want to talk about it, analysts and journalists want to report on it. Deep learni Read more…

By Doug Black

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Tsinghua Crowned Eight-Time Student Cluster Champions at ISC

June 22, 2017

Always a hard-fought competition, the Student Cluster Competition awards were announced Wednesday, June 21, at the ISC High Performance Conference 2017. Amid wh Read more…

By Kim McMahon

GPUs, Power9, Figure Prominently in IBM’s Bet on Weather Forecasting

June 22, 2017

IBM jumped into the weather forecasting business roughly a year and a half ago by purchasing The Weather Company. This week at ISC 2017, Big Blue rolled out pla Read more…

By John Russell

Intersect 360 at ISC: HPC Industry at $44B by 2021

June 22, 2017

The care, feeding and sustained growth of the HPC industry increasingly is in the hands of the commercial market sector – in particular, it’s the hyperscale Read more…

By Doug Black

At ISC – Goh on Go: Humans Can’t Scale, the Data-Centric Learning Machine Can

June 22, 2017

I've seen the future this week at ISC, it’s on display in prototype or Powerpoint form, and it’s going to dumbfound you. The future is an AI neural network Read more…

By Doug Black

Cray Brings AI and HPC Together on Flagship Supers

June 20, 2017

Cray took one more step toward the convergence of big data and high performance computing (HPC) today when it announced that it’s adding a full suite of big d Read more…

By Alex Woodie

AMD Charges Back into the Datacenter and HPC Workflows with EPYC Processor

June 20, 2017

AMD is charging back into the enterprise datacenter and select HPC workflows with its new EPYC 7000 processor line, code-named Naples, announced today at a “g Read more…

By John Russell

Hyperion: Deep Learning, AI Helping Drive Healthy HPC Industry Growth

June 20, 2017

To be at the ISC conference in Frankfurt this week is to experience deep immersion in deep learning. Users want to learn about it, vendors want to talk about it Read more…

By Doug Black

Quantum Bits: D-Wave and VW; Google Quantum Lab; IBM Expands Access

March 21, 2017

For a technology that’s usually characterized as far off and in a distant galaxy, quantum computing has been steadily picking up steam. Just how close real-wo Read more…

By John Russell

Trump Budget Targets NIH, DOE, and EPA; No Mention of NSF

March 16, 2017

President Trump’s proposed U.S. fiscal 2018 budget issued today sharply cuts science spending while bolstering military spending as he promised during the cam Read more…

By John Russell

HPC Compiler Company PathScale Seeks Life Raft

March 23, 2017

HPCwire has learned that HPC compiler company PathScale has fallen on difficult times and is asking the community for help or actively seeking a buyer for its a Read more…

By Tiffany Trader

Google Pulls Back the Covers on Its First Machine Learning Chip

April 6, 2017

This week Google released a report detailing the design and performance characteristics of the Tensor Processing Unit (TPU), its custom ASIC for the inference Read more…

By Tiffany Trader

CPU-based Visualization Positions for Exascale Supercomputing

March 16, 2017

In this contributed perspective piece, Intel’s Jim Jeffers makes the case that CPU-based visualization is now widely adopted and as such is no longer a contrarian view, but is rather an exascale requirement. Read more…

By Jim Jeffers, Principal Engineer and Engineering Leader, Intel

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Nvidia’s Mammoth Volta GPU Aims High for AI, HPC

May 10, 2017

At Nvidia's GPU Technology Conference (GTC17) in San Jose, Calif., this morning, CEO Jensen Huang announced the company's much-anticipated Volta architecture a Read more…

By Tiffany Trader

Facebook Open Sources Caffe2; Nvidia, Intel Rush to Optimize

April 18, 2017

From its F8 developer conference in San Jose, Calif., today, Facebook announced Caffe2, a new open-source, cross-platform framework for deep learning. Caffe2 is the successor to Caffe, the deep learning framework developed by Berkeley AI Research and community contributors. Read more…

By Tiffany Trader

Leading Solution Providers

MIT Mathematician Spins Up 220,000-Core Google Compute Cluster

April 21, 2017

On Thursday, Google announced that MIT math professor and computational number theorist Andrew V. Sutherland had set a record for the largest Google Compute Engine (GCE) job. Sutherland ran the massive mathematics workload on 220,000 GCE cores using preemptible virtual machine instances. Read more…

By Tiffany Trader

Google Debuts TPU v2 and will Add to Google Cloud

May 25, 2017

Not long after stirring attention in the deep learning/AI community by revealing the details of its Tensor Processing Unit (TPU), Google last week announced the Read more…

By John Russell

US Supercomputing Leaders Tackle the China Question

March 15, 2017

Joint DOE-NSA report responds to the increased global pressures impacting the competitiveness of U.S. supercomputing. Read more…

By Tiffany Trader

Russian Researchers Claim First Quantum-Safe Blockchain

May 25, 2017

The Russian Quantum Center today announced it has overcome the threat of quantum cryptography by creating the first quantum-safe blockchain, securing cryptocurrencies like Bitcoin, along with classified government communications and other sensitive digital transfers. Read more…

By Doug Black

Groq This: New AI Chips to Give GPUs a Run for Deep Learning Money

April 24, 2017

CPUs and GPUs, move over. Thanks to recent revelations surrounding Google’s new Tensor Processing Unit (TPU), the computing world appears to be on the cusp of Read more…

By Alex Woodie

DOE Supercomputer Achieves Record 45-Qubit Quantum Simulation

April 13, 2017

In order to simulate larger and larger quantum systems and usher in an age of “quantum supremacy,” researchers are stretching the limits of today’s most advanced supercomputers. Read more…

By Tiffany Trader

Messina Update: The US Path to Exascale in 16 Slides

April 26, 2017

Paul Messina, director of the U.S. Exascale Computing Project, provided a wide-ranging review of ECP’s evolving plans last week at the HPC User Forum. Read more…

By John Russell

Knights Landing Processor with Omni-Path Makes Cloud Debut

April 18, 2017

HPC cloud specialist Rescale is partnering with Intel and HPC resource provider R Systems to offer first-ever cloud access to Xeon Phi "Knights Landing" processors. The infrastructure is based on the 68-core Intel Knights Landing processor with integrated Omni-Path fabric (the 7250F Xeon Phi). Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Share This