HPC and the Spirit of St. Louis

By Thomas Sterling

June 20, 2012

Every year, as the International Supercomputing Conference in Germany approaches, our good friends here at HPCwire invite me to reflect on the trends of the past 12 months, not so much to provide a potentially tedious list of specific events, product deliveries, and TOP500 mantras but rather to convey a personal sense of what it all adds up to and possibly means for the future of HPC.

This year, to do so, is both easy and difficult. As a field, we are at an inflection point marked by significant progress and innovation in form and method, while at the same time we are confronted by uncertainty at a level that is at least uncomfortable for our system providers and possibly disruptive. There is certainly more contention about degree and direction of product and research investment, both within the US and internationally.

HPC has certainly entered a period of diversity far different than a decade ago, and that is not simply a function of the more than two orders of magnitude of Linpack performance in a time of Pax MPI. The easy part is to recite the buzz words of the year: “GPU”, “big data”, “clouds,” and “exascale.” If you are on one of these trains, then according to popular belief, you are on the fast track.

The mad dash to flops through the means of cramming as many ALUs as possible in the dense (and successful) form factor of a GPU, sometimes referred to as “accelerators,” has pushed passed the tipping point, with many, but not all major new installations incorporating these flops multipliers in their arsenal on the field of big iron.

Both NVIDIA and AMD are providing the punch in heterogeneity, although with vary different architectures. It’s actually interesting to watch NVIDIA on the wrong side of the PCI bus move ARM into their modules. AMD is moving their accelerator module into, or at least closer to, their multicore array. Choose your own benchmark (you actually should), but for some, the AMD strategy appears to be working, while the NVIDIA offering is clearly in the lead.

In the US, new big systems like Titan at Oak Ridge and Blue Waters at the University of Illinois are betting on this, and preparing to break through 10 petaflops based on Cray’s latest supercomputing offerings. Both China, with Tianhe-1A, and Japan, with TSUBAME 2, have taken a similar path, each with their own novel contributions.

But there are exceptions, even at the top end. Kei (K) in Kobe, the fastest machine at the time of this writing, has a more tightly integrated architecture provided by Fujitsu as it delivers about 10 petaflops and an array of IBM Blue Gene systems are banking on millions of lighter weight cores in a homogeneous system architecture to deliver an easier-to-program, and therefore more general class of computer, with lower power. (Note the GPUs are good on power as well.).

But programming remains a challenge, and if that is not hard enough, portability is even trickier, especially performance and scalability portability. There are on the order of 50,000-plus CUDA programmers but that does not mean the programming of large scalable systems incorporating GPUs is solved. OpenCL, a community-wide effort to provide an open programming methodology and one that addresses the problems somewhat more broadly, is in work and is attracting a growing body of users. OpenACC is an inchoate programming formalism with broader goals and an OpenMP-like touch and feel.

Many assert that we are looking at the system/programming family of the future. Others (and I’m among them) think it is a transitory phase, which will evolve into something as yet undefined. At least one heavy hitter, Intel, is betting on something all together different; their early MIC chip that defines a new manycore socket exhibiting homogeneity, reduced power, and generality. Clearly, the HPC community is not of a uniform opinion.

A very constructive movement that has gained momentum over the last year in the field of HPC is dubbed “big data.” In science and engineering more and more problems are challenged by the management, processing, and communication of potentially enormous amounts of associated data, whether observed by sensors or derived through simulation. The world’s largest telescopes, LIGO (Laser Interferometric Gravitational Observatory), and of course the LHC (Large Hadron Collider at CERN) are all examples of on-going experiments that generate constant streams of data that have to be dealt with. But biology and medical science also create an ever-growing body of data where cross correlations and data mining becomes an increasing challenge.

Storage capacity is only the beginning of the daunting problems confronting big data science. Communication bandwidth, latency, and reliability for data integrity, as well as power and cost are now and at an increasing pace continue to dominate big data science. Fortunately, unlike some other aspects of mostly flops-intense scientific computing, help will come from industry. This is because big data may generate big profits.

The needs of science in this realm are also manifest in the commercial space from large relational databases, through inventory and sales management, to social networks and search engines. These and other markets will drive technology advancement by the vendors that should have substantial impact on the science domain as well. But over-exuberance in our field is abundant and there are some well-intentioned practitioners in the big data arena who assert that this is THE problem in scientific computing. My message to them is: there are enough problems in HPC to go around.

Of course, according to some, the answer to the question of where to put all that data, or for that matter, where to process it (or any other kind of computing one might need to do) is obvious: it’s the cloud! Well maybe.

The value of clouds or “The Cloud” — I don’t know which — is real, permitting shared environments, data sources, services etc. among multiple people or communities and among the multiple platforms of a single individual. This is a rapidly moving capability and interface the full societal impact of which is probably unpredictable even to the most visionary among us but can be anticipating to be enormous and far reaching.

But for HPC, the utility of clouds in the future is, well, foggy. There are some sweet spots. Storage of data, larger than easily managed by a modest department, but smaller than some horrific size, is likely. The problem with ultra-large data sets is that they have to get moved. If they accrue slowly and are only lightly sampled, this can work. But if the entire data set has to be processed by local computing resources, then the intervening bandwidth provided by the internet simply may not be adequate.

On the computing side, there is an attraction to amortizing the cost and administration of a large array of computing resources across many users. Indeed, the accessibility of a system of very large scale that could not be acquired by any but a few institutions is a potential breakthrough in operational modality. But HPC reflects different forms of usage. The clouds can supply “throughput computing” and a significant percentage of the HPC workload is of this kind. Indeed, pools of resources including workstation farms across academic campuses and else where have been widely employed over decades.

But HPC has many computational challenges, single programs, that are tightly coupled and for which much of the programming challenge is performance tuning. Latencies have to be low, overheads even lower, and cost of information flow understood and stable. Clouds provide none of this in very large configurations. In some sense, this is their strength; successive requests are serviced by different configurations of available resources on demand. But for very large complex problems, they are not suitable, or at least less than optimal. Success of the cloud will require that we benefit from its advantages but not over-hype it and ultimately become disappointed.

People love milestones to mark progress and not just HPC people. In the last century two such captured the imagination of the world. One that I lived through was getting to the Moon with “one small step” provided Neil Armstrong in 1969. But another was a flight non-stop from New York to Paris by Charles Lindbergh in 1927 to claim the Raymond Orteig Prize. Today, the HPC community has self-defined our next milestone as exascale.

Over the last year, this objective has been codified by the US and internationally through meetings, plans, and programs. One international forum, the International Exascale Software Project (IESP), was completed after more than two years with its last of eight meetings in Kobe, Japan. The European Exascale Software Initiative (EESI) was also completed and is now succeeded by EESI-2. Plans are being considered in China, Japan, and Russia for their own path to exascale computing. In the US, the Department of Energy has launched at least three programs to develop a sufficient understanding and capability not just to get to exaflops, but to derive the right kind of exascale systems (hardware and software) and programming methodologies.

The Predictive Science Academic Alliance Program (PSAAP II) has just accepted proposals for exascale application development and system software. The co-design centers are also focused on the development of application algorithms and the systems upon which they are to run. The Modeling of Execution Model projects are exploring and quantifying the very principles upon which future exascale systems will be designed and operated. And the X-stack Program has just selected the teams that will develop next-generation system software and programming environments that will lead to exascale computing while providing nearer term utility as well.

But there is a difference between the milestones of 1927 and 1969 on the one hand, and that of the exascale, on the other. As extraordinary as Lindbergh’s historic accomplishment was, it was an end in itself. That cannot be the case for exascale computing. While our field has been guilty of stunt machines in the past, the cost and importance of achieving useful exascale capability, capacity, and application is too great to invest in merely claiming the first HPL exaflops Rmax run

And if some institution, agency, or nation does force such an artificial solution for a short-lived sense of glory, then surely the serious HPC community should mark this act with disdain. The future of HPC is the future of exascale but not merely such systems or benchmarks in and of themselves, but rather the scientific, medical, societal, and commercial breakthroughs that these systems will enable.

The Spirit of St Louis flew from New York to Paris. But it took a ship back to the US. It wouldn’t have made it if it had tried to do it in reverse. The head winds, which helped it fly east, would have impeded its progress west. The Spirit of Saint Louis now lives in the Smithsonian Air and Space Museum, the world’s most popular museum.

When viewing it from the 2nd floor, the discerning eye will notice a very peculiar thing; there is no front-looking window. Lindbergh could not see where he was going (although he did have a small periscope). From the side windows he could see where he was and guess what was coming next but he did not have the vision ahead.

HPC cannot afford to fly blind. We cannot just use our current position to assume we will make the right incremental progress towards our future destination. And we can’t just build an exascale computer to sit in a museum, even if it does run a benchmark. HPC is a tool for humanity to solve problems of importance when faced with so many critical challenges. No more stunt machines, please.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

NIST/Xanadu Researchers Report Photonic Quantum Computing Advance

March 3, 2021

Researchers from the National Institute of Standards and Technology (NIST) and Xanadu, a young Canada-based quantum computing company, have reported developing a full-stack, photonic quantum computer able to carry out th Read more…

By John Russell

Can Deep Learning Replace Numerical Weather Prediction?

March 3, 2021

Numerical weather prediction (NWP) is a mainstay of supercomputing. Some of the first applications of the first supercomputers dealt with climate modeling, and even to this day, the largest climate models are heavily con Read more…

By Oliver Peckham

Deloitte Outfits New AI Computing Center with Nvidia DGX Gear

March 3, 2021

With AI use continuing to grow in adoption throughout enterprise IT, Deloitte is creating a new Deloitte Center for AI Computing to advise its customers, explain the technology and help them use it in their ongoing busin Read more…

By Todd R. Weiss

HPE Names Justin Hotard New HPC Chief as Pete Ungaro Departs

March 2, 2021

HPE CEO Antonio Neri announced today (March 2, 2020) the appointment of Justin Hotard as general manager of HPC, mission critical solutions and labs, effective immediately. Hotard replaces long-time Cray exec Pete Ungaro Read more…

By Tiffany Trader

ORNL’s Jeffrey Vetter on How IRIS Runtime will Help Deal with Extreme Heterogeneity

March 2, 2021

Jeffery Vetter is a familiar figure in HPC. Last year he became one of the new section heads in a reorganization at Oak Ridge National Laboratory. He had been founding director of ORNL's Future Technologies Group which i Read more…

By John Russell

AWS Solution Channel

Moderna Accelerates COVID-19 Vaccine Development on AWS

Marcello Damiani, Chief Digital and Operational Excellence Officer at Moderna, joins Todd Weatherby, Vice President of AWS Professional Services Worldwide, for a discussion on developing Moderna’s COVID-19 vaccine, scaling systems to enable global distribution, and leveraging cloud technologies to accelerate processes. Read more…

HPC Career Notes: March 2021 Edition

March 1, 2021

In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it’s a promotion, new company hire, or even an accolade, we’ Read more…

By Mariana Iriarte

Can Deep Learning Replace Numerical Weather Prediction?

March 3, 2021

Numerical weather prediction (NWP) is a mainstay of supercomputing. Some of the first applications of the first supercomputers dealt with climate modeling, and Read more…

By Oliver Peckham

HPE Names Justin Hotard New HPC Chief as Pete Ungaro Departs

March 2, 2021

HPE CEO Antonio Neri announced today (March 2, 2020) the appointment of Justin Hotard as general manager of HPC, mission critical solutions and labs, effective Read more…

By Tiffany Trader

ORNL’s Jeffrey Vetter on How IRIS Runtime will Help Deal with Extreme Heterogeneity

March 2, 2021

Jeffery Vetter is a familiar figure in HPC. Last year he became one of the new section heads in a reorganization at Oak Ridge National Laboratory. He had been f Read more…

By John Russell

HPC Career Notes: March 2021 Edition

March 1, 2021

In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it Read more…

By Mariana Iriarte

African Supercomputing Center Inaugurates ‘Toubkal,’ Most Powerful Supercomputer on the Continent

February 25, 2021

Historically, Africa hasn’t exactly been synonymous with supercomputing. There are only a handful of supercomputers on the continent, with few ranking on the Read more…

By Oliver Peckham

Japan to Debut Integrated Fujitsu HPC/AI Supercomputer This Spring

February 25, 2021

The integrated Fujitsu HPC/AI Supercomputer, Wisteria, is coming to Japan this spring. The University of Tokyo is preparing to deploy a heterogeneous computing Read more…

By Tiffany Trader

Xilinx Launches Alveo SN1000 SmartNIC

February 24, 2021

FPGA vendor Xilinx has debuted its latest SmartNIC model, the Alveo SN1000, with integrated “composability” features that allow enterprise users to add their own custom networking functions to supplement its built-in networking. By providing deep flexibility... Read more…

By Todd R. Weiss

ASF Keynotes Showcase How HPC and Big Data Have Pervaded the Pandemic

February 24, 2021

Last Thursday, a range of experts joined the Advanced Scale Forum (ASF) in a rapid-fire roundtable to discuss how advanced technologies have transformed the way humanity responded to the COVID-19 pandemic in indelible ways. The roundtable, held near the one-year mark of the first... Read more…

By Oliver Peckham

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

By John Russell

Esperanto Unveils ML Chip with Nearly 1,100 RISC-V Cores

December 8, 2020

At the RISC-V Summit today, Art Swift, CEO of Esperanto Technologies, announced a new, RISC-V based chip aimed at machine learning and containing nearly 1,100 low-power cores based on the open-source RISC-V architecture. Esperanto Technologies, headquartered in... Read more…

By Oliver Peckham

Azure Scaled to Record 86,400 Cores for Molecular Dynamics

November 20, 2020

A new record for HPC scaling on the public cloud has been achieved on Microsoft Azure. Led by Dr. Jer-Ming Chia, the cloud provider partnered with the Beckman I Read more…

By Oliver Peckham

Programming the Soon-to-Be World’s Fastest Supercomputer, Frontier

January 5, 2021

What’s it like designing an app for the world’s fastest supercomputer, set to come online in the United States in 2021? The University of Delaware’s Sunita Chandrasekaran is leading an elite international team in just that task. Chandrasekaran, assistant professor of computer and information sciences, recently was named... Read more…

By Tracey Bryant

NICS Unleashes ‘Kraken’ Supercomputer

April 4, 2008

A Cray XT4 supercomputer, dubbed Kraken, is scheduled to come online in mid-summer at the National Institute for Computational Sciences (NICS). The soon-to-be petascale system, and the resulting NICS organization, are the result of an NSF Track II award of $65 million to the University of Tennessee and its partners to provide next-generation supercomputing for the nation's science community. Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Top500: Fugaku Keeps Crown, Nvidia’s Selene Climbs to #5

November 16, 2020

With the publication of the 56th Top500 list today from SC20's virtual proceedings, Japan's Fugaku supercomputer – now fully deployed – notches another win, Read more…

By Tiffany Trader

Gordon Bell Special Prize Goes to Massive SARS-CoV-2 Simulations

November 19, 2020

2020 has proven a harrowing year – but it has produced remarkable heroes. To that end, this year, the Association for Computing Machinery (ACM) introduced the Read more…

By Oliver Peckham

Leading Solution Providers

Contributors

Texas A&M Announces Flagship ‘Grace’ Supercomputer

November 9, 2020

Texas A&M University has announced its next flagship system: Grace. The new supercomputer, named for legendary programming pioneer Grace Hopper, is replacing the Ada system (itself named for mathematician Ada Lovelace) as the primary workhorse for Texas A&M’s High Performance Research Computing (HPRC). Read more…

By Oliver Peckham

Saudi Aramco Unveils Dammam 7, Its New Top Ten Supercomputer

January 21, 2021

By revenue, oil and gas giant Saudi Aramco is one of the largest companies in the world, and it has historically employed commensurate amounts of supercomputing Read more…

By Oliver Peckham

Intel Xe-HP GPU Deployed for Aurora Exascale Development

November 17, 2020

At SC20, Intel announced that it is making its Xe-HP high performance discrete GPUs available to early access developers. Notably, the new chips have been deplo Read more…

By Tiffany Trader

Intel Teases Ice Lake-SP, Shows Competitive Benchmarking

November 17, 2020

At SC20 this week, Intel teased its forthcoming third-generation Xeon "Ice Lake-SP" server processor, claiming competitive benchmarking results against AMD's second-generation Epyc "Rome" processor. Ice Lake-SP, Intel's first server processor with 10nm technology... Read more…

By Tiffany Trader

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

By John Russell

Livermore’s El Capitan Supercomputer to Debut HPE ‘Rabbit’ Near Node Local Storage

February 18, 2021

A near node local storage innovation called Rabbit factored heavily into Lawrence Livermore National Laboratory’s decision to select Cray’s proposal for its CORAL-2 machine, the lab’s first exascale-class supercomputer, El Capitan. Details of this new storage technology were revealed... Read more…

By Tiffany Trader

African Supercomputing Center Inaugurates ‘Toubkal,’ Most Powerful Supercomputer on the Continent

February 25, 2021

Historically, Africa hasn’t exactly been synonymous with supercomputing. There are only a handful of supercomputers on the continent, with few ranking on the Read more…

By Oliver Peckham

It’s Fugaku vs. COVID-19: How the World’s Top Supercomputer Is Shaping Our New Normal

November 9, 2020

Fugaku is currently the most powerful publicly ranked supercomputer in the world – but we weren’t supposed to have it yet. The supercomputer, situated at Japan’s Riken scientific research institute, was scheduled to come online in 2021. When the pandemic struck... Read more…

By Oliver Peckham

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire