So much ‘spaghetti’ gets tossed on walls by the technology community (vendors and researchers) to see what sticks that it is often difficult to peer through the splatter; in 2018 it felt like everyone was strong-arming AI offerings. The danger is not so much disappointing products; it’s that the AI idea becomes so stretched as to lose meaning. So it was something of a surprise, at least for me, that amid the endless spray of AI announcements last year, something fascinating happened – artificial intelligence (not the HAL kind) started irreversibly transforming HPC for scientists almost as quickly as the marketing hype predicted. I like the succinct quote below posted in October on the ORNL web site.
“Emergence of AI is a very rare type of event,” said Sergei Kalinin, director of ORNL’s Institute for Functional Imaging of Materials. “Once in a generation there is a paradigm shift in science, and this is ours.” Kalinin and his colleagues use machine learning to better analyze data streams from the laboratory’s powerful electron and scanning probe microscopes. https://www.ornl.gov/blog/ornl-review/ai-experimentalist-s-experience
How we do science is changing. AI (writ large) is the change. Don’t get the wrong idea. First-principle modeling and simulation is hardly defunct! And 2018 was momentous on many fronts – Big Machines (think Summit/Sierra for starters); Flaring of a Real Processor War (we’ll get to it later); Quantum Something (at least in spending); Rehabilitation of U.S. DoE Secretary Rick Perry (knew you’d come around Rick); Nvidia’s Continued Magic (DGX-2 and T4 introductions), Mergers & Acquisitions (IBM/Red Hat, Microsoft/GitHub) and Inevitable Potholes (e.g. Meltdown/Spectre and other nasty things). It was a very full year.
But the rise of AI in earnest is the key feature of 2018. Let’s not quibble about exactly what constitutes AI – broadly, it encompasses deep learning (neural networks), machine learning, and a variety of data analytics. One has the sense it will self-identify in the hands of users like Dr. Kalinin. Whatever it is, it’s on the verge of transforming not only HPC but all of computing. With regrets to the many subjects omitted (neuromorphic computing and the Exascale Computing Project’s (ECP) admirable output are just two) and apologies for the rapid-fire treatment (less technical) of topics tackled, here are a few reflections on 2018’s frenetic rush and thoughts on what 2019 may bring. Included at the end of each section are a few links to articles on the topic appearing throughout 2018.
- Congratulations IBM and Brace for the March of New BIG Machines
Let start with the excellent job done by IBM, its collaborators Nvidia and Mellanox (and others), and the folks at Oak Ridge Leadership Computing Facility (OCLF) and Lawrence Livermore Computing Center (LC) in standing up Summit (an IBM AC922 system) and Sierra supercomputers. Summit and Sierra are, for now, the top performers on the Top500 List. As important is the science they are already doing (produced the 2018 Gordon Bell winner and a number of GB finalists). Both systems also reinforce the idea that heterogeneous architectures will likely dominate near-term supercomputers.
IBM has taken lumps in this yearly wrap-up/look-ahead column. This year Big Blue (and partners) deserves a victory lap for these fantastic machines. Designing and standing up leadership machines isn’t for the faint-hearted – ask Intel about Aurora. IBM has plenty of challenges in the broader HPC server market which we’ll get to later.
Supercomputing is a notoriously boom-bust game. Right now, it’s booming driven largely by the global race to exascale computing and also by efforts to create large-scale compute infrastructures able to deliver AI capabilities. Japan’s just-completed AI Bridging Cloud Infrastructure (ABCI) is a good example of the latter. Proficiency at AI is likely to be a requirement for big machines going forward.
Barring unforeseen economic dips (at one point while writing this piece the Dow was down 500 points) the supercomputing boom will continue with a fair number of pre-exascale and exascale class machines under development worldwide and other leadership class or near-leadership class systems also in the pipeline or expected soon. Hyperion Research is forecasting supercomputer spending to basically double from $4.8B in 2017 to $9.5B in 2022. Let the good times roll while they may!
How well the U.S. is doing overall in the supercomputer fray is a matter of debate.
HPCwire noted in its coverage of the Top500 list at SC18 that, “China now claims 229 systems (45.8 percent of the total), while U.S. share fell has dropped to the lowest ever: 108 systems (21.6 percent). That wide delta in system count is offset by the U.S. having the top two systems and generally operating more powerful systems (and more real HPC systems, as opposed to Web/cloud systems), allowing the U.S. to enjoy a 38 percent performance share, compared to China’s 31 percent. Related to the rise in these non-HPC systems, Gigabit Ethernet ropes together 254 systems. 275 systems on the list are tagged as industry.”
There will always be debate over the value of the Top500 as a metric. Indeed there’s a good deal more to say about supercomputing generally. New architectures coming. Cray recently unveiled its new Shasta line. The HPC community continues speculating over what Aurora’s architecture will look like. There’s even a paper out of China with ideas for reaching Zettascale.
Instead of hammering away at further big machine dynamics, stop and enjoy the standing up of Summit and Sierra for a moment.
SUPERCOMPUTING LOOK-BACK 2018
- Handicapping the Arm(s) Race as Processor Wars Flare Up
One is tempted to write, “The King is Dead. Long Live the King” following Intel’s discontinuance of its Knights Landing line. That’s unfair. A knight is not a king and KNL was never king, nor perhaps, ever intended to be. Although Intel has travelled a more rugged landscape than usual this year, it is still the dominant force in processors. However, KNL’s misfortune does reflect the increasingly competitive fast-moving processor market. For the first time in years a real war among diverse processor alternatives is breaking out.
AMD’s Epyc line, just launched in June 2017, now has around two percent of the x86 server market according to IDC. Last June, Intel’s then CEO Brian Krzanich worried to analysts about how to keep AMD from capturing 15-20 percent of the market. Don’t snicker. He was right. AMD is on a roll.
AMD has major wins among the major systems and cloud providers. It’s in the game, and often enjoys a price advantage. In October, Cray unveiled its next-gen supercomputing architecture, Shasta, which was selected to be the next flagship system at NERSC. Named “Perlmutter” (after Nobel Prize winning astrophysicist Saul Perlmutter), the system will feature AMD Epyc processors and Nvidia GPUs offering a combined peak performance of ~100 petaflops and a sustained application performance equivalent to about 3X that of the Cray Cori (NERSC-8) supercomputer.
Moving on. Arm, so dominant in the mobile market, has struggled to achieve traction in HPC and the broader server marker. Lately the arrival of 64-bit chips is changing attitudes, helped by the filling out of the tool/support ecosystem.
Dell EMC HPC chief Thierry Pellegrino told HPCwire at SC18, “[J]ust like other OEMs out there – we had a SKU available that was 32-bit and didn’t really sell. But I think we are not one of those OEMs that will go out there and just design it and hope people will go and buy it. We depend upon our customers. I can tell you historically customers have asked questions about Arm but have not been very committal. Those discussions are now intensifying…The TX2 (Marvel/Cavium ThunderX2, available last May) looks good and the ThunderX3 roadmap looks great but they aren’t the only ones supplying Arm. Fujitsu has an offering. We also see Ampere with an offering coming up.”
Arm is winning a few key systems deals and turning up in some big machines, such as the Astra system from HPE for Sandia National Lab and in CEA’s selection of an Arm-based system from Atos. The Isambard System at the University of Bristol is another Arm-based large system (Cray XC50) and, of course, Japan’s post K supercomputer is based on an Arm chip from Fujitsu. Arm is slowly insinuating itself into the server (big and small) market. Cray, for example, has been promoting an on-demand webinar entitled, “Embrace Arm for HPC Confidently with the Cray Programming Environment.”
Then there’s IBM’s Power9. IBM is riding high on the success of Summit and Sierra. Its challenge is winning traction for Power in the broader server market. Here’s Pellegrino again: “I think right now we are very busy and focused on Intel, x86, and Arm. It’s not impossible that Power could become more relevant. We are always looking at technologies. The Power-Nvidia integration was a pretty smart move and we’ve seen some clusters won by Power. But it’s not an avalanche. I think it works great for purposeful applications. For general purpose, I think it’s still looked at as [less attractive] than AMD Intel and ARM.”
The overall picture is growing clearer. AMD and Arm will take some market share from Intel. It’s no doubt important that AMD’s ROME line (now sampling) impress potential buyers. So far AMD’s return to the datacenter has been without significant error. To some extent, IBM still has to prove itself (price competitive and easy to use) but is making progress and selling systems. Intel, of course, remains king but fate can move quickly.
Bottom line: Processor alternatives are available and for the first time in a long time, the market seems interested.
PROCESSOR LOOK-BACK 2018
- After the CERN AI ‘Breakthrough’, Scientific Computing Won’t be the Same
No startling predictions here. Even so, AI is not only next year’s poster child but likely the poster child for the next decade as we work toward understanding its potential and developing technologies to deliver it. That said, because AI is being adopted or at least tested in so many different venues and applications, charting its many-veined course forward is challenging. Accelerator-driven, heterogeneous architectures with advanced mixed-precision processing capabilities is just the start, and mostly in top-of-line scientific computing systems. Embedded systems are likely to show greater AI variety.
A watershed moment of sorts occurred over the summer when work by CERN scientists was awarded a best poster prize at ISC18 for demonstrating that AI-based models have the potential to act as orders-of-magnitude-faster replacements for computationally expensive tasks in simulation. Their work is part of a CERN openlab project in collaboration with Intel. That project is just one of many projects demonstrating AI effectiveness in scientific computing. The CANcer Distributed Learning Environment (CANDLE) project is another. AI tools developed by CANDLE will find use across a broad range of DoE missions.
Events are moving fast. You may not know, for example, there’s a bona fide effort to develop a Deep500 benchmark underway led by Torsten Hoefler and Tal Ben-Nun of ETH in close collaboration with other distinguished researchers such as Satoshi Matsuoka, director of the Japan’s RIKEN Center for Computational Science .
“We are organizing a monthly meeting with leading researchers and interested parties from the industry. The meetings are open and posted on the Deep500 website (https://www.deep500.org/). Following that, the next step is to establish a steering committee for the benchmark. It is imperative that we fix the ranking and metrics of the benchmark, as the community is undecided right now on several aspects of this benchmark (see below). We intend to make considerable progress this year, reconvene at SC19,” Ben-Nun told HPCwire.
More than just bragging rights, such a benchmark may have eminently practical uses. Matsuoka described the difficult effort he and colleagues had developing procurement metrics for the ABCI system. HPCwire will have coverage of the emerging Deep500 Benchmark effort and its BOF session at SC18 early in the new year.
The user community does seem hungry for comparison metrics – an HPCwire article on the broader MLPerf standard’s introduction in May, led in part by Google, Baidu, Intel, AMD, Harvard, and Stanford, was one of the highest read articles of the year. Just last week Nvidia loudly trumpeted its performance on the first round of results released by the seven-month-old standard. (Yes, Nvidia fared well.)
AI’s challenges are mostly familiar. Model training is notoriously difficult. Required datasets are often massive and sometimes remote from the compute resource. Pairing CPUs with GPUs is the most common approach. No surprise, Nvidia has jumped onto AI as a best-use of its GPUs (scale up and scale out), DGX-2 computer, and assorted software tools including containerized aps, verticalized stacks, code compatibility across its products. Intel is likewise driving a stake deep into AI territory with chips and SOCs. Intel is also working intensely on neuromorphic technology (Loihi chip) which may eventually deliver greater deep learning efficiency and lower power consumption.
All of the systems houses, Dell EMC, HPE, IBM, Lenovo, Supermicro, etc. have ‘AI’ solutions of one or another flavor. Cloud providers and social networks, of course, have been deep into AI writ large for years. They have been busily developing deep learning, machine learning, and data analytics expertise for several years, often sharing their learnings in open source. It’s a virtuous cycle since they are all also heavy consumers of ‘AI’.
It’s really not clear yet how all of this will shake out. Heck, quantum computer pioneer D-Wave launched a machine learning business unit this year. Don’t ask me exactly what it does. What does seem clear is that AI technologies will take on many new tasks and, at least in HPC, increasingly work in concert with traditional modeling and simulation.
Prediction: Next year’s Gordon Bell prize finalists will likely (again) include some AI-driven surprises.
AI LOOK-BACK 2018
- Quantum’s Haze…Are We There Yet? No!
Where to start?
The $1.2 billion U.S National Quantum Initiative, first passed by the House of Representatives in September, was finally passed by the Senate on Dec. 14. It’s expected to reach the president’s desk by year end and to be signed. It’s a ten-year program covering many aspects of fostering a quantum computing ecosystem. And yes, it is driven in part by geopolitical worries of falling behind in a global quantum computing race. Indeed there are several other like-minded efforts around the globe.
Jim Clarke, director of quantum hardware, Intel Labs, issued a statement in support back when the House acted on the bill: “This legislation will allocate funding for public research in the emerging area of Quantum Computing, which has the potential to help solve some of our nation’s greatest challenges through exponential increases in compute speed. [We] look forward to working with leaders in the Senate to help keep the U.S. at the cutting edge of quantum information science and maintain the economic advantages of this technological leadership.”
HPCwire reported then, “As spelled out in the bill, 1) National Institute of Standards and Technology (NIST) Activities and Workshops would receive $400 million (2019-2023 at $80 million per year); 2) National Science Foundation (NSF) Multidisciplinary Centers for Quantum Research and Education would receive $250 million (2019-2023, at $50 million per year); and 3) Department of Energy Research and National Quantum Information Science Research Centers would receive $625 million (2019-2023 at $125 million per year).”
It’s a big, whole-of-government program and as in all such things the devil will be in the details.
Meanwhile, a report released on December 5th by the National Academies of Science, Engineering, and Medicine (Quantum Computing: Progress and Prospects) declares robust, error-corrected quantum computers won’t be practical for at least a decade! Until then, according to the report, noisy intermediate scale quantum computers (NISQ) will have to carry the load and, adds the report, no one is quite sure what NISQs will actually be able to do.
A little like Schrodinger’s Cat, quantum computing is alive or not, depending upon which report you look at (bit of a stretch, I know). By now most of the HPC community is familiar in broad terms with quantum computing’s potential. Many are dabbling already in QC. I attended an excellent workshop at SC18 led by Scott Pakin (Los Alamos National Laboratory) and Eleanor Reiffel (NASA Ames Research Center) and one of our exercises, using an online toolbox, was to build a two-bit adder using gate-based quantum computing code. The show of hands denoting success at the task was not overwhelming.
Quantum computing is different and tantalizing and needs pursuing.
There is so much ongoing activity in quantum computing that it’s very possible the sober near-term outlook presented in the NASEM report is too pessimistic. At least three vendors – IBM, D-Wave Systems, and Rigetti Computing – have launched web-based platforms providing access to tools, instruction, and quantum processors. The (good) idea here is to jump start a community of developers working on quantum applications and algorithms. It seems likely other notable quantum pioneers such as Microsoft, Google, and Intel will follow suit with their own web-based quantum computing sand boxes.
Also the chase for quantum supremacy has been joined by quantum advantage. I rather like the NASEM report’s thoughts here:
“Demonstration of ‘quantum supremacy’—that is, completing a task that is intractable on a classical computer, whether or not the task has practical utility—is one [milestone]. While several teams have been focused on this goal, it has not yet been demonstrated (as of mid-2018). Another major milestone is creating a commercially useful quantum computer, which would require a QC that carries out at least one practical task more efficiently than any classical computer. While this milestone is in theory harder than achieving quantum supremacy—since the application in question must be better and more useful than available classical approaches—proving quantum supremacy could be difficult, especially for analog QC. Thus, it is possible that a useful application could arise before quantum supremacy is demonstrated.”
Rigetti is offering $1 million prize to the first group/individual to demonstrate quantum advantage using its web-platform.
Overall, there are many smart, experienced researchers working on quantum computing writ large and that includes many application areas (computing, sensors, communications, etc.). I like how John Martinis, who leads Google’s quantum effort and is a Stanford researcher, put it during the Q&A at release of the NASEM report which he helped write. He’s also a former HPCwire Person to Watch (2017):
“Progress in the field has been quite good in the last few years, and people have been able not just to do basics physics experiments but [also] to start building quantum computing systems. I think there’s a lot more optimism that people can build things and get it to work properly. Of course there’s lot of work to be done to get them to work well and match them to problems but the pace [of progress] has picked up and there’s interesting things that have come out. I think that in the next year or two, [we] won’t get to solving actual problems yet but there will be a lot better machines out there,” said Martinis.
Somewhere, there’s a pony hidden in the quantum play room. Just don’t expect to find it in 2019. Listed below are links to just a few of the many articles HPCwire has published on QC this year. Also, the NASEM report isn’t a bad reference and is free to download.
QUANTUM LOOK-BACK ON 2018
- Too Little Space for So Many Worthwhile Items
We start our roundup with remembrance of three prominent members of the HPC and science communities. In March, legendary physicist Stephen Hawking died at age 76. Hawking made lasting contributions in many areas and advanced cosmology as a computational science and led the launch of several UK supercomputers dedicated to cosmology and particle physics. In April, computer pioneer Burton J. Smith passed away at age 77. He was an MIT and Microsoft alum, a renowned parallel computing expert, and a leader in the HPC community. His 2007 ISC keynote detailed how computing would be reinvented for the multicore era. A third loss of note was Bob Borchers, one of the founders of the Supercomputing Conference, who died in June. Among his many accomplishments, Borchers served as Director of the Division of Advanced Scientific Computing at the National Science Foundation (NSF).
There are always a few eye-catching M&As. Microsoft’s $7.5 billion gobbling up of GitHub in June is still being closely watched. Several analysts at the time said the move reaffirms Microsoft’s commitment to open-source development. We’ll see. In October, IBM announced plans to purchase Linux powerhouse Red Hat for $34 billion. Probably too soon to say much about the latter deal. Personnel shuffling is part of life in HPC (and everywhere) The wait continues for a new Intel CEO. That said Intel snapped up Jim Keller in April from Tesla to lead Intel’s system-on-chip development efforts. Keller had been leader in AMD’s x86 Zen architecture development and has worked extensively on Arm.
HPE’s Spaceborne Computer (based on the HPE Apollo 40) successfully completed its first year in space, demonstrating a system built with commercial off the shelf (COTS) parts could survive the rigors of space. Haven’t heard much lately from Pattern Computer which emerged from stealth in May sporting some familiar HPC names (Michael Riddle, James Reinders). In a nutshell, Pattern Computer says it has developed an approach to exploring data that permits very high dimensionality exploration in contrast to the pairwise approach that now dominates. It hasn’t spelled out details.
Some less praiseworthy moments: Daisuke Suzuki, GM of Pezy Computer, was sentenced (July) to three years in prison which was then reduced to a four-year suspended sentence. No, word yet on Pezy President Motoaki Saito, also on trial. Both were indicted in late 2017 for defrauding the Japanese government of roughly $5.8 million (¥653 million) in 2014. In February, a group of Nuclear scientists working at the All-Russian Research Institute of Experimental Physics (RFNC-VNIIEF) have been arrested for using lab supercomputing resources to mine crypto-currency, according to a report in Russia’s Interfax News Agency.
It’s interesting what catches readers’ attention – an article about using DL to solve Rubik’s Cube received wide readership. Retro nostalgia? Questions about leading edge semiconductor fabrication capacity are still percolating through the community following Global Foundries announcement it has decided to put its 7 nm node on hold, and entirely stopped development of nodes beyond 7 nm. With Global Foundries shuttering development there are now only three companies left in the game at 10/7 nm; TSMC, Samsung and Intel. At both ISC18 and SC18 BeeGFS was drawing attention – it’s looking more and more like BeeGFS may become a viable option in the parallel file system market.
Chalk this up under Blasé but not Passé. Container technology has clearly gone mainstream in HPC (if that’s not an oxymoron); SyLabs released Singularity 3.0 in the fall. OpenHPC also continues forward. HPCwire ran a good interview with OpenHPC project leader Karl Schulz who said, among other things, that OpenHPC was planning to offer more automated functions; it has already increased the number of recipes (~dozen) and support Singularity and Charliecloud.
Supermicro has countered a news story (The Big Hack: How China Used a Tiny Chip to Infiltrate U.S. Companies) that appeared on Bloomberg BusinessWeek claiming spies in China hacked Super Micro Computer servers widely distributed throughout the U.S. technology supply chain, including servers used by Amazon and Apple. Supermicro issued a report last week saying an investigation “found absolutely no evidence of malicious hardware on our motherboards.” Amazon also issued a denial, stating, “It’s untrue that AWS knew about a supply chain compromise, an issue with malicious chips, or hardware modifications.” No doubt there’s a dark side to the world.
Leaving the dark side to others, Happy Holidays and a hopeful new year to all. On to 2019!