US, China Vie for Supercomputing Supremacy

By Tiffany Trader

November 14, 2016

The 48th edition of the TOP500 list is fresh off the presses and while there is no new number one system, as previously teased by China, there are a number of notable entrants from the US and around the world and significant trends to report on. Even without the benefit of another mega-system, China is still a force to be reckoned with; the number one and number two machines alone, both Chinese, provide the list with nearly 19 percent of its total FLOPS. We also see the arrival of Knights Landing systems, a continued dip in accelerator-based systems, and InfiniBand losing ground to Ethernet, as non-traditional “supercomputers” from the cloud and Web 2.0 sphere continue to enter the list.

Before we unzip these trends further, let’s jump to the top of the list, where there are two new additions. Joining the top ten club at number five with 14 petaflops is the NERSC Cori supercomputer, and sliding in at number six is Japan’s new 13.6 petaflops Oakforest-PACS supercomputer. Both Cori, the Cray XC40 system installed at Berkeley Lab’s National Energy Research Scientific Computing Center (NERSC), and Oakforest-PACS supercomputer, a Fujitsu PRIMERGY CX1640 M1 cluster operating at Japan’s Joint Center for Advanced High Performance Computing (JCAHPC), rely on the Intel “Knights Landing” Xeon Phi 7250, a 68-core processor that delivers just under 3 peak teraflops of performance.

The brand-new Theta supercomputer, deployed at Argonne National Laboratory ahead of the larger Aurora install, is also using KNL parts, specifically the 64-core Intel Xeon Phi 7230. Theta provides 5.1 Linpack petaflops, earning it the 18th spot on this list. All told, there are 10 systems using Xeon Phi as the main processing unit.

nov-2016-top500-top-10

There are also some noteworthy “internal systems” debuting on the list. At number 28 with 3.3 petaflops Linpack (4.9 petaflops peak) is the the DGX Saturn V from Nvidia, powered by NVLink’d Pascal P100 GPUs. Constructed with 125 DGX-1s, Saturn V is the most energy-efficient system on the list, grabbing the number one spot on the Green500 list with a 8.17 gigaflops/watt rating. That’s a 42 percent improvement from the 6.67 gigaflops/watt delivered by the most efficient machine on previous TOP500 list. Nvidia has had this system in development since GTC16 in March. In June at ISC 2016, Marc Hamilton told us the machine was being used by millions of lines of codes at Nvidia. The graphics chips maker indicated that its automotive teams were its heaviest users.

nvidia_dgx_saturnv-800x
Nvidia’s DGX Saturn V

“In this new style of computing you don’t write if/then/else code to recognize a cat or a stop sign or a pedestrian, you’re feeding a lot of data into a deep neural network and adjusting the network,” Hamilton said. “We have today engineers at Nvidia on our automotive DriveWorks software team, and that’s what they’re doing, rather than writing a bunch of if/then/else code in C they’re getting a bunch of data from a car, either simulated or real, they’re piping it into a deep neural network running on the DGX-1 box – so getting the results in 2 hours instead of 24 hours – they’re adjusting the networking, fine-tuning the network and running it again.”

The number two greenest super is also using P100 GPUs (the only other machine to do so, although to be precise, these are the PCIe variants) — we’re talking about Piz Daint (installed at the Swiss National Supercomputing Centre), which touts an impressive 7.45 gigaflops/watt. Piz Daint recently received a massive 3.5 petaflops P100 infusion that allowed it to hold onto its number 8 spot on the TOP500 despite two new entrants above it (Cori and Oakforest-PACS).

Penguin Computing qualified its in-house machine, Topaz, for the new list achieving a 169th ranking with 760 teraflops (Linpack). The Tundra Extreme Scale machine uses Xeon E5-2695v4 processors and Intel Omni-Path architecture.

Dell EMC is also debuting an on-site machine, Zenith, installed at the Dell HPC Innovation labs in Austin, Texas. Ranked at 372 on the list, Zenith is a 451-teraflops (Linpack) machine built with Dell PowerEdge C6320 and PowerEdge R630 servers using Xeon E5-2697v4 processors and the Intel Omni-Path interconnect. Dell EMC will also be unveiling a companion system (not yet submitted to the TOP500), Rattler, that has 80 C6320 PowerEdge nodes fully connected with EDR. Pascal GPUs will be added soon, according to Dell EMC’s Jim Ganthier, “since that is [the GPU] most customers are interested in trying out.”

The China-US Tally

On the previous edition of the TOP500, released at ISC in June, China had overtaken the United States in both system share and performance share. With this list, the US is now matched with China at 171 systems apiece. As the list authors note, in terms of total performance share, the US now holds the narrowest of leads, 33.9 percent compared to runner-up China’s 33.3 percent.

The number one and two systems — TaihuLight and Tianhe-2 respectively, are Chinese with the 93-petaflops “homegrown” TaihuLight machine commanding a 5.3X FLOPS lead over the fastest US system, the 17.6-petaflops Titan, ranked number three. Although the US has recaptured a bit of ground since the June list, if you take system share, performance share and top-of-the-list status as three primary dimensions of TOP500 leadership, China is in the stronger position.

One can rightly question the relevance of machine “scores” and list standing as the Linpack benchmark becomes less relevant as a stand-in for performance on modern science and engineering applications, but it’s hard to deny the galvanizing impact of a global-scale competition. After all, it’s the supercomputing race that captures the mass attention span and you can’t have a race without a way of gauging who’s ahead.

Last year’s SC (2015) was something of TOP500 coming out party for China. China’s list share went from 37 systems in June 2015 to 109 systems in November 2015 — and then to 168 systems in June 2016. In the same timeframe, US system share fell from 233 to 199 to 165. As Intersect360 Research CEO Addison Snell has remarked, it wasn’t so much that China discovered supercomputing as it discovered the TOP500 list. In other words, many of these machines were older systems newly earmarked for inclusion onto the list.

The US has a major supercomputing refresh planned for 2018-2019 with the CORAL systems coming online, so there will be list churn in the coming years with some jockeying for position, but China won’t be standing still. In addition to the Wuxi supercomputer, China has reported that it will stand up one or two more big systems in the neighborhood of 100-petaflops each. The status of those systems isn’t completely clear, but China has disclosed that they are building three prototype machines ramping up to their 2020 exascale target. The EU and Japan aren’t expecting to reach exascale until at least a year or two after that with the US on track for 2023.

After US and China, Germany ranks third on the latest TOP500 list with 32 systems, followed by Japan with 27, France with 20, and the UK with 17. A year ago, Japan had 37, Germany had 33, and both France and the UK had 18.

top500-nov-2016-vendor-tree-map-rmax
Nov. 2016 TOP500 vendor tree map (% of total list performance)

Looking at the vendor landscape, Cray has staked out the highest share of total list performance at 21.3 percent up from 19.9 percent. The massive Sunway TaihuLight system claims 13.8 percent of the total installed performance, which gives developer NRCPC second-place bragging rights. HPE is in third place with 9.8 percent, down from 12.9 percent six months ago, but will pick up another 6 percent from SGI systems. IBM and Lenovo are tied for fourth place with 8.8 percent share each. Thanks to Tianhe-2 and Tianhe-1A, NUDT contributes 5.8 percent of the total performance of the list, down from 9.2 percent.

By system share, HPE is on top with 112 systems (22.4 percent). HPE will also gain 28 systems from the SGI acquisition, bringing its grand total to 140 machines. In third place is Lenovo with 92 systems. Cray now has 56 systems, down from 69 systems six month ago. IBM is fifth with 33 systems. No new IBM system were introduced in this list.

The aggregate performance of all 500 computers on the list stands at 672 petaflops, a 60 percent increase from a year ago. As long as the growth rate stays above 50 percent, the list will reach a total performance of >1,000 petaflops (1 exaflops) one year from now. The 60 percent rate represents a slight uptick in the year over year growth. The growth of the average performance of all systems in the list slowed in 2008 and again in 2013, dropping to around 55 percent per year. Prior to 2008, aggregate system performance was increasing by about 90 percent per year.

sc16-performance-development-trajectories
Nov. 2016 TOP500 Performance Development

The aggregate performance of the top ten machines is 226 petaflops. 117 systems have cracked the petaflops ceiling, compared with 95 machines on the previous list. The admission point for the TOP100 is currently 1.07 petaflops (up from 958 teraflops). The bar for entry onto the list has been raised to 349.3 Linpack teraflops up from 285.9 teraflops six months ago.

sc16-accelerators-coprocessors-2006-2016
      Source: Nov. 2016 TOP500

Other highlights from the 48th TOP500 list:

  • A total of 462 systems (92.4 percent) are now using Intel processors, slightly up from 91 percent six months ago.
  • The share of IBM Power processors is now at 22 systems, down from 23 systems six months ago.
  • The AMD Opteron family is used in 7 systems, down from 13 systems on the previous list.
  • A total of 86 systems on the list are using accelerator/co-processor technology, down from 93 on June 2016. Sixty (60) of these use NVIDIA chips, 21 systems with Intel Xeon Phi technology (as co-processors), one uses ATI Radeon, and one uses PEZY technology. Three systems use a combination of Nvidia and Intel Xeon Phi accelerators/co-processors. 10 Systems now use Xeon Phi as the main processing unit.
  • InfiniBand technology is now found on 187 systems, down from 205 systems, and is now the second most-used internal system interconnect technology. Gigabit Ethernet is now at 206 systems down from 218 systems, in large part thanks to 177 systems now using 10G interfaces.
  • Intel Omni-Path technology which made its first appearance six months ago with eight systems is now at 28 systems and is used in the No. 6 system, Oakforest-PACS.

We’ll follow up with more insights and analysis from the TOP500 BoF, which takes place Tuesday night from 5:15-7pm at the Salt Palace Convention Center in Salt Lake City.

For now, the TOP500 compilers — Erich Strohmaier and Horst Simon of Lawrence Berkeley National Laboratory; Jack Dongarra of the University of Tennessee, Knoxville; and Martin Meuer of ISC Group — have put together this poster, which provides a view into key performance trends, as well as the evolving architecture and chip technology landscapes.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

TACC Helps ROSIE Bioscience Gateway Expand its Impact

April 26, 2017

Biomolecule structure prediction has long been challenging not least because the relevant software and workflows often require high-end HPC systems that many bioscience researchers lack easy access to. Read more…

By John Russell

Messina Update: The US Path to Exascale in 16 Slides

April 26, 2017

Paul Messina, director of the U.S. Exascale Computing Project, provided a wide-ranging review of ECP’s evolving plans last week at the HPC User Forum. Read more…

By John Russell

IBM, Nvidia, Stone Ridge Claim Gas & Oil Simulation Record

April 25, 2017

IBM, Nvidia, and Stone Ridge Technology today reported setting the performance record for a “billion cell” oil and gas reservoir simulation. Read more…

By John Russell

ASC17 Makes Splash at Wuxi Supercomputing Center

April 24, 2017

A record-breaking twenty student teams plus scores of company representatives, media professionals, staff and student volunteers transformed a formerly empty hall inside the Wuxi Supercomputing Center into a bustling hub of HPC activity, kicking off day one of 2017 Asia Student Supercomputer Challenge (ASC17). Read more…

By Tiffany Trader

HPE Extreme Performance Solutions

Remote Visualization Optimizing Life Sciences Operations and Care Delivery

As patients continually demand a better quality of care and increasingly complex workloads challenge healthcare organizations to innovate, investing in the right technologies is key to ensuring growth and success. Read more…

Groq This: New AI Chips to Give GPUs a Run for Deep Learning Money

April 24, 2017

CPUs and GPUs, move over. Thanks to recent revelations surrounding Google’s new Tensor Processing Unit (TPU), the computing world appears to be on the cusp of a new generation of chips designed specifically for deep learning workloads. Read more…

By Alex Woodie

Musk’s Latest Startup Eyes Brain-Computer Links

April 21, 2017

Elon Musk, the auto and space entrepreneur and severe critic of artificial intelligence, is forming a new venture that reportedly will seek to develop an interface between the human brain and computers. Read more…

By George Leopold

MIT Mathematician Spins Up 220,000-Core Google Compute Cluster

April 21, 2017

On Thursday, Google announced that MIT math professor and computational number theorist Andrew V. Sutherland had set a record for the largest Google Compute Engine (GCE) job. Sutherland ran the massive mathematics workload on 220,000 GCE cores using preemptible virtual machine instances. Read more…

By Tiffany Trader

NERSC Cori Shows the World How Many-Cores for the Masses Works

April 21, 2017

As its mission, the high performance computing center for the U.S. Department of Energy Office of Science, NERSC (the National Energy Research Supercomputer Center), supports a broad spectrum of forefront scientific research across diverse areas that includes climate, material science, chemistry, fusion energy, high-energy physics and many others. Read more…

By Rob Farber

Messina Update: The US Path to Exascale in 16 Slides

April 26, 2017

Paul Messina, director of the U.S. Exascale Computing Project, provided a wide-ranging review of ECP’s evolving plans last week at the HPC User Forum. Read more…

By John Russell

ASC17 Makes Splash at Wuxi Supercomputing Center

April 24, 2017

A record-breaking twenty student teams plus scores of company representatives, media professionals, staff and student volunteers transformed a formerly empty hall inside the Wuxi Supercomputing Center into a bustling hub of HPC activity, kicking off day one of 2017 Asia Student Supercomputer Challenge (ASC17). Read more…

By Tiffany Trader

Groq This: New AI Chips to Give GPUs a Run for Deep Learning Money

April 24, 2017

CPUs and GPUs, move over. Thanks to recent revelations surrounding Google’s new Tensor Processing Unit (TPU), the computing world appears to be on the cusp of a new generation of chips designed specifically for deep learning workloads. Read more…

By Alex Woodie

NERSC Cori Shows the World How Many-Cores for the Masses Works

April 21, 2017

As its mission, the high performance computing center for the U.S. Department of Energy Office of Science, NERSC (the National Energy Research Supercomputer Center), supports a broad spectrum of forefront scientific research across diverse areas that includes climate, material science, chemistry, fusion energy, high-energy physics and many others. Read more…

By Rob Farber

Hyperion (IDC) Paints a Bullish Picture of HPC Future

April 20, 2017

Hyperion Research – formerly IDC’s HPC group – yesterday painted a fascinating and complicated portrait of the HPC community’s health and prospects at the HPC User Forum held in Albuquerque, NM. HPC sales are up and growing ($22 billion, all HPC segments, 2016). Read more…

By John Russell

Knights Landing Processor with Omni-Path Makes Cloud Debut

April 18, 2017

HPC cloud specialist Rescale is partnering with Intel and HPC resource provider R Systems to offer first-ever cloud access to Xeon Phi "Knights Landing" processors. The infrastructure is based on the 68-core Intel Knights Landing processor with integrated Omni-Path fabric (the 7250F Xeon Phi). Read more…

By Tiffany Trader

CERN openlab Explores New CPU/FPGA Processing Solutions

April 14, 2017

Through a CERN openlab project known as the ‘High-Throughput Computing Collaboration,’ researchers are investigating the use of various Intel technologies in data filtering and data acquisition systems. Read more…

By Linda Barney

DOE Supercomputer Achieves Record 45-Qubit Quantum Simulation

April 13, 2017

In order to simulate larger and larger quantum systems and usher in an age of “quantum supremacy,” researchers are stretching the limits of today’s most advanced supercomputers. Read more…

By Tiffany Trader

Google Pulls Back the Covers on Its First Machine Learning Chip

April 6, 2017

This week Google released a report detailing the design and performance characteristics of the Tensor Processing Unit (TPU), its custom ASIC for the inference phase of neural networks (NN). Read more…

By Tiffany Trader

Quantum Bits: D-Wave and VW; Google Quantum Lab; IBM Expands Access

March 21, 2017

For a technology that’s usually characterized as far off and in a distant galaxy, quantum computing has been steadily picking up steam. Read more…

By John Russell

Trump Budget Targets NIH, DOE, and EPA; No Mention of NSF

March 16, 2017

President Trump’s proposed U.S. fiscal 2018 budget issued today sharply cuts science spending while bolstering military spending as he promised during the campaign. Read more…

By John Russell

HPC Compiler Company PathScale Seeks Life Raft

March 23, 2017

HPCwire has learned that HPC compiler company PathScale has fallen on difficult times and is asking the community for help or actively seeking a buyer for its assets. Read more…

By Tiffany Trader

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

CPU-based Visualization Positions for Exascale Supercomputing

March 16, 2017

In this contributed perspective piece, Intel’s Jim Jeffers makes the case that CPU-based visualization is now widely adopted and as such is no longer a contrarian view, but is rather an exascale requirement. Read more…

By Jim Jeffers, Principal Engineer and Engineering Leader, Intel

For IBM/OpenPOWER: Success in 2017 = (Volume) Sales

January 11, 2017

To a large degree IBM and the OpenPOWER Foundation have done what they said they would – assembling a substantial and growing ecosystem and bringing Power-based products to market, all in about three years. Read more…

By John Russell

TSUBAME3.0 Points to Future HPE Pascal-NVLink-OPA Server

February 17, 2017

Since our initial coverage of the TSUBAME3.0 supercomputer yesterday, more details have come to light on this innovative project. Of particular interest is a new board design for NVLink-equipped Pascal P100 GPUs that will create another entrant to the space currently occupied by Nvidia's DGX-1 system, IBM's "Minsky" platform and the Supermicro SuperServer (1028GQ-TXR). Read more…

By Tiffany Trader

Leading Solution Providers

Tokyo Tech’s TSUBAME3.0 Will Be First HPE-SGI Super

February 16, 2017

In a press event Friday afternoon local time in Japan, Tokyo Institute of Technology (Tokyo Tech) announced its plans for the TSUBAME3.0 supercomputer, which will be Japan’s “fastest AI supercomputer,” Read more…

By Tiffany Trader

Is Liquid Cooling Ready to Go Mainstream?

February 13, 2017

Lost in the frenzy of SC16 was a substantial rise in the number of vendors showing server oriented liquid cooling technologies. Three decades ago liquid cooling was pretty much the exclusive realm of the Cray-2 and IBM mainframe class products. That’s changing. We are now seeing an emergence of x86 class server products with exotic plumbing technology ranging from Direct-to-Chip to servers and storage completely immersed in a dielectric fluid. Read more…

By Steve Campbell

IBM Wants to be “Red Hat” of Deep Learning

January 26, 2017

IBM today announced the addition of TensorFlow and Chainer deep learning frameworks to its PowerAI suite of deep learning tools, which already includes popular offerings such as Caffe, Theano, and Torch. Read more…

By John Russell

Facebook Open Sources Caffe2; Nvidia, Intel Rush to Optimize

April 18, 2017

From its F8 developer conference in San Jose, Calif., today, Facebook announced Caffe2, a new open-source, cross-platform framework for deep learning. Caffe2 is the successor to Caffe, the deep learning framework developed by Berkeley AI Research and community contributors. Read more…

By Tiffany Trader

BioTeam’s Berman Charts 2017 HPC Trends in Life Sciences

January 4, 2017

Twenty years ago high performance computing was nearly absent from life sciences. Today it’s used throughout life sciences and biomedical research. Genomics and the data deluge from modern lab instruments are the main drivers, but so is the longer-term desire to perform predictive simulation in support of Precision Medicine (PM). There’s even a specialized life sciences supercomputer, ‘Anton’ from D.E. Shaw Research, and the Pittsburgh Supercomputing Center is standing up its second Anton 2 and actively soliciting project proposals. There’s a lot going on. Read more…

By John Russell

HPC Startup Advances Auto-Parallelization’s Promise

January 23, 2017

The shift from single core to multicore hardware has made finding parallelism in codes more important than ever, but that hasn’t made the task of parallel programming any easier. Read more…

By Tiffany Trader

HPC Technique Propels Deep Learning at Scale

February 21, 2017

Researchers from Baidu’s Silicon Valley AI Lab (SVAIL) have adapted a well-known HPC communication technique to boost the speed and scale of their neural network training and now they are sharing their implementation with the larger deep learning community. Read more…

By Tiffany Trader

IDG to Be Bought by Chinese Investors; IDC to Spin Out HPC Group

January 19, 2017

US-based publishing and investment firm International Data Group, Inc. (IDG) will be acquired by a pair of Chinese investors, China Oceanwide Holdings Group Co., Ltd. Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Share This