Nvidia Sees Bright Future for AI Supercomputing

By Tiffany Trader

November 23, 2016

Graphics chipmaker Nvidia made a strong showing at SC16 in Salt Lake City last week. Most prominent wins were achieving the number one spot on the Green500 list with new in-house DGX-1 supercomputer, SaturnV, and partnering with the National Cancer Institute, the U.S. Department of Energy (DOE) and several national laboratories to accelerate cancer research as part of the Cancer Moonshot initiative.

The company kicked off its SC activities with a press briefing on Monday (Nov. 14), during which CEO Jen-Hsun Huang characterized 2016 as a tipping point for the GPU computing approach popularized by Nvidia for over a decade.

Not surprisingly, Huang’s main message was that the GPU computing era has arrived. Throughout the hour-long talk, Huang would revisit the theme of deep learning as both a supercomputing problem and a supercomputing opportunity.

“We believe that supercomputers ought to be designed as AI supercomputers – meaning it has to be good at both computational science as well as data science – that building a machine that’s only good at data science doesn’t make sense and building a supercomputer that’s only good at computational science doesn’t make sense,” he said.

“On the one hand, deep learning requires an enormous amount of data throughput processing – this way of developing software where the computers write software themselves inspired by a lot of data processing behind it is a very important approach to computing but it also has the wonderful opportunity to benefit supercomputing as well, solving problems for science that hasn’t been possible before today,” said Huang.

Huang’s view is that traditional numerical HPC is not going anywhere, but will exist side by side with machine learning methods.

“I’m a big fan of using math when you can; we should use AI when you can’t,” he said. “For example what’s the equation of a cat? It’s probably very similar to the equation for a dog – two ears, four legs, a tail. And so there are a lot of areas where equations don’t work and that’s where I see AI – search problems, recommendation problems, likelihood problems, where there’s either too much data, incomplete data, or no laws of physics that support it. So where do I feel like eating tonight – there’s no laws of physics for that. There’s a lot of these type of problems that we simply can’t solve – I think that they’re going to coexist.”

While Nvidia is enabling parallel computing via thousands of CUDA cores combined with the CUDA programing framework, the CEO emphasized the necessity of a performant central processing unit. “Almost everything we do we start with a strong CPU,” said Huang. “We still believe in Amdahl’s law; we believe that code has a lot of single threaded parts to it and this is an area that we want to continue to be good at.”

nvidia-nvlink-dgx-1-ibm-p8

The two servers currently shipping with the NVLink P100 GPU – Nvidia’s DGX-1 server and IBM’s Minsky platform – speak to this goal. The DGX-1 connects eight NVLink’d Pascal P100s to two 20-core Intel Xeon E5-2698 v4 chips. The IBM Minsky server leverages two Power8 CPUs and four P100 GPUs connected by NVlink up to the CPUs.

Nvidia’s 124-node supercomputer, SaturnV plays a crucial role in Nvidia’s plans to usher in AI supercomputing. The machine debuted on the 48th TOP500 list at number 28 with 3.3 petaflops Linpack (4.9 petaflops peak). Even more impressively, it nabbed the number one spot on the Green500 list achieving more than 8.17 gigaflops/watt. That’s a 42 percent improvement from the 6.67 gigaflops/watt delivered by the most efficient machine on the previous TOP500 list. Extrapolating to exascale gives us 105.7 MW. If we go with a semi-“relaxed” exascale power allowance of 30 MW (the original DARPA target was 20 MW), this is less than one-fourth the planned power consumption of US exascale systems. Three years ago, the extrapolated delta was over a 7X.

SaturnV – its name inspired by the original Moonshot – will be a critical part of the CANDLE (CANcer Distributed Learning Environment) project (covered here). Announced last month, CANDLE’s mission is to exploit high performance computing (HPC), machine learning and data analytics technologies to advance precision oncology. Huang said the partners will be working together to develop “the world’s first deep learning framework designed for exascale.”

“It’s going to be really hard,” he added. “That’s why we’re working with the four DOE labs and have all standardized on the same architecture – SaturnV is the biggest one of them but we’re all using exactly the same architecture and it’s all GPU accelerated and we’re going to develop a framework that allows us to scale to get to exascale.”

Huang noted that when you apply deep learning FLOPS math – aka 16-bit floating point operations as opposed to the HPC norm of 64-bit FLOPS, exascale is not far away at all.

The [IBM/Nvidia] CORAL machines are on track for 2018 with 300 petaflops peak FP64, which comes out to 1,200 peak FP16, Huang pointed out. “For AI, FP16 is fine, now in some areas we need FP32, we need variable precision, but that’s the point,” he said. “I think CORAL is going to be the world’s fastest AI supercomputer [and] I think that we didn’t know it then but I believe that we are building an exascale machine already.”

It’s a fair point that dialing down the bits increases data throughput (boosting FLOPS), but as one analyst at the event said, “calling it exascale is changing the rules.”

Lending more insight to Nvidia’s plans was Solutions Architect Louis Capps, who presented at the Green500 BoF on November 16.

“This is completely a research platform,” he said of SaturnV. “We’re going to have academics using it. We’re going to have partnerships, collaborations, and internally, we’re working on our deep learning research and our HPC research.”

Embedded, robotics, automotive, and hyperscale computing are all major focus areas, but Capps and Huang both were most effusive about the opportunities at the convergence of data science and HPC. “We’re just now starting to bridge where real HPC work is converging with deep learning,” said Capps.

nvidia_dgx_saturnv-800xSaturnV is organized into five 3U boxes per rack, with 15 kilowatt of power on each rack and some 25 racks total. While the press photo of SaturnV indicates 10 servers per rack, this is not reflective of what’s inside. “We could not put that many in ours,” said Capps. “We put this in a datacenter which is not HPC. It was an IT datacenter originally.”

SaturnV was one of two systems on the newly published TOP500 list to employ the Pascal-based P100 GPUs. The number two greenest super, Piz Daint is using the PCIe variants. Installed at the Swiss National Supercomputing Centre, Piz Daint delivers an energy-efficiency rating of 7.45 gigaflops/watt. Refreshed with the new P100 hardware, Piz Daint achieved 9.8 petaflops on the Linpack benchmark, securing it the eighth spot on the latest list.

Notably, every single one of the top ten systems on the Green500 list is using some flavor of acceleration or manycore. There is no pure-play traditional x86 in the bunch.

green500-nov-2016-top-10
Source: Top500/Green500

A compelling testament to this approach came from Thomas Schulthess, director of the Swiss National Supercomputing Centre, where Nvidia K80 GPUs have been used for operational weather forecasting for over a year now. “I know the HPC community has a problem with the heterogeneous approach,” he said. “We’ve done a lot of analysis on this issue. We asked, what would the goals we have at exascale look like if we build a homogeneous Xeon-based system, and there’s no way that you will run significant problems that are significantly bigger and faster than we do today in 5-6 years at exascale if you build it based on a Xeon system.

“The message to the application folks is, you’ve had time to think about it now, but now there is no more choice. If you want to run at exascale, it is going to be on Xeon Phi or GPU-accelerated or the lightweight core, almost Cell-like architectures that we see on TaihuLight.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

At SC18: AMD Sets Up for Epyc Epoch

November 16, 2018

It’s been a good two weeks, AMD’s Gary Silcott and Andy Parma told me on the last day of SC18 in Dallas at the restaurant where we met to discuss their show news and recent successes. Heck, it’s been a good year. Read more…

By Tiffany Trader

Dell EMC’s HPC Chief on Strategy and Emerging Processor Diversity

November 16, 2018

Last January Thierry Pellegrino, a long-time Dell/Dell EMC veteran, became vice president of HPC. His tenure comes at a time when the very definition of HPC is blurring with AI writ large (data analytics, machine learnin Read more…

By John Russell

IBM’s AI-HPC Combine for ‘Intelligent Simulation’: Eliminating the Unnecessary 

November 16, 2018

A powerhouse concept in attaining new knowledge is the notion of the “emergent property,” the combination of formerly stovepiped scientific disciplines and exploratory methods to form cross-disciplinary intelligence Read more…

By Doug Black

HPE Extreme Performance Solutions

AI Can Be Scary. But Choosing the Wrong Partners Can Be Mortifying!

As you continue to dive deeper into AI, you will discover it is more than just deep learning. AI is an extremely complex set of machine learning, deep learning, reinforcement, and analytics algorithms with varying compute, storage, memory, and communications needs. Read more…

IBM Accelerated Insights

From Deep Blue to Summit – 30 Years of Supercomputing Innovation

This week, in honor of the 30th anniversary of the SC conference, we are highlighting some of the most significant IBM contributions to supercomputing over the past 30 years. Read more…

How the United States Invests in Supercomputing

November 14, 2018

The CORAL supercomputers Summit and Sierra are now the world's fastest computers and are already contributing to science with early applications. Ahead of SC18, Maciej Chojnowski with ICM at the University of Warsaw discussed the details of the CORAL project with Dr. Dimitri Kusnezov from the U.S. Department of Energy. Read more…

By Maciej Chojnowski

At SC18: AMD Sets Up for Epyc Epoch

November 16, 2018

It’s been a good two weeks, AMD’s Gary Silcott and Andy Parma told me on the last day of SC18 in Dallas at the restaurant where we met to discuss their show news and recent successes. Heck, it’s been a good year. Read more…

By Tiffany Trader

Dell EMC’s HPC Chief on Strategy and Emerging Processor Diversity

November 16, 2018

Last January Thierry Pellegrino, a long-time Dell/Dell EMC veteran, became vice president of HPC. His tenure comes at a time when the very definition of HPC is Read more…

By John Russell

IBM’s AI-HPC Combine for ‘Intelligent Simulation’: Eliminating the Unnecessary 

November 16, 2018

A powerhouse concept in attaining new knowledge is the notion of the “emergent property,” the combination of formerly stovepiped scientific disciplines and Read more…

By Doug Black

How the United States Invests in Supercomputing

November 14, 2018

The CORAL supercomputers Summit and Sierra are now the world's fastest computers and are already contributing to science with early applications. Ahead of SC18, Maciej Chojnowski with ICM at the University of Warsaw discussed the details of the CORAL project with Dr. Dimitri Kusnezov from the U.S. Department of Energy. Read more…

By Maciej Chojnowski

At SC18: Humanitarianism Amid Boom Times for HPC

November 14, 2018

At SC18 in Dallas, the feeling on the ground is one of forward-looking buoyancy. Like boom times that cycle through the Texas oil fields, the HPC industry is en Read more…

By Doug Black

Nvidia’s Jensen Huang Delivers Vision for the New HPC

November 14, 2018

For nearly two hours on Monday at SC18, Jensen Huang, CEO of Nvidia, presented his expansive view of the future of HPC (and computing in general) as only he can do. Animated. Backstopped by a stream of data charts, product photos, and even a beautiful image of supernovae... Read more…

By John Russell

New Panasas High Performance Storage Straddles Commercial-Traditional HPC

November 13, 2018

High performance storage vendor Panasas has launched a new version of its ActiveStor product line this morning featuring what the company said is the industry’s first plug-and-play, portable parallel file system that delivers up to 75 Gb/s per rack on industry standard hardware combined with “enterprise-grade reliability and manageability.” Read more…

By Doug Black

SC18 Student Cluster Competition – Revealing the Field

November 13, 2018

It’s November again and we’re almost ready for the kick-off of one of the greatest computer sports events in the world – the SC Student Cluster Competitio Read more…

By Dan Olds

Cray Unveils Shasta, Lands NERSC-9 Contract

October 30, 2018

Cray revealed today the details of its next-gen supercomputing architecture, Shasta, selected to be the next flagship system at NERSC. We've known of the code-name "Shasta" since the Argonne slice of the CORAL project was announced in 2015 and although the details of that plan have changed considerably, Cray didn't slow down its timeline for Shasta. Read more…

By Tiffany Trader

TACC Wins Next NSF-funded Major Supercomputer

July 30, 2018

The Texas Advanced Computing Center (TACC) has won the next NSF-funded big supercomputer beating out rivals including the National Center for Supercomputing Ap Read more…

By John Russell

IBM at Hot Chips: What’s Next for Power

August 23, 2018

With processor, memory and networking technologies all racing to fill in for an ailing Moore’s law, the era of the heterogeneous datacenter is well underway, Read more…

By Tiffany Trader

Requiem for a Phi: Knights Landing Discontinued

July 25, 2018

On Monday, Intel made public its end of life strategy for the Knights Landing "KNL" Phi product set. The announcement makes official what has already been wide Read more…

By Tiffany Trader

House Passes $1.275B National Quantum Initiative

September 17, 2018

Last Thursday the U.S. House of Representatives passed the National Quantum Initiative Act (NQIA) intended to accelerate quantum computing research and developm Read more…

By John Russell

CERN Project Sees Orders-of-Magnitude Speedup with AI Approach

August 14, 2018

An award-winning effort at CERN has demonstrated potential to significantly change how the physics based modeling and simulation communities view machine learni Read more…

By Rob Farber

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

By John Russell

Leading Solution Providers

US Leads Supercomputing with #1, #2 Systems & Petascale Arm

November 12, 2018

The 31st Supercomputing Conference (SC) - commemorating 30 years since the first Supercomputing in 1988 - kicked off in Dallas yesterday, taking over the Kay Ba Read more…

By Tiffany Trader

TACC’s ‘Frontera’ Supercomputer Expands Horizon for Extreme-Scale Science

August 29, 2018

The National Science Foundation and the Texas Advanced Computing Center announced today that a new system, called Frontera, will overtake Stampede 2 as the fast Read more…

By Tiffany Trader

At SC18: AMD Sets Up for Epyc Epoch

November 16, 2018

It’s been a good two weeks, AMD’s Gary Silcott and Andy Parma told me on the last day of SC18 in Dallas at the restaurant where we met to discuss their show news and recent successes. Heck, it’s been a good year. Read more…

By Tiffany Trader

HPE No. 1, IBM Surges, in ‘Bucking Bronco’ High Performance Server Market

September 27, 2018

Riding healthy U.S. and global economies, strong demand for AI-capable hardware and other tailwind trends, the high performance computing server market jumped 28 percent in the second quarter 2018 to $3.7 billion, up from $2.9 billion for the same period last year, according to industry analyst firm Hyperion Research. Read more…

By Doug Black

Intel Announces Cooper Lake, Advances AI Strategy

August 9, 2018

Intel's chief datacenter exec Navin Shenoy kicked off the company's Data-Centric Innovation Summit Wednesday, the day-long program devoted to Intel's datacenter Read more…

By Tiffany Trader

Germany Celebrates Launch of Two Fastest Supercomputers

September 26, 2018

The new high-performance computer SuperMUC-NG at the Leibniz Supercomputing Center (LRZ) in Garching is the fastest computer in Germany and one of the fastest i Read more…

By Tiffany Trader

Houston to Field Massive, ‘Geophysically Configured’ Cloud Supercomputer

October 11, 2018

Based on some news stories out today, one might get the impression that the next system to crack number one on the Top500 would be an industrial oil and gas mon Read more…

By Tiffany Trader

Nvidia’s Jensen Huang Delivers Vision for the New HPC

November 14, 2018

For nearly two hours on Monday at SC18, Jensen Huang, CEO of Nvidia, presented his expansive view of the future of HPC (and computing in general) as only he can do. Animated. Backstopped by a stream of data charts, product photos, and even a beautiful image of supernovae... Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This