Already Europe’s fastest supercomputer at 7.8 petaflops, the Piz Daint (hybrid CPU/GPU Cray XC30) at the Swiss National Computing Center (CSCS) will double its performance with a massive upgrade that involves switching to NVIDIA’s newest Pascal GPU architecture and merging with Piz Dora (Cray XC40), a smaller CPU-based machine. The announcement was made at GTC16 yesterday. Last November Piz Daint placed seventh on the TOP500 list.
Plans call for 5,200 NVIDIA K20xs to be replaced by 4,500 Pascal GPUs – which version hasn’t been decided. Also, the Intel processors will be upgraded from Sandy Bridge to Haswell architecture. When completed, the new combined system, all on a single fabric, will keep the Piz Daint name and provide users with two types of compute nodes: hybrid CPU-GPU and CPU-only nodes. Although slightly reduced in physical size, Piz Daint will be more powerful and flexible allowing simulations or data analyses to be scaled to a few nodes or thousands of nodes.
“We are taking advantage of NVIDIA GPUs to significantly accelerate simulations in such diverse areas as cosmology, materials science, seismology and climatology,” said Thomas Schulthess, professor of computational physics at ETH Zurich and director of CSCS. “Tesla accelerators represent a leap forward in computing, allowing our researchers to solve larger, more complex problems that are currently out of reach in a host of fields.”
Pascal GPUs feature a number of breakthrough technologies, including second-generation High Bandwidth Memory (HBM2) that delivers three times higher bandwidth than the previous generation architecture, and 16nm FinFET technology for unprecedented energy efficiency.
Piz Daint will also incorporate Cray’s DataWarp technology. DataWarp’s so-called Burst Buffer mode quadruples the effective bandwidth for long-term storage; in other words, data is input to and output from storage far more quickly. It paves the way for analyzing millions of small, unstructured files. Consequently, Piz Daint will be able to transfer initial results to a specialized area of the supercomputer for analysis while calculations are still under way.
The upgraded machine will help CSCS carry out its mission of tackling grand challenge science as well as critical applied research. Piz Daint will be used to analyze data from the Large Hadron Collider at CERN, to accelerate research on the Human Brain Project’s High Performance Analytics and Computing Platform, and to continue its work in meteorology and climatology among other domain areas, including deep learning — which was of course a highlight of the NVIDIA event.
“Today a lot of the machine learning work [at ETH Zurich] is happening on workstations and I think the researchers are only now starting to realize that they can actually do this at much bigger scale on our supercomputers,” said Schulthess.
Schulthess bulleted out what he thought were the three were the most important advantages of upgrading to the Pascal architecture and combining the two systems:
- Memory Bandwidth. He expects a substantial memory performance increase. “Exactly how big a boost, we will have to find out — probably NVIDIA doesn’t even know yet, but we do expect a big boost on the memory bandwidth. That’s really important because many applications on the GPU are memory bandwidth bound.”
- Pascal-Haswell Duo. “The combination of Pascal and Haswell versus K20x and Sandy Bridge is important [now] that we have PCIe Gen3. Imagine you have a job distributed over the GPU memory — a weather code or a climate code, [for example] over the GPU memory of many nodes. Now there is no bottleneck. The GPUs talk to each other with a similar bandwidth. Before the piece between the CPU and the GPU was slow and now the bottleneck is gone.”
- Overall Performance. “Pascal is higher performance. I expect that this combination of much better memory bandwidth and faster performance will increase the throughput of the system. And we will open the system to new applications with all these new cool developments that we have today, all these libraries that are coming out of the deep neural network side. Pascal will enable a lot of this.”
All netted out, Schulthess is confident Piz Daint will double performance for both compute and memory bound applications. “We’re not talking about FLOPS; we’re talking about application performance,” he said.
Not surprisingly, CSCS will again run the LINPACK benchmark on Piz Daint, according to Schulthess, in part for the high profile all supercomputer centers desire but equally because, “LINPACK is very, very good at finding out if there are any hardware problems. It was good last time and I’m sure it will be good for that this time.”
It’s not yet clear how energy efficient the new system will be, but Schulthess thinks it won’t be worse and may be better.
“This whole FLOPS per watt and FLOPS per second is very narrow view of looking at the performance of a system. You have to look at time-to-solution of applications and you have to look at energy-to-solution of applications. In a sense what you’ve want – and I’ve written this in a number of papers already – is for the time-to-solution to be good enough,” he said.
A good example, he noted, are weather forecasts, which need to be completed as quickly as practical so as to make them most useful. “At some point when the time-to-solution is good enough, then you want to really minimize energy to solution (not FLOPS-per-watt),” he agreed.
CSCS is exploring the use of Intel’s forthcoming Xeon Phi, but isn’t ready to comment as the work with Intel is ongoing. Software development is another a major investment area, said Schulthess, “much more important than the hardware. We will actually double up in the future with our investments.” Predictably, CSCS is “looking at everything, also ARM – but that is a whole separate conversation.” Indeed.
Notably, the merging of Piz Dora into Piz Daint opens up tremendous flexibility and is in keeping with the growing trend to create unified platforms able to handle big data analytics as well as traditional modeling and simulation.
For example, one can pre-process data and then scale the simulation up while the data is always on the same system.
“If we need GPU-acceleration for simulations but the CPUs for pre-processing, we move the data from the pre-processing side to the GPU-accelerated side. So you move data between partitions, but you’re doing this per node, at 10 gigabytes-per-second, which is much higher than I/O bandwidth if you go through the disks. We’ll have very high performance for the whole workflow and make things more convenient for the scientists,” said Schulthess.
What’s more, the incorporation of big data analytics tools and practices can help science adopt new approaches. “It’s one thing to bring the data analytics on the systems, but to me there is another very important benefit to the HPC community. The data analytics community is used to a different type of software environment — they like to use Python and SPARK, and in real-time not batches. If we’re able to get supercomputers to run Python and even SPARK, we make them much more usable also to the traditional scientific computing community.”
He cited CSCS work on climate and meteorology as an example, “There’s no reason you wouldn’t want climate scientists to write their models in Python rather than Fortran in the future. Their productivity could go up [significantly] on model development. On an old-style supercomputer, you don’t want to talk about those things. But thanks to the whole data science pressure, we’re creating a software environment that’s much more usable for computational scientists. To me, that’s almost as interesting as the deep learning stuff – enhancing productivity of scientists.”
Turning to the rise of container technology in high-end HPC, perhaps best illustrated by the Docker-Shifter effort at NERSC, Schulthess said CSCS was working with NVIDIA to expose the GPUs in Docker.
Schulthess predicts the revamped Piz Daint will be up and fully running in a year or so, “Our requirements are very high and we are not going to cut corners, but once that is done, moving applications from today’s Piz Daint to the new system, they will just fly — I don’t expect any issues there.” A key reason is Pascal GPUs are backwards compatible. In the words of NVIDIA, “It’s all CUDA; you can use the same application you had five years ago and it just scales up.”