July 14, 2011

GPU Computing Wades Into the Mainstream

Michael Feldman

No longer the "Next Big Thing" in HPC, GPUs are becoming conventional.

The idea that the most successful technologies become invisible doesn’t yet apply to GPU computing, but it’s getting there. This week there were a handful of major HPC system announcements based on GPU-equipped platforms, but you wouldn’t have known that from the headlines. No longer the interloper in high performance computing, GPUs are beginning to fade into the background, just like every other mainstream HPC technology.

On Monday, Bright Computing announced that Drexel University has installed a large cluster to be used for its astrophysics and molecular dynamics research. In this case large means 176 peak teraflops — not bad for a university with less than 25 thousand students. Actually the system’s peak performance is even larger than that. The 176 teraflops are attributed to 68K NVIDIA GPU cores in the machine. That works out to about 133 of the latest 512-core Tesla GPUs at 1.33 double-precision teraflops per processor. The CPUs in the system were even more invisible though; they weren’t even mentioned.

Bright Computing’s notable contribution here is its support for GPUs — CUDA 4.0 specifically — in its cluster management offering. Today, though, all cluster and workload managers support GPU computing to one extent or another. They have to, given the increasing level of penetration of GPUs in HPC clusters. The idea is to help automate the management of the GPU resources in the cluster so that the system admins don’t have to treat these CPU-GPU machines like exotic animals.

On Wednesday, SGI announced Swinburne University of Technology in Australia is buying a Rackable C3108 /Altix UV combo system that will deliver 130 teraflops. Like the Drexel super, the Swinburne machine will be used for astrophysics computations. And, if you weren’t paying close attention, you might not have noticed that the system will incorporate NVIDIA GPUs, in this case, a combination of Tesla C2070 and M2090 GPUs. Although no specifics were offered about the number of Tesla parts employed, it’s a good bet that most of the FLOPS are from the GPU side.

Meanwhile the gang at T-Platforms was talking up the Graph 500 performance of their Lomonosov super, installed at Moscow State University. Although Lomonosov was ranked third on the list, it set a new performance record, hitting 43.5 GE/s (billion edges processed per second). The metric is an attempt to measure the ability of computers to perform data-intensive operations, rather than the TOP500 Linpack benchmark, which measures a computer’s floating-point computational prowess.

Lomonosov was recently upgraded to 1.3 petaflops, thanks to — you guessed it — NVIDIA GPUs. In this case, the upgrade added 863 GPU teraflops (courtesy of T-Platforms’ NVIDIA Tesla X2070-equipped TB2-TL blades) to Lomonosov’s existing 510 teraflops. It is not clear, though, whether the GPU parts were used to achieve the record-breaking Graph 500 result.

Jumping now to China, there was the news that the Tianhe-1 supercomputer has gone into operation at the Changsha Supercomputer Center. It looks like the story originated with China Central Television (CCTV) and was subsequently picked up by the IDG News Service. The system, which is reported to reach a peak performance of 1.1 petaflops, apparently went into production last weekend.  According to the report, by October the system will be upgraded to 3 petaflops.

Tianhe-1 has an odd history. It was the world’s first “petascale” supercomputer that employed GPUs, in this case, AMD/ATI Radeon ATI Radeon HD 4870 2 processors. It debuted in the November 2009 TOP500 rankings as a 1.2 (peak) petaflop machine, garnering itself the number five position on the list. By November 2010, it had disappeared from TOP500, replaced by the now-famous Tianhe-1A, a much larger GPU-equipped Chinese super that delivered 4.7 peak petaflops using NVIDIA parts.

What happened to the Tianhe-1 since last November is a mystery. But given the peak petaflops has been shaved by 100 teraflops, I suspect the configuration was modified. Whether that means different GPUs, less GPUs, or no GPUs remains to be seen.  If you’re interested in the IDG/CCTV report, take a look at the YouTube video.

By the way, even though these CPU-GPU machines are becoming more commonplace, I’ve noticed that the naming convention for them has not quite settled. Some are calling them hybrid systems, while others are referring to them as heterogeneous machines. My preference is the latter, since hybrid implies a mixing of DNA, which I take to mean the processor’s transistors. Since the GPUs and CPUs are still discrete entities, heterogeneous seems the better nomenclature here.

Even the AMD Fusion chips and future Project Denver processors from NVIDIA, which mix CPU and GPU components on-chip, still seem more heterogeneous than hybrid to me. But I have a feeling when GPUs are integrated to this level and, more importantly, when applications are oblivious to the mix of underlying computational units, we’ll just be calling them processors again. That’s what happens when technology becomes invisible.

Share This