Visit additional Tabor Communication Publications
May 31, 2010
A Chinese supercomputer called Nebulae, powered by the latest Fermi GPUs, grabbed the number two spot on the TOP500 list announced earlier today. The new machine delivered 1.27 petaflops of Linpack performance (against a record peak performance of 2.98 petaflops), yielding only to the 1.76 petaflop Jaguar system, which retained its number one berth.
The new Chinese machine is installed at the National Supercomputing Centre in Shenzhen (NSCS) and was built by Chinese HPC vendor Dawning. Nebulae is based on Dawning's TC3600 blades, which house Intel X5650 CPUs connected to NVIDIA Tesla C2050 GPUs. Although of hybrid design, the majority of the FLOPS originate with the system's 4,640 NVIDIA GPUs, which, by themselves, provide 2.32 of the 2.98 peak petaflops. Power consumption on Linpack for this latest petaflop machine is not recorded, but I'm guessing it's between 2.5 and 3.0 MW, which would be more than twice the power efficiency of the Opteron-based Jaguar super.
Nebulae represents the second Chinese machine in the top 10. Tianhe-1, now at number 7, is a 563-teraflop system that captured the number 5 slot last November. It is housed at the National Supercomputer Center in Tianjin/NUDT. Like Nebulae, Tianhe-1 is a CPU-GPU hybrid, in this case using ATI Radeon GPUs from AMD.
Yet another Fermi GPU-accelerated system from China that made the list is Mole-8.5, the supercomputer announced last week in a Mellanox press release. That system achieved 207 Linpack teraflops (out of a possible 1,138), garnering the 19th spot on the TOP500. It's installed at the Institute of Process Engineering, Chinese Academy of Sciences.
Whether Nebulae, Tianhe-1, and Mole-8.5 foreshadow a coming of age for high-end GPU computing remains to seen. These three Chinese systems represent three quarters of all the GPU-equipped machines on the current list, which is still dominated by x86-based CPUs. However, multi-petaflop systems powered by GPUs are now in the pipeline. The Keeneland Project, an NSF Track 2D grant will fund an HP system accelerated by NVIDIA GPUs. Georgia Tech, the University of Tennessee and Oak Ridge National Laboratory will build and manage this system. Keeneland is supposed to deliver about 2 peak petaflops, and be deployed sometime in 2012. In Japan, TSUBAME 2.0 is also going to be built using NVIDIA GPUs. That system is slated to hit 2.4 petaflops and is scheduled to be installed later this year. This is all good news for GPU vendors, especially NVIDIA, which has invested most heavily in HPC and the GPGPU movement over the past four years.
It's also good news for China. That country is developing its supercomputing resources at a rapid pace now, especially at the top end of the spectrum. This latest list puts 24 Chinese systems in the TOP500 -- tied with Germany, and trailing only the US, UK, and France. And from an aggregate performance standpoint, China is second only to the US.
Besides the China-GPU excitement, the rest of the TOP500 news was rather humdrum. For example, the top systems barely budged, which is somewhat of a rarity. With the exception of Nebulae, the only other noteworthy change in the top 10 is the upgraded NASA Pleiades system, which got outfitted with additional SGI Altix ICE 8400 blades. That upped its Linpack performance from 544 teraflops to 722 teraflops, but failed to improve its ranking at the number 6 slot. Ranger, the Sun Constellation system at TACC, was the only machine that got bumped out of the top 10.
The increase in aggregate performance for the entire list was the lowest in TOP500 history, reflected by the fact that only 143 of the 500 systems were replaced. Even the bottom of the list barely moved. The 500th system six month ago was 20 teraflops, which increased to only 24.7 teraflops on the current list.
Perhaps more ominous is that multicore scaling seems to be slowing. According to TOP500 list co-founder Erich Strohmaier, the move from predominantly dual-core systems to quad-core systems took about two years. If that pace had kept up, we would be seeing many more six- and eight-core system, which is not the case (425 systems on the list are still quad-core).
This may be due to a temporary hiccup in the CPU rollout cycle. This spring Intel, AMD and IBM started rolling out 6-, 8-, and 12-core CPUs, and they should start showing up in HPC installations very shortly. But because bandwidth to RAM is not keeping pace with the additional cores, memory-bound problems can't benefit by simply increasing the CPU core count. This could be encouraging chipmakers to spend relatively more of the transistor budget provided by Moore's Law on features like bigger caches or new instructions, rather than on additional compute engines.
Another unfortunate trend seems to be developing. Although the T0P500 has only been keeping tabs on system power consumption for a couple of years, that metric seems to be steadily rising for the list as a whole, and is rising especially fast for the top 10 systems. Fortunately power efficiency is going up too, but not fast enough to keep pace with user demand for bigger machines. If that curve can't be bent, a larger and larger percentage of the expense of a supercomputer is going to be consumed by power and cooling.
The trend toward Intel CPUs continues. The vast majority of systems -- 408, to be precise -- are based on Intel processors. AMD chips are in just 47 systems, despite being used in a disproportionate number of the top systems, including the number 1 (Jaguar), 3 (Roadrunner), and 4 (Kraken) machines. IBM Power-based systems are in third place with 42.
Finally, the trend toward InfiniBand remains unabated. There are now 207 systems on the list using InfiniBand fabric, up from 181 just six months ago. Interestingly, Gigabit Ethernet-based machines took a hit, dropping from 259 systems in November 2009, to 242 today. The battle between 10 GigE and InfiniBand awaits.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.