July 26, 2012
Exascale computing is going to require chipmakers to build extremely efficient microprocessors. This has been the focus of the Green500 list, which forgoes talking about the world’s fastest clusters in favor of those with the best performance per watt rating. In this brave new world of high performance computing, and increasingly, and any kind of computing, chip efficiency is now under intense scrutiny.
Yesterday, Real World Technologies posted an article detailing the computational efficiency of CPUs and GPUs, and how those designs have evolved over the last three years. In the analysis, author David Kanter looks at both the computational performance per watt as well as performance per physical die area. He also compares how the chip architectures have faired since 2009, when Kanter did his initial analysis. The evaluations were based on double precision floating point performance.
In 2009, the standout in performance per watt and physical space was AMD’s RV770 GPU. The processor was able to perform 1.6 gigaflops/watt. It was also capable of performing just under one gigaflop per mm2. Intel’s Silverthorne processor was slightly less efficient than the RV770, but had far less density than AMD’s GPU. Subsequently renamed “Atom,” the chip was able to perform between 1.5 to 1.6 gigaflops/watt and was primarily tasked with powering mobile consumer devices.
While the R770 appeared to be the clear winner in 2009, GPUs were not widely accepted as compute engines and suffered a number of challenges. Many were unable to deliver double precision floating point calculations, and those that could, often did so with limited performance. Programming GPUs was also difficult, as APIs were in their early stages.
GPU technology has improved significantly over the past three years though. Almost all these “graphic” processors can perform double precision calculations and have become simpler to program, thanks to more mature programming frameworks like CUDA, and OpenCL. CPUs have also improved over the interval, including new vector extensions like x86 AVX.
So what does the landscape look like today? From Kanter's analysis:

IBM currently takes the energy efficiency crown with their Blue Gene/Q (BG/Q) processor, which just so happens to power the most powerful supercomputer in the world. The chip can perform roughly 3.75 gigaflops/watt and is represented in the top 20 systems on the current Green500 list. Not far behind in efficiency is NVIDIA’s Fermi GPU, which performs close to 3 gigaflops/watt. The K computer’s SPARC64 chip is just a little further behind at 2.2 gigaflops/watt. All other mainstream CPUs in use for HPC – Intel’s Sandy Bridge, AMD’s Interlagos and IBM’s POWER7 – are further back, below 1.5 gigaflops/watt.
Kanter says this divergence reflects a fundamental difference between traditional processors (x86 CPUs, POWER, and others) and throughput processors (GPUs and BG/Q). But, he notes, the difference in efficiency between the two groups has narrowed since 2009, and he expects them to eventually converge.
Probably not in the short-term though. Before the end of this year, Intel will release its first Many Integrated Core (MIC) coprocessor, now rebranded as Xeon Phi, which promises over 1 teraflop of absolute performance. It will directly compete with NVIDIA’s Kepler K20 GPU, also due out later this year. Both chips will probably best BG/Q silicon on performance/watt.
Further out, it should get even more interesting. Server-capable 64-bit ARM processors, low power x86 CPUs from Intel, heterogeneous CPU-GPU chips from (at least) AMD and NVIDIA, and whatever IBM is planning as sequel to BG/Q, are all in the pipeline for 2013-2014.
Full story at Real World Technologies
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
Read more...
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
Read more...
Although Horst Simon was named Deputy Director of Lawrence Berkeley National Laboratory, he maintains his strong ties to the scientific computing community as an editor of the TOP500 list and as an invited speaker at conferences.
Read more...
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.