Visit additional Tabor Communication Publications
August 22, 2011
At the Hot Chips conference in Santa Clara last week, IBM lifted the curtain on its Blue Gene/Q SoC, which will soon power some of the highest performing supercomputers in the world. Next year, two DOE labs are slated to boot up the most powerful Blue Gene systems ever deployed: the 10-petaflop "Mira" system at Argonne National Lab, and the 20-petaflop "Sequoia" super at Lawrence Livermore. Both will employ the latest Blue Gene/Q processor described at the conference.
That, of course, is assuming IBM doesn't back out of those projects as it did recently with its 10-petaflop Power7-based (PERCS) Blue Waters supercomputer for NCSA at the University of Illinois. The company terminated the contract to build and support the $300 million Blue Waters system based on financial considerations, leaving the NCSA and its NSF sponsor looking for another vendor to fill the void. The DOE is certainly not expecting to endure that fate for their Blue Gene/Q acquisitions.
The unveiling of the Blue Gene/Q SoC last week implies IBM is committed to those DOE machines as well as futures systems. And Unlike the Power7 CPU, which is being used for both enterprise and HPC systems, the Blue Gene technology has always been exclusively designed and built for supercomputing.
Both, the Power7 and new Blue Gene SoC use IBM's 45 nm SOI technology, but the similarity end there. As described at Hot Chips, the BGQ processor is an 18-core CPU, 16 of which will be used for the application, one for the OS, and one held in reserve. And even though the chip is a custom design, it uses the PowerPC A2 core that IBM introduced last year at the International Solid-State Circuits Conference. The architecture represents yet another PowerPC variant, which in this case merges the functionality of network and server processors. IBM is using the A2 architecture to implement PowerEN chips for the more traditional datacenter applications such as edge-of-network processing, intelligent I/O devices in servers, network attached appliances, distributed computing, and streaming applications.
As such, the A2 architecture emphasizes throughput and energy efficiency, running at relatively modest clock speeds. In the case of the Blue Gene/Q implementation, the clock is just 1.6 GHz and consumes a modest 55 watts at peak. To further reduce power consumption, the chip makes extensive use of clock gating.
But thanks to the double-digit core count, support for up to four threads per core, and the quad floating-point unit, it delivers a very respectable 204 gigaflops per processor. Contrast that with the Power7, which at 3.5 GHz and 8 cores delivers about 256 gigaflops, but consumes a hefty 200 watts.
That gives the Blue Gene/Q chip nearly three times the energy efficiency per peak FLOP compared to the more computationally muscular Power7 (3.72 gigaflops/watt versus 1.28 gigaflops/watt). IBM has been able to capture most of that energy efficiency in the Blue Gene/Q servers. The current top-ranked system on the latest Green500 list is a prototype machine that measures 2.1 gigaflops/watt for Linpack, beating even the newest GPU-accelerated machines as well as the Sparc64 VIIIfx-based K supercomputer, the current champ of the TOP500.
Even compared to its Blue Gene predecessors, BGQ represents a step change in performance, thanks to a large bump in both core count and clock frequency. The Blue Gene/Q chip delivers a 15 times as many peak FLOPS its Blue Gene/P counterpart and a 36 times as many as the original Blue Gene/L SoC.
|Version||Core Architecture||Core Count||Clock Speed||Peak Performance|
|Blue Gene/L||PowerPC 440||2||700 MHz||5.6 Gigaflops|
|Blue Gene/P||PowerPC 450||4||850 MHz||13.6 Gigaflops|
|Blue Gene/Q||PowerPC A2||18||1600 MHz||204.8 Gigaflops|
As with Blue Gene/L and P, the Q incarnation uses embedded DRAM (eDRAM), a dynamic random access memory architecture that is integrated onto the processor ASIC. The technology is employed for shared Level 2 cache, replacing the less performant SRAM technology used in traditional CPUs. In the case of Blue Gene/Q, 32 MB of L2 cache have been carved out.
What is brand new for the latest version is transactional memory. According an EE Times report, the addition of transactional memory will give IBM the distinction of becoming the first company to deliver commercial chips with such technology.
Transactional memory is a technology used to simplify parallel programming by protecting shared data from concurrent access. Basically it prevents data from being corrupted by multiple threads when they simultaneously want to read or write a particular item, and does so in a much more transparent way to the application than the traditional locking mechanism in common use today.
The technology can be implemented in both hardware, software, and a combination of the two. It has been studied by a number of vendors over the years, most notably Intel, Microsoft, and Sun Microsystems. According to the EE Times report, IBM's implementation exploits the high performance on-chip eDRAM to achieve better latency compared to traditional locking schemes.
If everything goes according to plan, the new processor will elevate the Blue Gene franchise into the double-digit petaflops realm. The aforementioned Mira and Sequoia, taken together, represent 30 petaflops of supercomputing and will both be top 10 systems in 2012. Sequoia, in particular, is positioned to be the top-ranked supercomputer next year, assuming no surprises from China or elsewhere.
Whether the BGQ architecture is the end of the line for the Blue Gene franchise is an open question. As of today, there is no R system on the roadmap and IBM seems to be leaning toward a Power-architecture-only strategy for its custom supercomputing lineup. Even if IBM is able to repurpose the cores of other PowerPC architectures, designing and implementing a custom SoC for a single niche market, albeit a high-margin one, is an expensive proposition.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.