NetApp
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud
Green Computing Report

Tabor Communications
Corporate Video

Japanese University Boots Up 800-Teraflop GPU Supercomputer


Japan's newest supercomputer, an 802-teraflop GPU-accelerated Appro cluster, went into production last week at the University of Tsukuba, just north of Tokyo. The machine represents the lynchpin of the university's HA-PACS project, a three-year effort that will attempt to push the envelope on GPU-pumped supercomputing.

HA-PACS, which stands for Highly Accelerated Parallel Advanced system for Computational Sciences, is just the latest in a series "PACS" systems at the Tsukuba. The original system, known as PACS-9, was installed in 1978 and delivered 7 kiloflops (yes kiloflops!). Every two to four years thereafter, the university's Center for Computational Sciences upgraded to a new system. The last one, PACS-CS, was deployed in 2006 and topped out at 14.3 teraflops.

The new Appro cluster represents the 8th generation supercomputer at Tsukuba and is the first to be accelerated by GPUs. As you might suspect, the vast majority of the 802 teraflops is provided by the graphics units, in this case, based on the latest NVIDIA Tesla GPU part, the M2090. Each cluster node pairs four of them with two 8-core Xeon E5 ("Sandy Bridge") CPUs from Intel.

In aggregate, the 268-node HA-PACS machine will house 1072 GPUs and 536 CPUs, as well as a total of 34 terabytes of memory on the CPU side and an additional 6.4 terabytes for the GPUs. External storage amounts to just over half a petabyte, based on DataDirect Network's SFA10000 gear. As a result of the high computational density afforded by the graphics chips, the entire cluster fits into just 26-racks and draw a little over 400 KW of power.

Using the top-of-the line CPUs and GPUs makes for a dense and powerful cluster, with each node delivering just shy of 3 teraflops (peak) performance. And even though most of the flops are GPU-derived (665 gigaflops per M2090), each Xeon E5 chips in with a respectable 166 gigaflops, thanks to the addition of the new Advanced Vector Extensions (AVX) instructions.

This is Appro's second big system deployment at Tsukuba, having delivered the 95-teraflop T2K Open Supercomputer there in 2009. That machine used AMD's quad-core Opterons and no GPUs.

Appro, by the way is one of the few server vendors offering systems equipped with Xeon E5 CPUs these days, and already claims four such systems on the TOP500 list: "Zin" (961 teraflops) at Lawrence Livermore National Lab, "Luna" (293 teraflops) at Los Alamos National Lab, "Gordon" (262 teraflops) at the San Diego Supercomputer Center and "Chama" at Sandia National Labs. That's a nice accomplishment, considering Intel has yet to officially release the E5 chips into the wild.

CPU's aside, the main focus for HA-PACS is to draw the most performance from the GPU hardware. The project has a two-pronged mission in this regard: to bring more big science codes to the GPU and to develop a tightly coupled parallel computing acceleration mechanism in order to "further optimize the utility of the graphics hardware."

On the application side, HA-PACS will be porting codes to the GPU in the areas of subatomic particles, life sciences, astrophysics, nuclear physics and environmental science. For example, astrophysics applications that deal with radiation transfer can take advantage of ray tracing methods, which modern GPUs are tailor-made for. Likewise, for elementary particle physics, GPUs can be used to great advantage to accelerate dense matrix computations.

On the computational research side, the HA-PACS team is in the process of developing custom hardware to support direct communications between the GPUs. The idea is to enable the graphics processors to quickly shuffle data between themselves without the overhead involved in going through the CPU.

This custom hardware, known as the Tightly Coupled Accelerator (TCA), will be distinct from the HA-PACS base cluster from Appro, but will eventually be integrated with it, says Taisuke Boku, deputy director of Center for Computational Sciences at University of Tsukuba. According to him, TCA will use PCIe as a communication channel between the GPUs and employ FPGA technology to facilitate this.

The FPGA will be based on an existing implementation developed at Tsukuba called PEACH, which stands for PCI Express Adaptive Communication Hub. The idea is to provide a controller that enables PCIe devices to directly communicate with one another on a peer-to-peer basis, rather than as slave devices.

To make this work for TCA, an upgraded implementation of the FPGA, known as PEACH2, will be developed. It will incorporate NVIDIA's GPU-Direct communication protocols to facilitate data transfers between the Tesla parts. Bandwidth will also be improved from the original PEACH version, which used four ports of PCIe Gen2 x4 as the communication link. For PEACH2, four ports of PCIe Gen2 x8 will be supported, doubling throughput.

The first prototype of the TCA is under development now. The plan is to to incorporate the technology into a second cluster, which will be glued to the Appro base cluster by early 2013. The TCA cluster will add an additional 200-plus teraflops into production, bringing the integrated HA-PACS system to over a petaflop.
 
The HA-PACS work will be a precursor to future exascale systems already in the minds of Boku and his team at Tsukuba. He believes future exascale system will require some level of accelerated computing technology due to its inherent advantages in performance and energy efficiency.

"The largest issue on the accelerated computing is how to fill the gap between its powerful internal computation performance and relatively poor external communication performance," says Boku. "In some applications, we may need a paradigm shift toward a new generation of algorithms. HA-PACS will be the testbed for developing these algorithms."

Sponsored Links

Accelerate your science with Seneca
One of the first HPC providers installing a 4X NVIDIA Kepler K-20 cluster. Invites you to a free evaluation on Seneca’s NVIDIA K20 Kepler cluster, pre-loaded with AMBER, NAMD, LAMMPS

Webinar: Programming Heterogeneous X64+GPU Systems Using OpenACC
Join Michael Wolfe as he compares the advantages and costs of using both low-level models and the directive-based OpenACC model for programming accelerated heterogeneous systems. Registration is free.

High-Performance Computing in Action
Businesses that want to be on the cutting edge of their industries are increasingly turning to high-performance computing (HPC) solutions to handle complex compute processes and speed up their rate of innovation. Download this Executive Brief to see how businesses in energy, life sciences and entertainment put HPC solutions to work in their operations.

May 21, 2013

May 20, 2013

May 17, 2013

May 16, 2013

May 15, 2013

May 14, 2013

May 13, 2013

May 10, 2013

May 09, 2013


Most Read Features

Most Read Around the Web

Most Read This Just In

Supermicro

Short Takes

Running Computational Fluid Dynamics in the Cloud

May 16, 2013 | When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...

Computing the Physics of Bubbles

May 15, 2013 | Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...

Internet2 Awards Program Seeks Innovative Applications

May 10, 2013 | Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
Read more...

Floating Funding to Exascale Island

May 09, 2013 | The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
Read more...

Sponsored Whitepapers

Best Practices in Big Data Storage

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Progress in Parallel: the Bull Parallel Programming Center

04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.

Sponsored Multimedia

SGI DMF ZeroWatt Disk Solution

In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

SC12 Editorial Feature HPCwire Soundbite sponsored by ISC Xyratex

HPC Job Bank


Featured Events


  • June 16, 2013 - June 20, 2013
    ISC'13
    Leipzig,
    Germany

  • June 17, 2013 - June 18, 2013
    Forecast 2013
    San Francisco, CA
    United States





HPCwire Events