Aspen
NetApp
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud
Green Computing Report

Tabor Communications
Corporate Video

Blog: From the Editor

From the Editor | Main Blog Index

Exotics at SC10


Before general-purpose GPUs broke onto the supercomputing scene a few years ago, HPC was nearly a monoculture, processor-wise. According to the latest IDC numbers, x86 CPUs own about three-quarters of the market by revenue share. But the annual Supercomputing Conference always manages to showcase a number of exotic machines based on more esoteric parts, and this year's event in New Orleans was no exception.

IBM's Blue Gene/Q

First up is IBM's Blue Gene/Q, the company's next-generation architecture in the Blue Gene lineage. If you happened by the IBM booth at the conference, you could get a look at the compute and the I/O boards that will go into the upcoming BG machinery. The compute box has 32 nodes inside, with each node containing 16 PowerPC cores and each core able to manage four threads simultaneously. With 16 GB per node, IBM has managed to maintain a core:GB ratio of 1:1.

All of this adds up to a lot of flopping horsepower. According to IBM Deep Computing VP Dave Turek, just four BG/Q racks will deliver a petaflop (sustained) of performance. And true to the Blue Gene lineage, it does so with minimal power. A prototype BG/Q system topped the latest Green500 list announced at SC10 last week, with a figure of 1,684.2 megaflops per watt. That beat out even the power-sipping TSUBAME 2.0 super at Tokyo Tech, which derives most of its computational muscle from energy-efficient NVIDIA Fermi GPUs.

The first Q -- at least the first big one -- will be installed in 2011 at Lawrence Livermore National Laboratory. That system, known as Sequoia, is intended to be THE big production machine for the NNSA's weapon simulation codes maintained under the Advanced Simulation and Computing (ASC) Program. Sequoia is slated to be a 20-petaflop system when it boots up next year.

Fujitsu's K Supercomputer

Also on display at SC10 was Fujitsu's Sparc64 VIIIfx CPU. The processor will be the basis for Kei Soku Keisanki, aka the "K computer." That machine will be the culmination of Japan's Next-Generation Supercomputing Project and is expected to deliver 10 petaflops for its main customer, RIKEN (Japan's Institute of Physical and Chemical Research). Originally scheduled to come online in 2010, the pullout of NEC and Hitachi last year pushed the timeline out significantly. Full deployment is now scheduled to complete in 2012.

The Sparc64 VIIIfx processor itself is an 8-core scalar CPU that can deliver 128 peak gigaflops. Energy efficiency appears to be quite respectable. A cut-down K computer system at RIKEN was ranked number four on the latest Green500 list at 828.67 megaflops per watt (or about half that of the BG/Q prototype). Keep in mind, the original K machine was supposed to contain vector hardware as well, but when NEC and Hitachi bailed, the scalar CPUs from Fujitsu were forced to carry the entire computational load. The current design puts 80,000 Sparc64 VIIIfx chips into the 10-petaflop machine.

Tilera's Manycore Processors

For a company that immodestly claims on its website that it "has solved the multi-processor scalability problem," one would expect the supercomputing crowd to take notice. And it has. Tilera, makers of manycore microprocessors, has managed to attract both SGI and DARPA for HPC duty.

The 64-core Tilera processor will be an option on SGI's Prism XL supercomputer (the offspring of Project Mojo), the company's new accelerator-centric platform unveiled at SC10. Although most Prisms will likely be outfitted with GPGPU, SGI determined that the power-sipping Tilera silicon would be a great fit for HPC-style workloads that mainly need integer acceleration -- apps like encryption, image and signal processing; network packet inspection, Web/content delivery, and media format conversion. Whether this particular configuration catches on or not remains to be seen, but you have to give SGI credit for going after market niches that other HPC vendors have largely ignored.

Tilera is also a player in DARPA's Ubiquitous High Performance Computing (UHPC) program, where its manycore tiled processor technology garnered the company a place on one of the four initial teams. Anant Agarwal, Tilera co-founder and CTO (and EE/CS professor at MIT) pitched his UHPC team's Project Angstrom at a Friday panel at SC10. In his presentation, Agarwal emphasized the performance per watt strength of the Tilera technology versus conventional CPUs. For example, to attain a targeted 50 mW (milliwatts) per core performance needed for UHPC machines, Tilera has only to modestly scale its current 200 mW per core designs. Agarwal proposes they can deliver that on 11nm technology with a 1,000-core 50 watt chip that delivers 5 teraflops. Conventional CPUs, he argues, are going to have to undergo a deeper architectural redesign, given that they currently consume around 10 watts per core.

Cray's XMT

Cray's XMT supercomputer has been around since 2007, but has always been overshadowed by the company's mainstream XT and now XE lines. Outside the three-letter intelligence agencies and a few US DOE labs, the machine is not widely known. But Cray is apparently looking to expand its popularity. At SC10 this year, the XMT got some extra attention, appearing in the Disruptive Technologies exhibit and as the focus of its own BoF.

The XMT's forte is scalable data analytics, and the architecture has been designed with this application set in mind. Encompassing Cray's custom SeaStar2 interconnect and Threadstorm processors, the platform's principle architectural feature is globally-addressable memory, which makes it possible to run shared memory applications on the machine. All Threadstorm chips can access each other's memory (up to 8 GB per processor) making it feasible to build a system with as much as 64 terabytes of global RAM.

Unlike most shared memory machines, the XMT is built to support a lot of parallelism. Each Threadstorm CPU can manage 128 threads at a time. Combined with speedy access to random chunks of remote shared memory, the system is a much more efficient platform than a conventional distributed memory architecture for those applications that require processing of really big graph-oriented databases. This includes a lot of large-scale scientific data analysis as well as many non-technical informatics applications where the data is unstructured. Look for Cray to keep pushing this technology into this rapidly emerging space.

SeaMicro's Atom-Based Server

Perhaps the most obscure exotic at SC10 was SeaMicro's SM10000, an Intel Atom-based server that puts eight of the tiny processors onto a card, 512 in a 10U enclosure, and 2,048 in a rack. The Z530 processor being used is a single core, 1.6 GHz chip that has a max TDP of a mere 2 watts. The company's pitch is that this setup requires just one-quarter of the power and space of conventional x86 servers without requiring any modifications to software. Atom is conveniently x86 compatible.

The downside is that it's a pretty low-end set-up. Memory maxes out at 2 GB per card; network support is Ethernet only; and the single-core chip in the current version is 32-bit. That may be acceptable for low-precision throughput apps that need lots of parallelism but don't require any sort of tight coupling or single-threaded performance. A 64-bit version with InfiniBand or low-latency 10GbE connectivity would be a much more interesting offering. But keep on eye on the Atom. It could be the dark horse in the race for energy-efficient x86 HPC.

Posted by Michael Feldman - November 24, 2010 @ 3:33 PM, Pacific Standard Time

Sponsored Links

High-Performance Computing in Action
Businesses that want to be on the cutting edge of their industries are increasingly turning to high-performance computing (HPC) solutions to handle complex compute processes and speed up their rate of innovation. Download this Executive Brief to see how businesses in energy, life sciences and entertainment put HPC solutions to work in their operations.

Accelerate your science with Seneca
One of the first HPC providers installing a 4X NVIDIA Kepler K-20 cluster. Invites you to a free evaluation on Seneca’s NVIDIA K20 Kepler cluster, pre-loaded with AMBER, NAMD, LAMMPS

Webinar: Programming Heterogeneous X64+GPU Systems Using OpenACC
Join Michael Wolfe as he compares the advantages and costs of using both low-level models and the directive-based OpenACC model for programming accelerated heterogeneous systems. Registration is free.

Michael Feldman

Michael Feldman

Michael Feldman is the editor of HPCwire.

More Michael Feldman


Recent Comments

No Recent Blog Comments

Feature Articles

CERN, Google Drive Future of Global Science Initiatives

Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
Read more...

Saddling Phi for TACC’s Stampede

The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
Read more...

"No Exascale for You!" An Interview with Berkeley Lab's Horst Simon

Although Horst Simon was named Deputy Director of Lawrence Berkeley National Laboratory, he maintains his strong ties to the scientific computing community as an editor of the TOP500 list and as an invited speaker at conferences.
Read more...

Short Takes

Running Computational Fluid Dynamics in the Cloud

May 16, 2013 | When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...

Computing the Physics of Bubbles

May 15, 2013 | Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...

Internet2 Awards Program Seeks Innovative Applications

May 10, 2013 | Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
Read more...

Floating Funding to Exascale Island

May 09, 2013 | The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
Read more...

HPC and the True Cost of Cloud

May 08, 2013 | For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
Read more...

Sponsored Whitepapers

Best Practices in Big Data Storage

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Progress in Parallel: the Bull Parallel Programming Center

04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.

Sponsored Multimedia

SGI DMF ZeroWatt Disk Solution

In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

Blogs by Topics

Blogs by Author

HPC Blogroll


Featured Events


  • June 16, 2013 - June 20, 2013
    ISC'13
    Leipzig,
    Germany

  • June 17, 2013 - June 18, 2013
    Forecast 2013
    San Francisco, CA
    United States





HPCwire Events