Visit additional Tabor Communication Publications
November 24, 2010
Before general-purpose GPUs broke onto the supercomputing scene a few years ago, HPC was nearly a monoculture, processor-wise. According to the latest IDC numbers, x86 CPUs own about three-quarters of the market by revenue share. But the annual Supercomputing Conference always manages to showcase a number of exotic machines based on more esoteric parts, and this year's event in New Orleans was no exception.
IBM's Blue Gene/Q
First up is IBM's Blue Gene/Q, the company's next-generation architecture in the Blue Gene lineage. If you happened by the IBM booth at the conference, you could get a look at the compute and the I/O boards that will go into the upcoming BG machinery. The compute box has 32 nodes inside, with each node containing 16 PowerPC cores and each core able to manage four threads simultaneously. With 16 GB per node, IBM has managed to maintain a core:GB ratio of 1:1.
All of this adds up to a lot of flopping horsepower. According to IBM Deep Computing VP Dave Turek, just four BG/Q racks will deliver a petaflop (sustained) of performance. And true to the Blue Gene lineage, it does so with minimal power. A prototype BG/Q system topped the latest Green500 list announced at SC10 last week, with a figure of 1,684.2 megaflops per watt. That beat out even the power-sipping TSUBAME 2.0 super at Tokyo Tech, which derives most of its computational muscle from energy-efficient NVIDIA Fermi GPUs.
The first Q -- at least the first big one -- will be installed in 2011 at Lawrence Livermore National Laboratory. That system, known as Sequoia, is intended to be THE big production machine for the NNSA's weapon simulation codes maintained under the Advanced Simulation and Computing (ASC) Program. Sequoia is slated to be a 20-petaflop system when it boots up next year.
Fujitsu's K Supercomputer
Also on display at SC10 was Fujitsu's Sparc64 VIIIfx CPU. The processor will be the basis for Kei Soku Keisanki, aka the "K computer." That machine will be the culmination of Japan's Next-Generation Supercomputing Project and is expected to deliver 10 petaflops for its main customer, RIKEN (Japan's Institute of Physical and Chemical Research). Originally scheduled to come online in 2010, the pullout of NEC and Hitachi last year pushed the timeline out significantly. Full deployment is now scheduled to complete in 2012.
The Sparc64 VIIIfx processor itself is an 8-core scalar CPU that can deliver 128 peak gigaflops. Energy efficiency appears to be quite respectable. A cut-down K computer system at RIKEN was ranked number four on the latest Green500 list at 828.67 megaflops per watt (or about half that of the BG/Q prototype). Keep in mind, the original K machine was supposed to contain vector hardware as well, but when NEC and Hitachi bailed, the scalar CPUs from Fujitsu were forced to carry the entire computational load. The current design puts 80,000 Sparc64 VIIIfx chips into the 10-petaflop machine.
Tilera's Manycore Processors
For a company that immodestly claims on its website that it "has solved the multi-processor scalability problem," one would expect the supercomputing crowd to take notice. And it has. Tilera, makers of manycore microprocessors, has managed to attract both SGI and DARPA for HPC duty.
The 64-core Tilera processor will be an option on SGI's Prism XL supercomputer (the offspring of Project Mojo), the company's new accelerator-centric platform unveiled at SC10. Although most Prisms will likely be outfitted with GPGPU, SGI determined that the power-sipping Tilera silicon would be a great fit for HPC-style workloads that mainly need integer acceleration -- apps like encryption, image and signal processing; network packet inspection, Web/content delivery, and media format conversion. Whether this particular configuration catches on or not remains to be seen, but you have to give SGI credit for going after market niches that other HPC vendors have largely ignored.
Tilera is also a player in DARPA's Ubiquitous High Performance Computing (UHPC) program, where its manycore tiled processor technology garnered the company a place on one of the four initial teams. Anant Agarwal, Tilera co-founder and CTO (and EE/CS professor at MIT) pitched his UHPC team's Project Angstrom at a Friday panel at SC10. In his presentation, Agarwal emphasized the performance per watt strength of the Tilera technology versus conventional CPUs. For example, to attain a targeted 50 mW (milliwatts) per core performance needed for UHPC machines, Tilera has only to modestly scale its current 200 mW per core designs. Agarwal proposes they can deliver that on 11nm technology with a 1,000-core 50 watt chip that delivers 5 teraflops. Conventional CPUs, he argues, are going to have to undergo a deeper architectural redesign, given that they currently consume around 10 watts per core.
Cray's XMT supercomputer has been around since 2007, but has always been overshadowed by the company's mainstream XT and now XE lines. Outside the three-letter intelligence agencies and a few US DOE labs, the machine is not widely known. But Cray is apparently looking to expand its popularity. At SC10 this year, the XMT got some extra attention, appearing in the Disruptive Technologies exhibit and as the focus of its own BoF.
The XMT's forte is scalable data analytics, and the architecture has been designed with this application set in mind. Encompassing Cray's custom SeaStar2 interconnect and Threadstorm processors, the platform's principle architectural feature is globally-addressable memory, which makes it possible to run shared memory applications on the machine. All Threadstorm chips can access each other's memory (up to 8 GB per processor) making it feasible to build a system with as much as 64 terabytes of global RAM.
Unlike most shared memory machines, the XMT is built to support a lot of parallelism. Each Threadstorm CPU can manage 128 threads at a time. Combined with speedy access to random chunks of remote shared memory, the system is a much more efficient platform than a conventional distributed memory architecture for those applications that require processing of really big graph-oriented databases. This includes a lot of large-scale scientific data analysis as well as many non-technical informatics applications where the data is unstructured. Look for Cray to keep pushing this technology into this rapidly emerging space.
SeaMicro's Atom-Based Server
Perhaps the most obscure exotic at SC10 was SeaMicro's SM10000, an Intel Atom-based server that puts eight of the tiny processors onto a card, 512 in a 10U enclosure, and 2,048 in a rack. The Z530 processor being used is a single core, 1.6 GHz chip that has a max TDP of a mere 2 watts. The company's pitch is that this setup requires just one-quarter of the power and space of conventional x86 servers without requiring any modifications to software. Atom is conveniently x86 compatible.
The downside is that it's a pretty low-end set-up. Memory maxes out at 2 GB per card; network support is Ethernet only; and the single-core chip in the current version is 32-bit. That may be acceptable for low-precision throughput apps that need lots of parallelism but don't require any sort of tight coupling or single-threaded performance. A 64-bit version with InfiniBand or low-latency 10GbE connectivity would be a much more interesting offering. But keep on eye on the Atom. It could be the dark horse in the race for energy-efficient x86 HPC.
Posted by Michael Feldman - November 24, 2010 @ 3:33 PM, Pacific Standard Time
Michael Feldman is the editor of HPCwire.
No Recent Blog Comments
Contributing commentator, Andrew Jones, offers a break in the news cycle with an assessment of what the national "size matters" contest means for the U.S. and other nations...
Today at the International Supercomputing Conference in Leipzing, Germany, Jack Dongarra presented on a proposed benchmark that could carry a bit more weight than its older Linpack companion. The high performance conjugate gradient (HPCG) concept takes into account new architectures for new applications, while shedding the floating point....
Not content to let the Tianhe-2 announcement ride alone, Intel rolled out a series of announcements around its Knights Corner and Xeon Phi products--all of which are aimed at adding some options and variety for a wider base of potential users across the HPC spectrum. Today at the International Supercomputing Conference, the company's Raj....
Jun 19, 2013 |
Supercomputer architectures have evolved considerably over the last 20 years, particularly in the number of processors that are linked together. One aspect of HPC architecture that hasn't changed is the MPI programming model.
Jun 18, 2013 |
The world's largest supercomputers, like Tianhe-2, are great at traditional, compute-intensive HPC workloads, such as simulating atomic decay or modeling tornados. But data-intensive applications--such as mining big data sets for connections--is a different sort of workload, and runs best on a different sort of computer.
Jun 18, 2013 |
Researchers are finding innovative uses for Gordon, the 285 teraflop supercomputer housed at the San Diego Supercomputer Center (SDSC) that has a unique Flash-based storage system. Since going online, researchers have put the incredibly fast I/O to use on a wide variety of workloads, ranging from chemistry to political science.
Jun 17, 2013 |
The advent of low-power mobile processors and cloud delivery models is changing the economics of computing. But just as an economy car is good at different things than a full size truck, an HPC workload still has certain computing demands that neither the fastest smartphone nor the most elastic cloud cluster can fulfill.
Jun 14, 2013 |
For all the progress we've made in IT over the last 50 years, there's one area of life that has steadfastly eluded the grasp of computers: understanding human language. Now, researchers at the Texas Advanced Computing Center (TACC) are utilizing a Hadoop cluster on its Longhorn supercomputer to move the state of the art of language processing a little bit further.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?
Join our webinar to learn how IT managers can migrate to a more resilient, flexible and scalable solution that grows with the data center. Mellanox VMS is future-proof, efficient and brings significant CAPEX and OPEX savings. The VMS is available today.