CSCS Top Right Frontpage
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud

AMD's Server Roadmap Plots a Course for HPC


Our recent coverage of AMD's Financial Analyst Day outlined the chipmaker's overall server strategy for the next couple of years, but left a lot to the imagination with regard to how all this might play out in the high performance computing space. The presentation for the analysts barely acknowledged the HPC market, instead emphasizing AMD's main thrusts in the mainstream server and client segments. In a more recent conversation with John Fruehe, AMD's director of product marketing for the server and embedded group, we were able to get a better idea of how the company sees its HPC prospects for 2010 and beyond.

One might wonder how much AMD -- or Intel, for that matter -- thinks about the HPC market these days. Despite a better growth rate than the mainstream server market, HPC still only represents between 2 to 10 percent of server chip revenue, depending on who you talk to. In the commodity chip business, that's too small a segment to inspire separate processor designs, but too big to ignore. "The beauty of HPC," says Fruehe, "is that you have an opportunity to sell large numbers of processors in a single shot." According to him, that is reason enough to stay in the game.

And in any case, many mainstream enterprise applications require essentially the same performance characteristics as HPC workloads: large numbers of fast cores and high memory bandwidth. The soon-to-be-released 45nm Magny-Cours Opteron sports 8 or 12 cores and four memory channels. That design, says Fruehe, is well-suited to HPC workloads, and he believes it will help them capture more of the server market in 2010. Magny-Cours' current competition is the quad-core Nehalem EP, which has three memory channels, and the 8-core Nehalem EX that can support up to eight sockets. The idea is that Magny-Cours will outrun Nehalem EP on memory bandwidth and out-compete Nehalem EX on price and power consumption.

The Magny-Cours Opteron represents AMD's high-end 6000 series (G34 socket) and is aimed at 2P and 4P servers. The chips in this series emphasize performance and scalability, so would tend to be the first choice for HPC. AMD thinks that combining 2P and 4P support under one platform will encourage sales of more 4-socket servers, which up until now represented only about 5 percent of the company's server sales. The problem used to be that the price premium for the 4P chips meant that economics of two 2P servers usually made more sense, even if the application was better off on a four-socket configuration. The G34 chips changes that calculation. If the application is not bound by the cluster's interconnect performance (i.e., more compute-intensive than I/O-intensive), the user should be able to save money by eliminating the extra switches and adapter cards required by the additional nodes. AMD's advice is that if an HPC application is not saturating the InfiniBand interconnect, a cluster of four-socket nodes may be the way to go.

In Q2 2010, AMD is expected to launch the 4- and 6-core "Lisbon" Opteron chips, which represents the lower-end 4000 series (C32 socket), supporting 1P and 2P servers. The latter is aimed more toward Web-tier enterprise apps and cloud infrastructure, where power efficiency and less computationally dense cluster nodes are the rule. But Fruehe says the 4000 series are also suitable for what he calls "corporate HPC:" banks, auto companies, and the like, where clusters of 32-128 nodes predominate. In that world, more energy-efficient but less powerful processors are often the answer.

In 2011, AMD will move to the "Bulldozer" architecture, the first server products being the 16-core "Interlagos" and the 8-core "Valencia." These chips are G34 and C32 socket-compatible, respectively, so they can be drop-in replacements for HPC customers looking to do processor-only upgrades. Probably the biggest change in Bulldozer is the relationship between the integer and floating point units. Up until now each integer core on an Opteron was mapped to a single 128-bit FPU. But at least in the first implementation of Bulldozer, two integer cores are paired with a double-wide (256-bit) floating point unit. The goal here is to make FP processing more flexible. With the Bulldozer design, a single integer core can claim the entire FPU and schedule 256-bit FP operations or, alternatively, each core can run 128-bit operations. Fruehe says they also plan to reveal other FPU enhancements designed to bump up performance.

The other significant piece of the Bulldozer story is its modular design, which allows chips with different core counts to be built from the same silicon blueprint. Intel introduced its modular architecture in its Nehalem architecture, but for AMD modularity has special significance, since it can start to think about adding ATI GPU modules onto the die. Fruehe implied that AMD is reconsidering the roles of FPUs and GPUs in general-purpose computing, and could lead to chip designs where the FP capability is enhanced or even completely replaced by GPU modules. "We think that, down the road, you're going to see more customers utilizing a GPU as a way to enhance the floating point performance," he says.

It's a bit premature for AMD to talk about GPU modularity, at least in the server context. Currently there are no public plans to mingle ATI GPU silicon with Opteron CPUs. The CPU-GPU "Fusion" chips on AMD's existing roadmap are for client-side computing and use an architecture known as Accelerated Processing Unit (APU). The big hurdles for integrating high-end GPUs with high-end CPUs is the lack of die real estate and power limitations. Until AMD moves to the 22nm process node, it will be difficult to get teraflop-level GPUs sharing silicon on server chips.

In the near term, AMD's goal is to find a way to bring the GPU closer to the CPU in order to get better performance. Although Fruehe wouldn't elaborate on how this might be implemented, the company basically has two choices: HyperTransport and PCIe. The PCIe route would be easier from the GPU point of view, since existing graphics cards are already PCIe compatible. Getting CPUs more intimate with PCIe could involve changes to the socket and chip design or, more simply, the addition of a HyperTransport-to-PCIe bridge tunnel on the board. The HyperTransport-only option would most likely involve devising an Opteron socket-compatible GPU, which can then act as a native coprocessor in the same way XtremeData has done with its FPGA coprocessor.

In any case, it makes sense for AMD to maximize the synergy between its GPU and CPU offerings since it is the only chipmaker that has a deep portfolio in both architectures. Until Intel resurrects its Larrabee product line or NVIDIA decides it can't do without a CPU offering, AMD will remain in that unique position. From Fruehe's perspective, that's a much bigger advantage than any of its rivals have.

Of course, Intel can claim the high ground in semiconductor process technology, pointing to its 6 to 12 month lead over AMD. This week, Intel previewed its first 32nm Westmere parts: "Clarksdale" for desktops PCs and "Arrandale" for laptops, and is expected to launch the Westmere EP server chip later this year -- perhaps as early as March. The company's manufacturing prowess gives it certain advantages in production costs and generally better raw performance numbers. But that doesn't make up for the lack of a high-end graphics product. "Intel is much further behind us in GPU technology than we're behind them in process technology," notes Fruehe. As Intel recently found out, building a high-end graphics engine is a lot harder than shrinking transistors.

HPCwire on Twitter

Discussion

There are 0 discussion items posted.

Join the Discussion

Join the Discussion

Become a Registered User Today!


Registered Users Log in join the Discussion

May 18, 2012

May 17, 2012

May 16, 2012

May 15, 2012

May 14, 2012

May 11, 2012

May 10, 2012

May 09, 2012

May 08, 2012


Most Read Features

Most Read Around the Web

Most Read This Just In

Appro Nvidia Tesla Next Generation Xtreme-X Supercomputer

Around the Web

NVIDIA’s Bill Dally Talks 3D Chips and More at GTC

May 16, 2012 | Chief scientist discusses memory stacks, interconnects, and US technology leadership.
Read more...

NVIDIA Unveils Virtualized GPU with Kepler-Based Board

May 15, 2012 | GPU maker conjures up visualization technology for virtual desktops.
Read more...

Zettaflops Will Happen Says HPC Analyst

May 14, 2012 | Pessimistic predictions about technology have a poor track record, according to 451's John Barr.
Read more...

Next-Gen Memory on the Horizon

May 10, 2012 | DRAM manufacturers gear up for DDR4.
Read more...

US Energy Secretary Talks Supercomputing

May 09, 2012 | Steven Chu discusses the role of supercomputing in energy research.
Read more...

Sponsored Whitepapers

Sponsored Multimedia

ISC Think Tank 2012

Newsletters

PGI


HPC Job Bank


Featured Events







HPC Wire Events