IBM Unlocks the Cell

By Nicole Hemsoth

September 15, 2006

Last week, the DOE's National Nuclear Security Administration selected IBM to design and build the world's first supercomputer that will use both Cell Broadband Engine (Cell BE) processors and conventional AMD Opteron processors. The petaflop machine, code-named Roadrunner, is scheduled to be deployed at Los Alamos National Laboratory sometime in 2008. This not only represents IBM's first supercomputer containing Cell processors, but it also signifies the company's first large-scale heterogenous system deployment.

HPCwire got the opportunity to talk with David Turek, vice president of Deep Computing at IBM, about the new system. In this extended interview, Turek reveals IBM's strategy behind the Roadrunner platform and how it fits into the company's supercomputing plans. He also discusses IBM's overall approach to hardware accelerators and heterogeneous computing.

HPCwire: What is the significance of the Roadrunner deployment? Is it a one-off system or does it represent the start of a new line of IBM supercomputers?

Turek: The significance of Roadrunner is that this is our preferred architectural design for the deployment of Cell in the HPC application arena. To be clear, we have no plans to build a giant cluster just out of Cell processors. Instead we think the Roadrunner model is the correct model which employs Cell as an accelerator to a conventional microprocessor-based server.

Over the course of time, we expect accelerators to become a key element to our overarching strategy. So the work that we do here is designed, in particular, to be sufficiently general to encompass a variety of models on how accelerators might be deployed.

Our intention with respect of a more broadly propagated version of Roadrunner is an assignment we've given ourselves for the fall to see exactly how far this can be extended and how deeply it can be played in the marketplace. We've got to resolve programming model issues. Secondly, the early Cell deployment is based on single precision floating point; that's going to go to double precision [for the final deployment]. So there's work to be done here to see exactly how this plays out.

In a sense this is no different than our launch of Blue Gene, which nominally was targeted to a very narrow set of applications, but which over the course of time demonstrated much broader utility. And if you go back still further in time, when we launched the SP system back in the 90s, we viewed that as a more niche product; and that too became more broadly deployed.

So this is an addition to our portfolio. It is not meant to displace or replace anything. We just think that the diversity of application types are such that there will be a need for a broader portfolio rather than a narrower portfolio.

HPCwire: Are you looking at other accelerator devices besides Cell?

Turek: Always. Our technology outlook is pretty broad. We're looking at trends several years in the future. So we've been looking at a variety of schemes for acceleration, and it goes beyond just looking at the conventional idea of using an FPGA for an accelerator — which, by the way, we don't think is a good idea. And it goes as far as us beginning to think about system level acceleration as it applies to workflow, as opposed to process level acceleration as it applies to specific applications.

Let's look at process-level optimization and application decomposition and see how that maps to these kinds of models of acceleration that are embodied in Roadrunner. We know that a lot people will experiment and use accelerators. We can't be specific about what they'll all look like over the course of time. But we think that if we get the programming model right, it should be extendable to cover a more diverse range of accelerator [architectures].

So, for all the right reasons, we're extraordinarily proud of Cell and we think it has a huge opportunity to make a terrific impact in a variety of market segments. But we're not blind to the fact that other people can or have developed accelerator technologies.

HPCwire: While the Cell architecture certainly has generated a lot of interest in the HPC community, some of the people I've talked to have expressed doubts about the suitability of Cell for mainstream scientific and technical computing.

Turek: That's why I drew this stark distinction at the beginning about our plans to just build a Cell-based cluster. Because when I think when you talk to people and you ask the question the way you posed it, many people will naturally make the assumption that we're going to have a system entirely based on Cell processors and that's it. And I think that under that scenario we would agree — that would be a bit of a stretch. But on the other hand, with a lot of thoughtful analysis over many months, both internally and in collaboration with the teams at Los Alamos (as we got involved in responding to the RFP), we thought that this notion of deployed Cell as an accelerator to a conventional architecture was a better way to go.

HPCwire: You said that the final deployment of Roadrunner will incorporate a double precision floating point implementation of the Cell processor. What will you be accomplishing in the early stages of Roadrunner that uses the single precision version of Cell?

Turek: The early deployments of Cell are really meant to help us deploy and debug all the software tools and the programming model. All that gets preserved regardless of whether you're single or double precision. And then as we go down the path of producing the double precision Cell B.E., that will be more a matter of deployment and scaling issues than it will be to the specification programming models, software tools and things of that sort.

HPCwire: On a related topic, are you interested in the work Jack Dongarra is doing with the Cell, using single precision hardware to provide double precision math in software [see Less is More: Exploiting Single Precision Math in HPC]?

Turek: Absolutely. We talk to Jack all the time about this. I think we may experiment with it or have our other Cell collaborators experiment with it — if Jack's OK with that. We consider the work Jack is doing to be very, very important as is the work of all of our other collaborators. By the way, there are many such individuals, spread across many universities around the world.

So we'll talk to Jack and look at that pretty seriously. If we all have a meeting of the minds about how to begin to deploy this, we will let clients like Los Alamos or maybe others make use of that technology. Absolutely, we will do that.

HPCwire: You mentioned you're not really interested in FPGAs as accelerators. Why is that?

Turek: Because they're really hard to program and they're pretty expensive, relatively speaking. We think they're really good for prototyping. But we believe a better model is to put that [functionality] into a custom ASIC or something else. I'm not convinced that the software tools and the other things you need for programming them will ever make it, fundamentally. But I think a model built on custom ASICs or things like Cell, which can take advantage of conventional high-level programming languages and compilers, etc. (and yes there's work to be done here on programming models), is probably going to a more effective way to get those kinds of speedups that are nominally associated with strategies of acceleration.

I mean if you look, for example, at the XD1 system that Cray offered, I don't think there is much uptake in the market for that technology. I think the utilization of FPGAs in that was probably fairly scant — you'd have to talk to Cray about that and get some facts on it. There's clearly been more interest from companies talking about things like ClearSpeed [co-processors].

HPCwire: How do you envision applications will be deployed on Roadrunner?

Turek: The design of Roadrunner can be looked at in a couple of different ways. First of all, by having a very large Opteron cluster as kind of the workhorse part of the system, one could choose just to deploy applications quite conventionally on that cluster to achieve the expected benefit. The second thing is that the system has flexibility by the deployment of Cell processor as accelerators, in conjunction with the Opteron cluster, which gives you something like a “turbo-boost” on applications that are capable of exploiting the acceleration. So with Roadrunner, you have choices. You can deploy application conventionally — read that as MPI — and then you can marry that with a model that uses library calls to give you access to the compute power of the Cell.

HPCwire: Roadrunner is described as containing 16,000 Opteron processors and 16,000 Cell processors. What's the significance of the one-to-one ratio of Opterons to Cells?

Turek: So, I'll be the first to say that we don't know everything. I think that all these ratios are going have to be explored in more detail. Right now, for example, when you look at the Cell processor, it's one conventional processing engine and eight SPEs. Well, you could ask the same question there. Is that the right ratio? I think that it's premature on the part of anybody to be declarative on this topic.

In the context of the Los Alamos application, we've been thoughtful that this is the right plan. Do we think that there's no evidence in the world that would cause us to move away from this? Clearly not. I think as we get into deeper stages of development, both in software and deployment of hardware, and start running real applications (as opposed to running simulations), we're bound to learn something. And I will tell you that if what we learn says you need to tweak this a bit and go this way instead of that way, then we will absolutely do that to give our client the best possible performance.

HPCwire: Is the Blue Gene technology heading for petaflops in its roadmap as well?

Turek: I think the natural progression of what we're doing on these platforms is clearly to anticipate multi-petaflop systems down the road. So sure, if you look at Blue Gene today, the only thing that separates you from the deployment of a petaflop system is money. The future designs factor in a whole lot of other things — not only how you make a petaflop affordable, but also how do you open the aperture to an enhanced set of applications. Basically, this is a reflection upon the experience that we, along with our collaborators, have had over the past year and a half with Blue Gene. And you make adjustments along the way. So do we have an intention to drive the Blue Gene map forward? Absolutely.

And it's not at all in conflict with what we're doing here with Roadrunner because they're different programming models. For us, that's a key point of differentiation. Right now it looks like they may serve different application sets differently. For us that's fine.

We've never been strong believers in the notion that high performance computing, as a market segment, is homogeneous, or by implication, that the applications that characterize it, are homogeneous. And I think that's partly caused by the fact that when we talk about high performance computing, we expand it to include applications that you'll find in financial services, digital media, business intelligence, etc. So we probably have a broader conceptualization of the marketplace than some of the niche players may have. As a result, it conspires to cause us to have a broader portfolio than some of those players might have.

HPCwire: With that in mind, what kinds of application differentiation do you see between Roadrunner and Blue Gene?

Turek: Clearly, the Roadrunner represents a bigger memory model than Blue Gene. But it also has a different kind of programming model. Today for example, MPI applications, in almost 100 percent of the cases, are capable of being ported to Blue Gene, usually within a day, with reasonably good performance. Tuning, we've discovered, takes maybe another two to five days to get really outstanding performance. With respect to the Roadrunner model, that's going to be a bit different because of the way that system is architected. We'll reveal more details about the Roadrunner APIs down the road; it's a little premature to do that now. We'll go public with that sometime this fall, for sure.

There are a lot of things that we can do in regards to mapping applications to the SPEs on the Cell processor. And there's a lot we can do in the evolution of the Cell processor. So for us this is just another integral part of our portfolio that we've got to sort out in the context of our existing technologies, mapped against how we see the development of different market segments. I can understand a small company or niche company saying “Well, IBM has two, three or four things, whatever the case may be.” But our view is that it's a big market that is intrinsically diverse and it's actually what is required if you are really committed to serving the needs of your clients.

Consider really good scale-out applications, for example Qbox, which right now operates at 207 teraflops sustained on Blue Gene at Livermore. Are you going to get better performance if you port it and tune it to Roadrunner? My guess is probably not. And the reason for that is that the architecture of the Qbox application is something that does really well with the kind of memory subsystem characteristic of Blue Gene as well as the scale-out aspects of the networks in Blue Gene. For example Roadrunner doesn't have the multiple network model that Blue Gene has. And as a result there are applications where the scalability won't be there. The important thing, though, is that in the context of the applications that are characteristic of Los Alamos, there is a high degree of confidence that the design of Roadrunner is actually more appropriate for those applications than alternative architectures.

So this brings me back full circle. You have to let the algorithms and the applications dictate the nature of the architectures you deploy.

HPCwire: Your Roadrunner “Hybrid Programming” software model sounds similar to Cray's “Adaptive Computing” vision. How would you compare the two?
Turek: Well, ours is real.
HPCwire: In what sense?
Turek: It exists. We're working on it. The APIs are defined. The programming is underway. We're committed to it as an important and strategic element of what we're doing.

It's hard for me to comment on the “Adaptive Computing” model from Cray. I guess it was meant to be some of universal solution, encompassing a broad range of architectures, all under one roof — scalars, vectors, FPGAs, etc. I don't know how that all works. So I would it say it was more a statement of intention rather than a development plan.

With respect to the contract we signed with Los Alamos, we have a development plan. It's outside of the stage of intention. So when I say it's real, I mean the corporation has committed itself to execute on this and it will get done. It's different than making a speech and outlining a vision.

As far as I know, no one has signed a contract with Cray for an “Adaptive Computing” implementation. I don't know how to comment on its existence other than it's a statement of intent. With respect to Roadrunner, we have a contract with deliverables that start this fall. So I know that is concrete and real. And we're committed to it. That is the difference between “easy to say” and “hard to do.” By the way, we're not paying attention to what Cray is doing here. We have a keen understanding of the architectural needs embodied in Roadrunner and we're executing on that in the context of a pretty diverse application portfolio, which we think will help generalize what's embedded in the Roadrunner APIs. That's what we have to worry about; we don't need to worry about the musings of what someone might do sometime in the future.

[Editor's note: See Cray's response to these remarks below.]

HPCwire: You said that the Los Alamos deployment would begin in the fall. Do you think you'll be demonstrating something Roadrunner-like at the Supercomputer Conference in November?

Turek: I wouldn't be surprised. But remember, what we're talking about for 2006 will be heavy on the Opteron deliveries and lighter on Cell because we'll be focusing on the development of the programming model rather than on Cell performance. So in the context of doing demos and getting the “gee golly” kind of attention, I'm not sure that's what we'll be looking for at Supercomputing. I mean we've run demos for some time now at Supercomputing with Cell. And if you show the right visualization applications, people say “Wow, this is pretty cool.” There are going to be a lot of things coming out this fall that are going to demonstrate that Cell is pretty cool. But I think we will do something at Supercomputing and it's going to open the eyes of a lot of people.

HPCwire: Do you think the reaction to this new technology will be different from that of Blue Gene when it first started?

Turek: You've got to remember, two years ago, there were a lot of people in the industry that pooh-poohed Blue Gene. They said: “The microprocessor is not fast enough, there's not enough memory and here's all the things it can't do.” And every time somebody said that to us or one of our clients, we put a little attention on it and without any dramatization, we said “No it really can do these things.”

I would characterize our activities on the Roadrunner project as being entirely pragmatic and empirical. We're moving away from discussions of theory, speculation and vision. So we're just going to build the damn thing and see what it really does.

We've committed a lot of resources to the government to do this and we're going to do everything we can to make it a success. But personally, I'm not going to pay a lot of attention to people sitting on the sidelines giving me theoretical reasons why it won't be good or it can't work or what have you. We paid attention to that in Blue Gene and it turned out that most of those people sitting on the sidelines didn't know what they were talking about. We'll let the facts speak for themselves.


In response to David Turek's remarks about Cray's Adaptive Computing vision, Jan Silverman, Cray senior vice president for corporate strategy and business development, responds:

“Industry experts that have been following Cray's product roadmap and Adaptive Supercomputing vision are aware of both our plans and progress to date – and understand that what Cray is doing is 'real.'

“Cray's Adaptive Supercomputing Vision, which we are implementing through a long-term collaboration with AMD and other technology partners, is exciting to customers and is progressing on schedule. The implementation strategy is to develop, in stages now through 2010, supercomputing products that increasingly adapt to applications by applying the optimal processor type to each application, or portion of an application. These systems will also be more productive, easier to program and more robust than any contemporary HPC system.

“Cray is uniquely qualified to execute on our Adaptive Supercomputing vision, because we have systems in the marketplace today with four processor types (AMD Opteron microprocessors, vector processors, multithreaded processors, FPGAs). We plan to deliver all of these processor capabilities into a single, tightly coupled system by the end of 2007. After 2007, we will add many more advances to make our Adaptive Supercomputing platform adapt to applications more transparently.

“The decision by the DOE Office of Science and Oak Ridge National Laboratory to award Cray the world's first order for a petascale supercomputer was influenced by their excitement about our Adaptive Supercomputing vision and their confidence in our ability to achieve it on time. NERSC, which recently returned to Cray as a customer with an initial order for a 100-teraflop system, is also enthusiastic about Adaptive Supercomputing.

“Cray looks forward to providing HPC users with Adaptive Supercomputing systems; IBM and others seem to be following Cray's lead by recognizing the importance of complementing industry-standard microprocessors with other types of processors. We consider this another proof point that the path Cray's R&D organization has been actively pursuing is the right one.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

SODALITE: Towards Automated Optimization of HPC Application Deployment

May 29, 2020

Developing and deploying applications across heterogeneous infrastructures like HPC or Cloud with diverse hardware is a complex problem. Enabling developers to describe the application deployment and optimising runtime p Read more…

By the SODALITE Team

What’s New in HPC Research: Astronomy, Weather, Security & More

May 29, 2020

In this bimonthly feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

By Oliver Peckham

DARPA Looks to Automate Secure Silicon Designs

May 28, 2020

The U.S. military is ramping up efforts to secure semiconductors and its electronics supply chain by embedding defenses during the chip design phase. The automation effort also addresses the high cost and complexity of s Read more…

By George Leopold

COVID-19 HPC Consortium Expands to Europe, Reports on Research Projects

May 28, 2020

The COVID-19 HPC Consortium, a public-private effort delivering free access to HPC processing for scientists pursuing coronavirus research – some utilizing AI-based techniques – has expanded to more than 56 research Read more…

By Doug Black

What’s New in Computing vs. COVID-19: IceCube, TACC, Watson & More

May 28, 2020

Supercomputing, big data and artificial intelligence are crucial tools in the fight against the coronavirus pandemic. Around the world, researchers, corporations and governments are urgently devoting their computing reso Read more…

By Oliver Peckham

AWS Solution Channel

Computational Fluid Dynamics on AWS

Over the past 30 years Computational Fluid Dynamics (CFD) has grown to become a key part of many engineering design processes. From aircraft design to modelling the blood flow in our bodies, the ability to understand the behaviour of fluids has enabled countless innovations and improved the time to market for many products. Read more…

Supercomputer Simulations Explain the Asteroid that Killed the Dinosaurs

May 28, 2020

The supercomputing community has cataclysms on the mind. Hot on the heels of supercomputer-powered research delving into the fate of the neanderthals, a team of researchers used supercomputers at the DiRAC (Distributed R Read more…

By Oliver Peckham

COVID-19 HPC Consortium Expands to Europe, Reports on Research Projects

May 28, 2020

The COVID-19 HPC Consortium, a public-private effort delivering free access to HPC processing for scientists pursuing coronavirus research – some utilizing AI Read more…

By Doug Black

$100B Plan Submitted for Massive Remake and Expansion of NSF

May 27, 2020

Legislation to reshape, expand - and rename - the National Science Foundation has been submitted in both the U.S. House and Senate. The proposal, which seems to Read more…

By John Russell

IBM Boosts Deep Learning Accuracy on Memristive Chips

May 27, 2020

IBM researchers have taken another step towards making in-memory computing based on phase change (PCM) memory devices a reality. Papers in Nature and Frontiers Read more…

By John Russell

Hats Over Hearts: Remembering Rich Brueckner

May 26, 2020

HPCwire and all of the Tabor Communications family are saddened by last week’s passing of Rich Brueckner. He was the ever-optimistic man in the Red Hat presiding over the InsideHPC media portfolio for the past decade and a constant presence at HPC’s most important events. Read more…

Nvidia Q1 Earnings Top Expectations, Datacenter Revenue Breaks $1B

May 22, 2020

Nvidia’s seemingly endless roll continued in the first quarter with the company announcing blockbuster earnings that exceeded Wall Street expectations. Nvidia Read more…

By Doug Black

Microsoft’s Massive AI Supercomputer on Azure: 285k CPU Cores, 10k GPUs

May 20, 2020

Microsoft has unveiled a supercomputing monster – among the world’s five most powerful, according to the company – aimed at what is known in scientific an Read more…

By Doug Black

HPC in Life Sciences 2020 Part 1: Rise of AMD, Data Management’s Wild West, More 

May 20, 2020

Given the disruption caused by the COVID-19 pandemic and the massive enlistment of major HPC resources to fight the pandemic, it is especially appropriate to re Read more…

By John Russell

AMD Epyc Rome Picked for New Nvidia DGX, but HGX Preserves Intel Option

May 19, 2020

AMD continues to make inroads into the datacenter with its second-generation Epyc "Rome" processor, which last week scored a win with Nvidia's announcement that Read more…

By Tiffany Trader

Supercomputer Modeling Tests How COVID-19 Spreads in Grocery Stores

April 8, 2020

In the COVID-19 era, many people are treating simple activities like getting gas or groceries with caution as they try to heed social distancing mandates and protect their own health. Still, significant uncertainty surrounds the relative risk of different activities, and conflicting information is prevalent. A team of Finnish researchers set out to address some of these uncertainties by... Read more…

By Oliver Peckham

[email protected] Turns Its Massive Crowdsourced Computer Network Against COVID-19

March 16, 2020

For gamers, fighting against a global crisis is usually pure fantasy – but now, it’s looking more like a reality. As supercomputers around the world spin up Read more…

By Oliver Peckham

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

Global Supercomputing Is Mobilizing Against COVID-19

March 12, 2020

Tech has been taking some heavy losses from the coronavirus pandemic. Global supply chains have been disrupted, virtually every major tech conference taking place over the next few months has been canceled... Read more…

By Oliver Peckham

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

Supercomputer Simulations Reveal the Fate of the Neanderthals

May 25, 2020

For hundreds of thousands of years, neanderthals roamed the planet, eventually (almost 50,000 years ago) giving way to homo sapiens, which quickly became the do Read more…

By Oliver Peckham

Steve Scott Lays Out HPE-Cray Blended Product Roadmap

March 11, 2020

Last week, the day before the El Capitan processor disclosures were made at HPE's new headquarters in San Jose, Steve Scott (CTO for HPC & AI at HPE, and former Cray CTO) was on-hand at the Rice Oil & Gas HPC conference in Houston. He was there to discuss the HPE-Cray transition and blended roadmap, as well as his favorite topic, Cray's eighth-gen networking technology, Slingshot. Read more…

By Tiffany Trader

Honeywell’s Big Bet on Trapped Ion Quantum Computing

April 7, 2020

Honeywell doesn’t spring to mind when thinking of quantum computing pioneers, but a decade ago the high-tech conglomerate better known for its control systems waded deliberately into the then calmer quantum computing (QC) waters. Fast forward to March when Honeywell announced plans to introduce an ion trap-based quantum computer whose ‘performance’ would... Read more…

By John Russell

Leading Solution Providers

SC 2019 Virtual Booth Video Tour



Fujitsu A64FX Supercomputer to Be Deployed at Nagoya University This Summer

February 3, 2020

Japanese tech giant Fujitsu announced today that it will supply Nagoya University Information Technology Center with the first commercial supercomputer powered Read more…

By Tiffany Trader

Tech Conferences Are Being Canceled Due to Coronavirus

March 3, 2020

Several conferences scheduled to take place in the coming weeks, including Nvidia’s GPU Technology Conference (GTC) and the Strata Data + AI conference, have Read more…

By Alex Woodie

Exascale Watch: El Capitan Will Use AMD CPUs & GPUs to Reach 2 Exaflops

March 4, 2020

HPE and its collaborators reported today that El Capitan, the forthcoming exascale supercomputer to be sited at Lawrence Livermore National Laboratory and serve Read more…

By John Russell

Cray to Provide NOAA with Two AMD-Powered Supercomputers

February 24, 2020

The United States’ National Oceanic and Atmospheric Administration (NOAA) last week announced plans for a major refresh of its operational weather forecasting supercomputers, part of a 10-year, $505.2 million program, which will secure two HPE-Cray systems for NOAA’s National Weather Service to be fielded later this year and put into production in early 2022. Read more…

By Tiffany Trader

‘Billion Molecules Against COVID-19’ Challenge to Launch with Massive Supercomputing Support

April 22, 2020

Around the world, supercomputing centers have spun up and opened their doors for COVID-19 research in what may be the most unified supercomputing effort in hist Read more…

By Oliver Peckham

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

15 Slides on Programming Aurora and Exascale Systems

May 7, 2020

Sometime in 2021, Aurora, the first planned U.S. exascale system, is scheduled to be fired up at Argonne National Laboratory. Cray (now HPE) and Intel are the k Read more…

By John Russell

TACC Supercomputers Run Simulations Illuminating COVID-19, DNA Replication

March 19, 2020

As supercomputers around the world spin up to combat the coronavirus, the Texas Advanced Computing Center (TACC) is announcing results that may help to illumina Read more…

By Staff report

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This