Cray: ‘100 Percent Focused on HPC’

By Christopher Lazou, HiPerCom Consultants

July 8, 2005

There's no doubt about it: Cray is all about HPC. Here, CTO Steve Scott opens up to Chris Lazou about some recent developments at Cray, and elaborates on the company's plans to lead the HPC market.


Christopher Lazou: Steve, it's good that you can spare some time to talk to me. As chief technology officer at Cray, and with so many new products delivered recently to the market, you must be an extremely busy man. Let's briefly discuss what is needed to build successful high productivity HPC systems and, in the process, try to gain insight into your views concerning HPC futures.

At SC04 in Pittsburgh last November, Cray received several [HPCwire] awards for the new Cray XT3 (Red Storm) and Cray XD1 product lines. You got awards for the most important emerging technology, the most innovative HPC hardware technology and the best collaboration between government and industry. Can you sum up the new products and what's happening with them?

Steve Scott: Since SC04, Cray introduced the Cray X1E system, a major upgrade to the successful Cray X1 vector supercomputer. The Cray X1E is a compatible, board-swap upgrade with nearly three times the computational power of the Cray X1 system. We also began shipping the high-end Cray XT3 and larger versions of the mid-range Cray XD1 systems. Both exploit AMD Opteron and HyperTransport technology in Cray architectures designed for strong balance, scalability and reliability. We've sold large-scale Cray XT3 systems in the U.S. and around the world, including two in Japan and our first in Europe, at CSCS in Switzerland. We have sold Cray XD1 systems in many countries, including eight countries in Europe, so our installed base is getting stronger. The Cray XD1 system recently began shipping with dual-core Opteron processors, and we'll begin shipping Cray XT3 dual-core systems early next year.

The Cray X1E product line is also doing well in the market. I believe every one of our Cray X1 sites has upgraded or will soon upgrade to the Cray X1E. We recently announced that a 128-processor, 320Gbytes Cray X1E is doing production weather forecasting and climate modeling at Spain's National Institute of Meteorology (INM). In December 2004, Warsaw University's Interdisciplinary Center for Mathematical and Computational Modeling (ICM) became the first customer to receive a Cray X1E system. The Korea Meteorological Administration (KMA) will operate one of the largest numerical weather forecasting systems in the world when it upgrades later this year to a 16-teraflop/s Cray X1E supercomputer. Oak Ridge National Laboratory (ORNL) will run a 20-teraflop/s Cray X1E supercomputer along with a 20-teraflop/s Cray XT3 system, as part of the DoE's plan to build the world's most powerful supercomputing capability for open, non-classified scientific research at ORNL.

Lazou: In my January article, “Going for Gold in a Computer Olympiad,” (http://www.taborcommunications.com/hpcwire/hpcwireWWW/05/0121/109098.html) which used the HPC Challenge (HPCC) benchmark tests to measure performance, Cray did very well. Cray products won a total of four gold, four silver and one bronze medals. Six months on, how is Cray faring now?

Scott: Newer results show the Cray systems performing even better. As of June 15, the Cray XT3 had the best scores on 7 of 10 condensed results tests, compared to an IBM Blue Gene and an SGI Altix system, the only other systems with posted results for over 1000 processors. We've since posted results for a 3,744-processor Cray XT3. That system has the fastest posted results on all but one of the global metrics. In comparisons of 128-processor scalar systems, the Cray XD1 was first in four tests, more than for any other microprocessor- based product.

Though the HP Linpack benchmark has made an important contribution over the years by providing a census of the largest HPC systems and trends, it has also been clear for some time that a much more complete and comprehensive set of performance measures was required. As Jack Dongarra has stressed, no single test can accurately predict how an HPC system will perform across a spectrum of real-world problems. Rather than a single tool and a single number, we needed a set of high-level tests that could provide a reasonable indication of how well different kinds of applications might actually run on different systems. This is exactly what the HPCC benchmark suite does. Thanks to Jack and his colleagues, we now have an objective way to identify and reward desired architectural features and overall balance. It's exciting that HPCC is starting to be used for important procurements, like the recent “Horizon” procurement that resulted in CSCS selecting a 5.9-teraflop/s Cray XT3 system.

Lazou: Chip manufacturers have managed to refine their etching to produce more dense chips. 90nm are used at present and 65nm are in the pipeline. Can you give some indication of new near future Cray products? What are the engineering challenges to overcome and, as Cray is not a chip vendor, how are you sourcing your needs?

Scott: Cray systems use both commodity and custom-designed chips. Our Cray XT3 and Cray XD1 supercomputers use AMD Opterons as their compute processors, and we can drop the latest processors into these products as they become available. As I mentioned earlier, we're already shipping XD1 systems with dual core Opterons. We do our own designs for system ASICs, as well as the vector processors in the Cray X1E and follow-on systems. We're currently working on both 90nm and 65nm designs in house, and are sourcing our chips from multiple fabs. Not doing our own fabrication isn't a problem at all.

Lazou: The Red Storm product line uses an AMD Opteron processor and a chipset with high bandwidth memory and a network interconnect developed by Cray. In what way does this differ from the many systems offered by vendors using the Intel Processor Family (IPF) grafted onto their own chipset?

Scott: We chose the AMD Opteron processors because they have a couple of important attributes in their favor. First, AMD integrated the memory controller on-chip. That enables very low memory latency (about half that of competitors), and it provides scalable bandwidth with the addition of more processors, since the processors don't share a front side memory bus. Second, the Opteron's HyperTransport connections provide high bandwidth, low latency connections directly into the memory system. We then connect the HyperTransport link directly into our network, whereas Opteron-based clusters connect this link to an IO bridge and then connect the network through an IO bus like PCI-X.

Lazou: What about performance issues? How are you solving the “memory-cpu” gap? Are memory bandwidth and latency still the key important architecture features, or is the architecture going to be turned around, putting the network center stage?

Scott: Memory bandwidth and latency mostly affect single-thread performance, while the network is key for implementing parallelism. In HPC, both single- thread and parallel performance is important.

The “memory gap” is caused by the increasing relative latency of memory accesses. There are several ways to hide latency, including the use of vectors or multithreading. This is an area where Cray has lots of expertise. We also provide support to address the “communication wall” in both our custom and Opteron-based systems. These systems allow the entire system memory to be accessed by a single processor, and provide support for increasing communication concurrency, and reducing communication latency and overhead.

Lazou: What alternatives are there for increasing performance? Are hybrid systems such as Cray Rainier the answer, and what are its main architecture blocks (components), which makes high productivity performance possible? Are vectors, processor-in-memory (PIM) technology and FPGAs part of the new architecture? Vendors are putting several cores on a chip; could this development on its own deliver sustained performance to the very high-end user applications?

Scott: Processor designers have hit an interesting inflection point. You can no longer use all the transistors on a chip for a single thread. This provides a real opportunity for design innovation. Multi-core processors are a good way to increase overall performance at the socket level but provide some additional challenges at the system level. When you move to dual cores, it helps a lot to have strong bandwidth to start with. All of our Cray systems were designed with dual core in mind. They provide enough bandwidth to make the move to dual-core processors rewarding for real-world applications performance. But multi-core designs aggravate the scalability issue and aren't necessarily the right approach for all problems.

Heterogeneous solutions that include more than one processor type, such as vectors, FPGAs and multithreaded processors, are needed to address the full spectrum of applications. For both custom and COTS processor-based systems, there are different approaches to system design and balance that are appropriate for different applications.

Lazou: In my interview [http://www.taborcommunications.com/hpcwire/hpcwireWWW/04/0618/107850.html] with Tadashi Watanabe last year, one of his comments on performance issues, was: “Another way is to have the processor and memory on the same chip. Even if they can co-exist, the memory size relative to processor speed becomes a problem. For every one Gigaflop/s we need, 10Gb of memory are needed to maintain a balanced efficient system. For a 100 Gigaflop/s CPU, we will need one trillion bits of memory. One can see that for high performance, the processor and memory idea is not feasible. This of course is possible for small amounts of memory, but not for enough memory to produce a balanced system.” How is Cray addressing this?

Scott: Watanabe-san is right. This need for balance effectively restricts pure PIM machines to specialized applications. The lack of compute-memory balance also applies to systems like MDGRAPE and, to a lesser extent, designs like Blue Gene. In HPC, one size or one architecture is not best for all applications. That's one of the reasons why Cray offers more than one kind of HPC system.

Lazou: In my recent interview [http://news.taborcommunications.com/msgget.jsp?mid=398645&xsl=story.xsl] with Professor Michael Resch, director of HLRS, I asked. “Can you elaborate on why HLRS chose to continue with the vector parallel architecture and on how you arrived at this heterogeneous computing environment?” His reply was: “We did not choose based on architecture. We collected a number of attributes in a basket and looked for a vendor with a strong roadmap containing these components, i.e., we wanted a strong partner. NEC has a strong record in microprocessors and that was very important to us.” How does Cray propose to compete with larger vendors in the partnership stakes?

Scott: Cray has a very successful record in partnering with customers. Good examples are the Red Storm project at Sandia, the leadership-computing project at Oak Ridge, KMA, CSCS, PSC and many others. On the supplier side, we have strong relationships with AMD and others. No one wins every opportunity, but we have a strong track record of success in competitions that include much larger vendors. We don't compete against every division of mega-companies that sell everything from televisions to PCs to HPC systems. We compete against their HPC business units. When you look at it that way, Cray is a fairly large HPC vendor, because everyone at Cray is focused 100 percent on HPC.

Lazou: Shifting our attention back to the U.S. high-end computing scene, some experts, such as the High-End Crusader, believe that the U.S. is slipping in supercomputing competitiveness and that commercial server clusters have dried up the market and increased costs for “true” HPC systems. Assuming that commodity clusters alone are incapable of providing solutions for all HPC applications, what can we expect in the way of future supercomputers from the U.S., and in particular from Cray?

Scott: It's not a binary, win/lose situation. As I said earlier, no one HPC architecture is best for everything. Customers make rational choices. For many customers, especially those with capacity workloads and not a lot of communications-intensive problems, clusters can be the best solution. Other customers have problems and workloads that run fastest and most cost- effectively on the more-balanced architectures Cray designs. I think Bill Camp expressed this well when he said Sandia expects to get more real work done, and lower overall cost, on the Red Storm system than on large-scale clusters.

It's important to add, though, that the problems that require more balanced architectures tend to be the most challenging, consequential problems. They're the hardest ones to solve, and solving them would make the biggest difference for these users, whether they're in government or industry, or university research. So, it's entirely possible for the U.S., or any other global region, to win the market battle with clusters and still fall behind in basic science and industrial competitiveness. Several years ago, the Earth Simulator brought home the lesson that performance on real-world applications is more important than peak flops.

Cray is going to continue to design systems for superior performance on challenging real-world applications. Those are the customer requirements we want to focus on. The need for balanced architectures exists at a wide range of price points, not just at the high end of the HPC market, as we've demonstrated with the Cray XD1 system, at fairly modest system sizes.

Lazou: One of the concerns about a radical change of architecture is that it may turn out to suffer from a major drawback, namely, it can end up being too specialized. In that case, it will probably achieve good performance on certain codes, which match this particular machine architecture, but it is unlikely to become competitive across the more general scientific applications. Another difficulty is changing the software to take advantage of these specialized features, especially large application codes provided by ISVs. What does Cray have to say about these concerns?

Scott: Cray systems present a programming model similar to that of commodity clusters, along with more advanced programming models. So, we're within the prevalent programming paradigm. Due to the high cost of software, we can't afford to be different. Porting to systems like ours that have good balance is actually easier, because you have fewer bottlenecks to scaling. It's much easier to program well-balanced systems, since users have to worry less about managing communication. I think the hardest challenge is that of scaling applications to, say, 100,000 slow processors. If you have fast systems with powerful processors, you need fewer processors and this makes things easier for users.

Lazou: At present, the U.S. government is providing R&D funding to enable vendors to deliver a high productivity supercomputer with one Petaflop/s sustained performance by 2009. Without this stream of funds, can a small company like Cray (compared to say IBM), support such high R&D outlay to stay competitive in supercomputing?

Scott: The DARPA HPCS program's intention is to support R&D innovation that HPC vendors would not have undertaken on their own. So, none of the three vendors selected for the current Phase 2 of this program (Cray, IBM, Sun) was budgeting R&D funds for the specific goals of this program. All of the vendors, including the biggest, IBM, successfully argued that they needed government funding support to meet the program's goals by 2009-10.

We don't know how much large companies like IBM and HP spend specifically on HPC research and development. The vast majority of the R&D spending in those companies is not for HPC, whereas every dollar of Cray's R&D is spent on HPC. You could turn the question around and ask how very large vendors can continue to compete successfully when HPC is too small a market for them to design products for. In most cases, they are leveraging products into the HPC market that were designed for higher-volume server markets with different requirements. We think a company like Cray that focuses 100% on the requirements of HPC users has a competitive advantage and is more likely to produce machines similar to what DARPA is looking for.

Lazou: I think we explored a fair number of issues. Thank you Steve, for your time and frank answers. I am sure our readers would find your views very interesting.

(Brands and names are the property of their respective owners) Copyright: Christopher Lazou, HiPerCom Consultants, Ltd., UK. July 2005

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

2024 Winter Classic: The Return of Team Fayetteville

April 18, 2024

Hailing from Fayetteville, NC, Fayetteville State University stayed under the radar in their first Winter Classic competition in 2022. Solid students for sure, but not a lot of HPC experience. All good. They didn’t Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use of Rigetti’s Novera 9-qubit QPU. The approach by a quantum Read more…

2024 Winter Classic: Meet Team Morehouse

April 17, 2024

Morehouse College? The university is well-known for their long list of illustrious graduates, the rigor of their academics, and the quality of the instruction. They were one of the first schools to sign up for the Winter Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pressing needs and hurdles to widespread AI adoption. The sudde Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs 

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire