SHARED MEMORY & CLUSTERING: SGI’S APPROACHES TO HPC

July 21, 2000

by Tom Woolf

San Diego, CA — Some say the battle is brewing in high-performance computing (HPC). Which approach provides better computing performance – cache-coherent, shared-memory systems or compute clusters? As with most things, the answer is not a simple one and can best be summed up as, “It depends.”

Shared-memory systems are those “big-iron” computers with multiple processors working together to deliver lots of data-crunching horsepower. The fact that these systems are cache-coherent and in a single-system image (SSI) is important, since fast access to a large memory reserve is what gives them their power. Clusters, on the other hand, are arrays of standalone systems connected by high-speed networks that force the user to break up computing jobs into separate tasks that can be spread across the systems in the array using a programming model called message passing. Clusters can be scaled to handle larger and larger data processing jobs by adding more nodes. Each approach has its advantages and limitations, depending on the application.

Clustering has been gaining momentum as it becomes more cost/performance competitive. According to Derek Robb, product manager for the Scalable Node Itanium, the next-generation scalable node system based on the Intel microprocessor, “The cost of computer components is dropping, as is the cost of the interconnect, which means the price/performance of compute clusters is more attractive. In addition, more users and independent software vendors are writing applications that accommodate message passing in their programming models so they will run on both clusters and shared-memory systems, and they are changing their algorithms to adapt to a distributed memory model.”

However, clustering is not a magic bullet for HPC. As Ken Jacobsen, director of applications for SGI, notes, “Clustering works best for applications such as animation programs where you render hundreds of similar images to generate film frames.” Shared-memory applications, on the other hand, are designed to draw from a single memory pool and usually don’t have message-passing capability, so they can’t run in a clustering environment. In practice this means that for large computing projects, such as aerodynamic modeling or fluid dynamics, shared-memory offers a better alternative.

According to Jacobsen, the rule is if you write your own application code, then you are in a better position to rewrite it in a message passing model. In practice, this means clustering is better suited to scientific applications, where scientists customize their applications or use open source, as opposed to manufacturing environments where companies use off-the-shelf software written for the lowest common platform.

As Ben Passarelli, director of marketing for scalable servers, explains it, the customer who wants to focus on “science rather than computer science” often prefers shared-memory systems.

Jacobsen adds that while more third-party developers are starting to add message-passing structures to their applications, most commercial developers still need to accommodate multiple operating systems, which makes clustering support difficult. Rewriting commercial applications to add message passing is not trivial, so commercially viable clustering software will be slow in coming.

What this means for SGI is that the company will continue to offer both solutions to customers. In fact, clustering shared-memory systems opens up new market possibilities.

“We know that reducing memory access time is desirable, but when is it better to share everything or segment memory to process different jobs? Both solutions are important to different kinds of customers, and since we are in the business of meeting the needs of the technical community, we will continue to supply both solutions,” said Passarelli. “In fact, both architectures are converging. In the not-too-distant future, SGI software and hardware will be able to bring together the best of both worlds into a single HPC platform.”

The Yin and Yang of HPC: A Debate on the Pros and Cons of Capability Clusters and Cache-Coherent Shared-Memory Systems.

The preceding discussion provides a high-level, uncomplicated, and therefore wholly inadequate view of the debate that is raging as to the “right” way to approach high-performance computing. Like Macintosh versus IBM or rocky road versus tutti-frutti, this debate can have metaphysical ramifications in certain quarters. The following discussion is offered as a more complete view of the debate for the technically uninitiated, to raise awareness as to the whys and wherefores of cache-coherent shared-memory systems and capacity and capability clusters, just in case the subject should arise at your next encounter at the coffee machine.

A cluster is a parallel or distributed system whereby you interconnect a collection of separate computers into a single, unified computing resource. There are two basic kinds of clusters: a capacity or throughput cluster, where different jobs are run batch-style on different systems, and a capability cluster, which uses multiple systems to address huge computing problems. In a capability cluster, information is shared with other nodes using a message-passing protocol over high-speed links, such as HIPPI (high-performance parallel interface), GSNTM (Gigabyte System Network), Gigabit Ethernet, Myrinet, or Giganet.

The objective behind clustering is to take large computing jobs and break them into smaller tasks that can run and communicate effectively across multiple systems. In general, clusters are viewed as superior because they have lower initial cost and can be scaled to large numbers of processors. And since processing is shared among multiple systems, there is no single point of failure.

Much of SGI’s recent development and marketing efforts have focused on the price/performance offered by computer clustering. When measured in terms of dollars-per-megaflop, the cost of proprietary computing hardware continues to drop, at the same time the power of less expensive commodity hardware continues to increase. As a result, price/performance of clustering has dropped substantially, making it an attractive HPC approach for many SGI customers.

SGI recently announced the Advanced Cluster Environment (ACE), which offers an economical clustering solution for both IRIX/MIPS and Linux/IA platforms, leveraging the SGI 2100 and SGITM 2200 midrange computer systems for cost-effective, compute-intensive applications. The SGI IRIX ACE software is designed to complement the SGI 2100 and SGI 2200 midrange servers and draws on expertise developing implementations such as the 1,536-processor cluster for the National Center for Supercomputing Applications (NCSA) and the 6,144-processor cluster for Los Alamos National Labs (LANL). And the new product lines, SGITM 1200 and SGITM 1400, make clustering even more affordable. The pending release of the new server products built on the ItaniumTM processor will push price/performance even further by leveraging high-volume, commodity components from Intel, making clustering even more affordable as the costs of processors drop, interconnect bandwidth increases, and associated latency continues to drop.

“It would be ideal if, instead of a cluster, you could use an SSI shared-memory system of any size you want,” says Robb, “but that’s not economically or technologically feasible – you can’t scale the operating system to thousands of processors.” Robb adds that whereas the practical limit today for shared-memory systems is 128 processors (although a few 256-processor systems have been developed for special applications), clusters can continue to scale up as needed.

Robb indicates that as computing platforms and high-performance interconnects become less expensive, clustering becomes even more attractive for HPC applications. In terms of processing costs, the cost of a Linux cluster today is about $5 per megaflop as opposed to hundreds of dollars per megaflop just a few years ago. With more software development work being done on Linux for clustered nodes built using Intel Pentium, IA-32, and IA-64 processors, costs will continue to drop, making clustering even more attractive for HPC applications.

According to Jacobsen, cache-coherent, shared-memory systems, i.e., computer systems where multiple processors are configured in the same machine with a single memory resource, often deliver superior computing performance because they minimize latency, the lag time created by passing data from one point to another for processing.

“We once performed a test where we ran the same Fluent program on a cluster of four machines with four processors each and a 16-processor single-system image machine,” Jacobsen says. “We found that the 16-processor SSI machine gave performance superior to the clustered systems. The only difference was latency.” The close proximity of processors in the same machine, sharing the same memory, speeds performance because it minimizes latency.

In addition to latency, Jacobsen argues that the total cost of computing is dramatically less with a shared memory solution when matched processor-for-processor. Consider, for example, the cost of administering four shared-memory machines with 64 CPUs per machine in a single cluster, as opposed to administering 64 machines with 4 CPUs per machine. The total number of processors in the cluster is 256 in either configuration, but it is clearly easier to manage and troubleshoot four interconnected systems than it is 64 systems.

Cache-coherent, shared-memory applications also are easier to engineer since they draw from common memory; clustered applications have to use an MPI (message passing interface) to coordinate the data exchange between nodes. The MPI serves as the traffic cop that keeps track of the data, which makes the task of pointing to the data more complicated for the programmer. If an application has message passing built into its architecture it can be readily used in either a clustering environment or a shared-memory system. Applications written for a shared-memory system, however, typically do not incorporate message passing and will only run on shared-memory systems.

To highlight the pros and cons of clustered and shared-memory computing, let’s consider a market segment that has become important to SGI – automotive engineering. In the automotive world there are applications that can be categorized as “embarrassingly parallel,” such as running crash test simulations on the same auto body design using minor variations. For this application, a clustered system is practical, since each simulation can be run on a different node using slightly different parameters. However, other computer-aided engineering applications must run within a fixed time frame using off-the-shelf applications and are better suited to shared-memory systems to keep to the production schedule. Few commercial applications have MPI built in to take advantage of message passing in a clustered environment, so the fallback computing platform has to be a shared memory system.

Both Robb and Jacobsen agree that SGI customers ultimately will embrace both architectures, deploying shared-memory systems into a larger clustered infrastructure. As Jacobsen notes, an architecture with fewer clustered machines minimizes latency and administration, but by putting shared-memory systems in a compute cluster, you have the best of both worlds – a scalable HPC architecture. Robb adds, “We have to embrace both architectures and make intelligent choices about how to combine them to meet customers’ changing needs.”

Adds Passarelli, “Our customers look to SGI to deliver cost-competitive hardware that has no limits on scalability, is easy to administer, and can be integrated into a single comprehensive solution. They want computing performance without having to worry about the underlying configuration. That’s why SGI is actively working to bring together shared-memory systems and clustering into a single platform. We are committed to meeting the high-performance computing needs for all of our customers, and to do that, we need to continue to actively expand the technology for both shared-memory systems and clustered computing.”

So there is no right way or wrong way to approach HPC. Rocky road or tutti-frutti, clustering or shared-memory systems, Linux or IRIX, or an HPC sundae that includes a little bit of everything – customers can always pick the computing combination to suit their taste.

============================================================

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

2024 Winter Classic: The Return of Team Fayetteville

April 18, 2024

Hailing from Fayetteville, NC, Fayetteville State University stayed under the radar in their first Winter Classic competition in 2022. Solid students for sure, but not a lot of HPC experience. All good. They didn’t Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use of Rigetti’s Novera 9-qubit QPU. The approach by a quantum Read more…

2024 Winter Classic: Meet Team Morehouse

April 17, 2024

Morehouse College? The university is well-known for their long list of illustrious graduates, the rigor of their academics, and the quality of the instruction. They were one of the first schools to sign up for the Winter Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pressing needs and hurdles to widespread AI adoption. The sudde Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs 

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire