SHARED MEMORY & CLUSTERING: SGI’S APPROACHES TO HPC

July 21, 2000

by Tom Woolf

San Diego, CA — Some say the battle is brewing in high-performance computing (HPC). Which approach provides better computing performance – cache-coherent, shared-memory systems or compute clusters? As with most things, the answer is not a simple one and can best be summed up as, “It depends.”

Shared-memory systems are those “big-iron” computers with multiple processors working together to deliver lots of data-crunching horsepower. The fact that these systems are cache-coherent and in a single-system image (SSI) is important, since fast access to a large memory reserve is what gives them their power. Clusters, on the other hand, are arrays of standalone systems connected by high-speed networks that force the user to break up computing jobs into separate tasks that can be spread across the systems in the array using a programming model called message passing. Clusters can be scaled to handle larger and larger data processing jobs by adding more nodes. Each approach has its advantages and limitations, depending on the application.

Clustering has been gaining momentum as it becomes more cost/performance competitive. According to Derek Robb, product manager for the Scalable Node Itanium, the next-generation scalable node system based on the Intel microprocessor, “The cost of computer components is dropping, as is the cost of the interconnect, which means the price/performance of compute clusters is more attractive. In addition, more users and independent software vendors are writing applications that accommodate message passing in their programming models so they will run on both clusters and shared-memory systems, and they are changing their algorithms to adapt to a distributed memory model.”

However, clustering is not a magic bullet for HPC. As Ken Jacobsen, director of applications for SGI, notes, “Clustering works best for applications such as animation programs where you render hundreds of similar images to generate film frames.” Shared-memory applications, on the other hand, are designed to draw from a single memory pool and usually don’t have message-passing capability, so they can’t run in a clustering environment. In practice this means that for large computing projects, such as aerodynamic modeling or fluid dynamics, shared-memory offers a better alternative.

According to Jacobsen, the rule is if you write your own application code, then you are in a better position to rewrite it in a message passing model. In practice, this means clustering is better suited to scientific applications, where scientists customize their applications or use open source, as opposed to manufacturing environments where companies use off-the-shelf software written for the lowest common platform.

As Ben Passarelli, director of marketing for scalable servers, explains it, the customer who wants to focus on “science rather than computer science” often prefers shared-memory systems.

Jacobsen adds that while more third-party developers are starting to add message-passing structures to their applications, most commercial developers still need to accommodate multiple operating systems, which makes clustering support difficult. Rewriting commercial applications to add message passing is not trivial, so commercially viable clustering software will be slow in coming.

What this means for SGI is that the company will continue to offer both solutions to customers. In fact, clustering shared-memory systems opens up new market possibilities.

“We know that reducing memory access time is desirable, but when is it better to share everything or segment memory to process different jobs? Both solutions are important to different kinds of customers, and since we are in the business of meeting the needs of the technical community, we will continue to supply both solutions,” said Passarelli. “In fact, both architectures are converging. In the not-too-distant future, SGI software and hardware will be able to bring together the best of both worlds into a single HPC platform.”

The Yin and Yang of HPC: A Debate on the Pros and Cons of Capability Clusters and Cache-Coherent Shared-Memory Systems.

The preceding discussion provides a high-level, uncomplicated, and therefore wholly inadequate view of the debate that is raging as to the “right” way to approach high-performance computing. Like Macintosh versus IBM or rocky road versus tutti-frutti, this debate can have metaphysical ramifications in certain quarters. The following discussion is offered as a more complete view of the debate for the technically uninitiated, to raise awareness as to the whys and wherefores of cache-coherent shared-memory systems and capacity and capability clusters, just in case the subject should arise at your next encounter at the coffee machine.

A cluster is a parallel or distributed system whereby you interconnect a collection of separate computers into a single, unified computing resource. There are two basic kinds of clusters: a capacity or throughput cluster, where different jobs are run batch-style on different systems, and a capability cluster, which uses multiple systems to address huge computing problems. In a capability cluster, information is shared with other nodes using a message-passing protocol over high-speed links, such as HIPPI (high-performance parallel interface), GSNTM (Gigabyte System Network), Gigabit Ethernet, Myrinet, or Giganet.

The objective behind clustering is to take large computing jobs and break them into smaller tasks that can run and communicate effectively across multiple systems. In general, clusters are viewed as superior because they have lower initial cost and can be scaled to large numbers of processors. And since processing is shared among multiple systems, there is no single point of failure.

Much of SGI’s recent development and marketing efforts have focused on the price/performance offered by computer clustering. When measured in terms of dollars-per-megaflop, the cost of proprietary computing hardware continues to drop, at the same time the power of less expensive commodity hardware continues to increase. As a result, price/performance of clustering has dropped substantially, making it an attractive HPC approach for many SGI customers.

SGI recently announced the Advanced Cluster Environment (ACE), which offers an economical clustering solution for both IRIX/MIPS and Linux/IA platforms, leveraging the SGI 2100 and SGITM 2200 midrange computer systems for cost-effective, compute-intensive applications. The SGI IRIX ACE software is designed to complement the SGI 2100 and SGI 2200 midrange servers and draws on expertise developing implementations such as the 1,536-processor cluster for the National Center for Supercomputing Applications (NCSA) and the 6,144-processor cluster for Los Alamos National Labs (LANL). And the new product lines, SGITM 1200 and SGITM 1400, make clustering even more affordable. The pending release of the new server products built on the ItaniumTM processor will push price/performance even further by leveraging high-volume, commodity components from Intel, making clustering even more affordable as the costs of processors drop, interconnect bandwidth increases, and associated latency continues to drop.

“It would be ideal if, instead of a cluster, you could use an SSI shared-memory system of any size you want,” says Robb, “but that’s not economically or technologically feasible – you can’t scale the operating system to thousands of processors.” Robb adds that whereas the practical limit today for shared-memory systems is 128 processors (although a few 256-processor systems have been developed for special applications), clusters can continue to scale up as needed.

Robb indicates that as computing platforms and high-performance interconnects become less expensive, clustering becomes even more attractive for HPC applications. In terms of processing costs, the cost of a Linux cluster today is about $5 per megaflop as opposed to hundreds of dollars per megaflop just a few years ago. With more software development work being done on Linux for clustered nodes built using Intel Pentium, IA-32, and IA-64 processors, costs will continue to drop, making clustering even more attractive for HPC applications.

According to Jacobsen, cache-coherent, shared-memory systems, i.e., computer systems where multiple processors are configured in the same machine with a single memory resource, often deliver superior computing performance because they minimize latency, the lag time created by passing data from one point to another for processing.

“We once performed a test where we ran the same Fluent program on a cluster of four machines with four processors each and a 16-processor single-system image machine,” Jacobsen says. “We found that the 16-processor SSI machine gave performance superior to the clustered systems. The only difference was latency.” The close proximity of processors in the same machine, sharing the same memory, speeds performance because it minimizes latency.

In addition to latency, Jacobsen argues that the total cost of computing is dramatically less with a shared memory solution when matched processor-for-processor. Consider, for example, the cost of administering four shared-memory machines with 64 CPUs per machine in a single cluster, as opposed to administering 64 machines with 4 CPUs per machine. The total number of processors in the cluster is 256 in either configuration, but it is clearly easier to manage and troubleshoot four interconnected systems than it is 64 systems.

Cache-coherent, shared-memory applications also are easier to engineer since they draw from common memory; clustered applications have to use an MPI (message passing interface) to coordinate the data exchange between nodes. The MPI serves as the traffic cop that keeps track of the data, which makes the task of pointing to the data more complicated for the programmer. If an application has message passing built into its architecture it can be readily used in either a clustering environment or a shared-memory system. Applications written for a shared-memory system, however, typically do not incorporate message passing and will only run on shared-memory systems.

To highlight the pros and cons of clustered and shared-memory computing, let’s consider a market segment that has become important to SGI – automotive engineering. In the automotive world there are applications that can be categorized as “embarrassingly parallel,” such as running crash test simulations on the same auto body design using minor variations. For this application, a clustered system is practical, since each simulation can be run on a different node using slightly different parameters. However, other computer-aided engineering applications must run within a fixed time frame using off-the-shelf applications and are better suited to shared-memory systems to keep to the production schedule. Few commercial applications have MPI built in to take advantage of message passing in a clustered environment, so the fallback computing platform has to be a shared memory system.

Both Robb and Jacobsen agree that SGI customers ultimately will embrace both architectures, deploying shared-memory systems into a larger clustered infrastructure. As Jacobsen notes, an architecture with fewer clustered machines minimizes latency and administration, but by putting shared-memory systems in a compute cluster, you have the best of both worlds – a scalable HPC architecture. Robb adds, “We have to embrace both architectures and make intelligent choices about how to combine them to meet customers’ changing needs.”

Adds Passarelli, “Our customers look to SGI to deliver cost-competitive hardware that has no limits on scalability, is easy to administer, and can be integrated into a single comprehensive solution. They want computing performance without having to worry about the underlying configuration. That’s why SGI is actively working to bring together shared-memory systems and clustering into a single platform. We are committed to meeting the high-performance computing needs for all of our customers, and to do that, we need to continue to actively expand the technology for both shared-memory systems and clustered computing.”

So there is no right way or wrong way to approach HPC. Rocky road or tutti-frutti, clustering or shared-memory systems, Linux or IRIX, or an HPC sundae that includes a little bit of everything – customers can always pick the computing combination to suit their taste.

============================================================

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Supercomputers Take to the Solar Winds

June 5, 2020

The whims of the solar winds – charged particles flowing from the Sun’s atmosphere – can interfere with systems that are now crucial for modern life, such as satellites and GPS services – but these winds can be d Read more…

By Oliver Peckham

HPC in O&G: Deep Sea Drilling – What Happens Now   

June 4, 2020

At the beginning of March I attended the Rice Oil & Gas HPC conference in Houston. That seems a long time ago now. It’s a great event where oil and gas specialists join with compute veterans and the discussion tell Read more…

By Rosemary Francis

NCSA Wades into Post-Blue Waters Era with Delta Supercomputer

June 3, 2020

NSF has awarded the National Center for Supercomputing Applications (NCSA) $10 million for its next supercomputer - named Delta – “which will kick-start NCSA’s next generation of supercomputers post-Blue Waters,” Read more…

By John Russell

Dell Integrates Bitfusion for vHPC, GPU ‘Pools’

June 3, 2020

Dell Technologies advanced its hardware virtualization strategy to AI workloads this week with the introduction of capabilities aimed at expanding access to GPU and HPC services via its EMC, VMware and recently acquired Read more…

By George Leopold

Supercomputers Streamline Prediction of Dangerous Arrhythmia

June 2, 2020

Heart arrhythmia can prove deadly, contributing to the hundreds of thousands of deaths from cardiac arrest in the U.S. every year. Unfortunately, many of those arrhythmia are induced as side effects from various medicati Read more…

By Staff report

AWS Solution Channel

Join AWS, Univa and Intel for This Informative Session!

Event Date: June 18, 2020

More enterprises than ever are turning to HPC cloud computing. Whether you’re just getting started, or more mature in your use of cloud, this HPC Cloud webinar is an excellent opportunity to gain valuable insights and knowledge to help accelerate your HPC cloud projects. Read more…

Indiana University to Deploy Jetstream 2 Cloud with AMD, Nvidia Technology

June 2, 2020

Indiana University has been awarded a $10 million NSF grant to build ‘Jetstream 2,’ a cloud computing system that will provide 8 aggregate petaflops of computing capability in support of data analysis and AI workload Read more…

By Tiffany Trader

NCSA Wades into Post-Blue Waters Era with Delta Supercomputer

June 3, 2020

NSF has awarded the National Center for Supercomputing Applications (NCSA) $10 million for its next supercomputer - named Delta – “which will kick-start NCS Read more…

By John Russell

Indiana University to Deploy Jetstream 2 Cloud with AMD, Nvidia Technology

June 2, 2020

Indiana University has been awarded a $10 million NSF grant to build ‘Jetstream 2,’ a cloud computing system that will provide 8 aggregate petaflops of comp Read more…

By Tiffany Trader

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

COVID-19 HPC Consortium Expands to Europe, Reports on Research Projects

May 28, 2020

The COVID-19 HPC Consortium, a public-private effort delivering free access to HPC processing for scientists pursuing coronavirus research – some utilizing AI Read more…

By Doug Black

$100B Plan Submitted for Massive Remake and Expansion of NSF

May 27, 2020

Legislation to reshape, expand - and rename - the National Science Foundation has been submitted in both the U.S. House and Senate. The proposal, which seems to Read more…

By John Russell

IBM Boosts Deep Learning Accuracy on Memristive Chips

May 27, 2020

IBM researchers have taken another step towards making in-memory computing based on phase change (PCM) memory devices a reality. Papers in Nature and Frontiers Read more…

By John Russell

Hats Over Hearts: Remembering Rich Brueckner

May 26, 2020

HPCwire and all of the Tabor Communications family are saddened by last week’s passing of Rich Brueckner. He was the ever-optimistic man in the Red Hat presiding over the InsideHPC media portfolio for the past decade and a constant presence at HPC’s most important events. Read more…

Nvidia Q1 Earnings Top Expectations, Datacenter Revenue Breaks $1B

May 22, 2020

Nvidia’s seemingly endless roll continued in the first quarter with the company announcing blockbuster earnings that exceeded Wall Street expectations. Nvidia Read more…

By Doug Black

Supercomputer Modeling Tests How COVID-19 Spreads in Grocery Stores

April 8, 2020

In the COVID-19 era, many people are treating simple activities like getting gas or groceries with caution as they try to heed social distancing mandates and protect their own health. Still, significant uncertainty surrounds the relative risk of different activities, and conflicting information is prevalent. A team of Finnish researchers set out to address some of these uncertainties by... Read more…

By Oliver Peckham

[email protected] Turns Its Massive Crowdsourced Computer Network Against COVID-19

March 16, 2020

For gamers, fighting against a global crisis is usually pure fantasy – but now, it’s looking more like a reality. As supercomputers around the world spin up Read more…

By Oliver Peckham

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

Global Supercomputing Is Mobilizing Against COVID-19

March 12, 2020

Tech has been taking some heavy losses from the coronavirus pandemic. Global supply chains have been disrupted, virtually every major tech conference taking place over the next few months has been canceled... Read more…

By Oliver Peckham

Supercomputer Simulations Reveal the Fate of the Neanderthals

May 25, 2020

For hundreds of thousands of years, neanderthals roamed the planet, eventually (almost 50,000 years ago) giving way to homo sapiens, which quickly became the do Read more…

By Oliver Peckham

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

Steve Scott Lays Out HPE-Cray Blended Product Roadmap

March 11, 2020

Last week, the day before the El Capitan processor disclosures were made at HPE's new headquarters in San Jose, Steve Scott (CTO for HPC & AI at HPE, and former Cray CTO) was on-hand at the Rice Oil & Gas HPC conference in Houston. He was there to discuss the HPE-Cray transition and blended roadmap, as well as his favorite topic, Cray's eighth-gen networking technology, Slingshot. Read more…

By Tiffany Trader

Honeywell’s Big Bet on Trapped Ion Quantum Computing

April 7, 2020

Honeywell doesn’t spring to mind when thinking of quantum computing pioneers, but a decade ago the high-tech conglomerate better known for its control systems waded deliberately into the then calmer quantum computing (QC) waters. Fast forward to March when Honeywell announced plans to introduce an ion trap-based quantum computer whose ‘performance’ would... Read more…

By John Russell

Leading Solution Providers

SC 2019 Virtual Booth Video Tour

AMD
AMD
ASROCK RACK
ASROCK RACK
AWS
AWS
CEJN
CJEN
CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
IBM
IBM
MELLANOX
MELLANOX
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
SIX NINES IT
SIX NINES IT
VERNE GLOBAL
VERNE GLOBAL
WEKAIO
WEKAIO

Contributors

Tech Conferences Are Being Canceled Due to Coronavirus

March 3, 2020

Several conferences scheduled to take place in the coming weeks, including Nvidia’s GPU Technology Conference (GTC) and the Strata Data + AI conference, have Read more…

By Alex Woodie

Exascale Watch: El Capitan Will Use AMD CPUs & GPUs to Reach 2 Exaflops

March 4, 2020

HPE and its collaborators reported today that El Capitan, the forthcoming exascale supercomputer to be sited at Lawrence Livermore National Laboratory and serve Read more…

By John Russell

‘Billion Molecules Against COVID-19’ Challenge to Launch with Massive Supercomputing Support

April 22, 2020

Around the world, supercomputing centers have spun up and opened their doors for COVID-19 research in what may be the most unified supercomputing effort in hist Read more…

By Oliver Peckham

Cray to Provide NOAA with Two AMD-Powered Supercomputers

February 24, 2020

The United States’ National Oceanic and Atmospheric Administration (NOAA) last week announced plans for a major refresh of its operational weather forecasting supercomputers, part of a 10-year, $505.2 million program, which will secure two HPE-Cray systems for NOAA’s National Weather Service to be fielded later this year and put into production in early 2022. Read more…

By Tiffany Trader

15 Slides on Programming Aurora and Exascale Systems

May 7, 2020

Sometime in 2021, Aurora, the first planned U.S. exascale system, is scheduled to be fired up at Argonne National Laboratory. Cray (now HPE) and Intel are the k Read more…

By John Russell

Australian Researchers Break All-Time Internet Speed Record

May 26, 2020

If you’ve been stuck at home for the last few months, you’ve probably become more attuned to the quality (or lack thereof) of your internet connection. Even Read more…

By Oliver Peckham

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

Nvidia’s Ampere A100 GPU: Up to 2.5X the HPC, 20X the AI

May 14, 2020

Nvidia's first Ampere-based graphics card, the A100 GPU, packs a whopping 54 billion transistors on 826mm2 of silicon, making it the world's largest seven-nanom Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This