SHARED MEMORY & CLUSTERING: SGI’S APPROACHES TO HPC

July 21, 2000

by Tom Woolf

San Diego, CA — Some say the battle is brewing in high-performance computing (HPC). Which approach provides better computing performance – cache-coherent, shared-memory systems or compute clusters? As with most things, the answer is not a simple one and can best be summed up as, “It depends.”

Shared-memory systems are those “big-iron” computers with multiple processors working together to deliver lots of data-crunching horsepower. The fact that these systems are cache-coherent and in a single-system image (SSI) is important, since fast access to a large memory reserve is what gives them their power. Clusters, on the other hand, are arrays of standalone systems connected by high-speed networks that force the user to break up computing jobs into separate tasks that can be spread across the systems in the array using a programming model called message passing. Clusters can be scaled to handle larger and larger data processing jobs by adding more nodes. Each approach has its advantages and limitations, depending on the application.

Clustering has been gaining momentum as it becomes more cost/performance competitive. According to Derek Robb, product manager for the Scalable Node Itanium, the next-generation scalable node system based on the Intel microprocessor, “The cost of computer components is dropping, as is the cost of the interconnect, which means the price/performance of compute clusters is more attractive. In addition, more users and independent software vendors are writing applications that accommodate message passing in their programming models so they will run on both clusters and shared-memory systems, and they are changing their algorithms to adapt to a distributed memory model.”

However, clustering is not a magic bullet for HPC. As Ken Jacobsen, director of applications for SGI, notes, “Clustering works best for applications such as animation programs where you render hundreds of similar images to generate film frames.” Shared-memory applications, on the other hand, are designed to draw from a single memory pool and usually don’t have message-passing capability, so they can’t run in a clustering environment. In practice this means that for large computing projects, such as aerodynamic modeling or fluid dynamics, shared-memory offers a better alternative.

According to Jacobsen, the rule is if you write your own application code, then you are in a better position to rewrite it in a message passing model. In practice, this means clustering is better suited to scientific applications, where scientists customize their applications or use open source, as opposed to manufacturing environments where companies use off-the-shelf software written for the lowest common platform.

As Ben Passarelli, director of marketing for scalable servers, explains it, the customer who wants to focus on “science rather than computer science” often prefers shared-memory systems.

Jacobsen adds that while more third-party developers are starting to add message-passing structures to their applications, most commercial developers still need to accommodate multiple operating systems, which makes clustering support difficult. Rewriting commercial applications to add message passing is not trivial, so commercially viable clustering software will be slow in coming.

What this means for SGI is that the company will continue to offer both solutions to customers. In fact, clustering shared-memory systems opens up new market possibilities.

“We know that reducing memory access time is desirable, but when is it better to share everything or segment memory to process different jobs? Both solutions are important to different kinds of customers, and since we are in the business of meeting the needs of the technical community, we will continue to supply both solutions,” said Passarelli. “In fact, both architectures are converging. In the not-too-distant future, SGI software and hardware will be able to bring together the best of both worlds into a single HPC platform.”

The Yin and Yang of HPC: A Debate on the Pros and Cons of Capability Clusters and Cache-Coherent Shared-Memory Systems.

The preceding discussion provides a high-level, uncomplicated, and therefore wholly inadequate view of the debate that is raging as to the “right” way to approach high-performance computing. Like Macintosh versus IBM or rocky road versus tutti-frutti, this debate can have metaphysical ramifications in certain quarters. The following discussion is offered as a more complete view of the debate for the technically uninitiated, to raise awareness as to the whys and wherefores of cache-coherent shared-memory systems and capacity and capability clusters, just in case the subject should arise at your next encounter at the coffee machine.

A cluster is a parallel or distributed system whereby you interconnect a collection of separate computers into a single, unified computing resource. There are two basic kinds of clusters: a capacity or throughput cluster, where different jobs are run batch-style on different systems, and a capability cluster, which uses multiple systems to address huge computing problems. In a capability cluster, information is shared with other nodes using a message-passing protocol over high-speed links, such as HIPPI (high-performance parallel interface), GSNTM (Gigabyte System Network), Gigabit Ethernet, Myrinet, or Giganet.

The objective behind clustering is to take large computing jobs and break them into smaller tasks that can run and communicate effectively across multiple systems. In general, clusters are viewed as superior because they have lower initial cost and can be scaled to large numbers of processors. And since processing is shared among multiple systems, there is no single point of failure.

Much of SGI’s recent development and marketing efforts have focused on the price/performance offered by computer clustering. When measured in terms of dollars-per-megaflop, the cost of proprietary computing hardware continues to drop, at the same time the power of less expensive commodity hardware continues to increase. As a result, price/performance of clustering has dropped substantially, making it an attractive HPC approach for many SGI customers.

SGI recently announced the Advanced Cluster Environment (ACE), which offers an economical clustering solution for both IRIX/MIPS and Linux/IA platforms, leveraging the SGI 2100 and SGITM 2200 midrange computer systems for cost-effective, compute-intensive applications. The SGI IRIX ACE software is designed to complement the SGI 2100 and SGI 2200 midrange servers and draws on expertise developing implementations such as the 1,536-processor cluster for the National Center for Supercomputing Applications (NCSA) and the 6,144-processor cluster for Los Alamos National Labs (LANL). And the new product lines, SGITM 1200 and SGITM 1400, make clustering even more affordable. The pending release of the new server products built on the ItaniumTM processor will push price/performance even further by leveraging high-volume, commodity components from Intel, making clustering even more affordable as the costs of processors drop, interconnect bandwidth increases, and associated latency continues to drop.

“It would be ideal if, instead of a cluster, you could use an SSI shared-memory system of any size you want,” says Robb, “but that’s not economically or technologically feasible – you can’t scale the operating system to thousands of processors.” Robb adds that whereas the practical limit today for shared-memory systems is 128 processors (although a few 256-processor systems have been developed for special applications), clusters can continue to scale up as needed.

Robb indicates that as computing platforms and high-performance interconnects become less expensive, clustering becomes even more attractive for HPC applications. In terms of processing costs, the cost of a Linux cluster today is about $5 per megaflop as opposed to hundreds of dollars per megaflop just a few years ago. With more software development work being done on Linux for clustered nodes built using Intel Pentium, IA-32, and IA-64 processors, costs will continue to drop, making clustering even more attractive for HPC applications.

According to Jacobsen, cache-coherent, shared-memory systems, i.e., computer systems where multiple processors are configured in the same machine with a single memory resource, often deliver superior computing performance because they minimize latency, the lag time created by passing data from one point to another for processing.

“We once performed a test where we ran the same Fluent program on a cluster of four machines with four processors each and a 16-processor single-system image machine,” Jacobsen says. “We found that the 16-processor SSI machine gave performance superior to the clustered systems. The only difference was latency.” The close proximity of processors in the same machine, sharing the same memory, speeds performance because it minimizes latency.

In addition to latency, Jacobsen argues that the total cost of computing is dramatically less with a shared memory solution when matched processor-for-processor. Consider, for example, the cost of administering four shared-memory machines with 64 CPUs per machine in a single cluster, as opposed to administering 64 machines with 4 CPUs per machine. The total number of processors in the cluster is 256 in either configuration, but it is clearly easier to manage and troubleshoot four interconnected systems than it is 64 systems.

Cache-coherent, shared-memory applications also are easier to engineer since they draw from common memory; clustered applications have to use an MPI (message passing interface) to coordinate the data exchange between nodes. The MPI serves as the traffic cop that keeps track of the data, which makes the task of pointing to the data more complicated for the programmer. If an application has message passing built into its architecture it can be readily used in either a clustering environment or a shared-memory system. Applications written for a shared-memory system, however, typically do not incorporate message passing and will only run on shared-memory systems.

To highlight the pros and cons of clustered and shared-memory computing, let’s consider a market segment that has become important to SGI – automotive engineering. In the automotive world there are applications that can be categorized as “embarrassingly parallel,” such as running crash test simulations on the same auto body design using minor variations. For this application, a clustered system is practical, since each simulation can be run on a different node using slightly different parameters. However, other computer-aided engineering applications must run within a fixed time frame using off-the-shelf applications and are better suited to shared-memory systems to keep to the production schedule. Few commercial applications have MPI built in to take advantage of message passing in a clustered environment, so the fallback computing platform has to be a shared memory system.

Both Robb and Jacobsen agree that SGI customers ultimately will embrace both architectures, deploying shared-memory systems into a larger clustered infrastructure. As Jacobsen notes, an architecture with fewer clustered machines minimizes latency and administration, but by putting shared-memory systems in a compute cluster, you have the best of both worlds – a scalable HPC architecture. Robb adds, “We have to embrace both architectures and make intelligent choices about how to combine them to meet customers’ changing needs.”

Adds Passarelli, “Our customers look to SGI to deliver cost-competitive hardware that has no limits on scalability, is easy to administer, and can be integrated into a single comprehensive solution. They want computing performance without having to worry about the underlying configuration. That’s why SGI is actively working to bring together shared-memory systems and clustering into a single platform. We are committed to meeting the high-performance computing needs for all of our customers, and to do that, we need to continue to actively expand the technology for both shared-memory systems and clustered computing.”

So there is no right way or wrong way to approach HPC. Rocky road or tutti-frutti, clustering or shared-memory systems, Linux or IRIX, or an HPC sundae that includes a little bit of everything – customers can always pick the computing combination to suit their taste.

============================================================

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Mira Supercomputer Enables Cancer Research Breakthrough

November 11, 2019

Dynamic partial-wave spectroscopic (PWS) microscopy allows researchers to observe intracellular structures as small as 20 nanometers – smaller than those visible by optical microscopes – in three dimensions at a mill Read more…

By Staff report

IBM Adds Support for Ion Trap Quantum Technology to Qiskit

November 11, 2019

After years of percolating in the shadow of quantum computing research based on superconducting semiconductors – think IBM, Rigetti, Google, and D-Wave (quantum annealing) – ion trap technology is edging into the QC Read more…

By John Russell

Tackling HPC’s Memory and I/O Bottlenecks with On-Node, Non-Volatile RAM

November 8, 2019

On-node, non-volatile memory (NVRAM) is a game-changing technology that can remove many I/O and memory bottlenecks and provide a key enabler for exascale. That’s the conclusion drawn by the scientists and researcher Read more…

By Jan Rowell

What’s New in HPC Research: Cosmic Magnetism, Cryptanalysis, Car Navigation & More

November 8, 2019

In this bimonthly feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

By Oliver Peckham

Machine Learning Fuels a Booming HPC Market

November 7, 2019

Enterprise infrastructure investments for training machine learning models have grown more than 50 percent annually over the past two years, and are expected to shortly surpass $10 billion, according to a new market fore Read more…

By George Leopold

AWS Solution Channel

Making High Performance Computing Affordable and Accessible for Small and Medium Businesses with HPC on AWS

High performance computing (HPC) brings a powerful set of tools to a broad range of industries, helping to drive innovation and boost revenue in finance, genomics, oil and gas extraction, and other fields. Read more…

IBM Accelerated Insights

Atom by Atom, Supercomputers Shed Light on Alloys

November 7, 2019

Alloys are at the heart of human civilization, but developing alloys in the Information Age is much different than it was in the Bronze Age. Trial-by-error smelting has given way to the use of high-performance computing Read more…

By Oliver Peckham

IBM Adds Support for Ion Trap Quantum Technology to Qiskit

November 11, 2019

After years of percolating in the shadow of quantum computing research based on superconducting semiconductors – think IBM, Rigetti, Google, and D-Wave (quant Read more…

By John Russell

Tackling HPC’s Memory and I/O Bottlenecks with On-Node, Non-Volatile RAM

November 8, 2019

On-node, non-volatile memory (NVRAM) is a game-changing technology that can remove many I/O and memory bottlenecks and provide a key enabler for exascale. Th Read more…

By Jan Rowell

MLPerf Releases First Inference Benchmark Results; Nvidia Touts its Showing

November 6, 2019

MLPerf.org, the young AI-benchmarking consortium, today issued the first round of results for its inference test suite. Among organizations with submissions wer Read more…

By John Russell

Azure Cloud First with AMD Epyc Rome Processors

November 6, 2019

At Ignite 2019 this week, Microsoft's Azure cloud team and AMD announced an expansion of their partnership that began in 2017 when Azure debuted Epyc-backed ins Read more…

By Tiffany Trader

Nvidia Launches Credit Card-Sized 21 TOPS Jetson System for Edge Devices

November 6, 2019

Nvidia has launched a new addition to its Jetson product line: a credit card-sized (70x45mm) form factor delivering up to 21 trillion operations/second (TOPS) o Read more…

By Doug Black

In Memoriam: Steve Tuecke, Globus Co-founder

November 4, 2019

HPCwire is deeply saddened to report that Steve Tuecke, longtime scientist at Argonne National Lab and University of Chicago, has passed away at age 52. Tuecke Read more…

By Tiffany Trader

Spending Spree: Hyperscalers Bought $57B of IT in 2018, $10B+ by Google – But Is Cloud on Horizon?

October 31, 2019

Hyperscalers are the masters of the IT universe, gravitational centers of increasing pull in the emerging age of data-driven compute and AI.  In the high-stake Read more…

By Doug Black

Cray Debuts ClusterStor E1000 Finishing Remake of Portfolio for ‘Exascale Era’

October 30, 2019

Cray, now owned by HPE, today introduced the ClusterStor E1000 storage platform, which leverages Cray software and mixes hard disk drives (HDD) and flash memory Read more…

By John Russell

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

Using AI to Solve One of the Most Prevailing Problems in CFD

October 17, 2019

How can artificial intelligence (AI) and high-performance computing (HPC) solve mesh generation, one of the most commonly referenced problems in computational engineering? A new study has set out to answer this question and create an industry-first AI-mesh application... Read more…

By James Sharpe

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

D-Wave’s Path to 5000 Qubits; Google’s Quantum Supremacy Claim

September 24, 2019

On the heels of IBM’s quantum news last week come two more quantum items. D-Wave Systems today announced the name of its forthcoming 5000-qubit system, Advantage (yes the name choice isn’t serendipity), at its user conference being held this week in Newport, RI. Read more…

By John Russell

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ Read more…

By John Russell

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber, Burak Yenier and Wolfgang Gentzsch, UberCloud

Dell Ramps Up HPC Testing of AMD Rome Processors

October 21, 2019

Dell Technologies is wading deeper into the AMD-based systems market with a growing evaluation program for the latest Epyc (Rome) microprocessors from AMD. In a Read more…

By John Russell

Intel Debuts Pohoiki Beach, Its 8M Neuron Neuromorphic Development System

July 17, 2019

Neuromorphic computing has received less fanfare of late than quantum computing whose mystery has captured public attention and which seems to have generated mo Read more…

By John Russell

Rise of NIH’s Biowulf Mirrors the Rise of Computational Biology

July 29, 2019

The story of NIH’s supercomputer Biowulf is fascinating, important, and in many ways representative of the transformation of life sciences and biomedical res Read more…

By John Russell

Xilinx vs. Intel: FPGA Market Leaders Launch Server Accelerator Cards

August 6, 2019

The two FPGA market leaders, Intel and Xilinx, both announced new accelerator cards this week designed to handle specialized, compute-intensive workloads and un Read more…

By Doug Black

With the Help of HPC, Astronomers Prepare to Deflect a Real Asteroid

September 26, 2019

For years, NASA has been running simulations of asteroid impacts to understand the risks (and likelihoods) of asteroids colliding with Earth. Now, NASA and the European Space Agency (ESA) are preparing for the next, crucial step in planetary defense against asteroid impacts: physically deflecting a real asteroid. Read more…

By Oliver Peckham

When Dense Matrix Representations Beat Sparse

September 9, 2019

In our world filled with unintended consequences, it turns out that saving memory space to help deal with GPU limitations, knowing it introduces performance pen Read more…

By James Reinders

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This