Grid, HPC and SOA: The Real Thing?

By By Labro Dimitriou, Contributing Author

May 23, 2005

How do we know when a new technology is the real thing or just a fad? Furthermore, how do we value the significance of a new technology, and when is new technology a tactical or a strategic decision? In this article, I will discuss why Grid and SOA are here to stay. I will also describe the technology “product stack” in order to identify strategic from tactical, and I will propose best practices techniques for securing ROI and offering resilience to change and early adoption risks.

Some would say that Grid and SOA are not revolutionary concepts, but rather evolutionary steps of enterprise distributed computing. Make no mistake, though: together, the technologies have the potential and the power to bring about a computing revolution. Grid and SOA may seem unrelated, but they are complementary notions with fundamentally the same technology underpinning and common business goals. They are service-based entities supporting the adaptive enterprise.

So, let's talk about the adaptive, or agile, enterprise and its characteristics. The only constant in today's business models is change. Constant change in the way of doing business exists either because the company is out of focus or because of new competitive pressures: today we are product-focused, tomorrow we are client-centric. Re-engineering the enterprise is no longer the final state, but more of an ongoing-effort. Consider six-sigma and the Business Process Management (BPM) initiatives. Integration is not an afterthought anymore; most systems are built with integration as a hard requirement. There are changes of underlying technology that are apparent across all infrastructures and applications. The fact that new hardware delivers more power for less money proves that Moore's law is still valid. And last, but most challenging, are the varying requirements of processing compute power. Clearly, over-provisioning can only lead to underutilization and overspending, both undesirable results.

Information systems have to support the adaptive enterprise. As David Taylor wrote in his book Business Engineering with Object Technology: “Information systems, like the business models they support, must be adaptive in nature.” Simply put, Information systems have two layers: software and hardware supporting and facilitating business requirements.

SOA decouples business requirements and presentation (user interface) from the core application. Thus, shielding the end-user from incremental changes and visa versa: localizing the effect of code change when requirements adapt to new business conditions.

Grid software decouples computing needs from hardware capacity. It inserts the necessary abstraction layer that not only protects the application from hardware change, but also provides horizontal scalability, predictability with guaranteed SLAs, fault tolerance by design and maximum CPU utilization.

SOA gave rise to the notion of the enterprise service bus, which can transform a portfolio of monolithic applications to a pool of highly parameterized service based components. A new business application can be designed by orchestrating a set of Web services already in production. Time to market for a new application can be reduced by orders of magnitude. Grid services virtualize compute silos suffering from under-performance or under-utilization and turns them into well-balanced, fully utilized enterprise compute backbones.

SOA provides an optimal path for a minimum cost re-engineering or integration effort for a legacy system. In many cases, legacy systems gain longevity by replacing a hard-wired interface with a Web services layer. The Grid toolkit can turn a legacy application that hit the performance boundaries of a large SMP box to an HPC application running on a farm of high-powered, low cost commodity hardware.

Consider a small to medium enterprise with three or four vertical lines of businesses (LOB) each requiring a few turnkey applications. The traditional approach would be to look at the requirements of each application in isolation, design the code and deploy on hardware managed by the LOB. What is wrong with that approach? Well, lines of businesses most certainly share a good number of requirements, which means the enterprise spends money doing many of the same things multiple times. And what about addressing computing demands to run the dozen or so applications? Each LOB has to do its own capacity management.

Keeping a business unit happy is a tight walk between under-provisioning and over-spending. SOA is an architectural blueprint that delivers on its promise of application reuse and interoperability. It provides a top to bottom approach in developing and maintaining applications. In this case, small domains of business requirements turn into code and are made available to the rest of the enterprise as a service.

Grid, on the other hand, is the ultimate cost-saving strategic tool. It can dynamically allocate the right amount of compute fabric to the LOB that needs it the most. In Grid's simplest form, the risk and analytics group can have near-time response to complex “what if” market scenarios during the day, and the back office can meet the critical global economy requirements by using most of the compute fabric during the night window, which is getting smaller and smaller.

Next, let's review the product stack. First, I need to make a distinction between High Performance Computing (HPC) and Grid. HPC is all about making applications to compute fast — and one application at a time, I might add. Grid software, at large, orchestrates application execution and manages the available hardware resource or the compute fabric. There is further distinction based on the geographic collocation of the compute resource (i.e., desktop computers, workgroup, cluster and Grid). Grid virtualizes one or more clusters, whether they are located on the same floor or half way around the world. In all cases, hardware can be heterogeneous and with different computing properties.

In this article, I refer to the available compute fabric as the Grid at large. HPC applications started on super computers, vector computers and SMP boxes. Today, Grid offers a very compelling alternative for executing HPC applications. By taking a serially executing application and chunking it into smaller components that can run simultaneously on multiple nodes, the compute fabric, you can potentially improve the performance of an application by a factor of N, where N is the number of CPUs available on the compute fabric. Not bad at all, but admittedly there is a catch. Finding the parallelization opportunity or chunking is not always a trivial task and may require major re-engineering. That sounds invasive and costly, and the last thing one wants is to make logic changes to an existing application, adapt a new programming paradigm, hire expensive niche expertise and embark on one-off development cycles taking time away time form core business competence.

They good news is that several HPC design patterns are emerging. In short, there are three high-level parallelization patterns: domain decomposition, functional decomposition and algorithmic parallelization. Domain decomposition, also known as “same instructions, different data” or “loop level parallelization,” provides a simple Grid-enablement process. It requires that the application is adapted to run on smaller chunks of data (e.g., if you have a loop that iterates 1 million times doing the same computation on different data, the adapter can chunk the loop into, say, 1,000 ranges and do the same computation using 1,000 CPUs at the same time in parallel). OpenMP's “#pragma omp parallel” is a pre-compiler adapter supporting domain decomposition.

Functional decomposition comes in many flavors. The most obvious flavor is probably running in your back-office batch cycle: a set of independent executables readily available to run from the command line. In its more complex variety, it might require minimum instrumentation or adaptation of the serial code.

Algorithmic parallelization is left for very specific domain problems and usually combines functional and domain decomposition techniques. Such examples include HPC solvers for Partial Differential Equation, recombining trees for stochastic models and global unconstrained optimization required for a variety of business problems.

So, here is the first and top layer of the product stack: the adaptation layer. Applications need an non-invasive way to run on a Grid. This layer provides means that map the serial code to parallel executing components. A number of toolkits with available APIs are coming to market with a varying degree of abstraction and integration effort. Clearly, different types of algorithms and applications might need a different approach. Therefore, a tactical solution may be required. Whatever the approach, you want to avoid logic change of existing code and use a high level paradigm that encapsulates the rigors of parallelization. In addition, you should look for a toolkit that comes with a repeatable best practices process.

To introduce the next two layers, consider the requirements for sharing data and communicating results among the decomposed chunks of work. Shared data can be either static or intermediate computed results. In the case of static data, a simple NFS type of solution or a database access will suffice. But if the parallel workers need to exchange data, distributed data shared memory services might be required. So, the next layer going down the stack provides data transparency and data virtualization across the Grid. Clearly, it is a strategic piece of the puzzle, and high performance and scalability is critical for the few applications that need these qualities of services.

Communication among workers gives way to the classic middleware layer. One word of advice: make sure that your application is not exposed to any direct calls of the middleware, unless, of course, you have time to develop and debug low level messaging code. Better yet, make sure you don't have anything to do with middleware calls and that the application stack provides you with a much higher API abstraction.

So, you've developed your SOA HPC applications and all the LOBs are lining-up to use the compute fabric. How do you make sure that applications compute in a predictable fashion and within a predetermined timelines? How do you assure the horizontal scalability, reliability and high availability? This brings us to the most important part of the stack — the Grid software. The Grid software provides all the quality of services that make the product stack industrial-strength and mission-critical-ready: workload and resource management; SLA based ownership of resources; fail-over; cost-accounting; operational monitoring for 24×7 enterprises; horizontal scalability; and maximum use of compute capacity. The core of this layer implements an open policy-driven distributed scheduler.

A word of caution: resist the temptation to roll out your own solution. Just answer this: If you were to implement a J2EE application, would you write your own application server? A last word of advice: as rapdily as standards are evolving and products are maturing, it is important to pick your vendors wisely. Get a vendor that will be around tomorrow and that has the technical expertise your enterprise will need to extend the product and support your 24×7 operations.

Technologies cannot exist without real business benefits — we've tried this back in the dot.bom days, right? Clearly, SOA and the Grid software stack are mature, address real, tangible business benefits, and fully support the adaptive enterprise and the pragmatic reality of change. The beauty of a Grid and SOA implementation is that it does not have to be a big-bang approach to bring benefits. Start with your batch cycle, the time-consuming custom-built market risk application or the Excel spreadsheet running at the trader desk that takes 12 hours to complete. Then, instrument your first HPC and take advantage of idle CPU cycles, or transition an application from an expensive SMP machine to commodity hardware. You will immediately see ROI and business benefits. Be prepared for the unpredictable volume spikes that business growth opportunities bring with them.

Until next time: get the Grids crunching.

About Labro Dimitriou

Labro Dimitriou is a subject matter expert in HPC and Grid. He has been in the fields of distributed computing, applied mathematics and operations research for over 23 years, and has developed commercial software for trading, engineering and geosciences. Dimitriou has spent the last four years designing enterprise HPC and Grid solutions in finance and life science. He can be reached via e-mail at [email protected].

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

At SC19: What Is UrgentHPC and Why Is It Needed?

November 14, 2019

The UrgentHPC workshop, taking place Sunday (Nov. 17) at SC19, is focused on using HPC and real-time data for urgent decision making in response to disasters such as wildfires, flooding, health emergencies, and accidents. We chat with organizer Nick Brown, research fellow at EPCC, University of Edinburgh, to learn more. Read more…

By Tiffany Trader

China’s Tencent Server Design Will Use AMD Rome

November 13, 2019

Tencent, the Chinese cloud giant, said it would use AMD’s newest Epyc processor in its internally-designed server. The design win adds further momentum to AMD’s bid to erode rival Intel Corp.’s dominance of the glo Read more…

By George Leopold

NCSA Industry Conference Recap – Part 1

November 13, 2019

Industry Program Director Brendan McGinty welcomed guests to the annual National Center for Supercomputing Applications (NCSA) Industry Conference, October 8-10, on the University of Illinois campus in Urbana (UIUC). One hundred seventy from 40 organizations attended the invitation-only, two-day event. Read more…

By Elizabeth Leake, STEM-Trek

Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020

November 12, 2019

The number of top-tier HPC systems makers has shrunk due to a steady march of M&A activity, but there is increased diversity and choice of processing components with Intel Xeon, AMD Epyc, IBM Power, and Arm server ch Read more…

By Tiffany Trader

Intel AI Summit: New ‘Keem Bay’ Edge VPU, AI Product Roadmap

November 12, 2019

At its AI Summit today in San Francisco, Intel touted a raft of AI training and inference hardware for deployments ranging from cloud to edge and designed to support organizations at various points of their AI journeys. The company revealed its Movidius Myriad Vision Processing Unit (VPU)... Read more…

By Doug Black

AWS Solution Channel

Making High Performance Computing Affordable and Accessible for Small and Medium Businesses with HPC on AWS

High performance computing (HPC) brings a powerful set of tools to a broad range of industries, helping to drive innovation and boost revenue in finance, genomics, oil and gas extraction, and other fields. Read more…

IBM Accelerated Insights

Help HPC Work Smarter and Accelerate Time to Insight

 

[Attend the IBM LSF & HPC User Group Meeting at SC19 in Denver on November 19]

To recklessly misquote Jane Austen, it is a truth, universally acknowledged, that a company in possession of a highly complex problem must be in want of a massive technical computing cluster. Read more…

SIA Recognizes Robert Dennard with 2019 Noyce Award

November 12, 2019

If you don’t know what Dennard Scaling is, the chances are strong you don’t labor in electronics. Robert Dennard, longtime IBM researcher, inventor of the DRAM and the fellow for whom Dennard Scaling was named, is th Read more…

By John Russell

Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020

November 12, 2019

The number of top-tier HPC systems makers has shrunk due to a steady march of M&A activity, but there is increased diversity and choice of processing compon Read more…

By Tiffany Trader

Intel AI Summit: New ‘Keem Bay’ Edge VPU, AI Product Roadmap

November 12, 2019

At its AI Summit today in San Francisco, Intel touted a raft of AI training and inference hardware for deployments ranging from cloud to edge and designed to support organizations at various points of their AI journeys. The company revealed its Movidius Myriad Vision Processing Unit (VPU)... Read more…

By Doug Black

IBM Adds Support for Ion Trap Quantum Technology to Qiskit

November 11, 2019

After years of percolating in the shadow of quantum computing research based on superconducting semiconductors – think IBM, Rigetti, Google, and D-Wave (quant Read more…

By John Russell

Tackling HPC’s Memory and I/O Bottlenecks with On-Node, Non-Volatile RAM

November 8, 2019

On-node, non-volatile memory (NVRAM) is a game-changing technology that can remove many I/O and memory bottlenecks and provide a key enabler for exascale. That’s the conclusion drawn by the scientists and researchers of Europe’s NEXTGenIO project, an initiative funded by the European Commission’s Horizon 2020 program to explore this new... Read more…

By Jan Rowell

MLPerf Releases First Inference Benchmark Results; Nvidia Touts its Showing

November 6, 2019

MLPerf.org, the young AI-benchmarking consortium, today issued the first round of results for its inference test suite. Among organizations with submissions wer Read more…

By John Russell

Azure Cloud First with AMD Epyc Rome Processors

November 6, 2019

At Ignite 2019 this week, Microsoft's Azure cloud team and AMD announced an expansion of their partnership that began in 2017 when Azure debuted Epyc-backed instances for storage workloads. The fourth-generation Azure D-series and E-series virtual machines previewed at the Rome launch in August are now generally available. Read more…

By Tiffany Trader

Nvidia Launches Credit Card-Sized 21 TOPS Jetson System for Edge Devices

November 6, 2019

Nvidia has launched a new addition to its Jetson product line: a credit card-sized (70x45mm) form factor delivering up to 21 trillion operations/second (TOPS) o Read more…

By Doug Black

In Memoriam: Steve Tuecke, Globus Co-founder

November 4, 2019

HPCwire is deeply saddened to report that Steve Tuecke, longtime scientist at Argonne National Lab and University of Chicago, has passed away at age 52. Tuecke Read more…

By Tiffany Trader

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

Using AI to Solve One of the Most Prevailing Problems in CFD

October 17, 2019

How can artificial intelligence (AI) and high-performance computing (HPC) solve mesh generation, one of the most commonly referenced problems in computational engineering? A new study has set out to answer this question and create an industry-first AI-mesh application... Read more…

By James Sharpe

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

D-Wave’s Path to 5000 Qubits; Google’s Quantum Supremacy Claim

September 24, 2019

On the heels of IBM’s quantum news last week come two more quantum items. D-Wave Systems today announced the name of its forthcoming 5000-qubit system, Advantage (yes the name choice isn’t serendipity), at its user conference being held this week in Newport, RI. Read more…

By John Russell

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ Read more…

By John Russell

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber, Burak Yenier and Wolfgang Gentzsch, UberCloud

Dell Ramps Up HPC Testing of AMD Rome Processors

October 21, 2019

Dell Technologies is wading deeper into the AMD-based systems market with a growing evaluation program for the latest Epyc (Rome) microprocessors from AMD. In a Read more…

By John Russell

Rise of NIH’s Biowulf Mirrors the Rise of Computational Biology

July 29, 2019

The story of NIH’s supercomputer Biowulf is fascinating, important, and in many ways representative of the transformation of life sciences and biomedical res Read more…

By John Russell

Xilinx vs. Intel: FPGA Market Leaders Launch Server Accelerator Cards

August 6, 2019

The two FPGA market leaders, Intel and Xilinx, both announced new accelerator cards this week designed to handle specialized, compute-intensive workloads and un Read more…

By Doug Black

When Dense Matrix Representations Beat Sparse

September 9, 2019

In our world filled with unintended consequences, it turns out that saving memory space to help deal with GPU limitations, knowing it introduces performance pen Read more…

By James Reinders

With the Help of HPC, Astronomers Prepare to Deflect a Real Asteroid

September 26, 2019

For years, NASA has been running simulations of asteroid impacts to understand the risks (and likelihoods) of asteroids colliding with Earth. Now, NASA and the European Space Agency (ESA) are preparing for the next, crucial step in planetary defense against asteroid impacts: physically deflecting a real asteroid. Read more…

By Oliver Peckham

Cerebras to Supply DOE with Wafer-Scale AI Supercomputing Technology

September 17, 2019

Cerebras Systems, which debuted its wafer-scale AI silicon at Hot Chips last month, has entered into a multi-year partnership with Argonne National Laboratory and Lawrence Livermore National Laboratory as part of a larger collaboration with the U.S. Department of Energy... Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This