Grid, HPC and SOA: The Real Thing?

By By Labro Dimitriou, Contributing Author

May 23, 2005

How do we know when a new technology is the real thing or just a fad? Furthermore, how do we value the significance of a new technology, and when is new technology a tactical or a strategic decision? In this article, I will discuss why Grid and SOA are here to stay. I will also describe the technology “product stack” in order to identify strategic from tactical, and I will propose best practices techniques for securing ROI and offering resilience to change and early adoption risks.

Some would say that Grid and SOA are not revolutionary concepts, but rather evolutionary steps of enterprise distributed computing. Make no mistake, though: together, the technologies have the potential and the power to bring about a computing revolution. Grid and SOA may seem unrelated, but they are complementary notions with fundamentally the same technology underpinning and common business goals. They are service-based entities supporting the adaptive enterprise.

So, let's talk about the adaptive, or agile, enterprise and its characteristics. The only constant in today's business models is change. Constant change in the way of doing business exists either because the company is out of focus or because of new competitive pressures: today we are product-focused, tomorrow we are client-centric. Re-engineering the enterprise is no longer the final state, but more of an ongoing-effort. Consider six-sigma and the Business Process Management (BPM) initiatives. Integration is not an afterthought anymore; most systems are built with integration as a hard requirement. There are changes of underlying technology that are apparent across all infrastructures and applications. The fact that new hardware delivers more power for less money proves that Moore's law is still valid. And last, but most challenging, are the varying requirements of processing compute power. Clearly, over-provisioning can only lead to underutilization and overspending, both undesirable results.

Information systems have to support the adaptive enterprise. As David Taylor wrote in his book Business Engineering with Object Technology: “Information systems, like the business models they support, must be adaptive in nature.” Simply put, Information systems have two layers: software and hardware supporting and facilitating business requirements.

SOA decouples business requirements and presentation (user interface) from the core application. Thus, shielding the end-user from incremental changes and visa versa: localizing the effect of code change when requirements adapt to new business conditions.

Grid software decouples computing needs from hardware capacity. It inserts the necessary abstraction layer that not only protects the application from hardware change, but also provides horizontal scalability, predictability with guaranteed SLAs, fault tolerance by design and maximum CPU utilization.

SOA gave rise to the notion of the enterprise service bus, which can transform a portfolio of monolithic applications to a pool of highly parameterized service based components. A new business application can be designed by orchestrating a set of Web services already in production. Time to market for a new application can be reduced by orders of magnitude. Grid services virtualize compute silos suffering from under-performance or under-utilization and turns them into well-balanced, fully utilized enterprise compute backbones.

SOA provides an optimal path for a minimum cost re-engineering or integration effort for a legacy system. In many cases, legacy systems gain longevity by replacing a hard-wired interface with a Web services layer. The Grid toolkit can turn a legacy application that hit the performance boundaries of a large SMP box to an HPC application running on a farm of high-powered, low cost commodity hardware.

Consider a small to medium enterprise with three or four vertical lines of businesses (LOB) each requiring a few turnkey applications. The traditional approach would be to look at the requirements of each application in isolation, design the code and deploy on hardware managed by the LOB. What is wrong with that approach? Well, lines of businesses most certainly share a good number of requirements, which means the enterprise spends money doing many of the same things multiple times. And what about addressing computing demands to run the dozen or so applications? Each LOB has to do its own capacity management.

Keeping a business unit happy is a tight walk between under-provisioning and over-spending. SOA is an architectural blueprint that delivers on its promise of application reuse and interoperability. It provides a top to bottom approach in developing and maintaining applications. In this case, small domains of business requirements turn into code and are made available to the rest of the enterprise as a service.

Grid, on the other hand, is the ultimate cost-saving strategic tool. It can dynamically allocate the right amount of compute fabric to the LOB that needs it the most. In Grid's simplest form, the risk and analytics group can have near-time response to complex “what if” market scenarios during the day, and the back office can meet the critical global economy requirements by using most of the compute fabric during the night window, which is getting smaller and smaller.

Next, let's review the product stack. First, I need to make a distinction between High Performance Computing (HPC) and Grid. HPC is all about making applications to compute fast — and one application at a time, I might add. Grid software, at large, orchestrates application execution and manages the available hardware resource or the compute fabric. There is further distinction based on the geographic collocation of the compute resource (i.e., desktop computers, workgroup, cluster and Grid). Grid virtualizes one or more clusters, whether they are located on the same floor or half way around the world. In all cases, hardware can be heterogeneous and with different computing properties.

In this article, I refer to the available compute fabric as the Grid at large. HPC applications started on super computers, vector computers and SMP boxes. Today, Grid offers a very compelling alternative for executing HPC applications. By taking a serially executing application and chunking it into smaller components that can run simultaneously on multiple nodes, the compute fabric, you can potentially improve the performance of an application by a factor of N, where N is the number of CPUs available on the compute fabric. Not bad at all, but admittedly there is a catch. Finding the parallelization opportunity or chunking is not always a trivial task and may require major re-engineering. That sounds invasive and costly, and the last thing one wants is to make logic changes to an existing application, adapt a new programming paradigm, hire expensive niche expertise and embark on one-off development cycles taking time away time form core business competence.

They good news is that several HPC design patterns are emerging. In short, there are three high-level parallelization patterns: domain decomposition, functional decomposition and algorithmic parallelization. Domain decomposition, also known as “same instructions, different data” or “loop level parallelization,” provides a simple Grid-enablement process. It requires that the application is adapted to run on smaller chunks of data (e.g., if you have a loop that iterates 1 million times doing the same computation on different data, the adapter can chunk the loop into, say, 1,000 ranges and do the same computation using 1,000 CPUs at the same time in parallel). OpenMP's “#pragma omp parallel” is a pre-compiler adapter supporting domain decomposition.

Functional decomposition comes in many flavors. The most obvious flavor is probably running in your back-office batch cycle: a set of independent executables readily available to run from the command line. In its more complex variety, it might require minimum instrumentation or adaptation of the serial code.

Algorithmic parallelization is left for very specific domain problems and usually combines functional and domain decomposition techniques. Such examples include HPC solvers for Partial Differential Equation, recombining trees for stochastic models and global unconstrained optimization required for a variety of business problems.

So, here is the first and top layer of the product stack: the adaptation layer. Applications need an non-invasive way to run on a Grid. This layer provides means that map the serial code to parallel executing components. A number of toolkits with available APIs are coming to market with a varying degree of abstraction and integration effort. Clearly, different types of algorithms and applications might need a different approach. Therefore, a tactical solution may be required. Whatever the approach, you want to avoid logic change of existing code and use a high level paradigm that encapsulates the rigors of parallelization. In addition, you should look for a toolkit that comes with a repeatable best practices process.

To introduce the next two layers, consider the requirements for sharing data and communicating results among the decomposed chunks of work. Shared data can be either static or intermediate computed results. In the case of static data, a simple NFS type of solution or a database access will suffice. But if the parallel workers need to exchange data, distributed data shared memory services might be required. So, the next layer going down the stack provides data transparency and data virtualization across the Grid. Clearly, it is a strategic piece of the puzzle, and high performance and scalability is critical for the few applications that need these qualities of services.

Communication among workers gives way to the classic middleware layer. One word of advice: make sure that your application is not exposed to any direct calls of the middleware, unless, of course, you have time to develop and debug low level messaging code. Better yet, make sure you don't have anything to do with middleware calls and that the application stack provides you with a much higher API abstraction.

So, you've developed your SOA HPC applications and all the LOBs are lining-up to use the compute fabric. How do you make sure that applications compute in a predictable fashion and within a predetermined timelines? How do you assure the horizontal scalability, reliability and high availability? This brings us to the most important part of the stack — the Grid software. The Grid software provides all the quality of services that make the product stack industrial-strength and mission-critical-ready: workload and resource management; SLA based ownership of resources; fail-over; cost-accounting; operational monitoring for 24×7 enterprises; horizontal scalability; and maximum use of compute capacity. The core of this layer implements an open policy-driven distributed scheduler.

A word of caution: resist the temptation to roll out your own solution. Just answer this: If you were to implement a J2EE application, would you write your own application server? A last word of advice: as rapdily as standards are evolving and products are maturing, it is important to pick your vendors wisely. Get a vendor that will be around tomorrow and that has the technical expertise your enterprise will need to extend the product and support your 24×7 operations.

Technologies cannot exist without real business benefits — we've tried this back in the dot.bom days, right? Clearly, SOA and the Grid software stack are mature, address real, tangible business benefits, and fully support the adaptive enterprise and the pragmatic reality of change. The beauty of a Grid and SOA implementation is that it does not have to be a big-bang approach to bring benefits. Start with your batch cycle, the time-consuming custom-built market risk application or the Excel spreadsheet running at the trader desk that takes 12 hours to complete. Then, instrument your first HPC and take advantage of idle CPU cycles, or transition an application from an expensive SMP machine to commodity hardware. You will immediately see ROI and business benefits. Be prepared for the unpredictable volume spikes that business growth opportunities bring with them.

Until next time: get the Grids crunching.

About Labro Dimitriou

Labro Dimitriou is a subject matter expert in HPC and Grid. He has been in the fields of distributed computing, applied mathematics and operations research for over 23 years, and has developed commercial software for trading, engineering and geosciences. Dimitriou has spent the last four years designing enterprise HPC and Grid solutions in finance and life science. He can be reached via e-mail at [email protected].

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

House Bill Seeks Study on Quantum Computing, Identifying Benefits, Supply Chain Risks

May 27, 2020

New legislation under consideration (H.R.6919, Advancing Quantum Computing Act) requests that the Secretary of Commerce conduct a comprehensive study on quantum computing to assess the benefits of the technology for Amer Read more…

By Tiffany Trader

$100B Plan Submitted for Massive Remake and Expansion of NSF

May 27, 2020

Legislation to reshape, expand - and rename - the National Science Foundation has been submitted in both the U.S. House and Senate. The proposal, which seems to have bipartisan support, calls for giving NSF $100 billion Read more…

By John Russell

IBM Boosts Deep Learning Accuracy on Memristive Chips

May 27, 2020

IBM researchers have taken another step towards making in-memory computing based on phase change (PCM) memory devices a reality. Papers in Nature and Frontiers in Neuroscience this month present IBM work using a mixed-si Read more…

By John Russell

Australian Researchers Break All-Time Internet Speed Record

May 26, 2020

If you’ve been stuck at home for the last few months, you’ve probably become more attuned to the quality (or lack thereof) of your internet connection. Even in the U.S. (which has a reasonably fast average broadband Read more…

By Oliver Peckham

Hats Over Hearts: Remembering Rich Brueckner

May 26, 2020

It is with great sadness that we announce the death of Rich Brueckner. His passing is an unexpected and enormous blow to both his family and our HPC family. Rich was born in Milwaukee, Wisconsin on April 12, 1962. His Read more…

AWS Solution Channel

Computational Fluid Dynamics on AWS

Over the past 30 years Computational Fluid Dynamics (CFD) has grown to become a key part of many engineering design processes. From aircraft design to modelling the blood flow in our bodies, the ability to understand the behaviour of fluids has enabled countless innovations and improved the time to market for many products. Read more…

Supercomputer Simulations Reveal the Fate of the Neanderthals

May 25, 2020

For hundreds of thousands of years, neanderthals roamed the planet, eventually (almost 50,000 years ago) giving way to homo sapiens, which quickly became the dominant primate species, with the neanderthals disappearing b Read more…

By Oliver Peckham

$100B Plan Submitted for Massive Remake and Expansion of NSF

May 27, 2020

Legislation to reshape, expand - and rename - the National Science Foundation has been submitted in both the U.S. House and Senate. The proposal, which seems to Read more…

By John Russell

IBM Boosts Deep Learning Accuracy on Memristive Chips

May 27, 2020

IBM researchers have taken another step towards making in-memory computing based on phase change (PCM) memory devices a reality. Papers in Nature and Frontiers Read more…

By John Russell

Nvidia Q1 Earnings Top Expectations, Datacenter Revenue Breaks $1B

May 22, 2020

Nvidia’s seemingly endless roll continued in the first quarter with the company announcing blockbuster earnings that exceeded Wall Street expectations. Nvidia Read more…

By Doug Black

Microsoft’s Massive AI Supercomputer on Azure: 285k CPU Cores, 10k GPUs

May 20, 2020

Microsoft has unveiled a supercomputing monster – among the world’s five most powerful, according to the company – aimed at what is known in scientific an Read more…

By Doug Black

HPC in Life Sciences 2020 Part 1: Rise of AMD, Data Management’s Wild West, More 

May 20, 2020

Given the disruption caused by the COVID-19 pandemic and the massive enlistment of major HPC resources to fight the pandemic, it is especially appropriate to re Read more…

By John Russell

AMD Epyc Rome Picked for New Nvidia DGX, but HGX Preserves Intel Option

May 19, 2020

AMD continues to make inroads into the datacenter with its second-generation Epyc "Rome" processor, which last week scored a win with Nvidia's announcement that Read more…

By Tiffany Trader

Hacking Streak Forces European Supercomputers Offline in Midst of COVID-19 Research Effort

May 18, 2020

This week, a number of European supercomputers discovered intrusive malware hosted on their systems. Now, in the midst of a massive supercomputing research effo Read more…

By Oliver Peckham

Nvidia’s Ampere A100 GPU: Up to 2.5X the HPC, 20X the AI

May 14, 2020

Nvidia's first Ampere-based graphics card, the A100 GPU, packs a whopping 54 billion transistors on 826mm2 of silicon, making it the world's largest seven-nanom Read more…

By Tiffany Trader

Supercomputer Modeling Tests How COVID-19 Spreads in Grocery Stores

April 8, 2020

In the COVID-19 era, many people are treating simple activities like getting gas or groceries with caution as they try to heed social distancing mandates and protect their own health. Still, significant uncertainty surrounds the relative risk of different activities, and conflicting information is prevalent. A team of Finnish researchers set out to address some of these uncertainties by... Read more…

By Oliver Peckham

[email protected] Turns Its Massive Crowdsourced Computer Network Against COVID-19

March 16, 2020

For gamers, fighting against a global crisis is usually pure fantasy – but now, it’s looking more like a reality. As supercomputers around the world spin up Read more…

By Oliver Peckham

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

Global Supercomputing Is Mobilizing Against COVID-19

March 12, 2020

Tech has been taking some heavy losses from the coronavirus pandemic. Global supply chains have been disrupted, virtually every major tech conference taking place over the next few months has been canceled... Read more…

By Oliver Peckham

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

Supercomputer Simulations Reveal the Fate of the Neanderthals

May 25, 2020

For hundreds of thousands of years, neanderthals roamed the planet, eventually (almost 50,000 years ago) giving way to homo sapiens, which quickly became the do Read more…

By Oliver Peckham

Steve Scott Lays Out HPE-Cray Blended Product Roadmap

March 11, 2020

Last week, the day before the El Capitan processor disclosures were made at HPE's new headquarters in San Jose, Steve Scott (CTO for HPC & AI at HPE, and former Cray CTO) was on-hand at the Rice Oil & Gas HPC conference in Houston. He was there to discuss the HPE-Cray transition and blended roadmap, as well as his favorite topic, Cray's eighth-gen networking technology, Slingshot. Read more…

By Tiffany Trader

Honeywell’s Big Bet on Trapped Ion Quantum Computing

April 7, 2020

Honeywell doesn’t spring to mind when thinking of quantum computing pioneers, but a decade ago the high-tech conglomerate better known for its control systems waded deliberately into the then calmer quantum computing (QC) waters. Fast forward to March when Honeywell announced plans to introduce an ion trap-based quantum computer whose ‘performance’ would... Read more…

By John Russell

Leading Solution Providers

SC 2019 Virtual Booth Video Tour

AMD
AMD
ASROCK RACK
ASROCK RACK
AWS
AWS
CEJN
CJEN
CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
IBM
IBM
MELLANOX
MELLANOX
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
SIX NINES IT
SIX NINES IT
VERNE GLOBAL
VERNE GLOBAL
WEKAIO
WEKAIO

Contributors

Fujitsu A64FX Supercomputer to Be Deployed at Nagoya University This Summer

February 3, 2020

Japanese tech giant Fujitsu announced today that it will supply Nagoya University Information Technology Center with the first commercial supercomputer powered Read more…

By Tiffany Trader

Tech Conferences Are Being Canceled Due to Coronavirus

March 3, 2020

Several conferences scheduled to take place in the coming weeks, including Nvidia’s GPU Technology Conference (GTC) and the Strata Data + AI conference, have Read more…

By Alex Woodie

Exascale Watch: El Capitan Will Use AMD CPUs & GPUs to Reach 2 Exaflops

March 4, 2020

HPE and its collaborators reported today that El Capitan, the forthcoming exascale supercomputer to be sited at Lawrence Livermore National Laboratory and serve Read more…

By John Russell

‘Billion Molecules Against COVID-19’ Challenge to Launch with Massive Supercomputing Support

April 22, 2020

Around the world, supercomputing centers have spun up and opened their doors for COVID-19 research in what may be the most unified supercomputing effort in hist Read more…

By Oliver Peckham

Cray to Provide NOAA with Two AMD-Powered Supercomputers

February 24, 2020

The United States’ National Oceanic and Atmospheric Administration (NOAA) last week announced plans for a major refresh of its operational weather forecasting supercomputers, part of a 10-year, $505.2 million program, which will secure two HPE-Cray systems for NOAA’s National Weather Service to be fielded later this year and put into production in early 2022. Read more…

By Tiffany Trader

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

15 Slides on Programming Aurora and Exascale Systems

May 7, 2020

Sometime in 2021, Aurora, the first planned U.S. exascale system, is scheduled to be fired up at Argonne National Laboratory. Cray (now HPE) and Intel are the k Read more…

By John Russell

TACC Supercomputers Run Simulations Illuminating COVID-19, DNA Replication

March 19, 2020

As supercomputers around the world spin up to combat the coronavirus, the Texas Advanced Computing Center (TACC) is announcing results that may help to illumina Read more…

By Staff report

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This