Grid, HPC and SOA: The Real Thing?

By By Labro Dimitriou, Contributing Author

May 23, 2005

How do we know when a new technology is the real thing or just a fad? Furthermore, how do we value the significance of a new technology, and when is new technology a tactical or a strategic decision? In this article, I will discuss why Grid and SOA are here to stay. I will also describe the technology “product stack” in order to identify strategic from tactical, and I will propose best practices techniques for securing ROI and offering resilience to change and early adoption risks.

Some would say that Grid and SOA are not revolutionary concepts, but rather evolutionary steps of enterprise distributed computing. Make no mistake, though: together, the technologies have the potential and the power to bring about a computing revolution. Grid and SOA may seem unrelated, but they are complementary notions with fundamentally the same technology underpinning and common business goals. They are service-based entities supporting the adaptive enterprise.

So, let's talk about the adaptive, or agile, enterprise and its characteristics. The only constant in today's business models is change. Constant change in the way of doing business exists either because the company is out of focus or because of new competitive pressures: today we are product-focused, tomorrow we are client-centric. Re-engineering the enterprise is no longer the final state, but more of an ongoing-effort. Consider six-sigma and the Business Process Management (BPM) initiatives. Integration is not an afterthought anymore; most systems are built with integration as a hard requirement. There are changes of underlying technology that are apparent across all infrastructures and applications. The fact that new hardware delivers more power for less money proves that Moore's law is still valid. And last, but most challenging, are the varying requirements of processing compute power. Clearly, over-provisioning can only lead to underutilization and overspending, both undesirable results.

Information systems have to support the adaptive enterprise. As David Taylor wrote in his book Business Engineering with Object Technology: “Information systems, like the business models they support, must be adaptive in nature.” Simply put, Information systems have two layers: software and hardware supporting and facilitating business requirements.

SOA decouples business requirements and presentation (user interface) from the core application. Thus, shielding the end-user from incremental changes and visa versa: localizing the effect of code change when requirements adapt to new business conditions.

Grid software decouples computing needs from hardware capacity. It inserts the necessary abstraction layer that not only protects the application from hardware change, but also provides horizontal scalability, predictability with guaranteed SLAs, fault tolerance by design and maximum CPU utilization.

SOA gave rise to the notion of the enterprise service bus, which can transform a portfolio of monolithic applications to a pool of highly parameterized service based components. A new business application can be designed by orchestrating a set of Web services already in production. Time to market for a new application can be reduced by orders of magnitude. Grid services virtualize compute silos suffering from under-performance or under-utilization and turns them into well-balanced, fully utilized enterprise compute backbones.

SOA provides an optimal path for a minimum cost re-engineering or integration effort for a legacy system. In many cases, legacy systems gain longevity by replacing a hard-wired interface with a Web services layer. The Grid toolkit can turn a legacy application that hit the performance boundaries of a large SMP box to an HPC application running on a farm of high-powered, low cost commodity hardware.

Consider a small to medium enterprise with three or four vertical lines of businesses (LOB) each requiring a few turnkey applications. The traditional approach would be to look at the requirements of each application in isolation, design the code and deploy on hardware managed by the LOB. What is wrong with that approach? Well, lines of businesses most certainly share a good number of requirements, which means the enterprise spends money doing many of the same things multiple times. And what about addressing computing demands to run the dozen or so applications? Each LOB has to do its own capacity management.

Keeping a business unit happy is a tight walk between under-provisioning and over-spending. SOA is an architectural blueprint that delivers on its promise of application reuse and interoperability. It provides a top to bottom approach in developing and maintaining applications. In this case, small domains of business requirements turn into code and are made available to the rest of the enterprise as a service.

Grid, on the other hand, is the ultimate cost-saving strategic tool. It can dynamically allocate the right amount of compute fabric to the LOB that needs it the most. In Grid's simplest form, the risk and analytics group can have near-time response to complex “what if” market scenarios during the day, and the back office can meet the critical global economy requirements by using most of the compute fabric during the night window, which is getting smaller and smaller.

Next, let's review the product stack. First, I need to make a distinction between High Performance Computing (HPC) and Grid. HPC is all about making applications to compute fast — and one application at a time, I might add. Grid software, at large, orchestrates application execution and manages the available hardware resource or the compute fabric. There is further distinction based on the geographic collocation of the compute resource (i.e., desktop computers, workgroup, cluster and Grid). Grid virtualizes one or more clusters, whether they are located on the same floor or half way around the world. In all cases, hardware can be heterogeneous and with different computing properties.

In this article, I refer to the available compute fabric as the Grid at large. HPC applications started on super computers, vector computers and SMP boxes. Today, Grid offers a very compelling alternative for executing HPC applications. By taking a serially executing application and chunking it into smaller components that can run simultaneously on multiple nodes, the compute fabric, you can potentially improve the performance of an application by a factor of N, where N is the number of CPUs available on the compute fabric. Not bad at all, but admittedly there is a catch. Finding the parallelization opportunity or chunking is not always a trivial task and may require major re-engineering. That sounds invasive and costly, and the last thing one wants is to make logic changes to an existing application, adapt a new programming paradigm, hire expensive niche expertise and embark on one-off development cycles taking time away time form core business competence.

They good news is that several HPC design patterns are emerging. In short, there are three high-level parallelization patterns: domain decomposition, functional decomposition and algorithmic parallelization. Domain decomposition, also known as “same instructions, different data” or “loop level parallelization,” provides a simple Grid-enablement process. It requires that the application is adapted to run on smaller chunks of data (e.g., if you have a loop that iterates 1 million times doing the same computation on different data, the adapter can chunk the loop into, say, 1,000 ranges and do the same computation using 1,000 CPUs at the same time in parallel). OpenMP's “#pragma omp parallel” is a pre-compiler adapter supporting domain decomposition.

Functional decomposition comes in many flavors. The most obvious flavor is probably running in your back-office batch cycle: a set of independent executables readily available to run from the command line. In its more complex variety, it might require minimum instrumentation or adaptation of the serial code.

Algorithmic parallelization is left for very specific domain problems and usually combines functional and domain decomposition techniques. Such examples include HPC solvers for Partial Differential Equation, recombining trees for stochastic models and global unconstrained optimization required for a variety of business problems.

So, here is the first and top layer of the product stack: the adaptation layer. Applications need an non-invasive way to run on a Grid. This layer provides means that map the serial code to parallel executing components. A number of toolkits with available APIs are coming to market with a varying degree of abstraction and integration effort. Clearly, different types of algorithms and applications might need a different approach. Therefore, a tactical solution may be required. Whatever the approach, you want to avoid logic change of existing code and use a high level paradigm that encapsulates the rigors of parallelization. In addition, you should look for a toolkit that comes with a repeatable best practices process.

To introduce the next two layers, consider the requirements for sharing data and communicating results among the decomposed chunks of work. Shared data can be either static or intermediate computed results. In the case of static data, a simple NFS type of solution or a database access will suffice. But if the parallel workers need to exchange data, distributed data shared memory services might be required. So, the next layer going down the stack provides data transparency and data virtualization across the Grid. Clearly, it is a strategic piece of the puzzle, and high performance and scalability is critical for the few applications that need these qualities of services.

Communication among workers gives way to the classic middleware layer. One word of advice: make sure that your application is not exposed to any direct calls of the middleware, unless, of course, you have time to develop and debug low level messaging code. Better yet, make sure you don't have anything to do with middleware calls and that the application stack provides you with a much higher API abstraction.

So, you've developed your SOA HPC applications and all the LOBs are lining-up to use the compute fabric. How do you make sure that applications compute in a predictable fashion and within a predetermined timelines? How do you assure the horizontal scalability, reliability and high availability? This brings us to the most important part of the stack — the Grid software. The Grid software provides all the quality of services that make the product stack industrial-strength and mission-critical-ready: workload and resource management; SLA based ownership of resources; fail-over; cost-accounting; operational monitoring for 24×7 enterprises; horizontal scalability; and maximum use of compute capacity. The core of this layer implements an open policy-driven distributed scheduler.

A word of caution: resist the temptation to roll out your own solution. Just answer this: If you were to implement a J2EE application, would you write your own application server? A last word of advice: as rapdily as standards are evolving and products are maturing, it is important to pick your vendors wisely. Get a vendor that will be around tomorrow and that has the technical expertise your enterprise will need to extend the product and support your 24×7 operations.

Technologies cannot exist without real business benefits — we've tried this back in the dot.bom days, right? Clearly, SOA and the Grid software stack are mature, address real, tangible business benefits, and fully support the adaptive enterprise and the pragmatic reality of change. The beauty of a Grid and SOA implementation is that it does not have to be a big-bang approach to bring benefits. Start with your batch cycle, the time-consuming custom-built market risk application or the Excel spreadsheet running at the trader desk that takes 12 hours to complete. Then, instrument your first HPC and take advantage of idle CPU cycles, or transition an application from an expensive SMP machine to commodity hardware. You will immediately see ROI and business benefits. Be prepared for the unpredictable volume spikes that business growth opportunities bring with them.

Until next time: get the Grids crunching.

About Labro Dimitriou

Labro Dimitriou is a subject matter expert in HPC and Grid. He has been in the fields of distributed computing, applied mathematics and operations research for over 23 years, and has developed commercial software for trading, engineering and geosciences. Dimitriou has spent the last four years designing enterprise HPC and Grid solutions in finance and life science. He can be reached via e-mail at [email protected].

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

2024 Winter Classic: The Return of Team Fayetteville

April 18, 2024

Hailing from Fayetteville, NC, Fayetteville State University stayed under the radar in their first Winter Classic competition in 2022. Solid students for sure, but not a lot of HPC experience. All good. They didn’t Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use of Rigetti’s Novera 9-qubit QPU. The approach by a quantum Read more…

2024 Winter Classic: Meet Team Morehouse

April 17, 2024

Morehouse College? The university is well-known for their long list of illustrious graduates, the rigor of their academics, and the quality of the instruction. They were one of the first schools to sign up for the Winter Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs 

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire