MPI Is 25 Years Old!

By Ewing Lusk and Jesper Larsson Träff

May 1, 2017

Has it really been 25 years since the Message Passing Interface standard was born? It has indeed, and at this year’s EuroMPI meeting in September in Chicago, a “birthday” symposium will be held to celebrate the occasion. Speakers from the remote past of MPI, the middle years, and the current time will touch on the ideas that have given MPI its long life and will highlight the impact the standard has had on multiple aspects of parallel computing, from applications to libraries to its multiple implementations.

The concept of a standard for message passing emerged over time. While assorted systems, both commercial and free, competed for “mind share” and commercial success, a small meeting of researchers took place in 1991 at a conference in Oberlech, Austria. There Jack Dongarra, Rolf Hempel, Tony Hey, and David Walker drafted a white paper outlining a proposal for what a standard might look like, borrowing heavily from Marc Snir’s work at IBM. Jack Dongarra, Professor of Computer Science at the University of Tennessee, recalls, “Each of the existing systems had merit, but none had everything needed to move application development forward. We decided to instigate a community effort to address the problem.” It seems reasonable to affix the label “Birth of MPI” to the resulting workshop entitled “Standards for Message Passing in a Distributed Memory Environment” organized by Jack Dongarra and David Walker with funding from the Ken Kennedy Center for Research in Parallel Computation at Rice University in April 1992. That was the first time a wide variety of interested stakeholders gathered in an open meeting dedicated to the topic of a standard for message passing, forecasting the openness of the process that would follow. The result of that workshop, which featured presentations on multiple vendor-specific and portable systems, was a realization that a great diversity of good ideas existed among then-current message-passing libraries but that the lack of a standard was impeding the progress of parallel computing.

Jack Dongarra

At the Supercomputing ’92 conference in November, a committee was formed to define a message-passing standard. At the time of creation, no one knew what the outcome might look like, but the effort was begun with the following objectives:  (1) to define a portable standard for message-passing, which would not be an official, ANSI-like standard but would attract both implementers and users; (2) to operate in a completely open way, allowing anyone to join the discussions, either by attending meetings in person or by monitoring open email discussions; and (3) to be finished in one year.

The MPI effort was a lively one, as a result of the tensions among these three objectives. The committee decided to follow the format used by the High-Performance Fortran Forum, whose procedures had been well received by its community. (It even decided to meet in the same hotel in North Dallas.)  An early decision of the MPI Forum was to not adopt any existing system or proposal as a starting but to start from scratch, with the explicit goals of portability, expressiveness, and performance capability. “Ease of use” was not a primary goal; the idea was that libraries, compilers, and other software layers would provide this aspect of parallel programming, and that applications would rely on their implementations over MPI to provide convenience of programming.

More formal meetings began in January 1993 under the name “MPI Forum,” an extension of the SC ’92 committee, and continued until the following February. Over that time, more than 60 people from 40 organizations participated, although attendance at most meetings was about 30. The procedures for submitting proposals and voting were adopted from those of HPF Forum, which had worked well. One reason the MPI standardization effort succeeded was that the MPI Forum itself was so broadly based. At the original (MPI-1) Forum the parallel computer vendors were represented by Convex, Cray, IBM, Intel, Meiko, nCUBE, NEC, and Thinking Machines. Members of the groups associated with portable software libraries were also there: PVM, p4, Zipcode, Chameleon, PARMACS, TCGMSG, and Express were all represented, as well as some application groups. One subgroup committed to providing a test implementation of each iteration of the standard as it evolved from meeting to meeting; this proved valuable in uncovering the implementation consequences of API decisions, as well as ensuring that when the standard definition was completed, a prototype implementation was immediately available. Marc Snir, Professor of Computer Science at the University of Illinois and an original Forum member representing IBM, has said, “The MPI Forum was an outstanding example of many companies, research labs, and individuals working together to achieve a common good.”

The first version of the MPI standard was published in May 1994. It included standard versions of many well-known message-passing operations such as blocking and nonblocking sends and receives, together with collective operations such as broadcast, reduce, and scan. It broke new ground with its concept of communicators (essential for the modularity of MPI-based libraries), datatypes (to deal efficiently with structured and noncontiguous messages), and process topologies (ignored by many in those days but becoming more significant on today’s machines). Its inclusion of both Fortran and C bindings (with identical semantics) signaled its desire to be immediately useful to both libraries and end-user scientific applications.

MPI also took an innovative approach to the problem of tools for debugging and performance analysis. Rather than designing such a tool into the standard specification itself, MPI provided a mechanism, its “profiling interface,” by which anyone could write a library that intercepted a subset of MPI calls in order to count, measure, or display them in some way, before (and after) passing them to the underlying MPI implementation for actual execution. As expected, this has spawned a wide collection of tools that are completely portable, since the profiling interface is part of the standard rather than the tool itself.

During the 1993-1994 meetings of the MPI Forum, several issues were postponed in order to reach early agreement on a core of message-passing functionality, which nonetheless included several innovative concepts, such as communicators, datatypes, and topologies. The Forum reconvened during 1995-1997 to extend MPI to include remote memory operations, parallel I/O, and dynamic process management, along with a number of features designed to increase the convenience and robustness of MPI. This effort resulted in the MPI-2 standard, released in 1997. MPI-2 had three major new feature sets:  an extensive interface to efficiently support parallel file I/O to and from MPI programs; support for one-sided (put/get) communication; and dynamic process management, namely, the ability to create additional processes from a running MPI program and the ability for separately started MPI applications to connect to each other and communicate. MPI-2 also introduced other features, such as precisely defined semantics for multithreaded communication that in some way foreshadowed the multiple modes of OpenMP parallelism, bindings for Fortran-90 and C++, and detailed support for mixed language programming (how to send a message from Fortran and have it received in C, for example).

While the MPI-2 standard was finished in 1997, it took a few years for full implementations to appear. In contrast to the MPI-1 effort, there was no hand-in-hand prototype developed for most of the additions of MPI-2, and in retrospect, some of the useful feedback on the standardization process from a co-developed prototype was missing. Nevertheless, over the next decade and a half, MPI filled the needs of most computational science codes that required a high-performance, scalable, portable programming system. The Forum itself disbanded.

The timing of MPI seems to have been about right. Trying to establish such a standard earlier might have failed to benefit from research into multiple approaches. Indeed, some feared that adoption of a standard would shut down research into the message-passing model. In fact, the opposite happened. Having a fairly complete, performance-enabling, portable interface target stimulated a wealth of research into implementation approaches, tool development, and application algorithms. Much of the research appeared in the Proceedings of the Euro-* conferences, underlining the international nature of MPI-based research. These workshops started as PVM (Parallel Virtual Machine) user group meetings, became EuroPVM workshops from 1994 to 1996, EuroPVM/MPI from 2007 to 2009, and EuroMPI from 2010 to 2017. It is telling and amusing that “Euro”MPI 2017 will be held in Chicago this year.

Over the next fifteen years or so, the MPI Forum itself was inactive, the published standard remained unchanged, and MPI was a stable interface for users and implementers alike. Vendors used the open-source prototype implementations (MPICH, and later OpenMPI), layered to allow optimizations at multiple levels, to evolve their proprietary implementations over time in order to gradually take advantage of their own evolving specialized hardware.

This was no mean feat. As Bill Gropp, Acting Director and Chief scientist at the National Center for Supercomputing Applications, says, “One of the hardest things about an MPI implementation is keeping the implementation focused on the future. This requires finding a balance between making engineering decisions based on today’s hardware and designing and implementing for likely directions in the future.”  Many message-passing applications, written in customized ways to deal with the portability problem, switched to making direct MPI calls, improving efficiency and maintainability. And library development was unleashed, fulfilling one of MPI’s original goals. Barry Smith, Senior Computer Scientist at Argonne National Laboratory and primary developer of the PETSc library, explains MPI’s contribution to library development as follows:  “MPI changed everything, by providing an extensive API for message passing and collectives that allowed portable distributed memory scientific libraries to no longer need to be programmed to the lowest common denominator of message passing systems. Equally important, MPI eliminated the problem of ‘tag collision’ where each library might utilize the same tags for messages, resulting in messages sent from one library being (improperly) received and processed by a different library or the application code. The MPI communicator concept made distributed parallel scientific libraries practical in two ways, it eliminated the tag collision problem and (by the use of subcommunicators) allowed applications to simply utilize scientific libraries to perform needed computations on subsets of processes, for example with ‘divide and conquer’ algorithms.”

For more than a decade after the Forum disbanded in 1997, the MPI specification remained stable, providing a period during which MPI could “sink in” while implementations steadily improved, parallel libraries flourished, and applications, now portable, took advantage of multiple new tera- and petascale machines, challenging those implementations and libraries to become ever more scalable. However, HPC moves fast, and after a dozen years multiple trends had gradually increased community pressure to restart the MPI process, whose inclusiveness and openness had served the community so well in the past.

For one thing, the scale of massively parallel systems had reached more than a million cores. Single-core processors had disappeared, nodes had become symmetric multiprocessors, and defining how a distributed-memory model like MPI’s would interact with threads (specifically, the emerging OpenMP standard) and shared memory became more critical. Remote memory access (put/get) support in networks became mainstream, raising the applicability of efficient remote memory access (RMA) as a programming model. Although MPI-2’s RMA was used by some applications, it had failed to live up to expectations and needed an overhaul. C and Fortran had both evolved, requiring updates to the MPI interfaces. Nonblocking collective operations had been proposed, and some experience with them obtained. At the time of MPI-2, nonblocking collectives had been considered but deliberately left out of the standard because of the expectation that they could be implemented on top of MPI by issuing blocking operations in separate threads. However, threads turned out to be more difficult to use efficiently, and support for threads was uneven. The increase in scale had brought fault tolerance issues to the fore. And finally, a list of (mostly) minor errata had accumulated.

In response to all this, the MPI Forum reconstituted itself in 2008, at first tidying up MPI-2 and eventually releasing the initial version of MPI-3 in September 2012. Major new features of MPI-3 include the nonblocking collective operations, together with “neighborhood” collectives, useful for stencil computations and relying on the topology functions from MPI-1. (The concept of a nonblocking barrier was considered a joke during the MPI-1 meetings; now MPI has one!) There is an improved one-sided communication interface as well as a tools interface that goes beyond MPI-1’s profiling interface to dynamically access the behavior of an MPI implementation. The Fortran bindings have been updated to take advantage of the Fortran 2008 standard, which was a major step forward in making Fortran work well with libraries in a parallel environment. C bindings were modernized to catch more errors at compile time. Other new features improved interactions with threads and shared memory.

Some topics that the MPI-3 Forum grappled with have not (yet) become part of MPI, such as fault tolerance and more complex support for multithreaded programming, because the Forum decided that current proposals were not quite ready for standardization. The Forum continues to work on these and other issues. Martin Schulz, Computer Scientist at Lawrence Livermore National Laboratory and current chairperson of the MPI-3 Forum, says, “As MPI has established itself as the dominant standard in HPC, it has been exciting and rewarding to see that the members of the MPI forum have not been resting on their laurels. Instead, the Forum continues to drive innovation balanced with the pragmatism necessary for a standards document as we race towards exascale as well as to embrace new commercial application fields and their different requirements.”

Many of the participants in this decades-long effort will speak at the “25 Years of MPI” symposium during the EuroMPI Workshop to be held at Argonne National Laboratory near Chicago on September 25-27, 2017.

About the Authors

Ewing “Rusty” Lusk is Argonne Distinguished Fellow Emeritus at Argonne National Laboratory.

Prof. Jesper Larsson Träff is on the Faculty of Informatics at the Vienna University of Technology.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire