MPI Is 25 Years Old!

By Ewing Lusk and Jesper Larsson Träff

May 1, 2017

Has it really been 25 years since the Message Passing Interface standard was born? It has indeed, and at this year’s EuroMPI meeting in September in Chicago, a “birthday” symposium will be held to celebrate the occasion. Speakers from the remote past of MPI, the middle years, and the current time will touch on the ideas that have given MPI its long life and will highlight the impact the standard has had on multiple aspects of parallel computing, from applications to libraries to its multiple implementations.

The concept of a standard for message passing emerged over time. While assorted systems, both commercial and free, competed for “mind share” and commercial success, a small meeting of researchers took place in 1991 at a conference in Oberlech, Austria. There Jack Dongarra, Rolf Hempel, Tony Hey, and David Walker drafted a white paper outlining a proposal for what a standard might look like, borrowing heavily from Marc Snir’s work at IBM. Jack Dongarra, Professor of Computer Science at the University of Tennessee, recalls, “Each of the existing systems had merit, but none had everything needed to move application development forward. We decided to instigate a community effort to address the problem.” It seems reasonable to affix the label “Birth of MPI” to the resulting workshop entitled “Standards for Message Passing in a Distributed Memory Environment” organized by Jack Dongarra and David Walker with funding from the Ken Kennedy Center for Research in Parallel Computation at Rice University in April 1992. That was the first time a wide variety of interested stakeholders gathered in an open meeting dedicated to the topic of a standard for message passing, forecasting the openness of the process that would follow. The result of that workshop, which featured presentations on multiple vendor-specific and portable systems, was a realization that a great diversity of good ideas existed among then-current message-passing libraries but that the lack of a standard was impeding the progress of parallel computing.

Jack Dongarra

At the Supercomputing ’92 conference in November, a committee was formed to define a message-passing standard. At the time of creation, no one knew what the outcome might look like, but the effort was begun with the following objectives:  (1) to define a portable standard for message-passing, which would not be an official, ANSI-like standard but would attract both implementers and users; (2) to operate in a completely open way, allowing anyone to join the discussions, either by attending meetings in person or by monitoring open email discussions; and (3) to be finished in one year.

The MPI effort was a lively one, as a result of the tensions among these three objectives. The committee decided to follow the format used by the High-Performance Fortran Forum, whose procedures had been well received by its community. (It even decided to meet in the same hotel in North Dallas.)  An early decision of the MPI Forum was to not adopt any existing system or proposal as a starting but to start from scratch, with the explicit goals of portability, expressiveness, and performance capability. “Ease of use” was not a primary goal; the idea was that libraries, compilers, and other software layers would provide this aspect of parallel programming, and that applications would rely on their implementations over MPI to provide convenience of programming.

More formal meetings began in January 1993 under the name “MPI Forum,” an extension of the SC ’92 committee, and continued until the following February. Over that time, more than 60 people from 40 organizations participated, although attendance at most meetings was about 30. The procedures for submitting proposals and voting were adopted from those of HPF Forum, which had worked well. One reason the MPI standardization effort succeeded was that the MPI Forum itself was so broadly based. At the original (MPI-1) Forum the parallel computer vendors were represented by Convex, Cray, IBM, Intel, Meiko, nCUBE, NEC, and Thinking Machines. Members of the groups associated with portable software libraries were also there: PVM, p4, Zipcode, Chameleon, PARMACS, TCGMSG, and Express were all represented, as well as some application groups. One subgroup committed to providing a test implementation of each iteration of the standard as it evolved from meeting to meeting; this proved valuable in uncovering the implementation consequences of API decisions, as well as ensuring that when the standard definition was completed, a prototype implementation was immediately available. Marc Snir, Professor of Computer Science at the University of Illinois and an original Forum member representing IBM, has said, “The MPI Forum was an outstanding example of many companies, research labs, and individuals working together to achieve a common good.”

The first version of the MPI standard was published in May 1994. It included standard versions of many well-known message-passing operations such as blocking and nonblocking sends and receives, together with collective operations such as broadcast, reduce, and scan. It broke new ground with its concept of communicators (essential for the modularity of MPI-based libraries), datatypes (to deal efficiently with structured and noncontiguous messages), and process topologies (ignored by many in those days but becoming more significant on today’s machines). Its inclusion of both Fortran and C bindings (with identical semantics) signaled its desire to be immediately useful to both libraries and end-user scientific applications.

MPI also took an innovative approach to the problem of tools for debugging and performance analysis. Rather than designing such a tool into the standard specification itself, MPI provided a mechanism, its “profiling interface,” by which anyone could write a library that intercepted a subset of MPI calls in order to count, measure, or display them in some way, before (and after) passing them to the underlying MPI implementation for actual execution. As expected, this has spawned a wide collection of tools that are completely portable, since the profiling interface is part of the standard rather than the tool itself.

During the 1993-1994 meetings of the MPI Forum, several issues were postponed in order to reach early agreement on a core of message-passing functionality, which nonetheless included several innovative concepts, such as communicators, datatypes, and topologies. The Forum reconvened during 1995-1997 to extend MPI to include remote memory operations, parallel I/O, and dynamic process management, along with a number of features designed to increase the convenience and robustness of MPI. This effort resulted in the MPI-2 standard, released in 1997. MPI-2 had three major new feature sets:  an extensive interface to efficiently support parallel file I/O to and from MPI programs; support for one-sided (put/get) communication; and dynamic process management, namely, the ability to create additional processes from a running MPI program and the ability for separately started MPI applications to connect to each other and communicate. MPI-2 also introduced other features, such as precisely defined semantics for multithreaded communication that in some way foreshadowed the multiple modes of OpenMP parallelism, bindings for Fortran-90 and C++, and detailed support for mixed language programming (how to send a message from Fortran and have it received in C, for example).

While the MPI-2 standard was finished in 1997, it took a few years for full implementations to appear. In contrast to the MPI-1 effort, there was no hand-in-hand prototype developed for most of the additions of MPI-2, and in retrospect, some of the useful feedback on the standardization process from a co-developed prototype was missing. Nevertheless, over the next decade and a half, MPI filled the needs of most computational science codes that required a high-performance, scalable, portable programming system. The Forum itself disbanded.

The timing of MPI seems to have been about right. Trying to establish such a standard earlier might have failed to benefit from research into multiple approaches. Indeed, some feared that adoption of a standard would shut down research into the message-passing model. In fact, the opposite happened. Having a fairly complete, performance-enabling, portable interface target stimulated a wealth of research into implementation approaches, tool development, and application algorithms. Much of the research appeared in the Proceedings of the Euro-* conferences, underlining the international nature of MPI-based research. These workshops started as PVM (Parallel Virtual Machine) user group meetings, became EuroPVM workshops from 1994 to 1996, EuroPVM/MPI from 2007 to 2009, and EuroMPI from 2010 to 2017. It is telling and amusing that “Euro”MPI 2017 will be held in Chicago this year.

Over the next fifteen years or so, the MPI Forum itself was inactive, the published standard remained unchanged, and MPI was a stable interface for users and implementers alike. Vendors used the open-source prototype implementations (MPICH, and later OpenMPI), layered to allow optimizations at multiple levels, to evolve their proprietary implementations over time in order to gradually take advantage of their own evolving specialized hardware.

This was no mean feat. As Bill Gropp, Acting Director and Chief scientist at the National Center for Supercomputing Applications, says, “One of the hardest things about an MPI implementation is keeping the implementation focused on the future. This requires finding a balance between making engineering decisions based on today’s hardware and designing and implementing for likely directions in the future.”  Many message-passing applications, written in customized ways to deal with the portability problem, switched to making direct MPI calls, improving efficiency and maintainability. And library development was unleashed, fulfilling one of MPI’s original goals. Barry Smith, Senior Computer Scientist at Argonne National Laboratory and primary developer of the PETSc library, explains MPI’s contribution to library development as follows:  “MPI changed everything, by providing an extensive API for message passing and collectives that allowed portable distributed memory scientific libraries to no longer need to be programmed to the lowest common denominator of message passing systems. Equally important, MPI eliminated the problem of ‘tag collision’ where each library might utilize the same tags for messages, resulting in messages sent from one library being (improperly) received and processed by a different library or the application code. The MPI communicator concept made distributed parallel scientific libraries practical in two ways, it eliminated the tag collision problem and (by the use of subcommunicators) allowed applications to simply utilize scientific libraries to perform needed computations on subsets of processes, for example with ‘divide and conquer’ algorithms.”

For more than a decade after the Forum disbanded in 1997, the MPI specification remained stable, providing a period during which MPI could “sink in” while implementations steadily improved, parallel libraries flourished, and applications, now portable, took advantage of multiple new tera- and petascale machines, challenging those implementations and libraries to become ever more scalable. However, HPC moves fast, and after a dozen years multiple trends had gradually increased community pressure to restart the MPI process, whose inclusiveness and openness had served the community so well in the past.

For one thing, the scale of massively parallel systems had reached more than a million cores. Single-core processors had disappeared, nodes had become symmetric multiprocessors, and defining how a distributed-memory model like MPI’s would interact with threads (specifically, the emerging OpenMP standard) and shared memory became more critical. Remote memory access (put/get) support in networks became mainstream, raising the applicability of efficient remote memory access (RMA) as a programming model. Although MPI-2’s RMA was used by some applications, it had failed to live up to expectations and needed an overhaul. C and Fortran had both evolved, requiring updates to the MPI interfaces. Nonblocking collective operations had been proposed, and some experience with them obtained. At the time of MPI-2, nonblocking collectives had been considered but deliberately left out of the standard because of the expectation that they could be implemented on top of MPI by issuing blocking operations in separate threads. However, threads turned out to be more difficult to use efficiently, and support for threads was uneven. The increase in scale had brought fault tolerance issues to the fore. And finally, a list of (mostly) minor errata had accumulated.

In response to all this, the MPI Forum reconstituted itself in 2008, at first tidying up MPI-2 and eventually releasing the initial version of MPI-3 in September 2012. Major new features of MPI-3 include the nonblocking collective operations, together with “neighborhood” collectives, useful for stencil computations and relying on the topology functions from MPI-1. (The concept of a nonblocking barrier was considered a joke during the MPI-1 meetings; now MPI has one!) There is an improved one-sided communication interface as well as a tools interface that goes beyond MPI-1’s profiling interface to dynamically access the behavior of an MPI implementation. The Fortran bindings have been updated to take advantage of the Fortran 2008 standard, which was a major step forward in making Fortran work well with libraries in a parallel environment. C bindings were modernized to catch more errors at compile time. Other new features improved interactions with threads and shared memory.

Some topics that the MPI-3 Forum grappled with have not (yet) become part of MPI, such as fault tolerance and more complex support for multithreaded programming, because the Forum decided that current proposals were not quite ready for standardization. The Forum continues to work on these and other issues. Martin Schulz, Computer Scientist at Lawrence Livermore National Laboratory and current chairperson of the MPI-3 Forum, says, “As MPI has established itself as the dominant standard in HPC, it has been exciting and rewarding to see that the members of the MPI forum have not been resting on their laurels. Instead, the Forum continues to drive innovation balanced with the pragmatism necessary for a standards document as we race towards exascale as well as to embrace new commercial application fields and their different requirements.”

Many of the participants in this decades-long effort will speak at the “25 Years of MPI” symposium during the EuroMPI Workshop to be held at Argonne National Laboratory near Chicago on September 25-27, 2017.

About the Authors

Ewing “Rusty” Lusk is Argonne Distinguished Fellow Emeritus at Argonne National Laboratory.

Prof. Jesper Larsson Träff is on the Faculty of Informatics at the Vienna University of Technology.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Talk to Me: Nvidia Claims NLP Inference, Training Records

August 15, 2019

Nvidia says it’s achieved significant advances in conversation natural language processing (NLP) training and inference, enabling more complex, immediate-response interchanges between customers and chatbots. And the co Read more…

By Doug Black

Trump Administration and NIST Issue AI Standards Development Plan

August 14, 2019

Efforts to develop AI are gathering steam fast. On Monday, the White House issued a federal plan to help develop technical standards for AI following up on a mandate contained in the Administration’s AI Executive Order Read more…

By John Russell

Scientists to Tap Exascale Computing to Unlock the Mystery of our Accelerating Universe

August 14, 2019

The universe and everything in it roared to life with the Big Bang approximately 13.8 billion years ago. It has continued expanding ever since. While we have a good understanding of the early universe, its fate billions Read more…

By Rob Johnson

AWS Solution Channel

Efficiency and Cost-Optimization for HPC Workloads – AWS Batch and Amazon EC2 Spot Instances

High Performance Computing on AWS leverages the power of cloud computing and the extreme scale it offers to achieve optimal HPC price/performance. With AWS you can right size your services to meet exactly the capacity requirements you need without having to overprovision or compromise capacity. Read more…

HPE Extreme Performance Solutions

Bring the combined power of HPC and AI to your business transformation

FPGA (Field Programmable Gate Array) acceleration cards are not new, as they’ve been commercially available since 1984. Typically, the emphasis around FPGAs has centered on the fact that they’re programmable accelerators, and that they can truly offer workload specific hardware acceleration solutions without requiring custom silicon. Read more…

IBM Accelerated Insights

Cloudy with a Chance of Mainframes

[Connect with HPC users and learn new skills in the IBM Spectrum LSF User Community.]

Rapid rates of change sometimes result in unexpected bedfellows. Read more…

Argonne Supercomputer Accelerates Cancer Prediction Research

August 13, 2019

In the fight against cancer, early prediction, which drastically improves prognoses, is critical. Now, new research by a team from Northwestern University – and accelerated by supercomputing resources at Argonne Nation Read more…

By Oliver Peckham

Scientists to Tap Exascale Computing to Unlock the Mystery of our Accelerating Universe

August 14, 2019

The universe and everything in it roared to life with the Big Bang approximately 13.8 billion years ago. It has continued expanding ever since. While we have a Read more…

By Rob Johnson

AI is the Next Exascale – Rick Stevens on What that Means and Why It’s Important

August 13, 2019

Twelve years ago the Department of Energy (DOE) was just beginning to explore what an exascale computing program might look like and what it might accomplish. Today, DOE is repeating that process for AI, once again starting with science community town halls to gather input and stimulate conversation. The town hall program... Read more…

By Tiffany Trader and John Russell

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

Lenovo Drives Single-Socket Servers with AMD Epyc Rome CPUs

August 7, 2019

No summer doldrums here. As part of the AMD Epyc Rome launch event in San Francisco today, Lenovo announced two new single-socket servers, the ThinkSystem SR635 Read more…

By Doug Black

Building Diversity and Broader Engagement in the HPC Community

August 7, 2019

Increasing diversity and inclusion in HPC is a community-building effort. Representation of both issues and individuals matters - the more people see HPC in a w Read more…

By AJ Lauer

Xilinx vs. Intel: FPGA Market Leaders Launch Server Accelerator Cards

August 6, 2019

The two FPGA market leaders, Intel and Xilinx, both announced new accelerator cards this week designed to handle specialized, compute-intensive workloads and un Read more…

By Doug Black

Upcoming NSF Cyberinfrastructure Projects to Support ‘Long-Tail’ Users, AI and Big Data

August 5, 2019

The National Science Foundation is well positioned to support national priorities, as new NSF-funded HPC systems to come online in the upcoming year promise to Read more…

By Ken Chiacchia, Pittsburgh Supercomputing Center/XSEDE

High Performance (Potato) Chips

May 5, 2006

In this article, we focus on how Procter & Gamble is using high performance computing to create some common, everyday supermarket products. Tom Lange, a 27-year veteran of the company, tells us how P&G models products, processes and production systems for the betterment of consumer package goods. Read more…

By Michael Feldman

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

Cray, AMD to Extend DOE’s Exascale Frontier

May 7, 2019

Cray and AMD are coming back to Oak Ridge National Laboratory to partner on the world’s largest and most expensive supercomputer. The Department of Energy’s Read more…

By Tiffany Trader

Graphene Surprises Again, This Time for Quantum Computing

May 8, 2019

Graphene is fascinating stuff with promise for use in a seeming endless number of applications. This month researchers from the University of Vienna and Institu Read more…

By John Russell

AMD Verifies Its Largest 7nm Chip Design in Ten Hours

June 5, 2019

AMD announced last week that its engineers had successfully executed the first physical verification of its largest 7nm chip design – in just ten hours. The AMD Radeon Instinct Vega20 – which boasts 13.2 billion transistors – was tested using a TSMC-certified Calibre nmDRC software platform from Mentor. Read more…

By Oliver Peckham

TSMC and Samsung Moving to 5nm; Whither Moore’s Law?

June 12, 2019

With reports that Taiwan Semiconductor Manufacturing Co. (TMSC) and Samsung are moving quickly to 5nm manufacturing, it’s a good time to again ponder whither goes the venerable Moore’s law. Shrinking feature size has of course been the primary hallmark of achieving Moore’s law... Read more…

By John Russell

Deep Learning Competitors Stalk Nvidia

May 14, 2019

There is no shortage of processing architectures emerging to accelerate deep learning workloads, with two more options emerging this week to challenge GPU leader Nvidia. First, Intel researchers claimed a new deep learning record for image classification on the ResNet-50 convolutional neural network. Separately, Israeli AI chip startup Hailo.ai... Read more…

By George Leopold

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Nvidia Embraces Arm, Declares Intent to Accelerate All CPU Architectures

June 17, 2019

As the Top500 list was being announced at ISC in Frankfurt today with an upgraded petascale Arm supercomputer in the top third of the list, Nvidia announced its Read more…

By Tiffany Trader

Top500 Purely Petaflops; US Maintains Performance Lead

June 17, 2019

With the kick-off of the International Supercomputing Conference (ISC) in Frankfurt this morning, the 53rd Top500 list made its debut, and this one's for petafl Read more…

By Tiffany Trader

A Behind-the-Scenes Look at the Hardware That Powered the Black Hole Image

June 24, 2019

Two months ago, the first-ever image of a black hole took the internet by storm. A team of scientists took years to produce and verify the striking image – an Read more…

By Oliver Peckham

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

Cray – and the Cray Brand – to Be Positioned at Tip of HPE’s HPC Spear

May 22, 2019

More so than with most acquisitions of this kind, HPE’s purchase of Cray for $1.3 billion, announced last week, seems to have elements of that overused, often Read more…

By Doug Black and Tiffany Trader

Chinese Company Sugon Placed on US ‘Entity List’ After Strong Showing at International Supercomputing Conference

June 26, 2019

After more than a decade of advancing its supercomputing prowess, operating the world’s most powerful supercomputer from June 2013 to June 2018, China is keep Read more…

By Tiffany Trader

In Wake of Nvidia-Mellanox: Xilinx to Acquire Solarflare

April 25, 2019

With echoes of Nvidia’s recent acquisition of Mellanox, FPGA maker Xilinx has announced a definitive agreement to acquire Solarflare Communications, provider Read more…

By Doug Black

Qualcomm Invests in RISC-V Startup SiFive

June 7, 2019

Investors are zeroing in on the open standard RISC-V instruction set architecture and the processor intellectual property being developed by a batch of high-flying chip startups. Last fall, Esperanto Technologies announced a $58 million funding round. Read more…

By George Leopold

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This