MPI Is 25 Years Old!

By Ewing Lusk and Jesper Larsson Träff

May 1, 2017

Has it really been 25 years since the Message Passing Interface standard was born? It has indeed, and at this year’s EuroMPI meeting in September in Chicago, a “birthday” symposium will be held to celebrate the occasion. Speakers from the remote past of MPI, the middle years, and the current time will touch on the ideas that have given MPI its long life and will highlight the impact the standard has had on multiple aspects of parallel computing, from applications to libraries to its multiple implementations.

The concept of a standard for message passing emerged over time. While assorted systems, both commercial and free, competed for “mind share” and commercial success, a small meeting of researchers took place in 1991 at a conference in Oberlech, Austria. There Jack Dongarra, Rolf Hempel, Tony Hey, and David Walker drafted a white paper outlining a proposal for what a standard might look like, borrowing heavily from Marc Snir’s work at IBM. Jack Dongarra, Professor of Computer Science at the University of Tennessee, recalls, “Each of the existing systems had merit, but none had everything needed to move application development forward. We decided to instigate a community effort to address the problem.” It seems reasonable to affix the label “Birth of MPI” to the resulting workshop entitled “Standards for Message Passing in a Distributed Memory Environment” organized by Jack Dongarra and David Walker with funding from the Ken Kennedy Center for Research in Parallel Computation at Rice University in April 1992. That was the first time a wide variety of interested stakeholders gathered in an open meeting dedicated to the topic of a standard for message passing, forecasting the openness of the process that would follow. The result of that workshop, which featured presentations on multiple vendor-specific and portable systems, was a realization that a great diversity of good ideas existed among then-current message-passing libraries but that the lack of a standard was impeding the progress of parallel computing.

Jack Dongarra

At the Supercomputing ’92 conference in November, a committee was formed to define a message-passing standard. At the time of creation, no one knew what the outcome might look like, but the effort was begun with the following objectives:  (1) to define a portable standard for message-passing, which would not be an official, ANSI-like standard but would attract both implementers and users; (2) to operate in a completely open way, allowing anyone to join the discussions, either by attending meetings in person or by monitoring open email discussions; and (3) to be finished in one year.

The MPI effort was a lively one, as a result of the tensions among these three objectives. The committee decided to follow the format used by the High-Performance Fortran Forum, whose procedures had been well received by its community. (It even decided to meet in the same hotel in North Dallas.)  An early decision of the MPI Forum was to not adopt any existing system or proposal as a starting but to start from scratch, with the explicit goals of portability, expressiveness, and performance capability. “Ease of use” was not a primary goal; the idea was that libraries, compilers, and other software layers would provide this aspect of parallel programming, and that applications would rely on their implementations over MPI to provide convenience of programming.

More formal meetings began in January 1993 under the name “MPI Forum,” an extension of the SC ’92 committee, and continued until the following February. Over that time, more than 60 people from 40 organizations participated, although attendance at most meetings was about 30. The procedures for submitting proposals and voting were adopted from those of HPF Forum, which had worked well. One reason the MPI standardization effort succeeded was that the MPI Forum itself was so broadly based. At the original (MPI-1) Forum the parallel computer vendors were represented by Convex, Cray, IBM, Intel, Meiko, nCUBE, NEC, and Thinking Machines. Members of the groups associated with portable software libraries were also there: PVM, p4, Zipcode, Chameleon, PARMACS, TCGMSG, and Express were all represented, as well as some application groups. One subgroup committed to providing a test implementation of each iteration of the standard as it evolved from meeting to meeting; this proved valuable in uncovering the implementation consequences of API decisions, as well as ensuring that when the standard definition was completed, a prototype implementation was immediately available. Marc Snir, Professor of Computer Science at the University of Illinois and an original Forum member representing IBM, has said, “The MPI Forum was an outstanding example of many companies, research labs, and individuals working together to achieve a common good.”

The first version of the MPI standard was published in May 1994. It included standard versions of many well-known message-passing operations such as blocking and nonblocking sends and receives, together with collective operations such as broadcast, reduce, and scan. It broke new ground with its concept of communicators (essential for the modularity of MPI-based libraries), datatypes (to deal efficiently with structured and noncontiguous messages), and process topologies (ignored by many in those days but becoming more significant on today’s machines). Its inclusion of both Fortran and C bindings (with identical semantics) signaled its desire to be immediately useful to both libraries and end-user scientific applications.

MPI also took an innovative approach to the problem of tools for debugging and performance analysis. Rather than designing such a tool into the standard specification itself, MPI provided a mechanism, its “profiling interface,” by which anyone could write a library that intercepted a subset of MPI calls in order to count, measure, or display them in some way, before (and after) passing them to the underlying MPI implementation for actual execution. As expected, this has spawned a wide collection of tools that are completely portable, since the profiling interface is part of the standard rather than the tool itself.

During the 1993-1994 meetings of the MPI Forum, several issues were postponed in order to reach early agreement on a core of message-passing functionality, which nonetheless included several innovative concepts, such as communicators, datatypes, and topologies. The Forum reconvened during 1995-1997 to extend MPI to include remote memory operations, parallel I/O, and dynamic process management, along with a number of features designed to increase the convenience and robustness of MPI. This effort resulted in the MPI-2 standard, released in 1997. MPI-2 had three major new feature sets:  an extensive interface to efficiently support parallel file I/O to and from MPI programs; support for one-sided (put/get) communication; and dynamic process management, namely, the ability to create additional processes from a running MPI program and the ability for separately started MPI applications to connect to each other and communicate. MPI-2 also introduced other features, such as precisely defined semantics for multithreaded communication that in some way foreshadowed the multiple modes of OpenMP parallelism, bindings for Fortran-90 and C++, and detailed support for mixed language programming (how to send a message from Fortran and have it received in C, for example).

While the MPI-2 standard was finished in 1997, it took a few years for full implementations to appear. In contrast to the MPI-1 effort, there was no hand-in-hand prototype developed for most of the additions of MPI-2, and in retrospect, some of the useful feedback on the standardization process from a co-developed prototype was missing. Nevertheless, over the next decade and a half, MPI filled the needs of most computational science codes that required a high-performance, scalable, portable programming system. The Forum itself disbanded.

The timing of MPI seems to have been about right. Trying to establish such a standard earlier might have failed to benefit from research into multiple approaches. Indeed, some feared that adoption of a standard would shut down research into the message-passing model. In fact, the opposite happened. Having a fairly complete, performance-enabling, portable interface target stimulated a wealth of research into implementation approaches, tool development, and application algorithms. Much of the research appeared in the Proceedings of the Euro-* conferences, underlining the international nature of MPI-based research. These workshops started as PVM (Parallel Virtual Machine) user group meetings, became EuroPVM workshops from 1994 to 1996, EuroPVM/MPI from 2007 to 2009, and EuroMPI from 2010 to 2017. It is telling and amusing that “Euro”MPI 2017 will be held in Chicago this year.

Over the next fifteen years or so, the MPI Forum itself was inactive, the published standard remained unchanged, and MPI was a stable interface for users and implementers alike. Vendors used the open-source prototype implementations (MPICH, and later OpenMPI), layered to allow optimizations at multiple levels, to evolve their proprietary implementations over time in order to gradually take advantage of their own evolving specialized hardware.

This was no mean feat. As Bill Gropp, Acting Director and Chief scientist at the National Center for Supercomputing Applications, says, “One of the hardest things about an MPI implementation is keeping the implementation focused on the future. This requires finding a balance between making engineering decisions based on today’s hardware and designing and implementing for likely directions in the future.”  Many message-passing applications, written in customized ways to deal with the portability problem, switched to making direct MPI calls, improving efficiency and maintainability. And library development was unleashed, fulfilling one of MPI’s original goals. Barry Smith, Senior Computer Scientist at Argonne National Laboratory and primary developer of the PETSc library, explains MPI’s contribution to library development as follows:  “MPI changed everything, by providing an extensive API for message passing and collectives that allowed portable distributed memory scientific libraries to no longer need to be programmed to the lowest common denominator of message passing systems. Equally important, MPI eliminated the problem of ‘tag collision’ where each library might utilize the same tags for messages, resulting in messages sent from one library being (improperly) received and processed by a different library or the application code. The MPI communicator concept made distributed parallel scientific libraries practical in two ways, it eliminated the tag collision problem and (by the use of subcommunicators) allowed applications to simply utilize scientific libraries to perform needed computations on subsets of processes, for example with ‘divide and conquer’ algorithms.”

For more than a decade after the Forum disbanded in 1997, the MPI specification remained stable, providing a period during which MPI could “sink in” while implementations steadily improved, parallel libraries flourished, and applications, now portable, took advantage of multiple new tera- and petascale machines, challenging those implementations and libraries to become ever more scalable. However, HPC moves fast, and after a dozen years multiple trends had gradually increased community pressure to restart the MPI process, whose inclusiveness and openness had served the community so well in the past.

For one thing, the scale of massively parallel systems had reached more than a million cores. Single-core processors had disappeared, nodes had become symmetric multiprocessors, and defining how a distributed-memory model like MPI’s would interact with threads (specifically, the emerging OpenMP standard) and shared memory became more critical. Remote memory access (put/get) support in networks became mainstream, raising the applicability of efficient remote memory access (RMA) as a programming model. Although MPI-2’s RMA was used by some applications, it had failed to live up to expectations and needed an overhaul. C and Fortran had both evolved, requiring updates to the MPI interfaces. Nonblocking collective operations had been proposed, and some experience with them obtained. At the time of MPI-2, nonblocking collectives had been considered but deliberately left out of the standard because of the expectation that they could be implemented on top of MPI by issuing blocking operations in separate threads. However, threads turned out to be more difficult to use efficiently, and support for threads was uneven. The increase in scale had brought fault tolerance issues to the fore. And finally, a list of (mostly) minor errata had accumulated.

In response to all this, the MPI Forum reconstituted itself in 2008, at first tidying up MPI-2 and eventually releasing the initial version of MPI-3 in September 2012. Major new features of MPI-3 include the nonblocking collective operations, together with “neighborhood” collectives, useful for stencil computations and relying on the topology functions from MPI-1. (The concept of a nonblocking barrier was considered a joke during the MPI-1 meetings; now MPI has one!) There is an improved one-sided communication interface as well as a tools interface that goes beyond MPI-1’s profiling interface to dynamically access the behavior of an MPI implementation. The Fortran bindings have been updated to take advantage of the Fortran 2008 standard, which was a major step forward in making Fortran work well with libraries in a parallel environment. C bindings were modernized to catch more errors at compile time. Other new features improved interactions with threads and shared memory.

Some topics that the MPI-3 Forum grappled with have not (yet) become part of MPI, such as fault tolerance and more complex support for multithreaded programming, because the Forum decided that current proposals were not quite ready for standardization. The Forum continues to work on these and other issues. Martin Schulz, Computer Scientist at Lawrence Livermore National Laboratory and current chairperson of the MPI-3 Forum, says, “As MPI has established itself as the dominant standard in HPC, it has been exciting and rewarding to see that the members of the MPI forum have not been resting on their laurels. Instead, the Forum continues to drive innovation balanced with the pragmatism necessary for a standards document as we race towards exascale as well as to embrace new commercial application fields and their different requirements.”

Many of the participants in this decades-long effort will speak at the “25 Years of MPI” symposium during the EuroMPI Workshop to be held at Argonne National Laboratory near Chicago on September 25-27, 2017.

About the Authors

Ewing “Rusty” Lusk is Argonne Distinguished Fellow Emeritus at Argonne National Laboratory.

Prof. Jesper Larsson Träff is on the Faculty of Informatics at the Vienna University of Technology.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

What’s New in HPC Research: Volcanoes, Mobile Games, Proteins & More

July 14, 2020

In this bimonthly feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

By Oliver Peckham

Joliot-Curie Supercomputer Used to Build First Full, High-Fidelity Aircraft Engine Simulation

July 14, 2020

When industrial designers plan the design of a new element of a vehicle’s propulsion or exterior, they typically use fluid dynamics to optimize airflow and increase the vehicle’s speed and efficiency. These fluid dyn Read more…

By Oliver Peckham

U.S. CTO Michael Kratsios Adds DoD Research & Engineering Title

July 13, 2020

Michael Kratsios, the U.S. Chief Technology Officer, has been appointed acting Undersecretary of Defense for research and engineering. He replaces Mike Griffin, who along with his deputy Lis Porter, stepped down last wee Read more…

By John Russell

Supercomputer Research Reveals Star Cluster Born Outside Our Galaxy

July 11, 2020

The Milky Way is our galactic home, containing our solar system and continuing into a giant band of densely packed stars that stretches across clear night skies around the world – but, it turns out, not all of those st Read more…

By Oliver Peckham

Max Planck Society Begins Installation of Liquid-Cooled Supercomputer from Lenovo

July 9, 2020

Lenovo announced today that it is supplying a new high performance computer to the Max Planck Society, one of Germany's premier research organizations. Comprised of Intel Xeon processors and Nvidia A100 GPUs, and featuri Read more…

By Tiffany Trader

AWS Solution Channel

INEOS TEAM UK Accelerates Boat Design for America’s Cup Using HPC on AWS

The America’s Cup Dream

The 36th America’s Cup race will be decided in Auckland, New Zealand in 2021. Like all the teams, INEOS TEAM UK will compete in a boat whose design will have followed guidelines set by race organizers to ensure the crew’s sailing skills are fully tested. Read more…

Intel® HPC + AI Pavilion

Supercomputing the Pandemic: Scientific Community Tackles COVID-19 from Multiple Perspectives

Since their inception, supercomputers have taken on the biggest, most complex, and most data-intensive computing challenges—from confirming Einstein’s theories about gravitational waves to predicting the impacts of climate change. Read more…

Xilinx Announces First Adaptive Computing Challenge

July 9, 2020

A new contest is challenging the computing world. Xilinx has announced the first Xilinx Adaptive Computing Challenge, a competition that will task developers and startups with finding creative workload acceleration solutions. Xilinx is running the Adaptive Computing Challenge in partnership with Hackster.io, a developing community... Read more…

By Staff report

Max Planck Society Begins Installation of Liquid-Cooled Supercomputer from Lenovo

July 9, 2020

Lenovo announced today that it is supplying a new high performance computer to the Max Planck Society, one of Germany's premier research organizations. Comprise Read more…

By Tiffany Trader

President’s Council Targets AI, Quantum, STEM; Recommends Spending Growth

July 9, 2020

Last week the President Council of Advisors on Science and Technology (PCAST) met (webinar) to review policy recommendations around three sub-committee reports: Read more…

By John Russell

Google Cloud Debuts 16-GPU Ampere A100 Instances

July 7, 2020

On the heels of the Nvidia’s Ampere A100 GPU launch in May, Google Cloud is announcing alpha availability of the A100 “Accelerator Optimized” VM A2 instance family on Google Compute Engine. The instances are powered by the HGX A100 16-GPU platform, which combines two HGX A100 8-GPU baseboards using... Read more…

By Tiffany Trader

Q&A: HLRS’s Bastian Koller Tackles HPC and Industry in Germany and Europe

July 6, 2020

In this exclusive interview for HPCwire – sadly not face to face – Steve Conway, senior advisor for Hyperion Research, talks with Dr.-Ing Bastian Koller about the state of HPC and its collaboration with Industry in Europe. Koller is a familiar figure in HPC. He is the managing director at High Performance Computing Center Stuttgart (HLRS) and also serves... Read more…

By Steve Conway, Hyperion

OpenPOWER Reboot – New Director, New Silicon Partners, Leveraging Linux Foundation Connections

July 2, 2020

Earlier this week the OpenPOWER Foundation announced the contribution of IBM’s A21 Power processor core design to the open source community. Roughly this time Read more…

By John Russell

Hyperion Forecast – Headwinds in 2020 Won’t Stifle Cloud HPC Adoption or Arm’s Rise

June 30, 2020

The semiannual taking of HPC’s pulse by Hyperion Research – late fall at SC and early summer at ISC – is a much-watched indicator of things come. This yea Read more…

By John Russell

Racism and HPC: a Special Podcast

June 29, 2020

Promoting greater diversity in HPC is a much-discussed goal and ostensibly a long-sought goal in HPC. Yet it seems clear HPC is far from achieving this goal. Re Read more…

Top500 Trends: Movement on Top, but Record Low Turnover

June 25, 2020

The 55th installment of the Top500 list saw strong activity in the leadership segment with four new systems in the top ten and a crowning achievement from the f Read more…

By Tiffany Trader

Supercomputer Modeling Tests How COVID-19 Spreads in Grocery Stores

April 8, 2020

In the COVID-19 era, many people are treating simple activities like getting gas or groceries with caution as they try to heed social distancing mandates and protect their own health. Still, significant uncertainty surrounds the relative risk of different activities, and conflicting information is prevalent. A team of Finnish researchers set out to address some of these uncertainties by... Read more…

By Oliver Peckham

[email protected] Turns Its Massive Crowdsourced Computer Network Against COVID-19

March 16, 2020

For gamers, fighting against a global crisis is usually pure fantasy – but now, it’s looking more like a reality. As supercomputers around the world spin up Read more…

By Oliver Peckham

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

Supercomputer Simulations Reveal the Fate of the Neanderthals

May 25, 2020

For hundreds of thousands of years, neanderthals roamed the planet, eventually (almost 50,000 years ago) giving way to homo sapiens, which quickly became the do Read more…

By Oliver Peckham

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

Neocortex Will Be First-of-Its-Kind 800,000-Core AI Supercomputer

June 9, 2020

Pittsburgh Supercomputing Center (PSC - a joint research organization of Carnegie Mellon University and the University of Pittsburgh) has won a $5 million award Read more…

By Tiffany Trader

Honeywell’s Big Bet on Trapped Ion Quantum Computing

April 7, 2020

Honeywell doesn’t spring to mind when thinking of quantum computing pioneers, but a decade ago the high-tech conglomerate better known for its control systems waded deliberately into the then calmer quantum computing (QC) waters. Fast forward to March when Honeywell announced plans to introduce an ion trap-based quantum computer whose ‘performance’ would... Read more…

By John Russell

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Leading Solution Providers

Contributors

Nvidia’s Ampere A100 GPU: Up to 2.5X the HPC, 20X the AI

May 14, 2020

Nvidia's first Ampere-based graphics card, the A100 GPU, packs a whopping 54 billion transistors on 826mm2 of silicon, making it the world's largest seven-nanom Read more…

By Tiffany Trader

‘Billion Molecules Against COVID-19’ Challenge to Launch with Massive Supercomputing Support

April 22, 2020

Around the world, supercomputing centers have spun up and opened their doors for COVID-19 research in what may be the most unified supercomputing effort in hist Read more…

By Oliver Peckham

Australian Researchers Break All-Time Internet Speed Record

May 26, 2020

If you’ve been stuck at home for the last few months, you’ve probably become more attuned to the quality (or lack thereof) of your internet connection. Even Read more…

By Oliver Peckham

15 Slides on Programming Aurora and Exascale Systems

May 7, 2020

Sometime in 2021, Aurora, the first planned U.S. exascale system, is scheduled to be fired up at Argonne National Laboratory. Cray (now HPE) and Intel are the k Read more…

By John Russell

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

TACC Supercomputers Run Simulations Illuminating COVID-19, DNA Replication

March 19, 2020

As supercomputers around the world spin up to combat the coronavirus, the Texas Advanced Computing Center (TACC) is announcing results that may help to illumina Read more…

By Staff report

$100B Plan Submitted for Massive Remake and Expansion of NSF

May 27, 2020

Legislation to reshape, expand - and rename - the National Science Foundation has been submitted in both the U.S. House and Senate. The proposal, which seems to Read more…

By John Russell

John Martinis Reportedly Leaves Google Quantum Effort

April 21, 2020

John Martinis, who led Google’s quantum computing effort since establishing its quantum hardware group in 2014, has left Google after being moved into an advi Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This