MPI: Maturing, Evolving, and Becoming More Pervasive

By Salvatore Salamone

August 6, 2018

Have you kept pace with the changes in the message passing interface (MPI) specification? Most likely not, if you’re like most people, even though it is very likely your high-performance computing (HPC) workloads are benefiting from the most recent updates and enhancements done to MPI.

MPI was conceived and developed by academic and industry researchers in the early 1990s and was designed to be a portable message-passing middleware to function on a wide variety of parallel computing architectures.

Today, MPI is widely considered as the de facto parallel programming standard for the most demanding HPC environments. MPI over InfiniBand is used to accelerate workloads on the most powerful supercomputers, including the top three Top500 HPC systems, and four of the top five systems. Additionally, MPI continues to be the dominant middleware for HPC systems and is pervasive across distributed AI/ML applications.

The latest updates and enhancements are being driven by the demands of new application areas such as AI, distributed machine learning (ML), and the increased use of GPUs in HPC environments.

Introducing the MVAPICH project

The MVAPICH project, led by Network-Based Computing Laboratory (NBCL) of The Ohio State University, is developing MPI enhancements to meet the performance demands of these new application areas.

The MVAPICH2 software, based on MPI 3.1 standard, delivers the best performance, scalability, and fault tolerance for high-end computing systems and servers using a wide range of interconnect technologies, including InfiniBand and RoCE networking technologies. The MVAPICH2 software family is also ABI (application binary interface) compatible with various other MPI libraries such as MPICH, IntelMPI, CrayMPI, and others.

The software is now being used by more than 2,925 organizations in 86 countries worldwide to extract the potential of the latest emerging networking technologies, such as in-network computing. As of July, more than 482,000 downloads have taken place from the project’s site, and many vendors, including Mellanox are also distributing this software as part of their own software distributions.

The MVAPICH project is optimizing its implementation of MPI to keep pace with the changing demands in technology and all while keeping faithful to the MPI standard.

Currently, MVAPICH2 2.3 is the latest version and provides many enhancements and new features including MPI-3.1 standards compliance, single copy intra-node communication using Linux supported CMA (Cross Memory Attach), Checkpoint/Restart using LLNL’s Scalable Checkpoint/Restart Library (SCR), high-performance and scalable InfiniBand hardware multicast-based collectives, enhanced shared-memory-aware and intra-node collectives, support for Mellanox SHARP technology for optimized collectives, high-performance communication support for NVIDIA GPU with IPC, collective and non-contiguous datatype support, MPI_T support, and integrated hybrid UD-RC/XRC design, and support for UD only mode. A complete set of features and supported platforms can be found here.

The MVAPICH project’s most recently released libraries are designed to address compute demands and performance requirements of newer HPC workloads and environments.

The libraries deliver specific benefits for different applications or computing needs. They include:

  • MVAPICH2: This library offers support for InfiniBand, RoCE, Ethernet and other interconnect technologies.
  • MVAPICH2-X: A library that includes advanced MPI features (exploiting UMR, ODP and Core-Direct features of InfiniBand), OSU INAM for Network Analysis and Monitoring, PGAS (OpenSHMEM, UPC, UPC++, and CAF), and MPI+PGAS programming models with unified communication runtime
  • MVAPICH2-GDR: This delivers optimized MPI for clusters with NVIDIA GPUs. This library is also designed to deliver high-performance and scalability for the emerging deep-learning applications
  • MVAPICH2-Virt: This library offers high-performance and scalable MPI for hypervisor and container-based HPC cloud applications
  • MVAPICH2-EA: A library for energy aware and High-performance MPI

Selecting a technology partner

When implementing MPI to improve the performance of your HPC environment, it is not done independently, but rather by partnering with technology companies that have expertise in the area.

Mellanox is a leader in the field and offers high-speed interconnect solutions which based on open standards. They work closely with the MVAPICH project to ensure the benefits provided by the project’s latest libraries take full advantage of the improvements and additional offload engines from its latest 100Gb/s EDR and 200Gb/s HDR InfiniBand. Additionally, Mellanox uses the project’s benchmarks to validate performance claims.

At the heart of the Mellanox HPC software offering is its HPC-X™. This is a comprehensive software package that includes MPI, SHMEM, and UPC communications libraries. HPC-X also includes various acceleration packages to improve both the performance and scalability of applications running on top of these libraries, including support for MXM (Mellanox Messaging) which accelerates the underlying send/receive (or put/get) messages, and FCA (Fabric Collectives Accelerations) which accelerates the underlying collective operations used by the MPI/PGAS languages.

Mellanox HPC-X takes full advantage of the Mellanox hardware-based acceleration engines to maximize MPI, SHMEM/PGAS and UPC based applications. These acceleration engines are part of the Mellanox adapter (CORE-Direct engine) and switch (Mellanox SHARP engine) solutions. Mellanox Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™ technology improves upon the performance of MPI operations by offloading collective operations from the CPU to the switch network, and by eliminating the need to send data multiple times between endpoints. This innovative approach decreases the amount of data traversing the network as aggregation nodes are reached which dramatically reduces the MPI operations time.

Implementing collective communication algorithms in the network also has additional benefits, such as freeing up valuable CPU resources for computation rather than using them to process communication.

Mellanox HPC-X allows OEM’s and System Integrators to meet the needs of their end-users by deploying the latest available software that takes advantage of the features and capabilities available in the most recent hardware and firmware changes.


New HPC application areas including AI and ML, along with the growing use of GPUs to accelerate compute-intensive applications requires robust and feature-rich message passing.

The MVAPICH project is producing new MPI libraries to enhance the performance of HPC systems and speed the execution of the most demanding workloads.

Today, MPI is supported on virtually all HPC platforms. It is highly portable. There is little or no need to modify your source code when you port your application to a different platform that supports the MPI standard.

Moreover, vendor implementations, such as Mellanox HPC-X, can further exploit native hardware features to optimize performance.

For more information about the MVAPICH project, visit:

For more details about implementing MPI in today’s demanding HPC environments, visit:

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Neural Network ‘Synapse’ Technology Showcased at IEEE Meeting

December 12, 2018

There’s nice snapshot of advancing work to develop improved neural network “synapse” technologies posted yesterday on IEEE Spectrum. Lower power, ease of use, manufacturability, and performance are all key paramete Read more…

By John Russell

Is Amazon’s Plunge into Server Chips a Watershed Moment?

December 11, 2018

For several years now the big cloud providers – Amazon, Microsoft Azure, Google, et al – have been transforming from technology consumers into technology creators in hardware and software. The most recent example bei Read more…

By John Russell

Mellanox Uses Univa to Extend Silicon Design HPC Operation to Azure

December 11, 2018

Call it a corollary to Murphy’s Law: When a system is most in demand, when end users are most dependent on the system performing as required, when it’s crunch time – that’s when the system is most likely to blow up. Or make you wait in line to use it. Read more…

By Doug Black

HPE Extreme Performance Solutions

AI Can Be Scary. But Choosing the Wrong Partners Can Be Mortifying!

As you continue to dive deeper into AI, you will discover it is more than just deep learning. AI is an extremely complex set of machine learning, deep learning, reinforcement, and analytics algorithms with varying compute, storage, memory, and communications needs. Read more…

IBM Accelerated Insights

Blurring the Lines Between HPC and AI @ SC18

The dominant topic at SC18 was the convergence of HPC and Artificial Intelligence (AI) with some of the biggest research and enterprise HPC users providing perspectives on how HPC and AI are moving closer together. Read more…

Clemson’s Cautionary Cryptomining Tale

December 11, 2018

In some ways, the bigger the computer, the more vulnerable it is to cryptomining as Clemson University discovered after cryptominers dug into its Palmetto supercomputer. When a number of nodes on Clemson University’s P Read more…

By Staff

Topology Can Help Us Find Patterns in Weather

December 6, 2018

Topology--–the study of shapes-- seems to be all the rage. You could even say that data has shape, and shape matters. Shapes are comfortable and familiar conc Read more…

By James Reinders

Zettascale by 2035? China Thinks So

December 6, 2018

Exascale machines (of at least a 1 exaflops peak) are anticipated to arrive by around 2020, a few years behind original predictions; and given extreme-scale performance challenges are not getting any easier, it makes sense that researchers are already looking ahead to the next big 1,000x performance goal post: zettascale computing. Read more…

By Tiffany Trader

Robust Quantum Computers Still a Decade Away, Says Nat’l Academies Report

December 5, 2018

The National Academies of Science, Engineering, and Medicine yesterday released a report – Quantum Computing: Progress and Prospects – whose optimism about Read more…

By John Russell

Revisiting the 2008 Exascale Computing Study at SC18

November 29, 2018

A report published a decade ago conveyed the results of a study aimed at determining if it were possible to achieve 1000X the computational power of the the Read more…

By Scott Gibson

AWS Debuts Lustre as a Service, Accelerates Data Transfer

November 28, 2018

From the Amazon re:Invent main stage in Las Vegas today, Amazon Web Services CEO Andy Jassy introduced Amazon FSx for Lustre, citing a growing body of applicati Read more…

By Tiffany Trader

AWS Launches First Arm Cloud Instances

November 28, 2018

AWS, a macrocosm of the emerging high-performance technology landscape, wants to be everywhere you want to be and offer everything you want to use (or at least Read more…

By Doug Black

Move Over Lustre & Spectrum Scale – Here Comes BeeGFS?

November 26, 2018

Is BeeGFS – the parallel file system with European roots – on a path to compete with Lustre and Spectrum Scale worldwide in HPC environments? Frank Herold Read more…

By John Russell

DOE Under Secretary for Science Paul Dabbar Interviewed at SC18

November 21, 2018

During the 30th annual SC conference in Dallas last week, SC18 hosted U.S. Department of Energy Under Secretary for Science Paul M. Dabbar. In attendance Nov. 13-14, Dabbar delivered remarks at the Top500 panel, met with a number of industry stakeholders and toured the show floor. He also met with HPCwire for an interview, where we discussed the role of the DOE in advancing leadership computing. Read more…

By Tiffany Trader

Quantum Computing Will Never Work

November 27, 2018

Amid the gush of money and enthusiastic predictions being thrown at quantum computing comes a proposed cold shower in the form of an essay by physicist Mikhail Read more…

By John Russell

Cray Unveils Shasta, Lands NERSC-9 Contract

October 30, 2018

Cray revealed today the details of its next-gen supercomputing architecture, Shasta, selected to be the next flagship system at NERSC. We've known of the code-name "Shasta" since the Argonne slice of the CORAL project was announced in 2015 and although the details of that plan have changed considerably, Cray didn't slow down its timeline for Shasta. Read more…

By Tiffany Trader

IBM at Hot Chips: What’s Next for Power

August 23, 2018

With processor, memory and networking technologies all racing to fill in for an ailing Moore’s law, the era of the heterogeneous datacenter is well underway, Read more…

By Tiffany Trader

House Passes $1.275B National Quantum Initiative

September 17, 2018

Last Thursday the U.S. House of Representatives passed the National Quantum Initiative Act (NQIA) intended to accelerate quantum computing research and developm Read more…

By John Russell

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

CERN Project Sees Orders-of-Magnitude Speedup with AI Approach

August 14, 2018

An award-winning effort at CERN has demonstrated potential to significantly change how the physics based modeling and simulation communities view machine learni Read more…

By Rob Farber

AMD Sets Up for Epyc Epoch

November 16, 2018

It’s been a good two weeks, AMD’s Gary Silcott and Andy Parma told me on the last day of SC18 in Dallas at the restaurant where we met to discuss their show news and recent successes. Heck, it’s been a good year. Read more…

By Tiffany Trader

US Leads Supercomputing with #1, #2 Systems & Petascale Arm

November 12, 2018

The 31st Supercomputing Conference (SC) - commemorating 30 years since the first Supercomputing in 1988 - kicked off in Dallas yesterday, taking over the Kay Ba Read more…

By Tiffany Trader

Leading Solution Providers

SC 18 Virtual Booth Video Tour

Advania @ SC18 AMD @ SC18
ASRock Rack @ SC18
DDN Storage @ SC18
HPE @ SC18
IBM @ SC18
Lenovo @ SC18 Mellanox Technologies @ SC18
One Stop Systems @ SC18
Oracle @ SC18 Panasas @ SC18
Supermicro @ SC18 SUSE @ SC18 TYAN @ SC18
Verne Global @ SC18

TACC’s ‘Frontera’ Supercomputer Expands Horizon for Extreme-Scale Science

August 29, 2018

The National Science Foundation and the Texas Advanced Computing Center announced today that a new system, called Frontera, will overtake Stampede 2 as the fast Read more…

By Tiffany Trader

HPE No. 1, IBM Surges, in ‘Bucking Bronco’ High Performance Server Market

September 27, 2018

Riding healthy U.S. and global economies, strong demand for AI-capable hardware and other tailwind trends, the high performance computing server market jumped 28 percent in the second quarter 2018 to $3.7 billion, up from $2.9 billion for the same period last year, according to industry analyst firm Hyperion Research. Read more…

By Doug Black

Nvidia’s Jensen Huang Delivers Vision for the New HPC

November 14, 2018

For nearly two hours on Monday at SC18, Jensen Huang, CEO of Nvidia, presented his expansive view of the future of HPC (and computing in general) as only he can do. Animated. Backstopped by a stream of data charts, product photos, and even a beautiful image of supernovae... Read more…

By John Russell

Germany Celebrates Launch of Two Fastest Supercomputers

September 26, 2018

The new high-performance computer SuperMUC-NG at the Leibniz Supercomputing Center (LRZ) in Garching is the fastest computer in Germany and one of the fastest i Read more…

By Tiffany Trader

Houston to Field Massive, ‘Geophysically Configured’ Cloud Supercomputer

October 11, 2018

Based on some news stories out today, one might get the impression that the next system to crack number one on the Top500 would be an industrial oil and gas mon Read more…

By Tiffany Trader

Intel Confirms 48-Core Cascade Lake-AP for 2019

November 4, 2018

As part of the run-up to SC18, taking place in Dallas next week (Nov. 11-16), Intel is doling out info on its next-gen Cascade Lake family of Xeon processors, specifically the “Advanced Processor” version (Cascade Lake-AP), architected for high-performance computing, artificial intelligence and infrastructure-as-a-service workloads. Read more…

By Tiffany Trader

Google Releases Machine Learning “What-If” Analysis Tool

September 12, 2018

Training machine learning models has long been time-consuming process. Yesterday, Google released a “What-If Tool” for probing how data point changes affect a model’s prediction. The new tool is being launched as a new feature of the open source TensorBoard web application... Read more…

By John Russell

The Convergence of Big Data and Extreme-Scale HPC

August 31, 2018

As we are heading towards extreme-scale HPC coupled with data intensive analytics like machine learning, the necessary integration of big data and HPC is a curr Read more…

By Rob Farber

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This