Until a few years ago, bringing a new network or new programming model onto the scene required an enormous amount of repetitious development that effectively shut out innovation opportunities. Then along came the UCF (Unified Communication Framework) Consortium, an active collaboration between industry, laboratories, and academia. As an umbrella consortium, UCF is a great example of a collaboration and co-design approach as it includes multiple users of high performance computing (HPC) platforms and involves multiple vendors. Los Alamos National Lab (LANL) is chairing the consortium, which includes AMD, Argonne, ARM, IBM, Mellanox, NVIDIA, Ohio State University and others —all active participants in the development – amongst other users and other vendors and the U.S. Government (USG). Much of the original funding for OpenUCX communication framework was done by the USG. Having open standards is very much in the interest of the USG as it creates a more open and rich development environment, without disadvantaging a variety of vendors. Having a rich environment to select from, is always best, especially for the tax payer.
Humble Beginnings
According to Jeff Kuehn from LANL, the idea for the UCF-style consortium and its eventual project OpenUCX, an open-source framework, was conceived in a car ride from Los Alamos up to Colorado Springs and Denver. “Steve Poole and Rich Graham (both at LANL at the time, working with Mellanox and Gilad Shainer) were discussing stacked architecture and recognized the need for a middleware and the problem. We realized we keep recreating this middle space between the programming language and the network hardware.” As fate would have it, the four (Steve, Rich, Jeff and Pavel Shamis) began to tackle the problem, based on Rich and Pavel’s experience in developing middleware layers, insights provided by Steve, and application and benchmarking experience from Jeff. The project was originally named UCCS, The Unified Common Communication Substrate, but was renamed UCX once the project participation was opened to other collaborators. While all four have since moved on from Oak Ridge, they continue to collaborate.
As the common practice was to build an abstraction into software and other services provided, Jeff explained that any vendor inventing a new network was forced to build an entire MPI stack on top of it; likewise, if a developer set out to write a new programming library, whether an MPI or SHMEM, they had to code the entire MPI stack all the way down to the networking layer. “This amounted to excessive recreated effort. Worse, it created a barrier to entry and shut out innovation opportunities.”
All the developers hoped to redefine the barrier between the two, and in doing so, to reduce that overall barrier to entry
Pavel Shamis from ARM spoke about the requirements for an efficient, easy-to-use networking layer, that was portable both in terms of network and CPU architectures. “We searched but found nothing on the market we could easily adopt for our programming models, without making a substantial investment in new networking layers. That’s when we decided to develop our own layer – UCX.”
Jeff added that “by creating a ‘clever cut’ between the network innovation space and the programming model innovation space, network vendors are required to provide only up to that layer, and every programming model that uses UCX immediately works on that network. The converse is also true for developers of new programming models – if one develops the programming model to use UCX, it will work across of the networks supporting UCX. Essentially, by coming up with this clever cut between the two – this delineation between the programming model and network abstraction, opens the innovation space both for networks and programming models and permits both network vendors and programming model developers to focus on their area of expertise.”
One year into the project, the team decided the effort alone was too great for a small research team, so they sought out an industry partnership for reached out to IBM, Mellanox and Nvidia, among a group network of technology leaders.
A Critical Stepping Stone to Exascale
According to Jeff, one of the things the team discovered through Pavel’s early work on UCCS was how long it takes to get bits on a wire from the time one says, ‘I want to send a message’. During the initial pass, the team was able to drive figures down to sub-70 nanoseconds. “It’s very difficult to get more efficient than that. That’s very bare-bones in terms of a very lightweight API getting bits on the wire.”
The economic impact involved in rewriting an application for a new machine is daunting. In fact, when the LANL networking team had originally looked at InfiniBand as a potential replacement network for HIPPI and GSN many years ago, the projections were that the overall cost from concept to commercially viable product would be between $500M/$1B overall investment. Vendors and customers alike usually get caught up in a continuous process of rewriting, redeveloping, and improving codes – however, the cycle time on that is very long. In fact, the cycle-time to develop, deploy, validate, and use new application code is significantly longer than any life-span of any one machine – this is particularly true for the extremely complex physics simulations at LANL, where it can take a decade of development and validation to field a new simulation code. When it comes to targeting the code to a specific machine, it’s critical to maintain the code portability. This implies the code or codes can’t target only machine or system if survival is a goal. UCX and the UCF consortium have made this a requirement.
Pavel points out that “Developers writing the programs for such machines, see with UCX an abstraction that’s easier to program to, and which still gets very high performance. Via lower-level network abstraction, UCX provides insulation from low-level changes while still providing high performance. Since the MPI and OpenSHMEM are built on top, application developers don’t even have to know it’s there; all they know is that it runs fast.”
Market Adoption
The UCX project is very active, through the contribution of various developers from both the open source GitHub community and a growing number of vendor partners. The UCF is seeing adoption in both the MPI and OpenSHMEM space, both of which serve as the basis for many other programming models as well as many leading technology vendors who perceive the strengths of UCX as a communications model.
Jeff explains that UCX was designed from the beginning to support high performance platforms, thus it’s ideal for Exascale and distributed machine learning and deep learning, as well as for enterprise networking where scalable performance matters. “We’re excited by how the greater adoption of UCX is being used to drive systems in the path to Exascale. We anticipate that UCX will be used in large-scale infrastructures being built now and in the future.”
The project supports strong integration with several interconnect accelerators, including server-class GPU platforms and GPUDirect, providing GPU-to-GPU communication across multiple nodes as well within the same node. UCX also provides GPU support for multiple vendors and multiple architectures, i.e., IBM Power, and more.
UCX Development
The UCF has recently released version 1.3.1 of UCX with support for multiple architectures: it runs on x86, Power, ARM and other architecture, and provides support for multiple interconnect technologies, including InfiniBand, Ethernet and other protocols; it also provides support for areas shared memory communication types; support for proprietary interconnects as well.
Along with UCX’s strong integration with the HPC community, including OpenSHMEM implementations using UCX, and multiple MPI implementations, such as OpenMPI and open GTX, the project is boasting growing popularity in the Open Source community.
Future Impact
The UCF predicts that UCX will simplify the adoption of future architectures, by adding features to help developers and programming models manage the increasingly complex hierarchy being built into next-generation architectural designs. This will amount to a machine with memory plus NVMe, or GPUs or accelerators with attached memory, or a varying number of cores within the CPU itself, that will be able to benefit directly from the UCX to help manage that. The UFC points out that these are exactly the kind of capabilities that UCF’s vendor-partners are interested in and need for greater adoption of their platforms. They would like to be able to use UCX to help abstract some of that complexity in a hierarchical architecture.
Next Steps
UCF’s next steps will involve creating specifications for UCX with the goal to make life easier for vendors or people who write the programs or the applications. For the next release of UCX, the UFC plans to provide many features including support for Java API, to grow beyond classical HPC systems and software. It will be open for the big data community, analytics, IoT and more. In addition to providing the existing support for InfiniBand, GPUs and other high performance HPC hardware, the UFC will add support for regular TCP layers that can be exploited for use in today’s enterprises; this means the same networking stack (layer) can be used for enterprise-class software. NVMe storage as well as KVS (Key Value Store) are other targeted markets, however at only the experimental stage. UCF is also planning to add more projects, specifically those around deep learning frameworks, and many others.
User Community Call to Action
The UCF is calling out to users and vendors in the HPC arena to join UCF’s efforts to help drive specifications and to participate in new projects. UCF has called on the community to test and evaluate the layer in their programming environments and provide feedback on feature additions and other improvements to the API.
For more information, visit: openucx.org or the GitHub community.