Helping developers increase ML, AI, and HPC performance and code portability
As organizations increasingly move to using open source and open standard solutions, their developers need the ability to write code once and be able to use it on multiple architectures. Many organizations use GPU-accelerated applications and infrastructure that limit users to a single accelerator hardware vendor. Developers face challenges in keeping up with ports and changing workloads where different devices are required and need the ability to move between different devices and architectures to run workloads across a variety of different hardware environments without buying new hardware to run a certain code. AMD developed a porting solution that allows developers to port code that is based on NVIDIA’s CUDA® API to run on AMD GPUs.
Introducing AMD ROCm™ Platform and HIPify Tools
The AMD ROCm™ open software platform provides tools to port CUDA-based code to AMD native open-source Heterogeneous Computing Interface for Portability (HIP) that can run on AMD Instinct™ accelerators including the latest MI200 series products. Developers can write their GPU applications and with very minimal changes be able to run their code in both NVIDIA and AMD environments.
When using ROCm, developers can run their software on the GPU accelerator, the CPU, and the server platform of their choice. AMD ROCm features include minimalism and modular software development for GPU computing. The ROCm software stack, represented by the black circle, enables a broad ecosystem of tools, libraries and applications as shown in Figure 1.
The AMD ROCm ecosystem includes support for the most popular open machine learning (ML) and high performance computing (HPC) frameworks including PyTorch, Tensorflow, ONNX-RT, RAJA, and a variety of other frameworks. ROCm works with a broad range of supporting libraries (BLAS, FFT, RNG, SPARSE, THRUST, MIOpen, and RCCL) as well as third party libraries. ROCm provides upstreamed Linux® Kernel support for major Linux distributions along with support for popular deployment, management tools and scale-out computing and toolchain programming models.
Benefits of using AMD HIP to convert code
- HIP uses C++ which is familiar to many developers
- HIP C++ code can be compiled using either AMD `hipcc` or CUDA® `nvcc`
- HIP conversion provides customers with more choice in hardware and development tools
- Can save developer time in moving between architectures making it easier to try different solutions on various GPUs
- Aids in the development of compute-intensive applications
Porting CUDA to HIP steps
The porting conversion begins with running the HIP translator hipexamine-perl shell script to convert CUDA-based files (.cu) to HIP files (.cpp). This shell script runs to determine the CUDA code that can be converted to HIP. Compile (with `hipcc`) and run the application to begin the conversion. Most CUDA API calls can be converted one-for-one to HIP API calls by HIPIFY tools automatically.
Things to be aware of that may cause issues
Be aware of components (Makefile and build system changes) that may require manual intervention after the conversion. Inline PTX assembly, CUDA intrinsics, and inlined functions often have hard-coded dependencies on warp size which may need to be modified. Click here to see a complete example of running a CNN HIP conversion.
TempoQuest Ports AceCAST WRF CUDA-Based Code to AMD HIP
TempoQuest (TQI) developed their AceCAST™ Weather Research and Forecasting (WRF) software, a GPU-accelerated version of the WRF model, implemented using a combination of CUDA and OpenACC. TQI partnered with AMD to support AceCAST on AMD Instinct™ MI200 series GPUs. This is done using AMD’s HIP conversion tools, which convert the OpenACC-Fortran and CUDA-C based code to HIP to run on AMD GPU platforms.
TQI’s development team indicates that converting the code using the HIP conversion tools was trivial with only a few minor changes required for performance tuning and to accommodate for some minor CUDA-to-HIP incompatibilities. According to Gene Pache, TQI Founder and Chief Executive Officer, “Being able to run the accelerated WRF, AceCAST on AMD GPUs will greatly help the compute shortfall. The major benefit of our AceCAST software is the acceleration of the forecast process afforded by running on AMD GPUs.”
ORNL Uses HIP Conversion for Research on Frontier Exascale Supercomputer
The AMD GPU powered Frontier supercomputer located at the Oak Ridge Leadership Computing Facility (OLCF) at Oak Ridge National Laboratory (ORNL) became operational for pre-production testing in May 2022. Application developers on the Particle-in-cell (PIConGPU) team used AMD HIP to port and optimize the PIConGPU code to run on AMD GPUs. Frontier is powered by 2nd Generation AMD CDNA™ architecture and AMD Instinct MI200 series accelerators.
The CUDA-to-HIP conversion allowed the open source code to be portable between GPUs. Sunita Chandrasekaran leader of an international research team states, “A simulation that took two months on the previous Summit system now takes less than two weeks on the AMD GPU-powered Frontier system while allowing the team to run several 10-million time-step simulations.”
For more information on the AMD CUDA-to-HIP conversion, see:
Helpful Resources:
- Learn more about our latest AMD Instinct™ accelerators
- ROCm Information Portal is a new one-stop portal for users and developers that posts the latest versions of ROCm along with API and support documentation.
- AMD Infinity Hub gives you access to HPC applications and ML frameworks packaged as containers
- ROCm Application Catalog, which includes an up-to-date listing of ROCm enabled applications.
- AMD Accelerator Cloud offers remote access to test code and applications in the cloud, on the latest AMD Instinct™ accelerators and ROCm software.