Research: A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

By Hartwig Anzt and Jack Dongarra

August 5, 2020

Editor’s Note: To take advantage of proliferating hardware advances intended to deliver powerful mixed-precision computing, DoE has started an effort to develop new algorithms to make the most of these new capabilities. This new effort is a meaningful and timely pivot from traditional software optimization say Jack Dongarra and Hartwig Anzt. As a first step, Dongarra, Anzt, and colleagues have surveyed the numerical linear algebra community and pulled their findings into a rich report. Dongarra and Anzt’s brief commentary, presented here, provides a glimpse into the report’s contents and, they hope, enticement to dig deeper into the full report. Both are familiar figures in HPC. Brief bios are included at the end. 

Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the machine learning community and their demand for high compute power in low precision formats. Also, server-line products are increasingly featuring low-precision special function units, such as the Nvidia tensor cores in the Oak Ridge National Laboratory’s Summit supercomputer, providing more than an order of magnitude of higher performance than what is available in IEEE double precision.

At the same time, the gap between the compute power on the one hand and the memory bandwidth on the other hand keeps increasing, making data access and communication prohibitively expensive compared to arithmetic operations. Having the choice between ignoring the hardware trends and continuing the traditional path, and adjusting the software stack to the changing hardware designs, the Department of Energy’s Exascale Computing Project decided for the aggressive step of building a multiprecision focus effort to take on the challenge of designing and engineering novel algorithms exploiting the compute power available in low precision and adjusting the communication format to the application-specific needs.

To start the multiprecision focus effort, we have written a survey of the numerical linear algebra community and summarized all existing multiprecision knowledge, expertise, and software capabilities in this landscape analysis report. We also include current efforts and preliminary results that may not yet be considered “mature technology,” but have the potential to grow into production quality within the multiprecision focus effort. As we expect the reader to be familiar with the basics of numerical linear algebra, we refrain from providing a detailed background on the algorithms themselves but focus on how mixed- and multiprecision technology can help to improve the performance of these methods and present highlights of application significantly outperforming the traditional fixed precision methods.

This report covers low precision BLAS operations, solving systems of linear systems, least squares problems, eigenvalue computations using mixed precision. These are demonstrated with dense and sparse matrix computations and direct and iterative methods. The ideas presented try to exploit low precision computations for the bulk of the compute time and then use mathematical techniques to enhance the accuracy of the solution to bring it to full precision accuracy with less time to solution.

On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. There are two reasons for this. Firstly, a 32-bit floating point arithmetic rate of execution is usually twice as fast as a 64-bit floating point arithmetic on most modern processors. Secondly, the number of bytes moved through the memory system is halved. It may be possible to care out the computation in lower precision, say 16-bit operations.

One approach exploiting the compute power in low precision is motivated by the observation that in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. The refinement can be accomplished, for instance, by means of the Newtons algorithm (see Equation (1)) which computes the zero of a function f (x) according to the iterative formula:

In general, we would compute a starting point and f (x) in single precision arithmetic, and the refinement process will be computed in double precision arithmetic. If the refinement process is cheaper than the initial computation of the solution, then double precision accuracy can be achieved nearly at the same speed as the single precision accuracy.

Stunning results can be achieved. In Figure 1, we are comparing the solution of a general system of linear equations using a dense solver on an Nvidia V100 GPU comparing the performance for 64-, 32-, and 16-bit floating point operations for the factorization and then using refinement techniques to improve the solution for the 32- and 16-bit solution to what was achieved using 64-bit factorization.

Figure 1: Mixed-precision iterative refinement in MAGMA and acceleration vs. FP64 solvers. Note ≈2%overhead per iteration, and more than 2×less overhead in terms of iterations for mixed-precision LU vs. regular FP16 LU (the 3 vs. 7 iterations until FP64 convergence)

The survey report presents much more detail on the methods and approaches using these techniques, see https://www.icl.utk.edu/files/publications/2020/icl-utk-1392-2020.pdf.

Author Bio – Hartwig Anzt

Hartwig Anzt is a Helmholtz-Young-Investigator Group leader at the Steinbuch Centre for Computing at the Karlsruhe Institute of Technology (KIT). He obtained his PhD in Mathematics at the Karlsruhe Institute of Technology, and afterwards joined Jack Dongarra’s Innovative Computing Lab at the University of Tennessee in 2013. Since 2015 he also holds a Senior Research Scientist position at the University of Tennessee. Hartwig Anzt has a strong background in numerical mathematics, specializes in iterative methods and preconditioning techniques for the next generation hardware architectures. His Helmholtz group on Fixed-point methods for numerics at Exascale (“FiNE”) is granted funding until 2022. Hartwig Anzt has a long track record of high-quality software development. He is author of the MAGMA-sparse open source software package managing lead and developer of the Ginkgo numerical linear algebra library, and part of the US Exascale computing project delivering production-ready numerical linear algebra libraries.

Author Bio – Jack Dongarra

Jack Dongarra received a Bachelor of Science in Mathematics from Chicago State University in 1972 and a Master of Science in Computer Science from the Illinois Institute of Technology in 1973. He received his PhD in Applied Mathematics from the University of New Mexico in 1980. He worked at the Argonne National Laboratory until 1989, becoming a senior scientist. He now holds an appointment as University Distinguished Professor of Computer Science in the Computer Science Department at the University of Tennessee, has the position of a Distinguished Research Staff member in the Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL), Turing Fellow in the Computer Science and Mathematics Schools at the University of Manchester, and an Adjunct Professor in the Computer Science Department at Rice University.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

HPC User Forum: Sustainability at TACC Points to Software

October 3, 2023

Recently, Dan Stanzione, Executive Director, TACC and Associate Vice President for Research, UT-Austin, gave a presentation on HPC sustainability at the Fall 2023 HPC Users Forum. The complete set of slides is available Read more…

Google’s Controversial AI Chip Paper Under Scrutiny Again 

October 3, 2023

A controversial research paper by Google that claimed the superiority of AI techniques in creating chips is under the microscope for the authenticity of its claims. Science publication Nature is investigating Google's c Read more…

Rust Busting: IBM and Boeing Battle Corrosion with Simulations on Quantum Computer

October 3, 2023

The steady research into developing real-world applications for quantum computing is piling up interesting use cases. Today, IBM reported on work with Boeing to simulate corrosion processes to improve composites used in Read more…

Nvidia Delivering New Options for MLPerf and HPC Performance

September 28, 2023

As HPCwire reported recently, the latest MLperf benchmarks are out. Not unsurprisingly, Nvidia was the leader across many categories. The HGX H100 GPU systems, which contain eight H100 GPUs, delivered the highest throughput on every MLPerf inference test in this round. Read more…

Hakeem Oluseyi Explores His Unlikely Journey from the Street to the Stars in SC23 Keynote

September 28, 2023

Defying the odds In the heart of one of the toughest neighborhoods in the country, young Hakeem Oluseyi’s world was a confined space, but his imagination soared to the stars. While other kids roamed the streets, he Read more…

AWS Solution Channel

Shutterstock 2338659951

VorTech Derisks Innovative Technology to Aid Global Water Sustainability Challenges Using Cloud-Native Simulations on AWS

Overview

No more than 1 percent of the world’s water is readily available fresh water, according to the US Geological Survey. Read more…

QCT Solution Channel

QCT and Intel Codeveloped QCT DevCloud Program to Jumpstart HPC and AI Development

Organizations and developers face a variety of issues in developing and testing HPC and AI applications. Challenges they face can range from simply having access to a wide variety of hardware, frameworks, and toolkits to time spent on installation, development, testing, and troubleshooting which can lead to increases in cost. Read more…

Nvidia Takes Another Shot at Trying to Get AI to Mobile Devices

September 28, 2023

Nvidia takes another shot at trying to get to mobile devices Long before the current situation of Nvidia's GPUs holding AI hostage, the company tried to put its chips in mobile devices but failed. The Tegra mobile chi Read more…

Shutterstock 1927423355

Google’s Controversial AI Chip Paper Under Scrutiny Again 

October 3, 2023

A controversial research paper by Google that claimed the superiority of AI techniques in creating chips is under the microscope for the authenticity of its cla Read more…

Rust Busting: IBM and Boeing Battle Corrosion with Simulations on Quantum Computer

October 3, 2023

The steady research into developing real-world applications for quantum computing is piling up interesting use cases. Today, IBM reported on work with Boeing to Read more…

Nvidia Delivering New Options for MLPerf and HPC Performance

September 28, 2023

As HPCwire reported recently, the latest MLperf benchmarks are out. Not unsurprisingly, Nvidia was the leader across many categories. The HGX H100 GPU systems, which contain eight H100 GPUs, delivered the highest throughput on every MLPerf inference test in this round. Read more…

IonQ Announces 2 New Quantum Systems; Suggests Quantum Advantage is Nearing

September 27, 2023

It’s been a busy week for IonQ, the quantum computing start-up focused on developing trapped-ion-based systems. At the Quantum World Congress today, the compa Read more…

Rethinking ‘Open’ for AI

September 27, 2023

What does “open” mean in the context of AI? Must we accept hidden layers? Do copyrights and patents still hold sway? And do consumers have the right to opt Read more…

Aurora Image

Leveraging Machine Learning in Dark Matter Research for the Aurora Exascale System 

September 25, 2023

Scientists have unlocked many secrets about particle interactions at atomic and subatomic levels. However, one mystery that has eluded researchers is dark matte Read more…

Watsonx Brings AI Visibility to Banking Systems

September 21, 2023

A new set of AI-based code conversion tools is available with IBM watsonx. Before introducing the new "watsonx," let's talk about the previous generation Watson Read more…

Intel’s Gelsinger Lays Out Vision and Map at Innovation 2023 Conference

September 20, 2023

Intel’s sprawling, optimistic vision for the future was on full display yesterday in CEO Pat Gelsinger’s opening keynote at the Intel Innovation 2023 confer Read more…

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

Leading Solution Providers

Contributors

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

ISC 2023 Booth Videos

Cornelis Networks @ ISC23
Dell Technologies @ ISC23
Intel @ ISC23
Lenovo @ ISC23
Microsoft @ ISC23
ISC23 Playlist
  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire