Everyone knows that 64-bit floating point arithmetic dominates in HPC. When a new Xeon or high-end GPU comes out, the most interesting spec to an HPCer is probably its peak double-precision flops performance, and yet… Along with the democratization of HPC and the rise of accelerators, so have new use cases for sub-FP64 and mixed precision arithmetic.
One of the most pertinent examples is in the deep learning space, where for neural network training and operation, single-precision or even half-precision are often sufficient, and save on cost, energy and storage space.
For more on this fast-emerging trend, Nick Higham gives a great overview with his blog post, “The Rise of Mixed-Precision Arithmetic.” The Richardson Professor of Applied Mathematics in the School of Mathematics at the University of Manchester is an expert in numerical analysis with a focus on numerical linear algebra. More on that here.
Higham offers concise definitions of double-precision, single-precision, half-precision and even quadruple precision arithmetic, which he notes was included in the 2008 revision of the IEEE standard and is supported by some compilers.
Then there’s half-precision arithmetic in which a number occupies 16 bits. This, he says, “is supported by the IEEE standard for storage but not for computation.” With a relative accuracy of about 10^{-4}, FP16 is increasingly seen as “good enough” for training and running neural networks. NVIDIA, by the way, is very focused on this segment and it was discussed in depth at the GPU Technology Conference in March.
And it’s not a matter of one or the other either, there are lots of ways to implement mixed-precision to optimize performance.
Higham gives many examples of where “extra precision” is called for, including the case of an iterative algorithm that accepts an arbitrary starting point. He writes “it can be run once at a given precision and the solution used to ‘warm start’ a second run of the same algorithm at higher precision.”
In this time of diminishing Moore’s law returns and data mandates ushering in heterogenous computing and hiearchical memory/storage, mixed precision algorithms make increasing sense from an economic perspective in terms of any number of costs (time, money, power, etc.)
You can read more on this here.