The Great 8-bit Debate of Artificial Intelligence

By Waleed Atallah

August 7, 2023

Editors Note: Users often ask “What separates HPC from AI, they both do a lot of number crunching?” While this statement is true, one big difference is the precision required for a valid answer. HPC often requires the highest possible precision (i.e. 64-bit double precision floating point), while many AI applications actually work with 8-bit integers or floating point numbers. The use of less precision often allows faster CPU/GPU mathematics and a “good enough” result for many AI applications. The following article explains the trend toward lower precision computing in AI.

A grand competition of numerical representation is shaping up as some companies promote floating point data types in deep learning, while others champion integer data types.

Artificial Intelligence Is Growing In Popularity And Cost

Artificial intelligence (AI) is proliferating into every corner of our lives. The demand for products and services powered by AI algorithms has skyrocketed alongside the popularity of large language models (LLMs) like ChatGPT, and image generation models like Stable Diffusion. With this increase in popularity, however, comes an increase in scrutiny over the computational and environmental costs of AI, and particularly the subfield of deep learning.

The primary factors influencing the costs of deep learning are the size and structure of the deep learning model, the processor it is running on, and the numerical representation of the data. State-of-the-art models have been growing in size for years now, with the compute requirements doubling every 6-10 months [1] for the last decade. Processor compute power has increased as well, but not nearly fast enough to keep up with the growing costs of the latest AI models. This has led researchers to delve deeper into numerical representation in attempts to reduce the cost of AI. Choosing the right numerical representation, or data type, has incredible implications on the power consumption, accuracy, and throughput of a given model. There is, however, no singular answer to which data type is best for AI. Data type requirements vary between the two distinct phases of deep learning: the initial training phase and the subsequent inference phase.

Finding the Sweet Spot: Bit by Bit

When it comes to increasing AI efficiency, the method of first resort is quantization of the data type. Quantization reduces the number of bits required to represent the weights of a network. Reducing the number of bits not only makes the model smaller, but reduces the total computation time, and thus reduces the power required to do the computations. This is an essential technique for those pursuing efficient AI.

AI models are typically trained using single precision 32-bit floating point (FP32) data types. It was found, however, that all 32 bits aren’t always needed to maintain accuracy. Attempts at training models using half precision 16-bit floating point (FP16) data types showed early success, and the race to find the minimum number of bits that maintains accuracy was on. Google came out with their 16-bit brain float (BF16), and models being primed for inference were often quantized to 8-bit floating point (FP8) and integer (INT8) data types. There are two primary approaches to quantizing a neural network: Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Both methods aim to reduce the numerical precision of the model to improve computational efficiency, memory footprint, and energy consumption, but they differ in how and when the quantization is applied, and the resulting accuracy.

Post-Training Quantization (PTQ) occurs after training a model with higher-precision representations (e.g., FP32 or FP16). It converts the model’s weights and activations to lower-precision formats (e.g., FP8 or INT8). Although simple to implement, PTQ can result in significant accuracy loss, particularly in low-precision formats, as the model isn’t trained to handle quantization errors. Quantization-Aware Training (QAT) incorporates quantization during training, allowing the model to adapt to reduced numerical precision. Forward and backward passes simulate quantized operations, computing gradients concerning quantized weights and activations. Although QAT generally yields better model accuracy than PTQ, it requires training process modifications and can be more complex to implement.

The 8-bit Debate

The AI industry has begun coalescing around two preferred candidates for quantized data types: INT8 and FP8. Every hardware vendor seems to have taken a side. In mid 2022, a paper by Graphcore and AMD[2] floated the idea of an IEEE standard FP8 datatype. A subsequent joint paper with a similar proposal from Intel, Nvidia, and Arm[3] followed shortly. Other AI hardware vendors like Qualcomm[4, 5] and Untether AI[6] also wrote papers promoting FP8 and reviewing its merits versus INT8. But the debate is far from settled. While there is no singular answer for which data type is best for AI in general, there are superior and inferior data types when it comes to various AI processors and model architectures with specific performance and accuracy requirements.

Integer Versus Floating Point

Floating point and integer data types are two ways to represent and store numerical values in computer memory. There are a few key differences between the two formats that translate to advantages and disadvantages for various neural networks in training and inference.

The differences all stem from their representation. Floating point data types are used to represent real numbers, which include both integers and fractions. These numbers can be represented in scientific notation, with a base (mantissa) and an exponent.

On the other hand, integer data types are used to represent whole numbers (without fractions). The representations result in a very large difference in precision and dynamic range. Floating point numbers have a wider dynamic range then their integer counterparts. Integer numbers have a smaller range and can only represent whole numbers with a fixed level of precision.

Integer vs Floating Point for Training

In deep learning, the numerical representation requirements differ between the training and inference phases due to the unique computational demands and priorities of each stage. During the training phase, the primary focus is on updating the model’s parameters through iterative optimization, which typically necessitates higher dynamic range to ensure the accurate propagation of gradients and the convergence of the learning process. Consequently, floating-point representations, such as FP32, FP16, and even FP8 lately, should be employed during training to maintain sufficient dynamic range. On the other hand, the inference phase is concerned with the efficient evaluation of the trained model on new input data, where the priority shifts towards minimizing computational complexity, memory footprint, and energy consumption. In this context, lower-precision numerical representations, such as 8-bit integer (INT8) become an option in addition to FP8. The ultimate decision depends on the specific model and underlying hardware.

Integer vs Floating Point for Inference

The best data type for inference will vary depending on the application and the target hardware. Real-time and mobile inference services tend to use the smaller 8-bit data types to reduce memory footprint, compute time, and energy consumption while maintaining enough accuracy.

FP8 is growing increasingly popular, as every major hardware vendor and cloud service provider has addressed its use in deep learning. There are three primary flavors of FP8, defined by the ratio of exponents to mantissa. Having more exponents increases the dynamic range of a data type, so FP8 E3M4 consisting of 1 sign bit, 3 exponent bits, and 4 mantissa bits, has the smallest dynamic range of the bunch. This FP8 representation sacrifices range for precision by having more bits reserved for mantissa, which increases the accuracy. FP8 E4M3 has an extra exponent, and thus a greater range. FP8 E5M2 has the highest dynamic range of the trio, making it the preferred target for training, which requires greater dynamic range. Having a collection of FP8 representations allows for a tradeoff between dynamic range and precision, as some inference applications would benefit from the increased accuracy offered by an extra mantissa bit.

INT8, on the other hand, effectively has 1 sign bit, 1 exponent bit, and 6 mantissa bits. This sacrifices much of its dynamic range for precision. Whether or not this translates into better accuracy compared to FP8 depends on the AI model in question. And whether or not it translates into better power efficiency will depend on the underlying hardware. Research from Untether AI research[6] shows that FP8 outperforms INT8 in terms of accuracy, and for their hardware, performance and efficiency as well. Alternatively, Qualcomm research [5] had found that the accuracy gains of FP8 are not worth the loss of efficiency compared to INT8 in their hardware. Ultimately, the decision for which data type to select when quantizing for inference will often come down to what is best supported in hardware, as well as depending on the model itself.

References

[1] Compute Trends Across Three Eras Of Machine Learning, https://arxiv.org/pdf/2202.05924.pdf
[2] 8-bit Numerical Formats for Deep Neural Networks, https://arxiv.org/abs/2206.02915
[3] FP8 Formats for Deep Learning, https://arxiv.org/abs/2209.05433
[4] FP8 Quantization: The Power of the Exponent, https://arxiv.org/pdf/2208.09225.pdf
[5] FP8 verses INT8 for Efficient Deep Learning Inference, https://arxiv.org/abs/2303.17951
[6] FP8: Efficient AI Inference Using Custom 8-bit Floating Point Data Types, https://www.untether.ai/content-request-form-fp8-whitepaper

About the Author

Waleed Atallah is a Product Manager responsible for silicon, boards, and systems at Untether AI. Currently, he is rolling out Untether AI’s second generation silicon product, the speedAI family of devices. He was previously a Product Manager at Intel, where he was responsible for high-end FPGAs with high bandwidth memory. His interests span all things compute efficiency, particularly the mapping of software to new hardware architectures. He received a B.S. degree in Electrical Engineering from UCLA.

Topics: AI, Hardware, Processors, Software

Sectors: Academia & Research, Community

Tags: AI, ChatGPT, data types, floating point, FP16, FP8, LLMs, Post-Training Quantization, Quantization-Aware Training

ISC Preview: Focus Will Be on Top500 and HPC Diversity

May 9, 2024

Last year's Supercomputing 2023 in November had record attendance, but the direction of high-performance computing was a hot topic on the floor. Expect more of that at the upcoming ISC High Performance 2024, which is hap Read more…

Processor Security: Taking the Wong Path

May 9, 2024

More research at UC San Diego revealed yet another side-channel attack on x86_64 processors. The research identified a new vulnerability that allows precise control of conditional branch prediction in modern processors.� Read more…

The Ultimate 2024 Winter Class Round-Up

May 8, 2024

To make navigating easier, we have compiled a collection of all the 2024 Winter Classic News in this single page round-up. Meet The Teams Introducing Team Lobo This is the other team from University of New Mex Read more…

How the Chip Industry is Helping a Battery Company

May 8, 2024

Chip companies, once seen as engineering pure plays, are now at the center of geopolitical intrigue. Chip manufacturing firms, especially TSMC and Intel, have become the backbone of devices with an on/off switch. Thes Read more…

Illinois Considers $20 Billion Quantum Manhattan Project Says Report

May 7, 2024

There are multiple reports that Illinois governor Jay Robert Pritzker is considering a $20 billion Quantum Manhattan-like project for the Chicago area. According to the reports, photonics quantum computer developer PsiQu Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of what it is like to orbit and enter a black hole. And yes, it c Read more…

ISC Preview: Focus Will Be on Top500 and HPC Diversity

May 9, 2024

Last year's Supercomputing 2023 in November had record attendance, but the direction of high-performance computing was a hot topic on the floor. Expect more of Read more…

Illinois Considers $20 Billion Quantum Manhattan Project Says Report

May 7, 2024

There are multiple reports that Illinois governor Jay Robert Pritzker is considering a $20 billion Quantum Manhattan-like project for the Chicago area. Accordin Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

How Nvidia Could Use $700M Run.ai Acquisition for AI Consumption

May 6, 2024

Nvidia is touching $2 trillion in market cap purely on the brute force of its GPU sales, and there's room for the company to grow with software. The company hop Read more…

Hyperion To Provide a Peek at Storage, File System Usage with Global Site Survey

May 3, 2024

Curious how the market for distributed file systems, interconnects, and high-end storage is playing out in 2024? Then you might be interested in the market anal Read more…

Qubit Watch: Intel Process, IBM’s Heron, APS March Meeting, PsiQuantum Platform, QED-C on Logistics, FS Comparison

May 1, 2024

Intel has long argued that leveraging its semiconductor manufacturing prowess and use of quantum dot qubits will help Intel emerge as a leader in the race to de Read more…

Stanford HAI AI Index Report: Science and Medicine

April 29, 2024

While AI tools are incredibly useful in a variety of industries, they truly shine when applied to solving problems in scientific and medical discovery. Research Read more…

IBM Delivers Qiskit 1.0 and Best Practices for Transitioning to It

April 29, 2024

After spending much of its December Quantum Summit discussing forthcoming quantum software development kit Qiskit 1.0 — the first full version — IBM quietly Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel Plans Falcon Shores 2 GPU Supercomputing Chip for 2026 

August 8, 2023

Intel is planning to onboard a new version of the Falcon Shores chip in 2026, which is code-named Falcon Shores 2. The new product was announced by CEO Pat Gel Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

Click Here for More Headlines

HPCwire is a registered trademark of Tabor Communications, Inc. Use of this site is governed by our Terms of Use and Privacy Policy.

Reproduction in whole or in part in any form or medium without express written permission of Tabor Communications, Inc. is prohibited.

Artificial Intelligence Is Growing In Popularity And Cost

Finding the Sweet Spot: Bit by Bit

The 8-bit Debate

Integer Versus Floating Point

Integer vs Floating Point for Training

Integer vs Floating Point for Inference

Leading Solution Providers

Off The Wire

Industry Headlines

May 9, 2024

May 8, 2024

May 7, 2024

May 6, 2024

Subscribe to HPCwire's Weekly Update!

ISC Preview: Focus Will Be on Top500 and HPC Diversity

Processor Security: Taking the Wong Path

The Ultimate 2024 Winter Class Round-Up

How the Chip Industry is Helping a Battery Company

Illinois Considers $20 Billion Quantum Manhattan Project Says Report

The NASA Black Hole Plunge

ISC Preview: Focus Will Be on Top500 and HPC Diversity

Illinois Considers $20 Billion Quantum Manhattan Project Says Report

The NASA Black Hole Plunge

How Nvidia Could Use $700M Run.ai Acquisition for AI Consumption

Hyperion To Provide a Peek at Storage, File System Usage with Global Site Survey

Qubit Watch: Intel Process, IBM’s Heron, APS March Meeting, PsiQuantum Platform, QED-C on Logistics, FS Comparison

Stanford HAI AI Index Report: Science and Medicine

IBM Delivers Qiskit 1.0 and Best Practices for Transitioning to It

Nvidia H100: Are 550,000 GPUs Enough for This Year?

Synopsys Eats Ansys: Does HPC Get Indigestion?

Intel’s Server and PC Chip Development Will Blur After 2025

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

Choosing the Right GPU for LLM Inference and Training

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

How AMD May Get Across the CUDA Moat

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

Leading Solution Providers

Contributors

Tiffany Trader

Editorial Director

Douglas Eadline

Managing Editor

John Russell

Senior Editor

Kevin Jackson

Contributing Editor

Ali Azhar

Contributing Editor

Alex Woodie

Contributing Editor

Addison Snell

Contributing Editor

Drew Jolly

Assistant Editor

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

Eyes on the Quantum Prize – D-Wave Says its Time is Now

The GenAI Datacenter Squeeze Is Here

Intel Plans Falcon Shores 2 GPU Supercomputing Chip for 2026

GenAI Having Major Impact on Data Culture, Survey Says

China Is All In on a RISC-V Future

The NASA Black Hole Plunge

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

The Information Nexus of Advanced Computing and Data systems for a High Performance World

Share

Copy short link

Intel Plans Falcon Shores 2 GPU Supercomputing Chip for 2026