GPUs are often likened to the “gold” of artificial intelligence, vital to today’s generative AI age. This article aims to explain why AI is unachievable without GPUs. Let’s start with a simple processor task — displaying an image on the screen (shown below).
As straightforward as it seems, this task involves several steps: geometry transformation, rasterization, fragment processing, framebuffer operations, and output merging. These outline the GPU pipeline’s process for rendering 3D graphics.
In a GPU pipeline, an image is converted into a polygon mesh representation, as seen below:
A single teapot image is transformed into a mesh structure made of hundreds of triangles, each processed separately in the same manner.
What does a GPU offer that a CPU cannot in handling this “simple” task? High-end server CPUs can have up to 128 cores so that a CPU can simultaneously process 128 triangles in the teapot. The user sees a partially rendered teapot that slowly completes as the CPU cores finish and picks new triangles to render. Imagine playing Grand Theft Auto (GTA) and seeing the scene rendered in parts—it would ruin the experience, making even the old snake game seem more fun.
How does a GPU provide the complete GTA gaming experience? The answer is “parallelism” because of its tens of thousands of cores. A GPU can render all the triangles of the teapot simultaneously due to its many threads working on each triangle in parallel. Essentially, CPUs handle serial computing, whereas GPUs are built for parallel processing.
This captivating video showcases GPU computing’s power.
Initially created to boost 3D graphics rendering, GPUs have become more versatile and programmable over time. They add capabilities for better visual effects and realistic scenes through advanced lighting and shadowing, revolutionizing gaming. But it didn’t stop there. Developers saw GPUs’ untapped potential. Returning to our teapot example, GPUs perform vector-based mathematical calculations and matrix multiplications to render the image. Rendering a simple teapot requires about 192 bytes, while a complex GTA scene with 100 objects needs around 10KB.
GPUs’ built-in parallelism and high throughput lead to accelerated computing, pushing researchers to use GPUs for tasks like protein folding simulations and physics calculations. These early achievements showed that GPUs could speed up computation-heavy tasks beyond graphics rendering, such as matrix and vector operations used in neural networks. Although neural networks were achievable without GPUs, their capabilities were constrained by the available computational power. The advent of GPUs provided the necessary resources to train deep and complex neural networks effectively, driving rapid advancements and widespread adoption of deep learning techniques.
To allow GPUs to handle a wide range of tasks effectively, Nvidia has developed different types of GPU cores specialized for various functions:
- CUDA Cores: These are for general-purpose parallel processing, including rendering graphics, scientific computations, and basic machine learning tasks.
- Tensor Cores: Designed for deep learning and AI, they speed up tensor operations like matrix multiplications, which are essential for training and inference in neural networks.
- RT Cores: Focused on real-time ray tracing, these provide realistic lighting, shadows, and reflections in graphics.
Does this mean GPUs can replace CPUs? Absolutely not! The CPU is like the brain of the computer, excelling in swiftly managing individual tasks with its fewer but more powerful cores. The CPU is oriented towards latency, reflecting system response time, while the GPU is about throughput, indicating system capacity. The GPU’s journey from merely a graphics accelerator to playing a pivotal role in supercomputers marks a tale of rapid technological progress and expanding applications. Machine learning used to be slow and inaccurate, but the integration of GPUs revolutionized large neural networks, driving advancements in fields like autonomous driving and image/object recognition. High-performance computing, now a leading enterprise technology, has largely been propelled by GPUs.
Manasi Rashinkar holds a Master of Science in Electrical Engineering from Santa Clara University and is currently the senior ASIC Engineer—timing Lead at Nvidia. This article is Manasi’s own work and does not represent Nvidia.