Since 1986 - Covering the Fastest Computers in the World and the People Who Run Them

Language Flags
June 17, 2014

Comparing Peak Floating Point Claims

Tiffany Trader
Altera WP 2014 Fig3

With Moore’s law and associated silicon transistor performance “laws” winding down, there is renewed interest in accelerators, e.g., digital signal processors (DSPs), graphics processing units (GPUs), and field-programmable gate arrays (FPGAs). Measuring the peak floating-point performance of these non-traditional computing architectures is not without challenges, however. A new white paper from Altera’s Michael Parker attempts to shed light on floating-point performance claims.

Parker, Principal DSP Product Planning Manager at Altera, provides a method for calculating and comparing the peak floating-point capabilities of several accelerators and also covers a real-life floating-point performance claim from Xilinx using a non-standard benchmarking method.

“Given the variety of computing architectures available, designers need a uniform method to compare performance and power efficiency,” writes Parker. “The accepted method is to measure floating-point operations per second (FLOPs), where a FLOP is defined as either an addition or multiplication of single (32 bit) or double (64 bit) precision numbers in conformance with the IEEE 754 standard. All higher order functions, such as divide, square root, and trigonometric operators, can be constructed using adders and multipliers. As these operators, as well as other common functions such as fast Fourier transforms (FFTs) and matrix operators, require both adders and multipliers, there is commonly a 1:1 ratio of adders and multipliers in all these architectures.”

The paper goes on to describe how to arrive at the peak FLOPS rating for DSPs, GPUs and FPGAs: by multiplying the sum of the adders and multipliers by the maximum operation frequency. Of course, this is a theoretical limit, which can never be realized in practice. However, the peak rating still serves as a useful point of reference, says Parker.

Parker notes that floating-point has always been available in FPGAs using their programmable logic. What’s more it’s not restricted to industry-standard single and double precision performance. Altera offers seven different levels of floating-point precision, he adds, but calculating the peak rating of a given FPGA using programmable logic implementation is not at all straight-forward.

“Therefore,” Parker writes, “the peak floating-point rating of Altera FPGAs is based solely on the capability of the hardened floating-point engines, and assumes that the programmable logic is not used for floating point, but rather for the other parts of a design, such as the data control and scheduling circuits, I/O interfaces, internal and external memory interfaces, and other needed functionality.”

Because it is nearly impossible to determine floating-point capacity of an FPGA when implemented in programmable logic, Parker says the best approach is to build benchmark floating-point designs, which include the timing closure process. The FPGA vendor can also supply these designs.

According to the designs that Altera provides on its 28nm FPGAs, “several hundred GFLOPs can be achieved for simpler algorithms such as FFTs, and just over 100 GFLOPs for complex algorithms such as QR and Cholesky decomposition.”

Parker cautions against relying solely on the vendor-supplied theoretical GFLOPs, and to be especially skeptical of claims based on logic implementation at over 500 GFLOPs. For a more accurate comparison, a report showing logic, memory and other resources with achieved clock rate should be provided. Going one step further, having a compiled design file would allow results to be replicated.

SC14 Virtual Booth Tours

AMD SC14 video AMD Virtual Booth Tour @ SC14
Click to Play Video
Cray SC14 video Cray Virtual Booth Tour @ SC14
Click to Play Video
Datasite SC14 video DataSite and RedLine @ SC14
Click to Play Video
HP SC14 video HP Virtual Booth Tour @ SC14
Click to Play Video
IBM DCS3860 and Elastic Storage @ SC14 video IBM DCS3860 and Elastic Storage @ SC14
Click to Play Video
IBM Flash Storage
@ SC14 video IBM Flash Storage @ SC14  
Click to Play Video
IBM Platform @ SC14 video IBM Platform @ SC14
Click to Play Video
IBM Power Big Data SC14 video IBM Power Big Data @ SC14
Click to Play Video
Intel SC14 video Intel Virtual Booth Tour @ SC14
Click to Play Video
Lenovo SC14 video Lenovo Virtual Booth Tour @ SC14
Click to Play Video
Mellanox SC14 video Mellanox Virtual Booth Tour @ SC14
Click to Play Video
Panasas SC14 video Panasas Virtual Booth Tour @ SC14
Click to Play Video
Quanta SC14 video Quanta Virtual Booth Tour @ SC14
Click to Play Video
Seagate SC14 video Seagate Virtual Booth Tour @ SC14
Click to Play Video
Supermicro SC14 video Supermicro Virtual Booth Tour @ SC14
Click to Play Video