FLOPS Fall Flat for Intelligence Agency
The Intelligence Advanced Research Projects Activity (IARPA) is putting out some RFI feelers in hopes of pushing new boundaries with an HPC program. However, at the core of their evaluation process is an overt dismissal of current popular benchmarks, including floating operations per second (FLOPS).
To uncover some missing pieces for their growing computational needs, IARPA is soliciting for “responses that illuminate the breadth of technologies” under the HPC umbrella, particularly the tech that “isn’t already well-represented in today’s HPC benchmarks.”
The RFI points to the general value of benchmarks (Linpack, for instance) as necessary metrics to push research and development, but argues that HPC benchmarks have “constrained the technology and architecture options for HPC system designers.” More specifically, in this case, floating point benchmarks are not quite as valuable to the agency as data-intensive system measurements, particularly as they relate to some of the graph and other so-called big data problems the agency is hoping to tackle using HPC systems.
From the document:
In this RFI we seek information about novel technologies that have the potential to enable new levels of computational performance with dramatically lower power, space and cooling requirements than the HPC systems of today. Importantly, we also seek to broaden the definition of high performance computing beyond today’s commonplace floating point benchmarks, which reflect HPC’s origins in the modeling and analysis of physical systems. While these benchmarks have been invaluable in providing the metrics that have driven HPC research and development, they have also constrained the technology and architecture options for HPC system designers. The HPC benchmarking community has already started to move beyond the traditional floating point benchmarks with new benchmarks focused on data intensive analysis of large graphs and on power efficiency.
The grumblings about whether or not FLOPS represent a valid measure of real application performance for large-scale users is nothing new, but it seems the questions about this are creeping up with more frequency on the end user side, at least for those whose problems tend to revolve around so-called big data problems—in other words, those with complex, large datasets that create unique programming, memory and other conditions.
As IARPA echoes, there are many technologies that are still maturing that “have the potential to achieve high performance on important computational challenges but are highly unlikely to do well on today’s benchmarks (e.g., quantum computation, molecular/DNA computation, neural computation, optical computation).”
One could go out on a limb and point to the continued development of high performance systems to tackle data-intensive problems. John Johnson from Pacific Northwest National Laboratory has described this in a number of presentations. One of the pieces from his talk serves this point rather well.
On that note, Addison Snell of Intersect360 Research points to the diversity of applications, noting that some “are sensitive to flops, but there are others that require different types of performance. FFTs and sparse linear algebra are examples of applications that are not flop-centric, but rather are much more reliant on the interconnect and system topology.” He added that as in this case, “Certain sectors of the government are very interested in finding systems that will deliver on these other dimensions of scalability.”
This is certainly not to say that the FLOPS designation is becoming irrelevant—it is critical to have performance benchmarks for top systems. But for users who have budgetary constraints on power and cooling (and this is a big part of this RFI), want to use big iron efficiently, and plow through their massive, complex data wells quickly, it’s not difficult to see how FLOPS could be a more abstract representation of actual use, especially when using theoretical peak benchmarks to evaluate potential real-world application performance.
AMD research Josh Mora has performed a fair bit of research into the value of FLOPS for real-world applications, including CFD and others. He asserts that FLOPS, at least theoretical FLOPS, are not “a good indicator of how applications such as CFD and many others will perform.”
As CSS Founder (and HPCwire contributor) Gary Johnson argued, publicly funded high-end computers – including the top machines – are generally placed in environments where they are shared by a number of users. “Depending on site policies, there may be anywhere from a few hundred to several thousand users on these machines. Furthermore, these computers are seldom devoted in their entirety to a single application run. When they are, that run is likely to be Linpack benchmark to qualify for the next edition of the TOP500 list.”
Johnson says that when you do the math, no one really sees the full strength of the top computer. “Users just get a slice of the machine, one that is probably equivalent to full use of some computer much lower on the TOP500 list (and much cheaper).”
Although IARPA doesn’t want to get mixed into the benchmark brew with this request, some of the data-intensive system technologies that back some of the graph analytics needs they hint at are firming up some of their own benchmarks. The most obvious example here would be the Graph 500 list, which measures edges-traversed performance—a prime benchmark for an agency that likely is creating massive social graphs to discover previously unseen connections.