During the recent Intel Developer Forum (IDF), Intel CTO Justin Rattner discussed the swift drive toward many-core computing, noting that this is an important development for HPC as well as many other realms.
Among the demonstrations and previews of the “many-core age” to come, Intel’s CTO touched on the future of extreme scale computing. This topic gave the company a perfect opportunity to discuss their ten-year goal to create a 300-fold improvement in energy efficiency, moving power consumption down the scale to 20 picojoules per FLOP at the system level.
Intel’s Shekhar Borkar who works with DARPA’s UHPC project said that “today’s 100 gigaFLOPs computer uses 200 watts. By 2019, it should use about 2 watts, due to reductions in power required not only by the cores, but by the whole system, including memory and storage.”
IDF also provided Intel a window to discuss a concept chip, nicknamed Clarmont, which they say can operate at near threshold voltage and can scale from full performance to low power on less than ten milliwatts of power.
Rattner stressed that these and related developments at Intel wouldn’t be restricted to HPC—he pointed to a number of applications that showed 30 or more times performance improvements as the core count lifted to 64.
The approaching deadline for the Knights Corner chip will bring more than 50 cores into 22nm—sparking what Rattner calls a new age of processing and memory capabilities. During the address, Rattner announced new parallel extension for JavaScript and demonstrated a range of many-core applications and efficient designs that performed well on the power, processing and memory fronts.
CERN’s Open Lab engineer, Andrzej Nowak said that in his work at the Large Hadron Collider is made possible by use of approximately 250,000 Intel cores. CERN has invested in parallelizing its software with returns in the range of 40x times the performance they experienced before.
According to Michael Miller, “CERN has worked with Northeastern University to parallelize its software. The lab has seen a fortyfold performance improvement on a 40-core Xeon implementation. The company uses the compatible MIC architecture. Nowak ran an application on both a single core and on a 32-core MIC, noting that on its heavily vectorized applications, they were getting nearly perfect scaling.”