Mapping the Energy Envelope of Multicore ARM Chips
Bigger is not always better in the world of supercomputing. While data scientists almost always desire more computational throughput, the key question is how best to deliver that: through traditional, power-hungry X64 processors, or through the cheap, low-power ARM processors that drive smartphones and tablets? The answer is not always clear.
The ARM architecture is estimated to power more than 90 percent of smartphones, and a good chunk of the world’s tablets too. To sate the desire for ever-faster devices, ARM Holdings has funneled more resources into the development of its 32-bit ARM architecture, with the hopes of boosting performance (memory especially) while minimizing electricity consumption and heat.
This keen interest in the ARM architecture has garnered the attention of the HPC community, which is always sensitive to power consumption and cooling issues. Several HPC companies and supercomputer projects have started migrating to the ARM architecture, such as the Barcelona Supercomputing Center, which is developing a supercomputer based on ARM Cortex-A9 systems.
Instead of diving headfirst into the ARM pool, however, smart HPC system builders need a way to predict whether ARM-based systems will, in fact, deliver the expected benefits in power consumption.
To that end, researchers at the National University of Singapore’s Department of Computer Science recently wrote a paper that sheds light on the balance between processing, memory, and network I/O on the one hand, and energy consumption in the latest multicore ARM architectures on the other. The paper was published by Sigmetrics, a special interest group that promotes the evaluation of computer system performance.
First, researchers Bogdan Marius Tudor and Yong Meng Teo developed a model that can predict the execution time and energy usage of an application for different number of cores and clock frequencies. This gives the user the capability to select the configuration that maximizes performance without wasting energy.
Second, the researchers tested that model against three types of applications, including HPC, Web hosting, and financial workloads. The tests show that the model can deliver a configuration of core counts and clock frequencies that reduce power consumption by 33 percent without impacting performance.
But in the end analysis, smaller is not always better. “We observe that low-power multicores may not always deliver energy-efficient executions for server workloads because large imbalances between cores, memory and I/O resources can lead to under-utilized resources and thus contribute to energy wastage,” the authors conclude. “Resource imbalances in HPC programs may result in significantly longer execution time and higher energy cost on ARM Cortex-A9 than on a traditional X64 server.”
If the ARM architecture is to make significant inroads into the HPC world, it will need enhancements in memory and I/O subsystems, the authors say. These enhancements are expected to be delivered in the ARM Cortex-A15 and the upcoming 64-bit ARM Cortex-A50 families, they say.