HPC on cloud platforms can be undermined by performance-numbing virtualization layers and slow networks. But a group of European researchers have found that there could be a more fundamental problem: multitenancy.
An article that appeared this week in HPC in the Cloud, written by software consultant Jeff Napper along with Paolo Bientinesi and Roman Lakymchuk of RWTH Aachen University, suggests that competition for resources by multiple applications running on the same nodes can slow performance significantly for HPC workloads. In their testing of a DGEMM (double-precision general matrix multiply) code on a single cloud node, they found that the typical run times were much slower than on a dedicated machine:
The fastest execution time of the DGEMM over the 6 hours… is similar to that on a typical HPC cluster node. However, the average execution time on our cloud node is more than 8 times worse with a standard deviation of 33%. The hardware is good, as shown by the best execution time, but the competition among tenants results in diminished average performance with a wide range of possible outcomes. Thus, the expected performance of a simple in-memory matrix-matrix multiply on a multitenant cloud node is not good and fluctuates significantly. Without even using the network, the cloud nodes still cannot be expected to perform as a typical HPC cluster due to the competition from other tenants.
What they discovered was that if they used less of the cores on the node, performance could be optimized. For an 8-core node, they found that using just 2 cores in this particular cloud yielded the lowest (fastest) average execution time. Even when they ran the code across multiple nodes (thus adding the network variable back in), using less than the full complement of cores produced faster results.
A good read for both prospective HPC cloud users and cloud providers.