The multicore era has been in full-swing for a decade now, yet exploiting all that parallel goodness remains a prominent challenge. Ideally, compute efficiency would scale linearly with increased cores, but that’s not always the case. As core counts are only set to proliferate across the computing spectrum, it’s an issue that merits serious attention.
Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory are exploring ways to improve the scalability of priority queues, which are essential for applications such as task scheduling and discrete event simulation. Concurrent priority queues scale up to about eight cores, but above single digits, efficiency drops off as multiple threads attempt to access elements at the head of the queue. By modifying priority queues so that they are directed a certain distance away from the head, the MIT team has demonstrated improved efficiency for up to 80 cores.
With traditional versions of a priority queue, bottlenecks arise as multiple threads attempt to access the front of the priority queue at the same time. The SprayList algorithm created by the MIT team resolves this issue by randomly assigning tasks. The algorithm, based on the skip list, enables processors with many cores to work in unison.
“Roughly, at each SkipList level, a thread flips a random coin to decide how many nodes to skip ahead at that level,” they write in their paper. “In essence, we use local randomness and the random structure of the SkipList to balance accesses to the head of the list. The lengths of jumps at each level are chosen such that the probabilities of hitting nodes among the first O(p log3 p) are close to uniform.”
Figure 1 shows the intuition behind the sprays.
Justin Kopinsky, an MIT graduate student in electrical engineering and computer science and one of the newpaper’s co-authors, explains the approach addresses what can be an order-of-magnitude slowdown or more arising from collisions and logjams associated with the standard priority queue.
The researchers tested the SprayList algorithm using a Fujitsu RX600 S6 server with four 10-core Intel Xeon E7-4870 (Westmere EX) processors, supporting a total of 80 hardware threads (approximating an 80-core setup). Running several benchmarks, which are detailed in their paper, the algorithm was said to achieve a drastic reduction in contention. The researchers explain that while a random approach takes longer to make its way around the queue compared to conventional priority queues, the gain in scalability outweighed the additional work for reasonably parallel workloads. Furthermore, they state that the relaxation parameters of their algorithm can be tuned to further increase performance depending on workload and specific core counts.
The researchers will present their findings this month at the Association for Computing Machinery’s Symposium on Principles and Practice of Parallel Programming.