A relatively new architecture explicitly designed for parallelism – Swarm – based on work at MIT has shown promise for substantially speeding up classes of applications (graphs, for example) and decreasing the programming burden to achieve parallelism. The work, recounted in a recent paper, Unlocking Ordered Parallelism with the Swarm Architecture, bucks conventional wisdom and adopts the ‘ordered’ program execution model.
“Multicore systems are really hard to program. You have to explicitly divide the work that you’re doing into tasks, and then you need to enforce some synchronization between tasks accessing shared data. What this architecture does, essentially, is to remove all sorts of explicit synchronization, to make parallel programming much easier,” said project leader Daniel Sanchez in an account of the work on MIT’s website.
In the paper[i], the authors wrote, “First, conventional wisdom says that order constraints limit parallelism. However, we have shown that it is possible to maintain a large speculation window efficiently, so that only true data dependencies limit parallelism. Second, conventional wisdom says that speculation is wasteful, and designers should instead build nonspeculative parallel systems. However, we have shown that, for a broad class of applications, speculation unlocks abundant parallelism for moderate costs.”
Using the SWARM architecture on a 64-core device (see above) to run a set of graph analytics, simulation, and database benchmarks, the authors report SWARM outperforms sequential implementations of these algorithms by 43 to 117 times and state-of-the-art software-only parallel algorithms by 3 to 18 times. SWARM exploits ordered parallelism with these four novel techniques:
- Execution model based on tasks with programmer-specified time stamps that conveys order constraints to hardware.
- Hardware task-management scheme that features speculative task creation and dispatch, drastically reducing task management overheads, and implements a large speculation window.
- Scalable conflict-detection scheme that leverages eager versioning to, upon mispeculation, selectively abort the mispeculated task and its dependents.
- Distributed commit protocol that allows ordered commits without serialization, supporting multiple commits per cycle with modest communication.
Applications with ordered parallelism have three key features, note the authors. First, they comprise tasks that must execute in some total or partial order. Second, tasks are not known in advance. Instead, tasks dynamically create children tasks and schedule them to run at a future time, resulting in different task creation and execution orders. Third, tasks can have data dependencies that are not known a priori.
The approach relies on speculative execution, which works because most of the order constraints are superfluous. Except for the earliest active task, Swarm assumes there are no earlier data-dependent tasks, and it executes the task speculatively. When Swarm detects a dependence has been violated, the offending task is aborted.
Order is enforced with task units that assign each dispatched task a unique virtual identifier comprised of the task’s time stamp and a unique tiebreaker. “To uncover enough parallelism, task units can dispatch any available task to cores, no matter how distant in program order. Tasks also can run even if their parent is still speculative,” note the authors.
“Swarm guarantees that tasks appear to execute in time-stamp order,” they further describe. “If multiple tasks have the same time stamp, Swarm chooses an order among them. Using time stamps decouples task creation and execution orders: software conveys new work to hardware as soon as it is discovered rather than in the order it needs to run, exposing a large amount of parallelism.”
The hybrid approach (software and hardware), say the researchers, not only achieves near-linear scalability on algorithms that are often considered sequential, but also results in “Swarm programs [that] are almost as simple as their sequential counterparts, because they do not use explicit synchronization.”
Programs leverage the Swarm execution model through an API, illustrated below (figure 2).
Link to the summary paper: https://people.csail.mit.edu/mcj/papers/2016.swarm.toppicks.pdf
Sources: MIT; IEEE
[i] Unlocking Ordered Parallelism with the Swarm Architecture, Authors include Mark C. Jeffrey (MIT), Suvinay Subramanian (MIT), Cong Yan, University of Washington, Joel Emer MIT and Nvidia; Daniel Sanchez (MIT); IEEE Micro, May/June 2016.