Managing Memory at Multicore
Moving data in and out of computer memory is a time and energy consuming process, so caches evolved as a form of local memory to store frequently-used data. With the advance of multicore and manycore processors, managing caches becomes more difficult. Researchers at MIT suggest that it might make sense to let software, rather than hardware, manage these high-speed on-chip memory banks, as this article at MIT News explains.
Daniel Sanchez, an assistant professor in MIT’s Department of Electrical Engineering and Computer Science, is one of the main proponents of this new software-based approach. During the International Conference on Parallel Architectures and Compilation Techniques that took place last week, Sanchez and his student Nathan Beckmann gave a presentation on Jigsaw, a cache organization scheme, based on a paper they had co-authored. The tool provides isolation and reduces access latency in shared caches.
Jigsaw operates on the last-level cache. In multicore chips, each core has its own small cache, but the last-level cache is shared by all the cores. Shared caches face two fundamental limitations: latency and interference from other shared cache accesses. Other research has demonstrated that improvement in one issue degrades the other. “NUCA techniques reduce access latency but are prone to hotspots and interference, and cache partitioning techniques only provide isolation but do not reduce access latency,” the authors write.
Physically, this cache is comprised of separate memory banks distributed across the chip to allow each core to utilize the bank closest to it. Most chips assign data to these banks randomly, but Jigsaw optimizes this process by calculating the most efficient assignment of data to cache banks. For example, data only needed by a single core is located near that core, while data used by all the cores is put near the center of the chip. Minimizing data travel is the main role of Jigsaw, but it also optimizes for space as well with more frequently used data receiving a larger allocation.
In a series of experiments, the duo simulated the execution of hundreds of applications on 16- and 64-core chips. They found that Jigsaw improved performance by 2.2x (or 18 percent) over a conventional shared cache, while reducing energy use by as much as 72 percent. Jigsaw even outperforms more advanced partitioning techniques, like NUCA.
Optimizing cache space allocations can itself be a very time consuming process, but the MIT researchers developed an approximate optimization algorithm that runs efficiently even as the number of cores scale and different data types are used.
Full Story at MIT News