It’s no secret memory-to-processor bottlenecks have become a chief obstacle in boosting computer performance. Recent efforts to place DRAM onto chip packages have helped mitigate the problem, but the overhead required to retrieve and manage data has prevented optimizing DRAM’s natural speed. This month MIT and ETH researchers reported a novel cache-management scheme that improves the data rate of in-package DRAM caches by 33 to 50 percent.
“The bandwidth in this in-package DRAM can be five times higher than off-package DRAM,” says Xiangyao Yu, a postdoc in MIT’s Computer Science and Artificial Intelligence Laboratory and first author on the new paper. “But it turns out that previous schemes spend too much traffic accessing metadata or moving data between in- and off-package DRAM, not really accessing data, and they waste a lot of bandwidth. The performance is not the best you can get from this new technology.”
The work (Banshee: Bandwidth-Efficient DRAM Caching via So ware/Hardware Cooperation) was presented at last week’s IEEE/ACM International Symposium on Microarchitecture held in Boston and is nicely summarized in an article posted on the MIT website. Diagram from the paper below.
The nature of the problem is well known; it stems from the difference between DRAM and SRAM, the technology used in standard caches. For every bit of data stored, SRAM uses six transistors. As explained in the MIT account, “DRAM uses one, which means that it’s much more space-efficient. But SRAM has some built-in processing capacity, and DRAM doesn’t. If a processor wants to search an SRAM cache for a data item, it sends the tag to the cache. The SRAM circuit itself compares the tag to those of the items stored at the corresponding hash location and, if it gets a match, returns the associated data.”
DRAM, by contrast, can’t do anything but transmit requested data. So the processor would request the first tag stored at a given hash location and, if it’s a match, send a second request for the associated data. If it’s not a match, it will request the second stored tag, and if that’s not a match, the third, and so on, until it either finds the data it wants or gives up and goes to main memory.
The researchers developed a new data management scheme relying on a hash function they developed to reduce the metadata burden.
Yu and his colleagues’ new system, dubbed Banshee, adds three bits of data to each entry in the table. One bit indicates whether the data at that virtual address can be found in the DRAM cache, and the other two indicate its location relative to any other data items with the same hash index.
“In the entry, you need to have the physical address, you need to have the virtual address, and you have some other data,” Yu says. “That’s already almost 100 bits. So three extra bits is a pretty small overhead.”
There’s one problem with this approach that Banshee also has to address. If one of a chip’s cores pulls a data item into the DRAM cache, the other cores won’t know about it. Sending messages to all of a chip’s cores every time any one of them updates the cache consumes a good deal of time and bandwidth. So Banshee introduces another small circuit, called a tag buffer, where any given core can record the new location of a data item it caches.
Here is the conclusion from the paper:
“We propose a new DRAM cache design called Banshee. Banshee aims to maximize both in-package and off-package DRAM bandwidth efficiency and performs better than previous latency- optimized DRAM cache designs on memory-bound applications. Banshee achieves this through a software/hardware co-design approach. Specifically, Banshee uses a new, low-overhead lazy TLB coherence mechanism and a bandwidth-aware DRAM cache replacement policy to minimize the memory bandwidth overhead for 1) tracking the DRAM cache contents, and 2) performing DRAM cache replacement. Our extensive experimental results show that Banshee provides significant performance and bandwidth efficiency improvements over three state-of-the-art DRAM cache schemes.”
Link to the MIT article: http://news.mit.edu/2017/new-high-capacity-data-caches-more-efficient-1023