Solid-state drives (SSDs) for servers have been around for a few years now, but there hasn’t been a big rush to re-outfit datacenters, HPC or otherwise, with this new technology. This week’s announcement by the San Diego Supercomputer Center (SDSC) of its new SSD-equipped Appro cluster illustrates the point. In fact, SDSC is touting its supercomputer, called “Dash,” as the first large HPC deployment of SSD technology.
The 5 teraflop Dash machine (it’s fast, get it?) is mostly just a typical, late-model HPC cluster: Nehalem processor blades hooked up with InfiniBand. Getting all the attention is the flash memory, which is of the Intel SATA SSD variety. Sprinkled on top is ScaleMP’s vSMP (virtual SMP) software, which is used to aggregate regular memory and flash memory into virtual “supernodes.” Each supernode is made up of 16 physical nodes and encompasses 768 GB of DRAM and 1 TB of NAND. The whole idea is to support super-sized data-mining, in which the big DRAM is used to house large datasets, and the NAND flash is used to turbo-charge file I/O and memory swapping. Applications includes drug discovery and asteroid-hunting, among others.
From the SDSC press release:
“Dash can do random data accesses one order-of-magnitude faster than other machines,” said Allan Snavely, associate director at SDSC. “This means it can solve data-mining problems that are looking for the proverbial ‘needle in the haystack’ more than 10 times faster than could be done on even much larger supercomputers that still rely on older ‘spinning disk’ technology.”
The fact that this is the first large-scale deployment of NAND flash technology in HPC, or maybe in any large-scale cluster, points to the immaturity of the enterprise SSD market. While consumer flash has been mainstream for years in handheld devices and USB thumb drives, server-based flash memory is still in the hype-cycle, although probably toward the end of it. There are plenty of server-based flash products and an array of vendors — Intel, STEC, Micron, Texas Memory Systems, Fusion-io, SandForce, and Violin Memory, to name a few — with more on the way. But with no history of product reliability, it’s been a battle to overcome the conservative culture of IT managers and the 50-year inertia of hard disk storage.
The fact that the recession has flattened IT budgets has probably not helped. Even in a bad economy, though, flash memory has plenty to recommend it. Compared to spinning disks, it’s denser, more energy-efficient, and, from an IOPS point of view, much less expensive. Although there have been concerns about longevity, the current crop of NAND server devices are being advertised with a lifetime of five-plus years, which is more than enough for most datacenters applications.
One confounding factor is the array of architectural and technology choices. Flash SSDs that plug into existing disk slots are the most commonly offered. They’re also the least disruptive, inasmuch as they act like hard disks, with, of course, the advantage of greater speed. Less common are PCI-based flash products, offered by Fusion-io and a number of other companies. In this case, the flash devices are hooked up directly to the PCI bus, which avoids the overhead of the disk controller and yields higher IOPS with lower latencies.
NAND memory for servers is usually of the SLC (Single-Level Cell) variety, since the less expensive, consumer-grade MLC (Multi-Level Cell) NAND degrades too rapidly for enterprise duty. However, some server flash vendors uses a mixture of SLC and MLC, along with some intelligent data placement and wear-leveling technology to take best advantage of both NAND technologies.
Sun Microsystems is planning to introduce its own flash DIMMs and use them in its upcoming 4 TB Flash Array 5100. Nothing has been announced yet, but Andy Bechtolsheim (who still seems to be keeping his Sun hobby alive, despite his Arista Networks gig) has talked up the 5100 product in at least a couple of venues, including the MySQL Conference this past spring.
I’ve embedded Andy’s 18-minute conference presentation below. He just makes a brief mention of the 5100, but the rest of the talk is a nice explanation of where SSD technology sits today in the server space.