The Pittsburgh Supercomputer Center (PSC), home to a number of data-intensive systems and projects, is testing new ropes to scale the memory wall.
More specifically, they’re interested in validating and testing technologies that offer possible improvements in I/O and memory. While they’ve experimented already with a number of technologies on both the hardware and software fronts, the center’s Scientific Director, Michael Levine, told us recently that approaches which pool together memory resources are of particular interest.
PSC has a number of research projects that require more directly addressable memory than what they are able to get with their current multi-socket, large memory servers. According to Michael Levine, Scientific Director at PSC, “having large amounts of data in directly-addressable memory avoids very time-consuming disk input/output and allows a much more productive computing paradigm.”
The center already has a stable of systems that attack the problems of “big data” and memory access in different ways. PSC houses the shared memory SGI Blacklight system (4096 cores and 32 TB shared memory) as well as Sherlock, which is the Cray (or specifically, YarcData division) XMT-based graph analytics appliance. There are also a number of smaller speciality clusters, including the SGI Altix SMP machine, Salk and an innovative data management system called the Data Supercell.
PSC recently announced that it would be putting the Norwegian company, Numascale, to the test. Specifically, PSC is interested in the company’s NumaConnect interconnect technology, which would allow them to build a cache coherent shared memory using the company’s hardware-based approach.
Levine says that the experimentation, which has not been defined in great detail, will at the very least allow them to compare how Numascale’s approach stacks up against different moves to the same goal from SGI and Cray in particular–both companies they’ve worked hand-in-hand with during implementation of other data-focused systems.
SGI takes a different route to opening memory than NumaConnect, but Levine says that following application-relevant testing they’ll be able to report comparisons (and hopefully shed some light on price/performance).
In essence, Numascale’s technology lets users turn a cluster into a shared memory machine by plugging into HyperTransport via the PCI bus using HTX. Numascale is focused on AMD systems for now–they have yet to sync up with Intel, which limits the choices PSC has. Levine says that one of the attractive elements about their work with SGI is that there is a great deal of customization possible.
According to Einar Rustad, CTO and co-founder of Numascale, “The huge and scalable memory capacity in systems with NumaConnect allows users to operate in the familiar programming and runtime environment they are used to…further, he notes that part of the strength of their approach is that it “eliminates the need for explicit message passing and reduces the overall time to solution.”
The key to Numascale’s attractiveness for a deep memory-focused center like PSC is the ability to tap the company’s interconnect to allow programs to access any memory location or memory-mapped IO device across the entirety of a system.
While the NUMA technology is not new by any means, Numascale’s technology is focused on using their interconnect system as a low-latency clustering interconnect–and, according to them, at a much lower price point than others although we’ve been unable to scare up information on the price range.
During our chat with Levine, he said they have also experimented with other technologies aimed at addressing memory problems outside of hardware. For instance, he pointed to a similar experiment with ScaleMP’s approach and although he didn’t offer details about how it stacked up, he did say it opened some important questions about solving memory problems in software versus hardware–although he noted as well that all assessments are application-dependent.