Science On the Outskirts of In-Memory
In-memory computing has been an active topic of discussion in commercial “big data” circles, but a recent use case of its ability to address complex computational chemistry demands highlights its potential in scientific computing.
Today, GridGain Systems detailed how Portland State University made use of their In-Memory HPC offering in its attempt to build an adaptive learning system that will be able to detect diseases, offer medical diagnoses, and offer therapeutic recommendations based on biocompatible computers that can be embedded into living cells.
To highlight how the technology fits into the broader spectrum of high performance computing applications, we talked to the company’s VP of Product Management, Jon Webster. In addition to covering how this approach to in-memory computing differs from that which is being touted in the commercial sector, we hit on how it compares to other modes of handling massive datasets for scientific simulations, and how a sense of ROI comes through for users who could alternately simply throw more hardware at their problem.
GridGain’s approach to in-memory for HPC applications is somewhat different than what some of the other “in-memory” companies are producing. In fact, they term this offering as a “real-time, high performance distributed computation framework” which, for those who pay attention to another important movement from the more commercial side of the performance house, sounds a lot like Hadoop/MapReduce.
The two are different, says GridGain, noting that “if Hadoop MapReduce tasks take input from disk, produce intermediate results on disk and then output that result onto disk, GridGain does everything Hadoop does in memory—so in other words, it takes input from memory via direct API calls, produces intermediate results in memory and then creates results in-memory as well.”
In other words, it’s not just about storage, which is what so many vendors on the mainstream side are talking about when they use the term in-memory, says Webster. He says that the current approaches aren’t enough to handle complex workloads—they require what GridGain offers, at least for HPC workloads, he argues, which is actually embedding a purpose-built compute engine across a large partitioned in-memory database (which is another component to their offering) so you have all your data in memory. In essence, instead of grabbing data, processing it and putting it back, this allows users to process it with locality in mind so that data movement can be minimized.
GridGain’s framework seeks to support different execution modes, including MapReduce processing, while also offering support for other common HPC-oriented models (MPP, MPI and RPC) to help broaden their base of HPC customers.
They’ve been able to serve a number of other HPC-oriented users with this approach, including using it in financial markets for fast trade matching, risk management. Other industries that tend to fall outside the purview of what has always been considered traditional HPC like online gaming and real-time ad targeting represent a “filtering down” of technologies that have been proven at massive scale.
As you’ll hear in the interview above, this approach allowed Portland State’s researchers to do something that wouldn’t have been possible before with their current infrastructure. Webster also details what in-memory means for other HPC applications and how their approach to data-intensive workloads might evolve to meet other hardware and software architectures and frameworks.