Object database maker Versant has done pretty well in its market niche, with a list of 1,500 customers that includes well-known names like AT&T, Alcatel-Lucent, Ericsson, and British Airways. With 80 people, a market cap of $65 million USD, and revenue last year of $25 million, Versant is a small company by most any measure. But it is in a small industry: while the relational database business is valued around $10 billion a year these days, the object database market is on the order of only a couple hundred million dollars a year.
A niche product for a niche market, Versant’s core technology isn’t needed everywhere, but it is indispensable where it is needed. And the company is hoping to demonstrate that at least some HPC users need it.
Alright, first things first: what’s an object database? Object databases provide persistent storage for, well, objects. Imagine you have a backpack object, and that backpack has a flashlight object and a rope object in it. When you retrieve the backpack object out of the database you get the flashlight and the rope along with it, no extra queries required (well, actually you probably get pointers to those objects, but that’s a detail).
With a relational database, data is stored in rows and columns in (probably many) tables in your database. In our backpack example all backpacks may be listed in a specific table, with each given a unique ID. Another table may store the various camp tools, like ropes and flashlights, that campers may put in backpacks. And yet a third table would put these together, with one row holding the ID for our backpack and the flashlight, and another row holding the ID for our backpack again and the rope.
The mechanics of retrieval offer an important distinction with the relational model: unlike a relational database wherein programmers have to structure a database request (query) in a separate language called SQL, an object database works in the context of a regular programming language such as C++, C or Java. So, for the object database, a programmer calls the backpack object into memory and it comes along with (pointers to) the flashlight and rope objects. But a programmer using a relational database would constructuct a SQL query that first pulled all of the records from the third table to find all the entries that are associated with our backpack’s ID. Then he’d have to construct other queries to look in the camp tool tables to find out what kinds of tools were attached to those IDs.
Despite the apparent added headaches of working with SQL and a relational database, they can be very (very) fast in a wide variety of applications, and have been proven to scale to enormous sizes. They are ubiquitous in nearly every enterprise, and you probably have a bunch in your own HPC center for managing inventory, user tickets, and so on. On the other hand, there are well-documented situations in which object databases are not only easier for a developer to deal with, they are much faster than the alternatives.
“Complexity and concurrency are the two things that we look for in application profiles that would lend themselves well to an object-oriented database,” says David Ingersoll, Versant’s VP of sales (Americas and APAC).
Of course, in traditional high performance technical computing, the choices aren’t between relational and object databases. The choices are between using any kind of database at all and flat files. And Ingersoll acknowledges this is a key obstacle they face in talking with clients, “One challenge is just to get people to realize that they need a database and not just a filesystem.”
But he isn’t coming to HPC empty-handed. When he briefed us about Versant’s potential in the HPC space, Ingersoll talked about examples of traditional HPC users using Versant’s object databases in HPC applications today; particularly, applications with large streaming data. For example, the Air Force Weather Agency uses a Versant database to store real-time satellite imagery that is then fed into computational models for cloud forecasts. Other similar applications include the European Space Agency’s Herschel Space Observatory, where Versant is the mission database, and Verizon, where real-time call data are streamed into a hierarchical set of databases that are used for near real-time fraud detection.
Exxon Mobil is also using Versant’s technology in its reservoir simulation system, EMPower. In its application, results from large-scale numerical simulations are stored in the database and then subjected to analytics routines that answer questions about where to place wells, when and where to inject fluids, and so forth.
In many of their HPC examples, Versant’s users are storing the data in a large database that itself may be hosted on a cluster. Hundreds or thousands of clients then access the database from the compute nodes of other clusters to process the data and answer mission questions. This is a basic level of parallelism supported by Versant, which also offers multi-threaded and parallel queries baked into the database engine along with a dual cache and object-locking for high concurrency support.
Object databases themselves aren’t new: work started on them in the 1980s and spiked in the early 1990s when all the cool kids were drinking the O-O Kool-Aid. As object-oriented languages have become mainstream (including C++, Java, and C#), programmers have struggled with mapping their languages to relational databases because they wanted to work with what they knew: familiar languages and familiar (relational) databases.
And this points to a key challenge in positioning an object database technology for HPC: if you aren’t using an object-oriented language, you aren’t going to see much benefit. “C and FORTRAN don’t lend themselves well [to object databases] because the domain models are very flat, very procedurally oriented, and they’re not going to have a lot of inter-relationships,” says Ingersoll. “At that point, the benefit of our system really falls down.”
Versant is targeting markets and applications where C++ and Java are already in use for intensive computing, or where the practictioners don’t have a vast store of legacy code in their toolboxes already. Areas like bioinformatics offer a lot of potential, not only because of the very modern nature of many of those codes, but also because the domain data model is inherently object-oriented. According to Ingersoll, “We are at that point where people are just coming [into HPC in these domains], so if we can get in front of that wave then that’s a benefit for us.”
Versant is looking to build partnerships as it tries to wriggle into the HPC market. Ingersoll let us know that they are talking to both Penguin Computing and Panasas about working more closely together. The Panasas opportunity seems particularly appropos given the object-based nature of Panasas’ PanFS file system. In fact, according to Ingersoll, Versant is already being used in the financial services industry on a cluster outfitted with Panasas storage.
Versant doesn’t have the object database market to itself, of course. It competes with companies like Objectivity and Intersystems in the object database market, and with Microsoft and Oracle, both of which have a growing interest in the technology. Object databases are an interesting technology, and in twenty years of development Versant has structured a robust solution. But getting databases into HPC, even into the developing segments of our community, will be a tall order. Differentiating object databases from relational databases to HPC people layers another challenge on top of that.
This is a challenge that Ingersoll feels Versant is equal to, “We are getting the market to understand that difference,” he says. “If people are investigating what steps to take today, we have a much better shot at educating them than if they are going to be moving that application from C to C++, and you’re really going to be thoughtful about how you’re modeling the application, then we provide orders of magnitude of performance benefits.”