September 29, 2009
Object database maker Versant has done pretty well in its market niche, with a list of 1,500 customers that includes well-known names like AT&T, Alcatel-Lucent, Ericsson, and British Airways. With 80 people, a market cap of $65 million USD, and revenue last year of $25 million, Versant is a small company by most any measure. But it is in a small industry: while the relational database business is valued around $10 billion a year these days, the object database market is on the order of only a couple hundred million dollars a year.
A niche product for a niche market, Versant's core technology isn't needed everywhere, but it is indispensable where it is needed. And the company is hoping to demonstrate that at least some HPC users need it.
Alright, first things first: what's an object database? Object databases provide persistent storage for, well, objects. Imagine you have a backpack object, and that backpack has a flashlight object and a rope object in it. When you retrieve the backpack object out of the database you get the flashlight and the rope along with it, no extra queries required (well, actually you probably get pointers to those objects, but that's a detail).
With a relational database, data is stored in rows and columns in (probably many) tables in your database. In our backpack example all backpacks may be listed in a specific table, with each given a unique ID. Another table may store the various camp tools, like ropes and flashlights, that campers may put in backpacks. And yet a third table would put these together, with one row holding the ID for our backpack and the flashlight, and another row holding the ID for our backpack again and the rope.
The mechanics of retrieval offer an important distinction with the relational model: unlike a relational database wherein programmers have to structure a database request (query) in a separate language called SQL, an object database works in the context of a regular programming language such as C++, C or Java. So, for the object database, a programmer calls the backpack object into memory and it comes along with (pointers to) the flashlight and rope objects. But a programmer using a relational database would constructuct a SQL query that first pulled all of the records from the third table to find all the entries that are associated with our backpack's ID. Then he'd have to construct other queries to look in the camp tool tables to find out what kinds of tools were attached to those IDs.
Despite the apparent added headaches of working with SQL and a relational database, they can be very (very) fast in a wide variety of applications, and have been proven to scale to enormous sizes. They are ubiquitous in nearly every enterprise, and you probably have a bunch in your own HPC center for managing inventory, user tickets, and so on. On the other hand, there are well-documented situations in which object databases are not only easier for a developer to deal with, they are much faster than the alternatives.
"Complexity and concurrency are the two things that we look for in application profiles that would lend themselves well to an object-oriented database," says David Ingersoll, Versant's VP of sales (Americas and APAC).
Of course, in traditional high performance technical computing, the choices aren't between relational and object databases. The choices are between using any kind of database at all and flat files. And Ingersoll acknowledges this is a key obstacle they face in talking with clients, "One challenge is just to get people to realize that they need a database and not just a filesystem."
But he isn't coming to HPC empty-handed. When he briefed us about Versant's potential in the HPC space, Ingersoll talked about examples of traditional HPC users using Versant's object databases in HPC applications today; particularly, applications with large streaming data. For example, the Air Force Weather Agency uses a Versant database to store real-time satellite imagery that is then fed into computational models for cloud forecasts. Other similar applications include the European Space Agency's Herschel Space Observatory, where Versant is the mission database, and Verizon, where real-time call data are streamed into a hierarchical set of databases that are used for near real-time fraud detection.
Exxon Mobil is also using Versant's technology in its reservoir simulation system, EMPower. In its application, results from large-scale numerical simulations are stored in the database and then subjected to analytics routines that answer questions about where to place wells, when and where to inject fluids, and so forth.
In many of their HPC examples, Versant's users are storing the data in a large database that itself may be hosted on a cluster. Hundreds or thousands of clients then access the database from the compute nodes of other clusters to process the data and answer mission questions. This is a basic level of parallelism supported by Versant, which also offers multi-threaded and parallel queries baked into the database engine along with a dual cache and object-locking for high concurrency support.
Object databases themselves aren't new: work started on them in the 1980s and spiked in the early 1990s when all the cool kids were drinking the O-O Kool-Aid. As object-oriented languages have become mainstream (including C++, Java, and C#), programmers have struggled with mapping their languages to relational databases because they wanted to work with what they knew: familiar languages and familiar (relational) databases.
And this points to a key challenge in positioning an object database technology for HPC: if you aren't using an object-oriented language, you aren't going to see much benefit. "C and FORTRAN don't lend themselves well [to object databases] because the domain models are very flat, very procedurally oriented, and they're not going to have a lot of inter-relationships," says Ingersoll. "At that point, the benefit of our system really falls down."
Versant is targeting markets and applications where C++ and Java are already in use for intensive computing, or where the practictioners don't have a vast store of legacy code in their toolboxes already. Areas like bioinformatics offer a lot of potential, not only because of the very modern nature of many of those codes, but also because the domain data model is inherently object-oriented. According to Ingersoll, "We are at that point where people are just coming [into HPC in these domains], so if we can get in front of that wave then that's a benefit for us."
Versant is looking to build partnerships as it tries to wriggle into the HPC market. Ingersoll let us know that they are talking to both Penguin Computing and Panasas about working more closely together. The Panasas opportunity seems particularly appropos given the object-based nature of Panasas' PanFS file system. In fact, according to Ingersoll, Versant is already being used in the financial services industry on a cluster outfitted with Panasas storage.
Versant doesn't have the object database market to itself, of course. It competes with companies like Objectivity and Intersystems in the object database market, and with Microsoft and Oracle, both of which have a growing interest in the technology. Object databases are an interesting technology, and in twenty years of development Versant has structured a robust solution. But getting databases into HPC, even into the developing segments of our community, will be a tall order. Differentiating object databases from relational databases to HPC people layers another challenge on top of that.
This is a challenge that Ingersoll feels Versant is equal to, "We are getting the market to understand that difference," he says. "If people are investigating what steps to take today, we have a much better shot at educating them than if they are going to be moving that application from C to C++, and you're really going to be thoughtful about how you're modeling the application, then we provide orders of magnitude of performance benefits."
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
Read more...
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
Read more...
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
Read more...
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.