Visit additional Tabor Communication Publications
June 16, 2011
For all the accolades one hears about German engineering, there are few IT vendors native to that country. Recently though, we got the opportunity to talk with one such company, ParStream, a Cologne-based startup that has developed a bleeding-edge CPU/GPU-based analytics platform that marries high performance computing to big data.
ParStream, whose official company name is empulse GmbH, was founded four years ago by Michael Hummel and Joerg Bienert, who share the title of managing director. The duo funded the venture themselves but were able to subsequently attract some external investment. That was enough to develop the initial software and appliance products, and even snag a couple of paying customers. Right now they are looking for venture capital to move the business into the fast lane.
ParStream was initially formed around the idea of doing IT consulting and application development, much like the work Hummel and Bienert performed at Accenture, where the two had met. But about three years ago, their newly hatched company got a contract from the German tourism industry to build a search engine for a travel package offering. They wanted the application to be able to search through about 6 billion data records against 20 parameters in less that 100 ms. Unfortunately, most of the current database technology, based on decades-old software architectures, doesn't provide anything close to the level of parallelism required to digest these big databases under such strict time constraints. Thus was born ParStream and its new mission: to do big data analytics with an HPC flair.
Hummel and Bienert developed their own database software kernel that was able to handle the tourism industry's search problem on conventional hardware, that is, x86 clusters. According to Bienert, they quickly realized the solution they came up with could be generalized. "Afterward, we looked at other industries and found that this big data challenge was everywhere, so we decided to make a product out of it," he told HPCwire.
Hummel and Bienert figured any business that deals in super-sized datasets and has a need for interactive analysis would be able to use this technology. The main technological challenge is to be able to run many concurrent queries on the data and deliver the results in real-time or near real-time. This includes such applications such as web analytics, bioinformatics, intelligent ad serving, algorithmic trading, fraud detection, market research, and smart energy metering, among many others.
As suggested by its name, the ParStream software performs parallel streaming of data structures. In this case they are focused on structured data, but of such a size that they can have thousands of columns and millions, or even billions, of rows. According to the company, their offering performs, on average, about 35 times faster than traditional database products.
The secret is to parallelize each query such that it can be processed simultaneously on many cores spread across multiple nodes. In a cluster environment, the data is stored on individual servers in a "shared nothing" environment. Since there is little interprocess communication, the performance can scale linearly with the cluster size; doubling processors or nodes should double throughput.
They haven't tested ParStream on a petabyte-sized system yet, but according to Bienert, there is no inherent limitation in the software that would prevent it from scaling to that level. To be fair, a lot of other analytics engines also operate in parallel, but in many cases that means multiple queries can be run simultaneously, but each requires its own processor.
Newer technology such as Google's MapReduce and its open-source Hadoop derivative, are able to decompose the query into many independent pieces, just like the ParStream software. But according to Bienert, the MapReduce technology is more suited for batch-mode processing, rather than real-time analysis. Three of the ParStream's potential clients had tried the MapReduce scheme and encountered those limitations. In fact, last year Google itself abandoned MapReduce for query-type searching in favor of a higher performance technology called Dremel.
It's not just about query parallelization though. ParStream's real secret sauce is their index structure. Like many traditional relational databases, the bitmap index is in compressed form to save space in memory. But according to Bienert, the ParStream index can be used while compressed; there's no need for a compute- and memory-intensive decompression step to operate on it. "This is the heart of ParStream and what makes it extremely fast," he says.
That same technology makes it extremely efficient from a hardware standpoint. Bienert says in a production environment, where the other database solutions would require about 400 servers, ParStream only needs 20 and executes many times faster.
They initially wrote their software to run on generic 64-bit x86-based Linux platforms -- single nodes and clusters. Later they found their parallel approach and bitmap structure was very well-suited to general-purpose GPUs, which provided a speed up of 8-10x, compared to the CPU-only version.
Not just any GPU would do though. The ParStream software required error corrected code (ECC) memory since it was critical to maintain the integrity of the bitmap index and other compressed data structures in memory. Arbitrarily flipping bits would not do. With NVIDIA's Fermi (Tesla 20-series) GPU, ParStream got that critical ECC support.
For the GPU-accelerated version, the company has to provide a custom applicance because the configuration is a little tricky for the software's needs. In fact, each GPU deployment is a custom job at this point. The specific configuration (mix of Fermi cards, x86 processors, and memory capacity) is based on application requirements associated with throughput, database size, and so on. A single node can contain up to four CPUs and eight GPU cards.
At this point, the company is building up proof points for their technology. They have two existing customers in Europe in the eCommerce sector, and five additional prospects across multiple industries running proof-of-concept deployments.
Early results look encouraging. A German customer with a web analytics application originally took three to five minutes on a "large cluster" to analyze billions of records using their traditional database solution. After some tuning of the ParStream software, the customer was able to perform the same query calculation in 15 ms, and on just four x86 servers. The most difficult part was convincing the customer that the solution was spitting out valid results "instantaneously." The company is currently in the process of migrating their whole infrastructure to ParStream, says Bienert.
In two other instances where interactive analytics was the driving goal, ParStream delivered impressive performance results. A market research firm with 20 million records (1000 columns apiece) was able to perform 5000 queries in just 5 seconds, and a climate research center in Germany was able analyze 3 billion records in 100 milliseconds (ms) as part of an effort to identify hurricane risk. Each of these applications was run on a single server using the ParStream offering.
Bienert believes ParStream's high throughput, low-latency analytics has a significant edge on its competition at this point. Other up-and-coming big data vendors, like Vertica and EXASOL, are also touting highly parallel architectures, but as of today Bienert thinks they're alone in offering GPU-based acceleration and their unique compressed data indexing scheme. The company is hoping that's enough to attract some savvy investors.
In the meantime they'll be hitting the trade show circuit. Hummel introduced the technology last September at NVIDIA's GPU Technology Conference, where the company was selected as "One to Watch" by the GPU maker. ParStream's first exhibition of their offerings will be at the International Supercomputing Conference in Hamburg, Germany next week, where they hope to wow the HPC faithful.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.