Startup Brings HPC to Big Data Analytics

By Michael Feldman

June 16, 2011

For all the accolades one hears about German engineering, there are few IT vendors native to that country. Recently though, we got the opportunity to talk with one such company, ParStream, a Cologne-based startup that has developed a bleeding-edge CPU/GPU-based analytics platform that marries high performance computing to big data.

ParStream, whose official company name is empulse GmbH, was founded four years ago by Michael Hummel and Joerg Bienert, who share the title of managing director. The duo funded the venture themselves but were able to subsequently attract some external investment. That was enough to develop the initial software and appliance products, and even snag a couple of paying customers. Right now they are looking for venture capital to move the business into the fast lane.

ParStream was initially formed around the idea of doing IT consulting and application development, much like the work Hummel and Bienert performed at Accenture, where the two had met. But about three years ago, their newly hatched company got a contract from the German tourism industry to build a search engine for a travel package offering. They wanted the application to be able to search through about 6 billion data records against 20 parameters in less that 100 ms. Unfortunately, most of the current database technology, based on decades-old software architectures, doesn’t provide anything close to the level of parallelism required to digest these big databases under such strict time constraints. Thus was born ParStream and its new mission: to do big data analytics with an HPC flair.

Hummel and Bienert developed their own database software kernel that was able to handle the tourism industry’s search problem on conventional hardware, that is, x86 clusters. According to Bienert, they quickly realized the solution they came up with could be generalized. “Afterward, we looked at other industries and found that this big data challenge was everywhere, so we decided to make a product out of it,” he told HPCwire.

Hummel and Bienert figured any business that deals in super-sized datasets and has a need for interactive analysis would be able to use this technology. The main technological challenge is to be able to run many concurrent queries on the data and deliver the results in real-time or near real-time. This includes such applications such as web analytics, bioinformatics, intelligent ad serving, algorithmic trading, fraud detection, market research, and smart energy metering, among many others.

As suggested by its name, the ParStream software performs parallel streaming of data structures. In this case they are focused on structured data, but of such a size that they can have thousands of columns and millions, or even billions, of rows. According to the company, their offering performs, on average, about 35 times faster than traditional database products.

The secret is to parallelize each query such that it can be processed simultaneously on many cores spread across multiple nodes. In a cluster environment, the data is stored on individual servers in a “shared nothing” environment. Since there is little interprocess communication, the performance can scale linearly with the cluster size; doubling processors or nodes should double throughput.

They haven’t tested ParStream on a petabyte-sized system yet, but according to Bienert, there is no inherent limitation in the software that would prevent it from scaling to that level. To be fair, a lot of other analytics engines also operate in parallel, but in many cases that means multiple queries can be run simultaneously, but each requires its own processor.

Newer technology such as Google’s MapReduce and its open-source Hadoop derivative, are able to decompose the query into many independent pieces, just like the ParStream software. But according to Bienert, the MapReduce technology is more suited for batch-mode processing, rather than real-time analysis. Three of the ParStream’s potential clients had tried the MapReduce scheme and encountered those limitations. In fact, last year Google itself abandoned MapReduce for query-type searching in favor of a higher performance technology called Dremel.

It’s not just about query parallelization though. ParStream’s real secret sauce is their index structure. Like many traditional relational databases, the bitmap index is in compressed form to save space in memory. But according to Bienert, the ParStream index can be used while compressed; there’s no need for a compute- and memory-intensive decompression step to operate on it. “This is the heart of ParStream and what makes it extremely fast,” he says.

That same technology makes it extremely efficient from a hardware standpoint. Bienert says in a production environment, where the other database solutions would require about 400 servers, ParStream only needs 20 and executes many times faster.

They initially wrote their software to run on generic 64-bit x86-based Linux platforms — single nodes and clusters. Later they found their parallel approach and bitmap structure was very well-suited to general-purpose GPUs, which provided a speed up of 8-10x, compared to the CPU-only version.

Not just any GPU would do though. The ParStream software required error corrected code (ECC) memory since it was critical to maintain the integrity of the bitmap index and other compressed data structures in memory. Arbitrarily flipping bits would not do. With NVIDIA’s Fermi (Tesla 20-series) GPU, ParStream got that critical ECC support.

For the GPU-accelerated version, the company has to provide a custom applicance because the configuration is a little tricky for the software’s needs. In fact, each GPU deployment is a custom job at this point. The specific configuration (mix of Fermi cards, x86 processors, and memory capacity) is based on application requirements associated with throughput, database size, and so on. A single node can contain up to four CPUs and eight GPU cards.

At this point, the company is building up proof points for their technology. They have two existing customers in Europe in the eCommerce sector, and five additional prospects across multiple industries running proof-of-concept deployments.

Early results look encouraging. A German customer with a web analytics application originally took three to five minutes on a “large cluster” to analyze billions of records using their traditional database solution. After some tuning of the ParStream software, the customer was able to perform the same query calculation in 15 ms, and on just four x86 servers. The most difficult part was convincing the customer that the solution was spitting out valid results “instantaneously.” The company is currently in the process of migrating their whole infrastructure to ParStream, says Bienert.

In two other instances where interactive analytics was the driving goal, ParStream delivered impressive performance results. A market research firm with 20 million records (1000 columns apiece) was able to perform 5000 queries in just 5 seconds, and a climate research center in Germany was able analyze 3 billion records in 100 milliseconds (ms) as part of an effort to identify hurricane risk. Each of these applications was run on a single server using the ParStream offering.

Bienert believes ParStream’s high throughput, low-latency analytics has a significant edge on its competition at this point. Other up-and-coming big data vendors, like Vertica and EXASOL, are also touting highly parallel architectures, but as of today Bienert thinks they’re alone in offering GPU-based acceleration and their unique compressed data indexing scheme. The company is hoping that’s enough to attract some savvy investors.

In the meantime they’ll be hitting the trade show circuit. Hummel introduced the technology last September at NVIDIA’s GPU Technology Conference, where the company was selected as “One to Watch” by the GPU maker. ParStream’s first exhibition of their offerings will be at the International Supercomputing Conference in Hamburg, Germany next week, where they hope to wow the HPC faithful.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Microsoft Bolsters Azure With Cloud HPC Deal

August 15, 2017

Microsoft has acquired cloud computing software vendor Cycle Computing in a move designed to bring orchestration tools along with high-end computing access capabilities to the cloud. Terms of the acquisition were not Read more…

By George Leopold

HPE Ships Supercomputer to Space Station, Final Destination Mars

August 14, 2017

With a manned mission to Mars on the horizon, the demand for space-based supercomputing is at hand. Today HPE and NASA sent the first off-the-shelf HPC system into space aboard the SpaceX Dragon Spacecraft to explore if Read more…

By Tiffany Trader

AMD EPYC Video Takes Aim at Intel’s Broadwell

August 14, 2017

Let the benchmarking begin. Last week, AMD posted a YouTube video in which one of its EPYC-based systems outperformed a ‘comparable’ Intel Broadwell-based system on the STREAM benchmark and on a test case running ANS Read more…

By John Russell

HPE Extreme Performance Solutions

Accelerating Genomics Research with a New Breakthrough Architecture

The field of genomics is revolutionizing our understanding of human biology, rapidly accelerating the discovery and treatment of genetic diseases, and dramatically improving human health. Read more…

Livermore Computing, Reddit Asked Them Anything

August 10, 2017

In case you missed it, the staff of Livermore Computing (LC) at the Lawrence Livermore National Laboratory (LLNL) recently fielded some questions from the internet, part of Reddit's Science Ask Me Anything (AMA) series. Read more…

By Tiffany Trader

Microsoft Bolsters Azure With Cloud HPC Deal

August 15, 2017

Microsoft has acquired cloud computing software vendor Cycle Computing in a move designed to bring orchestration tools along with high-end computing access capa Read more…

By George Leopold

HPE Ships Supercomputer to Space Station, Final Destination Mars

August 14, 2017

With a manned mission to Mars on the horizon, the demand for space-based supercomputing is at hand. Today HPE and NASA sent the first off-the-shelf HPC system i Read more…

By Tiffany Trader

AMD EPYC Video Takes Aim at Intel’s Broadwell

August 14, 2017

Let the benchmarking begin. Last week, AMD posted a YouTube video in which one of its EPYC-based systems outperformed a ‘comparable’ Intel Broadwell-based s Read more…

By John Russell

Deep Learning Thrives in Cancer Moonshot

August 8, 2017

The U.S. War on Cancer, certainly a worthy cause, is a collection of programs stretching back more than 40 years and abiding under many banners. The latest is t Read more…

By John Russell

IBM Raises the Bar for Distributed Deep Learning

August 8, 2017

IBM is announcing today an enhancement to its PowerAI software platform aimed at facilitating the practical scaling of AI models on today’s fastest GPUs. Scal Read more…

By Tiffany Trader

IBM Storage Breakthrough Paves Way for 330TB Tape Cartridges

August 3, 2017

IBM announced yesterday a new record for magnetic tape storage that it says will keep tape storage density on a Moore's law-like path far into the next decade. Read more…

By Tiffany Trader

AMD Stuffs a Petaflops of Machine Intelligence into 20-Node Rack

August 1, 2017

With its Radeon “Vega” Instinct datacenter GPUs and EPYC “Naples” server chips entering the market this summer, AMD has positioned itself for a two-head Read more…

By Tiffany Trader

Cray Moves to Acquire the Seagate ClusterStor Line

July 28, 2017

This week Cray announced that it is picking up Seagate's ClusterStor HPC storage array business for an undisclosed sum. "In short we're effectively transitioning the bulk of the ClusterStor product line to Cray," said CEO Peter Ungaro. Read more…

By Tiffany Trader

Nvidia’s Mammoth Volta GPU Aims High for AI, HPC

May 10, 2017

At Nvidia's GPU Technology Conference (GTC17) in San Jose, Calif., this morning, CEO Jensen Huang announced the company's much-anticipated Volta architecture a Read more…

By Tiffany Trader

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Quantum Bits: D-Wave and VW; Google Quantum Lab; IBM Expands Access

March 21, 2017

For a technology that’s usually characterized as far off and in a distant galaxy, quantum computing has been steadily picking up steam. Just how close real-wo Read more…

By John Russell

Reinders: “AVX-512 May Be a Hidden Gem” in Intel Xeon Scalable Processors

June 29, 2017

Imagine if we could use vector processing on something other than just floating point problems.  Today, GPUs and CPUs work tirelessly to accelerate algorithms Read more…

By James Reinders

HPC Compiler Company PathScale Seeks Life Raft

March 23, 2017

HPCwire has learned that HPC compiler company PathScale has fallen on difficult times and is asking the community for help or actively seeking a buyer for its a Read more…

By Tiffany Trader

Russian Researchers Claim First Quantum-Safe Blockchain

May 25, 2017

The Russian Quantum Center today announced it has overcome the threat of quantum cryptography by creating the first quantum-safe blockchain, securing cryptocurrencies like Bitcoin, along with classified government communications and other sensitive digital transfers. Read more…

By Doug Black

Trump Budget Targets NIH, DOE, and EPA; No Mention of NSF

March 16, 2017

President Trump’s proposed U.S. fiscal 2018 budget issued today sharply cuts science spending while bolstering military spending as he promised during the cam Read more…

By John Russell

Leading Solution Providers

CPU-based Visualization Positions for Exascale Supercomputing

March 16, 2017

In this contributed perspective piece, Intel’s Jim Jeffers makes the case that CPU-based visualization is now widely adopted and as such is no longer a contrarian view, but is rather an exascale requirement. Read more…

By Jim Jeffers, Principal Engineer and Engineering Leader, Intel

Facebook Open Sources Caffe2; Nvidia, Intel Rush to Optimize

April 18, 2017

From its F8 developer conference in San Jose, Calif., today, Facebook announced Caffe2, a new open-source, cross-platform framework for deep learning. Caffe2 is the successor to Caffe, the deep learning framework developed by Berkeley AI Research and community contributors. Read more…

By Tiffany Trader

Groq This: New AI Chips to Give GPUs a Run for Deep Learning Money

April 24, 2017

CPUs and GPUs, move over. Thanks to recent revelations surrounding Google’s new Tensor Processing Unit (TPU), the computing world appears to be on the cusp of Read more…

By Alex Woodie

Google Debuts TPU v2 and will Add to Google Cloud

May 25, 2017

Not long after stirring attention in the deep learning/AI community by revealing the details of its Tensor Processing Unit (TPU), Google last week announced the Read more…

By John Russell

MIT Mathematician Spins Up 220,000-Core Google Compute Cluster

April 21, 2017

On Thursday, Google announced that MIT math professor and computational number theorist Andrew V. Sutherland had set a record for the largest Google Compute Engine (GCE) job. Sutherland ran the massive mathematics workload on 220,000 GCE cores using preemptible virtual machine instances. Read more…

By Tiffany Trader

Six Exascale PathForward Vendors Selected; DoE Providing $258M

June 15, 2017

The much-anticipated PathForward awards for hardware R&D in support of the Exascale Computing Project were announced today with six vendors selected – AMD Read more…

By John Russell

Top500 Results: Latest List Trends and What’s in Store

June 19, 2017

Greetings from Frankfurt and the 2017 International Supercomputing Conference where the latest Top500 list has just been revealed. Although there were no major Read more…

By Tiffany Trader

IBM Clears Path to 5nm with Silicon Nanosheets

June 5, 2017

Two years since announcing the industry’s first 7nm node test chip, IBM and its research alliance partners GlobalFoundries and Samsung have developed a proces Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Share This