Netezza is a six year-old company that’s been on the edge of my radar screen for awhile. That’s mostly because they sell data warehousing appliances — not exactly my idea of mainstream high performance computing. But what the company really does is marry data warehousing with streaming analytics. And it does it in a sort of sexy way, geek-wise.
In a conventional data warehousing setup, you have a database stored on a SAN, which is connected to a mainframe or more likely, a compute cluster. Processing takes place after the data is loaded from the storage hardware onto the computing hardware. In a transactional database app, this is fine and dandy, since the data volumes and the amount of processing are usually not stressed by the limits of the network’s bandwidth and latency.
In streaming applications, lots of data must be processed in real time or close to it. And when I say lots of data here, I’m talking terabytes. This type of software is most commonly associated with “business intelligence,” but streaming apps encompass an even wider range — everything from data mining to financial analytics to intelligence gathering. In this environment, the compute-storage links can easily become a communication bottleneck. Netezza appliances attempt to rectify this by placing the storage and compute pieces in close proximity and by providing a streaming framework for the applications. Here’s how the company describes it:
Rather than shuttling data between disk and memory for processing once a query comes in, which creates the bottleneck, data streams off the disk and through query logic loaded into an FPGA (field programmable gate array). The FPGA and processor (a PowerPC chip), together with 400 GB of disk storage, reside on each of the massively parallel nodes that Netezza calls snippet processing units (SPUs). Each of our Netezza racks contains 112 of these SPUs. Queries are optimized across the SPUs for maximum performance and power efficiency A Linux host server aggregates SPU results and manages query workload and the results are returned to the user.
Not exactly a commodity solution. But unlike a lot of other vendors peddling unique HPC solutions, Netezza has managed to attract some big name customers including Amazon, AOL, The American Red Cross, CNET Networks, Nationwide Financial Services, Sandia National Laboratories, and the US Army Corps of Engineers. All told, the company has collected 58 customers.
On Wednesday, Netezza announced five new applications for their platform:
- Database emulation functions for easier migration to Netezza (Edge Associates).
- Monte Carlo simulation for pricing derivatives (HCL Technologies).
- Fuzzy name matching for intelligence and fraud detection for government agencies (Multi-Threaded, Inc.).
- Dynamic re-pricing for telecommunications providers (RateIntegration).
- Business profitability management for retail and CPG companies (Pi Solutions, a Systech Solutions spinoff).
The new apps are the result of the Netezza Developer Network (NDN), a program the company launched in September 2007. The idea was to attract developers to write analytic applications for the Netezza platform.
Since many data warehousing applications are evolving from an online transaction processing (OLTP) model to an online analytics processing (OLAP) one, Netezza may have hit the market at the right time. The company is not alone, however, and has to deal with larger vendors like IBM, HP, Oracle, SAS, as well as a posse of smaller firms like Teradata and Greenplum. So far so good, though. Netezza went public in July 2007 and for the last two quarters has reported profits and growing revenue. In an industry that has mostly punished vendors that dared to offer non-commodity solutions, Netezza may be a refreshing exception.