From the Editor | Main Blog Index
April 16, 2009
If there is any information technology problem more intractable than parallel programming, it's the data deluge problem. In most cases, demand for storage is growing even faster than demand for more compute cycles, and storage boxes are increasingly taking up a larger share of the budget and real estate in the datacenter. Good news for storage vendors.
Or maybe not. Data by itself is worthless. Here I'm not talking about content destined for entertainment, like movies or music; I'm referring to those special bytes destined to be processed by computers. For the latter category, the value is in the knowledge that can be extracted from the data. The only reason to store it in the first place is that you're not prepared to process it yet, either because you don't have the time or resources in place or because other information needs to arrive before the original data can be digested.
But embracing the data deluge in the moment may be answer. Stream computing, which processes data on the fly, short-circuits the traditional workflow of storing data, reading it, processing it, and delivering the results. It's basically data processing without the database. This model is especially useful for complex event processing (CEP), which Neal Leavitt wrote about (PDF) in the April issue of IEEE's Computer. CEP software pipes one or more data sources to an analysis engine, so it intrinsically relies on the stream model to do its job. Writes Leavitt:
Vendors build CEP systems so that they initiate processing in response to inbound events, which enables them to function in real time. Traditional event-processing systems wait for a user action to initiate processing.
In fact, it is the real-time response, rather than the simplification of the storage layer, that is usually cited as the most valuable feature of these systems. That's why CEP turns out to be a very attractive platform for applications like options pricing or terrorist threat detection. Although the idea of CEP has been around for a while, general-purpose platforms are just starting to emerge. Software providers like Truviso, Aleri, Realtime Monitoring GmbH, and Event Zero are some of the early players.
Leavitt's research led him to Aite Group, which predicted that CEP products will "more than double from $180 million in 2008 to $370 million this year and increase to $460 million in 2010." He believes two advancements that are helping push CEP into the mainstream: the proliferation of real-time feeds and the modularization of IT infrastructure. To that I would also add the increase in computational performance of hardware. And this is where high performance computing could turn out to be a key enabling technology for such systems.
IBM recently unveiled a prototype of its stream computing technology by porting the company's InfoSphere Streams analytics software to the Blue Gene/P supercomputer. IBM has been working on a stream technology for several years, and started talking about it publicly just a couple of years ago under the name of System S (PDF). According to IBM, "It has some similarity to CEP systems, but it is built to support higher data rates and a broader spectrum of input data modalities. It also provides significant infrastructure support to address needs for scalability and dynamic adaptability such as scheduling, load balancing, and high availability."
The IBM prototype announced this month was done in conjunction with TD Securities to demonstrate what Big Blue thinks could be the model for next-generation trading systems. The press release quotes Nagui Halim, chief scientist of the Stream Computing Project at IBM: "TD Securities could potentially use the new system to analyze and act on information before their competitors can finish ingesting and analyzing, effectively blinding the competition to its actions. We're not talking about 20 percent faster here. We're talking about 20 times faster."
Of course, if this works as advertised, all the big financial firms will buy such a system. The real advantage to organizations comes from combining lots of streams of data in novel ways and asking unique questions. For example, pouring news feeds, weather reports, and maybe even proprietary data streams into the analytic soup could offer a way for companies to a gain a real edge on their competition.
In the meantime, it will be interesting to see how this technology makes its way into the world. As much as everyone complains about being inundated with information 24/7, the average person has come to accept it, and the movers and shakers have come to exploit it. In such an environment, buying more hardware to analyze yesterday's data may be a losing proposition.
Posted by Michael Feldman - April 16, 2009 @ 4:20 PM, Pacific Daylight Time
![]()
Michael Feldman is the editor of HPCwire.
No Recent Blog Comments
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
Read more...
Although Horst Simon was named Deputy Director of Lawrence Berkeley National Laboratory, he maintains his strong ties to the scientific computing community as an editor of the TOP500 list and as an invited speaker at conferences.
Read more...
Supercomputing veteran, Bo Ewald, has been neck-deep in bleeding edge system development since his twelve-year stint at Cray Research back in the mid-1980s, which was followed by his tenure at large organizations like SGI and startups, including Scale Eight Corporation and Linux Networx. He has put his weight behind quantum company....
Read more...
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
Read more...
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
Read more...
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
Read more...
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.