Platform Tunes Symphony for Big Data Deluge

By Nicole Hemsoth

March 31, 2011

The “big data” topic is wending its way into an increasing number of conversations as data volumes mount, stretching computational resources to their limits. This realm of massive datasets is not confined to business intelligence either — it is increasingly becoming a central component of mission-critical enterprise goals.

Accordingly, the last year has produced a swell of news around companies looking to capitalize on the challenges of managing big data via commercial renditions of popular open source products and the emergence of new open frameworks to further develop the landscape. Newer companies like Cloudera, for instance, seek to bring “big data to the masses” via simplified handling of large messy datasets. And now, industry stalwart Platform Computing is hopping aboard the big data express.

modern architectureTo be more specific, Platform announced this week that it’s seeking to provide distributed computing for the MapReduce programming model, which is one of a short list of ways to extract and map the pesky unstructured data and, a la its moniker, reduce that mess into actionable information.

Cloudera (and a host of open source solutions) are all targeting one big problem. At the heart of challenges for those contending with big data (financial services organizations and large-scale business analytics users, among others) is the matter of structured versus unstructured data. To be clear, however, this isn’t just a single-sided issue; unstructured data can be problematic on several fronts, not the least of which is some warranted concern about being “locked in” to specific management tools for all that information.

Platform and others are right to address this and other problems given the continued proliferation of more of this particularly tricky type of data. As it stands, a vast majority of the data filtering in is in an unstructured format — as much as 80 percent if IDC figures are correct. New programming frameworks have stepped into the fray to help manage this complexity and enable distributed computing on large datasets.

On the storage end, new techniques and file systems like the Hadoop file system (HDFS), which was built to tackle the demands of both structured and unstructured data, have been developed, but in Platform’s view (which we’ll expound on in a moment) this and other models all have some serious weaknesses on one front or another.

An Evolving Platform

For a company that has been in the business of distributed systems for 18 years, this implementation isn’t unexpected. In fact, the only element that does cause some head-scratching is why they took so long to get into the big data boat when much of the needed framework was there.

According to Scott Campbell, Platform’s product manager for enterprise analytics, the process to start adding the tools to “reduce the maps” began around eight months ago even though he noted that the company was seeing some seismic shifts in the analytics sphere over the last few years on the unstructured data front. With massive amounts of data filtering in from any number of new tools, sensors and other collection methods, it was clear that it was becoming impossible to run this data into warehouses or structured databases and there were some serious limitations underlying a number of existing efforts.

Ken Hertzler, vice president of product management for Platform, told us that their customers, especially those on the financial services and analytics side, found that existing big data solutions (including open source tools like Hadoop, companies like Cloudera or data warehousing systems a la Greenplum or Aster Data) had critical flaws. He pointed out that with all of these solutions users might be responsible for managing the software stack (if using open source) and would thus need to increase internal expertise as well as perform regular maintenance to keep big data projects churning.

Another big problem that Hertzler highlighted is that open source solutions are reliant only with the HDFS file system and those who try to avoid this perceived “trap” and go with a data warehousing alternative are getting that top-to-bottom product that can be very difficult to extract oneself from.

This isn’t just coming from Hertzler’s own opinion well; he stated that customers all felt that the alternatives for big data management did a great job of managing the query side of their needs but that they failed on the enterprise-class or production-ready level. He revealed that the main gripes were about poor application compatibility, the lock-in issue, maintaining utilization and SLAs and concerns about having data on multiple cloud storage distributed systems.

Platform’s distributed MapReduce workload manager and job execution engine is, as both Hertzler and Campbell emphasized repeatedly, enterprise-ready and far more viable due to two key traits in particular: openness and scalability.

The keywords “open” and “scalable” are ferried about in nearly every technological context these days — almost to the point that their meanings are sometimes overlooked. Campbell explained in depth these two angles to highlight how Platform is doing something that isn’t available with the other management alternatives.

The openness and scalability angles are somewhat interesting but require a bit of setting up, more specifically by putting Platform’s announcement in the context of its Symphony product.

This MapReduce capability has been integrated into Platform Symphony, which is something of an SOA approach to workload distribution, in contrast to the company’s other widely-used LSF product, which works from a batch-oriented architecture. Why is this important, you ask…

Well, to take yet another step backwards, the Symphony approach for workload distribution and management is actually a natural fit for what Platform just got around to eight months ago. Symphony was literally built for distributed architectures, which is exactly how MapReduce is deployed. The short time-to-market for this (relatively — after all, what’s eight months) is because Campbell and his team simply build the APIs on top of Symphony. With their existing tool in place to provide the distributed management and job execution engine, they pile on specific APIs for different job types (PIG, Hadoop, etc.). Users can manage complexity by using the Symphony framework along with those APIs, and on the backside, using connectors to file systems or databases to serve as I/O for MapReduce jobs.

And back to the relatively short process behind this — the company is more or less aggregating interfaces versus tackling the cumbersome mission of rewriting MapReduce like some of the commercial big data companies have done.

In other words, the Symphony was already playing along with the big MapReduce quest to simplify workloads by allowing users to run multiple jobs at a time versus having one job hang out until completion. This could possibly mean a much more nimble big data game for those who — here’s the catch — are under the Symphony license. While the company hasn’t “productized” the new solution yet, it is going to be available within Symphony and is already making its way into financial services organizations.

Campbell asserted that this “rearchitecture of a workload distribution has low latency and operates more like a server than a grid so the workloads that can run on Symphony can run on sub-second time.”

Back to Openness and Scalability…

Remember several paragraphs ago when we hit on the idea that this offering might be something of a game-changer (at least for those with a Symphony license) due to the openness and scalability aspects? Now that there’s sufficient background we can explore that in quick detail. This is where the meat of the announcement is.

The “open” angle is probably the most important differentiator here between the Symphony/MapReduce marriage and other alternatives. As Campbell noted, since this capability “sits in the middle of the stack so that we can open up the architecture on both the front-end application layer and the backend database layer. This means we can let customers move from a complete solution and single vendor or select the application or file systems independently.”

Campbell went on to state that “this technology is getting a lot of investment commercially and in the open source form because it’s compatible with Hadoop and fully supports APIs for MapReduce. Right now, everything is almost always coming from a single vendor top to bottom and when open source comes you can’t take advantage of it. As new file systems get created you can leverage and manage those versus being locked in.”

It is also possible to add APIs specifically for MapReduce logic so there is integration of Hadoop, PIG, HIVE and others, as more programming frameworks are likely to emerge over the next several months. Platform’s big story here on the openness front is that when something new comes down the pike, users will actually be able to put it into production versus facing lock-in with very high barriers to moving over.

On that note, the architecture is designed, as noted before, without the requirement for using HDFS as the end-all file system. Users will be able to select file systems based on their specific needs while still maintaining their application type, which might, for example be written in Hadoop.

In terms of scalability, Campbell affirmed that they will be able to manage thousands or even millions of files varying in size in a short period of time via the proven, existing Symphony product.

On this note, users could get higher resource utilization since they’re getting more than one distributed job at a time — they can have multiple running simultaneously which is unique for MapReduce. This is an important element for HPC folk who are performance conscious because, as Campbell explained, they’ve “eliminated a big issue in terms of startup time on the mappers so single jobs can be fast but overall time also goes way down because it’s not a serial thing any longer; we are running many jobs in parallel across a set of jobs.”

When asked about how this Symphony and MapReduce marriage will meld into the HPC user camp, Campbell noted traction in the government and life sciences spheres as well as the more predictable arenas like financial services and large-scale analytics.

He said that while this could represent an improvement for users, there was no core engineering behind the effort, it’s been a matter of engineering interfaces to support the MapReduce logic. “We can react to the market,” he declared. “If someone creates another end user application for MapReduce we can simply interface to it.”

As big data gets bigger and more companies come calling for management and data crunching, there’s little doubt Platform’s interface builders will be working overtime.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Geospatial Data Research Leverages GPUs

August 17, 2017

MapD Technologies, the GPU-accelerated database specialist, said it is working with university researchers on leveraging graphics processors to advance geospatial analytics. The San Francisco-based company is collabor Read more…

By George Leopold

Intel, NERSC and University Partners Launch New Big Data Center

August 17, 2017

A collaboration between the Department of Energy’s National Energy Research Scientific Computing Center (NERSC), Intel and five Intel Parallel Computing Centers (IPCCs) has resulted in a new Big Data Center (BDC) that Read more…

By Linda Barney

Google Releases Deeplearn.js to Further Democratize Machine Learning

August 17, 2017

Spreading the use of machine learning tools is one of the goals of Google’s PAIR (People + AI Research) initiative, which was introduced in early July. Last week the cloud giant released deeplearn.js as part of that in Read more…

By John Russell

HPE Extreme Performance Solutions

Leveraging Deep Learning for Fraud Detection

Advancements in computing technologies and the expanding use of e-commerce platforms have dramatically increased the risk of fraud for financial services companies and their customers. Read more…

Spoiler Alert: Glimpse Next Week’s Solar Eclipse Via Simulation from TACC, SDSC, and NASA

August 17, 2017

Can’t wait to see next week’s solar eclipse? You can at least catch glimpses of what scientists expect it will look like. A team from Predictive Science Inc. (PSI), based in San Diego, working with Stampede2 at the Read more…

By John Russell

Microsoft Bolsters Azure With Cloud HPC Deal

August 15, 2017

Microsoft has acquired cloud computing software vendor Cycle Computing in a move designed to bring orchestration tools along with high-end computing access capabilities to the cloud. Terms of the acquisition were not disclosed. Read more…

By George Leopold

HPE Ships Supercomputer to Space Station, Final Destination Mars

August 14, 2017

With a manned mission to Mars on the horizon, the demand for space-based supercomputing is at hand. Today HPE and NASA sent the first off-the-shelf HPC system i Read more…

By Tiffany Trader

AMD EPYC Video Takes Aim at Intel’s Broadwell

August 14, 2017

Let the benchmarking begin. Last week, AMD posted a YouTube video in which one of its EPYC-based systems outperformed a ‘comparable’ Intel Broadwell-based s Read more…

By John Russell

Deep Learning Thrives in Cancer Moonshot

August 8, 2017

The U.S. War on Cancer, certainly a worthy cause, is a collection of programs stretching back more than 40 years and abiding under many banners. The latest is t Read more…

By John Russell

IBM Raises the Bar for Distributed Deep Learning

August 8, 2017

IBM is announcing today an enhancement to its PowerAI software platform aimed at facilitating the practical scaling of AI models on today’s fastest GPUs. Scal Read more…

By Tiffany Trader

IBM Storage Breakthrough Paves Way for 330TB Tape Cartridges

August 3, 2017

IBM announced yesterday a new record for magnetic tape storage that it says will keep tape storage density on a Moore's law-like path far into the next decade. Read more…

By Tiffany Trader

AMD Stuffs a Petaflops of Machine Intelligence into 20-Node Rack

August 1, 2017

With its Radeon “Vega” Instinct datacenter GPUs and EPYC “Naples” server chips entering the market this summer, AMD has positioned itself for a two-head Read more…

By Tiffany Trader

Cray Moves to Acquire the Seagate ClusterStor Line

July 28, 2017

This week Cray announced that it is picking up Seagate's ClusterStor HPC storage array business for an undisclosed sum. "In short we're effectively transitioning the bulk of the ClusterStor product line to Cray," said CEO Peter Ungaro. Read more…

By Tiffany Trader

Nvidia’s Mammoth Volta GPU Aims High for AI, HPC

May 10, 2017

At Nvidia's GPU Technology Conference (GTC17) in San Jose, Calif., this morning, CEO Jensen Huang announced the company's much-anticipated Volta architecture a Read more…

By Tiffany Trader

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Reinders: “AVX-512 May Be a Hidden Gem” in Intel Xeon Scalable Processors

June 29, 2017

Imagine if we could use vector processing on something other than just floating point problems.  Today, GPUs and CPUs work tirelessly to accelerate algorithms Read more…

By James Reinders

Quantum Bits: D-Wave and VW; Google Quantum Lab; IBM Expands Access

March 21, 2017

For a technology that’s usually characterized as far off and in a distant galaxy, quantum computing has been steadily picking up steam. Just how close real-wo Read more…

By John Russell

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Russian Researchers Claim First Quantum-Safe Blockchain

May 25, 2017

The Russian Quantum Center today announced it has overcome the threat of quantum cryptography by creating the first quantum-safe blockchain, securing cryptocurrencies like Bitcoin, along with classified government communications and other sensitive digital transfers. Read more…

By Doug Black

HPC Compiler Company PathScale Seeks Life Raft

March 23, 2017

HPCwire has learned that HPC compiler company PathScale has fallen on difficult times and is asking the community for help or actively seeking a buyer for its a Read more…

By Tiffany Trader

Trump Budget Targets NIH, DOE, and EPA; No Mention of NSF

March 16, 2017

President Trump’s proposed U.S. fiscal 2018 budget issued today sharply cuts science spending while bolstering military spending as he promised during the cam Read more…

By John Russell

Leading Solution Providers

Groq This: New AI Chips to Give GPUs a Run for Deep Learning Money

April 24, 2017

CPUs and GPUs, move over. Thanks to recent revelations surrounding Google’s new Tensor Processing Unit (TPU), the computing world appears to be on the cusp of Read more…

By Alex Woodie

CPU-based Visualization Positions for Exascale Supercomputing

March 16, 2017

In this contributed perspective piece, Intel’s Jim Jeffers makes the case that CPU-based visualization is now widely adopted and as such is no longer a contrarian view, but is rather an exascale requirement. Read more…

By Jim Jeffers, Principal Engineer and Engineering Leader, Intel

Google Debuts TPU v2 and will Add to Google Cloud

May 25, 2017

Not long after stirring attention in the deep learning/AI community by revealing the details of its Tensor Processing Unit (TPU), Google last week announced the Read more…

By John Russell

MIT Mathematician Spins Up 220,000-Core Google Compute Cluster

April 21, 2017

On Thursday, Google announced that MIT math professor and computational number theorist Andrew V. Sutherland had set a record for the largest Google Compute Engine (GCE) job. Sutherland ran the massive mathematics workload on 220,000 GCE cores using preemptible virtual machine instances. Read more…

By Tiffany Trader

Six Exascale PathForward Vendors Selected; DoE Providing $258M

June 15, 2017

The much-anticipated PathForward awards for hardware R&D in support of the Exascale Computing Project were announced today with six vendors selected – AMD Read more…

By John Russell

Top500 Results: Latest List Trends and What’s in Store

June 19, 2017

Greetings from Frankfurt and the 2017 International Supercomputing Conference where the latest Top500 list has just been revealed. Although there were no major Read more…

By Tiffany Trader

IBM Clears Path to 5nm with Silicon Nanosheets

June 5, 2017

Two years since announcing the industry’s first 7nm node test chip, IBM and its research alliance partners GlobalFoundries and Samsung have developed a proces Read more…

By Tiffany Trader

Messina Update: The US Path to Exascale in 16 Slides

April 26, 2017

Paul Messina, director of the U.S. Exascale Computing Project, provided a wide-ranging review of ECP’s evolving plans last week at the HPC User Forum. Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Share This