Platform Tunes Symphony for Big Data Deluge

By Nicole Hemsoth

March 31, 2011

The “big data” topic is wending its way into an increasing number of conversations as data volumes mount, stretching computational resources to their limits. This realm of massive datasets is not confined to business intelligence either — it is increasingly becoming a central component of mission-critical enterprise goals.

Accordingly, the last year has produced a swell of news around companies looking to capitalize on the challenges of managing big data via commercial renditions of popular open source products and the emergence of new open frameworks to further develop the landscape. Newer companies like Cloudera, for instance, seek to bring “big data to the masses” via simplified handling of large messy datasets. And now, industry stalwart Platform Computing is hopping aboard the big data express.

modern architectureTo be more specific, Platform announced this week that it’s seeking to provide distributed computing for the MapReduce programming model, which is one of a short list of ways to extract and map the pesky unstructured data and, a la its moniker, reduce that mess into actionable information.

Cloudera (and a host of open source solutions) are all targeting one big problem. At the heart of challenges for those contending with big data (financial services organizations and large-scale business analytics users, among others) is the matter of structured versus unstructured data. To be clear, however, this isn’t just a single-sided issue; unstructured data can be problematic on several fronts, not the least of which is some warranted concern about being “locked in” to specific management tools for all that information.

Platform and others are right to address this and other problems given the continued proliferation of more of this particularly tricky type of data. As it stands, a vast majority of the data filtering in is in an unstructured format — as much as 80 percent if IDC figures are correct. New programming frameworks have stepped into the fray to help manage this complexity and enable distributed computing on large datasets.

On the storage end, new techniques and file systems like the Hadoop file system (HDFS), which was built to tackle the demands of both structured and unstructured data, have been developed, but in Platform’s view (which we’ll expound on in a moment) this and other models all have some serious weaknesses on one front or another.

An Evolving Platform

For a company that has been in the business of distributed systems for 18 years, this implementation isn’t unexpected. In fact, the only element that does cause some head-scratching is why they took so long to get into the big data boat when much of the needed framework was there.

According to Scott Campbell, Platform’s product manager for enterprise analytics, the process to start adding the tools to “reduce the maps” began around eight months ago even though he noted that the company was seeing some seismic shifts in the analytics sphere over the last few years on the unstructured data front. With massive amounts of data filtering in from any number of new tools, sensors and other collection methods, it was clear that it was becoming impossible to run this data into warehouses or structured databases and there were some serious limitations underlying a number of existing efforts.

Ken Hertzler, vice president of product management for Platform, told us that their customers, especially those on the financial services and analytics side, found that existing big data solutions (including open source tools like Hadoop, companies like Cloudera or data warehousing systems a la Greenplum or Aster Data) had critical flaws. He pointed out that with all of these solutions users might be responsible for managing the software stack (if using open source) and would thus need to increase internal expertise as well as perform regular maintenance to keep big data projects churning.

Another big problem that Hertzler highlighted is that open source solutions are reliant only with the HDFS file system and those who try to avoid this perceived “trap” and go with a data warehousing alternative are getting that top-to-bottom product that can be very difficult to extract oneself from.

This isn’t just coming from Hertzler’s own opinion well; he stated that customers all felt that the alternatives for big data management did a great job of managing the query side of their needs but that they failed on the enterprise-class or production-ready level. He revealed that the main gripes were about poor application compatibility, the lock-in issue, maintaining utilization and SLAs and concerns about having data on multiple cloud storage distributed systems.

Platform’s distributed MapReduce workload manager and job execution engine is, as both Hertzler and Campbell emphasized repeatedly, enterprise-ready and far more viable due to two key traits in particular: openness and scalability.

The keywords “open” and “scalable” are ferried about in nearly every technological context these days — almost to the point that their meanings are sometimes overlooked. Campbell explained in depth these two angles to highlight how Platform is doing something that isn’t available with the other management alternatives.

The openness and scalability angles are somewhat interesting but require a bit of setting up, more specifically by putting Platform’s announcement in the context of its Symphony product.

This MapReduce capability has been integrated into Platform Symphony, which is something of an SOA approach to workload distribution, in contrast to the company’s other widely-used LSF product, which works from a batch-oriented architecture. Why is this important, you ask…

Well, to take yet another step backwards, the Symphony approach for workload distribution and management is actually a natural fit for what Platform just got around to eight months ago. Symphony was literally built for distributed architectures, which is exactly how MapReduce is deployed. The short time-to-market for this (relatively — after all, what’s eight months) is because Campbell and his team simply build the APIs on top of Symphony. With their existing tool in place to provide the distributed management and job execution engine, they pile on specific APIs for different job types (PIG, Hadoop, etc.). Users can manage complexity by using the Symphony framework along with those APIs, and on the backside, using connectors to file systems or databases to serve as I/O for MapReduce jobs.

And back to the relatively short process behind this — the company is more or less aggregating interfaces versus tackling the cumbersome mission of rewriting MapReduce like some of the commercial big data companies have done.

In other words, the Symphony was already playing along with the big MapReduce quest to simplify workloads by allowing users to run multiple jobs at a time versus having one job hang out until completion. This could possibly mean a much more nimble big data game for those who — here’s the catch — are under the Symphony license. While the company hasn’t “productized” the new solution yet, it is going to be available within Symphony and is already making its way into financial services organizations.

Campbell asserted that this “rearchitecture of a workload distribution has low latency and operates more like a server than a grid so the workloads that can run on Symphony can run on sub-second time.”

Back to Openness and Scalability…

Remember several paragraphs ago when we hit on the idea that this offering might be something of a game-changer (at least for those with a Symphony license) due to the openness and scalability aspects? Now that there’s sufficient background we can explore that in quick detail. This is where the meat of the announcement is.

The “open” angle is probably the most important differentiator here between the Symphony/MapReduce marriage and other alternatives. As Campbell noted, since this capability “sits in the middle of the stack so that we can open up the architecture on both the front-end application layer and the backend database layer. This means we can let customers move from a complete solution and single vendor or select the application or file systems independently.”

Campbell went on to state that “this technology is getting a lot of investment commercially and in the open source form because it’s compatible with Hadoop and fully supports APIs for MapReduce. Right now, everything is almost always coming from a single vendor top to bottom and when open source comes you can’t take advantage of it. As new file systems get created you can leverage and manage those versus being locked in.”

It is also possible to add APIs specifically for MapReduce logic so there is integration of Hadoop, PIG, HIVE and others, as more programming frameworks are likely to emerge over the next several months. Platform’s big story here on the openness front is that when something new comes down the pike, users will actually be able to put it into production versus facing lock-in with very high barriers to moving over.

On that note, the architecture is designed, as noted before, without the requirement for using HDFS as the end-all file system. Users will be able to select file systems based on their specific needs while still maintaining their application type, which might, for example be written in Hadoop.

In terms of scalability, Campbell affirmed that they will be able to manage thousands or even millions of files varying in size in a short period of time via the proven, existing Symphony product.

On this note, users could get higher resource utilization since they’re getting more than one distributed job at a time — they can have multiple running simultaneously which is unique for MapReduce. This is an important element for HPC folk who are performance conscious because, as Campbell explained, they’ve “eliminated a big issue in terms of startup time on the mappers so single jobs can be fast but overall time also goes way down because it’s not a serial thing any longer; we are running many jobs in parallel across a set of jobs.”

When asked about how this Symphony and MapReduce marriage will meld into the HPC user camp, Campbell noted traction in the government and life sciences spheres as well as the more predictable arenas like financial services and large-scale analytics.

He said that while this could represent an improvement for users, there was no core engineering behind the effort, it’s been a matter of engineering interfaces to support the MapReduce logic. “We can react to the market,” he declared. “If someone creates another end user application for MapReduce we can simply interface to it.”

As big data gets bigger and more companies come calling for management and data crunching, there’s little doubt Platform’s interface builders will be working overtime.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

HPC-as-a-Service Finds Toehold in Iceland

December 11, 2017

While high-demand workloads (e.g., bitcoin mining) can overheat data center cooling capabilities, at least one data center infrastructure provider has announced an HPC-as-a-service offering that features 100 percent fre Read more…

By Doug Black

HPC Iron, Soft, Data, People – It Takes an Ecosystem!

December 11, 2017

Cutting edge advanced computing hardware (aka big iron) does not stand by itself. These computers are the pinnacle of a myriad of technologies that must be carefully woven together by people to create the computational c Read more…

By Alex R. Larzelere

IBM Begins Power9 Rollout with Backing from DOE, Google

December 6, 2017

After over a year of buildup, IBM is unveiling its first Power9 system based on the same architecture as the Department of Energy CORAL supercomputers, Summit and Sierra. The new AC922 server pairs two Power9 CPUs with f Read more…

By Tiffany Trader

HPE Extreme Performance Solutions

Explore the Origins of Space with COSMOS and Memory-Driven Computing

From the formation of black holes to the origins of space, data is the key to unlocking the secrets of the early universe. Read more…

PEZY President Arrested, Charged with Fraud

December 6, 2017

The head of Japanese supercomputing firm PEZY Computing was arrested Tuesday on suspicion of defrauding a government institution of 431 million yen (~$3.8 million). According to reports in the Japanese press, PEZY founde Read more…

By Tiffany Trader

HPC Iron, Soft, Data, People – It Takes an Ecosystem!

December 11, 2017

Cutting edge advanced computing hardware (aka big iron) does not stand by itself. These computers are the pinnacle of a myriad of technologies that must be care Read more…

By Alex R. Larzelere

IBM Begins Power9 Rollout with Backing from DOE, Google

December 6, 2017

After over a year of buildup, IBM is unveiling its first Power9 system based on the same architecture as the Department of Energy CORAL supercomputers, Summit a Read more…

By Tiffany Trader

Microsoft Spins Cycle Computing into Core Azure Product

December 5, 2017

Last August, cloud giant Microsoft acquired HPC cloud orchestration pioneer Cycle Computing. Since then the focus has been on integrating Cycle’s organization Read more…

By John Russell

GlobalFoundries, Ayar Labs Team Up to Commercialize Optical I/O

December 4, 2017

GlobalFoundries (GF) and Ayar Labs, a startup focused on using light, instead of electricity, to transfer data between chips, today announced they've entered in Read more…

By Tiffany Trader

HPE In-Memory Platform Comes to COSMOS

November 30, 2017

Hewlett Packard Enterprise is on a mission to accelerate space research. In August, it sent the first commercial-off-the-shelf HPC system into space for testing Read more…

By Tiffany Trader

SC17 Cluster Competition: Who Won and Why? Results Analyzed and Over-Analyzed

November 28, 2017

Everyone by now knows that Nanyang Technological University of Singapore (NTU) took home the highest LINPACK Award and the Overall Championship from the recently concluded SC17 Student Cluster Competition. We also already know how the teams did in the Highest LINPACK and Highest HPCG competitions, with Nanyang grabbing bragging rights for both benchmarks. Read more…

By Dan Olds

Perspective: What Really Happened at SC17?

November 22, 2017

SC is over. Now comes the myriad of follow-ups. Inboxes are filled with templated emails from vendors and other exhibitors hoping to win a place in the post-SC thinking of booth visitors. Attendees of tutorials, workshops and other technical sessions will be inundated with requests for feedback. Read more…

By Andrew Jones

SC Bids Farewell to Denver, Heads to Dallas for 30th Anniversary

November 17, 2017

After a jam-packed four-day expo and intensive six-day technical program, SC17 has wrapped up another successful event that brought together nearly 13,000 visit Read more…

By Tiffany Trader

US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021

September 27, 2017

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the "Aurora" supercompute Read more…

By Tiffany Trader

NERSC Scales Scientific Deep Learning to 15 Petaflops

August 28, 2017

A collaborative effort between Intel, NERSC and Stanford has delivered the first 15-petaflops deep learning software running on HPC platforms and is, according Read more…

By Rob Farber

Oracle Layoffs Reportedly Hit SPARC and Solaris Hard

September 7, 2017

Oracle’s latest layoffs have many wondering if this is the end of the line for the SPARC processor and Solaris OS development. As reported by multiple sources Read more…

By John Russell

AMD Showcases Growing Portfolio of EPYC and Radeon-based Systems at SC17

November 13, 2017

AMD’s charge back into HPC and the datacenter is on full display at SC17. Having launched the EPYC processor line in June along with its MI25 GPU the focus he Read more…

By John Russell

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Japan Unveils Quantum Neural Network

November 22, 2017

The U.S. and China are leading the race toward productive quantum computing, but it's early enough that ultimate leadership is still something of an open questi Read more…

By Tiffany Trader

GlobalFoundries Puts Wind in AMD’s Sails with 12nm FinFET

September 24, 2017

From its annual tech conference last week (Sept. 20), where GlobalFoundries welcomed more than 600 semiconductor professionals (reaching the Santa Clara venue Read more…

By Tiffany Trader

Google Releases Deeplearn.js to Further Democratize Machine Learning

August 17, 2017

Spreading the use of machine learning tools is one of the goals of Google’s PAIR (People + AI Research) initiative, which was introduced in early July. Last w Read more…

By John Russell

Leading Solution Providers

Amazon Debuts New AMD-based GPU Instances for Graphics Acceleration

September 12, 2017

Last week Amazon Web Services (AWS) streaming service, AppStream 2.0, introduced a new GPU instance called Graphics Design intended to accelerate graphics. The Read more…

By John Russell

Perspective: What Really Happened at SC17?

November 22, 2017

SC is over. Now comes the myriad of follow-ups. Inboxes are filled with templated emails from vendors and other exhibitors hoping to win a place in the post-SC thinking of booth visitors. Attendees of tutorials, workshops and other technical sessions will be inundated with requests for feedback. Read more…

By Andrew Jones

EU Funds 20 Million Euro ARM+FPGA Exascale Project

September 7, 2017

At the Barcelona Supercomputer Centre on Wednesday (Sept. 6), 16 partners gathered to launch the EuroEXA project, which invests €20 million over three-and-a-half years into exascale-focused research and development. Led by the Horizon 2020 program, EuroEXA picks up the banner of a triad of partner projects — ExaNeSt, EcoScale and ExaNoDe — building on their work... Read more…

By Tiffany Trader

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Tensors Come of Age: Why the AI Revolution Will Help HPC

November 13, 2017

Thirty years ago, parallel computing was coming of age. A bitter battle began between stalwart vector computing supporters and advocates of various approaches to parallel computing. IBM skeptic Alan Karp, reacting to announcements of nCUBE’s 1024-microprocessor system and Thinking Machines’ 65,536-element array, made a public $100 wager that no one could get a parallel speedup of over 200 on real HPC workloads. Read more…

By John Gustafson & Lenore Mullin

Flipping the Flops and Reading the Top500 Tea Leaves

November 13, 2017

The 50th edition of the Top500 list, the biannual publication of the world’s fastest supercomputers based on public Linpack benchmarking results, was released Read more…

By Tiffany Trader

Intel Launches Software Tools to Ease FPGA Programming

September 5, 2017

Field Programmable Gate Arrays (FPGAs) have a reputation for being difficult to program, requiring expertise in specialty languages, like Verilog or VHDL. Easin Read more…

By Tiffany Trader

HPC Chips – A Veritable Smorgasbord?

October 10, 2017

For the first time since AMD's ill-fated launch of Bulldozer the answer to the question, 'Which CPU will be in my next HPC system?' doesn't have to be 'Whichever variety of Intel Xeon E5 they are selling when we procure'. Read more…

By Dairsie Latimer

Share This