Platform Tunes Symphony for Big Data Deluge

By Nicole Hemsoth

March 31, 2011

The “big data” topic is wending its way into an increasing number of conversations as data volumes mount, stretching computational resources to their limits. This realm of massive datasets is not confined to business intelligence either — it is increasingly becoming a central component of mission-critical enterprise goals.

Accordingly, the last year has produced a swell of news around companies looking to capitalize on the challenges of managing big data via commercial renditions of popular open source products and the emergence of new open frameworks to further develop the landscape. Newer companies like Cloudera, for instance, seek to bring “big data to the masses” via simplified handling of large messy datasets. And now, industry stalwart Platform Computing is hopping aboard the big data express.

modern architectureTo be more specific, Platform announced this week that it’s seeking to provide distributed computing for the MapReduce programming model, which is one of a short list of ways to extract and map the pesky unstructured data and, a la its moniker, reduce that mess into actionable information.

Cloudera (and a host of open source solutions) are all targeting one big problem. At the heart of challenges for those contending with big data (financial services organizations and large-scale business analytics users, among others) is the matter of structured versus unstructured data. To be clear, however, this isn’t just a single-sided issue; unstructured data can be problematic on several fronts, not the least of which is some warranted concern about being “locked in” to specific management tools for all that information.

Platform and others are right to address this and other problems given the continued proliferation of more of this particularly tricky type of data. As it stands, a vast majority of the data filtering in is in an unstructured format — as much as 80 percent if IDC figures are correct. New programming frameworks have stepped into the fray to help manage this complexity and enable distributed computing on large datasets.

On the storage end, new techniques and file systems like the Hadoop file system (HDFS), which was built to tackle the demands of both structured and unstructured data, have been developed, but in Platform’s view (which we’ll expound on in a moment) this and other models all have some serious weaknesses on one front or another.

An Evolving Platform

For a company that has been in the business of distributed systems for 18 years, this implementation isn’t unexpected. In fact, the only element that does cause some head-scratching is why they took so long to get into the big data boat when much of the needed framework was there.

According to Scott Campbell, Platform’s product manager for enterprise analytics, the process to start adding the tools to “reduce the maps” began around eight months ago even though he noted that the company was seeing some seismic shifts in the analytics sphere over the last few years on the unstructured data front. With massive amounts of data filtering in from any number of new tools, sensors and other collection methods, it was clear that it was becoming impossible to run this data into warehouses or structured databases and there were some serious limitations underlying a number of existing efforts.

Ken Hertzler, vice president of product management for Platform, told us that their customers, especially those on the financial services and analytics side, found that existing big data solutions (including open source tools like Hadoop, companies like Cloudera or data warehousing systems a la Greenplum or Aster Data) had critical flaws. He pointed out that with all of these solutions users might be responsible for managing the software stack (if using open source) and would thus need to increase internal expertise as well as perform regular maintenance to keep big data projects churning.

Another big problem that Hertzler highlighted is that open source solutions are reliant only with the HDFS file system and those who try to avoid this perceived “trap” and go with a data warehousing alternative are getting that top-to-bottom product that can be very difficult to extract oneself from.

This isn’t just coming from Hertzler’s own opinion well; he stated that customers all felt that the alternatives for big data management did a great job of managing the query side of their needs but that they failed on the enterprise-class or production-ready level. He revealed that the main gripes were about poor application compatibility, the lock-in issue, maintaining utilization and SLAs and concerns about having data on multiple cloud storage distributed systems.

Platform’s distributed MapReduce workload manager and job execution engine is, as both Hertzler and Campbell emphasized repeatedly, enterprise-ready and far more viable due to two key traits in particular: openness and scalability.

The keywords “open” and “scalable” are ferried about in nearly every technological context these days — almost to the point that their meanings are sometimes overlooked. Campbell explained in depth these two angles to highlight how Platform is doing something that isn’t available with the other management alternatives.

The openness and scalability angles are somewhat interesting but require a bit of setting up, more specifically by putting Platform’s announcement in the context of its Symphony product.

This MapReduce capability has been integrated into Platform Symphony, which is something of an SOA approach to workload distribution, in contrast to the company’s other widely-used LSF product, which works from a batch-oriented architecture. Why is this important, you ask…

Well, to take yet another step backwards, the Symphony approach for workload distribution and management is actually a natural fit for what Platform just got around to eight months ago. Symphony was literally built for distributed architectures, which is exactly how MapReduce is deployed. The short time-to-market for this (relatively — after all, what’s eight months) is because Campbell and his team simply build the APIs on top of Symphony. With their existing tool in place to provide the distributed management and job execution engine, they pile on specific APIs for different job types (PIG, Hadoop, etc.). Users can manage complexity by using the Symphony framework along with those APIs, and on the backside, using connectors to file systems or databases to serve as I/O for MapReduce jobs.

And back to the relatively short process behind this — the company is more or less aggregating interfaces versus tackling the cumbersome mission of rewriting MapReduce like some of the commercial big data companies have done.

In other words, the Symphony was already playing along with the big MapReduce quest to simplify workloads by allowing users to run multiple jobs at a time versus having one job hang out until completion. This could possibly mean a much more nimble big data game for those who — here’s the catch — are under the Symphony license. While the company hasn’t “productized” the new solution yet, it is going to be available within Symphony and is already making its way into financial services organizations.

Campbell asserted that this “rearchitecture of a workload distribution has low latency and operates more like a server than a grid so the workloads that can run on Symphony can run on sub-second time.”

Back to Openness and Scalability…

Remember several paragraphs ago when we hit on the idea that this offering might be something of a game-changer (at least for those with a Symphony license) due to the openness and scalability aspects? Now that there’s sufficient background we can explore that in quick detail. This is where the meat of the announcement is.

The “open” angle is probably the most important differentiator here between the Symphony/MapReduce marriage and other alternatives. As Campbell noted, since this capability “sits in the middle of the stack so that we can open up the architecture on both the front-end application layer and the backend database layer. This means we can let customers move from a complete solution and single vendor or select the application or file systems independently.”

Campbell went on to state that “this technology is getting a lot of investment commercially and in the open source form because it’s compatible with Hadoop and fully supports APIs for MapReduce. Right now, everything is almost always coming from a single vendor top to bottom and when open source comes you can’t take advantage of it. As new file systems get created you can leverage and manage those versus being locked in.”

It is also possible to add APIs specifically for MapReduce logic so there is integration of Hadoop, PIG, HIVE and others, as more programming frameworks are likely to emerge over the next several months. Platform’s big story here on the openness front is that when something new comes down the pike, users will actually be able to put it into production versus facing lock-in with very high barriers to moving over.

On that note, the architecture is designed, as noted before, without the requirement for using HDFS as the end-all file system. Users will be able to select file systems based on their specific needs while still maintaining their application type, which might, for example be written in Hadoop.

In terms of scalability, Campbell affirmed that they will be able to manage thousands or even millions of files varying in size in a short period of time via the proven, existing Symphony product.

On this note, users could get higher resource utilization since they’re getting more than one distributed job at a time — they can have multiple running simultaneously which is unique for MapReduce. This is an important element for HPC folk who are performance conscious because, as Campbell explained, they’ve “eliminated a big issue in terms of startup time on the mappers so single jobs can be fast but overall time also goes way down because it’s not a serial thing any longer; we are running many jobs in parallel across a set of jobs.”

When asked about how this Symphony and MapReduce marriage will meld into the HPC user camp, Campbell noted traction in the government and life sciences spheres as well as the more predictable arenas like financial services and large-scale analytics.

He said that while this could represent an improvement for users, there was no core engineering behind the effort, it’s been a matter of engineering interfaces to support the MapReduce logic. “We can react to the market,” he declared. “If someone creates another end user application for MapReduce we can simply interface to it.”

As big data gets bigger and more companies come calling for management and data crunching, there’s little doubt Platform’s interface builders will be working overtime.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

The EU Human Brain Project Reboots but Supercomputing Still Needed

June 26, 2017

The often contentious, EU-funded Human Brain Project whose initial aim was fixed firmly on full-brain simulation is now in the midst of a reboot targeting a more modest goal – development of informatics tools and data/ Read more…

By John Russell

DOE Launches Chicago Quantum Exchange

June 26, 2017

While many of us were preoccupied with ISC 2017 last week, the launch of the Chicago Quantum Exchange went largely unnoticed. So what is such a thing? It is a Department of Energy sponsored collaboration between the Univ Read more…

By John Russell

UMass Dartmouth Reports on HPC Day 2017 Activities

June 26, 2017

UMass Dartmouth's Center for Scientific Computing & Visualization Research (CSCVR) organized and hosted the third annual "HPC Day 2017" on May 25th. This annual event showcases on-going scientific research in Massach Read more…

By Gaurav Khanna

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “pre-exascale” award), parsed out additional information ab Read more…

By Tiffany Trader

HPE Extreme Performance Solutions

Creating a Roadmap for HPC Innovation at ISC 2017

In an era where technological advancements are driving innovation to every sector, and powering major economic and scientific breakthroughs, high performance computing (HPC) is crucial to tackle the challenges of today and tomorrow. Read more…

Tsinghua Crowned Eight-Time Student Cluster Champions at ISC

June 22, 2017

Always a hard-fought competition, the Student Cluster Competition awards were announced Wednesday, June 21, at the ISC High Performance Conference 2017. Amid whoops and hollers from the crowd, Thomas Sterling presented t Read more…

By Kim McMahon

GPUs, Power9, Figure Prominently in IBM’s Bet on Weather Forecasting

June 22, 2017

IBM jumped into the weather forecasting business roughly a year and a half ago by purchasing The Weather Company. This week at ISC 2017, Big Blue rolled out plans to push deeper into climate science and develop more gran Read more…

By John Russell

Intersect 360 at ISC: HPC Industry at $44B by 2021

June 22, 2017

The care, feeding and sustained growth of the HPC industry increasingly is in the hands of the commercial market sector – in particular, it’s the hyperscale companies and their embrace of AI and deep learning – tha Read more…

By Doug Black

At ISC – Goh on Go: Humans Can’t Scale, the Data-Centric Learning Machine Can

June 22, 2017

I've seen the future this week at ISC, it’s on display in prototype or Powerpoint form, and it’s going to dumbfound you. The future is an AI neural network designed to emulate and compete with the human brain. In thi Read more…

By Doug Black

DOE Launches Chicago Quantum Exchange

June 26, 2017

While many of us were preoccupied with ISC 2017 last week, the launch of the Chicago Quantum Exchange went largely unnoticed. So what is such a thing? It is a D Read more…

By John Russell

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Tsinghua Crowned Eight-Time Student Cluster Champions at ISC

June 22, 2017

Always a hard-fought competition, the Student Cluster Competition awards were announced Wednesday, June 21, at the ISC High Performance Conference 2017. Amid wh Read more…

By Kim McMahon

GPUs, Power9, Figure Prominently in IBM’s Bet on Weather Forecasting

June 22, 2017

IBM jumped into the weather forecasting business roughly a year and a half ago by purchasing The Weather Company. This week at ISC 2017, Big Blue rolled out pla Read more…

By John Russell

Intersect 360 at ISC: HPC Industry at $44B by 2021

June 22, 2017

The care, feeding and sustained growth of the HPC industry increasingly is in the hands of the commercial market sector – in particular, it’s the hyperscale Read more…

By Doug Black

At ISC – Goh on Go: Humans Can’t Scale, the Data-Centric Learning Machine Can

June 22, 2017

I've seen the future this week at ISC, it’s on display in prototype or Powerpoint form, and it’s going to dumbfound you. The future is an AI neural network Read more…

By Doug Black

Cray Brings AI and HPC Together on Flagship Supers

June 20, 2017

Cray took one more step toward the convergence of big data and high performance computing (HPC) today when it announced that it’s adding a full suite of big d Read more…

By Alex Woodie

AMD Charges Back into the Datacenter and HPC Workflows with EPYC Processor

June 20, 2017

AMD is charging back into the enterprise datacenter and select HPC workflows with its new EPYC 7000 processor line, code-named Naples, announced today at a “g Read more…

By John Russell

Quantum Bits: D-Wave and VW; Google Quantum Lab; IBM Expands Access

March 21, 2017

For a technology that’s usually characterized as far off and in a distant galaxy, quantum computing has been steadily picking up steam. Just how close real-wo Read more…

By John Russell

Trump Budget Targets NIH, DOE, and EPA; No Mention of NSF

March 16, 2017

President Trump’s proposed U.S. fiscal 2018 budget issued today sharply cuts science spending while bolstering military spending as he promised during the cam Read more…

By John Russell

HPC Compiler Company PathScale Seeks Life Raft

March 23, 2017

HPCwire has learned that HPC compiler company PathScale has fallen on difficult times and is asking the community for help or actively seeking a buyer for its a Read more…

By Tiffany Trader

Google Pulls Back the Covers on Its First Machine Learning Chip

April 6, 2017

This week Google released a report detailing the design and performance characteristics of the Tensor Processing Unit (TPU), its custom ASIC for the inference Read more…

By Tiffany Trader

CPU-based Visualization Positions for Exascale Supercomputing

March 16, 2017

In this contributed perspective piece, Intel’s Jim Jeffers makes the case that CPU-based visualization is now widely adopted and as such is no longer a contrarian view, but is rather an exascale requirement. Read more…

By Jim Jeffers, Principal Engineer and Engineering Leader, Intel

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Nvidia’s Mammoth Volta GPU Aims High for AI, HPC

May 10, 2017

At Nvidia's GPU Technology Conference (GTC17) in San Jose, Calif., this morning, CEO Jensen Huang announced the company's much-anticipated Volta architecture a Read more…

By Tiffany Trader

Facebook Open Sources Caffe2; Nvidia, Intel Rush to Optimize

April 18, 2017

From its F8 developer conference in San Jose, Calif., today, Facebook announced Caffe2, a new open-source, cross-platform framework for deep learning. Caffe2 is the successor to Caffe, the deep learning framework developed by Berkeley AI Research and community contributors. Read more…

By Tiffany Trader

Leading Solution Providers

MIT Mathematician Spins Up 220,000-Core Google Compute Cluster

April 21, 2017

On Thursday, Google announced that MIT math professor and computational number theorist Andrew V. Sutherland had set a record for the largest Google Compute Engine (GCE) job. Sutherland ran the massive mathematics workload on 220,000 GCE cores using preemptible virtual machine instances. Read more…

By Tiffany Trader

Google Debuts TPU v2 and will Add to Google Cloud

May 25, 2017

Not long after stirring attention in the deep learning/AI community by revealing the details of its Tensor Processing Unit (TPU), Google last week announced the Read more…

By John Russell

US Supercomputing Leaders Tackle the China Question

March 15, 2017

Joint DOE-NSA report responds to the increased global pressures impacting the competitiveness of U.S. supercomputing. Read more…

By Tiffany Trader

Russian Researchers Claim First Quantum-Safe Blockchain

May 25, 2017

The Russian Quantum Center today announced it has overcome the threat of quantum cryptography by creating the first quantum-safe blockchain, securing cryptocurrencies like Bitcoin, along with classified government communications and other sensitive digital transfers. Read more…

By Doug Black

Groq This: New AI Chips to Give GPUs a Run for Deep Learning Money

April 24, 2017

CPUs and GPUs, move over. Thanks to recent revelations surrounding Google’s new Tensor Processing Unit (TPU), the computing world appears to be on the cusp of Read more…

By Alex Woodie

DOE Supercomputer Achieves Record 45-Qubit Quantum Simulation

April 13, 2017

In order to simulate larger and larger quantum systems and usher in an age of “quantum supremacy,” researchers are stretching the limits of today’s most advanced supercomputers. Read more…

By Tiffany Trader

Messina Update: The US Path to Exascale in 16 Slides

April 26, 2017

Paul Messina, director of the U.S. Exascale Computing Project, provided a wide-ranging review of ECP’s evolving plans last week at the HPC User Forum. Read more…

By John Russell

Six Exascale PathForward Vendors Selected; DoE Providing $258M

June 15, 2017

The much-anticipated PathForward awards for hardware R&D in support of the Exascale Computing Project were announced today with six vendors selected – AMD Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Share This