Platform Tunes Symphony for Big Data Deluge

By Nicole Hemsoth

March 31, 2011

The “big data” topic is wending its way into an increasing number of conversations as data volumes mount, stretching computational resources to their limits. This realm of massive datasets is not confined to business intelligence either — it is increasingly becoming a central component of mission-critical enterprise goals.

Accordingly, the last year has produced a swell of news around companies looking to capitalize on the challenges of managing big data via commercial renditions of popular open source products and the emergence of new open frameworks to further develop the landscape. Newer companies like Cloudera, for instance, seek to bring “big data to the masses” via simplified handling of large messy datasets. And now, industry stalwart Platform Computing is hopping aboard the big data express.

modern architectureTo be more specific, Platform announced this week that it’s seeking to provide distributed computing for the MapReduce programming model, which is one of a short list of ways to extract and map the pesky unstructured data and, a la its moniker, reduce that mess into actionable information.

Cloudera (and a host of open source solutions) are all targeting one big problem. At the heart of challenges for those contending with big data (financial services organizations and large-scale business analytics users, among others) is the matter of structured versus unstructured data. To be clear, however, this isn’t just a single-sided issue; unstructured data can be problematic on several fronts, not the least of which is some warranted concern about being “locked in” to specific management tools for all that information.

Platform and others are right to address this and other problems given the continued proliferation of more of this particularly tricky type of data. As it stands, a vast majority of the data filtering in is in an unstructured format — as much as 80 percent if IDC figures are correct. New programming frameworks have stepped into the fray to help manage this complexity and enable distributed computing on large datasets.

On the storage end, new techniques and file systems like the Hadoop file system (HDFS), which was built to tackle the demands of both structured and unstructured data, have been developed, but in Platform’s view (which we’ll expound on in a moment) this and other models all have some serious weaknesses on one front or another.

An Evolving Platform

For a company that has been in the business of distributed systems for 18 years, this implementation isn’t unexpected. In fact, the only element that does cause some head-scratching is why they took so long to get into the big data boat when much of the needed framework was there.

According to Scott Campbell, Platform’s product manager for enterprise analytics, the process to start adding the tools to “reduce the maps” began around eight months ago even though he noted that the company was seeing some seismic shifts in the analytics sphere over the last few years on the unstructured data front. With massive amounts of data filtering in from any number of new tools, sensors and other collection methods, it was clear that it was becoming impossible to run this data into warehouses or structured databases and there were some serious limitations underlying a number of existing efforts.

Ken Hertzler, vice president of product management for Platform, told us that their customers, especially those on the financial services and analytics side, found that existing big data solutions (including open source tools like Hadoop, companies like Cloudera or data warehousing systems a la Greenplum or Aster Data) had critical flaws. He pointed out that with all of these solutions users might be responsible for managing the software stack (if using open source) and would thus need to increase internal expertise as well as perform regular maintenance to keep big data projects churning.

Another big problem that Hertzler highlighted is that open source solutions are reliant only with the HDFS file system and those who try to avoid this perceived “trap” and go with a data warehousing alternative are getting that top-to-bottom product that can be very difficult to extract oneself from.

This isn’t just coming from Hertzler’s own opinion well; he stated that customers all felt that the alternatives for big data management did a great job of managing the query side of their needs but that they failed on the enterprise-class or production-ready level. He revealed that the main gripes were about poor application compatibility, the lock-in issue, maintaining utilization and SLAs and concerns about having data on multiple cloud storage distributed systems.

Platform’s distributed MapReduce workload manager and job execution engine is, as both Hertzler and Campbell emphasized repeatedly, enterprise-ready and far more viable due to two key traits in particular: openness and scalability.

The keywords “open” and “scalable” are ferried about in nearly every technological context these days — almost to the point that their meanings are sometimes overlooked. Campbell explained in depth these two angles to highlight how Platform is doing something that isn’t available with the other management alternatives.

The openness and scalability angles are somewhat interesting but require a bit of setting up, more specifically by putting Platform’s announcement in the context of its Symphony product.

This MapReduce capability has been integrated into Platform Symphony, which is something of an SOA approach to workload distribution, in contrast to the company’s other widely-used LSF product, which works from a batch-oriented architecture. Why is this important, you ask…

Well, to take yet another step backwards, the Symphony approach for workload distribution and management is actually a natural fit for what Platform just got around to eight months ago. Symphony was literally built for distributed architectures, which is exactly how MapReduce is deployed. The short time-to-market for this (relatively — after all, what’s eight months) is because Campbell and his team simply build the APIs on top of Symphony. With their existing tool in place to provide the distributed management and job execution engine, they pile on specific APIs for different job types (PIG, Hadoop, etc.). Users can manage complexity by using the Symphony framework along with those APIs, and on the backside, using connectors to file systems or databases to serve as I/O for MapReduce jobs.

And back to the relatively short process behind this — the company is more or less aggregating interfaces versus tackling the cumbersome mission of rewriting MapReduce like some of the commercial big data companies have done.

In other words, the Symphony was already playing along with the big MapReduce quest to simplify workloads by allowing users to run multiple jobs at a time versus having one job hang out until completion. This could possibly mean a much more nimble big data game for those who — here’s the catch — are under the Symphony license. While the company hasn’t “productized” the new solution yet, it is going to be available within Symphony and is already making its way into financial services organizations.

Campbell asserted that this “rearchitecture of a workload distribution has low latency and operates more like a server than a grid so the workloads that can run on Symphony can run on sub-second time.”

Back to Openness and Scalability…

Remember several paragraphs ago when we hit on the idea that this offering might be something of a game-changer (at least for those with a Symphony license) due to the openness and scalability aspects? Now that there’s sufficient background we can explore that in quick detail. This is where the meat of the announcement is.

The “open” angle is probably the most important differentiator here between the Symphony/MapReduce marriage and other alternatives. As Campbell noted, since this capability “sits in the middle of the stack so that we can open up the architecture on both the front-end application layer and the backend database layer. This means we can let customers move from a complete solution and single vendor or select the application or file systems independently.”

Campbell went on to state that “this technology is getting a lot of investment commercially and in the open source form because it’s compatible with Hadoop and fully supports APIs for MapReduce. Right now, everything is almost always coming from a single vendor top to bottom and when open source comes you can’t take advantage of it. As new file systems get created you can leverage and manage those versus being locked in.”

It is also possible to add APIs specifically for MapReduce logic so there is integration of Hadoop, PIG, HIVE and others, as more programming frameworks are likely to emerge over the next several months. Platform’s big story here on the openness front is that when something new comes down the pike, users will actually be able to put it into production versus facing lock-in with very high barriers to moving over.

On that note, the architecture is designed, as noted before, without the requirement for using HDFS as the end-all file system. Users will be able to select file systems based on their specific needs while still maintaining their application type, which might, for example be written in Hadoop.

In terms of scalability, Campbell affirmed that they will be able to manage thousands or even millions of files varying in size in a short period of time via the proven, existing Symphony product.

On this note, users could get higher resource utilization since they’re getting more than one distributed job at a time — they can have multiple running simultaneously which is unique for MapReduce. This is an important element for HPC folk who are performance conscious because, as Campbell explained, they’ve “eliminated a big issue in terms of startup time on the mappers so single jobs can be fast but overall time also goes way down because it’s not a serial thing any longer; we are running many jobs in parallel across a set of jobs.”

When asked about how this Symphony and MapReduce marriage will meld into the HPC user camp, Campbell noted traction in the government and life sciences spheres as well as the more predictable arenas like financial services and large-scale analytics.

He said that while this could represent an improvement for users, there was no core engineering behind the effort, it’s been a matter of engineering interfaces to support the MapReduce logic. “We can react to the market,” he declared. “If someone creates another end user application for MapReduce we can simply interface to it.”

As big data gets bigger and more companies come calling for management and data crunching, there’s little doubt Platform’s interface builders will be working overtime.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Advancing Modular Supercomputing with DEEP and DEEP-ER Architectures

February 24, 2017

Knowing that the jump to exascale will require novel architectural approaches capable of delivering dramatic efficiency and performance gains, researchers around the world are hard at work on next-generation HPC systems. Read more…

By Sean Thielen

Weekly Twitter Roundup (Feb. 23, 2017)

February 23, 2017

Here at HPCwire, we aim to keep the HPC community apprised of the most relevant and interesting news items that get tweeted throughout the week. Read more…

By Thomas Ayres

HPE Server Shows Low Latency on STAC-N1 Test

February 22, 2017

The performance of trade and match servers can be a critical differentiator for financial trading houses. Read more…

By John Russell

HPC Financial Update (Feb. 2017)

February 22, 2017

In this recurring feature, we’ll provide you with financial highlights from companies in the HPC industry. Check back in regularly for an updated list with the most pertinent fiscal information. Read more…

By Thomas Ayres

HPE Extreme Performance Solutions

Manufacturers Reaping the Benefits of Remote Visualization

Today’s manufacturers are operating in an ever-changing atmosphere, and finding new ways to boost productivity has never been more vital.

This is why manufacturers are ramping up their investments in high performance computing (HPC), a trend which has helped give rise to the “connected factory” and Industrial Internet of Things (IIoT) concepts that are proliferating throughout the industry today. Read more…

Rethinking HPC Platforms for ‘Second Gen’ Applications

February 22, 2017

Just what constitutes HPC and how best to support it is a keen topic currently. Read more…

By John Russell

HPC Technique Propels Deep Learning at Scale

February 21, 2017

Researchers from Baidu’s Silicon Valley AI Lab (SVAIL) have adapted a well-known HPC communication technique to boost the speed and scale of their neural network training and now they are sharing their implementation with the larger deep learning community. Read more…

By Tiffany Trader

IDC: Will the Real Exascale Race Please Stand Up?

February 21, 2017

So the exascale race is on. And lots of organizations are in the pack. Government announcements from the US, China, India, Japan, and the EU indicate that they are working hard to make it happen – some sooner, some later. Read more…

By Bob Sorensen, IDC

ExxonMobil, NCSA, Cray Scale Reservoir Simulation to 700,000+ Processors

February 17, 2017

In a scaling breakthrough for oil and gas discovery, ExxonMobil geoscientists report they have harnessed the power of 717,000 processors – the equivalent of 22,000 32-processor computers – to run complex oil and gas reservoir simulation models. Read more…

By Doug Black

Advancing Modular Supercomputing with DEEP and DEEP-ER Architectures

February 24, 2017

Knowing that the jump to exascale will require novel architectural approaches capable of delivering dramatic efficiency and performance gains, researchers around the world are hard at work on next-generation HPC systems. Read more…

By Sean Thielen

HPC Technique Propels Deep Learning at Scale

February 21, 2017

Researchers from Baidu’s Silicon Valley AI Lab (SVAIL) have adapted a well-known HPC communication technique to boost the speed and scale of their neural network training and now they are sharing their implementation with the larger deep learning community. Read more…

By Tiffany Trader

IDC: Will the Real Exascale Race Please Stand Up?

February 21, 2017

So the exascale race is on. And lots of organizations are in the pack. Government announcements from the US, China, India, Japan, and the EU indicate that they are working hard to make it happen – some sooner, some later. Read more…

By Bob Sorensen, IDC

TSUBAME3.0 Points to Future HPE Pascal-NVLink-OPA Server

February 17, 2017

Since our initial coverage of the TSUBAME3.0 supercomputer yesterday, more details have come to light on this innovative project. Of particular interest is a new board design for NVLink-equipped Pascal P100 GPUs that will create another entrant to the space currently occupied by Nvidia's DGX-1 system, IBM's "Minsky" platform and the Supermicro SuperServer (1028GQ-TXR). Read more…

By Tiffany Trader

Tokyo Tech’s TSUBAME3.0 Will Be First HPE-SGI Super

February 16, 2017

In a press event Friday afternoon local time in Japan, Tokyo Institute of Technology (Tokyo Tech) announced its plans for the TSUBAME3.0 supercomputer, which will be Japan’s “fastest AI supercomputer,” Read more…

By Tiffany Trader

Drug Developers Use Google Cloud HPC in the Fight Against ALS

February 16, 2017

Within the haystack of a lethal disease such as ALS (amyotrophic lateral sclerosis / Lou Gehrig’s Disease) there exists, somewhere, the needle that will pierce this therapy-resistant affliction. Read more…

By Doug Black

Azure Edges AWS in Linpack Benchmark Study

February 15, 2017

The “when will clouds be ready for HPC” question has ebbed and flowed for years. Read more…

By John Russell

Is Liquid Cooling Ready to Go Mainstream?

February 13, 2017

Lost in the frenzy of SC16 was a substantial rise in the number of vendors showing server oriented liquid cooling technologies. Three decades ago liquid cooling was pretty much the exclusive realm of the Cray-2 and IBM mainframe class products. That’s changing. We are now seeing an emergence of x86 class server products with exotic plumbing technology ranging from Direct-to-Chip to servers and storage completely immersed in a dielectric fluid. Read more…

By Steve Campbell

For IBM/OpenPOWER: Success in 2017 = (Volume) Sales

January 11, 2017

To a large degree IBM and the OpenPOWER Foundation have done what they said they would – assembling a substantial and growing ecosystem and bringing Power-based products to market, all in about three years. Read more…

By John Russell

US, China Vie for Supercomputing Supremacy

November 14, 2016

The 48th edition of the TOP500 list is fresh off the presses and while there is no new number one system, as previously teased by China, there are a number of notable entrants from the US and around the world and significant trends to report on. Read more…

By Tiffany Trader

Lighting up Aurora: Behind the Scenes at the Creation of the DOE’s Upcoming 200 Petaflops Supercomputer

December 1, 2016

In April 2015, U.S. Department of Energy Undersecretary Franklin Orr announced that Intel would be the prime contractor for Aurora: Read more…

By Jan Rowell

D-Wave SC16 Update: What’s Bo Ewald Saying These Days

November 18, 2016

Tucked in a back section of the SC16 exhibit hall, quantum computing pioneer D-Wave has been talking up its new 2000-qubit processor announced in September. Forget for a moment the criticism sometimes aimed at D-Wave. This small Canadian company has sold several machines including, for example, ones to Lockheed and NASA, and has worked with Google on mapping machine learning problems to quantum computing. In July Los Alamos National Laboratory took possession of a 1000-quibit D-Wave 2X system that LANL ordered a year ago around the time of SC15. Read more…

By John Russell

Enlisting Deep Learning in the War on Cancer

December 7, 2016

Sometime in Q2 2017 the first ‘results’ of the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) will become publicly available according to Rick Stevens. He leads one of three JDACS4C pilot projects pressing deep learning (DL) into service in the War on Cancer. Read more…

By John Russell

IBM Wants to be “Red Hat” of Deep Learning

January 26, 2017

IBM today announced the addition of TensorFlow and Chainer deep learning frameworks to its PowerAI suite of deep learning tools, which already includes popular offerings such as Caffe, Theano, and Torch. Read more…

By John Russell

Tokyo Tech’s TSUBAME3.0 Will Be First HPE-SGI Super

February 16, 2017

In a press event Friday afternoon local time in Japan, Tokyo Institute of Technology (Tokyo Tech) announced its plans for the TSUBAME3.0 supercomputer, which will be Japan’s “fastest AI supercomputer,” Read more…

By Tiffany Trader

HPC Startup Advances Auto-Parallelization’s Promise

January 23, 2017

The shift from single core to multicore hardware has made finding parallelism in codes more important than ever, but that hasn’t made the task of parallel programming any easier. Read more…

By Tiffany Trader

Leading Solution Providers

CPU Benchmarking: Haswell Versus POWER8

June 2, 2015

With OpenPOWER activity ramping up and IBM’s prominent role in the upcoming DOE machines Summit and Sierra, it’s a good time to look at how the IBM POWER CPU stacks up against the x86 Xeon Haswell CPU from Intel. Read more…

By Tiffany Trader

BioTeam’s Berman Charts 2017 HPC Trends in Life Sciences

January 4, 2017

Twenty years ago high performance computing was nearly absent from life sciences. Today it’s used throughout life sciences and biomedical research. Genomics and the data deluge from modern lab instruments are the main drivers, but so is the longer-term desire to perform predictive simulation in support of Precision Medicine (PM). There’s even a specialized life sciences supercomputer, ‘Anton’ from D.E. Shaw Research, and the Pittsburgh Supercomputing Center is standing up its second Anton 2 and actively soliciting project proposals. There’s a lot going on. Read more…

By John Russell

Nvidia Sees Bright Future for AI Supercomputing

November 23, 2016

Graphics chipmaker Nvidia made a strong showing at SC16 in Salt Lake City last week. Read more…

By Tiffany Trader

TSUBAME3.0 Points to Future HPE Pascal-NVLink-OPA Server

February 17, 2017

Since our initial coverage of the TSUBAME3.0 supercomputer yesterday, more details have come to light on this innovative project. Of particular interest is a new board design for NVLink-equipped Pascal P100 GPUs that will create another entrant to the space currently occupied by Nvidia's DGX-1 system, IBM's "Minsky" platform and the Supermicro SuperServer (1028GQ-TXR). Read more…

By Tiffany Trader

IDG to Be Bought by Chinese Investors; IDC to Spin Out HPC Group

January 19, 2017

US-based publishing and investment firm International Data Group, Inc. (IDG) will be acquired by a pair of Chinese investors, China Oceanwide Holdings Group Co., Ltd. Read more…

By Tiffany Trader

Dell Knights Landing Machine Sets New STAC Records

November 2, 2016

The Securities Technology Analysis Center, commonly known as STAC, has released a new report characterizing the performance of the Knight Landing-based Dell PowerEdge C6320p server on the STAC-A2 benchmarking suite, widely used by the financial services industry to test and evaluate computing platforms. The Dell machine has set new records for both the baseline Greeks benchmark and the large Greeks benchmark. Read more…

By Tiffany Trader

Is Liquid Cooling Ready to Go Mainstream?

February 13, 2017

Lost in the frenzy of SC16 was a substantial rise in the number of vendors showing server oriented liquid cooling technologies. Three decades ago liquid cooling was pretty much the exclusive realm of the Cray-2 and IBM mainframe class products. That’s changing. We are now seeing an emergence of x86 class server products with exotic plumbing technology ranging from Direct-to-Chip to servers and storage completely immersed in a dielectric fluid. Read more…

By Steve Campbell

Intel and Trump Announce $7B for Fab 42 Targeting 7nm

February 8, 2017

In what may be an attempt by President Trump to reset his turbulent relationship with the high tech industry, he and Intel CEO Brian Krzanich today announced plans to invest more than $7 billion to complete Fab 42. Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Share This