Platform Tunes Symphony for Big Data Deluge

By Nicole Hemsoth

March 31, 2011

The “big data” topic is wending its way into an increasing number of conversations as data volumes mount, stretching computational resources to their limits. This realm of massive datasets is not confined to business intelligence either — it is increasingly becoming a central component of mission-critical enterprise goals.

Accordingly, the last year has produced a swell of news around companies looking to capitalize on the challenges of managing big data via commercial renditions of popular open source products and the emergence of new open frameworks to further develop the landscape. Newer companies like Cloudera, for instance, seek to bring “big data to the masses” via simplified handling of large messy datasets. And now, industry stalwart Platform Computing is hopping aboard the big data express.

modern architectureTo be more specific, Platform announced this week that it’s seeking to provide distributed computing for the MapReduce programming model, which is one of a short list of ways to extract and map the pesky unstructured data and, a la its moniker, reduce that mess into actionable information.

Cloudera (and a host of open source solutions) are all targeting one big problem. At the heart of challenges for those contending with big data (financial services organizations and large-scale business analytics users, among others) is the matter of structured versus unstructured data. To be clear, however, this isn’t just a single-sided issue; unstructured data can be problematic on several fronts, not the least of which is some warranted concern about being “locked in” to specific management tools for all that information.

Platform and others are right to address this and other problems given the continued proliferation of more of this particularly tricky type of data. As it stands, a vast majority of the data filtering in is in an unstructured format — as much as 80 percent if IDC figures are correct. New programming frameworks have stepped into the fray to help manage this complexity and enable distributed computing on large datasets.

On the storage end, new techniques and file systems like the Hadoop file system (HDFS), which was built to tackle the demands of both structured and unstructured data, have been developed, but in Platform’s view (which we’ll expound on in a moment) this and other models all have some serious weaknesses on one front or another.

An Evolving Platform

For a company that has been in the business of distributed systems for 18 years, this implementation isn’t unexpected. In fact, the only element that does cause some head-scratching is why they took so long to get into the big data boat when much of the needed framework was there.

According to Scott Campbell, Platform’s product manager for enterprise analytics, the process to start adding the tools to “reduce the maps” began around eight months ago even though he noted that the company was seeing some seismic shifts in the analytics sphere over the last few years on the unstructured data front. With massive amounts of data filtering in from any number of new tools, sensors and other collection methods, it was clear that it was becoming impossible to run this data into warehouses or structured databases and there were some serious limitations underlying a number of existing efforts.

Ken Hertzler, vice president of product management for Platform, told us that their customers, especially those on the financial services and analytics side, found that existing big data solutions (including open source tools like Hadoop, companies like Cloudera or data warehousing systems a la Greenplum or Aster Data) had critical flaws. He pointed out that with all of these solutions users might be responsible for managing the software stack (if using open source) and would thus need to increase internal expertise as well as perform regular maintenance to keep big data projects churning.

Another big problem that Hertzler highlighted is that open source solutions are reliant only with the HDFS file system and those who try to avoid this perceived “trap” and go with a data warehousing alternative are getting that top-to-bottom product that can be very difficult to extract oneself from.

This isn’t just coming from Hertzler’s own opinion well; he stated that customers all felt that the alternatives for big data management did a great job of managing the query side of their needs but that they failed on the enterprise-class or production-ready level. He revealed that the main gripes were about poor application compatibility, the lock-in issue, maintaining utilization and SLAs and concerns about having data on multiple cloud storage distributed systems.

Platform’s distributed MapReduce workload manager and job execution engine is, as both Hertzler and Campbell emphasized repeatedly, enterprise-ready and far more viable due to two key traits in particular: openness and scalability.

The keywords “open” and “scalable” are ferried about in nearly every technological context these days — almost to the point that their meanings are sometimes overlooked. Campbell explained in depth these two angles to highlight how Platform is doing something that isn’t available with the other management alternatives.

The openness and scalability angles are somewhat interesting but require a bit of setting up, more specifically by putting Platform’s announcement in the context of its Symphony product.

This MapReduce capability has been integrated into Platform Symphony, which is something of an SOA approach to workload distribution, in contrast to the company’s other widely-used LSF product, which works from a batch-oriented architecture. Why is this important, you ask…

Well, to take yet another step backwards, the Symphony approach for workload distribution and management is actually a natural fit for what Platform just got around to eight months ago. Symphony was literally built for distributed architectures, which is exactly how MapReduce is deployed. The short time-to-market for this (relatively — after all, what’s eight months) is because Campbell and his team simply build the APIs on top of Symphony. With their existing tool in place to provide the distributed management and job execution engine, they pile on specific APIs for different job types (PIG, Hadoop, etc.). Users can manage complexity by using the Symphony framework along with those APIs, and on the backside, using connectors to file systems or databases to serve as I/O for MapReduce jobs.

And back to the relatively short process behind this — the company is more or less aggregating interfaces versus tackling the cumbersome mission of rewriting MapReduce like some of the commercial big data companies have done.

In other words, the Symphony was already playing along with the big MapReduce quest to simplify workloads by allowing users to run multiple jobs at a time versus having one job hang out until completion. This could possibly mean a much more nimble big data game for those who — here’s the catch — are under the Symphony license. While the company hasn’t “productized” the new solution yet, it is going to be available within Symphony and is already making its way into financial services organizations.

Campbell asserted that this “rearchitecture of a workload distribution has low latency and operates more like a server than a grid so the workloads that can run on Symphony can run on sub-second time.”

Back to Openness and Scalability…

Remember several paragraphs ago when we hit on the idea that this offering might be something of a game-changer (at least for those with a Symphony license) due to the openness and scalability aspects? Now that there’s sufficient background we can explore that in quick detail. This is where the meat of the announcement is.

The “open” angle is probably the most important differentiator here between the Symphony/MapReduce marriage and other alternatives. As Campbell noted, since this capability “sits in the middle of the stack so that we can open up the architecture on both the front-end application layer and the backend database layer. This means we can let customers move from a complete solution and single vendor or select the application or file systems independently.”

Campbell went on to state that “this technology is getting a lot of investment commercially and in the open source form because it’s compatible with Hadoop and fully supports APIs for MapReduce. Right now, everything is almost always coming from a single vendor top to bottom and when open source comes you can’t take advantage of it. As new file systems get created you can leverage and manage those versus being locked in.”

It is also possible to add APIs specifically for MapReduce logic so there is integration of Hadoop, PIG, HIVE and others, as more programming frameworks are likely to emerge over the next several months. Platform’s big story here on the openness front is that when something new comes down the pike, users will actually be able to put it into production versus facing lock-in with very high barriers to moving over.

On that note, the architecture is designed, as noted before, without the requirement for using HDFS as the end-all file system. Users will be able to select file systems based on their specific needs while still maintaining their application type, which might, for example be written in Hadoop.

In terms of scalability, Campbell affirmed that they will be able to manage thousands or even millions of files varying in size in a short period of time via the proven, existing Symphony product.

On this note, users could get higher resource utilization since they’re getting more than one distributed job at a time — they can have multiple running simultaneously which is unique for MapReduce. This is an important element for HPC folk who are performance conscious because, as Campbell explained, they’ve “eliminated a big issue in terms of startup time on the mappers so single jobs can be fast but overall time also goes way down because it’s not a serial thing any longer; we are running many jobs in parallel across a set of jobs.”

When asked about how this Symphony and MapReduce marriage will meld into the HPC user camp, Campbell noted traction in the government and life sciences spheres as well as the more predictable arenas like financial services and large-scale analytics.

He said that while this could represent an improvement for users, there was no core engineering behind the effort, it’s been a matter of engineering interfaces to support the MapReduce logic. “We can react to the market,” he declared. “If someone creates another end user application for MapReduce we can simply interface to it.”

As big data gets bigger and more companies come calling for management and data crunching, there’s little doubt Platform’s interface builders will be working overtime.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

IDG to Be Bought by Chinese Investors; IDC to Spin Out HPC Group

January 19, 2017

US-based publishing and investment group International Data Group, Inc. (IDG) will be acquired by a pair of Chinese investors, China Oceanwide Holdings Group Co., Ltd. Read more…

By Tiffany Trader

Weekly Twitter Roundup (Jan. 19, 2017)

January 19, 2017

Here at HPCwire, we aim to keep the HPC community apprised of the most relevant and interesting news items that get tweeted throughout the week. Read more…

By Thomas Ayres

France’s CEA and Japan’s RIKEN to Partner on ARM and Exascale

January 19, 2017

France’s CEA and Japan’s RIKEN institute announced a multi-faceted five-year collaboration to advance HPC generally and prepare for exascale computing. Among the particulars are efforts to: build out the ARM ecosystem; work on code development and code sharing on the existing and future platforms; share expertise in specific application areas (material and seismic sciences for example); improve techniques for using numerical simulation with big data; and expand HPC workforce training. It seems to be a very full agenda. Read more…

By Nishi Katsuya and John Russell

ARM Waving: Attention, Deployments, and Development

January 18, 2017

It’s been a heady two weeks for the ARM HPC advocacy camp. At this week’s Mont-Blanc Project meeting held at the Barcelona Supercomputer Center, Cray announced plans to build an ARM-based supercomputer in the U.K. while Mont-Blanc selected Cavium’s ThunderX2 ARM chip for its third phase of development. Last week, France’s CEA and Japan’s Riken announced a deep collaboration aimed largely at fostering the ARM ecosystem. This activity follows a busy 2016 when SoftBank acquired ARM, OpenHPC announced ARM support, ARM released its SVE spec, Fujistu chose ARM for the post K machine, and ARM acquired HPC tool provider Allinea in December. Read more…

By John Russell

HPE Extreme Performance Solutions

Remote Visualization: An Integral Technology for Upstream Oil & Gas

As the exploration and production (E&P) of natural resources evolves into an even more complex and vital task, visualization technology has become integral for the upstream oil and gas industry. Read more…

Women Coders from Russia, Italy, and Poland Top Study

January 17, 2017

According to a study posted on HackerRank today the best women coders as judged by performance on HackerRank challenges come from Russia, Italy, and Poland. Read more…

By John Russell

Spurred by Global Ambitions, Inspur in Joint HPC Deal with DDN

January 17, 2017

Inspur, the fast-growth cloud computing and server vendor from China that has several systems on the current Top500 list, and DDN, a leader in high-end storage, have announced a joint sales and marketing agreement to produce solutions based on DDN storage platforms integrated with servers, networking, software and services from Inspur. Read more…

By Doug Black

Weekly Twitter Roundup (Jan. 12, 2017)

January 12, 2017

Here at HPCwire, we aim to keep the HPC community apprised of the most relevant and interesting news items that get tweeted throughout the week. Read more…

By Thomas Ayres

NSF Seeks Input on Cyberinfrastructure Advances Needed

January 12, 2017

In cased you missed it, the National Science Foundation posted a “Dear Colleague Letter” (DCL) late last week seeking input on needs for the next generation of cyberinfrastructure to support science and engineering. Read more…

By John Russell

IDG to Be Bought by Chinese Investors; IDC to Spin Out HPC Group

January 19, 2017

US-based publishing and investment group International Data Group, Inc. (IDG) will be acquired by a pair of Chinese investors, China Oceanwide Holdings Group Co., Ltd. Read more…

By Tiffany Trader

France’s CEA and Japan’s RIKEN to Partner on ARM and Exascale

January 19, 2017

France’s CEA and Japan’s RIKEN institute announced a multi-faceted five-year collaboration to advance HPC generally and prepare for exascale computing. Among the particulars are efforts to: build out the ARM ecosystem; work on code development and code sharing on the existing and future platforms; share expertise in specific application areas (material and seismic sciences for example); improve techniques for using numerical simulation with big data; and expand HPC workforce training. It seems to be a very full agenda. Read more…

By Nishi Katsuya and John Russell

ARM Waving: Attention, Deployments, and Development

January 18, 2017

It’s been a heady two weeks for the ARM HPC advocacy camp. At this week’s Mont-Blanc Project meeting held at the Barcelona Supercomputer Center, Cray announced plans to build an ARM-based supercomputer in the U.K. while Mont-Blanc selected Cavium’s ThunderX2 ARM chip for its third phase of development. Last week, France’s CEA and Japan’s Riken announced a deep collaboration aimed largely at fostering the ARM ecosystem. This activity follows a busy 2016 when SoftBank acquired ARM, OpenHPC announced ARM support, ARM released its SVE spec, Fujistu chose ARM for the post K machine, and ARM acquired HPC tool provider Allinea in December. Read more…

By John Russell

Spurred by Global Ambitions, Inspur in Joint HPC Deal with DDN

January 17, 2017

Inspur, the fast-growth cloud computing and server vendor from China that has several systems on the current Top500 list, and DDN, a leader in high-end storage, have announced a joint sales and marketing agreement to produce solutions based on DDN storage platforms integrated with servers, networking, software and services from Inspur. Read more…

By Doug Black

For IBM/OpenPOWER: Success in 2017 = (Volume) Sales

January 11, 2017

To a large degree IBM and the OpenPOWER Foundation have done what they said they would – assembling a substantial and growing ecosystem and bringing Power-based products to market, all in about three years. Read more…

By John Russell

UberCloud Cites Progress in HPC Cloud Computing

January 10, 2017

200 HPC cloud experiments, 80 case studies, and a ton of hands-on experience gained, that’s the harvest of four years of UberCloud HPC Experiments. Read more…

By Wolfgang Gentzsch and Burak Yenier

A Conversation with Women in HPC Director Toni Collis

January 6, 2017

In this SC16 video interview, HPCwire Managing Editor Tiffany Trader sits down with Toni Collis, the director and founder of the Women in HPC (WHPC) network, to discuss the strides made since the organization’s debut in 2014. Read more…

By Tiffany Trader

BioTeam’s Berman Charts 2017 HPC Trends in Life Sciences

January 4, 2017

Twenty years ago high performance computing was nearly absent from life sciences. Today it’s used throughout life sciences and biomedical research. Genomics and the data deluge from modern lab instruments are the main drivers, but so is the longer-term desire to perform predictive simulation in support of Precision Medicine (PM). There’s even a specialized life sciences supercomputer, ‘Anton’ from D.E. Shaw Research, and the Pittsburgh Supercomputing Center is standing up its second Anton 2 and actively soliciting project proposals. There’s a lot going on. Read more…

By John Russell

AWS Beats Azure to K80 General Availability

September 30, 2016

Amazon Web Services has seeded its cloud with Nvidia Tesla K80 GPUs to meet the growing demand for accelerated computing across an increasingly-diverse range of workloads. The P2 instance family is a welcome addition for compute- and data-focused users who were growing frustrated with the performance limitations of Amazon's G2 instances, which are backed by three-year-old Nvidia GRID K520 graphics cards. Read more…

By Tiffany Trader

US, China Vie for Supercomputing Supremacy

November 14, 2016

The 48th edition of the TOP500 list is fresh off the presses and while there is no new number one system, as previously teased by China, there are a number of notable entrants from the US and around the world and significant trends to report on. Read more…

By Tiffany Trader

Vectors: How the Old Became New Again in Supercomputing

September 26, 2016

Vector instructions, once a powerful performance innovation of supercomputing in the 1970s and 1980s became an obsolete technology in the 1990s. But like the mythical phoenix bird, vector instructions have arisen from the ashes. Here is the history of a technology that went from new to old then back to new. Read more…

By Lynd Stringer

For IBM/OpenPOWER: Success in 2017 = (Volume) Sales

January 11, 2017

To a large degree IBM and the OpenPOWER Foundation have done what they said they would – assembling a substantial and growing ecosystem and bringing Power-based products to market, all in about three years. Read more…

By John Russell

Container App ‘Singularity’ Eases Scientific Computing

October 20, 2016

HPC container platform Singularity is just six months out from its 1.0 release but already is making inroads across the HPC research landscape. It's in use at Lawrence Berkeley National Laboratory (LBNL), where Singularity founder Gregory Kurtzer has worked in the High Performance Computing Services (HPCS) group for 16 years. Read more…

By Tiffany Trader

Dell EMC Engineers Strategy to Democratize HPC

September 29, 2016

The freshly minted Dell EMC division of Dell Technologies is on a mission to take HPC mainstream with a strategy that hinges on engineered solutions, beginning with a focus on three industry verticals: manufacturing, research and life sciences. "Unlike traditional HPC where everybody bought parts, assembled parts and ran the workloads and did iterative engineering, we want folks to focus on time to innovation and let us worry about the infrastructure," said Jim Ganthier, senior vice president, validated solutions organization at Dell EMC Converged Platforms Solution Division. Read more…

By Tiffany Trader

Lighting up Aurora: Behind the Scenes at the Creation of the DOE’s Upcoming 200 Petaflops Supercomputer

December 1, 2016

In April 2015, U.S. Department of Energy Undersecretary Franklin Orr announced that Intel would be the prime contractor for Aurora: Read more…

By Jan Rowell

Enlisting Deep Learning in the War on Cancer

December 7, 2016

Sometime in Q2 2017 the first ‘results’ of the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) will become publicly available according to Rick Stevens. He leads one of three JDACS4C pilot projects pressing deep learning (DL) into service in the War on Cancer. Read more…

By John Russell

Leading Solution Providers

D-Wave SC16 Update: What’s Bo Ewald Saying These Days

November 18, 2016

Tucked in a back section of the SC16 exhibit hall, quantum computing pioneer D-Wave has been talking up its new 2000-qubit processor announced in September. Forget for a moment the criticism sometimes aimed at D-Wave. This small Canadian company has sold several machines including, for example, ones to Lockheed and NASA, and has worked with Google on mapping machine learning problems to quantum computing. In July Los Alamos National Laboratory took possession of a 1000-quibit D-Wave 2X system that LANL ordered a year ago around the time of SC15. Read more…

By John Russell

CPU Benchmarking: Haswell Versus POWER8

June 2, 2015

With OpenPOWER activity ramping up and IBM’s prominent role in the upcoming DOE machines Summit and Sierra, it’s a good time to look at how the IBM POWER CPU stacks up against the x86 Xeon Haswell CPU from Intel. Read more…

By Tiffany Trader

Nvidia Sees Bright Future for AI Supercomputing

November 23, 2016

Graphics chipmaker Nvidia made a strong showing at SC16 in Salt Lake City last week. Read more…

By Tiffany Trader

New Genomics Pipeline Combines AWS, Local HPC, and Supercomputing

September 22, 2016

Declining DNA sequencing costs and the rush to do whole genome sequencing (WGS) of large cohort populations – think 5000 subjects now, but many more thousands soon – presents a formidable computational challenge to researchers attempting to make sense of large cohort datasets. Read more…

By John Russell

Beyond von Neumann, Neuromorphic Computing Steadily Advances

March 21, 2016

Neuromorphic computing – brain inspired computing – has long been a tantalizing goal. The human brain does with around 20 watts what supercomputers do with megawatts. And power consumption isn’t the only difference. Fundamentally, brains ‘think differently’ than the von Neumann architecture-based computers. While neuromorphic computing progress has been intriguing, it has still not proven very practical. Read more…

By John Russell

The Exascale Computing Project Awards $39.8M to 22 Projects

September 7, 2016

The Department of Energy’s Exascale Computing Project (ECP) hit an important milestone today with the announcement of its first round of funding, moving the nation closer to its goal of reaching capable exascale computing by 2023. Read more…

By Tiffany Trader

Dell Knights Landing Machine Sets New STAC Records

November 2, 2016

The Securities Technology Analysis Center, commonly known as STAC, has released a new report characterizing the performance of the Knight Landing-based Dell PowerEdge C6320p server on the STAC-A2 benchmarking suite, widely used by the financial services industry to test and evaluate computing platforms. The Dell machine has set new records for both the baseline Greeks benchmark and the large Greeks benchmark. Read more…

By Tiffany Trader

BioTeam’s Berman Charts 2017 HPC Trends in Life Sciences

January 4, 2017

Twenty years ago high performance computing was nearly absent from life sciences. Today it’s used throughout life sciences and biomedical research. Genomics and the data deluge from modern lab instruments are the main drivers, but so is the longer-term desire to perform predictive simulation in support of Precision Medicine (PM). There’s even a specialized life sciences supercomputer, ‘Anton’ from D.E. Shaw Research, and the Pittsburgh Supercomputing Center is standing up its second Anton 2 and actively soliciting project proposals. There’s a lot going on. Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Share This