The Revolution in Real-Time Analytics

By Neal Ekkers

May 4, 2018

For decades, the trajectory of data analytics has moved in one simple direction – toward bigger and faster. But the journey to get there has been anything but simple.

At heart, this is the story told by the short book, Spark for Dummies, by Robert D. Schneider. 

One of the main characters in the story of data analytics is of course the data itself – Big Data. It’s growing exponentially and it comes from almost everywhere – phone calls, e-mail, social media, and online shopping, to name only a few. Even driving a car – a new Ford Fusion plug-in hybrid generates 25GB of data every hour. In fact, according to Forbes, by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet.

But a key concept to keep in mind is that the growth of data is not the real issue. If all we want to do is store data, then there’s plenty of very inexpensive storage available on tape, with IBM scientists back in April of 2015 already demonstrating densities of 123 billion bits of uncompressed data per square inch on particulate magnetic tape, which represents the equivalent of a 220 terabyte tape cartridge that could fit in the palm of your hand.[1]

Instead, the real driver of data analytics evolution is the simple desire for competitive advantage. Consider the benefits that Big Data analytics brings to all kinds of industries:

  • Financial services:
    • Gain deeper knowledge about customers
    • Discover fraudulent activities
    • Offer new, innovative products and services
    • Make better – and faster – trading decisions
  • Telecommunications:
    • Deliver higher quality service
    • Quickly identify and correct network anomalies
    • Make informed decisions about capital investments
    • Offer highly tailored packages to retain more customers 
  • Retail:
    • Offer smarter up‐sell and cross‐sell recommendations
    • Get a better picture of overall purchasing trends
    • Set optimal pricing and discounts
    • Monitor social media to spot satisfied — or disgruntled — customers

The list is essentially endless, but you get the picture. The question of how to derive the most value possible from Big Data has occupied the attention of some rather impressive organizations. In 2004, Google decided to harness the power of parallel, distributed computing to help digest the enormous amounts of data produced during daily operations. The result was a group of technologies and architectural design philosophies that came to be known as MapReduce, an approach to Big Data analytics built on the proven concept of divide and conquer by using distributed computing and parallel processing. It’s much faster to break a massive task into smaller chunks, allocate them to multiple servers, and process them in parallel.

MapReduce was a great start, but it requires a significant amount of developer and technology resources to make it work. This wasn’t feasible for most enterprises, and the relative complexity led to the advent of Hadoop, a popular, standards‐based, open‐source software framework built on the foundation of MapReduce. Hadoop leverages the power of massive parallel processing to take advantage of Big Data, generally by using a lot of inexpensive commodity servers.

In typical fashion, the more we gain, the more we want. The MapReduce / Hadoop paradigm is based on batch processing – amassing large volumes of data, then running it all at once to get results. Though this approach is powerful for many use cases, what if we want results right now, not tomorrow or next week after the batch job runs? Enter Apache Spark, the Big Data solution for real-time analytics.

Apache Spark represents a revolutionary new approach to designing, developing, and distributing solutions capable of processing Big Data for real-time results. Spark offers several advantages for developing Big Data solutions, including higher performance, greater simplicity, easier administration, and faster application development. 

Because most Big Data analytics solutions such as Spark are composed of numerous open‐source components, assembling a stable, scalable, manageable environment isn’t straightforward. An integrated solution from a vendor provides a single point of contact to help get your Big Data infrastructure up and running – and to keep it running if you have problems. IBM has made enormous contributions and investments in open‐source Spark. To complement these efforts, IBM also created IBM Spectrum Conductor,[2] an all‐inclusive, turnkey commercial distribution that delivers all of Spark’s advantages, while making it easier for enterprises to build and operate Spark-based solutions.

IBM Spectrum Conductor, a member of the IBM Spectrum Computing family of software-defined solutions, enables organizations to accelerate business insights from all their data by leveraging the most current scale-out applications, open source frameworks, in-memory analytics, NoSQL databases, cloud-native application architectures, and container environments. IBM Spectrum Conductor offers significant advantages over Hadoop. It provides a more powerful resource scheduler that’s been proven in some of the world’s most demanding customer environments, as well as monitoring, reporting, diagnostics, and workload management tools. And don’t underestimate the value of IBM services and support, all managed from a single user interface.

Spark for Dummies provides many pages of explanations about why Spark-driven real-time analytics solutions are revolutionary for business and how all types of enterprises are successfully implementing Spark-based solutions leveraging the advantages of IBM Spectrum Conductor. You don’t need to wait for business insights; IBM Spectrum Conductor can help you gain competitive advantage today.

[1] IBM Press Release: IBM Research Sets New Record for Tape Storage, April 2015 https://www-03.ibm.com/press/us/en/pressrelease/46554.wss

[2] Formerly IBM Spectrum Conductor for Spark

Return to Solution Channel Homepage

IBM Resources

Follow @IBMSystems

IBM Systems on Facebook

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Neural Network ‘Synapse’ Technology Showcased at IEEE Meeting

December 12, 2018

There’s nice snapshot of advancing work to develop improved neural network “synapse” technologies posted yesterday on IEEE Spectrum. Lower power, ease of use, manufacturability, and performance are all key paramete Read more…

By John Russell

Is Amazon’s Plunge into Server Chips a Watershed Moment?

December 11, 2018

For several years now the big cloud providers – Amazon, Microsoft Azure, Google, et al – have been transforming from technology consumers into technology creators in hardware and software. The most recent example bei Read more…

By John Russell

Mellanox Uses Univa to Extend Silicon Design HPC Operation to Azure

December 11, 2018

Call it a corollary to Murphy’s Law: When a system is most in demand, when end users are most dependent on the system performing as required, when it’s crunch time – that’s when the system is most likely to blow up. Or make you wait in line to use it. Read more…

By Doug Black

HPE Extreme Performance Solutions

AI Can Be Scary. But Choosing the Wrong Partners Can Be Mortifying!

As you continue to dive deeper into AI, you will discover it is more than just deep learning. AI is an extremely complex set of machine learning, deep learning, reinforcement, and analytics algorithms with varying compute, storage, memory, and communications needs. Read more…

IBM Accelerated Insights

Blurring the Lines Between HPC and AI @ SC18

The dominant topic at SC18 was the convergence of HPC and Artificial Intelligence (AI) with some of the biggest research and enterprise HPC users providing perspectives on how HPC and AI are moving closer together. Read more…

Clemson’s Cautionary Cryptomining Tale

December 11, 2018

In some ways, the bigger the computer, the more vulnerable it is to cryptomining as Clemson University discovered after cryptominers dug into its Palmetto supercomputer. When a number of nodes on Clemson University’s P Read more…

By Staff

Topology Can Help Us Find Patterns in Weather

December 6, 2018

Topology--–the study of shapes-- seems to be all the rage. You could even say that data has shape, and shape matters. Shapes are comfortable and familiar conc Read more…

By James Reinders

Zettascale by 2035? China Thinks So

December 6, 2018

Exascale machines (of at least a 1 exaflops peak) are anticipated to arrive by around 2020, a few years behind original predictions; and given extreme-scale performance challenges are not getting any easier, it makes sense that researchers are already looking ahead to the next big 1,000x performance goal post: zettascale computing. Read more…

By Tiffany Trader

Robust Quantum Computers Still a Decade Away, Says Nat’l Academies Report

December 5, 2018

The National Academies of Science, Engineering, and Medicine yesterday released a report – Quantum Computing: Progress and Prospects – whose optimism about Read more…

By John Russell

Revisiting the 2008 Exascale Computing Study at SC18

November 29, 2018

A report published a decade ago conveyed the results of a study aimed at determining if it were possible to achieve 1000X the computational power of the the Read more…

By Scott Gibson

AWS Debuts Lustre as a Service, Accelerates Data Transfer

November 28, 2018

From the Amazon re:Invent main stage in Las Vegas today, Amazon Web Services CEO Andy Jassy introduced Amazon FSx for Lustre, citing a growing body of applicati Read more…

By Tiffany Trader

AWS Launches First Arm Cloud Instances

November 28, 2018

AWS, a macrocosm of the emerging high-performance technology landscape, wants to be everywhere you want to be and offer everything you want to use (or at least Read more…

By Doug Black

Move Over Lustre & Spectrum Scale – Here Comes BeeGFS?

November 26, 2018

Is BeeGFS – the parallel file system with European roots – on a path to compete with Lustre and Spectrum Scale worldwide in HPC environments? Frank Herold Read more…

By John Russell

DOE Under Secretary for Science Paul Dabbar Interviewed at SC18

November 21, 2018

During the 30th annual SC conference in Dallas last week, SC18 hosted U.S. Department of Energy Under Secretary for Science Paul M. Dabbar. In attendance Nov. 13-14, Dabbar delivered remarks at the Top500 panel, met with a number of industry stakeholders and toured the show floor. He also met with HPCwire for an interview, where we discussed the role of the DOE in advancing leadership computing. Read more…

By Tiffany Trader

Quantum Computing Will Never Work

November 27, 2018

Amid the gush of money and enthusiastic predictions being thrown at quantum computing comes a proposed cold shower in the form of an essay by physicist Mikhail Read more…

By John Russell

Cray Unveils Shasta, Lands NERSC-9 Contract

October 30, 2018

Cray revealed today the details of its next-gen supercomputing architecture, Shasta, selected to be the next flagship system at NERSC. We've known of the code-name "Shasta" since the Argonne slice of the CORAL project was announced in 2015 and although the details of that plan have changed considerably, Cray didn't slow down its timeline for Shasta. Read more…

By Tiffany Trader

IBM at Hot Chips: What’s Next for Power

August 23, 2018

With processor, memory and networking technologies all racing to fill in for an ailing Moore’s law, the era of the heterogeneous datacenter is well underway, Read more…

By Tiffany Trader

House Passes $1.275B National Quantum Initiative

September 17, 2018

Last Thursday the U.S. House of Representatives passed the National Quantum Initiative Act (NQIA) intended to accelerate quantum computing research and developm Read more…

By John Russell

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

CERN Project Sees Orders-of-Magnitude Speedup with AI Approach

August 14, 2018

An award-winning effort at CERN has demonstrated potential to significantly change how the physics based modeling and simulation communities view machine learni Read more…

By Rob Farber

AMD Sets Up for Epyc Epoch

November 16, 2018

It’s been a good two weeks, AMD’s Gary Silcott and Andy Parma told me on the last day of SC18 in Dallas at the restaurant where we met to discuss their show news and recent successes. Heck, it’s been a good year. Read more…

By Tiffany Trader

US Leads Supercomputing with #1, #2 Systems & Petascale Arm

November 12, 2018

The 31st Supercomputing Conference (SC) - commemorating 30 years since the first Supercomputing in 1988 - kicked off in Dallas yesterday, taking over the Kay Ba Read more…

By Tiffany Trader

Leading Solution Providers

SC 18 Virtual Booth Video Tour

Advania @ SC18 AMD @ SC18
ASRock Rack @ SC18
DDN Storage @ SC18
HPE @ SC18
IBM @ SC18
Lenovo @ SC18 Mellanox Technologies @ SC18
NVIDIA @ SC18
One Stop Systems @ SC18
Oracle @ SC18 Panasas @ SC18
Supermicro @ SC18 SUSE @ SC18 TYAN @ SC18
Verne Global @ SC18

TACC’s ‘Frontera’ Supercomputer Expands Horizon for Extreme-Scale Science

August 29, 2018

The National Science Foundation and the Texas Advanced Computing Center announced today that a new system, called Frontera, will overtake Stampede 2 as the fast Read more…

By Tiffany Trader

HPE No. 1, IBM Surges, in ‘Bucking Bronco’ High Performance Server Market

September 27, 2018

Riding healthy U.S. and global economies, strong demand for AI-capable hardware and other tailwind trends, the high performance computing server market jumped 28 percent in the second quarter 2018 to $3.7 billion, up from $2.9 billion for the same period last year, according to industry analyst firm Hyperion Research. Read more…

By Doug Black

Nvidia’s Jensen Huang Delivers Vision for the New HPC

November 14, 2018

For nearly two hours on Monday at SC18, Jensen Huang, CEO of Nvidia, presented his expansive view of the future of HPC (and computing in general) as only he can do. Animated. Backstopped by a stream of data charts, product photos, and even a beautiful image of supernovae... Read more…

By John Russell

Germany Celebrates Launch of Two Fastest Supercomputers

September 26, 2018

The new high-performance computer SuperMUC-NG at the Leibniz Supercomputing Center (LRZ) in Garching is the fastest computer in Germany and one of the fastest i Read more…

By Tiffany Trader

Houston to Field Massive, ‘Geophysically Configured’ Cloud Supercomputer

October 11, 2018

Based on some news stories out today, one might get the impression that the next system to crack number one on the Top500 would be an industrial oil and gas mon Read more…

By Tiffany Trader

Intel Confirms 48-Core Cascade Lake-AP for 2019

November 4, 2018

As part of the run-up to SC18, taking place in Dallas next week (Nov. 11-16), Intel is doling out info on its next-gen Cascade Lake family of Xeon processors, specifically the “Advanced Processor” version (Cascade Lake-AP), architected for high-performance computing, artificial intelligence and infrastructure-as-a-service workloads. Read more…

By Tiffany Trader

Google Releases Machine Learning “What-If” Analysis Tool

September 12, 2018

Training machine learning models has long been time-consuming process. Yesterday, Google released a “What-If Tool” for probing how data point changes affect a model’s prediction. The new tool is being launched as a new feature of the open source TensorBoard web application... Read more…

By John Russell

The Convergence of Big Data and Extreme-Scale HPC

August 31, 2018

As we are heading towards extreme-scale HPC coupled with data intensive analytics like machine learning, the necessary integration of big data and HPC is a curr Read more…

By Rob Farber

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This