The Revolution in Real-Time Analytics

By Neal Ekkers

May 4, 2018

For decades, the trajectory of data analytics has moved in one simple direction – toward bigger and faster. But the journey to get there has been anything but simple.

At heart, this is the story told by the short book, Spark for Dummies, by Robert D. Schneider. 

One of the main characters in the story of data analytics is of course the data itself – Big Data. It’s growing exponentially and it comes from almost everywhere – phone calls, e-mail, social media, and online shopping, to name only a few. Even driving a car – a new Ford Fusion plug-in hybrid generates 25GB of data every hour. In fact, according to Forbes, by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet.

But a key concept to keep in mind is that the growth of data is not the real issue. If all we want to do is store data, then there’s plenty of very inexpensive storage available on tape, with IBM scientists back in April of 2015 already demonstrating densities of 123 billion bits of uncompressed data per square inch on particulate magnetic tape, which represents the equivalent of a 220 terabyte tape cartridge that could fit in the palm of your hand.[1]

Instead, the real driver of data analytics evolution is the simple desire for competitive advantage. Consider the benefits that Big Data analytics brings to all kinds of industries:

  • Financial services:
    • Gain deeper knowledge about customers
    • Discover fraudulent activities
    • Offer new, innovative products and services
    • Make better – and faster – trading decisions
  • Telecommunications:
    • Deliver higher quality service
    • Quickly identify and correct network anomalies
    • Make informed decisions about capital investments
    • Offer highly tailored packages to retain more customers 
  • Retail:
    • Offer smarter up‐sell and cross‐sell recommendations
    • Get a better picture of overall purchasing trends
    • Set optimal pricing and discounts
    • Monitor social media to spot satisfied — or disgruntled — customers

The list is essentially endless, but you get the picture. The question of how to derive the most value possible from Big Data has occupied the attention of some rather impressive organizations. In 2004, Google decided to harness the power of parallel, distributed computing to help digest the enormous amounts of data produced during daily operations. The result was a group of technologies and architectural design philosophies that came to be known as MapReduce, an approach to Big Data analytics built on the proven concept of divide and conquer by using distributed computing and parallel processing. It’s much faster to break a massive task into smaller chunks, allocate them to multiple servers, and process them in parallel.

MapReduce was a great start, but it requires a significant amount of developer and technology resources to make it work. This wasn’t feasible for most enterprises, and the relative complexity led to the advent of Hadoop, a popular, standards‐based, open‐source software framework built on the foundation of MapReduce. Hadoop leverages the power of massive parallel processing to take advantage of Big Data, generally by using a lot of inexpensive commodity servers.

In typical fashion, the more we gain, the more we want. The MapReduce / Hadoop paradigm is based on batch processing – amassing large volumes of data, then running it all at once to get results. Though this approach is powerful for many use cases, what if we want results right now, not tomorrow or next week after the batch job runs? Enter Apache Spark, the Big Data solution for real-time analytics.

Apache Spark represents a revolutionary new approach to designing, developing, and distributing solutions capable of processing Big Data for real-time results. Spark offers several advantages for developing Big Data solutions, including higher performance, greater simplicity, easier administration, and faster application development. 

Because most Big Data analytics solutions such as Spark are composed of numerous open‐source components, assembling a stable, scalable, manageable environment isn’t straightforward. An integrated solution from a vendor provides a single point of contact to help get your Big Data infrastructure up and running – and to keep it running if you have problems. IBM has made enormous contributions and investments in open‐source Spark. To complement these efforts, IBM also created IBM Spectrum Conductor,[2] an all‐inclusive, turnkey commercial distribution that delivers all of Spark’s advantages, while making it easier for enterprises to build and operate Spark-based solutions.

IBM Spectrum Conductor, a member of the IBM Spectrum Computing family of software-defined solutions, enables organizations to accelerate business insights from all their data by leveraging the most current scale-out applications, open source frameworks, in-memory analytics, NoSQL databases, cloud-native application architectures, and container environments. IBM Spectrum Conductor offers significant advantages over Hadoop. It provides a more powerful resource scheduler that’s been proven in some of the world’s most demanding customer environments, as well as monitoring, reporting, diagnostics, and workload management tools. And don’t underestimate the value of IBM services and support, all managed from a single user interface.

Spark for Dummies provides many pages of explanations about why Spark-driven real-time analytics solutions are revolutionary for business and how all types of enterprises are successfully implementing Spark-based solutions leveraging the advantages of IBM Spectrum Conductor. You don’t need to wait for business insights; IBM Spectrum Conductor can help you gain competitive advantage today.

[1] IBM Press Release: IBM Research Sets New Record for Tape Storage, April 2015 https://www-03.ibm.com/press/us/en/pressrelease/46554.wss

[2] Formerly IBM Spectrum Conductor for Spark

IBM Resources

Follow @IBMSystems

IBM Systems on Facebook

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

InfiniBand Still Tops in Supercomputing

July 19, 2018

In the competitive global HPC landscape, system and processor vendors, nations and end user sites certainly get a lot of attention--deservedly so--but more than ever, the network plays a crucial role. While fast, perform Read more…

By Tiffany Trader

HPC for Life: Genomics, Brain Research, and Beyond

July 19, 2018

During the past few decades, the life sciences have witnessed one landmark discovery after another with the aid of HPC, paving the way toward a new era of personalized treatments based on an individual’s genetic makeup Read more…

By Warren Froelich

WCRP’s New Strategic Plan for Climate Research Highlights the Importance of HPC

July 19, 2018

As climate modeling increasingly leverages exascale computing and researchers warn of an impending computing gap in climate research, the World Climate Research Programme (WCRP) is developing its new Strategic Plan – and high-performance computing is slated to play a critical role. Read more…

By Oliver Peckham

HPE Extreme Performance Solutions

Introducing the First Integrated System Management Software for HPC Clusters from HPE

How do you manage your complex, growing cluster environments? Answer that big challenge with the new HPC cluster management solution: HPE Performance Cluster Manager. Read more…

IBM Accelerated Insights

Are Your Software Licenses Impeding Your Productivity?

In my previous article, Improving chip yield rates with cognitive manufacturing, I highlighted the costs associated with semiconductor manufacturing, and how cognitive methods can yield benefits in both design and manufacture.  Read more…

U.S. Exascale Computing Project Releases Software Technology Progress Report

July 19, 2018

As is often noted the race to exascale computing isn’t just about hardware. This week the U.S. Exascale Computing Project (ECP) released its latest Software Technology (ST) Capability Assessment Report detailing progress so far. Read more…

By John Russell

InfiniBand Still Tops in Supercomputing

July 19, 2018

In the competitive global HPC landscape, system and processor vendors, nations and end user sites certainly get a lot of attention--deservedly so--but more than Read more…

By Tiffany Trader

HPC for Life: Genomics, Brain Research, and Beyond

July 19, 2018

During the past few decades, the life sciences have witnessed one landmark discovery after another with the aid of HPC, paving the way toward a new era of perso Read more…

By Warren Froelich

D-Wave Breaks New Ground in Quantum Simulation

July 16, 2018

Last Friday D-Wave scientists and colleagues published work in Science which they say represents the first fulfillment of Richard Feynman’s 1982 notion that Read more…

By John Russell

AI Thought Leaders on Capitol Hill

July 14, 2018

On Thursday, July 12, the House Committee on Science, Space, and Technology heard from four academic and industry leaders – representatives from Berkeley Lab, Argonne Lab, GE Global Research and Carnegie Mellon University – on the opportunities springing from the intersection of machine learning and advanced-scale computing. Read more…

By Tiffany Trader

HPC Serves as a ‘Rosetta Stone’ for the Information Age

July 12, 2018

In an age defined and transformed by its data, several large-scale scientific instruments around the globe might be viewed as a ‘mother lode’ of precious data. With names seemingly created for a ‘techno-speak’ glossary, these interferometers, cyclotrons, sequencers, solenoids, satellite altimeters, and cryo-electron microscopes are churning out data in previously unthinkable and seemingly incomprehensible quantities -- billions, trillions and quadrillions of bits and bytes of electro-magnetic code. Read more…

By Warren Froelich

Tsinghua Powers Through ISC18 Field

July 10, 2018

Tsinghua University topped all other competitors at the ISC18 Student Cluster Competition with an overall score of 88.43 out of 100. This gives Tsinghua their s Read more…

By Dan Olds

HPE, EPFL Launch Blue Brain 5 Supercomputer

July 10, 2018

HPE and the Ecole Polytechnique Federale de Lausannne (EPFL) Blue Brain Project yesterday introduced Blue Brain 5, a new supercomputer built by HPE, which displ Read more…

By John Russell

Pumping New Life into HPC Clusters, the Case for Liquid Cooling

July 10, 2018

High Performance Computing (HPC) faces some daunting challenges in the coming years as traditional, industry-standard systems push the boundaries of data center Read more…

By Scott Tease

Leading Solution Providers

SC17 Booth Video Tours Playlist

Altair @ SC17

Altair

AMD @ SC17

AMD

ASRock Rack @ SC17

ASRock Rack

CEJN @ SC17

CEJN

DDN Storage @ SC17

DDN Storage

Huawei @ SC17

Huawei

IBM @ SC17

IBM

IBM Power Systems @ SC17

IBM Power Systems

Intel @ SC17

Intel

Lenovo @ SC17

Lenovo

Mellanox Technologies @ SC17

Mellanox Technologies

Microsoft @ SC17

Microsoft

Penguin Computing @ SC17

Penguin Computing

Pure Storage @ SC17

Pure Storage

Supericro @ SC17

Supericro

Tyan @ SC17

Tyan

Univa @ SC17

Univa

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This