HPC and AI – Two Communities Same Future

By Rob Farber

January 25, 2018

In this contributed feature, Rob Farber lays out Intel Fellow Al Gara’s vision for the unification of the “3 Pillars” of HPC currently underway.

According to Al Gara (Intel Fellow, Data Center Group), high performance computing and artificial intelligence will increasingly intertwine as we transition to an exascale future using new computing, storage, and communications technologies as well as neuromorphic and quantum computing chips. Gara observes that, “The convergence of AI, data analytics and traditional simulation will result in systems with broader capabilities and configurability as well as cross pollination.”

Gara sees very aggressive hardware targets being set for this intertwined HPC and AI future, where the hardware will deliver usable performance exceeding one exaflops of double precision performance (and much more for lower and reduced precision arithmetic). He believes a user focus on computation per memory capacity will pay big dividends across architectures and provide systems software and user applications the opportunity to stay on the exponential performance growth curve through exascale and beyond as shown in the performance table below.

Figure 1: Architectural targets for future systems that will support both HPC and AI. Note: PiB is petabytes of memory capacity

Unification of the “3 Pillars”

The vision Gara presented is based on a unification of the “3 Pillars” of HPC: Artificial Intelligence (AI) and Machine Learning (ML); Data Analytics and Big Data; plus High Performance Computing (HPC). What this means is that users of the future will program using models that leverage each other and that interact through memory.

Figure 2: Unifying the “3 Pillars” (Source Intel)

More concretely, Intel is working towards exascale systems that are highly configurable that can support upgrades to fundamentally new technologies including scalable processors, accelerators, neural network processors, neuromorphic chips, FPGAs, Intel persistent memory, 3D NAND, and custom hardware.

Figure 3: Working towards a highly configurable future (Source Intel)

The common denominator in Gara’s vision is that the same architecture will cover HPC, AI, and Data Analytics through configuration, which means there needs to be a consistent software story across these different hardware backends to address HPC plus AI workloads.

A current, very real instantiation of Gara’s vision is happening now through the use of Intel  nGraphT library in popular machine learning packages such as TensorFlow. Essentially, Intel nGraph library is being used as an intermediate language (in a manner analogous to LLVM) that can deliver optimized performance across a variety of hardware platforms from CPUs to FPGAs, dedicated neural network processors, and more.

Jason Knight (CTO office, Intel Artificial Intelligence Products Group) writes, “We see the Intel nGraph library  as the beginning of an ecosystem of optimization passes, hardware backends and frontend connectors to popular deep learning frameworks.”

Figure 4: XLA support for TensorFlow

Overall, Gara noted that “HPC is truly the birthplace of many architectures … and the testing ground” as HPC programmers, researchers, and domain scientists explore the architectural space map the performance landscape:

  • Data level parallel (from fine grain to coarse grain)
  • Energy efficient accelerators (compute density and energy efficiency often are correlated)
  • Exploiting predictable execution at all levels (cache to coarse grain)
  • Integrated fixed function data flow accelerators
  • General purpose data flow accelerators

Technology Opportunities

HPC and AI scientists will have access and the ability to exploit the performance capabilities of a number of new network, storage, and computing architectures.

In particular, HPC is a big driver of optical technology as fabrics represent one of the most challenging and costly elements of a supercomputer. For this reason, Gara believes that silicon photonics is game changing as the ability to integrate silicon and optical devices will deliver significant economic and performance advantages including room to grow (in a technology sense) as we transition to linear and ring devices and optical devices that communicate using multiple wavelengths of light.

New non-volatile storage technologies such as Intel persistent memory are blurring the line between memory and storage. Gara describes a new storage stack for exascale supercomputers, but of course this stack can be implemented on general compute clusters as well.

The key, Gara observes, is that this stack is designed from the ground up to use NVM storage. The result will be high throughput IO operations at arbitrary alignment and transaction sizes because applications can perform ultra-fine grained IO through a new userspace NVMe/pmem software stack. At a systems level, this means that users will be able to manage massively distributed NVM storage using scalable communications and IO operations across homogenous, shared-nothing servers in a software managed redundant, self-healing environment. In other words, high-performance, big-capacity scalable storage to support big-data and in-core algorithms such as log-runtime algorithms and data analytics on sparse and unstructured data sets.

Researchers are exploiting the advances in memory performance and capacity to change the way that we approach AI and HPC problems. Examples of such work range from the University of Utah to King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.

For example, Dr. Aaron Knoll (research scientist, Scientific Computing and Imaging Institute at the University of Utah) stresses the importance the logarithmic runtime algorithms in the Ospray visualization package. Logarithmic runtime algorithms are important for big visualizations and exascale computing. Basically the runtime increases slowly as data sizes increase. The logarithmic growth is important as the runtime increases slowly even when the data size increases by orders of magnitude. Otherwise, the runtime growth can prevent computations from finishing in a reasonable time, thus obviating the benefits of a large memory capacity computer.

As a result, large memory capacity (e.g., “fat”) compute nodes that provide low latency access to data are the enabling technology that can compete and beat massively parallel accelerators at their own game. Research at the University of Utah [PDF] shows a single large memory (three terabyte) workstation can deliver competitive and even superior interactive rendering performance compared to a 128-node GPU cluster. The University of Utah group is also exploring in-situ visualization using P-k-d trees and other fast, in-core approaches [PDF] to show that large “direct” in-core techniques are viable alternatives to traditional HPC visualization approaches.

In a second example, KAUST has been enhancing the ecosystem of numerical tools for multi-core and many-core processors in collaboration with Intel and the Tokyo Institute of Technology. Think of processing really big billion by billion sized matrices using CPU technology in a mathematically and computationally efficient manner.

The importance of these contributions in linear algebra and Fast Multi-pole Methods (FMM) can be appreciated by non-HPC scientists as numerical linear algebra is at the root of nearly all applications in engineering, physics, data science, and machine learning. The FMM method has been listed as one of the top ten algorithms of the 20th century.

Results show that HPC scientists now have the ability to solve faster and larger dense linear algebra problems and FMM related numerical problems than is possible using current highly optimized libraries such as the Intel Math Kernel Library (Intel MKL) running on the same hardware. These methods have been made available in highly optimized libraries bearing the names of ExaFMM, and HiCMA.

Looking to the future: Neuromorphic and Quantum Computing

The new neuromorphic test chips codenamed Loihi may represent a phase change in AI because they “self-learn”. Currently, data scientists spend a significant amount of time working with data to create training sets that are used to train a neural network to solve a complex problem. Neuromorphic chips eliminate the need for a human to create a training set (e.g., no human in the loop). Instead, humans need to validate the accuracy once the neuromorphic hardware has found a solution.

Succinctly, neuromorphic computing utilizes an entirely different computational model than traditional neural networks used in machine and deep learning. This model more accurately mimics how biological brains operate so neuromorphic chips can “learn” in an event driven fashion simply by observing their environment.  Further, they operate in a remarkably energy efficient manner. Time will tell if and when this provides an advantage. The good news is that neuromorphic hardware is now becoming available.

Gara states that the goal is to create a programmable architecture that delivers >100x energy efficiency over current architectures to solve hard AI problems efficiently. He provided examples such as sparse coding, dictionary learning, constraint satisfaction, pattern matching, and dynamic learning and adaptation.

Finally, Gara described advances in quantum computing that are being made possible through a collaboration with Delft University to make better Qubits (a Quantum Bit), improve connectivity between Qubits, and develop scalable IO. Quantum computing is non-intuitive because most people don’t intuitively grasp the idea of entanglement or something being in multiple states at the same time. Still the web contains excellent resources such as Quantum computing 101 at the University of Waterloo to help people make sense of this technology that is rapidly improving and, if realized, will change our computing universe forever.

Quantum computing holds the possibility of solving currently intractable problems using general purpose computers. Gara highlighted applications of the current Intel quantum computing efforts in quantum chemistry, microarchitecture and algorithm co-design, and post-quantum secure cryptography.

Summary

We are now seeing the introduction of new computing, storage, and manufacturing technologies that are forcing the AI and HPC communities to rethink their traditional approaches so they can use these ever more performant, scalable, and configurable architectures.  Al Gara pointed out, technologies are causing a unification of the “3 pillars” which, in turn, makes the future of AI and HPC in the data center indistinguishable from each other.

Rob Farber is a global technology consultant and author with an extensive background in HPC and in developing machine learning technology that he applies at national labs and commercial organizations. Rob can be reached at [email protected]

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

DNA Data Storage Innovation Reduces Write Times, Boosts Density

September 20, 2019

Storing digital data inside of DNA has been an idea since the 1960s, and recent developments have addressed some of the obstacles facing its scaled implementation. Now, researchers at the Technion-Israel Institute of Technology and the Interdisciplinary Center Herzliya have crossed another major milestone by using new techniques to store 10 petabytes of data in one gram of DNA. Read more…

By Oliver Peckham

IBM Opens Quantum Computing Center; Announces 53-Qubit Machine

September 19, 2019

Gauging progress in quantum computing is a tricky thing. IBM yesterday announced the opening of the IBM Quantum Computing Center in New York, with five 20-qubit systems up and running and a 53-qubit system expected to go Read more…

By John Russell

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber,Burak Yenier and Wolfgang Gentzsch, UberCloud

AWS Solution Channel

A Guide to Discovering the Best AWS Instances and Configurations for Your HPC Workload

The flexibility and heterogeneity of HPC cloud services provide a welcome contrast to the constraints of on-premises HPC. Every HPC configuration is potentially accessible to any given workload in a well-resourced cloud HPC deployment, with vast scalability to spin up as much compute as that workload demands in any given moment. Read more…

HPE Extreme Performance Solutions

Intel FPGAs: More Than Just an Accelerator Card

FPGA (Field Programmable Gate Array) acceleration cards are not new, as they’ve been commercially available since 1984. Typically, the emphasis around FPGAs has centered on the fact that they’re programmable accelerators, and that they can truly offer workload specific hardware acceleration solutions without requiring custom silicon. Read more…

IBM Accelerated Insights

Rumors of My Death Are Still Exaggerated: The Mainframe

[Connect with Spectrum users and learn new skills in the IBM Spectrum LSF User Community.]

As of 2017, 92 of the world’s top 100 banks used mainframes. Read more…

The European Processor Initiative’s Ambitious Vision of the Future

September 19, 2019

With the EuroHPC program well underway, much of the European Union’s ambition to be a leader in the exascale era rests with the European Processor Initiative (EPI). The project – which has a budget of roughly €160 Read more…

By Oliver Peckham

IBM Opens Quantum Computing Center; Announces 53-Qubit Machine

September 19, 2019

Gauging progress in quantum computing is a tricky thing. IBM yesterday announced the opening of the IBM Quantum Computing Center in New York, with five 20-qubit Read more…

By John Russell

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber,Burak Yenier and Wolfgang Gentzsch, UberCloud

The European Processor Initiative’s Ambitious Vision of the Future

September 19, 2019

With the EuroHPC program well underway, much of the European Union’s ambition to be a leader in the exascale era rests with the European Processor Initiative Read more…

By Oliver Peckham

When in Rome: AMD Announces New Epyc CPU for HPC, Server and Cloud Wins

September 18, 2019

Where else but Rome could AMD hold the official Europe launch party for its second generation of Epyc microprocessors, codenamed Rome. Today, AMD did just that announcing key server wins, important cloud provider wins... Read more…

By John Russell

Dell’s AMD-Powered Server Line Targets High-End Jobs

September 17, 2019

Dell Technologies rolled out five new servers this week based on AMD’s latest Epyc processor that are geared toward data-driven workloads running on increasin Read more…

By George Leopold

Cerebras to Supply DOE with Wafer-Scale AI Supercomputing Technology

September 17, 2019

Cerebras Systems, which debuted its wafer-scale AI silicon at Hot Chips last month, has entered into a multi-year partnership with Argonne National Laboratory and Lawrence Livermore National Laboratory as part of a larger collaboration with the U.S. Department of Energy... Read more…

By Tiffany Trader

IDAS: ‘Automagic’ HPC With Training Wheels

September 12, 2019

High-performance computing (HPC) for research is notorious for having steep barriers to entry. For this reason, high-tech disciplines were early adopters, have Read more…

By Elizabeth Leake

Univa Brings Cloud Automation to Slurm Users with Navops Launch 2.0

September 11, 2019

Univa, the company behind Grid Engine, announced today its HPC cloud-automation platform NavOps Launch will support the popular open-source workload scheduler Slurm. With the release of NavOps Launch 2.0, “Slurm users will have access to the same cloud automation capabilities... Read more…

By Tiffany Trader

High Performance (Potato) Chips

May 5, 2006

In this article, we focus on how Procter & Gamble is using high performance computing to create some common, everyday supermarket products. Tom Lange, a 27-year veteran of the company, tells us how P&G models products, processes and production systems for the betterment of consumer package goods. Read more…

By Michael Feldman

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

AMD Verifies Its Largest 7nm Chip Design in Ten Hours

June 5, 2019

AMD announced last week that its engineers had successfully executed the first physical verification of its largest 7nm chip design – in just ten hours. The AMD Radeon Instinct Vega20 – which boasts 13.2 billion transistors – was tested using a TSMC-certified Calibre nmDRC software platform from Mentor. Read more…

By Oliver Peckham

TSMC and Samsung Moving to 5nm; Whither Moore’s Law?

June 12, 2019

With reports that Taiwan Semiconductor Manufacturing Co. (TMSC) and Samsung are moving quickly to 5nm manufacturing, it’s a good time to again ponder whither goes the venerable Moore’s law. Shrinking feature size has of course been the primary hallmark of achieving Moore’s law... Read more…

By John Russell

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Nvidia Embraces Arm, Declares Intent to Accelerate All CPU Architectures

June 17, 2019

As the Top500 list was being announced at ISC in Frankfurt today with an upgraded petascale Arm supercomputer in the top third of the list, Nvidia announced its Read more…

By Tiffany Trader

Top500 Purely Petaflops; US Maintains Performance Lead

June 17, 2019

With the kick-off of the International Supercomputing Conference (ISC) in Frankfurt this morning, the 53rd Top500 list made its debut, and this one's for petafl Read more…

By Tiffany Trader

A Behind-the-Scenes Look at the Hardware That Powered the Black Hole Image

June 24, 2019

Two months ago, the first-ever image of a black hole took the internet by storm. A team of scientists took years to produce and verify the striking image – an Read more…

By Oliver Peckham

Chinese Company Sugon Placed on US ‘Entity List’ After Strong Showing at International Supercomputing Conference

June 26, 2019

After more than a decade of advancing its supercomputing prowess, operating the world’s most powerful supercomputer from June 2013 to June 2018, China is keep Read more…

By Tiffany Trader

Qualcomm Invests in RISC-V Startup SiFive

June 7, 2019

Investors are zeroing in on the open standard RISC-V instruction set architecture and the processor intellectual property being developed by a batch of high-flying chip startups. Last fall, Esperanto Technologies announced a $58 million funding round. Read more…

By George Leopold

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Intel Debuts Pohoiki Beach, Its 8M Neuron Neuromorphic Development System

July 17, 2019

Neuromorphic computing has received less fanfare of late than quantum computing whose mystery has captured public attention and which seems to have generated mo Read more…

By John Russell

Rise of NIH’s Biowulf Mirrors the Rise of Computational Biology

July 29, 2019

The story of NIH’s supercomputer Biowulf is fascinating, important, and in many ways representative of the transformation of life sciences and biomedical res Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This