AI: Scaling Neural Networks Through Cost-Effective Memory Expansion

June 26, 2017

Neural networks offer a powerful new resource for analyzing large volumes of complex, unstructured data. However, most of today’s Artificial Intelligence (AI) deep learning frameworks rely on in-core processing, which means that all the relevant data must fit into main memory. As the size and complexity of a neural network grows, cost becomes a limiting factor. DRAM memory is simply too expensive.

Of course, memory bottlenecks are hardly new in intensive-computing environments such as High Performance Computing (HPC). Transferring large data sets to large numbers of high-performance cores has been an increasing challenge for decades. Fortunately, that is beginning to change. New Intel memory and storage technologies are being integrated into the Intel® Scalable System Framework (Intel® SSF) to help reverse this trend. They do this by moving high volume data closer to the processing cores, and by accelerating data movement at each tier of the memory and storage hierarchy.

Moving Data Closer to Compute

To accelerate the flow of data into the compute cores, Intel is integrating high-speed memory directly into Intel® Xeon® Phi™ processors and future Intel® Xeon® processors. By moving memory closer to compute resources, these solutions help to optimize core utilization. They also help to improve workload scaling. Intel Xeon Phi processors, for example, have demonstrated up to 97 percent scaling efficiency for deep learning workloads up to 32-nodes1.

Transforming the Economics of Memory

Intel® Optane™ technology provides even more far-reaching advantages for data movement. This groundbreaking, non-volatile memory technology combines the speed of DRAM with the capacity and cost efficiency of NAND.  Based on Intel® Optane™ technology, Intel® Optane™ SSDs are designed to provide 5-8x faster performance than Intel’s fastest NAND-based SSDs2.  Intel Optane SSDs can be combined with Intel® Memory Drive Technology to extend memory and provide cost-effective, large-memory pools.

When connected over the PCIe bus, an Intel Optane SSD provides an efficient extension to system memory. Behind the scenes, the Intel Memory Drive Technology transparently integrates the SSD into the memory subsystem and orchestrates data movement. “Hot” data is automatically pushed onto the DRAM to maximize performance. The OS and applications see a single high-speed memory pool, so no software changes are required.

Figure 1. You can extend memory cost-effectively using high-speed Intel® Optane™ SSDs and Intel® Memory Drive Technology.
Figure 1. You can extend memory cost-effectively using high-speed Intel® Optane™ SSDs and Intel® Memory Drive Technology.

How good is performance? Based on Intel internal testing, the DRAM + Intel Optane SSD combination provides roughly 75 to 80 percent of the performance of a comparable DRAM-only solution3. The outlook may be even better for deep learning applications. Intel engineers found that the DRAM + Intel Optane SSD combination can optimize a data locality and minimize cross socket traffic which could result in better performance4 than the DRAM-only solution. This is the case for big datasets distributed across all system memory where every application thread has access to all data. Such an example could be found in the General Matrix Multiplication (GEMM) benchmark which represents some portion of Deep Learning core algorithms.

Accelerating Storage

With today’s exploding data volumes, transferring data from bulk storage to local storage to cluster memory can lead to operational bottlenecks at any point. Intel Optane SSDs can be used as high-speed buffers to break through these barriers. A relatively small number of Intel® Optane™ SSDs can dramatically reduce data transfer times. They can also improve performance for applications that are constrained by excessive storage latency or insufficient storage bandwidth.

Figure 2. Intel® Scalable System Framework simplifies the design of efficient, high-performing clusters that optimize the value of HPC investments.
Figure 2. Intel® Scalable System Framework simplifies the design of efficient, high-performing clusters that optimize the value of HPC investments.

Simplifying Integration with Intel® Scalable System Framework (Intel® SSF)

By accelerating data movement, Intel Optane SSDs—and future Intel products based on Intel Optane technology—will help to transform many aspects of HPC and AI.  Their inclusion in Intel SSF will make it easier for organizations to take advantage of emerging memory and storage solutions based on this new technology.

As deep learning emerges as a mainstream HPC workload, these balanced, large-memory cluster solutions will help organizations deploy massive neural networks to analyze some of the world’s largest and most complex datasets.Intel SSF provides a scalable blueprint for efficient clusters that deliver higher value through increased integration and balanced designs. This system-level focus helps Intel synchronize innovation across all layers of the HPC and AI solution stack, so new technologies can be integrated more easily by system vendors and end-user organizations.

Stay tuned for additional articles focusing on the benefits Intel SSF brings to AI at each level of the solution stack through balanced innovation in compute, fabric, storage, and software technologies.

 

1 https://syncedreview.com/2017/04/15/what-does-it-take-for-intel-to-seize-the-ai-market/

2 https://www.intel.com/content/www/us/en/solid-state-drives/optane-ssd-dc-p4800x-brief.html

3 Based on Intel internal testing using SGEMM MKL from the Intel® Math Kernel Library. System under test (DRAM + SSD): 2 X Intel® Xeon® processor E5-2699 v4, Intel® Server Board S2600WT, 128 GB DDR4 memory + 4 X Intel® Optane SSD SSDPED1K375GA), Cent OS 7.3.1611. Baseline system (all DRAM): 2 X Intel® Xeon® processor E5-2699 v4, Intel® Server Board S2600WT, 768 GB DDR4 memory, Cent OS 7.3.1611.

4 Achieving higher performance while using less DRAM memory was made possible by Intel® Memory Drive Technology, which automatically takes advantage of NUMA technology in Intel processors to enhance data placement not only across the hybrid memory space, but also within the available DRAM memory.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

At 50, Foxconn Celebrates Graduation from Connectors to AI Supercomputing

October 8, 2024

Foxconn is celebrating its 50th birthday this year. It started by making connectors, then moved to systems, and now, a supercomputer. The company announced it would build the supercomputer with Nvidia's Blackwell GPUs an Read more…

ZLUDA Takes Third Wack as a CUDA Emulator

October 7, 2024

The ZLUDA CUDA emulator is back in its third invocation. At one point, the project was quietly funded by AMD and demonstrated the ability to run unmodified CUDA applications with near-native performance on AMD GPUs. Cons Read more…

Quantum Companies D-Wave and Rigetti Again Face Stock Delisting

October 4, 2024

Both D-Wave (NYSE: QBTS) and Rigetti (Nasdaq: RGTI) are again facing stock delisting. This is a third time for D-Wave, which issued a press release today following notification by the SEC. Rigetti was notified of delisti Read more…

Alps Scientific Symposium Highlights AI’s Role in Tackling Science’s Biggest Challenges

October 4, 2024

ETH Zürich recently celebrated the launch of the AI-optimized “Alps” supercomputer with a scientific symposium focused on the future possibilities of scientific AI thanks to increased compute power and a flexible ar Read more…

The New MLPerf Storage Benchmark Runs Without ML Accelerators

October 3, 2024

MLCommons is known for its independent Machine Learning (ML) benchmarks. These benchmarks have focused on mathematical ML operations and accelerators (e.g., Nvidia GPUs). Recently, MLCommons introduced the results of its Read more…

DataPelago Unveils Universal Engine to Unite Big Data, Advanced Analytics, HPC, and AI Workloads

October 3, 2024

DataPelago this week emerged from stealth with a new virtualization layer that it says will allow users to move AI, data analytics, and ETL workloads to whatever physical processor they want, without making code changes, Read more…

At 50, Foxconn Celebrates Graduation from Connectors to AI Supercomputing

October 8, 2024

Foxconn is celebrating its 50th birthday this year. It started by making connectors, then moved to systems, and now, a supercomputer. The company announced it w Read more…

The New MLPerf Storage Benchmark Runs Without ML Accelerators

October 3, 2024

MLCommons is known for its independent Machine Learning (ML) benchmarks. These benchmarks have focused on mathematical ML operations and accelerators (e.g., Nvi Read more…

DataPelago Unveils Universal Engine to Unite Big Data, Advanced Analytics, HPC, and AI Workloads

October 3, 2024

DataPelago this week emerged from stealth with a new virtualization layer that it says will allow users to move AI, data analytics, and ETL workloads to whateve Read more…

Stayin’ Alive: Intel’s Falcon Shores GPU Will Survive Restructuring

October 2, 2024

Intel's upcoming Falcon Shores GPU will survive the brutal cost-cutting measures as part of its "next phase of transformation." An Intel spokeswoman confirmed t Read more…

How GenAI Will Impact Jobs In the Real World

September 30, 2024

There’s been a lot of fear, uncertainty, and doubt (FUD) about the potential for generative AI to take people’s jobs. The capability of large language model Read more…

IBM and NASA Launch Open-Source AI Model for Advanced Climate and Weather Research

September 25, 2024

IBM and NASA have developed a new AI foundation model for a wide range of climate and weather applications, with contributions from the Department of Energy’s Read more…

Intel Customizing Granite Rapids Server Chips for Nvidia GPUs

September 25, 2024

Intel is now customizing its latest Xeon 6 server chips for use with Nvidia's GPUs that dominate the AI landscape. The chipmaker's new Xeon 6 chips, also called Read more…

Building the Quantum Economy — Chicago Style

September 24, 2024

Will there be regional winner in the global quantum economy sweepstakes? With visions of Silicon Valley’s iconic success in electronics and Boston/Cambridge� Read more…

Shutterstock_2176157037

Intel’s Falcon Shores Future Looks Bleak as It Concedes AI Training to GPU Rivals

September 17, 2024

Intel's Falcon Shores future looks bleak as it concedes AI training to GPU rivals On Monday, Intel sent a letter to employees detailing its comeback plan after Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

Granite Rapids HPC Benchmarks: I’m Thinking Intel Is Back (Updated)

September 25, 2024

Waiting is the hardest part. In the fall of 2023, HPCwire wrote about the new diverging Xeon processor strategy from Intel. Instead of a on-size-fits all approa Read more…

AMD Clears Up Messy GPU Roadmap, Upgrades Chips Annually

June 3, 2024

In the world of AI, there's a desperate search for an alternative to Nvidia's GPUs, and AMD is stepping up to the plate. AMD detailed its updated GPU roadmap, w Read more…

Ansys Fluent® Adds AMD Instinct™ MI200 and MI300 Acceleration to Power CFD Simulations

September 23, 2024

Ansys Fluent® is well-known in the commercial computational fluid dynamics (CFD) space and is praised for its versatility as a general-purpose solver. Its impr Read more…

Shutterstock_1687123447

Nvidia Economics: Make $5-$7 for Every $1 Spent on GPUs

June 30, 2024

Nvidia is saying that companies could make $5 to $7 for every $1 invested in GPUs over a four-year period. Customers are investing billions in new Nvidia hardwa Read more…

Shutterstock 1024337068

Researchers Benchmark Nvidia’s GH200 Supercomputing Chips

September 4, 2024

Nvidia is putting its GH200 chips in European supercomputers, and researchers are getting their hands on those systems and releasing research papers with perfor Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Leading Solution Providers

Contributors

IBM Develops New Quantum Benchmarking Tool — Benchpress

September 26, 2024

Benchmarking is an important topic in quantum computing. There’s consensus it’s needed but opinions vary widely on how to go about it. Last week, IBM introd Read more…

Quantum and AI: Navigating the Resource Challenge

September 18, 2024

Rapid advancements in quantum computing are bringing a new era of technological possibilities. However, as quantum technology progresses, there are growing conc Read more…

Intel Customizing Granite Rapids Server Chips for Nvidia GPUs

September 25, 2024

Intel is now customizing its latest Xeon 6 server chips for use with Nvidia's GPUs that dominate the AI landscape. The chipmaker's new Xeon 6 chips, also called Read more…

Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium

May 30, 2024

Consider the GPU. An island of SIMD greatness that makes light work of matrix math. Originally designed to rapidly paint dots on a computer monitor, it was then Read more…

Google’s DataGemma Tackles AI Hallucination

September 18, 2024

The rapid evolution of large language models (LLMs) has fueled significant advancement in AI, enabling these systems to analyze text, generate summaries, sugges Read more…

Microsoft, Quantinuum Use Hybrid Workflow to Simulate Catalyst

September 13, 2024

Microsoft and Quantinuum reported the ability to create 12 logical qubits on Quantinuum's H2 trapped ion system this week and also reported using two logical qu Read more…

IonQ Plots Path to Commercial (Quantum) Advantage

July 2, 2024

IonQ, the trapped ion quantum computing specialist, delivered a progress report last week firming up 2024/25 product goals and reviewing its technology roadmap. Read more…

US Implements Controls on Quantum Computing and other Technologies

September 27, 2024

Yesterday the Commerce Department announced export controls on quantum computing technologies as well as new controls for advanced semiconductors and additive Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire