Intel’s Strategy to Free Server Capacity by Pushing AI Inference to PCs

By Agam Shah

January 18, 2024

AI is here to stay and is becoming a larger part of the workload processed on servers and PCs.

That’s why Nvidia is seeing success as a chipmaker, and there is excitement around large language models such as Meta’s open-source Llama. An eager audience wants to get a handle on such models to get answers and automate mundane processes.

HPCwire typically doesn’t cover PCs, but those devices are coming in handy to lighten AI load on server chips. Remote HPC work (on-prem or cloud) often begins on a laptop or a desktop. PCs are efficient at inferencing, which today is largely done on servers. PCs could be more power efficient than servers in producing output from accessible open-source models such as Llama.

The current way to process AI on PCs is complicated. It requires jumping through hoops to load transformer models on PCs. In most cases, people need hardware such as Nvidia’s GPU. Then, users have to create a local neural network, download the PyTorch environment, and get Python to pull relevant transformer models from GitHub and Hugging Face.

Many versions of GenAI models are not designed for Nvidia’s GPUs and default to CPUs. PCs also need huge memory capacity. This technique does not guarantee a stable AI environment that runs within a browser interface. Users need to restart the AI environment with every reboot.

The recent Meteor Lake PCs shown at the CES trade show have automated the entire process. For example, the PCs can run the Llama 2 chatbot locally in the PC environment and lighten the server load. For many months, Hugging Face had the Llama 2 chatbot on its HuggingChat website, with the processing happening on server GPUs. The chatbot now runs the Mixtral model with 8 billion parameters.

Intel Meteor Lake includes heterogeneous AI capabilities including an NPU (low power), GPU (high throughput), and CPU (fast response) Source: Intel

Intel’s Meteor Lake has a neural processing unit that can handle inferencing for Llama 2. The HP PCs with the chips and Microsoft’s Windows 11 have a chatbot interface with Llama 2, which loads instantly by clicking an icon. That removes the complexity of using the command line to load the neural network environment, Python, and other environments.

Microsoft’s DirectML driver set- much like the DirectX drivers for graphics- provides the middleware required for inference processing on PCs with Intel’s NPUs. Intel has pre-trained Llama 2 for its neural processing units in its Meteor Lake chips. The drivers are based on the ONNX runtime, a connector that works with software stacks from Nvidia, Intel, and AMD.

Intel can’t pre-train closed models, such as OpenAI’s GPT models and Google’s Gemini, for its NPUs (Neural Processing Units). But there are hooks for customers to connect to GPT-4 from the PCs. Customers can choose the preferred model within the AI software, which sends user queries to servers.

Intel executives said it is pre-training more models for its NPUs, which will ultimately be available on Meteor Lake PCs. The supporting models will likely be open source.

Intel executives said developers could also download models that can be loaded for Intel’s NPUs. The models will be available from ModelZoo, which is Intel’s GitHub site for downloadable models.

That has a trickle-down effect – GPUs and servers will be free to handle more relevant tasks such as training. Offloading inferencing to PCs from servers is a logical next step as generative AI overwhelms server infrastructures. Google, Microsoft, and Meta are building out server capacities to deal with the onslaught of AI.

Intel is trying to build a continuum in which certain processing is being done on its Xeon chips and then kicked off to other chips in the data path, said Lisa Spelman, corporate vice president and general manager for Xeon at Intel, in an interview with HPCwire.

“What we’re actually pursuing…is that customer value of that seamless flow without needing user intervention,” Spelman said.

AI is expensive to run because of the cost and complexity of building out the data center infrastructure. Microsoft is building its server AI capacity around Nvidia GPUs but is also selling GPT-4 services running on Azure servers. The company is spending a lot of money running AI transactions and is trying to balance its Nvidia GPU usage capacity while generating more revenue from each query.

To be sure, offloading AI to client devices isn’t new — it is already happening in Windows 11 PCs with the integration of Bing. Google is integrating its Gemini features in its Pixel phones. AMD is also integrating neural processors in its PC chips and is working with PyTorch and Hugging Face to tune generative AI technologies for its GPUs and AI chips.

AI is taking over more PC functions, and the chipmakers are waking up to the trend. The hurdle is running AI on PCs beyond Nvidia’s dominance.

Intel is also spending time building the software ecosystem to put its chips to work on AI. The two-fold strategy involves moving developers to OneAPI and the open-source OpenVINO stack, making it easier for developers to write applications for AI chips.

The other component is support for SYCL, which ports Nvidia’s software stack users — CUDA developers — out of that proprietary environment and onto open-source code. It doesn’t matter if for Intel chips or AMD GPUs. The SYCL goal is to break Nvidia’s unrelenting hardware and software dominance over the AI market.

Intel is still far from pushing out a coherent hardware and software stack. The company offers a range of hardware options, including the Habana 2 AI chip, which is the company’s answer to Nvidia’s H100 and H200 GPUs. The company has built AMX accelerators into server CPUs and offers Movidius AI chips for computer vision. Intel’s high-performance GPU strategy is on a break, with the next major Falcon Shores GPU – which will come integrated with Habana AI circuits — scheduled for release next year.

Intel offers Developer Cloud, which provides a plethora of its cloud-based chips for developers to test out the performance and functionality of its AI applications. However, the implementation has issues with no clear pricing or Jupyter Notebook implementation for developers. The interface isn’t as polished as a service like Google Cloud’s Vertex or Google Colab, which provides many AI chip options and uses Google Drive for storage.

Intel hasn’t communicated Developer Cloud pricing either. Intel executives have been receptive to my complaints about the service and understand the importance of Developer Cloud to attract more developers.

Intel also said it would provide the Cnvrg.io service for developers to compare AI performance across different cloud services. Intel bought Cnvrg.io in 2020, and in late 2022, hinted at making an updated version of Cnvrg.io available in 2023, but no such announcements have been forthcoming.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

A Big Memory Nvidia GH200 Next to Your Desk: Closer Than You Think

February 22, 2024

Students of the microprocessor may recall that the original 8086/8088 processors did not have floating point units. The motherboard often had an extra socket for an optional 8087 math coprocessor. The math coprocessor ma Read more…

IonQ Reports Advance on Path to Networked Quantum Computing

February 22, 2024

IonQ reported reaching a milestone in its efforts to use entangled photon-ion connectivity to scale its quantum computers. IonQ’s quantum computers are based on trapped ions which feature long coherence times and qubit Read more…

Apple Rolls out Post Quantum Security for iOS

February 21, 2024

Think implementing so-called Post Quantum Cryptography (PQC) isn't important because quantum computers able to decrypt current RSA codes don’t yet exist? Not Apple. Today the consumer electronics giant started rolling Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to derive any substantial value from it. However, the GenAI hyp Read more…

QED-C Issues New Quantum Benchmarking Paper

February 20, 2024

The Quantum Economic Development Consortium last week released a new paper on benchmarking – Quantum Algorithm Exploration using Application-Oriented Performance Benchmarks – that builds on earlier work and is an eff Read more…

AWS Solution Channel

Shutterstock 2283618597

Deep-dive into Ansys Fluent performance on Ansys Gateway powered by AWS

Today, we’re going to deep-dive into the performance and associated cost of running computational fluid dynamics (CFD) simulations on AWS using Ansys Fluent through the Ansys Gateway powered by AWS (or just “Ansys Gateway” for the rest of this post). Read more…

Atom Computing Reports Advance in Scaling Up Neutral Atom Qubit Arrays

February 15, 2024

The scale-up challenge facing quantum computing (QC) is daunting and varied. It’s commonly held that 1 million qubits (or more) will be needed to deliver practical fault tolerant QC. It’s also a varied challenge beca Read more…

A Big Memory Nvidia GH200 Next to Your Desk: Closer Than You Think

February 22, 2024

Students of the microprocessor may recall that the original 8086/8088 processors did not have floating point units. The motherboard often had an extra socket fo Read more…

Apple Rolls out Post Quantum Security for iOS

February 21, 2024

Think implementing so-called Post Quantum Cryptography (PQC) isn't important because quantum computers able to decrypt current RSA codes don’t yet exist? Not Read more…

QED-C Issues New Quantum Benchmarking Paper

February 20, 2024

The Quantum Economic Development Consortium last week released a new paper on benchmarking – Quantum Algorithm Exploration using Application-Oriented Performa Read more…

The Pulse of HPC: Tracking 4.5 Million Heartbeats of 3D Coronary Flow

February 15, 2024

Working in Duke University's Randles Lab, Cyrus Tanade, a National Science Foundation graduate student fellow and Ph.D. candidate in biomedical engineering, is Read more…

It Doesn’t Get Much SWEETER: The Winter HPC Computing Festival in Corpus Christi

February 14, 2024

(Main Photo by Visit Corpus Christi CrowdRiff) Texas A&M University's High-Performance Research Computing (HPRC) team hosted the "SWEETER Winter Comput Read more…

Q-Roundup: Diraq’s War Chest, DARPA’s Bet on Topological Qubits, Citi/Classiq Explore Optimization, WEF’s Quantum Blueprint

February 13, 2024

Yesterday, Australian start-up Diraq added $15 million to its war chest (now $120 million) to build a fault tolerant computer based on quantum dots. Last week D Read more…

2024 Winter Classic: Razor Thin Margins in HPL/HPCG

February 12, 2024

The first task for the 11 teams in the 2024 Winter Classic student cluster competition was to run and optimize the LINPACK and HPCG benchmarks. As usual, the Read more…

2024 Winter Classic: We’re Back!

February 9, 2024

The fourth edition of the Winter Classic Invitational Student Cluster Competition is up and running. This year, we have 11 teams of eager students representin Read more…

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

Leading Solution Providers

Contributors

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire