The strain on data centers to deliver generative AI answers is mounting, and Lenovo is reverting to a server-client approach to offload some of that computing stress.
Lenovo is partnering with Nvidia to break up applications developed using its AI Enterprise software between its L40S server GPU and RTX 6000 Ada workstation GPUs.
The PC maker’s ThinkStation PX, which will host the RTX 6000 Ada GPU, will “bring expanded AI capability and data center performance to the desktop,” Lenovo wrote in a statement.
The PX workstation is paired with the Lenovo ThinkSystem SR675 V3 server, which will host Nvidia’s L40S GPU.
The Lenovo-Nvidia partnership was announced at the Austin server maker’s Tech World show.
The re-packaged server-client approach for AI is re-dubbed as “hybrid AI” by the companies.
The server-client products target companies looking to deploy their customized AI models in on-premise hardware.
In the process, the electricity costs of AI are also being offloaded from data centers to client workstations. The PX workstation can host up to four RTX 6000 Ada GPUs, with each graphics card drawing up to 300 watts of power.
According to Nvidia’s documentation, the graphics card is made using TSMC’s 4N process and is about two times faster than its predecessor when running on the same power envelope. The graphics card is based on PCIe 4.0 interconnect technology.
Most AI applications are hosted in a cloud, but companies in finance and health care sectors are looking to repatriate systems for cost and security reasons. Companies are renting data center hardware at colocation providers such as Equinix.
GPU shortages also make AI computing capacity unavailable through major cloud providers. Nvidia’s H100 is in short supply, and the company is now redirecting customers to L40S GPUs to run AI applications.
Lenovo’s ThinkSystem servers have L40S GPUs and are beefed up with the BlueField-3 data processing unit.
Chipmakers are also looking to build more AI capacity on client devices to offload the processing stress on cloud providers and intermediary servers. Intel recently talked about AI PCs and bringing more inferencing capabilities to its Meteor Lake chips, which will power next-generation desktops and laptops.
The server-client approach relies on adjustments down the AI food chain to empower servers and clients to generate results. One technique, called retrieval-augmented generation, creates microservices at lower levels of the inferencing pipeline so chatbots or applications can use on-premise hardware to get quicker results.
For example, according to technical documentation of retrieval-augmented generation on the company’s website, Nvidia uses the technique internally for a chatbot designed to assist employees in answering public-relation answers.
“The sample dataset includes the last two years of Nvidia press releases and corporate blog posts. Our development and deployment of that chatbot is the guide to this reference generative AI workflow,” according to the Nvidia documentation.
Lenovo is also partnering with Nvidia to create servers based on its MGX designs for metaverse applications.