Nvidia Releasing Open-Source Optimized Tensor RT-LLM Runtime with Commercial Foundational AI Models to Follow Later This Year

By Agam Shah

September 14, 2023

Nvidia’s large-language models will become generally available later this year, the company confirmed.

Organizations widely rely on Nvidia’s graphics processors to write AI applications. The company has also created proprietary pre-trained models similar to OpenAI’s GPT-4 and Google’s PaLM-2.

Customers can use their own corpus of data, embed it in Nvidia’s pre-trained large language models, and build their own AI applications. The foundational models cover text, speech, images, and other forms of data.

Nvidia has three foundational models. The most publicized is NeMo, which includes Megatron, in which customers can build ChatGPT-style chatbots. NeMo also has TTS, which converts text to human speech.

The second model, BioNemo, is a large-language model targeted at the biotech industry. Nvidia’s third AI model is Picasso, which can manipulate images and videos. Customers will be able to engage Nvidia’s foundational models through software and services products from the company and its partners.

“We’ll be offering our foundational model services a little later this year,” said Dave Salvator, a product marketing director at Nvidia, during a conference call. Nvidia’s spokeswoman was not specific on availability dates for particular models.

The NeMo and BioNeMo services are currently in early access to customers via the AI Enterprise software and will likely be the first ones available commercially. Picasso is still further out from release, and services around the model may not become available as quickly.

“We are currently working with select customers, and others interested can sign up to get notified for when the service opens up more broadly,” an Nvidia spokeswoman said.

The models will run best on Nvidia’s GPUs, which are in short supply. The company is working to meet the demand, said Nvidia CFO Colette Kress at the recent Citi Global Technology Conference this week.

The GPU shortage creates a barrier to adoption, but customers can access Nvidia’s software and services through the company’s DGX Cloud or through Amazon Web Services, Google Cloud, Microsoft Azure, or Oracle Cloud, which have H100 installations.

Nvidia’s foundational models are important ingredients in the company’s concept of an “AI factory,” in which customers do not have to worry about coding or hardware. An AI factory can take in raw data and churn it through GPUs and LLMs. The output is actionable data for companies.

The LLMs will be part of the AI Enterprise software suite, which includes frameworks, foundation models, and other AI technologies. The technology stack also includes tools like Tao, which is a no-code AI programming environment, and NeMo Guardrails, which can analyze and redirect output to provide more reliability on responses.

Nvidia is relying on its partners to sell and help companies deploy AI models such as NeMo to its accelerated computing platform.

Some Nvidia partners include software companies Snowflake and VMware and AI service providers Huggingface. Nvidia has also partnered with consulting company Deloitte for larger deployments. Nvidia has already announced it will bring its NeMo LLM to Snowflake Data Cloud, on which top organizations deposit data. Snowflake Data Cloud users will be able to generate AI-related insights and create AI applications by connecting their data to NeMo and Nvidia’s GPUs.

The partnership with VMware brings the AI Enterprise software to VMware Private Cloud. VMware’s vSphere and Cloud Foundation platforms provide administrative and management tools for AI deployments in virtual machines across Nvidia’s hardware in the cloud. The deployments can also extend to non-Nvidia CPUs.

Nvidia is about 80% a software company, and its software platform is the operating system for AI, said Manuvir Das, vice president for enterprise computing at the company during Goldman Sachs’ Communacopia+Technology conference.

Last year, people were still wondering how AI would help, but this year, “customers come to see us now as they already know what the use case is,” Das said. The barrier to entry for AI remains high, and the challenge has been in the development of foundational models such as NeMo, GPT-4, or Meta’s Llama 2.

“You have to find all the data, the right data, you have to curate it. You have to go through this whole training process before you get a usable model,” Das said.

But after millions in investments for development and training, the models are now becoming available to customers.

“Now they’re ready to use. You start from there, you finetune with your own data, and you use the model,” Das said.

Nvidia has projected a $150 billion market opportunity for the AI Enterprise software stack, which is half that of the $300 billion hardware opportunity, which includes GPUs and systems. The company’s CEO, Jensen Huang, has previously talked about AI computing being a radical shift from the old style of computing reliant on CPUs.

Open-Source Tensor-RT LLM

Nvidia separately announced Tensor-RT LLM, which improves the inferencing performance of foundational models on its GPUs. The runtime can extract the best inferencing performance of a wide range of models such as Bloom, Falcon, and Meta’s latest Llama models.

A heavy-duty H100 is considered the best for training models but may be overkill for inferencing when factoring in the power and performance of the GPU. Nvidia has the lower-power L40s and L4 GPUs for inferencing but is making the H100 viable for inference if the GPUs are not busy.

Nvidia low power L4 GPU

The Tensor-RT LLM is specially optimized for low-level inferencing on H100 to reduce idle time and keep the GPU occupied at close to 100%, said Ian Buck, vice president of hyperscale and HPC at Nvidia.

Buck said that the combination of Hopper and Tensor-RT LLM software improved inference performance by eight times compared to the A100 GPU.

“As people develop new large language models, these kernels can be reused to continue to optimize and improve performance and build new models. As the community implements new techniques, we will continue to place them … into this open-source repository,” Buck continued.

Tensor RT-LLM has a new kind of scheduler for the GPU, which is called inflight batching. The scheduler allows work to enter and exit the GPU independently of other tasks.

“In the past, batching came in as work requests. The batch was scheduled onto a GPU or processor, and then when that entire batch was completed … the next batch would come in. Unfortunately, in high variability workloads, that would be the longest workload … and we often see GPUs and other things be underutilized,” Buck said.

With Tensor-RT LLM and in-flight batching, work can enter and leave the batch independently and asynchronously to keep the GPU 100% occupied.

“This all happens automatically inside the Tensor RT-LLM runtime system and it dramatically improves H100 efficiency,” Buck said.

The runtime is in early access now and will likely be released next month.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Watsonx Brings AI Visibility to Banking Systems

September 21, 2023

A new set of AI-based code conversion tools is available with IBM watsonx. Before introducing the new "watsonx," let's talk about the previous generation Watson, perhaps better known as "Jeopardy!-Watson." The origi Read more…

Researchers Advance Topological Superconductors for Quantum Computing

September 21, 2023

Quantum computers process information using quantum bits, or qubits, based on fragile, short-lived quantum mechanical states. To make qubits robust and tailor them for applications, researchers from the Department of Ene Read more…

Fortran: Still Compiling After All These Years

September 20, 2023

A recent article appearing in EDN (Electrical Design News) points out that on this day, September 20, 1954, the first Fortran program ran on a mainframe computer. Originally developed by IBM, Fortran (or FORmula TRANslat Read more…

Intel’s Gelsinger Lays Out Vision and Map at Innovation 2023 Conference

September 20, 2023

Intel’s sprawling, optimistic vision for the future was on full display yesterday in CEO Pat Gelsinger’s opening keynote at the Intel Innovation 2023 conference being held in San Jose. While technical details were sc Read more…

Intel Showcases “AI Everywhere” Strategy in MLPerf Inferencing v3.1

September 18, 2023

Intel used the latest MLPerf Inference (version 3.1) results as a platform to reinforce its developing “AI Everywhere” vision, which rests upon 4th gen Xeon CPUs and Gaudi2 (Habana) accelerators. Both fared well on t Read more…

AWS Solution Channel

Shutterstock 1679562793

How Maxar Builds Short Duration ‘Bursty’ HPC Workloads on AWS at Scale

Introduction

High performance computing (HPC) has been key to solving the most complex problems in every industry and has been steadily changing the way we work and live. Read more…

QCT Solution Channel

QCT and Intel Codeveloped QCT DevCloud Program to Jumpstart HPC and AI Development

Organizations and developers face a variety of issues in developing and testing HPC and AI applications. Challenges they face can range from simply having access to a wide variety of hardware, frameworks, and toolkits to time spent on installation, development, testing, and troubleshooting which can lead to increases in cost. Read more…

Survey: Majority of US Workers Are Already Using Generative AI Tools, But Company Policies Trail Behind

September 18, 2023

A new survey from the Conference Board indicates that More than half of US employees are already using generative AI tools, at least occasionally, to accomplish work-related tasks. Yet some three-quarters of companies st Read more…

Watsonx Brings AI Visibility to Banking Systems

September 21, 2023

A new set of AI-based code conversion tools is available with IBM watsonx. Before introducing the new "watsonx," let's talk about the previous generation Watson Read more…

Intel’s Gelsinger Lays Out Vision and Map at Innovation 2023 Conference

September 20, 2023

Intel’s sprawling, optimistic vision for the future was on full display yesterday in CEO Pat Gelsinger’s opening keynote at the Intel Innovation 2023 confer Read more…

Intel Showcases “AI Everywhere” Strategy in MLPerf Inferencing v3.1

September 18, 2023

Intel used the latest MLPerf Inference (version 3.1) results as a platform to reinforce its developing “AI Everywhere” vision, which rests upon 4th gen Xeon Read more…

China’s Quiet Journey into Exascale Computing

September 17, 2023

As reported in the South China Morning Post HPC pioneer Jack Dongarra mentioned the lack of benchmarks from recent HPC systems built by China. “It’s a we Read more…

Nvidia Releasing Open-Source Optimized Tensor RT-LLM Runtime with Commercial Foundational AI Models to Follow Later This Year

September 14, 2023

Nvidia's large-language models will become generally available later this year, the company confirmed. Organizations widely rely on Nvidia's graphics process Read more…

MLPerf Releases Latest Inference Results and New Storage Benchmark

September 13, 2023

MLCommons this week issued the results of its latest MLPerf Inference (v3.1) benchmark exercise. Nvidia was again the top performing accelerator, but Intel (Xeo Read more…

Need Some H100 GPUs? Nvidia is Listening

September 12, 2023

During a recent earnings call, Tesla CEO Elon Musk, the world's richest man, summed up the shortage of Nvidia enterprise GPUs in a few sentences.  "We're us Read more…

Intel Getting Squeezed and Benefiting from Nvidia GPU Shortages

September 10, 2023

The shortage of Nvidia's GPUs has customers searching for scrap heap to kickstart makeshift AI projects, and Intel is benefitting from it. Customers seeking qui Read more…

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

Leading Solution Providers

Contributors

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

ISC 2023 Booth Videos

Cornelis Networks @ ISC23
Dell Technologies @ ISC23
Intel @ ISC23
Lenovo @ ISC23
Microsoft @ ISC23
ISC23 Playlist
  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire