Nvidia is tempting fate with its generous use of the term “super” to describe new products—the latest is a “supermodel” that uses innovative techniques to create fine-looking AI models.
The company this week announced support for Meta’s Llama 3.1 AI model with 405-billion parameters on its GPUs. When used alongside its homegrown model called Nemotron, voila, it produces a “supermodel.”
This supermodel term relates to creating highly customized models using multiple LLMs, fine-tuning, guardrails, and adapters to create an AI application that suits customer requirements.
The “supermodel” may represent how LLMs are customized to meet organizational needs. Nvidia is trying to break away from the one-size-fits-all AI model and move toward complementary AI models and tools that work together.
The Llama 3.1-Nemotron technique resembles a good cop-bad cop routine. Llama 3.1 provides output, which passes through Nemotron, which double-checks if the output is good or bad. The reward is a fine-tuned model with more accurate responses.
“You can use those together to create synthetic data. So … create synthetic data, and the reward model says yes, that’s good data or not,” said Kari Briski, vice president at Nvidia, during a press briefing.
Nvidia is also tacking on more makeup for supermodels to look better. The AI factory backend includes many tools that can be mixed and matched to create a finely tuned model.
The added tooling provides faster responses and efficient use of computing resources.
“We’ve seen almost a 10-point increase in accuracy by simply customizing models,” Briski said.
An important component is NIM (Nvidia inference microservices), a downloadable container that provides the interface for customers to interact with AI. The model fine-tuning with multiple LLMs, guardrails, and optimizations happens in the background as users interact via the NIM.
Developers can now download the Llama 3.1 NIMs and fine-tune them with adapters that can customize the model with local data to generate more customized results.
Creating an AI supermodel is a complicated process. First, users need to figure out the ingredients, which could include Llama 3.1 with adapters to pull their own data into AI inferencing.
Customers can attach guardrails such as LlamaGuard or NeMo Guardrails to ensure chatbot answers remain relevant. In many cases, RAG systems and LoRA adapters help fine-tune models to generate more accurate results.
The model also involves extracting and pushing relevant data to a vector database through which information is evaluated, and responses are funneled to users. Companies typically have such information in databases, and Nvidia provides plugins that can interpret stored data for AI use.
“We’ve got models. We’ve got the compute. We’ve got the tooling and expertise,” Briski said.
Nvidia is partnering with many cloud providers to offer this service. The company is also building a sub-factory within its AI factory, called NIM factory, which provides the tooling for companies to build their own AI models and infrastructure.
The support for Llama 3.1 offers insight into how the company will integrate open-source technology into its proprietary AI offerings. Like with Linux, the company is taking open-source models, tuning them to its GPUs, and then linking them to its proprietary tech, including GPUs and CUDA.