MosaicML, Led by Naveen Rao, Comes Out of Stealth Aiming to Ease Model Training

By Todd R. Weiss

October 15, 2021

With more and more enterprises turning to AI for a myriad of tasks, companies quickly find out that training AI models is expensive, difficult and time-consuming.

Finding a new approach to deal with those cascading challenges is the aim of a new startup, MosaicML, that just came out of stealth – helmed by ex-Nervana, ex-Intelite Naveen Rao – and is now preparing to launch a cloud-based neural network training system that aims to attack the problems at the algorithmic and systems levels.

The idea is to make machine learning more efficient through a composition – a mosaic – of methods that together accelerate and improve training, the company announced in an Oct. 13 blog post from its founders.

MosaicML’s core idea is that since it is expensive to train machine learning models in the cloud, in datacenters or on-premises, that the answer lies in eliminating inefficiencies in the learning process.

The startup has built two components that will be part of its future product offering, Naveen Rao, the CEO and co-founder, told EnterpriseAIComposer is an open-source library of methods for efficient ML training that can be brought together into “recipes,” starting with some 20 different methods curated and rigorously benchmarked for their performance benefits. Additional methods will be added as the product matures.

The other MosaicML component is Explorer, a visualization tool and interface that gives enterprise developers the ability to simulate, map out and choose the best routes for running models by comparing costs, quality and the time that will be needed to run the experiments. Explorer is designed to give users a visualization of the measured trade-offs of cost, time, and quality across thousands of training runs on standard benchmarks. Users can filter by method, cloud and hardware type to reach their optimal operating test protocols.

“The key here is that these techniques actually make the training process more compute efficient,” said Rao.

The idea and need for MosaicML came out of the rise of AI, machine learning and the steps that were initially established to create and test models, he said. The original technologies were established over time, and they worked, but it turns out that there are better ways to do things, he added.

“It is like anything else,” said Rao. “[Data scientists] came up with something that basically works but was pretty inefficient. The deep learning world has been about showing that things can work and not be efficient, it did not really matter because compute was relatively cheap.”

The problem was it was only true when the models were small, said Rao.

“Once models got very big, the compute side of it actually got very expensive,” he said. Now we are at this inflection point where the models got very big and data sets are very large, so the expenses are now quite big. GPT-3 [the natural language AI model] cost $5 million to train – that was one single experiment that cost $5 million.”

That is where MosaicML began seeing its opportunity in the world of AI and machine learning.

“We are focused on enterprise companies whose core competency is not AI or ML, but they need to be able to use these techniques in a cost-effective manner to extract value from their data,” said Rao. “If you are Facebook or Google, you have a huge team who can do this, and they can spare the expensive computing and manage it on their own. They will eventually probably use these tools as well … but they do not really need us upfront. The enterprise is where we go first.”

MosaicML was incorporated on Dec. 1, 2020, and has raised $37 million from investors so far, including Lux Capital, DCVC, Future Ventures, Playground Global, AME, Correlation, E14 and several angel investors.

Rao said the company is having conversations with customers but that it has not yet made any sales. MosaicML released its open source library so potential customers and developers can use it and get a sense of its capabilities and features.

The company’s product, which has not yet been officially named, is expected to be available in the beginning of 2022 in a free version and in a paid, supported version.

“When you are training a model, all you really care about is cost,” he said. But later you begin to think about other factors, including how long things will take and how it will perform.

“This Explorer visualizer allows you to see the difference,” he said. “If I want to not pay as much and just do a bunch of experiments for cheap, I can do that, and predict very rationally where I will be when they are done. The idea is to give users tools to allow them to understand how much things cost. If they don’t have any idea, they really cannot plan, and it becomes very difficult to run these experiments.”

Initially, MosaicML will work on models that are being done in the cloud, said Rao, since those variables are easier to measure based on rate costs from each vendor. He said he expects similar capabilities will be available for on-premises uses in the future. “But we are not there yet.” He added.

Rao has been involved with AI for some time. He founded AI chip company Nervana Systems, which was acquired by Intel in 2016. He then joined Intel and started and ran Intel’s AI division, he said. Intel shuttered Nervana in early 2020 and Rao left the company.

Karl Freund, founder and principal analyst at Cambrian AI Research, told EnterpriseAI that MosaicML has chosen a viable approach to helping AI users.

“MosaicML is going after the model optimization problem,” he said. “Some optimizations, such as Nvidia TensorRT, optimize for specific hardware, but MosaicML is going after algorithmic optimization.”

That makes it model-specific and not hardware-specific, he said.

“AI hardware for training is very expensive and exotic technology,” said Freund. “If you can reduce training time by 50 percent, that reduces costs accordingly. And the client does not have to hire the super-high-priced talent, either.”

Another analyst, Addison Snell, CEO of Intersect360 Research, said that the popularity of AI “is bringing more organizations to high-performance computing for the first time, whether they think of it that way or not. And for any organization moving into HPC, model creation and optimization is one of the biggest challenges, certainly more so than simply getting access to hardware.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

Nvidia Appoints Andy Grant as EMEA Director of Supercomputing, Higher Education, and AI

March 22, 2024

Nvidia recently appointed Andy Grant as Director, Supercomputing, Higher Education, and AI for Europe, the Middle East, and Africa (EMEA). With over 25 years of high-performance computing (HPC) experience, Grant brings a Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Houston We Have a Solution: Addressing the HPC and Tech Talent Gap

March 15, 2024

Generations of Houstonian teachers, counselors, and parents have either worked in the aerospace industry or know people who do - the prospect of entering the fi Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire