Hyping an AI chip is one thing, but proving its usability in the commercial market is a bigger challenge.
Some AI chip companies — which are still proving the viability of their chips — are establishing their own AI computing infrastructure to educate customers and prove the viability of their chips.
Cerebras Systems, which makes the largest chip in the world, is now setting up artificial intelligence data centers that brings its experimental AI mega-processor out of labs to commercial customers
Cerebras last year won the coveted Gordon Bell prize after its hardware aided in Covid-19 research, and the company’s hardware has name recognition in academia and national labs, But the commercial expansion will pit its hardware against a computing infrastructure built on Nvidia’s GPUs provided by major cloud providers that include Google, Amazon, Microsoft, and Oracle.
The company, that has only a few hundred employees, is enlisting the help of a Middle Eastern cloud provider G42, an artificial intelligence and cloud computing company, to create an AI infrastructure. The companies are partnering to build three commercial AI data centers on U.S. soil by the end of this year.
Group 42 purchased AI systems from Cerebras only after vetting the startup.
“We had experience in building and managing operating large supercomputers. We had experience implementing massive generative AI models. And we had deep expertise in manipulating cleaning and managing huge datasets,” Andrew Feldman, CEO of Cerebras, told HPCwire.
But Cerebras faces a daunting road ahead in wooing commercial clients on its systems. It will have a tough time unseating Nvidia, which has a dominant software and hardware foothold in the AI market. Large commercial enterprises that include Microsoft and Facebook are betting their AI future on Nvidia’s GPUs.
Nvidia also has its own GPU data centers called Launchpad where developers can prototype AI applications. Intel has also established a cloud service with its own AI chips for developers and customers to prototype and run applications. Intel’s Dev Cloud recently added the Data Center GPU Max 1100 for developers to test out AI applications.
The three data centers built by Cerebras and G42 will deliver an aggregate AI compute power of 12 exaflops of FP-16 performance. Cerebras has created three new systems under the brand Condor Galaxy, each of which will deliver 4 exaflops of performance.
The first system, called Condor Galaxy-1 (CG-1), is already being deployed in a California data center and will have 54 million cores. The company will add more data centers over the next year.
“It’s set to expand to… nine exaflops machines, a total of 36 exaflops,” by the end of 2024, Feldman said.
The CG-1 AI mega-cluster brings together 64 CS-2 systems, which is an existing server offering that runs on Cerebras’ AI chip. The CS-2 is already being used in the U.S. Department of Energy’s Argonne National Laboratory and the Pittsburgh Supercomputing Center.
G42, which is considered a cloud and AI heavyweight in the Middle East, will sell the compute capacity to companies that want to train large-language models. G42 is targeting commercial customers in verticals that include healthcare, financial services, and manufacturing.
The promise of LLMs was demonstrated late last year by OpenAI’s LLM ChatGPT, which gained 100 million users in a few months. Since then, Google, Microsoft, and others have scrambled to implement their own large-language models in search and productivity applications.
Large companies are building their own models, but the compute capacity is scarce with Nvidia GPU shortages. That has created an opportunity for companies like Cerebras, whose AI chips have been used and cited in many academic papers authored by researchers at commercial organizations.
“We support up to 600 billion parameters, extensible to 100 trillion parameters,” Feldman said. Google and Microsoft have not reported the number of parameters in LLMs powering their search and productivity applications.
G42 is backed by Mubadala, which is funded by the UAE government. Mubadala also had an equity stake in AMD before a major sell-off in 2019. Feldman was previously employed with AMD after his server startup, SeaMicro, was absorbed by the chip maker in 2014.
A potential UAE government connection to the Cerebras-G42 partnership carries political intrigue considering the weaponization of semiconductors and AI in trade and policies, but Feldman said there were no concerns.
“We built the fastest AI processor, and we built the fastest AI system. Of course, we work with the [U.S.] Department of Commerce and regulators. We are engaged with them. We understand what the rules are,” Feldman said.
Cerebras’ CG-1 execution model relies on a technology called “weight streaming,” which disaggregates the memory, computing, and networking into separate clusters. AI computing primarily depends on the model’s size, and the system has technologies for memory and computing to scale separately. All the data processing is done on Cerebras’s main AI chip, the WSE-2, which has 850,000 cores, 2.6 trillion transistors, 40GB of SRAM memory, and 20 petabits per second of bandwidth.
Feldman said decoupling allows the CG-1 system to scale in a linear fashion as more systems are added. Linear scaling is possible as the memory and computing elements operate independently, which is unlike large deployments of GPUs, in which each chip has its own memory and cache. A system-level technology called MemoryX stores model parameters separately, which is communicated to the computing cores.
“You have thousands of little GPUs, each of them has a different chunk of the parameters. So, you have taken 100 billion parameters, you have to keep track of where they all are. We have a centralized parameter store,” Feldman said.
A similar technology called SwarmX orchestrates computing and memory management at the cluster level — it takes the parameters from MemoryX and broadcasts that to multiple CS-2s over the interconnecting fabric, which are multiple 100GbE lanes in the silicon.
Cerebras’ AI chip, like GPUs, has many desirable attributes that can accelerate conventional scientific computing. There is a risk G42 customers use Cerebras systems for conventional HPC, which could disrupt the startup’s AI market focus.
But Feldman insisted that the chip is designed for AI computing, not conventional HPC.
“We have built this machine for AI. We do not support 64-bit double precision. We do do some HPC work… and that is right at the intersection of AI and HPC,” Feldman said.
In the U.S., the Department of Energy’s National Energy Technology Laboratory is using Cerebras systems for decarbonization initiatives, but the chip gives them an excuse to test AI in its computing stack.
“We have some work with them, where they’re doing giant simulations for computational fluid dynamics. But I think we have really stood this up and optimized it for AI,” Feldman said.
Cerebras has also released many open-source large-language models as it tries to build an underlying software infrastructure for its chips. Nvidia also has a strong software presence, with a lot of the AI codebase veering in the direction of its proprietary CUDA software stack, which can take advantage of features only available in the A100 and H100 GPUs.
On the hardware front, Cerebras also faces challenges from AMD, which recently launched the MI300X GPU for AI, and Intel, which has an AI accelerator called Gaudi. None of these chips have racked up large commercial sales.
Cerebras Systems received the 2022 Editors’ Choice Awards for their Cerebras Systems CS-2 Artificial Intelligence system.