Cerebras Has Big Plans for Big AI Chips: Build Your Own Cloud

By Agam Shah

July 20, 2023

Hyping an AI chip is one thing, but proving its usability in the commercial market is a bigger challenge.  

Some AI chip companies — which are still proving the viability of their chips — are establishing their own AI computing infrastructure to educate customers and prove the viability of their chips. 

Cerebras Systems, which makes the largest chip in the world, is now setting up artificial intelligence data centers that brings its experimental AI mega-processor out of labs to commercial customers 

Cerebras last year won the coveted Gordon Bell prize after its hardware aided in Covid-19 research, and the company’s hardware has name recognition in academia and national labs, But the commercial expansion will pit its hardware against a computing infrastructure built on Nvidia’s GPUs provided by major cloud providers that include Google, Amazon, Microsoft, and Oracle.  

The company, that has only a few hundred employees, is enlisting the help of a Middle Eastern cloud provider G42, an artificial intelligence and cloud computing company, to create an AI infrastructure. The companies are partnering to build three commercial AI data centers on U.S. soil by the end of this year.  

Group 42 purchased AI systems from Cerebras only after vetting the startup.  

“We had experience in building and managing operating large supercomputers. We had experience implementing massive generative AI models. And we had deep expertise in manipulating cleaning and managing huge datasets,” Andrew Feldman, CEO of Cerebras, told HPCwire. 

But Cerebras faces a daunting road ahead in wooing commercial clients on its systems. It will have a tough time unseating Nvidia, which has a dominant software and hardware foothold in the AI market. Large commercial enterprises that include Microsoft and Facebook are betting their AI future on Nvidia’s GPUs.  

Nvidia also has its own GPU data centers called Launchpad where developers can prototype AI applications. Intel has also established a cloud service with its own AI chips for developers and customers to prototype and run applications. Intel’s Dev Cloud recently added the Data Center GPU Max 1100 for developers to test out AI applications.  

The three data centers built by Cerebras and G42 will deliver an aggregate AI compute power of 12 exaflops of FP-16 performance. Cerebras has created three new systems under the brand Condor Galaxy, each of which will deliver 4 exaflops of performance. 

The first system, called Condor Galaxy-1 (CG-1), is already being deployed in a California data center and will have 54 million cores. The company will add more data centers over the next year. 

“It’s set to expand to… nine exaflops machines, a total of 36 exaflops,” by the end of 2024, Feldman said. 

The CG-1 AI mega-cluster brings together 64 CS-2 systems, which is an existing server offering that runs on Cerebras’ AI chip. The CS-2 is already being used in the U.S. Department of Energy’s Argonne National Laboratory and the Pittsburgh Supercomputing Center. 

G42, which is considered a cloud and AI heavyweight in the Middle East, will sell the compute capacity to companies that want to train large-language models. G42 is targeting commercial customers in verticals that include healthcare, financial services, and manufacturing. 

The promise of LLMs was demonstrated late last year by OpenAI’s LLM ChatGPT, which gained 100 million users in a few months. Since then, Google, Microsoft, and others have scrambled to implement their own large-language models in search and productivity applications. 

Large companies are building their own models, but the compute capacity is scarce with Nvidia GPU shortages. That has created an opportunity for companies like Cerebras, whose AI chips have been used and cited in many academic papers authored by researchers at commercial organizations. 

“We support up to 600 billion parameters, extensible to 100 trillion parameters,” Feldman said. Google and Microsoft have not reported the number of parameters in LLMs powering their search and productivity applications. 

G42 is backed by Mubadala, which is funded by the UAE government. Mubadala also had an equity stake in AMD before a major sell-off in 2019. Feldman was previously employed with AMD after his server startup, SeaMicro, was absorbed by the chip maker in 2014. 

A potential UAE government connection to the Cerebras-G42 partnership carries political intrigue considering the weaponization of semiconductors and AI in trade and policies, but Feldman said there were no concerns. 

“We built the fastest AI processor, and we built the fastest AI system. Of course, we work with the [U.S.] Department of Commerce and regulators. We are engaged with them. We understand what the rules are,” Feldman said. 

Cerebras’ CG-1 execution model relies on a technology called “weight streaming,” which disaggregates the memory, computing, and networking into separate clusters. AI computing primarily depends on the model’s size, and the system has technologies for memory and computing to scale separately.  All the data processing is done on Cerebras’s main AI chip, the WSE-2, which has 850,000 cores, 2.6 trillion transistors, 40GB of SRAM memory, and 20 petabits per second of bandwidth. 

Feldman said decoupling allows the CG-1 system to scale in a linear fashion as more systems are added. Linear scaling is possible as the memory and computing elements operate independently, which is unlike large deployments of GPUs, in which each chip has its own memory and cache. A system-level technology called MemoryX stores model parameters separately, which is communicated to the computing cores. 

“You have thousands of little GPUs, each of them has a different chunk of the parameters. So, you have taken 100 billion parameters, you have to keep track of where they all are. We have a centralized parameter store,” Feldman said. 

A similar technology called SwarmX orchestrates computing and memory management at the cluster level — it takes the parameters from MemoryX and broadcasts that to multiple CS-2s over the interconnecting fabric, which are multiple 100GbE lanes in the silicon. 

Cerebras’ AI chip, like GPUs, has many desirable attributes that can accelerate conventional scientific computing. There is a risk G42 customers use Cerebras systems for conventional HPC, which could disrupt the startup’s AI market focus. 

But Feldman insisted that the chip is designed for AI computing, not conventional HPC. 

“We have built this machine for AI. We do not support 64-bit double precision. We do do some HPC work… and that is right at the intersection of AI and HPC,” Feldman said. 

In the U.S., the Department of Energy’s National Energy Technology Laboratory is using Cerebras systems for decarbonization initiatives, but the chip gives them an excuse to test AI in its computing stack.

“We have some work with them, where they’re doing giant simulations for computational fluid dynamics. But I think we have really stood this up and optimized it for AI,” Feldman said. 

Cerebras has also released many open-source large-language models as it tries to build an underlying software infrastructure for its chips. Nvidia also has a strong software presence, with a lot of the AI codebase veering in the direction of its proprietary CUDA software stack, which can take advantage of features only available in the A100 and H100 GPUs.  

On the hardware front, Cerebras also faces challenges from AMD, which recently launched the MI300X GPU for AI, and Intel, which has an AI accelerator called Gaudi. None of these chips have racked up large commercial sales.  

Cerebras Systems received the 2022 Editors’ Choice Awards for their Cerebras Systems CS-2 Artificial Intelligence system.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

HPC User Forum: Sustainability at TACC Points to Software

October 3, 2023

Recently, Dan Stanzione, Executive Director, TACC and Associate Vice President for Research, UT-Austin, gave a presentation on HPC sustainability at the Fall 2023 HPC Users Forum. The complete set of slides is available Read more…

Google’s Controversial AI Chip Paper Under Scrutiny Again 

October 3, 2023

A controversial research paper by Google that claimed the superiority of AI techniques in creating chips is under the microscope for the authenticity of its claims. Science publication Nature is investigating Google's c Read more…

Rust Busting: IBM and Boeing Battle Corrosion with Simulations on Quantum Computer

October 3, 2023

The steady research into developing real-world applications for quantum computing is piling up interesting use cases. Today, IBM reported on work with Boeing to simulate corrosion processes to improve composites used in Read more…

Nvidia Delivering New Options for MLPerf and HPC Performance

September 28, 2023

As HPCwire reported recently, the latest MLperf benchmarks are out. Not unsurprisingly, Nvidia was the leader across many categories. The HGX H100 GPU systems, which contain eight H100 GPUs, delivered the highest throughput on every MLPerf inference test in this round. Read more…

Hakeem Oluseyi Explores His Unlikely Journey from the Street to the Stars in SC23 Keynote

September 28, 2023

Defying the odds In the heart of one of the toughest neighborhoods in the country, young Hakeem Oluseyi’s world was a confined space, but his imagination soared to the stars. While other kids roamed the streets, he Read more…

AWS Solution Channel

Shutterstock 2338659951

VorTech Derisks Innovative Technology to Aid Global Water Sustainability Challenges Using Cloud-Native Simulations on AWS

Overview

No more than 1 percent of the world’s water is readily available fresh water, according to the US Geological Survey. Read more…

QCT Solution Channel

QCT and Intel Codeveloped QCT DevCloud Program to Jumpstart HPC and AI Development

Organizations and developers face a variety of issues in developing and testing HPC and AI applications. Challenges they face can range from simply having access to a wide variety of hardware, frameworks, and toolkits to time spent on installation, development, testing, and troubleshooting which can lead to increases in cost. Read more…

Nvidia Takes Another Shot at Trying to Get AI to Mobile Devices

September 28, 2023

Nvidia takes another shot at trying to get to mobile devices Long before the current situation of Nvidia's GPUs holding AI hostage, the company tried to put its chips in mobile devices but failed. The Tegra mobile chi Read more…

Shutterstock 1927423355

Google’s Controversial AI Chip Paper Under Scrutiny Again 

October 3, 2023

A controversial research paper by Google that claimed the superiority of AI techniques in creating chips is under the microscope for the authenticity of its cla Read more…

Rust Busting: IBM and Boeing Battle Corrosion with Simulations on Quantum Computer

October 3, 2023

The steady research into developing real-world applications for quantum computing is piling up interesting use cases. Today, IBM reported on work with Boeing to Read more…

Nvidia Delivering New Options for MLPerf and HPC Performance

September 28, 2023

As HPCwire reported recently, the latest MLperf benchmarks are out. Not unsurprisingly, Nvidia was the leader across many categories. The HGX H100 GPU systems, which contain eight H100 GPUs, delivered the highest throughput on every MLPerf inference test in this round. Read more…

IonQ Announces 2 New Quantum Systems; Suggests Quantum Advantage is Nearing

September 27, 2023

It’s been a busy week for IonQ, the quantum computing start-up focused on developing trapped-ion-based systems. At the Quantum World Congress today, the compa Read more…

Rethinking ‘Open’ for AI

September 27, 2023

What does “open” mean in the context of AI? Must we accept hidden layers? Do copyrights and patents still hold sway? And do consumers have the right to opt Read more…

Aurora Image

Leveraging Machine Learning in Dark Matter Research for the Aurora Exascale System 

September 25, 2023

Scientists have unlocked many secrets about particle interactions at atomic and subatomic levels. However, one mystery that has eluded researchers is dark matte Read more…

Watsonx Brings AI Visibility to Banking Systems

September 21, 2023

A new set of AI-based code conversion tools is available with IBM watsonx. Before introducing the new "watsonx," let's talk about the previous generation Watson Read more…

Intel’s Gelsinger Lays Out Vision and Map at Innovation 2023 Conference

September 20, 2023

Intel’s sprawling, optimistic vision for the future was on full display yesterday in CEO Pat Gelsinger’s opening keynote at the Intel Innovation 2023 confer Read more…

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

Leading Solution Providers

Contributors

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

ISC 2023 Booth Videos

Cornelis Networks @ ISC23
Dell Technologies @ ISC23
Intel @ ISC23
Lenovo @ ISC23
Microsoft @ ISC23
ISC23 Playlist
  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire