Editors Note: Additional Coverage of the AWS-Nvidia 65 Exaflop ‘Ultra-Cluster’ and Graviton4 can be found on our sister site Datanami.
Amazon Web Services will soon be home to a new Nvidia-built supercomputer that the company claims is one of the world’s fastest AI systems.
The system delivers 65 exaflops of AI performance when measured with the FP8 data type. It is also the world’s first ARM-based AI supercomputer in the cloud, said Ian Buck, vice president for hyperscale and HPC at Nvidia, in a press briefing.
The new DGX Cloud Project Ceiba system is built around Nvidia components but is adapted to connect to AWS’s Nitro chips, which provide the underlying networking, storage, and security infrastructure.
Project Ceiba will be installed in 2024 and available to customers then. This system differs from other Nvidia DGX Cloud implementations, which have been cloned across Microsoft’s Azure, Oracle Cloud, and Google Cloud.
The 65 exaflops of performance come with 16,384 Grace Hopper Superchips, and the racks are interconnected by AWS’s Elastic Fabric Adapter, which is used for supercomputing applications.
“This new supercomputer will be stood up inside the AWS infrastructure… and used by Nvidia’s own research and engineering team to develop new AIs for graphics, large-language model research, [and] new AIs for…digital biology, robotics research, and self-driving cars,” Buck said.
AWS also announced new VMs with Nvidia’s new H200, L40S, and L4 GPUs. The products were announced at the ongoing ReInvent conference taking place in Las Vegas.
AWS also announced its latest CPU, Graviton4, a successor to Graviton3, and it will be put into EC2 instances. The specifications were not immediately available, however, in a blog post, Amazon said that the CPU has 50% more cores, is 30% faster, and has 75% more memory bandwidth than Graviton3 chips.
The Graviton4 release was expected, considering the relationship between the Graviton processors and the release cadence of Arm Neoverse core designs, said James Sanders, principal analyst at CCS Insight. Graviton4 is modeled around Neoverse V2, whereas Graviton 3 is modeled around Neoverse V1.
“The major microarchitectural change between the two is the step up to Armv9, which improves process isolation and vector extensions. The former is useful in cloud contexts—and something of a reaction to side-channel attacks like Spectre and Meltdown—while the latter is useful for AI inference,” Sanders said.
The most recent CPU was the Graviton 3E, which is the highest-performance AWS CPU for supercomputing accessible. Graviton4 will be accessible in more instance types across different price points when it reaches general availability, Sanders said.
By comparison, Microsoft’s new Cobalt 100 is based around Arm’s Neoverse N2 CSS—which is also Armv9. The Neoverse N-series is targeted toward mainstream datacenter use, and the V-series is for higher-performance workloads, which will give the Graviton4 an advantage.
Amazon also finally released its next-generation training chip called Trainium2, which is four times faster at training than the first Trainium chip released close to three years ago.
“It will be able to be deployed in EC2 UltraClusters of up to 100,000 chips, making it possible to train foundation models and large language models (LLMs) in a fraction of the time while improving energy efficiency up to 2x,” Amazon said in a blog post.
AWS now hosts two ARM CPUs, including Nvidia’s Grace and its own Graviton4. Nvidia’s Grace provides access to GPUs, while Graviton4 provides access to Trainium. Graviton4 does not yet work with Nvidia GPUs.
Given the increase in model development and customization, the demand for this is there, though it requires some tooling adjustment for projects targeting Nvidia’s CUDA environment.
“Curiously, [AWS CEO] Adam Selipsky noted that other cloud companies are only talking about their AI chips when Google is on its fourth generation of TPU,” Sanders said.