Facebook Parent Meta’s New AI Supercomputer Will Be ‘World’s Fastest’

By Oliver Peckham

January 24, 2022

Fresh off its rebrand last October, Meta (née Facebook) is putting muscle behind its vision of a metaversal future with a massive new AI supercomputer called the AI Research SuperCluster (RSC). Meta says that RSC will be used to help build new AI models, develop augmented reality tools, seamlessly analyze multimedia data and more. The supercomputer’s first phase is already operational, and it is scheduled for full build-out by mid-year. HPCwire is estimating that the final system will weigh in at over 220 Linpack petaflops.

RSC as currently built. Image courtesy of Meta.

About the system

RSC’s first phase, already built-out and operational, consists of 760 Nvidia DGX A100 compute nodes, totaling some 6,080 Nvidia A100 GPUs, all networked with Nvidia’s Quantum 200Gb/s InfiniBand. For storage, the system is equipped with 175PB of Pure Storage FlashArray, 10PB of Pure Storage FlashBlade and 46PB of cache storage housed in Penguin Computing Altus servers. Meta says that, with just this first phase, they “believe [RSC] is among the fastest AI supercomputers running today[.]”

With the completion of the second phase around July, Meta says, RSC will contain a total of 16,000 GPUs (presumably through an additional 1,240 DGX A100 nodes, which Nvidia believes will make it the largest customer installation of DGX A100 systems) and a full exabyte of storage with the capacity to accommodate 16TB/s of training data. Meta indicated 16,000 GPUs will be the maximum configuration of the system. “This is due to the network configuration to reduce the number of hops, to ensure we provide a 1:1 oversubscription,” a Meta spokesperson told us.

Meta says this second phase will increase RSC’s AI training performance by more than 2.5× (tracking with the 2.63× increase in GPUs), cementing it as the single fastest AI supercomputer in the world.

Unlike preceding systems, RSC is intended for use with not just open-source/public datasets, but with real-world, internal production data from Meta. To that end, Meta says, they designed the system to be isolated from the internet, with all connections passing through Meta’s own datacenters. User-generated data—checked for anonymization—is encrypted from the storage systems to the GPUs and only decrypted in-memory immediately prior to its use in model training.

Meta also developed a storage service (called AI Research Store, or AIRStore) to handle the growing bandwidth and capacity requirements of RSC. AIRStore preprocesses training data for AI models and is designed to optimize transfer speeds.

In its announcement of RSC, Meta also quietly detailed the first generation of its AI research supercomputing hardware, launched in 2017. The unnamed cluster, Meta says, has 22,000 Nvidia V100 GPUs and performs 35,000 training jobs per day. Meta says that compared to this previous system, RSC’s early benchmarks show a 20× improvement on computer vision workflows and a 3× improvement in large-scale NLP model training (which, Meta says, translates to weeks of saved time).

So far, Meta has worked with a consistent roster of partners across these systems: Penguin Computing for architecture and managed services; Nvidia for systems, GPUs, networking, and software stack components; and Pure Storage for most of the storage functionality.

Image courtesy of Meta.

The fastest AI supercomputer(s)

In terms of flops, Meta estimates that RSC will deliver nearly five exaflops of mixed-precision AI compute power. Using Nvidia’s Selene supercomputer (also comprised of eight-GPU Nvidia DGX A100 nodes) as a benchmark, HPCwire estimates that (were Meta to run the HPL benchmark) the full iteration of RSC might deliver around 227 Linpack petaflops of compute power (up from perhaps 86 petaflops right now), though further optimizations made by Nvidia in the interim may make those numbers underestimates.

That is certainly a powerful system—the first phase of RSC would likely place fourth on November’s Top500 list, and its full form would likely place second—but the race for “fastest AI supercomputer” is crowded. While RSC will almost certainly best current comparable competitors like Selene (63.4 Linpack petaflops) and the similarly A100-based Perlmutter system at NERSC (70.9 Linpack petaflops), the near future boasts much stronger challengers.

The most like-to-like comparison might be EuroHPC’s forthcoming Leonardo system, a pre-exascale Atos-built supercomputer that will also be powered by Nvidia A100s (around 14,000 of them, compared to RSC’s planned 16,000). CINECA, which is slated to launch Leonardo’s GPU-powered booster module this month, expects that module alone to deliver 240.5 Linpack petaflops, and Nvidia has billed the forthcoming system as—you guessed it—the “world’s fastest AI supercomputer” (with an estimated ten exaflops of FP16 AI performance).

Tesla, too, is publicly building an enormous AI supercomputer called Dojo, targeting that system at model training for autonomous vehicle development. Currently, it has an A100-based precursor system that HPCwire previously estimated at around 82 Linpack petaflops, but Dojo itself will be powered by Tesla’s proprietary “D1” chip. Owing to the nontraditional hardware and other uncertainties, it is harder to estimate Dojo’s future Linpack performance, but Tesla says that when Dojo launches (as-yet unspecified) it will be “the fastest AI training computer.”

Two notes: first, HPCwire also estimates that RSC’s V100-based precursor system likely delivers around 135 Linpack petaflops and would probably place third on the current Top500, well above the competition from AI systems like Selene and Perlmutter. This would—at least in terms of the Top500—make it the world’s fastest AI supercomputer. Second: Meta (under the name Facebook) previously submitted a 3.3-Linpack petaflops system to the Top500 in early 2017 (it currently ranks 139th). While that system uses Penguin servers, the specs mention Nvidia Tesla P100s and Quadro GP100s rather than V100s, so it may not be part of the precursor system.

Only time (and benchmarks) will tell who comes out on top.

Image courtesy of Meta.

Into the metaverse

The first phase of RSC is already being used for applications like large-model training for natural language processing (NLP) and computer vision. But the long-term target is the metaverse, the nebulously defined virtual world that Meta (named for the metaverse) clearly believes will constitute a new digital revolution.

Meta has an ambitious vision of RSC for the metaverse, highlighting, as an example, how RSC could train models for real-time voice translation among large groups of people, enabling individuals speaking different languages to collaborate on work or gameplay without a language barrier.

“The experiences we’re building for the metaverse require enormous compute power (quintillions of operations/second!) and RSC will enable new AI models that can learn from trillions of examples, understand hundreds of languages, and more,” said Mark Zuckerberg, CEO of Meta.

Building RSC during a pandemic

Meta ties the ideas behind RSC all the way back to the founding of the Facebook AI Research lab in 2013, but says the real inception of the project dated back to early 2020, when they decided a new system was necessary to take advantage of advances in GPU and network fabric technologies. The headline goal: a system capable of training models with more than a trillion parameters on datasets as large as an exabyte.

Rack delivery for RSC. Image courtesy of Meta.

Covid, of course, impeded the development of such a system. Meta says that RSC started as a completely remote project, and the supply chain challenges that emerged later in the pandemic threw even more roadblocks into the path. Meta explained that supply chain disruptions made it difficult to obtain components from chips to GPUs.

“One does not simply buy and power on a supercomputer,” said George Niznik, sourcing manager for Meta. “RSC was designed and executed under extremely compressed timelines without the benefit of a traditional product release cycle. Additionally, the pandemic and a major industry chip supply shortage hit at precisely the wrong moment in time. We had to fully utilize of all our collective skills and experiences to solve these difficult constraints.”

Nevertheless, a year and a half later, the team had delivered a functioning cluster. Meta told HPCwire that the team had been able to mitigate supply chain issues for phase one and that the phased build is continuing according to plan.

“I think what I’m most proud of is doing this with the team completely remotely,” said Shubho Sengupta, an AI researcher at Meta. “I mean, it is insane that you can do this without ever meeting anybody.”

An image from Meta’s video announcing the system, perhaps showing the otherwise-undisclosed location of RSC. (We think the location shown is outside of Richmond, Virginia.)
Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Hyperion Study Tracks Rise and Impact of Linux Supercomputers

May 17, 2022

That supercomputers produce impactful, lasting value is a basic tenet among the HPC community. To make the point more formally, Hyperion Research has issued a new report, The Economic and Societal Benefits of Linux Super Read more…

ECP Director Doug Kothe Named ORNL Associate Laboratory Director

May 16, 2022

The Department of Energy's Oak Ridge National Laboratory (ORNL) has selected Doug Kothe to be the next Associate Laboratory Director for its Computing and Computational Sciences Directorate (CCSD), HPCwire has learned. K Read more…

Google Cloud’s New TPU v4 ML Hub Packs 9 Exaflops of AI

May 16, 2022

Almost exactly a year ago, Google launched its Tensor Processing Unit (TPU) v4 chips at Google I/O 2021, promising twice the performance compared to the TPU v3. At the time, Google CEO Sundar Pichai said that Google’s datacenters would “soon have dozens of TPU v4 Pods, many of which will be... Read more…

Q&A with Candace Culhane, SC22 General Chair and an HPCwire Person to Watch in 2022

May 14, 2022

HPCwire is pleased to present our interview with SC22 General Chair Candace Culhane, program/project director at Los Alamos National Lab and an HPCwire 2022 Person to Watch. In this exclusive Q&A, Culhane covers her Read more…

Argonne Supercomputer Advances Energy Storage Research

May 13, 2022

The lack of large-scale energy storage bottlenecks many sources of renewable energy, such as sunlight-reliant solar power and unpredictable wind power. Researchers from Lawrence Livermore National Laboratory (LLNL) are w Read more…

AWS Solution Channel

shutterstock 1103121086

Encoding workflow dependencies in AWS Batch

Most users of HPC or Batch systems need to analyze data with multiple operations to get meaningful results. That’s really driven by the nature of scientific research or engineering processes – it’s rare that a single task generates the insight you need. Read more…

Supercomputing an Image of Our Galaxy’s Supermassive Black Hole

May 13, 2022

A supermassive black hole called Sagittarius A* (yes, the asterisk is part of it!) sits at the center of the Milky Way. Now, for the first time, we can see it. The resulting direct image of Sagittarius A*, revealed this Read more…

Google Cloud’s New TPU v4 ML Hub Packs 9 Exaflops of AI

May 16, 2022

Almost exactly a year ago, Google launched its Tensor Processing Unit (TPU) v4 chips at Google I/O 2021, promising twice the performance compared to the TPU v3. At the time, Google CEO Sundar Pichai said that Google’s datacenters would “soon have dozens of TPU v4 Pods, many of which will be... Read more…

Q&A with Candace Culhane, SC22 General Chair and an HPCwire Person to Watch in 2022

May 14, 2022

HPCwire is pleased to present our interview with SC22 General Chair Candace Culhane, program/project director at Los Alamos National Lab and an HPCwire 2022 Per Read more…

Supercomputing an Image of Our Galaxy’s Supermassive Black Hole

May 13, 2022

A supermassive black hole called Sagittarius A* (yes, the asterisk is part of it!) sits at the center of the Milky Way. Now, for the first time, we can see it. Read more…

Royalty-free stock illustration ID: 1919750255

Intel Says UCIe to Outpace PCIe in Speed Race

May 11, 2022

Intel has shared more details on a new interconnect that is the foundation of the company’s long-term plan for x86, Arm and RISC-V architectures to co-exist in a single chip package. The semiconductor company is taking a modular approach to chip design with the option for customers to cram computing blocks such as CPUs, GPUs and AI accelerators inside a single chip package. Read more…

Intel Extends IPU Roadmap Through 2026

May 10, 2022

Intel is extending its roadmap for infrastructure processors through 2026, the company said at its Vision conference being held in Grapevine, Texas. The company's IPUs (infrastructure processing units) are megachips that are designed to improve datacenter efficiency by offloading functions such as networking control, storage management and security that were traditionally... Read more…

Exascale Watch: Aurora Installation Underway, Now Open for Reservations

May 10, 2022

Installation has begun on the Aurora supercomputer, Rick Stevens (associate director of Argonne National Laboratory) revealed today during the Intel Vision event keynote taking place in Dallas, Texas, and online. Joining Intel exec Raja Koduri on stage, Stevens confirmed that the Aurora build is underway – a major development for a system that is projected to deliver more... Read more…

Intel’s Habana Labs Unveils Gaudi2, Greco AI Processors

May 10, 2022

At the hybrid Intel Vision event today, Intel’s Habana Labs team launched two major new products: Gaudi2, the second generation of the Gaudi deep learning training processor; and Greco, the successor to the Goya deep learning inference processor. Intel says that the processors offer significant speedups relative to their predecessors and the... Read more…

IBM Unveils Expanded Quantum Roadmap; Talks Up ‘Quantum-Centric Supercomputer’

May 10, 2022

IBM today issued an extensive and detailed expansion of its Quantum Roadmap that calls for developing a new 1386-qubit processor – Kookaburra – built from modularly scaled chips, and delivering a 4,158-qubit POC system built using three connected Kookaburra processors by 2025. Kookaburra (Australian Kingfisher) is a new architecture... Read more…

Nvidia R&D Chief on How AI is Improving Chip Design

April 18, 2022

Getting a glimpse into Nvidia’s R&D has become a regular feature of the spring GTC conference with Bill Dally, chief scientist and senior vice president of research, providing an overview of Nvidia’s R&D organization and a few details on current priorities. This year, Dally focused mostly on AI tools that Nvidia is both developing and using in-house to improve... Read more…

Royalty-free stock illustration ID: 1919750255

Intel Says UCIe to Outpace PCIe in Speed Race

May 11, 2022

Intel has shared more details on a new interconnect that is the foundation of the company’s long-term plan for x86, Arm and RISC-V architectures to co-exist in a single chip package. The semiconductor company is taking a modular approach to chip design with the option for customers to cram computing blocks such as CPUs, GPUs and AI accelerators inside a single chip package. Read more…

Facebook Parent Meta’s New AI Supercomputer Will Be ‘World’s Fastest’

January 24, 2022

Fresh off its rebrand last October, Meta (née Facebook) is putting muscle behind its vision of a metaversal future with a massive new AI supercomputer called the AI Research SuperCluster (RSC). Meta says that RSC will be used to help build new AI models, develop augmented reality tools, seamlessly analyze multimedia data and more. The supercomputer’s... Read more…

AMD/Xilinx Takes Aim at Nvidia with Improved VCK5000 Inferencing Card

March 8, 2022

AMD/Xilinx has released an improved version of its VCK5000 AI inferencing card along with a series of competitive benchmarks aimed directly at Nvidia’s GPU line. AMD says the new VCK5000 has 3x better performance than earlier versions and delivers 2x TCO over Nvidia T4. AMD also showed favorable benchmarks against several Nvidia GPUs, claiming its VCK5000 achieved... Read more…

In Partnership with IBM, Canada to Get Its First Universal Quantum Computer

February 3, 2022

IBM today announced it will deploy its first quantum computer in Canada, putting Canada on a short list of countries that will have access to an IBM Quantum Sys Read more…

Supercomputer Simulations Show How Paxlovid, Pfizer’s Covid Antiviral, Works

February 3, 2022

Just about a month ago, Pfizer scored its second huge win of the pandemic when the U.S. Food and Drug Administration issued another emergency use authorization Read more…

Nvidia Launches Hopper H100 GPU, New DGXs and Grace Superchips

March 22, 2022

The battle for datacenter dominance keeps getting hotter. Today, Nvidia kicked off its spring GTC event with new silicon, new software and a new supercomputer. Speaking from a virtual environment in the Nvidia Omniverse 3D collaboration and simulation platform, CEO Jensen Huang introduced the new Hopper GPU architecture and the H100 GPU... Read more…

PsiQuantum’s Path to 1 Million Qubits

April 21, 2022

PsiQuantum, founded in 2016 by four researchers with roots at Bristol University, Stanford University, and York University, is one of a few quantum computing startups that’s kept a moderately low PR profile. (That’s if you disregard the roughly $700 million in funding it has attracted.) The main reason is PsiQuantum has eschewed the clamorous public chase for... Read more…

Leading Solution Providers

Contributors

Nvidia Dominates MLPerf Inference, Qualcomm also Shines, Where’s Everybody Else?

April 6, 2022

MLCommons today released its latest MLPerf inferencing results, with another strong showing by Nvidia accelerators inside a diverse array of systems. Roughly fo Read more…

D-Wave to Go Public with SPAC Deal; Expects ~$1.6B Market Valuation

February 8, 2022

Quantum computing pioneer D-Wave today announced plans to go public via a SPAC (special purpose acquisition company) mechanism. D-Wave will merge with DPCM Capital in a transaction expected to produce $340 million in cash and result in a roughly $1.6 billion initial market valuation. The deal is expected to be completed in the second quarter of 2022 and the new company will be traded on the New York Stock... Read more…

Intel Announces Falcon Shores CPU-GPU Combo Architecture for 2024

February 18, 2022

Intel held its 2022 investor meeting yesterday, covering everything from the imminent Sapphire Rapids CPUs to the hotly anticipated (and delayed) Ponte Vecchio GPUs. But somewhat buried in its summary of the meeting was a new namedrop: “Falcon Shores,” described as “a new architecture that will bring x86 and Xe GPU together into a single socket.” The reveal was... Read more…

Industry Consortium Forms to Drive UCIe Chiplet Interconnect Standard

March 2, 2022

A new industry consortium aims to establish a die-to-die interconnect standard – Universal Chiplet Interconnect Express (UCIe) – in support of an open chipl Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

Nvidia Defends Arm Acquisition Deal: a ‘Once-in-a-Generation Opportunity’

January 13, 2022

GPU-maker Nvidia is continuing to try to keep its proposed acquisition of British chip IP vendor Arm Ltd. alive, despite continuing concerns from several governments around the world. In its latest action, Nvidia filed a 29-page response to the U.K. government to point out a list of potential benefits of the proposed $40 billion deal. Read more…

Nvidia Acquires Software-Defined Storage Provider Excelero

March 7, 2022

Nvidia has announced that it has acquired Excelero. The high-performance block storage provider, founded in 2014, will have its technology integrated into Nvidia’s enterprise software stack. Nvidia is not disclosing the value of the deal. Excelero’s core product, Excelero NVMesh, offers software-defined block storage via networked NVMe SSDs. NVMesh operates through... Read more…

Google Launches TPU v4 AI Chips

May 20, 2021

Google CEO Sundar Pichai spoke for only one minute and 42 seconds about the company’s latest TPU v4 Tensor Processing Units during his keynote at the Google I Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire