ACES ‘Composable’ Supercomputer Gets Ready for Phase One Use

By John Russell

April 4, 2022

Later this spring, ACES – the new ‘composable’ supercomputer being stood up at Texas A&M University – will begin granting Phase One access to early users. Unlike traditional systems, whose architecture and components are mostly fixed, ACES will have a variety of nodes with a range of processors – CPUs, GPUs, FPGAs, specialized AI processors – that can be dynamically mixed and matched as needed for particular workflows.

Broadly, the NSF-funded ACES system is the first of its kind – ACES is a software-defined supercomputer whose magic sauce is Liqid’s Matrix software and fabric. Speaking at the HPC User Forum March 2022 meeting, held virtually last week, ACES principal investigator Honggao Liu shared more details of ACES (Accelerating Computing for Emerging Sciences) system and its deployment plans. Liu is executive director, high performance research computing (HPRC) at Texas A&M. The full system is scheduled to be available early next year.

“[The] goals of the ACES platform are to remove the most significant bottleneck in advanced computing by introducing the flexibility to [combine] different components like a processor, accelerator, or memory as needed as the basis to solve complex problems,” said Liu in his presentation.

Next generation infrastructure will be dynamically configurable, he said. “In ACES, each server can dynamically pool CPUs or GPUs or storage from the resource pool using the composable fabric and software. Each server is dynamically configurable based on the workload of your research team. We plan to deploy the Dell Omnia software which will support both Slurm and Kubernetes schedulers and they will be integrated with the Liqid software and fabric.”

The ACES grant was awarded last September to three universities – Texas A&M University, the University of Illinois at Urbana-Champaign and the University of Texas at Austin – who have moved quickly to deploy the new system. Dell is the server supplier. Intel will supply CPUs (48-core Sapphire Rapids), FPGAs, GPUs (Ponte Vecchio), and Optane PCIe SSDs. NEC will supply its Vector Engine. Graphcore will supply its IPU and NextSilicon will supply its co-processor whose details are still being kept close-to-the-vest.

Liu declined to elaborate on NextSilicon’s co-processor features. NextSilicon, of course, has been in stealth mode and hasn’t said much about its technology. Liu said, “We don’t have their devices at this point for ACES phase 1 and are in the process purchasing some.” Presumably we’ll learn more as ACES phase 2 proceeds. Liu did briefly touch on the different kinds of workflows ACES would be able to accommodate and showed a couple of slides (below) matching accelerators to well-suited workflows.

Besides highlighting the different accelerator types available, he noted, “If you need large memory you can dynamically compose up to three terabytes of the memory and then you can run your application that needs[such] a large memory.”

“As I mentioned earlier, ACES will use Nvidia-Mellanox 400 Gigabits per second InfiniBand and ACES will include two petabytes of useable DDN Lustre storage. ACES will have a couple login nodes, three management nodes, and two data transfer nodes and we will have 100 Gigabits per second network adapters on the data transfer nodes,” said Liu, noting the latter will allow users to transfer large data-sets to ACES at high-speed.

“The ACES configuration supports a broad range of application hardware preferences, depending on the research workflow or application. User can choose to use different components, so accelerators, that work best for the applications or workflows,” he said.

As on Aurora, the DoE exascale system being built at Argonne National Lab that also uses Intel CPUs and GPUs, ACES plans to use Intel’s oneAPI framework as the primary programming development tool. Intel has consistently pitched oneAPI as a vendor-neutral approach to porting code to diverse processor types.

About the ACES software environment, Liu said, “ACES will host all major HPC AI/ML software and frameworks. We will support the most widely-used the recent application software, with support to JupyterHub. We plan to offer Intel’s oneAPI as the cross-architecture programming framework or CPUs, GPUs and FPGAs. A user can use the same code through oneAPI to run on CPUs, GPUs, and FPGAs.

“We will support Slurm and Kubernetes. We will use Anaconda, Easybuild to build your software, and we will provide Singularity and Charlie Cloud for container applications. On the system side, we will install the XSEDE software stack and also support the HTC Condor.” (see slide below)

As explained by Liu, ACES will have two phases. “The Phase One prototype, including the Graphcore IPU, Intel D5005 FPGAs, NEC Vector engine and Liquid-Intel Optane card, will be likely available for early access in April or sometime in May,” he said.

He noted that ACES is being integrated to Texas A&M’s FASTER computer that’s now finishing installation and will become the center’s fastest system. The NSF FASTER award was announced in August of 2020 the system and shares several attributes with ACES including use of Liqid’s composable fabric.

Here’s a description of FASTER from the Texas A&M website: “FASTER is a 184-node Intel cluster from Dell with an InfiniBand HDR-100 interconnect. A100 GPUs, A10 GPUs, A30 GPUs, A40 GPUs and T4 GPUs are distributed and composable via Liqid PCIe fabrics. All nodes are based on the Intel Ice Lake processor.” FASTER will become available to users in April.

For ACES, all of the Phase Two hardware is expected to arrive at Texas A&M in the June-July timeframe. “We will finish installation and testing acceptance of Phase Two by September 30, 2022,” said Liu. “Then we will start user testing operations for ACES system in the fourth quarter of this year and we plan to have allocation for users early next year.”

Slides courtesy of Dr. Honggao Liu 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

U.S. Quantum Director Charles Tahan Calls for NQIA Reauthorization Now

February 29, 2024

(February 29, 2024) Origin stories make the best superhero movies. I am no superhero, but I still remember what my undergraduate thesis advisor said when I told him that I wanted to design quantum computers in graduate s Read more…

pNFS Provides Performance and New Possibilities

February 29, 2024

At the cusp of a new era in technology, enterprise IT stands on the brink of the most profound transformation since the Internet's inception. This seismic shift is propelled by the advent of artificial intelligence (AI), Read more…

Celebrating 35 Years of HPCwire by Recognizing 35 HPC Trailblazers

February 29, 2024

In 1988, a new IEEE conference debuted in Orlando, Florida. The planners were expecting 200-300 attendees because the conference was focused on an obscure topic called supercomputing, but when it was announced that S Read more…

Forrester’s State of AI Report Suggests a Wave of Disruption Is Coming

February 28, 2024

The explosive growth of generative artificial intelligence (GenAI) heralds opportunity and disruption across industries. It is transforming how we interact with technology itself. During this early phase of GenAI technol Read more…

Q-Roundup: Google on Optimizing Circuits; St. Jude Uses GenAI; Hunting Majorana; Global Movers

February 27, 2024

Last week, a Google-led team reported developing a new tool - AlphaTensor Quantum - based on deep reinforcement learning (DRL) to better optimize circuits. A week earlier a team working with St. Jude Children’s Hospita Read more…

AWS Solution Channel

Shutterstock 2283618597

Deep-dive into Ansys Fluent performance on Ansys Gateway powered by AWS

Today, we’re going to deep-dive into the performance and associated cost of running computational fluid dynamics (CFD) simulations on AWS using Ansys Fluent through the Ansys Gateway powered by AWS (or just “Ansys Gateway” for the rest of this post). Read more…

Argonne Aurora Walk About Video

February 27, 2024

In November 2023, Aurora was ranked #2 on the Top 500 list. That ranking was with half of Aurora running the HPL benchmark. It seems after much delay, 2024 will finally be Aurora's time in the spotlight. For those cur Read more…

Royalty-free stock illustration ID: 1988202119

pNFS Provides Performance and New Possibilities

February 29, 2024

At the cusp of a new era in technology, enterprise IT stands on the brink of the most profound transformation since the Internet's inception. This seismic shift Read more…

Celebrating 35 Years of HPCwire by Recognizing 35 HPC Trailblazers

February 29, 2024

In 1988, a new IEEE conference debuted in Orlando, Florida. The planners were expecting 200-300 attendees because the conference was focused on an obscure t Read more…

Forrester’s State of AI Report Suggests a Wave of Disruption Is Coming

February 28, 2024

The explosive growth of generative artificial intelligence (GenAI) heralds opportunity and disruption across industries. It is transforming how we interact with Read more…

Q-Roundup: Google on Optimizing Circuits; St. Jude Uses GenAI; Hunting Majorana; Global Movers

February 27, 2024

Last week, a Google-led team reported developing a new tool - AlphaTensor Quantum - based on deep reinforcement learning (DRL) to better optimize circuits. A we Read more…

South African Cluster Competition Team Enjoys Big Texas HPC Adventure

February 26, 2024

Texas A&M University's High-Performance Research Computing (HPRC) hosted an elite South African delegation on February 8 - undergraduate computer science (a Read more…

A Big Memory Nvidia GH200 Next to Your Desk: Closer Than You Think

February 22, 2024

Students of the microprocessor may recall that the original 8086/8088 processors did not have floating point units. The motherboard often had an extra socket fo Read more…

Apple Rolls out Post Quantum Security for iOS

February 21, 2024

Think implementing so-called Post Quantum Cryptography (PQC) isn't important because quantum computers able to decrypt current RSA codes don’t yet exist? Not Read more…

QED-C Issues New Quantum Benchmarking Paper

February 20, 2024

The Quantum Economic Development Consortium last week released a new paper on benchmarking – Quantum Algorithm Exploration using Application-Oriented Performa Read more…

Training of 1-Trillion Parameter Scientific AI Begins

November 13, 2023

A US national lab has started training a massive AI brain that could ultimately become the must-have computing resource for scientific researchers. Argonne N Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia Wins SC23, But Gets Socked by Microsoft’s AI Chip

November 16, 2023

Nvidia was invisible with a very small booth and limited floor presence, but thanks to its sheer AI dominance, it was a winner at the Supercomputing 2023. Nv Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Analyst Panel Says Take the Quantum Computing Plunge Now…

November 27, 2023

Should you start exploring quantum computing? Yes, said a panel of analysts convened at Tabor Communications HPC and AI on Wall Street conference earlier this y Read more…

Royalty-free stock illustration ID: 1675260034

RISC-V Summit: Ghosts of x86 and ARM Linger

November 12, 2023

Editor note: See SC23 RISC-V events at the end of the article At this year's RISC-V Summit, the unofficial motto was "drain the swamp," that is, x86 and Read more…

China Deploys Massive RISC-V Server in Commercial Cloud

November 8, 2023

If the U.S. government intends to curb China's adoption of emerging RISC-V architecture to develop homegrown chips, it may be getting late. Last month, China Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Leading Solution Providers

Contributors

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Chinese Company Developing 64-core RISC-V Chip with Tech from U.S.

November 13, 2023

Chinese chip maker SophGo is developing a RISC-V chip based on designs from the U.S. company SiFive, which highlights challenges the U.S. government may face in Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Royalty-free stock illustration ID: 1182444949

Forget Zettascale, Trouble is Brewing in Scaling Exascale Supercomputers

November 14, 2023

In 2021, Intel famously declared its goal to get to zettascale supercomputing by 2027, or scaling today's Exascale computers by 1,000 times. Moving forward t Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire