ASC23: Meet the Teams

By Dan Olds

May 18, 2023

The ASC23 cluster competition was held in a basketball stadium on the campus of the University of Science and Technology of China, located in Hefei, China – a modest Chinese town of nine million.

The competition nearly filled up the floor of the stadium but gave the student teams a comfortable amount of room for their work tables and systems. With the size of the room, there wasn’t any problem with keeping the temperatures at a comfortable level, but with all of the systems running, the sound level was incredible.

We interviewed nearly all the teams and, aided greatly by our trusty translators, we were able to talk about their configuration choices, the competition, and the challenges they anticipated they’d be facing. We put in a couple of long days filming, but it was a lot of fun to meet the students and learn more about them.

Here are the teams we interviewed at the stadium:

Fuzhou University: This is the fourth ASC competition for Fuzhou, they previously competed at ASC18, ASC19, and ASC21. Their final configuration consisted of three nodes and six GPUs. We processed this one in black and white because our camera exposure was set way too high. So kind of a noir look on this one.

 

Jinan University: They are the defending champion from ASC21, but also participated at ASC19, ISC21 (scoring Bronze), and SC21. Sporting a three node, six GPU, cluster, they’re looking to repeat their success from the last ASC competition.

 

Lanzhou University: Competing in their second ASC event, Lanzhou is looking to drive their three node, six GPU cluster to glory. Well, no, they didn’t exactly say that, they basically said that they’d try to do their best, but I can read between the lines.

 

Peking University: This is the eighth cluster competition for Peking University and their third ASC appearance. They’re looking to unseat their Beijing cross-town rival Tsinghua University and take home some trophy hardware. The team is driving a cluster that is a departure from the three node, six GPU configurations we’ve been seeing so far. The Peking cluster is three nodes, accompanied by nine GPUs, which should certainly give them more performance on several of the tasks – but only if they can precisely control the power draw and stay under the 3,000 watt power cap.

 

Qilu Iniversity of Technology: In an extraordinarily washed out video, we talk to the team from Qilu University of Technology. They’re a first time competitor, but don’t seem intimidated by pressure at all. The team has configured a four node, eight GPU cluster which is significantly larger than the other competitors. Maybe this will give them the edge they need to make some waves at ASC23.

 

Qinghai University: Qinghai is making their third ASC appearance and is driving a three node, four GPU cluster. In the video, we interview the team leader, who discusses pre-competition nerves and the chance that they’re under-powered when it comes to hardware. The team might move to a dual-node, four GPU configuration, which should allow them to run full out on the applications but might not get them to the performance they need to triumph.

 

Shanghai Jiao Tong University: This is the 12th cluster competition appearance for Shanghai Jiao Tong University, making them one of the most experienced institutions in the competition. They have won three Silver medals, plus a Bronze medal in previous competitions, but haven’t brought home a trophy since ASC18. The team believes that the most difficult part of the challenge for them will be the AI-centric tasks.

 

Shanghai University: Another first-time competitor in student cluster competitions, Shanghai University has a mountain to climb. Not only are they here for the first time, but they’re outgunned on the hardware side with only four GPUs when most of the other teams are sporting six. To compensate, they’re running four nodes to give them a bit more CPU power, which should help on some of the applications. Which will help more is that some of their team members have had some real world HPC experience, which is something that most of the other teams lack. We’ll see if it pays off.

 

ShanghaiTech University: This is the seventh competition for the team from ShanghaiTech University. The team nailed down a Silver medal in their first competition at ASC18. They’re driving four nodes and eight GPUs. At the time of the taping, the team was trying to see if they could go with directly connecting the nodes together and thus be able to devote some extra power (as much as 300 watts) to their compute components. But in order to accomplish this, they’ll need to get their hands on some dual-port IB NICs, which, at this point, doesn’t seem to be in the cards. Gotta like the innovative thinking, right?

 

Shanxi University: Third time competitor Shanxi has settled on a four node, eight GPU, cluster after experimenting with several other configurations. Their biggest challenge in their mind will be the YLLM training task, which will require them to build a language model with 17.88 billion tokens. In the video, we discuss the challenges of power management and how important it is to practice their power management techniques before the competition begins.

 

Southern University of Science and Technology: At the time of filming, the team captain is uneasy about the status of their server. They spent a lot of time correcting a network and GPU configuration error, plus even more time getting their Spack packages up and running. They were relying on the ability to download the SW they needed off the web, but ran into trouble finding the specific  packages they needed. Ouch. However, they’re recovering well, as you’d expect from a team that has competed five times before. Good luck, SUSTech.

 

Taiyuan University of Science & Technology: This is the seventh competition for the team from Taiyuan. When we caught up with them, they were comfortable with their progress and felt that they had everything under control. They have a bit of a hurdle in the competition as they only have two nodes and four GPUs, which could be a little underpowered compared to the rest of the field.

 

Tsinghua University: This university has competed in a record 25 student cluster competitions world-wide, has won 13 Gold medals, plus six Silver medals and three Bronze. In other words, they know their way around a student cluster competition. However, this is a brand-new slate of students, which means you can throw the record book out the window. The team was originally planning to run four nodes and eight GPUs but ended up with a configuration of three nodes and six GPUs. In the video, we talk to the team leader about team preparation and their unique approach to workload management. Rather than strictly split up task and responsibilities between team members, this edition of Team Tsinghua has decided that everyone is going to work on everything – often at the same time.

 

University of Science & Technology of China: This team is representing the host institution, USTC, and is located in Hefei, China, on a beautiful campus. Team USTC has competed in nine previous events, taking home Silver and Bronze awards from the ISC 2014 and 2015 competitions. At the time of filming, the team is driving a cluster with four nodes and eight GPUs. The team believes that their biggest challenge will be to control the power draw of their cluster, which is certainly true.

 

Zhejiang University: At their sixth competition, Team Zhejiang radically departed from the rest of the field in their hardware choice:  a single node attached to a PCIe expansion box with eight GPUs. Purists will argue that this isn’t really a cluster, since it’s only a single node, but they’re here and competing, so let’s see what happens. While this config will scream on LINPACK, which is probably what the team is gunning for, it’s doubtful that it can adequately perform on the other, more CPU-centric, applications.  Zhejiang has successfully captured the LINPACK crown before with their “Suicide LINPACK” at ASC 2016 (they turned off all their fans, crossed their fingers, and ran a scorching fast HPL). With this system, they have to be the favorite to win HPL again this year.

 

That’s it for the in-stadium competition, but there’s also a virtual competition (with the same apps but utilizing AWS hardware) that features four non-mainland China teams, let’s meet them….

The Chinese University of Hong Kong: We did a quick virtual interview with this school. This is the fourth time the university had entered a team in the ASC competition, but the first time for these students.

 

Kasetsart University: The pride of Thailand, this is the fifth outing for the team from Kasestsart U. In the interview, we meet all of the team members and talk about their responsibilities in the competition. This is an entirely new team for Kasetsart, but they seem enthusiastic and ready for the fight. Using a cloud is also a new experience for the students, not to mention running HPC applications. So a lot of new experiences are ahead for the Kasetsart team.

 

National Tsing Hua University: NTHU has competed in a whopping 20 previous student cluster competitions, including the very first one at SC07. Over the years, the team has collected a lot of awards including four Gold medals, two Silver medals, two Bronze awards, and three Highest LINPACK titles. In our interview, the team says that they are ready for the upcoming challenges. In their minds, the most difficult application will be the YLLM (large language model) due to the size of the model. But this is an experienced team, with veterans from their ASC/ISC 2021 and championship SC2022 teams.

 

Universid ad EAFIT: This is the ninth student cluster competition appearance for the team from Colombia, albeit with new members. In addition to learning the applications, learning HPC, and learning how to navigate the cloud, the team also must contend with an 11-hour time zone difference. Yikes. In the interview, we discuss the competition, their experience in HPC, and why only two members of the team are named Santiago (they could have had more, I think). Since this is their first look at HPC, they feel they’ve had a slow start at getting familiar with the applications. But that’s a common story in student cluster competitions.

 

Now that we’ve met the teams, next up are some results from the titanic battles that took place, stay tuned….

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire