Here’s the first look at the configurations of the ASC19 Student Cluster Competition teams. First thing to realize is that all of the hardware is being generously provided by Inspur, so this is a lot like a stock car race, where everyone is running mostly the same gear.
So, if everyone has the same systems available to them, then why is there such a wide variety of clusters in the competition? Creativity, that’s why. Students have spent months studying and testing various cluster alternatives given the selections available to them. Some believe that ‘small is beautiful’, particularly when accompanied by a slew of GPUs, while others were looking for a better balance between CPU, node count, and GPU accelerators.
It’s interesting to look at the trends. Systems are getting bigger; the average node count at ASC19 is a full node bigger than the average in 2018, with the median increasing by two nodes. Average and median CPUs and CPU cores has also risen sharply, mainly due to Intel increasing their core counts (nice work, Intel), but also due to the greater number of nodes in the systems this year.
We’re seeing a huge increase in memory per node, from 201 GB per node in 2018 to a whopping 384 GB per node in 2019. This is due to new big memory systems from Inspur – great job on their part. And take a look at the memory per cluster figures – 1017 GB in 2018 and 2688 GB in 2019, a huge increase over the course of a single year.
What’s interesting is that the number of accelerators/GPUs per cluster has dropped a little bit. Some competition observers have predicted that the student cluster competition race is primarily decided by how many accelerators you can cram into a system. But this isn’t necessarily true according to my data. It all depends on the applications.
The applications this year are a mixed bag when it comes to GPU-centricity (new phrase on my part). Linpack and HPCG are certainly impacted by how many GPUs you have in a system, but these benchmarks aren’t weighted very highly when it comes to the overall scores.
Nvidia is touting their PyTorch GPU projects, and GPUs can probably be very helpful in the SR challenge, but according to the students, you don’t really need a whole brace of GPUs to complete this task.
Which brings us to CESM, which isn’t GPU-centric at all according to the students. Since this is a major part of the scoring, students in some cases have eschewed GPUs in their clusters in favor of more CPU. There’s also the fact that Nvidia V100s are pretty expensive and in short supply – although Inspur is providing four V100s to any team that is interested. Good job Inspur.
Now that we know the configurations, we’re going to move on to meeting the students via video interview and getting our first real results of the competition. Stay tuned!