When it comes to hardware, the ASC Student Cluster Competitions are markedly different than what you see at ISC and SC. At the ISC and SC competitions, each team is sponsored by one or more hardware vendors and they build their cluster out of what their hardware vendors offer. At ASC, the entire field is generously sponsored by Inspur, thus everyone builds their clusters out of the same Inspur nodes. This makes the ASC competition more like stock car racing as compared to the SC/ISC Formula 1 style.
You’d think that this would mean that the team hardware configurations would all be about the same. This is an incorrect assumption. Take a look at the hardware configuration chart above. You see that total node count ranges from three nodes on the low end all the way up to 10 nodes, with an average node count of just over five. The CPU core count shows a similar disparity, ranging from 576 to 1,920.
Students come into the competition with different ideas. Some teams are aiming at winning the LINPACK prize and design their cluster with few nodes, but then jam GPUs into them like there’s no tomorrow. These teams can be at a disadvantage when it comes to the scientific applications, since those apps might not be as GPU-friendly as LINPACK. Their low CPU count can really hurt them on these applications. It’s a rare team that can field a system that’s competitive at both LINPACK and the scientific applications – although it has happened.
Other teams are looking to win the whole thing, the Overall Championship, so they configure systems that are more balanced between CPUs and GPUs. They could still configure in as many as 16 GPUs in their cluster. But they’d do it on six or eight nodes rather than just four. They would also have to mercilessly throttle down the power consumption of their CPUs and GPUs, depending on what application they’re running at the time. This level of power control takes serious planning, testing, and skills.
Students needed to tell the ASC18 committee how many nodes they desired before the competition. Some teams underestimated their requirements and had to try to get more nodes during the testing phase. A few teams were able to get additional servers, but some were not, which is unfortunate.
The most on the ball teams overprovisioned their clusters like a hungry fat man at the grocery store. They requested ten or even twelve nodes, knowing all along that they were probably only going to run six or eight in the actual competition. The craftiest of teams took the memory off the nodes they weren’t going to use and added it to the nodes in their final cluster, giving them a memory advantage of 2x. Nice move.
Each team was loaned four NVIDIA Tesla V100 GPUs for use during the event, which was very generous of NVIDIA and their Chinese partner LeadTek. More than half of the teams brought their own GPUs along to augment the four from NVIDIA, which is perfectly within the rules of the competition.
It’s interesting to note how the clusters of today compare to the clusters from just a year ago. Notice that the average node count has dropped from 6.5 to 5.5 today. Average CPU core count per system has dropped from 182 to 144, and we’ve even seen a drop in the average number of accelerators. What’s happening?
What we’re seeing is two trends coming together. The students now have access to much more powerful systems, CPUs, and accelerators. This new tech can accomplish much more work than the components from even 12 months ago. However, there’s a price to pay for speed. It’s the impact on the power budget. These new components are power hungry, which forces the teams to have smaller configurations.
I’ve seen the same trend at the SC and ISC competitions. Back in the earlier days of the competitions, it wasn’t unusual to see a ten node configuration and the average was probably around eight nodes. But today, the students can get much more performance out of much smaller systems – all while using the same amount of power. Ain’t technology grand?
Stay tuned for more exciting updates from ASC18…..