As you read this article, 82 university students from 11 countries are working feverishly on a cluster located at the National Supercomputing Centre of Singapore to try to win the ISC 2020 Student Cluster Competition golden crown. Ok, there isn’t an actual golden crown, but there are trophies, including a big one for the Overall Champion.
One of these teams is from the Centre for High Performance Computing located in South Africa. This is their seventh appearance in the ISC cluster wars and they’ve built up an incredible record of four gold medals, two silver medals and a bronze. In other words, they have made the podium every single time they’ve competed.
This achievement is all the more impressive because each of their teams is a unique set of undergrads – no repeats allowed. Some teams have the same students appearing in every competition until they lose their eligibility and go pro. Not the case with South Africa, it’s one and done for them. Former team members mentor new members but can’t compete more than once in the big dance.
Little Dance Then Big Dance
The CHPC is the only organization that has a ‘play in’ round to select their ISC team. Early in the competition year, the word goes out to universities all over South Africa: Put together your cluster teams. It’s go time.
The organization provides training materials and classes to help prepare the HPC beginners to compete at the CHPC HPC forum that occurs every December. At the forum, ten student cluster teams from various universities gather to duke it out to see who will be selected for the national team.
I had the privilege of attending the 2019 CHPC cluster competition and cover the three student competitions that took place: the cluster competition, the cyber-security competition, and the AI competition. In this article, I’m going to take you through the cluster competition in detail.
Ten Teams – One Winner
Each team is composed of four undergraduate students. They are assisted by mentors from past CHPC cluster competition teams, which is very cool. The overall winning team will form the foundation of the national team, with two outstanding competitors from the non-winning teams and then two alternates.
Through the miracle of video and extra airline luggage fees to haul the equipment to Johannesburg, South Africa, I was able to interview each of the teams twice, once to meet them, then again towards the end of the competition as a check in. Let’s take a look…
Team Alt F4: Named after the shut down command, this team is looking to shut down the other competitors. When we first check in on the team, they’re doing well, but are already tired when we reach them on the second day. This is one of those teams where everyone does everything without a lot of specialization.
When we check back in on the team, it’s a bit of a different story. When asked how they were doing, the mood was definitely different – they were in crunch time. They’ve been having problems compiling some of the applications, which is typical for these competitions.
Team It’s Spelt Bolognese: this team has one of the more unusual names in the competition, a real head scratcher for me. So that’s of course, my first question for them. Explanation? Watch the video to see.
The team is driving a three-node cluster with a switch that is supposedly on the way but hasn’t arrived yet. (As it turns out, none of the teams get their switches in time, so they all go with point to point interconnects – old school, love it.) The whole team is from Cape Town, so Johannesburg is, according to them, a real treat. When we check in with the team on the last day, they’re struggling to get some results to submit. Like some of the other teams, it’s the compilers that are the issue – trying to find the right compiler for each app. This is, as we’ve seen, a common story and one that we’ll hear again.
Team Ketamine: Ketamine is a horse tranquilizer which kind of goes with the motif of their booth. It’s a tranquil place with mood lighting and a laid-back style. When we catch up to the team early on, their three-node cluster is working well and the team is working on getting their benchmarks compiled.
According to the team, it’s “vibe first, Germany second” meaning that their mood is more important than winning and getting the coveted trip to Frankfurt for the ISC finals. They have a ‘different concept’ about what winning should mean in this context. To them, having a great time with their friends while at the CHPC conference is the ultimate win. We get into a bit of a dispute about how well this attitude will serve them in the big picture. I can’t tell if they’re just yanking me or being serious, although the team says they are serious. Check out the video and see what I mean.
Team Send Nodes: Send Nodes is learning the fine art of building switchless interconnects as we catch up with them on the first day. They’re soldiering through and getting the hang of it. The team is running what seems to be the standard three-node configuration with each node being a compute node – no need to have a dedicated head node in clusters this small, right?
The team has appointed a “Compiler Tsar” who is responsible for finding and selecting just the right compiler for the job – sort of like a HPC sous chef. When we interview the team on the last day, we find them busily putting the finishing touches on their applications and trying to get the best results possible. They’re still getting plenty of error messages, some of them unique to their team, which is a bit troubling. While they’ve gotten to the point where they get to use the NVIDIA V100 GPU nodes in the cloud, they’re having trouble getting Quantum ESPRESSO to compile so that they can run it on the cloudy infrastructure.
Team Vision 404: Another interesting name. Combining “file not found” with “vision”, could be interpreted as a bad thing. The team sees it as hopeful, although I’m not sure why. Team 404 hasn’t really divided up their work to a great degree, but on further questioning, it seems like one guy is responsible for most of the applications/benchmarks. The team also has a ‘Designated Google Guy’, a surfer dude who does all of the team research and provides answers back to the other students. Good division of labor.
On the last day of the competition, Vision 404 is fired up. They’re tired, sure, but they know this is the time to drive hard. As we comment “don’t hate the player, hate the game”, so at this point they’re resigned to competing against themselves and for posterity. Great attitude, love their passion and drive to learn.
Team SomberSystem: Kind of a sad name that was picked out of the blue by the team. They’re not all that somber, which is a good thing. Their system is three huge workstations connected by a point to point interconnect through their head node. On the first day, they’re having some problems getting their cluster to scale. It sounds like a MPI problem; they can run on a single node, but can’t get the app to scale and use memory on other nodes. I have some inane potential solutions for them, which are discarded instantly.
They have a team morale officer who tells jokes to keep the team loose and having fun. This is always a good thing as student clustering is tense business.
Team Nova Tech: Imagine my shock when I approached the team and found that they only had two members instead of four. This cluster competition puts a huge workload on a four-person team, it’s doubly huge for two (that’s just simple math, right?) This is the only team that has more nodes, at three, than team members. We’ll see how they hold up as the competition goes forward.
On our last day update, Team Nova Tech is still fighting. These guys are bone tired and it shows in the interview. They’ve completed three benchmarks but are still optimizing two of them to get a better final score. The biggest thing they’ve learned is to never, ever, rename library files. Hard won wisdom for the short handed team. Team Nova Tech also recommends reading the installation files and readme files – good advice in any context. These guys could have given up at any time, but they didn’t, they drove on and really impressed both the judges and other competitors.
Witts Team One: Witts University fielded two teams for this competition. This looks to be one of the better prepared teams, having put in lots of practice on a test cluster at their university. The team seemed pretty conventional in the interview until I got to Donald. Donald is in charge of compiling and optimizing the HPCC benchmark, which is an amalgamation of many benchmarks. He doesn’t see this as much of a challenge, which impressed me.
But what really impressed me about Donald was his confidence. When I asked the team how they felt about their chances to win, Donald responded “99.9%. I would have said 100% but nothing is ever for sure.” He also said, “we should start learning German now.” In my 10 years of Student Cluster Competition experience, I’ve never seen a player call his shot like Donald. In the student cluster world, he’s like Joe Namath, Muhammad Ali, Larry Bird and Michael Jordan all rolled into one. I love the whole team’s attitude and they’re obviously highly skilled.
Donald was particularly expressive in our follow-up interview. He complimented his teammates expansively and had some advice for the other team: “pack up and go home.” Damn, I love this kid and his whole team! You gotta watch the video to see what I mean….
Witts Team A: The second team from Witts looks to be solid as well. They were looking to containerize their applications but gave that up early on in order to get some solid results before optimizing to dial in their best possible numbers. When we meet the team, they’re down a member, but have compiled all of their benchmarks and were just starting the optimization process.
This is also a very confident team, like the other Witts team. Like Witts One, Witts A also guaranteed that they would be the winning team and make it to Germany. When we check back in on the final day, the team wasn’t quite as confident. Over night they had a node go down with a blown up motherboard. This has definitely hurt team morale, but they’re hopeful that the scores they submitted previously might be enough to put them over the top. But all is not well with Witts A, despite their great attitudes. It’s just an unlucky blow that seems to happen every once in a while. Ouch.
Team Two Nodes, One Cup: Edgy name for a fun team. A name that made me stop in my tracks and read it two or three times before believing it. They’re truly a delightful team, great sense of humor and highly skilled. The team has divided up their workload well and seem like they have a good grasp on the tasks.
But they might be a little outgunned when it comes to hardware. The team is sporting dual workstations, each with 48 Xeon Silver cores and 92 GB of RAM. Where they might be ahead of the game is in their choice of network cards, they have selected high end network cards and might be driving double the bandwidth of other teams. We’ll see if that is enough as the competition unfolds. But this is a team that just won’t quit, despite running into some problems. Check them out in the video below…
Winners? All of Them
The winning team and the rest of the CHPC national team was announced at a gala closing banquet. Great food was served, entertainers entertained, and dignitaries delivered rousing speeches. But, for me at least, I was waiting impatiently for the awards for the Cyber, AI, and Student Cluster competitions to be handed out. (More details on the Cyber and AI competitions in upcoming stories.)
The Intel Award
Before the final student cluster team was named, there was some other business. Intel had very generously contributed a $5,000 scholarship for the most outstanding male and female competitors. I know that most of you probably haven’t been to South Africa, but let me tell you, injecting $5,000 into a college students’ life is a game changer for that student. Most of these kids are just getting by when it comes to finances and this award can make the difference between finishing college in four years vs. dropping out or taking much longer to complete their degree.
The Intel Award for this year went to our pal Donald winning on the male side and Sivenathi Madlokazi winning the female award.
Finally, the moment was at hand. The winning team and the foundation of the CHPC national team was….wait for it…Witts Team One – the team with our pal Donald Alungile. Now it was time to name the two other team members and the alternates. I’m going to let the video do the talking now….
There’s a Dell in Their Future
We’d be remiss if we didn’t mention that Dell is supporting the entire South Africa CHPC Student Cluster Competition with equipment, technical support, and money. This cluster competition has been supported from the start by Dell and they do a fantastic job. But Dell isn’t stopping there.
The next step for the team is to travel to Austin, Texas, on a Dell sponsored trip to get additional training from both Dell and the Texas Advanced Computing Center (TACC). Dell engineers will advise and collaborate with the team to design their ISC20 cluster, making sure that the CHPC students have the finest hardware available in the industry.
Cluster Competition, Meet COVID-19
The COVID crisis has forced the ISC20 Student Cluster Competition to go to a virtual format this year. This means that every team will be using the exact same cluster, a two-node system located in the Singapore National Supercomputing Center. While this is certainly a disappointment for the CHPC team, not to mention Dell, there isn’t anything anyone can do about it and all of the teams are facing the same conditions. We’ll see if CHPC can adapt and overcome, as they’ve done in the past.