Making the Team South Africa: Defending the Crown

By Dan Olds

June 15, 2020

As you read this article, 82 university students from 11 countries are working feverishly on a cluster located at the National Supercomputing Centre of Singapore to try to win the ISC 2020 Student Cluster Competition golden crown. Ok, there isn’t an actual golden crown, but there are trophies, including a big one for the Overall Champion.

One of these teams is from the Centre for High Performance Computing located in South Africa. This is their seventh appearance in the ISC cluster wars and they’ve built up an incredible record of four gold medals, two silver medals and a bronze. In other words, they have made the podium every single time they’ve competed.

This achievement is all the more impressive because each of their teams is a unique set of undergrads – no repeats allowed. Some teams have the same students appearing in every competition until they lose their eligibility and go pro. Not the case with South Africa, it’s one and done for them. Former team members mentor new members but can’t compete more than once in the big dance.

Little Dance Then Big Dance

The CHPC is the only organization that has a ‘play in’ round to select their ISC team. Early in the competition year, the word goes out to universities all over South Africa:  Put together your cluster teams. It’s go time.

The organization provides training materials and classes to help prepare the HPC beginners to compete at the CHPC HPC forum that occurs every December. At the forum, ten student cluster teams from various universities gather to duke it out to see who will be selected for the national team.

I had the privilege of attending the 2019 CHPC cluster competition and cover the three student competitions that took place:  the cluster competition, the cyber-security competition, and the AI competition. In this article, I’m going to take you through the cluster competition in detail.

Ten Teams – One Winner

Each team is composed of four undergraduate students. They are assisted by mentors from past CHPC cluster competition teams, which is very cool. The overall winning team will form the foundation of the national team, with two outstanding competitors from the non-winning teams and then two alternates.

Through the miracle of video and extra airline luggage fees to haul the equipment to Johannesburg, South Africa, I was able to interview each of the teams twice, once to meet them, then again towards the end of the competition as a check in. Let’s take a look…

Team Alt F4:  Named after the shut down command, this team is looking to shut down the other competitors. When we first check in on the team, they’re doing well, but are already tired when we reach them on the second day. This is one of those teams where everyone does everything without a lot of specialization.

When we check back in on the team, it’s a bit of a different story. When asked how they were doing, the mood was definitely different – they were in crunch time. They’ve been having problems compiling some of the applications, which is typical for these competitions.

Team It’s Spelt Bolognese:  this team has one of the more unusual names in the competition, a real head scratcher for me. So that’s of course, my first question for them. Explanation? Watch the video to see.

The team is driving a three-node cluster with a switch that is supposedly on the way but hasn’t arrived yet. (As it turns out, none of the teams get their switches in time, so they all go with point to point interconnects – old school, love it.)  The whole team is from Cape Town, so Johannesburg is, according to them, a real treat. When we check in with the team on the last day, they’re struggling to get some results to submit. Like some of the other teams, it’s the compilers that are the issue – trying to find the right compiler for each app. This is, as we’ve seen, a common story and one that we’ll hear again.

Team Ketamine:  Ketamine is a horse tranquilizer which kind of goes with the motif of their booth. It’s a tranquil place with mood lighting and a laid-back style. When we catch up to the team early on, their three-node cluster is working well and the team is working on getting their benchmarks compiled.

According to the team, it’s “vibe first, Germany second” meaning that their mood is more important than winning and getting the coveted trip to Frankfurt for the ISC finals. They have a ‘different concept’ about what winning should mean in this context. To them, having a great time with their friends while at the CHPC conference is the ultimate win. We get into a bit of a dispute about how well this attitude will serve them in the big picture. I can’t tell if they’re just yanking me or being serious, although the team says they are serious. Check out the video and see what I mean.

Team Send Nodes:  Send Nodes is learning the fine art of building switchless interconnects as we catch up with them on the first day. They’re soldiering through and getting the hang of it. The team is running what seems to be the standard three-node configuration with each node being a compute node – no need to have a dedicated head node in clusters this small, right?

The team has appointed a “Compiler Tsar” who is responsible for finding and selecting just the right compiler for the job – sort of like a HPC sous chef. When we interview the team on the last day, we find them busily putting the finishing touches on their applications and trying to get the best results possible. They’re still getting plenty of error messages, some of them unique to their team, which is a bit troubling. While they’ve gotten to the point where they get to use the NVIDIA V100 GPU nodes in the cloud, they’re having trouble getting Quantum ESPRESSO to compile so that they can run it on the cloudy infrastructure.

Team Vision 404:  Another interesting name. Combining “file not found” with “vision”, could be interpreted as a bad thing. The team sees it as hopeful, although I’m not sure why. Team 404 hasn’t really divided up their work to a great degree, but on further questioning, it seems like one guy is responsible for most of the applications/benchmarks. The team also has a ‘Designated Google Guy’, a surfer dude who does all of the team research and provides answers back to the other students. Good division of labor.

On the last day of the competition, Vision 404 is fired up. They’re tired, sure, but they know this is the time to drive hard. As we comment “don’t hate the player, hate the game”, so at this point they’re resigned to competing against themselves and for posterity. Great attitude, love their passion and drive to learn.

Team SomberSystem:  Kind of a sad name that was picked out of the blue by the team. They’re not all that somber, which is a good thing. Their system is three huge workstations connected by a point to point interconnect through their head node. On the first day, they’re having some problems getting their cluster to scale. It sounds like a MPI problem; they can run on a single node, but can’t get the app to scale and use memory on other nodes. I have some inane potential solutions for them, which are discarded instantly.

They have a team morale officer who tells jokes to keep the team loose and having fun. This is always a good thing as student clustering is tense business.

Team Nova Tech:  Imagine my shock when I approached the team and found that they only had two members instead of four. This cluster competition puts a huge workload on a four-person team, it’s doubly huge for two (that’s just simple math, right?) This is the only team that has more nodes, at three, than team members. We’ll see how they hold up as the competition goes forward.

On our last day update, Team Nova Tech is still fighting. These guys are bone tired and it shows in the interview. They’ve completed three benchmarks but are still optimizing two of them to get a better final score. The biggest thing they’ve learned is to never, ever, rename library files. Hard won wisdom for the short handed team. Team Nova Tech also recommends reading the installation files and readme files – good advice in any context. These guys could have given up at any time, but they didn’t, they drove on and really impressed both the judges and other competitors.  

Witts Team One:  Witts University fielded two teams for this competition. This looks to be one of the better prepared teams, having put in lots of practice on a test cluster at their university. The team seemed pretty conventional in the interview until I got to Donald. Donald is in charge of compiling and optimizing the HPCC benchmark, which is an amalgamation of many benchmarks. He doesn’t see this as much of a challenge, which impressed me.

But what really impressed me about Donald was his confidence. When I asked the team how they felt about their chances to win, Donald responded “99.9%. I would have said 100% but nothing is ever for sure.” He also said, “we should start learning German now.” In my 10 years of Student Cluster Competition experience, I’ve never seen a player call his shot like Donald. In the student cluster world, he’s like Joe Namath, Muhammad Ali, Larry Bird and Michael Jordan all rolled into one. I love the whole team’s attitude and they’re obviously highly skilled.

Donald was particularly expressive in our follow-up interview. He complimented his teammates expansively and had some advice for the other team:  “pack up and go home.” Damn, I love this kid and his whole team! You gotta watch the video to see what I mean….

Witts Team A:  The second team from Witts looks to be solid as well. They were looking to containerize their applications but gave that up early on in order to get some solid results before optimizing to dial in their best possible numbers. When we meet the team, they’re down a member, but have compiled all of their benchmarks and were just starting the optimization process.

This is also a very confident team, like the other Witts team. Like Witts One, Witts A also guaranteed that they would be the winning team and make it to Germany. When we check back in on the final day, the team wasn’t quite as confident. Over night they had a node go down with a blown up motherboard. This has definitely hurt team morale, but they’re hopeful that the scores they submitted previously might be enough to put them over the top. But all is not well with Witts A, despite their great attitudes. It’s just an unlucky blow that seems to happen every once in a while. Ouch.

Team Two Nodes, One Cup:  Edgy name for a fun team. A name that made me stop in my tracks and read it two or three times before believing it. They’re truly a delightful team, great sense of humor and highly skilled. The team has divided up their workload well and seem like they have a good grasp on the tasks.

But they might be a little outgunned when it comes to hardware. The team is sporting dual workstations, each with 48 Xeon Silver cores and 92 GB of RAM. Where they might be ahead of the game is in their choice of network cards, they have selected high end network cards and might be driving double the bandwidth of other teams. We’ll see if that is enough as the competition unfolds. But this is a team that just won’t quit, despite running into some problems. Check them out in the video below…

Winners? All of Them

The winning team and the rest of the CHPC national team was announced at a gala closing banquet. Great food was served, entertainers entertained, and dignitaries delivered rousing speeches. But, for me at least, I was waiting impatiently for the awards for the Cyber, AI, and Student Cluster competitions to be handed out. (More details on the Cyber and AI competitions in upcoming stories.)

The Intel Award

Before the final student cluster team was named, there was some other business. Intel had very generously contributed a $5,000 scholarship for the most outstanding male and female competitors. I know that most of you probably haven’t been to South Africa, but let me tell you, injecting $5,000 into a college students’ life is a game changer for that student. Most of these kids are just getting by when it comes to finances and this award can make the difference between finishing college in four years vs. dropping out or taking much longer to complete their degree.

The Intel Award for this year went to our pal Donald winning on the male side and Sivenathi Madlokazi winning the female award.

Finally, the moment was at hand. The winning team and the foundation of the CHPC national team was….wait for it…Witts Team One – the team with our pal Donald Alungile. Now it was time to name the two other team members and the alternates. I’m going to let the video do the talking now….

There’s a Dell in Their Future

We’d be remiss if we didn’t mention that Dell is supporting the entire South Africa CHPC Student Cluster Competition with equipment, technical support, and money. This cluster competition has been supported from the start by Dell and they do a fantastic job. But Dell isn’t stopping there.

The next step for the team is to travel to Austin, Texas, on a Dell sponsored trip to get additional training from both Dell and the Texas Advanced Computing Center (TACC). Dell engineers will advise and collaborate with the team to design their ISC20 cluster, making sure that the CHPC students have the finest hardware available in the industry.

Cluster Competition, Meet COVID-19

The COVID crisis has forced the ISC20 Student Cluster Competition to go to a virtual format this year. This means that every team will be using the exact same cluster, a two-node system located in the Singapore National Supercomputing Center. While this is certainly a disappointment for the CHPC team, not to mention Dell, there isn’t anything anyone can do about it and all of the teams are facing the same conditions. We’ll see if CHPC can adapt and overcome, as they’ve done in the past.  

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

SC21 Was Unlike Any Other — Was That a Good Thing?

December 3, 2021

For a long time, the promised in-person SC21 seemed like an impossible fever dream, the assurances of a prominent physical component persisting across years of canceled conferences, including two virtual ISCs and the virtual SC20. With the advent of the Delta variant, Covid surges in St. Louis and contention over vaccine requirements... Read more…

The Green500’s Crystal Anniversary Sees MN-3 Crystallize Its Winning Streak

December 2, 2021

“This is the 30th Green500,” said Wu Feng, custodian of the Green500 list, at the list’s SC21 birds-of-a-feather session. “You could say 15 years of Green500, which makes it, I guess, the crystal anniversary.” Indeed, HPCwire marked the 15th anniversary of the Green500 – which ranks supercomputers by flops-per-watt, rather than just by flops – earlier this year with... Read more…

AWS Arm-based Graviton3 Instances Now in Preview

December 1, 2021

Three years after unveiling the first generation of its AWS Graviton chip-powered instances in 2018, Amazon Web Services announced that the third generation of the processors – the AWS Graviton3 – will power all-new Amazon Elastic Compute 2 (EC2) C7g instances that are now available in preview. Debuting at the AWS re:Invent 2021... Read more…

Nvidia Dominates Latest MLPerf Results but Competitors Start Speaking Up

December 1, 2021

MLCommons today released its fifth round of MLPerf training benchmark results with Nvidia GPUs again dominating. That said, a few other AI accelerator companies participated and, one of them, Graphcore, even held a separ Read more…

HPC Career Notes: December 2021 Edition

December 1, 2021

In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it’s a promotion, new company hire, or even an accolade, we’ Read more…

AWS Solution Channel

Running a 3.2M vCPU HPC Workload on AWS with YellowDog

Historically, advances in fields such as meteorology, healthcare, and engineering, were achieved through large investments in on-premises computing infrastructure. Upfront capital investment and operational complexity have been the accepted norm of large-scale HPC research. Read more…

At SC21, Experts Ask: Can Fast HPC Be Green?

November 30, 2021

HPC is entering a new era: exascale is (somewhat) officially here, but Moore’s law is ending. Power consumption and other sustainability concerns loom over the enormous systems and chips of this new epoch, for both cost and compliance reasons. Reconciling the need to continue the supercomputer scale-up while reducing HPC’s environmental impacts... Read more…

SC21 Was Unlike Any Other — Was That a Good Thing?

December 3, 2021

For a long time, the promised in-person SC21 seemed like an impossible fever dream, the assurances of a prominent physical component persisting across years of canceled conferences, including two virtual ISCs and the virtual SC20. With the advent of the Delta variant, Covid surges in St. Louis and contention over vaccine requirements... Read more…

The Green500’s Crystal Anniversary Sees MN-3 Crystallize Its Winning Streak

December 2, 2021

“This is the 30th Green500,” said Wu Feng, custodian of the Green500 list, at the list’s SC21 birds-of-a-feather session. “You could say 15 years of Green500, which makes it, I guess, the crystal anniversary.” Indeed, HPCwire marked the 15th anniversary of the Green500 – which ranks supercomputers by flops-per-watt, rather than just by flops – earlier this year with... Read more…

Nvidia Dominates Latest MLPerf Results but Competitors Start Speaking Up

December 1, 2021

MLCommons today released its fifth round of MLPerf training benchmark results with Nvidia GPUs again dominating. That said, a few other AI accelerator companies Read more…

At SC21, Experts Ask: Can Fast HPC Be Green?

November 30, 2021

HPC is entering a new era: exascale is (somewhat) officially here, but Moore’s law is ending. Power consumption and other sustainability concerns loom over the enormous systems and chips of this new epoch, for both cost and compliance reasons. Reconciling the need to continue the supercomputer scale-up while reducing HPC’s environmental impacts... Read more…

Raja Koduri and Satoshi Matsuoka Discuss the Future of HPC at SC21

November 29, 2021

HPCwire's Managing Editor sits down with Intel's Raja Koduri and Riken's Satoshi Matsuoka in St. Louis for an off-the-cuff conversation about their SC21 experience, what comes after exascale and why they are collaborating. Koduri, senior vice president and general manager of Intel's accelerated computing systems and graphics (AXG) group, leads the team... Read more…

Jack Dongarra on SC21, the Top500 and His Retirement Plans

November 29, 2021

HPCwire's Managing Editor sits down with Jack Dongarra, Top500 co-founder and Distinguished Professor at the University of Tennessee, during SC21 in St. Louis to discuss the 2021 Top500 list, the outlook for global exascale computing, and what exactly is going on in that Viking helmet photo. Read more…

SC21: Larry Smarr on The Rise of Supernetwork Data Intensive Computing

November 26, 2021

Larry Smarr, founding director of Calit2 (now Distinguished Professor Emeritus at the University of California San Diego) and the first director of NCSA, is one of the seminal figures in the U.S. supercomputing community. What began as a personal drive, shared by others, to spur the creation of supercomputers in the U.S. for scientific use, later expanded into a... Read more…

Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed

November 24, 2021

Details about two previously rumored Chinese exascale systems came to light during last week’s SC21 proceedings. Asked about these systems during the Top500 media briefing on Monday, Nov. 15, list author and co-founder Jack Dongarra indicated he was aware of some very impressive results, but withheld comment when asked directly if he had... Read more…

IonQ Is First Quantum Startup to Go Public; Will It be First to Deliver Profits?

November 3, 2021

On October 1 of this year, IonQ became the first pure-play quantum computing start-up to go public. At this writing, the stock (NYSE: IONQ) was around $15 and its market capitalization was roughly $2.89 billion. Co-founder and chief scientist Chris Monroe says it was fun to have a few of the company’s roughly 100 employees travel to New York to ring the opening bell of the New York Stock... Read more…

Enter Dojo: Tesla Reveals Design for Modular Supercomputer & D1 Chip

August 20, 2021

Two months ago, Tesla revealed a massive GPU cluster that it said was “roughly the number five supercomputer in the world,” and which was just a precursor to Tesla’s real supercomputing moonshot: the long-rumored, little-detailed Dojo system. Read more…

Esperanto, Silicon in Hand, Champions the Efficiency of Its 1,092-Core RISC-V Chip

August 27, 2021

Esperanto Technologies made waves last December when it announced ET-SoC-1, a new RISC-V-based chip aimed at machine learning that packed nearly 1,100 cores onto a package small enough to fit six times over on a single PCIe card. Now, Esperanto is back, silicon in-hand and taking aim... Read more…

US Closes in on Exascale: Frontier Installation Is Underway

September 29, 2021

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, held by Zoom this week (Sept. 29-30), it was revealed that the Frontier supercomputer is currently being installed at Oak Ridge National Laboratory in Oak Ridge, Tenn. The staff at the Oak Ridge Leadership... Read more…

AMD Launches Milan-X CPU with 3D V-Cache and Multichip Instinct MI200 GPU

November 8, 2021

At a virtual event this morning, AMD CEO Lisa Su unveiled the company’s latest and much-anticipated server products: the new Milan-X CPU, which leverages AMD’s new 3D V-Cache technology; and its new Instinct MI200 GPU, which provides up to 220 compute units across two Infinity Fabric-connected dies, delivering an astounding 47.9 peak double-precision teraflops. “We're in a high-performance computing megacycle, driven by the growing need to deploy additional compute performance... Read more…

Intel Reorgs HPC Group, Creates Two ‘Super Compute’ Groups

October 15, 2021

Following on changes made in June that moved Intel’s HPC unit out of the Data Platform Group and into the newly created Accelerated Computing Systems and Graphics (AXG) business unit, led by Raja Koduri, Intel is making further updates to the HPC group and announcing... Read more…

Intel Completes LLVM Adoption; Will End Updates to Classic C/C++ Compilers in Future

August 10, 2021

Intel reported in a blog this week that its adoption of the open source LLVM architecture for Intel’s C/C++ compiler is complete. The transition is part of In Read more…

Killer Instinct: AMD’s Multi-Chip MI200 GPU Readies for a Major Global Debut

October 21, 2021

AMD’s next-generation supercomputer GPU is on its way – and by all appearances, it’s about to make a name for itself. The AMD Radeon Instinct MI200 GPU (a successor to the MI100) will, over the next year, begin to power three massive systems on three continents: the United States’ exascale Frontier system; the European Union’s pre-exascale LUMI system; and Australia’s petascale Setonix system. Read more…

Leading Solution Providers

Contributors

Hot Chips: Here Come the DPUs and IPUs from Arm, Nvidia and Intel

August 25, 2021

The emergence of data processing units (DPU) and infrastructure processing units (IPU) as potentially important pieces in cloud and datacenter architectures was Read more…

D-Wave Embraces Gate-Based Quantum Computing; Charts Path Forward

October 21, 2021

Earlier this month D-Wave Systems, the quantum computing pioneer that has long championed quantum annealing-based quantum computing (and sometimes taken heat fo Read more…

HPE Wins $2B GreenLake HPC-as-a-Service Deal with NSA

September 1, 2021

In the heated, oft-contentious, government IT space, HPE has won a massive $2 billion contract to provide HPC and AI services to the United States’ National Security Agency (NSA). Following on the heels of the now-canceled $10 billion JEDI contract (reissued as JWCC) and a $10 billion... Read more…

The Latest MLPerf Inference Results: Nvidia GPUs Hold Sway but Here Come CPUs and Intel

September 22, 2021

The latest round of MLPerf inference benchmark (v 1.1) results was released today and Nvidia again dominated, sweeping the top spots in the closed (apples-to-ap Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer... Read more…

Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed

November 24, 2021

Details about two previously rumored Chinese exascale systems came to light during last week’s SC21 proceedings. Asked about these systems during the Top500 media briefing on Monday, Nov. 15, list author and co-founder Jack Dongarra indicated he was aware of some very impressive results, but withheld comment when asked directly if he had... Read more…

2021 Gordon Bell Prize Goes to Exascale-Powered Quantum Supremacy Challenge

November 18, 2021

Today at the hybrid virtual/in-person SC21 conference, the organizers announced the winners of the 2021 ACM Gordon Bell Prize: a team of Chinese researchers leveraging the new exascale Sunway system to simulate quantum circuits. The Gordon Bell Prize, which comes with an award of $10,000 courtesy of HPC pioneer Gordon Bell, is awarded annually... Read more…

Quantum Computer Market Headed to $830M in 2024

September 13, 2021

What is one to make of the quantum computing market? Energized (lots of funding) but still chaotic and advancing in unpredictable ways (e.g. competing qubit tec Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire