Making the Team South Africa: Defending the Crown

By Dan Olds

June 15, 2020

As you read this article, 82 university students from 11 countries are working feverishly on a cluster located at the National Supercomputing Centre of Singapore to try to win the ISC 2020 Student Cluster Competition golden crown. Ok, there isn’t an actual golden crown, but there are trophies, including a big one for the Overall Champion.

One of these teams is from the Centre for High Performance Computing located in South Africa. This is their seventh appearance in the ISC cluster wars and they’ve built up an incredible record of four gold medals, two silver medals and a bronze. In other words, they have made the podium every single time they’ve competed.

This achievement is all the more impressive because each of their teams is a unique set of undergrads – no repeats allowed. Some teams have the same students appearing in every competition until they lose their eligibility and go pro. Not the case with South Africa, it’s one and done for them. Former team members mentor new members but can’t compete more than once in the big dance.

Little Dance Then Big Dance

The CHPC is the only organization that has a ‘play in’ round to select their ISC team. Early in the competition year, the word goes out to universities all over South Africa:  Put together your cluster teams. It’s go time.

The organization provides training materials and classes to help prepare the HPC beginners to compete at the CHPC HPC forum that occurs every December. At the forum, ten student cluster teams from various universities gather to duke it out to see who will be selected for the national team.

I had the privilege of attending the 2019 CHPC cluster competition and cover the three student competitions that took place:  the cluster competition, the cyber-security competition, and the AI competition. In this article, I’m going to take you through the cluster competition in detail.

Ten Teams – One Winner

Each team is composed of four undergraduate students. They are assisted by mentors from past CHPC cluster competition teams, which is very cool. The overall winning team will form the foundation of the national team, with two outstanding competitors from the non-winning teams and then two alternates.

Through the miracle of video and extra airline luggage fees to haul the equipment to Johannesburg, South Africa, I was able to interview each of the teams twice, once to meet them, then again towards the end of the competition as a check in. Let’s take a look…

Team Alt F4:  Named after the shut down command, this team is looking to shut down the other competitors. When we first check in on the team, they’re doing well, but are already tired when we reach them on the second day. This is one of those teams where everyone does everything without a lot of specialization.

When we check back in on the team, it’s a bit of a different story. When asked how they were doing, the mood was definitely different – they were in crunch time. They’ve been having problems compiling some of the applications, which is typical for these competitions.

Team It’s Spelt Bolognese:  this team has one of the more unusual names in the competition, a real head scratcher for me. So that’s of course, my first question for them. Explanation? Watch the video to see.

The team is driving a three-node cluster with a switch that is supposedly on the way but hasn’t arrived yet. (As it turns out, none of the teams get their switches in time, so they all go with point to point interconnects – old school, love it.)  The whole team is from Cape Town, so Johannesburg is, according to them, a real treat. When we check in with the team on the last day, they’re struggling to get some results to submit. Like some of the other teams, it’s the compilers that are the issue – trying to find the right compiler for each app. This is, as we’ve seen, a common story and one that we’ll hear again.

Team Ketamine:  Ketamine is a horse tranquilizer which kind of goes with the motif of their booth. It’s a tranquil place with mood lighting and a laid-back style. When we catch up to the team early on, their three-node cluster is working well and the team is working on getting their benchmarks compiled.

According to the team, it’s “vibe first, Germany second” meaning that their mood is more important than winning and getting the coveted trip to Frankfurt for the ISC finals. They have a ‘different concept’ about what winning should mean in this context. To them, having a great time with their friends while at the CHPC conference is the ultimate win. We get into a bit of a dispute about how well this attitude will serve them in the big picture. I can’t tell if they’re just yanking me or being serious, although the team says they are serious. Check out the video and see what I mean.

Team Send Nodes:  Send Nodes is learning the fine art of building switchless interconnects as we catch up with them on the first day. They’re soldiering through and getting the hang of it. The team is running what seems to be the standard three-node configuration with each node being a compute node – no need to have a dedicated head node in clusters this small, right?

The team has appointed a “Compiler Tsar” who is responsible for finding and selecting just the right compiler for the job – sort of like a HPC sous chef. When we interview the team on the last day, we find them busily putting the finishing touches on their applications and trying to get the best results possible. They’re still getting plenty of error messages, some of them unique to their team, which is a bit troubling. While they’ve gotten to the point where they get to use the NVIDIA V100 GPU nodes in the cloud, they’re having trouble getting Quantum ESPRESSO to compile so that they can run it on the cloudy infrastructure.

Team Vision 404:  Another interesting name. Combining “file not found” with “vision”, could be interpreted as a bad thing. The team sees it as hopeful, although I’m not sure why. Team 404 hasn’t really divided up their work to a great degree, but on further questioning, it seems like one guy is responsible for most of the applications/benchmarks. The team also has a ‘Designated Google Guy’, a surfer dude who does all of the team research and provides answers back to the other students. Good division of labor.

On the last day of the competition, Vision 404 is fired up. They’re tired, sure, but they know this is the time to drive hard. As we comment “don’t hate the player, hate the game”, so at this point they’re resigned to competing against themselves and for posterity. Great attitude, love their passion and drive to learn.

Team SomberSystem:  Kind of a sad name that was picked out of the blue by the team. They’re not all that somber, which is a good thing. Their system is three huge workstations connected by a point to point interconnect through their head node. On the first day, they’re having some problems getting their cluster to scale. It sounds like a MPI problem; they can run on a single node, but can’t get the app to scale and use memory on other nodes. I have some inane potential solutions for them, which are discarded instantly.

They have a team morale officer who tells jokes to keep the team loose and having fun. This is always a good thing as student clustering is tense business.

Team Nova Tech:  Imagine my shock when I approached the team and found that they only had two members instead of four. This cluster competition puts a huge workload on a four-person team, it’s doubly huge for two (that’s just simple math, right?) This is the only team that has more nodes, at three, than team members. We’ll see how they hold up as the competition goes forward.

On our last day update, Team Nova Tech is still fighting. These guys are bone tired and it shows in the interview. They’ve completed three benchmarks but are still optimizing two of them to get a better final score. The biggest thing they’ve learned is to never, ever, rename library files. Hard won wisdom for the short handed team. Team Nova Tech also recommends reading the installation files and readme files – good advice in any context. These guys could have given up at any time, but they didn’t, they drove on and really impressed both the judges and other competitors.  

Witts Team One:  Witts University fielded two teams for this competition. This looks to be one of the better prepared teams, having put in lots of practice on a test cluster at their university. The team seemed pretty conventional in the interview until I got to Donald. Donald is in charge of compiling and optimizing the HPCC benchmark, which is an amalgamation of many benchmarks. He doesn’t see this as much of a challenge, which impressed me.

But what really impressed me about Donald was his confidence. When I asked the team how they felt about their chances to win, Donald responded “99.9%. I would have said 100% but nothing is ever for sure.” He also said, “we should start learning German now.” In my 10 years of Student Cluster Competition experience, I’ve never seen a player call his shot like Donald. In the student cluster world, he’s like Joe Namath, Muhammad Ali, Larry Bird and Michael Jordan all rolled into one. I love the whole team’s attitude and they’re obviously highly skilled.

Donald was particularly expressive in our follow-up interview. He complimented his teammates expansively and had some advice for the other team:  “pack up and go home.” Damn, I love this kid and his whole team! You gotta watch the video to see what I mean….

Witts Team A:  The second team from Witts looks to be solid as well. They were looking to containerize their applications but gave that up early on in order to get some solid results before optimizing to dial in their best possible numbers. When we meet the team, they’re down a member, but have compiled all of their benchmarks and were just starting the optimization process.

This is also a very confident team, like the other Witts team. Like Witts One, Witts A also guaranteed that they would be the winning team and make it to Germany. When we check back in on the final day, the team wasn’t quite as confident. Over night they had a node go down with a blown up motherboard. This has definitely hurt team morale, but they’re hopeful that the scores they submitted previously might be enough to put them over the top. But all is not well with Witts A, despite their great attitudes. It’s just an unlucky blow that seems to happen every once in a while. Ouch.

Team Two Nodes, One Cup:  Edgy name for a fun team. A name that made me stop in my tracks and read it two or three times before believing it. They’re truly a delightful team, great sense of humor and highly skilled. The team has divided up their workload well and seem like they have a good grasp on the tasks.

But they might be a little outgunned when it comes to hardware. The team is sporting dual workstations, each with 48 Xeon Silver cores and 92 GB of RAM. Where they might be ahead of the game is in their choice of network cards, they have selected high end network cards and might be driving double the bandwidth of other teams. We’ll see if that is enough as the competition unfolds. But this is a team that just won’t quit, despite running into some problems. Check them out in the video below…

Winners? All of Them

The winning team and the rest of the CHPC national team was announced at a gala closing banquet. Great food was served, entertainers entertained, and dignitaries delivered rousing speeches. But, for me at least, I was waiting impatiently for the awards for the Cyber, AI, and Student Cluster competitions to be handed out. (More details on the Cyber and AI competitions in upcoming stories.)

The Intel Award

Before the final student cluster team was named, there was some other business. Intel had very generously contributed a $5,000 scholarship for the most outstanding male and female competitors. I know that most of you probably haven’t been to South Africa, but let me tell you, injecting $5,000 into a college students’ life is a game changer for that student. Most of these kids are just getting by when it comes to finances and this award can make the difference between finishing college in four years vs. dropping out or taking much longer to complete their degree.

The Intel Award for this year went to our pal Donald winning on the male side and Sivenathi Madlokazi winning the female award.

Finally, the moment was at hand. The winning team and the foundation of the CHPC national team was….wait for it…Witts Team One – the team with our pal Donald Alungile. Now it was time to name the two other team members and the alternates. I’m going to let the video do the talking now….

There’s a Dell in Their Future

We’d be remiss if we didn’t mention that Dell is supporting the entire South Africa CHPC Student Cluster Competition with equipment, technical support, and money. This cluster competition has been supported from the start by Dell and they do a fantastic job. But Dell isn’t stopping there.

The next step for the team is to travel to Austin, Texas, on a Dell sponsored trip to get additional training from both Dell and the Texas Advanced Computing Center (TACC). Dell engineers will advise and collaborate with the team to design their ISC20 cluster, making sure that the CHPC students have the finest hardware available in the industry.

Cluster Competition, Meet COVID-19

The COVID crisis has forced the ISC20 Student Cluster Competition to go to a virtual format this year. This means that every team will be using the exact same cluster, a two-node system located in the Singapore National Supercomputing Center. While this is certainly a disappointment for the CHPC team, not to mention Dell, there isn’t anything anyone can do about it and all of the teams are facing the same conditions. We’ll see if CHPC can adapt and overcome, as they’ve done in the past.  

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Careers in Cybersecurity Featured at PEARC21

August 5, 2021

The PEARC21 (Practice & Experience in Advanced Research Computing) Student Program featured a Cybersecurity Careers Panel. Five experts shared lessons learned from more than 100 years of combined experience. While it Read more…

HPC Career Notes: August 2021 Edition

August 4, 2021

In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it’s a promotion, new company hire, or even an accolade, we’ Read more…

The Promise (and Necessity) of Runtime Systems like Charm++ in Exascale Power Management

August 4, 2021

Big heterogeneous computer systems, especially forthcoming exascale computers, are power hungry and difficult to program effectively. This is, of course, not an unrecognized problem. In a recent blog, Charmworks’ CEO S Read more…

Digging into the Atos-Nimbix Deal: Big US HPC and Global Cloud Aspirations. Look out HPE?

August 2, 2021

Behind Atos’s deal announced last week to acquire HPC-cloud specialist Nimbix are ramped-up plans to penetrate the U.S. HPC market and global expansion of its HPC cloud capabilities. Nimbix will become “an Atos HPC c Read more…

Berkeley Lab Makes Strides in Autonomous Discovery to Tackle the Data Deluge

August 2, 2021

Data production is outpacing the human capacity to process said data. Whether a giant radio telescope, a new particle accelerator or lidar data from autonomous cars, the sheer scale of the data generated is increasingly Read more…

AWS Solution Channel

Pushing pixels, not data with NICE DCV

NICE DCV, our high-performance, low-latency remote-display protocol, was originally created for scientists and engineers who ran large workloads on far-away supercomputers, but needed to visualize data without moving it. Read more…

Verifying the Universe with Exascale Computers

July 30, 2021

The ExaSky project, one of the critical Earth and Space Science applications being solved by the US Department of Energy’s (DOE’s) Exascale Computing Project (ECP), is preparing to use the nation’s forthcoming exas Read more…

Careers in Cybersecurity Featured at PEARC21

August 5, 2021

The PEARC21 (Practice & Experience in Advanced Research Computing) Student Program featured a Cybersecurity Careers Panel. Five experts shared lessons learn Read more…

Digging into the Atos-Nimbix Deal: Big US HPC and Global Cloud Aspirations. Look out HPE?

August 2, 2021

Behind Atos’s deal announced last week to acquire HPC-cloud specialist Nimbix are ramped-up plans to penetrate the U.S. HPC market and global expansion of its Read more…

What’s After Exascale? The Internet of Workflows Says HPE’s Nicolas Dubé

July 29, 2021

With the race to exascale computing in its final leg, it’s natural to wonder what the Post Exascale Era will look like. Nicolas Dubé, VP and chief technologist for HPE’s HPC business unit, agrees and shared his vision at Supercomputing Frontiers Europe 2021 held last week. The next big thing, he told the virtual audience at SFE21, is something that will connect HPC and (broadly) all of IT – into what Dubé calls The Internet of Workflows. Read more…

How UK Scientists Developed Transformative, HPC-Powered Coronavirus Sequencing System

July 29, 2021

In November 2020, the COVID-19 Genomics UK Consortium (COG-UK) won the HPCwire Readers’ Choice Award for Best HPC Collaboration for its CLIMB-COVID sequencing project. Launched in March 2020, CLIMB-COVID has now resulted in the sequencing of over 675,000 coronavirus genomes – an increasingly critical task as variants like Delta threaten the tenuous prospect of a return to normalcy in much of the world. Read more…

IBM and University of Tokyo Roll Out Quantum System One in Japan

July 27, 2021

IBM and the University of Tokyo today unveiled an IBM Quantum System One as part of the IBM-Japan quantum program announced in 2019. The system is the second IB Read more…

Intel Unveils New Node Names; Sapphire Rapids Is Now an ‘Intel 7’ CPU

July 27, 2021

What's a preeminent chip company to do when its process node technology lags the competition by (roughly) one generation, but outmoded naming conventions make it seem like it's two nodes behind? For Intel, the response was to change how it refers to its nodes with the aim of better reflecting its positioning within the leadership semiconductor manufacturing space. Intel revealed its new node nomenclature, and... Read more…

Will Approximation Drive Post-Moore’s Law HPC Gains?

July 26, 2021

“Hardware-based improvements are going to get more and more difficult,” said Neil Thompson, an innovation scholar at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL). “I think that’s something that this crowd will probably, actually, be already familiar with.” Thompson, speaking... Read more…

With New Owner and New Roadmap, an Independent Omni-Path Is Staging a Comeback

July 23, 2021

Put on a shelf by Intel in 2019, Omni-Path faced a uncertain future, but under new custodian Cornelis Networks, OmniPath is looking to make a comeback as an independent high-performance interconnect solution. A "significant refresh" – called Omni-Path Express – is coming later this year according to the company. Cornelis Networks formed last September as a spinout of Intel's Omni-Path division. Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base cl Read more…

Berkeley Lab Debuts Perlmutter, World’s Fastest AI Supercomputer

May 27, 2021

A ribbon-cutting ceremony held virtually at Berkeley Lab's National Energy Research Scientific Computing Center (NERSC) today marked the official launch of Perlmutter – aka NERSC-9 – the GPU-accelerated supercomputer built by HPE in partnership with Nvidia and AMD. Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer called Dojo to process truly vast amounts of video data. It’s a beast! … A truly useful exaflop at de facto FP32.” Read more…

Google Launches TPU v4 AI Chips

May 20, 2021

Google CEO Sundar Pichai spoke for only one minute and 42 seconds about the company’s latest TPU v4 Tensor Processing Units during his keynote at the Google I Read more…

CentOS Replacement Rocky Linux Is Now in GA and Under Independent Control

June 21, 2021

The Rocky Enterprise Software Foundation (RESF) is announcing the general availability of Rocky Linux, release 8.4, designed as a drop-in replacement for the soon-to-be discontinued CentOS. The GA release is launching six-and-a-half months after Red Hat deprecated its support for the widely popular, free CentOS server operating system. The Rocky Linux development effort... Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

Iran Gains HPC Capabilities with Launch of ‘Simorgh’ Supercomputer

May 18, 2021

Iran is said to be developing domestic supercomputing technology to advance the processing of scientific, economic, political and military data, and to strengthen the nation’s position in the age of AI and big data. On Sunday, Iran unveiled the Simorgh supercomputer, which will deliver.... Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

Leading Solution Providers

Contributors

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

AMD-Xilinx Deal Gains UK, EU Approvals — China’s Decision Still Pending

July 1, 2021

AMD’s planned acquisition of FPGA maker Xilinx is now in the hands of Chinese regulators after needed antitrust approvals for the $35 billion deal were receiv Read more…

GTC21: Nvidia Launches cuQuantum; Dips a Toe in Quantum Computing

April 13, 2021

Yesterday Nvidia officially dipped a toe into quantum computing with the launch of cuQuantum SDK, a development platform for simulating quantum circuits on GPU-accelerated systems. As Nvidia CEO Jensen Huang emphasized in his keynote, Nvidia doesn’t plan to build... Read more…

Microsoft to Provide World’s Most Powerful Weather & Climate Supercomputer for UK’s Met Office

April 22, 2021

More than 14 months ago, the UK government announced plans to invest £1.2 billion ($1.56 billion) into weather and climate supercomputing, including procuremen Read more…

Quantum Roundup: IBM, Rigetti, Phasecraft, Oxford QC, China, and More

July 13, 2021

IBM yesterday announced a proof for a quantum ML algorithm. A week ago, it unveiled a new topology for its quantum processors. Last Friday, the Technical Univer Read more…

Q&A with Jim Keller, CTO of Tenstorrent, and an HPCwire Person to Watch in 2021

April 22, 2021

As part of our HPCwire Person to Watch series, we are happy to present our interview with Jim Keller, president and chief technology officer of Tenstorrent. One of the top chip architects of our time, Keller has had an impactful career. Read more…

Frontier to Meet 20MW Exascale Power Target Set by DARPA in 2008

July 14, 2021

After more than a decade of planning, the United States’ first exascale computer, Frontier, is set to arrive at Oak Ridge National Laboratory (ORNL) later this year. Crossing this “1,000x” horizon required overcoming four major challenges: power demand, reliability, extreme parallelism and data movement. Read more…

Senate Debate on Bill to Remake NSF – the Endless Frontier Act – Begins

May 18, 2021

The U.S. Senate today opened floor debate on the Endless Frontier Act which seeks to remake and expand the National Science Foundation by creating a technology Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire