Finally! SC19 Competitors Live and in Color!

By Dan Olds

December 10, 2019

You know the saying “better late than never”? That’s how my cluster competition coverage is faring this year. With SC19 coming late in November, quickly followed by my annual trip to South Africa to cover their cluster competition, I’ve been running behind. But I’m back and I’m going to provide all of the deep analysis and competition coverage that you’ve all become accustomed to over the years.

Now let’s take an up close and personal look at our SC19 teams. Using the miracle of video, we’ve interviewed as many teams as we could given the accessibility constraints. We apologize to the teams that we couldn’t get to, but we were under the gun to get as many teams as we could during our limited access time. We managed to snare 12 out of 16, which isn’t too bad, I guess, but far from our usual 100% coverage, damn it.

Team Washington:  Representing the great Pacific Northwest, we have Team Washington or Team Husky or Team Udub. This team is driving a slim configuration with two nodes, but they’re also packing eight NVIDIA V100 GPUs, so they have plenty of processing power. This is a team that can adapt on the fly, for example:  For some reason, teams have to have official data center racks for their cluster or else they’re disqualified. Back in the day, before we had all of these nitpicky rules, you used to be able to use about anything to hold your cluster. But today, you have to have an expensive rack to house your couple of nodes.

Anyway, the Udub students weren’t provided a rack by their sponsor and thus had to scramble to find one by Monday morning at 9:30 am or else face expulsion. They combed Craigslist and Facebook Marketplace and came up with a $100 42U rack. But it was in Boulder, not Dener. So they had to rent a truck, head to Boulder to pick it up, return the truck, and get it all set up by early Monday morning. Nice work, guys, great job.

https://youtu.be/kmr_PeH2RZo

Watch the video to see and hear more about the Washington team, both me and my cluster competition color commentator Jessi Lanum were highly impressed by this first-time team. Let’s see how they do.

Team Warsaw:  Jessi and I interview Team Warsaw to see how this now veteran team are handling the pressure of the SC19 cluster competition. The students from Warsaw have one of their best configurations with five nodes, eight GPUs, and a beefy Mellanox EDR interconnect. The team this year is very solid and experienced, with great skills. Could this be the year that Team Warsaw breaks out of the pack?

It’s also a closely-knit team. When we were interviewing them, one of their team members was off sleeping, so they showed her picture to the camera just to make sure that she was included in the video.

Wake Forest:  When Jessi and I check in on them, Wake Forest seems to be happy with their performance so far in the competition. They’ve established a good division of labor and are using their machine well. We run into an anomaly in the team, a finance major! Well, a finance and computer science major, but it’s the first one we’ve run into in ten years of covering competitions.

On the reproducibility challenge, the Daemon Deacons found that the paper is valid. One of the students on this app is like the most chilled out competitor we’ve seen. Kicked back, easy going, relaxed, he’s the picture of happiness, which is nice to see. Check out the video to check him out.

One of the team’s network cards went out, which is unfortunate. Under the rules, the team can’t do a restart without taking a penalty, which, to me, is sort of unfair when it’s a hardware problem that is clearly outside of student control. But rules are rules, right?

University of Illinois Urbana-Champaign:  Team UIUC is doing well when we catch up with them, with some caveats. They’re driving an older cluster that seems like it’s become a bit crotchety in its old age. As the team captain said to us, if they’re not on top of it all the time, it tends to get out of hand and overheat. To me, this sounds a bit like a nuclear pile back in the old days.

The team has two NVMe drives on each of their four nodes, plus a grand total of eight NVIDIA V100 GPUs. They’re also using IBM’s Spectrum Scale (formerly GPFS) file system and tossed out some love to IBM by mentioning it.

Check out the video to get details on their various challenges and how they got over them.

UIUC had a $700 Azure Cloud budget that they managed to blow through pretty quickly. When we talked to them, they only had $6 left in their budget. Jessi and I offered to toss in $10 each to help them get a little breathing room, but that’s against the rules. Plus, I didn’t have the sawbuck on me anyway, so it all worked out well.

Team Tennessee:  This team is an amalgamation of students from University of Tennessee, Maryville College and Phellissippi State Community College. These are all first time participants, so they have their work cut out for them. I give them a bit of grief over the unsuccessful Tennessee Volunteer football team, which was kind of fun.

While we’re interviewing the team, both Shanghai teams went over the power limit, causing sirens and lights to go off, which was also fun.

The team is realistic about their chances to take home the Championship Trophy (unfortunately, there is no real trophy). While they’re doing well, they know that it’s an uphill climb and that the most important thing about the competition is how much they’re learning. They hope to come back in subsequent years and mount another quest for cluster competition glory.

ETH Zurich:  This is the second outing for the Swiss team. Backed by the CSCS, this is a team that has proven they can compete with the top-tier competitors. How? In their first competition, they took home third place and the Highest LINPACK award at ISC19 – which is almost an unprecedented level of success for first timers. We hadn’t seen that kind of debut since the South African CHPC won the whole ISC shooting match in their first year back at ISC13.

The team is making good progress with the applications, with no apparent problems, when we find them on the competition floor. The stupid video is in and out of focus as the camera struggles to figure out where to focus.

During our conversation we discuss the differences between the ISC and SC competition. More rules at SC, plus plenty of sleep deprivation, which is a marked difference from ISC. One of the team members said that the SC competition was “more competitive” than the ISC competition, begging the question (which I asked) “how can you say it’s more competitive when you didn’t actually win the ISC19 competition?” Mean question? Yeah, it was, but I hadn’t slept much either.

The team had a bit of a letdown on their LINPACK score, which was slightly lower than their championship LINPACK at ISC, but there’s a good explanation for the discrepancy, check out the video for the details.

ShanghaiTech:  This is the third competition for a new university, ShanghaiTech. They were a powerful new competitor at ASC18, finishing in second place and punching their ticket for ISC18. They had a bit of a sophomore slump at ISC18, doing well, but not taking home any major prizes, although they were first in HPCG.

The first team member we interviewed talked about his past experience in FPGA design and claimed that his youth (he’s the youngest on the team) gives him an edge in productivity and creativity. The team has a solid complement of skills, ranging from traditional HPC drivers to computer architecture and AI specialists.

ShanghaiTech is pushing a large-ish cluster with six nodes and a whopping 16 NVIDIA V100 GPUs. That’s a whole hell of a lot of computing power, but it requires rigorous control and power throttling in order to keep it within the 3,000 watt limit. Can ShanghaiTech control this beast and get the most out of it? We’ll find out.

Purdue:  As an institution, Purdue has sponsored 14 cluster teams in worldwide major competitions. While they haven’t come home with any trophies, they’ve gained a lot of knowledge and have even built a curriculum around the events – which is a very good thing.

They’re running a system with very sporty AMD 32-core Rome processors arranged in five single-node systems. Unfortunately, their motherboards don’t support GPUs, which is a huge disadvantage in modern cluster competitions. It was unclear whether or not this configuration was intentional, thinking it could win, or if it was a technical oversight. But either way, they’re trying their best and giving it the old Purdue try – which is what you do when you’re in a cluster competition.

Team NTHU:  This is another team that has been around the block in Student Cluster Competitions, logging an astounding 17 major events over the last 12 years. They’ve amassed an enviable record of Gold Medals and LINPACK Awards, with their most recent win coming at ASC19 in Dalian, China.

They’re in a bit of trouble when we catch up to them. They have a GPU down and they can’t fix it due to cabling problems. They do have seven other GPUs, but that might not be enough for to get them over the hump.

However, like most all NTHU teams, they’ve done a great job in optimizing the apps and getting them to run. NTHU almost never submits a zero score, no matter what. In the video, I tell a story about how NTHU outwitted all the other teams during their New Orleans 2010 win – a story that is now referred to as the “Super Sort.” It’s good watching.

Nanyang Technological University:  Team Nanyang, the pride of Singapore, has become a top echelon team over the past few years and is always a threat to walk away with multiple trophies. They’re a pioneer in the “small is beautiful” cluster movement and are at it again with a two node, 16 GPU system. As we heard in the interview, the team has notched another LINPACK award. We’ll have details on that in our next story.

As we meet with the team, they’re coming down to the wire on turning in applications – but, as we note in the video, Nanyang has never not turned in a result, which is an incredible feat in modern competition history.

Team FAU:  This German team has had a long and storied history. They’ve won two LINPACK awards along with a Bronze Medal in their seven-year history. This year, they’re driving a NEC Aurora vector machine, which is a whole different deal for the team, who are used to driving more conventional clusters.

One of the problems they have is that their vector engines broke down during the benchmarking phase of the competition. They had to pull them from the cluster, which means they can only run on CPU power. This won’t give them enough processing power to compete with the other teams, unfortunately. But the plucky Germans are continuing to push and will certainly finish the competition and never give up. There just isn’t any quit in this team.

Shanghai Jiao Tong:  This is one of my favorite teams. Their coach was a long-time competitor for the school, and I must have interviewed him ten times over the years at multiple venues. He’s a hard charger, highly competitive, but more interested in what his team can take from the competition knowledge wise than taking home trophies.

Jessi and I catch up with Shanghai Jiao Tong and ask them about their competition so far. While Shanghai has had some hardware problems in the past, everything is running at 100% today. The team is driving one of the larger clusters in the competition with six nodes, eight V100 GPUs and some of the fastest CPUs in the competition at 2.6 GHz. To me, this team has been poised on the edge of moving into the top tier of cluster competition teams but hasn’t quite gotten into the groove yet. This could be their year.

Next up, we’re going to take an in depth look at the LINPACK and HPCG results, then reveal the detailed overall scoring. Following that, we’ll provide our patent pending “Power Ranking Analysis” which shows which teams are getting the most performance out of their systems. Stay tuned to this channel for all the latest. If you want to catch up on your Student Cluster Competition history, check out the new Student Cluster Competition website.

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

What’s After Exascale? The Internet of Workflows Says HPE’s Nicolas Dubé

July 29, 2021

With the race to exascale computing in its final leg, it’s natural to wonder what the Post Exascale Era will look like. Nicolas Dubé, VP and chief technologist for HPE’s HPC business unit, agrees and shared his vision at Supercomputing Frontiers Europe 2021 held last week. The next big thing, he told the virtual audience at SFE21, is something that will connect HPC and (broadly) all of IT – into what Dubé calls The Internet of Workflows. Read more…

How UK Scientists Developed Transformative, HPC-Powered Coronavirus Sequencing System

July 29, 2021

In November 2020, the COVID-19 Genomics UK Consortium (COG-UK) won the HPCwire Readers’ Choice Award for Best HPC Collaboration for its CLIMB-COVID sequencing project. Launched in March 2020, CLIMB-COVID has now resulted in the sequencing of over 675,000 coronavirus genomes – an increasingly critical task as variants like Delta threaten the tenuous prospect of a return to normalcy in much of the world. Read more…

KAUST Leverages Mixed Precision for Geospatial Data

July 28, 2021

For many computationally intensive tasks, exacting precision is not necessary for every step of the entire task to obtain a suitably precise result. The alternative is mixed-precision computing: using high precision wher Read more…

Oak Ridge Supercomputer Enables Next-Gen Jet Turbine Research

July 27, 2021

Air travel is notoriously carbon-inefficient, with many airlines going as far as to offer purchasable carbon offsets to ease the guilt over large-footprint travel. But even over just the last decade, major aircraft model Read more…

IBM and University of Tokyo Roll Out Quantum System One in Japan

July 27, 2021

IBM and the University of Tokyo today unveiled an IBM Quantum System One as part of the IBM-Japan quantum program announced in 2019. The system is the second IBM Quantum System One assembled outside the U.S. and follows Read more…

AWS Solution Channel

Data compression with increased performance and lower costs

Many customers associate a performance cost with data compression, but that’s not the case with Amazon FSx for Lustre. With FSx for Lustre, data compression reduces storage costs and increases aggregate file system throughput. Read more…

Intel Unveils New Node Names; Sapphire Rapids Is Now an ‘Intel 7’ CPU

July 27, 2021

What's a preeminent chip company to do when its process node technology lags the competition by (roughly) one generation, but outmoded naming conventions make it seem like it's two nodes behind? For Intel, the response was to change how it refers to its nodes with the aim of better reflecting its positioning within the leadership semiconductor manufacturing space. Intel revealed its new node nomenclature, and... Read more…

What’s After Exascale? The Internet of Workflows Says HPE’s Nicolas Dubé

July 29, 2021

With the race to exascale computing in its final leg, it’s natural to wonder what the Post Exascale Era will look like. Nicolas Dubé, VP and chief technologist for HPE’s HPC business unit, agrees and shared his vision at Supercomputing Frontiers Europe 2021 held last week. The next big thing, he told the virtual audience at SFE21, is something that will connect HPC and (broadly) all of IT – into what Dubé calls The Internet of Workflows. Read more…

How UK Scientists Developed Transformative, HPC-Powered Coronavirus Sequencing System

July 29, 2021

In November 2020, the COVID-19 Genomics UK Consortium (COG-UK) won the HPCwire Readers’ Choice Award for Best HPC Collaboration for its CLIMB-COVID sequencing project. Launched in March 2020, CLIMB-COVID has now resulted in the sequencing of over 675,000 coronavirus genomes – an increasingly critical task as variants like Delta threaten the tenuous prospect of a return to normalcy in much of the world. Read more…

IBM and University of Tokyo Roll Out Quantum System One in Japan

July 27, 2021

IBM and the University of Tokyo today unveiled an IBM Quantum System One as part of the IBM-Japan quantum program announced in 2019. The system is the second IB Read more…

Intel Unveils New Node Names; Sapphire Rapids Is Now an ‘Intel 7’ CPU

July 27, 2021

What's a preeminent chip company to do when its process node technology lags the competition by (roughly) one generation, but outmoded naming conventions make it seem like it's two nodes behind? For Intel, the response was to change how it refers to its nodes with the aim of better reflecting its positioning within the leadership semiconductor manufacturing space. Intel revealed its new node nomenclature, and... Read more…

Will Approximation Drive Post-Moore’s Law HPC Gains?

July 26, 2021

“Hardware-based improvements are going to get more and more difficult,” said Neil Thompson, an innovation scholar at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL). “I think that’s something that this crowd will probably, actually, be already familiar with.” Thompson, speaking... Read more…

With New Owner and New Roadmap, an Independent Omni-Path Is Staging a Comeback

July 23, 2021

Put on a shelf by Intel in 2019, Omni-Path faced a uncertain future, but under new custodian Cornelis Networks, OmniPath is looking to make a comeback as an independent high-performance interconnect solution. A "significant refresh" – called Omni-Path Express – is coming later this year according to the company. Cornelis Networks formed last September as a spinout of Intel's Omni-Path division. Read more…

Chameleon’s HPC Testbed Sharpens Its Edge, Presses ‘Replay’

July 22, 2021

“One way of saying what I do for a living is to say that I develop scientific instruments,” said Kate Keahey, a senior fellow at the University of Chicago a Read more…

Summer Reading: “High-Performance Computing Is at an Inflection Point”

July 21, 2021

At last month’s 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART), a group of researchers led by Martin Schulz of the Leibniz Supercomputing Center (Munich) presented a “position paper” in which they argue HPC architectural landscape... Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base cl Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

Berkeley Lab Debuts Perlmutter, World’s Fastest AI Supercomputer

May 27, 2021

A ribbon-cutting ceremony held virtually at Berkeley Lab's National Energy Research Scientific Computing Center (NERSC) today marked the official launch of Perlmutter – aka NERSC-9 – the GPU-accelerated supercomputer built by HPE in partnership with Nvidia and AMD. Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer called Dojo to process truly vast amounts of video data. It’s a beast! … A truly useful exaflop at de facto FP32.” Read more…

Google Launches TPU v4 AI Chips

May 20, 2021

Google CEO Sundar Pichai spoke for only one minute and 42 seconds about the company’s latest TPU v4 Tensor Processing Units during his keynote at the Google I Read more…

CentOS Replacement Rocky Linux Is Now in GA and Under Independent Control

June 21, 2021

The Rocky Enterprise Software Foundation (RESF) is announcing the general availability of Rocky Linux, release 8.4, designed as a drop-in replacement for the soon-to-be discontinued CentOS. The GA release is launching six-and-a-half months after Red Hat deprecated its support for the widely popular, free CentOS server operating system. The Rocky Linux development effort... Read more…

CERN Is Betting Big on Exascale

April 1, 2021

The European Organization for Nuclear Research (CERN) involves 23 countries, 15,000 researchers, billions of dollars a year, and the biggest machine in the worl Read more…

Iran Gains HPC Capabilities with Launch of ‘Simorgh’ Supercomputer

May 18, 2021

Iran is said to be developing domestic supercomputing technology to advance the processing of scientific, economic, political and military data, and to strengthen the nation’s position in the age of AI and big data. On Sunday, Iran unveiled the Simorgh supercomputer, which will deliver.... Read more…

Leading Solution Providers

Contributors

HPE Launches Storage Line Loaded with IBM’s Spectrum Scale File System

April 6, 2021

HPE today launched a new family of storage solutions bundled with IBM’s Spectrum Scale Erasure Code Edition parallel file system (description below) and featu Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

GTC21: Nvidia Launches cuQuantum; Dips a Toe in Quantum Computing

April 13, 2021

Yesterday Nvidia officially dipped a toe into quantum computing with the launch of cuQuantum SDK, a development platform for simulating quantum circuits on GPU-accelerated systems. As Nvidia CEO Jensen Huang emphasized in his keynote, Nvidia doesn’t plan to build... Read more…

Microsoft to Provide World’s Most Powerful Weather & Climate Supercomputer for UK’s Met Office

April 22, 2021

More than 14 months ago, the UK government announced plans to invest £1.2 billion ($1.56 billion) into weather and climate supercomputing, including procuremen Read more…

Quantum Roundup: IBM, Rigetti, Phasecraft, Oxford QC, China, and More

July 13, 2021

IBM yesterday announced a proof for a quantum ML algorithm. A week ago, it unveiled a new topology for its quantum processors. Last Friday, the Technical Univer Read more…

Q&A with Jim Keller, CTO of Tenstorrent, and an HPCwire Person to Watch in 2021

April 22, 2021

As part of our HPCwire Person to Watch series, we are happy to present our interview with Jim Keller, president and chief technology officer of Tenstorrent. One of the top chip architects of our time, Keller has had an impactful career. Read more…

Senate Debate on Bill to Remake NSF – the Endless Frontier Act – Begins

May 18, 2021

The U.S. Senate today opened floor debate on the Endless Frontier Act which seeks to remake and expand the National Science Foundation by creating a technology Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire