Ten Years of Delivering HPC Leadership Training 

By Andrew Jones

November 7, 2023

SC23 will mark the tenth anniversary of the HPC Leadership Tutorials. A decade ago, providing training for the HPC community other than for users, programmers, or system administrators was almost unheard of. Today, this series of HPC Leadership classes, delivered in various guises from official SC tutorials to executive-style dedicated classes, remains uniquely positioned as the community’s only recognized impartial source of training on the skills essential to plan, procure, and deliver the highest quality HPC services. The training covers topics spanning HPC strategy, procurement, cost models (including Total Cost of Ownership, TCO), value models, business cases, risk, metrics, and much more, building on the significant combined experience of the tutors. 

SC Tutorials 

The first ever tutorial in the series was “Effective Procurement of Supercomputers” at SC13 in Denver. The tutorial has grown significantly in scope since then and has been run at SC16 (2x half-day tutorials), SC17 (3x half-day tutorials), SC18 (2x half-day tutorials), SC19 (1x full-day combined tutorial), although it was not selected for SC23. It was also run at ISC19 (½-day) and the 2017 Rice Oil & Gas HPC Workshop. The training has also been delivered in numerous private settings. 

TACC HPC Leadership Institute 

However, it was the TACC leadership’s belief in the value of this training that helped launch the “executive-style” 3-day format, providing interactive training with the greatest breadth of content and fulfilling a new demand from the community. The TACC HPC Leadership Institute became an annual September fixture in Austin, TX, for 2017, 2018, 2019, 2021 (an online edition that sold out in a week!), and 2022. In 2023, Stanford hosted the TACC-affiliated Institute for the first time.  

Dan Stanzione, Director of TACC

Dan Stanzione, Director of TACC, explains: “Like so many parts of the science and engineering enterprise, there is a persistent belief in the supercomputing world that if you just master all the technical skills, you are ready to move up in management. Then you get hired to run a research computing group somewhere, and you realize that those technical skills, while an essential foundation, are just a tiny part of what you need to know. These courses have made me think hard about codifying the “other duties as assigned” part of my job, which is most of my job. It’s been gratifying seeing the thoughtful responses and the light going on for so many of the attendees who realize – often during the course – that the things that have made their life difficult are parts of the job they hadn’t ever really thought about, from procurements to managing, to understanding how to advocate for the resources they need.” 

Popular and highly-regarded 

The classes are consistently popular, both in the number of attendees and in attendee feedback. The tutorials have attracted over 100 attendees every time they have been run at SC. Indeed, the aggregate attendance across the various SC tutorials and other venues has been estimated at over 1000 participants. Attendee feedback is consistently outstanding – from top scores among all SC tutorials across the board of feedback metrics – to direct attendee feedback. 

Keith Gray, Oil and Gas HPC Architect at Intel and retired head of HPC at BP, recalls: “The BP HPC team sent almost all of our early career people to the TACC HPC Leadership Institute. They gained a much better understanding of how to evaluate and acquire new systems, how HPC creates value for the organization, and they began to develop a network of friends in the HPC world who will help them succeed in their careers.” 

Andrew Jones AKA “hpcnotes”

A collaboration of many people 

I created the HPC leadership training concept (Andrew Jones, also known on social media as “hpcnotes,” working at NAG at the time) along with Terry Hewitt (then at STFC). Jonathan Follows (also STFC) was part of the original creation team but sadly never got to deliver a tutorial. While Andrew remains the only ever-present tutor from the beginning, the line-up of co-tutors has been refreshed multiple times over the years. Owen Thomas (Red Oak Consulting) added significant experience and some key sections of content, Ingrid Barcena Roig (KU Leuven) was a past attendee who joined the tutor team and injected a major visual update into the slides), and Dan Stanzione (TACC) added some experiences from his journey to an HPC leader. Other tutors included Dairsie Latimer (Red Oak Consulting), Mike Croucher (then at NAG), Brandon Moore (at NAG and later at AMD), Sierra Koehler (then at Key Government Finance), and Ruth Marinshaw (Stanford). 

Christine Harvey

The current co-tutor and joint leader is Christine Harvey (MITRE Corp.), who comments: “Joining in teaching this course has been a wonderful professional and educational experience. There is so much knowledge and expertise in the stories (IYKYK) contained in the tutorial, and I’m incredibly appreciative of everything done to develop the content for the course and the flexibility to add my spin on things. Having co-taught the class thrice, every experience has been completely different. The participants, hosts, environment, and guest lecturers create a distinctive and familiar experience. I’ve thoroughly enjoyed getting to know all of my co-tutors while working through the slides, every class participant during Q&A and over-shared meals, and getting to focus on the material myself while observing and teaching the course. I think Andy and everyone that’s helped build the course has done a fantastic job of taking practical and sometimes stuffy topics and providing engaging and informative sessions and community.” 

Guest speakers have included Bill Barth (TACC), Tommy Minyard (TACC), Melyssa Fratkin (TACC), and Chris Dagdigian (Bioteam). Many others have contributed to this success, whether through helping to manage the TACC Institute (Lucas Wilson, Charlie Dey, Melyssa Fratkin, and others), supporting the SC tutorials (NAG and Red Oak Consulting colleagues), or simply believing in us – TACC leadership, Andrew’s managers at NAG and Microsoft, Nages Sieslack at ISC, the SC tutorial chairs, Keith Gray, Ken Odegard (then at Rice), and the late Rich Brueckner, among others. Some of those contributors will be gathering on Monday at SC23 in Denver to celebrate and reminisce. 

Ultimately rewarding 

It is great to see our original vision still thriving ten years later, continually improving and evolving, including adapting to changes in technology over the decade and addressing diversity and inclusivity among tutors and attendees. There have been many rewarding moments, from realizing that things like TCO models, metrics, or value considerations, which were rare a decade ago, are now commonplace in HPC conversations, to seeing our classes reflected in subsequent HPC procurements and service delivery, to delightful attendee feedback such as “one of the best tutorials I ever attended.” 

Thoughts from a selection of other contributors, past and present 

Terry Hewitt

Terry Hewitt (now retired) 

As a community, we should strive to raise the standards of our work. One of the ways Andrew and I thought we could do this was through sharing our expertise in supercomputing procurement with colleagues at SC, on the basis that we would also learn from the experiences of attendees, and we have certainly done. All supercomputing procurements need to demonstrate ‘good value for money,’ one element of that is evaluating how our peers do it. Sharing, critical appraisal, and constructive criticism are good for us all (both presenters and attendees). We also had ideas at the time of expanding the scope, and Andrew has developed the tutorials in many ways since I retired.  

Jonathan Follows

Jonathan Follows (now retired) 

I was pleased to be involved, as I have always found the topics covered interesting personally, and I had some good recent experiences at the time of writing my part, which I think was useful, and I was happy to share. I think that HPC procurements can often be run better than they are, and I have always felt a strong duty of openness, transparency, and honesty when spending money. I’m glad that something I was briefly involved in at its start continues to make a difference and helps people. 

Owen Thomas

Owen Thomas (Red Oak Consulting) 

I feel truly honored to have been able to co-deliver these tutorials, which were stressful to write (focusing on so few pages) but fun to present. The attendees were always fully engaged, and we always engaged in interesting and challenging discussions. Probably hundreds of attendees have gone on to be directly involved in procurements, and the benefits to them (as reported by them) have been huge. One calculation we share estimates that the potential impact of each week’s delay in starting value generation from an HPC system can be around 1% of the capital purchase price. Our attendee’s ability to better manage procurement time and risks mean the cumulative benefit across them all far exceeds all the tutorial costs (by some very large factor)! 

Melyssa Fratkin

Melyssa Fratkin (TACC – and one of the early believers) 

I remember participating in the ‘write the requirements’ exercise for sunglasses – I remember it being really challenging but fun. I also recall getting called on to answer a question because Andy noticed that I was playing Scrabble in the back of the room… and getting the answer right anyway. I say it all the time – this class condenses everything I learned in Business School into three days, and it applies to the real world in an HPC center! 

Ingrid Barcena Roig

Ingrid Barcena Roig (KU Leuven) 

I am happy to be part of the celebration of the ten years of HPC Leadership Tutorials, and I hope they will continue for many years more, as I strongly believe they are needed in our community. My first contact with the HPC leadership tutorials was as an attendee of the very first tutorial of the series at SC13. It was inspiring, and what I learned helped me to manage HPC procurements in my work in a more systematic and professional way. Years later, I was honored to be a co-tutor in the SC18 and ISC19 editions. I not only had the opportunity to collaborate and learn from a team with deep expertise on the different aspects of HPC management, but I also enjoyed the interactions and the sharing of experiences with very motivated attendees. 

Dairsie Latimer (Red Oak Consulting) 

My favorite memory came up after presenting in Denver in 2019. Despite how often the course had been run previously, there would always be a receptive audience because as HPC changes, so do the top tips we can impart. 

Mike Croucher

Mike Croucher (now at Matlab) 

I co-presented the workshop at ISC 2019, where my topic was Total Cost of Ownership models. Not only was it the first ISC I had presented at, it was the first ISC I had ever attended! I learned so much working alongside Andrew and Ingrid that year, and it also proved to be the perfect introduction to the ISC community. The ultimate networking session! I met so many people that day who I’ve stayed friends with ever since. It was a privilege to be part of this long-running tradition in HPC.  

Branden Moore

Branden Moore (now at AMD) 

Ten years! Maybe one day, we will reach everyone who needs help with their procurements and metrics! The course was always interesting and different every time. I would learn something new each class, usually from the attendees. Their perspectives helped to shape how I viewed HPC Leadership, which continues to assist me today. 

Ruth Marinshaw (Stanford, 2023 host and speaker) 

Ruth Marinshaw

After participating in one of the recent SC-based tutorials, I left wishing there had been learning opportunities like this when I began my HPC leadership journey decades earlier. Though I have been leading HPC programs for years, I still learned a lot. This fall, I was fortunate to host and present at the 2023 HPC Leadership Institute, seeing Andy and Christine in action over three days. New insights, new ideas, and great reinforcement of important lessons learned. I greatly appreciate the contributions Andy and his various partners have made to community workforce development through these leadership tutorials over the last decade. I hope the series continues for years to come. 

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators and Atos' financial turmoil have the HPC market worried. If Read more…

Core42 Is Building Its 172 Million-core AI Supercomputer in Texas

May 20, 2024

UAE-based Core42 is building an AI supercomputer with 172 million cores which will become operational later this year. The system, Condor Galaxy 3, was announced earlier this year and will have 192 nodes with Cerebras Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's latest weapon in the AI battle with GPU maker Nvidia and clou Read more…

ISC 2024 Student Cluster Competition

May 16, 2024

The 2024 ISC 2024 competition welcomed 19 virtual (remote) and eight in-person teams. The in-person teams participated in the conference venue and, while the virtual teams competed using the Bridges-2 supercomputers at t Read more…

Grace Hopper Gets Busy with Science 

May 16, 2024

Nvidia’s new Grace Hopper Superchip (GH200) processor has landed in nine new worldwide systems. The GH200 is a recently announced chip from Nvidia that eliminates the PCI bus from the CPU/GPU communications pathway.  Read more…

Europe’s Race towards Quantum-HPC Integration and Quantum Advantage

May 16, 2024

What an interesting panel, Quantum Advantage — Where are We and What is Needed? While the panelists looked slightly weary — their’s was, after all, one of the last panels at ISC 2024 — the discussion was fascinat Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's l Read more…

Europe’s Race towards Quantum-HPC Integration and Quantum Advantage

May 16, 2024

What an interesting panel, Quantum Advantage — Where are We and What is Needed? While the panelists looked slightly weary — their’s was, after all, one of Read more…

The Future of AI in Science

May 15, 2024

AI is one of the most transformative and valuable scientific tools ever developed. By harnessing vast amounts of data and computational power, AI systems can un Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

ISC 2024 Keynote: High-precision Computing Will Be a Foundation for AI Models

May 15, 2024

Some scientific computing applications cannot sacrifice accuracy and will always require high-precision computing. Therefore, conventional high-performance c Read more…

Shutterstock 493860193

Linux Foundation Announces the Launch of the High-Performance Software Foundation

May 14, 2024

The Linux Foundation, the nonprofit organization enabling mass innovation through open source, is excited to announce the launch of the High-Performance Softw Read more…

ISC 2024: Hyperion Research Predicts HPC Market Rebound after Flat 2023

May 13, 2024

First, the top line: the overall HPC market was flat in 2023 at roughly $37 billion, bogged down by supply chain issues and slowed acceptance of some larger sys Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

Leading Solution Providers

Contributors

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel Plans Falcon Shores 2 GPU Supercomputing Chip for 2026  

August 8, 2023

Intel is planning to onboard a new version of the Falcon Shores chip in 2026, which is code-named Falcon Shores 2. The new product was announced by CEO Pat Gel Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

How the Chip Industry is Helping a Battery Company

May 8, 2024

Chip companies, once seen as engineering pure plays, are now at the center of geopolitical intrigue. Chip manufacturing firms, especially TSMC and Intel, have b Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire