STEM-Trek, a nonprofit that supports scholarly travel, mentoring, and advanced skills training for science, technology, engineering, and mathematics, hosted a pre-conference workshop ahead of the annual Supercomputing Conference, SC23, in Denver, Colorado, on November 9-11, 2023. This year’s workshop, NRG@SC23, showcased a range of energy-related research projects that are developing best practices, toolkits, and novel strategies to tackle global grand challenges. Applications were especially encouraged from U.S. National Science Foundation (NSF) ACCESS Campus Champions from NSF EPSCoR (Established Program to Stimulate Competitive Research) states and territories and demographics that are underrepresented in Research Computing and Data science (RCD) academics and careers.
The pre-conference experience is a great way for RCD professionals to meet colleagues worldwide who serve in similar roles. They coalesce as a cohort before entering the full conference, which, due to its size (over 14K attendees), can be overwhelming for newcomers. Their affinity garment (a red jacket this year) helps them find each other in the crowd.
NRG@SC23 was the fifth such workshop sponsored by STEM-Trek. The first, held before SC15 in Austin, Texas, in collaboration with Executive Director Dan Stanzione at the Texas Advanced Computing Center (TACC), was for systems administrators and research facilitators. The second: “HPC On Open Ground,” or OCG@SC16, in collaboration with Dana Brunson (then Oklahoma State U, now Internet2) and Henry Neeman (U-Oklahoma), focused on food security science. URISC@SC17 provided training on high-speed networks and best practices associated with cybersecurity in coordination with Von Welch, then Principal Investigator (PI) of the NSF Trusted-CI program at Indiana University. COVID-19 forced everything to be online; STEM-Trek hosted two “ScienceSlam” remote competitions in 2020/21. In 2022, we held the first in-person workshop in five years: EarthSci@SC22.
A “jetlag” day for NRG@SC23 international delegates helped them adjust to time zone and elevation changes. We visited the National Center for Atmospheric Research (NCAR) in Boulder upon the invitation of Daniel Howard (NCAR HPC Systems Engineer) and Wenfu Tang (NCAR Project Scientist). Dr. Tang arranged six amazing talks by NCAR scientists covering HPC-enabled research in agriculture, hydrology, air quality, urbanization, and early-warning systems in Africa.
NSF funding was historically available for U.S. delegates who attended our workshops, and STEM-Trek finds financial support for international participation. Again, this year, Google came through; Amazon Web Services and VAST Data sponsored meals. SC23 General Chair Dorian Arnold donated technical program registrations (with workshops and tutorials for 18 delegates), the room for our meeting, and audio-visual support.
Thirty-eight attended NRG@SC23 – our largest cohort to date. Twenty-seven percent were female, also a record. Funding was tighter than ever – our NSF proposal fell through, but some NSF funds from a 2022 grant submitted with Boise State University, which supported three Idahoans, were rolled over. In October (after tickets were purchased), the South African government enacted cost containment measures that limited the number who could attend a single conference. Tech companies could only sponsor about half of what they previously donated. Even so, nine were supported (partially or fully) by their institutions, and one was entirely self-funded. It was gratifying that they found the pre-conference experience valuable–enough to justify the time and expense to attend – even in a tight budget year. We’ve always emphasized the importance of self-advocacy, and it must be working!
About half of the delegates were from the African HPC Ecosystems project led by the South African Centre for HPC (CHPC). This year, others from Nepal and Germany joined. Here’s the breakdown by region: U.S. (25 from 12 states; 6 EPSCoR, three female U.S. Air Force Academy Cadets (future Space Force?), and 18 ACCESS Campus Champions); South Africa (8 from three provinces); Mozambique (2 from their research and education network, MoRENet); Botswana (1); Nepal (1); and Germany (1).
More, faster horses!!!
STEM-Trek Director and Founder Elizabeth Leake opened the workshop with a quote by Henry Ford, who said that if he had asked customers what they wanted, they would have said, “faster horses.” She added that the practice of adding more acceleration to high-performance computing (HPC) isn’t sustainable from energy and cost perspectives – it’s time to redirect.
The IEEE Floating Point standard was registered in 1985. Since then, Moore’s Law ensured that we doubled power every two years until the data deluge broke it. When the pandemic affected supply chains, countries that didn’t make their own chips were disadvantaged – everyone needed technology for education, healthcare, commerce, and more. In response, the U.S. CHIPS and Science Act of 2022 provided $52.7B in research and development (R&D) funding, which influenced U.S. agency program offerings aimed at improving U.S. competitiveness in chip production and supply-chain resilience. The E.U. passed a comparable act that year allocating €43B from the public investment. Since then, there has been an estimated $200B investment in the private sector. As the U.S. renewed its commitment to R&D, the Whitehouse proposed historic increases for all science-serving agencies totaling $2B over the next decade (NSF, National Aeronautics and Space Administration/NASA, National Institute of Standards and Technology/NIST, Department of Energy, and others).
With some overlap, the data center construction industry is projected to exceed $400B by 2032. In the future, we will likely see small modular reactor innovation (SMR) powering data centers, which are traditionally built in regions with cheap real estate and an abundance of alternative energy. Many who attend our workshops live and work in such places. Idaho National Laboratory leads R&D in SMR nuclear innovation (e.g., Natrium reactors). We tried to engage SMR innovators for our workshop, but most are held to non-disclosure agreements as they pursue Nuclear Regulatory Commission approval – environmental review and licensing can take years. Meanwhile, keep an eye on Bill Gates, Kemmerer, Wyoming and TerraPower.
More money – unintended consequences
The unprecedented public/private investment in R&D has unintended consequences for academia. University RCD talent that historically trained the workforce is being siphoned away at an alarming rate. Continuing a trend that began in 2018, more big RCD employers waived degree requirements in 2022 to attract candidates. From 2020 to 2022, many students dropped out of or failed to enter universities, opting instead to pursue careers – unwilling to accrue student debt for a virtual experience. Big tech reduced their footprint during COVID, so many are no longer wrangling huge office facilities. They’re building manufacturing facilities with processes that are driven by artificial intelligence (AI) that require less manual labor. Many can now offer skilled RCD talent full remote employment (for some RCD roles, not all). Remote is attractive to those who grew accustomed to it during the pandemic while reducing their carbon footprint. Commercial entities can offer higher salaries than academia, especially public schools, whose wages are governed by rigid state policies. Remote attendance is difficult for many universities to defend. Iconic architecture and the social experience gained through the traditional post-secondary experience built a strong destination-reliant legacy, although hybrid is now more common and likely here to stay.
Open-source culture may be affected
Historically, most research funding has been sponsored directly by public investment or philanthropies in the form of grants to P.I.s, which helped foster a rich open-source culture. Intellectual property derived from commercial R&D is protected. The shift toward more private than public spending could impact those from resource-constrained colleges and universities who can’t afford to purchase licenses. Less data could be FAIR (Findable, Accessible, Interoperable, and Reusable). It’s important that the communities of practice that rely on open-source innovation continue to serve as advocates. To illuminate the importance of preserving the open-source culture, Alex Scammon (Head of Open-Source Development at G-Research) was invited to speak at NRG@SC23.
Maybe we need unicorns instead of horses?
With greater emphasis on generative AI and quantum computing, workflows and information are growing in scale and complexity. Data must be managed more judiciously; software, hardware, and networks are being customized to achieve the highest precision, move quicker, and with the fewest bits – often at the edge. Custom computing environments, some employing exotic math, are designed to require much less (and much different) storage that consumes less energy. To address these demands, software, new interconnects, storage innovation, and methods of securing data from end-to-end are being developed.
IBM released its quantum roadmap on Dec. 4, 2023, which has them on track for full potential – hardware, theory, and software – in 2033. Unlike classic computers that, with increased computational capacity, use a corresponding amount of energy, quantum, due to the physics involved, requires significantly less power, with the potential to offer far superior results.
Quantum hasn’t escaped the Colorado School of Mines in Golden, Colorado. At NRG@SC23, Mines Grad Student Sean Feeney presented on quantum software, and Liwen Shih (UHouston-Clear Lake/visiting research faculty at Oak Ridge National Laboratory) emphasized the importance of quantum literacy. Showcasing a computational challenge for the National Renewable Energy Laboratory (NREL), also based in Golden, Judith Vidal (NREL with a Mines joint appointment) presented her work with the NREL Thermal Storage Materials Laboratory, which measures the full range of thermophysical material properties and material degradation evaluations – challenges that quantum technologies will greatly enhance in the future.
Processes that are designed for energy efficiency tend to be quieter – a quality that is essential for the Square Kilometer Array (SKA) and other radio astronomy instrumentation being installed in radio-quiet regions of South Africa and Australia. When it’s operational, the SKA will share data (an estimated 11 exabytes daily) with institutions around the globe, including 19 ground-based telescopes supported by the NSF. Such globally distributed data challenges were among the science drivers that justified the NSF’s investment in the South Atlantic Cable System (SACS) via the Americas-Africa Lightpaths Express and Protect (AmLight-Exp) project based at Florida International University’s Center for Internet Augmented Research and Assessment (CIARA). SACS delivers 100G end-to-end connectivity to three continents. AmLight’s Vasilka Chergarova presented at NRG@SC23 about relationships they’re fostering in the U.S., pan-Africa, and Brazil.
Composable computing, which underpins instruction-set architecture, is increasingly common. It’s no longer necessary to amass a commercial case for the production of a single-chip design (new fabrication facilities being built with the CHIPS investment will likely feature composable hardware and interconnects). Many exist in commercial clouds where industrial users can pay as they go. Amazon Web Services (AWS), for example, features 275 EC2 HPC instances with varied architecture, memory, bandwidth requirements, etc. AWS Graviton, a series of 64-bit ARM-based CPUs (designed by AWS subsidiary Annapurna Labs), launched in late 2018, released Graviton 4 on Nov. 28, 2023 (four versions in five years!).
Commercial cloud’s diversity is appealing but difficult for universities to adopt with a grant-funded financial model. Industry partners who can pay on demand are now more important, and they can also sponsor internships and workshops, like NRG@SC23. While a university could never afford to host as many HPC instances as AWS, NSF ACCESS (Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support) features a diverse portfolio of options that are available to U.S. researchers and their collaborators at no cost. Its composable system, ACES (Accelerating Computing for Emerging Sciences) at Texas A&M University, features CPU, GPU, (Graphcore) IPU, and Field Programmable Gate Array (FPGA) nodes that can be used to design novel instruction-set architecture. ACCESS PI Shelley Knuth presented an overview of the U.S. federated program, and ACES PI Honggao Liu (Texas A&M University) shared ACES highlights during the NRG@SC23 workshop.
U.S. agencies are making a greater investment in commercial cloud. Academic RCD facilitators must, therefore, make an intentional effort to on-board constituents. Each agency cloud has slightly different rules of engagement and eligibility requirements. NSF, through the ACCESS program, features Cloud Bank. Additionally, the U.S. National Institutes of Health (NIH) hosts the STRIDES program, the Department of Defense (DOD) has the Joint Warfighter Cloud, and NASA underwrites the Earthdata Cloud.
Because architecture and standards have remained somewhat stable for decades, it will be difficult for mid and late-career RCD professionals to keep up with the rapid changes; the Academy tends to move at a snail’s pace. RCD training models in the U.S. and pan-Africa have historically re-employed decommissioned clusters. But in today’s landscape, once it’s out of warranty, it will grow obsolete quickly and is more vulnerable to attack since it’s not patched as aggressively. That said, on-prem hardware is extremely useful for hands-on teaching of the basics to prepare future systems, cybersecurity, electrical, and network engineers. Recommissioned hardware is still useful if it’s carefully maintained.
As for talent, intellectual curiosity, tenacity, and creativity are prized as exotic math and software-defined architecture make their debut. Students must be exposed to theoretical and quantum computing – the ability to think outside of the box is critically important if we hope to solve big problems in novel ways. They need access to modern resources, such as NSF ACES and the Isango platform, envisioned as a small, affordable, composable, and portable training sled under warranty and backed by its global community of support.
Ideally, RCD pros dedicated to teaching and training will remain in academic roles to prepare the workforce. It’ll take a concerted effort to keep up – they should plan to allocate 20 percent of their time toward learning new skills. Adequate funding for scholarly pursuits must be budgeted (conference travel and professional association memberships). Universities must offer more vocational training and ally with industry partners to support internships.
We appreciate all volunteers who helped with NRG@SC23 planning: Bryan Johnston (South African CHPC), Shannon Beck (USAF Academy), Daniel Howard (NCAR), Wenfu Tang (NCAR), and Kurt Keville (UMass-Boston); a special thanks to Kurt for driving the van to NCAR!
Thank you, sponsors!