A Farewell to XSEDE: A Retrospective & Introduction to the ACCESS Program

By Ken Chiacchia, Pittsburgh Supercomputing Center

July 22, 2022

John Towns, principal investigator of the Extreme Science and Engineering Discovery Environment (XSEDE), gave an overview of the soon-to-complete NSF-funded cyberinfrastructure project by first reviewing his own path as a CI leader.

It’s all about failure, he explained – in particular, how lessons learned plant the seeds of something bigger and better. Towns presented his talk on July 14, the final day of the PEARC22 conference, “… To XSEDE and Beyond, or How Did We Get Here and Where Are We Going?”

The Association for Computing Machinery (ACM) Practice and Experience in Advanced Research Computing (PEARC) Conference Series is a community-driven effort built on the successes of the past, with the aim to grow and increase inclusivity by involving additional local, regional, national, and international cyberinfrastructure and research computing partners spanning academia, government, and industry. ACM PEARC22, which took place last week in Boston, explored current practice and experience in advanced research computing, including workforce development, training, diversity, applications and software, and systems and software.

Moving toward success

Describing himself as, sequentially, a “failed” general-relativity physicist, computational scientist, network-applications specialist, and builder of large-scale computational environments, he drew a direct line – invisible to himself at the time – leading to his 11-year leadership of one of the largest, most geographically distributed efforts to expand access to NSF-funded advanced computing resources and services to an ever-growing community. XSEDE did this by creating a distributed, inclusive, and innovative environment matching that community to rapidly evolving technologies.

“I was also working on building community … in those days, even if I didn’t realize I was doing it,” he said.

Another step toward making XSEDE a success, he said, was his stint as forum chair of the TeraGrid project, XSEDE’s predecessor organization, from 2008-11. While governance of TeraGrid, which was led by a council of often-competing NSF-funded resource providers, was sometimes contentious, it laid the groundwork for the more-successful follow on program.

TeraGrid’s final report reviewed how the organization had offered computing services to a wider range of users (including the initiation of the Campus Champions program in 2008, which would go on to become a popular onramp to XSEDE), innovated student programs to educate the next generation of HPC professionals, and co-founded the ARCC conferences. Another success of TeraGrid, continued in XSEDE, was that the NSF-funded program did not only support NSF-funded research; discussions with NSF leadership led to a policy to all open research whether or not it is funded by the NSF. That decision gave both organizations an impact far beyond the NSF ecosystem.

John Towns (center)

Perhaps more importantly, the report described the challenges with which the organization struggled.

The report’s recommendations were incorporated into the NSF’s Extreme Digital Competition, which led to XSEDE being funded in July 2011. The five-year project would be funded to the tune of $121 million, not counting the $4.6 million in supplements eventually awarded.

XSEDE is born

Not that XSEDE was an instant success.

“It wasn’t pretty to start out with,” Towns said. “The fact of the matter is that XSEDE has been very, very successful, but in the first year or so, it didn’t look like it does today … There were deep divides among the participants; it was a challenging process to get through … I wonder how we were able to operate as well as we did when we look at it today.”

The first three years of the project then became an intense team-building effort featuring expansive and ongoing strategic planning. Towns’ talent for mixing delegation with authority proved central to that team building.

While Towns was always there to resolve differences when needed, XSEDE was composed of service providers (HPC centers more or less synonymous with the NSF-funded, XSEDE-allocated computers they hosted) that became competitors the moment the next NSF call for proposals came out. This dual nature of collaboration and competition created complex relationships that necessitated “leadership by influence,” he said.

A “balanced governance” model offered strong central management with decision-making “pushed to as far down in the structure as possible.” Key to that were delegation and decentralization, with genuine stakeholder participation and formal risk management and change control to make governance decisions concrete and defensible.

“How did we get here? Trust,” Towns said. With managers who did not report to him in the usual manner and Service Provider Partners with independent subawards, finding confluences of interest proved key to making the teamwork work. In turn, the complementary capabilities of the partners leveraged the strengths of the smaller institutions and offered a diversity of viewpoints and contributions.

“We stubbed our toes on that a number of times, but we’ve gotten pretty good at it.”

XSEDE was followed by XSEDE 2.0 on Sept. 1, 2016, a non-competitive grant that renewed the organization, modified with new lessons learned, for a further five years. After that came a last year of extension to allow the project to wrap up in an organized fashion and to ease transition to the ACCESS program that came to follow it.

“There was not a competition, so the review was very rigorous,” Towns said of the XSEDE 2.0 granting process. “That was painful but useful.”

XSEDE’s successes

Towns reviewed XSEDE’s successes, which included high marks from user reviews for virtually every function of the organization.

XSEDE has provided one-stop-shop allocations, user support, training and education, cybersecurity, and other infrastructure for NSF-funded high-performance computers across the U.S. During year six of the project, the program had more than 17,500 unique active users, over half of whom were graduate or undergraduate students. Even more interesting, greater than 3,800 of these users weren’t associated with an XSEDE allocation, reflecting the popularity and usefulness of XSEDE’s training programs for researchers who aren’t even using XSEDE machines. These investigators’ work has produced, conservatively counted, more than 19,600 verified publications that have been collectively cited more than 730,000 times over both the XSEDE and XSEDE 2.0 projects.

XSEDE offered user support as well as allocations and computation. To date, XSEDE 2.0 has answered more than 74,000 service requests, with just over 15 hours to resolution on average today. A more intensive support program, the Extended Collaborative Support Service (ECSS), provided a deeper level of collaboration to support users with subject expertise with needed skills, as well as a Novel and Innovative Projects program that eased entry to users with little or no computational experience. To date ECSS has completed 249 projects, with an average productivity gain of 12.9 months.

In the workforce development sphere, XSEDE has also proved to be essential for the computational science community. To date, XSEDE has delivered more than 260,000 participant hours of live training (in addition to substantial use of online, self-paced training offerings). In Year 6, more than a third of those taking advantage of XSEDE training were women or racial/ethnic minority members — “which I think is impressive,” Towns said. “Is it where we need to be? Absolutely not.”

Nor is XSEDE’s success only visible from within the organization. This year, 700 user-survey respondents ranked the organization at 4.0 or better out of 5 in all areas, with an overall satisfaction score of 4.54. Users ranked the importance of the resources to their work at 4.42, near the historical high of 4.43; 58 percent of respondents reported XSEDE to be “essential” to their work.

“Maintaining that level of overall satisfaction at this point in the life of any project is significant,” Towns said. As for the importance scores, Towns said, that reflects well on XSEDE “but also is a reflection of how dependent so many fields of science are on computation” today.

Towns ended his talk with a discussion of where the academic HPC field is going in a broad sense. One issue, he said, will be finding better ways of supporting the professionals who do what XSEDE did.

“Naming is part of the problem,” he said. “We need to figure out what we want to call ourselves.” The issue goes deeper than terminology, he added: Identity as a profession, including possibly a professional organization as well as official recognition of individuals’ contributions, will be needed to crack problems such as that of not offering professional advancement without effectively leaving one’s expertise via management positions.

Advancing to ACCESS

On Sept. 1, 2022, the new Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program will replace XSEDE. Towns’ presentation was followed by a panel discussion of ACCESS and how it will operate.

Tom Gulbransen, program manager of the NSF’s Office of Advanced Cyberinfrastructure, kicked off the panel with a high-level description of ACCESS’s emphasis on continuity of services, along with novel innovations to expand the ecosystem participants, and leveraging related NSF programs. NSF awards thus far to ACCESS teams total $52 million over five years.

Gulbransen mentioned highlights from PEARC meeting sessions to reinforce NSF’s Office of Advanced Cyberinfrastructure’s emphasis on recognizing and promoting the value produced by cyberinfrastructure professionals from a variety of institutions; ensuring that resource providers and ACCESS leadership listen to stakeholders ranging from new researchers to institutional leaders who invest in cyberinfrastructure; and coordinating leadership that occurs through interfaces and exchange with varied community leaders throughout the advanced cyberinfrastructure ecosystem.

Allocations

David Hart of the National Center for Atmospheric Research and Laura Herriott of the National Center for Supercomputing Applications will be co-PIs of the allocations segment of ACCESS. Stephen Deems of the Pittsburgh Supercomputing Center will be PI for the project. Called Resource Allocations Marketplace and Platform Services (RAMPS), it will provide tiered methods of entry to researchers with different needs and levels of HPC expertise. The new structure will be focused on increasing ease of access for new and existing communities as well as democratizing access to the resource environment so that researchers can choose resource environments that can best meet their needs. In addition, the service providers of XSEDE/resource providers of ACCESS will be able to tailor their resource descriptions to attract relevant researchers to their machines.

The ACCESS allocation environment will offer a welcoming gateway that inspires collaboration and participation in the pursuit of scientific discovery while continuing to provide an essential gatekeeping function when necessary to balance demand for resources with the available supply, Hart explained. Diversity, equity, and inclusion (DEI) in the allocations environment is a central focus of the project, along with a simplified request and accelerated review framework. Supporting the allocations marketplace, the allocations software platform will be built with an emphasis on modularity, extensibility, and decentralization, with enough flexibility to accommodate allocations for current and future resources and services. Finally, through a suite of Innovative Pilots, the RAMPS team will introduce disruptive features to the ecosystem and allocations marketplace to further expand coverage for data resources and workflow-based needs.

End-User Support

Shelley Knuth of the University of Colorado Boulder presented the user support services segment – the Multi-tier Assistance, Training, and Computational Help (MATCH) program – for which she will serve as PIand includes the University of Colorado Boulder, the University of Southern California, the Massachusetts Green High Performance Computing Center, the University of Kentucky, and the Ohio Supercomputer Center.

Like RAMPS, MATCH will use tiers to offer support to users in a way that helps them find the precise level of support they need as quickly as possible. Tier 1 will offer a set of user-friendly tools that make it easy for researchers to use ACCESS resources. Tier 2, a curated knowledge base, will provide documentation and examples, as well as input from community experts designed to answer a wide range of researcher questions; an innovation in this tier will be funding to support experts who contribute. Tier 3, MATCH-Plus, will provide short-term support partnerships, connecting research projects with students who have the needed expertise as well as a mentor from MATCH. The highest tier, MATCH-Premier, will offer connections to pertinent specialists best-suited to provide long-term embedded support.

Operations

Amy Schuele of NCSA presented ACCESS’s operations component, the COre National Ecosystem for CyberinfrasTructure (CONECT), which she will PI along with other experts at the Argonne National Laboratory/University of Chicago, PSC, Indiana University, Florida International University, San Diego Supercomputer Center, and the National Center for Supercomputing Applications.

CONECT will integrate and support resource providers through operations; data and networking; cybersecurity; and, through the Student Training and Engagement Program (STEP), provide students with high-value, marketable skills in the above areas. Roadmaps will be used to systematically describe the steps associated with Resource Provider integration. 

Monitoring and Measurement

Thomas Furlani of the University at Buffalo described how the Monitoring and Measurement Service of ACCESS will employ and expand the popular XDMoD tool to improve overall system and application performance. The system will offer analytics of workload performance, network traffic, application diagnosis and optimization, a cyberinfrastructure for predictive analysis, and a pilot program to measure and assess the energy and application performance of novel computing architectures.

An important new facet of the track will be to an XDMoD analytical framework so that a wider community can employ the tool to study CI issues both within ACCESS and on other CI ecosystems. A new open data analytics framework, using Jupyter Notebook, Python, and other tools, will allow creative and ad hoc analytics to discover insights from the vast amount of system usage data available and to improve the overall understanding of CI. Open XDMoD will be offered for installation at campuses. 

Coordination

Towns wrapped up the presentations with an overview of the OpenCI ACCESS Coordination Office. Towns will serve as the PI, with support from Co-PIs at the San Diego Supercomputer Center and Georgia Tech.

Leveraging deep knowledge of XSEDE and its lessons learned, the coordination office will emphasize a seamless transition to ACCESS for users and resource providers. It will develop collaboration tools to support inter-track communications and coordinate outreach, DEI, and evaluation efforts across the program. It will also communicate with stakeholders to optimize ACCESS’s impact and take advantage of opportunities to improve the program.

Accessing ACCESS

Users with a current project and allocations awarded via XSEDE should notice no changes to their resource access after August. ACCESS and the Service/Resource Providers have agreed to honor all XSEDE-awarded allocations until they expire. This includes projects awarded at the upcoming August XRAC meeting with a start date of Oct. 1, 2022.

Users with an XSEDE-awarded research project that expires on Dec. 31, 2022, should continue to plan for a proposal submission window of Sept. 15 to Oct. 15, 2022, under the existing XRAC guidelines. ACCESS is supporting this “bridge” meeting to make sure no project falls through the cracks. However, users with needs at the smaller end of the scale should check out the ACCESS Allocations Marketplace tiers before starting a proposal, as they may offer an expedited means of getting computing time.

The coordination office is building a website that will offer entry to ACCESS in much the same way that xsede.org has; that site is yet to be launched but will be found at access-ci.org starting Aug. 1, 2022.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

2024 Winter Classic: The Return of Team Fayetteville

April 18, 2024

Hailing from Fayetteville, NC, Fayetteville State University stayed under the radar in their first Winter Classic competition in 2022. Solid students for sure, but not a lot of HPC experience. All good. They didn’t Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use of Rigetti’s Novera 9-qubit QPU. The approach by a quantum Read more…

2024 Winter Classic: Meet Team Morehouse

April 17, 2024

Morehouse College? The university is well-known for their long list of illustrious graduates, the rigor of their academics, and the quality of the instruction. They were one of the first schools to sign up for the Winter Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pressing needs and hurdles to widespread AI adoption. The sudde Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs 

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire