John Towns, principal investigator of the Extreme Science and Engineering Discovery Environment (XSEDE), gave an overview of the soon-to-complete NSF-funded cyberinfrastructure project by first reviewing his own path as a CI leader.
It’s all about failure, he explained – in particular, how lessons learned plant the seeds of something bigger and better. Towns presented his talk on July 14, the final day of the PEARC22 conference, “… To XSEDE and Beyond, or How Did We Get Here and Where Are We Going?”
The Association for Computing Machinery (ACM) Practice and Experience in Advanced Research Computing (PEARC) Conference Series is a community-driven effort built on the successes of the past, with the aim to grow and increase inclusivity by involving additional local, regional, national, and international cyberinfrastructure and research computing partners spanning academia, government, and industry. ACM PEARC22, which took place last week in Boston, explored current practice and experience in advanced research computing, including workforce development, training, diversity, applications and software, and systems and software.
Moving toward success
Describing himself as, sequentially, a “failed” general-relativity physicist, computational scientist, network-applications specialist, and builder of large-scale computational environments, he drew a direct line – invisible to himself at the time – leading to his 11-year leadership of one of the largest, most geographically distributed efforts to expand access to NSF-funded advanced computing resources and services to an ever-growing community. XSEDE did this by creating a distributed, inclusive, and innovative environment matching that community to rapidly evolving technologies.
“I was also working on building community … in those days, even if I didn’t realize I was doing it,” he said.
Another step toward making XSEDE a success, he said, was his stint as forum chair of the TeraGrid project, XSEDE’s predecessor organization, from 2008-11. While governance of TeraGrid, which was led by a council of often-competing NSF-funded resource providers, was sometimes contentious, it laid the groundwork for the more-successful follow on program.
TeraGrid’s final report reviewed how the organization had offered computing services to a wider range of users (including the initiation of the Campus Champions program in 2008, which would go on to become a popular onramp to XSEDE), innovated student programs to educate the next generation of HPC professionals, and co-founded the ARCC conferences. Another success of TeraGrid, continued in XSEDE, was that the NSF-funded program did not only support NSF-funded research; discussions with NSF leadership led to a policy to all open research whether or not it is funded by the NSF. That decision gave both organizations an impact far beyond the NSF ecosystem.
Perhaps more importantly, the report described the challenges with which the organization struggled.
The report’s recommendations were incorporated into the NSF’s Extreme Digital Competition, which led to XSEDE being funded in July 2011. The five-year project would be funded to the tune of $121 million, not counting the $4.6 million in supplements eventually awarded.
XSEDE is born
Not that XSEDE was an instant success.
“It wasn’t pretty to start out with,” Towns said. “The fact of the matter is that XSEDE has been very, very successful, but in the first year or so, it didn’t look like it does today … There were deep divides among the participants; it was a challenging process to get through … I wonder how we were able to operate as well as we did when we look at it today.”
The first three years of the project then became an intense team-building effort featuring expansive and ongoing strategic planning. Towns’ talent for mixing delegation with authority proved central to that team building.
While Towns was always there to resolve differences when needed, XSEDE was composed of service providers (HPC centers more or less synonymous with the NSF-funded, XSEDE-allocated computers they hosted) that became competitors the moment the next NSF call for proposals came out. This dual nature of collaboration and competition created complex relationships that necessitated “leadership by influence,” he said.
A “balanced governance” model offered strong central management with decision-making “pushed to as far down in the structure as possible.” Key to that were delegation and decentralization, with genuine stakeholder participation and formal risk management and change control to make governance decisions concrete and defensible.
“How did we get here? Trust,” Towns said. With managers who did not report to him in the usual manner and Service Provider Partners with independent subawards, finding confluences of interest proved key to making the teamwork work. In turn, the complementary capabilities of the partners leveraged the strengths of the smaller institutions and offered a diversity of viewpoints and contributions.
“We stubbed our toes on that a number of times, but we’ve gotten pretty good at it.”
XSEDE was followed by XSEDE 2.0 on Sept. 1, 2016, a non-competitive grant that renewed the organization, modified with new lessons learned, for a further five years. After that came a last year of extension to allow the project to wrap up in an organized fashion and to ease transition to the ACCESS program that came to follow it.
“There was not a competition, so the review was very rigorous,” Towns said of the XSEDE 2.0 granting process. “That was painful but useful.”
XSEDE’s successes
Towns reviewed XSEDE’s successes, which included high marks from user reviews for virtually every function of the organization.
XSEDE has provided one-stop-shop allocations, user support, training and education, cybersecurity, and other infrastructure for NSF-funded high-performance computers across the U.S. During year six of the project, the program had more than 17,500 unique active users, over half of whom were graduate or undergraduate students. Even more interesting, greater than 3,800 of these users weren’t associated with an XSEDE allocation, reflecting the popularity and usefulness of XSEDE’s training programs for researchers who aren’t even using XSEDE machines. These investigators’ work has produced, conservatively counted, more than 19,600 verified publications that have been collectively cited more than 730,000 times over both the XSEDE and XSEDE 2.0 projects.
XSEDE offered user support as well as allocations and computation. To date, XSEDE 2.0 has answered more than 74,000 service requests, with just over 15 hours to resolution on average today. A more intensive support program, the Extended Collaborative Support Service (ECSS), provided a deeper level of collaboration to support users with subject expertise with needed skills, as well as a Novel and Innovative Projects program that eased entry to users with little or no computational experience. To date ECSS has completed 249 projects, with an average productivity gain of 12.9 months.
In the workforce development sphere, XSEDE has also proved to be essential for the computational science community. To date, XSEDE has delivered more than 260,000 participant hours of live training (in addition to substantial use of online, self-paced training offerings). In Year 6, more than a third of those taking advantage of XSEDE training were women or racial/ethnic minority members — “which I think is impressive,” Towns said. “Is it where we need to be? Absolutely not.”
Nor is XSEDE’s success only visible from within the organization. This year, 700 user-survey respondents ranked the organization at 4.0 or better out of 5 in all areas, with an overall satisfaction score of 4.54. Users ranked the importance of the resources to their work at 4.42, near the historical high of 4.43; 58 percent of respondents reported XSEDE to be “essential” to their work.
“Maintaining that level of overall satisfaction at this point in the life of any project is significant,” Towns said. As for the importance scores, Towns said, that reflects well on XSEDE “but also is a reflection of how dependent so many fields of science are on computation” today.
Towns ended his talk with a discussion of where the academic HPC field is going in a broad sense. One issue, he said, will be finding better ways of supporting the professionals who do what XSEDE did.
“Naming is part of the problem,” he said. “We need to figure out what we want to call ourselves.” The issue goes deeper than terminology, he added: Identity as a profession, including possibly a professional organization as well as official recognition of individuals’ contributions, will be needed to crack problems such as that of not offering professional advancement without effectively leaving one’s expertise via management positions.
Advancing to ACCESS
On Sept. 1, 2022, the new Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program will replace XSEDE. Towns’ presentation was followed by a panel discussion of ACCESS and how it will operate.
Tom Gulbransen, program manager of the NSF’s Office of Advanced Cyberinfrastructure, kicked off the panel with a high-level description of ACCESS’s emphasis on continuity of services, along with novel innovations to expand the ecosystem participants, and leveraging related NSF programs. NSF awards thus far to ACCESS teams total $52 million over five years.
Gulbransen mentioned highlights from PEARC meeting sessions to reinforce NSF’s Office of Advanced Cyberinfrastructure’s emphasis on recognizing and promoting the value produced by cyberinfrastructure professionals from a variety of institutions; ensuring that resource providers and ACCESS leadership listen to stakeholders ranging from new researchers to institutional leaders who invest in cyberinfrastructure; and coordinating leadership that occurs through interfaces and exchange with varied community leaders throughout the advanced cyberinfrastructure ecosystem.
Allocations
David Hart of the National Center for Atmospheric Research and Laura Herriott of the National Center for Supercomputing Applications will be co-PIs of the allocations segment of ACCESS. Stephen Deems of the Pittsburgh Supercomputing Center will be PI for the project. Called Resource Allocations Marketplace and Platform Services (RAMPS), it will provide tiered methods of entry to researchers with different needs and levels of HPC expertise. The new structure will be focused on increasing ease of access for new and existing communities as well as democratizing access to the resource environment so that researchers can choose resource environments that can best meet their needs. In addition, the service providers of XSEDE/resource providers of ACCESS will be able to tailor their resource descriptions to attract relevant researchers to their machines.
The ACCESS allocation environment will offer a welcoming gateway that inspires collaboration and participation in the pursuit of scientific discovery while continuing to provide an essential gatekeeping function when necessary to balance demand for resources with the available supply, Hart explained. Diversity, equity, and inclusion (DEI) in the allocations environment is a central focus of the project, along with a simplified request and accelerated review framework. Supporting the allocations marketplace, the allocations software platform will be built with an emphasis on modularity, extensibility, and decentralization, with enough flexibility to accommodate allocations for current and future resources and services. Finally, through a suite of Innovative Pilots, the RAMPS team will introduce disruptive features to the ecosystem and allocations marketplace to further expand coverage for data resources and workflow-based needs.
End-User Support
Shelley Knuth of the University of Colorado Boulder presented the user support services segment – the Multi-tier Assistance, Training, and Computational Help (MATCH) program – for which she will serve as PIand includes the University of Colorado Boulder, the University of Southern California, the Massachusetts Green High Performance Computing Center, the University of Kentucky, and the Ohio Supercomputer Center.
Like RAMPS, MATCH will use tiers to offer support to users in a way that helps them find the precise level of support they need as quickly as possible. Tier 1 will offer a set of user-friendly tools that make it easy for researchers to use ACCESS resources. Tier 2, a curated knowledge base, will provide documentation and examples, as well as input from community experts designed to answer a wide range of researcher questions; an innovation in this tier will be funding to support experts who contribute. Tier 3, MATCH-Plus, will provide short-term support partnerships, connecting research projects with students who have the needed expertise as well as a mentor from MATCH. The highest tier, MATCH-Premier, will offer connections to pertinent specialists best-suited to provide long-term embedded support.
Operations
Amy Schuele of NCSA presented ACCESS’s operations component, the COre National Ecosystem for CyberinfrasTructure (CONECT), which she will PI along with other experts at the Argonne National Laboratory/University of Chicago, PSC, Indiana University, Florida International University, San Diego Supercomputer Center, and the National Center for Supercomputing Applications.
CONECT will integrate and support resource providers through operations; data and networking; cybersecurity; and, through the Student Training and Engagement Program (STEP), provide students with high-value, marketable skills in the above areas. Roadmaps will be used to systematically describe the steps associated with Resource Provider integration.
Monitoring and Measurement
Thomas Furlani of the University at Buffalo described how the Monitoring and Measurement Service of ACCESS will employ and expand the popular XDMoD tool to improve overall system and application performance. The system will offer analytics of workload performance, network traffic, application diagnosis and optimization, a cyberinfrastructure for predictive analysis, and a pilot program to measure and assess the energy and application performance of novel computing architectures.
An important new facet of the track will be to an XDMoD analytical framework so that a wider community can employ the tool to study CI issues both within ACCESS and on other CI ecosystems. A new open data analytics framework, using Jupyter Notebook, Python, and other tools, will allow creative and ad hoc analytics to discover insights from the vast amount of system usage data available and to improve the overall understanding of CI. Open XDMoD will be offered for installation at campuses.
Coordination
Towns wrapped up the presentations with an overview of the OpenCI ACCESS Coordination Office. Towns will serve as the PI, with support from Co-PIs at the San Diego Supercomputer Center and Georgia Tech.
Leveraging deep knowledge of XSEDE and its lessons learned, the coordination office will emphasize a seamless transition to ACCESS for users and resource providers. It will develop collaboration tools to support inter-track communications and coordinate outreach, DEI, and evaluation efforts across the program. It will also communicate with stakeholders to optimize ACCESS’s impact and take advantage of opportunities to improve the program.
Accessing ACCESS
Users with a current project and allocations awarded via XSEDE should notice no changes to their resource access after August. ACCESS and the Service/Resource Providers have agreed to honor all XSEDE-awarded allocations until they expire. This includes projects awarded at the upcoming August XRAC meeting with a start date of Oct. 1, 2022.
Users with an XSEDE-awarded research project that expires on Dec. 31, 2022, should continue to plan for a proposal submission window of Sept. 15 to Oct. 15, 2022, under the existing XRAC guidelines. ACCESS is supporting this “bridge” meeting to make sure no project falls through the cracks. However, users with needs at the smaller end of the scale should check out the ACCESS Allocations Marketplace tiers before starting a proposal, as they may offer an expedited means of getting computing time.
The coordination office is building a website that will offer entry to ACCESS in much the same way that xsede.org has; that site is yet to be launched but will be found at access-ci.org starting Aug. 1, 2022.