The Good, the Bad and the Ugly: Reflections on the NSF Supercomputer Center Program

By Nicole Hemsoth

January 4, 2010

In a position paper for community input at NSF’s Future of High Performance Computing Workshop in early December, Calit2 Director Larry Smarr reviewed the successes, failures and continuing challenges of the NSF supercomputing program that he helped create. In 1983, Smarr (then at the University of Illinois at Urbana-Champaign) was the first to propose what would later become known as the NSF Supercomputer Centers program, followed shortly by a proposal from UCSD’s Sid Karin. The two went on to become the founding directors of the first two NSF supercomputer centers — Larry Smarr, of the National Center for Supercomputing Applications (NCSA) at UIUC; and Sid Karin of the San Diego Supercomputer Center (SDSC) at UC San Diego. For the past 10 years, Smarr has been the founding director of Calit2 at UCSD, and in that capacity he has worked very closely with SDSC. Following is the bulk of Larry Smarr’s position paper submitted to the NSF HPC Workshop, which took place at the National Institute for Computational Sciences in Arlington, Virginia.

Larry Smarr: I believe there are some important lessons to be drawn on the institutional and cultural successes and failures of the last 25 years. I offer these reflections for your consideration as you think about how best to organize the NSF HPC program going forward. I have divided my thoughts into three sections: The good, the Bad, and the Ugly. The Good are the accomplishments of the NSF SC centers, many of which were unexpected in 1985. The Bad are the cultural and institutional shortcomings of that program. The Ugly are the missed opportunities, largely caused by the Bad.

The Good

Increased the number of academic supercomputer users. It was estimated that before the 1985 launch of the NSF SC centers there were ~100 academic supercomputer users. After the first five years of the centers program a two orders of magnitude increase, as measured by those that logged onto one or another of the centers machines, was induced in the national academic HPC human resource pool. This vastly increased the scale of academic research using HPC and provided a pool for industry and the labs to hire from.

Stimulated use of HPC simulation in industry. Each of the centers recruited industrial partners and trained them on the use of HPC. NCSA developed an industrial partner program which attracted leading companies from over a dozen categories of the Fortune 500 classification. One notable example is Eli Lilly, which trained over 200 of their staff by total immersion sessions at NCSA, then became the first pharmaceutical company to purchase their own supercomputer (Cray-2), and within a year most major pharmas had followed and acquired HPC resources.

Brought an HPC Garden of Architectures to the community. In a short period of time the NSF centers working jointly with DARPA and NSF acquired almost all major HPC parallel architectures and made them available to the academic HPC community. This drove a rapid evolution of exploring new algorithms for key applications which were most efficient on the new hardware architectures.

Incubated the global Internet and Web. Although the Internet protocols were over a decade old when the NSF centers program began, the decision of the networking section of the Office of Advanced Scientific Computing to only support TCP/IP, led to the NSFnet backbone, buildout of the regionals, and extension to early adopter campuses. The NSF networking division, formed after CISE was created, continue to aggressively upgrade the NSFnet. The vBNS program brought high speed shared Internet to many campuses. These activities led directly to today’s global Internet. NCSA Mosaic, developed only three years after Tim Berners-Lee created the WWW protocols, exponentially grew the nacent Web community. Indeed in 1994 NCSA was the most hit Web site on the planet and as a result we were forced to invent the first parallel Web server. The NCSA Mosaic programmers left UIUC to form Netscape, Microsoft licensed Mosaic to form the basis of Internet Explorer, and Apache moved the Mosaic server code through open source to form the Apache server. Together this led to one of the largest NSF-induced transformations of the global economy in the history of NSF grants.

Drove Scientific Visualization. The need for visualization of the massive datasets generated by the NSF centers drove the development of computer graphics teams at a number of centers. The concept of data-driven scientific visualization quickly swept the academic community, but also had a major impact, largely through SIGGRAPH, on Hollywood and later the gaming community. For instance, Stefen Fangmeier, who was NCSA scientific visualization project manager in 1987, went on to spend over 15 years as a visual effects supervisor at Industrial Light and Magic, working on such films as Terminator 2, Jurassic Park, Dreamcatcher, Perfect Storm, and Master and Commander.

Pioneered Collaboration Technologies. Because HPC applications often involve teams with members spread across multiple institutions, the NSF SC centers were natural locations for the development of collaborative technologies. There was also a need for center consulting staff to collaboratively analyze complex data output and code with remote users. As a result, the NCSA Software Development Group developed one of the first cross-platform (Windows, Mac, Unix) synchronous desktop collaboration software systems, NCSA Collage, focused on collaborative data analysis in 1990. Five years later this was replaced by NCSA Habanero, one of the largest Java applications yet written at the time, which was automatically cross platform. Under the PACI NCSAlliance, ANL led development of the Access Grid, which enabled many remote sites to share real-time video conferencing over the Internet, becoming widely used around the world. In addition, high end experiments in novel collaboration technologies also were explored, such as linking CAVEs or PowerWalls so that avatars represented the location of remote collaborators in a shared data space. This foreshadowed the use of the OptIPuter to link scalable OptIPortals with HD video streams, which is becoming commonplace today.

The Bad

Lack of institutionalization of the centers. In spite of constant requests from the centers, NSF never institutionalized the centers program as it had NCAR, NRAO, NOAO, etc. Those centers are, respectively, where the nation computes atmospheric sciences, observes with radio waves, and observes at optical wavelengths. The SC centers should be the sites where the academic community computes and where the staff support for things computational are housed. That is, select a few sites and give them the same multi-decadal guarantee of existence, with periodic reviews to maintain quality and user responsiveness. This would reduce a great deal of the endless rounds of existential worry and report writing which characterized the centers, at least during my 15 years as a director.

NSF induced a competitive culture between centers. A corollary of the above point is that the centers, by NSF design, were forced into a secretive and competitive posture relative to one another. Because one never knew when the next competition would come down from NSF, one hoarded any possible advantage to use in that next round. If the centers had been institutionalized they could relax and afford to be open and sharing. As one example of the disincentive to collaborate, it took me several years to convince the other centers to come together to form a joint national peer review board, because it undercut the ability of centers to recruit application “stars” and claim exclusivity with them. I believe the country would have seen the emergence of a national cyberinfrastructure during the PACI era if the centers had been institutionalized and incentives had been put in place for sharing and joint projects.

Narrowing Rather than Broadening Mission. One of the reasons that so many of the Good things happened was the flexibility that was inherently part of the original SC centers mandate. Yes, first and foremost the mission was centered on acquiring, installing, operating, and user consulting for HPC resources, but in addition there was funding opportunity to hire application domain experts, software tools developers, computer graphics and digital arts wizards, etc. In the PACI era this was broadened even more by the partnering with many other universities, national labs, and industrial partners. However, it seems to me that in the last decade the NSF has drastically narrowed the scope of the SC centers until finally the centers seem to be being dealt with as if they were contractors for installing and operating machines only. This had naturally led to a systematic “brain drain” away from the centers and a major lowering of their innovation opportunity space. I think it highly unlikely today that many of the successes of the first decade could occur in the centers as they are funded and reviewed currently.

The Ugly

Lack of balanced user-to-HPC architecture. From the beginning of the NSF centers program, a basic architectural concept was building a balanced end-to-end system connecting the end user with the HPC resource. Essentially, this was what drove the NSFnet build-out and the strong adoption of NCSA Telnet, allowing end users with Macs or PCs the ability to open up multiple windows on their PCs, including the supercomputer and mass storage systems. Similarly, during the first five years of the PACI, both NPACI and the Alliance spent a lot of their software development and infrastructure developments on connecting the end-user to the HPC resources. But it seems that during the TeraGrid era, the end-users only have access to the TG resources over the shared Internet, with no local facilities for compute, storage, and visualization that scale up in proportion with the capability of the TG resources. This sets up an exponentially growing data isolation of the end users as the HPC resources get exponentially faster (thus exponentially increasing the size of data sets the end-user needs access to), while the shared Internet throughput grows slowly if at all.

NSF drops support for national networking. After 15 years of leadership in increasing Internet backbone speed and connectivity to campuses, NSF has essentially removed itself from supporting the needed growth in capability of the Internet for the increasing data-intensive requirements of the end-users of the TG resources, with the notable exception of the IRNC. This is in spite of the creation and growth of the National LambdaRail and more recently the Internet2 Dynamic Circuits, both of which provide clear channel IP fiber optic connections at 10,000 Mbps. Although the NSF did support several 10G connections BETWEEN the TG sites, the NSF has essentially withdrawn from the national backbone, regional, and local support for dedicated or on-demand large data pipes to the end-users of the TG. Imagine that NSF had only supported the Internet links between the five centers in the late 1980s and hadn’t supported the build-out of the regionals and the access to the early adopting campuses!

No systemic cyberinfrastructure plan with centers having key role. In spite of 15 years of development of components of CI, there is still no NSF-wide layered CI defined and being used broadly. MREFCs are individually defining and building their own CI (NEES, OOI, NEON), as well as Division-level grants (e.g., iPLANT). I have always believed that the NSF SC centers, as the original data-intensive generators would be in the ideal position to come together with the CS and applications communities (the intersection they have always worked at) to define a national CI system and support it for the major NSF opportunities. However, to have done this would have required The Bad not to have existed. Namely, defining and supporting an NSF-wide national CI would be natural if the SC centers had institutional stability and longevity, a collaborative rather than competitive culture, and a broadening rather than narrowing mandate. With the formation of an Office of CI, there is a chance to try and change all this, but without a robust and flexible set of NSF SC centers, there are no obvious sites to house the software engineers and consultants to support a national NSF CI program.

My hope is that these remarks can help inform the discussions of the NSF HPC Workshop. I am happy to engage with the process in the future if it would be helpful.

—–

Reprinted with permission from Calit2 and Larry Smarr.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Data Vortex Users Contemplate the Future of Supercomputing

October 19, 2017

Last month (Sept. 11-12), HPC networking company Data Vortex held its inaugural users group at Pacific Northwest National Laboratory (PNNL) bringing together about 30 participants from industry, government and academia t Read more…

By Tiffany Trader

AI Self-Training Goes Forward at Google DeepMind

October 19, 2017

DeepMind, Google’s AI research organization, announced today in a blog that AlphaGo Zero, the latest evolution of AlphaGo (the first computer program to defeat a Go world champion) trained itself within three days to play Go at a superhuman level (i.e., better than any human) – and to beat the old version of AlphaGo – without leveraging human expertise, data or training. Read more…

By Doug Black

Researchers Scale COSMO Climate Code to 4888 GPUs on Piz Daint

October 17, 2017

Effective global climate simulation, sorely needed to anticipate and cope with global warming, has long been computationally challenging. Two of the major obstacles are the needed resolution and prolonged time to compute Read more…

By John Russell

HPE Extreme Performance Solutions

Transforming Genomic Analytics with HPC-Accelerated Insights

Advancements in the field of genomics are revolutionizing our understanding of human biology, rapidly accelerating the discovery and treatment of genetic diseases, and dramatically improving human health. Read more…

Student Cluster Competition Coverage New Home

October 16, 2017

Hello computer sports fans! This is the first of many (many!) articles covering the world-wide phenomenon of Student Cluster Competitions. Finally, the Student Cluster Competition coverage has come to its natural home: H Read more…

By Dan Olds

Data Vortex Users Contemplate the Future of Supercomputing

October 19, 2017

Last month (Sept. 11-12), HPC networking company Data Vortex held its inaugural users group at Pacific Northwest National Laboratory (PNNL) bringing together ab Read more…

By Tiffany Trader

AI Self-Training Goes Forward at Google DeepMind

October 19, 2017

DeepMind, Google’s AI research organization, announced today in a blog that AlphaGo Zero, the latest evolution of AlphaGo (the first computer program to defeat a Go world champion) trained itself within three days to play Go at a superhuman level (i.e., better than any human) – and to beat the old version of AlphaGo – without leveraging human expertise, data or training. Read more…

By Doug Black

Student Cluster Competition Coverage New Home

October 16, 2017

Hello computer sports fans! This is the first of many (many!) articles covering the world-wide phenomenon of Student Cluster Competitions. Finally, the Student Read more…

By Dan Olds

Intel Delivers 17-Qubit Quantum Chip to European Research Partner

October 10, 2017

On Tuesday, Intel delivered a 17-qubit superconducting test chip to research partner QuTech, the quantum research institute of Delft University of Technology (TU Delft) in the Netherlands. The announcement marks a major milestone in the 10-year, $50-million collaborative relationship with TU Delft and TNO, the Dutch Organization for Applied Research, to accelerate advancements in quantum computing. Read more…

By Tiffany Trader

Fujitsu Tapped to Build 37-Petaflops ABCI System for AIST

October 10, 2017

Fujitsu announced today it will build the long-planned AI Bridging Cloud Infrastructure (ABCI) which is set to become the fastest supercomputer system in Japan Read more…

By John Russell

HPC Chips – A Veritable Smorgasbord?

October 10, 2017

For the first time since AMD's ill-fated launch of Bulldozer the answer to the question, 'Which CPU will be in my next HPC system?' doesn't have to be 'Whichever variety of Intel Xeon E5 they are selling when we procure'. Read more…

By Dairsie Latimer

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Intel Debuts Programmable Acceleration Card

October 5, 2017

With a view toward supporting complex, data-intensive applications, such as AI inference, video streaming analytics, database acceleration and genomics, Intel i Read more…

By Doug Black

Reinders: “AVX-512 May Be a Hidden Gem” in Intel Xeon Scalable Processors

June 29, 2017

Imagine if we could use vector processing on something other than just floating point problems.  Today, GPUs and CPUs work tirelessly to accelerate algorithms Read more…

By James Reinders

NERSC Scales Scientific Deep Learning to 15 Petaflops

August 28, 2017

A collaborative effort between Intel, NERSC and Stanford has delivered the first 15-petaflops deep learning software running on HPC platforms and is, according Read more…

By Rob Farber

Oracle Layoffs Reportedly Hit SPARC and Solaris Hard

September 7, 2017

Oracle’s latest layoffs have many wondering if this is the end of the line for the SPARC processor and Solaris OS development. As reported by multiple sources Read more…

By John Russell

US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021

September 27, 2017

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the "Aurora" supercompute Read more…

By Tiffany Trader

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Google Releases Deeplearn.js to Further Democratize Machine Learning

August 17, 2017

Spreading the use of machine learning tools is one of the goals of Google’s PAIR (People + AI Research) initiative, which was introduced in early July. Last w Read more…

By John Russell

GlobalFoundries Puts Wind in AMD’s Sails with 12nm FinFET

September 24, 2017

From its annual tech conference last week (Sept. 20), where GlobalFoundries welcomed more than 600 semiconductor professionals (reaching the Santa Clara venue Read more…

By Tiffany Trader

Graphcore Readies Launch of 16nm Colossus-IPU Chip

July 20, 2017

A second $30 million funding round for U.K. AI chip developer Graphcore sets up the company to go to market with its “intelligent processing unit” (IPU) in Read more…

By Tiffany Trader

Leading Solution Providers

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Amazon Debuts New AMD-based GPU Instances for Graphics Acceleration

September 12, 2017

Last week Amazon Web Services (AWS) streaming service, AppStream 2.0, introduced a new GPU instance called Graphics Design intended to accelerate graphics. The Read more…

By John Russell

EU Funds 20 Million Euro ARM+FPGA Exascale Project

September 7, 2017

At the Barcelona Supercomputer Centre on Wednesday (Sept. 6), 16 partners gathered to launch the EuroEXA project, which invests €20 million over three-and-a-half years into exascale-focused research and development. Led by the Horizon 2020 program, EuroEXA picks up the banner of a triad of partner projects — ExaNeSt, EcoScale and ExaNoDe — building on their work... Read more…

By Tiffany Trader

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Cray Moves to Acquire the Seagate ClusterStor Line

July 28, 2017

This week Cray announced that it is picking up Seagate's ClusterStor HPC storage array business for an undisclosed sum. "In short we're effectively transitioning the bulk of the ClusterStor product line to Cray," said CEO Peter Ungaro. Read more…

By Tiffany Trader

Intel Launches Software Tools to Ease FPGA Programming

September 5, 2017

Field Programmable Gate Arrays (FPGAs) have a reputation for being difficult to program, requiring expertise in specialty languages, like Verilog or VHDL. Easin Read more…

By Tiffany Trader

IBM Advances Web-based Quantum Programming

September 5, 2017

IBM Research is pairing its Jupyter-based Data Science Experience notebook environment with its cloud-based quantum computer, IBM Q, in hopes of encouraging a new class of entrepreneurial user to solve intractable problems that even exceed the capabilities of the best AI systems. Read more…

By Alex Woodie

HPC Chips – A Veritable Smorgasbord?

October 10, 2017

For the first time since AMD's ill-fated launch of Bulldozer the answer to the question, 'Which CPU will be in my next HPC system?' doesn't have to be 'Whichever variety of Intel Xeon E5 they are selling when we procure'. Read more…

By Dairsie Latimer

  • arrow
  • Click Here for More Headlines
  • arrow
Share This