The Good, the Bad and the Ugly: Reflections on the NSF Supercomputer Center Program

January 4, 2010

In a position paper for community input at NSF’s Future of High Performance Computing Workshop in early December, Calit2 Director Larry Smarr reviewed the successes, failures and continuing challenges of the NSF supercomputing program that he helped create. In 1983, Smarr (then at the University of Illinois at Urbana-Champaign) was the first to propose what would later become known as the NSF Supercomputer Centers program, followed shortly by a proposal from UCSD’s Sid Karin. The two went on to become the founding directors of the first two NSF supercomputer centers — Larry Smarr, of the National Center for Supercomputing Applications (NCSA) at UIUC; and Sid Karin of the San Diego Supercomputer Center (SDSC) at UC San Diego. For the past 10 years, Smarr has been the founding director of Calit2 at UCSD, and in that capacity he has worked very closely with SDSC. Following is the bulk of Larry Smarr’s position paper submitted to the NSF HPC Workshop, which took place at the National Institute for Computational Sciences in Arlington, Virginia.

Larry Smarr: I believe there are some important lessons to be drawn on the institutional and cultural successes and failures of the last 25 years. I offer these reflections for your consideration as you think about how best to organize the NSF HPC program going forward. I have divided my thoughts into three sections: The good, the Bad, and the Ugly. The Good are the accomplishments of the NSF SC centers, many of which were unexpected in 1985. The Bad are the cultural and institutional shortcomings of that program. The Ugly are the missed opportunities, largely caused by the Bad.

The Good

Increased the number of academic supercomputer users. It was estimated that before the 1985 launch of the NSF SC centers there were ~100 academic supercomputer users. After the first five years of the centers program a two orders of magnitude increase, as measured by those that logged onto one or another of the centers machines, was induced in the national academic HPC human resource pool. This vastly increased the scale of academic research using HPC and provided a pool for industry and the labs to hire from.

Stimulated use of HPC simulation in industry. Each of the centers recruited industrial partners and trained them on the use of HPC. NCSA developed an industrial partner program which attracted leading companies from over a dozen categories of the Fortune 500 classification. One notable example is Eli Lilly, which trained over 200 of their staff by total immersion sessions at NCSA, then became the first pharmaceutical company to purchase their own supercomputer (Cray-2), and within a year most major pharmas had followed and acquired HPC resources.

Brought an HPC Garden of Architectures to the community. In a short period of time the NSF centers working jointly with DARPA and NSF acquired almost all major HPC parallel architectures and made them available to the academic HPC community. This drove a rapid evolution of exploring new algorithms for key applications which were most efficient on the new hardware architectures.

Incubated the global Internet and Web. Although the Internet protocols were over a decade old when the NSF centers program began, the decision of the networking section of the Office of Advanced Scientific Computing to only support TCP/IP, led to the NSFnet backbone, buildout of the regionals, and extension to early adopter campuses. The NSF networking division, formed after CISE was created, continue to aggressively upgrade the NSFnet. The vBNS program brought high speed shared Internet to many campuses. These activities led directly to today’s global Internet. NCSA Mosaic, developed only three years after Tim Berners-Lee created the WWW protocols, exponentially grew the nacent Web community. Indeed in 1994 NCSA was the most hit Web site on the planet and as a result we were forced to invent the first parallel Web server. The NCSA Mosaic programmers left UIUC to form Netscape, Microsoft licensed Mosaic to form the basis of Internet Explorer, and Apache moved the Mosaic server code through open source to form the Apache server. Together this led to one of the largest NSF-induced transformations of the global economy in the history of NSF grants.

Drove Scientific Visualization. The need for visualization of the massive datasets generated by the NSF centers drove the development of computer graphics teams at a number of centers. The concept of data-driven scientific visualization quickly swept the academic community, but also had a major impact, largely through SIGGRAPH, on Hollywood and later the gaming community. For instance, Stefen Fangmeier, who was NCSA scientific visualization project manager in 1987, went on to spend over 15 years as a visual effects supervisor at Industrial Light and Magic, working on such films as Terminator 2, Jurassic Park, Dreamcatcher, Perfect Storm, and Master and Commander.

Pioneered Collaboration Technologies. Because HPC applications often involve teams with members spread across multiple institutions, the NSF SC centers were natural locations for the development of collaborative technologies. There was also a need for center consulting staff to collaboratively analyze complex data output and code with remote users. As a result, the NCSA Software Development Group developed one of the first cross-platform (Windows, Mac, Unix) synchronous desktop collaboration software systems, NCSA Collage, focused on collaborative data analysis in 1990. Five years later this was replaced by NCSA Habanero, one of the largest Java applications yet written at the time, which was automatically cross platform. Under the PACI NCSAlliance, ANL led development of the Access Grid, which enabled many remote sites to share real-time video conferencing over the Internet, becoming widely used around the world. In addition, high end experiments in novel collaboration technologies also were explored, such as linking CAVEs or PowerWalls so that avatars represented the location of remote collaborators in a shared data space. This foreshadowed the use of the OptIPuter to link scalable OptIPortals with HD video streams, which is becoming commonplace today.

The Bad

Lack of institutionalization of the centers. In spite of constant requests from the centers, NSF never institutionalized the centers program as it had NCAR, NRAO, NOAO, etc. Those centers are, respectively, where the nation computes atmospheric sciences, observes with radio waves, and observes at optical wavelengths. The SC centers should be the sites where the academic community computes and where the staff support for things computational are housed. That is, select a few sites and give them the same multi-decadal guarantee of existence, with periodic reviews to maintain quality and user responsiveness. This would reduce a great deal of the endless rounds of existential worry and report writing which characterized the centers, at least during my 15 years as a director.

NSF induced a competitive culture between centers. A corollary of the above point is that the centers, by NSF design, were forced into a secretive and competitive posture relative to one another. Because one never knew when the next competition would come down from NSF, one hoarded any possible advantage to use in that next round. If the centers had been institutionalized they could relax and afford to be open and sharing. As one example of the disincentive to collaborate, it took me several years to convince the other centers to come together to form a joint national peer review board, because it undercut the ability of centers to recruit application “stars” and claim exclusivity with them. I believe the country would have seen the emergence of a national cyberinfrastructure during the PACI era if the centers had been institutionalized and incentives had been put in place for sharing and joint projects.

Narrowing Rather than Broadening Mission. One of the reasons that so many of the Good things happened was the flexibility that was inherently part of the original SC centers mandate. Yes, first and foremost the mission was centered on acquiring, installing, operating, and user consulting for HPC resources, but in addition there was funding opportunity to hire application domain experts, software tools developers, computer graphics and digital arts wizards, etc. In the PACI era this was broadened even more by the partnering with many other universities, national labs, and industrial partners. However, it seems to me that in the last decade the NSF has drastically narrowed the scope of the SC centers until finally the centers seem to be being dealt with as if they were contractors for installing and operating machines only. This had naturally led to a systematic “brain drain” away from the centers and a major lowering of their innovation opportunity space. I think it highly unlikely today that many of the successes of the first decade could occur in the centers as they are funded and reviewed currently.

The Ugly

Lack of balanced user-to-HPC architecture. From the beginning of the NSF centers program, a basic architectural concept was building a balanced end-to-end system connecting the end user with the HPC resource. Essentially, this was what drove the NSFnet build-out and the strong adoption of NCSA Telnet, allowing end users with Macs or PCs the ability to open up multiple windows on their PCs, including the supercomputer and mass storage systems. Similarly, during the first five years of the PACI, both NPACI and the Alliance spent a lot of their software development and infrastructure developments on connecting the end-user to the HPC resources. But it seems that during the TeraGrid era, the end-users only have access to the TG resources over the shared Internet, with no local facilities for compute, storage, and visualization that scale up in proportion with the capability of the TG resources. This sets up an exponentially growing data isolation of the end users as the HPC resources get exponentially faster (thus exponentially increasing the size of data sets the end-user needs access to), while the shared Internet throughput grows slowly if at all.

NSF drops support for national networking. After 15 years of leadership in increasing Internet backbone speed and connectivity to campuses, NSF has essentially removed itself from supporting the needed growth in capability of the Internet for the increasing data-intensive requirements of the end-users of the TG resources, with the notable exception of the IRNC. This is in spite of the creation and growth of the National LambdaRail and more recently the Internet2 Dynamic Circuits, both of which provide clear channel IP fiber optic connections at 10,000 Mbps. Although the NSF did support several 10G connections BETWEEN the TG sites, the NSF has essentially withdrawn from the national backbone, regional, and local support for dedicated or on-demand large data pipes to the end-users of the TG. Imagine that NSF had only supported the Internet links between the five centers in the late 1980s and hadn’t supported the build-out of the regionals and the access to the early adopting campuses!

No systemic cyberinfrastructure plan with centers having key role. In spite of 15 years of development of components of CI, there is still no NSF-wide layered CI defined and being used broadly. MREFCs are individually defining and building their own CI (NEES, OOI, NEON), as well as Division-level grants (e.g., iPLANT). I have always believed that the NSF SC centers, as the original data-intensive generators would be in the ideal position to come together with the CS and applications communities (the intersection they have always worked at) to define a national CI system and support it for the major NSF opportunities. However, to have done this would have required The Bad not to have existed. Namely, defining and supporting an NSF-wide national CI would be natural if the SC centers had institutional stability and longevity, a collaborative rather than competitive culture, and a broadening rather than narrowing mandate. With the formation of an Office of CI, there is a chance to try and change all this, but without a robust and flexible set of NSF SC centers, there are no obvious sites to house the software engineers and consultants to support a national NSF CI program.

My hope is that these remarks can help inform the discussions of the NSF HPC Workshop. I am happy to engage with the process in the future if it would be helpful.

—–

Reprinted with permission from Calit2 and Larry Smarr.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Supercomputer Simulations Validate NASA Crash Testing

February 17, 2020

Car crash simulation is already a challenging supercomputing task, requiring pinpoint estimation of how hundreds of components interact with turbulent forces and human bodies. Spacecraft crash simulation is far more diff Read more…

By Oliver Peckham

What’s New in HPC Research: Quantum Clouds, Interatomic Models, Genetic Algorithms & More

February 14, 2020

In this bimonthly feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

By Oliver Peckham

The Massive GPU Cloudburst Experiment Plays a Smaller, More Productive Encore

February 13, 2020

In November, researchers at the San Diego Supercomputer Center (SDSC) and the IceCube Particle Astrophysics Center (WIPAC) set out to break the internet – or at least, pull off the cloud HPC equivalent. As part of thei Read more…

By Oliver Peckham

ORNL Team Develops AI-based Cancer Text Mining Tool on Summit

February 13, 2020

A group of Oak Ridge National Laboratory researchers working on the Summit supercomputer has developed a new neural network tool for fast extraction of information from cancer pathology reports to speed research and clin Read more…

By John Russell

Nature Serves up Another Challenge to Quantum Computing?

February 13, 2020

Just when you thought it was safe to assume quantum computing – though distant – would eventually succumb to clever technology, another potentially confounding factor pops up. It’s the Heisenberg Limit (HL), close Read more…

By John Russell

AWS Solution Channel

Challenging the barriers to High Performance Computing in the Cloud

Cloud computing helps democratize High Performance Computing by placing powerful computational capabilities in the hands of more researchers, engineers, and organizations who may lack access to sufficient on-premises infrastructure. Read more…

IBM Accelerated Insights

Intelligent HPC – Keeping Hard Work at Bay(es)

Since the dawn of time, humans have looked for ways to make their lives easier. Over the centuries human ingenuity has given us inventions such as the wheel and simple machines – which help greatly with tasks that would otherwise be extremely laborious. Read more…

Researchers Enlist Three Supercomputers to Apply Deep Learning to Extreme Weather

February 12, 2020

When it comes to extreme weather, an errant forecast can have serious effects. While advance warning can give people time to prepare for the weather as it did with the polar vortex last year, the absence of accurate adva Read more…

By Oliver Peckham

The Massive GPU Cloudburst Experiment Plays a Smaller, More Productive Encore

February 13, 2020

In November, researchers at the San Diego Supercomputer Center (SDSC) and the IceCube Particle Astrophysics Center (WIPAC) set out to break the internet – or Read more…

By Oliver Peckham

Eni to Retake Industry HPC Crown with Launch of HPC5

February 12, 2020

With the launch of its Dell-built HPC5 system, Italian energy company Eni regains its position atop the industrial supercomputing leaderboard. At 52-petaflops p Read more…

By Tiffany Trader

Trump Budget Proposal Again Slashes Science Spending

February 11, 2020

President Donald Trump’s FY2021 U.S. Budget, submitted to Congress this week, again slashes science spending. It’s a $4.8 trillion statement of priorities, Read more…

By John Russell

Policy: Republicans Eye Bigger Science Budgets; NSF Celebrates 70th, Names Idea Machine Winners

February 5, 2020

It’s a busy week for science policy. Yesterday, the National Science Foundation announced winners of its 2026 Idea Machine contest seeking directions for futu Read more…

By John Russell

Fujitsu A64FX Supercomputer to Be Deployed at Nagoya University This Summer

February 3, 2020

Japanese tech giant Fujitsu announced today that it will supply Nagoya University Information Technology Center with the first commercial supercomputer powered Read more…

By Tiffany Trader

Intel Stopping Nervana Development to Focus on Habana AI Chips

February 3, 2020

Just two months after acquiring Israeli AI chip start-up Habana Labs for $2 billion, Intel is stopping development of its existing Nervana neural network proces Read more…

By John Russell

Lise Supercomputer, Part of HLRN-IV, Begins Operations

January 29, 2020

The second phase of the build-out of HLRN-IV – the planned 16 peak-petaflops supercomputer serving the North-German Supercomputing Alliance (HLRN) – is unde Read more…

By Staff report

IBM Debuts IC922 Power Server for AI Inferencing and Data Management

January 28, 2020

IBM today launched a Power9-based inference server – the IC922 – that features up to six Nvidia T4 GPUs, PCIe Gen 4 and OpenCAPI connectivity, and can accom Read more…

By John Russell

Julia Programming’s Dramatic Rise in HPC and Elsewhere

January 14, 2020

Back in 2012 a paper by four computer scientists including Alan Edelman of MIT introduced Julia, A Fast Dynamic Language for Technical Computing. At the time, t Read more…

By John Russell

Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020

November 12, 2019

The number of top-tier HPC systems makers has shrunk due to a steady march of M&A activity, but there is increased diversity and choice of processing compon Read more…

By Tiffany Trader

SC19: IBM Changes Its HPC-AI Game Plan

November 25, 2019

It’s probably fair to say IBM is known for big bets. Summit supercomputer – a big win. Red Hat acquisition – looking like a big win. OpenPOWER and Power processors – jury’s out? At SC19, long-time IBMer Dave Turek sketched out a different kind of bet for Big Blue – a small ball strategy, if you’ll forgive the baseball analogy... Read more…

By John Russell

Intel Debuts New GPU – Ponte Vecchio – and Outlines Aspirations for oneAPI

November 17, 2019

Intel today revealed a few more details about its forthcoming Xe line of GPUs – the top SKU is named Ponte Vecchio and will be used in Aurora, the first plann Read more…

By John Russell

Dell Ramps Up HPC Testing of AMD Rome Processors

October 21, 2019

Dell Technologies is wading deeper into the AMD-based systems market with a growing evaluation program for the latest Epyc (Rome) microprocessors from AMD. In a Read more…

By John Russell

IBM Unveils Latest Achievements in AI Hardware

December 13, 2019

“The increased capabilities of contemporary AI models provide unprecedented recognition accuracy, but often at the expense of larger computational and energet Read more…

By Oliver Peckham

SC19: Welcome to Denver

November 17, 2019

A significant swath of the HPC community has come to Denver for SC19, which began today (Sunday) with a rich technical program. As is customary, the ribbon cutt Read more…

By Tiffany Trader

D-Wave’s Path to 5000 Qubits; Google’s Quantum Supremacy Claim

September 24, 2019

On the heels of IBM’s quantum news last week come two more quantum items. D-Wave Systems today announced the name of its forthcoming 5000-qubit system, Advantage (yes the name choice isn’t serendipity), at its user conference being held this week in Newport, RI. Read more…

By John Russell

Leading Solution Providers

SC 2019 Virtual Booth Video Tour

AMD
AMD
ASROCK RACK
ASROCK RACK
AWS
AWS
CEJN
CJEN
CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
IBM
IBM
MELLANOX
MELLANOX
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
SIX NINES IT
SIX NINES IT
VERNE GLOBAL
VERNE GLOBAL
WEKAIO
WEKAIO

Jensen Huang’s SC19 – Fast Cars, a Strong Arm, and Aiming for the Cloud(s)

November 20, 2019

We’ve come to expect Nvidia CEO Jensen Huang’s annual SC keynote to contain stunning graphics and lively bravado (with plenty of examples) in support of GPU Read more…

By John Russell

Fujitsu A64FX Supercomputer to Be Deployed at Nagoya University This Summer

February 3, 2020

Japanese tech giant Fujitsu announced today that it will supply Nagoya University Information Technology Center with the first commercial supercomputer powered Read more…

By Tiffany Trader

51,000 Cloud GPUs Converge to Power Neutrino Discovery at the South Pole

November 22, 2019

At the dead center of the South Pole, thousands of sensors spanning a cubic kilometer are buried thousands of meters beneath the ice. The sensors are part of Ic Read more…

By Oliver Peckham

Top500: US Maintains Performance Lead; Arm Tops Green500

November 18, 2019

The 54th Top500, revealed today at SC19, is a familiar list: the U.S. Summit (ORNL) and Sierra (LLNL) machines, offering 148.6 and 94.6 petaflops respectively, Read more…

By Tiffany Trader

Azure Cloud First with AMD Epyc Rome Processors

November 6, 2019

At Ignite 2019 this week, Microsoft's Azure cloud team and AMD announced an expansion of their partnership that began in 2017 when Azure debuted Epyc-backed instances for storage workloads. The fourth-generation Azure D-series and E-series virtual machines previewed at the Rome launch in August are now generally available. Read more…

By Tiffany Trader

Intel’s New Hyderabad Design Center Targets Exascale Era Technologies

December 3, 2019

Intel's Raja Koduri was in India this week to help launch a new 300,000 square foot design and engineering center in Hyderabad, which will focus on advanced com Read more…

By Tiffany Trader

In Memoriam: Steve Tuecke, Globus Co-founder

November 4, 2019

HPCwire is deeply saddened to report that Steve Tuecke, longtime scientist at Argonne National Lab and University of Chicago, has passed away at age 52. Tuecke Read more…

By Tiffany Trader

Cray Debuts ClusterStor E1000 Finishing Remake of Portfolio for ‘Exascale Era’

October 30, 2019

Cray, now owned by HPE, today introduced the ClusterStor E1000 storage platform, which leverages Cray software and mixes hard disk drives (HDD) and flash memory Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This