The Good, the Bad and the Ugly: Reflections on the NSF Supercomputer Center Program

By Nicole Hemsoth

January 4, 2010

In a position paper for community input at NSF’s Future of High Performance Computing Workshop in early December, Calit2 Director Larry Smarr reviewed the successes, failures and continuing challenges of the NSF supercomputing program that he helped create. In 1983, Smarr (then at the University of Illinois at Urbana-Champaign) was the first to propose what would later become known as the NSF Supercomputer Centers program, followed shortly by a proposal from UCSD’s Sid Karin. The two went on to become the founding directors of the first two NSF supercomputer centers — Larry Smarr, of the National Center for Supercomputing Applications (NCSA) at UIUC; and Sid Karin of the San Diego Supercomputer Center (SDSC) at UC San Diego. For the past 10 years, Smarr has been the founding director of Calit2 at UCSD, and in that capacity he has worked very closely with SDSC. Following is the bulk of Larry Smarr’s position paper submitted to the NSF HPC Workshop, which took place at the National Institute for Computational Sciences in Arlington, Virginia.

Larry Smarr: I believe there are some important lessons to be drawn on the institutional and cultural successes and failures of the last 25 years. I offer these reflections for your consideration as you think about how best to organize the NSF HPC program going forward. I have divided my thoughts into three sections: The good, the Bad, and the Ugly. The Good are the accomplishments of the NSF SC centers, many of which were unexpected in 1985. The Bad are the cultural and institutional shortcomings of that program. The Ugly are the missed opportunities, largely caused by the Bad.

The Good

Increased the number of academic supercomputer users. It was estimated that before the 1985 launch of the NSF SC centers there were ~100 academic supercomputer users. After the first five years of the centers program a two orders of magnitude increase, as measured by those that logged onto one or another of the centers machines, was induced in the national academic HPC human resource pool. This vastly increased the scale of academic research using HPC and provided a pool for industry and the labs to hire from.

Stimulated use of HPC simulation in industry. Each of the centers recruited industrial partners and trained them on the use of HPC. NCSA developed an industrial partner program which attracted leading companies from over a dozen categories of the Fortune 500 classification. One notable example is Eli Lilly, which trained over 200 of their staff by total immersion sessions at NCSA, then became the first pharmaceutical company to purchase their own supercomputer (Cray-2), and within a year most major pharmas had followed and acquired HPC resources.

Brought an HPC Garden of Architectures to the community. In a short period of time the NSF centers working jointly with DARPA and NSF acquired almost all major HPC parallel architectures and made them available to the academic HPC community. This drove a rapid evolution of exploring new algorithms for key applications which were most efficient on the new hardware architectures.

Incubated the global Internet and Web. Although the Internet protocols were over a decade old when the NSF centers program began, the decision of the networking section of the Office of Advanced Scientific Computing to only support TCP/IP, led to the NSFnet backbone, buildout of the regionals, and extension to early adopter campuses. The NSF networking division, formed after CISE was created, continue to aggressively upgrade the NSFnet. The vBNS program brought high speed shared Internet to many campuses. These activities led directly to today’s global Internet. NCSA Mosaic, developed only three years after Tim Berners-Lee created the WWW protocols, exponentially grew the nacent Web community. Indeed in 1994 NCSA was the most hit Web site on the planet and as a result we were forced to invent the first parallel Web server. The NCSA Mosaic programmers left UIUC to form Netscape, Microsoft licensed Mosaic to form the basis of Internet Explorer, and Apache moved the Mosaic server code through open source to form the Apache server. Together this led to one of the largest NSF-induced transformations of the global economy in the history of NSF grants.

Drove Scientific Visualization. The need for visualization of the massive datasets generated by the NSF centers drove the development of computer graphics teams at a number of centers. The concept of data-driven scientific visualization quickly swept the academic community, but also had a major impact, largely through SIGGRAPH, on Hollywood and later the gaming community. For instance, Stefen Fangmeier, who was NCSA scientific visualization project manager in 1987, went on to spend over 15 years as a visual effects supervisor at Industrial Light and Magic, working on such films as Terminator 2, Jurassic Park, Dreamcatcher, Perfect Storm, and Master and Commander.

Pioneered Collaboration Technologies. Because HPC applications often involve teams with members spread across multiple institutions, the NSF SC centers were natural locations for the development of collaborative technologies. There was also a need for center consulting staff to collaboratively analyze complex data output and code with remote users. As a result, the NCSA Software Development Group developed one of the first cross-platform (Windows, Mac, Unix) synchronous desktop collaboration software systems, NCSA Collage, focused on collaborative data analysis in 1990. Five years later this was replaced by NCSA Habanero, one of the largest Java applications yet written at the time, which was automatically cross platform. Under the PACI NCSAlliance, ANL led development of the Access Grid, which enabled many remote sites to share real-time video conferencing over the Internet, becoming widely used around the world. In addition, high end experiments in novel collaboration technologies also were explored, such as linking CAVEs or PowerWalls so that avatars represented the location of remote collaborators in a shared data space. This foreshadowed the use of the OptIPuter to link scalable OptIPortals with HD video streams, which is becoming commonplace today.

The Bad

Lack of institutionalization of the centers. In spite of constant requests from the centers, NSF never institutionalized the centers program as it had NCAR, NRAO, NOAO, etc. Those centers are, respectively, where the nation computes atmospheric sciences, observes with radio waves, and observes at optical wavelengths. The SC centers should be the sites where the academic community computes and where the staff support for things computational are housed. That is, select a few sites and give them the same multi-decadal guarantee of existence, with periodic reviews to maintain quality and user responsiveness. This would reduce a great deal of the endless rounds of existential worry and report writing which characterized the centers, at least during my 15 years as a director.

NSF induced a competitive culture between centers. A corollary of the above point is that the centers, by NSF design, were forced into a secretive and competitive posture relative to one another. Because one never knew when the next competition would come down from NSF, one hoarded any possible advantage to use in that next round. If the centers had been institutionalized they could relax and afford to be open and sharing. As one example of the disincentive to collaborate, it took me several years to convince the other centers to come together to form a joint national peer review board, because it undercut the ability of centers to recruit application “stars” and claim exclusivity with them. I believe the country would have seen the emergence of a national cyberinfrastructure during the PACI era if the centers had been institutionalized and incentives had been put in place for sharing and joint projects.

Narrowing Rather than Broadening Mission. One of the reasons that so many of the Good things happened was the flexibility that was inherently part of the original SC centers mandate. Yes, first and foremost the mission was centered on acquiring, installing, operating, and user consulting for HPC resources, but in addition there was funding opportunity to hire application domain experts, software tools developers, computer graphics and digital arts wizards, etc. In the PACI era this was broadened even more by the partnering with many other universities, national labs, and industrial partners. However, it seems to me that in the last decade the NSF has drastically narrowed the scope of the SC centers until finally the centers seem to be being dealt with as if they were contractors for installing and operating machines only. This had naturally led to a systematic “brain drain” away from the centers and a major lowering of their innovation opportunity space. I think it highly unlikely today that many of the successes of the first decade could occur in the centers as they are funded and reviewed currently.

The Ugly

Lack of balanced user-to-HPC architecture. From the beginning of the NSF centers program, a basic architectural concept was building a balanced end-to-end system connecting the end user with the HPC resource. Essentially, this was what drove the NSFnet build-out and the strong adoption of NCSA Telnet, allowing end users with Macs or PCs the ability to open up multiple windows on their PCs, including the supercomputer and mass storage systems. Similarly, during the first five years of the PACI, both NPACI and the Alliance spent a lot of their software development and infrastructure developments on connecting the end-user to the HPC resources. But it seems that during the TeraGrid era, the end-users only have access to the TG resources over the shared Internet, with no local facilities for compute, storage, and visualization that scale up in proportion with the capability of the TG resources. This sets up an exponentially growing data isolation of the end users as the HPC resources get exponentially faster (thus exponentially increasing the size of data sets the end-user needs access to), while the shared Internet throughput grows slowly if at all.

NSF drops support for national networking. After 15 years of leadership in increasing Internet backbone speed and connectivity to campuses, NSF has essentially removed itself from supporting the needed growth in capability of the Internet for the increasing data-intensive requirements of the end-users of the TG resources, with the notable exception of the IRNC. This is in spite of the creation and growth of the National LambdaRail and more recently the Internet2 Dynamic Circuits, both of which provide clear channel IP fiber optic connections at 10,000 Mbps. Although the NSF did support several 10G connections BETWEEN the TG sites, the NSF has essentially withdrawn from the national backbone, regional, and local support for dedicated or on-demand large data pipes to the end-users of the TG. Imagine that NSF had only supported the Internet links between the five centers in the late 1980s and hadn’t supported the build-out of the regionals and the access to the early adopting campuses!

No systemic cyberinfrastructure plan with centers having key role. In spite of 15 years of development of components of CI, there is still no NSF-wide layered CI defined and being used broadly. MREFCs are individually defining and building their own CI (NEES, OOI, NEON), as well as Division-level grants (e.g., iPLANT). I have always believed that the NSF SC centers, as the original data-intensive generators would be in the ideal position to come together with the CS and applications communities (the intersection they have always worked at) to define a national CI system and support it for the major NSF opportunities. However, to have done this would have required The Bad not to have existed. Namely, defining and supporting an NSF-wide national CI would be natural if the SC centers had institutional stability and longevity, a collaborative rather than competitive culture, and a broadening rather than narrowing mandate. With the formation of an Office of CI, there is a chance to try and change all this, but without a robust and flexible set of NSF SC centers, there are no obvious sites to house the software engineers and consultants to support a national NSF CI program.

My hope is that these remarks can help inform the discussions of the NSF HPC Workshop. I am happy to engage with the process in the future if it would be helpful.


Reprinted with permission from Calit2 and Larry Smarr.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Google Releases Deeplearn.js to Further Democratize Machine Learning

August 17, 2017

Spreading the use of machine learning tools is one of the goals of Google’s PAIR (People + AI Research) initiative, which was introduced in early July. Last week the cloud giant released deeplearn.js as part of that in Read more…

By John Russell

Spoiler Alert: Glimpse Next Week’s Solar Eclipse Via Simulation from TACC, SDSC, and NASA

August 17, 2017

Can’t wait to see next week’s solar eclipse? You can at least catch glimpses of what scientists expect it will look like. A team from Predictive Science Inc. (PSI), based in San Diego, working with Stampede2 at the Read more…

By John Russell

Dell EMC will Build OzStar – Swinburne’s New Supercomputer to Study Gravity

August 16, 2017

Dell EMC announced yesterday it is building a new supercomputer – the OzStar – for the Swinburne University of Technology (Australia) in support the ARC Centre of Excellence for Gravitational Wave Discovery (OzGrav) Read more…

By John Russell

HPE Extreme Performance Solutions

Leveraging Deep Learning for Fraud Detection

Advancements in computing technologies and the expanding use of e-commerce platforms have dramatically increased the risk of fraud for financial services companies and their customers. Read more…

Microsoft Bolsters Azure With Cloud HPC Deal

August 15, 2017

Microsoft has acquired cloud computing software vendor Cycle Computing in a move designed to bring orchestration tools along with high-end computing access capabilities to the cloud. Terms of the acquisition were not Read more…

By George Leopold

Microsoft Bolsters Azure With Cloud HPC Deal

August 15, 2017

Microsoft has acquired cloud computing software vendor Cycle Computing in a move designed to bring orchestration tools along with high-end computing access capa Read more…

By George Leopold

HPE Ships Supercomputer to Space Station, Final Destination Mars

August 14, 2017

With a manned mission to Mars on the horizon, the demand for space-based supercomputing is at hand. Today HPE and NASA sent the first off-the-shelf HPC system i Read more…

By Tiffany Trader

AMD EPYC Video Takes Aim at Intel’s Broadwell

August 14, 2017

Let the benchmarking begin. Last week, AMD posted a YouTube video in which one of its EPYC-based systems outperformed a ‘comparable’ Intel Broadwell-based s Read more…

By John Russell

Deep Learning Thrives in Cancer Moonshot

August 8, 2017

The U.S. War on Cancer, certainly a worthy cause, is a collection of programs stretching back more than 40 years and abiding under many banners. The latest is t Read more…

By John Russell

IBM Raises the Bar for Distributed Deep Learning

August 8, 2017

IBM is announcing today an enhancement to its PowerAI software platform aimed at facilitating the practical scaling of AI models on today’s fastest GPUs. Scal Read more…

By Tiffany Trader

IBM Storage Breakthrough Paves Way for 330TB Tape Cartridges

August 3, 2017

IBM announced yesterday a new record for magnetic tape storage that it says will keep tape storage density on a Moore's law-like path far into the next decade. Read more…

By Tiffany Trader

AMD Stuffs a Petaflops of Machine Intelligence into 20-Node Rack

August 1, 2017

With its Radeon “Vega” Instinct datacenter GPUs and EPYC “Naples” server chips entering the market this summer, AMD has positioned itself for a two-head Read more…

By Tiffany Trader

Cray Moves to Acquire the Seagate ClusterStor Line

July 28, 2017

This week Cray announced that it is picking up Seagate's ClusterStor HPC storage array business for an undisclosed sum. "In short we're effectively transitioning the bulk of the ClusterStor product line to Cray," said CEO Peter Ungaro. Read more…

By Tiffany Trader

Nvidia’s Mammoth Volta GPU Aims High for AI, HPC

May 10, 2017

At Nvidia's GPU Technology Conference (GTC17) in San Jose, Calif., this morning, CEO Jensen Huang announced the company's much-anticipated Volta architecture a Read more…

By Tiffany Trader

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Reinders: “AVX-512 May Be a Hidden Gem” in Intel Xeon Scalable Processors

June 29, 2017

Imagine if we could use vector processing on something other than just floating point problems.  Today, GPUs and CPUs work tirelessly to accelerate algorithms Read more…

By James Reinders

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Quantum Bits: D-Wave and VW; Google Quantum Lab; IBM Expands Access

March 21, 2017

For a technology that’s usually characterized as far off and in a distant galaxy, quantum computing has been steadily picking up steam. Just how close real-wo Read more…

By John Russell

Russian Researchers Claim First Quantum-Safe Blockchain

May 25, 2017

The Russian Quantum Center today announced it has overcome the threat of quantum cryptography by creating the first quantum-safe blockchain, securing cryptocurrencies like Bitcoin, along with classified government communications and other sensitive digital transfers. Read more…

By Doug Black

HPC Compiler Company PathScale Seeks Life Raft

March 23, 2017

HPCwire has learned that HPC compiler company PathScale has fallen on difficult times and is asking the community for help or actively seeking a buyer for its a Read more…

By Tiffany Trader

Trump Budget Targets NIH, DOE, and EPA; No Mention of NSF

March 16, 2017

President Trump’s proposed U.S. fiscal 2018 budget issued today sharply cuts science spending while bolstering military spending as he promised during the cam Read more…

By John Russell

Leading Solution Providers

CPU-based Visualization Positions for Exascale Supercomputing

March 16, 2017

In this contributed perspective piece, Intel’s Jim Jeffers makes the case that CPU-based visualization is now widely adopted and as such is no longer a contrarian view, but is rather an exascale requirement. Read more…

By Jim Jeffers, Principal Engineer and Engineering Leader, Intel

Groq This: New AI Chips to Give GPUs a Run for Deep Learning Money

April 24, 2017

CPUs and GPUs, move over. Thanks to recent revelations surrounding Google’s new Tensor Processing Unit (TPU), the computing world appears to be on the cusp of Read more…

By Alex Woodie

Google Debuts TPU v2 and will Add to Google Cloud

May 25, 2017

Not long after stirring attention in the deep learning/AI community by revealing the details of its Tensor Processing Unit (TPU), Google last week announced the Read more…

By John Russell

MIT Mathematician Spins Up 220,000-Core Google Compute Cluster

April 21, 2017

On Thursday, Google announced that MIT math professor and computational number theorist Andrew V. Sutherland had set a record for the largest Google Compute Engine (GCE) job. Sutherland ran the massive mathematics workload on 220,000 GCE cores using preemptible virtual machine instances. Read more…

By Tiffany Trader

Six Exascale PathForward Vendors Selected; DoE Providing $258M

June 15, 2017

The much-anticipated PathForward awards for hardware R&D in support of the Exascale Computing Project were announced today with six vendors selected – AMD Read more…

By John Russell

Top500 Results: Latest List Trends and What’s in Store

June 19, 2017

Greetings from Frankfurt and the 2017 International Supercomputing Conference where the latest Top500 list has just been revealed. Although there were no major Read more…

By Tiffany Trader

IBM Clears Path to 5nm with Silicon Nanosheets

June 5, 2017

Two years since announcing the industry’s first 7nm node test chip, IBM and its research alliance partners GlobalFoundries and Samsung have developed a proces Read more…

By Tiffany Trader

Messina Update: The US Path to Exascale in 16 Slides

April 26, 2017

Paul Messina, director of the U.S. Exascale Computing Project, provided a wide-ranging review of ECP’s evolving plans last week at the HPC User Forum. Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Share This