Berkeley Lab Highlights ‘the Little Computer Cluster That Could’

May 3, 2019

May 3, 2019 — Decades before “big data” and “the cloud” were a part of our everyday lives and conversations, a custom computer cluster based at the Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab) enabled physicists around the world to remotely and simultaneously analyze and visualize data.

The PDSF computer cluster in 2003. (Credit: Berkeley lab)

The Parallel Distributed Systems Facility (PDSF) cluster, which had served as a steady workhorse in supporting groundbreaking and even Nobel-winning research around the world since the 1990s, switched off last month.

During its lifetime the cluster and its dedicated support team racked up many computing achievements and innovations in support of large collaborative efforts in nuclear physics and high-energy physics. Some of these innovations have persevered and evolved in other systems.

The cluster handled data for experiments that produce a primordial “soup” of subatomic particles to teach us about the makings of matter, search for intergalactic particle signals deep within Antarctic ice, and hunt for dark matter in a mile-deep tank of liquid xenon at a former mine site. It also handled data for a space observatory mapping the universe’s earliest light, and for Earth-based observations of supernovas.

It supported research leading to the discoveries of the morphing abilities of ghostly particles called neutrinos, the existence of the Higgs boson and the related Higgs field that generates mass through particle interactions, and the accelerating expansion rate of the universe that is attributed to a mysterious force called dark energy.

Some of PDSF’s collaboration users have transitioned to the Cori supercomputer at Berkeley Lab’s National Energy Research Scientific Computing Center (NERSC), with other participants moving to other systems. The transition to Cori gives users access to more computing power in an era of increasingly hefty and complex datasets and demands.

“A lot of great physics and science was done at PDSF,” said Richard Shane Canon, a project engineer at NERSC who served as a system lead for PDSF from 2003-05. “We learned a lot of cool things from it, and some of those things even became part of how we run our supercomputers today. It was also a unique partnership between experiments and a supercomputing facility – it was the first of its kind.”

PDSF was small when compared to its supercomputer counterparts that handle a heavier load of computer processors, data, and users, but it had developed a reputation for being responsive and adaptable, and its support crew over the years often included physicists who understood the science as well as the hardware and software capabilities and limitations.

“It was ‘The Little Engine That Could,’” said Iwona Sakrejda, a nuclear physicist who supported PDSF and its users for over a decade in a variety of roles at NERSC and retired from Berkeley Lab in 2015. “It was the ‘boutique’ computer cluster.”

PDSF, because it was small and flexible, offered an R&D environment that allowed researchers to test out new ideas for analyzing and visualizing data. Such an environment may have been harder to find on larger systems, she said. Its size also afforded a personal touch.

“When things didn’t work, they had more handholding,” she added, recalling the numerous researchers that she guided through the PDSF system – including early career researchers working on their theses.

“It was gratifying. I developed a really good relationship with the users,” Sakrejda said. “I understood what they were trying to do and how their programs worked, which was important in creating the right architecture for what they were trying to accomplish.”

She noted that because the PDSF system was constantly refreshed, it sometimes led to an odd assortment of equipment put together from different generations of hardware, in sharp contrast to the largely homogenous architecture of today’s supercomputers.

PDSF participants included collaborations for the Sudbury Neutrino Observatory (SNO) in Canada, the Solenoid Tracker at Brookhaven National Laboratory’s Relativistic Heavy Ion Collider (STAR), IceCube near the South Pole, Daya Bay in China, the Cryogenic Underground Observatory for Rare Events (CUORE) in Italy, the Large Underground Xenon (LUX), LUX-ZEPLIN (LZ), and MAJORANA experiments in South Dakota, the Collider Detector at Fermilab (CDF), and the ATLAS Experiment and A Large Ion Collider Experiment (ALICE) at Europe’s CERN laboratory, among others. The most data-intensive experiments use a distributed system of clusters like PDSF.

The STAR collaboration was the original participant and had by far the highest overall use of PDSF, and the ALICE collaboration had grown to become one of the largest PDSF users by 2010. Both experiments have explored the formation and properties of an exotic superhot particle soup known as the quark-gluon plasma by colliding heavy particles.

SNO researchers’ findings about neutrinos’ mass and ability to change into different forms or flavors led to the 2015 Nobel Prize in physics(see a related article), and PDSF played a notable role in the early analyses of SNO data.

Art McDonald, who shared that Nobel as director of the SNO Collaboration, said, “The PDSF computing facility was used extensively by the SNO Collaboration, including our collaborators at Berkeley Lab.”

He added, “This resource was extremely valuable in simulations and data analysis over many years, leading to our breakthroughs in neutrino physics and resulting in the award of the 2015 Nobel Prize and the 2016 Breakthrough Prize in Fundamental Physics to the entire SNO Collaboration. We are very grateful for the scientific opportunities provided to us through access to the PDSF facility.”

PDSF’s fast processing of data from the Daya Bay nuclear reactor-based experiment was also integral in precise measurements of neutrino properties.

The cluster was a trendsetter for a so-called condo model in shared computing. This model allowed collaborations to buy a share of computing power and dedicated storage space that was customized for their own needs, and a participant’s allocated computer processors on the system could also be temporarily co-opted by other cluster participants when they were not active.

In this condo analogy, “You could go use your neighbor’s house if your neighbor wasn’t using it,” said Canon, a former experimental physicist. “If everybody else was idle you could take advantage of the free capacity.” Canon noted that many universities have adopted this kind of model for their computer users.

Importantly, the PDSF system was also designed to provide easy access and support for individual collaboration members rather than requiring access to be funneled through one account per project or experiment. “If everybody had to log in to submit their jobs, it just wouldn’t work in these big collaborations,” Canon said.

The original PDSF cluster, called the Physics Detector Simulation Facility, was launched in March 1991 to support analyses and simulations for a planned U.S. particle collider project known as the Superconducting Super Collider. It was set up in Texas, the planned home for the collider, though the collider project was ultimately canceled in 1993.

1994 retrospective report on the collider project notes that the original PDSF had been built up to perform a then-impressive 7 billion instructions per second and that the science need for PDSF to simulate complex particle collisions had driven “substantial technological advances” in the nation’s computer industry.

At the time, PDSF was “the world’s most powerful high-energy physics computing facility,” the report also noted, and was built using non-proprietary systems and equipment from different manufacturers “at a fraction of the cost” of supercomputers.

Longtime Berkeley Lab physicist Stu Loken, who had led the Lab’s Information and Computing Sciences Division from 1988-2000, had played a pivotal role in PDSF’s development and in siting the cluster at Berkeley Lab.

PDSF moved to Berkeley Lab in 1996 with a new name and a new role. It was largely rebuilt with new hardware and was moved to a computer center in Oakland, Calif., in 2000 before returning once again to the Berkeley Lab site.

“A lot of the tools that we deployed to facilitate the data processing on PDSF are now being used by data users at NERSC,” said Lisa Gerhardt, a big-data architect at NERSC who worked on the PDSF system. She previously had served as a neutrino astrophysicist for the IceCube experiment.

Gerhardt noted that the cluster was nimble and responsive because of its focused user community. “Having a smaller and cohesive user pool made it easier to have direct relationships,” she said.

And Jan Balewski, computing systems engineer at NERSC who worked to transition PDSF users to the new system, said the scientific background of PDSF staff through the years was beneficial for the cluster’s users.

Balewski, a former experimental physicist, said, “Having our background, we were able to discuss with users what they really needed. And maybe, in some cases, what they were asking for was not what they really needed. We were able to help them find a solution.”

R. Jefferson “Jeff” Porter, a computer systems engineer and physicist in Berkeley Lab’s Nuclear Science Division who began working with the PDSF cluster and users as a postdoctoral researcher at Berkeley Lab in the mid-1990s, said, “PDSF was a resource that dealt with big data – many years before big data became a big thing for the rest of the world.”

It had always used off-the-shelf hardware and was steadily upgraded – typically twice a year. Even so, it was dwarfed by its supercomputer counterparts. About seven years ago the PDSF cluster had about 1,500 computer cores, compared to about 100,000 on a neighboring supercomputer at NERSC at the time. A core is the part of a computer processor that performs calculations.

Porter was later hired by NERSC to support grid computing, a distributed form of computing in which computers in different locations can work together to perform larger tasks. He returned to the Nuclear Science Division to lead the ALICE USA computing project, which established PDSF as one of about 80 grid sites for CERN’s ALICE experiment. Use of PDSF by ALICE was an easy fit, since the PDSF community “was at the forefront of grid computing,” Porter said.

In some cases, the unique demands of PDSF cluster users would also lead to the adoption of new tools at supercomputer systems. “Our community would push NERSC in ways they hadn’t been thinking,” he said. CERN developed a system to distribute software that was adopted by PDSF about five years ago, and that has also been adopted by many scientific collaborations. NERSC put in a big effort, Porter said, to integrate this system into larger machines: Cori and Edison.

Supporting multiple projects on a single system was a challenge for PDSF since each project had unique software needs, so Canon led the development of a system known as Chroot OS (CHOS) to enable each project to have a custom computing environment.

Porter explained that CHOS was an early form of “container computing” that has since enjoyed widespread adoption.

PDSF was run by a Berkeley Lab-based steering committee that typically had a member from each participating experiment and a member from NERSC, and Porter had served for about five years as the committee chair. He had been focused for the past year on how to transition users to the Cori supercomputer and other computing resources, as needed.

Balewski said that the leap of users from PDSF to Cori brings them access to far greater computing power, and allows them to “ask questions they could never ask on a smaller system.”

He added, “It’s like moving from a small town – where you know everyone but resources are limited – to a big city that is more crowded but also offers more opportunities.”

About Lawrence Berkeley National Laboratory

Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 13 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.

Source: Lawrence Berkeley National Laboratory

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Researchers Use Supercomputing to Study Links Between Hurricanes and Climate Change

July 19, 2019

As climate change looms, researchers are scrambling to answer the question of how a warming planet will affect the frequency and severity of already-deadly hurricanes. Now, a team of researchers from the University of Il Read more…

By Oliver Peckham

San Diego Supercomputer Center to Welcome ‘Expanse’ Supercomputer in 2020

July 18, 2019

With a $10 million dollar award from the National Science Foundation, San Diego Supercomputer Center (SDSC) at the University of California San Diego is procuring a new supercomputer, called Expanse, to be deployed next Read more…

By Staff report

Informing Designs of Safer, More Efficient Aircraft with Exascale Computing

July 18, 2019

During the process of designing an aircraft, aeronautical engineers must perform predictive simulations to understand how airflow around the plane impacts flight characteristics. However, modeling the complexities and su Read more…

By Rob Johnson

HPE Extreme Performance Solutions

Bring the Combined Power of HPC and AI to Your Business Transformation

A growing number of commercial businesses are implementing HPC solutions to derive actionable business insights, to run higher performance applications and to gain a competitive advantage. Read more…

IBM Accelerated Insights

With HPC the Future is Looking Grid

Gone are the days when problems such as unraveling genetic sequences or searching for extra-terrestrial life were solved using only a single high-performance computing (HPC) resource located at one facility. Read more…

How Fast is Your Rubik Solver; This One’s Probably Faster

July 18, 2019

In the race to solve Rubik’s Cube, the time-to-finish keeps shrinking. This year Philipp Weyer from Germany won the 10th World Cube Association (WCA) Championship held in Melbourne, Australia, with a 6.74-second perfo Read more…

By John Russell

Informing Designs of Safer, More Efficient Aircraft with Exascale Computing

July 18, 2019

During the process of designing an aircraft, aeronautical engineers must perform predictive simulations to understand how airflow around the plane impacts fligh Read more…

By Rob Johnson

Intel Debuts Pohoiki Beach, Its 8M Neuron Neuromorphic Development System

July 17, 2019

Neuromorphic computing has received less fanfare of late than quantum computing whose mystery has captured public attention and which seems to have generated mo Read more…

By John Russell

Goonhilly Unveils New Immersion-Cooled Platform, Doubles Down on Sustainability Mission

July 16, 2019

Goonhilly Earth Station has opened its new datacenter – an enhancement to its existing tier 3 facility – in Cornwall, England, touting an ambitious commitme Read more…

By Oliver Peckham

ISC19 Cluster Competition: Application Results, Finally!

July 15, 2019

Our exhaustive coverage of the ISC19 Student Cluster Competition continues as we discuss the application scores below. While the scores were typically high, som Read more…

By Dan Olds

Nvidia Expands DGX-Ready AI Program to 19 Countries

July 11, 2019

Nvidia’s DGX-Ready Data Center Program, announced in January and designed to provide colo and public cloud-like options to access the company’s GPU-powered Read more…

By Doug Black

Argonne Team Makes Record Globus File Transfer

July 10, 2019

A team of scientists at Argonne National Laboratory has broken a data transfer record by moving a staggering 2.9 petabytes of data for a research project.  The data – from three large cosmological simulations – was generated and stored on the Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF)... Read more…

By Oliver Peckham

Nvidia, Google Tie in Second MLPerf Training ‘At-Scale’ Round

July 10, 2019

Results for the second round of the AI benchmarking suite known as MLPerf were published today with Google Cloud and Nvidia each picking up three wins in the at Read more…

By Tiffany Trader

Applied Materials Embedding New Memory Technologies in Chips

July 9, 2019

Applied Materials, the $17 billion Santa Clara-based materials engineering company for the semiconductor industry, today announced manufacturing systems enablin Read more…

By Doug Black

High Performance (Potato) Chips

May 5, 2006

In this article, we focus on how Procter & Gamble is using high performance computing to create some common, everyday supermarket products. Tom Lange, a 27-year veteran of the company, tells us how P&G models products, processes and production systems for the betterment of consumer package goods. Read more…

By Michael Feldman

Cray, AMD to Extend DOE’s Exascale Frontier

May 7, 2019

Cray and AMD are coming back to Oak Ridge National Laboratory to partner on the world’s largest and most expensive supercomputer. The Department of Energy’s Read more…

By Tiffany Trader

Graphene Surprises Again, This Time for Quantum Computing

May 8, 2019

Graphene is fascinating stuff with promise for use in a seeming endless number of applications. This month researchers from the University of Vienna and Institu Read more…

By John Russell

AMD Verifies Its Largest 7nm Chip Design in Ten Hours

June 5, 2019

AMD announced last week that its engineers had successfully executed the first physical verification of its largest 7nm chip design – in just ten hours. The AMD Radeon Instinct Vega20 – which boasts 13.2 billion transistors – was tested using a TSMC-certified Calibre nmDRC software platform from Mentor. Read more…

By Oliver Peckham

TSMC and Samsung Moving to 5nm; Whither Moore’s Law?

June 12, 2019

With reports that Taiwan Semiconductor Manufacturing Co. (TMSC) and Samsung are moving quickly to 5nm manufacturing, it’s a good time to again ponder whither goes the venerable Moore’s law. Shrinking feature size has of course been the primary hallmark of achieving Moore’s law... Read more…

By John Russell

Deep Learning Competitors Stalk Nvidia

May 14, 2019

There is no shortage of processing architectures emerging to accelerate deep learning workloads, with two more options emerging this week to challenge GPU leader Nvidia. First, Intel researchers claimed a new deep learning record for image classification on the ResNet-50 convolutional neural network. Separately, Israeli AI chip startup Read more…

By George Leopold

Nvidia Embraces Arm, Declares Intent to Accelerate All CPU Architectures

June 17, 2019

As the Top500 list was being announced at ISC in Frankfurt today with an upgraded petascale Arm supercomputer in the top third of the list, Nvidia announced its Read more…

By Tiffany Trader

Top500 Purely Petaflops; US Maintains Performance Lead

June 17, 2019

With the kick-off of the International Supercomputing Conference (ISC) in Frankfurt this morning, the 53rd Top500 list made its debut, and this one's for petafl Read more…

By Tiffany Trader

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour


Intel Launches Cascade Lake Xeons with Up to 56 Cores

April 2, 2019

At Intel's Data-Centric Innovation Day in San Francisco (April 2), the company unveiled its second-generation Xeon Scalable (Cascade Lake) family and debuted it Read more…

By Tiffany Trader

Cray – and the Cray Brand – to Be Positioned at Tip of HPE’s HPC Spear

May 22, 2019

More so than with most acquisitions of this kind, HPE’s purchase of Cray for $1.3 billion, announced last week, seems to have elements of that overused, often Read more…

By Doug Black and Tiffany Trader

A Behind-the-Scenes Look at the Hardware That Powered the Black Hole Image

June 24, 2019

Two months ago, the first-ever image of a black hole took the internet by storm. A team of scientists took years to produce and verify the striking image – an Read more…

By Oliver Peckham

Announcing four new HPC capabilities in Google Cloud Platform

April 15, 2019

When you’re running compute-bound or memory-bound applications for high performance computing or large, data-dependent machine learning training workloads on Read more…

By Wyatt Gorman, HPC Specialist, Google Cloud; Brad Calder, VP of Engineering, Google Cloud; Bart Sano, VP of Platforms, Google Cloud

Chinese Company Sugon Placed on US ‘Entity List’ After Strong Showing at International Supercomputing Conference

June 26, 2019

After more than a decade of advancing its supercomputing prowess, operating the world’s most powerful supercomputer from June 2013 to June 2018, China is keep Read more…

By Tiffany Trader

In Wake of Nvidia-Mellanox: Xilinx to Acquire Solarflare

April 25, 2019

With echoes of Nvidia’s recent acquisition of Mellanox, FPGA maker Xilinx has announced a definitive agreement to acquire Solarflare Communications, provider Read more…

By Doug Black

Qualcomm Invests in RISC-V Startup SiFive

June 7, 2019

Investors are zeroing in on the open standard RISC-V instruction set architecture and the processor intellectual property being developed by a batch of high-flying chip startups. Last fall, Esperanto Technologies announced a $58 million funding round. Read more…

By George Leopold

Nvidia Claims 6000x Speed-Up for Stock Trading Backtest Benchmark

May 13, 2019

A stock trading backtesting algorithm used by hedge funds to simulate trading variants has received a massive, GPU-based performance boost, according to Nvidia, Read more…

By Doug Black

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This