Berkeley Lab Highlights ‘the Little Computer Cluster That Could’

May 3, 2019

May 3, 2019 — Decades before “big data” and “the cloud” were a part of our everyday lives and conversations, a custom computer cluster based at the Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab) enabled physicists around the world to remotely and simultaneously analyze and visualize data.

The PDSF computer cluster in 2003. (Credit: Berkeley lab)

The Parallel Distributed Systems Facility (PDSF) cluster, which had served as a steady workhorse in supporting groundbreaking and even Nobel-winning research around the world since the 1990s, switched off last month.

During its lifetime the cluster and its dedicated support team racked up many computing achievements and innovations in support of large collaborative efforts in nuclear physics and high-energy physics. Some of these innovations have persevered and evolved in other systems.

The cluster handled data for experiments that produce a primordial “soup” of subatomic particles to teach us about the makings of matter, search for intergalactic particle signals deep within Antarctic ice, and hunt for dark matter in a mile-deep tank of liquid xenon at a former mine site. It also handled data for a space observatory mapping the universe’s earliest light, and for Earth-based observations of supernovas.

It supported research leading to the discoveries of the morphing abilities of ghostly particles called neutrinos, the existence of the Higgs boson and the related Higgs field that generates mass through particle interactions, and the accelerating expansion rate of the universe that is attributed to a mysterious force called dark energy.

Some of PDSF’s collaboration users have transitioned to the Cori supercomputer at Berkeley Lab’s National Energy Research Scientific Computing Center (NERSC), with other participants moving to other systems. The transition to Cori gives users access to more computing power in an era of increasingly hefty and complex datasets and demands.

“A lot of great physics and science was done at PDSF,” said Richard Shane Canon, a project engineer at NERSC who served as a system lead for PDSF from 2003-05. “We learned a lot of cool things from it, and some of those things even became part of how we run our supercomputers today. It was also a unique partnership between experiments and a supercomputing facility – it was the first of its kind.”

PDSF was small when compared to its supercomputer counterparts that handle a heavier load of computer processors, data, and users, but it had developed a reputation for being responsive and adaptable, and its support crew over the years often included physicists who understood the science as well as the hardware and software capabilities and limitations.

“It was ‘The Little Engine That Could,’” said Iwona Sakrejda, a nuclear physicist who supported PDSF and its users for over a decade in a variety of roles at NERSC and retired from Berkeley Lab in 2015. “It was the ‘boutique’ computer cluster.”

PDSF, because it was small and flexible, offered an R&D environment that allowed researchers to test out new ideas for analyzing and visualizing data. Such an environment may have been harder to find on larger systems, she said. Its size also afforded a personal touch.

“When things didn’t work, they had more handholding,” she added, recalling the numerous researchers that she guided through the PDSF system – including early career researchers working on their theses.

“It was gratifying. I developed a really good relationship with the users,” Sakrejda said. “I understood what they were trying to do and how their programs worked, which was important in creating the right architecture for what they were trying to accomplish.”

She noted that because the PDSF system was constantly refreshed, it sometimes led to an odd assortment of equipment put together from different generations of hardware, in sharp contrast to the largely homogenous architecture of today’s supercomputers.

PDSF participants included collaborations for the Sudbury Neutrino Observatory (SNO) in Canada, the Solenoid Tracker at Brookhaven National Laboratory’s Relativistic Heavy Ion Collider (STAR), IceCube near the South Pole, Daya Bay in China, the Cryogenic Underground Observatory for Rare Events (CUORE) in Italy, the Large Underground Xenon (LUX), LUX-ZEPLIN (LZ), and MAJORANA experiments in South Dakota, the Collider Detector at Fermilab (CDF), and the ATLAS Experiment and A Large Ion Collider Experiment (ALICE) at Europe’s CERN laboratory, among others. The most data-intensive experiments use a distributed system of clusters like PDSF.

The STAR collaboration was the original participant and had by far the highest overall use of PDSF, and the ALICE collaboration had grown to become one of the largest PDSF users by 2010. Both experiments have explored the formation and properties of an exotic superhot particle soup known as the quark-gluon plasma by colliding heavy particles.

SNO researchers’ findings about neutrinos’ mass and ability to change into different forms or flavors led to the 2015 Nobel Prize in physics(see a related article), and PDSF played a notable role in the early analyses of SNO data.

Art McDonald, who shared that Nobel as director of the SNO Collaboration, said, “The PDSF computing facility was used extensively by the SNO Collaboration, including our collaborators at Berkeley Lab.”

He added, “This resource was extremely valuable in simulations and data analysis over many years, leading to our breakthroughs in neutrino physics and resulting in the award of the 2015 Nobel Prize and the 2016 Breakthrough Prize in Fundamental Physics to the entire SNO Collaboration. We are very grateful for the scientific opportunities provided to us through access to the PDSF facility.”

PDSF’s fast processing of data from the Daya Bay nuclear reactor-based experiment was also integral in precise measurements of neutrino properties.

The cluster was a trendsetter for a so-called condo model in shared computing. This model allowed collaborations to buy a share of computing power and dedicated storage space that was customized for their own needs, and a participant’s allocated computer processors on the system could also be temporarily co-opted by other cluster participants when they were not active.

In this condo analogy, “You could go use your neighbor’s house if your neighbor wasn’t using it,” said Canon, a former experimental physicist. “If everybody else was idle you could take advantage of the free capacity.” Canon noted that many universities have adopted this kind of model for their computer users.

Importantly, the PDSF system was also designed to provide easy access and support for individual collaboration members rather than requiring access to be funneled through one account per project or experiment. “If everybody had to log in to submit their jobs, it just wouldn’t work in these big collaborations,” Canon said.

The original PDSF cluster, called the Physics Detector Simulation Facility, was launched in March 1991 to support analyses and simulations for a planned U.S. particle collider project known as the Superconducting Super Collider. It was set up in Texas, the planned home for the collider, though the collider project was ultimately canceled in 1993.

1994 retrospective report on the collider project notes that the original PDSF had been built up to perform a then-impressive 7 billion instructions per second and that the science need for PDSF to simulate complex particle collisions had driven “substantial technological advances” in the nation’s computer industry.

At the time, PDSF was “the world’s most powerful high-energy physics computing facility,” the report also noted, and was built using non-proprietary systems and equipment from different manufacturers “at a fraction of the cost” of supercomputers.

Longtime Berkeley Lab physicist Stu Loken, who had led the Lab’s Information and Computing Sciences Division from 1988-2000, had played a pivotal role in PDSF’s development and in siting the cluster at Berkeley Lab.

PDSF moved to Berkeley Lab in 1996 with a new name and a new role. It was largely rebuilt with new hardware and was moved to a computer center in Oakland, Calif., in 2000 before returning once again to the Berkeley Lab site.

“A lot of the tools that we deployed to facilitate the data processing on PDSF are now being used by data users at NERSC,” said Lisa Gerhardt, a big-data architect at NERSC who worked on the PDSF system. She previously had served as a neutrino astrophysicist for the IceCube experiment.

Gerhardt noted that the cluster was nimble and responsive because of its focused user community. “Having a smaller and cohesive user pool made it easier to have direct relationships,” she said.

And Jan Balewski, computing systems engineer at NERSC who worked to transition PDSF users to the new system, said the scientific background of PDSF staff through the years was beneficial for the cluster’s users.

Balewski, a former experimental physicist, said, “Having our background, we were able to discuss with users what they really needed. And maybe, in some cases, what they were asking for was not what they really needed. We were able to help them find a solution.”

R. Jefferson “Jeff” Porter, a computer systems engineer and physicist in Berkeley Lab’s Nuclear Science Division who began working with the PDSF cluster and users as a postdoctoral researcher at Berkeley Lab in the mid-1990s, said, “PDSF was a resource that dealt with big data – many years before big data became a big thing for the rest of the world.”

It had always used off-the-shelf hardware and was steadily upgraded – typically twice a year. Even so, it was dwarfed by its supercomputer counterparts. About seven years ago the PDSF cluster had about 1,500 computer cores, compared to about 100,000 on a neighboring supercomputer at NERSC at the time. A core is the part of a computer processor that performs calculations.

Porter was later hired by NERSC to support grid computing, a distributed form of computing in which computers in different locations can work together to perform larger tasks. He returned to the Nuclear Science Division to lead the ALICE USA computing project, which established PDSF as one of about 80 grid sites for CERN’s ALICE experiment. Use of PDSF by ALICE was an easy fit, since the PDSF community “was at the forefront of grid computing,” Porter said.

In some cases, the unique demands of PDSF cluster users would also lead to the adoption of new tools at supercomputer systems. “Our community would push NERSC in ways they hadn’t been thinking,” he said. CERN developed a system to distribute software that was adopted by PDSF about five years ago, and that has also been adopted by many scientific collaborations. NERSC put in a big effort, Porter said, to integrate this system into larger machines: Cori and Edison.

Supporting multiple projects on a single system was a challenge for PDSF since each project had unique software needs, so Canon led the development of a system known as Chroot OS (CHOS) to enable each project to have a custom computing environment.

Porter explained that CHOS was an early form of “container computing” that has since enjoyed widespread adoption.

PDSF was run by a Berkeley Lab-based steering committee that typically had a member from each participating experiment and a member from NERSC, and Porter had served for about five years as the committee chair. He had been focused for the past year on how to transition users to the Cori supercomputer and other computing resources, as needed.

Balewski said that the leap of users from PDSF to Cori brings them access to far greater computing power, and allows them to “ask questions they could never ask on a smaller system.”

He added, “It’s like moving from a small town – where you know everyone but resources are limited – to a big city that is more crowded but also offers more opportunities.”

About Lawrence Berkeley National Laboratory

Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 13 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.


Source: Lawrence Berkeley National Laboratory

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

HPC Career Notes: July 2020 Edition

July 1, 2020

In this monthly feature, we'll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it's a promotion, new company hire, or even an accolade, we've got Read more…

By Mariana Iriarte

Supercomputers Enable Radical, Promising New COVID-19 Drug Development Approach

July 1, 2020

Around the world, innumerable supercomputers are sifting through billions of molecules in a desperate search for a viable therapeutic to treat COVID-19. Those molecules are pulled from enormous databases of known compoun Read more…

By Oliver Peckham

HPC-Powered Simulations Reveal a Looming Climatic Threat to Vital Monsoon Seasons

June 30, 2020

As June draws to a close, eyes are turning to the latter half of the year – and with it, the monsoon and hurricane seasons that can prove vital or devastating for many of the world’s coastal communities. Now, climate Read more…

By Oliver Peckham

Hyperion Forecast – Headwinds in 2020 Won’t Stifle Cloud HPC Adoption or Arm’s Rise

June 30, 2020

The semiannual taking of HPC’s pulse by Hyperion Research – late fall at SC and early summer at ISC – is a much-watched indicator of things come. This year is no different though the conversion of ISC to a digital Read more…

By John Russell

What’s New in HPC Research: Mosquitoes, [email protected], the Last Journey & More

June 29, 2020

In this bimonthly feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

By Oliver Peckham

AWS Solution Channel

Maxar Builds HPC on AWS to Deliver Forecasts 58% Faster Than Weather Supercomputer

When weather threatens drilling rigs, refineries, and other energy facilities, oil and gas companies want to move fast to protect personnel and equipment. And for firms that trade commodity shares in oil, precious metals, crops, and livestock, the weather can significantly impact their buy-sell decisions. Read more…

Intel® HPC + AI Pavilion

Supercomputing the Pandemic: Scientific Community Tackles COVID-19 from Multiple Perspectives

Since their inception, supercomputers have taken on the biggest, most complex, and most data-intensive computing challenges—from confirming Einstein’s theories about gravitational waves to predicting the impacts of climate change. Read more…

Racism and HPC: a Special Podcast

June 29, 2020

Promoting greater diversity in HPC is a much-discussed goal and ostensibly a long-sought goal in HPC. Yet it seems clear HPC is far from achieving this goal. Recent U.S. events, most poignantly the killing of George Floy Read more…

Hyperion Forecast – Headwinds in 2020 Won’t Stifle Cloud HPC Adoption or Arm’s Rise

June 30, 2020

The semiannual taking of HPC’s pulse by Hyperion Research – late fall at SC and early summer at ISC – is a much-watched indicator of things come. This yea Read more…

By John Russell

Racism and HPC: a Special Podcast

June 29, 2020

Promoting greater diversity in HPC is a much-discussed goal and ostensibly a long-sought goal in HPC. Yet it seems clear HPC is far from achieving this goal. Re Read more…

Top500 Trends: Movement on Top, but Record Low Turnover

June 25, 2020

The 55th installment of the Top500 list saw strong activity in the leadership segment with four new systems in the top ten and a crowning achievement from the f Read more…

By Tiffany Trader

ISC 2020 Keynote: Hope for the Future, Praise for Fugaku and HPC’s Pandemic Response

June 24, 2020

In stark contrast to past years Thomas Sterling’s ISC20 keynote today struck a more somber note with the COVID-19 pandemic as the central character in Sterling’s annual review of worldwide trends in HPC. Better known for his engaging manner and occasional willingness to poke prickly egos, Sterling instead strode through the numbing statistics associated... Read more…

By John Russell

ISC 2020’s Student Cluster Competition Winners Announced

June 24, 2020

Normally, the Student Cluster Competition involves teams of students building real computing clusters on the show floors of major supercomputer conferences and Read more…

By Oliver Peckham

Hoefler’s Whirlwind ISC20 Virtual Tour of ML Trends in 9 Slides

June 23, 2020

The ISC20 experience this year via livestreaming and pre-recordings is interesting and perhaps a bit odd. That said presenters’ efforts to condense their comments makes for economic use of your time. Torsten Hoefler’s whirlwind 12-minute tour of ML is a great example. Hoefler, leader of the planned ISC20 Machine Learning... Read more…

By John Russell

At ISC, the Fight Against COVID-19 Took the Stage – and Yes, Fugaku Was There

June 23, 2020

With over nine million infected and nearly half a million dead, the COVID-19 pandemic has seized the world’s attention for several months. It has also dominat Read more…

By Oliver Peckham

Japan’s Fugaku Tops Global Supercomputing Rankings

June 22, 2020

A new Top500 champ was unveiled today. Supercomputer Fugaku, the pride of Japan and the namesake of Mount Fuji, vaulted to the top of the 55th edition of the To Read more…

By Tiffany Trader

Supercomputer Modeling Tests How COVID-19 Spreads in Grocery Stores

April 8, 2020

In the COVID-19 era, many people are treating simple activities like getting gas or groceries with caution as they try to heed social distancing mandates and protect their own health. Still, significant uncertainty surrounds the relative risk of different activities, and conflicting information is prevalent. A team of Finnish researchers set out to address some of these uncertainties by... Read more…

By Oliver Peckham

[email protected] Turns Its Massive Crowdsourced Computer Network Against COVID-19

March 16, 2020

For gamers, fighting against a global crisis is usually pure fantasy – but now, it’s looking more like a reality. As supercomputers around the world spin up Read more…

By Oliver Peckham

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

Global Supercomputing Is Mobilizing Against COVID-19

March 12, 2020

Tech has been taking some heavy losses from the coronavirus pandemic. Global supply chains have been disrupted, virtually every major tech conference taking place over the next few months has been canceled... Read more…

By Oliver Peckham

Supercomputer Simulations Reveal the Fate of the Neanderthals

May 25, 2020

For hundreds of thousands of years, neanderthals roamed the planet, eventually (almost 50,000 years ago) giving way to homo sapiens, which quickly became the do Read more…

By Oliver Peckham

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

Steve Scott Lays Out HPE-Cray Blended Product Roadmap

March 11, 2020

Last week, the day before the El Capitan processor disclosures were made at HPE's new headquarters in San Jose, Steve Scott (CTO for HPC & AI at HPE, and former Cray CTO) was on-hand at the Rice Oil & Gas HPC conference in Houston. He was there to discuss the HPE-Cray transition and blended roadmap, as well as his favorite topic, Cray's eighth-gen networking technology, Slingshot. Read more…

By Tiffany Trader

Honeywell’s Big Bet on Trapped Ion Quantum Computing

April 7, 2020

Honeywell doesn’t spring to mind when thinking of quantum computing pioneers, but a decade ago the high-tech conglomerate better known for its control systems waded deliberately into the then calmer quantum computing (QC) waters. Fast forward to March when Honeywell announced plans to introduce an ion trap-based quantum computer whose ‘performance’ would... Read more…

By John Russell

Leading Solution Providers

Contributors

Neocortex Will Be First-of-Its-Kind 800,000-Core AI Supercomputer

June 9, 2020

Pittsburgh Supercomputing Center (PSC - a joint research organization of Carnegie Mellon University and the University of Pittsburgh) has won a $5 million award Read more…

By Tiffany Trader

‘Billion Molecules Against COVID-19’ Challenge to Launch with Massive Supercomputing Support

April 22, 2020

Around the world, supercomputing centers have spun up and opened their doors for COVID-19 research in what may be the most unified supercomputing effort in hist Read more…

By Oliver Peckham

Australian Researchers Break All-Time Internet Speed Record

May 26, 2020

If you’ve been stuck at home for the last few months, you’ve probably become more attuned to the quality (or lack thereof) of your internet connection. Even Read more…

By Oliver Peckham

15 Slides on Programming Aurora and Exascale Systems

May 7, 2020

Sometime in 2021, Aurora, the first planned U.S. exascale system, is scheduled to be fired up at Argonne National Laboratory. Cray (now HPE) and Intel are the k Read more…

By John Russell

Nvidia’s Ampere A100 GPU: Up to 2.5X the HPC, 20X the AI

May 14, 2020

Nvidia's first Ampere-based graphics card, the A100 GPU, packs a whopping 54 billion transistors on 826mm2 of silicon, making it the world's largest seven-nanom Read more…

By Tiffany Trader

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

TACC Supercomputers Run Simulations Illuminating COVID-19, DNA Replication

March 19, 2020

As supercomputers around the world spin up to combat the coronavirus, the Texas Advanced Computing Center (TACC) is announcing results that may help to illumina Read more…

By Staff report

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This