Brookhaven Lab Refreshes Its Data-Intensive Computing Capabilities

By Kerstin Kleese van Dam

October 27, 2016

Building on its experience in operating world-leading computer systems—including the IBM Blue Gene L series’ New York Blue, which ranked fifth among the world’s Top500 supercomputers in June 2007, and the Blue Gene Q, which ranked sixth in the 2012 Graph 500 list—the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory is refreshing its data-intensive computing capabilities. This refresh comes at a time when the rates, volumes, and variety of data that scientists are collecting are growing exponentially, driven by new technological developments at large-scale experimental facilities. Thousands of academic and industrial users are annually supported by Brookhaven’s DOE Office of Science User Facilities: the Relativistic Heavy Ion Collider (RHIC), National Synchrotron Light Source II (NSLS II), and Center for Functional Nanomaterials (CFN).

The new institutional cluster from Hewlett Packard Enterprise.
The new institutional cluster from Hewlett Packard Enterprise.

In response, Brookhaven’s Computational Science Initiative (CSI) was launched in 2014 to consolidate the lab’s data-centric activities under one umbrella. Through the CSI, scientists are carrying out leading-edge research into highly scalable data analysis and interpretation solutions, from hardware to new analytical methods, as well as providing an extended, collaborative user support program and a state-of-the-art data and computing infrastructure. Alongside its world-leading high-throughput computing and data archival capabilities, CSI is now commissioning two new systems:

  • “Annie,” an institutional data-intensive computing cluster from Hewlett Packard Enterprise, named after the computer scientist, mathematician, and rocket scientist Annie J. Easley. The cluster consists of
    • 108 compute nodes (upgrade to 200 planned for November 2016), each with two Intel Xeon E5-2695V4 Broadwell-based CPUs with a total physical core count of 36; two NVIDIA K80 GPUs; and a total of 256 GB of error-correcting code RAM
    • EDR InfiniBand network in a nonblocking configuration
    • A storage system with 1 PB of usable RAID 6 capacity managed by IBM General Parallel File System with a minimum 24 GB/sec read/write performance and a maximum expansion to 2 PB and 45 GB/sec read/write performance
  • “Frances,” an Intel Knights Landing (KNL) cluster from Koi Computers, named after Frances Spence, one of the original programmers of the first digital computer. The system is the first in a number of systems to be procured over the coming years in Brookhaven’s Novel Architecture Testbed Facility, which is dedicated to hardware exploration for data-intensive applications. It consists of:
    • 144 Intel Xeon Phi Processors 7230 (KNL), each with a total physical core count of 64 and a clock speed of 1.3 GHz
    • 2 x 512 GB solid-state drives per node
    • 192 GB of error-correcting code RAM per node
    • Dual-rail Intel Omni-Path Fabric 100 Series network in a nonblocking configuration

The institutional computing cluster will support a range of high-profile projects, including near-real-time data analysis at the CFN and NSLS-II. This analysis will help scientists understand the structures of biological proteins, the real-time operation of batteries, and other complex problems.

This figure shows the computer-assisted catalyst design for the oxygen reduction reaction (ORR), which is one of the key challenges to advancing the application of fuel cells in clean transportation. Theoretical calculations based on nanoparticle models provide a way to not only speed up this reaction on conventional platinum (Pt) catalysts and enhance their durability, but also to lower the cost of fuel cell production by alloying (combining) Pt catalysts with the less expensive elements nickel (Ni) and gold (Au).
This figure shows the computer-assisted catalyst design for the oxygen reduction reaction (ORR), which is one of the key challenges to advancing the application of fuel cells in clean transportation. Theoretical calculations based on nanoparticle models provide a way to not only speed up this reaction on conventional platinum (Pt) catalysts and enhance their durability, but also to lower the cost of fuel cell production by alloying (combining) Pt catalysts with the less expensive elements nickel (Ni) and gold (Au).

This cluster will also be used for exascale numerical model development efforts, such as for the new Center for Computational Design of Strongly Correlated Materials and Theoretical Spectroscopy. Led by Brookhaven Lab and Rutgers University with partners from the University of Tennessee and DOE’s Ames Laboratory, this center is developing next-generation methods and software to accurately describe electronic correlations in high-temperature superconductors and other complex materials and a companion database to predict targeted properties with energy-related application to thermoelectric materials. Brookhaven scientists collaborating on two exascale computing application projects that were recently awarded full funding by DOE—“NWChemEx: Tackling Chemical, Materials and Biomolecular Challenges in the Exascale Era” and “Exascale Lattice Gauge Theory Opportunities and Requirements for Nuclear and High Energy Physics”—will also access the institutional cluster.

Another user of the cluster will be theorists in CFN’s Theory and Computation Group, who use high-performance computing to solve the fundamental equations of quantum mechanics needed by their experimental colleagues to identify the atomic structures that will have a particular desired functionality, such as the ability to catalyze reactions. A new application area will be opened up by the Robust Extreme-scale Multimodal Structured Learning Project that was recently funded by DOE. This project is focused on the development of highly scalable machine learning approaches to enable the advanced analysis of spatio-temporal data.

Members of the commissioning team—(from left to right) Imran Latif, David Free, Mark Lukasczyk, Shigeki Misawa, Tejas Rao, Frank Burstein, and Costin Caramarcu—in front of the newly installed institutional computing cluster at Brookhaven Lab’s Scientific Data and Computing Center.
Members of the commissioning team—(from left to right) Imran Latif, David Free, Mark Lukasczyk, Shigeki Misawa, Tejas Rao, Frank Burstein, and Costin Caramarcu—in front of the newly installed institutional computing cluster at Brookhaven Lab’s Scientific Data and Computing Center.

The KNL cluster will be integrated into Brookhaven’s Novel Architecture Testbed Facility, where scientists will research the interplay of novel, highly scalable data analysis algorithms, enhanced programming models, and next-generation hardware architectures for extreme-scale, data-intensive applications. A key focus for these type of applications will be the analysis of large-scale, high-velocity experimental data from Brookhaven’s DOE Office of Science User Facilities. The testbed facility is set to become available to a wider user community in mid-2017, after the acquisition of two further experimental architectures.

At the same time, CSI scientists will be increasing their engagement in key standardization groups for leading parallel programming models. Brookhaven Lab is waiting on approval to join the OpenMP ARB, which manages the shared-memory parallel programming model commonly used today, and became a member of the OpenACC consortium in June 2016.

As part of the OpenACC community of more than 20 research institutions, supercomputing centers, and technology developers, Brookhaven will help implement the features of the latest C++ programming language standard into OpenACC software. This effort will directly support the new institutional cluster that Brookhaven purchased from Hewlett Packard Enterprise by making it easier to transfer data resident in main system memory to local memory resident on an accelerator such as a GPU.

These standards development efforts and technology upgrades coincide with CSI’s move to bring all of its computer science and applied mathematics research under one roof. In October 2016, CSI moved into a building that will accommodate a rapidly growing team and will include collaborative spaces where Brookhaven scientists and facility users can work with CSI experts to make data-driven scientific discoveries. 

About the Author

Kerstin Kleese van DamKerstin Kleese van Dam is the director of the Computational Science Initiative at the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory. In this role, she oversees a multidisciplinary team of computer scientists, mathematicians, and science domain experts as they develop new tools to tackle the big data challenges at the frontiers of scientific discovery—particularly at the DOE Office of Science User Facilities that attract thousands of scientific users each year. Key CSI partners include Stony Brook University, Columbia University, New York University, Yale University, and IBM research. She has more than 25 years of experience in data infrastructure services, data management, and analysis applications for experimental and observational facilities, and has coauthored more than 100 publications. She is a member of the DOE Advanced Scientific Computing Research Advisory Committee’s standing subcommittee on Science, Technology, and Information, and regularly co-organizes and participates in DOE Office of Science workshops and meetings.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

2024 Winter Classic: Meet Team Morehouse

April 17, 2024

Morehouse College? The university is well-known for their long list of illustrious graduates, the rigor of their academics, and the quality of the instruction. They were one of the first schools to sign up for the Winter Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pressing needs and hurdles to widespread AI adoption. The sudde Read more…

Quantinuum Reports 99.9% 2-Qubit Gate Fidelity, Caps Eventful 2 Months

April 16, 2024

March and April have been good months for Quantinuum, which today released a blog announcing the ion trap quantum computer specialist has achieved a 99.9% (three nines) two-qubit gate fidelity on its H1 system. The lates Read more…

Mystery Solved: Intel’s Former HPC Chief Now Running Software Engineering Group 

April 15, 2024

Last year, Jeff McVeigh, Intel's readily available leader of the high-performance computing group, suddenly went silent, with no interviews granted or appearances at press conferences.  It led to questions -- what's Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Institute for Human-Centered AI (HAI) put out a yearly report to t Read more…

Crossing the Quantum Threshold: The Path to 10,000 Qubits

April 15, 2024

Editor’s Note: Why do qubit count and quality matter? What’s the difference between physical qubits and logical qubits? Quantum computer vendors toss these terms and numbers around as indicators of the strengths of t Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs 

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Computational Chemistry Needs To Be Sustainable, Too

April 8, 2024

A diverse group of computational chemists is encouraging the research community to embrace a sustainable software ecosystem. That's the message behind a recent Read more…

Hyperion Research: Eleven HPC Predictions for 2024

April 4, 2024

HPCwire is happy to announce a new series with Hyperion Research  - a fact-based market research firm focusing on the HPC market. In addition to providing mark Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire