Brookhaven Lab Refreshes Its Data-Intensive Computing Capabilities

By Kerstin Kleese van Dam

October 27, 2016

Building on its experience in operating world-leading computer systems—including the IBM Blue Gene L series’ New York Blue, which ranked fifth among the world’s Top500 supercomputers in June 2007, and the Blue Gene Q, which ranked sixth in the 2012 Graph 500 list—the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory is refreshing its data-intensive computing capabilities. This refresh comes at a time when the rates, volumes, and variety of data that scientists are collecting are growing exponentially, driven by new technological developments at large-scale experimental facilities. Thousands of academic and industrial users are annually supported by Brookhaven’s DOE Office of Science User Facilities: the Relativistic Heavy Ion Collider (RHIC), National Synchrotron Light Source II (NSLS II), and Center for Functional Nanomaterials (CFN).

The new institutional cluster from Hewlett Packard Enterprise.
The new institutional cluster from Hewlett Packard Enterprise.

In response, Brookhaven’s Computational Science Initiative (CSI) was launched in 2014 to consolidate the lab’s data-centric activities under one umbrella. Through the CSI, scientists are carrying out leading-edge research into highly scalable data analysis and interpretation solutions, from hardware to new analytical methods, as well as providing an extended, collaborative user support program and a state-of-the-art data and computing infrastructure. Alongside its world-leading high-throughput computing and data archival capabilities, CSI is now commissioning two new systems:

  • “Annie,” an institutional data-intensive computing cluster from Hewlett Packard Enterprise, named after the computer scientist, mathematician, and rocket scientist Annie J. Easley. The cluster consists of
    • 108 compute nodes (upgrade to 200 planned for November 2016), each with two Intel Xeon E5-2695V4 Broadwell-based CPUs with a total physical core count of 36; two NVIDIA K80 GPUs; and a total of 256 GB of error-correcting code RAM
    • EDR InfiniBand network in a nonblocking configuration
    • A storage system with 1 PB of usable RAID 6 capacity managed by IBM General Parallel File System with a minimum 24 GB/sec read/write performance and a maximum expansion to 2 PB and 45 GB/sec read/write performance
  • “Frances,” an Intel Knights Landing (KNL) cluster from Koi Computers, named after Frances Spence, one of the original programmers of the first digital computer. The system is the first in a number of systems to be procured over the coming years in Brookhaven’s Novel Architecture Testbed Facility, which is dedicated to hardware exploration for data-intensive applications. It consists of:
    • 144 Intel Xeon Phi Processors 7230 (KNL), each with a total physical core count of 64 and a clock speed of 1.3 GHz
    • 2 x 512 GB solid-state drives per node
    • 192 GB of error-correcting code RAM per node
    • Dual-rail Intel Omni-Path Fabric 100 Series network in a nonblocking configuration

The institutional computing cluster will support a range of high-profile projects, including near-real-time data analysis at the CFN and NSLS-II. This analysis will help scientists understand the structures of biological proteins, the real-time operation of batteries, and other complex problems.

This figure shows the computer-assisted catalyst design for the oxygen reduction reaction (ORR), which is one of the key challenges to advancing the application of fuel cells in clean transportation. Theoretical calculations based on nanoparticle models provide a way to not only speed up this reaction on conventional platinum (Pt) catalysts and enhance their durability, but also to lower the cost of fuel cell production by alloying (combining) Pt catalysts with the less expensive elements nickel (Ni) and gold (Au).
This figure shows the computer-assisted catalyst design for the oxygen reduction reaction (ORR), which is one of the key challenges to advancing the application of fuel cells in clean transportation. Theoretical calculations based on nanoparticle models provide a way to not only speed up this reaction on conventional platinum (Pt) catalysts and enhance their durability, but also to lower the cost of fuel cell production by alloying (combining) Pt catalysts with the less expensive elements nickel (Ni) and gold (Au).

This cluster will also be used for exascale numerical model development efforts, such as for the new Center for Computational Design of Strongly Correlated Materials and Theoretical Spectroscopy. Led by Brookhaven Lab and Rutgers University with partners from the University of Tennessee and DOE’s Ames Laboratory, this center is developing next-generation methods and software to accurately describe electronic correlations in high-temperature superconductors and other complex materials and a companion database to predict targeted properties with energy-related application to thermoelectric materials. Brookhaven scientists collaborating on two exascale computing application projects that were recently awarded full funding by DOE—“NWChemEx: Tackling Chemical, Materials and Biomolecular Challenges in the Exascale Era” and “Exascale Lattice Gauge Theory Opportunities and Requirements for Nuclear and High Energy Physics”—will also access the institutional cluster.

Another user of the cluster will be theorists in CFN’s Theory and Computation Group, who use high-performance computing to solve the fundamental equations of quantum mechanics needed by their experimental colleagues to identify the atomic structures that will have a particular desired functionality, such as the ability to catalyze reactions. A new application area will be opened up by the Robust Extreme-scale Multimodal Structured Learning Project that was recently funded by DOE. This project is focused on the development of highly scalable machine learning approaches to enable the advanced analysis of spatio-temporal data.

Members of the commissioning team—(from left to right) Imran Latif, David Free, Mark Lukasczyk, Shigeki Misawa, Tejas Rao, Frank Burstein, and Costin Caramarcu—in front of the newly installed institutional computing cluster at Brookhaven Lab’s Scientific Data and Computing Center.
Members of the commissioning team—(from left to right) Imran Latif, David Free, Mark Lukasczyk, Shigeki Misawa, Tejas Rao, Frank Burstein, and Costin Caramarcu—in front of the newly installed institutional computing cluster at Brookhaven Lab’s Scientific Data and Computing Center.

The KNL cluster will be integrated into Brookhaven’s Novel Architecture Testbed Facility, where scientists will research the interplay of novel, highly scalable data analysis algorithms, enhanced programming models, and next-generation hardware architectures for extreme-scale, data-intensive applications. A key focus for these type of applications will be the analysis of large-scale, high-velocity experimental data from Brookhaven’s DOE Office of Science User Facilities. The testbed facility is set to become available to a wider user community in mid-2017, after the acquisition of two further experimental architectures.

At the same time, CSI scientists will be increasing their engagement in key standardization groups for leading parallel programming models. Brookhaven Lab is waiting on approval to join the OpenMP ARB, which manages the shared-memory parallel programming model commonly used today, and became a member of the OpenACC consortium in June 2016.

As part of the OpenACC community of more than 20 research institutions, supercomputing centers, and technology developers, Brookhaven will help implement the features of the latest C++ programming language standard into OpenACC software. This effort will directly support the new institutional cluster that Brookhaven purchased from Hewlett Packard Enterprise by making it easier to transfer data resident in main system memory to local memory resident on an accelerator such as a GPU.

These standards development efforts and technology upgrades coincide with CSI’s move to bring all of its computer science and applied mathematics research under one roof. In October 2016, CSI moved into a building that will accommodate a rapidly growing team and will include collaborative spaces where Brookhaven scientists and facility users can work with CSI experts to make data-driven scientific discoveries. 

About the Author

Kerstin Kleese van DamKerstin Kleese van Dam is the director of the Computational Science Initiative at the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory. In this role, she oversees a multidisciplinary team of computer scientists, mathematicians, and science domain experts as they develop new tools to tackle the big data challenges at the frontiers of scientific discovery—particularly at the DOE Office of Science User Facilities that attract thousands of scientific users each year. Key CSI partners include Stony Brook University, Columbia University, New York University, Yale University, and IBM research. She has more than 25 years of experience in data infrastructure services, data management, and analysis applications for experimental and observational facilities, and has coauthored more than 100 publications. She is a member of the DOE Advanced Scientific Computing Research Advisory Committee’s standing subcommittee on Science, Technology, and Information, and regularly co-organizes and participates in DOE Office of Science workshops and meetings.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Quantum Internet: Tsinghua Researchers’ New Memory Framework could be Game-Changer

April 25, 2024

Researchers from the Center for Quantum Information (CQI), Tsinghua University, Beijing, have reported successful development and testing of a new programmable quantum memory framework. “This work provides a promising Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Point. The system includes Intel's research chip called Loihi 2, Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Research senior analyst Steve Conway, who closely tracks HPC, AI, Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical.  A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Quantum Internet: Tsinghua Researchers’ New Memory Framework could be Game-Changer

April 25, 2024

Researchers from the Center for Quantum Information (CQI), Tsinghua University, Beijing, have reported successful development and testing of a new programmable Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Poin Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire