Building on its experience in operating world-leading computer systems—including the IBM Blue Gene L series’ New York Blue, which ranked fifth among the world’s Top500 supercomputers in June 2007, and the Blue Gene Q, which ranked sixth in the 2012 Graph 500 list—the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory is refreshing its data-intensive computing capabilities. This refresh comes at a time when the rates, volumes, and variety of data that scientists are collecting are growing exponentially, driven by new technological developments at large-scale experimental facilities. Thousands of academic and industrial users are annually supported by Brookhaven’s DOE Office of Science User Facilities: the Relativistic Heavy Ion Collider (RHIC), National Synchrotron Light Source II (NSLS II), and Center for Functional Nanomaterials (CFN).
In response, Brookhaven’s Computational Science Initiative (CSI) was launched in 2014 to consolidate the lab’s data-centric activities under one umbrella. Through the CSI, scientists are carrying out leading-edge research into highly scalable data analysis and interpretation solutions, from hardware to new analytical methods, as well as providing an extended, collaborative user support program and a state-of-the-art data and computing infrastructure. Alongside its world-leading high-throughput computing and data archival capabilities, CSI is now commissioning two new systems:
- “Annie,” an institutional data-intensive computing cluster from Hewlett Packard Enterprise, named after the computer scientist, mathematician, and rocket scientist Annie J. Easley. The cluster consists of
- 108 compute nodes (upgrade to 200 planned for November 2016), each with two Intel Xeon E5-2695V4 Broadwell-based CPUs with a total physical core count of 36; two NVIDIA K80 GPUs; and a total of 256 GB of error-correcting code RAM
- EDR InfiniBand network in a nonblocking configuration
- A storage system with 1 PB of usable RAID 6 capacity managed by IBM General Parallel File System with a minimum 24 GB/sec read/write performance and a maximum expansion to 2 PB and 45 GB/sec read/write performance
- “Frances,” an Intel Knights Landing (KNL) cluster from Koi Computers, named after Frances Spence, one of the original programmers of the first digital computer. The system is the first in a number of systems to be procured over the coming years in Brookhaven’s Novel Architecture Testbed Facility, which is dedicated to hardware exploration for data-intensive applications. It consists of:
- 144 Intel Xeon Phi Processors 7230 (KNL), each with a total physical core count of 64 and a clock speed of 1.3 GHz
- 2 x 512 GB solid-state drives per node
- 192 GB of error-correcting code RAM per node
- Dual-rail Intel Omni-Path Fabric 100 Series network in a nonblocking configuration
The institutional computing cluster will support a range of high-profile projects, including near-real-time data analysis at the CFN and NSLS-II. This analysis will help scientists understand the structures of biological proteins, the real-time operation of batteries, and other complex problems.
This cluster will also be used for exascale numerical model development efforts, such as for the new Center for Computational Design of Strongly Correlated Materials and Theoretical Spectroscopy. Led by Brookhaven Lab and Rutgers University with partners from the University of Tennessee and DOE’s Ames Laboratory, this center is developing next-generation methods and software to accurately describe electronic correlations in high-temperature superconductors and other complex materials and a companion database to predict targeted properties with energy-related application to thermoelectric materials. Brookhaven scientists collaborating on two exascale computing application projects that were recently awarded full funding by DOE—“NWChemEx: Tackling Chemical, Materials and Biomolecular Challenges in the Exascale Era” and “Exascale Lattice Gauge Theory Opportunities and Requirements for Nuclear and High Energy Physics”—will also access the institutional cluster.
Another user of the cluster will be theorists in CFN’s Theory and Computation Group, who use high-performance computing to solve the fundamental equations of quantum mechanics needed by their experimental colleagues to identify the atomic structures that will have a particular desired functionality, such as the ability to catalyze reactions. A new application area will be opened up by the Robust Extreme-scale Multimodal Structured Learning Project that was recently funded by DOE. This project is focused on the development of highly scalable machine learning approaches to enable the advanced analysis of spatio-temporal data.
The KNL cluster will be integrated into Brookhaven’s Novel Architecture Testbed Facility, where scientists will research the interplay of novel, highly scalable data analysis algorithms, enhanced programming models, and next-generation hardware architectures for extreme-scale, data-intensive applications. A key focus for these type of applications will be the analysis of large-scale, high-velocity experimental data from Brookhaven’s DOE Office of Science User Facilities. The testbed facility is set to become available to a wider user community in mid-2017, after the acquisition of two further experimental architectures.
At the same time, CSI scientists will be increasing their engagement in key standardization groups for leading parallel programming models. Brookhaven Lab is waiting on approval to join the OpenMP ARB, which manages the shared-memory parallel programming model commonly used today, and became a member of the OpenACC consortium in June 2016.
As part of the OpenACC community of more than 20 research institutions, supercomputing centers, and technology developers, Brookhaven will help implement the features of the latest C++ programming language standard into OpenACC software. This effort will directly support the new institutional cluster that Brookhaven purchased from Hewlett Packard Enterprise by making it easier to transfer data resident in main system memory to local memory resident on an accelerator such as a GPU.
These standards development efforts and technology upgrades coincide with CSI’s move to bring all of its computer science and applied mathematics research under one roof. In October 2016, CSI moved into a building that will accommodate a rapidly growing team and will include collaborative spaces where Brookhaven scientists and facility users can work with CSI experts to make data-driven scientific discoveries.
About the Author
Kerstin Kleese van Dam is the director of the Computational Science Initiative at the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory. In this role, she oversees a multidisciplinary team of computer scientists, mathematicians, and science domain experts as they develop new tools to tackle the big data challenges at the frontiers of scientific discovery—particularly at the DOE Office of Science User Facilities that attract thousands of scientific users each year. Key CSI partners include Stony Brook University, Columbia University, New York University, Yale University, and IBM research. She has more than 25 years of experience in data infrastructure services, data management, and analysis applications for experimental and observational facilities, and has coauthored more than 100 publications. She is a member of the DOE Advanced Scientific Computing Research Advisory Committee’s standing subcommittee on Science, Technology, and Information, and regularly co-organizes and participates in DOE Office of Science workshops and meetings.