Brookhaven Lab Xeon Phi Hackathon Expands Possibilities for Scientific Breakthroughs

April 6 — From February 26 through March 2, nearly 20 users of Xeon Phi–based supercomputers came together at Brookhaven Lab to be mentored by computing experts from Brookhaven and Lawrence Berkeley national labs, Indiana University, Princeton University, University of Bielefeld in Germany, and University of California–Berkeley. The hackathon organizing committee selected the mentors based on their experience in Xeon Phi optimization and shared-memory parallel programming with the OpenMP (for Multi-Processing) industry standard.

At the Brookhaven Lab-hosted Xeon Phi hackathon (left to right) mentor Bei Wang, a high-performance-computing software engineer at Princeton University; mentor Hideki Saito, a principal engineer at Intel; and participant Han Aung, a graduate student in the Department of Physics at Yale University.

Participants did not need to have prior Xeon Phi experience to attend. Several weeks prior to the hackathon, the teams were assigned to mentors with scientific backgrounds relevant to the respective application codes. The mentors and teams then held a series of meetings to discuss the limitations of their existing codes and goals at the hackathon. In addition to their specific mentors, the teams had access to four Intel technical experts with backgrounds in programming and scientific domains. These Intel experts served as floating mentors during the event to provide expertise in hardware architecture and performance optimization.

“The hackathon provided an excellent opportunity for application developers to talk and work with Intel experts directly,” said mentor Bei Wang, a HPC software engineer at Princeton University. “The result was a significant speed up in the time it takes to optimize code, thus helping application teams achieve their science goals at a faster pace. Events like this hackathon are of great value to both scientists and vendors.”

The five codes that were optimized cover a wide variety of applications:

A code for tracking particle-device and particle-particle interactions that has the potential to be used as the design platform for future particle accelerators
A code for simulating the evolution of the quark-gluon plasma (a hot, dense state of matter thought to have been present for a few millionths of a second after the Big Bang) produced through high-energy collisions at Brookhaven’s Relativistic Heavy Ion Collider(RHIC)—a DOE Office of Science User Facility
An algorithm for sorting records from databases, such as DNA sequences to identify inherited genetic variations and disorders
A code for simulating the formation of structures in the universe, particularly galaxy clusters
A code for simulating the interactions between quarks and gluons in real time

“Large-scale numerical simulations are required to describe the matter created at the earliest times after the collision of two heavy ions,” said team member Mark Mace, a PhD candidate in the Nuclear Theory Group in the Physics and Astronomy Department at Stony Brook University and the Nuclear Theory Group in the Physics Department at Brookhaven Lab. “My team had a really successful week—we were able to make our code run much faster (20x), and this improvement is a game changer as far as the physics we can study with the resources we have. We will now be able to more accurately describe the matter created after heavy-ion collisions, study a larger array of macroscopic phenomena observed in such collisions, and make quantitative predictions for experiments at RHIC and the Large Hadron Collider in Europe.”

Tinmin Tian, a senior principal engineer at Intel, gives a presentation on vector programming to help the teams optimize their scientific codes for the Xeon Phi processors.

“With the new memory subsystem recently released by Intel, we can order a huge number of elements faster than with conventional memory because more data can be transferred at a time,” said team member Sergey Madaminov, who is pursuing his PhD in computer science in the Computer Architecture at Stony Brook (COMPAS) Lab at Stony Brook University. “However, this high-bandwidth memory is physically located close to the processor, limiting its capacity. To mitigate this limitation, we apply smart algorithms that split data into smaller chunks that can then fit into high-bandwidth memory and be sorted inside it. At the hackathon, our goal was to demonstrate our theoretical results—our algorithms speed up sorting—in practice. We ended up finding many weak places in our code and were able to fix them with the help of our mentor and experts from Intel, improving our initial code more than 40x. With this improvement, we expect to sort much larger datasets faster.”

According to CSI computational scientist Meifeng Lin, one one of the organizers, the hackathon was highly successful—all five teams improved the performance of their codes, achieving from 2x to 40x speedups.

This is an excerpt of a longer feature article: https://www.bnl.gov/newsroom/news.php?a=212743

Source: Ariana Tantillo, Brookhaven National Laboratory