November 11, 2013

Alternatives to x86 for Physics Processing

Tiffany Trader
CMS Higgs boson decays 250x

The global distributed computing system known as the Worldwide LHC Computing Grid (WLCG) brings together resources from more than 150 computing centers in nearly 40 countries. Its mission is to store, distribute and analyze the 25 petabytes of data generated each year by the Large Hadron Collider (LHC), based out of the European Laboratory for Particle Physics (CERN) in Geneva, Switzerland. Projects of this magnitude require significantly more computational resources than can be delivered by one facility, hence the need for a multi-organizational, international grid computing system. This infrastructure supports the science that makes discoveries like the Higgs boson possible.

Even more capacity will be required going forward. It is predicted that datasets must increase by 2-3 orders of magnitude to realize the full potential of this scientific instrument. Keeping LHC computing relevant in the coming years will require significant advances on the hardware side. Starting around 2005, processors began hitting their scaling limits, owing mostly to their tremendous power demand. This challenge has driven interest in new processor architectures, other than general purpose x86-64 processors.

This situation has inspired an international team of distinguished scientists to examine the viability of the ARM processor and the Intel Xeon Phi coprocessor for scientific computing. They’ve written a paper describing their experience porting software to these processors and running benchmarks using real physics applications. Their goal is to assess the potential of these processors to be utilized for production physics processing.

For the ARM investigation, the test setup included two low-cost development boards, the ODROID-U2 and the ODROID-XU+E, each sporting eMMC and microSD slots, multiple USB ports and 10/100Mbps Ethernet with an RJ-45 port. Each uses a 5V DC power adaptor.

The authors write that “the processor on the U2 board is an Exynos 4412 Prime, a System-on-Chip (SoC) produced by Samsung for use in mobile devices. It is a quad-core Cortex A9 ARMv7 processor operating at 1.7GHz with 2GB of LP-DDR2 memory. The processor also contains an ARM Mali-400 quad-core GPU accelerator, although that was not used for the work described in this paper.”

They continue: “The XU+E board has a more recent Exynos 5410 processor, with 4 Cortex-A15 cores at 1.6GHz and 4 Cortex-A7 cores at 1.2GHz, in ARM’s big.LITTLE configuration, with 2GB of LDDR3 memory, as well as a PowerVR SGX544MP3 GPU (also not used in this work).”

For the Phi investigations, the team created a basic HEP software development environment to support application and benchmark tests which can run directly on the Phi card. The setup employed a Xeon Phi 7110P card attached to an Intel Xeon box with 32 logical cores.

The paper delves further into the hardware and software specifics for each test environment as well as the various challenges and limitations that presented. There is also a discussion of experimental results and general tools support. The authors make the point that “when comparing and optimizing for various architectures, understanding the performance obtained in detail is as important as obtaining overall benchmark numbers.”

As could be predicted, single core performance is much lower for ARMv7 processor than traditional x86 processors, but the performance per watt is much improved for the ARM chips. The authors conclude “the potential for use in scientific (general purpose) computing is clear.” They also report “successful ports of both the IgProf profiler and the DMTCP checkpointing package to ARMv7.” Despite these positive initial tests, more work is needed before there is a clear answer on the benefits of these alternative architectures for HEP computing.

The paper describing this research has been submitted to proceedings of the 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP13), Amsterdam.