CERN openlab Explores New CPU/FPGA Processing Solutions

By Linda Barney

April 14, 2017

Editor’s note: In this contributed feature, Linda Barney describes the ongoing technical collaboration between CERN and Intel to develop a co-packaged Xeon/FPGA processor.

At CERN, the European Organization for Nuclear Research, physicists and engineers are probing the fundamental structure of the universe. The Large Hadron Collider (LHC), which began working in 2008, is the world’s largest and most powerful particle accelerator; it is housed in an underground tunnel at CERN. Niko Neufeld is a deputy project leader at CERN who works on the Large Hadron Collider beauty (LHCb) experiment, which explores what happened after the Big Bang that allowed matter to survive and build the Universe we inhabit today.

“CERN experiments produce an enormous amount of data with forty million proton collisions every second, which leads to primary data rates of terabits per second,” says Neufeld when speaking on a recent FPGA vs. CPU panel. “This is an enormous amount of data and there are a number of technical challenges in our work. We use a number of processing solutions including central processing units (CPUs), field-programmable gate arrays (FPGAs), and graphic processing units (GPUs), but each of these solutions have some limitations. We are collaborating with Intel in experimenting with a co-packaged Intel Xeon processor plus FPGA Quick Path Interconnect (QPI) processor in our LHCb research to try to determine which technology provides the best results.”

CERN collaborates with leading ICT companies and other research institutes through a unique public-private partnership known as ‘CERN openlab’. Its goal is to accelerate the development of cutting-edge solutions for the worldwide LHC community and wider scientific research. Through a CERN openlab project known as the ‘High-Throughput Computing Collaboration,’ researchers are investigating the use of various Intel technologies in data filtering and data acquisition systems.

Figure 1. CERN researchers shown in the Large Hadron Collider tunnel in front of the LHCb detector. Courtesy of CERN (courtesty CERN).

Introducing the co-packaged Intel CPU / FPGA Processor

Today the CPU and FPGA are used as discrete chips in a solution – with an Intel Xeon processor and an FPGA which is typically attached via a PCIe interconnect to the CPU. And the development environment is also discrete using independent development tools from Intel and tools such as OpenCL and C++. Intel is working toward a common workflow and development flow to better integrate FPGAs.

“FPGA typically uses a higher level machine abstraction language (such as Verilog and VHDL) which have a painful low-level hardware programming model for most people. As a next step, Intel has a solution that co-packages the CPU and FPGA in the same Multichip Chip Product (MCP) package to deliver higher performance and lower latency than a discrete solution,” states Bill Jenkins, Intel Senior AI Marketing Manager. The Intel MCP is supported by a cross-platform development framework like OpenCL that can be used to develop applications for both the CPU and FPGA. The Intel solution includes a fully unified intellectual property (IP) and development suite, including languages, libraries and development environments. The roadmap to a unified development flow leverages common tools and libraries to support both FPGA and Intel Xeon processor + FPGA systems along with an expansive ecosystem network of Intel and vendors working on independent development tools for demanding workloads such as HPC, imaging identification, security and big data.

Abstracting away FPGA Coding

Intel is building an abstraction layer (as part of the product containing the Intel CPU and FPGA in the same MCP package), called the Orchestration Software layer. This layer and the higher level IP and software models help make development less complex so that developers don’t need to code specifically to the FPGA. The FPGA-enabled Orchestration software layer abstracts away the API to communicate with the FPGA as shown in the following example.

Figure 2. Example of Intel implementation of user IP implemented into FPGA via an abstraction Orchestration software layer

There is a cloud-based library of functions and end-user IP that have been pre-compiled and built that is loaded into the FPGA at runtime. The user first launches a workload from the host and it goes into the Orchestration software which pushes a function into the FPGA. This produces a bitstream that is pre-compiled on the FPGA to bring the data in—it is almost like a fixed architecture I/O interface.

In the example scenario, users simply download the image from the abstraction Orchestration software layer to the FPGA and it is ready to run without compilation. “With the abstraction Orchestration software layer,” Jenkins explained, “Intel is abstracting away all the difficulties of FPGA programming using machine language tools while enabling all the higher level Intel frameworks including the Intel Trusted Analytics Platform (TAP) and Intel Scalable System Framework (SSI) and tying the FPGA into the frameworks. Intel is developing this approach for a variety of markets including visual understanding, analytics, enterprise, Network Function Virtualization (NFV), VPN, genomics, HPC and storage.”

Large Hadron Collider High-Energy Physics Research at CERN

Neufeld indicates that the experiments at CERN — through what they refer to as ‘online computing’ — require a first-level data-filtering to reduce the data to an amount that can be stored and processed on more traditional processing units such as Intel Xeon processors. Figure 3 shows a schematic view of the future LHCb readout system. At the top level, there is a detector and optical fiber links, which transfer data out of the detector. CERN uses FPGAs to acquire data from the detector. There are also large switching fabrics, as well as clusters of processing elements including CPUs, FPGAs, and GPUs to reduce the amount of data. One of the questions the CERN team is testing is “Which technologies should we use and which provide the best performance and lowest energy usage results?”

Figure 3. Schematic diagram showing future LHCb first-level data-filtering system. Courtesy of CERN.

CERN Tests Complex Cherenkov Angle Reconstruction Calculation

CERN has extensive experience using FPGAs in their research work. “We typically use FPGAs in our research to run algorithms looking for simple integer signatures, or for other less complicated calculations. When we heard about the Intel Xeon / FPGA combined processor, we chose a test using a complex algorithm to do a Cherenkov angle reconstruction of light emission in a particle detector, which is not typically performed on an FPGA. This involves tracing a light particle — photon — through a complex arrangement of optical reflection and deflection systems. Our test case used a rich PID algorithm to calculate the Cherenkov angle for each track and detection point. This is a complex mathematical calculation that involves hyperbolic functions, roots, square roots, etc., as shown in Figure 4. It is one of the most costly calculations done in online reconstruction,” states Neufeld.

Figure 4. Test case running Rich PID algorithm to calculate Cherenkov angle. Courtesy of CERN.

Coding the Cherenkov Angle Reconstruction in Verilog versus OpenCL

The CERN team first implemented the Cherenkov angle reconstruction by coding it in the Verilog HDL. The team wrote a 748 clock-cycle long pipeline in Verilog, along with additional blocks developed for the test including: cubic root, complex square root, rotational matrix, and cross/scalar product. It was a lengthy task doing this coding in Verilog with 3,400 lines of code. With all test benches, the implementation took 2.5 months.

Next, the team recoded the Cherenkov angle code using the OpenCL and the BSP (board support package) designed to work across a variety of hardware platforms. Because OpenCL is an abstraction language, it required only 250 lines of code and took two weeks of coding. Not only was coding in OpenCL much faster but the performance results were similar. Figure 5 shows the results of the Verilog versus OpenCL implementation.

Figure 5. Result of Verilog (CQRT) versus OpenCL (RICH) code and performance. Courtesy of CERN.

CERN Compares Co-packaged Intel Xeon – FPGA Processor against Nallatech PCIe Stratix V FPGA Board

To test performance of the Verilog code, the CERN team used a commercially available Stratix V GXA7 FPGA board / Nallatech 385 board for testing. They achieved an acceleration of a factor up to six with the Stratix – Nallatech PCIe board. However, they found a bottleneck in data transfer—they could not keep the pipeline busy because the PCIe card was limited to an eight-lane interface. Next, the CERN team did tests with the Cherenkov angle code comparing a Nallatech FPGA Board with the co-packaged Intel Xeon/FPGA QPI processor.

Finally, the CERN team tested an Intel Xeon CPU, PCIe Stratix V FPGA and Intel Xeon processor/Stratix V QPI (where only the interconnect was different). As shown in Figure 6, there was a factor of 9 speed up for the PCIe Stratix V FPGA and a 26 factor speed up for the Intel Xeon processor/Stratix V QPI with the faster interconnect.

Figure 6. Test results from the CERN team comparing Intel Xeon CPU, PCIe Stratix V FPGA and Intel Xeon processor/FPGA QPI. Courtesy of CERN.

CERN Plans to do Future Testing using co-packaged Intel Xeon/ Intel Arria10 FPGA Processor

“Our CERN team found the results of using the co-packaged Intel Xeon processor/Stratix V QPI processors to be very encouraging. In addition, we find the programming model with OpenCL attractive and it will be mandatory for the High-Energy Physics (HEP) field. Intel will be launching a co-packaged Intel Xeon processor / Intel Arria 10 FPGA processor in the future. We want to do other experiments with the co-packaged Intel Xeon processor/ Arria 10 FPGA. We expect that the high-bandwidth interconnect and modern Arria 10 FPGA card will provide high performance and performance per Joule for HEP algorithms,” states Neufeld.

Linda Barney is the founder and owner of Barney and Associates, a technical/marketing writing, training and web design firm in Beaverton, OR.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

What’s New in HPC Research: Rabies, Smog, Robots & More

October 14, 2019

In this bimonthly feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

By Oliver Peckham

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ll get there at last month’s MIT-IBM Watson AI Lab’s AI Read more…

By John Russell

Summit Simulates Braking – on Mars

October 14, 2019

NASA is planning to send humans to Mars by the 2030s – and landing on the surface will be considerably trickier than landing a rover like Curiosity. To solve the problem, NASA researchers are using the world’s fastes Read more…

By Staff report

Chaminade University’s Immersion Program Builds Capacity for Data Science in Hawaii, Pacific Region

October 10, 2019

Kuleana is a uniquely Hawaiian value and practice which embodies responsibility to self, community, and the ‘aina' (land). At Chaminade University, a federally designated Native Hawaiian serving university in Hawai‘i Read more…

By Faith Singer-Villalobos

Trovares Drives Memory-Driven, Property Graph Analytics Strategy with HPE

October 10, 2019

Trovares, a high performance property graph analytics company, has partnered with HPE and its Superdome Flex memory-driven servers on a cybersecurity capability the companies say “routinely” runs near-time workloads on 24TB-capacity systems... Read more…

By Doug Black

AWS Solution Channel

Making High Performance Computing Affordable and Accessible for Small and Medium Businesses with HPC on AWS

High performance computing (HPC) brings a powerful set of tools to a broad range of industries, helping to drive innovation and boost revenue in finance, genomics, oil and gas extraction, and other fields. Read more…

HPE Extreme Performance Solutions

Intel FPGAs: More Than Just an Accelerator Card

FPGA (Field Programmable Gate Array) acceleration cards are not new, as they’ve been commercially available since 1984. Typically, the emphasis around FPGAs has centered on the fact that they’re programmable accelerators, and that they can truly offer workload specific hardware acceleration solutions without requiring custom silicon. Read more…

IBM Accelerated Insights

HPC in the Cloud: Avoid These Common Pitfalls

[Connect with LSF users and learn new skills in the IBM Spectrum LSF User Community.]

It seems that everyone is experimenting about cloud computing. Read more…

Intel, Lenovo Join Forces on HPC Cluster for Flatiron

October 9, 2019

An HPC cluster with deep learning techniques will be used to process petabytes of scientific data as part of workload-intensive projects spanning astrophysics to genomics. AI partners Intel and Lenovo said they are providing... Read more…

By George Leopold

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ Read more…

By John Russell

Summit Simulates Braking – on Mars

October 14, 2019

NASA is planning to send humans to Mars by the 2030s – and landing on the surface will be considerably trickier than landing a rover like Curiosity. To solve Read more…

By Staff report

Trovares Drives Memory-Driven, Property Graph Analytics Strategy with HPE

October 10, 2019

Trovares, a high performance property graph analytics company, has partnered with HPE and its Superdome Flex memory-driven servers on a cybersecurity capability the companies say “routinely” runs near-time workloads on 24TB-capacity systems... Read more…

By Doug Black

Intel, Lenovo Join Forces on HPC Cluster for Flatiron

October 9, 2019

An HPC cluster with deep learning techniques will be used to process petabytes of scientific data as part of workload-intensive projects spanning astrophysics to genomics. AI partners Intel and Lenovo said they are providing... Read more…

By George Leopold

Optimizing Offshore Wind Farms with Supercomputer Simulations

October 9, 2019

Offshore wind farms offer a number of benefits; many of the areas with the strongest winds are located offshore, and siting wind farms offshore ameliorates many of the land use concerns associated with onshore wind farms. Some estimates say that, if leveraged, offshore wind power... Read more…

By Oliver Peckham

Harvard Deploys Cannon, New Lenovo Water-Cooled HPC Cluster

October 9, 2019

Harvard's Faculty of Arts & Sciences Research Computing (FASRC) center announced a refresh of their primary HPC resource. The new cluster, called Cannon after the pioneering American astronomer Annie Jump Cannon, is supplied by Lenovo... Read more…

By Tiffany Trader

NSF Announces New AI Program; Plans $120M in Funding Next Year

October 8, 2019

As the saying goes, when you’re hot, you’re hot. Right now, AI is scalding. Today the National Science Foundation announced a new AI initiative – The National Artificial Intelligence Research Institutes program – with plans to invest about “$120 million in grants next year... Read more…

By Staff report

DOE Sets Sights on Accelerating AI (and other) Technology Transfer

October 3, 2019

For the past two days DOE leaders along with ~350 members from academia and industry gathered in Chicago to discuss AI development and the ways in which industr Read more…

By John Russell

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Chinese Company Sugon Placed on US ‘Entity List’ After Strong Showing at International Supercomputing Conference

June 26, 2019

After more than a decade of advancing its supercomputing prowess, operating the world’s most powerful supercomputer from June 2013 to June 2018, China is keep Read more…

By Tiffany Trader

D-Wave’s Path to 5000 Qubits; Google’s Quantum Supremacy Claim

September 24, 2019

On the heels of IBM’s quantum news last week come two more quantum items. D-Wave Systems today announced the name of its forthcoming 5000-qubit system, Advantage (yes the name choice isn’t serendipity), at its user conference being held this week in Newport, RI. Read more…

By John Russell

A Behind-the-Scenes Look at the Hardware That Powered the Black Hole Image

June 24, 2019

Two months ago, the first-ever image of a black hole took the internet by storm. A team of scientists took years to produce and verify the striking image – an Read more…

By Oliver Peckham

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber, Burak Yenier and Wolfgang Gentzsch, UberCloud

Intel Debuts Pohoiki Beach, Its 8M Neuron Neuromorphic Development System

July 17, 2019

Neuromorphic computing has received less fanfare of late than quantum computing whose mystery has captured public attention and which seems to have generated mo Read more…

By John Russell

Rise of NIH’s Biowulf Mirrors the Rise of Computational Biology

July 29, 2019

The story of NIH’s supercomputer Biowulf is fascinating, important, and in many ways representative of the transformation of life sciences and biomedical res Read more…

By John Russell

Quantum Bits: Neven’s Law (Who Asked for That), D-Wave’s Steady Push, IBM’s Li-O2- Simulation

July 3, 2019

Quantum computing’s (QC) many-faceted R&D train keeps slogging ahead and recently Japan is taking a leading role. Yesterday D-Wave Systems announced it ha Read more…

By John Russell

With the Help of HPC, Astronomers Prepare to Deflect a Real Asteroid

September 26, 2019

For years, NASA has been running simulations of asteroid impacts to understand the risks (and likelihoods) of asteroids colliding with Earth. Now, NASA and the European Space Agency (ESA) are preparing for the next, crucial step in planetary defense against asteroid impacts: physically deflecting a real asteroid. Read more…

By Oliver Peckham

ISC Keynote: Thomas Sterling’s Take on Whither HPC

June 20, 2019

Entertaining, insightful, and unafraid to launch the occasional verbal ICBM, HPC pioneer Thomas Sterling delivered his 16th annual closing keynote at ISC yesterday. He explored, among other things: exascale machinations; quantum’s bubbling money pot; Arm’s new HPC viability; Europe’s... Read more…

By John Russell

Argonne Team Makes Record Globus File Transfer

July 10, 2019

A team of scientists at Argonne National Laboratory has broken a data transfer record by moving a staggering 2.9 petabytes of data for a research project.  The data – from three large cosmological simulations – was generated and stored on the Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF)... Read more…

By Oliver Peckham

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This