Shining a Light on SKA’s Massive Data Processing Requirements

By Tiffany Trader

June 4, 2015

One of the many highlights of the fourth annual Asia Student Supercomputer Challenge (ASC15) was the MIC optimization test, which this year required students to optimize a gridding algorithm used in the world’s largest international astronomy effort, the Square Kilometre Array (SKA) project.

Gridding is one of the most time-consuming steps in radio telescope data processing. To reconstruct a sky image from the data collected by the radio telescope, scientists need to take the irregular sampled data and map it onto a standardized 2-D mesh. The process of adding sampled data from the telescopes to a grid is called gridding. After this step, the grid can be Fourier transformed to create a sky image.

To say that radio astronomy is pushing the limits of data processing is an understatement. Consider that the data produced by SKA per second is expected to exceed 12TB and nearly 50 percent of this astronomy data need to be processed through gridding. In a 2012 paper, Netherlands Institute for Radio Astronomy (ASTRON) researcher John W. Romein placed SKA phase one image processing requirements in the petaflops range; the full-scale project will cross into exaflops territory.

Unlike the other five ASC15 test applications (LINPACK, NAMD, WRF-CHEM, Palabos and the surprise-app HPCC), which run on Inspur provided racks with a maximum power consumption limit of 3000 watts, the gridding app is run on a separate platform provided by the committee consisting of one login node and 16 computing nodes. The 16 nodes are outfitted with two CPUs (Intel Xeon E5-2670 v3, 12-core, 2.30GHz, 64GB memory) and one MIC card (Intel Xeon Phi 7110P, 61 cores, 1.1Ghz, 8GB memory) connected over InfiniBand FDR.

The gridding portion of the ASC15 challenge is worth 20 points out of a possible 100 and the team with the fastest run time is awarded the e-Prize award, which comes with $4,380 in prize money. During the awards ceremony held Friday, May 22, the winner of this challenge was declared to be Sun Yat-sen University. This was not Sun Yat-sen’s first time being honored in an ASC competition. Last year, the team rewrote the LINPACK record by achieving a peak performance of 9.272 teraflops within the 3,000 watts power budget.

Sun Yat-sen University was victorious in this effort, but they were not alone in their ability to impress the judges, a panel of HPC experts that included ASTRON researcher Chris Broekema. Compute platform lead for the SKA Science Data Processor (essentially the HPC arm of SKA), Broekema shared with HPCwire that while the solutions that the students came up with were not entirely new ideas, the quality of the teams’ work exceeded his expectations.

The 16 teams who competed in the ASC15 finals were allowed to research the application in advance, but the way that they tackled the problem showed creativity and an understanding of the main issues involved in optimizing this I/O bound algorithm. In fact, they managed to get fairly close to the state-of-the-art in just a couple weeks, according to Broekema.

While the various teams employed different optimization techniques, Broekema said that the best results were the ones that completely reordered the way the data was handled and altered the structure of the different loops. This led to a result that was essentially one step short of most successful optimization developed by the SKA community.

One of the primary challenges of this algorithm relates to memory accesses, something that was correctly identified by most of the teams. Gridding involves many memory reads and writes for very little compute. The current state-of-the-art in addressing this imbalance is to sacrifice compute for reduced memory accesses. Implementing this solution takes a while, and requires a complete rethink of the way you go through your data.

“Even though it’s a bit more expensive in terms of compute, the fact that it’s far more efficient in going through memory makes it a far more efficient implementation of gridding altogether,” Broekema explained.

According to the ASC15 committee, the application selected for the MIC optimization test should be “practical, challenging and interesting.” Asked why this application was a good fit for the contest, Broekema responded that the shortness of the code snippet engendered a much more detailed analysis of what’s happening in the actual code, compared to the other applications, which, being established and somewhat bulky code bases, can be very difficult for students to fully penetrate. While the snippet allowed for a more meaningful challenge in some ways, Broekema is already thinking about ways to fine-tune the test code to further enrich the student experience. He wants to make it more like real-world implementations so students can get a feel for how it is used in practice.

MIC optimization is one of many projects that Broekema and his colleagues are working on. Several of the SKA processing workloads, including the gridding algorithm, have been optimized for GPUs, he said, but it can work for other platforms as well, including MIC, FPGAs and ASICs. Each of these necessitates a different approach to data handling. A number of benchmarking efforts have already been completed and others are underway as the SKA ramps up to its 2017 Phase 1 launch.

Broekema’s next point drove home just how integral platform evaluation is to the greater SKA effort. “One of the undertakings of the SKA community in general is looking at the various platforms that are currently available and the various algorithms important to the work to see how they map on those platforms,” he said. “This isn’t confined to the Science Data Processor [the high-performance computing component of the SKA].”

“Before data is sent to the Science Data Processor, which does the gridding, Fourier transforming, etc., there’s the central signal processor, essentially the correlator, which involves a very large amount of fairly simple algorithms – correlation, filtering, and also Fourier Transforms, probably on fixed integer size data – and those may well be done in FPGAs or ASICS, although it’s also possible to use accelerators like GPUs or Phi. So there’s a range of algorithms, correlators, Fourier transforms, gridding, convolutions, filters, etc., that are analyzed for different kinds of platforms, to see what is the best combination of platform and implementation.”

Asked whether FGPAs/ASICs wouldn’t be the best choice in terms of highest performance and performance-per-watt, Broekema said they are still very hard to program, which increases the risk of a late implementation. It’s also his opinion that the performance gap between GPUs and FPGAs is narrowing fairly quickly. It used to be several factors of discrepancy, but now it’s just a couple of dozen percent, he reported, and implementations of months (with GPUs) rather than years (with FPGAs) is a great advantage as well.

After a slight pause, however, Broekema began laying out the factors that could turn the tide toward FPGAs, starting with Intel’s purchase of Altera on Monday. The February announcement that Altera FPGAs will be manufactured on Intel’s 14 nm tri-gate transistor technology was cited as another reason to believe that FPGAs will continue to maintain their energy-efficiency edge over GPUs. And the fact that the reconfigurable chips can now be programmed using OpenCL promises to ease one of their main weaknesses. Just how much having this OpenCL support changes the FPGA programming paradigm is something that the SKA HPC group will be exploring with a new pilot project.

In summary, Broekema characterized the boundary between different kinds of programmable accelerators as fuzzy, which is why they are taking a look at all of them. “FPGAs are getting easier to integrate,” he stated. “There’s the Xeon Phi, which has the advantage of being easier to program and looking more like a regular Xeon, but they are a little late to the party and performance is not optimal at the moment. We did benchmarks on DSPs as well, and found them to be even more difficult to program than FPGAs.”

With all this benchmarking, GPUs are currently the preferred accelerator within the SKA community and the one they have deployed in production environments.

While the research into different platforms is being carried out by and for the benefit of the radio astronomy community in preparation for the immense SKA radio telescope, the value does not end there. “There’s an obvious parallel with medical imaging,” Broekema told HPCwire. “The data from large MRI machines, they do fairly similar work; then there’s the multimedia sector, streaming video has very similar data rates,” he said.

More significant is the potential for shared lessons going forward as HPC and even general computing become ever more data-laden. Radio astronomy knows all about these extremely I/O bound algorithms, where the data rates far exceed the compute element. The skewed ratio between I/O and compute is set to skew even further in the future, according to Broekema, and not just in radio astronomy.

“The problems that we face now are probably indicative of the problems that everyone is going to face in the next few years,” he commented. “So in that sense, I believe that the problems that we solve are useful for pretty much the entire HPC community and possibly even computer science in general.”

The ASTRON scientist recalled an example of this synergistic cross-HPC pollination that occurred several years ago. The systems software team at Argonne National Lab built an extension to an operating system that was intended to be for high-performance computing on their Blue Gene systems, and the radio astronomy community coopted it with great success for their Blue Gene that performed the data processing for LOFAR.

“Many of the optimizations that we come up with are equally valuable and equally useful for other HPC and other computer science applications,” he stated.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Cerebras Doubles AI Performance with Second-Gen 7nm Wafer Scale Engine

April 20, 2021

Nearly two years since its massive 1.2 trillion transistor Wafer Scale Engine chip debuted at Hot Chips, Cerebras Systems is announcing its second-generation technology (WSE-2), which its says packs twice the performance Read more…

The New Scalability

April 20, 2021

HPC is all about scalability. The most powerful systems. The biggest data sets. The most cores, the most bytes, the most flops, the most bandwidth. HPC scales! Notwithstanding a few recurring arguments over the last twenty years about scaling up versus scaling out, the definition of scalability... Read more…

Supercomputer-Powered Climate Model Makes Startling Sea Level Rise Prediction

April 19, 2021

The climate science community is tasked with striking a difficult balance: inspiring precisely the amount of alarm commensurate to the climate crisis. Make estimates that are too conservative, and the public might not re Read more…

San Diego Supercomputer Center Opens ‘Expanse’ to Industry Users

April 15, 2021

When San Diego Supercomputer Center (SDSC) at the University of California San Diego was getting ready to deploy its flagship Expanse supercomputer for the large research community it supports, it also sought to optimize Read more…

GTC21: Dell Building Cloud Native Supercomputers at U Cambridge and Durham

April 14, 2021

In conjunction with GTC21, Dell Technologies today announced new supercomputers at universities across DiRAC (Distributed Research utilizing Advanced Computing) in the UK with plans to explore use of Nvidia BlueField DPU technology. The University of Cambridge will expand... Read more…

AWS Solution Channel

Research computing with RONIN on AWS

To allow more visibility into and management of Amazon Web Services (AWS) resources and expenses and minimize the cloud skills training required to operate these resources, AWS Partner RONIN created the RONIN research computing platform. Read more…

The Role and Potential of CPUs in Deep Learning

April 14, 2021

Deep learning (DL) applications have unique architectural characteristics and efficiency requirements. Hence, the choice of computing system has a profound impact on how large a piece of the DL pie a user can finally enj Read more…

Cerebras Doubles AI Performance with Second-Gen 7nm Wafer Scale Engine

April 20, 2021

Nearly two years since its massive 1.2 trillion transistor Wafer Scale Engine chip debuted at Hot Chips, Cerebras Systems is announcing its second-generation te Read more…

The New Scalability

April 20, 2021

HPC is all about scalability. The most powerful systems. The biggest data sets. The most cores, the most bytes, the most flops, the most bandwidth. HPC scales! Notwithstanding a few recurring arguments over the last twenty years about scaling up versus scaling out, the definition of scalability... Read more…

San Diego Supercomputer Center Opens ‘Expanse’ to Industry Users

April 15, 2021

When San Diego Supercomputer Center (SDSC) at the University of California San Diego was getting ready to deploy its flagship Expanse supercomputer for the larg Read more…

GTC21: Dell Building Cloud Native Supercomputers at U Cambridge and Durham

April 14, 2021

In conjunction with GTC21, Dell Technologies today announced new supercomputers at universities across DiRAC (Distributed Research utilizing Advanced Computing) in the UK with plans to explore use of Nvidia BlueField DPU technology. The University of Cambridge will expand... Read more…

The Role and Potential of CPUs in Deep Learning

April 14, 2021

Deep learning (DL) applications have unique architectural characteristics and efficiency requirements. Hence, the choice of computing system has a profound impa Read more…

GTC21: Nvidia Launches cuQuantum; Dips a Toe in Quantum Computing

April 13, 2021

Yesterday Nvidia officially dipped a toe into quantum computing with the launch of cuQuantum SDK, a development platform for simulating quantum circuits on GPU-accelerated systems. As Nvidia CEO Jensen Huang emphasized in his keynote, Nvidia doesn’t plan to build... Read more…

Nvidia Aims Clara Healthcare at Drug Discovery, Imaging via DGX

April 12, 2021

Nvidia Corp. continues to expand its Clara healthcare platform with the addition of computational drug discovery and medical imaging tools based on its DGX A100 platform, related InfiniBand networking and its AGX developer kit. The Clara partnerships announced during... Read more…

Nvidia Serves Up Its First Arm Datacenter CPU ‘Grace’ During Kitchen Keynote

April 12, 2021

Today at Nvidia’s annual spring GPU Technology Conference (GTC), held virtually once more due to the pandemic, the company unveiled its first ever Arm-based CPU, called Grace in honor of the famous American programmer Grace Hopper. The announcement of the new... Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

CERN Is Betting Big on Exascale

April 1, 2021

The European Organization for Nuclear Research (CERN) involves 23 countries, 15,000 researchers, billions of dollars a year, and the biggest machine in the worl Read more…

Programming the Soon-to-Be World’s Fastest Supercomputer, Frontier

January 5, 2021

What’s it like designing an app for the world’s fastest supercomputer, set to come online in the United States in 2021? The University of Delaware’s Sunita Chandrasekaran is leading an elite international team in just that task. Chandrasekaran, assistant professor of computer and information sciences, recently was named... Read more…

HPE Launches Storage Line Loaded with IBM’s Spectrum Scale File System

April 6, 2021

HPE today launched a new family of storage solutions bundled with IBM’s Spectrum Scale Erasure Code Edition parallel file system (description below) and featu Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

Saudi Aramco Unveils Dammam 7, Its New Top Ten Supercomputer

January 21, 2021

By revenue, oil and gas giant Saudi Aramco is one of the largest companies in the world, and it has historically employed commensurate amounts of supercomputing Read more…

Quantum Computer Start-up IonQ Plans IPO via SPAC

March 8, 2021

IonQ, a Maryland-based quantum computing start-up working with ion trap technology, plans to go public via a Special Purpose Acquisition Company (SPAC) merger a Read more…

Leading Solution Providers

Contributors

Can Deep Learning Replace Numerical Weather Prediction?

March 3, 2021

Numerical weather prediction (NWP) is a mainstay of supercomputing. Some of the first applications of the first supercomputers dealt with climate modeling, and Read more…

Livermore’s El Capitan Supercomputer to Debut HPE ‘Rabbit’ Near Node Local Storage

February 18, 2021

A near node local storage innovation called Rabbit factored heavily into Lawrence Livermore National Laboratory’s decision to select Cray’s proposal for its CORAL-2 machine, the lab’s first exascale-class supercomputer, El Capitan. Details of this new storage technology were revealed... Read more…

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

African Supercomputing Center Inaugurates ‘Toubkal,’ Most Powerful Supercomputer on the Continent

February 25, 2021

Historically, Africa hasn’t exactly been synonymous with supercomputing. There are only a handful of supercomputers on the continent, with few ranking on the Read more…

AMD Launches Epyc ‘Milan’ with 19 SKUs for HPC, Enterprise and Hyperscale

March 15, 2021

At a virtual launch event held today (Monday), AMD revealed its third-generation Epyc “Milan” CPU lineup: a set of 19 SKUs -- including the flagship 64-core, 280-watt 7763 part --  aimed at HPC, enterprise and cloud workloads. Notably, the third-gen Epyc Milan chips achieve 19 percent... Read more…

The History of Supercomputing vs. COVID-19

March 9, 2021

The COVID-19 pandemic poses a greater challenge to the high-performance computing community than any before. HPCwire's coverage of the supercomputing response t Read more…

HPE Names Justin Hotard New HPC Chief as Pete Ungaro Departs

March 2, 2021

HPE CEO Antonio Neri announced today (March 2, 2021) the appointment of Justin Hotard as general manager of HPC, mission critical solutions and labs, effective Read more…

GTC21: Nvidia Launches cuQuantum; Dips a Toe in Quantum Computing

April 13, 2021

Yesterday Nvidia officially dipped a toe into quantum computing with the launch of cuQuantum SDK, a development platform for simulating quantum circuits on GPU-accelerated systems. As Nvidia CEO Jensen Huang emphasized in his keynote, Nvidia doesn’t plan to build... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire