Shining a Light on SKA’s Massive Data Processing Requirements

By Tiffany Trader

June 4, 2015

One of the many highlights of the fourth annual Asia Student Supercomputer Challenge (ASC15) was the MIC optimization test, which this year required students to optimize a gridding algorithm used in the world’s largest international astronomy effort, the Square Kilometre Array (SKA) project.

Gridding is one of the most time-consuming steps in radio telescope data processing. To reconstruct a sky image from the data collected by the radio telescope, scientists need to take the irregular sampled data and map it onto a standardized 2-D mesh. The process of adding sampled data from the telescopes to a grid is called gridding. After this step, the grid can be Fourier transformed to create a sky image.

To say that radio astronomy is pushing the limits of data processing is an understatement. Consider that the data produced by SKA per second is expected to exceed 12TB and nearly 50 percent of this astronomy data need to be processed through gridding. In a 2012 paper, Netherlands Institute for Radio Astronomy (ASTRON) researcher John W. Romein placed SKA phase one image processing requirements in the petaflops range; the full-scale project will cross into exaflops territory.

Unlike the other five ASC15 test applications (LINPACK, NAMD, WRF-CHEM, Palabos and the surprise-app HPCC), which run on Inspur provided racks with a maximum power consumption limit of 3000 watts, the gridding app is run on a separate platform provided by the committee consisting of one login node and 16 computing nodes. The 16 nodes are outfitted with two CPUs (Intel Xeon E5-2670 v3, 12-core, 2.30GHz, 64GB memory) and one MIC card (Intel Xeon Phi 7110P, 61 cores, 1.1Ghz, 8GB memory) connected over InfiniBand FDR.

The gridding portion of the ASC15 challenge is worth 20 points out of a possible 100 and the team with the fastest run time is awarded the e-Prize award, which comes with $4,380 in prize money. During the awards ceremony held Friday, May 22, the winner of this challenge was declared to be Sun Yat-sen University. This was not Sun Yat-sen’s first time being honored in an ASC competition. Last year, the team rewrote the LINPACK record by achieving a peak performance of 9.272 teraflops within the 3,000 watts power budget.

Sun Yat-sen University was victorious in this effort, but they were not alone in their ability to impress the judges, a panel of HPC experts that included ASTRON researcher Chris Broekema. Compute platform lead for the SKA Science Data Processor (essentially the HPC arm of SKA), Broekema shared with HPCwire that while the solutions that the students came up with were not entirely new ideas, the quality of the teams’ work exceeded his expectations.

The 16 teams who competed in the ASC15 finals were allowed to research the application in advance, but the way that they tackled the problem showed creativity and an understanding of the main issues involved in optimizing this I/O bound algorithm. In fact, they managed to get fairly close to the state-of-the-art in just a couple weeks, according to Broekema.

While the various teams employed different optimization techniques, Broekema said that the best results were the ones that completely reordered the way the data was handled and altered the structure of the different loops. This led to a result that was essentially one step short of most successful optimization developed by the SKA community.

One of the primary challenges of this algorithm relates to memory accesses, something that was correctly identified by most of the teams. Gridding involves many memory reads and writes for very little compute. The current state-of-the-art in addressing this imbalance is to sacrifice compute for reduced memory accesses. Implementing this solution takes a while, and requires a complete rethink of the way you go through your data.

“Even though it’s a bit more expensive in terms of compute, the fact that it’s far more efficient in going through memory makes it a far more efficient implementation of gridding altogether,” Broekema explained.

According to the ASC15 committee, the application selected for the MIC optimization test should be “practical, challenging and interesting.” Asked why this application was a good fit for the contest, Broekema responded that the shortness of the code snippet engendered a much more detailed analysis of what’s happening in the actual code, compared to the other applications, which, being established and somewhat bulky code bases, can be very difficult for students to fully penetrate. While the snippet allowed for a more meaningful challenge in some ways, Broekema is already thinking about ways to fine-tune the test code to further enrich the student experience. He wants to make it more like real-world implementations so students can get a feel for how it is used in practice.

MIC optimization is one of many projects that Broekema and his colleagues are working on. Several of the SKA processing workloads, including the gridding algorithm, have been optimized for GPUs, he said, but it can work for other platforms as well, including MIC, FPGAs and ASICs. Each of these necessitates a different approach to data handling. A number of benchmarking efforts have already been completed and others are underway as the SKA ramps up to its 2017 Phase 1 launch.

Broekema’s next point drove home just how integral platform evaluation is to the greater SKA effort. “One of the undertakings of the SKA community in general is looking at the various platforms that are currently available and the various algorithms important to the work to see how they map on those platforms,” he said. “This isn’t confined to the Science Data Processor [the high-performance computing component of the SKA].”

“Before data is sent to the Science Data Processor, which does the gridding, Fourier transforming, etc., there’s the central signal processor, essentially the correlator, which involves a very large amount of fairly simple algorithms – correlation, filtering, and also Fourier Transforms, probably on fixed integer size data – and those may well be done in FPGAs or ASICS, although it’s also possible to use accelerators like GPUs or Phi. So there’s a range of algorithms, correlators, Fourier transforms, gridding, convolutions, filters, etc., that are analyzed for different kinds of platforms, to see what is the best combination of platform and implementation.”

Asked whether FGPAs/ASICs wouldn’t be the best choice in terms of highest performance and performance-per-watt, Broekema said they are still very hard to program, which increases the risk of a late implementation. It’s also his opinion that the performance gap between GPUs and FPGAs is narrowing fairly quickly. It used to be several factors of discrepancy, but now it’s just a couple of dozen percent, he reported, and implementations of months (with GPUs) rather than years (with FPGAs) is a great advantage as well.

After a slight pause, however, Broekema began laying out the factors that could turn the tide toward FPGAs, starting with Intel’s purchase of Altera on Monday. The February announcement that Altera FPGAs will be manufactured on Intel’s 14 nm tri-gate transistor technology was cited as another reason to believe that FPGAs will continue to maintain their energy-efficiency edge over GPUs. And the fact that the reconfigurable chips can now be programmed using OpenCL promises to ease one of their main weaknesses. Just how much having this OpenCL support changes the FPGA programming paradigm is something that the SKA HPC group will be exploring with a new pilot project.

In summary, Broekema characterized the boundary between different kinds of programmable accelerators as fuzzy, which is why they are taking a look at all of them. “FPGAs are getting easier to integrate,” he stated. “There’s the Xeon Phi, which has the advantage of being easier to program and looking more like a regular Xeon, but they are a little late to the party and performance is not optimal at the moment. We did benchmarks on DSPs as well, and found them to be even more difficult to program than FPGAs.”

With all this benchmarking, GPUs are currently the preferred accelerator within the SKA community and the one they have deployed in production environments.

While the research into different platforms is being carried out by and for the benefit of the radio astronomy community in preparation for the immense SKA radio telescope, the value does not end there. “There’s an obvious parallel with medical imaging,” Broekema told HPCwire. “The data from large MRI machines, they do fairly similar work; then there’s the multimedia sector, streaming video has very similar data rates,” he said.

More significant is the potential for shared lessons going forward as HPC and even general computing become ever more data-laden. Radio astronomy knows all about these extremely I/O bound algorithms, where the data rates far exceed the compute element. The skewed ratio between I/O and compute is set to skew even further in the future, according to Broekema, and not just in radio astronomy.

“The problems that we face now are probably indicative of the problems that everyone is going to face in the next few years,” he commented. “So in that sense, I believe that the problems that we solve are useful for pretty much the entire HPC community and possibly even computer science in general.”

The ASTRON scientist recalled an example of this synergistic cross-HPC pollination that occurred several years ago. The systems software team at Argonne National Lab built an extension to an operating system that was intended to be for high-performance computing on their Blue Gene systems, and the radio astronomy community coopted it with great success for their Blue Gene that performed the data processing for LOFAR.

“Many of the optimizations that we come up with are equally valuable and equally useful for other HPC and other computer science applications,” he stated.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Dell’s AMD-Powered Server Line Targets High-End Jobs

September 17, 2019

Dell Technologies rolled out five new servers this week based on AMD’s latest Epyc processor that are geared toward data-driven workloads running on increasingly popular multi-cloud platforms as well as in the HPC data Read more…

By George Leopold

Cerebras to Supply DOE with Wafer-Scale AI Supercomputing Technology

September 17, 2019

Cerebras Systems, which debuted its wafer-scale AI silicon at Hot Chips last month, has entered into a multi-year partnership with Argonne National Laboratory and Lawrence Livermore National Laboratory as part of a larger collaboration with the U.S. Department of Energy... Read more…

By Tiffany Trader

Better Scientific Software: Turn Your Passion into Cash

September 13, 2019

Do you know your way around scientific software and programming? You think you can contribute to the community by making scientific software better? If so, then the Better Scientific Software (BSSW) organization wants yo Read more…

By Dan Olds

AWS Solution Channel

A Guide to Discovering the Best AWS Instances and Configurations for Your HPC Workload

The flexibility and heterogeneity of HPC cloud services provide a welcome contrast to the constraints of on-premises HPC. Every HPC configuration is potentially accessible to any given workload in a well-resourced cloud HPC deployment, with vast scalability to spin up as much compute as that workload demands in any given moment. Read more…

HPE Extreme Performance Solutions

Intel FPGAs: More Than Just an Accelerator Card

FPGA (Field Programmable Gate Array) acceleration cards are not new, as they’ve been commercially available since 1984. Typically, the emphasis around FPGAs has centered on the fact that they’re programmable accelerators, and that they can truly offer workload specific hardware acceleration solutions without requiring custom silicon. Read more…

IBM Accelerated Insights

Rumors of My Death Are Still Exaggerated: The Mainframe

[Connect with Spectrum users and learn new skills in the IBM Spectrum LSF User Community.]

As of 2017, 92 of the world’s top 100 banks used mainframes. Read more…

Google’s ML Compiler Initiative Advances

September 12, 2019

Machine learning models running on everything from cloud platforms to mobile phones are posing new challenges for developers faced with growing tool complexity. Google’s TensorFlow team unveiled an open-source machine Read more…

By George Leopold

Cerebras to Supply DOE with Wafer-Scale AI Supercomputing Technology

September 17, 2019

Cerebras Systems, which debuted its wafer-scale AI silicon at Hot Chips last month, has entered into a multi-year partnership with Argonne National Laboratory and Lawrence Livermore National Laboratory as part of a larger collaboration with the U.S. Department of Energy... Read more…

By Tiffany Trader

IDAS: ‘Automagic’ HPC With Training Wheels

September 12, 2019

High-performance computing (HPC) for research is notorious for having steep barriers to entry. For this reason, high-tech disciplines were early adopters, have Read more…

By Elizabeth Leake

Univa Brings Cloud Automation to Slurm Users with Navops Launch 2.0

September 11, 2019

Univa, the company behind Grid Engine, announced today its HPC cloud-automation platform NavOps Launch will support the popular open-source workload scheduler Slurm. With the release of NavOps Launch 2.0, “Slurm users will have access to the same cloud automation capabilities... Read more…

By Tiffany Trader

When Dense Matrix Representations Beat Sparse

September 9, 2019

In our world filled with unintended consequences, it turns out that saving memory space to help deal with GPU limitations, knowing it introduces performance pen Read more…

By James Reinders

Eyes on the Prize: TACC’s Frontera Quickly Ramps up Science Agenda

September 9, 2019

Announced a year ago and officially launched a week ago, the Texas Advanced Computing Center’s Frontera – now the fastest academic supercomputer (~25 petefl Read more…

By John Russell

Quantum Roundup: IBM Goes to School, Delft Tackles Networking, Rigetti Updates

September 5, 2019

IBM today announced a new open source quantum ‘textbook’, a series of quantum education videos, and plans to expand its nascent quantum hackathon program. L Read more…

By John Russell

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

Fastest Academic Supercomputer Enters Full Production at TACC, Just in Time for Hurricane Season

September 3, 2019

Frontera, the NSF supercomputer installed at the Texas Advanced Computing Center (TACC) in June, passed its formal acceptance last week and is now officially la Read more…

By Tiffany Trader

High Performance (Potato) Chips

May 5, 2006

In this article, we focus on how Procter & Gamble is using high performance computing to create some common, everyday supermarket products. Tom Lange, a 27-year veteran of the company, tells us how P&G models products, processes and production systems for the betterment of consumer package goods. Read more…

By Michael Feldman

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

AMD Verifies Its Largest 7nm Chip Design in Ten Hours

June 5, 2019

AMD announced last week that its engineers had successfully executed the first physical verification of its largest 7nm chip design – in just ten hours. The AMD Radeon Instinct Vega20 – which boasts 13.2 billion transistors – was tested using a TSMC-certified Calibre nmDRC software platform from Mentor. Read more…

By Oliver Peckham

TSMC and Samsung Moving to 5nm; Whither Moore’s Law?

June 12, 2019

With reports that Taiwan Semiconductor Manufacturing Co. (TMSC) and Samsung are moving quickly to 5nm manufacturing, it’s a good time to again ponder whither goes the venerable Moore’s law. Shrinking feature size has of course been the primary hallmark of achieving Moore’s law... Read more…

By John Russell

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

Nvidia Embraces Arm, Declares Intent to Accelerate All CPU Architectures

June 17, 2019

As the Top500 list was being announced at ISC in Frankfurt today with an upgraded petascale Arm supercomputer in the top third of the list, Nvidia announced its Read more…

By Tiffany Trader

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Top500 Purely Petaflops; US Maintains Performance Lead

June 17, 2019

With the kick-off of the International Supercomputing Conference (ISC) in Frankfurt this morning, the 53rd Top500 list made its debut, and this one's for petafl Read more…

By Tiffany Trader

A Behind-the-Scenes Look at the Hardware That Powered the Black Hole Image

June 24, 2019

Two months ago, the first-ever image of a black hole took the internet by storm. A team of scientists took years to produce and verify the striking image – an Read more…

By Oliver Peckham

Cray – and the Cray Brand – to Be Positioned at Tip of HPE’s HPC Spear

May 22, 2019

More so than with most acquisitions of this kind, HPE’s purchase of Cray for $1.3 billion, announced last week, seems to have elements of that overused, often Read more…

By Doug Black and Tiffany Trader

Chinese Company Sugon Placed on US ‘Entity List’ After Strong Showing at International Supercomputing Conference

June 26, 2019

After more than a decade of advancing its supercomputing prowess, operating the world’s most powerful supercomputer from June 2013 to June 2018, China is keep Read more…

By Tiffany Trader

Qualcomm Invests in RISC-V Startup SiFive

June 7, 2019

Investors are zeroing in on the open standard RISC-V instruction set architecture and the processor intellectual property being developed by a batch of high-flying chip startups. Last fall, Esperanto Technologies announced a $58 million funding round. Read more…

By George Leopold

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Intel Debuts Pohoiki Beach, Its 8M Neuron Neuromorphic Development System

July 17, 2019

Neuromorphic computing has received less fanfare of late than quantum computing whose mystery has captured public attention and which seems to have generated mo Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This