Emerging Architectures Boost Geospatial Application Performance

By Chenggang Lai, Miaoqing Huang, Xuan Shi, and Haihang You

January 23, 2014

Geospatial data is critical in a variety of applications – including transportation planning, hydrological network and watershed analysis, environmental modeling and surveillance, emergency response, and military operations. As the availability of geospatial data has expanded, its volume has accelerated, creating a variety of challenges and complexities that render traditional desktop-based geographical information systems (GIS) and remote-sensing software incapable of providing the requisite processing power.

Intel’s Many Integrated Core (MIC) architecture and the graphics processing unit (GPU) employ parallelism to achieve scalability with high performance for data-intensive computing over high-resolution spatial data. Our research has demonstrated that hybrid computer clusters equipped with the latest Intel MIC processors and NVIDIA GPUs can achieve a significant performance improvement for a range of typical geospatial applications, with Kriging interpolation, ISODATA, and Cellular Automata as examples. Details of our study are contained in a paper titled “Accelerating Geospatial Applications on Hybrid Architectures” in the proceedings of the 2013 IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing. The co-authors of the paper were Chenggang Lai, Miaoqing Huang, and Xuan Shi of the University of Arkansas, and Haihang You of the National Institute for Computational Sciences.

Coprocessor architecture

GPU architecture has been evolving for many years. Nvidia is a case in point, having gone through many generations, from G80 to GT200, Fermi, and today’s Kepler. The Kepler GPU architecture contains 15 streaming multiprocessors (SMXes), each of which consists of 192 single-precision cores and 64 double-precision cores. The Kepler architecture provides three advanced features to efficiently share the GPU resources among multiple host threads or processes (i.e., Hyper-Q), flexibly create new kernels on a GPU (i.e., dynamic parallelism), and reduce communication overhead across GPUs through GPUDirect. GPUs are normally used as accelerators in high-performance computer clusters. In a typical MPI-based parallel application, the MPI process executes on a host CPU that in turn allocates the computation to one or more client GPUs.

figure.1.kepler-architecture

NVIDIA’s Kepler GPU architecture. Image source: Lai et al., “Accelerating Geospatial Applications on Hybrid Architectures,” Proceedings of the 2013 IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 1545–1552, 2013.

The first commercially available Intel coprocessor based on MIC architecture is Xeon Phi. It contains up to 61 scalar processors with vector processing units. Direct communication between MIC coprocessors across different nodes is also supported through MPI. The following images show two approaches to parallelizing applications on computer clusters equipped with MIC processors. The first approach is to treat the MIC processors as clients to the host CPUs. The MPI processes will be hosted by CPUs, which will offload the computation to the MIC processors. Multithreading programming models such as OpenMP can be used to allocate many cores for data processing. The second approach is to let each MIC core directly host one MPI process. In this way, the 60 cores on the same die are treated as 60 independent processors while sharing the 8 GB on-board memory on the Xeon Phi 5110P.

figure.2.MIC_Use1.offloading

Offloading approach to implementing parallelism on the MIC cluster. Image source: Lai et al., “Accelerating Geospatial Applications on Hybrid Architectures,” Proceedings of the 2013 IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 1545–1552, 2013.

figure.3.MIC_Use2.directhost

Direct-host approach to implementing parallelism on the MIC cluster. Image source: Lai et al., “Accelerating Geospatial Applications on Hybrid Architectures,” Proceedings of the 2013 IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 1545–1552, 2013.

Benchmarks

Three different types of use case served as the benchmarks for this study: Kriging interpolation (embarrassingly parallelism), the Iterative Self-organizing Data-analysis Technique Algorithm (ISODATA) (loose communication in the computation), and Cellular Automata (intense communication).

Kriging is a geostatistical estimator that infers the value of a random field at an unobserved location, and can be viewed as a point interpolation that reads input point data and returns a raster grid with calculated estimations for each cell.

ISODATA is one of the most frequently used algorithms for unsupervised image classification algorithms in remote sensing applications. In general, it can be implemented in three steps: (1) calculate the initial mean value of each class; (2) classify each pixel to the nearest class; and (3) calculate the new class means based on all pixels in one class. The second and third steps are repeated until the change between two iterations is small enough. When multiple processors are used, only one summation from all processors is required in each iteration.

Cellular Automata are commonly used in a variety of geospatial modeling and simulation. Game of Life (GOL), invented by British mathematician John Conway, is a well-known generic Cellular Automaton that consists of a collection of cells that can live, die or multiply based on a few mathematical rules. The universe of the GOL is a two-dimensional orthogonal grid of square cells, each of which is in one of two possible states, alive (‘1’) or dead (‘0’). Every cell interacts with its eight neighbors, which are the cells that are horizontally, vertically, or diagonally adjacent.

Experiment setup

We conducted our experiments on two platforms, the National Science Foundation-sponsored Keeneland supercomputer and Beacon supercomputer. Keeneland Initial Delivery System (KIDS) is a 201 Teraflop, 120-node HP SL390 system with 240 Intel Xeon X5660 CPUs and 360 Nvidia Fermi GPUs, with the nodes connected by a QDR InfiniBand network. Each node has two 6-core 2.8 GHz Xeon CPUs and 3 Tesla M2090 GPUs. The Nvidia M2090 GPU contains 512 CUDA cores and 6 GB GDDR5 on-board memory. The Beacon system (a Cray CS300-AC Cluster Supercomputer) offers access to 48 compute nodes and 6 I/O nodes joined by an FDR InfiniBand interconnect providing 56 Gb/s of bi-directional bandwidth. Each compute node is equipped with 2 Intel Xeon E5-2670 8-core 2.6 GHz processors, 4 Intel Xeon Phi (MIC) coprocessors 5110P, 256 GB of RAM, and 960 GB of SSD storage. Each I/O node provides access to an additional 4.8 TB of SSD storage. For each benchmark, we had three parallel implementations on two clusters. i.e., MPI+CPU, MPI+MIC, MPI+GPU.

Results

figure.4.a.kriging-300xfigure.4.b.isodata-300xfigure.4.c.gol_32768-300x

Performance of benchmarks on four different configurations: (a) Kriging, (b) ISODATA, (c) GOL. Image source: Lai et al., “Accelerating Geospatial Applications on Hybrid Architectures,” Proceedings of the 2013 IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 1545–1552, 2013.

We want to show the strong scalability of the parallel implementations. Therefore, the problem size is fixed for each benchmark while the number of participating MPI processes is increased.

In the Kriging interpolation benchmark, the source dataset is evenly partitioned among all MPI processes along the row-major. The computation in each MPI process is purely local, i.e., there is no cross-process communication. The problem size of this benchmark is 171 MB consisting of 4 datasets. The output raster grid for each dataset has a consistent dimension of 1,440×720. The performance of the GPU cluster with K20 is projected based on the speedup of the single K20 vs. M2090 and we assume that the other specifications of the K20 GPU cluster is same to the Keeneland KIDS. From this figure, it can be found that all hybrid implementation can easily outperform the parallel implementation on CPU with GPU further better than MIC.

The input of the ISODATA is a high-resolution image of 18 GB with a dimension of 80,000×80,000 for three bands. The objective of this benchmark is to classify the image into 15 classes. For this benchmark, it can be found that the gap between the MIC processor and GPUs becomes quite small. One reason is that the FDR InfiniBand network on Beacon provides much higher bandwidth than the QDR InfiniBand network on Keeneland KIDS. The advantage of more efficient communication network on Beacon is further demonstrated when the number of participating processors is increased from 100 to 120.

In the Game of Life benchmark, the grid size is 32,768×32,768. The status of each cell in the grid will be updated for 100 iterations. By observing the performance results, it can be found that the strong scalability is demonstrated for MPI implementations on both CPUs and GPUs. For the MPI+MIC implementation, it is found that the performance does not scale quite well due to the communication overhead among MPI processes. Therefore, it is critical to keep a balance between computation and communication for achieving the best performance.

Conclusion

In our study, we have shown the potential for accelerating geospatial applications using parallel implementation on hybrid computer clusters. MPI+GPU and MPI+MIC parallel implementations of representative geospatial applications achieve significant performance improvement compared with the traditional MPI+CPU parallel. It is also found that the simple MPI-direct-host programming model on Intel MIC cluster can achieve a performance equivalent to the MPI+GPU model on GPU clusters when the same number of processors are allocated. An efficient cross-node communication network is still the key to achieve the strong scalability for parallel applications running on multiple nodes. In general, geospatial computation consists of the functional modules to process (1) vector geometric data, (2) network and graph data, (3) raster grid data, and (4) imagery data. A variety of research challenges remain in deploying heterogeneous computer architecture and systems to handle different data structure and geospatial computation problems in the future.

The paper on this research can be accessed at http://www.csce.uark.edu/~mqhuang/papers/2013_gis_hpcc.pdf.

Research Team Bios

Miaoqing Huang is an Assistant Professor at the Department of Computer Science and Computer Engineering, University of Arkansas. His research interests include operating system and infrastructure design for manycore computer system, hardware acceleration technologies (such as FPGA and GPU), and on-board cache design in nonvolatile memory-based solid-state drives (SSDs). He earned his doctoral degree in computer engineering from The George Washington University in 2009. He can be reached at mqhuang@uark.edu.

Xuan Shi is an Assistant Professor at the Department of Geosciences, University of Arkansas. His research interests include Geoinformatics, Geospatial Cyberinfrastructure, High performance geocomputation among others. He earned his doctoral degree in geography from the West Virginia University in 2007. He can be reached at xuanshi@uark.edu.

Haihang You is a Computational Scientist at the National Institute for Computational Sciences, University of Tennessee. Prior of joining NICS, he was a research associate at Innovative Computing Laboratory, Dept. of Electrical Engineering and Computer Science, University of Tennessee. His research interests are High Performance Computing, Performance Analysis and Evaluation, Compiler & Automatic Tuning and Optimization System, Linear Algebra, Iterative Adaptive Discontinuous Galerkin Finite Element Methods, Parallel I/O Tuning on Lustre and System Utilization Analysis and Improvement on a Supercomputer. He can be reached at hyou@utk.edu.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

European HPC Summit Week and PRACEdays 2018: Slaying Dragons and SHAPEing Futures One SME at a Time

June 20, 2018

The University of Ljubljana in Slovenia hosted the third annual EHPCSW18 and fifth annual PRACEdays18 events which opened May 29, 2018. The conference was chaired by PRACE Council Vice-Chair Sergi Girona (Barcelona Super Read more…

By Elizabeth Leake (STEM-Trek for HPCwire)

An Overview of ‘OpenACC for Programmers’ from the Book’s Editors

June 20, 2018

In an era of multicore processors coupled with manycore accelerators in all kinds of devices from smartphones all the way to supercomputers, it is important to train current and future computational scientists of all dom Read more…

By Sunita Chandrasekaran and Guido Juckeland

Cray Introduces All Flash Lustre Storage Solution Targeting HPC

June 19, 2018

Citing the rise of IOPS-intensive workflows and more affordable flash technology, Cray today introduced the L300F, a scalable all-flash storage solution whose primary use case is to support high IOPS rates to/from a scra Read more…

By John Russell

HPE Extreme Performance Solutions

HPC and AI Convergence is Accelerating New Levels of Intelligence

Data analytics is the most valuable tool in the digital marketplace – so much so that organizations are employing high performance computing (HPC) capabilities to rapidly collect, share, and analyze endless streams of data. Read more…

IBM Accelerated Insights

Preview the World’s Smartest Supercomputer at ISC 2018

Introducing an accelerated IT infrastructure for HPC & AI workloads Read more…

Lenovo to Debut ‘Neptune’ Cooling Technologies at ISC

June 19, 2018

Lenovo today announced a set of cooling technologies, dubbed Neptune, that include direct to node (DTN) warm water cooling, rear door heat exchanger (RDHX), and hybrid solutions that combine air and liquid cooling. Lenov Read more…

By John Russell

European HPC Summit Week and PRACEdays 2018: Slaying Dragons and SHAPEing Futures One SME at a Time

June 20, 2018

The University of Ljubljana in Slovenia hosted the third annual EHPCSW18 and fifth annual PRACEdays18 events which opened May 29, 2018. The conference was chair Read more…

By Elizabeth Leake (STEM-Trek for HPCwire)

Cray Introduces All Flash Lustre Storage Solution Targeting HPC

June 19, 2018

Citing the rise of IOPS-intensive workflows and more affordable flash technology, Cray today introduced the L300F, a scalable all-flash storage solution whose p Read more…

By John Russell

Sandia to Take Delivery of World’s Largest Arm System

June 18, 2018

While the enterprise remains circumspect on prospects for Arm servers in the datacenter, the leadership HPC community is taking a bolder, brighter view of the x86 server CPU alternative. Amongst current and planned Arm HPC installations – i.e., the innovative Mont-Blanc project, led by Bull/Atos, the 'Isambard’ Cray XC50 going into the University of Bristol, and commitments from both Japan and France among others -- HPE is announcing that it will be supply the United States National Nuclear Security Administration (NNSA) with a 2.3 petaflops peak Arm-based system, named Astra. Read more…

By Tiffany Trader

The Machine Learning Hype Cycle and HPC

June 14, 2018

Like many other HPC professionals I’m following the hype cycle around machine learning/deep learning with interest. I subscribe to the view that we’re probably approaching the ‘peak of inflated expectation’ but not quite yet starting the descent into the ‘trough of disillusionment. This still raises the probability that... Read more…

By Dairsie Latimer

Xiaoxiang Zhu Receives the 2018 PRACE Ada Lovelace Award for HPC

June 13, 2018

Xiaoxiang Zhu, who works for the German Aerospace Center (DLR) and Technical University of Munich (TUM), was awarded the 2018 PRACE Ada Lovelace Award for HPC for her outstanding contributions in the field of high performance computing (HPC) in Europe. Read more…

By Elizabeth Leake

U.S Considering Launch of National Quantum Initiative

June 11, 2018

Sometime this month the U.S. House Science Committee will introduce legislation to launch a 10-year National Quantum Initiative, according to a recent report by Read more…

By John Russell

ORNL Summit Supercomputer Is Officially Here

June 8, 2018

Oak Ridge National Laboratory (ORNL) together with IBM and Nvidia celebrated the official unveiling of the Department of Energy (DOE) Summit supercomputer toda Read more…

By Tiffany Trader

Exascale USA – Continuing to Move Forward

June 6, 2018

The end of May 2018, saw several important events that continue to advance the Department of Energy’s (DOE) Exascale Computing Initiative (ECI) for the United Read more…

By Alex R. Larzelere

MLPerf – Will New Machine Learning Benchmark Help Propel AI Forward?

May 2, 2018

Let the AI benchmarking wars begin. Today, a diverse group from academia and industry – Google, Baidu, Intel, AMD, Harvard, and Stanford among them – releas Read more…

By John Russell

How the Cloud Is Falling Short for HPC

March 15, 2018

The last couple of years have seen cloud computing gradually build some legitimacy within the HPC world, but still the HPC industry lies far behind enterprise I Read more…

By Chris Downing

US Plans $1.8 Billion Spend on DOE Exascale Supercomputing

April 11, 2018

On Monday, the United States Department of Energy announced its intention to procure up to three exascale supercomputers at a cost of up to $1.8 billion with th Read more…

By Tiffany Trader

Deep Learning at 15 PFlops Enables Training for Extreme Weather Identification at Scale

March 19, 2018

Petaflop per second deep learning training performance on the NERSC (National Energy Research Scientific Computing Center) Cori supercomputer has given climate Read more…

By Rob Farber

Lenovo Unveils Warm Water Cooled ThinkSystem SD650 in Rampup to LRZ Install

February 22, 2018

This week Lenovo took the wraps off the ThinkSystem SD650 high-density server with third-generation direct water cooling technology developed in tandem with par Read more…

By Tiffany Trader

ORNL Summit Supercomputer Is Officially Here

June 8, 2018

Oak Ridge National Laboratory (ORNL) together with IBM and Nvidia celebrated the official unveiling of the Department of Energy (DOE) Summit supercomputer toda Read more…

By Tiffany Trader

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Hennessy & Patterson: A New Golden Age for Computer Architecture

April 17, 2018

On Monday June 4, 2018, 2017 A.M. Turing Award Winners John L. Hennessy and David A. Patterson will deliver the Turing Lecture at the 45th International Sympo Read more…

By Staff

Leading Solution Providers

SC17 Booth Video Tours Playlist

Altair @ SC17

Altair

AMD @ SC17

AMD

ASRock Rack @ SC17

ASRock Rack

CEJN @ SC17

CEJN

DDN Storage @ SC17

DDN Storage

Huawei @ SC17

Huawei

IBM @ SC17

IBM

IBM Power Systems @ SC17

IBM Power Systems

Intel @ SC17

Intel

Lenovo @ SC17

Lenovo

Mellanox Technologies @ SC17

Mellanox Technologies

Microsoft @ SC17

Microsoft

Penguin Computing @ SC17

Penguin Computing

Pure Storage @ SC17

Pure Storage

Supericro @ SC17

Supericro

Tyan @ SC17

Tyan

Univa @ SC17

Univa

Google Chases Quantum Supremacy with 72-Qubit Processor

March 7, 2018

Google pulled ahead of the pack this week in the race toward "quantum supremacy," with the introduction of a new 72-qubit quantum processor called Bristlecone. Read more…

By Tiffany Trader

Google I/O 2018: AI Everywhere; TPU 3.0 Delivers 100+ Petaflops but Requires Liquid Cooling

May 9, 2018

All things AI dominated discussion at yesterday’s opening of Google’s I/O 2018 developers meeting covering much of Google's near-term product roadmap. The e Read more…

By John Russell

Nvidia Ups Hardware Game with 16-GPU DGX-2 Server and 18-Port NVSwitch

March 27, 2018

Nvidia unveiled a raft of new products from its annual technology conference in San Jose today, and despite not offering up a new chip architecture, there were still a few surprises in store for HPC hardware aficionados. Read more…

By Tiffany Trader

Pattern Computer – Startup Claims Breakthrough in ‘Pattern Discovery’ Technology

May 23, 2018

If it weren’t for the heavy-hitter technology team behind start-up Pattern Computer, which emerged from stealth today in a live-streamed event from San Franci Read more…

By John Russell

HPE Wins $57 Million DoD Supercomputing Contract

February 20, 2018

Hewlett Packard Enterprise (HPE) today revealed details of its massive $57 million HPC contract with the U.S. Department of Defense (DoD). The deal calls for HP Read more…

By Tiffany Trader

Part One: Deep Dive into 2018 Trends in Life Sciences HPC

March 1, 2018

Life sciences is an interesting lens through which to see HPC. It is perhaps not an obvious choice, given life sciences’ relative newness as a heavy user of H Read more…

By John Russell

Intel Pledges First Commercial Nervana Product ‘Spring Crest’ in 2019

May 24, 2018

At its AI developer conference in San Francisco yesterday, Intel embraced a holistic approach to AI and showed off a broad AI portfolio that includes Xeon processors, Movidius technologies, FPGAs and Intel’s Nervana Neural Network Processors (NNPs), based on the technology it acquired in 2016. Read more…

By Tiffany Trader

Google Charts Two-Dimensional Quantum Course

April 26, 2018

Quantum error correction, essential for achieving universal fault-tolerant quantum computation, is one of the main challenges of the quantum computing field and it’s top of mind for Google’s John Martinis. At a presentation last week at the HPC User Forum in Tucson, Martinis, one of the world's foremost experts in quantum computing, emphasized... Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This