DU GeoSolutions Leverages Xeon Phi for Improved Seismic Processing

By Phil Schwan

March 23, 2016

It’s a given that modern oil and gas exploration couldn’t exist without increasingly powerful supercomputers. Today’s machines enable new capabilities, improve image resolutions, lower costs, and decrease risk. Indeed, many wells, some costing tens of millions of dollars, have missed their targets due to inadequate or incorrect data analysis. This article explores the impact of high-performance computing (HPC) and highly tuned software in DownUnder GeoSolutions (DUG) seismic processing and imaging workflows and how that in turn improves oil and gas exploration.

Fundamentally, seismic processing – turning raw, noisy data into a useful, accurate image of subsurface geology – is a complicated, computation-intensive task. To understand how HPC adds value, it helps to know roughly what the process entails. Our clients use purpose-built ocean or land teams to acquire seismic data. That process uses an energy source (such as an air-gun array or dynamite) to generate acoustic waves, which propagate through the earth, reflect off of geological layers, and are measured at the surface by a recording system.

The raw data can be very noisy, and must be processed to enhance the signal-to-noise ratio of the reflections. Moreover, the true subsurface location of these reflections isn’t initially known. A compute-intensive process called migration is used to reposition reflections to their correct subsurface positions. Prior to migration, the apparent position can be literally miles away from the actual subsurface location.

Because our clients subsequently use this data to identify likely hydrocarbon accumulations – and ultimately to decide where to drill wells – accurate subsurface positioning is paramount. Enhanced processing and migration analysis increases the fidelity of the seismic images, which in turn allows for a significantly better understanding of the subsurface geology.

DUG.Top photoImages on the left show the effect that modern processing and imaging can have on seismic data. Both images were created from the same input data; the DUG result is on the bottom. The hugely increased fidelity allows geoscientists to more accurately evaluate the prospect of hydrocarbons. This kind of processing is necessary for every dataset.

DUG Bottom photoBecause seismic data processing is so fundamental to success and extraordinarily compute-intensive to analyze, the modern petroleum industry could not exist without supercomputers and the software that drives them. All of the easy hydrocarbons were found a long time ago; we need more-advanced technology to discover and reach the energy reserves that remain. To accurately image complex subsurface geology, our clients acquire more and better-quality data, which then needs to be processed at higher resolution and with faster turnaround. These factors keep pushing compute demands higher, which is why DUG doubles its HPC power every 12-18 months.

Indeed, while DUG is a geosciences company with a broad portfolio of offerings (details below), it is also to an HPC software- and hardware-integration company. We operate a 6 petaflop/s cluster in Perth, another of the same size in Houston, and smaller clusters in London, Kuala Lumpur, Brisbane, and Jakarta – all of which are busy 24 hours a day, 365 days a year. Worldwide, DUG makes extensive use of Intel Xeon processors and Intel Xeon Phi coprocessors and its custom system from SGI is one of the world’s larger Intel Xeon Phi coprocessor commercial deployments.

Since 2004 DUG has invested heavily in developing state-of-the-art, highly-parallel processing codes, well-tuned to successive generations of multi-threaded CPUs. We use sophisticated libraries for the basic infrastructure shared amongst all our codes (to manage communication, queues, caches, and so on) along with explicit threading and vectorization, where necessary.

When interacting with the Xeon Phi coprocessors, we begin with Intel’s language extensions for offload (LEO); however in order to decouple execution on the host and coprocessor – and to share as much code as possible between them – LEO use is kept to a minimum. It’s used initially to spawn persistent threads on the coprocessor along with their associated workflow infrastructure. Subsequently, offload is used purely to feed work packets and input data to this autonomous compute engine on each coprocessor; no useful work occurs within the execution of an offload code block.

From a code structure perspective, the use of offload turns out to be reasonably non-invasive, as most of the offload code segments can be isolated and ignored by algorithm developers. In cases where host memory is at a premium, it’s sometimes necessary to subvert the LEO framework and explicitly manage address mappings and data transfers between the host and coprocessor.

The offload model has many practical advantages in terms of software bundling and execution. For example, the single executable and single host model allows existing packaging and scripting tools to remain unchanged. Because our code was already heavily threaded, relatively few changes to the Kirchhoff time migration kernel were required to achieve scalability on the Xeon Phi coprocessor. When the offloaded compute engine is up and running, all 236 hardware thread contexts are fully utilized. In simpler applications, the addition of two OpenMP parallel loop pragmas are enough to fully utilize a whole (four thread) core for each task.

DUG-supercomputer
DUG-supercomputer

We found that the Xeon processors and Xeon Phi coprocessors were a natural combination when looking to increase our per-node HPC compute density. As important as high-thread-count scalability is to driving performance, it’s really the HPC-oriented 512 bit Single Instruction Multiple Data (SIMD) instruction set on the coprocessors that removes many constraints associated with GPUs and enables non-trivial algorithms and optimizations.

For example, efficient seismic processing involves seismic and model data interacting on different resolution (generally non-uniform) grids. The gather-scatter instructions are a natural fit for these vector operations, but they don’t greatly relieve memory traffic overheads. Often a better fit for these algorithms are generic permutation vector instructions that can be used to express vectorization over both grids simultaneously. In the latter instance, mapping and interactions between the grids can occur entirely within vector registers, which significantly reduces memory traffic and improves performance.

Moving to the Intel Xeon Phi Coprocessor
Even before we considered using coprocessors, the DUG software suite was heavily optimized for modern NUMA architectures with relatively high core counts. Most of the classical HPC optimization techniques used in DUG software still apply and, indeed, the benefits tend to be amplified on the Intel Xeon Phi because of the scale of the coprocessor.

As an example of classical tuning techniques at work, we achieved large performance gains by ensuring that the 30 MB L2 cache of each coprocessor actually holds close to 30 MB of unique data. Careful data reordering and vectorization allowed us to keep the memory footprint of even very complex nested loops within the 128 KB of L2 available to each thread.

Other features of the vector instruction set simplified the existing optimized code. For example, the availability of fast 23-bit-accurate floating-point square root and division instructions, which are endemic to seismic processing, remove the need to choose between fast or accurate results. We also make use of the comprehensive set of instructions for floating-point rounding and conversion to/from fixed-point, to express operations such as interpolation.

The abundance of existing sophisticated library infrastructure in C and C++ required virtually no changes to function on the coprocessor. The integrated compiler support and execution model means that the deployment is reasonably transparent.

One noticeable difference with the coprocessor implementation of Kirchhoff time migration was the need to rely heavily on up-front allocations of large memory blocks and manual memory management. Between the overheads of pinning memory for DMA transfers, and the Linux kernel bottlenecks exposed by 236 threads simultaneously allocating and deallocating, it was important to remove as much dynamic allocation as possible. In practice, we use most of the coprocessor memory as a cache.

DUG has a number of applications, including the workhorse Kirchhoff time migration, which run up to 1,000-way thread parallelism per compute node, on hundreds of cluster nodes concurrently. Within a single compute node, adaptive load balancing across multiple coprocessors achieves essentially linear performance gains with multiple coprocessors. Each coprocessor delivers between 1.5 to 2 times the performance of the high-end hosts (which typically have 2 x 10-core Intel Xeon E5-2660 v2 processors), allowing us to increase of the compute density of our nodes by roughly 8x.

Future
Accurate subsurface imaging, made possible by highly specialized HPC solutions, are a fundamental part of oil and gas exploration and production. In these extremely difficult times for the industry, our success is due in large part to our total focus on very high-density computing in which Intel Xeon Phi coprocessors play an important role. The more FLOPs that we can pack into a single compute node at full bandwidth, the better the results, faster turnaround times, and better prices that we can offer to our clients. We’re always working on cutting-edge petroleum exploration issues, and supercomputing is as fundamental a part of those solutions as geophysics.

Company Description:
DownUnder GeoSolutions (DUG) is a geosciences company headquartered in Australia with offices in Perth, Brisbane, London, Houston, Kuala Lumpur, and Jakarta. Services include seismic acquisition design and implementation, seismic data processing, depth imaging, petrophysical processing and interpretation, quantitative interpretation services, geostatistical depth conversion, and a complete range of DUG software. DUG operates a 6 petaflop/s cluster in Perth, another of the same size in Houston, and smaller clusters in London, Kuala Lumpur, Brisbane, and Jakarta – all of which are busy 24 hours a day, 365 days a year. Worldwide, DUG makes extensive use of Intel Xeon processors and Intel Xeon Phi coprocessors.

Author Bio:
Phil Schwan is Head of Software at DownUnder GeoSolutions.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire