NVIDIA Highlights GPU Progress on Titan Supercomputer

By Nicole Hemsoth

March 27, 2014

The GPU Technology Conference this week in San Jose offered plenty of material for the supercomputing set with a number of presentations focused on specific programming challenges for large-scale scientific and enterprise HPC applications. The Titan system at Oak Ridge National Lab tied together key themes through a number of the talks, which helped put massive-scale use of GPUs in better context.

Jim Rogers, Director of Operations at the National Center for Computational Sciences at Oak Ridge National Laboratory described in detail how the 27-petaflop Titan system has been making use of its 18,688 NVIDIA Tesla K20 GPUs. Oak Ridge is able to track efficiency metrics through recent changes in the Kepler device driver and Cray’s software that allows for sophisticated reporting of GPU usage metrics for both memory use and scheduled work. Rogers used the data from these metrics to point to the some specific operational benefits to using GPUs over a multicore-only approach, estimating that their use of GPUs at such scale has offered over 5x the efficiency of CPU-only system.

titan_detailsThe efficiency and performance message seems to be resonating with an increasing number of users requesting allocations on Titan, says Fernanda Foertter, HPC User Support Specialist at Oak Ridge National Lab. In her GTC presentation about GPU interest and user needs on Titan, she highlighted the demand for GPU acceleration for a growing number of applications. Foertter was able to collect several perspectives from users of Titan about their experiences porting applications and making use of the accelerators and pointed to the role of acceleration for the future of exascale-class systems. Her presentation set the stage for a number of topics around GPU usage on Titan, particularly in terms of the coding support required for complex scientific and commercial codes.

Aside from details about general production and operation of the system, there were a number of users of the Titan system present to share experiences about porting and altering their codes as well as gauging performance against CPU-only systems. Among such users was Evghenii Gaburov, HPC Advisor at SURFsara, who described how his team was able to leverage Titan to simulate the evolution of the Milky Way on a star-by-star basis in just over a week. While he made no secret of the challenges in parallelizing an advanced hierarchical GPU tree-code for use on Titan, after some significant workarounds, they were able to redesign the communication strategy to maximize both the CPU and GPU use and allow their application to scale to over 8000 of Titan’s GPUs.

Others shared war stories about getting their codes primed to run on Titan and other GPU-powered supercomputers, including James Phillips, a senior research programmer at the University of Illinois. His team had already worked with the NAMD molecular dynamics code on Blue Waters and before they began to tap into Titan. Again, while there were significant software challenges, once the team overcame some of the core barriers of their legacy application using core CUDA 5.5 and Kepler features, they were able to improve their time to result—one that allows researchers to model the complete atomic structure of the HIV capsid.

Weather modeling efforts on Titan were a prime use case that opened the doors for researchers to talk about the use of GPUs at large scale to continue improving model resolution. Dag Lohmann, co-founder at catastrophe modeling company, Katrisk, described how his company, which was recently selected by Oak Ridge National Lab to use Titan for specific flooding events, was enthusiastic about the performance boost offered by GPUs. In addition to providing a great overview of catastrophe modeling in the context of global flood risk models, he detailed the challenges of getting their CUDA-based fluid mechanics code to run on the Keplers (in terms of code, data assimilation, data volume, etc). The end result of their work allows KatRisk to create probabilistic flood models and maps at high resolution.

tesla_cardAlso on the weather and climate front, Mark Govett, Chief of the Advanced Computing division at NOAA discussed the development, parallelization and performance of the NIM next-gen weather model for the Titan system, which will allow the weather agency to improve weather prediction accuracy. Specifically, Govett talked about NOAA’s experiences using OpenACC compilers—an important element since NOAA’s parallelization path has relied on a homegrown directive-based Fortran-to-CUDA  compiler to get the application ready to run at the full resolution across 5000 Titan nodes.

Others shared specific thoughts on code-related issues at Titan scale. For instance, Alan Gray, a research architect at EPCC at the University of Edinburgh described their work with a highly complex application that allowed his team to scale their soft matter physics code to over 8,000 GPUs on Titan. Specifically, he talked about the challenges and ultimate success of blending CUDA and MPI and shared details about their communication library, which can be adopted by others. Interestingly, with their code that supports bboth GPU and CPU-only versions, they were able to demonstrate a performance enhancement of 3.5-5x using the GPU variant against the same code running on fully utilized CPUs.

More researchers, including Mathias Wagner, from Bielfeld University and Indiana University, shared how GPUs are advancing quantum chromodynamics following his team’s preparation of complex code for Titan via the QUDA library. In a similar vein, Justin Foley, a developer at Microway and NVIDIA, described QUDA in more detail for the same research area, which rounded out the picture for Lattice Quantm Chromodynamics on Titan GPUs.

Researchers from GE Global were on hand as well to talk about scaling their codes to meet the GPU capabilities on Titan for gas turbine modeling and accelerating three-body molecular dynamics codes and others shared details about scaling to Titan heights for seismic and medical research applications.

On the code front, OpenACC was a hot topic among the HPC set. Rob Farber did an excellent job of highlighting some of the key trends in programming and optimizing for GPUs at large scale. He presented on new results that extend machine learning and big data analysis to 13 petaflops average sustained performance across 16,384 GPUs on Titan—a very popular topic.

As we noted earlier in the week, this GTC event didn’t seem to emphasize the gaming and entertainment crowd. The focus on large-scale analytics, cognitive computing, computer vision and of course, scientific computing were top of the charts in terms of sessions and posters. Jack Wells from Oak Ridge, who chaired the “Extreme Scale Supercomputing with the Titan Supercomputer” series for GTC was able to gather a representative sample of leading researchers to put real-world use and challenge context into the Titan story.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Microsoft, Nvidia Launch Cloud HPC

November 20, 2019

Nvidia and Microsoft have joined forces to offer a cloud HPC capability based on the GPU vendor’s V100 Tensor Core chips linked via an Infiniband network scaling up to 800 graphics processors. The partners announced Read more…

By George Leopold

Hazra Retiring from Intel Data Center Group, Successor Unknown

November 20, 2019

This article is an update to a story published earlier today. Rajeeb Hazra, corporate VP of Intel’s Data Center Group and GM for the Enterprise and Government Group, is retiring after more than 24 years at the compa Read more…

By Doug Black

Jensen Huang’s SC19 – Fast Cars, a Strong Arm, and Aiming for the Cloud(s)

November 20, 2019

We’ve come to expect Nvidia CEO Jensen Huang’s annual SC keynote to contain stunning graphics and lively bravado (with plenty of examples) in support of GPU-accelerated computing. In recent years, AI has joined the s Read more…

By John Russell

SC19 Student Cluster Competition: Know Your Teams

November 19, 2019

I’m typing this live from Denver, the location of the 2019 Student Cluster Competition… and, oh yeah, the annual SC conference too. The attendance this year should be north of 13,000 people, with the majority attende Read more…

By Dan Olds

Top500: US Maintains Performance Lead; Arm Tops Green500

November 18, 2019

The 54th Top500, revealed today at SC19, is a familiar list: the U.S. Summit (ORNL) and Sierra (LLNL) machines, offering 148.6 and 94.6 petaflops respectively, remain in first and second place. The only new entrants in t Read more…

By Tiffany Trader

AWS Solution Channel

Making High Performance Computing Affordable and Accessible for Small and Medium Businesses with HPC on AWS

High performance computing (HPC) brings a powerful set of tools to a broad range of industries, helping to drive innovation and boost revenue in finance, genomics, oil and gas extraction, and other fields. Read more…

IBM Accelerated Insights

Data Management – The Key to a Successful AI Project

 

Five characteristics of an awesome AI data infrastructure

[Attend the IBM LSF & HPC User Group Meeting at SC19 in Denver on November 19!]

AI is powered by data

While neural networks seem to get all the glory, data is the unsung hero of AI projects – data lies at the heart of everything from model training to tuning to selection to validation. Read more…

ScaleMatrix and Nvidia Launch ‘Deploy Anywhere’ DGX HPC and AI in a Controlled Enclosure

November 18, 2019

HPC and AI in a phone booth: ScaleMatrix and Nvidia announced today at the SC19 conference in Denver a joint offering that puts up to 13 petaflops of Nvidia DGX-1 compute power in an air conditioned, water-cooled ScaleMa Read more…

By Doug Black

Hazra Retiring from Intel Data Center Group, Successor Unknown

November 20, 2019

This article is an update to a story published earlier today. Rajeeb Hazra, corporate VP of Intel’s Data Center Group and GM for the Enterprise and Governm Read more…

By Doug Black

Jensen Huang’s SC19 – Fast Cars, a Strong Arm, and Aiming for the Cloud(s)

November 20, 2019

We’ve come to expect Nvidia CEO Jensen Huang’s annual SC keynote to contain stunning graphics and lively bravado (with plenty of examples) in support of GPU Read more…

By John Russell

Top500: US Maintains Performance Lead; Arm Tops Green500

November 18, 2019

The 54th Top500, revealed today at SC19, is a familiar list: the U.S. Summit (ORNL) and Sierra (LLNL) machines, offering 148.6 and 94.6 petaflops respectively, Read more…

By Tiffany Trader

ScaleMatrix and Nvidia Launch ‘Deploy Anywhere’ DGX HPC and AI in a Controlled Enclosure

November 18, 2019

HPC and AI in a phone booth: ScaleMatrix and Nvidia announced today at the SC19 conference in Denver a joint offering that puts up to 13 petaflops of Nvidia DGX Read more…

By Doug Black

Intel Debuts New GPU – Ponte Vecchio – and Outlines Aspirations for oneAPI

November 17, 2019

Intel today revealed a few more details about its forthcoming Xe line of GPUs – the top SKU is named Ponte Vecchio and will be used in Aurora, the first plann Read more…

By John Russell

SC19: Welcome to Denver

November 17, 2019

A significant swath of the HPC community has come to Denver for SC19, which began today (Sunday) with a rich technical program. As is customary, the ribbon cutt Read more…

By Tiffany Trader

SC19’s HPC Impact Showcase Chair: AI + HPC a ‘Speed Train’

November 16, 2019

This year’s chair of the HPC Impact Showcase at the SC19 conference in Denver is Lori Diachin, who has spent her career at the spearhead of HPC. Currently deputy director for the U.S. Department of Energy’s (DOE) Exascale Computing Project (ECP), Diachin is also... Read more…

By Doug Black

Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020

November 12, 2019

The number of top-tier HPC systems makers has shrunk due to a steady march of M&A activity, but there is increased diversity and choice of processing compon Read more…

By Tiffany Trader

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

Using AI to Solve One of the Most Prevailing Problems in CFD

October 17, 2019

How can artificial intelligence (AI) and high-performance computing (HPC) solve mesh generation, one of the most commonly referenced problems in computational engineering? A new study has set out to answer this question and create an industry-first AI-mesh application... Read more…

By James Sharpe

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

D-Wave’s Path to 5000 Qubits; Google’s Quantum Supremacy Claim

September 24, 2019

On the heels of IBM’s quantum news last week come two more quantum items. D-Wave Systems today announced the name of its forthcoming 5000-qubit system, Advantage (yes the name choice isn’t serendipity), at its user conference being held this week in Newport, RI. Read more…

By John Russell

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ Read more…

By John Russell

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber, Burak Yenier and Wolfgang Gentzsch, UberCloud

Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020

November 12, 2019

The number of top-tier HPC systems makers has shrunk due to a steady march of M&A activity, but there is increased diversity and choice of processing compon Read more…

By Tiffany Trader

Dell Ramps Up HPC Testing of AMD Rome Processors

October 21, 2019

Dell Technologies is wading deeper into the AMD-based systems market with a growing evaluation program for the latest Epyc (Rome) microprocessors from AMD. In a Read more…

By John Russell

Rise of NIH’s Biowulf Mirrors the Rise of Computational Biology

July 29, 2019

The story of NIH’s supercomputer Biowulf is fascinating, important, and in many ways representative of the transformation of life sciences and biomedical res Read more…

By John Russell

Xilinx vs. Intel: FPGA Market Leaders Launch Server Accelerator Cards

August 6, 2019

The two FPGA market leaders, Intel and Xilinx, both announced new accelerator cards this week designed to handle specialized, compute-intensive workloads and un Read more…

By Doug Black

When Dense Matrix Representations Beat Sparse

September 9, 2019

In our world filled with unintended consequences, it turns out that saving memory space to help deal with GPU limitations, knowing it introduces performance pen Read more…

By James Reinders

With the Help of HPC, Astronomers Prepare to Deflect a Real Asteroid

September 26, 2019

For years, NASA has been running simulations of asteroid impacts to understand the risks (and likelihoods) of asteroids colliding with Earth. Now, NASA and the European Space Agency (ESA) are preparing for the next, crucial step in planetary defense against asteroid impacts: physically deflecting a real asteroid. Read more…

By Oliver Peckham

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This