RENCI/Dell Supercomputer Charts Hurricane Matthew’s Storm Surge

By John Russell

October 6, 2016

Hurricane Matthew, now headed into Florida having already hammered Haiti and other parts of the Caribbean, is a stark reminder of the importance of computer modeling not only in predicting the storm strength and path but also in predicting and plotting the storm surge which is often its most destructive component. Right now, the Hatteras supercomputer (Dell) at Renaissance Computing Institute (RENCI) in North Carolina is doing just that for Hurricane Matthew.

Named after North Carolina’s famous Outer Banks lighthouse, the Hatteras supercomputer is a 150-node M420 Dell cluster (full specs at the end of article) that runs the ADCIRC storm surge model every six hours when a hurricane is active. Visualizations of the models appear on the Coastal Emergency Risks Assessment website. The outputs from these runs are incorporated into guidance information by the National Weather Service, the National Hurricane Center, and agencies such as the U.S. Coast Guard, the U.S. Army Corps of Engineers, FEMA, and local and regional emergency management divisions.

The models are a tool used to help make decisions about evacuations, and where to position supplies and response personnel. In Florida, Governor Rick Scott has urged about 1.5 million Floridians in the storm’s path to evacuate. Hurricane Matthew, whose winds have again reached 140 miles per hour as it nears the Florida coast turning it into a Category 4 storm, has already killed more than 200 people.

The work to apply high-performance computing and data analysis to understanding dangerous storm surges is part of a long-term collaboration involving RENCI, the Coastal Resilience Center at UNC-Chapel Hill, and UNC’s Institute of Marine Sciences. Over the last 10 years, Brian Blanton, a coastal oceanographer and director of RENCI environmental initiatives, has worked closely with Rick Luettich, lead principal investigator of the Coastal Resilience Center and director IMS, and others to enhance and improve the ADCIRC coastal circulation and storm surge model.

matthew-renci-640x437“We model the way the ocean moves and particularly the ocean and coastal areas and so we are trying to always predict that. It moves because of tides, because of rivers that flow into it, it also moves because of the wind and so when we get these severe storms whether they are winter Nor’easters or hurricanes like Matthew, they blow the wind around if you will, in particularly when they blow it up onto shore then it causes flooding and we have what typically refer to as storm surge,” said Leuttich.

Every time the Dell system at RENCI computes another storm surge model for use by the emergency response community, Blanton is busy running a series of at least nine possible storm surge scenarios on the same HPC system. The process is much like ensemble weather forecasting, where meteorologists run a large number of weather models using slightly different initial conditions in order to account for the uncertainty in such a dynamic system.

The model output available on the web for Matthew can resolve the detail of coastal storm surge to a level of less than 200 meters. And the team’s current research could mean that storm surge models next year will provide even more detail and accuracy.  “We are working on doing storm surge predictions the same way that meteorologists develop predictions for rain and wind speeds,” said Blanton. “It will provide high-resolution storm surge probabilities that account for uncertainty in the track and intensity of hurricane forecasts.” Blanton said the research team plans to acquire enough test simulations this year to be able to produce ensemble models regularly for hurricane season 2017.

renci-official-logo1-300x160ADCIRC – a system of computer programs for solving time dependent, free surface circulation and transport problems in two and three dimensions – was developed by Luettich and researchers at the University of Notre Dame. These programs utilize the finite element method in space allowing the use of highly flexible, unstructured grids. The researchers and developers who maintain the software and develop the visual models represent universities on the East and Gulf coasts as well as agencies such as the National Oceanic and Atmospheric Agency, the National Weather Service, the National Science Foundation, and the Department of Homeland Security.

In one sense, storm surge forecasting is lower on the HPC totem pole than weather forecasting in terms of access to necessary resources. The major weather forecasting services often have access to bigger machines, modernized codes, and sometime can be the dominant user of the resource. These agencies use ensemble of modeling – sometimes looking at thousands of models as well as other data sources such as that from hurricane hunter aircraft to “develop with a hand-created forecast.” Even then, as the forecast extends out a couple of days it’s uncertainty grows significantly.

In times of an event such as Hurricane Matthew the National Weather Forecasting Service uses its substantial resources to update its forecast every six hours. Keeping pace is a challenge for the storm surge forecasters. “If it takes us five and a half hours to do a run and process it and get everything displayed and out there for the public to see, then it is pretty much useless. Its relevancy window has left. I typically think two hours is the maximum amount of time we have to stay relevant and I am much happier if we can get results done in an hour,” said Luettich.

Luettich’s team starts with the basic forecast provided by the National Hurricane Center and runs that through its model: “It’s the hurricane center forecast and it’s the first thing we want to go out because that’s our best estimate of what’s likely to occur. The next question is what’s the range of things that could occur. The only way we can address that issue of range [is] using ensembles. At that point we have to do multiple runs to try to bracket and depending on what we have for resources we can do this either heuristically, just picking a couple of storms or a few storms to give us kind of a sensitivity study, or ideally we can get into the dozens or hundreds storms to give us truly a statistically valid population that we can then compute statistics from and whatnot. In a nutshell that’s the challenge,” he said.

A single run on several hundred to one-thousand processors may take hours. “The challenge for us, as the ocean modelers, as storm surge modelers, is to properly account for that uncertainty in the way in which we deliver forecasts of the ocean’s response. So right now we do the forecast which is right smack down the middle of that cone of uncertainty and then we will do a few runs which kind of bracket either the possible track variations over time or changes to the predictive intensity of the storm.”

Hatteras Supercomputer by Dell at RENCI
Hatteras Supercomputer by Dell at RENCI

Perhaps not surprisingly, access to sufficient compute horsepower is a bottleneck. “We are fortunate if we can get enough computer horsepower either at RENCI and RENCI is our go-to-place for in-house HPC but realistically we can get enough processors there to do more than one or two runs each compute cycle. We collaborate with folks at LSU and TACC and other places so we can typically add in a few more runs but we are still only a the phase of being able to do the primary forecast and a few sensitivity runs around it.”

The need for speed, emphasizes Luettich, is critical, however it’s important to note the ADCIRC tools are also used extensively in design and hazard assessment, which are generally not time-constrained projects.

“By far these models are used, [maybe] 100X more often than for active storms, for design purposes. For example a model we developed was used by the Army Corp. to design the hurricane protection systems that is now around New Orleans. [It’s] also being used to design a major levy system (so-called Ike Dike) that might protect the Houston Galveston area in the future. So it is very much a design tool and gets used extensively for that purpose.”

Secondly the models are used to define what the hazards of storm surge are in coastal regions. “FEMA uses it for 100-year flood levels and where those are for insurance purposes,” he noted. Recently the Nuclear Regulatory Commission has been using it to define what the threats are to coastal nuclear power plants. All of that work goes on outside of the context of actual event.

“It’s very HPC intensive. We may end up having to run many, many hundreds or thousands of storms to get a full sweep of the design or the hazard situation that exists. But time is not nearly as big a constraint. If it takes a run one hour or five hours or ten hours to do as long as you can stack up the hundreds or thousands of runs you need and get them done over a reasonable time, a few months or a year or whatever your study length, it’s [acceptable].”

That said, Leuttich and his colleagues are actively pushing to advance ADCIRC on at least three fronts. Leuttich notes the code, though old, is already very parallelizable and already scales well on existing architecture, but not on newer architecture. Moreover, rigid code parallelization isn’t always the best approach. He singled the following three areas of active effort:

  • Parallelization. “In these modeling applications we need very high resolution in these areas where the storm is impacting but in other areas we can use very low resolution. Yet to automate the process in the parallelization, the leading parallelization paradigm middleware that is out there is very challenging. So we have a NSF funded project that is looking into new parallelization strategies that will allow us to optimize our calculations and consequently be much more efficiently and faster.”
  • Modern Hardware. ADCIRC have started looking into manycore chips such as Intel’s newly-released Knights Landing Phi. That’s one area. “It looks like it is going to take some code reengineering to optimize the code for use on that hardware but that’s is something that we are starting to think about at RENCI. In the last month or so, gotten [KNL-based system] that will give us at least the opportunity to test some of software re-engineering we have to do to see how extensive it is and to what extent we can get performance increases.”
  • irods_logo_hdMore Computers. “The third direction is looking for other partners and in fact our colleagues at RENCI have been extremely helpful. One of their fortés is the iRODS systems and ability to move data around between HPC centers, distributed HPC. We wouldn’t want to necessarily distribute a single run among centers at various locations but again thinking back to the ensemble approach if we can farm out X number of runs to different machines at different location and compile the information back efficiently then that may help us considerably, and that may even include a cloud type application.”

Interestingly, the ADCIRC code has not performed well on GPUs. “It is predominantly because of the way the algorithms are written; they are not terribly compatible with GPU acceleration,” said Luettich.

Without doubt, a certain amount of inertia exists in the code, says Luettich, and a massive rewrite to take advantage of the next generation of hardware may be necessary. Funding is always an issue for projects such ADCIRC. Luettich noted, “Think about how much damage is going to result from this Hurricane Matthew. Imagine if you took one percent of that and invested it in computer resources, whether hardware or software, what advances we could make and what the returns in lessened damage in the future would be.”

Hatteras Supercomputer Profile (from RENCI web site)

Deployed in summer 2013 and expanded in early 2014, Hatteras is a 5168-core cluster running CentOS Linux.  Hatteras is not fully MPI interconnected, and is instead segmented into several independent sub-clusters with varying architectures.  Hatteras is capable of concurrently running 9 512-way ensemble members.  Hatteras uses Dell’s densest blade enclosure to allow for maximum core-count within each chassis.

Hatteras’ sub-clusters have the following configurations:

  • Chassis 0-3 (512 interconnected cores per chassis)
    • 32 x Dell M420 quarter-height blade server
      • Two Intel Xeon E5-2450 CPUs (2.1GHz, 8-core)
      • 96GB 1600MHz RAM
      • 50GB SSD for local I/O
    • 40Gb/s Mellanox FDR-10 Interconnect
  • Chassis 4-7 (640 interconnected cores per chassis)
    • 32 x Dell M420 Quarter-Height Blade Server
      • Two Intel Xeon E5-2470v2 CPUs (2.4GHz, 10-core)
      • 96GB 1600MHz RAM
      • 50GB SSD for local I/O
    • 40Gb/s Mellanox FDR-10 Interconnect
  • Hadoop (560 interconnected cores)
    • 30 x Dell R720xd 2U Rack Server
      • Two Intel Xeon E5-2670 processors (16 cores total @ 2.6GHz)
      • 256GB RDIMM RAM @ 1600MHz
      • 36 Terabytes (12 x 3TB) of raw local disk dedicated to the node
      • 146GB RAID-1 volume dedicated for OS
      • 10Gb/s Dedicated Ethernet NAS Connectivity
    • 2 x Dell R820 2U Rack Server (LargeMem)
      • Four Intel Xeon E5-4640v2 processors (40 cores total @ 2.2GHz)
      • 1.5TB LRDIMM RAM @ 1600MHz
      • 9.6 Terabytes (8 x 1.2TB) of raw local disk dedicated to the node
      • 10Gb/s Dedicated Ethernet NAS Connectivity
    • 56Gb/s Mellanox FDR Infiniband Interconnect
    • 40Gb/s Mellanox Ethernet Interconnect

Related Links
ADCRIC website
Coastal Resilience Center Website
Institute of Marine Sciences Website

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

SRC Spends $200M on University Research Centers

January 16, 2018

The Semiconductor Research Corporation, as part of its JUMP initiative, has awarded $200 million to fund six research centers whose areas of focus span cognitive computing, memory-centric computing, high-speed communicat Read more…

By John Russell

US Seeks to Automate Video Analysis

January 16, 2018

U.S. military and intelligence agencies continue to look for new ways to use artificial intelligence to sift through huge amounts of video imagery in hopes of freeing analysts to identify threats and otherwise put their Read more…

By George Leopold

URISC@SC17 and the #LongestLastMile

January 11, 2018

A multinational delegation recently attended the Understanding Risk in Shared CyberEcosystems workshop, or URISC@SC17, in Denver, Colorado. URISC participants and presenters from 11 countries, including eight African nations, 12 U.S. states, Canada, India and Nepal, also attended SC17, the annual international conference for high performance computing, networking, storage and analysis that drew nearly 13,000 attendees. Read more…

By Elizabeth Leake, STEM-Trek Nonprofit

HPE Extreme Performance Solutions

HPE and NREL Take Steps to Create a Sustainable, Energy-Efficient Data Center with an H2 Fuel Cell

As enterprises attempt to manage rising volumes of data, unplanned data center outages are becoming more common and more expensive. As the cost of downtime rises, enterprises lose out on productivity and valuable competitive advantage without access to their critical data. Read more…

When the Chips Are Down

January 11, 2018

In the last article, "The High Stakes Semiconductor Game that Drives HPC Diversity," I alluded to the challenges facing the semiconductor industry and how that may impact the evolution of HPC systems over the next few years. I thought I’d lift the covers a little and look at some of the commercial challenges that impact the component technology we use in HPC. Read more…

By Dairsie Latimer

SRC Spends $200M on University Research Centers

January 16, 2018

The Semiconductor Research Corporation, as part of its JUMP initiative, has awarded $200 million to fund six research centers whose areas of focus span cognitiv Read more…

By John Russell

When the Chips Are Down

January 11, 2018

In the last article, "The High Stakes Semiconductor Game that Drives HPC Diversity," I alluded to the challenges facing the semiconductor industry and how that may impact the evolution of HPC systems over the next few years. I thought I’d lift the covers a little and look at some of the commercial challenges that impact the component technology we use in HPC. Read more…

By Dairsie Latimer

How Meltdown and Spectre Patches Will Affect HPC Workloads

January 10, 2018

There have been claims that the fixes for the Meltdown and Spectre security vulnerabilities, named the KPTI (aka KAISER) patches, are going to affect applicatio Read more…

By Rosemary Francis

Momentum Builds for US Exascale

January 9, 2018

2018 looks to be a great year for the U.S. exascale program. The last several months of 2017 revealed a number of important developments that help put the U.S. Read more…

By Alex R. Larzelere

ANL’s Rick Stevens on CANDLE, ARM, Quantum, and More

January 8, 2018

Late last year HPCwire caught up with Rick Stevens, associate laboratory director for computing, environment and life Sciences at Argonne National Laboratory, f Read more…

By John Russell

Chip Flaws ‘Meltdown’ and ‘Spectre’ Loom Large

January 4, 2018

The HPC and wider tech community have been abuzz this week over the discovery of critical design flaws that impact virtually all contemporary microprocessors. T Read more…

By Tiffany Trader

The @hpcnotes Predictions for HPC in 2018

January 4, 2018

I’m not averse to making predictions about the world of High Performance Computing (and Supercomputing, Cloud, etc.) in person at conferences, meetings, causa Read more…

By Andrew Jones

Fast Forward: Five HPC Predictions for 2018

December 21, 2017

What’s on your list of high (and low) lights for 2017? Volta 100’s arrival on the heels of the P100? Appearance, albeit late in the year, of IBM’s Power9? Read more…

By John Russell

US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021

September 27, 2017

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the "Aurora" supercompute Read more…

By Tiffany Trader

Japan Unveils Quantum Neural Network

November 22, 2017

The U.S. and China are leading the race toward productive quantum computing, but it's early enough that ultimate leadership is still something of an open questi Read more…

By Tiffany Trader

AMD Showcases Growing Portfolio of EPYC and Radeon-based Systems at SC17

November 13, 2017

AMD’s charge back into HPC and the datacenter is on full display at SC17. Having launched the EPYC processor line in June along with its MI25 GPU the focus he Read more…

By John Russell

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

IBM Begins Power9 Rollout with Backing from DOE, Google

December 6, 2017

After over a year of buildup, IBM is unveiling its first Power9 system based on the same architecture as the Department of Energy CORAL supercomputers, Summit a Read more…

By Tiffany Trader

Fast Forward: Five HPC Predictions for 2018

December 21, 2017

What’s on your list of high (and low) lights for 2017? Volta 100’s arrival on the heels of the P100? Appearance, albeit late in the year, of IBM’s Power9? Read more…

By John Russell

GlobalFoundries Puts Wind in AMD’s Sails with 12nm FinFET

September 24, 2017

From its annual tech conference last week (Sept. 20), where GlobalFoundries welcomed more than 600 semiconductor professionals (reaching the Santa Clara venue Read more…

By Tiffany Trader

Chip Flaws ‘Meltdown’ and ‘Spectre’ Loom Large

January 4, 2018

The HPC and wider tech community have been abuzz this week over the discovery of critical design flaws that impact virtually all contemporary microprocessors. T Read more…

By Tiffany Trader

Leading Solution Providers

Perspective: What Really Happened at SC17?

November 22, 2017

SC is over. Now comes the myriad of follow-ups. Inboxes are filled with templated emails from vendors and other exhibitors hoping to win a place in the post-SC thinking of booth visitors. Attendees of tutorials, workshops and other technical sessions will be inundated with requests for feedback. Read more…

By Andrew Jones

Tensors Come of Age: Why the AI Revolution Will Help HPC

November 13, 2017

Thirty years ago, parallel computing was coming of age. A bitter battle began between stalwart vector computing supporters and advocates of various approaches to parallel computing. IBM skeptic Alan Karp, reacting to announcements of nCUBE’s 1024-microprocessor system and Thinking Machines’ 65,536-element array, made a public $100 wager that no one could get a parallel speedup of over 200 on real HPC workloads. Read more…

By John Gustafson & Lenore Mullin

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Flipping the Flops and Reading the Top500 Tea Leaves

November 13, 2017

The 50th edition of the Top500 list, the biannual publication of the world’s fastest supercomputers based on public Linpack benchmarking results, was released Read more…

By Tiffany Trader

GlobalFoundries, Ayar Labs Team Up to Commercialize Optical I/O

December 4, 2017

GlobalFoundries (GF) and Ayar Labs, a startup focused on using light, instead of electricity, to transfer data between chips, today announced they've entered in Read more…

By Tiffany Trader

HPC Chips – A Veritable Smorgasbord?

October 10, 2017

For the first time since AMD's ill-fated launch of Bulldozer the answer to the question, 'Which CPU will be in my next HPC system?' doesn't have to be 'Whichever variety of Intel Xeon E5 they are selling when we procure'. Read more…

By Dairsie Latimer

Nvidia, Partners Announce Several V100 Servers

September 27, 2017

Here come the Volta 100-based servers. Nvidia today announced an impressive line-up of servers from major partners – Dell EMC, Hewlett Packard Enterprise, IBM Read more…

By John Russell

How Meltdown and Spectre Patches Will Affect HPC Workloads

January 10, 2018

There have been claims that the fixes for the Meltdown and Spectre security vulnerabilities, named the KPTI (aka KAISER) patches, are going to affect applicatio Read more…

By Rosemary Francis

  • arrow
  • Click Here for More Headlines
  • arrow
Share This