Cray Supers at Forefront of Earth Science Research

By Christopher Lazou

April 1, 2009

…and then there are the environmental consequences of our fossil fuel-based economy. Just about every scientist outside the White House (Bush administration) believes climate change is real, is serious, and is accelerated by the continued release of carbon dioxide. If the prospect of melting ice caps, rising sea levels, changing weather patterns, more frequent hurricanes, more violent tornadoes, endless dust storms, decaying forests, dying coral reefs, and increases in respiratory illness and insect-borne diseases – if all that does not constitute a serious threat, I don’t know what does….

Quote from p168, Barack Obama: “The Audacity of Hope.” Published by: Three Rivers Press, USA, 2006

Climate Change is of course global and no respecter of national boundaries. The heatwave in Spain and extreme flooding in the UK in 2008 are but two examples of extreme weather in Europe.

For HPC vendors the earth sciences segment provides a great business opportunity across the globe. Both improved predictability of severe weather events and climate change assessments for policymakers are high on national governments’ agendas. As the saying goes: “Every cloud has a silver lining.”

I caught up with Per Nyberg, director of marketing and business development for earth sciences at Cray Inc., following the 11th International Specialist Meeting on the Next Generation Models on Climate Change and Sustainability for Advanced High-Performance Computing Facilities held at Oak Ridge National Lab (ORNL).

ORNL is no stranger to being in the vanguard of HPC facilities. In the last few years ORNL has implemented one segment of the high productivity petaflops initiative spearheaded by DARPA funding. Its choice was a Cray XT product line system and its latest upgrade to the Cray XT5 system, named “Jaguar,” increased the system’s computing power to a peak of 1.64 petaflops, making Jaguar the world’s first petaflops system dedicated to open research.

Indeed, an ORNL research team recorded an unprecedented 1.35 petaflops sustained performance when running a superconductivity application used in nanotechnology and materials science research. The team’s simulation ran on over 150,000 of Jaguar’s 180,000-plus processing cores. The latest simulations on Jaguar were the first in which the team had enough computing power to move beyond ideal, perfectly-ordered materials to the imperfect materials that typify the real world.

The petaflops barrier was broken on a second application, with 1.05 petaflops of sustained performance. The new performance levels for this application, a first principles material science computer model used to perform studies involving the interactions between a large number of atoms, are expected to support advancements in magnetic storage.

One swallow may not signify summer, but two swallows hint of things to come.

Christopher Lazou: Per, it’s good that you can spare some time to talk to me. Let’s briefly explore some of the work done at Cray’s current customer base, discuss what products Cray has to offer earth sciences and in the process try to gain insight into your views concerning the benefits of using Cray systems in this important field. Tell me about Cray’s history in this area and where this community fits into Cray’s future.

Per Nyberg: Cray has a rich history of long and successful relationships with the weather, climate and oceanographic communities, and is committed to providing the best possible solutions to this marketplace. It is a key business area for Cray and the demands of complex, computationally-intense earth system models factor heavily into our ongoing research and development programmes.

The needs of scientists studying the earth’s system are consistently cited in both defining and justifying the need for sustained petaflops computing. A significant percentage of Cray’s revenue is invested in research and development, and the needs of this community play a central role in defining our future products and technologies.

Lazou: Who are Cray’s significant customers in the earth sciences?

Nyberg: This is a key application area that spans nearly the entire spectrum of our customer base, ranging from those whose core business is earth system modelling to multi-disciplinary HPC centres.

In recent years Cray has sold large systems to national meteorological and hydrological services in countries such as Switzerland, Spain, India and South Korea. Our two most recent installations are Cray XT5 systems at the Danish Meteorological Institute (DMI) and the U.S. Naval Oceanographic Office.

As an example, MeteoSwiss uses a Cray XT4 located at the Swiss Centre for Scientific Computing (CSCS) for their operational requirements. With this capability, MeteoSwiss has been able to implement one of the highest resolution regional models in Europe, a key requirement for accurate forecasts in their challenging mountainous terrain.

Nearly every large scientific HPC site in government and academia uses some of their computing resources for earth system modelling. Examples of these include the HECToR system at Edinburgh Parallel Computing Centre (EPCC) in the UK, the Bergen Centre for Computational Science (BCCS) in Norway, the Centre for Scientific Computing (CSC) in Finland, the University of Tennessee / National Science Foundation, the National Energy Research Scientific Computing Center (NERSC) and Oak Ridge National Laboratory (ORNL).

The recent 11th International Specialist Meeting on the Next Generation Models on Climate Change and Sustainability for Advanced High-Performance Computing Facilities was very aptly held at ORNL. The Cray petaflops system at ORNL, “Jaguar,” is the only open science petaflops system in the world and the first such system available to the climate community. This is obviously a milestone in high performance computing in general, but also specifically for the climate community, which has been reiterating the importance of such systems for many years.

Lazou: What examples can you cite of ground-breaking science that is being done at these centres?

Nyberg: Two examples that come to mind are the Climate Science Computational End Station usage of the Cray XT systems at ORNL and NERSC, and the U.S. National Oceanic and Atmospheric Administration (NOAA) Hazardous Weather Testbed Spring 2008 experiment conducted on the Cray XT3 at the Pittsburgh Supercomputing Center (PSC).

In preparation for the fifth IPCC assessment, the U.S. Department of Energy, National Science Foundation, National Aeronautics and Space Administration and university researchers have partnered in a Climate Science Computational End Station Development and Grand Challenge Team. The aim is achieving unprecedented simulations and coordinated model development on the next-generation climate model. With millions of hours of access to the Cray systems at ORNL and 380+ teraflop Cray XT4 at NERSC, IPCC researchers will be able to apply greater computational resources to climate problems than ever before. This is a ground breaking capability.

This past spring the University of Oklahoma’s Center for Analysis and Prediction Storms (CAPS) used the Cray XT3 system at PSC to incorporate real-time radar data into their high-resolution thunderstorm forecasting model for the first time. Observational data from more than 120 weather radars enabling the most realistic storm prediction to date. This was part of the annual NOAA Hazardous Weather Testbed Spring Experiment and was a key step in their ability to predict storms more accurately and with improved lead time.

We are now seeing real applications scaling on 10,000’s of cores at our largest customers and enabling simulations not previously possible. One example from ORNL is a 5 km semi-hemispheric run of the WRF (Weather Research and Forecasting) model on 150,000 cores sustaining over 50 teraflops per second. This level of scaling and sustained performance has never been seen before on such an application.

Lazou: You mentioned a recent Cray XT5 installation at DMI. Can you tell me a little about the system that has been installed?

Nyberg: We are very excited about the installation at DMI. In fact the DMI “system” is composed of two identical Cray XT5s. Like many numerical weather prediction centres today, the DMI design was to have two identical systems, one for operations and one for a dual research and failover role. The two systems are integrated through a single global file-system where the two Cray XT5 systems are both clients on a shared Lustre global file-system. This provides maximum flexibility and resiliency, while maintaining the highest levels of performance.

Of course it is the performance of the HIRLAM weather model on the XT5 systems that is crucial, but the reality today is that system performance is just one dimension of the buying decision. The overall environment needs to meet the centre’s objectives for on-time delivery of meteorological products and cost-of-ownership criteria including electrical consumption, system utilization and management.

Lazou: Beyond the role of HPC supplier, how is Cray involved in this community?

Nyberg: A driving mission for Cray is that it helps customers solve their most challenging computational science problems. Cray has been and continues to be engaged in a number of activities which support advanced science through achieving greater performance with earth system models. Efforts range from working directly with application developers, to involvement in community efforts and fostering the greater use of HPC in academia. These engagements are often done through our Centers of Excellence such as those at HECToR and ORNL. An example of a more extensive partnership is the Earth System Research Centre (ESRC) which was jointly established by the Korean Meteorological Administration (KMA) and Cray to advance the science of earth-system modelling over the East-Asia Pacific region. The third round of ESRC sponsored projects was just recently announced.

Lazou: You mentioned the Cray “petascale” system installed at ORNL. The requirement for petaflops computing has been a stated objective by the climate community for some time, and there continues to be efforts worldwide to secure access to this capability. From a general perspective, can you comment on the challenges involved in petaflops computing?

Nyberg: Securing the highest possible performance capabilities has always been a key requirement in advancing the state of climate science. This was a clear message at the World Modelling Summit for Climate Prediction earlier this summer and in the recommendations of U.S. weather and climate leaders in August to greatly increase computing power available to the weather and climate community. It is also important to note that computing power was just one of the areas addressed by these recommendations.

From a computing perspective, the successful realization of a sustained petaflops will depend on the convergence of multiple disciplines and stakeholders. Let’s be realistic about the scale and complexity of these systems. There has always been a tendency to over-simplify on one metric. Peak flops being the most obvious one. The reality, however, is that sustaining a petaflops will require all system aspects to be petascale. Application software, system software, system I/O, external peripherals, scheduling, RAS, management and so on, are all on the critical path. Even in the case of using a system for a throughput oriented workload, such as a many member ensemble, the resulting I/O and scheduling challenges remain petascale.

Lazou: With global warming upon us, and energy security high on every national government’s agenda, how energy efficient are Cray systems?

Nyberg: The issues of energy costs and efficient power usage are foremost in the minds of HPC centres worldwide. Modern HPC demands are calling for computer systems to do more computing in less space and Moore’s law has kept processing power up to this demand. However, this means individual compute racks are currently pushing 40 kW, or more, and there is a need to rapidly adapt cooling at both the facility and rack level.

Cray has been a leader in power and cooling technologies, including liquid cooling, since the Cray-1 in 1976. With the drive towards ever increasing system sizes, we are concerned with addressing all the requirements that will ultimately define their success and usability.

We recently announced a novel, non-invasive approach to heat removal that brings the refrigeration to the cabinet, transferring heat with a patented “flooded coil” cycle. This technology, termed ECOphlex (Phase-change Liquid Exchange), is designed to be “room air neutral,” meaning that the temperature of the air entering the system is roughly the same as the temperature of the air exiting the system. In a recent test at a government site, ECOphlex technology removed 100 percent of the heat.

ECOphlex uses efficient air flow to remove heat from the base components, and a phase-change refrigerant system to remove heat from the air prior to leaving the cabinet. The technology’s phase-change coil is more than 10 times as efficient at removing heat from the compute cabinets as a water coil of similar size. There is also the flexibility to use chilled or un-chilled water at various temperatures. This promotes energy savings by enabling greater system density, reducing the need for expensive air cooling and air conditioners, and limiting the need for chilled water.

Lazou: I think we explored a fair number of issues. Thank you, Per, for your time and frank answers. I am sure our readers will find your views very interesting.

—–

Note: For those interested in the development of meteorology in the last fifty years as seen through the eyes of a tireless worker from NCAR, and laced with a human touch, I recommend the excellent autobiographical book, “Odyssey in climate modelling, global warming and advising five presidents,” by Dr Warren M. Washington, edited by his wife Mary and published by Lulu (http://www.lulu.com).

The ISC’09, to be held in Hamburg, Germany, June 23-26, 2009, is organizing a special in-depth session on earth sciences on Tuesday, June 23, featuring four hours of detailed presentations and discussions. In addition, Hans Meuer is planning a great party in Hamburg with all the HPC vendors and practitioners in attendance, so try to get there — do not miss out.

Copyright (c) Christopher Lazou. April 2009. Brands and names are the property of their respective owners.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Research senior analyst Steve Conway, who closely tracks HPC, AI, Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical.  A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire