After six months of tweaking – producing a 20 percent reduction in time-to-solution for weather forecasting – MeteoSwiss, the Federal Office of Meteorology and Climatology, today reported its next generation COSMO-1 forecasting system is now operational. COSMO-1 requires 20 times the computing power of COSMO-2 and runs on the hybrid CPU-GPU supercomputer, Piz Kesch, operated by the Swiss National Supercomputing Centre (CSCS) and custom built in collaboration with Cray and NVIDIA.
COSMO-1 was put into service last September (see, Today’s Outlook: GPU-accelerated Weather Forecasting, HPCwire) and improves resolution from 2.2 km to 1.1 km over COSMO-2, an important advance, particularly for Alpine topography forecasts where high spatial resolution is required to accurately predict local weather events such as thunderstorms and thermally induced mountain and valley wind systems.
Each node on the MeteoSwiss system has 8 GPUs, which cumulatively deliver 90 percent of the flops (48 CPUs and 192 K80s). The two cabinets of the Cray CS-Storm supercomputer at CSCS are tightly packed and together deliver 282.5 teraflops (LINPACK), sufficient to give this system TOP500 standing. Each cabinet consists of 12 hybrid computing nodes for a total of 96 NVIDIA Tesla K80 GPU accelerators and 24 Intel Haswell CPUs. CSCS says the hybrid system performs simulations, which are three times more energy-efficient and twice as fast as conventional CPUs.
Weather forecasting has always been computationally intensive. Generally speaking, a weather model samples the state of the atmosphere at a given time, and uses fluid motion and thermodynamics equations to predict the state of the atmosphere at some time in the future. The model divides a forecast region into a grid, and the equations are solved within each grid cell with interactions between the neighboring cells to compute a prediction. The closer grid points are to one another, the higher the overall model resolution which leads to increased realism in the final forecast.
Optimizing COSMO-1 code — even before going live in September — was a significant effort and important to achieving initial performance gains. Weather forecasting apps are typically 10–20-years old or more, and tend to be written in Fortran, according to Roy Kim, group product manager of accelerated computing at NVIDIA. A combination of CUDA and OpenACC were used to optimize and port the code for GPUs.
In labeling the new COSMO-1 systems as now operational, “we have contractual constraints that we can actually deliver with a certain reliability and within a certain time frame,” said Oliver Fuher, a team leader at MeteoSwiss. The spec requires a single day forecast to be completed in a half-hour. When fired up in September, COSMO-1 was slightly above that limit. Tweaks included, for example, speeding inter-GPU communication, including the MPI communications, adopting asynchronous communication, and increasing the parallelism in the code. The hardware, he reports, was very stable.
“This is the first time we’ve run on such a hybrid system. We didn’t have a lot of experience with how the system behaves, [for example] if you have failures, what the procedures to cope with those are. These nodes are fat and when one node goes down you basically lose a lot of the computational power in your system,” said Fuher. “We updated the software environment, updated the libraries, and updated the SLURM scheduler. On the hardware side things stayed pretty constant.”
As a rule, weather forecasting today uses complex programs, so-called numerical models, which simulate developments in the atmosphere based on numerical formulae. MeteoSwiss uses the COSMO model, which has been developed in collaboration with the international Consortium for Small-scale Modeling (COSMO).
The complex software codes have been steadily upgraded in preparation for the switchover to a GPU-based computer system during the last five years. In this effort, MeteoSwiss collaborated closely with ETH Zurich researchers, C2SM and CSCS, under the umbrella of the HP2C (High Performance and High Productivity Computing) and PASC (Platform for Advanced Scientific Computing) initiatives.
Now in full production mode, COSMO-1 will be run every three hours for forecasts of up to 33 hours into the future. For warnings concerning the following day, simulations will be run once a day looking out 45 hours ahead, according to MeteoSwiss. Operation of COSMO-2 is scheduled to cease in autumn 2016.