Since 1986 - Covering the Fastest Computers in the World and the People Who Run Them

Language Flags
August 19, 2014

ALCF Optimizes I/O with Innovative ‘Cache’

Tiffany Trader

The Argonne Leadership Computing Facility (ALCF), a DOE Office of Science user facility, is on-track to hosting the fastest GPFS file system in the world. The innovative storage upgrade project is mainly concerned with reducing the amount of time users have to spend managing the massive amounts of data generated by the organization’s supercomputers.

Ask a room of computational scientists about their day to day challenges and chances are that data management will rank pretty high. Transferring files and moving or storing data can also be time-consuming. Optimization efforts seek to reduce this “distraction” so users can spend more time on their core work.

“I/O is generally considered overhead because it’s time not spent doing computations,” said ALCF Director of Operations Bill Allcock, who is heading up the storage upgrade. “The goal is to have a system that moves the data as fast as possible, and as easily as possible so users can focus on the science.”

ALCF 080514-storage-diagram

The first phase of the upgrade, already completed by the ALCF’s operations team, added a second system to compliment the primary disk storage system, an IBM General Parallel File System (GPFS) that offers 20 petabytes (PB) of usable space and a maximum transfer speed of 240 gigabytes per second (GB/s). The second GPFS configuration provided an additional 7 PB of storage and 90 GB/s of transfer speed. Despite their being two filesystems, accessing project data is enabled by what appears to be a single project root directory.

According to the ALCF team, the next phase of the storage upgrade is where the real innovation lies. The first step was to install 30 GPFS Storage Servers (GSS) between the compute system and the two storage systems. IBM is helping the operations crew to customize and test the system’s Active File Management (AFM) feature, which will enable it to be used like a cache.

The ALCF explains:

In essence, this GSS system will serve as an extremely large and extremely fast cache, offering 13 PB of space and 400 GB/s of transfer speed. The idea is that it will act as a buffer to prevent the compute system from slowing down due to defensive I/O (also known as checkpointing), analysis and visualization efforts, and delays caused by data being written to storage.

“We’re basically developing a storage system that looks like a processor,” Allcock said. “To the best of my knowledge, no other facility is doing anything like this yet.”

Projects will write to the cache, and then the AFM software will copy the data to the project storage systems. Files will be removed from the cache according to utilization and retention rules, but users will still be able to access those files seamlessly without having to know whether they are still on the cache or in storage.

“They will have the option to check where the data is located,” says Allock, “but because the cache is so huge, odds are they will never need to stage the data back into the cache after it has been evicted.”

The cache-like configuration is scheduled to come online this fall.

Tags: , , , ,

SC14 Virtual Booth Tours

AMD SC14 video AMD Virtual Booth Tour @ SC14
Click to Play Video
Cray SC14 video Cray Virtual Booth Tour @ SC14
Click to Play Video
Datasite SC14 video DataSite and RedLine @ SC14
Click to Play Video
HP SC14 video HP Virtual Booth Tour @ SC14
Click to Play Video
IBM DCS3860 and Elastic Storage @ SC14 video IBM DCS3860 and Elastic Storage @ SC14
Click to Play Video
IBM Flash Storage
@ SC14 video IBM Flash Storage @ SC14  
Click to Play Video
IBM Platform @ SC14 video IBM Platform @ SC14
Click to Play Video
IBM Power Big Data SC14 video IBM Power Big Data @ SC14
Click to Play Video
Intel SC14 video Intel Virtual Booth Tour @ SC14
Click to Play Video
Lenovo SC14 video Lenovo Virtual Booth Tour @ SC14
Click to Play Video
Mellanox SC14 video Mellanox Virtual Booth Tour @ SC14
Click to Play Video
Panasas SC14 video Panasas Virtual Booth Tour @ SC14
Click to Play Video
Quanta SC14 video Quanta Virtual Booth Tour @ SC14
Click to Play Video
Seagate SC14 video Seagate Virtual Booth Tour @ SC14
Click to Play Video
Supermicro SC14 video Supermicro Virtual Booth Tour @ SC14
Click to Play Video