Spider Up and Spinning Connections to All Computing Platforms at ORNL

By Agatha Bardoel

July 9, 2009

Spider, the world’s biggest Lustre-based, centerwide file system, has been fully tested to support Oak Ridge National Laboratory’s (ORNL’s) new petascale Cray XT4/XT5 Jaguar supercomputer and is now offering early access to scientists.

An extremely high-performance file system, Spider has 10.7 petabytes of disk space and can move data at more than 240 gigabytes a second. “It is the largest-scale Lustre file system in existence,” said Galen Shipman, Technology Integration Group leader at ORNL’s National Center for Computational Sciences (NCCS). “What makes Spider different [from large file systems at other centers] is that it is the only file system for all our major simulation platforms, both capable of providing peak performance and globally accessible.”

Ultimately, it will connect to all of ORNL’s existing and future supercomputing platforms as well as off-site platforms across the country via GridFTP (a protocol that transports large data files), making data files accessible from any site in the system.

Shipman said Spider has demonstrated stability on the XT5 and XT4 partitions of Jaguar, on Smoky (the center’s development cluster), and on Lens (the center’s visualization and data analysis cluster). “We’ve had all these systems running on the file system concurrently, with over 26,000 compute nodes (clients) mounting the file system and performing I/O [input and output]. It’s the largest demonstration of Lustre scalability in terms of client count ever achieved.”

Shipman said the file system is designed to support the latest incarnation of Jaguar, which is capable of 1.64 quadrillion calculations a second (1.64 petaflops). “When they told us they needed a file system to support it, we could not just pick up the phone and order one,” he said. “No vendor could deliver such a system, so we essentially trail-blazed.”

It was a phased approach. ORNL computer scientists and technicians (David Dillow, Jason Hill, Ross Miller, Sarp Oral, Feiyi Wang, and James Simmons) worked in close collaboration with partners Cray Inc., Data Direct Networks (DDN), Sun Microsystems, and Dell to bring Spider online. Cray provided the expertise to make the file system available on both Jaguar XT4 and Jaguar XT5. DDN provided 48 DDN 9900 storage arrays, Sun provided the Lustre parallel file system software, and Dell provided 192 I/0 servers. The vendors’ collaboration has produced a system which manages 13,000 disks and provides over 240 GB/s of throughput, a file system cluster that rivals the computational capability of many high-performance compute clusters.

The Spider parallel file system is similar to the disk in a conventional laptop — multiplied 13,000 times. A file system cluster sits in front of the storage arrays to manage the system and project a parallel file system to the computing platforms. A large-scale InfiniBand-based system area network connects Spider to each NCCS system, making data on Spider instantly available to them all.

“As new systems are deployed at the NCCS, we just plug them into our system area network; it is really about a backplane of services,” Shipman said. “Once they are plugged into the backplane, they have access to Spider and to HPSS [the center’s high-performance storage system] for data archival.  Users can access this file system from anywhere in the center. It really decouples data access and storage from individual systems.”
 
Before Spider each computing platform had its own file system. Once a project ran an application on Jaguar, it then had to move the data to the Lens visualization platform for analysis. Any problem encountered along the way would necessitate that the cumbersome process be repeated. With Spider connected to both Jaguar and Lens, however, this headache is avoided. “You can think of it as eliminating islands of data. Instead of having to multiply file systems all within the NCCS, one for each of our simulation platforms, we have a single file system that is available anywhere. If you are using extremely large data sets on the order of 200 terabytes, it could save you hours and hours.”

“Spider is one of the most important steps the NCCS has taken toward increasing the scientific productivity of our users,” said Bronson Messer, of the Scientific Computing Group and a participant in the “Three-Dimensional Model of SN1987A Frontier” early science project. “Sophisticated users have been asking for this, while new users I have spoken with immediately see the advantages and become very excited.”

Spider will have both scratch space (short-term storage for files involved in simulations, data analysis, etc.) and long-term storage for each user. Shipman said the technology integration team is now working with Sun to prepare for future NCCS platforms with even more daunting requirements.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Automated Optimization Boosts ResNet50 Performance by 1.77x on Intel CPUs

October 23, 2018

From supercomputers to cell phones, every system and software device in our digital panoply has a growing number of settings that, if not optimized, constrain performance, wasting precious cycles and watts. In the f Read more…

By Tiffany Trader

South Africa CHPC: Home Grown Dynasty

October 22, 2018

Before the build up to the final event in the 2018 Student Cluster Competition season (the SC18 competition in Dallas), I want to take a moment to write about one of the great inspirational stories of these competitions. Read more…

By Dan Olds

NSF Launches Quantum Computing Faculty Fellows Program

October 22, 2018

Efforts to expand quantum computing research capacity continue to accelerate. The National Science Foundation today announced a Quantum Computing & Information Science Faculty Fellows (QCIS-FF) program aimed at devel Read more…

By John Russell

HPE Extreme Performance Solutions

One Small Step Toward Mars: One Giant Leap for Supercomputing

Since the days of the Space Race between the U.S. and the former Soviet Union, we have continually sought ways to perform experiments in space. Read more…

IBM Accelerated Insights

Join IBM at SC18 and Learn to Harness the Next Generation of AI-focused Supercomputing

Blurring the lines between HPC and AI

Today’s high performance computers are helping clients gain insights at an unprecedented pace. The intersection of artificial intelligence (AI) and HPC can transform industries while solving some of the world’s toughest challenges. Read more…

Democratization of HPC Part 3: Ninth Graders Tap HPC in the Cloud to Design Flying Boats

October 18, 2018

This is the third in a series of articles demonstrating the growing acceptance of high-performance computing (HPC) in new user communities and application areas. In this article we present UberCloud use case #208 on how Read more…

By Wolfgang Gentzsch and Håkon Bull Hove

Automated Optimization Boosts ResNet50 Performance by 1.77x on Intel CPUs

October 23, 2018

From supercomputers to cell phones, every system and software device in our digital panoply has a growing number of settings that, if not optimized, constrain  Read more…

By Tiffany Trader

South Africa CHPC: Home Grown Dynasty

October 22, 2018

Before the build up to the final event in the 2018 Student Cluster Competition season (the SC18 competition in Dallas), I want to take a moment to write about o Read more…

By Dan Olds

Penguin Computing Launches Consultancy for Piecing AI Strategies Together

October 18, 2018

AI stands before the HPC industry as a beacon of great expectations, yet market research repeatedly shows that AI adoption is commonly stuck in the talking phas Read more…

By Tiffany Trader

When Water Quality—Not Quantity—Hinders HPC Cooling

October 18, 2018

Attention has been paid to the sheer quantity of water consumed by supercomputers’ cooling towers – and rightly so, as they can require thousands of gallons per minute to cool. But in the background, another factor can emerge, bottlenecking efficiency and raising costs: water quality. Read more…

By Oliver Peckham

Paper Offers ‘Proof’ of Quantum Advantage on Some Problems

October 18, 2018

Is quantum computing worth all the effort being poured into it or should we just wait for classical computing to catch up? An IBM blog today posed those questio Read more…

By John Russell

Dell EMC to Supply U Michigan’s Great Lakes Cluster

October 16, 2018

The University of Michigan (U-M) today announced Dell EMC is the lead vendor for U-M’s $4.8 million Great Lakes HPC cluster scheduled for deployment in first Read more…

By John Russell

Houston to Field Massive, ‘Geophysically Configured’ Cloud Supercomputer

October 11, 2018

Based on some news stories out today, one might get the impression that the next system to crack number one on the Top500 would be an industrial oil and gas mon Read more…

By Tiffany Trader

Nvidia Platform Pushes GPUs into Machine Learning, High Performance Data Analytics

October 10, 2018

GPU leader Nvidia, generally associated with deep learning, autonomous vehicles and other higher-end enterprise and scientific workloads (and gaming, of course) Read more…

By Doug Black

TACC Wins Next NSF-funded Major Supercomputer

July 30, 2018

The Texas Advanced Computing Center (TACC) has won the next NSF-funded big supercomputer beating out rivals including the National Center for Supercomputing Ap Read more…

By John Russell

IBM at Hot Chips: What’s Next for Power

August 23, 2018

With processor, memory and networking technologies all racing to fill in for an ailing Moore’s law, the era of the heterogeneous datacenter is well underway, Read more…

By Tiffany Trader

Requiem for a Phi: Knights Landing Discontinued

July 25, 2018

On Monday, Intel made public its end of life strategy for the Knights Landing "KNL" Phi product set. The announcement makes official what has already been wide Read more…

By Tiffany Trader

CERN Project Sees Orders-of-Magnitude Speedup with AI Approach

August 14, 2018

An award-winning effort at CERN has demonstrated potential to significantly change how the physics based modeling and simulation communities view machine learni Read more…

By Rob Farber

House Passes $1.275B National Quantum Initiative

September 17, 2018

Last Thursday the U.S. House of Representatives passed the National Quantum Initiative Act (NQIA) intended to accelerate quantum computing research and developm Read more…

By John Russell

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

By John Russell

D-Wave Breaks New Ground in Quantum Simulation

July 16, 2018

Last Friday D-Wave scientists and colleagues published work in Science which they say represents the first fulfillment of Richard Feynman’s 1982 notion that Read more…

By John Russell

Leading Solution Providers

HPC on Wall Street 2018 Booth Video Tours Playlist

Arista

Dell EMC

IBM

Intel

RStor

VMWare

TACC’s ‘Frontera’ Supercomputer Expands Horizon for Extreme-Scale Science

August 29, 2018

The National Science Foundation and the Texas Advanced Computing Center announced today that a new system, called Frontera, will overtake Stampede 2 as the fast Read more…

By Tiffany Trader

HPE No. 1, IBM Surges, in ‘Bucking Bronco’ High Performance Server Market

September 27, 2018

Riding healthy U.S. and global economies, strong demand for AI-capable hardware and other tailwind trends, the high performance computing server market jumped 28 percent in the second quarter 2018 to $3.7 billion, up from $2.9 billion for the same period last year, according to industry analyst firm Hyperion Research. Read more…

By Doug Black

Intel Announces Cooper Lake, Advances AI Strategy

August 9, 2018

Intel's chief datacenter exec Navin Shenoy kicked off the company's Data-Centric Innovation Summit Wednesday, the day-long program devoted to Intel's datacenter Read more…

By Tiffany Trader

Germany Celebrates Launch of Two Fastest Supercomputers

September 26, 2018

The new high-performance computer SuperMUC-NG at the Leibniz Supercomputing Center (LRZ) in Garching is the fastest computer in Germany and one of the fastest i Read more…

By Tiffany Trader

MLPerf – Will New Machine Learning Benchmark Help Propel AI Forward?

May 2, 2018

Let the AI benchmarking wars begin. Today, a diverse group from academia and industry – Google, Baidu, Intel, AMD, Harvard, and Stanford among them – releas Read more…

By John Russell

Houston to Field Massive, ‘Geophysically Configured’ Cloud Supercomputer

October 11, 2018

Based on some news stories out today, one might get the impression that the next system to crack number one on the Top500 would be an industrial oil and gas mon Read more…

By Tiffany Trader

Aerodynamic Simulation Reveals Best Position in a Peloton of Cyclists

July 5, 2018

Eindhoven University of Technology (TU/e) and KU Leuven research group conducts the largest numerical simulation ever done in the sport industry and cycling discipline. The goal was to understand the aerodynamic interactions in the peloton, i.e., the main pack of cyclists in a race. Read more…

No Go for GloFo at 7nm; and the Fujitsu A64FX post-K CPU

September 5, 2018

It’s been a news worthy couple of weeks in the semiconductor and HPC industry. There were several HPC relevant disclosures at Hot Chips 2018 to whet appetites Read more…

By Dairsie Latimer

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This