Since 1986 - Covering the Fastest Computers in the World and the People Who Run Them

Language Flags
October 12, 2007

Panasas Invents ‘Tiered Parity’

by Michael Feldman

In 1988 Garth Gibson at the University of California, Berkeley, co-authored a paper titled “A Case for Redundant Arrays of Inexpensive Disks (RAID) [PDF],” which outlined the basic principles of using big, cheap disks to increase data reliability and I/O performance. RAID went on to become a widely adopted storage technology throughout the industry, while Gibson co-founded Panasas Inc., a storage cluster vendor for high performance computing applications.

This week, Gibson and company claim that they have implemented the most significant extension to disk array data reliability since the original RAID paradigm was developed. Their new architecture is called “tiered parity.” In this model, Panasas has built “vertical parity” and “network parity” on top of their existing RAID 5 “horizontal parity” implementation.

The RAID 5 approach, as it was outlined in the original paper, consists of striping data and parity across multiple disks. It enables error recovery for single disk failures and increases performance via parallel reads and writes. This technology is widely used in storage systems today. Panasas’ own implementation of RAID 5, called “ObjectRAID,” is based on storage objects rather than blocks. The added intelligence is designed to reduce reconstruction times when a disk failure occurs.

But no RAID 5 technology can handle a media error, also know as an unrecoverable read error (URE), if it occurs during reconstruction of a failed disk. When this occurs, the RAID data cannot be rebuilt from disk; a backup (usually on tape) has to be used to recover the entire array. Ten years ago, this wasn’t a serious problem. With 50 GB SATA disk drives, a media error was very unlikely to occur while reading a single disk, since the rate of failure is about one error every 10^14 bits (12.5 terabytes), a rate that has remained constant for over a decade. And when a media error did happen to occur during reconstruction, a 50 GB disk took only a few hours to recover from tape.

Times have changed. Disks have become much bigger and denser. Capacities of 500 to 750 GB are common today, and one terabyte disks will soon be the norm. That means when a disk goes south, the odds of hitting a media error during recovery are much greater, and recovery from tape can take days or weeks.

Imagine a RAID array of seven 1 TB disks. When one disk fails, the chances of hitting a URE while recovering the data from the six remaining disks is now about 50/50. When two terabyte disks hit the market in 2009, the disk failure plus media error scenario becomes almost a sure bet. Recovering the storage array from backup tape could take a month. For high end computing applications that use tens or hundreds of terabytes of data, this would be a disaster.

“I think what people are becoming aware of is that the data integrity provided by RAID 5 is basically no longer sufficient,” says Robin Harris, senior analyst at Data Mobility Group. “RAID 5 will only protect across a single disk failure, so it’s going away as a [standalone] data protection strategy.”

To address this problem, Panasas invented vertical parity. Essentially, they’ve added RAID within each disk, by generating a parity sector from the other sectors. The local parity sector can be used to recompute the missing data in case of a media error. According to Panasas, vertical parity gets the error rate down to between one in 10^18 and one in 10^19, which is 1000 to 10,000 times better than the URE rate. The extra parity information uses 10 percent of the disk capacity, but Panasas claims there is no performance hit. So scalability is built in.

A word here should be said about RAID 6 technology (also known as double parity), which some vendors use for an additional level of data protection. This scheme was designed to guard against a double disk failure, which it does. Sort of. The problem is that RAID 6 doesn’t protect against subsequent media errors after the second disk goes down, which, as discussed above, is becoming increasingly more likely. Here, it has the same problem as RAID 5. However, RAID 6 can be used to recover from the single disk failure plus media error scenario. But the performance hit for dual parity compared to single parity is significant. So it’s a mixed bag and doesn’t directly address the media error problem.

On top of its horizonal and vertical parity schemes, Panasas has added an additional layer of network parity protection. At this level, parity checking is done on the client side, to make sure the data delivered by the storage system wasn’t corrupted on its way to the user. Because of increasing I/O bandwidth and the number of hardware and software components between the external data and the application, there are increasing opportunities for good data to go bad. Firmware, server hardware, server software, network components and transmission media can all potentially mangle valid data unbeknownst to the application. With network parity, the client receives an error notification when bad data is detected.

The tiered parity technology will be included in the next version of Panasas’ ActiveScale operating environment, version 3.2. The beta will be out next month and will be generally available by the end of the year. The additional parity levels can be turned off if the user believes they’re not needed for a particular environment. According to Panasas, the tiered parity technology doesn’t exact a performance hit on top of the existing RAID 5 implementation, but, as stated above, the vertical scheme does eat an additional 10 percent of the storage — that’s in addition to the 10 percent used by the RAID 5 implementation.

Although the overall concepts of the three-tiered architecture are fairly general, Panasas is attempting to protect its new invention. “We actually have a patent pending on this tiered parity concept, particularly the vertical parity,” says Larry Jones, VP of Marketing at Panasas. “Could someone copy it? Who knows? But we are trying to protect this specific idea.”

SC14 Virtual Booth Tours

AMD SC14 video AMD Virtual Booth Tour @ SC14
Click to Play Video
Cray SC14 video Cray Virtual Booth Tour @ SC14
Click to Play Video
Datasite SC14 video DataSite and RedLine @ SC14
Click to Play Video
HP SC14 video HP Virtual Booth Tour @ SC14
Click to Play Video
IBM DCS3860 and Elastic Storage @ SC14 video IBM DCS3860 and Elastic Storage @ SC14
Click to Play Video
IBM Flash Storage
@ SC14 video IBM Flash Storage @ SC14  
Click to Play Video
IBM Platform @ SC14 video IBM Platform @ SC14
Click to Play Video
IBM Power Big Data SC14 video IBM Power Big Data @ SC14
Click to Play Video
Intel SC14 video Intel Virtual Booth Tour @ SC14
Click to Play Video
Lenovo SC14 video Lenovo Virtual Booth Tour @ SC14
Click to Play Video
Mellanox SC14 video Mellanox Virtual Booth Tour @ SC14
Click to Play Video
Panasas SC14 video Panasas Virtual Booth Tour @ SC14
Click to Play Video
Quanta SC14 video Quanta Virtual Booth Tour @ SC14
Click to Play Video
Seagate SC14 video Seagate Virtual Booth Tour @ SC14
Click to Play Video
Supermicro SC14 video Supermicro Virtual Booth Tour @ SC14
Click to Play Video