The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
October 12, 2007
In 1988 Garth Gibson at the University of California, Berkeley, co-authored a paper titled "A Case for Redundant Arrays of Inexpensive Disks (RAID) [PDF]," which outlined the basic principles of using big, cheap disks to increase data reliability and I/O performance. RAID went on to become a widely adopted storage technology throughout the industry, while Gibson co-founded Panasas Inc., a storage cluster vendor for high performance computing applications.
This week, Gibson and company claim that they have implemented the most significant extension to disk array data reliability since the original RAID paradigm was developed. Their new architecture is called "tiered parity." In this model, Panasas has built "vertical parity" and "network parity" on top of their existing RAID 5 "horizontal parity" implementation.
The RAID 5 approach, as it was outlined in the original paper, consists of striping data and parity across multiple disks. It enables error recovery for single disk failures and increases performance via parallel reads and writes. This technology is widely used in storage systems today. Panasas' own implementation of RAID 5, called "ObjectRAID," is based on storage objects rather than blocks. The added intelligence is designed to reduce reconstruction times when a disk failure occurs.
But no RAID 5 technology can handle a media error, also know as an unrecoverable read error (URE), if it occurs during reconstruction of a failed disk. When this occurs, the RAID data cannot be rebuilt from disk; a backup (usually on tape) has to be used to recover the entire array. Ten years ago, this wasn't a serious problem. With 50 GB SATA disk drives, a media error was very unlikely to occur while reading a single disk, since the rate of failure is about one error every 10^14 bits (12.5 terabytes), a rate that has remained constant for over a decade. And when a media error did happen to occur during reconstruction, a 50 GB disk took only a few hours to recover from tape.
Times have changed. Disks have become much bigger and denser. Capacities of 500 to 750 GB are common today, and one terabyte disks will soon be the norm. That means when a disk goes south, the odds of hitting a media error during recovery are much greater, and recovery from tape can take days or weeks.
Imagine a RAID array of seven 1 TB disks. When one disk fails, the chances of hitting a URE while recovering the data from the six remaining disks is now about 50/50. When two terabyte disks hit the market in 2009, the disk failure plus media error scenario becomes almost a sure bet. Recovering the storage array from backup tape could take a month. For high end computing applications that use tens or hundreds of terabytes of data, this would be a disaster.
"I think what people are becoming aware of is that the data integrity provided by RAID 5 is basically no longer sufficient," says Robin Harris, senior analyst at Data Mobility Group. "RAID 5 will only protect across a single disk failure, so it's going away as a [standalone] data protection strategy."
To address this problem, Panasas invented vertical parity. Essentially, they've added RAID within each disk, by generating a parity sector from the other sectors. The local parity sector can be used to recompute the missing data in case of a media error. According to Panasas, vertical parity gets the error rate down to between one in 10^18 and one in 10^19, which is 1000 to 10,000 times better than the URE rate. The extra parity information uses 10 percent of the disk capacity, but Panasas claims there is no performance hit. So scalability is built in.
A word here should be said about RAID 6 technology (also known as double parity), which some vendors use for an additional level of data protection. This scheme was designed to guard against a double disk failure, which it does. Sort of. The problem is that RAID 6 doesn't protect against subsequent media errors after the second disk goes down, which, as discussed above, is becoming increasingly more likely. Here, it has the same problem as RAID 5. However, RAID 6 can be used to recover from the single disk failure plus media error scenario. But the performance hit for dual parity compared to single parity is significant. So it's a mixed bag and doesn't directly address the media error problem.
On top of its horizonal and vertical parity schemes, Panasas has added an additional layer of network parity protection. At this level, parity checking is done on the client side, to make sure the data delivered by the storage system wasn't corrupted on its way to the user. Because of increasing I/O bandwidth and the number of hardware and software components between the external data and the application, there are increasing opportunities for good data to go bad. Firmware, server hardware, server software, network components and transmission media can all potentially mangle valid data unbeknownst to the application. With network parity, the client receives an error notification when bad data is detected.
The tiered parity technology will be included in the next version of Panasas' ActiveScale operating environment, version 3.2. The beta will be out next month and will be generally available by the end of the year. The additional parity levels can be turned off if the user believes they're not needed for a particular environment. According to Panasas, the tiered parity technology doesn't exact a performance hit on top of the existing RAID 5 implementation, but, as stated above, the vertical scheme does eat an additional 10 percent of the storage -- that's in addition to the 10 percent used by the RAID 5 implementation.
Although the overall concepts of the three-tiered architecture are fairly general, Panasas is attempting to protect its new invention. "We actually have a patent pending on this tiered parity concept, particularly the vertical parity," says Larry Jones, VP of Marketing at Panasas. "Could someone copy it? Who knows? But we are trying to protect this specific idea."
Appro Xtreme-X1 Supercomputer is Intel® Cluster Ready Certified
Appro adopts the Intel Cluster Ready program to help simplify deployment, usage and management of high performance computing clusters to achieve faster and more accurate time-to-results. Learn how.
UPenn adds third state to nanowire storage; and UIUC is named the first CUDA Center of Excellence. John West recaps those stories and more in our weekly wrap-up.
Read More...
Modern civilization is positively drenched in data, some of which needs to be dealt with in real time to be of any value. Businesses, especially in the financial industry, have long recognized this, and have been building custom systems to collect, analyze, and react to information as it is captured. IBM thinks the time is right to generalize these approaches into a new field of computing -- and a new business -- it calls stream computing.
Read More...
Not all supercomputing rides on InfiniBand or proprietary interconnects. For technical applications that decompose neatly into loosely-coupled threads, a big cluster with vanilla Gigabit Ethernet does just fine. The top Ethernet system on the TOP500 list -- at number 58 -- is the new ATLAS cluster at the Max Planck Institute for Gravitational Physics in Germany.
Read More...
Jul 03 | Byte and Switch | The San Diego Supercomputer Center, which provides much of the core storage for the TeraGrid, is overhauling its 28 petabyte storage system to support tremendous data growth. Read more...
Jul 03 | ExtremeTech | Intel exec Pat Gelsinger said he sees the Intel Architecture permeating virtually every segment of computing, as the company's microprocessors expand into more and more cores. Read more...
Jul 03 | Bangkok Post | The latest programmable GPUs are starting to steal application cycles from CPUs. Read more...
Jul 02 | UC San Diego News Center | With the help of resources at the San Diego Supercomputer Center, UCSD scientists have isolated more than two dozen promising compounds from which new “designer drugs” might be developed to combat the avian flu virus. Read more...
Jul 02 | Chip Design Magazine | Dual- and quad-core processors barely scratch the surface of the potential of multi-core systems. Read more...
Jul 03 | | The paper explores some of the performance benefits of Star-P on commodity scalable systems such as IBM's Linux clusters based on multi-core Intel Xeon processors. The results demonstrate substantial performance gains with almost no programmer effort-roughly a 24-fold speed improvement for solving linear matrix equations. An overview of parallel computing with Star-P, a description of the performance test cases and description of IBM cluster configurations used for testing are also addressed.
Apr 17 | | An N-body simulation numerically approximates the evolution of a system of bodies in which each body continuously interacts with every other body, and it arises in many other computational science problems as well.
Jun 05 | | As pressure increases on the upstream seismic processing community to deliver ever-higher levels of productivity and efficiency, a new generation of storage solutions will be required that allow the maximum utilisation of high-performance computing (HPC) Linux cluster resources, together with the minimum of management overhead.
Today, HPC organizations are requiring substantially more floating point performance to solve real-world problems. In this podcast, Ben Bennett, ClearSpeed General Manager, discusses how acceleration technology can improve the overall performance of standard x86-based systems...
Get updates and insights on the High Productivity Computing industry delivered driectly to your inbox.