转载自:http://www.lascon.co.uk/d008005.htm
These descriptions are based on the original RAID definitions from the Berkeley paper by Patterson, Gibson and Katz. RAID originally stood for Redundant Array of Inexpensive Disks, but the disk vendors did not like that, as it had cost implications. They changed it to mean Redundant Array of Independent Disks.
Now this page has turned out to be a lot more popular that I ever thought it would, and needs a bit more explanation, as a lot of people are coming in from the home PC angle. I'm from a big systems background, IBM mainframes, big Unix servers, Windows and Netware clusters, that sort of stuff and that biases my opinions on RAID. If you want to put RAID onto your home PC, then in my opinion, RAID1 is the best way to go. It's simple. it works and it only needs two disks. It will even perform if it is a software implementation.
If you run big storage systems with gigabytes of cache and hundreds of physical disks, then I would definitely go for RAID5. Why? It is cheaper because it uses fewer disks for a given capacity and it performs just as good as RAID1. If you have eighty 500GB disks, you can only store 20 Terabytes of data on them with RAID1, but you will get 35 TB on them in a 7+1 RAID5 implementation. That's why I claim that RAID5 is cheaper than RAID1. It is for big systems, but not for small systems, say less than a couple of terabytes.
I had an animated discussion (which is one way of describing it) with a DBA last year who insisted that Oracle databases had to have RAID1 or they would not perform. We bought a DMX and ran some tests with the same database on RAID1 and RAID5, and the RAID5 setup actually performed better, I suspect, because it was pulling the data off more spindles.
However, I would never touch a software implementation of RAID5 as the write penalty will kill performance.
So there you go, PCs and small systems; RAID1, big systems RAID5 but at the end of the day it is your money.
RAID can be implemented by software in the host, but this is not usually successful. It is best implemented by microcode in the storage subsystem controller. The various types of RAID are explained below. In the diagrams, the square box represents the controller and the cache.
Parity is a means of adding extra data, so that if one of the bits of data is deleted, it can be recreated from the parity. For example, suppose a binary halfword consists of the bits 1011. The total number of '1's in the halfword is odd, so we make the parity bit a 1. The halfword then becomes 10111. Suppose the third bit is lost, the halfword is then 10?11. We know from the last bit that there should be an odd number of '1's, the number of recognisable '1's is even, so the missing but must be a '1'. This is a very simplistic explanation, in practice, disk parity is calculated on blocks of data using XOR hardware functions. The advantage of parity is that it is possible to recover data from errors. The disadvantage is that more storage space is required.
In the gif above, the right hand disk is dedicated parity, the other three disks are data disks.
In the gif above, the right hand disk is dedicated parity, the other three disks are data disks.
The gif below illustrates the RAID5 write overhead. If a block of data on a RAID5 disk is updated, then all the unchanged data blocks from the RAID stripe have to be read back from the disks, then new parity calculated before the new data block and new parity block can be written out. This means that a RAID5 write operation requires 4 IOs. The performance impact is usually masked by a large subsystem cache.
As Nat Makarevitch pointed out, more efficient RAID-5 implementations hang on to the original data and use that to generate the parity according to the formula new_parity = old_data XOR new_data XOR old_parity. If the old data block is retained in cache, and it often is, then this just requires one extra IO to fetch the old parity. Worst case it will require to read two extra data blocks, not four.
RAID 5 often gets a bad press, due to potential data loss on hardware errors and poor performance on random writes. Some database manufactures will positively tell you to avoid RAID5. The truth is, it depends on the implementation. Avoid software implemented RAID5, it will not perform. RAID5 on smaller subsystems will not perform unless the subsystem has a large amount of cache. However, RAID5 is fine on enterprise class subsystems like the EMC DMX, the HDS USP or the IBM DDS devices. They all have large, gigabyte size caches and force all write IOs to be written to cache, thus guaranteeing performance and data integrity.
Most manufactures will let you have some control over the RAID5 configuration now. You can select your block stripe size and the number of volumes in an array group.
A smaller stripe size is more efficient for a heavy random write workload, while a larger blocksize works better for sequential writes. A smaller number of disks in an array will perform better, but has a bigger parity bit overhead. Typical configurations are 3+1 (25% parity) and 7+1 (12.5% parity).
The problem with RAID6 is that there is no standard method of implementation; every manufacturer has their own method. In fact there are two distinct architectures, RAID6 P+Q and RAID6 DP.
DP, or Double Parity raid uses a mathematical method to generate two independent parity bits for each block of data, and several mathematical methods are used. P+Q generates a horizontal P parity block, then combines those disks into a second vertical RAID stripe and generates a Q parity, hence P+Q. One way to visualise this is to picture three standard four disk RAID5 arrays then take a fourth array and stripe again to construct a second set of raid arrays that consist of one disk from each of the first three arrays, plus a fourth disk from the fourth array. The consequence is that those sixteen disks will only contain nine disks worth of data.
P+Q architectures tend to perform better than DP architectures and are more flexible in the number of disks that can be in each RAID array. DP architectures usually insist that the number of disks is prime, something like 4+1, 6+1 or 10+1. This can be a problem as the physical disks usually come in units of eight, and so do not easily fit a prime number scheme.