NAS RAID type

To set up a RAID consisting of 6+ disks there are some choices to be made.

Raid Controller

The question is which type of RAID controller to use. There are software implementations and hardware controllers.

Mainboard integrated RAID Controller

With RAID controller chips included now on many mainboards this is an option that sticks out as too obvious. The charm of this option is that there is no additional hardware required (aka cheap).
The downsides of this option outweigh the pros:

  • cheap
  • no option to migrate the RAID array to an upgraded system
  • only shortterm driver support
  • in case of defect of the mainboard, most likely the attached RAID array is also lost
  • no advanced features like online array expansion or online RAID level change

Separate Hardware RAID Controller

There are many RAID controllers available, but if viewed closely one finds that most controllers offload the work to the CPU. Only few controllers actually perform the XORing on their own. With a dedicated DSP of current controller cards the RAID performance is superior to a software solution. Concerning bluescreen shutdowns the verification of the RAID status is performed in minutes for a 1TB RAID1. Depending on the manufacturer there are also advanced features available.

  • array migration strategy from controller to controller
  • migration strategy for OS to OS is not necessary, as the array is tied in over the controller. A current driver is necessary to move to an updated OS.
  • longterm driver support
  • replacement of controller in case of defect does not affect array
  • online RAID level change (e.g. mirror --> RAID5)
  • online storage pool expansion
  • assignment of hot swap drives

The hardware I've had good experience with sofar are the RAID controllers made by 3Ware. Currently I'm using the 9650SE-2LP and the 9650SE-8LPML controllers.

Software Controller

The software controller assigns the RAID XORing to the CPU. Depending on the used operating system there are advantages and disadvantages with software controllers.

Feature Linux Opensolaris Windows
migration controller - controller yes yes no
migration OS - OS (of same type) yes yes uncertain
longterm support yes yes yes
RAID Levels 0-1-5-6-10 0-1-5-6 0-1-5
online RAID level change no no no
online storage pool expansion no (*) no (*) no
assignment of hot swap drives no yes no
RAID verify necessary after bluescreen / power out yes no yes

- The controller - controller migration means attaching harddrives to different controller e.g. new mainboard.
- The OS - OS migration means change of OS (within the same type) e.g. Opensuse --> Ubuntu or Opensolaris 0609 --> Opensolaris Build 125
- The storage pool can be increased within Linux and Opensolaris by adding new harddrives, but the existing RAID pool can't be expanded. E.g. RAID5 3hdd can't be modified to RAID5 5hdd. You can however have RAID5 3dd + RAID1 2hdd.

There are some quite awesome RAID software implementations out there. Some of the advanced features offered by hardware RAID are not available however. After the 4th bluescreen re-mirroring of my 1TB RAID1 I decided that the software implementation in Windows is not what I was looking for as every re-verifiy takes between 12-24h for 1TB.

RAID Controller Conclusion

Based on the gathered experience the answer is not simple. For a RAID1 setup in Windows my choice is a dedicated hardware RAID controller. However for the NAS I'm using a software controller with the Opensolaris OS.

RAID parity

There are several RAID parities available. Listed below are also some statistical sensitivities which I used to choose the parity type of the NAS.

RAID level overview

linux opensolaris   available space redundancy contents of single
disk can be read
RAID 0 same striping 100% none no
RAID 1 same mirror 50% half may fail yes
RAID 5 RAIDZ1   (n-1)/n (83% for 6) 1 may fail no
RAID 6 RAIDZ2   (n-2)/n (67% for 6) 2 may fail no
  RAIDZ3   (n-3)/n (50% for 6) only the paranoid survive no

Harddrive reliability overview

This sensitivity contains Annual Failure Rates (AFR) provided by several studies.
note: the probability for this NAS is lowered due to low POH (power on hours).
Google study includes 24/7 disks with a POH of 8760hrs.
The Seagate study uses a correction factor of approx. 0,7 for 1100 POH as this NAS is predicted to have

Probability sensitivity
  year 1 year 2 year 3
Seagate AFR 0,012 0,020 0,024
Google AFR 0,020 0,080 0,085
this NAS AFR (est.) 0,020 0,060 0,065
0 - Disk failures
  year 1 year 2 year 3
Seagate 0,930 0,886 0,864
Google 0,886 0,606 0,587
this NAS 0,886 0,690 0,668
1 - Disk failure
  year 1 year 2 year 3
Seagate 0,056 0,090 0,106
Google 0,090 0,264 0,273
this NAS 0,090 0,220 0,232
2 - Disk failures
  year 1 year 2 year 3
Seagate 0,002 0,006 0,008
Google 0,006 0,069 0,076
this NAS 0,006 0,042 0,048
3 - Disk failures
  year 1 year 2 year 3
Seagate 0,000 0,000 0,000
Google 0,000 0,008 0,009
this NAS 0,000 0,004 0,004
4 or more - Disk failures
  year 1 year 2 year 3
Seagate 0,011 0,018 0,021
Google 0,018 0,053 0,055
this NAS 0,018 0,044 0,047

Having calculated these numbers for a RAID array of 6 harddisks, there are several conclusions that can be drawn.

  • The statistical difference between 2 disk failures and 3 disk failures is less than 0,5% which questions the gain of triple parity on a small array.
  • There is a 2% chance of 3 or more HDDs failing. This does not mean that data loss will occur with double parity (RAID6 / RAIDZ-2) as the disks can fail at different times during the year, but I had previously thought this number to be phenominally small.
  • The probability of surviving year 1 without any failure is 89%.
  • The probability of surviving 2 consecutive years without any failure is 89% x 69% = 61%
  • The probability of surviving 3 consecutive years without any failure is 89% x 69% x 67% = 41%
  • Extrapolating these numbers shows year 4 to be 27% and year 5 to be 18%.

RAID parity conclusion

For the NAS I've choosen a RAID6 setup where 2 disks may fail at the same time without loosing the array. Also note that the contents of a single disk cannot be read, as is the case with the mirror RAID setup. The likelyhood of a drive failing is given (see above) and one may not be able to erase the contents of the disk before sending it to repair.

Reference data

Seagate: Gerry Cole: Estimating Drive Reliability in Desktop Computers and Consumer Electronics Systems
Google: Eduardo Pinheiro, Wolf-Dietrich Weber and Luiz Andre Barroso: Failure Trends in a Large Disk Drive Population