RAID, It’s Not Just for Killing Bugs! | Birds on a Cable

We’ve spoken about backups before, and in truth it’s impossible to talk too much about them. The best rule of thumb is to follow the 3-2-1 rule:

3 – Keep three copies of all important data

2 – Keep data on at least two different types of media (hard drives, flash drives, tape, cloud)

1 – Keep at least one copy offsite.

This is a fairly bulletproof way to secure your data, but it doesn’t stop downtime. If you have a giant server die with terabytes of data, it will take hours or days to restore from backup. During this time, any work that relies on that server has to be put on hold, potentially costing businesses lots of money. There is a technology to help mitigate this, called RAID.

RAID stands for “redundant array of inexpensive disks,” but really, it’s just a fancy way of saying “treat a bunch of hard drives like one big one.” There are different RAID levels, from 0 all the way to 10, with RAID 5 and 6 being the most popular; and RAID 6 and 10 the most recommended. In RAID 5 a group of three or more drives are treated as one, and each drive is broken into blocks. Then, each drive’s block is grouped so that there is one block from each drive in a grouping. The system sets aside a drive’s chunk as what’s called a checksum, and then does complicated math on the other blocks in the grouping to end up with a remainder, which it stores as the checksum. The math is advanced enough that if any of the hard drives fail, it’s able to use those checksums to figure out what data is missing on the failed drive. RAID 6 improves on this by using two checksums.

Hopefully you didn’t stop reading. What this means is that, in RAID 5, you can have any one hard drive fail, and the system keeps running. If you’re willing to sacrifice a bit more of the total drive space by using RAID 6, you can lose any 2 hard drives and still keep running. You can replace the drive while the system keeps running and it will use its magical math to recreate all the data on the replaced drive.

Now, this hardware redundancy is often misconstrued as a backup; that if you can have a hard drive fail and replace it without the system coming down and without losing data, why would anyone pay extra for backup? The answer is that the data still only exists in one place: on that server. If that server’s hard drive controller dies, if a cryptolocker virus encrypts all the files, if the building is struck by lightning and fries everything, you won’t have a way to get your data. So, the two can go hand in hand. RAID keeps you running during a minor crisis, and backups protect you from everything else. There are even further ways to expand this for even higher uptime guarantees, but that’s a topic for another time!