What is RAID and what are the different RAID modes

When companies want to build fault-tolerant server infrastructure they cannot do it without RAID arrays. Using one HDD as a storage device is not safe, since every other piece of machinery has a chance to break right in the middle of work, at the most inconvenient time. What should companies do if they lost all the data? Backup you might say, and you are right, but retrieving lost data from backup is a long process. That’s why people have invented RAID arrays, so the data keeps being operational at all times.


What is RAID?

First, we need to define what a RAID is. In a nutshell, RAID is a Redundant Array of Independent Disks meaning it is an array of individual hard drives arranged in such a way that interleaving and storing data results in redundant data on a physically different drive.

 

In the event of a physical disk failure with a properly configured array, the data is safe. RAID cannot save you from virus attacks, it just makes your storage hardware stay longer and the data within it. It protects the data from breaking mechanically while data backup can save your data in a programmable way. It's always better to invest in more disks initially than to have a single storage array, where in the event of a failure there is a high risk that all data will be irretrievably lost

 

What Are the Types of RAID?

During 30 years of RAID existence, the businesses set tasks for engineers to develop certain types of fault-tolerance which resulted in different RAID types coping with certain tasks. 

 

RAID 0

A RAID 0 array is not actually a true RAID because it contains no redundant information and therefore provides no protection for the stored data (a failure of a member means data loss). The individual devices are just combined into a logical whole, creating a capacity of the sum of all members. The connection can be implemented in two ways: as a concatenation (i.e. linear) and by striping:

 

  • Concatenation (JBOD)

In concatenation, data is stored sequentially on multiple disks. When the first one fills up, it is stored on the second, then on the third, and so on. The advantage is the ease of increasing capacity by adding another member and the fact that some files may be unaffected when a member fails.

  • Interleaving

With interleaving, data is stored on the disks cyclically (alternately, see the figure on the right). The space is divided into fixed-size chunks so that writing or reading a longer section of data takes place from multiple disks. In the event of a disk failure, it is unlikely that any file will remain undamaged. Interleaving can speed up the reading and writing of larger blocks of data because it is possible to simultaneously read (write) one block from one disk and the next block from another disk. Theoretically, the read speedup should be less than RAID 1, but in real-world use, reading and writing in RAID 0 are significantly faster than in RAID 1. The performance increase with sequential reads tends to be around 50% in a home environment (i.e., using two disks with 100 MB/s sequential reads, the disk array will (typically) have a read speed of approximately 150 MB/s). Of course, a 50% increase does not mean half the performance, as being in RAID 0 does not reduce access times.

 

Basically 1TB + 1TB = 2TB plus improved read/write speed. One disk is down, everything is down.

 

RAID 1

The simplest but quite effective data protection. Mirroring is performed. The content is simultaneously recorded on two disks. In the event of a failure of one disk, a copy is made immediately available. A similar technique can be applied one level up, where two separate controllers are used. This technique is called duplexing and is also robust against controller failure. In theory, it can significantly increase read speed and slightly reduce response time, but it depends on the specific controller (software controllers usually do not use the ability to read from both disks at all). Writing may be slower because the same data is stored on two disks. The technique greatly improves data security against loss due to hardware failure. The disadvantage is the need for twice the disk capacity.

 

  • RAID 0+1 

Is a combination of RAID 0 and RAID 1. We store data interleaved (striping) on two disks (A, B), then do the same with the other two disks (C, D). This gives us two logical disks AB, CD, which has redundant contents. (If we have a file that splits in half when striped, the first part of the file is on disk A and C, the second on disk B and D.) The advantage of this method is that not only do we spread the read and write load across multiple disks, but the data is also stored redundantly, so it can be easily recovered after an error. Disadvantages include using only 50% of the total disk capacity, and losing data redundancy when one of the four disks fails.

  • A RAID 1+0 

Is again a combination of RAID 0 and RAID 1, but we do it the other way around. First, we store the same data on disk A, B, then on disk C, D. We get two logical disks AB, CD, on which the data is stored striped. (If we have a file that splits into two halves when striped, the first part of the file is on disk A and B, the second part is on disk C and D, unlike RAID 0+1) The benefits are similar to RAID 0+1, plus RAID 1+0 is more resilient to multiple disk failures and data recovery is much faster after a failure. The disadvantage is again that only 50% of the capacity is used.

 

RAID 5 and RAID 6

RAID 5 requires at least 3 hard drives, with one drive’s capacity occupied by self-repairing codes that are stored on the hard drives alternately. The advantage is that parallel data access can be used, as a longer stretch of data is spread across multiple disks, so read speed is much faster. The disadvantage is slower writing speed (requiring the calculation of self-correcting code). It is resilient to single disk failure.

 

RAID 10

It differs from RAID 0+1 in that the data is first mirrored in the disk arrays before being inserted into another RAID 0 disk array for faster transfer rates. The maximum number of hard disks that can fail without any consequences is one in each array. This type is often used for heavily loaded database applications. This is because parity data does not have to be counted, which makes everything faster (or cheaper). 

 

Hardware RAID VS Software RAID?

Software RAID

It is the RAID that the operating system creates itself, mounts, and manages the entire RAID on a series of physical disks. It is the OS. itself that marks the partitions to be used in the RAID, then create the virtual device and allow working with this space. 

 

The disadvantage of this system is the dedication of server resources for all tasks related to the RAID, but on the other hand, it has the advantage of being a reliable system, almost as reliable as a hardware RAID. On the subject at hand, the vast majority of manufacturers such as Synology, QNAP, Thecus, etc. usually work with this type of system for SOHO and SMB ranges (specifically with Linux), hence for example if your NAS is making important file transfers notes the other processes slow down as the CPU load increases considerably.

 

Hardware RAID

This is the RAID found in high-performance servers, they are systems assembled thanks to controllers or cards dedicated to managing the RAID by themselves, with their own processors and memory, which relieves CPU work for the system that acts as a data server. A hardware RAID can be found for SATA, SAS, or SCSI disks, the latter being less and less used.

 

Logically, this implementation system is the most reliable, fastest, and therefore notably more expensive, so you can almost certainly rule out that the RAID you have mounted on your computer or NAS works with this system, since we are talking about thousands of euros of cost.

 

In RAID 0, 1, and 10 working modes, this type of RAID does not provide great advantages over other systems since the cards do not perform major offload tasks for the system.

 

Blog