As humanity delves deeper into the digital age, the need for faster, more reliable server and storage systems continues to grow. To give an example of data storage and processing needs, Google handles 20 Petabytes of data information on a daily basis through their servers. As a way to avoid catastrophic failures, companies set up server arrays which are vast networks of storage drives to handle digital information. Should one drive fail, another is there to pick up the slack while the broken drive is replaced. So, how exactly is this accomplished, without losing a single bit of data or incurring downtime? RAID technology and RAID levels.
What Exactly is RAID and How Does it Work?
Redundant Array of Independent Disks or RAID, is a data storage method that allows you to reduce data loss and improve read/write performance, by storing your information on multiple hard drives in what is called an array. This is possible as most RAID technology uses data striping to separate or break down sequential data such as files, videos, or documentation, into consecutive segments. These segments are then stored in order across your drives.
Video File Striping Example in a 3-Hard Drive RAID Array
For instance, if a video file is broken down into 4 consecutive segments; A1-A4 and you have 3 hard drives in your RAID array, A1 would be stored on hard drive #1, A2 on hard drive #2, A3 on hard drive #3, and A4 back on hard drive #1.
When you go to retrieve the file, the computer will ask the first hard drive to load A1 and then immediately move to the next hard drive for segment A2 because hard drive #1 is busy. This allows the computer to load A1-A3 at the same time, followed by A4. When you’re trying to load a 40-minute video, the computer loads 4, 10-minute segments and then displays them as one singular file, making the retrieval process much quicker.
Some RAID Levels Use Mirroring Technology. What Is It?
For protecting against data loss, some RAID arrays use mirroring, which is the duplication of information as it’s created and stored. Where striping focuses on speed, mirroring focuses on ensuring there will be a copy of your data if one hard drive breaks down. Mirroring accomplishes this by ensuring that all secondary drives are an exact copy of your primary one.
As an example, if you have a computer with two hard drives, and the primary hard drive simply stops working, the system will automatically switch over to the secondary hard drive, possibly without you even noticing. The operating system, any personal settings, and every file you’ve created and saved will appear to be exactly where you left them. In a non-RAID setup, a hard drive failing means ALL of the data on that drive may become irretrievable.
3 Different RAID Types or Implementation
Before we jump into the different RAID levels, it is important to explain that there are a few different ways that a RAID setup can be implemented into your computer’s system. These implementations can be categorized as either hardware, software, or firmware-based, and make use of a corresponding RAID controller.
We cover the main differences between hardware RAID vs software RAID controllers in a separate article, and go into great detail about their specific benefits, drawbacks, and typical use cases.
What is a RAID controller?
A RAID controller is what manages and directs the flow of information into and out of your hard drive array. Without a RAID controller, your RAID array would simply be a collection of hard drives. You can implement a RAID controller on one of the following three ways.
1. Hardware RAID Controller
A hardware RAID controller is often a dedicated physical chip or card that directly impacts the flow of information in and out of your RAID array. It is most commonly used in data center banks, or for systems that use remote servers.
2. Software RAID Controller
A software RAID controller is a piece of software that works directly with the computer’s operating system. It works by using the computer’s existing hardware resources like the central processing unit (CPU) to direct and manage the flow of information. This is often seen in home computer RAID array setups.
3. Firmware-Based RAID Controller
A firmware-based RAID controller is a chip that is pre-installed on the computer’s motherboard that requires a driver in order to function. The chip will activate with the boot process but then passes all control to the corresponding drivers once the operating system loads, moving all operations over to the CPU. This type of RAID controller is also called hardware-assisted software RAID and while it is less expensive than a hardware RAID controller, it puts more strain on your system.
Standard RAID Levels from 0 to 6
Regardless of the RAID controller used, RAID levels refer to the specific architecture used to distribute data across the drives. Depending on what performance you are looking for and what kind of fault-tolerance you want, will determine which RAID level you use.
Typically using pairs of hard drives, RAID 0 works by segmenting sequential data and storing it across multiple drives. With this level, it is all about optimizing the performance and speed of your hard drives without any concern to data loss prevention as RAID 0 does not use mirroring.
It works by taking your stored data, breaking it down into striped units or segments and spreading it across the hard drives in your array. This allows a RAID 0 setup to write and read your data quickly because you have more than 1 hard drive working to process the data simultaneously. However, if one of the drives fails in this set up, all data is lost.
RAID level 1 is entirely focused on data redundancy, ensuring the safety of information, even with drive failure. In order to accomplish this, RAID 1 does not strip any data but instead duplicates information onto the second drive (mirrors).
With this, your reading performance is increased, meaning that the computer can access either drive at any given time because both have the same information. If one drive is busy, it can access the second. With this level, you get great performance with some fault-tolerance. The speed of writing operations however is not improved compared to a single drive setup because all information has to be written to both disks in a RAID 1 array.
This RAID level uses striping technology, but rather than breaking down a file’s data into blocked segments, it breaks it down at the bit level. Some of the hard drives in this type of array setup will store error and correction code information to replace any parities based on the Hamming code. This RAID level is rarely used today as it has been superseded by better setups.
RAID 3 is an upgraded version of RAID level 2. This array uses byte-level striping technology in order to increase performance, while dedicating one drive to storing parity information, and spreading the data across the rest. However, any I/O command addresses every hard drive at once, meaning the performance will be limited by the single parity disk as it cannot handle multiple requests simultaneously.
Similar to RAID level 3, this level dedicates a single parity drive, but instead creates larger segments while striping, allowing the computer to access and read data from any drive, at any time, quickly. Additionally, RAID level 4 also allows for multiple read I/O functions to be active at once. However, the system will still be limited to a single write I/O function at a time, since any write command needs to update the parity drive. It is because of this that level 4 is only a slight upgrade from level 3.
For most use cases RAID level 3 and 4 are both superseded by RAID 5.
This is the most common raid level used, as it is a good balance between speed and data security. To increase performance, data is striped across multiple hard drives, typically five, as it is with previous RAID levels. However, parity information is also striped across each disk, creating what’s called a rotating parity array. This allows the computer to perform multiple read and write I/O functions simultaneously, while leaving the computer as much information as possible should one drive fail. RAID Level 5 doesn’t boast the best performance (RAID 0 is faster), but is great for systems where a balance between performance and redundancy is needed.
RAID level 6 uses all the same systems as RAID level 5, but also implements a second rotating parity array across the same number of drives. This means that a RAID level 6 system can handle up to two drives failing at once, whereas a level 5 system can only handle losing one at a time.
Having double the amount of parity information means that there is some performance penalty on write operations compared to RAID 5. Read operations are not impacted, those are just as fast as on RAID 5.
What is Nested RAID?
In the RAID level section above, we mentioned three key terms that explain a RAID’s function: striping, mirroring, and parity. When two of these are used in a single array, this is called a nested RAID. Before we jump into an example of this, let us explain what these three key terms mean briefly.
RAID striping is when some data is stored on one drive and other segments of that data is stored on another.
RAID mirroring is when data is copied from one drive to another for redundancy. It protects your data from loss.
RAID parity uses complex math calculations to reconstruct data that is lost during a hard drive failure.
RAID 10 or RAID 1+0
A fantastic example of nested RAID is RAID level 10 or RAID 1+0. It combines RAID 0 and RAID 1 levels and can only work in an array with a minimum of four hard drives. Unlike other RAID levels that rely solely on striping or mirroring, RAID 10 combines both functions for much better performance, however at a much higher cost. With this RAID level, you get the speed that comes with disk striping, but you also get the data redundancies of disk mirroring.
It can be set up either as a RAID 1+0 where the data gets mirrored and then these mirrors are striped or it can be set up as RAID 0+1 where the data is organized in stripes across your hard drives and then these disks are mirrored. If you are trying to protect against single drive failure in either set of drives or a simultaneous failure in 1 drive of both sets, RAID 10 is what you want.
This RAID level is commonly used on servers or applications that need to be up 24/7. The downside with RAID 10 is that if you lose more than 50% of your drives in the array, you will lose all the data.
Each RAID level comes with its own set of benefits and drawbacks, however, levels 0, 1, 5 and 10 are the most common. Levels 0 and 5 offer higher speeds and greater storage, but with less safety and greater cost, respectively. Level 1 offers greater protection, but lacks speed and sacrifices storage. Level 10 is probably the most balanced, as it offers the best of both level 0 and 1, but is much more costly, as it requires more hard drives.
Yes, however you risk sacrificing storage space. For example, using RAID level 1 with a 256 GB primary drive and a 512 GB secondary drive means that the secondary drive will only make use of half of it’s storage. Switching the drives around means only half of your data will be mirrored, rendering the whole process irrelevant.
Yes, any hard drive can be used in a RAID array, however it is recommended that you use similar hard drives as speeds and storage are typically limited to the slowest and smallest drive available. Using similar or identical drives nullifies this limitation.
Yes, as with other hard drives, NVMe drives can be used in RAID arrays as well and will boost performance significantly. However, this type of setup isn’t practical for consumers, as these drives are already plenty powerful on their own, averaging data transfer speeds of over six times faster than typical a SSD.
RAID Level comparisons
If you are trying to decide between two specific RAID levels for your environment, we have created several in-depth comparisons to make the decision easier: