Copy link to clipboard
Copied
People often ask: Should I raid my disks?
The question is simple, unfortunately the answer is not. So here I'm going to give you another guide to help you decide when a raid array is advantageous and how to go about it. Notice that this guide also applies to SSD's, with the expection of the parts about mechanical failure.
What is a RAID?
RAID is the acronym for "Redundant Array of Inexpensive Disks". The concept originated at the University of Berkely in 1987 and was intended to create large storage capacity with smaller disks without the need for very expensive and reliable disks, that were very expensive at that time, often a tenfold of smaller disks. Today prices of hard disks have fallen so much that it often is more attractive to buy a single 1 TB disk than two 500 GB disks. That is the reason that today RAID is often described as "Redundant Array of Independent Disks".
The idea behind RAID is to have a number of disks co-operate in such a way that it looks like one big disk. Note that 'Spanning' is not in any way comparable to RAID, it is just a way, like inverse partitioning, to extend the base partition to use multiple disks, without changing the method of reading and writing to that extended partition.
Why use a RAID?
Now with these lower disks prices today, why would a video editor consider a raid array? There are two reasons:
1. Redundancy (or security)
2. Performance
Notice that it can be a combination of both reasons, it is not an 'either/or' reason.
Does a video editor need RAID?
No, if the above two reasons, redundancy and performance are not relevant. Yes if either or both reasons are relevant.
Re 1. Redundancy
Every mechanical disk will eventually fail, sometimes on the first day of use, sometimes only after several years of usage. When that happens, all data on that disk are lost and the only solution is to get a new disk and recreate the data from a backup (if you have one) or through tedious and time-consuming work. If that does not bother you and you can spare the time to recreate the data that were lost, then redundancy is not an issue for you. Keep in mind that disk failures often occur at inconvenient moments, on a weekend when the shops are closed and you can't get a replacement disk, or when you have a tight deadline.
Re 2. Performance
Opponents of RAID will often say that any modern disk is fast enough for video editing and they are right, but only to a certain extent. As fill rates of disks go up, performance goes down, sometimes by 50%. As the number of disk activities on the disk go up , like accessing (reading or writing) pagefile, media cache, previews, media, project file, output file, performance goes down the drain. The more tracks you have in your project, the more strain is put on your disk. 10 tracks require 10 times the bandwidth of a single track. The more applications you have open, the more your pagefile is used. This is especially apparent on systems with limited memory.
The following chart shows how fill rates on a single disk will impact performance:
Remember that I said previously the idea behind RAID is to have a number of disks co-operate in such a way that it looks like one big disk. That means a RAID will not fill up as fast as a single disk and not experience the same performance degradation.
RAID basics
Now that we have established the reasons why people may consider RAID, let's have a look at some of the basics.
Single or Multiple?
There are three methods to configure a RAID array: mirroring, striping and parity check. These are called levels and levels are subdivided in single or multiple levels, depending on the method used. A single level RAID0 is striping only and a multiple level RAID15 is a combination of mirroring (1) and parity check (5). Multiple levels are designated by combining two single levels, like a multiple RAID10, which is a combination of single level RAID0 with a single level RAID1.
Hardware or Software?
The difference is quite simple: hardware RAID controllers have their own processor and usually their own cache. Software RAID controllers use the CPU and the RAM on the motherboard. Hardware controllers are faster but also more expensive. For RAID levels without parity check like Raid0, Raid1 and Raid10 software controllers are quite good with a fast PC.
The common Promise and Highpoint cards are all software controllers that (mis)use the CPU and RAM memory. Real hardware RAID controllers all use their own IOP (I/O Processor) and cache (ever wondered why these hardware controllers are expensive?).
There are two kinds of software RAID's. One is controlled by the BIOS/drivers (like Promise/Highpoint) and the other is solely OS dependent. The first kind can be booted from, the second one can only be accessed after the OS has started. In performance terms they do not differ significantly.
For the technically inclined: Cluster size, Block size and Chunk size
In short: Cluster size applies to the partition and Block or Stripe size applies to the array.
With a cluster size of 4 KB, data are distributed across the partition in 4 KB parts. Suppose you have a 10 KB file, three full clusters will be occupied: 4 KB - 4 KB - 2 KB. The remaining 2 KB is called slackspace and can not be used by other files. With a block size (stripe) of 64 KB, data are distributed across the array disks in 64 KB parts. Suppose you have a 200 KB file, the first part of 64 KB is located on disk A, the second 64 KB is located on disk B, the third 64 KB is located on disk C and the remaining 8 KB on disk D. Here there is no slackspace, because the block size is subdivided into clusters. When working with audio/video material a large block size is faster than smaller block size. Working with smaller files a smaller block size is preferred.
Sometimes you have an option to set 'Chunk size', depending on the controller. It is the minimal size of a data request from the controller to a disk in the array and only useful when striping is used. Suppose you have a block size of 16 KB and you want to read a 1 MB file. The controller needs to read 64 times a block of 16 KB. With a chunk size of 32 KB the first two blocks will be read from the first disk, the next two blocks from the next disk, and so on. If the chunk size is 128 KB. the first 8 blocks will be read from the first disk, the next 8 block from the second disk, etcetera. Smaller chunks are advisable with smaller filer, larger chunks are better for larger (audio/video) files.
RAID Levels
For a full explanation of various RAID levels, look here: http://www.acnc.com/04_01_00/html
What are the benefits of each RAID level for video editing and what are the risks and benefits of each level to help you achieve better redundancy and/or better performance? I will try to summarize them below.
RAID0
The Band AID of RAID. There is no redundancy! There is a risk of losing all data that is a multiplier of the number of disks in the array. A 2 disk array carries twice the risk over a single disk, a X disk array carries X times the risk of losing it all.
A RAID0 is perfectly OK for data that you will not worry about if you lose them. Like pagefile, media cache, previews or rendered files. It may be a hassle if you have media files on it, because it requires recapturing, but not the end-of-the-world. It will be disastrous for project files.
Performance wise a RAID0 is almost X times as fast as a single disk, X being the number of disks in the array.
RAID1
The RAID level for the paranoid. It gives no performance gain whatsoever. It gives you redundancy, at the cost of a disk. If you are meticulous about backups and make them all the time, RAID1 may be a better solution, because you can never forget to make a backup, you can restore instantly. Remember backups require a disk as well. This RAID1 level can only be advised for the C drive IMO if you do not have any trust in the reliability of modern-day disks. It is of no use for video editing.
RAID3
The RAID level for video editors. There is redundancy! There is only a small performance hit when rebuilding an array after a disk failure due to the dedicated parity disk. There is quite a perfomance gain achieveable, but the drawback is that it requires a hardware controller from Areca. You could do worse, but apart from it being the Rolls-Royce amongst the hardware controllers, it is expensive like the car.
Performance wise it will achieve around 85% (X-1) on reads and 60% (X-1) on writes over a single disk with X being the number of disks in the array. So with a 6 disk array in RAID3, you get around 0.85x (6-1) = 425% the performance of a single disk on reads and 300% on writes.
RAID5 & RAID6
The RAID level for non-video applications with distributed parity. This makes for a somewhat severe hit in performance in case of a disk failure. The double parity in RAID6 makes it ideal for NAS applications.
The performance gain is slightly lower than with a RAID3. RAID6 requires a dedicated hardware controller, RAID5 can be run on a software controller but the CPU overhead negates to a large extent the performance gain.
RAID10
The RAID level for paranoids in a hurry. It delivers the same redundancy as RAID 1, but since it is a multilevel RAID, combined with a RAID0, delivers twice the performance of a single disk at four times the cost, apart from the controller. The main advantage is that you can have two disk failures at the same time without losing data, but what are the chances of that happening?
RAID30, 50 & 60
Just striped arrays of RAID 3, 5 or 6 which doubles the speed while keeping redundancy at the same level.
EXTRAS
RAID level 0 is striping, RAID level 1 is mirroring and RAID levels 3, 5 & 6 are parity check methods. For parity check methods, dedicated controllers offer the possibility of defining a hot-spare disk. A hot-spare disk is an extra disk that does not belong to the array, but is instantly available to take over from a failed disk in the array. Suppose you have a 6 disk RAID3 array with a single hot-spare disk and assume one disk fails. What happens? The data on the failed disk can be reconstructed in the background, while you keep working with negligeable impact on performance, to the hot-spare. In mere minutes your system is back at the performance level you were before the disk failure. Sometime later you take out the failed drive, replace it for a new drive and define that as the new hot-spare.
As stated earlier, dedicated hardware controllers use their own IOP and their own cache instead of using the memory on the mobo. The larger the cache on the controller, the better the performance, but the main benefits of cache memory are when handling random R+W activities. For sequential activities, like with video editing it does not pay to use more than 2 GB of cache maximum.
REDUNDANCY
(or security)
Not using RAID entails the risk of a drive failing and losing all data. The same applies to using RAID0 (or better said AID0), only multiplied by the number of disks in the array.
RAID1 or 10 overcomes that risk by offering a mirror, an instant backup in case of failure at high cost.
RAID3, 5 or 6 offers protection for disk failure by reconstructing the lost data in the background (1 disk for RAID3 & 5, 2 disks for RAID6) while continuing your work. This is even enhanced by the use of hot-spares (a double assurance).
PERFORMANCE
RAID0 offers the best performance increase over a single disk, followed by RAID3, then RAID5 amd finally RAID6. RAID1 does not offer any performance increase.
Hardware RAID controllers offer the best performance and the best options (like adjustable block/stripe size and hot-spares), but they are costly.
SUMMARY
If you only have 3 or 4 disks in total, forget about RAID. Set them up as individual disks, or the better alternative, get more disks for better redundancy and better performance. What does it cost today to buy an extra disk when compared to the downtime you have when a single disk fails?
If you have room for at least 4 or more disks, apart from the OS disk, consider a RAID3 if you have an Areca controller, otherwise consider a RAID5.
If you have even more disks, consider a multilevel array by striping a parity check array to form a RAID30, 50 or 60.
If you can afford the investment get an Areca controller with battery backup module (BBM) and 2 GB of cache. Avoid as much as possible the use of software raids, especially under Windows if you can.
RAID, if properly configured will give you added redundancy (or security) to protect you from disk failure while you can continue working and will give you increased performance.
Look carefully at this chart to see what a properly configured RAID can do to performance and compare it to the earlier single disk chart to see the performance difference, while taking into consideration that you can have one disks (in each array) fail at the same time without data loss:
Hope this helps in deciding whether RAID is worthwhile for you.
WARNING: If you have a power outage without a UPS, all bets are off.
A power outage can destroy the contents of all your disks if you don't have a proper UPS. A BBM may not be sufficient to help in that case.
Copy link to clipboard
Copied
Another well written article, Harm.
Copy link to clipboard
Copied
Dear Harm
since month i read all your posts, comments, articles. Congratulations, very well done, very, very interesting for "non professionals" to get useful and helpful information. Thank you for spending your time for the community. I think nowhere users can learn more.
If your time allows, may i hear your oppinion.
My System:
Vista ultim.64 bit, ProdPremium 4.1
I use HDV mpeg2 files captured 1440x1080.
drive C: 1 TB Samsung 7200 rpm...only used for the programms
drive D: 1 TB Samsung 7200, Raid0 ... only used for projekt and all projekt files
drive E: 1 TB Samsung 7200, .. only used for daily savings of drive D
after the Project is finished, i export the projekt, all used and not used files and encoded results to QNAP server, 8 TB Raid5.
Are i on a secure road or what shall i change. For me Nr 1 intention is to have as much speed as possible.
Having red your posts and watshing the benchmarks listened, i will go on to rebulit "Harms Best" vista 64bit system.
Maybe you will allow me to ask you one or another questions next time becouse of this issue .
Again best regards from klfi, austria, europe
Copy link to clipboard
Copied
If I understand you correctly, you have two 1 TB disks and two 500 GB disks, the latter being in a Raid0, correct?
I would change your disk allocation, with the current number of disks, as follows:
C: 500 GB for OS & Programs
D: 500 GB for pagefile, media cache, previews
E: 1 TB for project files, indexed and conformed/peak files
F: 1 TB for media
No raid at the moment, unless you are willing to get more disks.
This setup will better use the space on C, which is currently largely wasted. It also separates project files from media files. You can use the space on the D drive to make backups of the project files from E, until you move everything over to the QNAP.
BTW, which QNAP do you have, the TS809 Pro and is it populated with 1 TB disks?
Copy link to clipboard
Copied
C: 500 GB for OS & Programs
D: 500 GB for pagefile, media cache, previews
E: 1 TB for project files, indexed and conformed/peak files
F: 1 TB for media
I would change that up slightly myself.
C: 500 GB for OS & Programs (leave the page file alone)
D: 500 GB for Project and all Scratch files
E: 1 TB for Media
F: 1 TB for Exports
Copy link to clipboard
Copied
Jim,
First, it is better to use a fixed pagefile (min=max) on a separate disk to avoid file fragmentation and improve performance. Second, for exports disk performance is completely irrelevant, you could do that to any USB device without penalty and using 1 TB just for non-time critical writes (CPU is critical) seems like a waste of space and resources. Performance wise you are better off exporting either to the project file or to an external disk.
Copy link to clipboard
Copied
Harm,
thanks for your comment.
Your understanding was absolutely right.
Sorry my mistake its QNAP 639 PRO, populated with 5 1TB, one is empty.
So for my understanding, in my configuration you suggest NOT to use RAID-0. Im not willing to have more drives in my workstation becouse if my projekts are finished, i archiv on QNAP or archiv on other external drive.
My only intention is to have as much speed and as much performance as possible during developing a projekt
BTW QNAP i also use as media-center in combination with Sony PS3 to run the encoded files.
For my final understanding:
C: i understand
D: i understand
E and F: does it mean, when i create a projekt on E, all my captured and project-used MPEG - files should be situated in F? Or which media in F you mean?
Following your suggestions in want to rebulid Harms-Best Vista64-Benchmark comp to reach maximum speed and performance. Can i use in general the those hardware components (exept so many HD drives and exept Areca raid controller ) in my drive configuration C to F. Or would you suggest some changings in my situation?
Copy link to clipboard
Copied
I suggest NOT to use Raid with the number of disks you have.
Yes, with Media on the F drive I mean all your captured material and your projects on E.
If you say you want no more internal disks, but also achieve somewhat comparable results to my benchmarks, you are in for a deception. My results are rather exceptional just because I use a 12 disk raid30 on an Areca controller. That is the most important difference between other scores and my score.
The results Bill showed above are very impressive, especially for a single disk, Nearly 180 MB/s average read time, but if you compare it to my average read time of 853 MB/s still somewhat off, but that is only because of my massive array. Had I had those disks, I guess I would have had over 1,000 MB/s in a similar array, maybe even more, although the limitations of the PCI-e bus come in the picture as well.
Copy link to clipboard
Copied
With PCIe there has been an improvement with most motherboards that are less than a year old now supporting PCIe version 2.0 that doubled the initial rate with some day we will have PCIe version 3
here is some information on transfer speeds:
Capacity Per lane:
* v1.x: 250 MB/s
* v2.0: 500 MB/s
* v3.0: 1 GB/s
16 lane slot (x16):
* v1.x: 4 GB/s
* v2.0: 8 GB/s
* v3.0: 16 GB/s
The Areca controllers that are out now are only 8 lanes (x8) wide at present so yes the present generation of Areca boards would definitly be bottle-necked at 1 GB/s, but I think also the Intel IOP 341 chip would also be a bottleneck
Copy link to clipboard
Copied
Dear Harm
dear other authors,
thanks a lot for your comments
I will follow your suggestions, which means in my system:
C: 500 GB for OS & Programs
D: 500 GB for pagefile, media cache, previews
E: 1 TB for project files, indexed and conformed/peak files
F: 1 TB for media
For my general understanding:
which dirve in general has to be the fastest to have the best speed and performance during working in a project.
Which one is the importanst for performance and which not. Can you explain me the lines up in some words an why?
best regards klfi
Copy link to clipboard
Copied
C: 500 GB for OS & Programs
D: 500 GB for pagefile, media cache, previews
E: 1 TB for project files, indexed and conformed/peak files
F: 1 TB for media
Harm, where is any redundancy in this setup? Seems to me no more secure from data loss than a single, large drive! For performance, it's good...but for security, it's not much.
Why not a seemingly more secure and faster solution, like:
C: small SSD for OS & Programs (with builti n Raid 1)
D: small SSD for pagefile, media cache, previews (RAID0 or no RAID)
E: larger HDD RAID5 for project files, indexed and conformed/peak files and all the media.
Seems to be only a little more expensive, yet potentially much faster and much more secure.
Where's the holes in my logic? One HDD less, less power draw and it'd all fit in a 4 bay case.
Copy link to clipboard
Copied
Thanks sarmour2 ,
my poriorities in my system are not savings and security, becouse they were done regularely daily on external drives or QNAP during not working times automaticly.
My priotity one is only speed and performance during working in a project..as fast as possible.
Files which i captured and which are not used in a project are parked on an external drive or QNAP. They will be imported when they are used for a projekt. On the external drives or QNAP speed and performance is secudary and not necessary.
For ist better to wait some minutes for import than have no speed an performance during projectwork. If you follow my thoughts, would you stay on your poinion or would you change something.
thanks for your thoughts klf
Copy link to clipboard
Copied
Klfi, I guess for your setup, as Harm observed, what you stated seems right for what you have. Your workflow is very different from ours though.
Harm, I guess the best thing at the moment is to wait out the bleeding edge and see where the SSDs are in a couple of months. From what I've read, they have already resolved (or are very close to finalizing) those problems. If they remove that barrier, it would seem a very good setup for a new ws and something we may try early next year. One of our quad Intel boards is rolling past the 2 yr mark and needs to be replaced soon.
Since the controller would only be relevant for the RAID1 boot/mirror and for the RAID3 or 5 setup, which Areca board would be a good tradeoff of cost/benefit/performance? I know you like their equipment.
Copy link to clipboard
Copied
I personally would not try to pick out an Areca controller now for a future application. Current boards are relatively old designs and new designs are most likely immanent. They need to upgrade the designs to accommodate PCIe version 2 and SATA 6 Gbits/s to stay up with current technology and the competition.
Copy link to clipboard
Copied
Stephen,
There is one problem with your approach: KLFI has expressly stated he does not want any more internal disks in his system, so he has to do with the four disks he has. I only suggested a different allocation of the disks to the various tasks.
SSD's are still in their infancy and initial benchmark results do not show significant performance gains from using SSD's. Theoretically you are quite correct that your setup may be faster with the SSD's and a dedicated Raid5 (or Raid3, which may be even better) but that may mean that in addition to a number of SSD's, KLFI would also need to get at least 2 or 3 extra 1 TB disks and possibly a good controller. The sum total of that investment may be prohibitive for his workflow, since he alreay has the QNAP.
A practical limitation at this moment is that Intel is still working on the required firmware upgrade to fully support the trim function, which is essential to limit the write performance degradation that SSD's were plagued with. It may be 1 or 2 months to be released.
Copy link to clipboard
Copied
Great article on RAID. But I have another suggestion for those with very deep pockets. How about a 1 TB SSD in a 3.5-inch form factor for a mere $3500. It has read rate of up to 260 MBps, write of up to 260MBps and sustained write of up to 230MBps.
Copy link to clipboard
Copied
Imagine that in a 12 disk RAID30....
Copy link to clipboard
Copied
Harm Millaard wrote:
Imagine that in a 12 disk RAID30....
Areca might have to do some new homework to make it effective.
I mentioned to Harm in a PE that I was very impressed with results of the "Poath Junction" PPBM4 AVI encoding score with just two disk drives. I assumed that they were the newer Seagate 15K.7 drives. I ordered one and here are the HDtach results:
The PPBM results are very much like the results I get for two Seagate 7200.12 1 TB disks in RAID 0
Copy link to clipboard
Copied
All this RAID talk reminded me of a video with 24 SSD in raid.
http://www.youtube.com/watch?v=96dWOEa4Djs
Fun to see how fast everything ran.
Enjoy: Glenn
Copy link to clipboard
Copied
I'm not sure it's the right place but it seems you have this knowledge so I'l try:
I'm considering 3 disks of 2TB from the following:
Hitachi Deskstar 7K2000 SATA II - 2.0 TB
Seagate Barracuda LP SATA II - 2.0 TB
WD Caviar Green SATA II - 2.0 TB (32MB)
They are all at ascending order of price,
I wanted to know if there are any recommendations and/or if experience tell some tend to fail.
10x
Copy link to clipboard
Copied
Green is unsuitable for editing.
Seagate Barracuda is a disaster it you are talking about the 7200.11 series, better is the 7200.12
Hitachi I don't know.
For single disk use look at WD Caviar Black, for raid useage look at the WD RE4.
Copy link to clipboard
Copied
This is so great! Yesterday, I had decided I needed to learn more about RAID and whether or not I was taking full advantage of it..
..
I did a little search and VoilĆ !
Wonderful article - right there waiting for me.
Thanks Harm. I really appreciate it.
Jessica Vecchione
Copy link to clipboard
Copied
Hi,
I followed your advices and configured my media system with WD Black 2TB disks, but the data transfer seems still slow...
As for the other it worked well with your configuration and macbook pro, I have to conclude that the problem is the external case -
I use Raidsonic dual bay with USB 2.0
Can you recommend any other external case, I could use (2-4 bay) with FireWire 800 (as eSata to Macbook does not work to my understanding)
Cheers!
Copy link to clipboard
Copied
The slowness is caused by using USB2, that is about the worst connection to use when you want speed.
Isn't there a way to get eSATA on a MAC by using a PCI Express card to eSATA connector? Or would this be your solution: http://www.g-technology.com/products/g-raid.cfm
Copy link to clipboard
Copied
Harm, just a comment on that particular 4 TB external drive with the two big Hitachi's . I have a new one on a workstation (win 7 x64) with the eSata connection and am not real impressed so far. It tested at 160 MB/s with HD Tach, but in real life seems to be much slower. Transfers to it from another internal 1 TB SATA drive were only 64 MB/s, which is much slower than going from my other internal drives to each other, or across our 1 Gb network.
I'm not sure if something else is wrong somewhere in this setup, but he probably should get other opinions before deciding. If it's something else on this particular ws, then it still could be a very good solution. But it would be good to hear from others on that drive too...