Inspiring

Question

Raid Performance and Rebuild Issues

Forum|Forum|14 years ago
January 17, 2012
5 replies
22261 views

Rebuilding a Raid array

What happens when you have a Raid array and one (or more) disk(s) fail?

First let's consider the work-flow impact of using a Raid array or not. You may want to refresh your memory about Raids, by reading Adobe Forums: To RAID or not to RAID, that is the... again.

Sustained transfer rates are a major factor in determining how 'snappy' your editing experience will be when editing multiple tracks. For single track editing most modern disks are fast enough, but when editing complex codecs like AVCHD, DSLR, RED or EPIC, when using uncompressed or AVC-Intra 100 Mbps codecs, or using multi-cam or multiple tracks the sustained transfer speed can quickly become a bottleneck and limit the 'snappy' feeling during editing.

For that reason many use raid arrays to remove that bottleneck from their systems, but this also raises the question:

What happens when one of more of my disks fail?

Actually, it is simple. Single disks or single level striped arrays will lose all data. And that means that you have to replace the failed disk and then restore the lost data from a backup before you can continue your editing. This situation can become extremely bothersome if you consider the following scenario:

At 09:00 you start editing and you finish editing by 17:00 and have a planned backup scheduled at 21:00, like you do every day. At 18:30 one of your disks fails, before your backup has been made. All your work from that day is lost, including your auto-save files, so a complete day of editing is irretrievably lost. You only have the backup from the previous day to restore your data, but that can not be done before you have installed a new disk.

This kind of scenario is not unheard of and even worse, this usually happens at the most inconvenient time, like on Saturday afternoon before a long weekend and you can only buy a new disk on Tuesday...(sigh).

That is the reason many opt for a mirrored or parity array, despite the much higher cost (dedicated raid controller, extra disks and lower performance than a striped array). They buy safety, peace-of-mind and a more efficient work-flow.

Consider the same scenario as above and again one disk fails. No worry, be happy!! No data lost at all and you could continue editing, making the last changes of the day. Your planned backup will proceed as scheduled and the next morning you can continue editing, after having the failed disk replaced. All your auto-save files are intact as well.

The chances of two disks failing simultaneously are extremely slim, but if cost is no object and safety is everything, some consider using a raid6 array to cover that eventuality. See the article quoted at the top.

Rebuilding data after a disk failure

In the case of a single disk or striped arrays, you have to use your backup to rebuild your data. If the backup is not current, you lose everything you did after your last backup.

In the case of a mirrored array, the raid controller will write all data on the mirror to the newly installed disk. Consider it a disk copy from the mirror to the new disk. This is a fast way to get back to full speed. No need to get out your (possibly older) backup and restore the data. Since the controller does this in the background, you can continue working on your time-line.

In the case of parity raids (3/5/6) one has to make a distinction between distributed parity raids (5/6) and dedicated parity raid (3).

Dedicated parity, raid3

If a disk fails, the data can be rebuild by reading all remaining disks (all but the failed one) and writing the rebuilt data only to the newly replaced disk. So writing to a single disk is enough to rebuild the array. There are actually two possibilities that can impact the rebuild of a degraded array. If the dedicated parity drive failed, the rebuilding process is a matter of recalculating the parity info (relatively easy) by reading all remaining data and writing the parity to the new dedicated disk. If a data disk failed, then the data need to be rebuild, based on the remaining data and the parity and this is the most time-consuming part of rebuilding a degraded array.

Distributed parity, raid5 or raid6

If a disk fails, the data can be rebuild by reading all remaining disks (all but the failed one), rebuilding the data and recalculating the parity information and writing the data and parity information to the failed disk. This is always time-consuming.

The impact of 'hot-spares' and other considerations

When an array is protected by a hot spare, if a disk drive in that array fails the hot spare is automatically incorporated into the array and takes over for the failed drive. When an array is not protected by a hot spare, if a disk drive in that array fails, remove and replace the failed disk drive. The controller detects the new disk drive and begins to rebuild the array.

If you have hot-swappable drive bays, you do not need to shut down the PC, you can simply slide out the failed drive and replace it with a new disk. Remember, when a drive has failed and the raid is running in 'degraded' mode, there is no further protection against data loss, so it is imperative that you replace the failed disk at the earliest moment and rebuild the array to a 'healthy' state.

Rebuilding a 'degraded' array can be done automatically or manually, depending on the controller in use and often you can set the priority of the rebuilding process higher or lower, depending on the need to continue regular work versus the speed required to repair the array to its 'healthy' status.

What are the performance gains to be expected from a raid and how long will a rebuild take?

The most important column in the table below is the sustained transfer rate. It is indicative and no guarantee that your raid will achieve exactly the same results. That depends on the controller, the on-board cache and the disks in use. The more tracks you use in your editing, the higher the resolution you use, the more complex your codec, the more you will need a high sustained transfer rate and that means more disks in the array.

Sidebar: While testing a new time-line for the PPBM6 benchmark, using a large variety of source material, including RED and EPIC 4K, 4:2:2 MXF, XDCAM HD and the like, the required sustained transfer rate for simple playback of a pre-rendered time-line was already over 300 MB/s, even with 1/4 resolution playback, because of the 4 4 4 4 full quality deBayering of the 4K material.

Final thoughts

With the increasing popularity of file based formats, the importance of backups of your media can not be stressed enough. In the past one always had the original tape if disaster stroke, but no longer. You need regular backups of your media and projects. With single disks and (R)aid0 you take risks of complete data loss, because of the lack of redundancy. Backups cost extra disks and extra time to create and restore in case of disk failure.

The need for backups in case of mirrored raids is far less, since there is complete redundancy. Sure, mirrored raids require double the number of disks but you save on the number of backup disks and you save time to create and restore backups.

In the case of parity raids, the need for backups is more than with mirrored arrays, but less than with single disks or striped arrays and in the case of 'hot-spares' the need for backups is further reduced. Initially, a parity array may look like a costly endeavor. The raid controller and the number of disks make it expensive, but if you consider what you get, more speed, more storage space, easier administration, less backups required, less time for those backups, continued working in case of a drive failure, even though somewhat sluggish, the cost is often worth more with the peace-of-mind it brings, than continuing with single disks or striped arrays.

This topic has been closed for replies.

Simo90

Participant

Have you updated the chart you gave here (http://forums.adobe.com/thread/662972?start=0&tstart=0) that lists how you would configure disks if someone only had 1 disk, then 2 disk, then 4 disk, etc. I edit HD video once or twice a month, so I don't want a system that is really slow, but I also don't need a lightening fast one. I'm thinking about getting maybe 3 or 4 hard drives . . . maybe five.

UlfLaursen

Inspiring

Hi Brad

There are several resent threads regarding disc setup f.ex. this one:

http://forums.adobe.com/thread/1190731?tstart=0

I think the buttom line or summary is to spread the different things out on several discs, to gain performance, avoid so called "green" discs and external discs (if you are on a desktop) and get the discs as fast and big as you can afford.

Ulf

nofxspam02

Participant

Drive Failure: I was wondering if anyone could help me with my RAID 10 setup. This build was my first time ever using RAID. I bought the hdd in August 2011. I bought 4 HGST Deskstar 7K3000 HDS723015BLA642 (0F12114) 1.5TB 7200 RPM 64MB Cache SATA 6.0Gb/s 3.5" Internal Hard Drives.

This morning I turned on the computer and had a S.M.A.R.T. warning on boot. When I booted the computer I had my The Intel® Rapid Storage Technology application pop up with a warning sign on the first disc in my array. It doesn't offer any details when I click on Manage. Port: Unknown, Status: At risk Reset disk to normal, Type: Hard Disk, Usage: Array disk, Size: 0MB, Serial number: (has my serial), Model: (blank), Firmware: (blank).

Under advanced: Password protected: No, Disk date cache: Enabled, Native command queuing: No, SATA transfer rate: Inactive, Physical sector size: 512 Bytes, Logical sector size: 512 Bytes.

So now basically I need to know how to proceed. I thought in RAID you needed to have the exact same hard drives. The hard drive is no longer available for purchase. I heard Hitachi's are weird sizes so getting another brand with same specs may be an issue?

Can anyone help me out please?

StarMarc

Participating Frequently

Harm, as i'm still trying to understand RAID as it relates to video editng.. please correct (or anyone for that matter) me if im wrong..

RAID 3 writes parity (data it calculates from existing incoming data, and uses in a rebuild) to a dedicated disk and RAID 5 writes parity in distribution across all disks. Essentially they are doing the same things but in different ways.

From your chart, RAID 3 has a 5% advantage over RAID 5 in speed, yet both carry the same capacity and 1 disk failure rate. So why would anyone chose RAID 5 over 3??

In addition, provided everything was the same, would RAID 3 have a faster rebuild time then RAID 5???

When I do build my system (just waiting for the 3930k's to finally be in stock), I will definitely run these sort of scenarios for my own reference, but would love your insight into this.

ALSO, i've found that the Canon 5D's data rate to be around 5MB/sec and the cinfeorm codec i transcode it too is around 20MB/sec. Does this mean even a normal SATA 2 HDD sustained transfer speed of 128MB/sec is more than enough to handle this type of source footage?? Or is it not as simple as that??

Harm_MillaardAuthor

Inspiring

Raid3 is better suited for video editing work, because it is more efficient when using large files, as clips usually are. Raid5 is better suited in high I/O environments, where lots of small files need to be accessed all the time, like news sites, webshops and the like. Raid3 will usually have a better rebuild time than raid5.

But, and there is always a but, raid3 requires an Areca controller. LSI and other controller brands do not support raid3. And Areca is not exactly cheap...

Keep in mind that a single disk shows declining performance when the fill rate increases. See the example below:

A Raid3 or Raid30 will not show that behavior. The performance remains nearly constant even if fill rates go up:

Note that both charts were created with Samsung Spinpoint F1 disks, an older and slower generation of disks and with an older generation Areca ARC-1680iX-12.

StarMarc

Participating Frequently

Not necessarily. In most (but not all) cases, the smaller-capacity hard drives tend to be older, lower-density designs that deliver significantly slower sequential transfer speeds than their larger-capacity, higher-density siblings or cousins. You see, both drives use a single platter - but if both the 250GB and the 500GB drives use the entire area of both sides of the platter, the 250GB drive will be significantly slower in sequential speed than its 500GB cousin (only about 90 MB/s versus 130 MB/s). Put together, the six-disk array of the older 250GB drives will be only slightly faster than the four-disk array of newer-design 500GB drives, but you'll lose significant total capacity - only about 1.25GB total with the six-disk array of 250GB disks versus 1.5TB with the four-disk array of 500GB disks.

Got it, thanks. Following your logic (what ive learned from reading my computer building books) than thats why the higher density disks like 2TB hdds have higher sustained transfer speeds... i didn't realize the speed difference was that great between 250 and 500 tho..

illucine

Inspiring

Great article Harm! Thanks for your work on this. I have a couple of questions.

1) You say:

Sustained transfer rates are a major factor in determining how 'snappy' your editing experience will be when editing multiple tracks. For single track editing most modern disks are fast enough, but when editing complex codecs like AVCHD, DSLR, RED or EPIC, when using uncompressed or AVC-Intra 100 Mbps codecs, or using multi-cam or multiple tracks the sustained transfer speed can quickly become a bottleneck and limit the 'snappy' feeling during editing.

I thought that complex codecs actually have a lower data rate and thus should not require any faster sustained transfer speed. The "complexity" of them requires a faster CPU to decode in real-time but I don't understand why they would need a faster disk system. I do understand why uncompressed or lightly-compressed video would require a faster disk sytem.

2) I'm curious where you got the 5% risk of data loss number for single disks.

Roy

Harm_MillaardAuthor

Inspiring

1. Take for instance a RED 4K clip. It already requires a huge bandwidth to process with that resolution, but that is made much worse by the on-the-fly expansion to 4 4 4 4 BGRA or BGRX format, effectively multiplying the storage requirements. Take note of my sidebar in the article:

Sidebar: While testing a new time-line for the PPBM6 benchmark, using a large variety of source material, including RED and EPIC 4K, 4:2:2 MXF, XDCAM HD and the like, the required sustained transfer rate for simple playback of a pre-rendered time-line was already over 300 MB/s, even with 1/4 resolution playback, because of the 4 4 4 4 full quality deBayering of the 4K material.

2. An arbitrary number I chose, although 1 out of 20 giving a failure does not seem unreasonable. With the Seagate 7200.11, a notoriously bad disk, I got 12 out of 7 disk failures (where 5 were exchanged under warranty, but still failed again), so you could say a 100% success rate or even a 171% success rate . In comparison, 5% is very low.

illucine

Inspiring

Thanks for the clarification. I would have thought 4 4 4 4 expansion would be written to the cache disk, although if you are just using one big fast volume of RAID for everything then I guess it would be the same disk. I'm still not sure why something like AVCHD would require greater disk bandwidth than something like HDV.

StarMarc

Participating Frequently

Very cool stuff and again a great tutorial!

I was wondering if you (or anyone else for that matter) could give some general Sustained Transfer Rate guidlines in regards to editing popular codecs comfortably (i.e. in another thread someone mentioned 4k needing at least 700+MB/sec).

For example in my case, I edit mostly Canon 5D/7D footage. So when building a RAID within a certain budget, what sort of Sustained Transfer Rate should I be shooting for??? At what speed threshold would you NOT see noticeable improvement..( for example would 1000/MB sec be overkill for DSLR footage and money best spent on more RAM)

So for example (again) If the answer is 300+ MB/sec, then I know based on your tables above, how to caclucate the necessary variables to achieve this target plus provide for redundancy.

Finally, in your experience/opinion, how much more improvement in performance do you see between 512 RAID controller cache and say 2GB?? 50% faster? 25%???

Thanks

Harm_MillaardAuthor

Inspiring

Marc,

Very valid questions, but to answer them requires quite an effort. Even if one were to limit themselves to single source material per test, say only HDV or only AVCHD or only RED 4K, it would require tests with 1 track up to say 7 tracks to determine the necessary transfer rates. That means 21 runs for these three formats, better run several times to avoid measurement errors, so in all it will probably be more like 63 or 84 runs. Then collecting the data and presenting them in a handy format, with all the caveats that go with such experiments and you can imagine the time and effort that would go into this. Nevertheless, your request is well noted and when I have the time, I'll keep it foremost in my mind.

I can tell you the performance gain of a 2 GB cache over a 512 MB cache is no more than around 10%, I haven't tried the advantage of 4 GB cache, but would expect it to be around 15% overall, compared to 512 MB, but with a cost of € 40 for that memory, it appears very worthwhile to me. In the past, when DDR2-ECC 667 or 800 was used, it was way more expensive, but now it only requires less expensive and faster DDR3-ECC 1333 memory, so it looks much more attractive than it did in the past.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded