Why Modern RAID 5 is Ideal for Oracle Databases

There is a convention of thought amongst Oracle DBA’s that databases should never be installed on disks that are configured into a RAID 5 array. The argument goes, that since Oracle accesses and writes to random points within relatively large files, the overhead of constantly calculating block-level parity on these files is substantial, resulting in serious performance degradation. They suggest that RAID 1 (mirroring) is the ideal disk configuration since no parity needs to be calculated, and Oracle is more than happy to divide up its database over many smaller mount points.

This way of thinking has largely been correct over the years because most systems have traditionally used software RAID. This means that the CPU of the server itself had the job of doing all those parity calculations, and it really did slow down both the server and the disk when RAID 5 configurations were used. Oracle, in particular, had a hard time with these configurations for the exact reasons the DBA’s point to.

In many cases, software RAID is still used, and to be sure, it is wholly inappropriate to deploy RAID 5 in these environments. However, it is increasingly common to find IT departments using a SAN-type architecture where the RAID type and configuration are invisible to the host operating system. In these environments, the the disk array has a dedicated controller that is singly tasked with handling all read, write, and parity operations. The RAID controller is no longer software running on a generic CPU, but rather firmware that is optimized to handle parity calculations. This results in a system where parity is calculated so quickly by the dedicated controller that differences in speed between RAID 1 and Raid 5 should be virtually nonexistent.

To prove this, I carved up our new InfoTrend EonStor A12F-G2221 into three arrays – a RAID 5, a RAID 1, and a RAID 10. I then set out to run some benchmarks on these different arrays to see what, if any, the differences would be.

The hardware used was as follows:

  • The RAID 5 LUN consisted of 4 drives
  • The RAID 1 LUN consisted of 2 drives
  • The RAID 10 LUN consisted of 4 drives

I then identified the iozone tests that most accurately simulated Oracle disk activity. What I really wanted to do was to simulate select and update queries on various sized files and see how the different RAID types held up under the load. To do this, I ran iozone, a well-respected benchmark utility, with the following arguments:

/opt/iozone/bin/iozone -Ra -g 2G -b /home/sysop/new/raid5-2G-1.wks

This put the disk through its paces, as it ran the iozone tests in automatic mode on a 2 Gb file, but in the end, I was interested in analyzing the following tests because they were the ones our DBA team suggested would most closely represent database activity.

Random Read (select queries)

This test measures the performance of reading a file with accesses being made to random locations within the file. The performance of a system under this type of activity can be impacted by several factors such as: Size of operating system’s cache, number of disks, seek latencies, and others.

Random Write (update queries)

This test measures the performance of writing a file with accesses being made to random locations within the file. Again the performance of a system under this type of activity can be impacted by several factors such as: Size of operating system’s cache, number of disks, seek latencies, and others.

Strided Read (more complex select queries)

This test measures the performance of reading a file with a strided access behavior. An example would be: Read at offset zero for a length of 4 Kbytes, then seek 200 Kbytes, and then read for a length of 4 Kbytes, then seek 200 Kbytes and so on. Here the pattern is to read 4 Kbytes and then %%[Page: 3]%%

I ran several instances of the same tests using the same command line to ensure that there were no anomalies, and the machine was doing nothing else during the tests besides running the host OS. The results were pretty much as I expected, and I found little to no variation between the raid types on this disk subsystem.

Random Read Tests:

In this test, there seems to be the slightest advantage to the mirror-type RAID arrays when it comes to very small files. This, I suspect can be attributed to actual drive head latency as, in RAID 5 volumes, the correct block needs to be found on a larger number of disks. This advantage quickly falls off, however as the file size grows, meaning that this slight advantage would not be seen in an Oracle database.

Random Write Tests:

In this test, both RAID 5 and RAID 10 seem to hold a slight advantage over the direct mirror. This, I would imagine can be attributed to the fact that the writes are happening over a larger number of spindles. This indicates that the controller is calculating the parity faster than the 2Gb connection speed to the disk subsystem. Again, the variation is incredibly small, so there is no arguable performance advantage to using one type of RAID over another when using a hardware controller.

Stride Read Tests:

Here again we see no real advantage to one RAID type over any other. It could be said that the RAID 10 volume held up ever-so-slightly better on this test, but any edge is so slight that it would be hard to imagine how this could translate into a noticeable performance gain in an Oracle database.

In the end, these tests proved my suspicion that hardware RAID controllers have become so efficient and fast that it no longer makes any real difference what type of RAID you decide to use for your Oracle database. Largely gone are the days when your disk space and RAID volumes were inexorably tied to the server itself. So long as you are using hardware RAID, and the LUNS are abstracted from your operating system, you can largely feel free to make the most of your storage dollar by using RAID 5 in your production database environments.

16 thoughts on “Why Modern RAID 5 is Ideal for Oracle Databases

  1. Huh. As it turns out, the MySQL gurus recommend RAID 5:

    From a performance standpoint, RAID 5, which is striping (RAID 0) with distributed parity blocks, can be beneficial. There are two disks involved in every operation, so it’s not substantially faster than RAID 1 until you have more than three disks total. Even then, its other benefit, size, shines through. Using RAID 5, you can create rather large volumes without spending a lot of cash because you sacrifice only a single disk. By using more smaller disks, such as eight 36-GB disks instead of four 72-GB disks, you increase the number of spindles in the array and therefore boost seek performance and throughput.

  2. Hi,
    I just came across your blog entry as I was looking for a way to use IOZone to simulate Oracle. However, after running the command you suggested, I discovered that once IOZone sets the file size to 32 MB in the automated run, it starts using 64k record sizes and above and does not test 4k or 8k block sizes, which I believe are closer to the record sizes that Oracle typically uses. Any ideas?

  3. the real question is whether the author had write-caching enabled on his/her disks and his/her RAID controller. if so, he/she should try to replicate these results with caching disabled.

  4. The real problem is that there are really no conclusions that one can draw from this test. I/O transfer rates are completely independent of file size, and are strictly a function of block size and number of spindles.

    The fact that the graphs fall off at about 256MB indicates that all that is being measured is the linux kernel buffer cache behavior, not the I/O subsystem behavior.

    iozone is not representative of oracle in any way.

    oracle generally uses raw devices on traditional unix, and O_DIRECT on linux to bypass the kernel buffer cache entirely.

    There is no possible way to get 2.5Gbytes/sec from a pair of 2Gbits/sec fiber channel ports, so the graphs are either measuring buffer cache performance or are mislabeld.

  5. That is an interesting test but I do have a question – what type of drives were being used the Infortrend SAN used I believe take only SATA drives. The speed of the System is always defined by the slowest subsystem and I agree that if you plan to use SATA drives then RAID 5 or even RAID 6 will probably keep up with the speed of the drives as the new controllers are highly optimized for RAID 6 performance. But put in 4GB FC and faster FC or SAS (duplex) compared to SATA(single channel) drives and the bottle neck will I believe move back to the controller. Now I would love to see that test with concurrent read and writes as well. On the other hand this article did answer my concern if the Infortrend SAN would handle my I/O demands and I guess it will. With the cheap SAS drive SAN available I’ll stay with RAID 10 for the time being but I am open to RAID 5 except like old DBA’s of the past when drives were expensive I have seen database performance come to its knees with RAID 5 – once bitten twice shy.

  6. Vic,

    That’s a very interesting point… The drives were, indeed, SATA, so it would be very interesting to see how the tests would have come out had they been faster FC drives over a 4GB link. We just got an EMC CX310, so perhaps I can set up a RAID 10 with some of our faster drives to see how things work out. Thanks for the comment.

  7. When you say that the RAID 5 LUN consisted of 4 drives and the RAID 10 LUN consisted of 4 drives, you’re not really comparing llike with like.

    I would like to see you run a mixture of highly concurrent random reads and writes on the following set up:-

    A RAID 5 LUN consisted of 3 drives
    A RAID 10 LUN consisted of 4 drives

    Then, halfway through your test, pull one disk out of the array and measure your performance thereafter.

    Whichever way you look at it RAID 5 is cheap and nasty. You get what you pay for and when RAID 5 goes into degraded mode your SAL goes down the toilet.

    For a fuller understanding of RAID 5 and why it is NOT ideal for Oracle databases, visit:- http://www.baarf.com/

  8. Richard,
    With all due respect, I think my understanding of RAID 5 is quite full enough thank you. You make a point that a RAID 5 LUN consisting of 5 drives and a RAID 10 LUN consisting of 4 drives is not entirely apples to apples because the RAID 5 is writing to 4 spindles while the RAID 10 is writing to 2 spindles. It is, however a fair comparison, given that the same number of drives per LUN are used. In order to write to 4 drives in a RAID 10, I would need 8 drives, which only illustrates my point. Most storage experts agree that performance is gained by adding spindles, and unless you are doing massively random writes to small files, there is very little to be gained by using RAID 10 over 5 in this day and age.

    As for degraded mode, you are correct, although higher end arrays like the CLARiiON will fail over to your hot spare even if the failing drive is generating errors, generally preventing you from ever running in degraded mode. Degraded mode should not be though of as an operational norm, but rather a way to prevent data loss. If you environment is such that you can’t afford the degraded speed while the hot spare is brought on-line, then by all means, use RAID 10.

    The days of S.A.M.E. “Stripe And Mirror Everything” are gone, and DBA’s who still adheare to it as if it’s the holly grail should step back and really think about the types of writes they need to do. This is not to say that RAID 10 does not have its place, but rather that the type of RAID should not be decided entirely by the application, but rather they type of writes the application will be doing.

  9. Cliff,

    Thank you for you quick reply to my comments.

    My point was that a RAID 5 LUN consisting of 3 (three) drives and a RAID 10 LUN consisting of 4 drives IS comparing apples to apples because the RAID 5 is writing to 2 spindles while the RAID 10 is also writing to 2 spindles.

    I totally agree that more spindles equals better performance but the temptation to use RADI5 just to save a few disks is very naiive. An 8 disk RAID 1+0 LUN will require 16 disks and its RAID 5 counterpart only requires 9 for the same capacity. But disks are really cheap these days so why not use the additional 7 disks and have excellent performance with/without losing a disk?

    Juan Loaiza summarises RAID 5 very eloquently at http://www.miracleas.com/BAARF/Juan_Loaiza_on_raid_V.html

    “If you look at all the vendors that implement RAID-5 you will find that they all set the stripe size to a relatively small value, usually 32K or 64K.
    The result of this is that a large sequential read (like 1M) will span a lot of disks (like 16).
    Because of this, a lot of disk spindles will be made busy for a single IO.
    In mixed mode scenarios where there are random IOs going on, it will slow down the whole IO subsystem by using up lots of disk drives.
    Why don’t they set the stripe size bigger?
    Because in normal file systems, people tend to write a whole file sequentially.
    The RAID-5 vendors want to take advantage of this sequential write to eliminate the RAID-5 penalty.
    Any time you can write across a full stripe set, you can avoid the extra reads and writes that are required for random IOs in RAID-5.
    This is because you can calculate the parity values directly without having to read the old ones from disk.
    Small stripe units also help to eliminate the RAID-5 penalty when a large NVRAM cache is used.
    Locality of reference is more likely to create a full stripe set of blocks all present in the cache if the stripe set is small.
    So, RAID-5 creates a tradeoff where you want to have small stripe units to avoid the RAID-5 penalty for sequential
    writes, but this hurts you any time you have a mixed workload environment in which there are some scans going on in parallel with random access OLTP activity. .”

    You would be mistaken to believe that the days of S.A.M.E. “Stripe And Mirror Everything” are gone.Oracle’s ASM is a prime example why S.A.M.E. is very much alive and here to stay.

  10. Lol, I can imagine a DBA will lose the job if they take your RAID 5 benchmark.

    The biggest flaw of this benchmark : it doesn’t indicate how many concurrent logical user are making the query/updates. You can keep crunching the block size for 1-10 user and see no different of IO performance for any database in RAID5 vs RAID10.

    Try convince me with a multiuser benchmark.

  11. 1: You need to test a MUCH larger file size so you’re getting out of your memory cache. On a machine with 256Meg, that means something in the 1G or more range. since your test is invalid and you didn’t seem to notice, I’d have a hard time taking advice from you.

    2: RAID-5 is NOT a match for RAID-10. I’ve run plenty of my own tests, and under heavy concurrent load RAID-10 with the same number of disks performs much much faster.

    3: I’m a pgsql DBA, have done some Oracle DBA work. Both benefit quite a bit from a good RAID-10 setup.

    • There are a lot of people who don’t seem to like what I’m saying, but nobody is positing numbers or graphs. Since you come to the table with nothing more than anecdotal reports of RAID 10’s superiority, I have a hard time taking advice from you.

      I think you’ll find that most modern storage subsystems make great use of cache coalescing in the attempt to make corona writes. This is a benefit regardless of file size. In fact, RAID 5 often benefits more since it generally spans more disks in real-world examples.

      My point here is that performance is much more dependent on storage subsystem type than it is on RAID type. It is wrong to say that RAID 5 is inappropriate for ALL Oracle databases.

  12. Sorry, but the burden of evidence is on you, the proclaimer that RAID-5 is just peachy keen to run Oracle on. I’ve been building big beefy database servers for about 10 years.

    With a big RAID-10 I can get incredible throughput and parallel performance, both read and write. With RAIF-5 or RAID-6 I am lucky to get 20 to 30% of the same performance with the same number of drives. I’ve tested it, I know it.

    You have created an unrealistic test with too small a data set and refuse to redo your tests to see the error. That’s your fault, not ours. Any realistic test with a large enough dataset to not fit in RAM is going to show you just how bad RAID-5 is. For good measure, pull a drive from each RAID-5 and RAID-10 while testing, and start a rebuild to see how horrifically slow RAID-5 can really be.

    You are wrong that performance is more dependent on storage subsystem than RAID type. Last company I was at had a RAID-6 14 drive system under Solaris for Oracle. My 4 disk SATA RAID-10 routinely beat it soundly with Pgsql on top, both in synthetic benchmarks and in regular reporting applications.

    Make realistic test then come back. For now your lack of homework gets you a D-.

  13. Let me make this one point perfectly clear. Due to caching effects, the only part of the graphs you have posted are the bits to the right of the 256Meg portion, which are so small and compressed as to be unreadable.

    You need to run tests on larger data sets and for longer periods so caching isn’t what you’re testing, but the RAID set itself.

    My machines have 32Gigs of memory, so the databases I test on are routinely 64G or larger to make sure I’m testing the actual drive subsystem where it counts, when the dataset is bigger than memory. It’s a very very very basic tenet of performance testing that your dataset has to exceed RAM to be useful, unless you’re planning on working with a dataset that will always be smaller than RAM. At which point you can store your data on tape drives and get good performance.

Leave a Reply