• Why Modern RAID 5 is Ideal for Oracle Databases

    There is a convention of thought amongst Oracle DBA’s that databases should never be installed on disks that are configured into a RAID 5 array. The argument goes, that since Oracle accesses and writes to random points within relatively large files, the overhead of constantly calculating block-level parity on these files is substantial, resulting in serious performance degradation. They suggest that RAID 1 (mirroring) is the ideal disk configuration since no parity needs to be calculated, and Oracle is more than happy to divide up its database over many smaller mount points.

    This way of thinking has largely been correct over the years because most systems have traditionally used software RAID. This means that the CPU of the server itself had the job of doing all those parity calculations, and it really did slow down both the server and the disk when RAID 5 configurations were used. Oracle, in particular, had a hard time with these configurations for the exact reasons the DBA’s point to.

    In many cases, software RAID is still used, and to be sure, it is wholly inappropriate to deploy RAID 5 in these environments. However, it is increasingly common to find IT departments using a SAN-type architecture where the RAID type and configuration are invisible to the host operating system. In these environments, the the disk array has a dedicated controller that is singly tasked with handling all read, write, and parity operations. The RAID controller is no longer software running on a generic CPU, but rather firmware that is optimized to handle parity calculations. This results in a system where parity is calculated so quickly by the dedicated controller that differences in speed between RAID 1 and Raid 5 should be virtually nonexistent.

    To prove this, I carved up our new InfoTrend EonStor A12F-G2221 into three arrays – a RAID 5, a RAID 1, and a RAID 10. I then set out to run some benchmarks on these different arrays to see what, if any, the differences would be.

    The hardware used was as follows:

    • The RAID 5 LUN consisted of 4 drives
    • The RAID 1 LUN consisted of 2 drives
    • The RAID 10 LUN consisted of 4 drives

    I then identified the iozone tests that most accurately simulated Oracle disk activity. What I really wanted to do was to simulate select and update queries on various sized files and see how the different RAID types held up under the load. To do this, I ran iozone, a well-respected benchmark utility, with the following arguments:

    /opt/iozone/bin/iozone -Ra -g 2G -b /home/sysop/new/raid5-2G-1.wks

    This put the disk through its paces, as it ran the iozone tests in automatic mode on a 2 Gb file, but in the end, I was interested in analyzing the following tests because they were the ones our DBA team suggested would most closely represent database activity.

    Random Read (select queries)

    This test measures the performance of reading a file with accesses being made to random locations within the file. The performance of a system under this type of activity can be impacted by several factors such as: Size of operating system’s cache, number of disks, seek latencies, and others.

    Random Write (update queries)

    This test measures the performance of writing a file with accesses being made to random locations within the file. Again the performance of a system under this type of activity can be impacted by several factors such as: Size of operating system’s cache, number of disks, seek latencies, and others.

    Strided Read (more complex select queries)

    This test measures the performance of reading a file with a strided access behavior. An example would be: Read at offset zero for a length of 4 Kbytes, then seek 200 Kbytes, and then read for a length of 4 Kbytes, then seek 200 Kbytes and so on. Here the pattern is to read 4 Kbytes and then %%[Page: 3]%%

    I ran several instances of the same tests using the same command line to ensure that there were no anomalies, and the machine was doing nothing else during the tests besides running the host OS. The results were pretty much as I expected, and I found little to no variation between the raid types on this disk subsystem.

    Random Read Tests:
    Raid5-random-read.jpg
    Raid1-random-read.jpg
    Raid10-random-read.jpg

    In this test, there seems to be the slightest advantage to the mirror-type RAID arrays when it comes to very small files. This, I suspect can be attributed to actual drive head latency as, in RAID 5 volumes, the correct block needs to be found on a larger number of disks. This advantage quickly falls off, however as the file size grows, meaning that this slight advantage would not be seen in an Oracle database.

    Random Write Tests:
    Raid5-random-write.jpg
    Raid1-random-write.jpg
    Raid10-random-write.jpg

    In this test, both RAID 5 and RAID 10 seem to hold a slight advantage over the direct mirror. This, I would imagine can be attributed to the fact that the writes are happening over a larger number of spindles. This indicates that the controller is calculating the parity faster than the 2Gb connection speed to the disk subsystem. Again, the variation is incredibly small, so there is no arguable performance advantage to using one type of RAID over another when using a hardware controller.

    Stride Read Tests:
    Raid5-strided-read.jpg
    Raid1-stride-read.jpg
    Raid10-stride-read.jpg

    Here again we see no real advantage to one RAID type over any other. It could be said that the RAID 10 volume held up ever-so-slightly better on this test, but any edge is so slight that it would be hard to imagine how this could translate into a noticeable performance gain in an Oracle database.

    In the end, these tests proved my suspicion that hardware RAID controllers have become so efficient and fast that it no longer makes any real difference what type of RAID you decide to use for your Oracle database. Largely gone are the days when your disk space and RAID volumes were inexorably tied to the server itself. So long as you are using hardware RAID, and the LUNS are abstracted from your operating system, you can largely feel free to make the most of your storage dollar by using RAID 5 in your production database environments.

    This entry was posted on Friday, September 8th, 2006 at 2:28 pm and is filed under Data and Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
  • 16 Comments

    Take a look at some of the responses we've had to this article.

    1. Nov 5th
      Reply

      Huh. As it turns out, the MySQL gurus recommend RAID 5:

      From a performance standpoint, RAID 5, which is striping (RAID 0) with distributed parity blocks, can be beneficial. There are two disks involved in every operation, so it’s not substantially faster than RAID 1 until you have more than three disks total. Even then, its other benefit, size, shines through. Using RAID 5, you can create rather large volumes without spending a lot of cash because you sacrifice only a single disk. By using more smaller disks, such as eight 36-GB disks instead of four 72-GB disks, you increase the number of spindles in the array and therefore boost seek performance and throughput.

    2. Charlie
      Jan 14th
      Reply

      Hi,
      I just came across your blog entry as I was looking for a way to use IOZone to simulate Oracle. However, after running the command you suggested, I discovered that once IOZone sets the file size to 32 MB in the automated run, it starts using 64k record sizes and above and does not test 4k or 8k block sizes, which I believe are closer to the record sizes that Oracle typically uses. Any ideas?

    3. paul w
      Apr 5th
      Reply

      the real question is whether the author had write-caching enabled on his/her disks and his/her RAID controller. if so, he/she should try to replicate these results with caching disabled.

    4. Joe
      Jun 12th
      Reply

      The real problem is that there are really no conclusions that one can draw from this test. I/O transfer rates are completely independent of file size, and are strictly a function of block size and number of spindles.

      The fact that the graphs fall off at about 256MB indicates that all that is being measured is the linux kernel buffer cache behavior, not the I/O subsystem behavior.

      iozone is not representative of oracle in any way.

      oracle generally uses raw devices on traditional unix, and O_DIRECT on linux to bypass the kernel buffer cache entirely.

      There is no possible way to get 2.5Gbytes/sec from a pair of 2Gbits/sec fiber channel ports, so the graphs are either measuring buffer cache performance or are mislabeld.

    5. Vic
      Oct 5th
      Reply

      That is an interesting test but I do have a question – what type of drives were being used the Infortrend SAN used I believe take only SATA drives. The speed of the System is always defined by the slowest subsystem and I agree that if you plan to use SATA drives then RAID 5 or even RAID 6 will probably keep up with the speed of the drives as the new controllers are highly optimized for RAID 6 performance. But put in 4GB FC and faster FC or SAS (duplex) compared to SATA(single channel) drives and the bottle neck will I believe move back to the controller. Now I would love to see that test with concurrent read and writes as well. On the other hand this article did answer my concern if the Infortrend SAN would handle my I/O demands and I guess it will. With the cheap SAS drive SAN available I’ll stay with RAID 10 for the time being but I am open to RAID 5 except like old DBA’s of the past when drives were expensive I have seen database performance come to its knees with RAID 5 – once bitten twice shy.

    6. Oct 5th
      Reply

      Vic,

      That’s a very interesting point… The drives were, indeed, SATA, so it would be very interesting to see how the tests would have come out had they been faster FC drives over a 4GB link. We just got an EMC CX310, so perhaps I can set up a RAID 10 with some of our faster drives to see how things work out. Thanks for the comment.

    7. Richard
      Oct 24th
      Reply

      When you say that the RAID 5 LUN consisted of 4 drives and the RAID 10 LUN consisted of 4 drives, you’re not really comparing llike with like.

      I would like to see you run a mixture of highly concurrent random reads and writes on the following set up:-

      A RAID 5 LUN consisted of 3 drives
      A RAID 10 LUN consisted of 4 drives

      Then, halfway through your test, pull one disk out of the array and measure your performance thereafter.

      Whichever way you look at it RAID 5 is cheap and nasty. You get what you pay for and when RAID 5 goes into degraded mode your SAL goes down the toilet.

      For a fuller understanding of RAID 5 and why it is NOT ideal for Oracle databases, visit:- http://www.baarf.com/

    8. Oct 24th
      Reply

      Richard,
      With all due respect, I think my understanding of RAID 5 is quite full enough thank you. You make a point that a RAID 5 LUN consisting of 5 drives and a RAID 10 LUN consisting of 4 drives is not entirely apples to apples because the RAID 5 is writing to 4 spindles while the RAID 10 is writing to 2 spindles. It is, however a fair comparison, given that the same number of drives per LUN are used. In order to write to 4 drives in a RAID 10, I would need 8 drives, which only illustrates my point. Most storage experts agree that performance is gained by adding spindles, and unless you are doing massively random writes to small files, there is very little to be gained by using RAID 10 over 5 in this day and age.

      As for degraded mode, you are correct, although higher end arrays like the CLARiiON will fail over to your hot spare even if the failing drive is generating errors, generally preventing you from ever running in degraded mode. Degraded mode should not be though of as an operational norm, but rather a way to prevent data loss. If you environment is such that you can’t afford the degraded speed while the hot spare is brought on-line, then by all means, use RAID 10.

      The days of S.A.M.E. “Stripe And Mirror Everything” are gone, and DBA’s who still adheare to it as if it’s the holly grail should step back and really think about the types of writes they need to do. This is not to say that RAID 10 does not have its place, but rather that the type of RAID should not be decided entirely by the application, but rather they type of writes the application will be doing.

    9. Richard
      Oct 24th
      Reply

      Cliff,

      Thank you for you quick reply to my comments.

      My point was that a RAID 5 LUN consisting of 3 (three) drives and a RAID 10 LUN consisting of 4 drives IS comparing apples to apples because the RAID 5 is writing to 2 spindles while the RAID 10 is also writing to 2 spindles.

      I totally agree that more spindles equals better performance but the temptation to use RADI5 just to save a few disks is very naiive. An 8 disk RAID 1+0 LUN will require 16 disks and its RAID 5 counterpart only requires 9 for the same capacity. But disks are really cheap these days so why not use the additional 7 disks and have excellent performance with/without losing a disk?

      Juan Loaiza summarises RAID 5 very eloquently at http://www.miracleas.com/BAARF/Juan_Loaiza_on_raid_V.html

      “If you look at all the vendors that implement RAID-5 you will find that they all set the stripe size to a relatively small value, usually 32K or 64K.
      The result of this is that a large sequential read (like 1M) will span a lot of disks (like 16).
      Because of this, a lot of disk spindles will be made busy for a single IO.
      In mixed mode scenarios where there are random IOs going on, it will slow down the whole IO subsystem by using up lots of disk drives.
      Why don’t they set the stripe size bigger?
      Because in normal file systems, people tend to write a whole file sequentially.
      The RAID-5 vendors want to take advantage of this sequential write to eliminate the RAID-5 penalty.
      Any time you can write across a full stripe set, you can avoid the extra reads and writes that are required for random IOs in RAID-5.
      This is because you can calculate the parity values directly without having to read the old ones from disk.
      Small stripe units also help to eliminate the RAID-5 penalty when a large NVRAM cache is used.
      Locality of reference is more likely to create a full stripe set of blocks all present in the cache if the stripe set is small.
      So, RAID-5 creates a tradeoff where you want to have small stripe units to avoid the RAID-5 penalty for sequential
      writes, but this hurts you any time you have a mixed workload environment in which there are some scans going on in parallel with random access OLTP activity. .”

      You would be mistaken to believe that the days of S.A.M.E. “Stripe And Mirror Everything” are gone.Oracle’s ASM is a prime example why S.A.M.E. is very much alive and here to stay.

    10. Jun 2nd
      Reply

      Lol, I can imagine a DBA will lose the job if they take your RAID 5 benchmark.

      The biggest flaw of this benchmark : it doesn’t indicate how many concurrent logical user are making the query/updates. You can keep crunching the block size for 1-10 user and see no different of IO performance for any database in RAID5 vs RAID10.

      Try convince me with a multiuser benchmark.

    11. RAID 5 is not safe. Disks from the same batch tend to fail together, so a second disk may fail before your RAID recovers after a failure. This will cause total RAID loss. Also, recovery is slow.

    12. Paul
      Jan 7th
      Reply

      Oracle control files are small and popular. Maybe they should not be on raid 5?

    13. Scott Marlowe
      Sep 17th
      Reply

      1: You need to test a MUCH larger file size so you’re getting out of your memory cache. On a machine with 256Meg, that means something in the 1G or more range. since your test is invalid and you didn’t seem to notice, I’d have a hard time taking advice from you.

      2: RAID-5 is NOT a match for RAID-10. I’ve run plenty of my own tests, and under heavy concurrent load RAID-10 with the same number of disks performs much much faster.

      3: I’m a pgsql DBA, have done some Oracle DBA work. Both benefit quite a bit from a good RAID-10 setup.

      • Sep 21st
        Reply

        There are a lot of people who don’t seem to like what I’m saying, but nobody is positing numbers or graphs. Since you come to the table with nothing more than anecdotal reports of RAID 10′s superiority, I have a hard time taking advice from you.

        I think you’ll find that most modern storage subsystems make great use of cache coalescing in the attempt to make corona writes. This is a benefit regardless of file size. In fact, RAID 5 often benefits more since it generally spans more disks in real-world examples.

        My point here is that performance is much more dependent on storage subsystem type than it is on RAID type. It is wrong to say that RAID 5 is inappropriate for ALL Oracle databases.

    14. Scott Marlowe
      Oct 20th
      Reply

      Sorry, but the burden of evidence is on you, the proclaimer that RAID-5 is just peachy keen to run Oracle on. I’ve been building big beefy database servers for about 10 years.

      With a big RAID-10 I can get incredible throughput and parallel performance, both read and write. With RAIF-5 or RAID-6 I am lucky to get 20 to 30% of the same performance with the same number of drives. I’ve tested it, I know it.

      You have created an unrealistic test with too small a data set and refuse to redo your tests to see the error. That’s your fault, not ours. Any realistic test with a large enough dataset to not fit in RAM is going to show you just how bad RAID-5 is. For good measure, pull a drive from each RAID-5 and RAID-10 while testing, and start a rebuild to see how horrifically slow RAID-5 can really be.

      You are wrong that performance is more dependent on storage subsystem than RAID type. Last company I was at had a RAID-6 14 drive system under Solaris for Oracle. My 4 disk SATA RAID-10 routinely beat it soundly with Pgsql on top, both in synthetic benchmarks and in regular reporting applications.

      Make realistic test then come back. For now your lack of homework gets you a D-.

    15. Scott Marlowe
      Oct 28th
      Reply

      Let me make this one point perfectly clear. Due to caching effects, the only part of the graphs you have posted are the bits to the right of the 256Meg portion, which are so small and compressed as to be unreadable.

      You need to run tests on larger data sets and for longer periods so caching isn’t what you’re testing, but the RAID set itself.

      My machines have 32Gigs of memory, so the databases I test on are routinely 64G or larger to make sure I’m testing the actual drive subsystem where it counts, when the dataset is bigger than memory. It’s a very very very basic tenet of performance testing that your dataset has to exceed RAM to be useful, unless you’re planning on working with a dataset that will always be smaller than RAM. At which point you can store your data on tape drives and get good performance.

  • Leave a Reply

    Let us know what you thought.

  • Name (required):

    Email (required):

    Website:

    Message:

Visitors have tagged this post: t (191) - oracle raid (180) - oracle RAID 5 (136) - raid 5 oracle (114) - oracle raid5 (104) - iozone oracle (83) - oracle raid-5 (80) - oracle raid recommendations (56) - Oracle RAID 5 performance (36) - oracle raid configuration (33) - raid 5 vs raid 10 oracle (31) - oracle (29) - oracle and raid 5 (27) - When is a smaller stripe size beneficial in a RAID 5 set (26) - oracle and raid (25) - raid5 oracle (25) - RAID oracle (25) - When is a smaller stripe size beneficial in a RAID 5 set? (25) - raid 5 database (24) - oracle raid performance (24) - oracle on raid 5 (24) - oracle raid recommendation (21) - When is a larger stripe size beneficial in a RAID 1 set (21) - oracle database RAID (20) - when is a smaller stripe size beneficial in a raid 5 se (20) - raid 1 vs raid 5 oracle (20) - Oracle raid 10 (19) - raid 5 spindles (19) - smaller stripe size beneficial in a RAID 5 set (19) - raid 5 benchmark (18) - raid for databases (18) - oracle raid stripe size (18) - smaller stripe size beneficial in a RAID 5 (16) - oracle asm raid 5 (16) - f (15) - oracle asm vs raid (15) - raid 5 and oracle (13) - raid 5 database performance (13) - oracle raid 10 vs raid 5 (13) - Raid 5 speed (12) - oracle on RAID (12) - oracle RAID 6 (12) - oracle IOzone (11) - raid 5 speed test (11) - raid 10 benchmark (11) - raid5 benchmark (11) - mysql "raid 5" (11) - oracle raid 5 raid 10 (11) - Oracle stripe and mirror everything (11) - database RAID5 (10) - RAID 5 LUN (10) - database RAID 5 (10) - oracle performance raid (9) - linux raid 5 benchmark (9) - oracle raid 1 raid 5 (9) - raid 10 vs raid 5 oracle (9) - oracle san raid (8) - Oracle RAID SAN (8) - RAID for oracle (8) - raid and oracle (8) - oracle raid5 raid1 (8) - ORACLE SPINDLES (8) - oracle database raid 5 (8) - raid 5 stripe size oracle (8) - raid stripe size oracle (8) - raid 5 oracle database (8) - raid 5 (7) - oracle performance RAID 5 (7) - mysql RAID configuration (7) - raid 1 oracle (7) - mysql raid5 (7) - raid 5 mysql (7) - oracle on raid5 (7) - oracle raid 1 (7) - RAID configuration for Oracle (7) - raid databases (7) - oracle raid 5 configuration (7) - oracle random read (7) - RAID spindles (7) - mysql raid5 raid10 (7) - raid 5 impacted (7) - raid5 stride (7) - raid5 databases (7) - Oracle RAID 5 or RAID 10 (7) - RAID 5 for DAtabases (7) - raid 6 oracle (7) - oracle raid5 raid10 (7) - raid5 database (7) - When is a larger stripe size beneficial in a RAID 1 set? (7) - \"raid 5\ small-write penalty (7) - raid 10 benchmarks (6) - raid 5 performance benchmark (6) - oracle raid benchmark (6) - mysql raid 5 raid 10 (6) - What is faster? RAID 1 or RAID 5? (6) - raid 5 performance spindles (6) - raid 5 1 oracle (6) - oracle lun layout (6) - oracle raid 5 vs raid 10 (6) - oracle raid 5 vs raid 1 (6) - raid 5 speed tests (6) - raid5 penalty (6) - mysql raid stripe size (6) - oracle stripe size (6) - oracle lun (6) - RAID and databases (6) - IS RAID BENEFICIAL IN ORACLE (6) - Oracle RAID levels (6) - About RAID levels for databases (6) - software raid 5 benchmark (5) - RAID 5 databases (5) - mysql "raid 1" (5) - "RAID 10" Oracle (5) - raid10 raid5 performance (5) - oracle raid1 (5) - iozone database (5) - ORACLE server RAID (5) - mysql raid benchmark (5) - mysql raid (5) - oracle raid 5 san (5) - EMC cx310 (5) - oracle disk raid (5) - iozone and oracle (5) - oracle linux raid (5) - Oracle raid5 stripe size (5) - raid 5 stripe size (5) - oracle asm raid (5) - raid 10 vs raid 5 benchmark (5) - install oracle on raid (5) - Oracle & raid (5) - database san raid-5 (5) - larger stripe size beneficial in a RAID 1 set (5) - oracle does support RAID 5 (5) - ASM and RAID 5 (5) - create LUN for oracle database (5) - all (4) - raid10 oracle (4) - oracle database raid performance (4) - raid performance benchmarks (4) - Oracle RAID 1 or RAID 5 (4) - which is faster raid 1 or raid 5 (4) - RAID benchmark (4) - raid 5 stride (4) - oracle db raid (4) - oracle raid 10 raid 5 (4) - raid 5 in oracle (4) - raid5 speed (4) - raid type for database (4) - database on RAID5 (4) - RAID 5 and databases (4) -