Replace Failed SVM Mirror Drive

So you have used SVM to mirror your disk, and one of the two drives fails. Aren’t you glad you mirrored them! You don’t have to do a restore from tape, but you are going have to replace the failed drive.

Many modern RAID arrays just require you to take out the bad drive and plug in the new one, while everything else is taken care of automatically. It’s not quite that easy on a Sun server, but it’s really just a few simple steps. I just had to do this, so I thought I would write down the procedure here.

Basically, the process boils down to the following steps:

  • Detach the failed meta devices from the failed drive
  • Delete the meta devices from the failed drive
  • Delete the meta databases from the failed drive
  • Unconfigure the failed drive
  • Remove and replace the failed drive
  • Configure the new drive
  • Copy the remaining drive’s partition table to the new drive
  • Re-create the meta databases on the new drive
  • Install the bootblocks on the new drive
  • Recreate the meta devices
  • Attach the meta devices

Let’s look at each step individually. In my case, c0t1d0 has failed, so, I detach all meta devices on that disk and then delete them:


# metadetach -f d0 d2
# metadetach -f d10 d12
# metadetach -f d40 d42
# metaclear d2
# metaclear d12
# metaclear d42

Next I take a look at the status of my meta databases. Below we can see the the replicas on that disk have write errors:

# metadb -i
        flags           first blk       block count
     a m  p  luo        16               8192            /dev/dsk/c0t0d0s3
     a    p  luo        8208             8192            /dev/dsk/c0t0d0s3
     W    p  luo        16                8192            /dev/dsk/c0t1d0s3
     W    p  luo        8208            8192            /dev/dsk/c0t1d0s3
 r - replica does not have device relocation information
 o - replica active prior to last mddb configuration change
 u - replica is up to date
 l - locator for this replica was read successfully
 c - replica's location was in /etc/lvm/mddb.cf
 p - replica's location was patched in kernel
 m - replica is master, this is replica selected as input
 W - replica has device write errors
 a - replica is active, commits are occurring to this replica
 M - replica had problem with master blocks
 D - replica had problem with data blocks
 F - replica had format problems
 S - replica is too small to hold current data base
 R - replica had device read errors

The replicas on c0t1d0s3 are dead to us, so let’s wipe them out!


# metadb -d c0t1d0s3
# metadb -i

        flags           first blk       block count
     a m  p  luo        16               8192            /dev/dsk/c0t0d0s3
     a    p  luo        8208             8192            /dev/dsk/c0t0d0s3

The only replicas we have left are on c0t0d0s3, so I’m all clear to unconfigure the device. I run cfgadm to get the c0 path:


# cfgadm -al

Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0                 disk         connected    configured   unknown
c0::dsk/c0t1d0                 disk         connected    configured   unknown
c0::dsk/c0t2d0                 disk         connected    configured   unknown
c0::dsk/c0t3d0                 disk         connected    configured   unknown
c1                             scsi-bus     connected    configured   unknown
c1::dsk/c1t0d0                 CD-ROM       connected    configured   unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb1/1.1                       unknown      empty        unconfigured ok
usb1/1.2                       unknown      empty        unconfigured ok
usb1/1.3                       unknown      empty        unconfigured ok
usb1/1.4                       unknown      empty        unconfigured ok
usb1/2                         unknown      empty        unconfigured ok

I run the following command to unconfigure the failed drive:


# cfgadm -c unconfigure c0::dsk/c0t1d0

The drive light turns blue
Pull the failed drive out
Insert the new drive

Configure the new drive:


# cfgadm -c configure c0::dsk/c0t1d0

Now that the drive is configured and visible from within the format command, we can copy the partition table from the remaining mirror member:


# prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2

Next, I install the bootblocks onto the new drive:


# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s2

Create the state replicas:


metadb -a -c 2 c0t1d0s3

Recreate the meta devices:

metainit -f d2 1 1 c0t1d0s0
metainit -f d12 1 1 c0t1d0s1
metainit -f d42 1 1 c0t1d0s4

And finally, reattach the metadevices which will sync them up with the mirror.


metattach d0 d2
metattach d10 d12
metattach d40 d42

10 thoughts on “Replace Failed SVM Mirror Drive

  1. Nice post, I just used it to replace a failed drive on my SUN Fire V440. I think you might have left off one step. You need to put back the metadbs on the new disk.

    Example:
    metadb -a -c 2 c0t0d0s3

  2. prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s – /dev/rdsk/c1t0d0s2

    should be

    prtvtoc /dev/rdsk/c1t1d0s2 | fmthard -s – /dev/rdsk/c1t0d0s2

  3. You have “metadb -a -c 3 c0t0d0s3″ but you probably mean “metadb -a -c 2 c1t0d0s3″ if the device names you’ve been working with in the article should stay consistent.

  4. all this worked GREAT for me! one minor hickup though. the disk I replaced(came from Sun) had an EFI disk label on it which causes fmthards to barf. ‘fdisk -B /dev/rdsk/c*t*d*p0″ fixed that though.

  5. One minor issue.

    The bootblock install should be

    installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0

    You show it going on slice 2 (whole disk).
    Often they are the same, but if the root slice isn’t at the start of the disk, then putting it on slice 2 won’t work.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>