How to Replace a Failed Drive in a ZFS Pool

Featured

So you have a failed disk in a ZFS pool and you want to fix it? Routine disk failures are really a non-event with ZFS because the volume management makes replacing them so dang easy. In many cases, unlike hardware RAID or older volume management solutions, the replacement disk doesn’t even need to be exactly the same as the original. So let’s get started replacing our failed disk. These instructions will be for a Solaris 10 system, so a few of the particulars related to unconfiguring the disk and device paths will vary with different flavors of UNIX.

First, take a look at the zpools to see if there are any errors. The -x flag will only display status for pools that are exhibiting errors or are otherwise unavailable.
Note: If the disk is actively failing (a process that sometimes takes a while as the OS offlines it), any commands that use storage related system calls will hang and take a long time to return. These include “zpool” and “format”, so just be patient; they will eventually return.

# zpool status -x

 pool: data
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            c1t4d0  ONLINE       0     0     0
            c1t5d0  FAULTED      1    81     0  too many errors
          mirror-1  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0

errors: No known data errors

So we can easily see that c1t5d0 has failed. Take a look at the “format” output do get the particulars about the disk:
# format

Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c1t0d0 
          /pci@0/pci@0/pci@2/scsi@0/sd@0,0
       1. c1t1d0 
          /pci@0/pci@0/pci@2/scsi@0/sd@1,0
       2. c1t2d0 
          /pci@0/pci@0/pci@2/scsi@0/sd@2,0
       3. c1t3d0 
          /pci@0/pci@0/pci@2/scsi@0/sd@3,0
       4. c1t4d0 
          /pci@0/pci@0/pci@2/scsi@0/sd@4,0
       5. c1t5d0 
          /pci@0/pci@0/pci@2/scsi@0/sd@5,0
Specify disk (enter its number): 

Get your hands on a replacement disk that is as similar as possible to a SEAGATE-ST914602SSUN146G-0603-136.73GB. I was only able to dig up a HITACHI-H103014SCSUN146G-A2A8-136.73GB, so I’ll be using that instead of a direct replacement.

Next, use “cfgadm” to look at the disks you have and their configuration status:

# cfgadm -al

Ap_Id                          Type         Receptacle   Occupant     Condition
c1                             scsi-sata    connected    configured   unknown
c1::dsk/c1t0d0                 disk         connected    configured   unknown
c1::dsk/c1t1d0                 disk         connected    configured   unknown
c1::dsk/c1t2d0                 disk         connected    configured   unknown
c1::dsk/c1t3d0                 disk         connected    configured   unknown
c1::dsk/c1t4d0                 disk         connected    configured   unknown
c1::dsk/c1t5d0                 disk         connected    configured   unknown

We want to replace t5, so we prepare it for removal by unconfiguring it:

# cfgadm -c unconfigure c1::dsk/c1t5d0

The “safe to remove” led should turn on and you can pull the disk, remembering to allow it several seconds to spin down. Replace it with the new disk and take a look at “cfgadm -al” output again to ensure that it has been automatically configured. If it has not, you can manually configure it like below:

# cfgadm -c configure c1::dsk/c1t5d0

Now, it’s a simple matter of a quick “zpool replace” to get things rebuilding:

# zpool replace data c1t5d0

You can use the output of zpool status to watch the resilver process…

Replace Failed SVM Mirror Drive

So you have used SVM to mirror your disk, and one of the two drives fails. Aren’t you glad you mirrored them! You don’t have to do a restore from tape, but you are going have to replace the failed drive.

Many modern RAID arrays just require you to take out the bad drive and plug in the new one, while everything else is taken care of automatically. It’s not quite that easy on a Sun server, but it’s really just a few simple steps. I just had to do this, so I thought I would write down the procedure here.

Basically, the process boils down to the following steps:

  • Detach the failed meta devices from the failed drive
  • Delete the meta devices from the failed drive
  • Delete the meta databases from the failed drive
  • Unconfigure the failed drive
  • Remove and replace the failed drive
  • Configure the new drive
  • Copy the remaining drive’s partition table to the new drive
  • Re-create the meta databases on the new drive
  • Install the bootblocks on the new drive
  • Recreate the meta devices
  • Attach the meta devices

Let’s look at each step individually. In my case, c0t1d0 has failed, so, I detach all meta devices on that disk and then delete them:


# metadetach -f d0 d2
# metadetach -f d10 d12
# metadetach -f d40 d42
# metaclear d2
# metaclear d12
# metaclear d42

Next I take a look at the status of my meta databases. Below we can see the the replicas on that disk have write errors:

# metadb -i
        flags           first blk       block count
     a m  p  luo        16               8192            /dev/dsk/c0t0d0s3
     a    p  luo        8208             8192            /dev/dsk/c0t0d0s3
     W    p  luo        16                8192            /dev/dsk/c0t1d0s3
     W    p  luo        8208            8192            /dev/dsk/c0t1d0s3
 r - replica does not have device relocation information
 o - replica active prior to last mddb configuration change
 u - replica is up to date
 l - locator for this replica was read successfully
 c - replica's location was in /etc/lvm/mddb.cf
 p - replica's location was patched in kernel
 m - replica is master, this is replica selected as input
 W - replica has device write errors
 a - replica is active, commits are occurring to this replica
 M - replica had problem with master blocks
 D - replica had problem with data blocks
 F - replica had format problems
 S - replica is too small to hold current data base
 R - replica had device read errors

The replicas on c0t1d0s3 are dead to us, so let’s wipe them out!


# metadb -d c0t1d0s3
# metadb -i

        flags           first blk       block count
     a m  p  luo        16               8192            /dev/dsk/c0t0d0s3
     a    p  luo        8208             8192            /dev/dsk/c0t0d0s3

The only replicas we have left are on c0t0d0s3, so I’m all clear to unconfigure the device. I run cfgadm to get the c0 path:


# cfgadm -al

Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0                 disk         connected    configured   unknown
c0::dsk/c0t1d0                 disk         connected    configured   unknown
c0::dsk/c0t2d0                 disk         connected    configured   unknown
c0::dsk/c0t3d0                 disk         connected    configured   unknown
c1                             scsi-bus     connected    configured   unknown
c1::dsk/c1t0d0                 CD-ROM       connected    configured   unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb1/1.1                       unknown      empty        unconfigured ok
usb1/1.2                       unknown      empty        unconfigured ok
usb1/1.3                       unknown      empty        unconfigured ok
usb1/1.4                       unknown      empty        unconfigured ok
usb1/2                         unknown      empty        unconfigured ok

I run the following command to unconfigure the failed drive:


# cfgadm -c unconfigure c0::dsk/c0t1d0

The drive light turns blue
Pull the failed drive out
Insert the new drive

Configure the new drive:


# cfgadm -c configure c0::dsk/c0t1d0

Now that the drive is configured and visible from within the format command, we can copy the partition table from the remaining mirror member:


# prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2

Next, I install the bootblocks onto the new drive:


# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s2

Create the state replicas:


metadb -a -c 2 c0t1d0s3

Recreate the meta devices:

metainit -f d2 1 1 c0t1d0s0
metainit -f d12 1 1 c0t1d0s1
metainit -f d42 1 1 c0t1d0s4

And finally, reattach the metadevices which will sync them up with the mirror.


metattach d0 d2
metattach d10 d12
metattach d40 d42

Solaris 8 SAN Frustrations

Getting Solaris 8 to light up a Qlogic QLA2310 Fibre Channel card using the SUNWqlc and SUNWqlcx drivers can be frustrating enough, but the headaches are only beginning if you want to connect it to a SAN and you don’t have all the right packages installed.

Last week, I installed the QLA2310 in a Sun Fire V210 running Solaris 8. I installed the latest versions of SUNWqlc, SUNWqlcx and SUNWsan. After doing a reboot -- -r, the system came up and attached the driver to the card. I zoned it in the fabric and logged into Navisphere, where the WWN showed up, but neither Power Path or the Navisphere host agent could communicate with the CLARiiON. I also could not see any of the LUNS I had presented.

I thought it was strange that the CLARiiON could see the host, but the host could not see the CLARiiON.

I ran:
luxadm -e port
Which returned:

Found path to 1 HBA ports

/devices/pci@1d,700000/SUNW,qlc@1/fp@0,0:devctl                    CONNECTED

Clearly, it could see the HBA.

I ran:

ls -l /dev/cfg
total 8
lrwxrwxrwx 1 root  root   38 Nov 30 14:31 c0 ->
../../devices/pci@1e,600000/ide@d:scsi
lrwxrwxrwx 1 root  root   39 Nov 30 14:31 c1 ->
../../devices/pci@1c,600000/scsi@2:scsi
lrwxrwxrwx 1 root  root   41 Nov 30 14:31 c2 ->
../../devices/pci@1c,600000/scsi@2,1:scsi
lrwxrwxrwx 1 root  root   48 Dec  4 13:49 c3 ->
../../devices/pci@1d,700000/SUNW,qlc@1/fp@0,0:fc

The card was C3… This becomes useful later when we have to config it.

I ran:
cfgadm -al -o show_FCP_dev
Which retuned:
cfgadm: Configuration administration not supported

There it was… I didn’t have the complete SAN package installed. I hadn’t done this in a few years, so I had forgotten all the packages I had to add to get the Sun SAN package working correctly… There are many.

Happily, Sun has now packaged them in a nice “SAN_4.4.12_install_it.tar.Z”, which you can get from their website if you have a username. It installs everything for you in the right order.

The only thing left to do was another reboot -- -r and run cfgadm -c configure c3 to config the device. After this everything started working nicely.

Registering Solaris CLARiiON Hosts With QLA 2310 HBAs

Sun Microsystems likes the QLA 2310 Fiber Channel HBA. It’s only a 2Gig card, but it works with the Sun native driver, which makes it wonderful for us Solaris Administrators. Unfortunately, it does not integrate perfectly with EMC CLARiiON SANs because it does not register properly with Navasphere. Even if you manually register the host, the LUNs will not be presented to the host because the agent can’t pass commands to the array.

To remedy this situation on my Solaris 8 host, I used the following procedure:

Edit the /etc/system file and add the following line:

set fcp:ssfcp_enable_auto_configuration=1

Next, I rebooted my Solaris host with the “-r” flag:

reboot -- -r

Next I checked Navisphere to make sure my paths have logged in. They were, so I logged into the Solaris host and ran the following commands:

cfgadm
devfsadm
format

I then saw the storage that was presented to my host. Finally, I restarted the Navisphere agent and started using my new LUNs.