This document illustrates swapping a drive in intel ISW fakeraid - the BIOS
utility is aware of the RAID sets, and Linux interfaces with them via the older
dmraid utility. In this example the machine has a RAID 10 setup.
Note that RAID 10 is implemented as a stripe of two mirrors, with the stripe
visible as the "superset" device and the mirrors visible as separate
sub-devices, each of which has a pair of actual physical drives.
Note also that the /dev/sda, /dev/sdb, etc. names correspond to actual
physical drives (not RAID volumes), and that they are assigned to the drives
in the order that they initialize, so they don't necessarily correspond to the
slot numbers.
useful commands for interrogating the arrays:
To check what the drive letters correspond to:
ls -l /dev/disk/by-path
(see example output in transcript below -- scsi-0:0:0:0 indicates slot 0,
scsi-1:0:0:0 is slot 1, etc.)
see what drives are part of fakeraid sets, and subsets, with lots of
verbosity:
dmraid -s -s -vv
brief listing of what drives are part of what fakeraid superset:
dmraid -r
monitor rebuild status:
dmsetup status
show which sets are mirrors, stripes, etc.:
dmsetup status
List raid volumes & partitions handled by device mapper / dmraid:
dmsetup ls
or:
ls -l /dev/mapper/
List which drives are part of each dmraid subset / superset (the dm-? numbers
correspond to the second numbers in the output from "dmsetup ls"):
ls /sys/block/dm-?/slaves
Tell fakeraid to start rebuilding the specified array using the specified new
disk (make sure you check which device is degraded first [will be the one with
fewer drives than it's supposed to have, in "dmraid -s -s -vv"], and which
drive is definitely the new one):
dmraid -R isw_dxifadxaii_Volume0-0 /dev/sdd
================================================================================
Below is a transcript of adding a new drive to a degraded RAID 10 array on
the server 'fishie'. The new drive was zeroed first, to get rid of any
stray RAID signatures that may have been on it.
In the transcript, the system had been booted with one array component
degraded (the drive that was in slot 0 had failed), so sda, sdb, sdc were
slots 1, 2, and 3 respectively, and slot 0 was not responding, so it did not
get a device name at boot time. A new drive was then added in slot 0, which
was the fourth SATA drive the OS saw initialize, so slot 0 became /dev/sdd.
This was then re-added to the failed array to start a rebuild.
================================================================================
verify which drive is dead:
dmesg | less
look for drive initialization, e.g. look shortly before the first time sda is mentioned
reveals this:
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: both IDENTIFYs aborted, assuming NODEV
so the bad drive is on ata1; see what the good drives got:
sd 1:0:0:0: Attached scsi disk sda
sd 2:0:0:0: Attached scsi disk sdb
sd 3:0:0:0: Attached scsi disk sdc
the first disk usually gets "sd 0:0:0:0", so this implies it's the first one.
confirm by blinking all the drive lights:
[root@fishie tmp]# dd if=/dev/zero of=testfile bs=1M conv=fdatasync
...yep, first one is dark, all three others are lit. Pull the first drive and
insert the replacement.
Now look at dmesg to verify that the OS can see the new disk:
[root@fishie tmp]# dmesg | tail
ata1: hard resetting link
ata1: SATA link down (SStatus 0 SControl 300)
ata1: EH complete
ata1: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen
ata1: irq_stat 0x00400040, connection status changed
ata1: SError: { PHYRdyChg CommWake DevExch }
ata1: hard resetting link
ata1: link is slow to respond, please be patient (ready=0)
ata1: softreset failed (device not ready)
ata1: hard resetting link
[root@fishie tmp]# dmesg | tail
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
sdd: unknown partition table
sd 0:0:0:0: Attached scsi disk sdd
sd 0:0:0:0: Attached scsi generic sg4 type 0
OK, the new disk is sdd. We can see that it it's on "sd 0:0:0:0" as expected;
we can also see this from /dev/disk/by-path:
[root@fishie tmp]# ls -l /dev/disk/by-path
lrwxrwxrwx 1 root root 9 Feb 17 15:36 pci-0000:00:1f.2-scsi-0:0:0:0 -> ../../sdd
lrwxrwxrwx 1 root root 9 Feb 15 16:32 pci-0000:00:1f.2-scsi-1:0:0:0 -> ../../sda
lrwxrwxrwx 1 root root 9 Feb 15 16:32 pci-0000:00:1f.2-scsi-2:0:0:0 -> ../../sdb
lrwxrwxrwx 1 root root 9 Feb 15 16:32 pci-0000:00:1f.2-scsi-3:0:0:0 -> ../../sdc
lrwxrwxrwx 1 root root 9 Feb 15 16:32 pci-0000:00:1f.2-scsi-5:0:0:0 -> ../../sr0
Now look at dmraid status:
[root@fishie tmp]# dmraid -s -s -vv
NOTICE: /dev/sda: asr discovering
NOTICE: /dev/sda: ddf1 discovering
NOTICE: /dev/sda: hpt37x discovering
NOTICE: /dev/sda: hpt45x discovering
NOTICE: /dev/sda: isw discovering
NOTICE: /dev/sda: isw metadata discovered
NOTICE: /dev/sda: jmicron discovering
NOTICE: /dev/sda: lsi discovering
NOTICE: /dev/sda: nvidia discovering
NOTICE: /dev/sda: pdc discovering
NOTICE: /dev/sda: sil discovering
NOTICE: /dev/sda: via discovering
NOTICE: /dev/sdb: asr discovering
NOTICE: /dev/sdb: ddf1 discovering
NOTICE: /dev/sdb: hpt37x discovering
NOTICE: /dev/sdb: hpt45x discovering
NOTICE: /dev/sdb: isw discovering
NOTICE: /dev/sdb: isw metadata discovered
NOTICE: /dev/sdb: jmicron discovering
NOTICE: /dev/sdb: lsi discovering
NOTICE: /dev/sdb: nvidia discovering
NOTICE: /dev/sdb: pdc discovering
NOTICE: /dev/sdb: sil discovering
NOTICE: /dev/sdb: via discovering
NOTICE: /dev/sdc: asr discovering
NOTICE: /dev/sdc: ddf1 discovering
NOTICE: /dev/sdc: hpt37x discovering
NOTICE: /dev/sdc: hpt45x discovering
NOTICE: /dev/sdc: isw discovering
NOTICE: /dev/sdc: isw metadata discovered
NOTICE: /dev/sdc: jmicron discovering
NOTICE: /dev/sdc: lsi discovering
NOTICE: /dev/sdc: nvidia discovering
NOTICE: /dev/sdc: pdc discovering
NOTICE: /dev/sdc: sil discovering
NOTICE: /dev/sdc: via discovering
NOTICE: /dev/sdd: asr discovering
NOTICE: /dev/sdd: ddf1 discovering
NOTICE: /dev/sdd: hpt37x discovering
NOTICE: /dev/sdd: hpt45x discovering
NOTICE: /dev/sdd: isw discovering
NOTICE: /dev/sdd: jmicron discovering
NOTICE: /dev/sdd: lsi discovering
NOTICE: /dev/sdd: nvidia discovering
NOTICE: /dev/sdd: pdc discovering
NOTICE: /dev/sdd: sil discovering
NOTICE: /dev/sdd: via discovering
NOTICE: added /dev/sda to RAID set "isw_dxifadxaii"
NOTICE: added /dev/sdb to RAID set "isw_dxifadxaii"
NOTICE: added /dev/sdc to RAID set "isw_dxifadxaii"
ERROR: isw: wrong number of devices in RAID set "isw_dxifadxaii_Volume0-0" [1/2] on /dev/sda
*** Group superset isw_dxifadxaii
--> Active Superset
name : isw_dxifadxaii_Volume0
size : 1953535744
stride : 128
type : raid01
status : ok
subsets: 2
devs : 3
spares : 0
--> *Inconsistent* Active Subset
name : isw_dxifadxaii_Volume0-0
size : 976767872
stride : 128
type : mirror
status : inconsistent
subsets: 0
devs : 1
spares : 0
--> *Inconsistent* Active Subset
name : isw_dxifadxaii_Volume0-1
size : 976767872
stride : 128
type : mirror
status : inconsistent
subsets: 0
devs : 2
spares : 0
OK, dmraid sees /dev/sdd, and it's not part of any arrays, and it looks as if
"isw_dxifadxaii_Volume0-0" is the sub-array with the missing disk (it shows
"devs: 1"), i.e. where we want to add it. Another way of looking at the array:
[root@fishie tmp]# dmsetup status
isw_dxifadxaii_Volume0: 0 1953535744 striped 2 253:0 253:1 1 AA
isw_dxifadxaii_Volume0-1: 0 976767995 mirror 2 8:32 8:16 7453/7453 1 AA 1 core
isw_dxifadxaii_Volume0p3: 0 12578895 linear
isw_dxifadxaii_Volume0-0: 0 976767995 linear
isw_dxifadxaii_Volume0p2: 0 12578895 linear
isw_dxifadxaii_Volume0p1: 0 256977 linear
isw_dxifadxaii_Volume0p5: 0 1928105172 linear
0-1 says "mirror", but 0-0 says "linear". Let's add the new drive to 0-0 with
the rebuild command:
[root@fishie tmp]# dmraid -R isw_dxifadxaii_Volume0-0 /dev/sdd
ERROR: isw: wrong number of devices in RAID set "isw_dxifadxaii_Volume0-0" [1/2] on /dev/sda
isw: drive to rebuild: /dev/sdd
RAID set "isw_dxifadxaii_Volume0" already active
device "isw_dxifadxaii_Volume0-0" is now registered with dmeventd for monitoring
device "isw_dxifadxaii_Volume0-1" is now registered with dmeventd for monitoring
device "isw_dxifadxaii_Volume0" is now registered with dmeventd for monitoring
Error: Unable to write to descriptor!
Error: Unable to execute set command!
Error: Unable to write to descriptor!
Error: Unable to execute set command!
Somewhat alarming error messages, but it did work:
[root@fishie tmp]# dmraid -s -s -vv
NOTICE: /dev/sda: asr discovering
NOTICE: /dev/sda: ddf1 discovering
NOTICE: /dev/sda: hpt37x discovering
NOTICE: /dev/sda: hpt45x discovering
NOTICE: /dev/sda: isw discovering
NOTICE: /dev/sda: isw metadata discovered
NOTICE: /dev/sda: jmicron discovering
NOTICE: /dev/sda: lsi discovering
NOTICE: /dev/sda: nvidia discovering
NOTICE: /dev/sda: pdc discovering
NOTICE: /dev/sda: sil discovering
NOTICE: /dev/sda: via discovering
NOTICE: /dev/sdb: asr discovering
NOTICE: /dev/sdb: ddf1 discovering
NOTICE: /dev/sdb: hpt37x discovering
NOTICE: /dev/sdb: hpt45x discovering
NOTICE: /dev/sdb: isw discovering
NOTICE: /dev/sdb: isw metadata discovered
NOTICE: /dev/sdb: jmicron discovering
NOTICE: /dev/sdb: lsi discovering
NOTICE: /dev/sdb: nvidia discovering
NOTICE: /dev/sdb: pdc discovering
NOTICE: /dev/sdb: sil discovering
NOTICE: /dev/sdb: via discovering
NOTICE: /dev/sdc: asr discovering
NOTICE: /dev/sdc: ddf1 discovering
NOTICE: /dev/sdc: hpt37x discovering
NOTICE: /dev/sdc: hpt45x discovering
NOTICE: /dev/sdc: isw discovering
NOTICE: /dev/sdc: isw metadata discovered
NOTICE: /dev/sdc: jmicron discovering
NOTICE: /dev/sdc: lsi discovering
NOTICE: /dev/sdc: nvidia discovering
NOTICE: /dev/sdc: pdc discovering
NOTICE: /dev/sdc: sil discovering
NOTICE: /dev/sdc: via discovering
NOTICE: /dev/sdd: asr discovering
NOTICE: /dev/sdd: ddf1 discovering
NOTICE: /dev/sdd: hpt37x discovering
NOTICE: /dev/sdd: hpt45x discovering
NOTICE: /dev/sdd: isw discovering
NOTICE: /dev/sdd: isw metadata discovered
NOTICE: /dev/sdd: jmicron discovering
NOTICE: /dev/sdd: lsi discovering
NOTICE: /dev/sdd: nvidia discovering
NOTICE: /dev/sdd: pdc discovering
NOTICE: /dev/sdd: sil discovering
NOTICE: /dev/sdd: via discovering
NOTICE: added /dev/sda to RAID set "isw_dxifadxaii"
NOTICE: added /dev/sdb to RAID set "isw_dxifadxaii"
NOTICE: added /dev/sdc to RAID set "isw_dxifadxaii"
NOTICE: added /dev/sdd to RAID set "isw_dxifadxaii"
*** Group superset isw_dxifadxaii
--> Active Superset
name : isw_dxifadxaii_Volume0
size : 1953535744
stride : 128
type : raid01
status : ok
subsets: 2
devs : 4
spares : 0
--> Active Subset
name : isw_dxifadxaii_Volume0-0
size : 976767872
stride : 128
type : mirror
status : nosync
subsets: 0
devs : 2
spares : 0
--> Active Subset
name : isw_dxifadxaii_Volume0-1
size : 976767872
stride : 128
type : mirror
status : nosync
subsets: 0
devs : 2
spares : 0
Looks like it was added successfully nonetheless. Both 0-0 and 0-1 show 2 devs
and "mirror" now, and "nosync" implies that it hasn't synched up yet. We can
monitor rebuild progress with:
[root@fishie tmp]# dmsetup status
isw_dxifadxaii_Volume0: 0 1953535744 striped 2 253:0 253:1 1 AA
isw_dxifadxaii_Volume0-1: 0 976767995 mirror 2 8:32 8:16 1005/7453 1 AA 1 core
isw_dxifadxaii_Volume0p3: 0 12578895 linear
isw_dxifadxaii_Volume0-0: 0 976767995 mirror 2 8:0 8:48 1065/7453 1 AA 1 core
isw_dxifadxaii_Volume0p2: 0 12578895 linear
isw_dxifadxaii_Volume0p1: 0 256977 linear
isw_dxifadxaii_Volume0p5: 0 1928105172 linear
the following demonstrate other ways to see some of the same information in
various forms:
[root@fishie tmp]# dmraid -r
/dev/sda: isw, "isw_dxifadxaii", GROUP, ok, 976773165 sectors, data@ 0
/dev/sdb: isw, "isw_dxifadxaii", GROUP, ok, 976773165 sectors, data@ 0
/dev/sdc: isw, "isw_dxifadxaii", GROUP, ok, 976773165 sectors, data@ 0
/dev/sdd: isw, "isw_dxifadxaii", GROUP, ok, 976773165 sectors, data@ 0
[root@fishie tmp]# dmsetup ls
isw_dxifadxaii_Volume0 (253, 2)
isw_dxifadxaii_Volume0-1 (253, 1)
isw_dxifadxaii_Volume0p3 (253, 5)
isw_dxifadxaii_Volume0-0 (253, 0)
isw_dxifadxaii_Volume0p2 (253, 4)
isw_dxifadxaii_Volume0p1 (253, 3)
isw_dxifadxaii_Volume0p5 (253, 6)
[root@fishie tmp]# ls -l /dev/mapper/
total 0
crw------- 1 root root 10, 63 Feb 15 16:31 control
brw-rw---- 1 root disk 253, 2 Feb 15 16:31 isw_dxifadxaii_Volume0
brw-rw---- 1 root disk 253, 0 Feb 15 16:31 isw_dxifadxaii_Volume0-0
brw-rw---- 1 root disk 253, 1 Feb 15 16:31 isw_dxifadxaii_Volume0-1
brw-rw---- 1 root disk 253, 3 Feb 15 16:53 isw_dxifadxaii_Volume0p1
brw-rw---- 1 root disk 253, 4 Feb 15 16:32 isw_dxifadxaii_Volume0p2
brw-rw---- 1 root disk 253, 5 Feb 15 16:31 isw_dxifadxaii_Volume0p3
brw-rw---- 1 root disk 253, 6 Feb 15 16:53 isw_dxifadxaii_Volume0p5
The device minor numbers (second column, after "253,") correspond to:
[root@fishie tmp]# ls /sys/block/dm-?/slaves
/sys/block/dm-0/slaves:
sda sdd
/sys/block/dm-1/slaves:
sdb sdc
/sys/block/dm-2/slaves:
dm-0 dm-1
/sys/block/dm-3/slaves:
dm-2
/sys/block/dm-4/slaves:
dm-2
/sys/block/dm-5/slaves:
dm-2
/sys/block/dm-6/slaves:
dm-2
...which shows quite nicely how the disks are grouped.
finally, after the rebuild is done, everything is well again:
[root@fishie tmp]# dmraid -s -s -vv
NOTICE: /dev/sda: asr discovering
NOTICE: /dev/sda: ddf1 discovering
NOTICE: /dev/sda: hpt37x discovering
NOTICE: /dev/sda: hpt45x discovering
NOTICE: /dev/sda: isw discovering
NOTICE: /dev/sda: isw metadata discovered
NOTICE: /dev/sda: jmicron discovering
NOTICE: /dev/sda: lsi discovering
NOTICE: /dev/sda: nvidia discovering
NOTICE: /dev/sda: pdc discovering
NOTICE: /dev/sda: sil discovering
NOTICE: /dev/sda: via discovering
NOTICE: /dev/sdb: asr discovering
NOTICE: /dev/sdb: ddf1 discovering
NOTICE: /dev/sdb: hpt37x discovering
NOTICE: /dev/sdb: hpt45x discovering
NOTICE: /dev/sdb: isw discovering
NOTICE: /dev/sdb: isw metadata discovered
NOTICE: /dev/sdb: jmicron discovering
NOTICE: /dev/sdb: lsi discovering
NOTICE: /dev/sdb: nvidia discovering
NOTICE: /dev/sdb: pdc discovering
NOTICE: /dev/sdb: sil discovering
NOTICE: /dev/sdb: via discovering
NOTICE: /dev/sdc: asr discovering
NOTICE: /dev/sdc: ddf1 discovering
NOTICE: /dev/sdc: hpt37x discovering
NOTICE: /dev/sdc: hpt45x discovering
NOTICE: /dev/sdc: isw discovering
NOTICE: /dev/sdc: isw metadata discovered
NOTICE: /dev/sdc: jmicron discovering
NOTICE: /dev/sdc: lsi discovering
NOTICE: /dev/sdc: nvidia discovering
NOTICE: /dev/sdc: pdc discovering
NOTICE: /dev/sdc: sil discovering
NOTICE: /dev/sdc: via discovering
NOTICE: /dev/sdd: asr discovering
NOTICE: /dev/sdd: ddf1 discovering
NOTICE: /dev/sdd: hpt37x discovering
NOTICE: /dev/sdd: hpt45x discovering
NOTICE: /dev/sdd: isw discovering
NOTICE: /dev/sdd: isw metadata discovered
NOTICE: /dev/sdd: jmicron discovering
NOTICE: /dev/sdd: lsi discovering
NOTICE: /dev/sdd: nvidia discovering
NOTICE: /dev/sdd: pdc discovering
NOTICE: /dev/sdd: sil discovering
NOTICE: /dev/sdd: via discovering
NOTICE: added /dev/sda to RAID set "isw_dxifadxaii"
NOTICE: added /dev/sdb to RAID set "isw_dxifadxaii"
NOTICE: added /dev/sdc to RAID set "isw_dxifadxaii"
NOTICE: added /dev/sdd to RAID set "isw_dxifadxaii"
*** Group superset isw_dxifadxaii
--> Active Superset
name : isw_dxifadxaii_Volume0
size : 1953535744
stride : 128
type : raid01
status : ok
subsets: 2
devs : 4
spares : 0
--> Active Subset
name : isw_dxifadxaii_Volume0-0
size : 976767872
stride : 128
type : mirror
status : ok
subsets: 0
devs : 2
spares : 0
--> Active Subset
name : isw_dxifadxaii_Volume0-1
size : 976767872
stride : 128
type : mirror
status : ok
subsets: 0
devs : 2
spares : 0
Note that after next boot, the device letters will be different, and slot 0
will be /dev/sda again. The RAID BIOS and dmraid driver should have no problem
with this, hopefully.
================================================================================