[K12OSN] RAID1 failure: need help

Les Mikesell les at futuresource.com
Sat Oct 1 01:38:06 UTC 2005


On Fri, 2005-09-30 at 19:23, Robert Arkiletian wrote:
> Wondering what I should do first. Any advice? Never been in this
> situation before. System is still working fine on 1 drive. BTW
> md0 is /
> md1 is /home
> md2 is /var
> md3 is swap
> 
> [ark at server ~]$ cat /proc/mdstat
> Personalities : [raid1]
> md1 : active raid1 sdb2[2](F) sda2[0]
>      20482752 blocks [2/1] [U_]
> 
> md2 : active raid1 sdb3[1] sda3[0]
>      2048192 blocks [2/2] [UU]
> 
> md3 : active raid1 sdb4[1] sda4[0]
>      1020032 blocks [2/2] [UU]
> 
> md0 : active raid1 sdb1[2](F) sda1[0]
>      12289600 blocks [2/1] [U_]
> 

Assuming you can shut the machine down, swap in a good drive for
sdb - preferably one with no linux filesystem labels or raid
partitions that might confuse things on the initial boot.
The machine should boot normally with all of the md devices
'broken' but working anyway.
Do an 'fdisk -l /dev/sda' to see the partition setup on
the working disk.  fdisk /dev/sdb and duplicate it, setting
the partition types to 'FD' (linux raid).
Then for each md device, add the corresponding partition:
mdadm /dev/md0 --add /dev/sb1  (etc.)
  cat /proc/mdstat
will show the resync status.

If you have hot swap disk carriers you can do this without
shutting down, but you need a few more steps to fail and
remove the still-working partitions from the raid, and then
to remove the scsi device and add one back.

If you are using grub and want to be able to boot with a
failed first drive you need to repeat the operation to
install grub on it.

And if the drive is still under warranty, go the the mfg's web
site, put in the serial number, and get an RMA.

-- 
  Les Mikesell
    les at futuresource.com





More information about the K12OSN mailing list