RAID-1 (mirroring) disk failed; now what?
Will Partain
will.partain at verilab.com
Tue Aug 30 13:21:11 UTC 2005
FC3 machine, two disks, one mirrors the other. Software RAID.
Lovely. (Time passes.) The first disk (a Hitachi Deskstar 7K250, if
anyone cares) dies suddenly. The RAID software does the right thing
(more or less: the machine was unusable until after a reboot).
But now what? (Hint: a nice Fedora Docs topic; it's sorely
underdocumented in general :-) This article
[http://mark.foster.cc/articles/raid-rebuild.html] is an exception.)
Please advise if I've got any of the following wrong...
* Get RAID to "stand down" re the dead disk; something like...
("mdadm --query --detail /dev/md<whatever>" to get the facts...)
# mdadm /dev/md0 --set-faulty /dev/sda1 --remove /dev/sda1
# mdadm /dev/md1 --set-faulty /dev/sda2 --remove /dev/sda2
# mdadm /dev/md2 --set-faulty /dev/sda5 --remove /dev/sda5
# mdadm /dev/md3 --set-faulty /dev/sda6 --remove /dev/sda6
* Which physical disk is the guilty party?
(Oooh, shoulda thought about this _much_ earlier...) OK, I've gotta
yank one disk out, now which one is it? They're identical; oops.
Well, it's the first one (/dev/sda, also logged as 'ata1'), so if I
look in the motherboard manual and find where SATA channel 0 is
connected, and follow that wire... that'll be right, won't it?
A better plan? -- take the cover off, fire the machine up, it's only
using one disk, right? -- so I feel which one is doing something,
and take out the other.
* Will the machine's GRUBness work when the disk is yanked and replaced?
I am about to replace what was /dev/sda with a fresh disk. Has GRUB
got hidden secrets, e.g. in the MBR, such that it won't boot without
the expected first disk?
Even though the machine was RAID-1'd from day one, its grub.conf
includes the comment...
#boot=/dev/sda1
... which is worrying.
Or should I expect to boot in with the rescue CD, get the RAID stuff
tidied up, and only then get back to normal booting?
* Yank old, put in new disk.
(*Mark* it as '/dev/sda' or whatever, for future reference!)
* Getting the right partitioning on the replacement disk.
Seems straightforward:
# sfdisk -d /dev/sdb > /tmp/my-disk-partitioning
# edit /tmp/my-disk-partitioning, replacing 'sdb' with 'sda'
# sfdisk /dev/sda < /tmp/my-disk-partitioning
* Telling the RAID gubbins of its new friends:
This should be plain sailing...
# mdadm /dev/md0 --add /dev/sda1
# mdadm /dev/md1 --add /dev/sda2
# mdadm /dev/md2 --add /dev/sda5
# mdadm /dev/md3 --add /dev/sda6
Any looming disasters in all of that? Assuming non-trivial feedback,
I will summarize back to the list. Thanks!
Will
More information about the fedora-list
mailing list