[K12OSN] RAID1 failure: need help

Robert Arkiletian robark at gmail.com
Sat Oct 1 06:00:22 UTC 2005


On 9/30/05, Les Mikesell <les at futuresource.com> wrote:
>
> Assuming you can shut the machine down, swap in a good drive for
> sdb - preferably one with no linux filesystem labels or raid
> partitions that might confuse things on the initial boot.
> The machine should boot normally with all of the md devices
> 'broken' but working anyway.

Thanks for the quick replies guys. I have a few more comments and questions.

After looking at the logs it seems this happened when a full class was
logging on. It's not a power failure situation. Last year (when I was
running 3.1.2 without raid) I had this same drive flake out on me with
i/o errors. So I ran multiple checks on it with seagate diagnostic
utils and also the built-in scsi controller bios tests. Everything
came up negative. So I figured the drive was okay. I figured it must
have been the fault of the scsi controller driver. My controller is
the Adaptec aic7902w. Wondering if you guys have the same controller?

Anyway, now I don't trust this drive any more even if the diagnostics
say it's okay again. I'm going to buy another hd. Question: Do I need
to get the exact same model? I know they have to be the same size but
I can't get the older model with 2 heads/ 1 platter. The new ones
comes with 1head for the same 36gb size. So it's higher density.

BTW have you guys seen this

sfdisk -d /dev/sda > partition.sda
sfdisk -d /dev/sdb > partition.sdb
then to restore
sfdisk /dev/sdb < partition.sdb
or
sfdisk /dev/sda < partition.sda

Is it better to do it manually with fdisk?

> Do an 'fdisk -l /dev/sda' to see the partition setup on
> the working disk.  fdisk /dev/sdb and duplicate it, setting
> the partition types to 'FD' (linux raid).
> Then for each md device, add the corresponding partition:
> mdadm /dev/md0 --add /dev/sb1  (etc.)
>   cat /proc/mdstat
> will show the resync status.

Will do. Thanks Les.

>
> If you have hot swap disk carriers you can do this without
> shutting down, but you need a few more steps to fail and
> remove the still-working partitions from the raid, and then
> to remove the scsi device and add one back.
>

No hot swapping.

> If you are using grub and want to be able to boot with a
> failed first drive you need to repeat the operation to
> install grub on it.

Thanks. I would have forgot this. I still have the notes you helped me
do this with (not that long ago)

>
> And if the drive is still under warranty, go the the mfg's web
> site, put in the serial number, and get an RMA.

It is. But I'm afraid it may show up as good. I'll try doing
diagnostics on it again.

--
Robert Arkiletian
C++ GUI tutorial http://fltk.org/links.php?V19




More information about the K12OSN mailing list