SATA + RAID + advice sought

Aleksandar Milivojevic amilivojevic at pbl.ca
Wed Nov 24 16:35:13 UTC 2004


Jon Hill wrote:
> The system ran ok for a while but we started to get disk IO errors reporting 
> from one of the drives. It looks like a hardware failure but I am wondering 
> if I may have misconfigured the software RAID which could have caused the IO 
> errors. A setup a single partition but put a SWAP partition on one drive 
> only; I am wondering if this could have been the cause of my disk errors.

No.  Setting up swap on only one partition can not cuase disk errors. 
If you had two disks, you could also have partitioned them identically 
and have two swap partitions.  Another option would be placing swap on 
RAID 1 too.  Placing swap partition on RAID 1 kind of device might slow 
things down marginally, but it is doable and also gives you some system 
stability if kernel survives death of disk drive during run time (in my 
experience, Linux usually hangs if ATA drive fails, so no much use for 
that anyhow).

[snip]

> Most recent kernel updates have not been applied because the boot loader is 
> sat on the disk that failed! The machine has managed to boot from the failed 
> disk but then this drive gets disabled once the kernel has loaded.
> 
> I have now got a boot loader onto the other drive and am using a single drive. 

Hm, yeah, Grub can be pain in the ass to configure properly for mirrored 
boot drives.  I did it some time ago, and it wasn't preatty.  Then I 
gave up on Grub and switched back to LILO.  I find LILO much better for 
this kind of configuration.  It handles mirrored boot partitions 
automatically and installs itself correctly into MBR of *all* drives 
that contain sub-mirrors (Grub tends to install only on a drive with 
first sub-mirror).  As I said, this is possible with Grub too, but it is 
pain in the ass to configure properly, and LILO does the right thing out 
of the box.  So, if you are not into preatty graphics in your boot 
loader, and you think you can remeber to type /sbin/lilo each time you 
change lilo.conf, I'd suggest switching to LILO.

> WARNING:  Kernel Errors Present
>    Buffer I/O error on device hde, l...:  18 Time(s)
>    end_request: I/O error, dev fd0, sector...:  2 Time(s)
>    end_request: I/O error, dev hde, sector...:  18 Time(s)
>    hde: dma_intr: error=0x40 { Uncorrect...:  18 Time(s)
>    hde: dma_intr: status=0x51 { DriveReady SeekComplete Error }...:  18
> Time(s)

These usually mean your hard drive went south.  Get new one from your 
local PC components store and sledge hammer from your local hardware 
store.  Replace the drive and destroy data on old drive using sledge hammer.

-- 
Aleksandar Milivojevic <amilivojevic at pbl.ca>    Pollard Banknote Limited
Systems Administrator                           1499 Buffalo Place
Tel: (204) 474-2323 ext 276                     Winnipeg, MB  R3T 1L7




More information about the fedora-list mailing list