Re: file system, kernel or hardware raid failure?

Vegard Svanberg wrote:
> I had a busy mailserver fail on me the other day. Below is what was
> printed in dmesg. We first suspected a hardware failure (raid controller
> or something else), so we moved the drives to another (identical
> hardware) machine and ran fsck. Fsck complained ("short read while
> reading inode") and asked if I wanted to ignore and rewrite (which I
> did). 
> After booting up again, the problem came back immediately and root was
> remounted read only. We moved the data from the read only drive to a new
> machine. While copying the data, we got this message from time to time
> (on various files): "EXT3-fs error (device dm-0): ext3_get_inode_loc:
> unable to read inode block - inode=22561891, block=90243144.
> I need to find the cause(s) of the problems. So far I have these
> questions/concerns:
> - Kernel bug? (This is Ubuntu 8.10 with 2.6.27-7-server)
> - Filesystem bug/failure?
> - Did the RAID controller fail to detect a failing drive? This is an
>   Adaptec aoc-usas-s4ir running on a Supermicro motherboard.
> I suspect that one of the drives (RAID 6 btw) has failed, but I'm not
> sure what to do from here.
> Any ideas? Thanks in advance.
> dmesg:
> [   38.907730] end_request: I/O error, dev sda, sector 284688831

Drive hardware on sda failing; I'd run smart tools or vendor
diagnostics, to be sure.


> [   45.749997] sd 0:0:0:0: [sda] Result: hostbyte=3DDID_OK driverbyte=3DD=
> [   45.750008] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]

I can't speak to whether the raid controller should have detected this.


