file system, kernel or hardware raid failure?

Vegard Svanberg vegard at svanberg.no
Wed Mar 4 10:53:11 UTC 2009


I had a busy mailserver fail on me the other day. Below is what was
printed in dmesg. We first suspected a hardware failure (raid controller
or something else), so we moved the drives to another (identical
hardware) machine and ran fsck. Fsck complained ("short read while
reading inode") and asked if I wanted to ignore and rewrite (which I
did). 

After booting up again, the problem came back immediately and root was
remounted read only. We moved the data from the read only drive to a new
machine. While copying the data, we got this message from time to time
(on various files): "EXT3-fs error (device dm-0): ext3_get_inode_loc:
unable to read inode block - inode=22561891, block=90243144.

I need to find the cause(s) of the problems. So far I have these
questions/concerns:

- Kernel bug? (This is Ubuntu 8.10 with 2.6.27-7-server)
- Filesystem bug/failure?
- Did the RAID controller fail to detect a failing drive? This is an
  Adaptec aoc-usas-s4ir running on a Supermicro motherboard.

I suspect that one of the drives (RAID 6 btw) has failed, but I'm not
sure what to do from here.

Any ideas? Thanks in advance.

dmesg:

[   38.907730] end_request: I/O error, dev sda, sector 284688831
[   38.907802] EXT3-fs error (device dm-0): read_block_bitmap: Cannot =
read block bitmap - block_group =3D 1086, block_bitmap =3D 35586048
[   38.907956] Aborting journal on device dm-0.
[   38.919742] ext3_abort called.
[   38.919798] EXT3-fs error (device dm-0): ext3_journal_start_sb: =
Detected aborted journal
[   38.919942] Remounting filesystem read-only
[   38.925855] __journal_remove_journal_head: freeing b_committed_data
[   38.925915] journal commit I/O error
[   38.925935] journal commit I/O error
[   38.925953] journal commit I/O error
[   38.943245] Remounting filesystem read-only
[   38.958907] EXT3-fs error (device dm-0) in ext3_reserve_inode_write: =
Journal has aborted
[   38.958988] EXT3-fs error (device dm-0) in ext3_truncate: Journal has =
aborted
[   38.959051] EXT3-fs error (device dm-0) in ext3_reserve_inode_write: =
Journal has aborted
[   38.959137] EXT3-fs error (device dm-0) in ext3_orphan_del: Journal =
has aborted
[   38.959222] EXT3-fs error (device dm-0) in ext3_reserve_inode_write: =
Journal has aborted
[   39.024087] journal commit I/O error
[   39.024103] journal commit I/O error
[   39.024117] journal commit I/O error
[   39.024124] journal commit I/O error
[   39.024181] journal commit I/O error
[   39.024201] journal commit I/O error
[   39.024208] journal commit I/O error
[   39.024258] journal commit I/O error
[   39.024275] journal commit I/O error
[   39.024284] journal commit I/O error
[   39.024330] journal commit I/O error
[   39.024358] journal commit I/O error
[   39.024384] journal commit I/O error
[   39.024432] journal commit I/O error
[   39.024481] journal commit I/O error
[   45.749997] sd 0:0:0:0: [sda] Result: hostbyte=3DDID_OK driverbyte=3DD=
RIVER_SENSE,SUGGEST_OK
[   45.750008] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]=20
[   45.750012] sd 0:0:0:0: [sda] Add. Sense: Internal target failure
[   45.750017] end_request: I/O error, dev sda, sector 721945599
[   45.750079] Buffer I/O error on device dm-0, logical block 90243144
[   45.750137] lost page write due to I/O error on dm-0
[   87.970284] sd 0:0:0:0: [sda] Result: hostbyte=3DDID_OK driverbyte=3DD=
RIVER_SENSE,SUGGEST_OK
[   87.970292] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]=20
[   87.970296] sd 0:0:0:0: [sda] Add. Sense: Internal target failure
[   87.970302] end_request: I/O error, dev sda, sector 83324999

-- 
Vegard Svanberg <vegard at svanberg.no> [*Takapa at IRC (EFnet)]




More information about the Ext3-users mailing list