Kernel bug or disk failure
Sam Varshavchik
mrsam at courier-mta.com
Sun Jul 13 14:51:14 UTC 2008
Chris Snook writes:
> Sam Varshavchik wrote:
>> Every other week or so, I get a disk kicked out of my RAID, with this:
>>
>> Jul 6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device overrun
>> (status 10) on 0:0:0
>> Jul 6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in
>> phase, 1 SCBs aborted, PRGMCNT == 0x22f
>> Jul 6 04:05:38 commodore kernel: >>>>>>>>>>>>>>>>>> Dump Card State
>> Begins <<<<<<<<<<<<<<<<<
>> Jul 6 04:05:38 commodore kernel: scsi1: Dumping Card State at program
>> address 0x22d Mode 0x22
>> Jul 6 04:05:38 commodore kernel: Card was paused
>>
>> … followed by a rather dry dump of the HBA's registers. This is aic79xxx.
>>
>> This does not look like a disk error to me. I re-add the drive into the
>> array, and rebuild with no downtime. SMART shows 0 in the defect list on
>> this drive, and over the disk's lifetime 0 uncorrectable reads and 1
>> uncorrectable write -- but this kernel barf already happened 4-5 times
>> now, and it's getting rather annoying.
>>
>
> Looks more like a controller problem than a drive problem. Do you have a spare
> HBA to test?
No, but I have one on order, now. I reseated the cable, that didn't help --
the card dumped again about 12 hours later, but it was, apparently,
non-fatal because RAID did not degrade.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/fedora-list/attachments/20080713/8952de61/attachment-0001.sig>
More information about the fedora-list
mailing list