Kernel bug or disk failure

Sam Varshavchik mrsam at courier-mta.com
Mon Jul 14 22:30:36 UTC 2008


Todd Denniston writes:

> Sam Varshavchik wrote, On 07/13/2008 10:51 AM:
>> Chris Snook writes:
>> 
>>> Sam Varshavchik wrote:
>>>> Every other week or so, I get a disk kicked out of my RAID, with this:
>>>>
>>>> Jul  6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device 
>>>> overrun (status 10) on 0:0:0
>>>> Jul  6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in 
>>>> phase, 1 SCBs aborted, PRGMCNT == 0x22f
>>>> Jul  6 04:05:38 commodore kernel: >>>>>>>>>>>>>>>>>> Dump Card State 
>>>> Begins <<<<<<<<<<<<<<<<<
>>>> Jul  6 04:05:38 commodore kernel: scsi1: Dumping Card State at 
>>>> program address 0x22d Mode 0x22
>>>> Jul  6 04:05:38 commodore kernel: Card was paused
>>>>
>>>> … followed by a rather dry dump of the HBA's registers. This is 
>>>> aic79xxx.
>>>>
>>>> This does not look like a disk error to me. I re-add the drive into 
>>>> the array, and rebuild with no downtime. SMART shows 0 in the defect 
>>>> list on this drive, and over the disk's lifetime 0 uncorrectable 
>>>> reads and 1 uncorrectable write -- but this kernel barf already 
>>>> happened 4-5 times now, and it's getting rather annoying.
>>>>
>>>
>>> Looks more like a controller problem than a drive problem.  Do you 
>>> have a spare HBA to test?
>> 
>> No, but I have one on order, now. I reseated the cable, that didn't help 
>> -- the card dumped again about 12 hours later, but it was, apparently, 
>> non-fatal because RAID did not degrade.
>> 
> 
> May I suggest that, when it is convenient to do so, you:
> 1) reboot
> 2) Catch the scsi card ( Ctrl-A ) when the aic79xxx boot text shows up during 
> bios operations.
> 3) set the speed of the scsi bus to that drive to a little slower.
> 4) if you get the fault or the drive is not recognized, repeat until you get a 
> desired result (some drives do not work at ALL the speeds slower than it is 
> rated at, Promise U160 rated array communicated only at 160, 80, 66, 16 & 6).

I'll try that if the replacement card still trips like this. These drives 
have been spinning away, 24x7, for a few years, with nary a hiccup.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/fedora-list/attachments/20080714/1e7b4667/attachment-0001.sig>


More information about the fedora-list mailing list