Kernel bug or disk failure

Chris Snook csnook at redhat.com
Fri Jul 11 21:43:15 UTC 2008


Sam Varshavchik wrote:
> Every other week or so, I get a disk kicked out of my RAID, with this:
> 
> Jul  6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device overrun 
> (status 10) on 0:0:0
> Jul  6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in 
> phase, 1 SCBs aborted, PRGMCNT == 0x22f
> Jul  6 04:05:38 commodore kernel: >>>>>>>>>>>>>>>>>> Dump Card State 
> Begins <<<<<<<<<<<<<<<<<
> Jul  6 04:05:38 commodore kernel: scsi1: Dumping Card State at program 
> address 0x22d Mode 0x22
> Jul  6 04:05:38 commodore kernel: Card was paused
> 
> … followed by a rather dry dump of the HBA's registers. This is aic79xxx.
> 
> This does not look like a disk error to me. I re-add the drive into the 
> array, and rebuild with no downtime. SMART shows 0 in the defect list on 
> this drive, and over the disk's lifetime 0 uncorrectable reads and 1 
> uncorrectable write -- but this kernel barf already happened 4-5 times 
> now, and it's getting rather annoying.
> 

Looks more like a controller problem than a drive problem.  Do you have a spare 
HBA to test?

-- Chris




More information about the fedora-list mailing list