bad blocks... random death

Thierry ITTY thierry.itty at besancon.org
Fri Aug 13 11:32:45 UTC 2004


this continues discussions about bad disk blocks not really bad and redhat
9 dying randomly

we're now a few on this list experiencing various symptoms (dma errors, bad
blocks on disks, system freeze or death) that look like hardware problems.
after talking togeteher we can now say that those problems are pure OS
problems.

the disks with bad blocks work actually fine elswhere (in my case I ran the
manufacturer low-level diags and no disk had any problem. and, ain't it
very strange that 10 disks get the same problems at the same time ?!!!)

the problem happens on various machines (gigabyte, asus, athlon, pentium,
maxtor, western...).

it seems it is related to high load periods (in my case a heavily used file
server). 

we've been advised to change dma disks settings. I tried various things (no
dma at all, forcing mdma0 or udma2). the system behave differently (either
no errors or other errors as dma timeouts), but it's not working quite well
(for example deactivating dma on disks lowers the average network
throughput from 50 MB/s to 1.5 !!! almost 40 times slower !!!

we really need help to investigate this problem which causes io errors and
fs corruption !

tia





More information about the redhat-list mailing list