bad blocks... random death

Kenneth Goodwin kgoodwin at datamarktech.com
Fri Aug 13 16:44:38 UTC 2004


Two cents worth being oblivious to previous discussions in
this thread.
see below in-line.

>  -----Original Message-----
>  From: redhat-list-bounces at redhat.com
>  [mailto:redhat-list-bounces at redhat.com]On Behalf Of
Thierry ITTY
>  Sent: Friday, August 13, 2004 11:33 AM
>  To: redhat-list at redhat.com
>  Subject: bad blocks... random death
>
>
>  this continues discussions about bad disk blocks not
really
>  bad and redhat
>  9 dying randomly
>
>  we're now a few on this list experiencing various
symptoms
>  (dma errors, bad
>  blocks on disks, system freeze or death) that look like
>  hardware problems.
>  after talking together we can now say that those problems
>  are pure OS
>  problems.

If all are SMP systems, then perhaps there is a Spinlock
conflict
(multi-cpu contention) problem with the disk driver.
But I doubt that the disk drivers in the kernel have changed
in years.
I am running RH9 on several heavily used scsi based Compaq
multi-cpu machines with no problems.
So based on my experience, I dount believe in a softwrae
issue here.

>
>  the disks with bad blocks work actually fine elswhere (in
my
>  case I ran the
>  manufacturer low-level diags and no disk had any problem.
>  and, ain't it
>  very strange that 10 disks get the same problems at the
same
>  time ?!!!)

Not if you have an EMI (electro-magnetic interference)
shielding issue. The drives are fine.
They might be cross
polluting each other ,the cables and/or the controllers with
EMI.
that will corrupt the bit sream between the drives and the
controller and give you errors.

The heavier you use the drives, the more the
magnetic coils that move the heads are used.  Those coils
put out an EMI field.
The more your use the drives, the more consistent that EMI
field is and without good grounding
it "leak" into whatever copper ground path is available
including your drive cables,
power cables, etc.
normally Emi is drained off through the drive's grounds to
the chassis. It's
grounded to the chassis and through the chassis to the
ground line on the power supply to earth.

check the following if you haven't already as it applies to
your system:

1) get an electrical outlet tester at your local Home
Depot/Loews et.al

2) Check the outlets your systems are plugged into. (if you
use non nema 5-15R/5-20R outlets (household type)
then get a tester or electrical testing service in to check
your grounds.)

3) Make sure you have a good reliable earth ground at the
outlet. If you dont, get it fixed.
You would be surprised at how many outlets dont have valid
earth grounds.
If you are in a commercial building, your data center
outlets should have been installed with
ISOLATED Grounds , that is a separate ground wire between
the power panel and the receptable.
Most commercial electrical uses the metal jacket as a ground
path and that tends to come apart over time
(ie NO MORE GROUND)

4) Check the power supply - make sure you are not
overloading it past it's rated maximum output. Make sure
that it is grounded to the chassis and to the earth ground.
Normally it grounds the chassis through it's case
but some have separate ground connections, look for ground
screw connections.

5) If your drives have ground screws or Tabs on them,
connect them to a reliable chassis ground point.
dont assume they have a good ground through the drive
mounting screws.

6) Use round shielded cables and watch the grounds on them.
If they are single ended grounds on the shields
make sure that the connected end is connected to a valid
ground source.

7) Grounds are normally single end connected to prevent
ground fault loops, that is, you dont want more than one
ground path here if you can help it. Multiple ground paths
wont help and can hurt under the wrong circumstances. Drives
with ground tabs dont generally ground through the mounting
screws, but check the drive specs. A cable with the shield
connected at both ends is also expecting to ground the
drive, the cable
should be connecting to a ground pin on the drives
interface.

8) If you have these drives "dense packed" in your chassis,
you might want to consider putting
grounded shields between them if all else fails, grounded
copper plates for example.

9) Make sure that you route the power cables away from the
drive controller cables within the chassis.

10) look for ways that EMI could be crossing.

11) You might just have one really EMI noisy drive. There
are EMI meters that can be used
to measure EMI levels.

12) You can also be subject to a different wavelength of
radiation knows as RFI , or Radio Frequency
Interference.




>
>  the problem happens on various machines (gigabyte, asus,
>  athlon, pentium,
>  maxtor, western...).
>
>  it seems it is related to high load periods (in my case a
>  heavily used file
>  server).
>
>  we've been advised to change dma disks settings. I tried
>  various things (no
>  dma at all, forcing mdma0 or udma2). the system behave
>  differently (either
>  no errors or other errors as dma timeouts), but it's not
>  working quite well
>  (for example deactivating dma on disks lowers the average
network
>  throughput from 50 MB/s to 1.5 !!! almost 40 times slower
!!!
>
>  we really need help to investigate this problem which
causes
>  io errors and
>  fs corruption !
>
>  tia
>
>
>  --
>  redhat-list mailing list
>  unsubscribe
mailto:redhat-list-request at redhat.com?subject=unsubscribe
>  https://www.redhat.com/mailman/listinfo/redhat-list
>





More information about the redhat-list mailing list