[Linux-cluster] GFS hangs after several hours

Brynnen R Owen owen at isrl.uiuc.edu
Thu Nov 11 18:07:18 UTC 2004


Hi all,

My setup:

5 Athlon servers

RedHat 9.0 (Yeah, I still haven't upgraded yet)

kernel-2.6.9 from kernel.org, patched with gfs/ccs/dlm from the
.tar.gz repository.

using lock_dlm

Using Apple XServe RAIDs with Apple FC cards (mptscsih driver).

  I thought I had everything running properly.  I had two machines
hammering a GFS partition at the same time.  I pulled the power cord
on one.  fence_vixel kicked in, and the rest of the cluster
continued.  I could repeat this over and over.

  I set up two machines, each writing to a different GFS overnight.
In the morning, there were no errors but one process was hung in a "D"
state.  The fence system did not show any activity.  No errors were
logged anywhere on the cluster.  'df' hung on any machine in the
cluster when it came to one of the GFS partitions.  I shut down the
ethernet on one of the machines, but it didn't get fenced.  It seems
that something silently died, but I don't really know where to begin
looking, as I don't see any errors written anywhere.  Anyone got any
ideas?

  The only other note is that CCSD appeared to be having some problems
with determining if the cluster had quorum.

-- 
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>  Brynnen Owen            (     this space for rent                      )<>
<>  owen at uiuc.edu           (                                              )<>
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>




More information about the Linux-cluster mailing list