[Linux-cluster] GFS hangs after several hours
Brynnen R Owen
owen at isrl.uiuc.edu
Fri Nov 12 16:06:49 UTC 2004
More information.
I may have had an old version of ccsd which allowed me to get the
cluster running in the first place. I can't get that far now.
I have IPv6 compiled in the kernel but no IPv6 interfaces defined.
I've given ccsd the -4 flag.
Checking logs after "ccs_test connect" shows that ccsd does not
believe the cluster is quorate.
/etc/cluster/status says that the cluster has reached quorum. The IP
addresses are appropriate (I have dual-NIC hosts).
I recompiled ccsd with "DEBUG=1" and found that the "quorate" variable
was never set in ccsd. I further found that cluster_communicator()
never received a valid fd from clu_connect and was therefore stuck in
a loop. clu_connect appears to be a magma call.
Any advice on how to proceed?
On Thu, Nov 11, 2004 at 12:07:18PM -0600, Brynnen R Owen wrote:
> Hi all,
>
> My setup:
>
> 5 Athlon servers
>
> RedHat 9.0 (Yeah, I still haven't upgraded yet)
>
> kernel-2.6.9 from kernel.org, patched with gfs/ccs/dlm from the
> .tar.gz repository.
>
> using lock_dlm
>
> Using Apple XServe RAIDs with Apple FC cards (mptscsih driver).
>
> I thought I had everything running properly. I had two machines
> hammering a GFS partition at the same time. I pulled the power cord
> on one. fence_vixel kicked in, and the rest of the cluster
> continued. I could repeat this over and over.
>
> I set up two machines, each writing to a different GFS overnight.
> In the morning, there were no errors but one process was hung in a "D"
> state. The fence system did not show any activity. No errors were
> logged anywhere on the cluster. 'df' hung on any machine in the
> cluster when it came to one of the GFS partitions. I shut down the
> ethernet on one of the machines, but it didn't get fenced. It seems
> that something silently died, but I don't really know where to begin
> looking, as I don't see any errors written anywhere. Anyone got any
> ideas?
>
> The only other note is that CCSD appeared to be having some problems
> with determining if the cluster had quorum.
>
> --
> <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
> <> Brynnen Owen ( this space for rent )<>
> <> owen at uiuc.edu ( )<>
> <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
--
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> Brynnen Owen ( this space for rent )<>
<> owen at uiuc.edu ( )<>
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
More information about the Linux-cluster
mailing list