[Linux-cluster] cluster suite crashing
chris at cmiware.com
Thu Aug 2 00:55:22 UTC 2007
I am again attempting a 2-node cluster (two_node=1). This time we have
power fencing, creating a cluster config from scratch.
Unplug network cables on Node A. Node B still plugged in. (Expected B to
Node B does not attempt fencing, claims to have lost quorum (???). (
Plug Node A back in.
Node A fences Node B
On reboot, Node B reboots itself right after fencing Node A.
clurgmgrd: <crit> *Watchdog: Daemon died, rebooting
*Various things appear directly ahead of this in the log. Most of the
time it was a service script that was failing a stop operation.
Correcting it did not resolve the issue:
[/var/log/messages on Node B]
clurgmgrd: <notice> Resource Group Manager Starting
clurgmgrd: <crit> Watchdog: Daemon died, rebooting...
kernel: md: stopping all md devices.
fenced: fence "[Node A]" success
[some pertinent lines from cluster.conf - they are identical on each node]
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="12"/>
<cman expected_votes="1" two_node="1"/>
Meanwhile, Node A comes up and fences B when it gets a chance.
I'm really at a loss on what to do. We are running the RHEL 5 rpms from
RHN. Googling the error message yields some results on crashes in
RGManager which were allegedly fixed in version 4. I have seen some
other squirrelly behavior out of RGManager at various points, but
reboots seemed to fix those so I figured proper fencing might render
Any advice is welcome.
More information about the Linux-cluster