[Linux-cluster] two node cluster, 2nd node hangs in join

Wed May 4 07:33:51 UTC 2005

Hello, hopefully someone has ran into this and it's a quick fix. I'm using
a vanilla 2.6.9 kernel and the newest (as of tonite)  cvs branch from
-rRHEL4.  My sequence is to startup ccsd on both nodes, and then I try to
have both of them join (with a brief wait before I have the 2nd one try).
Here's what I get from the cman_tool's view of the nodes.

phung # cman_tool nodes
Node  Votes Exp Sts  Name
   3    1    1   J   blade03
   4    1    1   M   blade04

and in /var/log/messages, I see this:
  CMAN: sending membership request

followed by many:
  last message repeated 7 times

In addition I ran a tcpdump, and there seem to be UDP packets flying
around from node to node, using port 6809, so the network seems fine.
How would I debug this further?  What kinds of tools are people using
to debug their config/setup?

here's my config.

<?xml version="1.0"?>
<cluster name="blade_cluster" config_version="3">
        <fencedevices>
          <fencedevice name="blade_san" agent="fence_manual"/>
        </fencedevices>

        <fence_daemon clean_start="0">
        </fence_daemon>

        <cman two_node="1" expected_votes="1">
          <multicast addr="224.0.0.1"/>
        </cman>

        <clusternodes>
          <clusternode name="blade03" nodeid="3" votes="1">
          <multicast addr="224.0.0.1" interface="eth0"/>
             <fence>
               <method name="human">
                 <device name="last_resort" ipaddr="blade03"/>
               </method>
             </fence>
          </clusternode>

          <clusternode name="blade04" nodeid="4" votes="1">
             <multicast addr="224.0.0.1" interface="eth0"/>
             <fence>
               <method name="human">
                 <device name="last_resort" ipaddr="blade04"/>
               </method>
             </fence>
          </clusternode>
        </clusternodes>
</cluster>

regards,
Dan

--