[Linux-cluster] cman_tool join causes other nodes to kernel panic

Dan B. Phung phung at cs.columbia.edu
Mon May 16 07:02:49 UTC 2005


yes, I updated the cluster.conf by adding more nodes.  

e.g., I added a couple of these blocks (6 more nodes to be exact)

        <clusternode name="blade06" nodeid="1" votes="1">
          <multicast addr="224.0.0.18" interface="eth0"/>
             <fence>
               <method name="single">
                 <device name="human" ipaddr="129.58.15.6"/>
               </method>
             </fence>
          </clusternode>

I first updated the file (incrementing the version), and then ran:
 ccs_tool update cluster.conf
 cman_tool version -r 3

These commands completed without incident.  The failure occured when
running 'cman_tool join -w' on the new node.


On 16, May, 2005, David Teigland declared:

> On Sun, May 15, 2005 at 11:06:23AM -0400, Dan B. Phung wrote:
> > I was adding another node to my cluster, so I updated the configurations
> > and did cman_tool join -w, which caused all the other nodes to kernel
> > panic, which prompted reboot of the cluster.  I pasted the syslog of the
> > blade I just added and the kernel panic message from the other blades
> > below.  I've done this same procedure several times before, so I don't
> > know why this time it caused this assertion.
> > 
> > on the other machines, I see this:
> > 
> > SM:  Assertion failed on line 52 of file 
> > /usr/src/cluster-2.6.9/cman-kernel/src/sm_misc.c
> > SM:  assertion:  "!error"
> > SM:  time = 272181619
> 
> This means there's some sort of internal consistency error within cman.
> If you could explain in more detail the steps you took prior to this I'll
> try to reproduce it.  It sounds like you may have been updating
> cluster.conf while the cluster was running.  If so, what exactly did you
> change?
> 
> Dave
> 

-- 




More information about the Linux-cluster mailing list