[Linux-cluster] cman_tool join causes other nodes to kernel panic

Dan B. Phung phung at cs.columbia.edu
Sun May 15 15:06:23 UTC 2005


I was adding another node to my cluster, so I updated the configurations
and did cman_tool join -w, which caused all the other nodes to kernel
panic, which prompted reboot of the cluster.  I pasted the syslog of the
blade I just added and the kernel panic message from the other blades
below.  I've done this same procedure several times before, so I don't
know why this time it caused this assertion.

on the other machines, I see this:

SM:  Assertion failed on line 52 of file 
/usr/src/cluster-2.6.9/cman-kernel/src/sm_misc.c
SM:  assertion:  "!error"
SM:  time = 272181619

Kernel panic - not syncing: SM:  Record message above and reboot.

on the just added blade I see this:

May 15 10:44:58 localhost kernel: device-mapper: 4.1.0-ioctl (2003-12-10) 
initialised: dm at uk.sistina.com
May 15 10:45:02 localhost kernel: Lock_Harness <CVS> (built May 15 2005 
10:28:33) installed
May 15 10:45:02 localhost kernel: GFS <CVS> (built May 15 2005 10:28:54) 
installed
May 15 10:45:29 localhost kernel: CMAN <CVS> (built May 15 2005 10:28:17) 
installed
May 15 10:45:29 localhost kernel: NET: Registered protocol family 30
May 15 10:45:29 localhost kernel: dlm: no version for 
"kcl_register_service" found: kernel tainted.
May 15 10:45:29 localhost kernel: DLM <CVS> (built May 15 2005 10:28:29) 
installed
May 15 10:45:29 localhost kernel: Lock_DLM (built May 15 2005 10:28:36) 
installed
May 15 10:55:06 localhost ccsd[3815]: Starting ccsd DEVEL.1115264594:
May 15 10:55:06 localhost ccsd[3815]:  Built: May  4 2005 23:48:37
May 15 10:55:06 localhost ccsd[3815]:  Copyright (C) Red Hat, Inc.  2004  
All rights reserved.
May 15 10:55:07 localhost ccsd[3815]: cluster.conf (cluster name = 
blade_cluster, version = 3) found.
May 15 10:55:07 localhost ccsd[3815]: Remote copy of cluster.conf is from 
quorate node.
May 15 10:55:07 localhost ccsd[3815]:  Local version # : 3
May 15 10:55:07 localhost ccsd[3815]:  Remote version #: 3
May 15 10:55:07 localhost kernel: CMAN: Waiting to join or form a 
Linux-cluster
May 15 10:55:08 localhost ccsd[3815]: Connected to cluster infrastruture 
via: CMAN/SM Plugin v1.1.2
May 15 10:55:08 localhost ccsd[3815]: Initial status:: Inquorate
May 15 10:55:39 localhost kernel: CMAN: forming a new cluster
May 15 10:55:39 localhost kernel: CMAN: quorum regained, resuming activity
May 15 10:55:39 localhost ccsd[3815]: Cluster is quorate.  Allowing 
connections.
May 15 10:55:45 localhost fenced[3853]: blade01 not a cluster member after 
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: blade02 not a cluster member after 
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: blade04 not a cluster member after 
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: blade09 not a cluster member after 
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: blade10 not a cluster member after 
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: blade11 not a cluster member after 
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: blade12 not a cluster member after 
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: fencing node "blade01"

- -

let me know if you need any other info would be helpful.

regards,
dan

-- 




More information about the Linux-cluster mailing list