[Linux-cluster] cman_tool join causes other nodes to kernel panic
Dan B. Phung
phung at cs.columbia.edu
Sun May 15 15:06:23 UTC 2005
I was adding another node to my cluster, so I updated the configurations
and did cman_tool join -w, which caused all the other nodes to kernel
panic, which prompted reboot of the cluster. I pasted the syslog of the
blade I just added and the kernel panic message from the other blades
below. I've done this same procedure several times before, so I don't
know why this time it caused this assertion.
on the other machines, I see this:
SM: Assertion failed on line 52 of file
/usr/src/cluster-2.6.9/cman-kernel/src/sm_misc.c
SM: assertion: "!error"
SM: time = 272181619
Kernel panic - not syncing: SM: Record message above and reboot.
on the just added blade I see this:
May 15 10:44:58 localhost kernel: device-mapper: 4.1.0-ioctl (2003-12-10)
initialised: dm at uk.sistina.com
May 15 10:45:02 localhost kernel: Lock_Harness <CVS> (built May 15 2005
10:28:33) installed
May 15 10:45:02 localhost kernel: GFS <CVS> (built May 15 2005 10:28:54)
installed
May 15 10:45:29 localhost kernel: CMAN <CVS> (built May 15 2005 10:28:17)
installed
May 15 10:45:29 localhost kernel: NET: Registered protocol family 30
May 15 10:45:29 localhost kernel: dlm: no version for
"kcl_register_service" found: kernel tainted.
May 15 10:45:29 localhost kernel: DLM <CVS> (built May 15 2005 10:28:29)
installed
May 15 10:45:29 localhost kernel: Lock_DLM (built May 15 2005 10:28:36)
installed
May 15 10:55:06 localhost ccsd[3815]: Starting ccsd DEVEL.1115264594:
May 15 10:55:06 localhost ccsd[3815]: Built: May 4 2005 23:48:37
May 15 10:55:06 localhost ccsd[3815]: Copyright (C) Red Hat, Inc. 2004
All rights reserved.
May 15 10:55:07 localhost ccsd[3815]: cluster.conf (cluster name =
blade_cluster, version = 3) found.
May 15 10:55:07 localhost ccsd[3815]: Remote copy of cluster.conf is from
quorate node.
May 15 10:55:07 localhost ccsd[3815]: Local version # : 3
May 15 10:55:07 localhost ccsd[3815]: Remote version #: 3
May 15 10:55:07 localhost kernel: CMAN: Waiting to join or form a
Linux-cluster
May 15 10:55:08 localhost ccsd[3815]: Connected to cluster infrastruture
via: CMAN/SM Plugin v1.1.2
May 15 10:55:08 localhost ccsd[3815]: Initial status:: Inquorate
May 15 10:55:39 localhost kernel: CMAN: forming a new cluster
May 15 10:55:39 localhost kernel: CMAN: quorum regained, resuming activity
May 15 10:55:39 localhost ccsd[3815]: Cluster is quorate. Allowing
connections.
May 15 10:55:45 localhost fenced[3853]: blade01 not a cluster member after
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: blade02 not a cluster member after
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: blade04 not a cluster member after
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: blade09 not a cluster member after
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: blade10 not a cluster member after
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: blade11 not a cluster member after
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: blade12 not a cluster member after
6 sec post_join_delay
May 15 10:55:45 localhost fenced[3853]: fencing node "blade01"
- -
let me know if you need any other info would be helpful.
regards,
dan
--
More information about the Linux-cluster
mailing list