[Linux-cluster] Unable to connect to cluster infrastructure - cluster died

Matteo Catanese m.catanese at kinetikon.com
Fri Oct 13 12:03:18 UTC 2006


Hi all,
i had a perfectly working 2-node cluster.

I saw kernel security updates and cluster bugfix update, so i waited  
2 weeks and decided, today, to do the updates

I disabled my cluster service (oracle) , patched both machines and  
rebooted

After reboot i had:

[root at lvzbe1 kernel]# clustat
Could not connect to cluster service

and a bunch of
Oct 13 13:51:55 lvzbe2 ccsd[3381]: Unable to connect to cluster  
infrastructure after 3840 seconds.
Oct 13 13:52:26 lvzbe2 ccsd[3381]: Unable to connect to cluster  
infrastructure after 3870 seconds.
Oct 13 13:52:56 lvzbe2 ccsd[3381]: Unable to connect to cluster  
infrastructure after 3900 seconds.
Oct 13 13:53:26 lvzbe2 ccsd[3381]: Unable to connect to cluster  
infrastructure after 3930 seconds.

Cluster DIED.

I did investigations and i discovered that someone _forgot_ to  
compile dlm-smp and cman-smp for the latest redhat kernel.

this is the "old" kernel:

[root at lvzbe1 kernel]# cd /lib/modules/2.6.9-42.0.2.ELsmp/kernel/
[root at lvzbe1 kernel]# ls -la
total 44
drwxr-xr-x  10 root root 4096 Sep  4 10:17 .
drwxr-xr-x   3 root root 4096 Oct 13 12:56 ..
drwxr-xr-x   3 root root 4096 Sep  4 10:17 arch
drwxr-xr-x   2 root root 4096 Oct 13 12:56 cluster
drwxr-xr-x   2 root root 4096 Sep  4 10:17 crypto
drwxr-xr-x  29 root root 4096 Sep  4 10:17 drivers
drwxr-xr-x  22 root root 4096 Sep  4 10:17 fs
drwxr-xr-x   3 root root 4096 Sep  4 10:17 lib
drwxr-xr-x  13 root root 4096 Sep  4 10:17 net
drwxr-xr-x  10 root root 4096 Sep  4 10:17 sound
[root at lvzbe1 kernel]#


and this is the "new" one:

root at lvzbe1 kernel]# cd /lib/modules/2.6.9-42.0.3.ELsmp/kernel/
[root at lvzbe1 kernel]# ls -la
total 36
drwxr-xr-x   9 root root 4096 Oct 13 12:20 .
drwxr-xr-x   3 root root 4096 Oct 13 12:31 ..
drwxr-xr-x   3 root root 4096 Oct 13 12:20 arch
drwxr-xr-x   2 root root 4096 Oct 13 12:20 crypto
drwxr-xr-x  29 root root 4096 Oct 13 12:20 drivers
drwxr-xr-x  22 root root 4096 Oct 13 12:20 fs
drwxr-xr-x   3 root root 4096 Oct 13 12:20 lib
drwxr-xr-x  13 root root 4096 Oct 13 12:20 net
drwxr-xr-x  10 root root 4096 Oct 13 12:20 sound
[root at lvzbe1 kernel]#


As you can see, the latest kernel does not have the "cluster" directory.

This is the latest cman:

[root at lvzbe1 kernel]# rpm -qil cman-kernel-smp-2.6.9-45.5
Name        : cman-kernel-smp              Relocations: (not  
relocatable)
Version     : 2.6.9                             Vendor: Red Hat, Inc.
Release     : 45.5                          Build Date: Fri 18 Aug  
2006 07:05:34 PM CEST
Install Date: Fri 13 Oct 2006 12:56:36 PM CEST      Build Host: hs20- 
bc1-3.build.redhat.com
Group       : System Environment/Kernel     Source RPM: cman- 
kernel-2.6.9-45.5.src.rpm
Size        : 340198                           License: GPL
Signature   : DSA/SHA1, Tue 22 Aug 2006 09:51:57 PM CEST, Key ID  
219180cddb42a60e
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Summary     : cman-kernel-smp - The Cluster Manager kernel smp modules
Description :
cman-kernel-smp - The Cluster Manager kernel smp modules
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/cman.ko
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/cman.symvers
[root at lvzbe1 kernel]#


and this is the latest dlm:

rpm -qil dlm-kernel-smp-2.6.9-44.2
Name        : dlm-kernel-smp               Relocations: (not  
relocatable)
Version     : 2.6.9                             Vendor: Red Hat, Inc.
Release     : 44.2                          Build Date: Tue 26 Sep  
2006 10:49:24 PM CEST
Install Date: Fri 13 Oct 2006 12:20:35 PM CEST      Build Host: hs20- 
bc2-3.build.redhat.com
Group       : System Environment/Kernel     Source RPM: dlm- 
kernel-2.6.9-44.2.src.rpm
Size        : 329858                           License: GPL
Signature   : DSA/SHA1, Thu 28 Sep 2006 09:44:31 PM CEST, Key ID  
219180cddb42a60e
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Summary     : dlm-kernel-smp - The Distributed Lock Manager kernel  
modules.
Description :
dlm-kernel-smp - The Distributed Lock Manager kernel-smp modules.
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/dlm.ko
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/dlm.symvers

Luckily this is not (yet) a production system, and i REALLY hope i  
did something wrong, even if im sure i did not.

Can i download cman-kernel-src.rpm and dlm-kernel.src.rpm and compile  
myself, while waiting for answers from you ?


Matteo






More information about the Linux-cluster mailing list