[Linux-cluster] Unable to connect to cluster infrastructure - cluster died

David Brieck Jr. dbrieck at gmail.com
Fri Oct 13 12:16:00 UTC 2006


On 10/13/06, Matteo Catanese <m.catanese at kinetikon.com> wrote:
> Hi all,
> i had a perfectly working 2-node cluster.
>
> I saw kernel security updates and cluster bugfix update, so i waited
> 2 weeks and decided, today, to do the updates
>
> I disabled my cluster service (oracle) , patched both machines and
> rebooted
>
> After reboot i had:
>
> [root at lvzbe1 kernel]# clustat
> Could not connect to cluster service
>
> and a bunch of
> Oct 13 13:51:55 lvzbe2 ccsd[3381]: Unable to connect to cluster
> infrastructure after 3840 seconds.
> Oct 13 13:52:26 lvzbe2 ccsd[3381]: Unable to connect to cluster
> infrastructure after 3870 seconds.
> Oct 13 13:52:56 lvzbe2 ccsd[3381]: Unable to connect to cluster
> infrastructure after 3900 seconds.
> Oct 13 13:53:26 lvzbe2 ccsd[3381]: Unable to connect to cluster
> infrastructure after 3930 seconds.
>
> Cluster DIED.
>
> I did investigations and i discovered that someone _forgot_ to
> compile dlm-smp and cman-smp for the latest redhat kernel.
>
> this is the "old" kernel:
>
> [root at lvzbe1 kernel]# cd /lib/modules/2.6.9-42.0.2.ELsmp/kernel/
> [root at lvzbe1 kernel]# ls -la
> total 44
> drwxr-xr-x  10 root root 4096 Sep  4 10:17 .
> drwxr-xr-x   3 root root 4096 Oct 13 12:56 ..
> drwxr-xr-x   3 root root 4096 Sep  4 10:17 arch
> drwxr-xr-x   2 root root 4096 Oct 13 12:56 cluster
> drwxr-xr-x   2 root root 4096 Sep  4 10:17 crypto
> drwxr-xr-x  29 root root 4096 Sep  4 10:17 drivers
> drwxr-xr-x  22 root root 4096 Sep  4 10:17 fs
> drwxr-xr-x   3 root root 4096 Sep  4 10:17 lib
> drwxr-xr-x  13 root root 4096 Sep  4 10:17 net
> drwxr-xr-x  10 root root 4096 Sep  4 10:17 sound
> [root at lvzbe1 kernel]#
>
>
> and this is the "new" one:
>
> root at lvzbe1 kernel]# cd /lib/modules/2.6.9-42.0.3.ELsmp/kernel/
> [root at lvzbe1 kernel]# ls -la
> total 36
> drwxr-xr-x   9 root root 4096 Oct 13 12:20 .
> drwxr-xr-x   3 root root 4096 Oct 13 12:31 ..
> drwxr-xr-x   3 root root 4096 Oct 13 12:20 arch
> drwxr-xr-x   2 root root 4096 Oct 13 12:20 crypto
> drwxr-xr-x  29 root root 4096 Oct 13 12:20 drivers
> drwxr-xr-x  22 root root 4096 Oct 13 12:20 fs
> drwxr-xr-x   3 root root 4096 Oct 13 12:20 lib
> drwxr-xr-x  13 root root 4096 Oct 13 12:20 net
> drwxr-xr-x  10 root root 4096 Oct 13 12:20 sound
> [root at lvzbe1 kernel]#
>
>
> As you can see, the latest kernel does not have the "cluster" directory.
>
> This is the latest cman:
>
> [root at lvzbe1 kernel]# rpm -qil cman-kernel-smp-2.6.9-45.5
> Name        : cman-kernel-smp              Relocations: (not
> relocatable)
> Version     : 2.6.9                             Vendor: Red Hat, Inc.
> Release     : 45.5                          Build Date: Fri 18 Aug
> 2006 07:05:34 PM CEST
> Install Date: Fri 13 Oct 2006 12:56:36 PM CEST      Build Host: hs20-
> bc1-3.build.redhat.com
> Group       : System Environment/Kernel     Source RPM: cman-
> kernel-2.6.9-45.5.src.rpm
> Size        : 340198                           License: GPL
> Signature   : DSA/SHA1, Tue 22 Aug 2006 09:51:57 PM CEST, Key ID
> 219180cddb42a60e
> Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
> Summary     : cman-kernel-smp - The Cluster Manager kernel smp modules
> Description :
> cman-kernel-smp - The Cluster Manager kernel smp modules
> /lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster
> /lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/cman.ko
> /lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/cman.symvers
> [root at lvzbe1 kernel]#
>
>
> and this is the latest dlm:
>
> rpm -qil dlm-kernel-smp-2.6.9-44.2
> Name        : dlm-kernel-smp               Relocations: (not
> relocatable)
> Version     : 2.6.9                             Vendor: Red Hat, Inc.
> Release     : 44.2                          Build Date: Tue 26 Sep
> 2006 10:49:24 PM CEST
> Install Date: Fri 13 Oct 2006 12:20:35 PM CEST      Build Host: hs20-
> bc2-3.build.redhat.com
> Group       : System Environment/Kernel     Source RPM: dlm-
> kernel-2.6.9-44.2.src.rpm
> Size        : 329858                           License: GPL
> Signature   : DSA/SHA1, Thu 28 Sep 2006 09:44:31 PM CEST, Key ID
> 219180cddb42a60e
> Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
> Summary     : dlm-kernel-smp - The Distributed Lock Manager kernel
> modules.
> Description :
> dlm-kernel-smp - The Distributed Lock Manager kernel-smp modules.
> /lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/dlm.ko
> /lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/dlm.symvers
>
> Luckily this is not (yet) a production system, and i REALLY hope i
> did something wrong, even if im sure i did not.
>
> Can i download cman-kernel-src.rpm and dlm-kernel.src.rpm and compile
> myself, while waiting for answers from you ?
>
>
> Matteo
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

The cluster packages are kernel specific and lag behind normal kernel
updates. Not sure if they release cluster updates outside the update
cycle though, I haven't been using them for more than two updates.




More information about the Linux-cluster mailing list