[Linux-cluster] Using GULM with CLVM

Thu Jun 29 22:53:41 UTC 2006

>
> clvmd should log errors to syslog. Check that the gulm cluster is quorate as
> clvmd won't do anything without a quorate cluster. You might also like to run
> it with -d and see if any errors appear on stderr.
>

Yes you're right, thanks, if clvmd & lvm are stuck this is because
gulm is inquorate and simply doesn't work at all....

But I still can't figure out on how making it work, I've spent a lot
of hours on it now, and all of my problems seems to be
IPv4/IPv6/hostname related (I guess)....

To ease the configuration and the trial process, I've reduced my
cluster.conf to the simplest case (I guess) : just 2 client nodes and
1 master gulm one. All of them are working on the 10.x.x.x private
IPv4 subnet.

Again I don't know if I can trust the 'documentation' or not, since it
is written that gulm is working on IPv6 sockets only and on the man
pages (man lock_gulmd) it seems that both IPv4 and IPv6 sockets are
handled by GULM.

My first problem is this one :

venus:/etc/init.d# lock_gulmd --use_ccs
Warning! You didn't specify a cluster name before --use_ccs
  Letting ccsd choose which cluster we belong to.
I cannot find the name for ip "::ffff:10.1.1.5". Stopping.
Gulm requires 1,3,4, or 5 nodes to be specified in the servers list.
You specified 0
I cannot find the name for ip "::ffff:10.1.1.5". Stopping.
Gulm requires 1,3,4, or 5 nodes to be specified in the servers list.
You specified 0
venus:/etc/init.d# I cannot find the name for ip "::ffff:10.1.1.5". Stopping.
Gulm requires 1,3,4, or 5 nodes to be specified in the servers list.
You specified 0

Apparently, GULM forces some kind of ~IPv6-translated-IPv4~ address
that it can't find anywhere on the system.

Here's my /etc/hosts :

----------------------------------------
venus:/etc/init.d# cat /etc/hosts
127.0.0.1       localhost.localdomain   localhost
10.1.1.5        venus

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
------------------------------------------

And here's my cluster.conf
-----------------------------------------------------------
venus:/etc/init.d# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster name="iliona" config_version="1">

<gulm>
        <lockserver name="10.1.1.5"/>
</gulm>

<clusternodes>

<clusternode name="mars">
        <fence>
                <method name="single">
                        <device name="human" ipaddr="10.1.1.3"/>
                </method>
        </fence>
</clusternode>

<clusternode name="venus">
        <fence>
                <method name="single">
                        <device name="human" ipaddr="10.1.1.5"/>
                </method>
        </fence>
</clusternode>

<clusternode name="triton">
        <fence>
                <method name="single">
                        <device name="human" ipaddr="10.1.1.6"/>
                </method>
        </fence>
</clusternode>

</clusternodes>

<fencedevices>
        <fencedevice name="human" agent="fence_manual"/>
</fencedevices>

</cluster>
---------------------------------------------------------

So in order to force it's host-matching process, I've modified my
/etc/hosts according to that :

------------------------------------------
venus:/etc/init.d# cat /etc/hosts
127.0.0.1       localhost.localdomain   localhost
10.1.1.5        venus
::ffff:10.1.1.5 venus

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
-------------------------------------------

With that one, it *SEEMED* to work (it's not printing messages at
runtime anymore and silently forks the daemon), but still on the logs
:

It seems it cannot find his IP or host again : Jun 30 00:19:03 venus
lock_gulmd_LTPX[26223]: I am (venus) with ip (::)  :(

Here's the whole part :

Jun 30 00:19:01 venus lock_gulmd_main[26211]: Forked lock_gulmd_core.
Jun 30 00:19:01 venus lock_gulmd_core[26215]: Starting lock_gulmd_core
1.02.00. (built Jun 23 2006 18:56:19) Copyright (C) 2004 Red Hat, Inc.
 All rights reserved.
Jun 30 00:19:01 venus lock_gulmd_core[26215]: I am running in Standard mode.
Jun 30 00:19:01 venus lock_gulmd_core[26215]: I am (venus) with ip (::)
Jun 30 00:19:01 venus lock_gulmd_core[26215]: This is cluster iliona
Jun 30 00:19:01 venus lock_gulmd_core[26215]: EOF on xdr (Magma::26198
::1 idx:1 fd:6)
Jun 30 00:19:02 venus lock_gulmd_main[26211]: Forked lock_gulmd_LT.
Jun 30 00:19:02 venus lock_gulmd_LT[26219]: Starting lock_gulmd_LT
1.02.00. (built Jun 23 2006 18:56:19) Copyright (C) 2004 Red Hat, Inc.
 All rights reserved.
Jun 30 00:19:02 venus lock_gulmd_LT[26219]: I am running in Standard mode.
Jun 30 00:19:02 venus lock_gulmd_LT[26219]: I am (venus) with ip (::)
Jun 30 00:19:02 venus lock_gulmd_LT[26219]: This is cluster iliona
Jun 30 00:19:02 venus lock_gulmd_LT000[26219]: Not serving locks from
this node.
Jun 30 00:19:02 venus lock_gulmd_core[26215]: EOF on xdr (Magma::26198
::1 idx:1 fd:6)
Jun 30 00:19:03 venus lock_gulmd_main[26211]: Forked lock_gulmd_LTPX.
Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: Starting lock_gulmd_LTPX
1.02.00. (built Jun 23 2006 18:56:19) Copyright (C) 2004 Red Hat, Inc.
 All rights reserved.
Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: I am running in Standard mode.
Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: I am (venus) with ip (::)
Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: This is cluster iliona
Jun 30 00:19:03 venus ccsd[26197]: Connected to cluster infrastruture
via: GuLM Plugin v1.0.4
Jun 30 00:19:03 venus ccsd[26197]: Initial status:: Inquorate

And indeed it's not acting as a 'Server/Master' but as a 'Client' too :

venus:/etc/init.d# gulm_tool getstats venus
I_am = Client
quorum_has = 1
quorum_needs = 1
rank = -1
quorate = false
GenerationID = 0
run time = 128
pid = 27456
verbosity = Default
failover = disabled
venus:/etc/init.d#

Of course the other 2 nodes are acting the same way, and with no
master, the cluster is always in inquorate/unusable state, hence my
problems with clvmd/lvm.

I've tried many other things, like putting the names inside
cluster.conf (with the names inside /etc/hosts or DNS-based only,
etc..) instead of IPs, etc.... but still the same error.

I am getting really confused by the whole system and the lack of
documentation is really painful for me to find my mistakes as a
cluster-suite beginner :/

If you have any ideas :),

Thanks a lot,

Ugo PARSI

-- 
An apple a day, keeps the doctor away