[Linux-cluster] some questions about setting up GFS

Thu Jan 13 13:40:23 UTC 2005

>> I have 2 nodes - hp1 and hp2. Any of nodes have Integrated Lights-Out
>> with ROM Version: 1.55 - 04/16/2004.
>> 

> The nodes in the servers config line for gulm form a mini-cluster of
> sorts.  There must be quorum (51%) of nodes present in this mini-cluster
> for things to continue.

> You must have two of the three servers up and running so that the
> mini-cluster has quorum, which then will alow the other nodes to
> connect.

I have only 2 nodes and I can't get quorum. Should I use Single Lock
Manager (SLM), when one node is master and another is slave?

But in this case if master goes down slave loses access to common file
system, and it systemlog looks like this:

Jan 13 15:56:59 hp2 kernel: lock_gulm: Checking for journals for node "hp1"
Jan 13 15:56:59 hp2 lock_gulmd_core[2935]: Master Node has logged out.
Jan 13 15:56:59 hp2 kernel: lock_gulm: Checking for journals for node "hp1"
Jan 13 15:56:59 hp2 lock_gulmd_core[2935]: In core_io.c:410 (v6.0.0) death by: Lost connection to SLM Master (hp1),
stopping. node reset required to re-activate cluster operations.
Jan 13 15:56:59 hp2 kernel: lock_gulm: ERROR Got an error in gulm_res_recvd err: -71
Jan 13 15:56:59 hp2 lock_gulmd_LTPX[2941]: EOF on xdr (_ core _:0.0.0.0 idx:1 fd:5)
Jan 13 15:56:59 hp2 lock_gulmd_LTPX[2941]: In ltpx_io.c:335 (v6.0.0) death by: Lost connection to core, cannot
continue. node reset required to re-activate cluster operations.
Jan 13 15:56:59 hp2 kernel: lock_gulm: ERROR gulm_LT_recver err -71
Jan 13 15:57:02 hp2 kernel: lock_gulm: ERROR Got a -111 trying to login to lock_gulmd.  Is it running?

status of lock_gulmd:

[root at hp2 root]# /etc/init.d/lock_gulmd status
lock_gulmd dead but subsys locked

If master boots up after some time happens nothing - slave does not
try to connect.

What should happens further and in what order?

> You really should test that fencing works by running 
> fence_node <node name> for each node in your cluster before running
> lock_gulmd.  This makes sure that fencing is setup and working
> correctly.

> Do that, and once you've verified that fencing is correct (without
> lock_gulmd running) try things again with lock_gulmd.

Result of command
fence_node NODENAME
is reboot of NODENAME. Is it right?

--
Sergey