[Linux-cluster] client doesnt start when lock master is not ready

Adam Manthei amanthei at redhat.com
Thu Feb 24 17:38:19 UTC 2005


On Thu, Feb 24, 2005 at 04:45:27PM -0000, Raj  Kumar wrote:
> Hi All,
> 
> We have a two node system using GFS. One of them is the lock server and 
> other is just client. We restarted our servers recently and brought the 
> lock client before bringing up the lock master. lock_gulmd is set to 
> restart at levels 3, 4 and 5. The lock client system just hungup with the 
> message "Starting lock_gulmd..." in the boot process. It's clear that this 
> situation happened since lock master server wasn't available then. When 
> the lock master server started the lock client system started successfully.

This is the desired behavior.  Adjust the following value in
/etc/sysconfig/gfs if you don't like it's behavior.

# GULM_QUORUM_TIMEOUT -- amount of time to wait for there to be a master
#     before giving up.  If GULM_QUORUM_TIMEOUT is positive, then we will
#     wait GULM_QUORUM_TIMEOUT seconds before giving up and failing when
#     a master server is not found.  If GULM_QUORUM_TIMEOUT is zero, then
#     wait indefinately for a master server.  If GULM_QUORUM_TIMEOUT is
#     negative, just start lock_gulmd and not worry about whether it is
#     quorate.
GULM_QUORUM_TIMEOUT=300

> I noticed before client system started even when lock master was not 
> available and the status of the lock_gulmd on client was set to "pending". 
> But now the system doesnt start until the master server is also started. 

Did you have the system mounting GFS automatically?  Apparently not since it
would have "hung" there too.  The client node should have eventually timed
out after 5 minutes without a master server to log into.

> Has this changed recently? 

Define recently... sort of need the version information you are using :)

My guess is that since you are complaining about this behavior, you just
upgraded from GFS-6.0.0-15 to GFS-6.0.2-24.  From the rpm change log:

* Mon Nov 15 2004 Chris Feist <cfeist at redhat.com> 6.0.2-0
- init.d/lock_gulmd will not start if quorum is not established after
  a specified time (rbz135732).
- init.d/lock_gulmd will not stop if GFS is mounted (rbz135730).
- pool init.d scripts no longer hang on startup until console input
  is provided (rbz137382).


> It is possible that other administrators in the 
> group may have to restart the system at times. If they start the client 
> before master (or worse they dont start master at all) then the system will
> not complete its boot process and other services remain unavailable. 

Your nodes won't be able to mount GFS if there cluster the gulm servers
aren't quorate, so what's the problem?

> I like 
> the system to complete its boot process and have the lock_gulmd stay in 
> pending state until master comes back. Is there any trick to achieve this 
> behavior?

GULM_QUORUM_TIMEOUT=-1 

One other suggestion.  I usually start sshd immediately after networking on
my machines so that I can get into them as soon as possible.  This often
helps when dealing with complaints of this nature.
-- 
Adam Manthei  <amanthei at redhat.com>




More information about the Linux-cluster mailing list