[Linux-cluster] Question about GFS

Michael Conrad Tadpol Tilstra mtilstra at redhat.com
Thu Mar 24 16:06:18 UTC 2005


On Wed, Mar 23, 2005 at 02:11:00PM +0200, Oved Ourfali wrote:
> I have GFS version 6 installes on rhl es3 update 3.
> The GFS includes 3 nodes, a, b and c.
> 
> The three nodes run the lock_gulm daemon, and thus it runs in RLM mode.
> 
> I have done some tests to check that the GFS works correctly, and i
> ran into some thing very weird:
> Lets assume the master is A, and B and C are slaves.
> Disconnecting B or C from the network works fine.
> 
> Disconnecting A causes a problem. Lets assume B tries to be the new
> master. B indicates that A is down, but for some reason it also thinks
> that C is down, thus it waits for enough slaves to contact him, and it
> doesn't happen. I tried to increase the timeout, and now it sometimes
> work and sometimes don't.
> 
> Does anyone have a clue why it is happening ?

For some reason C isn't finding B in time to let it know that it is
still alive.  So, first question, what values are you using for
heartbeat_rate and allowed_misses? Are you seeing this with the
defaults? or are you using something else? (before increasing it)

Also, you can add LoginLoops to the verbosity setting to have gulm print
out much more detail when it is trying to connect and find the master
server. 

-- 
Michael Conrad Tadpol Tilstra
BE ALERT!!!!  (The world needs more lerts ...)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050324/c00f41e5/attachment.sig>


More information about the Linux-cluster mailing list