[Linux-cluster] Lock Resources

Fri May 2 12:41:00 UTC 2008

Ja S wrote:
> Hi, Christine:
> 
> Really appreciate your prompt and kind reply. 
> 
> I have some further questions.
> 
> 
>>> 1. Whether the kernel on each server/node is going
>> to
>>> initialize a number of empty lock resources after
>>> completely rebooting the cluster? 
>>>
>>> 2. If so, what is the default value of the number
>> of
>>> empty lock resources? Is it configurable?
>> There is no such thing as an "empty" lock resource.
>> Lock resources are
>> allocated from kernel memory as required. That does
>> mean that the number
>> of resources that can be held on a node is limited
>> by the amount of
>> physical memory in the system. 
> 
> Does it mean the cache allocated for disk IO will be
> reduced to meet the need of more lock resources? 
> 
> If so, for an extremely busy node, when reducing the
> cache, the physical disk IO will increase, which in
> turn increases the processing time (as disk IO is much
> slower than accessing cache), which then in turn
> increases the period of holding the lock resources,
> which in turn makes the kernel grab more memory space
> that should be used for cache in order to create new
> lock resources for new requests, and on and on, and
> eventually ends up to a no-cache situtation at all.
> Would this case ever happen?

I suppose it could happen, yes. There are tuning values for GFS you can
use to make it flush unused locks more frequently but if the locks are
needed then they are needed!

> 
>> I think this addresses 3 & 4.
> 
> Yes, your answer does address them. Thank you.
> However, what will happen if an extremely busy
> application needs to write more new files thus the
> kernel needs to allocate more lock resources but the
> physical memory limit has been reached and all
> existing lock resources cannot be released? I guess
> the kernel will simply force the application go into
> an uninterruptable sleep until some lock resources are
> released or some memories are freed. Am I right?

I think so yes. The VMM is not my speciality

> 
> 
>>> 3. Whether the number of lock resources is fixed
>>> regardless the load of the server?
>>>
>>> 4. If not, how the number of lock resources will
>> be
>>> expended under a heavy load?
>>>
>>> 5. The lock manager maintains a cluster-wide
>> directory
>>> of the locations of the master copy of all the
>> lock
>>> resources within the cluster and evenly divides
>> the
>>> content of the directory across all nodes. How can
>> I
>>> check the content held by a node (what command or
>>> API)?
>> On RHEL4 (cluster 1) systems the lock directory is
>> viewable in
>> /proc/cluster/dlm_dir. I don't think there is
>> currently any equivalent
>> in RHEL5 (cluster 2)
> 
> Thanks. Very helpful. From the busiest node A the
> first several lines of dlm_dir are below. How to
> interpret them, please? 
> 
> DLM lockspace 'data'
>        5         2f06768 1
>        5          114d15 1
>        5          120b13 1
>        5         5bd1f04 1
>        3          6a02f8 2
>        5          cb7604 1
>        5          ca187b 1
> 

The first two numbers are the lock name. Don't ask me what they mean,
that's a GFS question! (actually, I think inode numbers might be
involved) The last number is the nodeID on which the lock is mastered.

> Also there are many files under /proc/cluster, Could
> you please direct me to a place where I can find the
> usages of these files and descriptions of their
> content? 

They are not well documented. Mainly because they are subject to change
and are not a recognised API. Maybe something could be put onto the
cluster wiki at some point.

>>> 6. If only one node A is busy while other nodes
>> are
>>> idle all the time,  does it mean that the node A
>> holds
>>> a very big master copy of lock resources and other
>>> nodes have nothing?
>> That's correct. There is no point in mastering locks
>> on a remote node as
>> it will just slow access down for the only node
>> using those locks.
>>
>>> 7. For the above case, what would be the content
>> of
>>> the cluster-wide directory? Only one entry as only
>> the
>>> node A is really doing IO, or many entries and the
>>> number of entries is the same as the number of
>> used
>>> lock resources on the node A? If the latter case
>> is
>>> true, will the lock manager still divide the
>> content
>>> evenly to other nodes? If so, would it costs the
>> node
>>> A extra time on finding the location of the lock
>>> resources, which is just on itself,  by messaging
>>> other nodes?
>> You're correct that the lock directory will still be
>> distributed around
>> the cluster in this case and that it causes network
>> traffic. It isn't a
>> lot of network traffic (and there needs to be some
>> way of determining
>> where a resource is mastered; a node does not know,
>> initially, if it is
>> the only node that is using a resource). 
> 
> 
> 
>> That lookup only happens the first time
>> a resource is used by a node, once the
>> node knows where the master is, 
>> it does not need to look it up again,
>> unless it releases all
>> locks on the resource.
>>
> 
> Oh, I see. Just to further clarify, does it means if
> the same lock resource is required again by an
> application on the node A, the node A will go straight
> to the known node (ie the node B) which holds the
> master previously, but needs to lookup again if the
> node B has already released the lock resource?

Not quite. A resource is mastered on a node for as long as there are
locks for it. If node A gets the lock (which is mastered on node B) then
it knows always to go do node B until all locks on node A are released.
When that happens the local copy of the resource on node A is released
including the reference to node B. If all the locks on node B are
released (but A still has some) then the resource will stay mastered on
node B and nodes that still have locks on that resource will know where
to find it without a directory lookup.

-- 

Chrissie