[Linux-cluster] Lock Resources

Christine Caulfield ccaulfie at redhat.com
Wed May 7 06:58:04 UTC 2008


Ja S wrote:
> --- Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
>> Ja S wrote:
>>> --- Christine Caulfield <ccaulfie at redhat.com>
>> wrote:
>>>
>>>>> DLM lockspace 'data'
>>>>>        5         2f06768 1
>>>>>        5          114d15 1
>>>>>        5          120b13 1
>>>>>        5         5bd1f04 1
>>>>>        3          6a02f8 2
>>>>>        5          cb7604 1
>>>>>        5          ca187b 1
>>>>>
>>>> The first two numbers are the lock name. Don't
>> ask
>>>> me what they mean,
>>>> that's a GFS question! (actually, I think inode
>>>> numbers might be
>>>> involved) The last number is the nodeID on which
>> the
>>>> lock is mastered.
>>>
>>> Great, thanks again!
>>>
>>>
>>>>>> That lookup only happens the first time
>>>>>> a resource is used by a node, once the
>>>>>> node knows where the master is, 
>>>>>> it does not need to look it up again,
>>>>>> unless it releases all
>>>>>> locks on the resource.
>>>>>>
>>>>> Oh, I see. Just to further clarify, does it
>> means
>>>> if
>>>>> the same lock resource is required again by an
>>>>> application on the node A, the node A will go
>>>> straight
>>>>> to the known node (ie the node B) which holds
>> the
>>>>> master previously, but needs to lookup again if
>>>> the
>>>>> node B has already released the lock resource?
>>>> Not quite. A resource is mastered on a node for
>> as
>>>> long as there are
>>>> locks for it. If node A gets the lock (which is
>>>> mastered on node B) then
>>>> it knows always to go do node B until all locks
>> on
>>>> node A are released.
>>>> When that happens the local copy of the resource
>> on
>>>> node A is released
>>>> including the reference to node B. If all the
>> locks
>>>> on node B are
>>>> released (but A still has some) then the resource
>>>> will stay mastered on
>>>> node B and nodes that still have locks on that
>>>> resource will know where
>>>> to find it without a directory lookup.
>>>>
>>> Aha, I think I missed another important concept --
>> a
>>> local copy of lock resources. I did not realise
>> the
>>> existence of the local copy of lock resources.
>> Which
>>> file should I check to figure out how many local
>>> copies a node has and what the local copies are? 
>> All the locks are displayed in
>> /proc/cluster/dlm_locks, that shows you
>> which are local copies and which are masters.
> 
> 
> A couple of further questions about the master copy of
> lock resources.
> 
> The first one:
> =============
> 
> Again, assume:
> 1) Node A is extremely too busy and handle all
> requests
> 2) other nodes are just idle and have never handled
> any requests
> 
> According to the documents, Node A will hold all
> master copies initially. The thing I am not aware of
> and unclear is whether the lock manager will evenly
> distribute the master copies on Node A to other nodes
> when it thinks the number of master copies on Node A
> is too many?

Locks are only remastered when a node leaves the cluster. In that case
all of its nodes will be moved to another node. We do not do dynamic
remastering - a resource that is mastered on one node will stay mastered
on that node regardless of traffic or load, until all users of the
resource have been freed.

> The second one:
> ==============
> 
> Assume a master copy of lock resource is on Node A.
> Now Node B holds a local copy of the lock resource.
> When the lock queues changed on the local copy on Node
> B, will the master copy on Node A be updated
> simultaneously? If so, when more than one nodes have
> the local copy of the same lock resource, how the lock
> manager to handle the update of the master copy? Using
> another lock mechanism to prevent the corruption of
> the master copy?
> 

All locking happens on the master node. The local copy is just that, a
copy. It is updated when the master confirms what has happened. The
local copy is there mainly for rebuilding the resource table when a
master leaves the cluster, and to keep a track of locks that exist on
the local node. The local copy is NOT complete. it only contains local
users of a resource.


-- 

Chrissie




More information about the Linux-cluster mailing list