[Linux-cluster] Lock Resources

Wed May 7 10:34:31 UTC 2008

> > 
> > A couple of further questions about the master
> copy of
> > lock resources.
> > 
> > The first one:
> > =============
> > 
> > Again, assume:
> > 1) Node A is extremely too busy and handle all
> > requests
> > 2) other nodes are just idle and have never
> handled
> > any requests
> > 
> > According to the documents, Node A will hold all
> > master copies initially. The thing I am not aware
> of
> > and unclear is whether the lock manager will
> evenly
> > distribute the master copies on Node A to other
> nodes
> > when it thinks the number of master copies on Node
> A
> > is too many?
> 
> Locks are only remastered when a node leaves the
> cluster. In that case
> all of its nodes will be moved to another node. We
> do not do dynamic
> remastering - a resource that is mastered on one
> node will stay mastered
> on that node regardless of traffic or load, until
> all users of the
> resource have been freed.

Thank you very much.

> 
> > The second one:
> > ==============
> > 
> > Assume a master copy of lock resource is on Node
> A.
> > Now Node B holds a local copy of the lock
> resource.
> > When the lock queues changed on the local copy on
> Node
> > B, will the master copy on Node A be updated
> > simultaneously? If so, when more than one nodes
> have
> > the local copy of the same lock resource, how the
> lock
> > manager to handle the update of the master copy?
> Using
> > another lock mechanism to prevent the corruption
> of
> > the master copy?
> > 
> 
> All locking happens on the master node. The local
> copy is just that, a
> copy. It is updated when the master confirms what
> has happened. The
> local copy is there mainly for rebuilding the
> resource table when a
> master leaves the cluster, and to keep a track of
> locks that exist on
> the local node. The local copy is NOT complete. it
> only contains local
> users of a resource.
> 

Thanks again for the kind and detailed explanation. 

I am sorry I have to bother you again as I am having
more questions. I analysed /proc/cluster/dlm_dir and
dlm_locks and found some strange things. Please see
below:

>From /proc/cluster/dlm_dir:

In lock space [ABC]:
This node (node 2) has 445 lock resources in total
where
--328   master lock resources
--117   local copies of lock resources mastered on
other nodes.

===============================
===============================

>From /proc/cluster/dlm_locks:

In lock space [ABC]:
There are 1678 lock resouces in use where
--1674  lock resources are mastered by this node (node
2)
--4     lock resources are mastered by other nodes,
within which:
----1 lock resource mastered on node 1
----1 lock resource mastered on node 3
----1 lock resource mastered on node 4
----1 lock resource mastered on node 5

A typical master lock resource in
/proc/cluster/dlm_locks is:
Resource 000001000de4fd88 (parent 0000000000000000).
Name (len=24) "       3         5fafc85"
Master Copy
LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Granted Queue
1ff5036d NL Remote:   4 000603e8
80d2013f NL Remote:   5 00040214
00240209 NL Remote:   3 0001031d
00080095 NL Remote:   1 00040197
00010304 NL
Conversion Queue
Waiting Queue

After search for local copy in
/proc/cluster/dlm_locks, I got:
Resource 000001002a273618 (parent 0000000000000000).
Name (len=16) "withdraw 3......"
Local Copy, Master is node 3
Granted Queue
0004008d PR Master:     0001008c
Conversion Queue
Waiting Queue

--
Resource 000001003fe69b68 (parent 0000000000000000).
Name (len=16) "withdraw 5......"
Local Copy, Master is node 5
Granted Queue
819402ef PR Master:     00010317
Conversion Queue
Waiting Queue

--
Resource 000001002a2732e8 (parent 0000000000000000).
Name (len=16) "withdraw 1......"
Local Copy, Master is node 1
Granted Queue
000401e9 PR Master:     00010074
Conversion Queue
Waiting Queue

--
Resource 000001004a32e598 (parent 0000000000000000).
Name (len=16) "withdraw 4......"
Local Copy, Master is node 4
Granted Queue
1f5b0317 PR Master:     00010203
Conversion Queue
Waiting Queue

These four local copy of lock resources have been
staying in /proc/cluster/dlm_locks for several days.

Now my questions:
1. In my case, for the same lock space, the number of
master lock resources reported by dlm_dir is much
SMALLER than that reported in dlm_locks. My
understanding is that master lock resources listed in
dlm_dir must be larger than or at least the same as
that reported in dlm_locks. The situation I discovered
on the node does not make any sense to me. Am I
missing anything? Can you help me to clarify the case?

2. What can cause "withdraw ...." to be the lock
resource name? 

3. These four local copy of lock resources have not
been released for at least serveral days as I knew.
How can I find out whether they are in a strange dead
situation or are still waiting for the lock manager to
release them? How to change the timeout?

Thank you very much for your great further help in
advance.

Jas

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ