[Linux-cluster] DLM behavior after lockspace recovery

Thu Oct 7 11:26:35 UTC 2004

Thursday, October 7, 2004, 7:07:22 AM, David Teigland wrote:

> On Wed, Oct 06, 2004 at 09:15:01AM -0400, Jeff wrote:

>> When the value block not valid flag is set any subsequent lock requests
>> which read the lock value block complete with a value-block-not-valid
>> status. This is a success state. This status continues to be returned
>> until the value block is written or the lock is forgotten because it has
>> been released by all cluster members.

> If VALNOTVALID is returned, do you care what the lvb contents are?

My preference would be that it has the most current copy from
the surviving members. If the nodes keep track of the change count,
this would be the copy with the highest value. An alternative,
although I suspect this is more difficult to implement, would be for
each surviving node to return the VALNOTVALID status until it writes
the lock value block. In this case after one node has written the value
block it would be important that the current, valid, value is used.

Here's the problem with simply resetting the value block to zero.
We're using the value block as a counter to track whether a block
on disk has changed or not. Each cluster member keeps a copy of the
value block counter in memory along with the associated disk block.
When a process converts a NL lock to a higher mode it reads the
current copy of the value block to decide whether it needs to re-read
the block from disk.

When the lock request completes with VALNOTVALID as a status the
process knows that it needs to re-read the block from disk. The big
question though is what does it write into the lock value block at
that point so the other systems will know this as well. If the lock
value block is guaranteed to have the most recent value seen by the
existing nodes then the process can simply increment the value and
it will know that the result will not match what any other system has
cached. If the lock value block is zeroed or set to an arbitrary
value from any one of the surviving nodes, then it might be a value
which is lower than exists on one or more of the nodes. There are ways
we can deal with this but it means more bookkeeping.

>> I'm not sure that implementing this as a flag returned in the lock value
>> block is really a good idea as it would mean that interested
>> applications would have to perform an extra memory reference for a
>> fairly uncommon situation. Should this be implemented as a property of
>> the lockspace which is defined when the lockspace is created?

> Let's try returning VALNOTVALID in the lksb flags field first and see how
> it works.

ok.