[Linux-cluster] DLM behavior after lockspace recovery

Fri Oct 15 11:49:42 UTC 2004

Friday, October 15, 2004, 12:32:38 AM, Daniel Phillips wrote:

> Hi Jeff,

> On Thursday 07 October 2004 07:26, Jeff wrote:
>> Here's the problem with simply resetting the value block to zero.
>> We're using the value block as a counter to track whether a block
>> on disk has changed or not. Each cluster member keeps a copy of the
>> value block counter in memory along with the associated disk block.
>> When a process converts a NL lock to a higher mode it reads the
>> current copy of the value block to decide whether it needs to re-read
>> the block from disk.
>>
>> When the lock request completes with VALNOTVALID as a status the
>> process knows that it needs to re-read the block from disk. The big
>> question though is what does it write into the lock value block at
>> that point so the other systems will know this as well. If the lock
>> value block is guaranteed to have the most recent value seen by the
>> existing nodes then the process can simply increment the value and
>> it will know that the result will not match what any other system has
>> cached. If the lock value block is zeroed or set to an arbitrary
>> value from any one of the surviving nodes, then it might be a value
>> which is lower than exists on one or more of the nodes. There are
>> ways we can deal with this but it means more bookkeeping.

> But do you really think the dlm should pretend that a potentially 
> corrupt value is in fact good?  This seems like a very bad idea to me.
> In return for saving some bookkeeping in the very special case where you
> have an incrementing lvb, you suggest imposing extra overhead on every
> lvb update and having the dlm make false promises about data integrity.
> I don't think this is a good trade.

> I'd suggest biting the bullet and initiating application level recovery
> as soon as you see VALNOTVALID.  In this case it means you have to tell
> every node to reset its cached sequence number.

> Regards,

> Daniel

Why is the LVB "corrupt" when its marked VALNOTVALID...I'm
suggesting that it have the "last known value" as far as the
surviving nodes in the cluster are concerned.

We have various different types of locks and each of them
deal with VALNOTVALID errors differently. One example is
we have a lock which is sometimes held exclusively but the
contents of the lock value block never change once the lock
has been initialized. For this lock we ignore VALNOTVALID errors
because we know the contents are correct. If the DLM is going to
zero the LVB we'd have to update our code to regenerate this value.

For for locks which contain sequence #'s to protect cached data we
treat a VALNOTVALID error as meaning that the cached buffer is
invalid and the block needs to be read from disk. Preserving the last
known good value allows us to pick a new value for the sequence
# which we know is greater than what any of the other cluster
members know. If the last known good value can't be provided by
the DLM we'd deal with this by adding a "VALNOTVALID" counter to
the sequence #. This might mean extending it to a 64-bit # or perhaps
we could take some # of bits from the existing 32-bits.