[Linux-cluster] DLM behavior after lockspace recovery

Sat Oct 16 21:07:23 UTC 2004

Saturday, October 16, 2004, 4:40:19 PM, Daniel Phillips wrote:

> On Friday 15 October 2004 07:49, Jeff wrote:
>> Why is the LVB "corrupt" when its marked VALNOTVALID... I'm 
>> suggesting that it have the "last known value" as far as the
>> surviving nodes in the cluster are concerned.

> Staleness is a kind of corruption.  If you prefer, I'll call it stale.

Not really. It depends on how you design your code. If you
know that when a LVB you see may not be the latest and greatest
value but if it isn't, you get notified of that (eg. VALNOTVALID
status), and you design around those premises, 'staleness' is
not corruption. If the recovery mechanisms are designed so that
the LVB has the most recent value seen by any of the surviving
cluster members, its not necessary stale, it may be perfectly
valid. Of course, in some situations the current value needs
to be recalculated.

>> We have various different types of locks and each of them
>> deal with VALNOTVALID errors differently. One example is
>> we have a lock which is sometimes held exclusively but the
>> contents of the lock value block never change once the lock
>> has been initialized. For this lock we ignore VALNOTVALID errors
>> because we know the contents are correct. If the DLM is going to
>> zero the LVB we'd have to update our code to regenerate this value.

> How hard is that?

It could be difficult as it introduces new race conditions.
If a node starts to join the cluster as another node fails
it could be that it sees the zeroed LVB before the recovery
process gets a chance to reset the value.

I don't understand the philosophy behind zeroing the LVB.
Zero is a perfectly valid value so you're not clearing the
LVB when you zero it, you're just changing it to something else.
Why is zero preferable to any other value?  Why not simply
leave the value alone and if an application wants to zero it
when the value is questionable it can do so when it gets
the VALNOTVALID status.

>> For for locks which contain sequence #'s to protect cached data we
>> treat a VALNOTVALID error as meaning that the cached buffer is
>> invalid and the block needs to be read from disk. Preserving the last
>> known good value allows us to pick a new value for the sequence
>> # which we know is greater than what any of the other cluster
>> members know. If the last known good value can't be provided by
>> the DLM we'd deal with this by adding a "VALNOTVALID" counter to
>> the sequence #. This might mean extending it to a 64-bit # or perhaps
>> we could take some # of bits from the existing 32-bits.

> Don't you really just want an efficient way of invalidating the stale
> lvb values out there?  Would it work for you if the lock master issues
> asts to tell everybody the lvb is now invalid?

Where are you going to get the ast routine to deliver? You can't
deliver the completion or blocking ast's associated with the
locks because the LVB can be invalidated on locks that are
granted which may not have a blocking AST established.

> Regards,

> Daniel