[Linux-cluster] DLM behavior after lockspace recovery

Sat Oct 16 20:40:19 UTC 2004

On Friday 15 October 2004 07:49, Jeff wrote:
> Why is the LVB "corrupt" when its marked VALNOTVALID... I'm 
> suggesting that it have the "last known value" as far as the
> surviving nodes in the cluster are concerned.

Staleness is a kind of corruption.  If you prefer, I'll call it stale.

> We have various different types of locks and each of them
> deal with VALNOTVALID errors differently. One example is
> we have a lock which is sometimes held exclusively but the
> contents of the lock value block never change once the lock
> has been initialized. For this lock we ignore VALNOTVALID errors
> because we know the contents are correct. If the DLM is going to
> zero the LVB we'd have to update our code to regenerate this value.

How hard is that?

> For for locks which contain sequence #'s to protect cached data we
> treat a VALNOTVALID error as meaning that the cached buffer is
> invalid and the block needs to be read from disk. Preserving the last
> known good value allows us to pick a new value for the sequence
> # which we know is greater than what any of the other cluster
> members know. If the last known good value can't be provided by
> the DLM we'd deal with this by adding a "VALNOTVALID" counter to
> the sequence #. This might mean extending it to a 64-bit # or perhaps
> we could take some # of bits from the existing 32-bits.

Don't you really just want an efficient way of invalidating the stale 
lvb values out there?  Would it work for you if the lock master issues 
asts to tell everybody the lvb is now invalid?

Regards,

Daniel