[Linux-cluster] DLM behavior after lockspace recovery

Sat Oct 16 23:04:57 UTC 2004

On Saturday 16 October 2004 17:07, Jeff wrote:
> Saturday, October 16, 2004, 4:40:19 PM, Daniel Phillips wrote:
> > On Friday 15 October 2004 07:49, Jeff wrote:
> >> Why is the LVB "corrupt" when its marked VALNOTVALID... I'm
> >> suggesting that it have the "last known value" as far as the
> >> surviving nodes in the cluster are concerned.
> >
> > Staleness is a kind of corruption.  If you prefer, I'll call it
> > stale.
>
> Not really. It depends on how you design your code. If you
> know that when a LVB you see may not be the latest and greatest
> value but if it isn't, you get notified of that (eg. VALNOTVALID
> status), and you design around those premises, 'staleness' is
> not corruption.

Perhaps you'd prefer the term "data loss"?

> If the recovery mechanisms are designed so that 
> the LVB has the most recent value seen by any of the surviving
> cluster members, its not necessary stale, it may be perfectly
> valid. Of course, in some situations the current value needs
> to be recalculated.
>
> >> We have various different types of locks and each of them
> >> deal with VALNOTVALID errors differently. One example is
> >> we have a lock which is sometimes held exclusively but the
> >> contents of the lock value block never change once the lock
> >> has been initialized. For this lock we ignore VALNOTVALID errors
> >> because we know the contents are correct. If the DLM is going to
> >> zero the LVB we'd have to update our code to regenerate this
> >> value.
> >
> > How hard is that?
>
> It could be difficult as it introduces new race conditions.
> If a node starts to join the cluster as another node fails
> it could be that it sees the zeroed LVB before the recovery
> process gets a chance to reset the value.
>
> I don't understand the philosophy behind zeroing the LVB.
> Zero is a perfectly valid value so you're not clearing the
> LVB when you zero it, you're just changing it to something else.
> Why is zero preferable to any other value?  Why not simply
> leave the value alone and if an application wants to zero it
> when the value is questionable it can do so when it gets
> the VALNOTVALID status.

I agree, 

>
> >> For for locks which contain sequence #'s to protect cached data we
> >> treat a VALNOTVALID error as meaning that the cached buffer is
> >> invalid and the block needs to be read from disk. Preserving the
> >> last known good value allows us to pick a new value for the
> >> sequence # which we know is greater than what any of the other
> >> cluster members know. If the last known good value can't be
> >> provided by the DLM we'd deal with this by adding a "VALNOTVALID"
> >> counter to the sequence #. This might mean extending it to a
> >> 64-bit # or perhaps we could take some # of bits from the existing
> >> 32-bits.
> >
> > Don't you really just want an efficient way of invalidating the
> > stale lvb values out there?  Would it work for you if the lock
> > master issues asts to tell everybody the lvb is now invalid?
>
> Where are you going to get the ast routine to deliver? You can't
> deliver the completion or blocking ast's associated with the
> locks because the LVB can be invalidated on locks that are
> granted which may not have a blocking AST established.

But it's your application, you'd make sure you supply the appropriate 
trap.  No such trap exists at the moment, but my question is: supposing 
there were such an invalidation trap, would it solve your problem?

Regards,

Daniel