[Linux-cluster] DLM behavior after lockspace recovery

Jeff jeff at intersystems.com
Sun Oct 17 00:03:34 UTC 2004


Saturday, October 16, 2004, 7:04:57 PM, Daniel Phillips wrote:

> On Saturday 16 October 2004 17:07, Jeff wrote:
>> Saturday, October 16, 2004, 4:40:19 PM, Daniel Phillips wrote:
>> > On Friday 15 October 2004 07:49, Jeff wrote:
>> >> Why is the LVB "corrupt" when its marked VALNOTVALID... I'm
>> >> suggesting that it have the "last known value" as far as the
>> >> surviving nodes in the cluster are concerned.
>> >
>> > Staleness is a kind of corruption.  If you prefer, I'll call it
>> > stale.
>>
>> Not really. It depends on how you design your code. If you
>> know that when a LVB you see may not be the latest and greatest
>> value but if it isn't, you get notified of that (eg. VALNOTVALID
>> status), and you design around those premises, 'staleness' is
>> not corruption.

> Perhaps you'd prefer the term "data loss"?

Why do you say that. The lock is marked not valid. The
application is told the lock is not valid. Its up to
the application to decide what this means. Perhaps
it does mean data loss, perhaps it doesn't. How is the
DLM to know.

>> If the recovery mechanisms are designed so that
>> the LVB has the most recent value seen by any of the surviving
>> cluster members, its not necessary stale, it may be perfectly
>> valid. Of course, in some situations the current value needs
>> to be recalculated.
>>
>> >> We have various different types of locks and each of them
>> >> deal with VALNOTVALID errors differently. One example is
>> >> we have a lock which is sometimes held exclusively but the
>> >> contents of the lock value block never change once the lock
>> >> has been initialized. For this lock we ignore VALNOTVALID errors
>> >> because we know the contents are correct. If the DLM is going to
>> >> zero the LVB we'd have to update our code to regenerate this
>> >> value.
>> >
>> > How hard is that?
>>
>> It could be difficult as it introduces new race conditions.
>> If a node starts to join the cluster as another node fails
>> it could be that it sees the zeroed LVB before the recovery
>> process gets a chance to reset the value.
>>
>> I don't understand the philosophy behind zeroing the LVB.
>> Zero is a perfectly valid value so you're not clearing the
>> LVB when you zero it, you're just changing it to something else.
>> Why is zero preferable to any other value?  Why not simply
>> leave the value alone and if an application wants to zero it
>> when the value is questionable it can do so when it gets
>> the VALNOTVALID status.

> I agree, 


>>
>> >> For for locks which contain sequence #'s to protect cached data we
>> >> treat a VALNOTVALID error as meaning that the cached buffer is
>> >> invalid and the block needs to be read from disk. Preserving the
>> >> last known good value allows us to pick a new value for the
>> >> sequence # which we know is greater than what any of the other
>> >> cluster members know. If the last known good value can't be
>> >> provided by the DLM we'd deal with this by adding a "VALNOTVALID"
>> >> counter to the sequence #. This might mean extending it to a
>> >> 64-bit # or perhaps we could take some # of bits from the existing
>> >> 32-bits.
>> >
>> > Don't you really just want an efficient way of invalidating the
>> > stale lvb values out there?  Would it work for you if the lock
>> > master issues asts to tell everybody the lvb is now invalid?
>>
>> Where are you going to get the ast routine to deliver? You can't
>> deliver the completion or blocking ast's associated with the
>> locks because the LVB can be invalidated on locks that are
>> granted which may not have a blocking AST established.

> But it's your application, you'd make sure you supply the appropriate
> trap.  No such trap exists at the moment, but my question is: supposing
> there were such an invalidation trap, would it solve your problem?

> Regards,

> Daniel

If there was some mechanism where by each holder of
a lock gets notified that its LVB is no longer valid,
would suffice. This would have to be delivered before
any completion ASTs on that lock. A single trap
which gets delivered when any locks have been invalidated
will not suffice because the node cannot decide on its own
which LVBs are still valid or not.

I'm not sure why you're looking to invent a new mechanism
though when a useful model already exists.








More information about the Linux-cluster mailing list