[Linux-cluster] DLM behavior after lockspace recovery

Sun Oct 17 01:52:02 UTC 2004

On Saturday 16 October 2004 20:03, Jeff wrote:
> > Perhaps you'd prefer the term "data loss"?
>
> Why do you say that. The lock is marked not valid. The
> application is told the lock is not valid. Its up to
> the application to decide what this means. Perhaps
> it does mean data loss, perhaps it doesn't. How is the
> DLM to know.

Quibble territory, Jeff ;-)

> >> >> If the DLM is
> >> >> going to zero the LVB we'd have to update our code to
> >> >> regenerate this value.
> >> >
> >> > How hard is that?
> >>
> >> It could be difficult as it introduces new race conditions.
> >> If a node starts to join the cluster as another node fails
> >> it could be that it sees the zeroed LVB before the recovery
> >> process gets a chance to reset the value.

But VALNOTVALID would tell the new node it has to participate in 
recovery, eliminating the race.

> >> I don't understand the philosophy behind zeroing the LVB.
> >> Zero is a perfectly valid value so you're not clearing the
> >> LVB when you zero it, you're just changing it to something else.
> >> Why is zero preferable to any other value?  Why not simply
> >> leave the value alone and if an application wants to zero it
> >> when the value is questionable it can do so when it gets
> >> the VALNOTVALID status.

Define "leave the value alone".  The situation is, a new master is 
taking over the lock because the old master died.  It doesn't have a 
value to leave alone.

> If there was some mechanism where by each holder of
> a lock gets notified that its LVB is no longer valid,
> would suffice. This would have to be delivered before
> any completion ASTs on that lock. A single trap
> which gets delivered when any locks have been invalidated
> will not suffice because the node cannot decide on its own
> which LVBs are still valid or not.
>
> I'm not sure why you're looking to invent a new mechanism
> though...

That was a "suppose".  The additional INVALIDATED flag[1] I proposed is 
a lot lighter and would also do the job, _I think_.

[1] A user receives an INVALIDATED flag the first time it reads an lvb 
that couldn't be recovered, whether or not the lvb was subsequently 
updated.  The user may use this flag to know if it should perform a 
recovery action of its own.

> ...when a useful model already exists. 

LVB sequencing that we suspect vaxcluster might have had?  I regard it 
as a misguided model because it adds code and data bloat that benefits 
a small minority of applications.  We should resist the bloat-up 
temptation strongly, because if we don't, Linus will do it for us.

By the way, thanks muchly for your efforts, and please keep it up, this 
is useful.

Regards,

Daniel