[Linux-cluster] DLM behavior after lockspace recovery

Sat Oct 16 06:40:43 UTC 2004

On Fri, Oct 15, 2004 at 12:41:16PM -0400, Daniel Phillips wrote:
> On Friday 15 October 2004 10:20, David Teigland wrote:
> > On Fri, Oct 15, 2004 at 12:32:38AM -0400, Daniel Phillips wrote:
> > > But do you really think the dlm should pretend that a potentially
> > > corrupt value is in fact good?  This seems like a very bad idea to
> > > me. In return for saving some bookkeeping in the very special case
> > > where you have an incrementing lvb, you suggest imposing extra
> > > overhead on every lvb update and having the dlm make false promises
> > > about data integrity. I don't think this is a good trade.
> >
> > Incorrect.  Nothing is corrupt, there's no "false promise", there's
> > no overhead to speak of, and restoring the last available value is
> > standard behavior.
> 
> A standard bug you mean.  Suppose, in my application, I were to store 
> the number of nodes currently writing to a disk region in a lvb.  Each 
> time a node initiates a write it gets PW and increments the value.  
> When the write completes, it decrements the value.  If zero, the region 
> is marked as "in sync".
> 
> Now the lock master dies and an older lvb value is quietly substituted 
> for the actual value.  You see that a region is now going to be marked 
> as "in sync" when there are still one or more writes outstanding to it.
> 
> The correct behavior is obviously to mark the lvb invalid and have the 
> application respond by contacting each node to see which of them are 
> still writing to the region.
> 
> I hope you see why it is in general, bad to lie about the integrity of 
> data.

Still incorrect.  Check the facts and try again.

-- 
Dave Teigland  <teigland at redhat.com>