[Linux-cluster] DLM behavior after lockspace recovery

Sun Oct 17 03:42:04 UTC 2004

On Saturday 16 October 2004 21:58, Jeff wrote:
> I don't see this as all that much bloat. We're talking
> about one 32-bit counter per resource, not per lock. The sequence #
> only has to travel with lock requests which request value blocks so
> we talking about 33x32-bit #'s instead of 32 (discounting everything
> else in the packet).

32 byte lvbs, not 32 ints.  I look at it as 10% bloat-up.  I'd rather we 
head in the other direction and shrink the lvb by encoding, say, a one 
byte length.

> As for benefiting a small subset of lvb applications, how many
> do you think exist today? Of those that exist, what model do you
> think they are organized around?

I think that there are very few distributed applications around using 
dlms today, and of those, even fewer could be ported to another dlm 
without changes.  Maybe I'm wrong about that, perhaps you have some 
examples.

All the dlms around seem to have been inspired by Vaxcluster, but 
they've all made their own improvements.  Shades of Unix, except that 
there is no official dlm standard or even any standards body sniffing 
around, which is a broad hint about how important portability is to a 
dlm application.

This makes me think that we should take the good parts, leave out the 
bad parts, and work hard to improve the api now, while there is really 
only one application using gdlm, and none in production.

> > It seems to me that the VALNOTVALID flag by itself isn't enough for
> > you because one of your nodes might update the lvb, and
> > consequently some other node may not ever see the VALNOTVALID flag,
> > and therefore not know that it should reset its cached counter.  So
> > how about an additional flag, say, INVALIDATED, that the lock
> > master hands out to any lvb reader the first time it reads an lvb
> > for which recovery was not possible, whether the lvb was
> > subsequently written or not.  Your application looks at INVALIDATED
> > to know that it has to reset its counter and ignores VALNOTVALID. 
> > Does this work for you?
>
> This works for me assuming that the INVALIDATED flag is only passed
> out to nodes which hold locks at the time of the failure.

Yes, intended but not stated.

> A node  which requests a new lock should only see the VALNOTVALID flag
> if the LVB is still invalid.

Yes, that's why INVALIDATED is a separate flag.

> With this scheme it would be best if 
> any one of the current LVB's was picked to be the LVB after failover
> rather than zeroing it.

Which would work for the class of users that never rewrite the lvb to a 
different value, a rather larger class than the class that can tolerate 
losing the last N versions of the lvb, I would think.

So, "randomly pick one of the stale values and mark it invalid" sounds 
ok to me, on the grounds that it comes for free and it helps at least 
some applications.  Whereas "unilaterally clear to zero" doesn't 
accomplish anything that an application can't do for itself given the 
NOTVALID flag.  Like you, I'd prefer we rehabilitate all-zero as a 
valid lvb state, on grounds of cleanliness if nothing else.

> This also avoids the nasty problem of what the cluster does to
> deal with rollover in the VALSEQNUM field.

I was saving that one ;-)

Anyway, there are few minor issues I'm fretting about:

  1) Are there any other applications besides yours that could benefit
     from an INVALIDATED flag, as defined?

  2) What about breaking gfs?  Fortunately, it's all in-kernel, it's
     just some fiddling with the harness plugin.

  3) What about breaking userspace?  I'd say go ahead and do it now
     while pain is limited, if it's for a good cause.

Regards,

Daniel