[Cluster-devel] [GFS2] Fix ordering bug in lock_dlm

Wed May 21 18:10:30 UTC 2008

On Wed, May 21, 2008 at 06:09:24PM +0100, Steven Whitehouse wrote:
> >From 317b0076b8b1a27b51a8eb47a64d495fdb956ac5 Mon Sep 17 00:00:00 2001
> From: Steven Whitehouse <swhiteho at redhat.com>
> Date: Wed, 21 May 2008 17:21:42 +0100
> Subject: [PATCH] [GFS2] Fix ordering bug in lock_dlm
> 
> This looks like a lot of change, but in fact its not. Mostly its
> things moving from one file to another. The change is just that
> instead of queuing lock completions and callbacks from the DLM
> we now pass them directly to GFS2.
> 
> This gives us a net loss of two list heads per glock (a fair
> saving in memory) plus a reduction in the latency of delivering
> the messages to GFS2, plus we now have one thread fewer as well.
> There was a bug where callbacks and completions could be delivered
> in the wrong order due to this unnecessary queuing which is fixed
> by this patch.

Several things,

1. These are very significant changes.  There's nothing terribly wrong
   with that, but it's important to get that straight.

2. Moving large chunks of code along with making significant changes
   makes it impossible to see what changed and what didn't.  In relation
   to point 1, a small number of actual lines changed doesn't make it
   insignificant, it's what those changes *do*.

3. These changes require us to fork the lock modules for gfs1 and gfs2.
   That's fine, it's been coming for quite a while anyway. (more below)

4. I'll continue to maintain the original lock_dlm for gfs1.  You and
   other gfs folks can own the new one and do whatever you like with it
   without me getting in the way.

Now, how to fork the lock modules.  There shouldn't be much trouble
adapting gfs_controld to cope with the two different lock_dlm's.  The one
main problem I see is that the name of the module "lock_dlm" can't really
be changed; it's a long standing part of the API/ABI, user interface,
documentation, ...  But I don't think it's feasible to have two different
files named lock_dlm.ko on the system.

The only solution I've been able to come up with is for the upstream
lock_dlm module to be merged into the gfs2 module, along with the
lock_nolock module.  We'd still be able to use gfs2 in the same old way,
refering to lock_dlm and lock_nolock, but it just wouldn't be a separate
module.  This has been the plan for a long time anway.  Initially, nothing
functional would need to change between gfs2 and lock_dlm even though
they're in the same module (it's the same thing we did with the
lock_harness).  Breaking down the barrier between them could then begin,
though, and be done incrementally.

So, when gfs2 looks for "lock_dlm" or "lock_nolock" it would look within
the scope of its own kernel module.  For gfs1, it would continue to look
for separate modules named lock_dlm and lock_nolock.