[Cluster-devel] [GFS2 PATCH 1/2] GFS2: Make gfs2_clear_inode() queue the final put

Wed Dec 2 10:23:18 UTC 2015

HI,

On 01/12/15 15:42, Bob Peterson wrote:
> ----- Original Message -----
>> Hi,
>>
>> On 25/11/15 14:22, Bob Peterson wrote:
>>> ----- Original Message -----
>>>> Hi,
>>>>
>>>> On 19/11/15 18:42, Bob Peterson wrote:
>>>>> This patch changes function gfs2_clear_inode() so that instead
>>>>> of calling gfs2_glock_put directly() most of the time, it queues
>>>>> the glock to the delayed work queue. That avoids a possible
>>>>> deadlock where it calls dlm during a fence operation:
>>>>> dlm waits for a fence operation, the fence operation waits for
>>>>> memory, the shrinker waits for gfs2 to free an inode from memory,
>>>>> but gfs2 waits for dlm.
>>>>>
>>>>> Signed-off-by: Bob Peterson <rpeterso at redhat.com>
>>>>> ---
>>>>>     fs/gfs2/glock.c | 34 +++++++++++++++++-----------------
>>>>>     fs/gfs2/glock.h |  1 +
>>>>>     fs/gfs2/super.c |  5 ++++-
>>>>>     3 files changed, 22 insertions(+), 18 deletions(-)
>>>> [snip]
>>>> Most of the patch seems to just rename the workqueue which makes it
>>>> tricky to spot the other changes. However, the below code seems to be
>>>> the new bit..
>>>>
>>>>> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
>>>>> index 9d5c3f7..46e5004 100644
>>>>> --- a/fs/gfs2/super.c
>>>>> +++ b/fs/gfs2/super.c
>>>>> @@ -24,6 +24,7 @@
>>>>>     #include <linux/crc32.h>
>>>>>     #include <linux/time.h>
>>>>>     #include <linux/wait.h>
>>>>> +#include <linux/workqueue.h>
>>>>>     #include <linux/writeback.h>
>>>>>     #include <linux/backing-dev.h>
>>>>>     #include <linux/kernel.h>
>>>>> @@ -1614,7 +1615,9 @@ out:
>>>>>     	ip->i_gl->gl_object = NULL;
>>>>>     	flush_delayed_work(&ip->i_gl->gl_work);
>>>>>     	gfs2_glock_add_to_lru(ip->i_gl);
>>>>> -	gfs2_glock_put(ip->i_gl);
>>>>> +	if (queue_delayed_work(gfs2_glock_workqueue,
>>>>> +			       &ip->i_gl->gl_work, 0) == 0)
>>>>> +		gfs2_glock_put(ip->i_gl);
>>>>>     	ip->i_gl = NULL;
>>>>>     	if (ip->i_iopen_gh.gh_gl) {
>>>>>     		ip->i_iopen_gh.gh_gl->gl_object = NULL;
>>>> which replaces a put with a queue & put if the queue fails (due to it
>>>> being already on the queue) which doesn't look quite right to be since
>>>> if calling gfs2_glock_put() was not safe before, then calling it
>>>> conditionally like this is still no safer I think?
>>>>
>>>> Steve.
>>> Hi,
>>>
>>> The call to gfs2_glock_put() in this case should be safe.
>>>
>>> If queuing the delayed work fails, it means the glock reference count is
>>> greater than 1, to be decremented when the glock state machine runs.
>>> Which means this can't be the final glock_put().
>>> Which means we can't possibly call into DLM, which means we can't block.
>>> Which means it's safe.
>>>
>>> Regards,
>>>
>>> Bob Peterson
>>> Red Hat File Systems
>> There is no reason that this cannot be the final glock put, since there
>> is no synchronization with the work that has been queued, so it might
>> well have run and decremented the ref count before we return from the
>> queuing function. It is unlikely that will be the case, but it is still
>> possible,
>>
>> Steve.
>>
> Hi Steve,
>
> It's kind of an ugly hack, but can we do something like the patch below instead?
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
> ---
> commit 1949050b4b13c1b32ea45987fbf2936ae779609e
> Author: Bob Peterson <rpeterso at redhat.com>
> Date:   Thu Nov 19 12:06:31 2015 -0600
>
> GFS2: Make gfs2_clear_inode() not block on final glock put
>
> This patch changes function gfs2_clear_inode() so that instead
> of calling gfs2_glock_put, it calls a new gfs2_glock_put_noblock
> function that avoids a possible deadlock that would occur should
> it call dlm during a fence operation: dlm waits for a fence
> operation, the fence operation waits for memory, the shrinker
> waits for gfs2 to free an inode from memory, but gfs2 waits for
> dlm. The new non-blocking glock_put does this:
>
> 1. It acquires the lockref to ensure no one else is messing with it.
> 2. If the lockref is put (not locked) it can safely return because
>     it is not the last reference to the glock.
> 3. If this is the last reference, it tries to queue delayed work for
>     the glock.
> 4. If it was able to queue the delayed work, it's safe to return
>     because the glock_work_func will run in another process, so
>     this one cannot block.
> 5. If it was unable to queue the delayed work, it needs to schedule
>     and start the whole process again.
>
> Signed-off-by: Bob Peterson <rpeterso at redhat.com>
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index a4ff7b5..22870c6 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -178,6 +178,27 @@ void gfs2_glock_put(struct gfs2_glock *gl)
>   }
>   
>   /**
> + * gfs2_glock_put_noblock() - Decrement reference count on glock
> + * @gl: The glock to put
> + *
> + * This is the same as gfs2_glock_put() but it's not allowed to block
> + */
> +
> +void gfs2_glock_put_noblock(struct gfs2_glock *gl)
> +{
> +	while (1) {
> +		if (lockref_put_or_lock(&gl->gl_lockref))
> +			break;
> +
> +		spin_unlock(&gl->gl_lockref.lock);
That just drops the ref count without doing anything.

> +		if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) != 0)
> +			break;
You can't call queue_delayed_work on a glock for which you don't have a 
ref count - it might not exist any more. Please take a look at this 
again and figure out what the problematic cycle of events is, and then 
work out how to avoid that happening in the first place. There is no 
point in replacing one problem with another one, particularly one which 
would likely be very tricky to debug,

Steve.

> +
> +		cond_resched();
> +	}
> +}
> +
> +/**
>    * may_grant - check if its ok to grant a new lock
>    * @gl: The glock
>    * @gh: The lock request which we wish to grant
> diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
> index 46ab67f..d786446 100644
> --- a/fs/gfs2/glock.h
> +++ b/fs/gfs2/glock.h
> @@ -182,6 +182,7 @@ extern int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
>   			  const struct gfs2_glock_operations *glops,
>   			  int create, struct gfs2_glock **glp);
>   extern void gfs2_glock_put(struct gfs2_glock *gl);
> +extern void gfs2_glock_put_noblock(struct gfs2_glock *gl);
>   extern void gfs2_holder_init(struct gfs2_glock *gl, unsigned int state,
>   			     u16 flags, struct gfs2_holder *gh);
>   extern void gfs2_holder_reinit(unsigned int state, u16 flags,
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 03fa155..188f2a5 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1613,7 +1613,7 @@ out:
>   	ip->i_gl->gl_object = NULL;
>   	flush_delayed_work(&ip->i_gl->gl_work);
>   	gfs2_glock_add_to_lru(ip->i_gl);
> -	gfs2_glock_put(ip->i_gl);
> +	gfs2_glock_put_noblock(ip->i_gl);
>   	ip->i_gl = NULL;
>   	if (ip->i_iopen_gh.gh_gl) {
>   		ip->i_iopen_gh.gh_gl->gl_object = NULL;
>