[Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

Mark Syms Mark.Syms at citrix.com
Tue Oct 9 08:50:15 UTC 2018


We think we have, just making a build to test.

Will follow up later.

Mark.

-----Original Message-----
From: Steven Whitehouse <swhiteho at redhat.com> 
Sent: 09 October 2018 09:41
To: Mark Syms <Mark.Syms at citrix.com>; Tim Smith <tim.smith at citrix.com>
Cc: cluster-devel at redhat.com; Ross Lagerwall <ross.lagerwall at citrix.com>
Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock



On 09/10/18 09:13, Mark Syms wrote:
> Having swapped the line below around we still see the timeout on 
> schedule fire, but only once in a fairly mega stress test. This is why 
> we weren't worried about the timeout being HZ, the situation is hardly 
> ever hit as having to wait is rare and normally we are woken from 
> schedule and without a timeout on schedule we never wake up so a rare occurrence of waiting a second really doesn't seem too bad.
>
> Mark.
We should still get to the bottom of why the wake up is missing though, since without that fix we won't know if there is something else wrong somewhere,

Steve.

>
> -----Original Message-----
> From: Tim Smith <tim.smith at citrix.com>
> Sent: 08 October 2018 14:27
> To: Steven Whitehouse <swhiteho at redhat.com>
> Cc: Mark Syms <Mark.Syms at citrix.com>; cluster-devel at redhat.com; Ross 
> Lagerwall <ross.lagerwall at citrix.com>
> Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in 
> find insert glock
>
> On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
>> Hi,
>>
>> On 08/10/18 14:10, Tim Smith wrote:
>>> On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
>>>> On 08/10/18 13:59, Mark Syms wrote:
>>>>> That sounds entirely reasonable so long as you are absolutely sure 
>>>>> that nothing is ever going to mess with that glock, we erred on 
>>>>> the side of more caution not knowing whether it would be guaranteed safe or not.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> 	Mark
>>>> We should have a look at the history to see how that wait got added.
>>>> However the "dead" flag here means "don't touch this glock" and is 
>>>> there so that we can separate the marking dead from the actual 
>>>> removal from the list (which simplifies the locking during the 
>>>> scanning procedures)
>>> You beat me to it :-)
>>>
>>> I think there might be a bit of a problem inserting a new entry with 
>>> the same name before the old entry has been fully destroyed (or at 
>>> least removed), which would be why the schedule() is there.
>> If the old entry is marked dead, all future lookups should ignore it.
>> We should only have a single non-dead entry at a time, but that 
>> doesn't seem like it should need us to wait for it.
> On the second call we do have the new glock to insert as arg2, so we could try to swap them cleanly, yeah.
>
>> If we do discover that the wait is really required, then it sounds 
>> like as you mentioned above there is a lost wakeup, and that must 
>> presumably be on a code path that sets the dead flag and then fails 
>> to send a wake up later on. If we can drop the wait in the first 
>> place, that seems like a better plan,
> Ooooh, I wonder if these two lines:
>
> 	wake_up_glock(gl);
> 	call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
>
> in gfs2_glock_free() are the wrong way round?
>
> --
> Tim Smith <tim.smith at citrix.com>
>
>





More information about the Cluster-devel mailing list