[Cluster-devel] [GFS2 PATCH 09/15] gfs2: fix deadlock in gfs2_ail1_empty withdraw

Bob Peterson rpeterso at redhat.com
Wed Jul 28 13:30:06 UTC 2021


On 7/28/21 12:38 AM, Andreas Gruenbacher wrote:
> Hi Bob,
> 
> On Tue, Jul 27, 2021 at 7:37 PM Bob Peterson <rpeterso at redhat.com> wrote:
>> Before this patch, function gfs2_ail1_empty could issue a file system
>> withdraw when IO errors were discovered. However, there are several
>> callers, including gfs2_flush_revokes() which holds the gfs2_log_lock
>> before calling gfs2_ail1_empty. If gfs2_ail1_empty needed to withdraw
>> it would leave the gfs2_log_lock held, which resulted in a deadlock
>> due to other processes that needed the log_lock.
>>
>> Another problem discovered by Christoph Helwig is that we cannot
>> withdraw from the log_flush process because it may be called from
>> the glock workqueue, and the withdraw process waits for that very
>> workqueue to be flushed. So the withdraw must be ignored until it may
>> be handled by a more appropriate context like the gfs2_logd daemon.
>>
>> This patch moves the withdraw out of function gfs2_ail1_empty and
>> makes each of the callers check for a withdraw by calling new function
>> check_ail1_withdraw.
> 
>> Function gfs2_flush_revokes now does this check
>> after releasing the gfs2_log_lock to avoid the deadlock.
> 
> I don't see that in the code.

Yeah, the comment was wrong. I noticed the problem and already removed 
the paragraph after the patch set was sent out.

Bob




More information about the Cluster-devel mailing list