[Cluster-devel] [GFS2 PATCH 09/15] gfs2: fix deadlock in gfs2_ail1_empty withdraw

Andreas Gruenbacher agruenba at redhat.com
Wed Jul 28 05:38:39 UTC 2021


Hi Bob,

On Tue, Jul 27, 2021 at 7:37 PM Bob Peterson <rpeterso at redhat.com> wrote:
> Before this patch, function gfs2_ail1_empty could issue a file system
> withdraw when IO errors were discovered. However, there are several
> callers, including gfs2_flush_revokes() which holds the gfs2_log_lock
> before calling gfs2_ail1_empty. If gfs2_ail1_empty needed to withdraw
> it would leave the gfs2_log_lock held, which resulted in a deadlock
> due to other processes that needed the log_lock.
>
> Another problem discovered by Christoph Helwig is that we cannot
> withdraw from the log_flush process because it may be called from
> the glock workqueue, and the withdraw process waits for that very
> workqueue to be flushed. So the withdraw must be ignored until it may
> be handled by a more appropriate context like the gfs2_logd daemon.
>
> This patch moves the withdraw out of function gfs2_ail1_empty and
> makes each of the callers check for a withdraw by calling new function
> check_ail1_withdraw.

> Function gfs2_flush_revokes now does this check
> after releasing the gfs2_log_lock to avoid the deadlock.

I don't see that in the code.

>
> Signed-off-by: Bob Peterson <rpeterso at redhat.com>
> ---
>  fs/gfs2/log.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index f0ee3ff6f9a8..c687a8c4e044 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -364,11 +364,6 @@ static int gfs2_ail1_empty(struct gfs2_sbd *sdp, int max_revokes)
>         ret = list_empty(&sdp->sd_ail1_list);
>         spin_unlock(&sdp->sd_ail_lock);
>
> -       if (test_bit(SDF_WITHDRAWING, &sdp->sd_flags)) {
> -               gfs2_lm(sdp, "fatal: I/O error(s)\n");
> -               gfs2_withdraw(sdp);
> -       }
> -
>         return ret;
>  }
>
> @@ -786,6 +781,15 @@ void gfs2_glock_remove_revoke(struct gfs2_glock *gl)
>         }
>  }
>
> +static void check_ail1_withdraw(struct gfs2_sbd *sdp)
> +{
> +       if (!test_bit(SDF_WITHDRAWING, &sdp->sd_flags))
> +               return;
> +
> +       gfs2_lm(sdp, "fatal: I/O error(s)\n");
> +       gfs2_withdraw(sdp);
> +}
> +
>  /**
>   * gfs2_flush_revokes - Add as many revokes to the system transaction as we can
>   * @sdp: The GFS2 superblock
> @@ -1317,6 +1321,7 @@ int gfs2_logd(void *data)
>
>                 if (gfs2_jrnl_flush_reqd(sdp) || t == 0) {
>                         gfs2_ail1_empty(sdp, 0);
> +                       check_ail1_withdraw(sdp);
>                         gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
>                                                   GFS2_LFC_LOGD_JFLUSH_REQD);
>                 }
> @@ -1325,6 +1330,7 @@ int gfs2_logd(void *data)
>                         gfs2_ail1_start(sdp);
>                         gfs2_ail1_wait(sdp);
>                         gfs2_ail1_empty(sdp, 0);
> +                       check_ail1_withdraw(sdp);
>                         gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
>                                                   GFS2_LFC_LOGD_AIL_FLUSH_REQD);
>                 }
> --
> 2.31.1
>

Thanks,
Andreas




More information about the Cluster-devel mailing list