[Cluster-devel] [GFS2 PATCH] gfs2: Abort gfs2_freeze if io error is seen

Thu Nov 14 20:44:45 UTC 2019

On Thu, Nov 14, 2019 at 6:13 PM Bob Peterson <rpeterso at redhat.com> wrote:
> Before this patch, an io error, such as -EIO writing to the journal
> would cause function gfs2_freeze to go into an infinite loop,
> continuously retrying the freeze operation. But nothing ever clears
> the -EIO except unmount after withdraw, which is impossible if the
> freeze operation never ends (fails). Instead you get:
>
> [ 6499.767994] gfs2: fsid=dm-32.0: error freezing FS: -5
> [ 6499.773058] gfs2: fsid=dm-32.0: retrying...
> [ 6500.791957] gfs2: fsid=dm-32.0: error freezing FS: -5
> [ 6500.797015] gfs2: fsid=dm-32.0: retrying...
>
> This patch adds a check for -EIO in gfs2_freeze, and if seen, it
> dequeues the freeze glock, aborts the loop and returns the error.
> Also, there's no need to pass the freeze holder to function
> gfs2_lock_fs_check_clean since it's only called in one place and
> it's a well-known superblock pointer, so this simplifies that.
>
> Signed-off-by: Bob Peterson <rpeterso at redhat.com>
> ---
>  fs/gfs2/super.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 8154c38e488b..eb1fbd533e6d 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -399,8 +399,7 @@ struct lfcc {
>   * Returns: errno
>   */
>
> -static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp,
> -                                   struct gfs2_holder *freeze_gh)
> +static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp)
>  {
>         struct gfs2_inode *ip;
>         struct gfs2_jdesc *jd;
> @@ -425,7 +424,7 @@ static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp,
>         }
>
>         error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_EXCLUSIVE,
> -                                  GL_NOCACHE, freeze_gh);
> +                                  GL_NOCACHE, &sdp->sd_freeze_gh);

Missing here:

        if (error)
              goto out;

>         list_for_each_entry(jd, &sdp->sd_jindex_list, jd_list) {
>                 error = gfs2_jdesc_check(jd);
> @@ -441,7 +440,7 @@ static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp,
>         }
>
>         if (error)
> -               gfs2_glock_dq_uninit(freeze_gh);
> +               gfs2_glock_dq_uninit(&sdp->sd_freeze_gh);
>
>  out:
>         while (!list_empty(&list)) {
> @@ -767,15 +766,20 @@ static int gfs2_freeze(struct super_block *sb)
>                         goto out;
>                 }
>
> -               error = gfs2_lock_fs_check_clean(sdp, &sdp->sd_freeze_gh);
> +               error = gfs2_lock_fs_check_clean(sdp);
>                 if (!error)
>                         break;
>
>                 if (error == -EBUSY)
>                         fs_err(sdp, "waiting for recovery before freeze\n");
> -               else
> +               else if (error == -EIO) {
> +                       fs_err(sdp, "Fatal IO error: cannot freeze gfs2 due "
> +                              "to recovery error.\n");
> +                       gfs2_glock_dq_uninit(&sdp->sd_freeze_gh);

Instead of this, gfs2_lock_fs_check_clean should make sure it doesn't
keep sd_freeze_ghl held when it fails.

> +                       goto out;
> +               } else {
>                         fs_err(sdp, "error freezing FS: %d\n", error);
> -
> +               }
>                 fs_err(sdp, "retrying...\n");
>                 msleep(1000);
>         }
>

Thanks,
Andreas