[Cluster-devel] [GFS2 PATCH] gfs2: Abort gfs2_freeze if io error is seen
Andreas Gruenbacher
agruenba at redhat.com
Thu Nov 14 20:44:45 UTC 2019
On Thu, Nov 14, 2019 at 6:13 PM Bob Peterson <rpeterso at redhat.com> wrote:
> Before this patch, an io error, such as -EIO writing to the journal
> would cause function gfs2_freeze to go into an infinite loop,
> continuously retrying the freeze operation. But nothing ever clears
> the -EIO except unmount after withdraw, which is impossible if the
> freeze operation never ends (fails). Instead you get:
>
> [ 6499.767994] gfs2: fsid=dm-32.0: error freezing FS: -5
> [ 6499.773058] gfs2: fsid=dm-32.0: retrying...
> [ 6500.791957] gfs2: fsid=dm-32.0: error freezing FS: -5
> [ 6500.797015] gfs2: fsid=dm-32.0: retrying...
>
> This patch adds a check for -EIO in gfs2_freeze, and if seen, it
> dequeues the freeze glock, aborts the loop and returns the error.
> Also, there's no need to pass the freeze holder to function
> gfs2_lock_fs_check_clean since it's only called in one place and
> it's a well-known superblock pointer, so this simplifies that.
>
> Signed-off-by: Bob Peterson <rpeterso at redhat.com>
> ---
> fs/gfs2/super.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 8154c38e488b..eb1fbd533e6d 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -399,8 +399,7 @@ struct lfcc {
> * Returns: errno
> */
>
> -static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp,
> - struct gfs2_holder *freeze_gh)
> +static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp)
> {
> struct gfs2_inode *ip;
> struct gfs2_jdesc *jd;
> @@ -425,7 +424,7 @@ static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp,
> }
>
> error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_EXCLUSIVE,
> - GL_NOCACHE, freeze_gh);
> + GL_NOCACHE, &sdp->sd_freeze_gh);
Missing here:
if (error)
goto out;
> list_for_each_entry(jd, &sdp->sd_jindex_list, jd_list) {
> error = gfs2_jdesc_check(jd);
> @@ -441,7 +440,7 @@ static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp,
> }
>
> if (error)
> - gfs2_glock_dq_uninit(freeze_gh);
> + gfs2_glock_dq_uninit(&sdp->sd_freeze_gh);
>
> out:
> while (!list_empty(&list)) {
> @@ -767,15 +766,20 @@ static int gfs2_freeze(struct super_block *sb)
> goto out;
> }
>
> - error = gfs2_lock_fs_check_clean(sdp, &sdp->sd_freeze_gh);
> + error = gfs2_lock_fs_check_clean(sdp);
> if (!error)
> break;
>
> if (error == -EBUSY)
> fs_err(sdp, "waiting for recovery before freeze\n");
> - else
> + else if (error == -EIO) {
> + fs_err(sdp, "Fatal IO error: cannot freeze gfs2 due "
> + "to recovery error.\n");
> + gfs2_glock_dq_uninit(&sdp->sd_freeze_gh);
Instead of this, gfs2_lock_fs_check_clean should make sure it doesn't
keep sd_freeze_ghl held when it fails.
> + goto out;
> + } else {
> fs_err(sdp, "error freezing FS: %d\n", error);
> -
> + }
> fs_err(sdp, "retrying...\n");
> msleep(1000);
> }
>
Thanks,
Andreas
More information about the Cluster-devel
mailing list