[Cluster-devel] [PATCH 0/2] gfs2: improvements to recovery and withdraw process

Tue Nov 20 19:23:11 UTC 2018

Hi,

On 08/11/18 20:25, Bob Peterson wrote:
> Hi,
>
> This is a first draft of a two-patch set to fix some of the nasty
> journal recovery problems I've found lately.
>
> The problems have to do with file system corruption caused when recovery
> replays a journal after the resource group blocks have been unlocked
> by the recovery process. In other words, when no cluster node takes
> responsibility to replay the journal of a withdrawing node, then it
> gets replayed later on, after the blocks contents have been changed.
>
> The first patch prevents gfs2 from attempting recovery if the file system
> is withdrawn or has journal IO errors. Trying to recover your own journal
> from either of these unstable conditions is dangerous and likely to corrupt
> the file system.
That sounds sensible to me.

> The second patch is more extensive. When a node withdraws from a file system
> it first empties out all ourstanding pages in the ail lists, then it
How are we doing this? Since the disk can no longer be written to, there 
are two cases we need to cover. One is for dirty but not yet written 
pages. The other for pages in flight - these will need to either time 
out or complete somehow.

> signals all other nodes with the file system mounted to perform recovery
> on its journal since it cannot safely recover its own journal. This is
> accomplished by a new non-disk callback glop used exclusively by the
> "live" glock, which sets up an lvb in the glock to indicate which
> journal(s) need to be replayed. This sytem makes it necessary to prevent
> recursion, since the journal operations themselves (i.e. the ones that
> empty out the ail list on withdraw) can also withdraw. Thus, the withdraw
We should ignore any further I/O errors after we have withdrawn I think, 
since we know that no further disk writes can take place anyway. These 
will be completed as EIO by dm. As you say we definitely don't want the 
node that is withdrawing replaying its own journal. That should be done 
by the remaining nodes in the cluster.

The other question is should we just use the "normal" recovery process 
which would fence the withdrawn node, or whether we should have a 
different system which avoids the fencing, since we have effectively 
self-fenced from the storage. Looking at the patch I assume that perhaps 
this implements the latter?

Steve.

> system is now separated into "journal" and "non-journal" withdraws.
> Also, the "withdraw" flag is now replaced by a superblock bit because
> once the file system withdraws in this way, it needs to remember that from
> that point on.
>
> Regards,
>
> Bob Peterson
> ---
> Bob Peterson (2):
>    gfs2: Ignore recovery attempts if gfs2 has io error or is withdrawn
>    gfs2: initiate journal recovery as soon as a node withdraws
>
>   fs/gfs2/glock.c    |  5 ++-
>   fs/gfs2/glops.c    | 47 +++++++++++++++++++++++
>   fs/gfs2/incore.h   |  3 ++
>   fs/gfs2/lock_dlm.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++
>   fs/gfs2/log.c      | 62 ++++++++++++++++--------------
>   fs/gfs2/super.c    |  5 ++-
>   fs/gfs2/super.h    |  1 +
>   fs/gfs2/util.c     | 84 ++++++++++++++++++++++++++++++++++++++++
>   fs/gfs2/util.h     | 13 +++++++
>   9 files changed, 282 insertions(+), 33 deletions(-)
>