[Linux-cluster] Found unlinked inode

Wed Sep 26 14:49:41 UTC 2007

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bob Peterson
Sent: den 26 september 2007 16:01
To: linux clustering
Subject: Re: [Linux-cluster] Found unlinked inode

> Hi Jonas,
>
> Well, I can think of one possible explanation.  I can't be sure because
> I don't know your test scenario, but this is my theory.  First, a bit
> of background:
>
> When a node gets "shot" as you say, the metadata for some of its
> recent operations are likely to still be in the journal it was using.
> Depending on the circumstances of when it gets shot, that data may
> exist only in the journal if it got "shot" before the data was
> written to its final destination on disk.
>
> Ordinarily, that's not a big deal because the next time the file
> system is mounted, the journal is replayed and that causes the metadata
> to be written correctly to its proper place on disk and all is well.
> That's the same for most journaling file systems afaik.
>
> A couple years ago, one of my predecessors (before I started) made an
> executive decision to make gfs_fsck *clear* the system journals rather
> than replay the journals.  I don't know offhand if the code was once
> there and got taken out or if it was never written.  At any rate, it
> seemed like a good idea at the time and there were several good
> reasons to justify that decision:
>
> First, if the user is running gfs_fsck, they must already suspect
> file system corruption.  If (and this is a big if) that corruption was
> caused by recent operations to the file system, then replaying the
> journal can only serve to compound the corruption and cause more
> corruption.  That's because what is in the journal may also be based on
> the corruption.  This was more of a concern if, for some reason,
> GFS bailed out and "withdrew" from the file system because it detected
> corruption, suspecting that it must have somehow caused that corruption.
>
> Second, if the user is running gfs_fsck because of corruption, we may
> not be able to assume that the journal is good metadata, worthy of
> being replayed.
>
> Third, the user always has the option of replaying the journal before
> doing gfs_fsck:
>
> 1. mount the file system after the crash (to replay the journal)
> 2. unmount the file system
> 3. run gfs_fsck
>
> The decision to have gfs_fsck clear the journals was probably made
> many years ago, before gfs was stable, and these "withdraw" situations
> were more common.
>
> Some people believe that this was a bad decision.  I believe
> that it makes more sense to trust the journal and replay it before
> doing the rest of the fsck operations because in "normal" cases where
> a node dies (often for some reason unrelated to gfs, like getting
> shot, fenced, losing power, blowing up a power supply, etc.) you have
> the potential to lose metadata unless the journal is replayed.
>
> Other journaling file systems replay their journals during fsck
> or else they inform the user, ask them to take steps to replay
> the journal (as above), or give them the option to clear them, etc.
> So far, gfs_fsck does not do that.  It just clears the journals.
>
> To remedy the situation, I've got an open bugzilla 291551 (which
> may be marked "private" because it was opened internally--sorry)
> at least in the gfs2_fsck case.  (gfs_fsck will likely be done too).
> With that bugzilla, I intend to somehow remedy the situation.
> Either I'll ask the user if they want the journal replayed or else
> I'll replay them automatically, or try to detect problems with them.
>
> I'm not certain that this is the cause of your corruption, but it's
> the only one I can think of at the moment.
>
Hi Bob,

This sounds like a reasonable explanation except for one thing, the filesystem was cleanly umounted on both nodes before I ran gfs_fsck. So there shouldn't be any journal to replay, right?

Anyway, I've restarted the test and if I'm able to recreate this error I'll first take a copy of the filesystem and then check if running "mount + umount" makes this gfs_fsck error go away.

Regards,
Jonas
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster