[Linux-cluster] GFS2 filesystem consistency error
daniel.dehennin at baby-gnu.org
Wed Feb 24 09:05:06 UTC 2016
Bob Peterson <rpeterso at redhat.com> writes:
> Hi Daniel,
> I took a look at that metadata you sent me, but I didn't find any evidence
> relating to the problem you posted. Either the corruption happened a long
> time prior to your saving of the metadata, or else the metadata was saved
> after an fsck.gfs2 fixed (or attempted to fix) the problem?
- when I first encountered the problem, I did an fsck on the filesystem
with version 3.1.6 from Ubuntu.
- several days after, the same messages “dirty_inode: glock -5”
start showing on the same node as the first time.
- I did an fsck with 3.1.8 build from git
- few days after, the same node had the “dirty_inode” messages, I
shutdown that node and then run the “gfs2_edit savemeta”.
All nodes are same hardware and OS/kernel/pacemaker version.
> One thing's for sure: I don't see any evidence of wild file system corruption;
> certainly nothing that can account for those errors.
> You said the problem seemed to revolve around a gfs2_grow operation,
Not exactly, I live grow the fs 6 months ago and encounter some
troubles, I did an fsck by that time and the fs run fine for months.
Then we had the “dirty_inode” troubles starting on Feb 9.
> Can you make sure the lvm2 volume group has the clustered bit set?
> Please do the "vgs" command and see if that volume has "c" listed in its
> flags. If not, it could have caused problems for the gfs2_grow.
Yes it has the cluster flag.
> I've seen problems like this very rarely. Once was a legitimate bug in
> GFS2 that we fixed in RHEL5, but I assume your kernel is newer than
We have 3.13.0-78-generic from Ubuntu.
> My only working theory is this:
> This might be related to the transition between "unlinked" dinodes and
> "free". After a file is deleted, it goes to "unlinked" and has to be
> transitioned to "free". This sometimes goes wrong because of the way
> it needs to check what other nodes in the cluster are doing.
> Maybe: If you have three nodes, and a file was unlinked on node 1, then
> maybe the internode communication got confused and nodes 2 and 3 both
> tried to transition it from Unlinked to Free. That is only a theory, and
> there is absolutely no proof. However, I have a set of patches that are
> experimental, and not even in the upstream kernel yet (hopefully soon!)
> that try to tighten up and fix problems like this. It's much more common
> for multiple nodes to try to transition from Unlinked to Free, and they
> all fail, leaving the file in an "Unlinked" state.
Thanks for the explanations, so I try to re-add the down node to the
cluster and see what happen.
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 342 bytes
Desc: not available
More information about the Linux-cluster