[Linux-cluster] GFS2 filesystem consistency error

Shreekant Jena shreekant.jena at gmail.com
Wed Feb 24 08:08:21 UTC 2016


HI ,
I m having a problem in two node cluster . Secondary node is showing
offline after reboot.
CMAN not starting.
below are logs of offline node:-

[root at EI51SPM1 cluster]# clustat
msg_open: Invalid argument
Member Status: Inquorate

Resource Group Manager not running; no service information available.

Membership information not available
[root at EI51SPM1 cluster]# tail -10 /var/log/messages
Feb 24 13:36:23 EI51SPM1 ccsd[25487]: Error while processing connect:
Connection refused
Feb 24 13:36:23 EI51SPM1 kernel: CMAN: sending membership request
Feb 24 13:36:27 EI51SPM1 ccsd[25487]: Cluster is not quorate.  Refusing
connection.
Feb 24 13:36:27 EI51SPM1 ccsd[25487]: Error while processing connect:
Connection refused
Feb 24 13:36:28 EI51SPM1 kernel: CMAN: sending membership request
Feb 24 13:36:32 EI51SPM1 ccsd[25487]: Cluster is not quorate.  Refusing
connection.
Feb 24 13:36:32 EI51SPM1 ccsd[25487]: Error while processing connect:
Connection refused
Feb 24 13:36:32 EI51SPM1 ccsd[25487]: Cluster is not quorate.  Refusing
connection.
Feb 24 13:36:32 EI51SPM1 ccsd[25487]: Error while processing connect:
Connection refused
Feb 24 13:36:33 EI51SPM1 kernel: CMAN: sending membership request
[root at EI51SPM1 cluster]#
[root at EI51SPM1 cluster]# cman_tool status
Protocol version: 5.0.1
Config version: 166
Cluster name: IVRS_DB
Cluster ID: 9982
Cluster Member: No
Membership state: Joining
[root at EI51SPM1 cluster]# cman_tool nodes
Node  Votes Exp Sts  Name
[root at EI51SPM1 cluster]#
[root at EI51SPM1 cluster]#



Thanks & Regards,
Shreekanta Jena


On Tue, Feb 23, 2016 at 11:30 PM, Bob Peterson <rpeterso at redhat.com> wrote:

> ----- Original Message -----
> > Bob Peterson <rpeterso at redhat.com> writes:
> >
> >
> > [...]
> >
> > > Hi Daniel,
> > >
> > > I'm downloading the metadata now. I'll let you know what I find.
> > > It may take a while because my storage is a bit in flux at the moment.
> >
> > Ok, thanks a lot for looking at our problems.
> >
> > Regards.
> > --
> > Daniel Dehennin
> > Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
> > Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
>
> Hi Daniel,
>
> I took a look at that metadata you sent me, but I didn't find any evidence
> relating to the problem you posted. Either the corruption happened a long
> time prior to your saving of the metadata, or else the metadata was saved
> after an fsck.gfs2 fixed (or attempted to fix) the problem?
>
> One thing's for sure: I don't see any evidence of wild file system
> corruption;
> certainly nothing that can account for those errors.
>
> You said the problem seemed to revolve around a gfs2_grow operation, right?
> Can you make sure the lvm2 volume group has the clustered bit set?
> Please do the "vgs" command and see if that volume has "c" listed in its
> flags. If not, it could have caused problems for the gfs2_grow.
>
> I've seen problems like this very rarely. Once was a legitimate bug in
> GFS2 that we fixed in RHEL5, but I assume your kernel is newer than that.
> The other problem we weren't able to solve because there was no evidence
> of what went wrong.
>
> My only working theory is this:
>
> This might be related to the transition between "unlinked" dinodes and
> "free". After a file is deleted, it goes to "unlinked" and has to be
> transitioned to "free". This sometimes goes wrong because of the way
> it needs to check what other nodes in the cluster are doing.
>
> Maybe: If you have three nodes, and a file was unlinked on node 1, then
> maybe the internode communication got confused and nodes 2 and 3 both
> tried to transition it from Unlinked to Free. That is only a theory, and
> there is absolutely no proof. However, I have a set of patches that are
> experimental, and not even in the upstream kernel yet (hopefully soon!)
> that try to tighten up and fix problems like this. It's much more common
> for multiple nodes to try to transition from Unlinked to Free, and they
> all fail, leaving the file in an "Unlinked" state.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20160224/7a1e9fba/attachment.htm>


More information about the Linux-cluster mailing list