[Linux-cluster] GFS2 fatal: filesystem consistency error

Tue Jun 21 14:46:07 UTC 2011

Hi,

On Tue, 2011-06-21 at 09:57 -0400, Nicolas Ross wrote:
> 8 node cluster, fiber channel hbas and disks access trough a qlogic fabric.
> 
> I've got hit 3 times with this error on different nodes :
> 
> GFS2: fsid=CyberCluster:GizServer.1: fatal: filesystem consistency error
> GFS2: fsid=CyberCluster:GizServer.1: inode = 9582 6698267
> GFS2: fsid=CyberCluster:GizServer.1: function = gfs2_dinode_dealloc, file = 
> fs/gfs2/inode.c, line = 352
> GFS2: fsid=CyberCluster:GizServer.1: about to withdraw this file system
> GFS2: fsid=CyberCluster:GizServer.1: telling LM to unmount
> GFS2: fsid=CyberCluster:GizServer.1: withdrawn
> Pid: 2659, comm: delete_workqueu Tainted: G W ---------------- T 
> 2.6.32-131.2.1.el6.x86_64 #1
> Call Trace:
> [<ffffffffa044ffd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2]
> [<ffffffffa0425209>] ? trunc_dealloc+0xa9/0x130 [gfs2]
> [<ffffffffa04501dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2]
> [<ffffffffa0435584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2]
> [<ffffffffa044e1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2]
> [<ffffffffa044e0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2]
> [<ffffffffa044e020>] ? gfs2_delete_inode+0x0/0x280 [gfs2]
> [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0
> [<ffffffffa0432940>] ? delete_work_func+0x0/0x80 [gfs2]
> [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80
> [<ffffffffa044cc4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2]
> [<ffffffff8118bf82>] ? iput+0x62/0x70
> [<ffffffffa0432994>] ? delete_work_func+0x54/0x80 [gfs2]
> [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0
> [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40
> [<ffffffff81088660>] ? worker_thread+0x0/0x2a0
> [<ffffffff8108dd96>] ? kthread+0x96/0xa0
> [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
> [<ffffffff8108dd00>] ? kthread+0x0/0xa0
> [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
> no_formal_ino = 9582
> no_addr = 6698267
> i_disksize = 6838
> blocks = 0
> i_goal = 6698304
> i_diskflags = 0x00000000
> i_height = 1
> i_depth = 0
> i_entries = 0
> i_eattr = 0
> GFS2: fsid=CyberCluster:GizServer.1: gfs2_delete_inode: -5
> gdlm_unlock 5,66351b err=-22
> 
> 
> Only, with different inodes each time.
> 
> After that event, services running on that filesystem are marked failed and 
> not moved over another node. Any access to that fs yields I/O error. Server 
> needed to be rebooted to properly work again.
> 
> I did ran a fsck last night on that filesystem, and it did find some errors, 
> but nothing serious. Lots (realy lots) of those :
> 
> Ondisk and fsck bitmaps differ at block 5771602 (0x581152)
> Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
> Metadata type is 0 (free)
> Fix bitmap for block 5771602 (0x581152) ? (y/n)
> 
> And after completing the fsck, I started back some services, and I got the 
> same error on another filesystem that is practily empty and used for small 
> utilities used troughout the cluster...
> 
> What should I do to find the source of this problem ? 
> 

I suspect that this is a know problem, bz #712139 if you have access to
the Red Hat bugzilla. There is a fix available via our usual support
channels. Note that this particular bug is highly version specific so it
only applies to RHEL 6.1 and no other version (either RHEL or upstream),

Steve.