[Linux-cluster] GFS2 fatal: filesystem consistency error

Tue Jun 21 14:42:40 UTC 2011

----- Original Message -----
| 8 node cluster, fiber channel hbas and disks access trough a qlogic
| fabric.
| 
| I've got hit 3 times with this error on different nodes :
| 
| GFS2: fsid=CyberCluster:GizServer.1: fatal: filesystem consistency
| error
| GFS2: fsid=CyberCluster:GizServer.1: inode = 9582 6698267
| GFS2: fsid=CyberCluster:GizServer.1: function = gfs2_dinode_dealloc,
| file =
| fs/gfs2/inode.c, line = 352
| GFS2: fsid=CyberCluster:GizServer.1: about to withdraw this file
| system
| GFS2: fsid=CyberCluster:GizServer.1: telling LM to unmount
| GFS2: fsid=CyberCluster:GizServer.1: withdrawn
| Pid: 2659, comm: delete_workqueu Tainted: G W ---------------- T
| 2.6.32-131.2.1.el6.x86_64 #1
| Call Trace:
| [<ffffffffa044ffd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2]
| [<ffffffffa0425209>] ? trunc_dealloc+0xa9/0x130 [gfs2]
| [<ffffffffa04501dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2]
| [<ffffffffa0435584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2]
| [<ffffffffa044e1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2]
| [<ffffffffa044e0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2]
| [<ffffffffa044e020>] ? gfs2_delete_inode+0x0/0x280 [gfs2]
| [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0
| [<ffffffffa0432940>] ? delete_work_func+0x0/0x80 [gfs2]
| [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80
| [<ffffffffa044cc4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2]
| [<ffffffff8118bf82>] ? iput+0x62/0x70
| [<ffffffffa0432994>] ? delete_work_func+0x54/0x80 [gfs2]
| [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0
| [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40
| [<ffffffff81088660>] ? worker_thread+0x0/0x2a0
| [<ffffffff8108dd96>] ? kthread+0x96/0xa0
| [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
| [<ffffffff8108dd00>] ? kthread+0x0/0xa0
| [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
| no_formal_ino = 9582
| no_addr = 6698267
| i_disksize = 6838
| blocks = 0
| i_goal = 6698304
| i_diskflags = 0x00000000
| i_height = 1
| i_depth = 0
| i_entries = 0
| i_eattr = 0
| GFS2: fsid=CyberCluster:GizServer.1: gfs2_delete_inode: -5
| gdlm_unlock 5,66351b err=-22
| 
| 
| Only, with different inodes each time.
| 
| After that event, services running on that filesystem are marked
| failed and
| not moved over another node. Any access to that fs yields I/O error.
| Server
| needed to be rebooted to properly work again.
| 
| I did ran a fsck last night on that filesystem, and it did find some
| errors,
| but nothing serious. Lots (realy lots) of those :
| 
| Ondisk and fsck bitmaps differ at block 5771602 (0x581152)
| Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
| Metadata type is 0 (free)
| Fix bitmap for block 5771602 (0x581152) ? (y/n)
| 
| And after completing the fsck, I started back some services, and I got
| the
| same error on another filesystem that is practily empty and used for
| small
| utilities used troughout the cluster...
| 
| What should I do to find the source of this problem ?

Hi,

I believe this is a GFS2 bug we've already solved.
Please contact Red Hat Support.

Regards,

Bob Peterson
Red Hat File Systems