[Linux-cluster] GFS2 fatal: invalid metadata block

Steven Whitehouse swhiteho at redhat.com
Mon Sep 21 12:53:48 UTC 2009


Hi,

On Sat, 2009-09-19 at 05:16 -0600, Kai Meyer wrote:
> I have a 5 node cluster running kernel 2.6.18-128.1.6.el5xen and 
> gfs2-utils-0.1.53-1.el5_3.3 . Twice in 10 days, each node in my cluster 
> has failed with the same message in /var/log/messages. dmesg reports the 
> same errors, and on some nodes there are no other entries previous to 
> the invalid metadata block error.
> 
> I would like to know what issues can trigger such an event. If it is 
> more helpful for me to provide more information, I will be happy to, I'm 
> just not sure what other information you would consider relevant.
> 
> Thank you for your time,
> -Kai Meyer
> 
It means that the kernel was looking for an indirect block, but instead
found something that was not an indirect block. The only way to fix this
is with fsck (after unmounting on all nodes) otherwise the issue is
likely to continue to occur each time you access the particular inode
with the problem.

There have been a couple of reports of this (or very similar) issues
recently. The problem in each case is that the original issue probably
happened some time before it triggered the message which you've seen.
That means that it is very tricky to figure out exactly what the cause
is.

I'd be very interested to know whether this filesystem was a newly
created gfs2 filesystem or an upgraded gfs1 filesystem. Also, whether
there have been any other issues, however minor, which might have caused
a node to be rebooted or fenced since the filesystem was created? Also,
any other background information about the type of workload that was
being run on the filesystem would be helpful too.

Steve.


> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.1: fatal: invalid metadata block
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.1:   bh = 567447963 (magic number)
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.1:   function = 
> gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line
> = 334
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.1: about to withdraw this file system
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.1: telling LM to withdraw
> Sep 19 02:02:07 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.1: withdrawn
> Sep 19 02:02:07 192.168.100.104 kernel: 
> Sep 19 02:02:07 192.168.100.104 kernel: Call Trace:
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885154ce>] 
> :gfs2:gfs2_lm_withdraw+0xc1/0xd0
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80262907>] 
> __wait_on_bit+0x60/0x6e
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80215788>] 
> sync_buffer+0x0/0x3f
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80262981>] 
> out_of_line_wait_on_bit+0x6c/0x78
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8029a01a>] 
> wake_bit_function+0x0/0x23
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8021a7f1>] 
> submit_bh+0x10a/0x111
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885284a7>] 
> :gfs2:gfs2_meta_check_ii+0x2c/0x38
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff88518d30>] 
> :gfs2:gfs2_meta_indirect_buffer+0x104/0x160
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff88509fc3>] 
> :gfs2:gfs2_block_map+0x1dc/0x33e
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8021a821>] 
> poll_freewait+0x29/0x6a
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8850a199>] 
> :gfs2:gfs2_extent_map+0x74/0xac
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8850a2ce>] 
> :gfs2:gfs2_write_alloc_required+0xfd/0x122
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885128d5>] 
> :gfs2:gfs2_glock_nq+0x248/0x273
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851a27c>] 
> :gfs2:gfs2_write_begin+0x99/0x36a
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851bd1b>] 
> :gfs2:gfs2_file_buffered_write+0x14b/0x2e5
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8020d3a5>] 
> file_read_actor+0x0/0xfc
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c151>] 
> :gfs2:__gfs2_file_aio_write_nolock+0x29c/0x2d4
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c2f4>] 
> :gfs2:gfs2_file_write_nolock+0xaa/0x10f
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8022eca0>] 
> __wake_up+0x38/0x4f
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80299fec>] 
> autoremove_wake_function+0x0/0x2e
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8022fbe4>] 
> pipe_readv+0x38e/0x3a2
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80263bce>] 
> lock_kernel+0x1b/0x32
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c444>] 
> :gfs2:gfs2_file_write+0x49/0xa7
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80216da9>] 
> vfs_write+0xce/0x174
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff802175e1>] 
> sys_write+0x45/0x6e
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8025f2f9>] 
> tracesys+0xab/0xb6
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list