[Linux-cluster] Bug inquiry (#831330)

Wed Nov 14 07:09:38 UTC 2012

Hi, Steven. 
Thank you for the reply.

I'm sending you here the syslog portion where the problem appears. Maybe it will be of some help. 
The kernel version is 2.6.18-308.11.1.el5PAE.

Nov 12 15:50:16 blahblah6 kernel: GFS2: fsid=blahblah:data023.2: fatal: invalid metadata block 
Nov 12 15:50:16 blahblah6 kernel: GFS2: fsid=blahblah:data023.2:   bh = 151918444 (magic number) 
Nov 12 15:50:16 blahblah6 kernel: GFS2: fsid=blahblah:data023.2:   function = get_leaf, file = fs/gfs2/dir.c, line = 763 
Nov 12 15:50:16 blahblah6 kernel: GFS2: fsid=blahblah:data023.2: about to withdraw this file system 
Nov 12 15:50:16 blahblah6 kernel: GFS2: fsid=blahblah:data023.2: telling LM to withdraw 
Nov 12 15:50:17 blahblah6 kernel: GFS2: fsid=blahblah:data023.2: withdrawn 
Nov 12 15:50:17 blahblah6 kernel:  [<f95f76f6>] gfs2_lm_withdraw+0x8d/0xb0 [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:  [<f960a98e>] gfs2_meta_check_ii+0x28/0x33 [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:  [<f95ed682>] get_leaf+0x5e/0x9d [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:  [<f95edccb>] get_first_leaf+0x24/0x2a [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:  [<f95edd52>] gfs2_dirent_search+0x81/0x180 [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:  [<f95ee07f>] gfs2_dirent_find+0x0/0x4c [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:  [<f95f4344>] run_queue+0xbd/0x18a [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:  [<f95ef448>] gfs2_dir_search+0x1d/0x7f [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:  [<c04833e2>] permission+0xa2/0xb5 
Nov 12 15:50:17 blahblah6 kernel:  [<f95f5aa0>] gfs2_lookupi+0x116/0x14f [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:  [<f95f5a5a>] gfs2_lookupi+0xd0/0x14f [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:  [<f9602136>] gfs2_lookup+0x1b/0x8e [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:  [<f95f3b6c>] gfs2_glock_put+0xcf/0xe7 [gfs2] 
Nov 12 15:50:17 blahblah6 kernel:  [<c048c807>] d_alloc+0x151/0x17f 
Nov 12 15:50:17 blahblah6 kernel:  [<c04831c8>] do_lookup+0x102/0x1b6 
Nov 12 15:50:17 blahblah6 kernel:  [<c0484b45>] __link_path_walk+0x318/0xd1d 
Nov 12 15:50:17 blahblah6 kernel:  [<c0485584>] link_path_walk+0x3a/0x99 
Nov 12 15:50:17 blahblah6 kernel:  [<c0485961>] do_path_lookup+0x231/0x297 
Nov 12 15:50:17 blahblah6 kernel:  [<c04860bb>] __user_walk_fd+0x29/0x3a 
Nov 12 15:50:17 blahblah6 kernel:  [<c047f4b9>] vfs_stat_fd+0x15/0x3c 
Nov 12 15:50:17 blahblah6 kernel:  [<c047f525>] sys_stat64+0xf/0x23 
Nov 12 15:50:17 blahblah6 kernel:  [<c06258e8>] do_page_fault+0x356/0x653 
Nov 12 15:50:17 blahblah6 kernel:  [<c047795f>] __fput+0x15c/0x184 
Nov 12 15:50:17 blahblah6 kernel:  [<c0625592>] do_page_fault+0x0/0x653 
Nov 12 15:50:17 blahblah6 kernel:  [<c0404ee1>] sysenter_past_esp+0x56/0x79

We have 5 servers accessing a shared filesystem that consists of 24 virtual disks on top of multiple HDDs using GSF2. Once this problem happens in a virtual disk, we can't write into it (but the rest of the virtual disks keep on working without any problem). Also, it seems that running fsck fixes the virtual disk temporarily, but after a while it breaks again. Is there any way to fix this problem, or at least reduce how often it happens (it's happening almost every day in our system), without having to inst
all an older kernel version?

Best regards,

> Hi,
> 
> On Mon, 2012-11-12 at 15:24 +0900, Antonio Castellano wrote:
> > Hi,
> > 
> > I'd like to know about the status of the bug number 831330 and its schedule. Our system is complaining about it and I don't have enough permissions to access its bugzilla related page. It is urgent.
> > 
> > This is the link related to the text reported in our log:
> > https://access.redhat.com/knowledge/ja/node/141203
> > 
> > And this is the bugzilla link:
> > https://bugzilla.redhat.com/show_bug.cgi?id=831330
> > 
> > Is there anybody out there that can help me? The help will be greatly appreciated.
> > 
> > Thank you very much!
> > 
> Assuming that you are a Red Hat customer, please open a ticket. The bug
> mostly contains customer's private data, so that I don't think opening
> this one up would help much as there would be little that we could
> share.
> 
> This is though, our highest priority bug at the moment (when I say our,
> I mean the GFS2 team). There is a simple workaround (just use a slightly
> older kernel) which is one reason why we've had trouble in tracing this,
> because people are (understandably) using that rather than running the
> kernel we've built to debug this issue.
> 
> We've been unable to reproduce this internally, despite trying many
> different workloads. If you are in a position to help us debug the
> issue, then any assistance is very gratefully received,
> 
> Steve.
> 
> 
> 

--
Antonio Castellano [DEV at SDD.jp]
   Seventh Dimension Design, Inc.
   http://www.SDD.jp
   VOICE: +81-78-252-8855, FAX: +81-78-252-8856