[Linux-cluster] 'ls' makes GFS2 to withdraw

Mon Mar 16 15:24:15 UTC 2009

Hi,

Please do not use GFS2 on Centos 5.2, it is rather old. Did you try
running fsck.gfs2 ?

The results you see look like the readdir() call has worked, but that
the stat() call to the directory entry has failed. I'd suggest using
Fedora at least until Centos 5.3 is available,

Steve.

On Mon, 2009-03-16 at 17:23 +0200, Theophanis Kontogiannis wrote:
> Hello all,
> 
>  
> 
> I have Centos 5.2, kernel  2.6.18-92.1.22.el5.centos.plus,
> gfs2-utils-0.1.44-1.el5_2.1
> 
>  
> 
> The cluster is two nodes, using DRBD 8.3.2 as the shared block device,
> and CLVM over it, and GFS2 over it.
> 
>  
> 
> After an ls in a directory within the GFS2 file system I got the
> following errors.
> 
>  
> 
> …………………
> 
> GFS2: fsid=tweety:gfs2-00.0: fatal: invalid metadata block
> 
> GFS2: fsid=tweety:gfs2-00.0:   bh = 522538 (magic number)
> 
> GFS2: fsid=tweety:gfs2-00.0:   function = gfs2_meta_indirect_buffer,
> file = fs/gfs2/meta_io.c, line = 332
> 
> GFS2: fsid=tweety:gfs2-00.0: about to withdraw this file system
> 
> GFS2: fsid=tweety:gfs2-00.0: telling LM to withdraw
> 
> GFS2: fsid=tweety:gfs2-00.0: withdrawn
> 
>  
> 
> Call Trace:
> 
>  [<ffffffff885c2146>] :gfs2:gfs2_lm_withdraw+0xc1/0xd0
> 
>  [<ffffffff800639de>] __wait_on_bit+0x60/0x6e
> 
>  [<ffffffff80014f46>] sync_buffer+0x0/0x3f
> 
>  [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78
> 
>  [<ffffffff8009d0ca>] wake_bit_function+0x0/0x23
> 
>  [<ffffffff885d3f7f>] :gfs2:gfs2_meta_check_ii+0x2c/0x38
> 
>  [<ffffffff885c5a06>] :gfs2:gfs2_meta_indirect_buffer+0x104/0x15e
> 
>  [<ffffffff885c095a>] :gfs2:gfs2_inode_refresh+0x22/0x2ca
> 
>  [<ffffffff8009d0ca>] wake_bit_function+0x0/0x23
> 
>  [<ffffffff885bfd9c>] :gfs2:inode_go_lock+0x29/0x57
> 
>  [<ffffffff885bef04>] :gfs2:glock_wait_internal+0x1d4/0x23f
> 
>  [<ffffffff885bf11d>] :gfs2:gfs2_glock_nq+0x1ae/0x1d4
> 
>  [<ffffffff885cb053>] :gfs2:gfs2_lookup+0x58/0xa7
> 
>  [<ffffffff885cb04b>] :gfs2:gfs2_lookup+0x50/0xa7
> 
>  [<ffffffff800226dd>] d_alloc+0x174/0x1a9
> 
>  [<ffffffff8000cbff>] do_lookup+0xe5/0x1e6
> 
>  [<ffffffff80009fac>] __link_path_walk+0xa01/0xf42
> 
>  [<ffffffff800c4fe7>] zone_statistics+0x3e/0x6d
> 
>  [<ffffffff8000e7cd>] link_path_walk+0x5c/0xe5
> 
>  [<ffffffff885bdd6f>] :gfs2:gfs2_glock_put+0x26/0x133
> 
>  [<ffffffff8000c99e>] do_path_lookup+0x270/0x2e8
> 
>  [<ffffffff80012336>] getname+0x15b/0x1c1
> 
>  [<ffffffff80023741>] __user_walk_fd+0x37/0x4c
> 
>  [<ffffffff8003ed91>] vfs_lstat_fd+0x18/0x47
> 
>  [<ffffffff8002a9d3>] sys_newlstat+0x19/0x31
> 
>  [<ffffffff8005d229>] tracesys+0x71/0xe0
> 
>  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
> 
> …………………….
> 
>  
> 
>  
> 
> Obviously ls was not the cause of the problem but it triggered the
> events.
> 
>  
> 
> From the other node I can have access on the directory that on which
> the ‘ls’ triggered the above. The directory is full of files like
> that:
> 
>  
> 
> ?--------- ? ?     ?          ?            ? sched_reply
> 
>  
> 
> Almost 50% of the files are in shown like that with ls.
> 
>  
> 
> The questions are:
> 
>  
> 
> 1.      Is this a (new) GFS2 bug?
> 
> 2.      Is this a recoverable problem (and how)?
> 
> 3.      After a  GFS2 file system gets withdrawn, how do we make the
> node to use it again, without rebooting?
> 
>  
> 
> Thank you all for your time.
> 
>  
> 
> Theophanis Kontogiannis
> 
>  
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster