[Linux-cluster] Failed gfs_grow causing corrupt volume

Ben Yarwood ben.yarwood at juno.co.uk
Fri Jan 25 12:08:07 UTC 2008


Trying to grow a 15TB file system to 20TB this morning, using RHEL4.4 I got an error and the grow failed.  The file system will
still mount but when accessed gives the following error and withdraws:

Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: fatal: invalid metadata block
Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0:   bh = 465407847 (type: exp=4, found=3)
Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0:   function = gfs_get_meta_buffer
Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0:   file =
/builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/gfs/dio.c, line = 1223
Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0:   time = 1201260769
Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: about to withdraw from the cluster
Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: waiting for outstanding I/O
Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: telling LM to withdraw
Jan 25 11:32:50 jrmedia-c kernel: lock_dlm: withdraw abandoned memory
Jan 25 11:32:50 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: withdrawn


A gfs_fsck doesn't work either:

Initializing fsck
Initializing lists...
Initializing special inodes...
Validating Resource Group index.
Level 1 check.
20148 resource groups found.
Block #36354331 (0x22ab91b) (1 of 5) is neither GFS_METATYPE_RB nor GFS_METATYPE_RG.
Resource group or index is corrupted.
Unable to read in rgrp descriptor.
(failed--trying again at level 2)
Level 2 check.
The middle RG is not on an even boundary (fs has grown?)
(failed--trying again at level 3)
Level 3 check.
RG 1000 is damaged: recomputing RG dist from index: 0x10085
Section 1: 0x11 - 0x3e57fff
  RG 1 at block 0x11 intact [length 0x2881]
  RG 2 at block 0x2892 intact [length 0x2881]
* RG 3 at block 0x5113 *** DAMAGED *** [length 0x2881]
* RG 4 at block 0x7994 *** DAMAGED *** [length 0x2881]
* RG 5 at block 0xA215 *** DAMAGED *** [length 0x2881]
* RG 6 at block 0xCA96 *** DAMAGED *** [length 0x2881]
Error: too many bad RGs.
(failed--giving up)
Unable to fill in resource group information.
Freeing buffers.


Does anyone know if this file system can be fixed, it'll take a long time to restore form the backup.


Cheers
Ben







More information about the Linux-cluster mailing list