[Linux-cluster] falure during gfs2_grow caused node crash & data loss

Mon Mar 22 13:52:21 UTC 2010

----- bergman at merctech.com wrote:
| I just had a serious problem with gfs2_grow which caused a loss of
| data and a 
| cluster node reboot.
| 
| I was attempting to grow a gfs2 volume from 50GB => 145GB. The volume
| was 
| mounted on both cluster nodes at the start of running "gfs2_grow".
| When I 
| umounted the volume from _one_ node (not where gfs2_grow was running),
| the 
| macine running gfs2_grow rebooted and the filesystem is damaged.
| 
| The sequence of commands was as follows. Each command was successful
| until the 
| "umount".
(snip)
| Mark

Hi Mark,

There's a good chance this was caused by bugzilla bug #546683 which
is scheduled to be released in 5.5.  However, I've also seen some
problems like this when a logical volume in LVM isn't marked as
clustered.  Make sure it is with the "vgs" command (check if the flags
end with a "c") and if not, do vgchange -cy <volgrp>

As for fsck.gfs2, it should never segfault.  IMHO, this is a bug
so please open a bugzilla record: Product: "Red Hat Enterprise Linux 5"
and component "gfs2-utils".  Assign it to me.

As for recovering your volume, you can try this but it's not guaranteed
to work:
(1) Reduce the volume to its size from before the gfs2_grow.
(2) Mount it from one node only, if you can (it may crash).
(3) If it lets you mount it, run gfs2_grow again.
(4) Unmount the volume.
(5) Mount the volume from both nodes.

If that doesn't work or if the system can't properly mount the volume
your choices are either (1) reformat the volume and restore from
backup, (2) Use gfs2_edit to patch the i_size field of the rindex file
to be a fairly small multiple of 96 then repeat steps 1 through 4.

Regards,

Bob Peterson
Red Hat File Systems