[Linux-cluster] falure during gfs2_grow caused node crash & data loss

Mon Mar 22 17:18:31 UTC 2010

----- bergman at merctech.com wrote:
(snip)
| Do you mean "di_size"?

Yes.

| => to be a fairly small multiple of 96 then repeat steps 1 through 4.
| 
| According to "gfs2_edit -p rindex", the initial value of di_size is:
| 
| 	di_size               8192                0x2000
| 
| Does that give any indication of an appropriate "fairly small
| multiple"?
| 
| Thanks,
| 
| Mark

Hi Mark,

The big question is: Was this file system created with mkfs.gfs2
originally?  Or was it created with gfs_mkfs (gfs1) and converted to
gfs2 by gfs2_convert?  If it was created by gfs_mkfs and converted
then there's not much hope of recovering because fsck.gfs2 isn't
currently smart enough to handle oddly-spaced rgrps left behind by gfs1.

Here's the problem: fsck.gfs2 seems to be claiming that there are six
rgrps intact, each of which is around 1GB.  Since the file system was
originally much bigger, I'd think there would be more.  Each of the
rindex entries is 96 bytes, so you could try 6*96 = 576, or in hex 0x240.
So basically you could try setting di_size to 0x240 with gfs2_edit, then
mount and run gfs2_grow.  Then unmount and run fsck.gfs2.

As I said, if the file system was originally gfs1, this won't work.

If the file system was gfs2 from its conception, hopefully gfs2_grow
will rewrite those damaged rgrps starting with the damaged one, and
then fsck.gfs2 will take care of finding out what is allocated and not
allocated and fix the bitmaps.  If the file system was full before
gfs2_grow, you could lose a lot of data.  It's a long shot, really,
but I guess you've got nothing more to lose.

Regards,

Bob Peterson
Red Hat File Systems