[Linux-cluster] Failed gfs_grow causing corrupt volume

Fri Jan 25 15:59:54 UTC 2008

On Fri, 2008-01-25 at 14:56 +0000, Ben Yarwood wrote:
> I will try and find more information on the errors in the logs but I think the problem was that I was using a 32bit system and tried
> to expand over 16TB.  I didn't realize this was the size limit until I read the FAQ afterwards.  Is this the root cause of the
> problem?
> 
> I now have the file system mounted again and am copying what I can off it by moving the files by name.  So far we have copied over
> 250GB of files and not a single file has failed to copy or caused the file system to withdraw.  It is fortunate that we knew the
> name of every file on the file system.  Not really understanding the structure of the file system myself, do you think it's possible
> we will recover all the files using this method?
> 
> 
> Thanks
> Ben

Hi Ben,

Yes, using a 32-bit system to expand beyond 16TB might cause that
problem.  I added some smarts to gfs_fsck in late 2006 so that it
won't let you run a gfs_fsck from a 32-bit system if your file system
is > 16TB.  Perhaps gfs_grow needs more smarts regarding the file
system size as well.  That seems like an easy fix that's well worth
making.

I suspect you can get most (or all) of your data back by copying it
off like you're doing.  That's because even if the new RG information
is corrupt, you will likely not have tried to allocate any new data
into that RG, so the copy will hopefully not encounter the
corruption.  That's just a guess, not knowing what those RGs really
look like.

Regards,

Bob Peterson
Red Hat GFS