[Linux-cluster] Unformatting a GFS cluster disk

Wed Mar 26 14:22:14 UTC 2008

On Wed, 2008-03-26 at 10:05 +0000, DRand at amnesty.org wrote:
> 
> Hi Bob, 
> 
> Great, thanks for your info! The disk was previously a GFS disk and we
> reformatted it with exactly the same mkfs command both times. Here are
> more details. We are running the cluster on a Netapp SAN device. 
> 
> 1) mkfs.gfs -J 1024 -j 4 -p lock_gulm -t aicluster:cmsgfs /dev/sda
> [100Gb device] 
> 2) Copy lots of files to the disk 
> 3) gfs_grow /san   [Extra 50Gb extension added to device] 
> 4) Copy lots of files to the disk 
> 5) mkfs.gfs -J 1024 -j 4 -p lock_gulm -t aicluster:cmsgfs /dev/sda 

This is likely to be a problem.  The first mkfs would have calculated
where to put the resource groups based on the original size of the
device.  The gfs_grow would have added new resource groups, but it
would calculate them based on the size of the extension.  It uses
a somewhat different set of calculations than mkfs, so the first
"new" resource group added by gfs_grow will normally be in a location
different from where mkfs would have put it, had the device been the
new size to begin with.

What that means is that the second mkfs would have recalculated new
locations for the new RGs based on the bigger device size and put
them in different locations.  That would be likely to have overwritten
some data from the old file system.

If you were using lvm2 for your device, you may have been able to
recover some data (with possible corruption interspersed from the second
mkfs's RGs as stated above) by doing something like an lvremove of the
bigger device, lvcreate of the 100Gb device, mkfs, lvresize +50G, then
gfs_grow to place the RGs in their former locations.  With a device
name of /dev/sda, it sounds like you did not use lvm2, so I don't know
if it's possible to re-shrink the gfs partition to its original size
and re-grow it again.

> I have now read about resource groups and the GFS ondisk structure
> here.. 
> 
> http://www.redhat.com/archives/cluster-devel/2006-August/msg00324.html 
> 
> A couple more questions if you don't mind... 
> 
> What exactly would the mkfs command have done? Would the mkfs command
> have overwritten the resource group headers from the previous disk
> structure? Or does it just wipe the superblock and journals? 

The mkfs command does this:
1. Rewrite the superblock, which is no great loss.
2. Rewrite the internal "journal index" file which is not very
   destructive.
3. Rewrite the resource group index file (if the device has grown,
   the file might be slightly larger, so it may overwrite a few blocks).
4. Rewrite the root directory, which, if tricked into thinking there
   was valid data on all blocks, would cause gfs_fsck to toss all
   the files it finds into lost+found.  So the root directory
   information would be lost.
5. Rewrite the quota file, which is no great loss.
6. Rewrite the license file, which is no great loss.  The license file
   hasn't been used since Red Hat open-sourced the gfs code, but lately
   it's been re-used for the "fast statfs" feature.
7. Initializes the Resource Groups and the bitmaps that follow them.
   Again, if the file system had grown, these new RGs and bitmaps
   would be located in different places and overwrite blocks of data.
8. Initializes the journals to an empty state (again, if the device
   had grown, I guarantee if would put the new journals in a different
   place, thereby overwriting data from the first file system).

> If the resource group headers still exist shouldn't they have a
> characteristic structure we could identify enabling us to put 0xFF in
> only the correct places on disk? 

The gfs2_edit tool will show the structure header in one color and
the data in another.  It will say something like "(rsrc grp hdr)"
at the top of the screen for the RGs, and something like
"(rsrc grp bitblk)" for the bitmap blocks.  All the data that follows
the headers for both structures are considered bits in the bitmap.
The bitmaps are in the blocks immediately following the RGs.

> Also is there anyway we can usefully depend on this information. Or
> would mkfs have wiped these special inodes too? 

The internal files, or special inodes would have been rewritten by
the mkfs.

> In particular there is one 11Gb complete backup tar.gz on the disk
> somewhere. I'm thinking if we could write some custom utility that
> recognizes the gfs on disk structure and extracts very large files
> from it? 

This sounds like a fun project.  I would love it if I had time for
such a thing. If you only wanted to try to recover that one file, this
would likely be much safer than using gfs_mkfs, gfs_grow and gfs_fsck.

You could start with the source to gfs2_edit and make it do a
block-by-block search for gfs disk inodes, or gfs_dinode structures in
each block.  For every dinode found, you could read it in using the
function gfs_dinode_in, then check to see if the di_size (file size) is
close to what you want.

If it finds the right file, you would have it mark that block in the
bitmap as being "used metadata".  Again, there are functions in
libgfs2 to accomplish that.  I'm not sure offhand if gfs_fsck would
re-link the file into lost+found with only that much information.  If
not, you would have to create a directory entry for that dinode in the
empty root directory.  Then you'd want run gfs_fsck so it could find
all the blocks from the file and mark them as "in use" as well.

Still, some of your blocks may have been overwritten by bits from
the new journals or new resource groups.  If so, it will likely
confuse gunzip and/or tar, and you'll still not be able to get at
the data.  But hey, you might get lucky.

If you do decide to go this route, please send your changes to the
public cluster-devel mailing list so everyone can benefit, okay?

Regards,

Bob Peterson
Red Hat GFS