[Linux-cluster] gfs2-utils source for recovery purpose of a corrupt gfs2 partition

Fri Apr 2 08:00:22 UTC 2010

Hi Cluster/GFS Experts,
Hi Bob,

as I get no response concerning my recovery issue, I would like to
summarize my activities, which could help someone else running in such a
problem with gfs2.

As the corrupted gfs2 (12TB b4 grow 25TB after) was hosted on a SE6540
disk array and the master is a Sun X4150 4GB machine with a CentOS 5.3
(i686/PAE), I run in the out of memory problem during the run of fsck.gfs2.

No matter what i have done, I was not able even use the temporary swap
file as found in some postings suggested.

As the os installation was done by other guys and they insist on this
configuration, I boot a rescue x86_64 dvd in order to overcame the
memory restriction.

In addition to this I was lucky to have some spare memory to increase
the ram to 16GB.

As I don't like to run the lvm/cman software as well as honestly
speeking not having much experience on this, I create and mount a large
xfs partition an the disk array to create a temporary swap file and to
store the files I hope to recover from the corrupted gfs2 partition.

An investigation via dd | od -c on the first mb of the gfs2 partition
reveal that after the lvm2 block of a size of 192k the sb (super block)
of the gfs2 starts.

creating an loopback device with an offset of 196608 bytes let my access
the file system via fsck without dlm/clvm etc.

losetup /dev/loop4 /dev/sdb  -o 196608

/sbin/fsck.gfs2 -f -p -y -v /dev/loop4

The index of the loop device depends on the usage of the rescue system.
Check it with losetup -a and take a number which is not currently used.

After some attempts on checking the gfs2 running again in the oom my
temp swap space is now about 0.7TG (no joke).

I start with 20GB of swap space and double the size every oom abort of
fsck.

Now I was lucky to pass the first and run into the second check

Initializing fsck
Initializing lists...
jid=0: Looking at journal...
jid=0: Journal is clean.
jid=1: Looking at journal...
jid=1: Journal is clean.
jid=2: Looking at journal...
jid=2: Journal is clean.
jid=3: Looking at journal...
jid=3: Journal is clean.
jid=4: Looking at journal...
jid=4: Journal is clean.
jid=5: Looking at journal...
jid=5: Journal is clean.
jid=6: Looking at journal...
jid=6: Journal is clean.
jid=7: Looking at journal...
jid=7: Journal is clean.
Initializing special inodes...
Validating Resource Group index.
Level 1 RG check.
Level 2 RG check.
Existing resource groups:
1: start: 17 (0x11), length = 529563 (0x8149b)
2: start: 529580 (0x814ac), length = 524241 (0x7ffd1)
3: start: 1053821 (0x10147d), length = 524241 (0x7ffd1)
4: start: 1578062 (0x18144e), length = 524241 (0x7ffd1)
...
9083643: start: 4762017571061 (0x454be5da0f5), length = 524241 (0x7ffd1)
9083644: start: 4762018095302 (0x454be65a0c6), length = 524241 (0x7ffd1)
9083645: start: 4762018619543 (0x454be6da097), length = 524241 (0x7ffd1)
9083646: start: 4762019143784 (0x454be75a068), length = 524241 (0x7ffd1)
...

In addition to this I start to explore the code of gfs2-utils
(folder libgfs2 and folder fsck) and was able to list the super block
infos.

As mentioned im my previous posting I was able to list all my file names
of interest located in a 7TB big image created from the dd output.

all files I'm looking for found in the directory structure (about 16
tousend) could be seen  by a simple od -s (string mode) or by the xxd
command.

xxd -a -u -c 64 -s 671088640 dev_oa_vg_storage1_oa_lv_storage1.bin | less

The first snippet of code I'm used to play around looks like listed
below and is just plain a cut and paste of the utils code:
The code just show some information of the super block.

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <limits.h>
#include <errno.h>
#include <ctype.h>
#include <libintl.h>
#define _(String) gettext(String)
#include "gfs2structure.h"

int main(int argc, char *argv[])
{
	int fd;
	char *device, *field;

	unsigned char buf[GFS2_BASIC_BLOCK];
	unsigned char input[256];
	unsigned char output[256];

	struct gfs2_sb sb;
	struct gfs2_buffer_head dummy_bh;
	struct gfs2_dirent dirent,*dentp;;

	//struct gfs2_inum  sbmd;
	//struct gfs2_inum  sbrd;

	dummy_bh.b_data = (char *)buf;

	//memset(&dirent, 0, sizeof(struct gfs2_dirent));

	device = argv[1];

	fd = open(device, O_RDONLY);

	if (fd < 0)
		die("can't open %s: %s\n", device, strerror(errno));

	if (lseek(fd, GFS2_SB_ADDR * GFS2_BASIC_BLOCK, SEEK_SET) !=
	    GFS2_SB_ADDR * GFS2_BASIC_BLOCK) {
		fprintf(stderr, _("bad seek: %s from %s:%d: superblock\n"),
			strerror(errno), __FUNCTION__, __LINE__);
		exit(-1);
	}
	if (read(fd, buf, GFS2_BASIC_BLOCK) != GFS2_BASIC_BLOCK) {
		fprintf(stderr, _("bad read: %s from %s:%d: superblock\n"),
			strerror(errno), __FUNCTION__, __LINE__);
		exit(-1);
	}

	gfs2_sb_in(&sb, &dummy_bh);

	if (sb.sb_header.mh_magic != GFS2_MAGIC ||
	    sb.sb_header.mh_type != GFS2_METATYPE_SB)
		die( _("there isn't a GFS2 filesystem on %s\n"), device);

	printf( _("current lock protocol name = \"%s\"\n"),sb.sb_lockproto);	

	printf( _("current lock table name = \"%s\"\n"),sb.sb_locktable);

	printf( _("current ondisk format = %u\n"),sb.sb_fs_format);

	printf( _("current multihost format = %u\n"),sb.sb_multihost_format);

	//printf( _("current uuid = %s\n"), str_uuid(sb.sb_uuid));

	printf( _("current block size = %u\n"), sb.sb_bsize);

	printf( _("current block size shift = %u\n"), sb.sb_bsize_shift);

	printf( _("masterdir-addr = %u\n"), sb.sb_master_dir.no_addr);
	printf( _("masterdir-fino = %u\n"), sb.sb_master_dir.no_formal_ino);
	printf( _("rootdir-fino = %u\n"), sb.sb_root_dir.no_addr);
	printf( _("rootdir-fino = %u\n"), sb.sb_root_dir.no_formal_ino);

	printf( _("dummy_bh.sdp = %p\n"), dummy_bh.sdp);

	printf( _("sdp->blks_alloced = %u\n"), dummy_bh.sdp->blks_alloced);
	printf( _("sdp->blks_total = %u\n"), dummy_bh.sdp->blks_total);
	printf( _("sdp->device_name = %s\n"), dummy_bh.sdp->device_name);

	//gfs2_dirent_in(&dirent, (char *)dentp);

	//gfs2_dirent_print(&dirent, output);

        //gfs2_dinode_print(struct gfs2_dinode *di);

	close(fd);
}

I will keep you all informed on the progress of this story.

My next step will be - depending on the further progress of the fsck -
(if it fails or not) to overwrite the "lock_" and/or "fsck_" flags
in the image and to mount the gfs2 image to see what happens.

Meanwhile during the run of fsck which could take a while (used swap
space now is more the 510GB) as I was told, I hope someone could show
me how to run through the inodes using libgfs2 to collect data from them
or to point me to the right direction.

Many Thanks in Advance
and a nice Easter weekend.

Bye
Markus

*******************************************************
Markus Wolfgart
DLR Oberpfaffenhofen
German Remote Sensing Data Center
.
.
.
e-mail: markus.wolfgart at dlr.de
**********************************************************

Hi Bob,

thanks for prompt reply!

the fs originally was 12.4TB (6TB used) big.
After a resize attempt to 25TB by gfs2_grow (very very old version
gfs2-utils 1.62)
The fs was expand and the first impression looks good as df reported the
size of 25TB.
But looking from the second node to the fs (two nod system) ls -r and ls
-R throws
IO errors and gfs2 mount get frozen (reboot of machine was performed).
As no shrinking of gfs2 was possible to rollback, the additional
physical volume was removed from the logical volume (lvresize to org.
size & pvremove).
This hard cut of the gsf2 unfenced partition should be hopefully
repaired by the
fsck.gfs2 (newest version), this was my thought.
Even if this will not be the case, I could not run the fsck.gfs2 due to
a "of memory in compute_rgrp_layout" message.

see strace output:

write(1, "9098813: start: 4769970307031 (0"..., 739098813: start:
4769970307031 (0x4569862bfd7), length = 524241 (0x7ffd1)
) = 73
write(1, "9098814: start: 4769970831272 (0"..., 739098814: start:
4769970831272 (0x456986abfa8), length = 524241 (0x7ffd1)
) = 73
write(1, "9098815: start: 4769971355513 (0"..., 739098815: start:
4769971355513 (0x4569872bf79), length = 524241 (0x7ffd1)
) = 73
write(1, "9098816: start: 4769971879754 (0"..., 739098816: start:
4769971879754 (0x456987abf4a), length = 524241 (0x7ffd1)
) = 73
write(1, "9098817: start: 4769972403995 (0"..., 739098817: start:
4769972403995 (0x4569882bf1b), length = 524241 (0x7ffd1)
) = 73
brk(0xb7dea000)                         = 0xb7dc9000
mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = -1 ENOMEM (Cannot allocate memory)
write(2, "Out of memory in compute_rgrp_la"..., 37Out of memory in
compute_rgrp_layout
) = 37
exit_group(-1)                          = ?

As I had already increased my swapspace
swapon -s
Filename                                Type            Size    Used
Priority
/dev/sda3                               partition       8385920 0       -3
/var/swapfile.bin                       file            33554424
144     1
 and run again the same situation as before I decide to start to extract
the lost files by a c prog.

Now I have create a big Image (7TB) on a xfs partition and would like to
recover my files of interest
by a program using libgfs2 or part of the source from gfs2-utils, as
mentioned in my previous posting.
 As I see nearly all of the files located in the dir structure and get
the position in the image by
a simple string command, I hope to extract them in a simpler way.

The RG size was set to the Max value of 2GB end each file I'm looking
for is about 250BM big.
The amount of files to be recovered is more then 16k.
Every file have a header with his file name ant the total size, so it
should be easy to check if the
recovery of it is successful.

So thats my theory, but this could be a easter vacation project without
the right knowledge of gfs2.
As I'm lucky to have the gfs2-utils source I hope it could be done.
But if there is a simpler way to do a recovery by the installed gfs2
progs like gfs2_edit or gfs2_tool
or other tools it would be nice if someone could show my the proper way.

Many Thanks in advance

Markus  -- *******************************************************
Markus Wolfgart
DLR Oberpfaffenhofen
German Remote Sensing Data Center
.
.
.
e-mail: markus.wolfgart at dlr.de
**********************************************************

 ----- "Markus Wolfgart" <markus wolfgart dlr de> wrote:
| Hallo Cluster and GFS Experts,
| | I'm a new  subscriber of this mailing list and appologise
| in the case my posting is offtopic.
| | I'm looking for help concerning a corrupt gfs2 file system
| which could not be recovered by me by fsck.gfs2 (Ver. 3.0.9)
| due to to less less physical memory (4GB) eaven if increasing it
| by a additional swap space (now about 35GB).
| | I would like to parse a image created of the lost fs (the first 6TB)
| with the code provided in the new gfs2-utils release.
| | Due to this circumstance I hope to find in this mailing list some
| hints
| concerning an automated step by step recovery of lost data.
| | Many Thanks in advance for your help
| | Markus

Hi Markus,

You said that fsck.gfs2 is not working but you did not say what
messages it gives you when you try.  This must be a very big
file system.  How big is it?  Was it converted from gfs1?

Regards,

Bob Peterson
Red Hat File Systems