[Linux-cluster] Re: corrupted GFS filesystem

Thu Aug 14 17:47:36 UTC 2008

Hi David,

On Thu, 2008-08-14 at 10:44 -0500, David Potterveld wrote:
> The system in question is RHEL4
> 
> The gfs2_edit tool is quite informative. The root directory (block 26)
> in pointer mode displays the correct directory pointers, and I can
> navigate down that tree. My guess is it's OK.

That's good.

> Block 23 (rindex) seems to be damaged. In structure mode, it is listed
> as unknown block type. Here is the listing in Hex mode:
> 
> Block #23    (0x17)           of 393214976 (0x176FFC00)
> (p.1 of 16)--------------------- rindex file -------------------
> 00017000 00000000 00000000 00000000 00000000 [................]
> 00017010 000002B0 00000002 000025DE 3E6DF93C [..........%.>m.<]
> 00017020 3E6E2420 000061AC 00000279 00000279 [>n$ ..a....y...y]
> 00017030 00B46690 48A30A6D 00000000 00000000 [..f.H..m........]
> 00017040 00000000 00000000 00000000 000002B0 [................]
> 00017050 00000002 000025DF 3E6DF4C8 3E6E2420 [......%.>m..>n$ ]
> 00017060 000093D2 00000235 00000235 00B46B6C [.......5...5..kl]
> 00017070 48A30A6D 00000000 00000000 00000000 [H..m............]
> 00017080 00000000 00000000 000004E2 00000002 [................]
> 00017090 000025E0 3E6DF4C8 3E6E2420 000093D2 [..%.>m..>n$ ....]
> 000170A0 00000235 00000235 00B46FE0 48A30A6D [...5...5..o.H..m]
> 000170B0 00000000 00000000 00000000 00000000 [................]
> 000170C0 00000000 000004E2 00000002 000025E1 [..............%.]
> 000170D0 3E6DF48C 3E6E2420 00009423 00000201 [>m..>n$ ...#....]
> 000170E0 00000201 00B47454 48A30A6D 00000000 [......tTH..m....]
> 000170F0 00000000 00000000 00000000 00000000 [................]
>          *** This seems to be a GFS-1 file system ***

That is definitely trashed.  This block should look like an
inode and it's not even close.  It doesn't look like anything
I'm used to seeing as far as GFS metadata is concerned.
It looks like data.  I'm guessing the hex numbers that look
like 3E6DF48C are some kind of time stamp, but it doesn't
resemble gfs metadata.

I've got a bugzilla record open so I can do some improvements
to the rindex repair code for RHEL5.  The bug is:

https://bugzilla.redhat.com/show_bug.cgi?id=442271

Right now it's slotted for 5.4.  It's been suggested that I add
an option to gfs_fsck to force a rindex repair.  In your case,
you would need a complete rindex rebuild, which the code currently
doesn't do.

If your file system has ever been extended with gfs_grow, it
makes it extremely difficult for gfs_fsck to figure out how
the rindex should look.  Most of the code is already in
gfs_fsck to do this, but as a safety measure, it won't
overwrite your current rindex if it finds more than five
problems (iirc).

So it would take a while to get this working properly,
and even so, it's designed for RHEL5.4, not RHEL4.

I'm guessing that the resource groups are intact, so it should
be able to rebuild the rindex.  But there are obstacles to
overcome.  You would need a special program or a special version
of gfs_fsck that could do this for you.  It would be faster and
more reliable for you to mkfs the file system and restore from
backups if you got them.

> There should be nearly 6000 resource groups.
> 
> Perhaps I'll learn more if I can change the field for the block type
> so it's recognized. Can you tell me where it is and what should be in
> it?

No, that rindex block looks totally trashed.  You would have to
patch in the whole block.  It should look something like this:

Block #23    (0x17)           of 97255424 (0x5CC0000)  (disk inode)
(p.1 of 8)--------------------- rindex file -------------------
00017000 01161970 00000004 00000000 00000000 [...p............]
00017010 00000190 00000000 00000000 00000017 [................]
00017020 00000000 00000017 00000180 00000000 [................]
00017030 00000000 00000001 00000000 00022C80 [..............,.]
00017040 00000000 00000024 00000000 48A44678 [.......$....H.Fx]
00017050 00000000 48A44678 00000000 48A44678 [....H.Fx....H.Fx]
00017060 00000000 00000000 00000000 00000000 [................]
00017070 00000000 00000000 00000000 00000000 [................]
00017080 00000001 0000044C 00010001 00000000 [.......L........]
00017090 00000000 00000000 00000000 00000000 [................]
000170A0 00000000 00000000 00000000 00000000 [................]
000170B0 00000000 00000000 00000000 00000000 [................]
000170C0 00000000 00000000 00000000 00000000 [................]
000170D0 00000000 00000000 00000000 00000000 [................]
000170E0 00000000 00000000 00000000 0000001B [................] 
000170F0 00000000 0000001C 00000000 0000001D [................] 
00017100 00000000 0000001E 00000000 0000001F [................] 
00017110 00000000 00000020 00000000 00000021 [....... .......!] 
00017120 00000000 00000022 00000000 00000023 [.......".......#] 
00017130 00000000 00000024 00000000 00000025 [.......$.......%] 
00017140 00000000 00000026 00000000 00000027 [.......&.......'] 
00017150 00000000 00000028 00000000 00000029 [.......(.......)] 
00017160 00000000 0000002A 00000000 0000002B [.......*.......+] 
00017170 00000000 0000002C 00000000 0000002D [.......,.......-] 
00017180 00000000 0000002E 00000000 0000002F [.............../] 
00017190 00000000 00000030 00000000 00000031 [.......0.......1] 
000171A0 00000000 00000032 00000000 00000033 [.......2.......3] 
000171B0 00000000 00000034 00000000 00000035 [.......4.......5] 
000171C0 00000000 00000036 00000000 00000037 [.......6.......7] 
000171D0 00000000 00000038 00000000 00000039 [.......8.......9] 
000171E0 00000000 0000003A 00000000 0000003B [.......:.......;] 
000171F0 00000000 0000003C 00000000 0000003D [.......<.......=] 
         *** This seems to be a GFS-1 file system ***

The first pointer, at 000170e8 is "00000000 0000001B" which is
the first indirect block pointer.  That block should look something
like this:

Block #27    (0x1b)           of 97255424 (0x5CC0000)  (journal data)
(p.1 of 8)
0001B000 01161970 00000007 00000000 00000000 [...p............]
0001B010 000002BC 00000000 00000000 00000011 [................] 
0001B020 00000005 00000000 00000000 00000016 [................] 
0001B030 000101D8 00004076 00000000 00000000 [...... at v........] 
0001B040 00000000 00000000 00000000 00000000 [................] 
0001B050 00000000 00000000 00000000 00000000 [................] 
0001B060 00000000 00000000 00000000 00000000 [................] 
0001B070 00000000 00000000 00000000 000101EF [................] 
0001B080 00000005 00000000 00000000 000101F4 [................] 
0001B090 0000FFB8 00003FEE 00000000 00000000 [......?.........] 
0001B0A0 00000000 00000000 00000000 00000000 [................] 
0001B0B0 00000000 00000000 00000000 00000000 [................] 
0001B0C0 00000000 00000000 00000000 00000000 [................] 
etc.

If the first 20x bytes look like that, namely,
0001B000 01161970 00000007 00000000 00000000 [...p............]
0001B010 000002BC 00000000 00000000 00000011 [................] 

then that's good news and the indirect blocks are okay.
Each rindex record should be 0x60 bytes long.
The second entry in my example (0001b078) starts with the pointer
to "00000000 000101EF".  On your system it will be different
because it's a different file system.

If your block #27 looks like this, and you don't mind playing
with fire, you can try to patch the rindex block 23 to look
like I've got it above, but page down and fill in all the block
pointers starting with 1b, 1c, 1d, 1e, and so forth.  I'd use 
the 'f' key on block 1b to page forward until you find the end
of the rindex to figure out where to stop.  If you've really got
2000 resource groups, it will take quite a few blocks (30?) before
you hit the end of them.

You'll also need to patch the rindex inode value for file size
to be 0x2ee00 (corresponding to 2000 rg's * 0x60 bytes per rg).
That's at offset 0x38 thru 0x3f.

So you can try to patch them in, then I'd try to save the
metadata with gfs2_edit savemeta, then I'd run gfs_fsck to see
if it can figure it out from there.

This is just something you can try if you're desperate and have
no backups to restore from; I can't be held accountable for
problems, nor can Red Hat.

> Thanks,
> David Potterveld

Regards,

Bob Peterson
Red Hat Clustering & GFS