[Linux-cluster] panic after "jid=0, already locked for use"

Thu Dec 11 17:04:13 UTC 2008

----- "Nathan Stratton" <nathan at robotics.net> wrote:
| I know I screwed up, I should never have use GFS2, but is there any
| way to 
| pull the data off?

Hi Nathan,

You're not seeing the "---" because you're running an older version
of the code.  The newer gfs2-utils versions of gfs2_edit give the
"---" stuff to help identify where the journal wrapped.

At any rate, if you grep out only the Log headers from the journal,
the problem can be seen at this point in the journal:

Block #53b7: Log header: Seq= 0x81be, tail = 0x53a4, blk = 0x53b7
Block #53c6: Log header: Seq= 0x81c1, tail = 0x53be, blk = 0x53c6
Block #53c8: Log header: Seq= 0x81c1, tail = 0x53c8, blk = 0x53c8
Block #53d3: Log header: Seq= 0x81c5, tail = 0x53ce, blk = 0x53d3
Block #53d4: Log header: Seq= 0x81c6, tail = 0x53d4, blk = 0x53d4
Block #53d5: Log header: Seq= 0x81c7, tail = 0x53d5, blk = 0x53d5
Block #53d6: Log header: Seq= 0x81c3, tail = 0x53c9, blk = 0x53d6
Block #53d8: Log header: Seq= 0x81c4, tail = 0x53c9, blk = 0x53d8
Block #53d9: Log header: Seq= 0x81c5, tail = 0x53d9, blk = 0x53d9
Block #53e1: Log header: Seq= 0x81c6, tail = 0x53da, blk = 0x53e1

Notice how the sequence numbers are, well, out of sequence.
The numbers should be consecutive.  It should read something like:

Block #53b7: Log header: Seq= 0x81be, tail = 0x53a4, blk = 0x53b7
Block #53c6: Log header: Seq= 0x81bf, tail = 0x53be, blk = 0x53c6
Block #53c8: Log header: Seq= 0x81c0, tail = 0x53c8, blk = 0x53c8
Block #53d3: Log header: Seq= 0x81c1, tail = 0x53ce, blk = 0x53d3
Block #53d4: Log header: Seq= 0x81c2, tail = 0x53d4, blk = 0x53d4
Block #53d5: Log header: Seq= 0x81c3, tail = 0x53d5, blk = 0x53d5
Block #53d6: Log header: Seq= 0x81c4, tail = 0x53c9, blk = 0x53d6
Block #53d8: Log header: Seq= 0x81c5, tail = 0x53c9, blk = 0x53d8
Block #53d9: Log header: Seq= 0x81c6, tail = 0x53d9, blk = 0x53d9
Block #53e1: Log header: Seq= 0x81c7, tail = 0x53da, blk = 0x53e1
And so on.

The only way to fix it is to patch it by hand with gfs2_edit or
write a tool in C.  The job would be a lot easier if you had a
newer version of gfs2_edit because instead of dumping out the
relative block number in the journal, the newer code gives the
actual block address.  That way you can identify where the
blocks really are, and patch them by hand.

Since the sequence numbers are overlapping, you'll have to
renumber all of them (in hex) starting at the log header in
journal block 53b7 until journal block 5563.  There are only
about 50 log headers that need to be patched to fix the problem
and get your data back.

So my advice is:

1. Install git
2. Grab the latest source code for gfs2-utils from the cluster
   git repository.  You'll also need to install a few things in
   order to compile the gfs2 directory: ncurses-devel and
   libvolume_id-devel and kernel-devel if they're not already on.
3. git checkout RHEL5 .
4. Compile libgfs2 and gfs2_edit from source.
5. gfs2_edit savemeta to save off a copy of the (corrupt) metadata.
6. Redo the gfs2_edit -p journal0 to get a better list of the
   blocks that need patching.
7. Use gfs2_edit to jump from block to block, patching only
   the sequence numbers, renumbering them sequentially
   starting at 0x81bf until the log wraps at block 5563,
   which is about 49 of them.
8. Use gfs2_edit -p journal0 to verify you renumbered them correctly.
9. Try to mount the file system.

This worked for the other guy who had a similar problem.
Please approach gfs2_edit with caution; it is not for the timid.

Regards,

Bob Peterson
Red Hat GFS