[Linux-cluster] errors with GFS2 and DRBD. Please help..

Mon Mar 15 15:19:44 UTC 2010

----- "Koustubha Kale" <koustubha_kale at yahoo.com> wrote:
| Hi all,
| We have a three node GFS2 cluster on a CentOS 5.4 output of uname -a

| GFS2 errors and file system withdrawls, nodes restarting. The error in
| log is as shown below..

Hi,

What version of fsck.gfs2 did you use to fix these errors?

Not that long ago, I discovered that fsck.gfs2 is not always
cleaning everything up that it should on the first pass.
Sometimes it finds and fixes more inconsistencies on the second
run.  The issue will be much better when the 5.5 release is out.
But I've found some serious problems even in the 5.5 version.
For example, when orphaned dinodes are tossed into lost+found,
it can sometimes get the block accounting wrong.

I've got a better, faster fsck.gfs2 on my people page for
people to try.  This one is more thorough, better block accounting
and has added error checking, so it should do a much better job
of cleaning things up.  It's had a lot of testing and has gotten
a lot of positive feedback from other people too:

http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/fsck.gfs2

This is an x86_64 version.  I recommend these steps:

1. Download this experimental fsck.gfs2 to some directory
2. Unmount the file system from all nodes
3. Save off a copy of the file system metadata:
   gfs2_edit savemeta /dev/device /some/file.meta
   This saved copy means you can always go back if fsck.gfs2
   makes some kind of mistake
4. run the new fsck.gfs2 on the file system

See if that helps the situation.

Regards,

Bob Peterson
Red Hat File Systems