[Linux-cluster] gfs_grow

Tue Aug 28 15:54:18 UTC 2007

Sorry for the blank message.

Thanks for the advice, unfortunately the file system is part of a back end for a website which is permanently in use so taking it
offline overnight is not really an option.  We do have a virtually live backup copy so I will get this fully synchronised and then
try the gfs_fsck process you suggested.  

As a quick solution, I think I'll just unmount the file system from all nodes and run gfs_fsck until I see pass one start then kill
it.  Hopefully the problem will have been solved.  If this doesn't work, I'll mount the backup and run the full gfs_fsck.  If that
doesn't work, I'll rebuild the whole file system.  

Regarding gfs_grow, in my experience it often/always fails when the file system which is being grown is in use.  Previously I have
just waited until it fails and then run it again once I have stopped any process that is accessing the particular file system.  I'm
not sure what possessed me to hit <ctrl-c>, not my finest moment!

Regards
Ben

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bob
> Peterson
> Sent: 28 August 2007 15:31
> To: linux clustering
> Subject: Re: [Linux-cluster] gfs_grow
> 
> On Tue, 2007-08-28 at 10:08 +0100, Ben Yarwood wrote:
> > I am using a 3 Node cluster using RHEL4U4.
> >
> > I ran a gfs_grow yesterday on one of our filesystems but stupidly missed a process that was using
> the same file system.  The grow
> > process hung and when I got it to exit, the file system is now reporting as having grown to the
> larger size but no extra space has
> > appeared.  Basically my file system grew from 14TB to 15TB and my usage also grew from 13TB to 14TB.
> >
> > Does anyone know if it's possible to get this space back?  I know I could probably do as gfs_fsck
> but given the size of the file
> > system, this would take a few days according to some previous reports.
> >
> > Thanks
> > Ben
> 
> Hi Ben,
> 
> The fact that there was a process using the file system shouldn't have
> been a problem and gfs_grow should have been able to work around it.
> It would have been interesting to see where gfs_grow was "hung" but it's
> too late for that now.  My guess is that you killed gfs_grow before it
> was able to update the resource group index properly.
> 
> In RHEL4U4 there is a feature to gfs_fsck to change and repair damaged
> RGs and RG indexes.  Things get tricky for the code once the file system
> has been extended though, so although you probably don't want to hear
> this, you should probably make a backup of your data first, just to be
> safe.
> 
> Running gfs_fsck will take a while on a file system that big, but it
> depends on the speed of your hardware.  I'd expect it to take less than
> a day to complete.  If you can't afford the down time, it might be
> helpful to know that the RG repair is done before any of the passes, so
> in theory you could probably try to use it to repair the RGs and then
> kill the gfs_fsck.  Newer versions of gfs_fsck will catch <ctrl-c>
> interrupts and give you options to skip around parts, but I don't think
> that's in RHEL4U4 (I think it got into RHEL4.5).
> 
> So I guess my recommendation is:
> 
> 1. Make a backup of your data
> 2. Wait until most people have gone home for the day
> 3. Unmount the file system from ALL nodes.
> 4. Run gfs_fsck.
> 5. Watch the gfs_fsck output for messages about finding and fixing
>    RG damage just so you know it did something.
> 6. Let gfs_fsck run overnight.
> 7. If you need the file system back and it's still running by morning,
>    you could kill it manually.  It would be better to let it run, but
>    it shouldn't do any harm to kill it prematurely if necessary.
> 8. Remount the file system and see if df shows the correct values.
> 
> I hope this helps.
> 
> Regards,
> 
> Bob Peterson
> Red Hat Cluster Suite
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster