ben.yarwood at juno.co.uk
Tue Aug 28 15:54:18 UTC 2007
Sorry for the blank message.
Thanks for the advice, unfortunately the file system is part of a back end for a website which is permanently in use so taking it
offline overnight is not really an option. We do have a virtually live backup copy so I will get this fully synchronised and then
try the gfs_fsck process you suggested.
As a quick solution, I think I'll just unmount the file system from all nodes and run gfs_fsck until I see pass one start then kill
it. Hopefully the problem will have been solved. If this doesn't work, I'll mount the backup and run the full gfs_fsck. If that
doesn't work, I'll rebuild the whole file system.
Regarding gfs_grow, in my experience it often/always fails when the file system which is being grown is in use. Previously I have
just waited until it fails and then run it again once I have stopped any process that is accessing the particular file system. I'm
not sure what possessed me to hit <ctrl-c>, not my finest moment!
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bob
> Sent: 28 August 2007 15:31
> To: linux clustering
> Subject: Re: [Linux-cluster] gfs_grow
> On Tue, 2007-08-28 at 10:08 +0100, Ben Yarwood wrote:
> > I am using a 3 Node cluster using RHEL4U4.
> > I ran a gfs_grow yesterday on one of our filesystems but stupidly missed a process that was using
> the same file system. The grow
> > process hung and when I got it to exit, the file system is now reporting as having grown to the
> larger size but no extra space has
> > appeared. Basically my file system grew from 14TB to 15TB and my usage also grew from 13TB to 14TB.
> > Does anyone know if it's possible to get this space back? I know I could probably do as gfs_fsck
> but given the size of the file
> > system, this would take a few days according to some previous reports.
> > Thanks
> > Ben
> Hi Ben,
> The fact that there was a process using the file system shouldn't have
> been a problem and gfs_grow should have been able to work around it.
> It would have been interesting to see where gfs_grow was "hung" but it's
> too late for that now. My guess is that you killed gfs_grow before it
> was able to update the resource group index properly.
> In RHEL4U4 there is a feature to gfs_fsck to change and repair damaged
> RGs and RG indexes. Things get tricky for the code once the file system
> has been extended though, so although you probably don't want to hear
> this, you should probably make a backup of your data first, just to be
> Running gfs_fsck will take a while on a file system that big, but it
> depends on the speed of your hardware. I'd expect it to take less than
> a day to complete. If you can't afford the down time, it might be
> helpful to know that the RG repair is done before any of the passes, so
> in theory you could probably try to use it to repair the RGs and then
> kill the gfs_fsck. Newer versions of gfs_fsck will catch <ctrl-c>
> interrupts and give you options to skip around parts, but I don't think
> that's in RHEL4U4 (I think it got into RHEL4.5).
> So I guess my recommendation is:
> 1. Make a backup of your data
> 2. Wait until most people have gone home for the day
> 3. Unmount the file system from ALL nodes.
> 4. Run gfs_fsck.
> 5. Watch the gfs_fsck output for messages about finding and fixing
> RG damage just so you know it did something.
> 6. Let gfs_fsck run overnight.
> 7. If you need the file system back and it's still running by morning,
> you could kill it manually. It would be better to let it run, but
> it shouldn't do any harm to kill it prematurely if necessary.
> 8. Remount the file system and see if df shows the correct values.
> I hope this helps.
> Bob Peterson
> Red Hat Cluster Suite
> Linux-cluster mailing list
> Linux-cluster at redhat.com
More information about the Linux-cluster