[Linux-cluster] Cluster Project FAQ - GFS tuning section

Jon Erickson erickson.jon at gmail.com
Thu Jan 11 18:48:38 UTC 2007


Robert,

> What version of the gfs_mkfs code were you running to get this?
gfs_mkfs -V produced the following results:

gfs_mkfs 6.1.6 (built May 9 2006 17:48:45)
Copyright (C) Red Hat, Inc.  2004-2005 All rights reserved

Thanks,
Jon

On 1/11/07, Robert Peterson <rpeterso at redhat.com> wrote:
> Jon Erickson wrote:
> > I have a couple of question regarding the Cluster Project FAQ – GFS
> > tuning section (http://sources.redhat.com/cluster/faq.html#gfs_tuning).
> >
> > First:
> > -    Use –r 2048 on gfs_mkfs and mkfs.gfs2 for large file systems.
> > I noticed that when I used the –r 2048 switch while creating my file
> > system it ended up creating the file system with the 256MB resource
> > group size.  When I omitted the –r flag the file system was created
> > with 2048Mb resource group size.  Is there a problem with the –r flag,
> > and does gfs_mkfs dynamically come up with the best resource group
> > size based on your file system size?  Another thing I did which ended
> > up in a problem was executing the gfs_mkfs command while my current
> > GFS file system was mounted.  The command completed successfully but
> > when I went into the mount point all the old files and directories
> > still showed up.  When I attempted to remove files bad things
> > happened…I believe I received invalid metadata blocks error and the
> > cluster went into an infinite loop trying to restart the service.  I
> > ended up fixing this problem by un-mounting my file system re-creating
> > the GFS file system and re-mounting.  This problem was caused by my
> > user error, but maybe there should be some sort of check that
> > determines whether the file system is currently mounted.
> >
> > Second:
> > -    Break file systems up when huge numbers of file are involved.
> > This FAQ states that there is an amount of overhead when dealing with
> > lots (millions) of files.  What is a recommended limit of files in a
> > file system?  The theoretical limit of 8 exabytes for a file system
> > does not seem at all realistic if you can't have (millions) of files
> > in a file system.
> >
> > I just curious to see what everyone thinks about this.  Thanks
> >
> >
> Hi Jon,
>
> The newer gfs_mkfs (gfs1) and mkfs.gfs2 (gfs2) in the CVS HEAD will
> create the RG size based on the size of the file system if "-r" is not
> specified,
> so that would explain why it used 2048 in the case where you didn't
> specify -r.
> The previous versions just always assumed 256MB unless -r was specified.
>
> If you specified -r 2048 and it used 256 for its rg size, that would be
> a bug.
> What version of the gfs_mkfs code were you running to get this?
>
> I agree that it would be very nice if all the userspace GFS-related
> tools could
> make sure the file system is not mounted anywhere first before running.
> We even have a bugzilla from long ago about this regarding gfs_fsck:
>
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156012
>
> It's easy enough to check if the local node (the one running mkfs or fsck)
> has it mounted, but it's harder to figure out whether other nodes do because
> the userland tools can't assume access to the cluster infrastructure
> like the
> kernel code can.  So I guess we haven't thought of an elegant solution to
> this yet; we almost need to query every node and check its cman_tool
> services output to see if it is using resources pertaining to the file
> system,
> but that would require some kind of socket or connection,
> (e.g. ssh) but what should it do when it can't contact a node that's powered
> off, etc?
>
> Regarding the number of files in a GFS file system:  I don't have any kind
> of recommendations because I haven't studied the exact performance impact
> based on the number of inodes.  It would be cool if someone could do some
> tests and see where the performance starts to degrade.
>
> The cluster team at Red Hat can work toward improving the performance
> of GFS (in fact, we are; hence the change to gfs_mkfs for the rg size),
> but many of the performance issues are already addressed with GFS2,
> and since GFS2 was accepted by the upstream linux kernel, in a way
> I think it makes more sense to focus more of our efforts there.
>
> One thing I thought about doing was trying to use btrees instead of
> linked lists for some of our more critical resources, like the RGs and
> the glocks.  We'd have to figure out the impact of doing that; the overhead
> to do that might also impact performance.  Just my $0.02.
>
> Regards,
>
> Bob Peterson
> Red Hat Cluster Suite
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Jon




More information about the Linux-cluster mailing list