[Linux-cluster] Locking and performance questions regarding GFS1/2

Mon Jan 14 14:18:54 UTC 2008

Hi,

On Mon, 2008-01-14 at 15:06 +0100, Mathieu Avila wrote:
> Hello GFS developpers,
> 
> I have a few questions regarding how locking is performed in GFS, and
> the improvements brought by GFS2.
> 
> When I perform "ls" on a root directory of GFS1 that's been freshly
> mounted, it takes a time linear to the size of the FS. Nevertheless, it
> appears that the number of locks taken by GFS is always the same.
> When i perform this a second time, the command returns almost directly.
> What's the problem ? Was it solved in GFS2 ?
> 
I suspect that the time is proportional to the max number of entries
that the directory has ever contained at one time. This is the same
under both GFS1 and GFS2, although there is a bugzilla #223783 which is
designed to address the main part of the problem.

> When I perform "mkdir" or "touch" on either the root of a freshly
> mounted GFS1, or either on a subdirectory, it takes a time linear to
> the size of the FS. I understand that it must determine the best RG to
> put the dinode into, therefore reading a number of RG linear to the
> size of the FS (if I don't play with the RG size), and taking a number
> of locks also linear to the size of the FS.... This is the same
> behaviour as when I perform "df", i guess.
Basically yes. It reads all the RGs, although in the allocation case it
doesn't need to read all the RGs to work out where to put newly
allocated blocks, it only needs to read some of them. That also needs to
be fixed at some stage in the future.

> Is this behaviour different in GFS2 ? Wouldn't be a possibility for
> better behaviours, like, for example, taking the first free RG, if
> we encounter such a RG (which is the case when the FS has just
> been formated) ? Or maintaining fuzzy data about RG in an inode
> (just like it is done for the fuzzy statfs)  ? Or maybe this is useless,
> since it happens only at the first time after the FS is mounted on the
> first node, and you consider that a FS is not mounted/unmounted
> frequently ?
> However, has this been changed in GFS2 ?
> 
Not yet, but watch this space :-)

> When i read this:
> http://sourceware.org/cluster/faq.html#gfs_tuning
> I understand that i should increase the size of the RG on big FS.
> However, the code says that some data structures are loaded in memory
> for each RG that's being locked (notably 2 bitmaps). So there's a
> memory overhead when I increase the size of the RG. I also understand
> that increasing the size of the RG increases the risk to have 2 or more
> nodes working in the same RGs (is this right ?). What is the maximum
> size of RG I should be using ?
> 
RGs are limited to 2^32 blocks, including the RG header. Generally you
want to use a number or RGs >> number of nodes. Provided this is true
then you can make the RGs as large as you like (up to the 2^32 block
limit) without compromising performance.

There is not a lot to worry about so far as memory overhead goes. Either
you have more fewer larger RGs or more smaller RGs for a given
filesystem size.  There is a lot of overhead at the moment, but thats
something we need to address and not something thats really within the
users control.

> More generally, is there the list of hard points in GFS1 that you've
> been trying to solve with GFS2, somewhere accessible on the web ? Also,
> what is actually the maximum size that GFS2 is known to be working on ?
> (both in terms of nodes and real size)
> 
Really only whats in our bugzilla. Just search for all the bugs with
GFS2 somewhere in the title. I don't know what the max size of any
current GFS2 system is I'm afraid,

Steve.