[Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree

Mon Oct 10 14:43:20 UTC 2011

On Sat, Oct 08, 2011 at 06:13:52AM -0400, Bob Peterson wrote:
> ----- Original Message -----
> | On Wed, Oct 05, 2011 at 03:25:39PM -0400, Bob Peterson wrote:
> | > Hi,
> | > 
> | > This upstream patch changes the way DLM keeps track of RSBs.
> | > Before, they were in a linked list off a hash table.  Now,
> | > they're an rb_tree off the same hash table.  This speeds up
> | > DLM lookups greatly.
> | > 
> | > Today's DLM is faster than older DLMs for many file systems,
> | > (e.g. in RHEL5) due to the larger hash table size.  However,
> | > this rb_tree implementation scales much better.  For my
> | > 1000-directories-with-1000-files test, the patch doesn't
> | > show much of an improvement.  But when I scale the file system
> | > to 4000 directories with 4000 files (16 million files), it
> | > helps greatly. The time to do rm -fR /mnt/gfs2/* drops from
> | > 42.01 hours to 23.68 hours.
> | 
> | How many hash table buckets were you using in that test?
> | If it was the default (1024), I'd be interested to know how
> | 16k compares.
> 
> Hi,
> 
> Interestingly, on the stock 2.6.32-206.el6.x86_64 kernel
> and 16K hash buckets, the time was virtually the same as
> with my patch: 1405m46.519s (23.43 hours). So perhaps we
> should re-evaluate whether we should use the rb_tree
> implementation or just increase the hash buckets as needed.
> I guess the question is now mainly related to scaling and
> memory usage for all those hash tables at this point.

I'm still interested in possibly using an rbtree with fewer hash buckets.

At the same time, I think the bigger problem may be why gfs2 is caching so
many locks in the first place, especially for millions of unlinked files
whose locks will never benefit you again.