[Linux-cluster] GFS: more simple performance numbers

Fri Oct 29 21:40:33 UTC 2004

On Thu, 2004-10-21 at 05:06, David Teigland wrote:
>[snip]
> 
> I've found part of the problem by running the following tests.  (I have
> more modest hardware: 256MB memory, Dual Pentium III 700 MHz)
> 
> Here's the test I ran on just a single node:
> 
> > time tar xf /tmp/linux-2.6.8.1.tar;
>   time du -s linux-2.6.8.1/;
>   time du -s linux-2.6.8.1/
> 
> 1. lock_nolock
> 
> tar: real    1m6.859s
> du1: real    0m45.952s
> du2: real    0m1.934s
> 
> 2. lock_dlm, this is the only node mounted
> 
> tar: real    1m20.130s
> du1: real    0m52.483s
> du2: real    1m4.533s
> 
> Notice that the problem is not the first du which looks normal compared to
> the nolock results, but the second du is definately bad.
> 
> 3. lock_dlm, this is the only node mounted
>    * changed lock_dlm.h DROP_LOCKS_COUNT from 10,000 to 100,000
> 
> tar: real    1m16.028s
> du1: real    0m48.636s
> du2: real    0m2.332s
> 
> No more problem.
> 
> 
> Comentary:
> 
> When gfs is holding over DROP_LOCKS_COUNT locks (locally), lock_dlm tells
> gfs to "drop locks".  When gfs drops locks, it invalidates the cached data
> they protect.  du in the linux src tree requires gfs to acquire some
> 16,000 locks.  Since this exceeded 10,000, lock_dlm was having gfs toss
> the cached data from the previous du.  If we raise the limit to 100,000,
> there's no "drop locks" callback and everything remains cached.
> 
> This "drop locks" callback is a way for the lock manager to throttle
> things when it begins reaching its own limitations.  10,000 was picked
> pretty arbitrarily because there's no good way for the dlm to know when
> it's reaching its limitations.  This is because the main limitation is
> free memory on remote nodes.
> 
> The dlm can get into a real problem if gfs hold "too many" locks.  If a
> gfs node fails, it's likely that some of the locks the dlm mastered on
> that node need to be remastered on remaining nodes.  Those remaining nodes
> may not have enough memory to remaster all the locks -- the dlm recovery
> process eats up all the memory and hangs.
> 
> Part of a solution would be to have gfs free a bunch of locks at this
> point, but that's not a near-term option.  So, we're left with the
> tradeoff:  favoring performance and increasing risk of too little memory
> for recovery or v.v.
> 
> Given my machines and the test I was running, 10,000 solved the recovery
> problem.  256MB is obviously behind the times making a default of 10,000
> probably too low.  I'll increase the constant and make it configurable
> through /proc.

David,

Can you explain what gfs does with the callback?
Does it drop the locks for all gfs inodes or just the
ones that are not actively being used?
Does gfs still cache the inode without holding the dlm lock?
How much memory does a dlm lock take?  10,000 seems very small
for machines today.

Would it make sense to limit the number of gfs inodes which,
in turn, would limit the number of dlm_locks? 

It seems to me the number of inodes and number of dlm locks
should scale together.

Thanks,

Daniel