[Linux-cluster] Slowness above 500 RRDs

Mon May 21 14:50:58 UTC 2007

On Mon, May 21, 2007 at 12:55:13PM +0200, Ferenc Wagner wrote:
> Hi,
> 
> I installed the new OpenAIS infrastructure (cluster-2.00.00) on my
> cluster.  Here are the results of the usual tests:
> 
> == Using PLOCKS ==
> filecount=500
>   iteration=0 elapsed time=10.294059 s
>   iteration=1 elapsed time=9.416584 s
>   iteration=2 elapsed time=10.30608 s
>   iteration=3 elapsed time=10.294692 s
>   iteration=4 elapsed time=10.280316 s
> total elapsed time=50.591731 s
> filecount=501
>   iteration=0 elapsed time=11.170378 s
>   iteration=1 elapsed time=10.312731 s
>   iteration=2 elapsed time=10.308767 s
>   iteration=3 elapsed time=10.308905 s
>   iteration=4 elapsed time=10.308703 s
> total elapsed time=52.409484 s
> 
> == Using FLOCKS ==
> filecount=500
>   iteration=0 elapsed time=5.311825 s
>   iteration=1 elapsed time=7.030903 s
>   iteration=2 elapsed time=5.23619 s
>   iteration=3 elapsed time=5.229282 s
>   iteration=4 elapsed time=5.235798 s
> total elapsed time=28.043998 s
> filecount=501
>   iteration=0 elapsed time=5.03941 s
>   iteration=1 elapsed time=5.278866 s
>   iteration=2 elapsed time=4.93929 s
>   iteration=3 elapsed time=5.271063 s
>   iteration=4 elapsed time=5.133244 s
> total elapsed time=25.661873 s
> 
> The magic limit of 500 disappeared, and flocks are indeed twice as
> fast as plocks, something you probably expected.  However, in the
> first test the old infrastructure was much-much faster, doing an
> iteration in about 0.2 s, thanks to some caching effect.  This is the
> kind of performance I'm after.  How can I tune the new infrastructure,
> like cache sizes in the old?  Or should I look for something else?

The new code has much better caching in the dlm which will benefit flocks,
look at these flock numbers I sent before:

# fplockperf 
flock: filecount=100 iteration=0 elapsed time=0.098 s
flock: filecount=100 iteration=1 elapsed time=0.007 s
flock: filecount=100 iteration=2 elapsed time=0.007 s
flock: filecount=100 iteration=3 elapsed time=0.008 s
flock: filecount=100 iteration=4 elapsed time=0.007 s
total elapsed time=0.129 s
flock: filecount=500 iteration=0 elapsed time=0.483 s
flock: filecount=500 iteration=1 elapsed time=0.037 s
flock: filecount=500 iteration=2 elapsed time=0.039 s
flock: filecount=500 iteration=3 elapsed time=0.037 s
flock: filecount=500 iteration=4 elapsed time=0.037 s
total elapsed time=0.634 s
flock: filecount=1000 iteration=0 elapsed time=0.523 s
flock: filecount=1000 iteration=1 elapsed time=0.077 s
flock: filecount=1000 iteration=2 elapsed time=0.076 s
flock: filecount=1000 iteration=3 elapsed time=0.076 s
flock: filecount=1000 iteration=4 elapsed time=0.076 s
total elapsed time=0.830 s
flock: filecount=2000 iteration=0 elapsed time=1.064 s
flock: filecount=2000 iteration=1 elapsed time=0.151 s
flock: filecount=2000 iteration=2 elapsed time=0.151 s
flock: filecount=2000 iteration=3 elapsed time=0.146 s
flock: filecount=2000 iteration=4 elapsed time=0.147 s
total elapsed time=1.661 s
flock: filecount=5000 iteration=0 elapsed time=3.505 s
flock: filecount=5000 iteration=1 elapsed time=0.405 s
flock: filecount=5000 iteration=2 elapsed time=0.407 s
flock: filecount=5000 iteration=3 elapsed time=0.405 s
flock: filecount=5000 iteration=4 elapsed time=0.405 s
total elapsed time=5.128 s

This is testing raw flock performance.  The dlm locks for normal file
operations should be cached and locally mastered also, so I'm not sure
what's causing the long times.  Make sure that drop_count is zero again,
now it's in sysfs:
  echo 0 > /sys/fs/gfs/<foo>:<bar>/lock_module/drop_count

Also, mount debugfs so we can check some stuff later:
  mount -t debugfs none /sys/kernel/debug

Then run some tests:
- mount on nodeA
- run the test on nodeA
- count locks on nodeA
  (cat /sys/kernel/debug/dlm/<bar> | grep Master | wc -l)
- mount on nodeB (don't do anything on this node)
- run the test again on nodeA
- count locks on nodeA and nodeB (see above)
- mount on nodeC (don't do anything on nodes B or C)
- run the test again on nodeA
- count locks on nodes A, B and C (see above)

We're basically trying to produce the best-case performance from one node,
nodeA.  That means making sure that nodeA is mastering all locks and doing
maximum caching.  That's why it's important that we not do anything at all
that accesses the fs on nodes B or C, or do any extra mounts/unmounts.

Plocks will be much slower and are probably not interesting to test, but
I'm curious if you added the "-l0" option to gfs_controld?  That option
turns off the code that intentionally limits the rate of plocks.  See the
old results again:

without -l0:

plock: filecount=500 iteration=0 elapsed time=10.519 s
plock: filecount=500 iteration=1 elapsed time=10.350 s
plock: filecount=500 iteration=2 elapsed time=10.457 s
plock: filecount=500 iteration=3 elapsed time=10.178 s
plock: filecount=500 iteration=4 elapsed time=10.164 s

with -l0:

plock: filecount=500 iteration=0 elapsed time=3.010 s
plock: filecount=500 iteration=1 elapsed time=3.008 s
plock: filecount=500 iteration=2 elapsed time=2.993 s
plock: filecount=500 iteration=3 elapsed time=3.013 s
plock: filecount=500 iteration=4 elapsed time=3.006 s

Dave