[Linux-cluster] Slowness above 500 RRDs
Ferenc Wagner
wferi at niif.hu
Tue Jun 12 14:01:04 UTC 2007
Hi David,
Sorry if all what follows is misguided nonsense. I'm eager to learn...
David Teigland <teigland at redhat.com> writes:
> The new code has much better caching in the dlm which will benefit flocks,
> look at these flock numbers I sent before: [...]
>
> This is testing raw flock performance. The dlm locks for normal file
> operations should be cached and locally mastered also, so I'm not sure
> what's causing the long times. Make sure that drop_count is zero again,
> now it's in sysfs:
> echo 0 > /sys/fs/gfs/<foo>:<bar>/lock_module/drop_count
>
> Also, mount debugfs so we can check some stuff later:
> mount -t debugfs none /sys/kernel/debug
>
> Then run some tests:
> - mount on nodeA
> - run the test on nodeA
> - count locks on nodeA
> (cat /sys/kernel/debug/dlm/<bar> | grep Master | wc -l)
> - mount on nodeB (don't do anything on this node)
> - run the test again on nodeA
> - count locks on nodeA and nodeB (see above)
> - mount on nodeC (don't do anything on nodes B or C)
> - run the test again on nodeA
> - count locks on nodes A, B and C (see above)
>
> We're basically trying to produce the best-case performance from one node,
> nodeA. That means making sure that nodeA is mastering all locks and doing
> maximum caching. That's why it's important that we not do anything at all
> that accesses the fs on nodes B or C, or do any extra mounts/unmounts.
I made all the above tests and composed the reply a long time ago, but
now, getting back to it after that long time, I decided to satisfy your
curiosity, behold...
> Plocks will be much slower and are probably not interesting to test, but
> I'm curious if you added the "-l0" option to gfs_controld? That option
> turns off the code that intentionally limits the rate of plocks. See the
> old results again: [...]
Now, that switch makes ALL the difference. With a single node
switched on, I get results like this (with abbreviated strace -c
output appended):
without -l0:
filecount=500
iteration=0 elapsed time=10.444446 s
iteration=1 elapsed time=9.693618 s
iteration=2 elapsed time=10.520073 s
iteration=3 elapsed time=10.521504 s
iteration=4 elapsed time=10.520183 s
total elapsed time=51.699824 s
Process 5265 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
83.27 0.048525 6 7551 read
6.73 0.003923 2 2502 fcntl64
4.47 0.002606 1 2528 close
3.09 0.001801 1 2551 23 open
0.74 0.000432 0 2507 write
0.71 0.000415 0 5033 mmap2
0.41 0.000237 0 12528 3 _llseek
0.31 0.000178 0 5001 munmap
0.18 0.000107 0 5015 fstat64
0.08 0.000049 0 2506 gettimeofday
0.00 0.000000 0 16 14 ioctl
0.00 0.000000 0 202 182 stat64
------ ----------- ----------- --------- --------- ----------------
100.00 0.058273 47974 229 total
with -l0:
filecount=500
iteration=0 elapsed time=5.966146 s
iteration=1 elapsed time=0.582058 s
iteration=2 elapsed time=0.528272 s
iteration=3 elapsed time=0.936438 s
iteration=4 elapsed time=0.528147 s
total elapsed time=8.541061 s
Process 10030 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
57.17 0.016527 2 7551 read
21.49 0.006213 2 2528 close
8.16 0.002358 1 2502 fcntl64
6.59 0.001904 1 2551 23 open
2.21 0.000638 0 2507 write
1.46 0.000421 0 5033 mmap2
0.86 0.000249 249 1 execve
0.73 0.000212 0 5001 munmap
0.65 0.000187 0 12528 3 _llseek
0.57 0.000165 0 5015 fstat64
0.12 0.000034 0 2506 gettimeofday
0.00 0.000000 0 16 14 ioctl
0.00 0.000000 0 202 182 stat64
------ ----------- ----------- --------- --------- ----------------
100.00 0.028908 47974 229 total
Looks like the bottleneck isn't the explicit locking (be it plock or
flock), but something else, like the built-in GFS locking.
Similar dramatic speedup can be achieved (with a single node switched
on, again), by the lockproto=lock_nolock mount option, even if used
together with ignore_local_fs. It I understand it right, this
combination leaves the cluster-wide [pf]locks alone, just eliminates
the GFS internal locking, which guards the internal consistency of the
file system (please correct me if I'm wrong).
What's strange, is that gfs_controld -l0 seems like a perfectly safe
invocation (what's the catch, ie. why was the artifical limit
introduced?), still it achieves almost the same speedup like using
lock_nolock, which would be a disaster with more than one node
mounting the fs. (Also this trick scales pretty well to 4000 files.)
Again, the above tests were done with a single node switched on, and
I'm not sure whether the results carry over to the real cluster setup,
will test is soon. I didn't touch drop_count either, everything was
left as default, except for the mount options and the -l option.
Also, I can send the results of the scenario suggested by you, if it's
still relevant. In short: the locks are always mastered on node A
only, but the performance is poor nevertheless.
--
Regards,
Feri.
More information about the Linux-cluster
mailing list