[Linux-cluster] Slowness above 500 RRDs

Ferenc Wagner wferi at niif.hu
Wed Apr 25 15:47:53 UTC 2007


David Teigland <teigland at redhat.com> writes:

> On Mon, Apr 23, 2007 at 04:17:18PM -0500, David Teigland wrote:
>> > Also, what's that new infrastructure?  Do you mean GFS2?  I read it
>> > was not production quality yet, so I didn't mean to try it.  But again
>> > you may have got something else in your head...
>> 
>> GFS1 and GFS2 both run on the new openais-based cluster infrastructure.
>> (in the cluster-2.00.00 release, and the RHEL5 and HEAD cvs branches).
>
> I've attached a little flock/plock performance test that emulates what
> you're doing; could you run it on your cluster and send the results?

Here you go.  This is with three nodes mounting the FS, one running
the test.  /proc/cluster/lock_dlm/drop_count was set to 0 before
the mounts.

wferi at rs20:/mnt/rrdtest/david/test$ ../fplockperf 
flock: filecount=100 iteration=0 elapsed time=1.163 s
flock: filecount=100 iteration=1 elapsed time=1.200 s
flock: filecount=100 iteration=2 elapsed time=1.200 s
flock: filecount=100 iteration=3 elapsed time=1.200 s
flock: filecount=100 iteration=4 elapsed time=1.200 s
total elapsed time=5.963 s
flock: filecount=500 iteration=0 elapsed time=6.997 s
flock: filecount=500 iteration=1 elapsed time=6.998 s
flock: filecount=500 iteration=2 elapsed time=7.235 s
flock: filecount=500 iteration=3 elapsed time=6.999 s
flock: filecount=500 iteration=4 elapsed time=6.999 s
total elapsed time=35.228 s
flock: filecount=1000 iteration=0 elapsed time=13.797 s
flock: filecount=1000 iteration=1 elapsed time=14.006 s
flock: filecount=1000 iteration=2 elapsed time=13.798 s
flock: filecount=1000 iteration=3 elapsed time=14.010 s
flock: filecount=1000 iteration=4 elapsed time=13.798 s
total elapsed time=69.408 s
flock: filecount=2000 iteration=0 elapsed time=28.888 s
flock: filecount=2000 iteration=1 elapsed time=28.883 s
flock: filecount=2000 iteration=2 elapsed time=28.675 s
flock: filecount=2000 iteration=3 elapsed time=28.879 s
flock: filecount=2000 iteration=4 elapsed time=28.879 s
total elapsed time=144.205 s
flock: filecount=5000 iteration=0 elapsed time=71.272 s
flock: filecount=5000 iteration=1 elapsed time=68.620 s
flock: filecount=5000 iteration=2 elapsed time=68.668 s
flock: filecount=5000 iteration=3 elapsed time=68.664 s
flock: filecount=5000 iteration=4 elapsed time=68.676 s
total elapsed time=345.901 s
plock: filecount=100 iteration=0 elapsed time=1.515 s
plock: filecount=100 iteration=1 elapsed time=1.480 s
plock: filecount=100 iteration=2 elapsed time=1.480 s
plock: filecount=100 iteration=3 elapsed time=1.480 s
plock: filecount=100 iteration=4 elapsed time=1.480 s
total elapsed time=7.434 s
plock: filecount=500 iteration=0 elapsed time=6.156 s
plock: filecount=500 iteration=1 elapsed time=6.011 s
plock: filecount=500 iteration=2 elapsed time=6.279 s
plock: filecount=500 iteration=3 elapsed time=6.043 s
plock: filecount=500 iteration=4 elapsed time=6.039 s
total elapsed time=30.528 s
plock: filecount=1000 iteration=0 elapsed time=12.382 s
plock: filecount=1000 iteration=1 elapsed time=12.709 s
plock: filecount=1000 iteration=2 elapsed time=12.318 s
plock: filecount=1000 iteration=3 elapsed time=12.471 s
plock: filecount=1000 iteration=4 elapsed time=12.609 s
total elapsed time=62.488 s
plock: filecount=2000 iteration=0 elapsed time=26.784 s
plock: filecount=2000 iteration=1 elapsed time=30.671 s
plock: filecount=2000 iteration=2 elapsed time=30.563 s
plock: filecount=2000 iteration=3 elapsed time=30.407 s
plock: filecount=2000 iteration=4 elapsed time=30.443 s
total elapsed time=148.867 s
plock: filecount=5000 iteration=0 elapsed time=81.813 s
plock: filecount=5000 iteration=1 elapsed time=79.037 s
plock: filecount=5000 iteration=2 elapsed time=77.407 s
plock: filecount=5000 iteration=3 elapsed time=77.771 s
plock: filecount=5000 iteration=4 elapsed time=78.243 s
total elapsed time=394.271 s

=======================================================

After umount on all nodes, rmmod gfs lock_dlm, mount on single
node, rm file000000*:

wferi at rs20:/mnt/rrdtest/david/test$ ../fplockperf 
flock: filecount=100 iteration=0 elapsed time=0.008 s
flock: filecount=100 iteration=1 elapsed time=0.008 s
flock: filecount=100 iteration=2 elapsed time=0.008 s
flock: filecount=100 iteration=3 elapsed time=0.008 s
flock: filecount=100 iteration=4 elapsed time=0.008 s
total elapsed time=0.039 s
flock: filecount=500 iteration=0 elapsed time=0.039 s
flock: filecount=500 iteration=1 elapsed time=0.038 s
flock: filecount=500 iteration=2 elapsed time=0.038 s
flock: filecount=500 iteration=3 elapsed time=0.038 s
flock: filecount=500 iteration=4 elapsed time=0.038 s
total elapsed time=0.192 s
flock: filecount=1000 iteration=0 elapsed time=0.077 s
flock: filecount=1000 iteration=1 elapsed time=0.076 s
flock: filecount=1000 iteration=2 elapsed time=0.076 s
flock: filecount=1000 iteration=3 elapsed time=0.077 s
flock: filecount=1000 iteration=4 elapsed time=0.077 s
total elapsed time=0.383 s
flock: filecount=2000 iteration=0 elapsed time=0.153 s
flock: filecount=2000 iteration=1 elapsed time=0.153 s
flock: filecount=2000 iteration=2 elapsed time=0.154 s
flock: filecount=2000 iteration=3 elapsed time=0.153 s
flock: filecount=2000 iteration=4 elapsed time=0.150 s
total elapsed time=0.763 s
flock: filecount=5000 iteration=0 elapsed time=0.377 s
flock: filecount=5000 iteration=1 elapsed time=0.373 s
flock: filecount=5000 iteration=2 elapsed time=0.378 s
flock: filecount=5000 iteration=3 elapsed time=0.381 s
flock: filecount=5000 iteration=4 elapsed time=0.385 s
total elapsed time=1.895 s
plock: filecount=100 iteration=0 elapsed time=0.017 s
plock: filecount=100 iteration=1 elapsed time=0.015 s
plock: filecount=100 iteration=2 elapsed time=0.015 s
plock: filecount=100 iteration=3 elapsed time=0.016 s
plock: filecount=100 iteration=4 elapsed time=0.015 s
total elapsed time=0.079 s
plock: filecount=500 iteration=0 elapsed time=0.089 s
plock: filecount=500 iteration=1 elapsed time=0.081 s
plock: filecount=500 iteration=2 elapsed time=0.081 s
plock: filecount=500 iteration=3 elapsed time=0.080 s
plock: filecount=500 iteration=4 elapsed time=0.081 s
total elapsed time=0.412 s
plock: filecount=1000 iteration=0 elapsed time=0.182 s
plock: filecount=1000 iteration=1 elapsed time=0.179 s
plock: filecount=1000 iteration=2 elapsed time=0.178 s
plock: filecount=1000 iteration=3 elapsed time=0.178 s
plock: filecount=1000 iteration=4 elapsed time=0.177 s
total elapsed time=0.894 s
plock: filecount=2000 iteration=0 elapsed time=0.437 s
plock: filecount=2000 iteration=1 elapsed time=0.446 s
plock: filecount=2000 iteration=2 elapsed time=0.455 s
plock: filecount=2000 iteration=3 elapsed time=0.457 s
plock: filecount=2000 iteration=4 elapsed time=0.460 s
total elapsed time=2.255 s
plock: filecount=5000 iteration=0 elapsed time=1.136 s
plock: filecount=5000 iteration=1 elapsed time=1.159 s
plock: filecount=5000 iteration=2 elapsed time=1.151 s
plock: filecount=5000 iteration=3 elapsed time=1.153 s
plock: filecount=5000 iteration=4 elapsed time=1.171 s
total elapsed time=5.770 s

======================================================

After mount on another node, rm file000000*:

wferi at rs20:/mnt/rrdtest/david/test$ ../fplockperf 
flock: filecount=100 iteration=0 elapsed time=0.013 s
flock: filecount=100 iteration=1 elapsed time=0.013 s
flock: filecount=100 iteration=2 elapsed time=0.013 s
flock: filecount=100 iteration=3 elapsed time=0.013 s
flock: filecount=100 iteration=4 elapsed time=0.013 s
total elapsed time=0.066 s
flock: filecount=500 iteration=0 elapsed time=0.067 s
flock: filecount=500 iteration=1 elapsed time=0.067 s
flock: filecount=500 iteration=2 elapsed time=0.067 s
flock: filecount=500 iteration=3 elapsed time=0.067 s
flock: filecount=500 iteration=4 elapsed time=0.067 s
total elapsed time=0.333 s
flock: filecount=1000 iteration=0 elapsed time=0.134 s
flock: filecount=1000 iteration=1 elapsed time=0.132 s
flock: filecount=1000 iteration=2 elapsed time=0.137 s
flock: filecount=1000 iteration=3 elapsed time=0.134 s
flock: filecount=1000 iteration=4 elapsed time=0.133 s
total elapsed time=0.670 s
flock: filecount=2000 iteration=0 elapsed time=0.274 s
flock: filecount=2000 iteration=1 elapsed time=0.282 s
flock: filecount=2000 iteration=2 elapsed time=0.284 s
flock: filecount=2000 iteration=3 elapsed time=0.285 s
flock: filecount=2000 iteration=4 elapsed time=0.284 s
total elapsed time=1.408 s
flock: filecount=5000 iteration=0 elapsed time=0.716 s
flock: filecount=5000 iteration=1 elapsed time=0.716 s
flock: filecount=5000 iteration=2 elapsed time=0.694 s
flock: filecount=5000 iteration=3 elapsed time=0.705 s
flock: filecount=5000 iteration=4 elapsed time=0.839 s
total elapsed time=3.671 s
plock: filecount=100 iteration=0 elapsed time=0.029 s
plock: filecount=100 iteration=1 elapsed time=0.021 s
plock: filecount=100 iteration=2 elapsed time=0.021 s
plock: filecount=100 iteration=3 elapsed time=0.021 s
plock: filecount=100 iteration=4 elapsed time=0.021 s
total elapsed time=0.114 s
plock: filecount=500 iteration=0 elapsed time=0.144 s
plock: filecount=500 iteration=1 elapsed time=0.114 s
plock: filecount=500 iteration=2 elapsed time=0.111 s
plock: filecount=500 iteration=3 elapsed time=0.111 s
plock: filecount=500 iteration=4 elapsed time=0.111 s
total elapsed time=0.591 s
plock: filecount=1000 iteration=0 elapsed time=0.271 s
plock: filecount=1000 iteration=1 elapsed time=0.235 s
plock: filecount=1000 iteration=2 elapsed time=0.236 s
plock: filecount=1000 iteration=3 elapsed time=0.235 s
plock: filecount=1000 iteration=4 elapsed time=0.234 s
total elapsed time=1.212 s
plock: filecount=2000 iteration=0 elapsed time=0.657 s
plock: filecount=2000 iteration=1 elapsed time=0.843 s
plock: filecount=2000 iteration=2 elapsed time=0.794 s
plock: filecount=2000 iteration=3 elapsed time=0.701 s
plock: filecount=2000 iteration=4 elapsed time=0.794 s
total elapsed time=3.789 s
plock: filecount=5000 iteration=0 elapsed time=1.941 s
plock: filecount=5000 iteration=1 elapsed time=1.946 s
plock: filecount=5000 iteration=2 elapsed time=2.062 s
plock: filecount=5000 iteration=3 elapsed time=1.972 s
plock: filecount=5000 iteration=4 elapsed time=1.964 s
total elapsed time=9.886 s

======================================================

After mount on the third node, rm file000000* again:

wferi at rs20:/mnt/rrdtest/david/test$ ../fplockperf 
flock: filecount=100 iteration=0 elapsed time=1.273 s
flock: filecount=100 iteration=1 elapsed time=1.360 s
flock: filecount=100 iteration=2 elapsed time=1.360 s
flock: filecount=100 iteration=3 elapsed time=1.360 s
flock: filecount=100 iteration=4 elapsed time=1.360 s
total elapsed time=6.712 s
flock: filecount=500 iteration=0 elapsed time=6.479 s
flock: filecount=500 iteration=1 elapsed time=6.479 s
flock: filecount=500 iteration=2 elapsed time=6.671 s
flock: filecount=500 iteration=3 elapsed time=6.479 s
flock: filecount=500 iteration=4 elapsed time=6.479 s
total elapsed time=32.586 s
flock: filecount=1000 iteration=0 elapsed time=13.546 s
flock: filecount=1000 iteration=1 elapsed time=13.790 s
flock: filecount=1000 iteration=2 elapsed time=13.598 s
flock: filecount=1000 iteration=3 elapsed time=13.598 s
flock: filecount=1000 iteration=4 elapsed time=13.790 s
total elapsed time=68.321 s
flock: filecount=2000 iteration=0 elapsed time=27.852 s
flock: filecount=2000 iteration=1 elapsed time=27.906 s
flock: filecount=2000 iteration=2 elapsed time=28.159 s
flock: filecount=2000 iteration=3 elapsed time=28.147 s
flock: filecount=2000 iteration=4 elapsed time=28.099 s
total elapsed time=140.164 s
flock: filecount=5000 iteration=0 elapsed time=66.564 s
flock: filecount=5000 iteration=1 elapsed time=66.401 s
flock: filecount=5000 iteration=2 elapsed time=66.217 s
flock: filecount=5000 iteration=3 elapsed time=66.401 s
flock: filecount=5000 iteration=4 elapsed time=66.413 s
total elapsed time=331.996 s
plock: filecount=100 iteration=0 elapsed time=1.520 s
plock: filecount=100 iteration=1 elapsed time=1.520 s
plock: filecount=100 iteration=2 elapsed time=1.520 s
plock: filecount=100 iteration=3 elapsed time=1.520 s
plock: filecount=100 iteration=4 elapsed time=1.520 s
total elapsed time=7.599 s
plock: filecount=500 iteration=0 elapsed time=6.911 s
plock: filecount=500 iteration=1 elapsed time=6.615 s
plock: filecount=500 iteration=2 elapsed time=6.491 s
plock: filecount=500 iteration=3 elapsed time=6.519 s
plock: filecount=500 iteration=4 elapsed time=6.519 s
total elapsed time=33.055 s
plock: filecount=1000 iteration=0 elapsed time=13.906 s
plock: filecount=1000 iteration=1 elapsed time=13.458 s
plock: filecount=1000 iteration=2 elapsed time=13.554 s
plock: filecount=1000 iteration=3 elapsed time=13.774 s
plock: filecount=1000 iteration=4 elapsed time=13.405 s
total elapsed time=68.096 s
plock: filecount=2000 iteration=0 elapsed time=31.127 s
plock: filecount=2000 iteration=1 elapsed time=31.475 s
plock: filecount=2000 iteration=2 elapsed time=31.883 s
plock: filecount=2000 iteration=3 elapsed time=31.799 s
plock: filecount=2000 iteration=4 elapsed time=32.067 s
total elapsed time=158.349 s
plock: filecount=5000 iteration=0 elapsed time=86.233 s
plock: filecount=5000 iteration=1 elapsed time=79.686 s
plock: filecount=5000 iteration=2 elapsed time=78.507 s
plock: filecount=5000 iteration=3 elapsed time=80.542 s
plock: filecount=5000 iteration=4 elapsed time=79.151 s
total elapsed time=404.119 s

That is, we are basically back at the first result.  The third 
node mounting the FS causes much more slowdown than I expected.

It's possible that I can live with only two nodes for this task, but
more will be needed for the next, so I'm absolutely willing to
investigate further or hear possbile ways out of this issue.
-- 
Thanks,
Feri.




More information about the Linux-cluster mailing list