[Linux-cluster] Slowness above 500 RRDs
Ferenc Wagner
wferi at niif.hu
Wed Jun 13 14:38:40 UTC 2007
David Teigland <teigland at redhat.com> writes:
>>>> But looks like nodeA feels obliged to communicate its locking
>>>> process around the cluster.
>>>
>>> I'm not sure what you mean here. To see the amount of dlm locking traffic
>>> on the network, look at port 21064. There should be very little in the
>>> test above... and the dlm locking that you do see should mostly be related
>>> to file i/o, not flocks.
>>
>> There was much traffic on port 21064. Possibly related to file I/O
>> and not flocks, I can't tell. But that's agrees with my speculation,
>> that it's not the explicit [pf]locks that take much time, but
>> something else.
>
> Could you comment the fcntl/flock calls out of the application entirely
> and try it?
Let's see. A typical test run looks like this (first with fcntl
locking; tcpdump slows down the first iteration from about 6 s):
filecount=500
iteration=0 elapsed time=20.196318 s
iteration=1 elapsed time=0.323969 s
iteration=2 elapsed time=0.319929 s
iteration=3 elapsed time=0.361738 s
iteration=4 elapsed time=0.399365 s
total elapsed time=21.601319 s
During the first (slow) iteration, there's much traffic on port 21064.
During the next (fast) iterations there's no traffic at all on that port.
If I rerun the test immediately, there's still no traffic.
5 minutes later, without any action on my part, there's a couple of
packets again, then 20 s later a bigger bunch (around 30).
After this, the first iteration generates much traffic again, GOTO 10.
If I use flock instead, the beginning is similar, but after about 10 s
from the finish of the test, some small traffic appears by itself, and
if I rerun the test after this, it generates traffic again, although
much less than after 5 minutes. The traffic generated 5 minutes after
the test run consists of a couple of packets followed by a much bigger
bunch 5 s later.
If I don't use any locking at all, then the situation is the same as
with fcntl locking, but the "automatic" traffic consist of a small
burst (couple of packets) 4 min 51 s after the finish, then about 30
packets 25 s later.
Does it tell you anything? The timings are perhaps somewhat off
because of the 20 s runtime. If you can make some sense out of this,
I'd be glad to hear it. Also, I'd like to tweak the 5 minutes
timeout, where does it come from? Is it settable by sysfs or
gfs_tool?
--
Thanks,
Feri.
More information about the Linux-cluster
mailing list