[Linux-cluster] Slowness above 500 RRDs
Ferenc Wagner
wferi at niif.hu
Tue Jun 12 16:39:34 UTC 2007
David Teigland <teigland at redhat.com> writes:
> On Tue, Jun 12, 2007 at 05:06:56PM +0200, Ferenc Wagner wrote:
>
>> Here is the old mail I haven't sent before. Meanwhile, I'm switching
>> in other nodes to continue the tests in my previous mail.
[...]
>> But looks like nodeA feels obliged to communicate its locking
>> process around the cluster.
>
> I'm not sure what you mean here. To see the amount of dlm locking traffic
> on the network, look at port 21064. There should be very little in the
> test above... and the dlm locking that you do see should mostly be related
> to file i/o, not flocks.
There was much traffic on port 21064. Possibly related to file I/O
and not flocks, I can't tell. But that's agrees with my speculation,
that it's not the explicit [pf]locks that take much time, but
something else.
>> What confuses me is that he emits multicast packets even when he's the
>> only member. Otherwise, it passes tokens around the cluster, which
>> makes more sense, though still unnecessary, as he is the lock master (if
>> I get the lock master concept right).
>
> I think you're confusing the multicast network traffic from openais
> (related to cluster membership) and the point-to-point network traffic
> from the dlm (related to gfs locking). The two types of traffic are not
> related.
I didn't notice any multicast traffic when the node wasn't alone, but
maybe it was simply dwarfed by the locking traffic. I can check that
again later, but...
>> # cman_tool services
>> type level name id state
>> fence 0 default 00010001 none
>> [1 2 3]
>> dlm 1 clvmd 00020001 none
>> [1 2 3]
>> dlm 1 test 000a0001 none
>> [1 2]
>> gfs 2 test 00090001 none
>> [1 2]
>
> !?!? but now you're using the old RHEL4 generation stuff -- gfs_controld
> is completely irrelevant there. The analysis completely changes between
> the RHEL4/RHEL5 (old/new) generations of infrastructure.
To my best knowledge, I'm using the new infrastructure. There's no
cman kernel module loaded, there's no cman process running, there's an
aisexec process running, syslog contains messages like
openais[4374]: [CLM ] CLM CONFIGURATION CHANGE
openais[4374]: [CLM ] New Configuration:
openais[4374]: [CLM ] ^Ir(0) ip(XXX.XXX.XXX.XXX)
openais[4374]: [CLM ] ^Ir(0) ip(XXX.XXX.XXX.XXX)
openais[4374]: [CLM ] ^Ir(0) ip(XXX.XXX.XXX.XXX)
openais[4374]: [CLM ] Members Left:
openais[4374]: [CLM ] Members Joined:
openais[4374]: [CLM ] ^Ir(0) ip(XXX.XXX.XXX.XXX)
openais[4374]: [SYNC ] This node is within the primary component and will provide service.
openais[4374]: [TOTEM] entering OPERATIONAL state.
openais[4374]: [CLM ] got nodejoin message XXX.XXX.XXX.XXX
openais[4374]: [CLM ] got nodejoin message XXX.XXX.XXX.XXX
openais[4374]: [CLM ] got nodejoin message XXX.XXX.XXX.XXX
openais[4374]: [CPG ] got joinlist message from node 1
openais[4374]: [CPG ] got joinlist message from node 2
kernel: dlm: connecting to 3
kernel: dlm: got connection from 3
lsmod gives
gfs 256964 1
lock_nolock 4480 0
lock_dlm 20684 2
gfs2 328076 3 gfs,lock_nolock,lock_dlm
dlm 92340 17 lock_dlm
configfs 25616 2 dlm
How could I be running the old stuff? Am I totally confused?
--
Feri.
More information about the Linux-cluster
mailing list