[Linux-cluster] Strange behaviours in two-node cluster

Javier Vela jvdiago at gmail.com
Tue Aug 7 22:08:02 UTC 2012


Hi,

Sorry for reopen the thread, but I have new info that maybe can guide us to
a solution.

In the documentation I've readed that cman uses trhee UDP ports. Assuming
the default configuration:

MULTICAST_ADDR:5405
LOCAL_ADDR:5405
LOCAL_ADDR:5404

The problem is that in my cluster the 5404 port isn't up. Netstat shows the
5405 in both interfaces, but not the 5405 in the local interface. I've also
ran tcpdump:


[root at node1]# tcpdump -i eth0 -nn -vv port 5405
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96
bytes
16:36:35.340412 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto:
UDP (17), length: 134) 15.15.1.10.5149 > 15.15.1.11.5405: UDP, length 106
16:36:35.340872 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto:
UDP (17), length: 134) 15.15.1.11.5149 > 15.15.1.10.5405: UDP, length 106
16:36:35.549397 IP (tos 0x0, ttl   1, id 0, offset 0, flags [DF], proto:
UDP (17), length: 146) 15.15.1.10.5149 > 239.192.240.165.5405: UDP, length
118
16:36:37.399398 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto:
UDP (17), length: 134) 15.15.1.10.5149 > 15.15.1.11.5405: UDP, length 106
16:36:37.399845 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto:
UDP (17), length: 134) 15.15.1.11.5149 > 15.15.1.10.5405: UDP, length 106
16:36:37.608384 IP (tos 0x0, ttl   1, id 0, offset 0, flags [DF], proto:
UDP (17), length: 146) 15.15.1.10.5149 > 239.192.240.165.5405: UDP, length
118
16:36:39.458357 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto:
UDP (17), length: 134) 15.15.1.10.5149 > 15.15.1.11.5405: UDP, length 106
16:36:39.458780 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto:
UDP (17), length: 134) 15.15.1.11.5149 > 15.15.1.10.5405: UDP, length 106
16:36:39.667355 IP (tos 0x0, ttl   1, id 0, offset 0, flags [DF], proto:
UDP (17), length: 146) 15.15.1.10.5149 > 239.192.240.165.5405: UDP, length
118

[root at node1]# tcpdump -i eth0 -nn -vv port 5404
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96
bytes

As you can see, I have udp traffic between the two nodes (15.15.1.10 and
15.15.1.11) and between the node and the multicast address
(239.192.240.165) but there is no traffic in the 5404 port. Is this normal?
I mean, this port should be visible? What can lead to this?

Thank you in advance, Javi

2012/7/17 Javier Vela <jvdiago at gmail.com>

> Hi,
>
> Thank you for the quick reply. I'm going to ask if we can upgrade to Red
> Hat 5.8.
>
> Moreover, the machines don't have now performance problems (we are still
> in pre). But all is virtual, under VMWare, so a punctual problem in the
> VMWare  infrastructure can affect us. Do you know some way to test network
> problems that could affect RHCS?
>
> I tried tcpdump and iperf, but haven't seen anything.
>
> Regards, Javi.
>
>
> 2012/7/17 Digimer <lists at alteeve.ca>
>
>> On 07/17/2012 03:30 AM, Javier Vela wrote:
>> > Hi, I'm also seeing a lot of log entries in the logs like that:
>> >
>> > openais[4264]: [TOTEM] Retransmit List: 34 35 36 37 38 39 3a 3b 3c
>> >
>> > I've searched through internet and this happens when there are some
>> > delay between the nodes, but openais its supposed to recover gracefully.
>> > Can this be a problem?
>> >
>> > 2012/7/16 Javier Vela <jvdiago at gmail.com <mailto:jvdiago at gmail.com>>
>>
>> I saw this happen with a bug in rhel 6.1 when the nodes were too slow.
>> I'm wondering if a) you have network problems somewhere or b) you have
>> insufficient performance on your nodes.
>>
>> Usually it recovers on it's own, but I have seen it run away to the
>> point where I had to stop the cluster. That was on modest hardware in a
>> test environment. On all production machines I've seen, it recovered on
>> it's own.
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.com
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120808/558de6f2/attachment.htm>


More information about the Linux-cluster mailing list