[Linux-cluster] Problem with second ring config

Fri Apr 26 13:58:00 UTC 2013

Hello,

we have a two node cluster running CentOS 6.4 (fully patched:corosync-1.4.1-15,
cman-3.0.12.1-49).
When I configure a second ring (passive mode) for cluster-interconnect I get the
following messages in corosync.log (around every 2 minutes):

...
Apr 26 15:34:54 corosync [TOTEM ] Marking ringid 1 interface 192.168.216.24 FAULTY
Apr 26 15:34:55 corosync [TOTEM ] Automatically recovered ring 1
Apr 26 15:36:52 corosync [TOTEM ] Marking ringid 1 interface 192.168.216.24 FAULTY
Apr 26 15:36:53 corosync [TOTEM ] Automatically recovered ring 1
Apr 26 15:38:50 corosync [TOTEM ] Marking ringid 1 interface 192.168.216.24 FAULTY
Apr 26 15:38:51 corosync [TOTEM ] Automatically recovered ring 1
...

It seems related to bug https://bugzilla.redhat.com/show_bug.cgi?id=850757 (but
this one should be fixed in corosync-1.4.1-15).

Also "corosync-cfgtool -s" lists both rings as active:

> corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 10.0.0.5
        status  = ring 0 active with no faults
RING ID 1
        id      = 192.168.216.22
        status  = ring 1 active with no faults

> corosync-objctl | grep rrp
cluster.totem.rrp_mode=passive
totem.rrp_mode=passive

When I change the config to active (<totem rrp_mode="active" secauth="off"/>) I
don't get these messages.

Any comments?

Thanks and best regards,
Ralf