[Linux-cluster] corosync ring failure

Digimer lists at alteeve.ca
Wed Jul 23 16:16:48 UTC 2014


Any logs in the switch? Is the multicast group being deleted/recreated?

On 23/07/14 11:53 AM, C. Handel wrote:
> hi,
>
> i run a cluster with two corosync rings. One of the rings is marked
> faulty every fourty seconds, to immediately recover a second later.
> the other ring is stable
>
> i have no idea how i should debug this.
>
>
> we are running sl6.5 with pacemaker 1.1.10, cman 3.0.12, corosync 1.4.1
> cluster consists of three machines. Ring1 is running on 10gigbit
> interfaces, Ring0 on 1gigibit interfaces. Both rings don't leave their
> respective switch.
>
> corosync communication is udpu, rrp_mode is passive
>
> cluster.conf:
>
> <cluster config_version="30" name="aslfile">
>
> <cman transport="udpu">
> </cman>
>
> <fence_daemon post_join_delay="120" post_fail_delay="30"/>
>
> <fencedevices>
>          <fencedevice name="pcmk" agent="fence_pcmk" action="off"/>
> </fencedevices>
>
> <quorumd
>     cman_label="qdisk"
>     device="/dev/mapper/mpath-091quorump1"
>     min_score="1"
>     votes="2"
>     >
> </quorumd>
>
> <clusternodes>
> <clusternode name="asl430m90" nodeid="430">
>          <altname name="asl430"/>
>          <fence>
>                  <method name="pcmk-redirect">
>                          <device name="pcmk" port="asl430m90"/>
>                  </method>
>          </fence>
> </clusternode>
> <clusternode name="asl431m90" nodeid="431">
>          <altname name="asl431"/>
>          <fence>
>                  <method name="pcmk-redirect">
>                          <device name="pcmk" port="asl431m90"/>
>                  </method>
>          </fence>
> </clusternode>
> <clusternode name="asl432m90" nodeid="432">
>          <altname name="asl432"/>
>          <fence>
>                  <method name="pcmk-redirect">
>                          <device name="pcmk" port="asl432m90"/>
>                  </method>
>          </fence>
> </clusternode>
> </clusternodes>
> </cluster>
>
>
> syslog
>
>
> Jul 23 17:48:34 asl431 corosync[3254]:   [TOTEM ] Marking ringid 1
> interface 140.181.134.212 FAULTY
> Jul 23 17:48:35 asl431 corosync[3254]:   [TOTEM ] Automatically recovered ring 1
> Jul 23 17:48:35 asl431 corosync[3254]:   [TOTEM ] Automatically recovered ring 1
> Jul 23 17:48:35 asl431 corosync[3254]:   [TOTEM ] Automatically recovered ring 1
> Jul 23 17:49:14 asl431 corosync[3254]:   [TOTEM ] Marking ringid 1
> interface 140.181.134.212 FAULTY
> Jul 23 17:49:15 asl431 corosync[3254]:   [TOTEM ] Automatically recovered ring 1
> Jul 23 17:49:15 asl431 corosync[3254]:   [TOTEM ] Automatically recovered ring 1
> Jul 23 17:49:15 asl431 corosync[3254]:   [TOTEM ] Automatically recovered ring 1
>
>
>
> Greetings
>     Christoph
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?




More information about the Linux-cluster mailing list