[Linux-cluster] corosync ring failure
Digimer
lists at alteeve.ca
Wed Jul 23 16:16:48 UTC 2014
Any logs in the switch? Is the multicast group being deleted/recreated?
On 23/07/14 11:53 AM, C. Handel wrote:
> hi,
>
> i run a cluster with two corosync rings. One of the rings is marked
> faulty every fourty seconds, to immediately recover a second later.
> the other ring is stable
>
> i have no idea how i should debug this.
>
>
> we are running sl6.5 with pacemaker 1.1.10, cman 3.0.12, corosync 1.4.1
> cluster consists of three machines. Ring1 is running on 10gigbit
> interfaces, Ring0 on 1gigibit interfaces. Both rings don't leave their
> respective switch.
>
> corosync communication is udpu, rrp_mode is passive
>
> cluster.conf:
>
> <cluster config_version="30" name="aslfile">
>
> <cman transport="udpu">
> </cman>
>
> <fence_daemon post_join_delay="120" post_fail_delay="30"/>
>
> <fencedevices>
> <fencedevice name="pcmk" agent="fence_pcmk" action="off"/>
> </fencedevices>
>
> <quorumd
> cman_label="qdisk"
> device="/dev/mapper/mpath-091quorump1"
> min_score="1"
> votes="2"
> >
> </quorumd>
>
> <clusternodes>
> <clusternode name="asl430m90" nodeid="430">
> <altname name="asl430"/>
> <fence>
> <method name="pcmk-redirect">
> <device name="pcmk" port="asl430m90"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="asl431m90" nodeid="431">
> <altname name="asl431"/>
> <fence>
> <method name="pcmk-redirect">
> <device name="pcmk" port="asl431m90"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="asl432m90" nodeid="432">
> <altname name="asl432"/>
> <fence>
> <method name="pcmk-redirect">
> <device name="pcmk" port="asl432m90"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> </cluster>
>
>
> syslog
>
>
> Jul 23 17:48:34 asl431 corosync[3254]: [TOTEM ] Marking ringid 1
> interface 140.181.134.212 FAULTY
> Jul 23 17:48:35 asl431 corosync[3254]: [TOTEM ] Automatically recovered ring 1
> Jul 23 17:48:35 asl431 corosync[3254]: [TOTEM ] Automatically recovered ring 1
> Jul 23 17:48:35 asl431 corosync[3254]: [TOTEM ] Automatically recovered ring 1
> Jul 23 17:49:14 asl431 corosync[3254]: [TOTEM ] Marking ringid 1
> interface 140.181.134.212 FAULTY
> Jul 23 17:49:15 asl431 corosync[3254]: [TOTEM ] Automatically recovered ring 1
> Jul 23 17:49:15 asl431 corosync[3254]: [TOTEM ] Automatically recovered ring 1
> Jul 23 17:49:15 asl431 corosync[3254]: [TOTEM ] Automatically recovered ring 1
>
>
>
> Greetings
> Christoph
>
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Linux-cluster
mailing list