[Linux-cluster] Corosync fails to start using cman

Tue Aug 2 01:56:47 UTC 2011

On 08/01/2011 09:50 PM, David wrote:
> I have the RHCS installed on CentOS6 x86_64.
> 
> One of the nodes in a 3 node cluster won't start after I moved the nodes
> to a new vlan.
> 
> When I start cman this is what I get:
> 
> Starting cluster:
>    Checking Network Manager...                             [  OK  ]
>    Global setup...                                         [  OK  ]
>    Loading kernel modules...                               [  OK  ]
>    Mounting configfs...                                    [  OK  ]
>    Starting cman... Aug 02 01:45:17 corosync [MAIN  ] Corosync Cluster
> Engine ('1.2.3'): started and ready to provide service.
> Aug 02 01:45:17 corosync [MAIN  ] Corosync built-in features: nss rdma
> Aug 02 01:45:17 corosync [MAIN  ] Successfully read config from
> /etc/cluster/cluster.conf
> Aug 02 01:45:17 corosync [MAIN  ] Successfully parsed cman config
> Aug 02 01:45:17 corosync [TOTEM ] Token Timeout (10000 ms) retransmit
> timeout (2380 ms)
> Aug 02 01:45:17 corosync [TOTEM ] token hold (1894 ms) retransmits
> before loss (4 retrans)
> Aug 02 01:45:17 corosync [TOTEM ] join (60 ms) send_join (0 ms)
> consensus (12000 ms) merge (200 ms)
> Aug 02 01:45:17 corosync [TOTEM ] downcheck (1000 ms) fail to recv const
> (2500 msgs)
> Aug 02 01:45:17 corosync [TOTEM ] seqno unchanged const (30 rotations)
> Maximum network MTU 1402
> Aug 02 01:45:17 corosync [TOTEM ] window size per rotation (50 messages)
> maximum messages per rotation (17 messages)
> Aug 02 01:45:17 corosync [TOTEM ] missed count const (5 messages)
> Aug 02 01:45:17 corosync [TOTEM ] send threads (0 threads)
> Aug 02 01:45:17 corosync [TOTEM ] RRP token expired timeout (2380 ms)
> Aug 02 01:45:17 corosync [TOTEM ] RRP token problem counter (2000 ms)
> Aug 02 01:45:17 corosync [TOTEM ] RRP threshold (10 problem count)
> Aug 02 01:45:17 corosync [TOTEM ] RRP mode set to none.
> Aug 02 01:45:17 corosync [TOTEM ] heartbeat_failures_allowed (0)
> Aug 02 01:45:17 corosync [TOTEM ] max_network_delay (50 ms)
> Aug 02 01:45:17 corosync [TOTEM ] HeartBeat is Disabled. To enable set
> heartbeat_failures_allowed > 0
> Aug 02 01:45:17 corosync [TOTEM ] Initializing transport (UDP/IP).
> Aug 02 01:45:17 corosync [TOTEM ] Initializing transmit/receive
> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Aug 02 01:45:17 corosync [IPC   ] you are using ipc api v2
> Aug 02 01:45:18 corosync [TOTEM ] Receive multicast socket recv buffer
> size (262142 bytes).
> Aug 02 01:45:18 corosync [TOTEM ] Transmit multicast socket send buffer
> size (262142 bytes).
> corosync: totemsrp.c:3091: memb_ring_id_create_or_load: Assertion `res
> == sizeof (unsigned long long)' failed.
> Aug 02 01:45:18 corosync [TOTEM ] The network interface [10.50.3.70] is
> now up.
> corosync died with signal: 6 Check cluster logs for details
> 
> 
> Any idea what the issue could be?
> 
> Thanks
> David

What is your cluster.conf file (please obscure passwords only), what
does `uname -n` return and what is your network configuration (interface
names and IPs)?

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"