[Linux-cluster] VM service fence

Paul M. Dyer pmdyer at ctgcentral2.com
Wed Mar 17 15:04:42 UTC 2010


Good Morning Cluster Experts,

I have a 3-node cluster with Virtual Machine services.   During the full-OS backup timeframe (heavy I/O activity), one of the VMs is receiving a shutdown request.   It has happened 3 times in 8 weeks, to 3 different VMs.   I assume the cluster is sending this shutdown message.   The VM restarts immediately afterwards, likely as a result of cluster monitoring.

I checked the messages log.  It appears that we are not using a heartbeat, since I did not add any <totem/> to cluster.conf.   This version of the cluster does not use the openais.conf file, but rather cman is started as a service of aisexec (cman 2.0).

Does anyone have suggestions about what to do?

Who is sending the shutdown request; is it groupd?

I have two NICs configured on the nodes.   Is one or both IP subnets used in the multicast?  Which one?

Thanks,

Paul Dyer

P.S.
here is the messages log from a node startup showing the openais/totem portion:
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] Copyright (C) 2006 Red Hat, Inc. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] AIS Executive Service: started and ready to provide service. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] Using default multicast address of 239.192.48.228 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] send threads (0 threads) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] RRP token expired timeout (495 ms) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] RRP token problem counter (2000 ms) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] RRP threshold (10 problem count) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] RRP mode set to none. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] heartbeat_failures_allowed (0) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] max_network_delay (50 ms) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes). 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] The network interface [198.62.216.73] is now up. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Created or loaded sequence id 660.198.62.216.73 for this ring. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] entering GATHER state from 15. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [CMAN ] CMAN 2.0.115 (built Nov 19 2009 10:37:31) started 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] Service initialized 'openais CMAN membership service 2.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais extended virtual synchrony service' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais cluster membership service B.01.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais availability management framework B.01.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais checkpoint service B.01.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais event service B.01.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais distributed locking service B.01.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais message service B.01.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais configuration service' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais cluster closed process group service v1.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais cluster config database access v1.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SYNC ] Not using a virtual synchrony filter. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Creating commit token because I am the rep. 




More information about the Linux-cluster mailing list