[Linux-cluster] VM service fence
Paul M. Dyer
pmdyer at ctgcentral2.com
Wed Mar 17 15:04:42 UTC 2010
Good Morning Cluster Experts,
I have a 3-node cluster with Virtual Machine services. During the full-OS backup timeframe (heavy I/O activity), one of the VMs is receiving a shutdown request. It has happened 3 times in 8 weeks, to 3 different VMs. I assume the cluster is sending this shutdown message. The VM restarts immediately afterwards, likely as a result of cluster monitoring.
I checked the messages log. It appears that we are not using a heartbeat, since I did not add any <totem/> to cluster.conf. This version of the cluster does not use the openais.conf file, but rather cman is started as a service of aisexec (cman 2.0).
Does anyone have suggestions about what to do?
Who is sending the shutdown request; is it groupd?
I have two NICs configured on the nodes. Is one or both IP subnets used in the multicast? Which one?
Thanks,
Paul Dyer
P.S.
here is the messages log from a node startup showing the openais/totem portion:
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] AIS Executive Service: started and ready to provide service.
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] Using default multicast address of 239.192.48.228
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] send threads (0 threads)
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] RRP token expired timeout (495 ms)
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] RRP token problem counter (2000 ms)
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] RRP threshold (10 problem count)
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] RRP mode set to none.
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] heartbeat_failures_allowed (0)
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] max_network_delay (50 ms)
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes).
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] The network interface [198.62.216.73] is now up.
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Created or loaded sequence id 660.198.62.216.73 for this ring.
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] entering GATHER state from 15.
Mar 15 16:25:16 lxprodas1xen openais[6250]: [CMAN ] CMAN 2.0.115 (built Nov 19 2009 10:37:31) started
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] Service initialized 'openais CMAN membership service 2.01'
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais extended virtual synchrony service'
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais cluster membership service B.01.01'
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais availability management framework B.01.01'
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais checkpoint service B.01.01'
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais event service B.01.01'
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais distributed locking service B.01.01'
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais message service B.01.01'
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais configuration service'
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais cluster closed process group service v1.01'
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 'openais cluster config database access v1.01'
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SYNC ] Not using a virtual synchrony filter.
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Creating commit token because I am the rep.
More information about the Linux-cluster
mailing list