[Linux-cluster] Cluster logging issues + rgmanager doesn't notice failed vms

Mon Aug 20 12:11:39 UTC 2012

Hello again ;)

My cluster seems to be logging only to /var/log/syslog, and even then 
only from the corosync daemon, the /var/log/cluster logs are empty:

root at vm01-test:~# ls -al /var/log/cluster/*.log
-rw------- 1 root root 0 Aug 16 06:50 /var/log/cluster/corosync.log
-rw------- 1 root root 0 Aug 20 06:39 /var/log/cluster/dlm_controld.log
-rw------- 1 root root 0 Aug 20 06:39 /var/log/cluster/fenced.log
-rw------- 1 root root 0 Aug  7 06:27 /var/log/cluster/fence_na.log
-rw------- 1 root root 0 Aug 16 06:50 /var/log/cluster/gfs_controld.log
-rw------- 1 root root 0 Aug 20 06:39 /var/log/cluster/qdiskd.log
-rw------- 1 root root 0 Aug 20 06:39 /var/log/cluster/rgmanager.log

Also, I've shut down my 2 vms with virt-manager or with halt from the 
cli on the guest itself.
virsh list on all 3 nodes show no running guests. However:

root at vm01-test:~# clustat
Cluster Status for kvm @ Mon Aug 20 14:10:20 2012
Member Status: Quorate

  Member Name                                                     ID   
Status
  ------ ----                                                     ---- 
------
  vm01-test                                                           1 
Online, Local, rgmanager
  vm02-test                                                           2 
Online, rgmanager
  vm03-test                                                           3 
Online, rgmanager
  /dev/mapper/iscsi_cluster_quorum                                    0 
Online, Quorum Disk

  Service Name                                                     Owner 
(Last)                                                     State
  ------- ----                                                     ----- 
------                                                     -----
  vm:intux_firewall                                                
vm02-test                                                        started
  vm:intux_zabbix                                                  
vm02-test                                                        started

My config:

<cluster name="kvm" config_version="14">
	<logging debug="on"/>
         <clusternodes>
         <clusternode name="vm01-test" nodeid="1">
		<fence>
			<method name="apc">
				<device name="apc01" port="1" action="off"/>
				<device name="apc02" port="1" action="off"/>
				<device name="apc01" port="1" action="on"/>
				<device name="apc02" port="1" action="on"/>
			</method>
		</fence>
         </clusternode>
         <clusternode name="vm02-test" nodeid="2">
		<fence>
			<method name="apc">
				<device name="apc01" port="8" action="off"/>
				<device name="apc02" port="8" action="off"/>
				<device name="apc01" port="8" action="on"/>
				<device name="apc02" port="8" action="on"/>
			</method>
                 </fence>
         </clusternode>
         <clusternode name="vm03-test" nodeid="3">
		<fence>
			<method name="apc">
				<device name="apc01" port="2" action="off"/>
				<device name="apc02" port="2" action="off"/>
				<device name="apc01" port="2" action="on"/>
				<device name="apc02" port="2" action="on"/>
			</method>
                 </fence>
         </clusternode>
         </clusternodes>
	<fencedevices>
		<fencedevice agent="fence_apc" ipaddr="apc01" secure="on" 
login="device" name="apc01" passwd="xxx"/>
		<fencedevice agent="fence_apc" ipaddr="apc02" secure="on" 
login="device" name="apc02" passwd="xxx"/>
	</fencedevices>
	<rm log_level="5">
		<failoverdomains>
			<failoverdomain name="any_node" nofailback="1" ordered="0" 
restricted="0"/>
		</failoverdomains>
		<vm domain="any_node" max_restarts="2" migrate="live" name="firewall" 
path="/etc/libvirt/qemu/" recovery="restart" restart_expire_time="600"/>
		<vm domain="any_node" max_restarts="2" migrate="live" name="zabbix" 
path="/etc/libvirt/qemu/" recovery="restart" restart_expire_time="600"/>
	</rm>
	<totem rrp_mode="none" secauth="off"/>
	<quorumd interval="2" tko="4" 
device="/dev/mapper/iscsi_cluster_quorum"></quorumd>
</cluster>

I hope you guys can shed some light on this.. CMAN, rgmanager, ... = 
3.1.7-0ubuntu2.1, corosync = 1.4.2-2

Kind regards,

Bart