[Linux-cluster] Problems with relocation of apache and fence_vmware

Thu Aug 30 08:07:04 UTC 2012

*Background : *
I am using two VM's hosted in my internal lab that has two interfaces one
configured with a valid IP and other being down. I have kept the VIP also
in the same network. My intention is to have a Apache configured as cluster
service in these two nodes and do a fail-over when the node or the
interface goes down. I try to use fence_vmware as fencing device. These two
VM's are now part of a ESX 4.1 host and the GuestOS in my VM's are RHEL6.0
32-bit.

I am seeing the following problems in my setup now ...

1. When starting a apache service from LUCI, it starts fine in a node. But,
if i kill httpd process from that node manually, it does not detect the
service is down to restart or to relocate
2. -same- case if i do "ip adds del <VIP>" ; it just detects the node is
down but does not do a restart or relocate of the service
3. Whenever i reboot the nodes, it comes online and the service properly
starts fine in either of the node and both nodes perfectly in Quorum but
the fail-over never happens if i stop that active node.
4. I am not sure what format of fence that i must put in the cluster.conf,
since there is no way i can test that out if at all it works fine.

Manual tests :
1. I manually run something like this
"fence_vmware --action=status --ip=10.72.145.145 --username=<login>
--password=<password> --plug=<vm-name>" which works fine on both the nodes.
2. Apache starts/stops just particularly fine from both nodes when i do
"rg_test test /etc/cluster/cluster.conf start service WEB"

Cluster.conf is attached herewith.
rgmanager.log is attached herewith.

Please let me know any specific debug commands that i can run manually to
find out the issues going on here, more particularly the "relocation" of
service and the "fencing"; both consistently fails.

Please help. I have been spending more than 10 days now to set this up in
my internal lab to show it as Proof of Concept to my business heads to buy
RHEL cluster indeed works for our production requirement.

-Param
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120830/5485e3dd/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: application/octet-stream
Size: 2089 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120830/5485e3dd/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rgmanager.log
Type: application/octet-stream
Size: 23147 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120830/5485e3dd/attachment-0001.obj>