<div>I don't use fencing because with ha-lvm I thought that I dind't
need it. But also because both nodes are VMs in VMWare. I know that
there is a module to do fencing with vmware but I prefer to avoid it.
I'm not in control of the VMWare infraestructure and probably VMWare
admins won't give me the tools to use this module.<br>
<br>Regards, Javi<br> </div><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"><pre>Fencing is critical, and running a cluster without fencing, even with <br>
qdisk, is not supported. Manual fencing is also not supported. The <br>*only* way to have a reliable cluster, testing or production, is to use <br>fencing.<br> <br>Why do you not wish to use it?<br> <br>On 06/20/2012 09:43 AM, Javier Vela wrote:<br>
> As I readed, if you use HA-LVM you don't need fencing because of vg<br>> tagging. Is It absolutely mandatory to use fencing with qdisk?<br>><br>> If it is, i supose i can use manual_fence, but in production I also<br>
> won't use fencing.<br>><br>> Regards, Javi.<br>><br>> Date: Wed, 20 Jun 2012 14:45:28 +0200<br>> From: <a href="mailto:emi2fast@gmail.com" target="_blank">emi2fast@gmail.com</a> <mailto:<a href="mailto:emi2fast@gmail.com" target="_blank">emi2fast@gmail.com</a>><br>
> To: <a href="mailto:linux-cluster@redhat.com" target="_blank">linux-cluster@redhat.com</a> <mailto:<a href="mailto:linux-cluster@redhat.com" target="_blank">linux-cluster@redhat.com</a>><br>> Subject: Re: [Linux-cluster] Node can't join already quorated cluster<br>
><br>> If you don't wanna use a real fence divice, because you only do some<br>> test, you have to use fence_manual agent<br>><br>> 2012/6/20 Javier Vela <<a href="mailto:jvdiago@gmail.com" target="_blank">jvdiago@gmail.com</a> <mailto:<a href="mailto:jvdiago@gmail.com" target="_blank">jvdiago@gmail.com</a>>><br>
><br>> Hi, I have a very strange problem, and after searching through lot<br>> of forums, I haven't found the solution. This is the scenario:<br>><br>> Two node cluster with Red Hat 5.7, HA-LVM, no fencing and quorum<br>
> disk. I start qdiskd, cman and rgmanager on one node. After 5<br>> minutes, finally the fencing finishes and cluster get quorate with 2<br>> votes:<br>><br>> [root@node2 ~]# clustat<br>> Cluster Status for test_cluster @ Wed Jun 20 05:56:39 2012<br>
> Member Status: Quorate<br>><br>> Member Name ID Status<br>> ------ ---- ---- ------<br>> node1-hb 1 Offline<br>
> node2-hb 2 Online, Local, rgmanager<br>> /dev/mapper/vg_qdisk-lv_qdisk 0 Online, Quorum Disk<br>><br>> Service Name Owner (Last) State<br>
> ------- ---- ----- ------ -----<br>> service:postgres node2 started<br>><br>> Now, I start the second node. When cman reaches fencing, it hangs<br>
> for 5 minutes aprox, and finally fails. clustat says:<br>><br>> root@node1 ~]# clustat<br>> Cluster Status for test_cluster @ Wed Jun 20 06:01:12 2012<br>> Member Status: Inquorate<br>><br>
> Member Name ID Status<br>> ------ ---- ---- ------<br>> node1-hb 1 Online, Local<br>> node2-hb 2 Offline<br>
> /dev/mapper/vg_qdisk-lv_qdisk 0 Offline<br>><br>> And in /var/log/messages I can see this errors:<br>><br>> Jun 20 06:02:12 node1 openais[6098]: [TOTEM] entering OPERATIONAL state.<br>
> Jun 20 06:02:12 node1 openais[6098]: [CLM ] got nodejoin message<br>> 15.15.2.10<br>> Jun 20 06:02:13 node1 dlm_controld[5386]: connect to ccs error -111,<br>> check ccsd or cluster status<br>
> Jun 20 06:02:13 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>> connection.<br>> Jun 20 06:02:13 node1 ccsd[6090]: Error while processing connect:<br>> Connection refused<br>> Jun 20 06:02:13 node1 ccsd[6090]: Initial status:: Inquorate<br>
> Jun 20 06:02:13 node1 gfs_controld[5392]: connect to ccs error -111,<br>> check ccsd or cluster status<br>> Jun 20 06:02:13 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>> connection.<br>
> Jun 20 06:02:13 node1 ccsd[6090]: Error while processing connect:<br>> Connection refused<br>> Jun 20 06:02:14 node1 openais[6098]: [TOTEM] entering GATHER state<br>> from 9.<br>> Jun 20 06:02:14 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>
> connection.<br>> Jun 20 06:02:14 node1 ccsd[6090]: Error while processing connect:<br>> Connection refused<br>> Jun 20 06:02:14 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>> connection.<br>
> Jun 20 06:02:14 node1 ccsd[6090]: Error while processing connect:<br>> Connection refused<br>> Jun 20 06:02:15 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>> connection.<br>> Jun 20 06:02:15 node1 ccsd[6090]: Error while processing connect:<br>
> Connection refused<br>> Jun 20 06:02:15 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>> connection.<br>> Jun 20 06:02:15 node1 ccsd[6090]: Error while processing connect:<br>> Connection refused<br>
> Jun 20 06:02:15 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>> connection.<br>> Jun 20 06:02:15 node1 ccsd[6090]: Error while processing connect:<br>> Connection refused<br>> Jun 20 06:02:16 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>
> connection.<br>> Jun 20 06:02:16 node1 ccsd[6090]: Error while processing connect:<br>> Connection refused<br>> Jun 20 06:02:16 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>> connection.<br>
> Jun 20 06:02:16 node1 ccsd[6090]: Error while processing connect:<br>> Connection refused<br>> Jun 20 06:02:17 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>> connection.<br>> Jun 20 06:02:17 node1 ccsd[6090]: Error while processing connect:<br>
> Connection refused<br>> Jun 20 06:02:17 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>> connection.<br>> Jun 20 06:02:17 node1 ccsd[6090]: Error while processing connect:<br>> Connection refused<br>
> Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering GATHER state<br>> from 0.<br>> Jun 20 06:02:18 node1 openais[6098]: [TOTEM] Creating commit token<br>> because I am the rep.<br>> Jun 20 06:02:18 node1 openais[6098]: [TOTEM] Storing new sequence id<br>
> for ring 15c<br>> Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering COMMIT state.<br>> Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering RECOVERY state.<br>> Jun 20 06:02:18 node1 openais[6098]: [TOTEM] position [0] member<br>
> 15.15.2.10 <<a href="http://15.15.2.10" target="_blank">http://15.15.2.10</a>>:<br>> Jun 20 06:02:18 node1 openais[6098]: [TOTEM] previous ring seq 344<br>> rep 15.15.2.10<br>> Jun 20 06:02:18 node1 openais[6098]: [TOTEM] aru e high delivered e<br>
> received flag 1<br>> Jun 20 06:02:18 node1 openais[6098]: [TOTEM] Did not need to<br>> originate any messages in recovery.<br>> Jun 20 06:02:18 node1 openais[6098]: [TOTEM] Sending initial ORF token<br>
> Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering OPERATIONAL state.<br>> Jun 20 06:02:18 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>> connection.<br>> Jun 20 06:02:18 node1 ccsd[6090]: Error while processing connect:<br>
> Connection refused<br>> Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering GATHER state<br>> from 9.<br>> Jun 20 06:02:18 node1 ccsd[6090]: Cluster is not quorate. Refusing<br>> connection.<br>
><br>> And the quorum disk:<br>><br>> [root@node2 ~]# mkqdisk -L -d<br>> kqdisk v0.6.0<br>> /dev/mapper/vg_qdisk-lv_qdisk:<br>> /dev/vg_qdisk/lv_qdisk:<br>> Magic: eb7a62c2<br>
> Label: cluster_qdisk<br>> Created: Thu Jun 7 09:23:34 2012<br>> Host: node1<br>> Kernel Sector Size: 512<br>
> Recorded Sector Size: 512<br>><br>> Status block for node 1<br>> Last updated by node 2<br>> Last updated on Wed Jun 20 06:17:23 2012<br>> State: Evicted<br>
> Flags: 0000<br>> Score: 0/0<br>> Average Cycle speed: 0.000500 seconds<br>> Last Cycle speed: 0.000000 seconds<br>> Incarnation: 4fe1a06c4fe1a06c<br>
> Status block for node 2<br>> Last updated by node 2<br>> Last updated on Wed Jun 20 07:09:38 2012<br>> State: Master<br>> Flags: 0000<br>> Score: 0/0<br>
> Average Cycle speed: 0.001000 seconds<br>> Last Cycle speed: 0.000000 seconds<br>> Incarnation: 4fe1a06c4fe1a06c<br>><br>><br>> In the other node I don't see any errors in /var/log/messages. One<br>
> strange thing is that if I start cman on both nodes at the same<br>> time, everything works fine and both nodes quorate (until I reboot<br>> one node and the problem appears). I've checked that multicast is<br>
> working properly. With iperf I can send a receive multicast paquets.<br>> Moreover I've seen with tcpdump the paquets that openais send when<br>> cman is trying to start. I've readed about a bug in RH 5.3 with the<br>
> same behaviour, but it is solved in RH 5.4.<br>><br>> I don't have Selinux enabled, and Iptables are also disabled. Here<br>> is the cluster.conf simplified (with less services and resources). I<br>
> want to point out one thing. I have allow_kill="0" in order to avoid<br>> fencing errors when quorum tries to fence a failed node. As <fence/><br>> is empty, before this stanza I got a lot of messages in<br>
> /var/log/messages with failed fencing.<br>><br>> <?xml version="1.0"?><br>> <cluster alias="test_cluster" config_version="15" name="test_cluster"><br>
> <fence_daemon clean_start="0" post_fail_delay="0"<br>> post_join_delay="-1"/><br>> <clusternodes><br>> <clusternode name="node1-hb" nodeid="1" votes="1"><br>
> <fence/><br>> </clusternode><br>> <clusternode name="node2-hb" nodeid="2" votes="1"><br>> <fence/><br>
> </clusternode><br>> </clusternodes><br>> <cman two_node="0" expected_votes="3"/><br>> <fencedevices/><br>
><br>> <rm log_facility="local4" log_level="7"><br>> <failoverdomains><br>> <failoverdomain name="etest_cluster_fo"<br>
> nofailback="1" ordered="1" restricted="1"><br>> <failoverdomainnode name="node1-hb"<br>> priority="1"/><br>
> <failoverdomainnode name="node2-hb"<br>> priority="2"/><br>> </failoverdomain><br>> </failoverdomains><br>
> <resources/><br>> <service autostart="1" domain="test_cluster_fo"<br>> exclusive="0" name="postgres" recovery="relocate"><br>
> <ip address="172.24.119.44" monitor_link="1"/><br>> <lvm name="vg_postgres" vg_name="vg_postgres"<br>> lv_name="postgres"/><br>
><br>> <fs device="/dev/vg_postgres/postgres"<br>> force_fsck="1" force_unmount="1" fstype="ext3"<br>> mountpoint="/var/lib/pgsql" name="postgres" self_fence="0"/><br>
><br>> <script file="/etc/init.d/postgresql" name="postgres"><br>> </script><br>> </service><br>> </rm><br>
> <totem consensus="4000" join="60" token="20000"<br>> token_retransmits_before_loss_const="20"/><br>> <quorumd allow_kill="0" interval="1" label="cluster_qdisk"<br>
> tko="10" votes="1"><br>> <heuristic<br>> program="/usr/share/cluster/check_eth_link.sh eth0" score="1"<br>> interval="2" tko="3"/><br>
> </quorumd><br>> </cluster><br>><br>><br>> The /etc/hosts:<br>> 172.24.119.10 node1<br>> 172.24.119.34 node2<br>> 15.15.2.10 node1-hb node1-hb.localdomain<br>
> 15.15.2.11 node2-hb node2-hb.localdomain<br>><br>> And the versions:<br>> Red Hat Enterprise Linux Server release 5.7 (Tikanga)<br>> cman-2.0.115-85.el5<br>> rgmanager-2.0.52-21.el5<br>
> openais-0.80.6-30.el5<br>><br>> I don't know what else I should try, so if you can give me some<br>> ideas, I will be very pleased.<br>><br>> Regards, Javi.<br>><br>> --<br>
> Linux-cluster mailing list<br>> <a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a> <mailto:<a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a>><br>
> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br>
><br>><br>><br>><br>> --<br>> esta es mi vida e me la vivo hasta que dios quiera<br>><br>> -- Linux-cluster mailing list <a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a><br>
> <mailto:<a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a>><br>
> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br>><br>><br>> --<br>> Linux-cluster mailing list<br>> <a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a><br>
> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br>><br> <span class="HOEnZb"><font color="#888888"><br> <br>-- <br>Digimer<br>
Papers and Projects: <a href="https://alteeve.com" target="_blank">https://alteeve.com</a></font></span></pre></blockquote>