<div>I don't use fencing because with ha-lvm I thought that I dind't need it. But also because both nodes are VMs in VMWare. I know that there is a module to do fencing with vmware but I prefer to avoid it. I'm not in control of the VMWare infraestructure and probably VMWare admins won't give me the tools to use this module. Regards, Javi </div><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"><pre>Fencing is critical, and running a cluster without fencing, even with qdisk, is not supported. Manual fencing is also not supported. The *only* way to have a reliable cluster, testing or production, is to use fencing. Why do you not wish to use it? On 06/20/2012 09:43 AM, Javier Vela wrote: > As I readed, if you use HA-LVM you don't need fencing because of vg > tagging. Is It absolutely mandatory to use fencing with qdisk? > > If it is, i supose i can use manual_fence, but in production I also > won't use fencing. > > Regards, Javi. > > Date: Wed, 20 Jun 2012 14:45:28 +0200 > From: <a href="mailto:emi2fast@gmail.com" target="_blank">emi2fast@gmail.com</a> <mailto:<a href="mailto:emi2fast@gmail.com" target="_blank">emi2fast@gmail.com</a>> > To: <a href="mailto:linux-cluster@redhat.com" target="_blank">linux-cluster@redhat.com</a> <mailto:<a href="mailto:linux-cluster@redhat.com" target="_blank">linux-cluster@redhat.com</a>> > Subject: Re: [Linux-cluster] Node can't join already quorated cluster > > If you don't wanna use a real fence divice, because you only do some > test, you have to use fence_manual agent > > 2012/6/20 Javier Vela <<a href="mailto:jvdiago@gmail.com" target="_blank">jvdiago@gmail.com</a> <mailto:<a href="mailto:jvdiago@gmail.com" target="_blank">jvdiago@gmail.com</a>>> > > Hi, I have a very strange problem, and after searching through lot > of forums, I haven't found the solution. This is the scenario: > > Two node cluster with Red Hat 5.7, HA-LVM, no fencing and quorum > disk. I start qdiskd, cman and rgmanager on one node. After 5 > minutes, finally the fencing finishes and cluster get quorate with 2 > votes: > > [root@node2 ~]# clustat > Cluster Status for test_cluster @ Wed Jun 20 05:56:39 2012 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > node1-hb 1 Offline > node2-hb 2 Online, Local, rgmanager > /dev/mapper/vg_qdisk-lv_qdisk 0 Online, Quorum Disk > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > service:postgres node2 started > > Now, I start the second node. When cman reaches fencing, it hangs > for 5 minutes aprox, and finally fails. clustat says: > > root@node1 ~]# clustat > Cluster Status for test_cluster @ Wed Jun 20 06:01:12 2012 > Member Status: Inquorate > > Member Name ID Status > ------ ---- ---- ------ > node1-hb 1 Online, Local > node2-hb 2 Offline > /dev/mapper/vg_qdisk-lv_qdisk 0 Offline > > And in /var/log/messages I can see this errors: > > Jun 20 06:02:12 node1 openais[6098]: [TOTEM] entering OPERATIONAL state. > Jun 20 06:02:12 node1 openais[6098]: [CLM ] got nodejoin message > 15.15.2.10 > Jun 20 06:02:13 node1 dlm_controld[5386]: connect to ccs error -111, > check ccsd or cluster status > Jun 20 06:02:13 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > Jun 20 06:02:13 node1 ccsd[6090]: Error while processing connect: > Connection refused > Jun 20 06:02:13 node1 ccsd[6090]: Initial status:: Inquorate > Jun 20 06:02:13 node1 gfs_controld[5392]: connect to ccs error -111, > check ccsd or cluster status > Jun 20 06:02:13 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > Jun 20 06:02:13 node1 ccsd[6090]: Error while processing connect: > Connection refused > Jun 20 06:02:14 node1 openais[6098]: [TOTEM] entering GATHER state > from 9. > Jun 20 06:02:14 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > Jun 20 06:02:14 node1 ccsd[6090]: Error while processing connect: > Connection refused > Jun 20 06:02:14 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > Jun 20 06:02:14 node1 ccsd[6090]: Error while processing connect: > Connection refused > Jun 20 06:02:15 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > Jun 20 06:02:15 node1 ccsd[6090]: Error while processing connect: > Connection refused > Jun 20 06:02:15 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > Jun 20 06:02:15 node1 ccsd[6090]: Error while processing connect: > Connection refused > Jun 20 06:02:15 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > Jun 20 06:02:15 node1 ccsd[6090]: Error while processing connect: > Connection refused > Jun 20 06:02:16 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > Jun 20 06:02:16 node1 ccsd[6090]: Error while processing connect: > Connection refused > Jun 20 06:02:16 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > Jun 20 06:02:16 node1 ccsd[6090]: Error while processing connect: > Connection refused > Jun 20 06:02:17 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > Jun 20 06:02:17 node1 ccsd[6090]: Error while processing connect: > Connection refused > Jun 20 06:02:17 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > Jun 20 06:02:17 node1 ccsd[6090]: Error while processing connect: > Connection refused > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering GATHER state > from 0. > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] Creating commit token > because I am the rep. > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] Storing new sequence id > for ring 15c > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering COMMIT state. > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering RECOVERY state. > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] position [0] member > 15.15.2.10 <<a href="http://15.15.2.10" target="_blank">http://15.15.2.10</a>>: > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] previous ring seq 344 > rep 15.15.2.10 > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] aru e high delivered e > received flag 1 > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] Did not need to > originate any messages in recovery. > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] Sending initial ORF token > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering OPERATIONAL state. > Jun 20 06:02:18 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > Jun 20 06:02:18 node1 ccsd[6090]: Error while processing connect: > Connection refused > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering GATHER state > from 9. > Jun 20 06:02:18 node1 ccsd[6090]: Cluster is not quorate. Refusing > connection. > > And the quorum disk: > > [root@node2 ~]# mkqdisk -L -d > kqdisk v0.6.0 > /dev/mapper/vg_qdisk-lv_qdisk: > /dev/vg_qdisk/lv_qdisk: > Magic: eb7a62c2 > Label: cluster_qdisk > Created: Thu Jun 7 09:23:34 2012 > Host: node1 > Kernel Sector Size: 512 > Recorded Sector Size: 512 > > Status block for node 1 > Last updated by node 2 > Last updated on Wed Jun 20 06:17:23 2012 > State: Evicted > Flags: 0000 > Score: 0/0 > Average Cycle speed: 0.000500 seconds > Last Cycle speed: 0.000000 seconds > Incarnation: 4fe1a06c4fe1a06c > Status block for node 2 > Last updated by node 2 > Last updated on Wed Jun 20 07:09:38 2012 > State: Master > Flags: 0000 > Score: 0/0 > Average Cycle speed: 0.001000 seconds > Last Cycle speed: 0.000000 seconds > Incarnation: 4fe1a06c4fe1a06c > > > In the other node I don't see any errors in /var/log/messages. One > strange thing is that if I start cman on both nodes at the same > time, everything works fine and both nodes quorate (until I reboot > one node and the problem appears). I've checked that multicast is > working properly. With iperf I can send a receive multicast paquets. > Moreover I've seen with tcpdump the paquets that openais send when > cman is trying to start. I've readed about a bug in RH 5.3 with the > same behaviour, but it is solved in RH 5.4. > > I don't have Selinux enabled, and Iptables are also disabled. Here > is the cluster.conf simplified (with less services and resources). I > want to point out one thing. I have allow_kill="0" in order to avoid > fencing errors when quorum tries to fence a failed node. As <fence/> > is empty, before this stanza I got a lot of messages in > /var/log/messages with failed fencing. > > <?xml version="1.0"?> > <cluster alias="test_cluster" config_version="15" name="test_cluster"> > <fence_daemon clean_start="0" post_fail_delay="0" > post_join_delay="-1"/> > <clusternodes> > <clusternode name="node1-hb" nodeid="1" votes="1"> > <fence/> > </clusternode> > <clusternode name="node2-hb" nodeid="2" votes="1"> > <fence/> > </clusternode> > </clusternodes> > <cman two_node="0" expected_votes="3"/> > <fencedevices/> > > <rm log_facility="local4" log_level="7"> > <failoverdomains> > <failoverdomain name="etest_cluster_fo" > nofailback="1" ordered="1" restricted="1"> > <failoverdomainnode name="node1-hb" > priority="1"/> > <failoverdomainnode name="node2-hb" > priority="2"/> > </failoverdomain> > </failoverdomains> > <resources/> > <service autostart="1" domain="test_cluster_fo" > exclusive="0" name="postgres" recovery="relocate"> > <ip address="172.24.119.44" monitor_link="1"/> > <lvm name="vg_postgres" vg_name="vg_postgres" > lv_name="postgres"/> > > <fs device="/dev/vg_postgres/postgres" > force_fsck="1" force_unmount="1" fstype="ext3" > mountpoint="/var/lib/pgsql" name="postgres" self_fence="0"/> > > <script file="/etc/init.d/postgresql" name="postgres"> > </script> > </service> > </rm> > <totem consensus="4000" join="60" token="20000" > token_retransmits_before_loss_const="20"/> > <quorumd allow_kill="0" interval="1" label="cluster_qdisk" > tko="10" votes="1"> > <heuristic > program="/usr/share/cluster/check_eth_link.sh eth0" score="1" > interval="2" tko="3"/> > </quorumd> > </cluster> > > > The /etc/hosts: > 172.24.119.10 node1 > 172.24.119.34 node2 > 15.15.2.10 node1-hb node1-hb.localdomain > 15.15.2.11 node2-hb node2-hb.localdomain > > And the versions: > Red Hat Enterprise Linux Server release 5.7 (Tikanga) > cman-2.0.115-85.el5 > rgmanager-2.0.52-21.el5 > openais-0.80.6-30.el5 > > I don't know what else I should try, so if you can give me some > ideas, I will be very pleased. > > Regards, Javi. > > -- > Linux-cluster mailing list > <a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a> <mailto:<a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a>> > <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a> > > > > > -- > esta es mi vida e me la vivo hasta que dios quiera > > -- Linux-cluster mailing list <a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a> > <mailto:<a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a>> > <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a> > > > -- > Linux-cluster mailing list > <a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a> > <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a> > -- Digimer Papers and Projects: <a href="https://alteeve.com" target="_blank">https://alteeve.com</a></pre></blockquote>