From gounini.geekarea at gmail.com Wed Aug 1 08:28:07 2012 From: gounini.geekarea at gmail.com (GouNiNi) Date: Wed, 1 Aug 2012 10:28:07 +0200 (CEST) Subject: [Linux-cluster] Quorum device brain the cluster when master lose network In-Reply-To: Message-ID: <841982910.2925.1343809687840.JavaMail.root@geekarea.fr> I do this test one more time and I got same result with more precisions: When I shutdown network on 2 nodes including the master, master stay alive while the 2 online nodes are fencing the offline non-master node. The cluster goes Inquorate after. When fenced node came back, he joins cluster and cluster becomes quorate. New master is chose and the old master is fenced. # cman_tool status Version: 6.2.0 Config Version: 144 Cluster Name: cluname Cluster Id: 57462 Cluster Member: Yes Cluster Generation: 488 Membership state: Cluster-Member Nodes: 4 Expected votes: 5 Quorum device votes: 1 Total votes: 5 Quorum: 3 Active subsystems: 9 Flags: Dirty Ports Bound: 0 177 Node name: nodename Node ID: 2 Multicast addresses: ZZ.ZZ.ZZ.ZZ Node addresses: YY.YY.YY.YY -- .`'`. GouNiNi : ': : `. ` .` GNU/Linux `'` http://www.geekarea.fr ----- Mail original ----- > De: "emmanuel segura" > ?: "linux clustering" > Envoy?: Lundi 30 Juillet 2012 17:35:39 > Objet: Re: [Linux-cluster] Quorum device brain the cluster when master lose network > > > can you send me the ouput from cman_tool status? when the cluster > it's running > > > 2012/7/30 GouNiNi < gounini.geekarea at gmail.com > > > > > > ----- Mail original ----- > > De: "Digimer" < lists at alteeve.ca > > > ?: "linux clustering" < linux-cluster at redhat.com > > > Cc: "GouNiNi" < gounini.geekarea at gmail.com > > > Envoy?: Lundi 30 Juillet 2012 17:10:10 > > Objet: Re: [Linux-cluster] Quorum device brain the cluster when > > master lose network > > > > On 07/30/2012 10:43 AM, GouNiNi wrote: > > > Hello, > > > > > > I did some tests on 4 nodes cluster with quorum device and I find > > > a > > > bad situation with one test, so I need your knowledges to correct > > > my configuration. > > > > > > Configuation: > > > 4 nodes, all vote for 1 > > > quorum device vote for 1 (to hold services with minimum 2 nodes > > > up) > > > cman expected votes 5 > > > > > > Situation: > > > I shut down network on 2 nodes, one of them is master. > > > > > > Observation: > > > Fencing of one node (the master)... Quorum device Offline, Quorum > > > disolved ! Services stopped. > > > Fenced node reboot, cluster is quorate, 2nd offline node is > > > fenced. > > > Services restart. > > > 2nd node offline reboot. > > > > > > My cluster is not quorate for 8 min (very long hardware boot :-) > > > and my services were offline. > > > > > > Do you know how to prevent this situation? > > > > > > Regards, > > > > Please tell us the name and version of the cluster software you are > > using, Please also share your configuration file(s). > > > > -- > > Digimer > > Papers and Projects: https://alteeve.com > > > > Sorry, RHEL5.6 64bits > > # rpm -q cman rgmanager > cman-2.0.115-68.el5 > rgmanager-2.0.52-9.el5 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > login="xxxx" name="fenceIBM_307" passwd="yyyy"/> > login="xxxx" name="fenceIBM_308" passwd="yyyy"/> > > > > > > <...> > > > post_join_delay="300"/> > > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > esta es mi vida e me la vivo hasta que dios quiera > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From gounini.geekarea at gmail.com Wed Aug 1 08:29:02 2012 From: gounini.geekarea at gmail.com (GouNiNi) Date: Wed, 1 Aug 2012 10:29:02 +0200 (CEST) Subject: [Linux-cluster] How to change quorumd intervel and tko online? In-Reply-To: <1277138514.2193.1343659611647.JavaMail.root@geekarea.fr> Message-ID: <1145411303.2926.1343809742361.JavaMail.root@geekarea.fr> Infos: RHEL5.6 64bits # rpm -q cman rgmanager cman-2.0.115-68.el5 rgmanager-2.0.52-9.el5 Any idea? -- .`'`. GouNiNi : ': : `. ` .` GNU/Linux `'` http://www.geekarea.fr ----- Mail original ----- > De: "GouNiNi" > ?: "linux clustering" > Envoy?: Lundi 30 Juillet 2012 16:46:51 > Objet: [Linux-cluster] How to change quorumd intervel and tko online? > > Re, > > Juste two little questions. > How to change quorumd intervel and tko **online**? > How to check these values on online cluster? > > Thanks > Regards, > > -- > .`'`. GouNiNi > : ': : > `. ` .` GNU/Linux > `'` http://www.geekarea.fr > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From emi2fast at gmail.com Wed Aug 1 08:58:59 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Wed, 1 Aug 2012 10:58:59 +0200 Subject: [Linux-cluster] Quorum device brain the cluster when master lose network In-Reply-To: <841982910.2925.1343809687840.JavaMail.root@geekarea.fr> References: <841982910.2925.1343809687840.JavaMail.root@geekarea.fr> Message-ID: Hello Gounini Sorry but it told you, remove and reboot the cluster Let the cluster calculate the expected votes 2012/8/1 GouNiNi > I do this test one more time and I got same result with more precisions: > > When I shutdown network on 2 nodes including the master, master stay alive > while the 2 online nodes are fencing the offline non-master node. The > cluster goes Inquorate after. > When fenced node came back, he joins cluster and cluster becomes quorate. > New master is chose and the old master is fenced. > > # cman_tool status > Version: 6.2.0 > Config Version: 144 > Cluster Name: cluname > Cluster Id: 57462 > Cluster Member: Yes > Cluster Generation: 488 > Membership state: Cluster-Member > Nodes: 4 > Expected votes: 5 > Quorum device votes: 1 > Total votes: 5 > Quorum: 3 > Active subsystems: 9 > Flags: Dirty > Ports Bound: 0 177 > Node name: nodename > Node ID: 2 > Multicast addresses: ZZ.ZZ.ZZ.ZZ > Node addresses: YY.YY.YY.YY > > -- > .`'`. GouNiNi > : ': : > `. ` .` GNU/Linux > `'` http://www.geekarea.fr > > > ----- Mail original ----- > > De: "emmanuel segura" > > ?: "linux clustering" > > Envoy?: Lundi 30 Juillet 2012 17:35:39 > > Objet: Re: [Linux-cluster] Quorum device brain the cluster when master > lose network > > > > > > can you send me the ouput from cman_tool status? when the cluster > > it's running > > > > > > 2012/7/30 GouNiNi < gounini.geekarea at gmail.com > > > > > > > > > > > ----- Mail original ----- > > > De: "Digimer" < lists at alteeve.ca > > > > ?: "linux clustering" < linux-cluster at redhat.com > > > > Cc: "GouNiNi" < gounini.geekarea at gmail.com > > > > Envoy?: Lundi 30 Juillet 2012 17:10:10 > > > Objet: Re: [Linux-cluster] Quorum device brain the cluster when > > > master lose network > > > > > > On 07/30/2012 10:43 AM, GouNiNi wrote: > > > > Hello, > > > > > > > > I did some tests on 4 nodes cluster with quorum device and I find > > > > a > > > > bad situation with one test, so I need your knowledges to correct > > > > my configuration. > > > > > > > > Configuation: > > > > 4 nodes, all vote for 1 > > > > quorum device vote for 1 (to hold services with minimum 2 nodes > > > > up) > > > > cman expected votes 5 > > > > > > > > Situation: > > > > I shut down network on 2 nodes, one of them is master. > > > > > > > > Observation: > > > > Fencing of one node (the master)... Quorum device Offline, Quorum > > > > disolved ! Services stopped. > > > > Fenced node reboot, cluster is quorate, 2nd offline node is > > > > fenced. > > > > Services restart. > > > > 2nd node offline reboot. > > > > > > > > My cluster is not quorate for 8 min (very long hardware boot :-) > > > > and my services were offline. > > > > > > > > Do you know how to prevent this situation? > > > > > > > > Regards, > > > > > > Please tell us the name and version of the cluster software you are > > > using, Please also share your configuration file(s). > > > > > > -- > > > Digimer > > > Papers and Projects: https://alteeve.com > > > > > > > Sorry, RHEL5.6 64bits > > > > # rpm -q cman rgmanager > > cman-2.0.115-68.el5 > > rgmanager-2.0.52-9.el5 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > login="xxxx" name="fenceIBM_307" passwd="yyyy"/> > > > login="xxxx" name="fenceIBM_308" passwd="yyyy"/> > > > > > > > > > > > > <...> > > > > > > > post_join_delay="300"/> > > > > > > > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > > esta es mi vida e me la vivo hasta que dios quiera > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From piotr.pietrzak at hp.com Wed Aug 1 10:37:15 2012 From: piotr.pietrzak at hp.com (Pietrzak, Piotr (CMS rtBSS)) Date: Wed, 1 Aug 2012 10:37:15 +0000 Subject: [Linux-cluster] Named pipes not working on GFS2 in Redhat 5.x, In-Reply-To: <3ABFB3D87EB6904F9F6FB45C90E24F4D0808A4@G1W3650.americas.hpqcorp.net> References: <3ABFB3D87EB6904F9F6FB45C90E24F4D0808A4@G1W3650.americas.hpqcorp.net> Message-ID: <3ABFB3D87EB6904F9F6FB45C90E24F4D0808C9@G1W3650.americas.hpqcorp.net> Hello Bob, The problem has been reported by one of my customer, but I was not able to set up a real cluster so I have built a cluster with single machine, all software cluster and GFS2 installed and set up just one node. It allows me to mount GFS2 filesystem and conduct tests with application and shell steps. In the real system following piece of code crashed application start up if ( ioctl( nFifoDescr, FIONREAD, &lSize) == -1 ) { errNo = errno; However when I have built the test system I was able to see that when I have created pipes on the GFS2 and I tried to use in shell I am getting an error message. For instance when I tried to write to pipe following error shows up immediately [root at erm4 gfs_test]# echo test > msg_act_0.pipe -bash: echo: write error: Invalid argument When I try to read from pipe following error comes [root at erm4 gfs_test]# cat From queszama at yahoo.in Wed Aug 1 11:56:46 2012 From: queszama at yahoo.in (Zama Ques) Date: Wed, 1 Aug 2012 19:56:46 +0800 (SGT) Subject: [Linux-cluster] Creating two different cluster using same set of nodes. Message-ID: <1343822206.68654.YahooMailNeo@web193005.mail.sg3.yahoo.com> Hi All , Need clarifications whether it is possible to create two different cluster using the same set of nodes. Looks like Redhat Cluster Suite does not support creating different clusters using the same nodes. I am getting the following error while building the second cluster using the same nodes using luci interface .? ==== [dismiss] The following errors occurred: ??? * Host system3.example.com is already a member of the cluster named "ClusterA" ??? * Host system4.example.com is already a member of the cluster named "ClusterA" === My query is that does Redhat Cluster Suite allows in any way to create two different clusters using same nodes. ?If not , any reason for not allowing ?this feature?. Thanks in Advance Zaman -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Wed Aug 1 12:23:39 2012 From: lists at alteeve.ca (Digimer) Date: Wed, 01 Aug 2012 08:23:39 -0400 Subject: [Linux-cluster] Creating two different cluster using same set of nodes. In-Reply-To: <1343822206.68654.YahooMailNeo@web193005.mail.sg3.yahoo.com> References: <1343822206.68654.YahooMailNeo@web193005.mail.sg3.yahoo.com> Message-ID: <50191FCB.7000804@alteeve.ca> On 08/01/2012 07:56 AM, Zama Ques wrote: > Hi All , > > Need clarifications whether it is possible to create two different > cluster using the same set of nodes. > > Looks like Redhat Cluster Suite does not support creating different > clusters using the same nodes. I am getting the following > error while building the second cluster using the same nodes using luci > interface . > > ==== > [dismiss] > > The following errors occurred: > * Host system3.example.com is already a member of the cluster named > "ClusterA" > * Host system4.example.com is already a member of the cluster named > "ClusterA" > === > > My query is that does Redhat Cluster Suite allows in any way to create > two different clusters using same nodes. If not , any reason for not > allowing this feature?. > > > Thanks in Advance > Zaman It is not possible, no. A node must be in one cluster only. May I ask why you're trying to do this? -- Digimer Papers and Projects: https://alteeve.com From gianluca.cecchi at gmail.com Wed Aug 1 14:10:48 2012 From: gianluca.cecchi at gmail.com (Gianluca Cecchi) Date: Wed, 1 Aug 2012 16:10:48 +0200 Subject: [Linux-cluster] clvmd problems with centos 6.3 or normal clvmd behaviour? Message-ID: Hello, testing a three node cluster + quorum disk and clvmd. I was at CentOS 6.2 and I seem to remember to be able to start a single node. Correct? Then I upgraded to CentOS 6.3 and had a working environment. My config has At the moment two nodes are in another site that is powered down and I need to start a single node config. When the node starts it gets waiting for quorum and when quorum disk becomes master it goes ahead: # cman_tool nodes Node Sts Inc Joined Name 0 M 0 2012-08-01 15:41:58 /dev/block/253:4 1 X 0 intrarhev1 2 X 0 intrarhev2 3 M 1420 2012-08-01 15:39:58 intrarhev3 But the process hangs at clvmd start up. In particular at the step vgchange -aly Pid of "service clvmd start" command is 9335 # pstree -alp 9335 S24clvmd,9335 /etc/rc3.d/S24clvmd start ??vgchange,9363 -ayl # ll /proc/9363/fd/ total 0 lrwx------ 1 root root 64 Aug 1 15:44 0 -> /dev/console lrwx------ 1 root root 64 Aug 1 15:44 1 -> /dev/console lrwx------ 1 root root 64 Aug 1 15:44 2 -> /dev/console lrwx------ 1 root root 64 Aug 1 15:44 3 -> /dev/mapper/control lrwx------ 1 root root 64 Aug 1 15:44 4 -> socket:[1348167] lr-x------ 1 root root 64 Aug 1 15:44 5 -> /dev/dm-3 # lsof -p 9363 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME vgchange 9363 root cwd DIR 104,3 4096 2 / vgchange 9363 root rtd DIR 104,3 4096 2 / vgchange 9363 root txt REG 104,3 971464 132238 /sbin/lvm vgchange 9363 root mem REG 104,3 156872 210 /lib64/ld-2.12.so vgchange 9363 root mem REG 104,3 1918016 569 /lib64/libc-2.12.so vgchange 9363 root mem REG 104,3 22536 593 /lib64/libdl-2.12.so vgchange 9363 root mem REG 104,3 24000 832 /lib64/libdevmapper-event.so.1.02 vgchange 9363 root mem REG 104,3 124624 750 /lib64/libselinux.so.1 vgchange 9363 root mem REG 104,3 272008 2060 /lib64/libreadline.so.6.0 vgchange 9363 root mem REG 104,3 138280 2469 /lib64/libtinfo.so.5.7 vgchange 9363 root mem REG 104,3 61648 1694 /lib64/libudev.so.0.5.1 vgchange 9363 root mem REG 104,3 251112 1489 /lib64/libsepol.so.1 vgchange 9363 root mem REG 104,3 229024 1726 /lib64/libdevmapper.so.1.02 vgchange 9363 root mem REG 253,7 99158576 17029 /usr/lib/locale/locale-archive vgchange 9363 root mem REG 253,7 26060 134467 /usr/lib64/gconv/gconv-modules.cache vgchange 9363 root 0u CHR 5,1 0t0 5218 /dev/console vgchange 9363 root 1u CHR 5,1 0t0 5218 /dev/console vgchange 9363 root 2u CHR 5,1 0t0 5218 /dev/console vgchange 9363 root 3u CHR 10,58 0t0 5486 /dev/mapper/control vgchange 9363 root 4u unix 0xffff880879b309c0 0t0 1348167 socket vgchange 9363 root 5r BLK 253,3 0t143360 10773 /dev/dm-3 # strace -p 9363 Process 9363 attached - interrupt to quit read(4, multipath seems ok in general and for md=3 in particular # multipath -l /dev/mapper/mpathd mpathd (3600507630efe0b0c0000000000001181) dm-3 IBM,1750500 size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=0 status=active | |- 0:0:0:3 sdd 8:48 active undef running | `- 1:0:0:3 sdl 8:176 active undef running `-+- policy='round-robin 0' prio=0 status=enabled |- 0:0:1:3 sdq 65:0 active undef running `- 1:0:1:3 sdy 65:128 active undef running Currently I have lvm2-2.02.95-10.el6.x86_64 lvm2-cluster-2.02.95-10.el6.x86_64 startup is stuck as in image attached Logs messages: Aug 1 15:46:14 udevd[663]: worker [9379] unexpectedly returned with status 0x0100 Aug 1 15:46:14 udevd[663]: worker [9379] failed while handling '/devices/virtual/block/dm-15' dmesg DLM (built Jul 20 2012 01:56:50) installed dlm: Using TCP for communications qdiskd Aug 01 15:41:58 qdiskd Score sufficient for master operation (1/1; required=1); upgrading Aug 01 15:43:03 qdiskd Assuming master role corosync.log Aug 01 15:41:58 corosync [CMAN ] quorum device registered Aug 01 15:43:08 corosync [CMAN ] quorum regained, resuming activity Aug 01 15:43:08 corosync [QUORUM] This node is within the primary component and will provide service. Aug 01 15:43:08 corosync [QUORUM] Members[1]: 3 fenced.log Aug 01 15:43:09 fenced fenced 3.0.12.1 started Aug 01 15:43:09 fenced failed to get dbus connection dlm_controld.log Aug 01 15:43:10 dlm_controld dlm_controld 3.0.12.1 started gfs_controld.log Aug 01 15:43:11 gfs_controld gfs_controld 3.0.12.1 started Do I miss anything simple? Is it correct to say that clvmd can start only when one node is active, given that it has quorum under the cluster configuration rules set up? Or am I hitting any known bug/problem? Thanks in advance, Gianluca -------------- next part -------------- A non-text attachment was scrubbed... Name: clvms stuck.png Type: image/png Size: 21666 bytes Desc: not available URL: From sdake at redhat.com Wed Aug 1 14:14:48 2012 From: sdake at redhat.com (Steven Dake) Date: Wed, 01 Aug 2012 07:14:48 -0700 Subject: [Linux-cluster] Need HA for VMs on OpenStack? check out Heat V5 Message-ID: <501939D8.9080209@redhat.com> Hi folks, A few developers from HA community have been hard at work on a project called heat which provides native HA for OpenStack virtual machines. Heat provides a template based system with API matching AWS CloudFormation semantics specifically for OpenStack. In v5, instance heatlhchecking has been added. To get started on Fedora 16+ check out the getting started guide: https://github.com/heat-api/heat/blob/master/docs/GettingStarted.rst#readme or on Ubuntu Precise check out the devstack guide: https://github.com/heat-api/heat/wiki/Getting-Started-with-Heat-using-Master-on-Ubuntu An example template with instance HA features is here: https://github.com/heat-api/heat/blob/master/templates/WordPress_Single_Instance_With_IHA.template An example template with applicatoin HA features that includes escalation is here: https://github.com/heat-api/heat/blob/master/templates/WordPress_Single_Instance_With_HA.template Our website is here: http://www.heat-api.org The software can be downloaded from: https://github.com/heat-api/heat/downloads Enjoy -steve From emi2fast at gmail.com Wed Aug 1 14:26:38 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Wed, 1 Aug 2012 16:26:38 +0200 Subject: [Linux-cluster] clvmd problems with centos 6.3 or normal clvmd behaviour? In-Reply-To: References: Message-ID: Hello GianLuca Why you don't remove expected_votes=3 and let the cluster automatic calculate that I told you be cause i had some many problems with that setting 2012/8/1 Gianluca Cecchi > Hello, > testing a three node cluster + quorum disk and clvmd. > I was at CentOS 6.2 and I seem to remember to be able to start a > single node. Correct? > Then I upgraded to CentOS 6.3 and had a working environment. > My config has > > > At the moment two nodes are in another site that is powered down and I > need to start a single node config. > > When the node starts it gets waiting for quorum and when quorum disk > becomes master it goes ahead: > > # cman_tool nodes > Node Sts Inc Joined Name > 0 M 0 2012-08-01 15:41:58 /dev/block/253:4 > 1 X 0 intrarhev1 > 2 X 0 intrarhev2 > 3 M 1420 2012-08-01 15:39:58 intrarhev3 > > But the process hangs at clvmd start up. In particular at the step > vgchange -aly > Pid of "service clvmd start" command is 9335 > > # pstree -alp 9335 > S24clvmd,9335 /etc/rc3.d/S24clvmd start > ??vgchange,9363 -ayl > > > # ll /proc/9363/fd/ > total 0 > lrwx------ 1 root root 64 Aug 1 15:44 0 -> /dev/console > lrwx------ 1 root root 64 Aug 1 15:44 1 -> /dev/console > lrwx------ 1 root root 64 Aug 1 15:44 2 -> /dev/console > lrwx------ 1 root root 64 Aug 1 15:44 3 -> /dev/mapper/control > lrwx------ 1 root root 64 Aug 1 15:44 4 -> socket:[1348167] > lr-x------ 1 root root 64 Aug 1 15:44 5 -> /dev/dm-3 > > # lsof -p 9363 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > vgchange 9363 root cwd DIR 104,3 4096 2 / > vgchange 9363 root rtd DIR 104,3 4096 2 / > vgchange 9363 root txt REG 104,3 971464 132238 > /sbin/lvm > vgchange 9363 root mem REG 104,3 156872 210 > /lib64/ld-2.12.so > vgchange 9363 root mem REG 104,3 1918016 569 > /lib64/libc-2.12.so > vgchange 9363 root mem REG 104,3 22536 593 > /lib64/libdl-2.12.so > vgchange 9363 root mem REG 104,3 24000 832 > /lib64/libdevmapper-event.so.1.02 > vgchange 9363 root mem REG 104,3 124624 750 > /lib64/libselinux.so.1 > vgchange 9363 root mem REG 104,3 272008 2060 > /lib64/libreadline.so.6.0 > vgchange 9363 root mem REG 104,3 138280 2469 > /lib64/libtinfo.so.5.7 > vgchange 9363 root mem REG 104,3 61648 1694 > /lib64/libudev.so.0.5.1 > vgchange 9363 root mem REG 104,3 251112 1489 > /lib64/libsepol.so.1 > vgchange 9363 root mem REG 104,3 229024 1726 > /lib64/libdevmapper.so.1.02 > vgchange 9363 root mem REG 253,7 99158576 17029 > /usr/lib/locale/locale-archive > vgchange 9363 root mem REG 253,7 26060 134467 > /usr/lib64/gconv/gconv-modules.cache > vgchange 9363 root 0u CHR 5,1 0t0 5218 > /dev/console > vgchange 9363 root 1u CHR 5,1 0t0 5218 > /dev/console > vgchange 9363 root 2u CHR 5,1 0t0 5218 > /dev/console > vgchange 9363 root 3u CHR 10,58 0t0 5486 > /dev/mapper/control > vgchange 9363 root 4u unix 0xffff880879b309c0 0t0 1348167 socket > vgchange 9363 root 5r BLK 253,3 0t143360 10773 > /dev/dm-3 > > > # strace -p 9363 > Process 9363 attached - interrupt to quit > read(4, > > multipath seems ok in general and for md=3 in particular > # multipath -l /dev/mapper/mpathd > mpathd (3600507630efe0b0c0000000000001181) dm-3 IBM,1750500 > size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw > |-+- policy='round-robin 0' prio=0 status=active > | |- 0:0:0:3 sdd 8:48 active undef running > | `- 1:0:0:3 sdl 8:176 active undef running > `-+- policy='round-robin 0' prio=0 status=enabled > |- 0:0:1:3 sdq 65:0 active undef running > `- 1:0:1:3 sdy 65:128 active undef running > > Currently I have > lvm2-2.02.95-10.el6.x86_64 > lvm2-cluster-2.02.95-10.el6.x86_64 > > startup is stuck as in image attached > > Logs > messages: > Aug 1 15:46:14 udevd[663]: worker [9379] unexpectedly returned with > status 0x0100 > Aug 1 15:46:14 udevd[663]: worker [9379] failed while handling > '/devices/virtual/block/dm-15' > > dmesg > DLM (built Jul 20 2012 01:56:50) installed > dlm: Using TCP for communications > > > qdiskd > Aug 01 15:41:58 qdiskd Score sufficient for master operation (1/1; > required=1); upgrading > Aug 01 15:43:03 qdiskd Assuming master role > > corosync.log > Aug 01 15:41:58 corosync [CMAN ] quorum device registered > Aug 01 15:43:08 corosync [CMAN ] quorum regained, resuming activity > Aug 01 15:43:08 corosync [QUORUM] This node is within the primary > component and will provide service. > Aug 01 15:43:08 corosync [QUORUM] Members[1]: 3 > > fenced.log > Aug 01 15:43:09 fenced fenced 3.0.12.1 started > Aug 01 15:43:09 fenced failed to get dbus connection > > dlm_controld.log > Aug 01 15:43:10 dlm_controld dlm_controld 3.0.12.1 started > > gfs_controld.log > Aug 01 15:43:11 gfs_controld gfs_controld 3.0.12.1 started > > > Do I miss anything simple? > Is it correct to say that clvmd can start only when one node is > active, given that it has quorum under the cluster configuration rules > set up? > > Or am I hitting any known bug/problem? > > Thanks in advance, > Gianluca > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From gounini.geekarea at gmail.com Wed Aug 1 14:32:17 2012 From: gounini.geekarea at gmail.com (GouNiNi) Date: Wed, 1 Aug 2012 16:32:17 +0200 (CEST) Subject: [Linux-cluster] reasons for sporadic token loss? In-Reply-To: <5017E44D.2010702@itechnical.de> Message-ID: <861018155.3131.1343831537507.JavaMail.root@geekarea.fr> Hello, My answers is just feelings, I have to little experiment on RHCS 6. A. Your token lose is on token, consensus comes after losing token. B. Maybe your problem is on network, not on cluster tuning. C. No idea D. I think it doesn't. Token multicast use your network address matching nodename resolved. I think you should use your interconnect link on one single interface for testing without bonding. If your problem disepear, your bond mode 5 is KO. Regards, -- .`'`. GouNiNi : ': : `. ` .` GNU/Linux `'` http://www.geekarea.fr ----- Mail original ----- > De: "Heiko Nardmann" > ?: linux-cluster at redhat.com > Envoy?: Mardi 31 Juillet 2012 15:57:33 > Objet: [Linux-cluster] reasons for sporadic token loss? > > Hi together! > > I am experiencing sporadic problems with my cluster setup. Maybe > someone > has an idea? But first some facts: > > Type: RHEL 6.1 two node cluster (corosync 1.2.3-36) on two Dell R610 > each with a quad port NIC > > NICs: > - interfaces em1/em2 are bonded using mode 5; these interfaces are > cross > connected (intended to be used for the cluster housekeeping > communication) - no network element in between > - interfaces em3/em4 are bonded using mode 1; these interfaces are > connected to two switches > > Cluster configuration: > > > > > > > > > > > > > > > > > > > > > > > > > > > vboxhost="vboxhost.private" login="test" vmname="RHEL 6.1 x86_64 > DF-System Server 1" /> > vboxhost="vboxhost.private" login="test" vmname="RHEL 6.1 x86_64 > DF-System Server 2" /> > > > > sleeptime="10"/> >