From swap_project at yahoo.com Thu Sep 1 14:57:18 2011 From: swap_project at yahoo.com (Srija) Date: Thu, 1 Sep 2011 07:57:18 -0700 (PDT) Subject: [Linux-cluster] vm guest migrating through clusvcadm In-Reply-To: <1314702563.2694.13.camel@menhir> References: <4E5CC0B8.8010209@sissa.it> <1314702563.2694.13.camel@menhir> Message-ID: <1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com> ????Hi, ? ? ? I have confgured a guest? in the cluster environment to restart?in another? node?, when the node on which the guest resides if goes down. ? ? The guest? has no issue when i am live migrating it with 'xm migrate' command.? But having issues? when trying to migrated? with ? ? 'clusvcadm'.? The? cluster is not? under kvm.? ? ? The error is? : ? ??????????? Trying to migrate service:guest1 ?to node1...Service does not exist. ? ? Here is the configuration of the cluster: ? ? ??????????????? ??????????????????????? ??????????????????????????????? ?????????????????????????????? ...................... ???????????????????????????? ??????????????????????? ??????????????? ??????????????? ??????????????????? ??????????????? ??????? ? I tried also changing the cluster configuration , placing the guest? name as? service? , but it did not work? as well. ? ?Can anyone pl. confirm me that does it need the kvm to use? 'clusvcadm'? command? ? If not? then what kind of modications? is needed ? ?to? migrate the guest? with clusvcadm? ? Thanks From mmorgan at dca.net Thu Sep 1 20:58:18 2011 From: mmorgan at dca.net (Michael Morgan) Date: Thu, 1 Sep 2011 16:58:18 -0400 Subject: [Linux-cluster] "Invalid resource" starting KVM guest with clusvcadm In-Reply-To: <20110825214529.GF7305@staff.dca.net> References: <20110825214529.GF7305@staff.dca.net> Message-ID: <20110901205818.GD545@staff.dca.net> On Thu, Aug 25, 2011 at 05:45:29PM -0400, Michael Morgan wrote: > Hello, > > I have a 2 node KVM cluster under Scientific Linux 6.1. Starting guests > works fine through virsh, virt-manager, and even rg_test. When I try to > use clusvcadm however: > > [root at node1 ~]# clusvcadm -e vm:test > Local machine trying to enable vm:test...Invalid operation for resource > After poring through vm.sh and adding some logging I see that clusvcadm on the bad cluster is running "vm.sh status" and fails after "virsh domstate test". Both rg_test on the bad cluster and clusvcadm on a working cluster run "vm.sh start" which correctly follows up with "virsh create /mnt/shared/xml/test.xml". I can't think of any reason why this would be happening though. -- Michael Morgan mmorgan at dca.net From rhayden.public at gmail.com Fri Sep 2 13:38:25 2011 From: rhayden.public at gmail.com (Robert Hayden) Date: Fri, 2 Sep 2011 08:38:25 -0500 Subject: [Linux-cluster] RHEL 5.7: cpg_leave error retrying Message-ID: Has anyone experienced the following error/hang/loop when attempting to stop rgmanager or cman on the last node of a two node cluster? groupd[4909]: cpg_leave error retrying Basic scenario: RHEL 5.7 with the latest errata for cman. Create a two node cluster with qdisk and higher totem token=70000 start cman on both nodes, wait for qdisk to become online with master determined stop cman on node1, wait for it to complete stop cman on node2 error "cpg_leave" seen in logging output Observations: The "service cman stop" command hangs at "Stopping fencing" output If I cycle openais service with "service openais restart", then the "service cman stop" will complete (need to manually stop the openais service afterwards). When hung, the command "group_tool dump" hangs (any group_tool command hangs). The hang is inconsistent which, in my mind, implies a timing issue. Inconsistent meaning that every once in a while, then shutdown will complete (maybe 20% of the time). I have seen the issue with the stopping of rgmanager and cman. The below example has been stripped down to show the hang with cman. I have tested with varying the length of time to wait before stopping the second node with no difference (hang still occurs periodically). I have tested with commenting out the totem token and the quorum_dev_poll and still experienced the hang. (we use the longer timeouts to help survive network and san blips)/ I have dug through some of the source code. The message appears in group's cpg.c as function do_cpg_leave( ). This calls the cpg_leave function located in the openais package. If I attach to the groupd process with gdb, I get the following stack. Watching with strace, groupd is just in a looping state. (gdb) where #0 0x000000341409a510 in __nanosleep_nocancel () from /lib64/libc.so.6 #1 0x000000341409a364 in sleep () from /lib64/libc.so.6 #2 0x000000000040a410 in time () #3 0x000000000040bd09 in time () #4 0x000000000040e2cb in time () #5 0x000000000040ebe0 in time () #6 0x000000000040f394 in time () #7 0x000000341401d994 in __libc_start_main () from /lib64/libc.so.6 #8 0x00000000004018f9 in time () #9 0x00007fff04a671c8 in ?? () #10 0x0000000000000000 in ?? () If I attach to the aisexec process with gdb, I see the following: (gdb) where #0 0x00000034140cb696 in poll () from /lib64/libc.so.6 #1 0x0000000000405c50 in poll_run () #2 0x0000000000418aae in main () As you can see in the cluster.conf example below, I have attempted many different ways to create more debug logging. I do see debug messages from openais in the cpg.c component during startup, but nothing is logged on the shutdown hang scenario. I would appreciate any guidance on how to troubleshoot further, especially with increasing the tracing of the openais calls in cpg.c. Thanks Robert Example cluster.conf: From rhayden.public at gmail.com Fri Sep 2 14:33:17 2011 From: rhayden.public at gmail.com (Robert Hayden) Date: Fri, 2 Sep 2011 09:33:17 -0500 Subject: [Linux-cluster] RHEL 5.7: cpg_leave error retrying In-Reply-To: References: Message-ID: I modified the /etc/init.d/cman script to use the -D flag on the groupd start and re-direct the output to a file in /tmp. During the hang, I see groupd looping through the cpg_leave function. When I restart openais, it appears that groupd will get an error code "2" and then break out of the loop. Looks like I need to dig into the openais cpg_leave function.... Here is the output of the groupg -D output with the openais restart at the very end. 1314973495 cman: our nodeid 2 name node2-priv quorum 1 1314973495 setup_cpg groupd_handle 6b8b456700000000 1314973495 groupd confchg total 2 left 0 joined 1 1314973495 send_version nodeid 2 cluster 2 mode 2 compat 1 1314973495 client connection 3 1314973495 got client 3 setup 1314973495 setup fence 0 1314973495 client connection 4 1314973495 got client 4 setup 1314973495 setup dlm 1 1314973495 client connection 5 1314973495 got client 5 setup 1314973495 setup gfs 2 1314973496 got client 3 join 1314973496 0:default got join 1314973496 0:default is cpg client 6 name 0_default handle 79e2a9e300000001 1314973496 0:default cpg_join ok 1314973496 0:default waiting for first cpg event 1314973496 client connection 7 1314973496 0:default waiting for first cpg event 1314973496 0:default confchg left 0 joined 1 total 2 1314973496 0:default process_node_join 2 1314973496 0:default cpg add node 1 total 1 1314973496 0:default cpg add node 2 total 2 1314973496 0:default make_event_id 200020001 nodeid 2 memb_count 2 type 1 1314973496 0:default queue join event for nodeid 2 1314973496 0:default process_current_event 200020001 2 JOIN_BEGIN 1314973496 0:default app node init: add 2 total 1 1314973496 0:default app node init: add 1 total 2 1314973496 0:default waiting for 1 more stopped messages before JOIN_ALL_STOPPED 2 1314973496 got client 7 get_group 1314973496 0:default waiting for 1 more stopped messages before JOIN_ALL_STOPPED 2 1314973496 0:default waiting for 1 more stopped messages before JOIN_ALL_STOPPED 2 1314973496 0:default mark node 1 stopped 1314973496 0:default set global_id 10001 from 1 1314973496 0:default process_current_event 200020001 2 JOIN_ALL_STOPPED 1314973496 0:default action for app: setid default 65537 1314973496 0:default action for app: start default 1 2 2 1 2 1314973496 0:default mark node 1 started 1314973496 client connection 7 1314973496 got client 7 get_group 1314973496 client connection 7 1314973496 got client 7 get_group 1314973496 got client 3 start_done 1314973496 0:default send started 1314973496 0:default mark node 2 started 1314973496 0:default process_current_event 200020001 2 JOIN_ALL_STARTED 1314973496 0:default action for app: finish default 1 1314973497 client connection 7 1314973497 got client 7 get_group 1314973557 cman: node 0 added 1314973580 0:default confchg left 1 joined 0 total 1 1314973580 0:default confchg removed node 1 reason 2 1314973580 0:default process_node_leave 1 1314973580 0:default cpg del node 1 total 1 1314973580 0:default make_event_id 100010002 nodeid 1 memb_count 1 type 2 1314973580 0:default queue leave event for nodeid 1 1314973580 0:default process_current_event 100010002 1 LEAVE_BEGIN 1314973580 0:default action for app: stop default 1314973580 got client 3 stop_done 1314973580 0:default send stopped 1314973580 0:default waiting for 2 more stopped messages before LEAVE_ALL_STOPPED 1 1314973580 0:default mark node 1 stopped 1314973580 0:default waiting for 1 more stopped messages before LEAVE_ALL_STOPPED 1 1314973580 0:default waiting for 1 more stopped messages before LEAVE_ALL_STOPPED 1 1314973580 0:default mark node 2 stopped 1314973580 0:default process_current_event 100010002 1 LEAVE_ALL_STOPPED 1314973580 0:default app node leave: del 1 total 1 1314973580 0:default action for app: start default 2 3 1 2 1314973580 got client 3 start_done 1314973580 0:default send started 1314973580 0:default mark node 2 started 1314973580 0:default process_current_event 100010002 1 LEAVE_ALL_STARTED 1314973580 0:default action for app: finish default 2 1314973583 cman: node 1 removed 1314973583 add_recovery_set_cman nodeid 1 1314973591 got client 3 leave 1314973591 0:default got leave 1314973591 cpg_leave error retry 1314973592 cpg_leave error retry 1314973593 cpg_leave error retry 1314973594 cpg_leave error retry 1314973595 cpg_leave error retry 1314973596 cpg_leave error retry 1314973597 cpg_leave error retry 1314973598 cpg_leave error retry 1314973599 cpg_leave error retry 1314973600 cpg_leave error retry 1314973601 0:default cpg_leave error retrying 1314973601 cpg_leave error retry 1314973602 cpg_leave error retry 1314973603 cpg_leave error retry 1314973604 cpg_leave error retry 1314973605 cpg_leave error retry 1314973606 cpg_leave error retry 1314973607 cpg_leave error retry 1314973608 cpg_leave error retry 1314973609 cpg_leave error retry 1314973610 cpg_leave error retry 1314973611 0:default cpg_leave error retrying 1314973611 cpg_leave error retry 1314973612 cpg_leave error retry 1314973613 cpg_leave error retry 1314973614 cpg_leave error retry 1314973615 cpg_leave error retry 1314973616 cpg_leave error retry 1314973617 cpg_leave error retry 1314973618 cpg_leave error retry 1314973619 cpg_leave error retry 1314973620 cpg_leave error retry 1314973621 0:default cpg_leave error retrying 1314973621 cpg_leave error retry 1314973622 cpg_leave error retry 1314973623 cpg_leave error retry 1314973624 cpg_leave error retry 1314973625 cpg_leave error retry 1314973626 cpg_leave error retry 1314973627 cpg_leave error retry 1314973628 cpg_leave error retry 1314973629 cpg_leave error retry 1314973630 cpg_leave error retry 1314973631 0:default cpg_leave error retrying 1314973631 cpg_leave error retry 1314973632 cpg_leave error retry 1314973633 cpg_leave error retry 1314973634 cpg_leave error retry 1314973635 cpg_leave error retry 1314973636 cpg_leave error retry 1314973637 cpg_leave error retry 1314973640 0:default cpg_leave error 2 1314973640 client connection 7 1314973640 cluster is down, exiting On Fri, Sep 2, 2011 at 8:38 AM, Robert Hayden wrote: > Has anyone experienced the following error/hang/loop when attempting > to stop rgmanager or cman on the last node of a two node cluster? > > groupd[4909]: cpg_leave error retrying > > Basic scenario: > RHEL 5.7 with the latest errata for cman. > Create a two node cluster with qdisk and higher totem token=70000 > start cman on both nodes, wait for qdisk to become online with master determined > stop cman on node1, wait for it to complete > stop cman on node2 > error "cpg_leave" seen in logging output > > Observations: > The "service cman stop" command hangs at "Stopping fencing" output > If I cycle openais service with "service openais restart", then the > "service cman stop" will complete (need to manually stop the openais > service afterwards). > When hung, the command "group_tool dump" hangs (any group_tool command hangs). > The hang is inconsistent which, in my mind, implies a timing issue. > Inconsistent meaning that every once in a while, then shutdown will > complete (maybe 20% of the time). > I have seen the issue with the stopping of rgmanager and cman. ?The > below example has been stripped down to show the hang with cman. > I have tested with varying the length of time to wait before stopping > the second node with no difference (hang still occurs periodically). > I have tested with commenting out the totem token and the > quorum_dev_poll and still experienced the hang. (we use the longer > timeouts to help survive network and san blips)/ > > > I have dug through some of the source code. ?The message appears in > group's cpg.c as function do_cpg_leave( ). ?This calls the cpg_leave > function located in the openais package. > > If I attach to the groupd process with gdb, I get the following stack. > ?Watching with strace, groupd is just in a looping state. > (gdb) where > #0 ?0x000000341409a510 in __nanosleep_nocancel () from /lib64/libc.so.6 > #1 ?0x000000341409a364 in sleep () from /lib64/libc.so.6 > #2 ?0x000000000040a410 in time () > #3 ?0x000000000040bd09 in time () > #4 ?0x000000000040e2cb in time () > #5 ?0x000000000040ebe0 in time () > #6 ?0x000000000040f394 in time () > #7 ?0x000000341401d994 in __libc_start_main () from /lib64/libc.so.6 > #8 ?0x00000000004018f9 in time () > #9 ?0x00007fff04a671c8 in ?? () > #10 0x0000000000000000 in ?? () > > If I attach to the aisexec process with gdb, I see the following: > (gdb) where > #0 ?0x00000034140cb696 in poll () from /lib64/libc.so.6 > #1 ?0x0000000000405c50 in poll_run () > #2 ?0x0000000000418aae in main () > > > As you can see in the cluster.conf example below, I have attempted > many different ways to create more debug logging. ?I do see debug > messages from openais in the cpg.c component during startup, but > nothing is logged on the shutdown hang scenario. > > I would appreciate any guidance on how to troubleshoot further, > especially with increasing the tracing of the openais calls in cpg.c. > > Thanks > Robert > > > Example cluster.conf: > > > ? ? ? ? timestamp="on" debug="on"> > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? > ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? > ? ? ? ? > ? ? ? ? post_fail_delay="10" post_join_delay="60"/> > ? ? ? ? log_level="7" min_score="1" tko="60" votes="1"> > ? ? ? ? ? ? ? ? > ? ? ? ? > ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? > ? ? ? ? > ? ? ? ? ? ? ? ? ipaddr="X.X.X.X" login="node1_fence" name="iLO_node1" > passwd="password" power_wait="10" lanplus="1"/> > ? ? ? ? ? ? ? ? ipaddr="X.X.X.X" login="node2_fence" name="iLO_node2" > passwd="password" power_wait="10" lanplus="1"/> > ? ? ? ? > ? ? ? ? > > From rhayden.public at gmail.com Fri Sep 2 16:15:15 2011 From: rhayden.public at gmail.com (Robert Hayden) Date: Fri, 2 Sep 2011 11:15:15 -0500 Subject: [Linux-cluster] RHEL 5.7: cpg_leave error retrying In-Reply-To: References: Message-ID: I search the openais forums and ran across two recent threads and a couple of potential patches that sounds interesting. Unfortunately, I do not have enough experience to determine if it is related to my issue. "[Openais] Problems forming cluster on corosync startup" at http://marc.info/?l=openais&m=131234252917259&w=2 "[Openais] CPG client can lockup if the local node is in the downlist" at http://marc.info/?l=openais&m=131354417212931&w=2 The above threads refer to a patch from Steven Drake at http://marc.info/?l=openais&m=131274060602528&w=2 Thanks Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas at sjolshagen.net Fri Sep 2 22:34:29 2011 From: thomas at sjolshagen.net (Thomas Sjolshagen) Date: Fri, 02 Sep 2011 18:34:29 -0400 Subject: [Linux-cluster] =?utf-8?q?dlm=3A_dev=5Fwrite_no_op_48479213_18508?= Message-ID: I've been getting: dlm: dev_write no op 48479213 18508 in dmesg output after I've upgraded to the latest Fedora 15 cluster packages. After a while, my GFS2 file system(s) stop responding. I can't prove a connection between the two, but was wondering if there is any reason to believe there could be? Packages: cluster-glue-1.0.6-2.fc15.1.x86_64 gfs2-cluster-3.1.1-2.fc15.x86_64 cluster-glue-libs-1.0.6-2.fc15.1.x86_64 clusterlib-3.1.5-1.fc15.x86_64 cman-3.1.5-1.fc15.x86_64 kernel-2.6.40.3-0.fc15.x86_64 corosync-1.4.1-1.fc15.x86_64 corosynclib-1.4.1-1.fc15.x86_64 openaislib-1.1.4-2.fc15.x86_64 openais-1.1.4-2.fc15.x86_64 -- Read my blog(s) [1] - occasionally updated!: Follow me on Twitter [2] Links: ------ [1] http://www.sjolshagen.net/ [2] http://www.twitter.com/NotFitEnough -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdinitto at redhat.com Sat Sep 3 05:09:11 2011 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Sat, 03 Sep 2011 07:09:11 +0200 Subject: [Linux-cluster] dlm: dev_write no op 48479213 18508 In-Reply-To: References: Message-ID: <4E61B677.3080702@redhat.com> On 09/03/2011 12:34 AM, Thomas Sjolshagen wrote: > I've been getting: > > dlm: dev_write no op 48479213 18508 > > in dmesg output after I've upgraded to the latest Fedora 15 cluster > packages. > We already have a fix for this message. It is a miscommunication between kernel and dlm_controld. My understanding is that it is harmless. (see bz731775 for more details) > After a while, my GFS2 file system(s) stop responding. I can't prove a > connection between the two, but was wondering if there is any reason to > believe there could be? It is probably unrelated but I strongly recommend you file a bug against gfs2-utils in fedora so that the gfs2 maintainers can look at it. Fabio > > Packages: > > cluster-glue-1.0.6-2.fc15.1.x86_64 > gfs2-cluster-3.1.1-2.fc15.x86_64 > cluster-glue-libs-1.0.6-2.fc15.1.x86_64 > clusterlib-3.1.5-1.fc15.x86_64 > cman-3.1.5-1.fc15.x86_64 > kernel-2.6.40.3-0.fc15.x86_64 > > corosync-1.4.1-1.fc15.x86_64 > corosynclib-1.4.1-1.fc15.x86_64 > > openaislib-1.1.4-2.fc15.x86_64 > openais-1.1.4-2.fc15.x86_64 > > -- > > Read my blog(s) - occasionally updated!: > > Follow me on Twitter > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From rh-cluster at menole.net Tue Sep 6 09:51:11 2011 From: rh-cluster at menole.net (Michael Mende) Date: Tue, 6 Sep 2011 11:51:11 +0200 Subject: [Linux-cluster] vm guest migrating through clusvcadm In-Reply-To: <1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com> References: <4E5CC0B8.8010209@sissa.it> <1314702563.2694.13.camel@menhir> <1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com> Message-ID: <20110906095111.GA16237@menole.dyndns.org> Maybe Digimer's tutorial will help: https://alteeve.com/w/Red_Hat_Cluster_Service_2_Tutorial -- Mit freundlichen Gr??en, Michael Mende http://www.menole.net/ On Thu, Sep 01, 2011 at 07:57:18AM -0700, Srija wrote: > ????Hi, > ? > ? > ? I have confgured a guest? in the cluster environment to restart?in another? node?, when the node on which the guest resides if goes down. > ? > ? The guest? has no issue when i am live migrating it with 'xm migrate' command.? But having issues? when trying to migrated? with > ? > ? 'clusvcadm'.? The? cluster is not? under kvm.? > ? > ? The error is? : > ? > ??????????? Trying to migrate service:guest1 ?to node1...Service does not exist. > ? > ? Here is the configuration of the cluster: > ? > ? > ??????????????? > ??????????????????????? > ??????????????????????????????? > ?????????????????????????????? ...................... > ???????????????????????????? > ??????????????????????? > ??????????????? > ??????????????? > ??????????????????? > ??????????????? > ??????? > ? > I tried also changing the cluster configuration , placing the guest? name as? service? , but it did not work? as well. > ? > ?Can anyone pl. confirm me that does it need the kvm to use? 'clusvcadm'? command? ? If not? then what kind of modications? is needed > ? > ?to? migrate the guest? with clusvcadm? > ? > Thanks > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From hal at elizium.za.net Tue Sep 6 10:06:35 2011 From: hal at elizium.za.net (Hugo Lombard) Date: Tue, 6 Sep 2011 12:06:35 +0200 Subject: [Linux-cluster] vm guest migrating through clusvcadm In-Reply-To: <1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com> References: <4E5CC0B8.8010209@sissa.it> <1314702563.2694.13.camel@menhir> <1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com> Message-ID: <20110906100635.GI3298@squishy.elizium.za.net> On Thu, Sep 01, 2011 at 07:57:18AM -0700, Srija wrote: > > The error is : > > Trying to migrate service:guest1 to node1...Service does not exist. > What was the command you tried? That 'service:guest1' looks suspect, I think it should rather be 'vm:guest1'. It should match the name of the service in the clustat output. As an example, we'd use: clusvcadm -M vm:guest1 -m srv2 to migrate the virtual machine 'guest1' to the cluster node 'srv2'. HTH -- Hugo Lombard From mark at thermeon.com Wed Sep 7 18:37:52 2011 From: mark at thermeon.com (Mark Olliver) Date: Wed, 7 Sep 2011 19:37:52 +0100 Subject: [Linux-cluster] kvm shared disk space Message-ID: <012b01cc6d8d$44c21c50$ce4654f0$@thermeon.com> Hi, I have two kvm guests A and B which live on two different hosts. Both of the host have a different partition which is DRBD synced in Active/Active mode between the hosts, This is then mounted to each host using gfs. I now need to allow access to the data on the shared gfs disk by the two kvm guests but I am unsure what I need to do to do that. I have looked at the libvirt options but do not see anything that would make sense for the config file. Ideally each of the guests should mount the data to /mnt/data as this will can then be served out by both of them at the same time. I should note I do not need the data mounted on the hosts, I have just done that at the moment to test getting gfs2 working over active/active drbd. I do however, need locks to work correctly as the application that needs to use the shared data does need to use locks so any mounting or exporting option needs to respect that. Any help or ideas gratefully received. Regards Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From ntoughe at hotmail.com Thu Sep 8 09:06:40 2011 From: ntoughe at hotmail.com (Guy-Serge NTOUGHE) Date: Thu, 8 Sep 2011 09:06:40 +0000 (UTC) Subject: [Linux-cluster] Invitation to connect on LinkedIn Message-ID: <589992105.4907891.1315472800009.JavaMail.app@ela4-app0128.prod> I'd like to add you to my professional network on LinkedIn. - Guy-Serge Guy-Serge NTOUGHE Linux Expert at Michelin TravelPartners Paris Area, France Confirm that you know Guy-Serge NTOUGHE: https://www.linkedin.com/e/-odgn7o-gsbijgnb-3y/isd/4129701855/GsprLpRo/?hs=false&tok=3oiUnw96j2yAU1 -- You are receiving Invitation to Connect emails. Click to unsubscribe: http://www.linkedin.com/e/-odgn7o-gsbijgnb-3y/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/goo/linux-cluster%40redhat%2Ecom/20061/I1425010746_1/?hs=false&tok=08LlmS2DH2yAU1 (c) 2011 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ntoughe at hotmail.com Thu Sep 8 09:08:22 2011 From: ntoughe at hotmail.com (Guy-Serge NTOUGHE) Date: Thu, 8 Sep 2011 09:08:22 +0000 (UTC) Subject: [Linux-cluster] Invitation to connect on LinkedIn Message-ID: <192109173.4972278.1315472902660.JavaMail.app@ela4-app0132.prod> I'd like to add you to my professional network on LinkedIn. - Guy-Serge Guy-Serge NTOUGHE Linux Expert at Michelin TravelPartners Paris Area, France Confirm that you know Guy-Serge NTOUGHE: https://www.linkedin.com/e/-odgn7o-gsbilnur-5i/isd/4129701855/GsprLpRo/?hs=false&tok=2vII09OQ34yAU1 -- You are receiving Invitation to Connect emails. Click to unsubscribe: http://www.linkedin.com/e/-odgn7o-gsbilnur-5i/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/goo/linux-cluster%40redhat%2Ecom/20061/I1425015880_1/?hs=false&tok=23mvBAu9z4yAU1 (c) 2011 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pradhanparas at gmail.com Tue Sep 13 16:40:02 2011 From: pradhanparas at gmail.com (Paras pradhan) Date: Tue, 13 Sep 2011 11:40:02 -0500 Subject: [Linux-cluster] replacing HBA Message-ID: Hi, I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2 cluster. Apart from changing wwn in the SAN, what else do I need to change in Linux (centos). will the change be reflected automatically? Thanks! Paras. From rpeterso at redhat.com Tue Sep 13 18:04:08 2011 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 13 Sep 2011 14:04:08 -0400 (EDT) Subject: [Linux-cluster] replacing HBA In-Reply-To: Message-ID: <405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> ----- Original Message ----- | Hi, | | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2 | cluster. Apart from changing wwn in the SAN, what else do I need to | change in Linux (centos). will the change be reflected automatically? | | | Thanks! | Paras. Hi Paras, The GFS2 file system doesn't care what HBA you're using. So as long as your kernel has a good device driver for that HBA you shouldn't need to do anything else. Regards, Bob Peterson Red Hat File Systems From pradhanparas at gmail.com Tue Sep 13 19:46:26 2011 From: pradhanparas at gmail.com (Paras pradhan) Date: Tue, 13 Sep 2011 14:46:26 -0500 Subject: [Linux-cluster] replacing HBA In-Reply-To: <405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> References: <405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: Thanks Bob. Another question. What about replacing single port HBA with a dual port. After configuring the multipathd, can I reconfigure physical volume without destroying the vg, lv and clvm ? I am kinddda lost here. Thanks Paras. On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson wrote: > ----- Original Message ----- > | Hi, > | > | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2 > | cluster. Apart from changing wwn in the SAN, what else do I need to > | change in Linux (centos). will the change be reflected automatically? > | > | > | Thanks! > | Paras. > > Hi Paras, > > The GFS2 file system doesn't care what HBA you're using. > So as long as your kernel has a good device driver for that HBA > you shouldn't need to do anything else. > > Regards, > > Bob Peterson > Red Hat File Systems > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From keith.schincke at gmail.com Tue Sep 13 21:30:33 2011 From: keith.schincke at gmail.com (Keith Schincke) Date: Tue, 13 Sep 2011 16:30:33 -0500 Subject: [Linux-cluster] replacing HBA In-Reply-To: References: <405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: How many paths doe you currently have to your disk? Does your LVM use the multipath name (mpath0)? Sent from my iPhone On Sep 13, 2011, at 14:46, Paras pradhan wrote: > Thanks Bob. > > Another question. What about replacing single port HBA with a dual > port. After configuring the multipathd, can I reconfigure physical > volume without destroying the vg, lv and clvm ? I am kinddda lost > here. > > Thanks > Paras. > > On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson wrote: >> ----- Original Message ----- >> | Hi, >> | >> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2 >> | cluster. Apart from changing wwn in the SAN, what else do I need to >> | change in Linux (centos). will the change be reflected automatically? >> | >> | >> | Thanks! >> | Paras. >> >> Hi Paras, >> >> The GFS2 file system doesn't care what HBA you're using. >> So as long as your kernel has a good device driver for that HBA >> you shouldn't need to do anything else. >> >> Regards, >> >> Bob Peterson >> Red Hat File Systems >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From pradhanparas at gmail.com Tue Sep 13 22:11:03 2011 From: pradhanparas at gmail.com (Paras pradhan) Date: Tue, 13 Sep 2011 17:11:03 -0500 Subject: [Linux-cluster] replacing HBA In-Reply-To: References: <405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: On Tue, Sep 13, 2011 at 4:30 PM, Keith Schincke wrote: > How many paths doe you currently have to your disk? > Does your LVM use the multipath name (mpath0)? Right now only one path with no multipath configured so LVM is not using mpath0. Ideas? Thanks! Paras. > > Sent from my iPhone > > On Sep 13, 2011, at 14:46, Paras pradhan wrote: > >> Thanks Bob. >> >> Another question. What about replacing single port HBA with a dual >> port. After configuring the multipathd, can I reconfigure physical >> volume without destroying the vg, lv and clvm ? I am kinddda lost >> here. >> >> Thanks >> Paras. >> >> On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson wrote: >>> ----- Original Message ----- >>> | Hi, >>> | >>> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2 >>> | cluster. Apart from changing wwn in the SAN, what else do I need to >>> | change in Linux (centos). will the change be reflected automatically? >>> | >>> | >>> | Thanks! >>> | Paras. >>> >>> Hi Paras, >>> >>> The GFS2 file system doesn't care what HBA you're using. >>> So as long as your kernel has a good device driver for that HBA >>> you shouldn't need to do anything else. >>> >>> Regards, >>> >>> Bob Peterson >>> Red Hat File Systems >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From keith.schincke at gmail.com Tue Sep 13 22:50:56 2011 From: keith.schincke at gmail.com (Keith Schincke) Date: Tue, 13 Sep 2011 17:50:56 -0500 Subject: [Linux-cluster] replacing HBA In-Reply-To: References: <405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: Hmmm. The UUID of the physical volume should be written to disk (sdj) or partition (sdj1) depending on your design. kpartx should not care about the data on the disk (ie your UUID) when it makes the mpathXpY entries. Hopefully what will happen will be - install your hba and zone the SAN as necessary - enable multipathd and restart. This should create the mpathX entries. multipath -ll will list the paths and disks - run kpartx -a to add needed mpathXpY entries. I do not know if this runs on startup. - reboot and see if you can mount the LVM. If all goes right, pvdisplay should display the multipath devices of your PVs. On Tue, Sep 13, 2011 at 5:11 PM, Paras pradhan wrote: > On Tue, Sep 13, 2011 at 4:30 PM, Keith Schincke > wrote: > > How many paths doe you currently have to your disk? > > Does your LVM use the multipath name (mpath0)? > > Right now only one path with no multipath configured so LVM is not > using mpath0. Ideas? > > Thanks! > Paras. > > > > > > Sent from my iPhone > > > > On Sep 13, 2011, at 14:46, Paras pradhan wrote: > > > >> Thanks Bob. > >> > >> Another question. What about replacing single port HBA with a dual > >> port. After configuring the multipathd, can I reconfigure physical > >> volume without destroying the vg, lv and clvm ? I am kinddda lost > >> here. > >> > >> Thanks > >> Paras. > >> > >> On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson > wrote: > >>> ----- Original Message ----- > >>> | Hi, > >>> | > >>> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2 > >>> | cluster. Apart from changing wwn in the SAN, what else do I need to > >>> | change in Linux (centos). will the change be reflected automatically? > >>> | > >>> | > >>> | Thanks! > >>> | Paras. > >>> > >>> Hi Paras, > >>> > >>> The GFS2 file system doesn't care what HBA you're using. > >>> So as long as your kernel has a good device driver for that HBA > >>> you shouldn't need to do anything else. > >>> > >>> Regards, > >>> > >>> Bob Peterson > >>> Red Hat File Systems > >>> > >>> -- > >>> Linux-cluster mailing list > >>> Linux-cluster at redhat.com > >>> https://www.redhat.com/mailman/listinfo/linux-cluster > >>> > >> > >> -- > >> Linux-cluster mailing list > >> Linux-cluster at redhat.com > >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pradhanparas at gmail.com Thu Sep 15 16:50:01 2011 From: pradhanparas at gmail.com (Paras pradhan) Date: Thu, 15 Sep 2011 11:50:01 -0500 Subject: [Linux-cluster] replacing HBA In-Reply-To: References: <405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: Thanks Keith. I will try and let all know how it goes. Paras. On Tue, Sep 13, 2011 at 5:50 PM, Keith Schincke wrote: > Hmmm. The UUID of the physical volume should be written to disk (sdj) or > partition (sdj1) depending on your design. > kpartx should not care about the data on the disk (ie your UUID) when it > makes the mpathXpY entries. > > Hopefully what will happen will be > - install your hba and zone the SAN as necessary > - enable multipathd and restart. This should create the mpathX entries. > multipath -ll will list the paths and disks > - run kpartx -a to add needed mpathXpY entries. I do not know if this runs > on startup. > - reboot and see if you can mount the LVM. > > If all goes right, pvdisplay should display the multipath devices of your > PVs. > > > On Tue, Sep 13, 2011 at 5:11 PM, Paras pradhan > wrote: >> >> On Tue, Sep 13, 2011 at 4:30 PM, Keith Schincke >> wrote: >> > How many paths doe you currently have to your disk? >> > Does your LVM use the multipath name (mpath0)? >> >> Right now only one path with no multipath configured so LVM is not >> using mpath0. Ideas? >> >> Thanks! >> Paras. >> >> >> > >> > Sent from my iPhone >> > >> > On Sep 13, 2011, at 14:46, Paras pradhan wrote: >> > >> >> Thanks Bob. >> >> >> >> Another question. What about replacing single port HBA with a dual >> >> port. After configuring the multipathd, can I reconfigure physical >> >> volume without destroying the vg, lv and clvm ? I am kinddda lost >> >> here. >> >> >> >> Thanks >> >> Paras. >> >> >> >> On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson >> >> wrote: >> >>> ----- Original Message ----- >> >>> | Hi, >> >>> | >> >>> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2 >> >>> | cluster. Apart from changing wwn in the SAN, what else do I need to >> >>> | change in Linux (centos). will the change be reflected >> >>> automatically? >> >>> | >> >>> | >> >>> | Thanks! >> >>> | Paras. >> >>> >> >>> Hi Paras, >> >>> >> >>> The GFS2 file system doesn't care what HBA you're using. >> >>> So as long as your kernel has a good device driver for that HBA >> >>> you shouldn't need to do anything else. >> >>> >> >>> Regards, >> >>> >> >>> Bob Peterson >> >>> Red Hat File Systems >> >>> >> >>> -- >> >>> Linux-cluster mailing list >> >>> Linux-cluster at redhat.com >> >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >>> >> >> >> >> -- >> >> Linux-cluster mailing list >> >> Linux-cluster at redhat.com >> >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > >> > -- >> > Linux-cluster mailing list >> > Linux-cluster at redhat.com >> > https://www.redhat.com/mailman/listinfo/linux-cluster >> > >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From carlopmart at gmail.com Fri Sep 16 08:22:21 2011 From: carlopmart at gmail.com (carlopmart) Date: Fri, 16 Sep 2011 10:22:21 +0200 Subject: [Linux-cluster] Corosync goes cpu to 95-99% In-Reply-To: <4E2D940B.5020803@redhat.com> References: <4DD29D03.9080901@gmail.com> <4DD2BAC3.50509@redhat.com> <4DD2BD7D.5070704@gmail.com> <4DD2CA90.6090802@redhat.com> <3B50BA7445114813AE429BEE51A2BA52@versa> <4DD78908.2030801@gmail.com> <0B1965C8-9807-42B6-9453-01BE0C0B1DCB@cybercat.ca><4DD80D5D.10004@gmail.com> <4DD873C7.8080402@cybercat.ca> <22E7D11CD5E64E338A66811F31F06238@versa> <4DE545D7.1080703@redhat.com> <4DE69786.5010204@gmail.com><4DE6CAF6.4000002@cybercat.ca> <4DE75602.1000408@gmail.com> <51BB988BCCF547E69BF222BDAF34C4DE@versa> <4E04B61B.9070208@cybercat.ca> <4E2D63DD.4050007@gmail.com> <4E2D7329.6050607@redhat.com> <4E2D7425.4070801@gmail.com> <4E2D8ECB.6020305@redhat.com> <4E2D8F87.30508@gmail.com> <4E2D940B.5020803@redhat.com> Message-ID: <4E73073D.8010209@gmail.com> On 07/25/2011 06:04 PM, Steven Dake wrote: > On 07/25/2011 08:45 AM, carlopmart wrote: >> On 07/25/2011 05:42 PM, Steven Dake wrote: >>>>>>> are caused by this issue. >>>>>>> >>>>>>> So, as a temporary work-around for this time, woule be (at your own >>>>>>> risks) to downgrade to 2.6.32-71.29.1.el6 kernel : >>>>>>> >>>>>>> yum install kernel-2.6.32-71.29.1.el6.x86_64 >>>>>>> >>>>>>> Regards, >>>>>> >>>>>> Hi Steven and Nicolas, >>>>>> >>>>>> Is this bug resolved in RHEL6.1 with all updates applied?? Do I >>>>>> need to >>>>>> use some specific kernel version 2.6.32-131.2.1 or 2.6.32-131.6.1? >>>>>> >>>>>> Thanks. >>>>>> >>>>> >>>>> the corosync portion is going through QE. The kernel portion remains >>>>> open. >>>>> >>>>> Regards >>>>> -steve >>>>> >>>> >>>> Thanks Steve, then, Can I use last corosync version provided with >>>> RHEL6.1 and last RHEL6.0's kernel version without problems?? >>>> >>>> >>>> >>> >>> I recommend not mixing without a support signoff. >>> >> >> Then, how can I install rhcs under rhel6.x and prevent this bug?? >> >> > get a support signoff. Also the corosync updates have not finished > through our validation process. Only hot fixes (from support) are available > > Regards > -steve > Sorry to re-open this thread ... But exists any news about this problem?? -- CL Martinez carlopmart {at} gmail {d0t} com From ext.thales.jean-daniel.bonnetot at sncf.fr Fri Sep 16 12:54:02 2011 From: ext.thales.jean-daniel.bonnetot at sncf.fr (BONNETOT Jean-Daniel (EXT THALES)) Date: Fri, 16 Sep 2011 14:54:02 +0200 Subject: [Linux-cluster] Luci can't install packages Message-ID: Hello, Usually I used manal installation but I need to process throu Luci. My problem is present with RHEL 5.7 and RHEL 6.0 (luci and ricci), with RHEL 5.6 it works correctly. I used "Create" new cluster and add my nodes (options arenot important, the problem is always here) and submit... "Please wait..." Creating node "node1" for cluster "clutest": installing packages Creating node "node2" for cluster "clutest": installing packages I waited ;) but nothing. My process list on nodes says : 4166 ? Ss 0:00 /usr/sbin/oddjobd -p /var/run/oddjobd.pid -t 300 22343 ? S 0:00 \_ ricci-modrpm 22355 ? S 0:01 \_ /usr/bin/python /usr/bin/yum -y list all 4221 ? S From chekov at ucla.edu Fri Sep 16 22:26:00 2011 From: chekov at ucla.edu (Alan Wood) Date: Fri, 16 Sep 2011 15:26:00 -0700 (PDT) Subject: [Linux-cluster] shared disk with virsh migration In-Reply-To: References: Message-ID: Hi all, I'm trying to decide whether I really need a cluster implementation to do what I want to do and I figured I'd solicit opinions. Essentially I want to have two machines running as virtualization hosts with libvirt/kvm. I have shared iSCSI storage available to both hosts and have to decide how to configure the storage for use with libvirt. Right now I see three possibilities: 1. Setting an iSCSI storage pool in libvirt Pros: Migration seems painless, including live migration Cons: Need to pre-allocate LUNs on iSCSI box. Does not seem to take advantage of iSCSI offloading or multipathing 2. Setting up a two-node cluster and running CLVM Pros: Very flexible storage management (is snapshotting supported yet in clvm?) Automatic failover Cons: Cluster infrastructure adds complexity, more potential for bugs Possible split brain issues? 3. A single iSCSI block device with partitions for each VM mounted on both hosts Pros: Easy migration, setup Cons: Two hosts accessing the same block device outside of a cluster seems like it might lead to disaster Right now I actually like option 3 but I'm wondering if I really am asking for trouble accessing a block device simultaneously on two hosts without a clustering infrastructure. I did this a while back with a shared-SCSI box and it seemed to work. I would never be accessing the same partition on both hosts and I understand that all partitioning has to be done while the other host is off, but is there something else I'm missing here? Also, are people out there running option 2? Does it make sesne to set up a cluster as small as 2-nodes for HA virtualization or do I really need more nodes for it to be worthwhile? I do have all the fencing infrastructure I might need (PDUs and Dracs). any help would be appreciated. thanks -alan From ext.thales.jean-daniel.bonnetot at sncf.fr Mon Sep 19 08:02:41 2011 From: ext.thales.jean-daniel.bonnetot at sncf.fr (BONNETOT Jean-Daniel (EXT THALES)) Date: Mon, 19 Sep 2011 10:02:41 +0200 Subject: [Linux-cluster] shared disk with virsh migration In-Reply-To: References: Message-ID: Hello, I don't use KVM and libvirt but my experiment concerne clustering storage : 1. Don't know 2. Snapshotting is supported in clvm (since 5.7 I think) Complexity... yes Bugs... yes Split brain... yes 2 nodes is sufficient for HA, juste think what happens if 1 node shuts down and your VMs are very loded (needs 3rd nodes ?) 3. No experiment too but it sounds like it's not the right usage Best regards -- JD -----Message d'origine----- De?: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] De la part de Alan Wood Envoy??: samedi 17 septembre 2011 00:26 ??: linux-cluster at redhat.com Objet?: [Linux-cluster] shared disk with virsh migration Hi all, I'm trying to decide whether I really need a cluster implementation to do what I want to do and I figured I'd solicit opinions. Essentially I want to have two machines running as virtualization hosts with libvirt/kvm. I have shared iSCSI storage available to both hosts and have to decide how to configure the storage for use with libvirt. Right now I see three possibilities: 1. Setting an iSCSI storage pool in libvirt Pros: Migration seems painless, including live migration Cons: Need to pre-allocate LUNs on iSCSI box. Does not seem to take advantage of iSCSI offloading or multipathing 2. Setting up a two-node cluster and running CLVM Pros: Very flexible storage management (is snapshotting supported yet in clvm?) Automatic failover Cons: Cluster infrastructure adds complexity, more potential for bugs Possible split brain issues? 3. A single iSCSI block device with partitions for each VM mounted on both hosts Pros: Easy migration, setup Cons: Two hosts accessing the same block device outside of a cluster seems like it might lead to disaster Right now I actually like option 3 but I'm wondering if I really am asking for trouble accessing a block device simultaneously on two hosts without a clustering infrastructure. I did this a while back with a shared-SCSI box and it seemed to work. I would never be accessing the same partition on both hosts and I understand that all partitioning has to be done while the other host is off, but is there something else I'm missing here? Also, are people out there running option 2? Does it make sesne to set up a cluster as small as 2-nodes for HA virtualization or do I really need more nodes for it to be worthwhile? I do have all the fencing infrastructure I might need (PDUs and Dracs). any help would be appreciated. thanks -alan -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster ------- Ce message et toutes les pi?ces jointes sont ?tablis ? l'intention exclusive de ses destinataires et sont confidentiels. L'int?grit? de ce message n'?tant pas assur?e sur Internet, la SNCF ne peut ?tre tenue responsable des alt?rations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, m?me partielle, non autoris?e pr?alablement par la SNCF, est strictement interdite. Si vous n'?tes pas le destinataire de ce message, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire. ------- This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it. From carlopmart at gmail.com Mon Sep 19 09:09:40 2011 From: carlopmart at gmail.com (carlopmart) Date: Mon, 19 Sep 2011 11:09:40 +0200 Subject: [Linux-cluster] Rotating apache logs when is configured as a resource under RHCS Message-ID: <4E7706D4.2070201@gmail.com> Hi all, I have configured an apache resource under cluster.conf like this: (both nodes are RHEL6.1) My question is: which is the best form to rotate apache logs using logrotate configuration?? Is this a possible solution: /var/log/httpd/*log { missingok notifempty sharedscripts delaycompress postrotate if [ -f /var/run/cluster/apache/apache:httpd-mirror.pid ]; then clusvcadm -R httpd-mirror fi endscript } -- CL Martinez carlopmart {at} gmail {d0t} com From pmshehzad at yahoo.com Mon Sep 19 06:07:40 2011 From: pmshehzad at yahoo.com (pmshehzad at yahoo.com) Date: Mon, 19 Sep 2011 06:07:40 Subject: [Linux-cluster] hi cluster Message-ID: eca75ac689a87dca0122ec482a6b9609@[192.168.1.1] hows it going this is really interesting http://blog.news7ifinance.com/ see you around From harry.sutton at hp.com Mon Sep 19 12:34:53 2011 From: harry.sutton at hp.com (Sutton, Harry (HAS GSE)) Date: Mon, 19 Sep 2011 08:34:53 -0400 Subject: [Linux-cluster] shared disk with virsh migration In-Reply-To: References: Message-ID: <4E7736ED.9000607@hp.com> I'd have to do some research to verify, but I'm guessing that iSCSI (in option 3) would use the traditional SCSI reservation mechanism to prevent problems associated with multiple access. /Harry On 09/16/2011 06:26 PM, Alan Wood wrote: > Hi all, > > I'm trying to decide whether I really need a cluster implementation to do > what I want to do and I figured I'd solicit opinions. > Essentially I want to have two machines running as virtualization hosts > with libvirt/kvm. I have shared iSCSI storage available to both hosts and > have to decide how to configure the storage for use with libvirt. Right > now I see three possibilities: > 1. Setting an iSCSI storage pool in libvirt > Pros: Migration seems painless, including live migration > Cons: Need to pre-allocate LUNs on iSCSI box. > Does not seem to take advantage of iSCSI offloading or multipathing > 2. Setting up a two-node cluster and running CLVM > Pros: Very flexible storage management (is snapshotting supported yet in clvm?) > Automatic failover > Cons: Cluster infrastructure adds complexity, more potential for bugs > Possible split brain issues? > 3. A single iSCSI block device with partitions for each VM mounted on both hosts > Pros: Easy migration, setup > Cons: Two hosts accessing the same block device outside of a > cluster seems like it might lead to disaster > > Right now I actually like option 3 but I'm wondering if I really am asking > for trouble accessing a block device simultaneously on two hosts without a > clustering infrastructure. I did this a while back with a shared-SCSI box > and it seemed to work. I would never be accessing the same partition on > both hosts and I understand that all partitioning has to be done while the > other host is off, but is there something else I'm missing here? > > Also, are people out there running option 2? Does it make sesne to set up > a cluster as small as 2-nodes for HA virtualization or do I really need > more nodes for it to be worthwhile? I do have all the fencing > infrastructure I might need (PDUs and Dracs). > > any help would be appreciated. thanks > -alan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5069 bytes Desc: S/MIME Cryptographic Signature URL: From jmd_singhsaini at yahoo.com Tue Sep 20 05:40:55 2011 From: jmd_singhsaini at yahoo.com (Harvinder Singh Binder) Date: Tue, 20 Sep 2011 11:10:55 +0530 (IST) Subject: [Linux-cluster] Rotating apache logs when is configured as a resource under RHCS In-Reply-To: <4E7706D4.2070201@gmail.com> Message-ID: <1316497255.25429.YahooMailClassic@web94809.mail.in2.yahoo.com> how i configure media player in linux operation system please tell me about configure procedure(Commands). Harvinder Singh S/O Baldev Raj, VPO Barwa Teh. Anandpur Sahib, Dist. Ropar, PunjabE-Mail ID:- ? ? jmd_singhsaini at yahoo.com --- On Mon, 19/9/11, carlopmart wrote: > From: carlopmart > Subject: [Linux-cluster] Rotating apache logs when is configured as a resource under RHCS > To: linux-cluster at redhat.com > Date: Monday, 19 September, 2011, 2:09 AM > Hi all, > > I have configured an apache resource under cluster.conf > like this: (both nodes are RHEL6.1) > > config_file="/data/config/etc/httpd/conf/httpd-mirror.conf" > name="httpd-mirror" server_root="/data/config/etc/httpd" > shutdown_wait="3"/> > > My question is: which is the best form to rotate apache > logs using logrotate configuration?? > > Is this a possible solution: > > /var/log/httpd/*log { > ? ? missingok > ? ? notifempty > ? ? sharedscripts > ? ? delaycompress > ? ? postrotate > ? ? ? ? if [ -f > /var/run/cluster/apache/apache:httpd-mirror.pid ]; then > ? ? ? ? ? ? clusvcadm -R > httpd-mirror > ? ? ? ? fi > ? ? endscript > } > -- > CL Martinez > carlopmart {at} gmail {d0t} com > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From sdake at redhat.com Tue Sep 20 18:13:45 2011 From: sdake at redhat.com (Steven Dake) Date: Tue, 20 Sep 2011 11:13:45 -0700 Subject: [Linux-cluster] New Corosync Mailing list - Please register for it! Message-ID: <4E78D7D9.7060001@redhat.com> Hi, Over the past several years, we have been sharing a mailing list with the openais project. I have made a new mailing list specifically for corosync: This will be the permanent new list for corosync. Please register at: http://lists.corosync.org/mailman/listinfo The list is called "discuss" Q Why are we making this change now? A Several weeks ago Linux Foundation was hacked into (see http://www.linuxfoundation.org). They hosted our mailing list service. During this event, the mailing list has been unusable. The Linux Foundation staff is busy rebuilding their network, but in the interim this seems like a good opportunity to move everything to our core infrastructure at corosync.org. Q What about the archives? A I hope to restore the archives once I can get the records from Linux Foundation. There is no guarantee I can get a restored copy of the archive however. Fortunately several services over the years have archived our mailing list. Q What about my registration on the openais mailing list? A I don't have the records to transfer the registrations to the corosync list, so you will have to sign up for the mailing list again. Q Is my password that I used to register on the openais mailing list compromised? A I do not know what extent the systems were hacked, but I'd recommend treating the password as compromised. If you shared this password with other services, please change it. Mailman stores passwords in plaintext so that it can mail them to you once a month. Always use unique passwords on mailman mailing lists. Regards -steve From laszlo at beres.me Thu Sep 22 14:57:27 2011 From: laszlo at beres.me (Laszlo Beres) Date: Thu, 22 Sep 2011 16:57:27 +0200 Subject: [Linux-cluster] Lost connection to storage - what happens? Message-ID: Hi, just a theoretical question: let's assume we have a cluster with GFS2 filesystem (not as a managed resource). What happens exactly if all paths to backend device get lost? It's not a cluster event, so I assume cluster operates normally, but what does GFS2/DLM do? Regards, -- L?szl? B?res? ? ? ? ? ? Unix system engineer http://www.google.com/profiles/beres.laszlo From carlopmart at gmail.com Mon Sep 26 09:18:11 2011 From: carlopmart at gmail.com (carlopmart) Date: Mon, 26 Sep 2011 11:18:11 +0200 Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for rhel6.x? Message-ID: <4E804353.1040605@gmail.com> Hi all, Due to continuous problems with corosync (https://bugzilla.redhat.com/show_bug.cgi?id=709758, https://www.redhat.com/archives/linux-cluster/2011-July/msg00074.html) under rhel6.x (I have a trial subscription, that I will convert to permanent subscription when all works ok), I would like to know when corosync-1.4.1-3.el6, will be released for rhel6.1. Any?? Thanks ... -- CL Martinez carlopmart {at} gmail {d0t} com From ajb2 at mssl.ucl.ac.uk Mon Sep 26 10:01:09 2011 From: ajb2 at mssl.ucl.ac.uk (Alan Brown) Date: Mon, 26 Sep 2011 11:01:09 +0100 Subject: [Linux-cluster] Lost connection to storage - what happens? In-Reply-To: References: Message-ID: <4E804D65.6090808@mssl.ucl.ac.uk> Laszlo Beres wrote: > Hi, > > just a theoretical question: let's assume we have a cluster with GFS2 > filesystem (not as a managed resource). What happens exactly if all > paths to backend device get lost? GFS2 withdraws that filesystem and you'll have to reboot all the withdrawn machines to get it back, once the paths are restored. GFS doesn't require a reboot. Redhat argue this is not a regression as GFS2 is not GFS From jfriesse at redhat.com Mon Sep 26 10:31:41 2011 From: jfriesse at redhat.com (Jan Friesse) Date: Mon, 26 Sep 2011 12:31:41 +0200 Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for rhel6.x? In-Reply-To: <4E804353.1040605@gmail.com> References: <4E804353.1040605@gmail.com> Message-ID: <4E80548D.1070904@redhat.com> carlopmart napsal(a): > Hi all, > > Due to continuous problems with corosync > (https://bugzilla.redhat.com/show_bug.cgi?id=709758, > https://www.redhat.com/archives/linux-cluster/2011-July/msg00074.html) > under rhel6.x (I have a trial subscription, that I will convert to > permanent subscription when all works ok), I would like to know when > corosync-1.4.1-3.el6, will be released for rhel6.1. Any?? We are not doing rebases in Z streams, so Corosync 1.4.1 will be never released for RHEL 6.1. It will be available in RHEL 6.2. Regards, Honza > > Thanks ... > From carlopmart at gmail.com Mon Sep 26 10:51:20 2011 From: carlopmart at gmail.com (carlopmart) Date: Mon, 26 Sep 2011 12:51:20 +0200 Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for rhel6.x? In-Reply-To: <4E80548D.1070904@redhat.com> References: <4E804353.1040605@gmail.com> <4E80548D.1070904@redhat.com> Message-ID: <4E805928.3020009@gmail.com> On 09/26/2011 12:31 PM, Jan Friesse wrote: > carlopmart napsal(a): >> Hi all, >> >> Due to continuous problems with corosync >> (https://bugzilla.redhat.com/show_bug.cgi?id=709758, >> https://www.redhat.com/archives/linux-cluster/2011-July/msg00074.html) >> under rhel6.x (I have a trial subscription, that I will convert to >> permanent subscription when all works ok), I would like to know when >> corosync-1.4.1-3.el6, will be released for rhel6.1. Any?? > > We are not doing rebases in Z streams, so Corosync 1.4.1 will be never > released for RHEL 6.1. It will be available in RHEL 6.2. > > Regards, > Honza > >> But can be released a version that solves the bugs for rhel6.1 before rhel6.2? -- CL Martinez carlopmart {at} gmail {d0t} com From jfriesse at redhat.com Mon Sep 26 11:34:22 2011 From: jfriesse at redhat.com (Jan Friesse) Date: Mon, 26 Sep 2011 13:34:22 +0200 Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for rhel6.x? In-Reply-To: <4E805928.3020009@gmail.com> References: <4E804353.1040605@gmail.com> <4E80548D.1070904@redhat.com> <4E805928.3020009@gmail.com> Message-ID: <4E80633E.8020409@redhat.com> carlopmart napsal(a): > On 09/26/2011 12:31 PM, Jan Friesse wrote: >> carlopmart napsal(a): >>> Hi all, >>> >>> Due to continuous problems with corosync >>> (https://bugzilla.redhat.com/show_bug.cgi?id=709758, >>> https://www.redhat.com/archives/linux-cluster/2011-July/msg00074.html) >>> under rhel6.x (I have a trial subscription, that I will convert to >>> permanent subscription when all works ok), I would like to know when >>> corosync-1.4.1-3.el6, will be released for rhel6.1. Any?? >> >> We are not doing rebases in Z streams, so Corosync 1.4.1 will be never >> released for RHEL 6.1. It will be available in RHEL 6.2. >> >> Regards, >> Honza >> >>> > > But can be released a version that solves the bugs for rhel6.1 before > rhel6.2? > Please take your time to read how RHEL release process works, but basically and shortly. Ya, it's called EUS (Z-stream), and primary purpose is for really hard/security bugs. To be honest, 709758 may be annoying bug, but it doesn't fit to Z-stream very well, especially because it can be seen only in very special conditions/broken environments. Regards, Honza From carlopmart at gmail.com Mon Sep 26 11:55:47 2011 From: carlopmart at gmail.com (carlopmart) Date: Mon, 26 Sep 2011 13:55:47 +0200 Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for rhel6.x? In-Reply-To: <4E80633E.8020409@redhat.com> References: <4E804353.1040605@gmail.com> <4E80548D.1070904@redhat.com> <4E805928.3020009@gmail.com> <4E80633E.8020409@redhat.com> Message-ID: <4E806843.6060202@gmail.com> On 09/26/2011 01:34 PM, Jan Friesse wrote: > Please take your time to read how RHEL release process works, but > basically and shortly. Ya, it's called EUS (Z-stream), and primary > purpose is for really hard/security bugs. To be honest, 709758 may be > annoying bug, but it doesn't fit to Z-stream very well, especially > because it can be seen only in very special conditions/broken environments. But problem described in 709758 appears in my enviroment: One RHEL6.1 kvm host with two, only two with single CPUs, rhel6.1 guests running RHCS ... See this: a) running top on a rhel6.1 guest: top - 13:50:02 up 4:25, 4 users, load average: 5.91, 5.99, 6.71 Tasks: 132 total, 5 running, 127 sleeping, 0 stopped, 0 zombie Cpu(s): 96.7%us, 3.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1289092k total, 259524k used, 1029568k free, 24692k buffers Swap: 1309688k total, 0k used, 1309688k free, 110376k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1260 root RT 0 88572 84m 57m R 94.3 6.7 132:46.40 corosync 10475 root 19 -1 18704 1468 732 R 2.3 0.1 2:01.54 clulog 10454 root 19 -1 18704 1512 764 R 2.0 0.1 2:01.93 clulog 10654 root 20 0 5352 1688 1244 S 0.3 0.1 0:06.76 rgmanager 11681 root 20 0 2672 1132 864 S 0.3 0.1 0:03.43 top b) trying to stop rgmanager under rhel6.1 kvm guest, never stops: [root at rhelclunode01 tmp]# time service rgmanager stop Stopping Cluster Service Manager: c) running top under rhel6.1 kvm host: top - 13:52:00 up 4:32, 1 user, load average: 1.00, 1.00, 0.93 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 26.4%us, 1.5%sy, 0.0%ni, 72.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 5088504k total, 3656212k used, 1432292k free, 57832k buffers Swap: 5242872k total, 0k used, 5242872k free, 1240980k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2659 qemu 20 0 1526m 1.2g 3880 S 100.1 25.3 182:17.81 qemu-kvm 2445 qemu 20 0 1350m 592m 3960 S 6.0 11.9 13:55.74 qemu-kvm 2203 root 20 0 683m 15m 4904 S 3.0 0.3 7:56.55 libvirtd 2524 root 20 0 0 0 0 S 1.0 0.0 1:01.55 kvm-pit-wq 2279 qemu 20 0 852m 534m 3900 S 0.7 10.8 1:31.42 qemu-kvm d) ps ax |grep qemu-kvm, under rhel6.1 kvm host: 2659 ? Sl 183:01 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -cpu qemu32 -enable-kvm -m 1280 -smp 1,sockets=1,cores=1,threads=1 -name rhelclunode01 -uuid 5f0c1503-34a0-771b-1cde-bbe257447590 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhelclunode01.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:50:56:17:ad:8f,bus=pci.0,addr=0x3,bootindex=1 -netdev tap,fd=26,id=hostnet1,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:50:56:36:59:a7,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 Then, what could be the solution if not fix will be released until rhel6.2?? disable all rhcs services and don't install RHCS netither on virtual or physical enviroments?? Thanks. -- CL Martinez carlopmart {at} gmail {d0t} com From jfriesse at redhat.com Mon Sep 26 13:17:15 2011 From: jfriesse at redhat.com (Jan Friesse) Date: Mon, 26 Sep 2011 15:17:15 +0200 Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for rhel6.x? In-Reply-To: <4E806843.6060202@gmail.com> References: <4E804353.1040605@gmail.com> <4E80548D.1070904@redhat.com> <4E805928.3020009@gmail.com> <4E80633E.8020409@redhat.com> <4E806843.6060202@gmail.com> Message-ID: <4E807B5B.5000606@redhat.com> carlopmart napsal(a): > On 09/26/2011 01:34 PM, Jan Friesse wrote: >> Please take your time to read how RHEL release process works, but >> basically and shortly. Ya, it's called EUS (Z-stream), and primary >> purpose is for really hard/security bugs. To be honest, 709758 may be >> annoying bug, but it doesn't fit to Z-stream very well, especially >> because it can be seen only in very special conditions/broken >> environments. > > But problem described in 709758 appears in my enviroment: One RHEL6.1 Please contact GSS (Global Support Service). They can help you to: - Check if your configuration is valid - Check if architecture is valid - Give you "not yet" released package and/or hot fix - Propose backport to Z-stream for given bug -> Basically everything what you are/will pay them for. Thanks, Honza From matthew.painter at kusiri.com Mon Sep 26 15:55:11 2011 From: matthew.painter at kusiri.com (Matthew Painter) Date: Mon, 26 Sep 2011 16:55:11 +0100 Subject: [Linux-cluster] Manual multicasting address for CMAN bug Message-ID: Hi all, I have been trying to set up a cluster of 3 on Red Hat 6.1 using a cisco switch, and therefore a fixed multicast address - 239.192.15.224 in this case. All the docs etc. say to add to the cluster.conf: This seems to work and a cman_tool status brings back the correct multicast address, but has a Quorum status of "Activity Blocked", because the culster nodes never join. *However* if I manually run "cman_tool leave" and then "cman_tool join -m 239.192.15.224", the nodes can see each other. Does anyone know if this is this a known issue? I can't find any information about it. Thanks for all your help :) Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhayden.public at gmail.com Mon Sep 26 16:20:35 2011 From: rhayden.public at gmail.com (Robert Hayden) Date: Mon, 26 Sep 2011 11:20:35 -0500 Subject: [Linux-cluster] Manual multicasting address for CMAN bug In-Reply-To: References: Message-ID: You might try to add the multicast stanza inside the stanza as well. You can specify an specific interface as well. For example, I have gotten this to work internally, but your environment may be different. Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.painter at kusiri.com Mon Sep 26 16:40:21 2011 From: matthew.painter at kusiri.com (Matthew Painter) Date: Mon, 26 Sep 2011 17:40:21 +0100 Subject: [Linux-cluster] Manual multicasting address for CMAN bug In-Reply-To: References: Message-ID: Hi Robert, Thanks for your suggestion. I had tried this, and it gave an error when starting cman due to incorrect configuration - turns out it is a 5.x option, not needed for 6.x because it works out the interface based on the cluster ip address. Thanks anyway :) You might try to add the multicast stanza inside the stanza as well. You can specify an specific interface as well. For example, I have gotten this to work internally, but your environment may be different. Robert On Mon, Sep 26, 2011 at 4:55 PM, Matthew Painter wrote: > Hi all, > > I have been trying to set up a cluster of 3 on Red Hat 6.1 using a cisco > switch, and therefore a fixed multicast address - 239.192.15.224 in this > case. > > All the docs etc. say to add to the cluster.conf: > > > > > > This seems to work and a cman_tool status brings back the correct multicast > address, but has a Quorum status of "Activity Blocked", because the culster > nodes never join. > > *However* if I manually run "cman_tool leave" and then "cman_tool join -m > 239.192.15.224", the nodes can see each other. > > Does anyone know if this is this a known issue? I can't find any > information about it. > > Thanks for all your help :) > > Matt > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdinitto at redhat.com Mon Sep 26 17:53:36 2011 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Mon, 26 Sep 2011 19:53:36 +0200 Subject: [Linux-cluster] Manual multicasting address for CMAN bug In-Reply-To: References: Message-ID: <4E80BC20.50507@redhat.com> On 09/26/2011 06:20 PM, Robert Hayden wrote: > You might try to add the multicast stanza inside the > stanza as well. You can specify an specific interface as well. > > For example, > > > > > > > > > > I have gotten this to work internally, but your environment may be > different. this definitely doesn't not work in RHEL6.1. multicast is never parsed in that config section. Fabio From fdinitto at redhat.com Mon Sep 26 17:55:00 2011 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Mon, 26 Sep 2011 19:55:00 +0200 Subject: [Linux-cluster] Manual multicasting address for CMAN bug In-Reply-To: References: Message-ID: <4E80BC74.9090601@redhat.com> For all RHEL related problems you need to contact GSS. You also filed https://bugzilla.redhat.com/show_bug.cgi?id=741345 to track your issue. Please provide the requested info. Fabio On 09/26/2011 05:55 PM, Matthew Painter wrote: > Hi all, > > I have been trying to set up a cluster of 3 on Red Hat 6.1 using a cisco > switch, and therefore a fixed multicast address - 239.192.15.224 in this > case. > > All the docs etc. say to add to the cluster.conf: > > > > > > This seems to work and a cman_tool status brings back the correct > multicast address, but has a Quorum status of "Activity Blocked", > because the culster nodes never join. > > *However* if I manually run "cman_tool leave" and then "cman_tool join > -m 239.192.15.224", the nodes can see each other. > > Does anyone know if this is this a known issue? I can't find any > information about it. > > Thanks for all your help :) > > Matt > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From matthew.painter at kusiri.com Mon Sep 26 17:59:46 2011 From: matthew.painter at kusiri.com (Matthew Painter) Date: Mon, 26 Sep 2011 18:59:46 +0100 Subject: [Linux-cluster] Manual multicasting address for CMAN bug In-Reply-To: <4E80BC74.9090601@redhat.com> References: <4E80BC74.9090601@redhat.com> Message-ID: Indeed, I also opened a bug. The issue is a dupe of a known issue - I have updated the bug accordingly. Thank you Fabio for helping me find a work around in setting the TTL manually :) Matt On Mon, Sep 26, 2011 at 6:55 PM, Fabio M. Di Nitto wrote: > For all RHEL related problems you need to contact GSS. > > You also filed https://bugzilla.redhat.com/show_bug.cgi?id=741345 > > to track your issue. > > Please provide the requested info. > > Fabio > > On 09/26/2011 05:55 PM, Matthew Painter wrote: > > Hi all, > > > > I have been trying to set up a cluster of 3 on Red Hat 6.1 using a cisco > > switch, and therefore a fixed multicast address - 239.192.15.224 in this > > case. > > > > All the docs etc. say to add to the cluster.conf: > > > > > > > > > > > > This seems to work and a cman_tool status brings back the correct > > multicast address, but has a Quorum status of "Activity Blocked", > > because the culster nodes never join. > > > > *However* if I manually run "cman_tool leave" and then "cman_tool join > > -m 239.192.15.224", the nodes can see each other. > > > > Does anyone know if this is this a known issue? I can't find any > > information about it. > > > > Thanks for all your help :) > > > > Matt > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jeremy.Lyon at us.ibm.com Mon Sep 26 18:56:24 2011 From: Jeremy.Lyon at us.ibm.com (Jeremy Lyon) Date: Mon, 26 Sep 2011 12:56:24 -0600 Subject: [Linux-cluster] display and release gfs locks Message-ID: Hi, We have an 8 node cluster running SASgrid. We have the core components of SAS under RHCS (rgmanager) control, but there are user/client jobs that are initiated manually and by cron outside of RHCS. We have run into an issue a few times where it seems that when the gfs init script is called to unmount all the file systems and it kills off all the processes using the gfs file systems, the gfs on the other nodes locks up and hangs. The node leaving the cluster via a reboot appears to have left cleanly (cman_tool services doesn't show any *WAIT* states) but everything is hung and requires a complete reboot of the cluster to get things going. We are wondering if the killing of the processes by the gfs init script, which uses fuser to try to kill gracefully but then uses a -9, could be issuing the -9 and thus leaving locks in DLM that could be causing this issue. Is this possible? I would think that if a node has properly/cleanly left the cluster, locks that were held by that node would be released. Is there a way to display locks that may be still existing for that node that is down? And lastly, is there a way to force the release of those locks with out the reboot of the cluster? I've been searching the linux-cluster archives with little success. RHEL 5.6 cman-2.0.115-68.el5_6.3 gfs-utils-0.1.20-8.el5 kmod-gfs-0.1.34-12.el5 Thanks Jeremy -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkovachev at varna.net Tue Sep 27 07:41:17 2011 From: kkovachev at varna.net (Kaloyan Kovachev) Date: Tue, 27 Sep 2011 10:41:17 +0300 Subject: [Linux-cluster] display and release gfs locks In-Reply-To: References: Message-ID: Hi, > Is this possible? I would think that if a node has properly/cleanly left > the cluster, locks that were held by that node would be released. Is there > a way to display locks that may be still existing for that node that is > down? And lastly, is there a way to force the release of those locks with > out the reboot of the cluster? I've been searching the linux-cluster > archives with little success. The best thing is to fix the initial problem, but as a workaround you may try to fence_node from some of the other machines in the cluster even it has left cleanly - this should cleanup the locks held from that node about seeing the locks you may use "gfs(2)_tool lockdump " or via debugfs by mounting it somewhere From fdinitto at redhat.com Tue Sep 27 09:39:03 2011 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Tue, 27 Sep 2011 11:39:03 +0200 Subject: [Linux-cluster] cluster 3.1.7 release Message-ID: <4E8199B7.20608@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Welcome to the cluster 3.1.7 release. This release addresses several bugs and especially a serious problem introduced in the 3.1.6 release. If you are currently running 3.1.6, it is highly recommended to upgrade to 3.1.7 as soon as possible. The new source tarball can be downloaded here: https://fedorahosted.org/releases/c/l/cluster/cluster-3.1.7.tar.xz ChangeLog: https://fedorahosted.org/releases/c/l/cluster/Changelog-3.1.7 To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Thanks/congratulations to all people that contributed to achieve this great milestone. Happy clustering, Fabio -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBCAAGBQJOgZm2AAoJEAgUGcMLQ3qJSMgP+wZN2YUyTLSmD7AK/EgeJPxf q00EHAa7r0gReiSqwEkuGTTNNxwEkmEUoVlGUR2+Hu9jx6aYjPs+Z+KoCCrzjUGh y4iSxcje1F2tjLwtswlNbL6itjglwfEHpskcyBRW2DiVDNX3zyUa4E1BE2zfnkOW 1PmxNnMJPQ+N0JDS9+RGho5qNvM+dll/paupl5kH76HY11j3vSY+1ugX5xhnxA4V FAHxHw3lx7y5/ihqVK1OMBg7lIRzduo82eGJGy62p0VWm2+8VKX8z8YkfgBYfLj4 lWfsk8VHGiajGhA/5bBNphKwQY34NdmsOWJ4X5ksUFiDGJLZ+H400janmiMaheR2 m5T5Hs6ouOGoBIQm5jQxiA9JbeEyzZkl4crpjwQiRJLXJt4t0FHpwrzRIrCUTuPy 7LmIi3WJv2Q4EwDoRRhdOC/9j8WqAMrBoSq72P1b/hHZnRBkDh9X0z/w9tjNvF8C RnfB6QBxEKnT27qkRyspLwfRx8DQXEGnjJbK6uDYu+m5Et5YJllDmvNKDe/BOjzt nVw8egqgXKT0fumEFGxfwjmYVeWSpIazEAu5JyoKVddWiWKO2jUj8efgCkrAbZBh CBKBoCQAVJjTGNsKL6a6xXYFHVjMhE5hsYH1/pT3rx+OiNOT6zQMF+r6MjOa/vyV MrAP3GokgFOehsCMJhx4 =eiKh -----END PGP SIGNATURE----- From rsajnove at cisco.com Tue Sep 27 21:29:54 2011 From: rsajnove at cisco.com (Ruben Sajnovetzky) Date: Tue, 27 Sep 2011 17:29:54 -0400 Subject: [Linux-cluster] How to run same service in parallel in RedHat Cluster 5.0 Message-ID: Hello, I?m in the process of design a solution replacement to a Veritas implementation and have to find similar functionalities, not sure if this is doable in Red Hat Clutser: We have a distributed application that runs in several servers simultaneously and that application must run in a cluster environment. The summary is as follows: 1. Application has two different roles for the Servers, one we could call ?Central Server? and the others ?Collectors?. 2. Application has one Central Server and X Collector Servers. 3. Central Server + Collector Servers represents a set of servers that must be running all time and we want to implement two sets in order to implement failovers between them. 4. First issue I have: Application is installed in all servers at same location, let us say ?/opt/app? and I want to monitor it in all them (i.e.: different, separated, independent instances in separated servers). In Veritas we had ?fscentral? and ?fscollector?, both with same device name and mounting point and that worked fine, (of course, both resources were part of different service groups and running in different servers). I tried to do the same here and got an error: >>> clurgmgrd[9374]: Unique attribute collision. type=fs attr=mountpoint >>> value=/opt >>> clurgmgrd[9374]: Error storing fs resource >>> >> Then, I assume should be a different way to implement this resource? Notice >> that the number of Collectors is variable so I >> can?t say ?collector 1 will be mounted as /opt1? or ?collector 1 will have >> volume name as vol1?. >> > 5. Second issue I have: > > How I can run the ?service? ?app collector? in more than one server > simultaneously (in parallel)? > Again, the option to have ?X? services for ?X? Collectors is not a > real option here. > Any idea will be appreciated!!! > > Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From linux at alteeve.com Tue Sep 27 23:18:15 2011 From: linux at alteeve.com (Digimer) Date: Tue, 27 Sep 2011 16:18:15 -0700 Subject: [Linux-cluster] How to run same service in parallel in RedHat Cluster 5.0 In-Reply-To: References: Message-ID: <4E8259B7.6090204@alteeve.com> On 09/27/2011 02:29 PM, Ruben Sajnovetzky wrote: > > Hello, > > I?m in the process of design a solution replacement to a Veritas > implementation and have to find similar functionalities, not > sure if this is doable in Red Hat Clutser: > > We have a distributed application that runs in several servers > simultaneously and that application must run in a cluster environment. > The summary is as follows: > > 1. Application has two different roles for the Servers, one we > could call ?Central Server? and the others ?Collectors?. > 2. Application has _one_ Central Server and _X_ Collector Servers. > 3. Central Server + Collector Servers represents a set of > servers that must be running all time and we want to implement > two sets in order to implement failovers between them. > 4. _First issue I have_: > Application is installed in _all servers_ at same > location, let us say ?/opt/app? and I want to monitor it in all them (i.e.: > different, separated, independent instances in separated > servers). > In Veritas we had ?fscentral? and ?fscollector?, both > with same device name and mounting point and that worked fine, > (of course, both resources were part of different > service groups and running in different servers). > I tried to do the same here and got an error: > > > clurgmgrd[9374]: Unique attribute collision. type=fs > attr=mountpoint value=/opt > clurgmgrd[9374]: Error storing fs resource > > Then, I assume should be a different way to implement this > resource? Notice that the number of Collectors is variable so I > can?t say ?collector 1 will be mounted as /opt1? or ?collector > 1 will have volume name as vol1?. > > 5. Second issue I have: > > How I can run the ?service? ?app collector? in more than one > server simultaneously (in parallel)? > Again, the option to have ?X? services for ?X? Collectors is > not a real option here. > > Any idea will be appreciated!!! > > > Thanks I've not read this carefully (at work, sorry), but if I grasped your question; For services you want to run on all servers; - Defined a unique failoverdomain containing each node to run the parallel services. - Create the a service multiple times, each using the failoverdomain containing the single target node. For services to run on one node, but move on failure, create another failover domain (ordered, if you want to set preferences) with the candidate nodes as members. Then create a service and assign it to this domain. To provide your cluster.conf (or as much as you've crafted so far). Please only obfuscate passwords if possible. -- Digimer E-Mail: digimer at alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "At what point did we forget that the Space Shuttle was, essentially, a program that strapped human beings to an explosion and tried to stab through the sky with fire and math?" From linux at alteeve.com Tue Sep 27 23:25:14 2011 From: linux at alteeve.com (Digimer) Date: Tue, 27 Sep 2011 16:25:14 -0700 Subject: [Linux-cluster] How to run same service in parallel in RedHat Cluster 5.0 In-Reply-To: <4E8259B7.6090204@alteeve.com> References: <4E8259B7.6090204@alteeve.com> Message-ID: <4E825B5A.3030206@alteeve.com> Forgot to include an example; This link shows RGManager/cluster.conf configured with two single-node failoverdomains (for managing the storage services needed to be running on both nodes in a 2-node cluster) and two failoverdomains used for a service that can migrate (a VM, specifially). It will hopefully be useful as a template for what you are trying to do. https://alteeve.com/w/Red_Hat_Cluster_Service_2_Tutorial#Creating_the_Ordered_Failover_Domains -- Digimer E-Mail: digimer at alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "At what point did we forget that the Space Shuttle was, essentially, a program that strapped human beings to an explosion and tried to stab through the sky with fire and math?" From rsajnove at cisco.com Wed Sep 28 00:04:53 2011 From: rsajnove at cisco.com (Ruben Sajnovetzky) Date: Tue, 27 Sep 2011 20:04:53 -0400 Subject: [Linux-cluster] How to run same service in parallel in RedHat Cluster 5.0 In-Reply-To: <4E825B5A.3030206@alteeve.com> Message-ID: Good example, thanks. Not sure if is doable because we could have 10 servers and the idea to have 10 service instances could be tricky to admin :( What about the other q, related with the usage of same name of devices and mounting points? -- Sent from my PDP-11 On 27-Sep-2011 7:25 PM, "Digimer" wrote: > Forgot to include an example; > > This link shows RGManager/cluster.conf configured with two single-node > failoverdomains (for managing the storage services needed to be running > on both nodes in a 2-node cluster) and two failoverdomains used for a > service that can migrate (a VM, specifially). It will hopefully be > useful as a template for what you are trying to do. > > https://alteeve.com/w/Red_Hat_Cluster_Service_2_Tutorial#Creating_the_Ordered_ > Failover_Domains On 27-Sep-2011 7:18 PM, "Digimer" wrote: > On 09/27/2011 02:29 PM, Ruben Sajnovetzky wrote: >> >> Hello, >> >> I?m in the process of design a solution replacement to a Veritas >> implementation and have to find similar functionalities, not >> sure if this is doable in Red Hat Clutser: >> >> We have a distributed application that runs in several servers >> simultaneously and that application must run in a cluster environment. >> The summary is as follows: >> >> 1. Application has two different roles for the Servers, one we >> could call ?Central Server? and the others ?Collectors?. >> 2. Application has _one_ Central Server and _X_ Collector Servers. >> 3. Central Server + Collector Servers represents a set of >> servers that must be running all time and we want to implement >> two sets in order to implement failovers between them. >> 4. _First issue I have_: >> Application is installed in _all servers_ at same >> location, let us say ?/opt/app? and I want to monitor it in all them (i.e.: >> different, separated, independent instances in separated >> servers). >> In Veritas we had ?fscentral? and ?fscollector?, both >> with same device name and mounting point and that worked fine, >> (of course, both resources were part of different >> service groups and running in different servers). >> I tried to do the same here and got an error: >> >> >> clurgmgrd[9374]: Unique attribute collision. type=fs >> attr=mountpoint value=/opt >> clurgmgrd[9374]: Error storing fs resource >> >> Then, I assume should be a different way to implement this >> resource? Notice that the number of Collectors is variable so I >> can?t say ?collector 1 will be mounted as /opt1? or ?collector >> 1 will have volume name as vol1?. >> >> 5. Second issue I have: >> >> How I can run the ?service? ?app collector? in more than one >> server simultaneously (in parallel)? >> Again, the option to have ?X? services for ?X? Collectors is >> not a real option here. >> >> Any idea will be appreciated!!! >> >> >> Thanks > > I've not read this carefully (at work, sorry), but if I grasped your > question; > > For services you want to run on all servers; > - Defined a unique failoverdomain containing each node to run the > parallel services. > - Create the a service multiple times, each using the failoverdomain > containing the single target node. > > For services to run on one node, but move on failure, create another > failover domain (ordered, if you want to set preferences) with the > candidate nodes as members. Then create a service and assign it to this > domain. > > To provide your cluster.conf (or as much as you've crafted so far). > Please only obfuscate passwords if possible. > > -- > Digimer > E-Mail: digimer at alteeve.com > Freenode handle: digimer > Papers and Projects: http://alteeve.com > Node Assassin: http://nodeassassin.org > "At what point did we forget that the Space Shuttle was, essentially, > a program that strapped human beings to an explosion and tried to stab > through the sky with fire and math?" From linux at alteeve.com Wed Sep 28 00:19:19 2011 From: linux at alteeve.com (Digimer) Date: Tue, 27 Sep 2011 17:19:19 -0700 Subject: [Linux-cluster] How to run same service in parallel in RedHat Cluster 5.0 In-Reply-To: References: Message-ID: <4E826807.5030408@alteeve.com> On 09/27/2011 05:04 PM, Ruben Sajnovetzky wrote: > > Good example, thanks. > Not sure if is doable because we could have 10 servers and the idea to have > 10 service instances could be tricky to admin :( Oh? How so? The file would be a bit long, but even with ten definitions it should still be manageable. Particularly so if you use a tool like luci. > What about the other q, related with the usage of same name of devices and > mounting points? I didn't follow that question. Rather, that sounds like a much bigger question... If '/opt/app' is local to each node, containing separate installs of the application, it should be fine. However, I expect this is not the case, of you'd not be asking. If, on the other hand, '/opt/app' is a shared storage (ie: an NFS mount, GFS2 partition, etc) then it should still be fine. Look again at that link and search for '/xen_shared'. That is a common chunk of space (using clvmd and gfs2) which is un/mounted by the cluster and it is mounted in the same place on all nodes (and uses the same LV device name). If I am not answering your question, please ask again. :) -- Digimer E-Mail: digimer at alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "At what point did we forget that the Space Shuttle was, essentially, a program that strapped human beings to an explosion and tried to stab through the sky with fire and math?" From rsajnove at cisco.com Wed Sep 28 00:33:23 2011 From: rsajnove at cisco.com (Ruben Sajnovetzky) Date: Tue, 27 Sep 2011 20:33:23 -0400 Subject: [Linux-cluster] How to run same service in parallel in RedHat Cluster 5.0 In-Reply-To: <4E826807.5030408@alteeve.com> Message-ID: I might be doing something wrong, because you say "you are fine" but didn't work :( All servers have "/opt/app" mounted in same internal disk partition. They are not shared, it is just that all have identical layout. I tried to create: Resource name: Central_FS Device: /dev/mapper/VolGroup00-optvol FS Type: ext3 Mount point: /opt And Resource name: Collector_FS Device: /dev/mapper/VolGroup00-optvol FS Type: ext3 Mount point: /opt When I tried to save it I found in the /var/log/messages: clurgmgrd[4174]: Reconfiguring clurgmgrd[4174]: Unique attribute collision. type=fs attr=mountpoint value=/opt clurgmgrd[4174]: Error storing fs resource Thanks for your help and ideas! On 27-Sep-2011 8:19 PM, "Digimer" wrote: > On 09/27/2011 05:04 PM, Ruben Sajnovetzky wrote: >> >> Good example, thanks. >> Not sure if is doable because we could have 10 servers and the idea to have >> 10 service instances could be tricky to admin :( > > Oh? How so? The file would be a bit long, but even with ten definitions > it should still be manageable. Particularly so if you use a tool like luci. > >> What about the other q, related with the usage of same name of devices and >> mounting points? > > I didn't follow that question. Rather, that sounds like a much bigger > question... > > If '/opt/app' is local to each node, containing separate installs of the > application, it should be fine. However, I expect this is not the case, > of you'd not be asking. > > If, on the other hand, '/opt/app' is a shared storage (ie: an NFS mount, > GFS2 partition, etc) then it should still be fine. Look again at that > link and search for '/xen_shared'. That is a common chunk of space > (using clvmd and gfs2) which is un/mounted by the cluster and it is > mounted in the same place on all nodes (and uses the same LV device name). > > If I am not answering your question, please ask again. :) From linux at alteeve.com Wed Sep 28 00:45:34 2011 From: linux at alteeve.com (Digimer) Date: Tue, 27 Sep 2011 17:45:34 -0700 Subject: [Linux-cluster] How to run same service in parallel in RedHat Cluster 5.0 In-Reply-To: References: Message-ID: <4E826E2E.5000507@alteeve.com> On 09/27/2011 05:33 PM, Ruben Sajnovetzky wrote: > > I might be doing something wrong, because you say "you are fine" but didn't > work :( > > All servers have "/opt/app" mounted in same internal disk partition. > They are not shared, it is just that all have identical layout. > I tried to create: > > Resource name: Central_FS > Device: /dev/mapper/VolGroup00-optvol > FS Type: ext3 > Mount point: /opt > > And > > Resource name: Collector_FS > Device: /dev/mapper/VolGroup00-optvol > FS Type: ext3 > Mount point: /opt > > When I tried to save it I found in the /var/log/messages: > > clurgmgrd[4174]: Reconfiguring > clurgmgrd[4174]: Unique attribute collision. type=fs attr=mountpoint > value=/opt > clurgmgrd[4174]: Error storing fs resource > > Thanks for your help and ideas! Please post your cluster.conf file (and obfuscate only passwords, please). Also post a sample /etc/fstab and the outputs of 'pvscan', 'vgscan' and 'lvscan'. -- Digimer E-Mail: digimer at alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "At what point did we forget that the Space Shuttle was, essentially, a program that strapped human beings to an explosion and tried to stab through the sky with fire and math?" From amit.jathar at alepo.com Wed Sep 28 11:47:59 2011 From: amit.jathar at alepo.com (Amit Jathar) Date: Wed, 28 Sep 2011 11:47:59 +0000 Subject: [Linux-cluster] corosync crashes after firing crm configuration command on any one node Message-ID: Hi, I am facing weird issue in the corosync behavior. I have configured a two node cluster. The cluster is working fine & the crm_mon command is showing proper output. The command cibadmin -Q also working on both the nodes properly. The issue starts when I put any crm configuration command. As I put crm configuration command, I can see the following output:- [root at AAA02 corosync]# crm configure property no-quorum-policy=ignore Could not connect to the CIB: Remote node did not respond ERROR: creating tmp shadow __crmshell.12274 failed [root at AAA02 corosync]# At the same time, the logs in the /var/log/messages says that:- Sep 28 13:38:40 localhost cibadmin: [12295]: info: Invoked: cibadmin -Ql Sep 28 13:38:40 localhost cibadmin: [12296]: info: Invoked: cibadmin -Ql Sep 28 13:38:40 localhost crm_shadow: [12298]: info: Invoked: crm_shadow -c __crmshell.12274 I have attached a file which has cib.xml & corosync.conf file contents on both the nodes . Please guide me to troubleshoot this error. Thanks in advance. Thanks, Amit ________________________________ This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. ________________________________ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cib_xml_corosync_conf.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: logs_on_node.txt URL: From raju.rajsand at gmail.com Wed Sep 28 12:49:50 2011 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Wed, 28 Sep 2011 18:19:50 +0530 Subject: [Linux-cluster] How to run same service in parallel in RedHat Cluster 5.0 In-Reply-To: References: <4E826807.5030408@alteeve.com> Message-ID: Greetings, On Wed, Sep 28, 2011 at 6:03 AM, Ruben Sajnovetzky wrote: > > ? ?FS Type: ext3 Shouldn't it be GFS /gfs2? -- Regards, Rajagopal From rhayden.public at gmail.com Wed Sep 28 12:52:52 2011 From: rhayden.public at gmail.com (Robert Hayden) Date: Wed, 28 Sep 2011 07:52:52 -0500 Subject: [Linux-cluster] How to run same service in parallel in RedHat Cluster 5.0 In-Reply-To: <4E826E2E.5000507@alteeve.com> References: <4E826E2E.5000507@alteeve.com> Message-ID: > On 09/27/2011 05:33 PM, Ruben Sajnovetzky wrote: > > > > I might be doing something wrong, because you say "you are fine" but > didn't > > work :( > > > > All servers have "/opt/app" mounted in same internal disk partition. > > They are not shared, it is just that all have identical layout. > > I tried to create: > > > > Resource name: Central_FS > > Device: /dev/mapper/VolGroup00-optvol > > FS Type: ext3 > > Mount point: /opt > > > > And > > > > Resource name: Collector_FS > > Device: /dev/mapper/VolGroup00-optvol > > FS Type: ext3 > > Mount point: /opt > > > My suggestion here is theoretical and not tested.... I think you want to have a single "resource" with different service names. For example, > > When I tried to save it I found in the /var/log/messages: > > > > clurgmgrd[4174]: Reconfiguring > > clurgmgrd[4174]: Unique attribute collision. type=fs > attr=mountpoint > > value=/opt > > clurgmgrd[4174]: Error storing fs resource > > > > Thanks for your help and ideas! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rsajnove at cisco.com Wed Sep 28 13:09:13 2011 From: rsajnove at cisco.com (Ruben Sajnovetzky) Date: Wed, 28 Sep 2011 09:09:13 -0400 Subject: [Linux-cluster] How to run same service in parallel in RedHat Cluster 5.0 In-Reply-To: Message-ID: This approach didn?t work either :( First server started service the second couldn?t start On 28-Sep-2011 8:52 AM, "Robert Hayden" wrote: > >> On 09/27/2011 05:33 PM, Ruben Sajnovetzky wrote: >>> > >>> > I might be doing something wrong, because you say "you are fine" but >>> didn't >>> > work :( >>> > >>> > All servers have "/opt/app" mounted in same internal disk partition. >>> > They are not shared, it is just that all have identical layout. >>> > I tried to create: >>> > >>> > ? ? Resource name: Central_FS >>> > ? ? Device: /dev/mapper/VolGroup00-optvol >>> > ? ? FS Type: ext3 >>> > ? ? Mount point: /opt >>> > >>> > And >>> > >>> > ? ? Resource name: Collector_FS >>> > ? ? Device: /dev/mapper/VolGroup00-optvol >>> > ? ? FS Type: ext3 >>> > ? ? Mount point: /opt >>> > > > My suggestion here is theoretical and not tested.... I think you want to have > a single "resource" with different service names.? For example, > > > ? > ?????? > ? ? > ?????? > ???????????? > ?????? > ?????? > ???????????? > ?????? > > > ? >>> > When I tried to save it I found in the /var/log/messages: >>> > >>> > ?clurgmgrd[4174]: Reconfiguring >>> > ?clurgmgrd[4174]: Unique attribute collision. type=fs >>> attr=mountpoint >>> > value=/opt >>> > ?clurgmgrd[4174]: Error storing fs resource >>> > >>> > Thanks for your help and ideas! > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From rsajnove at cisco.com Wed Sep 28 13:20:39 2011 From: rsajnove at cisco.com (Ruben Sajnovetzky) Date: Wed, 28 Sep 2011 09:20:39 -0400 Subject: [Linux-cluster] How to run same service in parallel in RedHat Cluster 5.0 In-Reply-To: <4E826E2E.5000507@alteeve.com> Message-ID: Here is the cluster.conf (didn't get access to run other commands yet) :