From swap_project at yahoo.com Thu Sep 1 14:57:18 2011
From: swap_project at yahoo.com (Srija)
Date: Thu, 1 Sep 2011 07:57:18 -0700 (PDT)
Subject: [Linux-cluster] vm guest migrating through clusvcadm
In-Reply-To: <1314702563.2694.13.camel@menhir>
References: <4E5CC0B8.8010209@sissa.it> <1314702563.2694.13.camel@menhir>
Message-ID: <1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com>
????Hi,
?
?
? I have confgured a guest? in the cluster environment to restart?in another? node?, when the node on which the guest resides if goes down.
?
? The guest? has no issue when i am live migrating it with 'xm migrate' command.? But having issues? when trying to migrated? with
?
? 'clusvcadm'.? The? cluster is not? under kvm.?
?
? The error is? :
?
??????????? Trying to migrate service:guest1 ?to node1...Service does not exist.
?
? Here is the configuration of the cluster:
?
?
???????????????
???????????????????????
???????????????????????????????
?????????????????????????????? ......................
????????????????????????????
???????????????????????
???????????????
???????????????
???????????????????
???????????????
???????
?
I tried also changing the cluster configuration , placing the guest? name as? service? , but it did not work? as well.
?
?Can anyone pl. confirm me that does it need the kvm to use? 'clusvcadm'? command? ? If not? then what kind of modications? is needed
?
?to? migrate the guest? with clusvcadm?
?
Thanks
From mmorgan at dca.net Thu Sep 1 20:58:18 2011
From: mmorgan at dca.net (Michael Morgan)
Date: Thu, 1 Sep 2011 16:58:18 -0400
Subject: [Linux-cluster] "Invalid resource" starting KVM guest with
clusvcadm
In-Reply-To: <20110825214529.GF7305@staff.dca.net>
References: <20110825214529.GF7305@staff.dca.net>
Message-ID: <20110901205818.GD545@staff.dca.net>
On Thu, Aug 25, 2011 at 05:45:29PM -0400, Michael Morgan wrote:
> Hello,
>
> I have a 2 node KVM cluster under Scientific Linux 6.1. Starting guests
> works fine through virsh, virt-manager, and even rg_test. When I try to
> use clusvcadm however:
>
> [root at node1 ~]# clusvcadm -e vm:test
> Local machine trying to enable vm:test...Invalid operation for resource
>
After poring through vm.sh and adding some logging I see that clusvcadm
on the bad cluster is running "vm.sh status" and fails after "virsh
domstate test". Both rg_test on the bad cluster and clusvcadm on a
working cluster run "vm.sh start" which correctly follows up with "virsh
create /mnt/shared/xml/test.xml". I can't think of any reason why this
would be happening though.
--
Michael Morgan
mmorgan at dca.net
From rhayden.public at gmail.com Fri Sep 2 13:38:25 2011
From: rhayden.public at gmail.com (Robert Hayden)
Date: Fri, 2 Sep 2011 08:38:25 -0500
Subject: [Linux-cluster] RHEL 5.7: cpg_leave error retrying
Message-ID:
Has anyone experienced the following error/hang/loop when attempting
to stop rgmanager or cman on the last node of a two node cluster?
groupd[4909]: cpg_leave error retrying
Basic scenario:
RHEL 5.7 with the latest errata for cman.
Create a two node cluster with qdisk and higher totem token=70000
start cman on both nodes, wait for qdisk to become online with master determined
stop cman on node1, wait for it to complete
stop cman on node2
error "cpg_leave" seen in logging output
Observations:
The "service cman stop" command hangs at "Stopping fencing" output
If I cycle openais service with "service openais restart", then the
"service cman stop" will complete (need to manually stop the openais
service afterwards).
When hung, the command "group_tool dump" hangs (any group_tool command hangs).
The hang is inconsistent which, in my mind, implies a timing issue.
Inconsistent meaning that every once in a while, then shutdown will
complete (maybe 20% of the time).
I have seen the issue with the stopping of rgmanager and cman. The
below example has been stripped down to show the hang with cman.
I have tested with varying the length of time to wait before stopping
the second node with no difference (hang still occurs periodically).
I have tested with commenting out the totem token and the
quorum_dev_poll and still experienced the hang. (we use the longer
timeouts to help survive network and san blips)/
I have dug through some of the source code. The message appears in
group's cpg.c as function do_cpg_leave( ). This calls the cpg_leave
function located in the openais package.
If I attach to the groupd process with gdb, I get the following stack.
Watching with strace, groupd is just in a looping state.
(gdb) where
#0 0x000000341409a510 in __nanosleep_nocancel () from /lib64/libc.so.6
#1 0x000000341409a364 in sleep () from /lib64/libc.so.6
#2 0x000000000040a410 in time ()
#3 0x000000000040bd09 in time ()
#4 0x000000000040e2cb in time ()
#5 0x000000000040ebe0 in time ()
#6 0x000000000040f394 in time ()
#7 0x000000341401d994 in __libc_start_main () from /lib64/libc.so.6
#8 0x00000000004018f9 in time ()
#9 0x00007fff04a671c8 in ?? ()
#10 0x0000000000000000 in ?? ()
If I attach to the aisexec process with gdb, I see the following:
(gdb) where
#0 0x00000034140cb696 in poll () from /lib64/libc.so.6
#1 0x0000000000405c50 in poll_run ()
#2 0x0000000000418aae in main ()
As you can see in the cluster.conf example below, I have attempted
many different ways to create more debug logging. I do see debug
messages from openais in the cpg.c component during startup, but
nothing is logged on the shutdown hang scenario.
I would appreciate any guidance on how to troubleshoot further,
especially with increasing the tracing of the openais calls in cpg.c.
Thanks
Robert
Example cluster.conf:
From rhayden.public at gmail.com Fri Sep 2 14:33:17 2011
From: rhayden.public at gmail.com (Robert Hayden)
Date: Fri, 2 Sep 2011 09:33:17 -0500
Subject: [Linux-cluster] RHEL 5.7: cpg_leave error retrying
In-Reply-To:
References:
Message-ID:
I modified the /etc/init.d/cman script to use the -D flag on the
groupd start and re-direct the output to a file in /tmp. During the
hang, I see groupd looping through the cpg_leave function. When I
restart openais, it appears that groupd will get an error code "2" and
then break out of the loop. Looks like I need to dig into the openais
cpg_leave function....
Here is the output of the groupg -D output with the openais restart at
the very end.
1314973495 cman: our nodeid 2 name node2-priv quorum 1
1314973495 setup_cpg groupd_handle 6b8b456700000000
1314973495 groupd confchg total 2 left 0 joined 1
1314973495 send_version nodeid 2 cluster 2 mode 2 compat 1
1314973495 client connection 3
1314973495 got client 3 setup
1314973495 setup fence 0
1314973495 client connection 4
1314973495 got client 4 setup
1314973495 setup dlm 1
1314973495 client connection 5
1314973495 got client 5 setup
1314973495 setup gfs 2
1314973496 got client 3 join
1314973496 0:default got join
1314973496 0:default is cpg client 6 name 0_default handle 79e2a9e300000001
1314973496 0:default cpg_join ok
1314973496 0:default waiting for first cpg event
1314973496 client connection 7
1314973496 0:default waiting for first cpg event
1314973496 0:default confchg left 0 joined 1 total 2
1314973496 0:default process_node_join 2
1314973496 0:default cpg add node 1 total 1
1314973496 0:default cpg add node 2 total 2
1314973496 0:default make_event_id 200020001 nodeid 2 memb_count 2 type 1
1314973496 0:default queue join event for nodeid 2
1314973496 0:default process_current_event 200020001 2 JOIN_BEGIN
1314973496 0:default app node init: add 2 total 1
1314973496 0:default app node init: add 1 total 2
1314973496 0:default waiting for 1 more stopped messages before
JOIN_ALL_STOPPED 2
1314973496 got client 7 get_group
1314973496 0:default waiting for 1 more stopped messages before
JOIN_ALL_STOPPED 2
1314973496 0:default waiting for 1 more stopped messages before
JOIN_ALL_STOPPED 2
1314973496 0:default mark node 1 stopped
1314973496 0:default set global_id 10001 from 1
1314973496 0:default process_current_event 200020001 2 JOIN_ALL_STOPPED
1314973496 0:default action for app: setid default 65537
1314973496 0:default action for app: start default 1 2 2 1 2
1314973496 0:default mark node 1 started
1314973496 client connection 7
1314973496 got client 7 get_group
1314973496 client connection 7
1314973496 got client 7 get_group
1314973496 got client 3 start_done
1314973496 0:default send started
1314973496 0:default mark node 2 started
1314973496 0:default process_current_event 200020001 2 JOIN_ALL_STARTED
1314973496 0:default action for app: finish default 1
1314973497 client connection 7
1314973497 got client 7 get_group
1314973557 cman: node 0 added
1314973580 0:default confchg left 1 joined 0 total 1
1314973580 0:default confchg removed node 1 reason 2
1314973580 0:default process_node_leave 1
1314973580 0:default cpg del node 1 total 1
1314973580 0:default make_event_id 100010002 nodeid 1 memb_count 1 type 2
1314973580 0:default queue leave event for nodeid 1
1314973580 0:default process_current_event 100010002 1 LEAVE_BEGIN
1314973580 0:default action for app: stop default
1314973580 got client 3 stop_done
1314973580 0:default send stopped
1314973580 0:default waiting for 2 more stopped messages before
LEAVE_ALL_STOPPED 1
1314973580 0:default mark node 1 stopped
1314973580 0:default waiting for 1 more stopped messages before
LEAVE_ALL_STOPPED 1
1314973580 0:default waiting for 1 more stopped messages before
LEAVE_ALL_STOPPED 1
1314973580 0:default mark node 2 stopped
1314973580 0:default process_current_event 100010002 1 LEAVE_ALL_STOPPED
1314973580 0:default app node leave: del 1 total 1
1314973580 0:default action for app: start default 2 3 1 2
1314973580 got client 3 start_done
1314973580 0:default send started
1314973580 0:default mark node 2 started
1314973580 0:default process_current_event 100010002 1 LEAVE_ALL_STARTED
1314973580 0:default action for app: finish default 2
1314973583 cman: node 1 removed
1314973583 add_recovery_set_cman nodeid 1
1314973591 got client 3 leave
1314973591 0:default got leave
1314973591 cpg_leave error retry
1314973592 cpg_leave error retry
1314973593 cpg_leave error retry
1314973594 cpg_leave error retry
1314973595 cpg_leave error retry
1314973596 cpg_leave error retry
1314973597 cpg_leave error retry
1314973598 cpg_leave error retry
1314973599 cpg_leave error retry
1314973600 cpg_leave error retry
1314973601 0:default cpg_leave error retrying
1314973601 cpg_leave error retry
1314973602 cpg_leave error retry
1314973603 cpg_leave error retry
1314973604 cpg_leave error retry
1314973605 cpg_leave error retry
1314973606 cpg_leave error retry
1314973607 cpg_leave error retry
1314973608 cpg_leave error retry
1314973609 cpg_leave error retry
1314973610 cpg_leave error retry
1314973611 0:default cpg_leave error retrying
1314973611 cpg_leave error retry
1314973612 cpg_leave error retry
1314973613 cpg_leave error retry
1314973614 cpg_leave error retry
1314973615 cpg_leave error retry
1314973616 cpg_leave error retry
1314973617 cpg_leave error retry
1314973618 cpg_leave error retry
1314973619 cpg_leave error retry
1314973620 cpg_leave error retry
1314973621 0:default cpg_leave error retrying
1314973621 cpg_leave error retry
1314973622 cpg_leave error retry
1314973623 cpg_leave error retry
1314973624 cpg_leave error retry
1314973625 cpg_leave error retry
1314973626 cpg_leave error retry
1314973627 cpg_leave error retry
1314973628 cpg_leave error retry
1314973629 cpg_leave error retry
1314973630 cpg_leave error retry
1314973631 0:default cpg_leave error retrying
1314973631 cpg_leave error retry
1314973632 cpg_leave error retry
1314973633 cpg_leave error retry
1314973634 cpg_leave error retry
1314973635 cpg_leave error retry
1314973636 cpg_leave error retry
1314973637 cpg_leave error retry
1314973640 0:default cpg_leave error 2
1314973640 client connection 7
1314973640 cluster is down, exiting
On Fri, Sep 2, 2011 at 8:38 AM, Robert Hayden wrote:
> Has anyone experienced the following error/hang/loop when attempting
> to stop rgmanager or cman on the last node of a two node cluster?
>
> groupd[4909]: cpg_leave error retrying
>
> Basic scenario:
> RHEL 5.7 with the latest errata for cman.
> Create a two node cluster with qdisk and higher totem token=70000
> start cman on both nodes, wait for qdisk to become online with master determined
> stop cman on node1, wait for it to complete
> stop cman on node2
> error "cpg_leave" seen in logging output
>
> Observations:
> The "service cman stop" command hangs at "Stopping fencing" output
> If I cycle openais service with "service openais restart", then the
> "service cman stop" will complete (need to manually stop the openais
> service afterwards).
> When hung, the command "group_tool dump" hangs (any group_tool command hangs).
> The hang is inconsistent which, in my mind, implies a timing issue.
> Inconsistent meaning that every once in a while, then shutdown will
> complete (maybe 20% of the time).
> I have seen the issue with the stopping of rgmanager and cman. ?The
> below example has been stripped down to show the hang with cman.
> I have tested with varying the length of time to wait before stopping
> the second node with no difference (hang still occurs periodically).
> I have tested with commenting out the totem token and the
> quorum_dev_poll and still experienced the hang. (we use the longer
> timeouts to help survive network and san blips)/
>
>
> I have dug through some of the source code. ?The message appears in
> group's cpg.c as function do_cpg_leave( ). ?This calls the cpg_leave
> function located in the openais package.
>
> If I attach to the groupd process with gdb, I get the following stack.
> ?Watching with strace, groupd is just in a looping state.
> (gdb) where
> #0 ?0x000000341409a510 in __nanosleep_nocancel () from /lib64/libc.so.6
> #1 ?0x000000341409a364 in sleep () from /lib64/libc.so.6
> #2 ?0x000000000040a410 in time ()
> #3 ?0x000000000040bd09 in time ()
> #4 ?0x000000000040e2cb in time ()
> #5 ?0x000000000040ebe0 in time ()
> #6 ?0x000000000040f394 in time ()
> #7 ?0x000000341401d994 in __libc_start_main () from /lib64/libc.so.6
> #8 ?0x00000000004018f9 in time ()
> #9 ?0x00007fff04a671c8 in ?? ()
> #10 0x0000000000000000 in ?? ()
>
> If I attach to the aisexec process with gdb, I see the following:
> (gdb) where
> #0 ?0x00000034140cb696 in poll () from /lib64/libc.so.6
> #1 ?0x0000000000405c50 in poll_run ()
> #2 ?0x0000000000418aae in main ()
>
>
> As you can see in the cluster.conf example below, I have attempted
> many different ways to create more debug logging. ?I do see debug
> messages from openais in the cpg.c component during startup, but
> nothing is logged on the shutdown hang scenario.
>
> I would appreciate any guidance on how to troubleshoot further,
> especially with increasing the tracing of the openais calls in cpg.c.
>
> Thanks
> Robert
>
>
> Example cluster.conf:
>
>
> ? ? ? ? timestamp="on" debug="on">
> ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ?
> ? ? ? ?
> ? ? ? ?
> ? ? ? ? ? ? ? ?
> ? ? ? ?
> ? ? ? ?
> ? ? ? ? post_fail_delay="10" post_join_delay="60"/>
> ? ? ? ? log_level="7" min_score="1" tko="60" votes="1">
> ? ? ? ? ? ? ? ?
> ? ? ? ?
> ? ? ? ?
> ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ?
> ? ? ? ?
> ? ? ? ?
> ? ? ? ? ? ? ? ? ipaddr="X.X.X.X" login="node1_fence" name="iLO_node1"
> passwd="password" power_wait="10" lanplus="1"/>
> ? ? ? ? ? ? ? ? ipaddr="X.X.X.X" login="node2_fence" name="iLO_node2"
> passwd="password" power_wait="10" lanplus="1"/>
> ? ? ? ?
> ? ? ? ?
>
>
From rhayden.public at gmail.com Fri Sep 2 16:15:15 2011
From: rhayden.public at gmail.com (Robert Hayden)
Date: Fri, 2 Sep 2011 11:15:15 -0500
Subject: [Linux-cluster] RHEL 5.7: cpg_leave error retrying
In-Reply-To:
References:
Message-ID:
I search the openais forums and ran across two recent threads and a couple
of potential patches that sounds interesting. Unfortunately, I do not have
enough experience to determine if it is related to my issue.
"[Openais] Problems forming cluster on corosync startup" at
http://marc.info/?l=openais&m=131234252917259&w=2
"[Openais] CPG client can lockup if the local node is in the downlist" at
http://marc.info/?l=openais&m=131354417212931&w=2
The above threads refer to a patch from Steven Drake at
http://marc.info/?l=openais&m=131274060602528&w=2
Thanks
Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From thomas at sjolshagen.net Fri Sep 2 22:34:29 2011
From: thomas at sjolshagen.net (Thomas Sjolshagen)
Date: Fri, 02 Sep 2011 18:34:29 -0400
Subject: [Linux-cluster] =?utf-8?q?dlm=3A_dev=5Fwrite_no_op_48479213_18508?=
Message-ID:
I've been getting:
dlm: dev_write no op 48479213 18508
in
dmesg output after I've upgraded to the latest Fedora 15 cluster
packages.
After a while, my GFS2 file system(s) stop responding. I
can't prove a connection between the two, but was wondering if there is
any reason to believe there could be?
Packages:
cluster-glue-1.0.6-2.fc15.1.x86_64
gfs2-cluster-3.1.1-2.fc15.x86_64
cluster-glue-libs-1.0.6-2.fc15.1.x86_64
clusterlib-3.1.5-1.fc15.x86_64
cman-3.1.5-1.fc15.x86_64
kernel-2.6.40.3-0.fc15.x86_64
corosync-1.4.1-1.fc15.x86_64
corosynclib-1.4.1-1.fc15.x86_64
openaislib-1.1.4-2.fc15.x86_64
openais-1.1.4-2.fc15.x86_64
--
Read my blog(s) [1] - occasionally updated!:
Follow me on Twitter
[2]
Links:
------
[1] http://www.sjolshagen.net/
[2]
http://www.twitter.com/NotFitEnough
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From fdinitto at redhat.com Sat Sep 3 05:09:11 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Sat, 03 Sep 2011 07:09:11 +0200
Subject: [Linux-cluster] dlm: dev_write no op 48479213 18508
In-Reply-To:
References:
Message-ID: <4E61B677.3080702@redhat.com>
On 09/03/2011 12:34 AM, Thomas Sjolshagen wrote:
> I've been getting:
>
> dlm: dev_write no op 48479213 18508
>
> in dmesg output after I've upgraded to the latest Fedora 15 cluster
> packages.
>
We already have a fix for this message. It is a miscommunication between
kernel and dlm_controld. My understanding is that it is harmless. (see
bz731775 for more details)
> After a while, my GFS2 file system(s) stop responding. I can't prove a
> connection between the two, but was wondering if there is any reason to
> believe there could be?
It is probably unrelated but I strongly recommend you file a bug against
gfs2-utils in fedora so that the gfs2 maintainers can look at it.
Fabio
>
> Packages:
>
> cluster-glue-1.0.6-2.fc15.1.x86_64
> gfs2-cluster-3.1.1-2.fc15.x86_64
> cluster-glue-libs-1.0.6-2.fc15.1.x86_64
> clusterlib-3.1.5-1.fc15.x86_64
> cman-3.1.5-1.fc15.x86_64
> kernel-2.6.40.3-0.fc15.x86_64
>
> corosync-1.4.1-1.fc15.x86_64
> corosynclib-1.4.1-1.fc15.x86_64
>
> openaislib-1.1.4-2.fc15.x86_64
> openais-1.1.4-2.fc15.x86_64
>
> --
>
> Read my blog(s) - occasionally updated!:
>
> Follow me on Twitter
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
From rh-cluster at menole.net Tue Sep 6 09:51:11 2011
From: rh-cluster at menole.net (Michael Mende)
Date: Tue, 6 Sep 2011 11:51:11 +0200
Subject: [Linux-cluster] vm guest migrating through clusvcadm
In-Reply-To: <1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com>
References: <4E5CC0B8.8010209@sissa.it> <1314702563.2694.13.camel@menhir>
<1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com>
Message-ID: <20110906095111.GA16237@menole.dyndns.org>
Maybe Digimer's tutorial will help:
https://alteeve.com/w/Red_Hat_Cluster_Service_2_Tutorial
--
Mit freundlichen Gr??en,
Michael Mende
http://www.menole.net/
On Thu, Sep 01, 2011 at 07:57:18AM -0700, Srija wrote:
> ????Hi,
> ?
> ?
> ? I have confgured a guest? in the cluster environment to restart?in another? node?, when the node on which the guest resides if goes down.
> ?
> ? The guest? has no issue when i am live migrating it with 'xm migrate' command.? But having issues? when trying to migrated? with
> ?
> ? 'clusvcadm'.? The? cluster is not? under kvm.?
> ?
> ? The error is? :
> ?
> ??????????? Trying to migrate service:guest1 ?to node1...Service does not exist.
> ?
> ? Here is the configuration of the cluster:
> ?
> ?
> ???????????????
> ???????????????????????
> ???????????????????????????????
> ?????????????????????????????? ......................
> ????????????????????????????
> ???????????????????????
> ???????????????
> ???????????????
> ???????????????????
> ???????????????
> ???????
> ?
> I tried also changing the cluster configuration , placing the guest? name as? service? , but it did not work? as well.
> ?
> ?Can anyone pl. confirm me that does it need the kvm to use? 'clusvcadm'? command? ? If not? then what kind of modications? is needed
> ?
> ?to? migrate the guest? with clusvcadm?
> ?
> Thanks
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
From hal at elizium.za.net Tue Sep 6 10:06:35 2011
From: hal at elizium.za.net (Hugo Lombard)
Date: Tue, 6 Sep 2011 12:06:35 +0200
Subject: [Linux-cluster] vm guest migrating through clusvcadm
In-Reply-To: <1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com>
References: <4E5CC0B8.8010209@sissa.it> <1314702563.2694.13.camel@menhir>
<1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com>
Message-ID: <20110906100635.GI3298@squishy.elizium.za.net>
On Thu, Sep 01, 2011 at 07:57:18AM -0700, Srija wrote:
>
> The error is :
>
> Trying to migrate service:guest1 to node1...Service does not exist.
>
What was the command you tried?
That 'service:guest1' looks suspect, I think it should rather be
'vm:guest1'. It should match the name of the service in the clustat
output.
As an example, we'd use:
clusvcadm -M vm:guest1 -m srv2
to migrate the virtual machine 'guest1' to the cluster node 'srv2'.
HTH
--
Hugo Lombard
From mark at thermeon.com Wed Sep 7 18:37:52 2011
From: mark at thermeon.com (Mark Olliver)
Date: Wed, 7 Sep 2011 19:37:52 +0100
Subject: [Linux-cluster] kvm shared disk space
Message-ID: <012b01cc6d8d$44c21c50$ce4654f0$@thermeon.com>
Hi,
I have two kvm guests A and B which live on two different hosts. Both of the
host have a different partition which is DRBD synced in Active/Active mode
between the hosts, This is then mounted to each host using gfs.
I now need to allow access to the data on the shared gfs disk by the two kvm
guests but I am unsure what I need to do to do that. I have looked at the
libvirt options but do not see anything that would make sense for the config
file.
Ideally each of the guests should mount the data to /mnt/data as this will
can then be served out by both of them at the same time.
I should note I do not need the data mounted on the hosts, I have just done
that at the moment to test getting gfs2 working over active/active drbd. I
do however, need locks to work correctly as the application that needs to
use the shared data does need to use locks so any mounting or exporting
option needs to respect that.
Any help or ideas gratefully received.
Regards
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From ntoughe at hotmail.com Thu Sep 8 09:06:40 2011
From: ntoughe at hotmail.com (Guy-Serge NTOUGHE)
Date: Thu, 8 Sep 2011 09:06:40 +0000 (UTC)
Subject: [Linux-cluster] Invitation to connect on LinkedIn
Message-ID: <589992105.4907891.1315472800009.JavaMail.app@ela4-app0128.prod>
I'd like to add you to my professional network on LinkedIn.
- Guy-Serge
Guy-Serge NTOUGHE
Linux Expert at Michelin TravelPartners
Paris Area, France
Confirm that you know Guy-Serge NTOUGHE:
https://www.linkedin.com/e/-odgn7o-gsbijgnb-3y/isd/4129701855/GsprLpRo/?hs=false&tok=3oiUnw96j2yAU1
--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/-odgn7o-gsbijgnb-3y/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/goo/linux-cluster%40redhat%2Ecom/20061/I1425010746_1/?hs=false&tok=08LlmS2DH2yAU1
(c) 2011 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From ntoughe at hotmail.com Thu Sep 8 09:08:22 2011
From: ntoughe at hotmail.com (Guy-Serge NTOUGHE)
Date: Thu, 8 Sep 2011 09:08:22 +0000 (UTC)
Subject: [Linux-cluster] Invitation to connect on LinkedIn
Message-ID: <192109173.4972278.1315472902660.JavaMail.app@ela4-app0132.prod>
I'd like to add you to my professional network on LinkedIn.
- Guy-Serge
Guy-Serge NTOUGHE
Linux Expert at Michelin TravelPartners
Paris Area, France
Confirm that you know Guy-Serge NTOUGHE:
https://www.linkedin.com/e/-odgn7o-gsbilnur-5i/isd/4129701855/GsprLpRo/?hs=false&tok=2vII09OQ34yAU1
--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/-odgn7o-gsbilnur-5i/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/goo/linux-cluster%40redhat%2Ecom/20061/I1425015880_1/?hs=false&tok=23mvBAu9z4yAU1
(c) 2011 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From pradhanparas at gmail.com Tue Sep 13 16:40:02 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 13 Sep 2011 11:40:02 -0500
Subject: [Linux-cluster] replacing HBA
Message-ID:
Hi,
I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
cluster. Apart from changing wwn in the SAN, what else do I need to
change in Linux (centos). will the change be reflected automatically?
Thanks!
Paras.
From rpeterso at redhat.com Tue Sep 13 18:04:08 2011
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 13 Sep 2011 14:04:08 -0400 (EDT)
Subject: [Linux-cluster] replacing HBA
In-Reply-To:
Message-ID: <405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
----- Original Message -----
| Hi,
|
| I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
| cluster. Apart from changing wwn in the SAN, what else do I need to
| change in Linux (centos). will the change be reflected automatically?
|
|
| Thanks!
| Paras.
Hi Paras,
The GFS2 file system doesn't care what HBA you're using.
So as long as your kernel has a good device driver for that HBA
you shouldn't need to do anything else.
Regards,
Bob Peterson
Red Hat File Systems
From pradhanparas at gmail.com Tue Sep 13 19:46:26 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 13 Sep 2011 14:46:26 -0500
Subject: [Linux-cluster] replacing HBA
In-Reply-To: <405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
References:
<405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID:
Thanks Bob.
Another question. What about replacing single port HBA with a dual
port. After configuring the multipathd, can I reconfigure physical
volume without destroying the vg, lv and clvm ? I am kinddda lost
here.
Thanks
Paras.
On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson wrote:
> ----- Original Message -----
> | Hi,
> |
> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
> | cluster. Apart from changing wwn in the SAN, what else do I need to
> | change in Linux (centos). will the change be reflected automatically?
> |
> |
> | Thanks!
> | Paras.
>
> Hi Paras,
>
> The GFS2 file system doesn't care what HBA you're using.
> So as long as your kernel has a good device driver for that HBA
> you shouldn't need to do anything else.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
From keith.schincke at gmail.com Tue Sep 13 21:30:33 2011
From: keith.schincke at gmail.com (Keith Schincke)
Date: Tue, 13 Sep 2011 16:30:33 -0500
Subject: [Linux-cluster] replacing HBA
In-Reply-To:
References:
<405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID:
How many paths doe you currently have to your disk?
Does your LVM use the multipath name (mpath0)?
Sent from my iPhone
On Sep 13, 2011, at 14:46, Paras pradhan wrote:
> Thanks Bob.
>
> Another question. What about replacing single port HBA with a dual
> port. After configuring the multipathd, can I reconfigure physical
> volume without destroying the vg, lv and clvm ? I am kinddda lost
> here.
>
> Thanks
> Paras.
>
> On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson wrote:
>> ----- Original Message -----
>> | Hi,
>> |
>> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
>> | cluster. Apart from changing wwn in the SAN, what else do I need to
>> | change in Linux (centos). will the change be reflected automatically?
>> |
>> |
>> | Thanks!
>> | Paras.
>>
>> Hi Paras,
>>
>> The GFS2 file system doesn't care what HBA you're using.
>> So as long as your kernel has a good device driver for that HBA
>> you shouldn't need to do anything else.
>>
>> Regards,
>>
>> Bob Peterson
>> Red Hat File Systems
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
From pradhanparas at gmail.com Tue Sep 13 22:11:03 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 13 Sep 2011 17:11:03 -0500
Subject: [Linux-cluster] replacing HBA
In-Reply-To:
References:
<405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID:
On Tue, Sep 13, 2011 at 4:30 PM, Keith Schincke
wrote:
> How many paths doe you currently have to your disk?
> Does your LVM use the multipath name (mpath0)?
Right now only one path with no multipath configured so LVM is not
using mpath0. Ideas?
Thanks!
Paras.
>
> Sent from my iPhone
>
> On Sep 13, 2011, at 14:46, Paras pradhan wrote:
>
>> Thanks Bob.
>>
>> Another question. What about replacing single port HBA with a dual
>> port. After configuring the multipathd, can I reconfigure physical
>> volume without destroying the vg, lv and clvm ? I am kinddda lost
>> here.
>>
>> Thanks
>> Paras.
>>
>> On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson wrote:
>>> ----- Original Message -----
>>> | Hi,
>>> |
>>> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
>>> | cluster. Apart from changing wwn in the SAN, what else do I need to
>>> | change in Linux (centos). will the change be reflected automatically?
>>> |
>>> |
>>> | Thanks!
>>> | Paras.
>>>
>>> Hi Paras,
>>>
>>> The GFS2 file system doesn't care what HBA you're using.
>>> So as long as your kernel has a good device driver for that HBA
>>> you shouldn't need to do anything else.
>>>
>>> Regards,
>>>
>>> Bob Peterson
>>> Red Hat File Systems
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
From keith.schincke at gmail.com Tue Sep 13 22:50:56 2011
From: keith.schincke at gmail.com (Keith Schincke)
Date: Tue, 13 Sep 2011 17:50:56 -0500
Subject: [Linux-cluster] replacing HBA
In-Reply-To:
References:
<405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID:
Hmmm. The UUID of the physical volume should be written to disk (sdj) or
partition (sdj1) depending on your design.
kpartx should not care about the data on the disk (ie your UUID) when it
makes the mpathXpY entries.
Hopefully what will happen will be
- install your hba and zone the SAN as necessary
- enable multipathd and restart. This should create the mpathX entries.
multipath -ll will list the paths and disks
- run kpartx -a to add needed mpathXpY entries. I do not know if this runs
on startup.
- reboot and see if you can mount the LVM.
If all goes right, pvdisplay should display the multipath devices of your
PVs.
On Tue, Sep 13, 2011 at 5:11 PM, Paras pradhan wrote:
> On Tue, Sep 13, 2011 at 4:30 PM, Keith Schincke
> wrote:
> > How many paths doe you currently have to your disk?
> > Does your LVM use the multipath name (mpath0)?
>
> Right now only one path with no multipath configured so LVM is not
> using mpath0. Ideas?
>
> Thanks!
> Paras.
>
>
> >
> > Sent from my iPhone
> >
> > On Sep 13, 2011, at 14:46, Paras pradhan wrote:
> >
> >> Thanks Bob.
> >>
> >> Another question. What about replacing single port HBA with a dual
> >> port. After configuring the multipathd, can I reconfigure physical
> >> volume without destroying the vg, lv and clvm ? I am kinddda lost
> >> here.
> >>
> >> Thanks
> >> Paras.
> >>
> >> On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson
> wrote:
> >>> ----- Original Message -----
> >>> | Hi,
> >>> |
> >>> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
> >>> | cluster. Apart from changing wwn in the SAN, what else do I need to
> >>> | change in Linux (centos). will the change be reflected automatically?
> >>> |
> >>> |
> >>> | Thanks!
> >>> | Paras.
> >>>
> >>> Hi Paras,
> >>>
> >>> The GFS2 file system doesn't care what HBA you're using.
> >>> So as long as your kernel has a good device driver for that HBA
> >>> you shouldn't need to do anything else.
> >>>
> >>> Regards,
> >>>
> >>> Bob Peterson
> >>> Red Hat File Systems
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From pradhanparas at gmail.com Thu Sep 15 16:50:01 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Thu, 15 Sep 2011 11:50:01 -0500
Subject: [Linux-cluster] replacing HBA
In-Reply-To:
References:
<405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID:
Thanks Keith. I will try and let all know how it goes.
Paras.
On Tue, Sep 13, 2011 at 5:50 PM, Keith Schincke
wrote:
> Hmmm. The UUID of the physical volume should be written to disk (sdj) or
> partition (sdj1) depending on your design.
> kpartx should not care about the data on the disk (ie your UUID) when it
> makes the mpathXpY entries.
>
> Hopefully what will happen will be
> - install your hba and zone the SAN as necessary
> - enable multipathd and restart. This should create the mpathX entries.
> multipath -ll will list the paths and disks
> - run kpartx -a to add needed mpathXpY entries. I do not know if this runs
> on startup.
> - reboot and see if you can mount the LVM.
>
> If all goes right, pvdisplay should display the multipath devices of your
> PVs.
>
>
> On Tue, Sep 13, 2011 at 5:11 PM, Paras pradhan
> wrote:
>>
>> On Tue, Sep 13, 2011 at 4:30 PM, Keith Schincke
>> wrote:
>> > How many paths doe you currently have to your disk?
>> > Does your LVM use the multipath name (mpath0)?
>>
>> Right now only one path with no multipath configured so LVM is not
>> using mpath0. Ideas?
>>
>> Thanks!
>> Paras.
>>
>>
>> >
>> > Sent from my iPhone
>> >
>> > On Sep 13, 2011, at 14:46, Paras pradhan wrote:
>> >
>> >> Thanks Bob.
>> >>
>> >> Another question. What about replacing single port HBA with a dual
>> >> port. After configuring the multipathd, can I reconfigure physical
>> >> volume without destroying the vg, lv and clvm ? I am kinddda lost
>> >> here.
>> >>
>> >> Thanks
>> >> Paras.
>> >>
>> >> On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson
>> >> wrote:
>> >>> ----- Original Message -----
>> >>> | Hi,
>> >>> |
>> >>> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
>> >>> | cluster. Apart from changing wwn in the SAN, what else do I need to
>> >>> | change in Linux (centos). will the change be reflected
>> >>> automatically?
>> >>> |
>> >>> |
>> >>> | Thanks!
>> >>> | Paras.
>> >>>
>> >>> Hi Paras,
>> >>>
>> >>> The GFS2 file system doesn't care what HBA you're using.
>> >>> So as long as your kernel has a good device driver for that HBA
>> >>> you shouldn't need to do anything else.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Bob Peterson
>> >>> Red Hat File Systems
>> >>>
>> >>> --
>> >>> Linux-cluster mailing list
>> >>> Linux-cluster at redhat.com
>> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>
>> >>
>> >> --
>> >> Linux-cluster mailing list
>> >> Linux-cluster at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
From carlopmart at gmail.com Fri Sep 16 08:22:21 2011
From: carlopmart at gmail.com (carlopmart)
Date: Fri, 16 Sep 2011 10:22:21 +0200
Subject: [Linux-cluster] Corosync goes cpu to 95-99%
In-Reply-To: <4E2D940B.5020803@redhat.com>
References: <4DD29D03.9080901@gmail.com> <4DD2BAC3.50509@redhat.com> <4DD2BD7D.5070704@gmail.com> <4DD2CA90.6090802@redhat.com> <3B50BA7445114813AE429BEE51A2BA52@versa> <4DD78908.2030801@gmail.com> <0B1965C8-9807-42B6-9453-01BE0C0B1DCB@cybercat.ca><4DD80D5D.10004@gmail.com> <4DD873C7.8080402@cybercat.ca> <22E7D11CD5E64E338A66811F31F06238@versa> <4DE545D7.1080703@redhat.com> <4DE69786.5010204@gmail.com><4DE6CAF6.4000002@cybercat.ca> <4DE75602.1000408@gmail.com>
<51BB988BCCF547E69BF222BDAF34C4DE@versa>
<4E04B61B.9070208@cybercat.ca> <4E2D63DD.4050007@gmail.com>
<4E2D7329.6050607@redhat.com> <4E2D7425.4070801@gmail.com>
<4E2D8ECB.6020305@redhat.com> <4E2D8F87.30508@gmail.com>
<4E2D940B.5020803@redhat.com>
Message-ID: <4E73073D.8010209@gmail.com>
On 07/25/2011 06:04 PM, Steven Dake wrote:
> On 07/25/2011 08:45 AM, carlopmart wrote:
>> On 07/25/2011 05:42 PM, Steven Dake wrote:
>>>>>>> are caused by this issue.
>>>>>>>
>>>>>>> So, as a temporary work-around for this time, woule be (at your own
>>>>>>> risks) to downgrade to 2.6.32-71.29.1.el6 kernel :
>>>>>>>
>>>>>>> yum install kernel-2.6.32-71.29.1.el6.x86_64
>>>>>>>
>>>>>>> Regards,
>>>>>>
>>>>>> Hi Steven and Nicolas,
>>>>>>
>>>>>> Is this bug resolved in RHEL6.1 with all updates applied?? Do I
>>>>>> need to
>>>>>> use some specific kernel version 2.6.32-131.2.1 or 2.6.32-131.6.1?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>
>>>>> the corosync portion is going through QE. The kernel portion remains
>>>>> open.
>>>>>
>>>>> Regards
>>>>> -steve
>>>>>
>>>>
>>>> Thanks Steve, then, Can I use last corosync version provided with
>>>> RHEL6.1 and last RHEL6.0's kernel version without problems??
>>>>
>>>>
>>>>
>>>
>>> I recommend not mixing without a support signoff.
>>>
>>
>> Then, how can I install rhcs under rhel6.x and prevent this bug??
>>
>>
> get a support signoff. Also the corosync updates have not finished
> through our validation process. Only hot fixes (from support) are available
>
> Regards
> -steve
>
Sorry to re-open this thread ... But exists any news about this problem??
--
CL Martinez
carlopmart {at} gmail {d0t} com
From ext.thales.jean-daniel.bonnetot at sncf.fr Fri Sep 16 12:54:02 2011
From: ext.thales.jean-daniel.bonnetot at sncf.fr (BONNETOT Jean-Daniel (EXT THALES))
Date: Fri, 16 Sep 2011 14:54:02 +0200
Subject: [Linux-cluster] Luci can't install packages
Message-ID:
Hello,
Usually I used manal installation but I need to process throu Luci. My
problem is present with RHEL 5.7 and RHEL 6.0 (luci and ricci), with
RHEL 5.6 it works correctly.
I used "Create" new cluster and add my nodes (options arenot important,
the problem is always here) and submit...
"Please wait..."
Creating node "node1" for cluster "clutest": installing packages
Creating node "node2" for cluster "clutest": installing packages
I waited ;) but nothing. My process list on nodes says :
4166 ? Ss 0:00 /usr/sbin/oddjobd -p /var/run/oddjobd.pid -t
300
22343 ? S 0:00 \_ ricci-modrpm
22355 ? S 0:01 \_ /usr/bin/python /usr/bin/yum -y list
all
4221 ? S
From chekov at ucla.edu Fri Sep 16 22:26:00 2011
From: chekov at ucla.edu (Alan Wood)
Date: Fri, 16 Sep 2011 15:26:00 -0700 (PDT)
Subject: [Linux-cluster] shared disk with virsh migration
In-Reply-To:
References:
Message-ID:
Hi all,
I'm trying to decide whether I really need a cluster implementation to do
what I want to do and I figured I'd solicit opinions.
Essentially I want to have two machines running as virtualization hosts
with libvirt/kvm. I have shared iSCSI storage available to both hosts and
have to decide how to configure the storage for use with libvirt. Right
now I see three possibilities:
1. Setting an iSCSI storage pool in libvirt
Pros: Migration seems painless, including live migration
Cons: Need to pre-allocate LUNs on iSCSI box.
Does not seem to take advantage of iSCSI offloading or multipathing
2. Setting up a two-node cluster and running CLVM
Pros: Very flexible storage management (is snapshotting supported yet in clvm?)
Automatic failover
Cons: Cluster infrastructure adds complexity, more potential for bugs
Possible split brain issues?
3. A single iSCSI block device with partitions for each VM mounted on both hosts
Pros: Easy migration, setup
Cons: Two hosts accessing the same block device outside of a
cluster seems like it might lead to disaster
Right now I actually like option 3 but I'm wondering if I really am asking
for trouble accessing a block device simultaneously on two hosts without a
clustering infrastructure. I did this a while back with a shared-SCSI box
and it seemed to work. I would never be accessing the same partition on
both hosts and I understand that all partitioning has to be done while the
other host is off, but is there something else I'm missing here?
Also, are people out there running option 2? Does it make sesne to set up
a cluster as small as 2-nodes for HA virtualization or do I really need
more nodes for it to be worthwhile? I do have all the fencing
infrastructure I might need (PDUs and Dracs).
any help would be appreciated. thanks
-alan
From ext.thales.jean-daniel.bonnetot at sncf.fr Mon Sep 19 08:02:41 2011
From: ext.thales.jean-daniel.bonnetot at sncf.fr (BONNETOT Jean-Daniel (EXT THALES))
Date: Mon, 19 Sep 2011 10:02:41 +0200
Subject: [Linux-cluster] shared disk with virsh migration
In-Reply-To:
References:
Message-ID:
Hello,
I don't use KVM and libvirt but my experiment concerne clustering storage :
1. Don't know
2. Snapshotting is supported in clvm (since 5.7 I think)
Complexity... yes
Bugs... yes
Split brain... yes
2 nodes is sufficient for HA, juste think what happens if 1 node shuts down and your VMs are very loded (needs 3rd nodes ?)
3. No experiment too but it sounds like it's not the right usage
Best regards
--
JD
-----Message d'origine-----
De?: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] De la part de Alan Wood
Envoy??: samedi 17 septembre 2011 00:26
??: linux-cluster at redhat.com
Objet?: [Linux-cluster] shared disk with virsh migration
Hi all,
I'm trying to decide whether I really need a cluster implementation to do
what I want to do and I figured I'd solicit opinions.
Essentially I want to have two machines running as virtualization hosts
with libvirt/kvm. I have shared iSCSI storage available to both hosts and
have to decide how to configure the storage for use with libvirt. Right
now I see three possibilities:
1. Setting an iSCSI storage pool in libvirt
Pros: Migration seems painless, including live migration
Cons: Need to pre-allocate LUNs on iSCSI box.
Does not seem to take advantage of iSCSI offloading or multipathing
2. Setting up a two-node cluster and running CLVM
Pros: Very flexible storage management (is snapshotting supported yet in clvm?)
Automatic failover
Cons: Cluster infrastructure adds complexity, more potential for bugs
Possible split brain issues?
3. A single iSCSI block device with partitions for each VM mounted on both hosts
Pros: Easy migration, setup
Cons: Two hosts accessing the same block device outside of a
cluster seems like it might lead to disaster
Right now I actually like option 3 but I'm wondering if I really am asking
for trouble accessing a block device simultaneously on two hosts without a
clustering infrastructure. I did this a while back with a shared-SCSI box
and it seemed to work. I would never be accessing the same partition on
both hosts and I understand that all partitioning has to be done while the
other host is off, but is there something else I'm missing here?
Also, are people out there running option 2? Does it make sesne to set up
a cluster as small as 2-nodes for HA virtualization or do I really need
more nodes for it to be worthwhile? I do have all the fencing
infrastructure I might need (PDUs and Dracs).
any help would be appreciated. thanks
-alan
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------
Ce message et toutes les pi?ces jointes sont ?tablis ? l'intention exclusive de ses destinataires et sont confidentiels. L'int?grit? de ce message n'?tant pas assur?e sur Internet, la SNCF ne peut ?tre tenue responsable des alt?rations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, m?me partielle, non autoris?e pr?alablement par la SNCF, est strictement interdite. Si vous n'?tes pas le destinataire de ce message, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it.
From carlopmart at gmail.com Mon Sep 19 09:09:40 2011
From: carlopmart at gmail.com (carlopmart)
Date: Mon, 19 Sep 2011 11:09:40 +0200
Subject: [Linux-cluster] Rotating apache logs when is configured as a
resource under RHCS
Message-ID: <4E7706D4.2070201@gmail.com>
Hi all,
I have configured an apache resource under cluster.conf like this:
(both nodes are RHEL6.1)
My question is: which is the best form to rotate apache logs using
logrotate configuration??
Is this a possible solution:
/var/log/httpd/*log {
missingok
notifempty
sharedscripts
delaycompress
postrotate
if [ -f /var/run/cluster/apache/apache:httpd-mirror.pid ]; then
clusvcadm -R httpd-mirror
fi
endscript
}
--
CL Martinez
carlopmart {at} gmail {d0t} com
From pmshehzad at yahoo.com Mon Sep 19 06:07:40 2011
From: pmshehzad at yahoo.com (pmshehzad at yahoo.com)
Date: Mon, 19 Sep 2011 06:07:40
Subject: [Linux-cluster] hi cluster
Message-ID: eca75ac689a87dca0122ec482a6b9609@[192.168.1.1]
hows it going this is really interesting http://blog.news7ifinance.com/ see you around
From harry.sutton at hp.com Mon Sep 19 12:34:53 2011
From: harry.sutton at hp.com (Sutton, Harry (HAS GSE))
Date: Mon, 19 Sep 2011 08:34:53 -0400
Subject: [Linux-cluster] shared disk with virsh migration
In-Reply-To:
References:
Message-ID: <4E7736ED.9000607@hp.com>
I'd have to do some research to verify, but I'm guessing that iSCSI (in
option 3) would use the traditional SCSI reservation mechanism to
prevent problems associated with multiple access.
/Harry
On 09/16/2011 06:26 PM, Alan Wood wrote:
> Hi all,
>
> I'm trying to decide whether I really need a cluster implementation to do
> what I want to do and I figured I'd solicit opinions.
> Essentially I want to have two machines running as virtualization hosts
> with libvirt/kvm. I have shared iSCSI storage available to both hosts and
> have to decide how to configure the storage for use with libvirt. Right
> now I see three possibilities:
> 1. Setting an iSCSI storage pool in libvirt
> Pros: Migration seems painless, including live migration
> Cons: Need to pre-allocate LUNs on iSCSI box.
> Does not seem to take advantage of iSCSI offloading or multipathing
> 2. Setting up a two-node cluster and running CLVM
> Pros: Very flexible storage management (is snapshotting supported yet in clvm?)
> Automatic failover
> Cons: Cluster infrastructure adds complexity, more potential for bugs
> Possible split brain issues?
> 3. A single iSCSI block device with partitions for each VM mounted on both hosts
> Pros: Easy migration, setup
> Cons: Two hosts accessing the same block device outside of a
> cluster seems like it might lead to disaster
>
> Right now I actually like option 3 but I'm wondering if I really am asking
> for trouble accessing a block device simultaneously on two hosts without a
> clustering infrastructure. I did this a while back with a shared-SCSI box
> and it seemed to work. I would never be accessing the same partition on
> both hosts and I understand that all partitioning has to be done while the
> other host is off, but is there something else I'm missing here?
>
> Also, are people out there running option 2? Does it make sesne to set up
> a cluster as small as 2-nodes for HA virtualization or do I really need
> more nodes for it to be worthwhile? I do have all the fencing
> infrastructure I might need (PDUs and Dracs).
>
> any help would be appreciated. thanks
> -alan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5069 bytes
Desc: S/MIME Cryptographic Signature
URL:
From jmd_singhsaini at yahoo.com Tue Sep 20 05:40:55 2011
From: jmd_singhsaini at yahoo.com (Harvinder Singh Binder)
Date: Tue, 20 Sep 2011 11:10:55 +0530 (IST)
Subject: [Linux-cluster] Rotating apache logs when is configured as a
resource under RHCS
In-Reply-To: <4E7706D4.2070201@gmail.com>
Message-ID: <1316497255.25429.YahooMailClassic@web94809.mail.in2.yahoo.com>
how i configure media player in linux operation system
please tell me about configure procedure(Commands).
Harvinder Singh S/O Baldev Raj, VPO Barwa Teh. Anandpur Sahib, Dist. Ropar, PunjabE-Mail ID:- ? ? jmd_singhsaini at yahoo.com
--- On Mon, 19/9/11, carlopmart wrote:
> From: carlopmart
> Subject: [Linux-cluster] Rotating apache logs when is configured as a resource under RHCS
> To: linux-cluster at redhat.com
> Date: Monday, 19 September, 2011, 2:09 AM
> Hi all,
>
> I have configured an apache resource under cluster.conf
> like this: (both nodes are RHEL6.1)
>
> config_file="/data/config/etc/httpd/conf/httpd-mirror.conf"
> name="httpd-mirror" server_root="/data/config/etc/httpd"
> shutdown_wait="3"/>
>
> My question is: which is the best form to rotate apache
> logs using logrotate configuration??
>
> Is this a possible solution:
>
> /var/log/httpd/*log {
> ? ? missingok
> ? ? notifempty
> ? ? sharedscripts
> ? ? delaycompress
> ? ? postrotate
> ? ? ? ? if [ -f
> /var/run/cluster/apache/apache:httpd-mirror.pid ]; then
> ? ? ? ? ? ? clusvcadm -R
> httpd-mirror
> ? ? ? ? fi
> ? ? endscript
> }
> --
> CL Martinez
> carlopmart {at} gmail {d0t} com
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
From sdake at redhat.com Tue Sep 20 18:13:45 2011
From: sdake at redhat.com (Steven Dake)
Date: Tue, 20 Sep 2011 11:13:45 -0700
Subject: [Linux-cluster] New Corosync Mailing list - Please register for it!
Message-ID: <4E78D7D9.7060001@redhat.com>
Hi,
Over the past several years, we have been sharing a mailing list with
the openais project. I have made a new mailing list specifically for
corosync: This will be the permanent new list for corosync.
Please register at:
http://lists.corosync.org/mailman/listinfo
The list is called "discuss"
Q Why are we making this change now?
A Several weeks ago Linux Foundation was hacked into (see
http://www.linuxfoundation.org). They hosted our mailing list service.
During this event, the mailing list has been unusable. The Linux
Foundation staff is busy rebuilding their network, but in the interim
this seems like a good opportunity to move everything to our core
infrastructure at corosync.org.
Q What about the archives?
A I hope to restore the archives once I can get the records from Linux
Foundation. There is no guarantee I can get a restored copy of the
archive however. Fortunately several services over the years have
archived our mailing list.
Q What about my registration on the openais mailing list?
A I don't have the records to transfer the registrations to the corosync
list, so you will have to sign up for the mailing list again.
Q Is my password that I used to register on the openais mailing list
compromised?
A I do not know what extent the systems were hacked, but I'd recommend
treating the password as compromised. If you shared this password with
other services, please change it. Mailman stores passwords in plaintext
so that it can mail them to you once a month. Always use unique
passwords on mailman mailing lists.
Regards
-steve
From laszlo at beres.me Thu Sep 22 14:57:27 2011
From: laszlo at beres.me (Laszlo Beres)
Date: Thu, 22 Sep 2011 16:57:27 +0200
Subject: [Linux-cluster] Lost connection to storage - what happens?
Message-ID:
Hi,
just a theoretical question: let's assume we have a cluster with GFS2
filesystem (not as a managed resource). What happens exactly if all
paths to backend device get lost? It's not a cluster event, so I
assume cluster operates normally, but what does GFS2/DLM do?
Regards,
--
L?szl? B?res? ? ? ? ? ? Unix system engineer
http://www.google.com/profiles/beres.laszlo
From carlopmart at gmail.com Mon Sep 26 09:18:11 2011
From: carlopmart at gmail.com (carlopmart)
Date: Mon, 26 Sep 2011 11:18:11 +0200
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
rhel6.x?
Message-ID: <4E804353.1040605@gmail.com>
Hi all,
Due to continuous problems with corosync
(https://bugzilla.redhat.com/show_bug.cgi?id=709758,
https://www.redhat.com/archives/linux-cluster/2011-July/msg00074.html)
under rhel6.x (I have a trial subscription, that I will convert to
permanent subscription when all works ok), I would like to know when
corosync-1.4.1-3.el6, will be released for rhel6.1. Any??
Thanks ...
--
CL Martinez
carlopmart {at} gmail {d0t} com
From ajb2 at mssl.ucl.ac.uk Mon Sep 26 10:01:09 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Mon, 26 Sep 2011 11:01:09 +0100
Subject: [Linux-cluster] Lost connection to storage - what happens?
In-Reply-To:
References:
Message-ID: <4E804D65.6090808@mssl.ucl.ac.uk>
Laszlo Beres wrote:
> Hi,
>
> just a theoretical question: let's assume we have a cluster with GFS2
> filesystem (not as a managed resource). What happens exactly if all
> paths to backend device get lost?
GFS2 withdraws that filesystem and you'll have to reboot all the
withdrawn machines to get it back, once the paths are restored.
GFS doesn't require a reboot.
Redhat argue this is not a regression as GFS2 is not GFS
From jfriesse at redhat.com Mon Sep 26 10:31:41 2011
From: jfriesse at redhat.com (Jan Friesse)
Date: Mon, 26 Sep 2011 12:31:41 +0200
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
rhel6.x?
In-Reply-To: <4E804353.1040605@gmail.com>
References: <4E804353.1040605@gmail.com>
Message-ID: <4E80548D.1070904@redhat.com>
carlopmart napsal(a):
> Hi all,
>
> Due to continuous problems with corosync
> (https://bugzilla.redhat.com/show_bug.cgi?id=709758,
> https://www.redhat.com/archives/linux-cluster/2011-July/msg00074.html)
> under rhel6.x (I have a trial subscription, that I will convert to
> permanent subscription when all works ok), I would like to know when
> corosync-1.4.1-3.el6, will be released for rhel6.1. Any??
We are not doing rebases in Z streams, so Corosync 1.4.1 will be never
released for RHEL 6.1. It will be available in RHEL 6.2.
Regards,
Honza
>
> Thanks ...
>
From carlopmart at gmail.com Mon Sep 26 10:51:20 2011
From: carlopmart at gmail.com (carlopmart)
Date: Mon, 26 Sep 2011 12:51:20 +0200
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
rhel6.x?
In-Reply-To: <4E80548D.1070904@redhat.com>
References: <4E804353.1040605@gmail.com> <4E80548D.1070904@redhat.com>
Message-ID: <4E805928.3020009@gmail.com>
On 09/26/2011 12:31 PM, Jan Friesse wrote:
> carlopmart napsal(a):
>> Hi all,
>>
>> Due to continuous problems with corosync
>> (https://bugzilla.redhat.com/show_bug.cgi?id=709758,
>> https://www.redhat.com/archives/linux-cluster/2011-July/msg00074.html)
>> under rhel6.x (I have a trial subscription, that I will convert to
>> permanent subscription when all works ok), I would like to know when
>> corosync-1.4.1-3.el6, will be released for rhel6.1. Any??
>
> We are not doing rebases in Z streams, so Corosync 1.4.1 will be never
> released for RHEL 6.1. It will be available in RHEL 6.2.
>
> Regards,
> Honza
>
>>
But can be released a version that solves the bugs for rhel6.1 before
rhel6.2?
--
CL Martinez
carlopmart {at} gmail {d0t} com
From jfriesse at redhat.com Mon Sep 26 11:34:22 2011
From: jfriesse at redhat.com (Jan Friesse)
Date: Mon, 26 Sep 2011 13:34:22 +0200
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
rhel6.x?
In-Reply-To: <4E805928.3020009@gmail.com>
References: <4E804353.1040605@gmail.com> <4E80548D.1070904@redhat.com>
<4E805928.3020009@gmail.com>
Message-ID: <4E80633E.8020409@redhat.com>
carlopmart napsal(a):
> On 09/26/2011 12:31 PM, Jan Friesse wrote:
>> carlopmart napsal(a):
>>> Hi all,
>>>
>>> Due to continuous problems with corosync
>>> (https://bugzilla.redhat.com/show_bug.cgi?id=709758,
>>> https://www.redhat.com/archives/linux-cluster/2011-July/msg00074.html)
>>> under rhel6.x (I have a trial subscription, that I will convert to
>>> permanent subscription when all works ok), I would like to know when
>>> corosync-1.4.1-3.el6, will be released for rhel6.1. Any??
>>
>> We are not doing rebases in Z streams, so Corosync 1.4.1 will be never
>> released for RHEL 6.1. It will be available in RHEL 6.2.
>>
>> Regards,
>> Honza
>>
>>>
>
> But can be released a version that solves the bugs for rhel6.1 before
> rhel6.2?
>
Please take your time to read how RHEL release process works, but
basically and shortly. Ya, it's called EUS (Z-stream), and primary
purpose is for really hard/security bugs. To be honest, 709758 may be
annoying bug, but it doesn't fit to Z-stream very well, especially
because it can be seen only in very special conditions/broken environments.
Regards,
Honza
From carlopmart at gmail.com Mon Sep 26 11:55:47 2011
From: carlopmart at gmail.com (carlopmart)
Date: Mon, 26 Sep 2011 13:55:47 +0200
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
rhel6.x?
In-Reply-To: <4E80633E.8020409@redhat.com>
References: <4E804353.1040605@gmail.com>
<4E80548D.1070904@redhat.com> <4E805928.3020009@gmail.com>
<4E80633E.8020409@redhat.com>
Message-ID: <4E806843.6060202@gmail.com>
On 09/26/2011 01:34 PM, Jan Friesse wrote:
> Please take your time to read how RHEL release process works, but
> basically and shortly. Ya, it's called EUS (Z-stream), and primary
> purpose is for really hard/security bugs. To be honest, 709758 may be
> annoying bug, but it doesn't fit to Z-stream very well, especially
> because it can be seen only in very special conditions/broken environments.
But problem described in 709758 appears in my enviroment: One RHEL6.1
kvm host with two, only two with single CPUs, rhel6.1 guests running
RHCS ...
See this:
a) running top on a rhel6.1 guest:
top - 13:50:02 up 4:25, 4 users, load average: 5.91, 5.99, 6.71
Tasks: 132 total, 5 running, 127 sleeping, 0 stopped, 0 zombie
Cpu(s): 96.7%us, 3.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 1289092k total, 259524k used, 1029568k free, 24692k buffers
Swap: 1309688k total, 0k used, 1309688k free, 110376k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1260 root RT 0 88572 84m 57m R 94.3 6.7 132:46.40 corosync
10475 root 19 -1 18704 1468 732 R 2.3 0.1 2:01.54 clulog
10454 root 19 -1 18704 1512 764 R 2.0 0.1 2:01.93 clulog
10654 root 20 0 5352 1688 1244 S 0.3 0.1 0:06.76 rgmanager
11681 root 20 0 2672 1132 864 S 0.3 0.1 0:03.43 top
b) trying to stop rgmanager under rhel6.1 kvm guest, never stops:
[root at rhelclunode01 tmp]# time service rgmanager stop
Stopping Cluster Service Manager:
c) running top under rhel6.1 kvm host:
top - 13:52:00 up 4:32, 1 user, load average: 1.00, 1.00, 0.93
Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie
Cpu(s): 26.4%us, 1.5%sy, 0.0%ni, 72.2%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 5088504k total, 3656212k used, 1432292k free, 57832k buffers
Swap: 5242872k total, 0k used, 5242872k free, 1240980k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2659 qemu 20 0 1526m 1.2g 3880 S 100.1 25.3 182:17.81 qemu-kvm
2445 qemu 20 0 1350m 592m 3960 S 6.0 11.9 13:55.74 qemu-kvm
2203 root 20 0 683m 15m 4904 S 3.0 0.3 7:56.55 libvirtd
2524 root 20 0 0 0 0 S 1.0 0.0 1:01.55 kvm-pit-wq
2279 qemu 20 0 852m 534m 3900 S 0.7 10.8 1:31.42 qemu-kvm
d) ps ax |grep qemu-kvm, under rhel6.1 kvm host:
2659 ? Sl 183:01 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -cpu
qemu32 -enable-kvm -m 1280 -smp 1,sockets=1,cores=1,threads=1 -name
rhelclunode01 -uuid 5f0c1503-34a0-771b-1cde-bbe257447590 -nodefconfig
-nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhelclunode01.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -netdev
tap,fd=21,id=hostnet0,vhost=on,vhostfd=25 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:50:56:17:ad:8f,bus=pci.0,addr=0x3,bootindex=1
-netdev tap,fd=26,id=hostnet1,vhost=on,vhostfd=27 -device
virtio-net-pci,netdev=hostnet1,id=net1,mac=00:50:56:36:59:a7,bus=pci.0,addr=0x4
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2 -vga
cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
Then, what could be the solution if not fix will be released until
rhel6.2?? disable all rhcs services and don't install RHCS netither on
virtual or physical enviroments??
Thanks.
--
CL Martinez
carlopmart {at} gmail {d0t} com
From jfriesse at redhat.com Mon Sep 26 13:17:15 2011
From: jfriesse at redhat.com (Jan Friesse)
Date: Mon, 26 Sep 2011 15:17:15 +0200
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
rhel6.x?
In-Reply-To: <4E806843.6060202@gmail.com>
References: <4E804353.1040605@gmail.com> <4E80548D.1070904@redhat.com> <4E805928.3020009@gmail.com> <4E80633E.8020409@redhat.com>
<4E806843.6060202@gmail.com>
Message-ID: <4E807B5B.5000606@redhat.com>
carlopmart napsal(a):
> On 09/26/2011 01:34 PM, Jan Friesse wrote:
>> Please take your time to read how RHEL release process works, but
>> basically and shortly. Ya, it's called EUS (Z-stream), and primary
>> purpose is for really hard/security bugs. To be honest, 709758 may be
>> annoying bug, but it doesn't fit to Z-stream very well, especially
>> because it can be seen only in very special conditions/broken
>> environments.
>
> But problem described in 709758 appears in my enviroment: One RHEL6.1
Please contact GSS (Global Support Service). They can help you to:
- Check if your configuration is valid
- Check if architecture is valid
- Give you "not yet" released package and/or hot fix
- Propose backport to Z-stream for given bug
-> Basically everything what you are/will pay them for.
Thanks,
Honza
From matthew.painter at kusiri.com Mon Sep 26 15:55:11 2011
From: matthew.painter at kusiri.com (Matthew Painter)
Date: Mon, 26 Sep 2011 16:55:11 +0100
Subject: [Linux-cluster] Manual multicasting address for CMAN bug
Message-ID:
Hi all,
I have been trying to set up a cluster of 3 on Red Hat 6.1 using a cisco
switch, and therefore a fixed multicast address - 239.192.15.224 in this
case.
All the docs etc. say to add to the cluster.conf:
This seems to work and a cman_tool status brings back the correct multicast
address, but has a Quorum status of "Activity Blocked", because the culster
nodes never join.
*However* if I manually run "cman_tool leave" and then "cman_tool join -m
239.192.15.224", the nodes can see each other.
Does anyone know if this is this a known issue? I can't find any information
about it.
Thanks for all your help :)
Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From rhayden.public at gmail.com Mon Sep 26 16:20:35 2011
From: rhayden.public at gmail.com (Robert Hayden)
Date: Mon, 26 Sep 2011 11:20:35 -0500
Subject: [Linux-cluster] Manual multicasting address for CMAN bug
In-Reply-To:
References:
Message-ID:
You might try to add the multicast stanza inside the stanza as
well. You can specify an specific interface as well.
For example,
I have gotten this to work internally, but your environment may be
different.
Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From matthew.painter at kusiri.com Mon Sep 26 16:40:21 2011
From: matthew.painter at kusiri.com (Matthew Painter)
Date: Mon, 26 Sep 2011 17:40:21 +0100
Subject: [Linux-cluster] Manual multicasting address for CMAN bug
In-Reply-To:
References:
Message-ID:
Hi Robert,
Thanks for your suggestion. I had tried this, and it gave an error when
starting cman due to incorrect configuration - turns out it is a 5.x option,
not needed for 6.x because it works out the interface based on the cluster
ip address.
Thanks anyway :)
You might try to add the multicast stanza inside the stanza as
well. You can specify an specific interface as well.
For example,
I have gotten this to work internally, but your environment may be
different.
Robert
On Mon, Sep 26, 2011 at 4:55 PM, Matthew Painter wrote:
> Hi all,
>
> I have been trying to set up a cluster of 3 on Red Hat 6.1 using a cisco
> switch, and therefore a fixed multicast address - 239.192.15.224 in this
> case.
>
> All the docs etc. say to add to the cluster.conf:
>
>
>
>
>
> This seems to work and a cman_tool status brings back the correct multicast
> address, but has a Quorum status of "Activity Blocked", because the culster
> nodes never join.
>
> *However* if I manually run "cman_tool leave" and then "cman_tool join -m
> 239.192.15.224", the nodes can see each other.
>
> Does anyone know if this is this a known issue? I can't find any
> information about it.
>
> Thanks for all your help :)
>
> Matt
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From fdinitto at redhat.com Mon Sep 26 17:53:36 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 26 Sep 2011 19:53:36 +0200
Subject: [Linux-cluster] Manual multicasting address for CMAN bug
In-Reply-To:
References:
Message-ID: <4E80BC20.50507@redhat.com>
On 09/26/2011 06:20 PM, Robert Hayden wrote:
> You might try to add the multicast stanza inside the
> stanza as well. You can specify an specific interface as well.
>
> For example,
>
>
>
>
>
>
>
>
>
> I have gotten this to work internally, but your environment may be
> different.
this definitely doesn't not work in RHEL6.1.
multicast is never parsed in that config section.
Fabio
From fdinitto at redhat.com Mon Sep 26 17:55:00 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 26 Sep 2011 19:55:00 +0200
Subject: [Linux-cluster] Manual multicasting address for CMAN bug
In-Reply-To:
References:
Message-ID: <4E80BC74.9090601@redhat.com>
For all RHEL related problems you need to contact GSS.
You also filed https://bugzilla.redhat.com/show_bug.cgi?id=741345
to track your issue.
Please provide the requested info.
Fabio
On 09/26/2011 05:55 PM, Matthew Painter wrote:
> Hi all,
>
> I have been trying to set up a cluster of 3 on Red Hat 6.1 using a cisco
> switch, and therefore a fixed multicast address - 239.192.15.224 in this
> case.
>
> All the docs etc. say to add to the cluster.conf:
>
>
>
>
>
> This seems to work and a cman_tool status brings back the correct
> multicast address, but has a Quorum status of "Activity Blocked",
> because the culster nodes never join.
>
> *However* if I manually run "cman_tool leave" and then "cman_tool join
> -m 239.192.15.224", the nodes can see each other.
>
> Does anyone know if this is this a known issue? I can't find any
> information about it.
>
> Thanks for all your help :)
>
> Matt
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
From matthew.painter at kusiri.com Mon Sep 26 17:59:46 2011
From: matthew.painter at kusiri.com (Matthew Painter)
Date: Mon, 26 Sep 2011 18:59:46 +0100
Subject: [Linux-cluster] Manual multicasting address for CMAN bug
In-Reply-To: <4E80BC74.9090601@redhat.com>
References:
<4E80BC74.9090601@redhat.com>
Message-ID:
Indeed, I also opened a bug.
The issue is a dupe of a known issue - I have updated the bug accordingly.
Thank you Fabio for helping me find a work around in setting the TTL
manually :)
Matt
On Mon, Sep 26, 2011 at 6:55 PM, Fabio M. Di Nitto wrote:
> For all RHEL related problems you need to contact GSS.
>
> You also filed https://bugzilla.redhat.com/show_bug.cgi?id=741345
>
> to track your issue.
>
> Please provide the requested info.
>
> Fabio
>
> On 09/26/2011 05:55 PM, Matthew Painter wrote:
> > Hi all,
> >
> > I have been trying to set up a cluster of 3 on Red Hat 6.1 using a cisco
> > switch, and therefore a fixed multicast address - 239.192.15.224 in this
> > case.
> >
> > All the docs etc. say to add to the cluster.conf:
> >
> >
> >
> >
> >
> > This seems to work and a cman_tool status brings back the correct
> > multicast address, but has a Quorum status of "Activity Blocked",
> > because the culster nodes never join.
> >
> > *However* if I manually run "cman_tool leave" and then "cman_tool join
> > -m 239.192.15.224", the nodes can see each other.
> >
> > Does anyone know if this is this a known issue? I can't find any
> > information about it.
> >
> > Thanks for all your help :)
> >
> > Matt
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Jeremy.Lyon at us.ibm.com Mon Sep 26 18:56:24 2011
From: Jeremy.Lyon at us.ibm.com (Jeremy Lyon)
Date: Mon, 26 Sep 2011 12:56:24 -0600
Subject: [Linux-cluster] display and release gfs locks
Message-ID:
Hi,
We have an 8 node cluster running SASgrid. We have the core components of
SAS under RHCS (rgmanager) control, but there are user/client jobs that are
initiated manually and by cron outside of RHCS. We have run into an issue a
few times where it seems that when the gfs init script is called to unmount
all the file systems and it kills off all the processes using the gfs file
systems, the gfs on the other nodes locks up and hangs. The node leaving
the cluster via a reboot appears to have left cleanly (cman_tool services
doesn't show any *WAIT* states) but everything is hung and requires a
complete reboot of the cluster to get things going. We are wondering if the
killing of the processes by the gfs init script, which uses fuser to try to
kill gracefully but then uses a -9, could be issuing the -9 and thus
leaving locks in DLM that could be causing this issue.
Is this possible? I would think that if a node has properly/cleanly left
the cluster, locks that were held by that node would be released. Is there
a way to display locks that may be still existing for that node that is
down? And lastly, is there a way to force the release of those locks with
out the reboot of the cluster? I've been searching the linux-cluster
archives with little success.
RHEL 5.6
cman-2.0.115-68.el5_6.3
gfs-utils-0.1.20-8.el5
kmod-gfs-0.1.34-12.el5
Thanks
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From kkovachev at varna.net Tue Sep 27 07:41:17 2011
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Tue, 27 Sep 2011 10:41:17 +0300
Subject: [Linux-cluster] display and release gfs locks
In-Reply-To:
References:
Message-ID:
Hi,
> Is this possible? I would think that if a node has properly/cleanly left
> the cluster, locks that were held by that node would be released. Is
there
> a way to display locks that may be still existing for that node that is
> down? And lastly, is there a way to force the release of those locks
with
> out the reboot of the cluster? I've been searching the linux-cluster
> archives with little success.
The best thing is to fix the initial problem, but as a workaround you may
try to fence_node from some of the other machines in the cluster even it
has left cleanly - this should cleanup the locks held from that node
about seeing the locks you may use "gfs(2)_tool lockdump " or
via debugfs by mounting it somewhere
From fdinitto at redhat.com Tue Sep 27 09:39:03 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 27 Sep 2011 11:39:03 +0200
Subject: [Linux-cluster] cluster 3.1.7 release
Message-ID: <4E8199B7.20608@redhat.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Welcome to the cluster 3.1.7 release.
This release addresses several bugs and especially a serious problem
introduced in the 3.1.6 release. If you are currently running 3.1.6,
it is highly recommended to upgrade to 3.1.7 as soon as possible.
The new source tarball can be downloaded here:
https://fedorahosted.org/releases/c/l/cluster/cluster-3.1.7.tar.xz
ChangeLog:
https://fedorahosted.org/releases/c/l/cluster/Changelog-3.1.7
To report bugs or issues:
https://bugzilla.redhat.com/
Would you like to meet the cluster team or members of its community?
Join us on IRC (irc.freenode.net #linux-cluster) and share your
experience with other sysadministrators or power users.
Thanks/congratulations to all people that contributed to achieve this
great milestone.
Happy clustering,
Fabio
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQIcBAEBCAAGBQJOgZm2AAoJEAgUGcMLQ3qJSMgP+wZN2YUyTLSmD7AK/EgeJPxf
q00EHAa7r0gReiSqwEkuGTTNNxwEkmEUoVlGUR2+Hu9jx6aYjPs+Z+KoCCrzjUGh
y4iSxcje1F2tjLwtswlNbL6itjglwfEHpskcyBRW2DiVDNX3zyUa4E1BE2zfnkOW
1PmxNnMJPQ+N0JDS9+RGho5qNvM+dll/paupl5kH76HY11j3vSY+1ugX5xhnxA4V
FAHxHw3lx7y5/ihqVK1OMBg7lIRzduo82eGJGy62p0VWm2+8VKX8z8YkfgBYfLj4
lWfsk8VHGiajGhA/5bBNphKwQY34NdmsOWJ4X5ksUFiDGJLZ+H400janmiMaheR2
m5T5Hs6ouOGoBIQm5jQxiA9JbeEyzZkl4crpjwQiRJLXJt4t0FHpwrzRIrCUTuPy
7LmIi3WJv2Q4EwDoRRhdOC/9j8WqAMrBoSq72P1b/hHZnRBkDh9X0z/w9tjNvF8C
RnfB6QBxEKnT27qkRyspLwfRx8DQXEGnjJbK6uDYu+m5Et5YJllDmvNKDe/BOjzt
nVw8egqgXKT0fumEFGxfwjmYVeWSpIazEAu5JyoKVddWiWKO2jUj8efgCkrAbZBh
CBKBoCQAVJjTGNsKL6a6xXYFHVjMhE5hsYH1/pT3rx+OiNOT6zQMF+r6MjOa/vyV
MrAP3GokgFOehsCMJhx4
=eiKh
-----END PGP SIGNATURE-----
From rsajnove at cisco.com Tue Sep 27 21:29:54 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Tue, 27 Sep 2011 17:29:54 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
Message-ID:
Hello,
I?m in the process of design a solution replacement to a Veritas
implementation and have to find similar functionalities, not
sure if this is doable in Red Hat Clutser:
We have a distributed application that runs in several servers
simultaneously and that application must run in a cluster environment.
The summary is as follows:
1. Application has two different roles for the Servers, one we could
call ?Central Server? and the others ?Collectors?.
2. Application has one Central Server and X Collector Servers.
3. Central Server + Collector Servers represents a set of servers
that must be running all time and we want to implement
two sets in order to implement failovers between them.
4. First issue I have:
Application is installed in all servers at same location,
let us say ?/opt/app? and I want to monitor it in all them (i.e.:
different, separated, independent instances in separated
servers).
In Veritas we had ?fscentral? and ?fscollector?, both with
same device name and mounting point and that worked fine,
(of course, both resources were part of different service
groups and running in different servers).
I tried to do the same here and got an error:
>>> clurgmgrd[9374]: Unique attribute collision. type=fs attr=mountpoint
>>> value=/opt
>>> clurgmgrd[9374]: Error storing fs resource
>>>
>> Then, I assume should be a different way to implement this resource? Notice
>> that the number of Collectors is variable so I
>> can?t say ?collector 1 will be mounted as /opt1? or ?collector 1 will have
>> volume name as vol1?.
>>
> 5. Second issue I have:
>
> How I can run the ?service? ?app collector? in more than one server
> simultaneously (in parallel)?
> Again, the option to have ?X? services for ?X? Collectors is not a
> real option here.
>
Any idea will be appreciated!!!
>
> Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From linux at alteeve.com Tue Sep 27 23:18:15 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 27 Sep 2011 16:18:15 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
In-Reply-To:
References:
Message-ID: <4E8259B7.6090204@alteeve.com>
On 09/27/2011 02:29 PM, Ruben Sajnovetzky wrote:
>
> Hello,
>
> I?m in the process of design a solution replacement to a Veritas
> implementation and have to find similar functionalities, not
> sure if this is doable in Red Hat Clutser:
>
> We have a distributed application that runs in several servers
> simultaneously and that application must run in a cluster environment.
> The summary is as follows:
>
> 1. Application has two different roles for the Servers, one we
> could call ?Central Server? and the others ?Collectors?.
> 2. Application has _one_ Central Server and _X_ Collector Servers.
> 3. Central Server + Collector Servers represents a set of
> servers that must be running all time and we want to implement
> two sets in order to implement failovers between them.
> 4. _First issue I have_:
> Application is installed in _all servers_ at same
> location, let us say ?/opt/app? and I want to monitor it in all them (i.e.:
> different, separated, independent instances in separated
> servers).
> In Veritas we had ?fscentral? and ?fscollector?, both
> with same device name and mounting point and that worked fine,
> (of course, both resources were part of different
> service groups and running in different servers).
> I tried to do the same here and got an error:
>
>
> clurgmgrd[9374]: Unique attribute collision. type=fs
> attr=mountpoint value=/opt
> clurgmgrd[9374]: Error storing fs resource
>
> Then, I assume should be a different way to implement this
> resource? Notice that the number of Collectors is variable so I
> can?t say ?collector 1 will be mounted as /opt1? or ?collector
> 1 will have volume name as vol1?.
>
> 5. Second issue I have:
>
> How I can run the ?service? ?app collector? in more than one
> server simultaneously (in parallel)?
> Again, the option to have ?X? services for ?X? Collectors is
> not a real option here.
>
> Any idea will be appreciated!!!
>
>
> Thanks
I've not read this carefully (at work, sorry), but if I grasped your
question;
For services you want to run on all servers;
- Defined a unique failoverdomain containing each node to run the
parallel services.
- Create the a service multiple times, each using the failoverdomain
containing the single target node.
For services to run on one node, but move on failure, create another
failover domain (ordered, if you want to set preferences) with the
candidate nodes as members. Then create a service and assign it to this
domain.
To provide your cluster.conf (or as much as you've crafted so far).
Please only obfuscate passwords if possible.
--
Digimer
E-Mail: digimer at alteeve.com
Freenode handle: digimer
Papers and Projects: http://alteeve.com
Node Assassin: http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"
From linux at alteeve.com Tue Sep 27 23:25:14 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 27 Sep 2011 16:25:14 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
In-Reply-To: <4E8259B7.6090204@alteeve.com>
References: <4E8259B7.6090204@alteeve.com>
Message-ID: <4E825B5A.3030206@alteeve.com>
Forgot to include an example;
This link shows RGManager/cluster.conf configured with two single-node
failoverdomains (for managing the storage services needed to be running
on both nodes in a 2-node cluster) and two failoverdomains used for a
service that can migrate (a VM, specifially). It will hopefully be
useful as a template for what you are trying to do.
https://alteeve.com/w/Red_Hat_Cluster_Service_2_Tutorial#Creating_the_Ordered_Failover_Domains
--
Digimer
E-Mail: digimer at alteeve.com
Freenode handle: digimer
Papers and Projects: http://alteeve.com
Node Assassin: http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"
From rsajnove at cisco.com Wed Sep 28 00:04:53 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Tue, 27 Sep 2011 20:04:53 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
In-Reply-To: <4E825B5A.3030206@alteeve.com>
Message-ID:
Good example, thanks.
Not sure if is doable because we could have 10 servers and the idea to have
10 service instances could be tricky to admin :(
What about the other q, related with the usage of same name of devices and
mounting points?
--
Sent from my PDP-11
On 27-Sep-2011 7:25 PM, "Digimer" wrote:
> Forgot to include an example;
>
> This link shows RGManager/cluster.conf configured with two single-node
> failoverdomains (for managing the storage services needed to be running
> on both nodes in a 2-node cluster) and two failoverdomains used for a
> service that can migrate (a VM, specifially). It will hopefully be
> useful as a template for what you are trying to do.
>
> https://alteeve.com/w/Red_Hat_Cluster_Service_2_Tutorial#Creating_the_Ordered_
> Failover_Domains
On 27-Sep-2011 7:18 PM, "Digimer" wrote:
> On 09/27/2011 02:29 PM, Ruben Sajnovetzky wrote:
>>
>> Hello,
>>
>> I?m in the process of design a solution replacement to a Veritas
>> implementation and have to find similar functionalities, not
>> sure if this is doable in Red Hat Clutser:
>>
>> We have a distributed application that runs in several servers
>> simultaneously and that application must run in a cluster environment.
>> The summary is as follows:
>>
>> 1. Application has two different roles for the Servers, one we
>> could call ?Central Server? and the others ?Collectors?.
>> 2. Application has _one_ Central Server and _X_ Collector Servers.
>> 3. Central Server + Collector Servers represents a set of
>> servers that must be running all time and we want to implement
>> two sets in order to implement failovers between them.
>> 4. _First issue I have_:
>> Application is installed in _all servers_ at same
>> location, let us say ?/opt/app? and I want to monitor it in all them (i.e.:
>> different, separated, independent instances in separated
>> servers).
>> In Veritas we had ?fscentral? and ?fscollector?, both
>> with same device name and mounting point and that worked fine,
>> (of course, both resources were part of different
>> service groups and running in different servers).
>> I tried to do the same here and got an error:
>>
>>
>> clurgmgrd[9374]: Unique attribute collision. type=fs
>> attr=mountpoint value=/opt
>> clurgmgrd[9374]: Error storing fs resource
>>
>> Then, I assume should be a different way to implement this
>> resource? Notice that the number of Collectors is variable so I
>> can?t say ?collector 1 will be mounted as /opt1? or ?collector
>> 1 will have volume name as vol1?.
>>
>> 5. Second issue I have:
>>
>> How I can run the ?service? ?app collector? in more than one
>> server simultaneously (in parallel)?
>> Again, the option to have ?X? services for ?X? Collectors is
>> not a real option here.
>>
>> Any idea will be appreciated!!!
>>
>>
>> Thanks
>
> I've not read this carefully (at work, sorry), but if I grasped your
> question;
>
> For services you want to run on all servers;
> - Defined a unique failoverdomain containing each node to run the
> parallel services.
> - Create the a service multiple times, each using the failoverdomain
> containing the single target node.
>
> For services to run on one node, but move on failure, create another
> failover domain (ordered, if you want to set preferences) with the
> candidate nodes as members. Then create a service and assign it to this
> domain.
>
> To provide your cluster.conf (or as much as you've crafted so far).
> Please only obfuscate passwords if possible.
>
> --
> Digimer
> E-Mail: digimer at alteeve.com
> Freenode handle: digimer
> Papers and Projects: http://alteeve.com
> Node Assassin: http://nodeassassin.org
> "At what point did we forget that the Space Shuttle was, essentially,
> a program that strapped human beings to an explosion and tried to stab
> through the sky with fire and math?"
From linux at alteeve.com Wed Sep 28 00:19:19 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 27 Sep 2011 17:19:19 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
In-Reply-To:
References:
Message-ID: <4E826807.5030408@alteeve.com>
On 09/27/2011 05:04 PM, Ruben Sajnovetzky wrote:
>
> Good example, thanks.
> Not sure if is doable because we could have 10 servers and the idea to have
> 10 service instances could be tricky to admin :(
Oh? How so? The file would be a bit long, but even with ten definitions
it should still be manageable. Particularly so if you use a tool like luci.
> What about the other q, related with the usage of same name of devices and
> mounting points?
I didn't follow that question. Rather, that sounds like a much bigger
question...
If '/opt/app' is local to each node, containing separate installs of the
application, it should be fine. However, I expect this is not the case,
of you'd not be asking.
If, on the other hand, '/opt/app' is a shared storage (ie: an NFS mount,
GFS2 partition, etc) then it should still be fine. Look again at that
link and search for '/xen_shared'. That is a common chunk of space
(using clvmd and gfs2) which is un/mounted by the cluster and it is
mounted in the same place on all nodes (and uses the same LV device name).
If I am not answering your question, please ask again. :)
--
Digimer
E-Mail: digimer at alteeve.com
Freenode handle: digimer
Papers and Projects: http://alteeve.com
Node Assassin: http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"
From rsajnove at cisco.com Wed Sep 28 00:33:23 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Tue, 27 Sep 2011 20:33:23 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
In-Reply-To: <4E826807.5030408@alteeve.com>
Message-ID:
I might be doing something wrong, because you say "you are fine" but didn't
work :(
All servers have "/opt/app" mounted in same internal disk partition.
They are not shared, it is just that all have identical layout.
I tried to create:
Resource name: Central_FS
Device: /dev/mapper/VolGroup00-optvol
FS Type: ext3
Mount point: /opt
And
Resource name: Collector_FS
Device: /dev/mapper/VolGroup00-optvol
FS Type: ext3
Mount point: /opt
When I tried to save it I found in the /var/log/messages:
clurgmgrd[4174]: Reconfiguring
clurgmgrd[4174]: Unique attribute collision. type=fs attr=mountpoint
value=/opt
clurgmgrd[4174]: Error storing fs resource
Thanks for your help and ideas!
On 27-Sep-2011 8:19 PM, "Digimer" wrote:
> On 09/27/2011 05:04 PM, Ruben Sajnovetzky wrote:
>>
>> Good example, thanks.
>> Not sure if is doable because we could have 10 servers and the idea to have
>> 10 service instances could be tricky to admin :(
>
> Oh? How so? The file would be a bit long, but even with ten definitions
> it should still be manageable. Particularly so if you use a tool like luci.
>
>> What about the other q, related with the usage of same name of devices and
>> mounting points?
>
> I didn't follow that question. Rather, that sounds like a much bigger
> question...
>
> If '/opt/app' is local to each node, containing separate installs of the
> application, it should be fine. However, I expect this is not the case,
> of you'd not be asking.
>
> If, on the other hand, '/opt/app' is a shared storage (ie: an NFS mount,
> GFS2 partition, etc) then it should still be fine. Look again at that
> link and search for '/xen_shared'. That is a common chunk of space
> (using clvmd and gfs2) which is un/mounted by the cluster and it is
> mounted in the same place on all nodes (and uses the same LV device name).
>
> If I am not answering your question, please ask again. :)
From linux at alteeve.com Wed Sep 28 00:45:34 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 27 Sep 2011 17:45:34 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
In-Reply-To:
References:
Message-ID: <4E826E2E.5000507@alteeve.com>
On 09/27/2011 05:33 PM, Ruben Sajnovetzky wrote:
>
> I might be doing something wrong, because you say "you are fine" but didn't
> work :(
>
> All servers have "/opt/app" mounted in same internal disk partition.
> They are not shared, it is just that all have identical layout.
> I tried to create:
>
> Resource name: Central_FS
> Device: /dev/mapper/VolGroup00-optvol
> FS Type: ext3
> Mount point: /opt
>
> And
>
> Resource name: Collector_FS
> Device: /dev/mapper/VolGroup00-optvol
> FS Type: ext3
> Mount point: /opt
>
> When I tried to save it I found in the /var/log/messages:
>
> clurgmgrd[4174]: Reconfiguring
> clurgmgrd[4174]: Unique attribute collision. type=fs attr=mountpoint
> value=/opt
> clurgmgrd[4174]: Error storing fs resource
>
> Thanks for your help and ideas!
Please post your cluster.conf file (and obfuscate only passwords,
please). Also post a sample /etc/fstab and the outputs of 'pvscan',
'vgscan' and 'lvscan'.
--
Digimer
E-Mail: digimer at alteeve.com
Freenode handle: digimer
Papers and Projects: http://alteeve.com
Node Assassin: http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"
From amit.jathar at alepo.com Wed Sep 28 11:47:59 2011
From: amit.jathar at alepo.com (Amit Jathar)
Date: Wed, 28 Sep 2011 11:47:59 +0000
Subject: [Linux-cluster] corosync crashes after firing crm configuration
command on any one node
Message-ID:
Hi,
I am facing weird issue in the corosync behavior.
I have configured a two node cluster.
The cluster is working fine & the crm_mon command is showing proper output.
The command cibadmin -Q also working on both the nodes properly.
The issue starts when I put any crm configuration command.
As I put crm configuration command, I can see the following output:-
[root at AAA02 corosync]# crm configure property no-quorum-policy=ignore Could not connect to the CIB: Remote node did not respond
ERROR: creating tmp shadow __crmshell.12274 failed
[root at AAA02 corosync]#
At the same time, the logs in the /var/log/messages says that:- Sep 28 13:38:40 localhost cibadmin: [12295]: info: Invoked: cibadmin -Ql Sep 28 13:38:40 localhost cibadmin: [12296]: info: Invoked: cibadmin -Ql Sep 28 13:38:40 localhost crm_shadow: [12298]: info: Invoked: crm_shadow -c __crmshell.12274
I have attached a file which has cib.xml & corosync.conf file contents on both the nodes .
Please guide me to troubleshoot this error.
Thanks in advance.
Thanks,
Amit
________________________________
This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited.
________________________________
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cib_xml_corosync_conf.txt
URL:
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: logs_on_node.txt
URL:
From raju.rajsand at gmail.com Wed Sep 28 12:49:50 2011
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Wed, 28 Sep 2011 18:19:50 +0530
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
In-Reply-To:
References: <4E826807.5030408@alteeve.com>
Message-ID:
Greetings,
On Wed, Sep 28, 2011 at 6:03 AM, Ruben Sajnovetzky wrote:
>
> ? ?FS Type: ext3
Shouldn't it be GFS /gfs2?
--
Regards,
Rajagopal
From rhayden.public at gmail.com Wed Sep 28 12:52:52 2011
From: rhayden.public at gmail.com (Robert Hayden)
Date: Wed, 28 Sep 2011 07:52:52 -0500
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
In-Reply-To: <4E826E2E.5000507@alteeve.com>
References: <4E826E2E.5000507@alteeve.com>
Message-ID:
> On 09/27/2011 05:33 PM, Ruben Sajnovetzky wrote:
> >
> > I might be doing something wrong, because you say "you are fine" but
> didn't
> > work :(
> >
> > All servers have "/opt/app" mounted in same internal disk partition.
> > They are not shared, it is just that all have identical layout.
> > I tried to create:
> >
> > Resource name: Central_FS
> > Device: /dev/mapper/VolGroup00-optvol
> > FS Type: ext3
> > Mount point: /opt
> >
> > And
> >
> > Resource name: Collector_FS
> > Device: /dev/mapper/VolGroup00-optvol
> > FS Type: ext3
> > Mount point: /opt
> >
>
My suggestion here is theoretical and not tested.... I think you want to
have a single "resource" with different service names. For example,
> > When I tried to save it I found in the /var/log/messages:
> >
> > clurgmgrd[4174]: Reconfiguring
> > clurgmgrd[4174]: Unique attribute collision. type=fs
> attr=mountpoint
> > value=/opt
> > clurgmgrd[4174]: Error storing fs resource
> >
> > Thanks for your help and ideas!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From rsajnove at cisco.com Wed Sep 28 13:09:13 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Wed, 28 Sep 2011 09:09:13 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
In-Reply-To:
Message-ID:
This approach didn?t work either :(
First server started service the second couldn?t start
On 28-Sep-2011 8:52 AM, "Robert Hayden" wrote:
>
>> On 09/27/2011 05:33 PM, Ruben Sajnovetzky wrote:
>>> >
>>> > I might be doing something wrong, because you say "you are fine" but
>>> didn't
>>> > work :(
>>> >
>>> > All servers have "/opt/app" mounted in same internal disk partition.
>>> > They are not shared, it is just that all have identical layout.
>>> > I tried to create:
>>> >
>>> > ? ? Resource name: Central_FS
>>> > ? ? Device: /dev/mapper/VolGroup00-optvol
>>> > ? ? FS Type: ext3
>>> > ? ? Mount point: /opt
>>> >
>>> > And
>>> >
>>> > ? ? Resource name: Collector_FS
>>> > ? ? Device: /dev/mapper/VolGroup00-optvol
>>> > ? ? FS Type: ext3
>>> > ? ? Mount point: /opt
>>> >
>
> My suggestion here is theoretical and not tested.... I think you want to have
> a single "resource" with different service names.? For example,
>
>
> ?
> ??????
> ? ?
> ??????
> ????????????
> ??????
> ??????
> ????????????
> ??????
>
>
> ?
>>> > When I tried to save it I found in the /var/log/messages:
>>> >
>>> > ?clurgmgrd[4174]: Reconfiguring
>>> > ?clurgmgrd[4174]: Unique attribute collision. type=fs
>>> attr=mountpoint
>>> > value=/opt
>>> > ?clurgmgrd[4174]: Error storing fs resource
>>> >
>>> > Thanks for your help and ideas!
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From rsajnove at cisco.com Wed Sep 28 13:20:39 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Wed, 28 Sep 2011 09:20:39 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
In-Reply-To: <4E826E2E.5000507@alteeve.com>
Message-ID:
Here is the cluster.conf (didn't get access to run other commands yet) :
On 27-Sep-2011 8:45 PM, "Digimer" wrote:
> On 09/27/2011 05:33 PM, Ruben Sajnovetzky wrote:
>>
>> I might be doing something wrong, because you say "you are fine" but didn't
>> work :(
>>
>> All servers have "/opt/app" mounted in same internal disk partition.
>> They are not shared, it is just that all have identical layout.
>> I tried to create:
>>
>> Resource name: Central_FS
>> Device: /dev/mapper/VolGroup00-optvol
>> FS Type: ext3
>> Mount point: /opt
>>
>> And
>>
>> Resource name: Collector_FS
>> Device: /dev/mapper/VolGroup00-optvol
>> FS Type: ext3
>> Mount point: /opt
>>
>> When I tried to save it I found in the /var/log/messages:
>>
>> clurgmgrd[4174]: Reconfiguring
>> clurgmgrd[4174]: Unique attribute collision. type=fs attr=mountpoint
>> value=/opt
>> clurgmgrd[4174]: Error storing fs resource
>>
>> Thanks for your help and ideas!
>
> Please post your cluster.conf file (and obfuscate only passwords,
> please). Also post a sample /etc/fstab and the outputs of 'pvscan',
> 'vgscan' and 'lvscan'.
From ext.thales.jean-daniel.bonnetot at sncf.fr Wed Sep 28 15:58:02 2011
From: ext.thales.jean-daniel.bonnetot at sncf.fr (BONNETOT Jean-Daniel (EXT THALES))
Date: Wed, 28 Sep 2011 17:58:02 +0200
Subject: [Linux-cluster] (no subject)
Message-ID:
Hi,
I have problem with two node cluster. When I force a node to faile,
second node fences first one. When first one rejoin my cluster, cman
shutdown on both nodes saying :
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [MAIN ] Killing node
s64lmwbig3b because it has rejoined the cluster with existing state
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CMAN ] cman killed by node 1
because we rejoined the cluster without a full restart
Logs :
See attached
Conf :
Do you know what I missed ?
Thanks
Regards,
Jean-Daniel BONNETOT
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 667 bytes
Desc: image001.jpg
URL:
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cluster_log.txt
URL:
-------------- next part --------------
-------
Ce message et toutes les pi?ces jointes sont ?tablis ? l'intention exclusive de ses destinataires et sont confidentiels. L'int?grit? de ce message n'?tant pas assur?e sur Internet, la SNCF ne peut ?tre tenue responsable des alt?rations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, m?me partielle, non autoris?e pr?alablement par la SNCF, est strictement interdite. Si vous n'?tes pas le destinataire de ce message, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it.
From linux at alteeve.com Wed Sep 28 16:44:59 2011
From: linux at alteeve.com (Digimer)
Date: Wed, 28 Sep 2011 09:44:59 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
In-Reply-To:
References:
Message-ID: <4E834F0B.5020702@alteeve.com>
On 09/28/2011 06:09 AM, Ruben Sajnovetzky wrote:
> This approach didn?t work either :(
> First server started service the second couldn?t start
You only shared a small snippet of your cluster.conf config, and none of
the other requested info. I don't know what might be missing versus omitted.
--
Digimer
E-Mail: digimer at alteeve.com
Freenode handle: digimer
Papers and Projects: http://alteeve.com
Node Assassin: http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"
From linux at alteeve.com Wed Sep 28 16:50:57 2011
From: linux at alteeve.com (Digimer)
Date: Wed, 28 Sep 2011 09:50:57 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
Cluster 5.0
In-Reply-To:
References:
Message-ID: <4E835071.6080506@alteeve.com>
On 09/28/2011 06:20 AM, Ruben Sajnovetzky wrote:
>
>
> post_join_delay="30"/>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ordered="0" restricted="1">
> priority="1"/>
>
> ordered="0" restricted="1">
> priority="1"/>
>
>
>
>
>