From swap_project at yahoo.com  Thu Sep  1 14:57:18 2011
From: swap_project at yahoo.com (Srija)
Date: Thu, 1 Sep 2011 07:57:18 -0700 (PDT)
Subject: [Linux-cluster] vm guest migrating  through clusvcadm
In-Reply-To: <1314702563.2694.13.camel@menhir>
References: <4E5CC0B8.8010209@sissa.it> <1314702563.2694.13.camel@menhir>
Message-ID: <1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com>

????Hi,
?
?
? I have confgured a guest? in the cluster environment to restart?in another? node?, when the node on which the guest resides if goes down.
?
? The guest? has no issue when i am live migrating it with 'xm migrate' command.? But having issues? when trying to migrated? with
?
? 'clusvcadm'.? The? cluster is not? under kvm.? 
?
? The error is? :
?
??????????? Trying to migrate service:guest1 ?to node1...Service does not exist.
?
? Here is the configuration of the cluster:
?
?<rm log_level="7" log_facility="local4">
??????????????? <failoverdomains>
??????????????????????? <failoverdomain name="all1" nofailback="1" ordered="0" restricted="0">
??????????????????????????????? <failoverdomainnode name="node1" priority="1"/>
?????????????????????????????? ......................
???????????????????????????? <snip>
??????????????????????? </failoverdomain>
??????????????? </failoverdomains>
??????????????? <resources>
??????????????????? <vm autostart="1" name="guest1" use_virsh="0" migrate="live" domain="all1" exclusive="0" recovery="reload" path="/etc/xen"/>
??????????????? </resources>
??????? </rm>
?
I tried also changing the cluster configuration , placing the guest? name as? service? , but it did not work? as well.
?
?Can anyone pl. confirm me that does it need the kvm to use? 'clusvcadm'? command? ? If not? then what kind of modications? is needed
?
?to? migrate the guest? with clusvcadm?
?
Thanks




From mmorgan at dca.net  Thu Sep  1 20:58:18 2011
From: mmorgan at dca.net (Michael Morgan)
Date: Thu, 1 Sep 2011 16:58:18 -0400
Subject: [Linux-cluster] "Invalid resource" starting KVM guest with
	clusvcadm
In-Reply-To: <20110825214529.GF7305@staff.dca.net>
References: <20110825214529.GF7305@staff.dca.net>
Message-ID: <20110901205818.GD545@staff.dca.net>

On Thu, Aug 25, 2011 at 05:45:29PM -0400, Michael Morgan wrote:
> Hello,
> 
>  I have a 2 node KVM cluster under Scientific Linux 6.1. Starting guests
> works fine through virsh, virt-manager, and even rg_test. When I try to
> use clusvcadm however:
> 
> [root at node1 ~]# clusvcadm -e vm:test
> Local machine trying to enable vm:test...Invalid operation for resource
> 

After poring through vm.sh and adding some logging I see that clusvcadm
on the bad cluster is running "vm.sh status" and fails after "virsh
domstate test". Both rg_test on the bad cluster and clusvcadm on a
working cluster run "vm.sh start" which correctly follows up with "virsh
create /mnt/shared/xml/test.xml". I can't think of any reason why this
would be happening though.

--
Michael Morgan
mmorgan at dca.net



From rhayden.public at gmail.com  Fri Sep  2 13:38:25 2011
From: rhayden.public at gmail.com (Robert Hayden)
Date: Fri, 2 Sep 2011 08:38:25 -0500
Subject: [Linux-cluster] RHEL 5.7: cpg_leave error retrying
Message-ID: <CANqTVAGAxcdtg4s2YPVyxXJFTY2GbpA_JEFGDgEvxUCYStHTFg@mail.gmail.com>

Has anyone experienced the following error/hang/loop when attempting
to stop rgmanager or cman on the last node of a two node cluster?

groupd[4909]: cpg_leave error retrying

Basic scenario:
RHEL 5.7 with the latest errata for cman.
Create a two node cluster with qdisk and higher totem token=70000
start cman on both nodes, wait for qdisk to become online with master determined
stop cman on node1, wait for it to complete
stop cman on node2
error "cpg_leave" seen in logging output

Observations:
The "service cman stop" command hangs at "Stopping fencing" output
If I cycle openais service with "service openais restart", then the
"service cman stop" will complete (need to manually stop the openais
service afterwards).
When hung, the command "group_tool dump" hangs (any group_tool command hangs).
The hang is inconsistent which, in my mind, implies a timing issue.
Inconsistent meaning that every once in a while, then shutdown will
complete (maybe 20% of the time).
I have seen the issue with the stopping of rgmanager and cman.  The
below example has been stripped down to show the hang with cman.
I have tested with varying the length of time to wait before stopping
the second node with no difference (hang still occurs periodically).
I have tested with commenting out the totem token and the
quorum_dev_poll and still experienced the hang. (we use the longer
timeouts to help survive network and san blips)/


I have dug through some of the source code.  The message appears in
group's cpg.c as function do_cpg_leave( ).  This calls the cpg_leave
function located in the openais package.

If I attach to the groupd process with gdb, I get the following stack.
 Watching with strace, groupd is just in a looping state.
(gdb) where
#0  0x000000341409a510 in __nanosleep_nocancel () from /lib64/libc.so.6
#1  0x000000341409a364 in sleep () from /lib64/libc.so.6
#2  0x000000000040a410 in time ()
#3  0x000000000040bd09 in time ()
#4  0x000000000040e2cb in time ()
#5  0x000000000040ebe0 in time ()
#6  0x000000000040f394 in time ()
#7  0x000000341401d994 in __libc_start_main () from /lib64/libc.so.6
#8  0x00000000004018f9 in time ()
#9  0x00007fff04a671c8 in ?? ()
#10 0x0000000000000000 in ?? ()

If I attach to the aisexec process with gdb, I see the following:
(gdb) where
#0  0x00000034140cb696 in poll () from /lib64/libc.so.6
#1  0x0000000000405c50 in poll_run ()
#2  0x0000000000418aae in main ()


As you can see in the cluster.conf example below, I have attempted
many different ways to create more debug logging.  I do see debug
messages from openais in the cpg.c component during startup, but
nothing is logged on the shutdown hang scenario.

I would appreciate any guidance on how to troubleshoot further,
especially with increasing the tracing of the openais calls in cpg.c.

Thanks
Robert


Example cluster.conf:
<?xml version="1.0"?>
<cluster config_version="33" name="cluster_app_1">
        <logging to_syslog="yes" syslog_facility="local4"
timestamp="on" debug="on">
                <logger ident="CPG" debug="on"/>
                <logger ident="CMAN" debug="on"/>
        </logging>
        <cman expected_nodes="2" expected_votes="3" quorum_dev_poll="70000">
                <multicast addr="239.192.1.192"/>
        </cman>
        <totem token="70000"/>
        <fence_daemon clean_start="0" log_facility="local4"
post_fail_delay="10" post_join_delay="60"/>
        <quorumd interval="1" label="rhcs_qdisk" log_facility="local4"
log_level="7" min_score="1" tko="60" votes="1">
                <heuristic interval="2" program="/bin/ping -c1 -t2
-Ibond0 10.162.106.1" score="1" tko="3"/>
        </quorumd>
        <clusternodes>
                <clusternode name="node1-priv" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="iLO_node1"/>
                                </method>
                        </fence>
                        <multicast addr="239.192.1.192" interface="bond1"/>
                </clusternode>
                <clusternode name="node2-priv" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="iLO_node2"/>
                                </method>
                        </fence>
                        <multicast addr="239.192.1.192" interface="bond1"/>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice action="off" agent="fence_ipmilan"
ipaddr="X.X.X.X" login="node1_fence" name="iLO_node1"
passwd="password" power_wait="10" lanplus="1"/>
                <fencedevice action="off" agent="fence_ipmilan"
ipaddr="X.X.X.X" login="node2_fence" name="iLO_node2"
passwd="password" power_wait="10" lanplus="1"/>
        </fencedevices>
        <rm log_level="7"/>
</cluster>



From rhayden.public at gmail.com  Fri Sep  2 14:33:17 2011
From: rhayden.public at gmail.com (Robert Hayden)
Date: Fri, 2 Sep 2011 09:33:17 -0500
Subject: [Linux-cluster] RHEL 5.7: cpg_leave error retrying
In-Reply-To: <CANqTVAGAxcdtg4s2YPVyxXJFTY2GbpA_JEFGDgEvxUCYStHTFg@mail.gmail.com>
References: <CANqTVAGAxcdtg4s2YPVyxXJFTY2GbpA_JEFGDgEvxUCYStHTFg@mail.gmail.com>
Message-ID: <CANqTVAGGYJd=uJ_oxAL1yVV8jo5G0T-B7RWDzvTg4K79DRgJMQ@mail.gmail.com>

I modified the /etc/init.d/cman script to use the -D flag on the
groupd start and re-direct the output to a file in /tmp.  During the
hang, I see groupd looping through the cpg_leave function.  When I
restart openais, it appears that groupd will get an error code "2" and
then break out of the loop. Looks like I need to dig into the openais
cpg_leave function....


Here is the output of the groupg -D output with the openais restart at
the very end.

1314973495 cman: our nodeid 2 name node2-priv quorum 1
1314973495 setup_cpg groupd_handle 6b8b456700000000
1314973495 groupd confchg total 2 left 0 joined 1
1314973495 send_version nodeid 2 cluster 2 mode 2 compat 1
1314973495 client connection 3
1314973495 got client 3 setup
1314973495 setup fence 0
1314973495 client connection 4
1314973495 got client 4 setup
1314973495 setup dlm 1
1314973495 client connection 5
1314973495 got client 5 setup
1314973495 setup gfs 2
1314973496 got client 3 join
1314973496 0:default got join
1314973496 0:default is cpg client 6 name 0_default handle 79e2a9e300000001
1314973496 0:default cpg_join ok
1314973496 0:default waiting for first cpg event
1314973496 client connection 7
1314973496 0:default waiting for first cpg event
1314973496 0:default confchg left 0 joined 1 total 2
1314973496 0:default process_node_join 2
1314973496 0:default cpg add node 1 total 1
1314973496 0:default cpg add node 2 total 2
1314973496 0:default make_event_id 200020001 nodeid 2 memb_count 2 type 1
1314973496 0:default queue join event for nodeid 2
1314973496 0:default process_current_event 200020001 2 JOIN_BEGIN
1314973496 0:default app node init: add 2 total 1
1314973496 0:default app node init: add 1 total 2
1314973496 0:default waiting for 1 more stopped messages before
JOIN_ALL_STOPPED 2
1314973496 got client 7 get_group
1314973496 0:default waiting for 1 more stopped messages before
JOIN_ALL_STOPPED 2
1314973496 0:default waiting for 1 more stopped messages before
JOIN_ALL_STOPPED 2
1314973496 0:default mark node 1 stopped
1314973496 0:default set global_id 10001 from 1
1314973496 0:default process_current_event 200020001 2 JOIN_ALL_STOPPED
1314973496 0:default action for app: setid default 65537
1314973496 0:default action for app: start default 1 2 2 1 2
1314973496 0:default mark node 1 started
1314973496 client connection 7
1314973496 got client 7 get_group
1314973496 client connection 7
1314973496 got client 7 get_group
1314973496 got client 3 start_done
1314973496 0:default send started
1314973496 0:default mark node 2 started
1314973496 0:default process_current_event 200020001 2 JOIN_ALL_STARTED
1314973496 0:default action for app: finish default 1
1314973497 client connection 7
1314973497 got client 7 get_group
1314973557 cman: node 0 added
1314973580 0:default confchg left 1 joined 0 total 1
1314973580 0:default confchg removed node 1 reason 2
1314973580 0:default process_node_leave 1
1314973580 0:default cpg del node 1 total 1
1314973580 0:default make_event_id 100010002 nodeid 1 memb_count 1 type 2
1314973580 0:default queue leave event for nodeid 1
1314973580 0:default process_current_event 100010002 1 LEAVE_BEGIN
1314973580 0:default action for app: stop default
1314973580 got client 3 stop_done
1314973580 0:default send stopped
1314973580 0:default waiting for 2 more stopped messages before
LEAVE_ALL_STOPPED 1
1314973580 0:default mark node 1 stopped
1314973580 0:default waiting for 1 more stopped messages before
LEAVE_ALL_STOPPED 1
1314973580 0:default waiting for 1 more stopped messages before
LEAVE_ALL_STOPPED 1
1314973580 0:default mark node 2 stopped
1314973580 0:default process_current_event 100010002 1 LEAVE_ALL_STOPPED
1314973580 0:default app node leave: del 1 total 1
1314973580 0:default action for app: start default 2 3 1 2
1314973580 got client 3 start_done
1314973580 0:default send started
1314973580 0:default mark node 2 started
1314973580 0:default process_current_event 100010002 1 LEAVE_ALL_STARTED
1314973580 0:default action for app: finish default 2
1314973583 cman: node 1 removed
1314973583 add_recovery_set_cman nodeid 1
1314973591 got client 3 leave
1314973591 0:default got leave
1314973591 cpg_leave error retry
1314973592 cpg_leave error retry
1314973593 cpg_leave error retry
1314973594 cpg_leave error retry
1314973595 cpg_leave error retry
1314973596 cpg_leave error retry
1314973597 cpg_leave error retry
1314973598 cpg_leave error retry
1314973599 cpg_leave error retry
1314973600 cpg_leave error retry
1314973601 0:default cpg_leave error retrying
1314973601 cpg_leave error retry
1314973602 cpg_leave error retry
1314973603 cpg_leave error retry
1314973604 cpg_leave error retry
1314973605 cpg_leave error retry
1314973606 cpg_leave error retry
1314973607 cpg_leave error retry
1314973608 cpg_leave error retry
1314973609 cpg_leave error retry
1314973610 cpg_leave error retry
1314973611 0:default cpg_leave error retrying
1314973611 cpg_leave error retry
1314973612 cpg_leave error retry
1314973613 cpg_leave error retry
1314973614 cpg_leave error retry
1314973615 cpg_leave error retry
1314973616 cpg_leave error retry
1314973617 cpg_leave error retry
1314973618 cpg_leave error retry
1314973619 cpg_leave error retry
1314973620 cpg_leave error retry
1314973621 0:default cpg_leave error retrying
1314973621 cpg_leave error retry
1314973622 cpg_leave error retry
1314973623 cpg_leave error retry
1314973624 cpg_leave error retry
1314973625 cpg_leave error retry
1314973626 cpg_leave error retry
1314973627 cpg_leave error retry
1314973628 cpg_leave error retry
1314973629 cpg_leave error retry
1314973630 cpg_leave error retry
1314973631 0:default cpg_leave error retrying
1314973631 cpg_leave error retry
1314973632 cpg_leave error retry
1314973633 cpg_leave error retry
1314973634 cpg_leave error retry
1314973635 cpg_leave error retry
1314973636 cpg_leave error retry
1314973637 cpg_leave error retry
1314973640 0:default cpg_leave error 2
1314973640 client connection 7
1314973640 cluster is down, exiting





On Fri, Sep 2, 2011 at 8:38 AM, Robert Hayden <rhayden.public at gmail.com> wrote:
> Has anyone experienced the following error/hang/loop when attempting
> to stop rgmanager or cman on the last node of a two node cluster?
>
> groupd[4909]: cpg_leave error retrying
>
> Basic scenario:
> RHEL 5.7 with the latest errata for cman.
> Create a two node cluster with qdisk and higher totem token=70000
> start cman on both nodes, wait for qdisk to become online with master determined
> stop cman on node1, wait for it to complete
> stop cman on node2
> error "cpg_leave" seen in logging output
>
> Observations:
> The "service cman stop" command hangs at "Stopping fencing" output
> If I cycle openais service with "service openais restart", then the
> "service cman stop" will complete (need to manually stop the openais
> service afterwards).
> When hung, the command "group_tool dump" hangs (any group_tool command hangs).
> The hang is inconsistent which, in my mind, implies a timing issue.
> Inconsistent meaning that every once in a while, then shutdown will
> complete (maybe 20% of the time).
> I have seen the issue with the stopping of rgmanager and cman. ?The
> below example has been stripped down to show the hang with cman.
> I have tested with varying the length of time to wait before stopping
> the second node with no difference (hang still occurs periodically).
> I have tested with commenting out the totem token and the
> quorum_dev_poll and still experienced the hang. (we use the longer
> timeouts to help survive network and san blips)/
>
>
> I have dug through some of the source code. ?The message appears in
> group's cpg.c as function do_cpg_leave( ). ?This calls the cpg_leave
> function located in the openais package.
>
> If I attach to the groupd process with gdb, I get the following stack.
> ?Watching with strace, groupd is just in a looping state.
> (gdb) where
> #0 ?0x000000341409a510 in __nanosleep_nocancel () from /lib64/libc.so.6
> #1 ?0x000000341409a364 in sleep () from /lib64/libc.so.6
> #2 ?0x000000000040a410 in time ()
> #3 ?0x000000000040bd09 in time ()
> #4 ?0x000000000040e2cb in time ()
> #5 ?0x000000000040ebe0 in time ()
> #6 ?0x000000000040f394 in time ()
> #7 ?0x000000341401d994 in __libc_start_main () from /lib64/libc.so.6
> #8 ?0x00000000004018f9 in time ()
> #9 ?0x00007fff04a671c8 in ?? ()
> #10 0x0000000000000000 in ?? ()
>
> If I attach to the aisexec process with gdb, I see the following:
> (gdb) where
> #0 ?0x00000034140cb696 in poll () from /lib64/libc.so.6
> #1 ?0x0000000000405c50 in poll_run ()
> #2 ?0x0000000000418aae in main ()
>
>
> As you can see in the cluster.conf example below, I have attempted
> many different ways to create more debug logging. ?I do see debug
> messages from openais in the cpg.c component during startup, but
> nothing is logged on the shutdown hang scenario.
>
> I would appreciate any guidance on how to troubleshoot further,
> especially with increasing the tracing of the openais calls in cpg.c.
>
> Thanks
> Robert
>
>
> Example cluster.conf:
> <?xml version="1.0"?>
> <cluster config_version="33" name="cluster_app_1">
> ? ? ? ?<logging to_syslog="yes" syslog_facility="local4"
> timestamp="on" debug="on">
> ? ? ? ? ? ? ? ?<logger ident="CPG" debug="on"/>
> ? ? ? ? ? ? ? ?<logger ident="CMAN" debug="on"/>
> ? ? ? ?</logging>
> ? ? ? ?<cman expected_nodes="2" expected_votes="3" quorum_dev_poll="70000">
> ? ? ? ? ? ? ? ?<multicast addr="239.192.1.192"/>
> ? ? ? ?</cman>
> ? ? ? ?<totem token="70000"/>
> ? ? ? ?<fence_daemon clean_start="0" log_facility="local4"
> post_fail_delay="10" post_join_delay="60"/>
> ? ? ? ?<quorumd interval="1" label="rhcs_qdisk" log_facility="local4"
> log_level="7" min_score="1" tko="60" votes="1">
> ? ? ? ? ? ? ? ?<heuristic interval="2" program="/bin/ping -c1 -t2
> -Ibond0 10.162.106.1" score="1" tko="3"/>
> ? ? ? ?</quorumd>
> ? ? ? ?<clusternodes>
> ? ? ? ? ? ? ? ?<clusternode name="node1-priv" nodeid="1" votes="1">
> ? ? ? ? ? ? ? ? ? ? ? ?<fence>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<method name="1">
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<device name="iLO_node1"/>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?</method>
> ? ? ? ? ? ? ? ? ? ? ? ?</fence>
> ? ? ? ? ? ? ? ? ? ? ? ?<multicast addr="239.192.1.192" interface="bond1"/>
> ? ? ? ? ? ? ? ?</clusternode>
> ? ? ? ? ? ? ? ?<clusternode name="node2-priv" nodeid="2" votes="1">
> ? ? ? ? ? ? ? ? ? ? ? ?<fence>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<method name="1">
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<device name="iLO_node2"/>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?</method>
> ? ? ? ? ? ? ? ? ? ? ? ?</fence>
> ? ? ? ? ? ? ? ? ? ? ? ?<multicast addr="239.192.1.192" interface="bond1"/>
> ? ? ? ? ? ? ? ?</clusternode>
> ? ? ? ?</clusternodes>
> ? ? ? ?<fencedevices>
> ? ? ? ? ? ? ? ?<fencedevice action="off" agent="fence_ipmilan"
> ipaddr="X.X.X.X" login="node1_fence" name="iLO_node1"
> passwd="password" power_wait="10" lanplus="1"/>
> ? ? ? ? ? ? ? ?<fencedevice action="off" agent="fence_ipmilan"
> ipaddr="X.X.X.X" login="node2_fence" name="iLO_node2"
> passwd="password" power_wait="10" lanplus="1"/>
> ? ? ? ?</fencedevices>
> ? ? ? ?<rm log_level="7"/>
> </cluster>
>



From rhayden.public at gmail.com  Fri Sep  2 16:15:15 2011
From: rhayden.public at gmail.com (Robert Hayden)
Date: Fri, 2 Sep 2011 11:15:15 -0500
Subject: [Linux-cluster] RHEL 5.7: cpg_leave error retrying
In-Reply-To: <CANqTVAGGYJd=uJ_oxAL1yVV8jo5G0T-B7RWDzvTg4K79DRgJMQ@mail.gmail.com>
References: <CANqTVAGAxcdtg4s2YPVyxXJFTY2GbpA_JEFGDgEvxUCYStHTFg@mail.gmail.com>
	<CANqTVAGGYJd=uJ_oxAL1yVV8jo5G0T-B7RWDzvTg4K79DRgJMQ@mail.gmail.com>
Message-ID: <CANqTVAGUBqXLw3jxu=FYYgYjXzXeMZ8q56R6H0AOMNhGs6cZYw@mail.gmail.com>

I search the openais forums and ran across two recent threads and a couple
of potential patches that sounds interesting.  Unfortunately, I do not have
enough experience to determine if it is related to my issue.


"[Openais] Problems forming cluster on corosync startup" at
http://marc.info/?l=openais&m=131234252917259&w=2
"[Openais] CPG client can lockup if the local node is in the downlist" at
http://marc.info/?l=openais&m=131354417212931&w=2

The above threads refer to a patch from Steven Drake at
http://marc.info/?l=openais&m=131274060602528&w=2

Thanks
Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110902/1b047aa9/attachment.htm>

From thomas at sjolshagen.net  Fri Sep  2 22:34:29 2011
From: thomas at sjolshagen.net (Thomas Sjolshagen)
Date: Fri, 02 Sep 2011 18:34:29 -0400
Subject: [Linux-cluster] =?utf-8?q?dlm=3A_dev=5Fwrite_no_op_48479213_18508?=
Message-ID: <a6a06a67403c6336a15c8916b8b663fc@sjolshagen.net>

  

I've been getting:  

dlm: dev_write no op 48479213 18508 

in
dmesg output after I've upgraded to the latest Fedora 15 cluster
packages. 

After a while, my GFS2 file system(s) stop responding. I
can't prove a connection between the two, but was wondering if there is
any reason to believe there could be? 

Packages:


cluster-glue-1.0.6-2.fc15.1.x86_64
gfs2-cluster-3.1.1-2.fc15.x86_64
cluster-glue-libs-1.0.6-2.fc15.1.x86_64
clusterlib-3.1.5-1.fc15.x86_64
cman-3.1.5-1.fc15.x86_64
kernel-2.6.40.3-0.fc15.x86_64

corosync-1.4.1-1.fc15.x86_64
corosynclib-1.4.1-1.fc15.x86_64

openaislib-1.1.4-2.fc15.x86_64
openais-1.1.4-2.fc15.x86_64

--
 

Read my blog(s) [1] - occasionally updated!: 

Follow me on Twitter
[2]
  

Links:
------
[1] http://www.sjolshagen.net/
[2]
http://www.twitter.com/NotFitEnough
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110902/979d82e2/attachment.htm>

From fdinitto at redhat.com  Sat Sep  3 05:09:11 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Sat, 03 Sep 2011 07:09:11 +0200
Subject: [Linux-cluster] dlm: dev_write no op 48479213 18508
In-Reply-To: <a6a06a67403c6336a15c8916b8b663fc@sjolshagen.net>
References: <a6a06a67403c6336a15c8916b8b663fc@sjolshagen.net>
Message-ID: <4E61B677.3080702@redhat.com>

On 09/03/2011 12:34 AM, Thomas Sjolshagen wrote:
> I've been getting: 
> 
> dlm: dev_write no op 48479213 18508
> 
> in dmesg output after I've upgraded to the latest Fedora 15 cluster
> packages.
> 

We already have a fix for this message. It is a miscommunication between
kernel and dlm_controld. My understanding is that it is harmless. (see
bz731775 for more details)

> After a while, my GFS2 file system(s) stop responding. I can't prove a
> connection between the two, but was wondering if there is any reason to
> believe there could be?

It is probably unrelated but I strongly recommend you file a bug against
gfs2-utils in fedora so that the gfs2 maintainers can look at it.

Fabio

> 
> Packages:
> 
> cluster-glue-1.0.6-2.fc15.1.x86_64
> gfs2-cluster-3.1.1-2.fc15.x86_64
> cluster-glue-libs-1.0.6-2.fc15.1.x86_64
> clusterlib-3.1.5-1.fc15.x86_64
> cman-3.1.5-1.fc15.x86_64
> kernel-2.6.40.3-0.fc15.x86_64
> 
> corosync-1.4.1-1.fc15.x86_64
> corosynclib-1.4.1-1.fc15.x86_64
> 
> openaislib-1.1.4-2.fc15.x86_64
> openais-1.1.4-2.fc15.x86_64
> 
> -- 
> 
> Read my blog(s) <http://www.sjolshagen.net/> - occasionally updated!: 
> 
> Follow me on Twitter <http://www.twitter.com/NotFitEnough>
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From rh-cluster at menole.net  Tue Sep  6 09:51:11 2011
From: rh-cluster at menole.net (Michael Mende)
Date: Tue, 6 Sep 2011 11:51:11 +0200
Subject: [Linux-cluster] vm guest migrating  through clusvcadm
In-Reply-To: <1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com>
References: <4E5CC0B8.8010209@sissa.it> <1314702563.2694.13.camel@menhir>
	<1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com>
Message-ID: <20110906095111.GA16237@menole.dyndns.org>

Maybe Digimer's tutorial will help:

https://alteeve.com/w/Red_Hat_Cluster_Service_2_Tutorial

--

Mit freundlichen Gr??en,

Michael Mende

http://www.menole.net/

On Thu, Sep 01, 2011 at 07:57:18AM -0700, Srija wrote:
> ????Hi,
> ?
> ?
> ? I have confgured a guest? in the cluster environment to restart?in another? node?, when the node on which the guest resides if goes down.
> ?
> ? The guest? has no issue when i am live migrating it with 'xm migrate' command.? But having issues? when trying to migrated? with
> ?
> ? 'clusvcadm'.? The? cluster is not? under kvm.? 
> ?
> ? The error is? :
> ?
> ??????????? Trying to migrate service:guest1 ?to node1...Service does not exist.
> ?
> ? Here is the configuration of the cluster:
> ?
> ?<rm log_level="7" log_facility="local4">
> ??????????????? <failoverdomains>
> ??????????????????????? <failoverdomain name="all1" nofailback="1" ordered="0" restricted="0">
> ??????????????????????????????? <failoverdomainnode name="node1" priority="1"/>
> ?????????????????????????????? ......................
> ???????????????????????????? <snip>
> ??????????????????????? </failoverdomain>
> ??????????????? </failoverdomains>
> ??????????????? <resources>
> ??????????????????? <vm autostart="1" name="guest1" use_virsh="0" migrate="live" domain="all1" exclusive="0" recovery="reload" path="/etc/xen"/>
> ??????????????? </resources>
> ??????? </rm>
> ?
> I tried also changing the cluster configuration , placing the guest? name as? service? , but it did not work? as well.
> ?
> ?Can anyone pl. confirm me that does it need the kvm to use? 'clusvcadm'? command? ? If not? then what kind of modications? is needed
> ?
> ?to? migrate the guest? with clusvcadm?
> ?
> Thanks
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From hal at elizium.za.net  Tue Sep  6 10:06:35 2011
From: hal at elizium.za.net (Hugo Lombard)
Date: Tue, 6 Sep 2011 12:06:35 +0200
Subject: [Linux-cluster] vm guest migrating  through clusvcadm
In-Reply-To: <1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com>
References: <4E5CC0B8.8010209@sissa.it> <1314702563.2694.13.camel@menhir>
	<1314889038.96807.YahooMailNeo@web112810.mail.gq1.yahoo.com>
Message-ID: <20110906100635.GI3298@squishy.elizium.za.net>

On Thu, Sep 01, 2011 at 07:57:18AM -0700, Srija wrote:
> 
> The error is :
> 
>   Trying to migrate service:guest1 to node1...Service does not exist.
> 

What was the command you tried?

That 'service:guest1' looks suspect, I think it should rather be
'vm:guest1'.  It should match the name of the service in the clustat
output.

As an example, we'd use:

  clusvcadm -M vm:guest1 -m srv2

to migrate the virtual machine 'guest1' to the cluster node 'srv2'.

HTH

-- 
Hugo Lombard



From mark at thermeon.com  Wed Sep  7 18:37:52 2011
From: mark at thermeon.com (Mark Olliver)
Date: Wed, 7 Sep 2011 19:37:52 +0100
Subject: [Linux-cluster] kvm shared disk space
Message-ID: <012b01cc6d8d$44c21c50$ce4654f0$@thermeon.com>

Hi,

 

I have two kvm guests A and B which live on two different hosts. Both of the
host have a different partition which is DRBD synced in Active/Active mode
between the hosts, This is then mounted to each host using gfs. 

 

I now need to allow access to the data on the shared gfs disk by the two kvm
guests but I am unsure what I need to do to do that. I have looked at the
libvirt options but do not see anything that would make sense for the config
file.

 

Ideally each of the guests should mount the data to /mnt/data as this will
can then be served out by both of them at the same time.

 

I should note I do not need the data mounted on the hosts, I have just done
that at the moment to test getting gfs2 working over active/active drbd. I
do however, need locks to work correctly as the application that needs to
use the shared data does need to use locks so any mounting or exporting
option needs to respect that.

 

Any help or ideas gratefully received.

 

Regards

 

Mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110907/2d413754/attachment.htm>

From ntoughe at hotmail.com  Thu Sep  8 09:06:40 2011
From: ntoughe at hotmail.com (Guy-Serge NTOUGHE)
Date: Thu, 8 Sep 2011 09:06:40 +0000 (UTC)
Subject: [Linux-cluster] Invitation to connect on LinkedIn
Message-ID: <589992105.4907891.1315472800009.JavaMail.app@ela4-app0128.prod>

I'd like to add you to my professional network on LinkedIn.

- Guy-Serge

Guy-Serge NTOUGHE
Linux Expert at Michelin TravelPartners
Paris Area, France

Confirm that you know Guy-Serge NTOUGHE:
https://www.linkedin.com/e/-odgn7o-gsbijgnb-3y/isd/4129701855/GsprLpRo/?hs=false&tok=3oiUnw96j2yAU1

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/-odgn7o-gsbijgnb-3y/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/goo/linux-cluster%40redhat%2Ecom/20061/I1425010746_1/?hs=false&tok=08LlmS2DH2yAU1

(c) 2011 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110908/a526956b/attachment.htm>

From ntoughe at hotmail.com  Thu Sep  8 09:08:22 2011
From: ntoughe at hotmail.com (Guy-Serge NTOUGHE)
Date: Thu, 8 Sep 2011 09:08:22 +0000 (UTC)
Subject: [Linux-cluster] Invitation to connect on LinkedIn
Message-ID: <192109173.4972278.1315472902660.JavaMail.app@ela4-app0132.prod>

I'd like to add you to my professional network on LinkedIn.

- Guy-Serge

Guy-Serge NTOUGHE
Linux Expert at Michelin TravelPartners
Paris Area, France

Confirm that you know Guy-Serge NTOUGHE:
https://www.linkedin.com/e/-odgn7o-gsbilnur-5i/isd/4129701855/GsprLpRo/?hs=false&tok=2vII09OQ34yAU1

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/-odgn7o-gsbilnur-5i/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/goo/linux-cluster%40redhat%2Ecom/20061/I1425015880_1/?hs=false&tok=23mvBAu9z4yAU1

(c) 2011 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110908/e824094d/attachment.htm>

From pradhanparas at gmail.com  Tue Sep 13 16:40:02 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 13 Sep 2011 11:40:02 -0500
Subject: [Linux-cluster] replacing HBA
Message-ID: <CADyt5g=v0td-R0vPtO9hX_mWsM_K9ur_eR=nqneHVZZLAV4wxg@mail.gmail.com>

Hi,

I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
cluster. Apart from changing wwn in the SAN, what else do I need to
change in Linux (centos).  will the change be reflected automatically?


Thanks!
Paras.



From rpeterso at redhat.com  Tue Sep 13 18:04:08 2011
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 13 Sep 2011 14:04:08 -0400 (EDT)
Subject: [Linux-cluster] replacing HBA
In-Reply-To: <CADyt5g=v0td-R0vPtO9hX_mWsM_K9ur_eR=nqneHVZZLAV4wxg@mail.gmail.com>
Message-ID: <405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>

----- Original Message -----
| Hi,
| 
| I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
| cluster. Apart from changing wwn in the SAN, what else do I need to
| change in Linux (centos). will the change be reflected automatically?
| 
| 
| Thanks!
| Paras.

Hi Paras,

The GFS2 file system doesn't care what HBA you're using.
So as long as your kernel has a good device driver for that HBA
you shouldn't need to do anything else.

Regards,

Bob Peterson
Red Hat File Systems



From pradhanparas at gmail.com  Tue Sep 13 19:46:26 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 13 Sep 2011 14:46:26 -0500
Subject: [Linux-cluster] replacing HBA
In-Reply-To: <405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
References: <CADyt5g=v0td-R0vPtO9hX_mWsM_K9ur_eR=nqneHVZZLAV4wxg@mail.gmail.com>
	<405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID: <CADyt5gnUWUEViNje-HOvOjsDEJxpZo0Gj7CdsY0GBHtSdQOqsg@mail.gmail.com>

Thanks Bob.

Another question. What about replacing single port HBA with a dual
port. After configuring the multipathd, can I reconfigure physical
volume without destroying the vg, lv and clvm ? I am kinddda lost
here.

Thanks
Paras.

On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> ----- Original Message -----
> | Hi,
> |
> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
> | cluster. Apart from changing wwn in the SAN, what else do I need to
> | change in Linux (centos). will the change be reflected automatically?
> |
> |
> | Thanks!
> | Paras.
>
> Hi Paras,
>
> The GFS2 file system doesn't care what HBA you're using.
> So as long as your kernel has a good device driver for that HBA
> you shouldn't need to do anything else.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From keith.schincke at gmail.com  Tue Sep 13 21:30:33 2011
From: keith.schincke at gmail.com (Keith Schincke)
Date: Tue, 13 Sep 2011 16:30:33 -0500
Subject: [Linux-cluster] replacing HBA
In-Reply-To: <CADyt5gnUWUEViNje-HOvOjsDEJxpZo0Gj7CdsY0GBHtSdQOqsg@mail.gmail.com>
References: <CADyt5g=v0td-R0vPtO9hX_mWsM_K9ur_eR=nqneHVZZLAV4wxg@mail.gmail.com>
	<405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
	<CADyt5gnUWUEViNje-HOvOjsDEJxpZo0Gj7CdsY0GBHtSdQOqsg@mail.gmail.com>
Message-ID: <C85BC780-789B-48AA-B4C5-4EDDA519B14C@gmail.com>

How many paths doe you currently have to your disk?
Does your LVM use the multipath name (mpath0)?

Sent from my iPhone

On Sep 13, 2011, at 14:46, Paras pradhan <pradhanparas at gmail.com> wrote:

> Thanks Bob.
> 
> Another question. What about replacing single port HBA with a dual
> port. After configuring the multipathd, can I reconfigure physical
> volume without destroying the vg, lv and clvm ? I am kinddda lost
> here.
> 
> Thanks
> Paras.
> 
> On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson <rpeterso at redhat.com> wrote:
>> ----- Original Message -----
>> | Hi,
>> |
>> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
>> | cluster. Apart from changing wwn in the SAN, what else do I need to
>> | change in Linux (centos). will the change be reflected automatically?
>> |
>> |
>> | Thanks!
>> | Paras.
>> 
>> Hi Paras,
>> 
>> The GFS2 file system doesn't care what HBA you're using.
>> So as long as your kernel has a good device driver for that HBA
>> you shouldn't need to do anything else.
>> 
>> Regards,
>> 
>> Bob Peterson
>> Red Hat File Systems
>> 
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From pradhanparas at gmail.com  Tue Sep 13 22:11:03 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 13 Sep 2011 17:11:03 -0500
Subject: [Linux-cluster] replacing HBA
In-Reply-To: <C85BC780-789B-48AA-B4C5-4EDDA519B14C@gmail.com>
References: <CADyt5g=v0td-R0vPtO9hX_mWsM_K9ur_eR=nqneHVZZLAV4wxg@mail.gmail.com>
	<405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
	<CADyt5gnUWUEViNje-HOvOjsDEJxpZo0Gj7CdsY0GBHtSdQOqsg@mail.gmail.com>
	<C85BC780-789B-48AA-B4C5-4EDDA519B14C@gmail.com>
Message-ID: <CADyt5gmTgTZgtaGqjT3OQbrNsUPT0SFLauUcO4tcHX9gSj076w@mail.gmail.com>

On Tue, Sep 13, 2011 at 4:30 PM, Keith Schincke
<keith.schincke at gmail.com> wrote:
> How many paths doe you currently have to your disk?
> Does your LVM use the multipath name (mpath0)?

Right now only one path with no multipath configured so LVM is not
using mpath0. Ideas?

Thanks!
Paras.


>
> Sent from my iPhone
>
> On Sep 13, 2011, at 14:46, Paras pradhan <pradhanparas at gmail.com> wrote:
>
>> Thanks Bob.
>>
>> Another question. What about replacing single port HBA with a dual
>> port. After configuring the multipathd, can I reconfigure physical
>> volume without destroying the vg, lv and clvm ? I am kinddda lost
>> here.
>>
>> Thanks
>> Paras.
>>
>> On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson <rpeterso at redhat.com> wrote:
>>> ----- Original Message -----
>>> | Hi,
>>> |
>>> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
>>> | cluster. Apart from changing wwn in the SAN, what else do I need to
>>> | change in Linux (centos). will the change be reflected automatically?
>>> |
>>> |
>>> | Thanks!
>>> | Paras.
>>>
>>> Hi Paras,
>>>
>>> The GFS2 file system doesn't care what HBA you're using.
>>> So as long as your kernel has a good device driver for that HBA
>>> you shouldn't need to do anything else.
>>>
>>> Regards,
>>>
>>> Bob Peterson
>>> Red Hat File Systems
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From keith.schincke at gmail.com  Tue Sep 13 22:50:56 2011
From: keith.schincke at gmail.com (Keith Schincke)
Date: Tue, 13 Sep 2011 17:50:56 -0500
Subject: [Linux-cluster] replacing HBA
In-Reply-To: <CADyt5gmTgTZgtaGqjT3OQbrNsUPT0SFLauUcO4tcHX9gSj076w@mail.gmail.com>
References: <CADyt5g=v0td-R0vPtO9hX_mWsM_K9ur_eR=nqneHVZZLAV4wxg@mail.gmail.com>
	<405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
	<CADyt5gnUWUEViNje-HOvOjsDEJxpZo0Gj7CdsY0GBHtSdQOqsg@mail.gmail.com>
	<C85BC780-789B-48AA-B4C5-4EDDA519B14C@gmail.com>
	<CADyt5gmTgTZgtaGqjT3OQbrNsUPT0SFLauUcO4tcHX9gSj076w@mail.gmail.com>
Message-ID: <CA+y8wqx0O00i9YZO4iP7jv8XL_Chc=bJ0YyC=NZrKm7D5eY1uA@mail.gmail.com>

Hmmm. The UUID of the physical volume should be written to disk (sdj) or
partition (sdj1) depending on your design.
kpartx should not care about the data on the disk (ie your UUID) when it
makes the mpathXpY entries.

Hopefully what will happen will be
- install your hba and zone the SAN as necessary
- enable multipathd and restart. This should create the mpathX entries.
multipath -ll will list the paths and disks
- run kpartx -a to add needed mpathXpY entries. I do not know if this runs
on startup.
- reboot and see if you can mount the LVM.

If all goes right, pvdisplay should display the multipath devices of your
PVs.


On Tue, Sep 13, 2011 at 5:11 PM, Paras pradhan <pradhanparas at gmail.com>wrote:

> On Tue, Sep 13, 2011 at 4:30 PM, Keith Schincke
> <keith.schincke at gmail.com> wrote:
> > How many paths doe you currently have to your disk?
> > Does your LVM use the multipath name (mpath0)?
>
> Right now only one path with no multipath configured so LVM is not
> using mpath0. Ideas?
>
> Thanks!
> Paras.
>
>
> >
> > Sent from my iPhone
> >
> > On Sep 13, 2011, at 14:46, Paras pradhan <pradhanparas at gmail.com> wrote:
> >
> >> Thanks Bob.
> >>
> >> Another question. What about replacing single port HBA with a dual
> >> port. After configuring the multipathd, can I reconfigure physical
> >> volume without destroying the vg, lv and clvm ? I am kinddda lost
> >> here.
> >>
> >> Thanks
> >> Paras.
> >>
> >> On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson <rpeterso at redhat.com>
> wrote:
> >>> ----- Original Message -----
> >>> | Hi,
> >>> |
> >>> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
> >>> | cluster. Apart from changing wwn in the SAN, what else do I need to
> >>> | change in Linux (centos). will the change be reflected automatically?
> >>> |
> >>> |
> >>> | Thanks!
> >>> | Paras.
> >>>
> >>> Hi Paras,
> >>>
> >>> The GFS2 file system doesn't care what HBA you're using.
> >>> So as long as your kernel has a good device driver for that HBA
> >>> you shouldn't need to do anything else.
> >>>
> >>> Regards,
> >>>
> >>> Bob Peterson
> >>> Red Hat File Systems
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110913/c6210b71/attachment.htm>

From pradhanparas at gmail.com  Thu Sep 15 16:50:01 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Thu, 15 Sep 2011 11:50:01 -0500
Subject: [Linux-cluster] replacing HBA
In-Reply-To: <CA+y8wqx0O00i9YZO4iP7jv8XL_Chc=bJ0YyC=NZrKm7D5eY1uA@mail.gmail.com>
References: <CADyt5g=v0td-R0vPtO9hX_mWsM_K9ur_eR=nqneHVZZLAV4wxg@mail.gmail.com>
	<405319760.118870.1315937048971.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
	<CADyt5gnUWUEViNje-HOvOjsDEJxpZo0Gj7CdsY0GBHtSdQOqsg@mail.gmail.com>
	<C85BC780-789B-48AA-B4C5-4EDDA519B14C@gmail.com>
	<CADyt5gmTgTZgtaGqjT3OQbrNsUPT0SFLauUcO4tcHX9gSj076w@mail.gmail.com>
	<CA+y8wqx0O00i9YZO4iP7jv8XL_Chc=bJ0YyC=NZrKm7D5eY1uA@mail.gmail.com>
Message-ID: <CADyt5gmNz1ts9szBjFfJFctTuBxanetbXZ6cWdh0BZyWw2Z6LQ@mail.gmail.com>

Thanks Keith. I will try and let all know how it goes.

Paras.

On Tue, Sep 13, 2011 at 5:50 PM, Keith Schincke
<keith.schincke at gmail.com> wrote:
> Hmmm. The UUID of the physical volume should be written to disk (sdj) or
> partition (sdj1) depending on your design.
> kpartx should not care about the data on the disk (ie your UUID) when it
> makes the mpathXpY entries.
>
> Hopefully what will happen will be
> - install your hba and zone the SAN as necessary
> - enable multipathd and restart. This should create the mpathX entries.
> multipath -ll will list the paths and disks
> - run kpartx -a to add needed mpathXpY entries. I do not know if this runs
> on startup.
> - reboot and see if you can mount the LVM.
>
> If all goes right, pvdisplay should display the multipath devices of your
> PVs.
>
>
> On Tue, Sep 13, 2011 at 5:11 PM, Paras pradhan <pradhanparas at gmail.com>
> wrote:
>>
>> On Tue, Sep 13, 2011 at 4:30 PM, Keith Schincke
>> <keith.schincke at gmail.com> wrote:
>> > How many paths doe you currently have to your disk?
>> > Does your LVM use the multipath name (mpath0)?
>>
>> Right now only one path with no multipath configured so LVM is not
>> using mpath0. Ideas?
>>
>> Thanks!
>> Paras.
>>
>>
>> >
>> > Sent from my iPhone
>> >
>> > On Sep 13, 2011, at 14:46, Paras pradhan <pradhanparas at gmail.com> wrote:
>> >
>> >> Thanks Bob.
>> >>
>> >> Another question. What about replacing single port HBA with a dual
>> >> port. After configuring the multipathd, can I reconfigure physical
>> >> volume without destroying the vg, lv and clvm ? I am kinddda lost
>> >> here.
>> >>
>> >> Thanks
>> >> Paras.
>> >>
>> >> On Tue, Sep 13, 2011 at 1:04 PM, Bob Peterson <rpeterso at redhat.com>
>> >> wrote:
>> >>> ----- Original Message -----
>> >>> | Hi,
>> >>> |
>> >>> | I am replacing a 2 Gig Qlogic HBA with a 4 Gig Qlogic HBA in my GFS2
>> >>> | cluster. Apart from changing wwn in the SAN, what else do I need to
>> >>> | change in Linux (centos). will the change be reflected
>> >>> automatically?
>> >>> |
>> >>> |
>> >>> | Thanks!
>> >>> | Paras.
>> >>>
>> >>> Hi Paras,
>> >>>
>> >>> The GFS2 file system doesn't care what HBA you're using.
>> >>> So as long as your kernel has a good device driver for that HBA
>> >>> you shouldn't need to do anything else.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Bob Peterson
>> >>> Red Hat File Systems
>> >>>
>> >>> --
>> >>> Linux-cluster mailing list
>> >>> Linux-cluster at redhat.com
>> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>
>> >>
>> >> --
>> >> Linux-cluster mailing list
>> >> Linux-cluster at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From carlopmart at gmail.com  Fri Sep 16 08:22:21 2011
From: carlopmart at gmail.com (carlopmart)
Date: Fri, 16 Sep 2011 10:22:21 +0200
Subject: [Linux-cluster] Corosync goes cpu to 95-99%
In-Reply-To: <4E2D940B.5020803@redhat.com>
References: <4DD29D03.9080901@gmail.com>	<4DD2BAC3.50509@redhat.com>	<4DD2BD7D.5070704@gmail.com>	<4DD2CA90.6090802@redhat.com>	<3B50BA7445114813AE429BEE51A2BA52@versa>	<4DD78908.2030801@gmail.com>	<0B1965C8-9807-42B6-9453-01BE0C0B1DCB@cybercat.ca><4DD80D5D.10004@gmail.com>	<4DD873C7.8080402@cybercat.ca>	<22E7D11CD5E64E338A66811F31F06238@versa>	<4DE545D7.1080703@redhat.com>	<4DE69786.5010204@gmail.com><4DE6CAF6.4000002@cybercat.ca>	<4DE75602.1000408@gmail.com>
	<51BB988BCCF547E69BF222BDAF34C4DE@versa>
	<4E04B61B.9070208@cybercat.ca> <4E2D63DD.4050007@gmail.com>
	<4E2D7329.6050607@redhat.com> <4E2D7425.4070801@gmail.com>
	<4E2D8ECB.6020305@redhat.com> <4E2D8F87.30508@gmail.com>
	<4E2D940B.5020803@redhat.com>
Message-ID: <4E73073D.8010209@gmail.com>

On 07/25/2011 06:04 PM, Steven Dake wrote:
> On 07/25/2011 08:45 AM, carlopmart wrote:
>> On 07/25/2011 05:42 PM, Steven Dake wrote:
>>>>>>> are caused by this issue.
>>>>>>>
>>>>>>> So, as a temporary work-around for this time, woule be (at your own
>>>>>>> risks) to downgrade to 2.6.32-71.29.1.el6 kernel :
>>>>>>>
>>>>>>> yum install kernel-2.6.32-71.29.1.el6.x86_64
>>>>>>>
>>>>>>> Regards,
>>>>>>
>>>>>> Hi Steven and Nicolas,
>>>>>>
>>>>>>     Is this bug resolved in RHEL6.1 with all updates applied?? Do I
>>>>>> need to
>>>>>> use some specific kernel version 2.6.32-131.2.1 or 2.6.32-131.6.1?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>
>>>>> the corosync portion is going through QE.  The kernel portion remains
>>>>> open.
>>>>>
>>>>> Regards
>>>>> -steve
>>>>>
>>>>
>>>> Thanks Steve, then, Can I use last corosync version provided with
>>>> RHEL6.1 and last RHEL6.0's kernel version without problems??
>>>>
>>>>
>>>>
>>>
>>> I recommend not mixing without a support signoff.
>>>
>>
>> Then, how can I install rhcs under rhel6.x and prevent this bug??
>>
>>
> get a support signoff.  Also the corosync updates have not finished
> through our validation process.  Only hot fixes (from support) are available
>
> Regards
> -steve
>

Sorry to re-open this thread ... But exists any news about this problem??

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From ext.thales.jean-daniel.bonnetot at sncf.fr  Fri Sep 16 12:54:02 2011
From: ext.thales.jean-daniel.bonnetot at sncf.fr (BONNETOT Jean-Daniel (EXT THALES))
Date: Fri, 16 Sep 2011 14:54:02 +0200
Subject: [Linux-cluster] Luci can't install packages
Message-ID: <C088D3516432C643AC828162A5164A7F0A82BDFC@se3lmwbibaw.COMMUN.AD.SNCF.FR>

Hello,

 

Usually I used manal installation but I need to process throu Luci. My
problem is present with RHEL 5.7 and RHEL 6.0 (luci and ricci), with
RHEL 5.6 it works correctly.

I used "Create" new cluster and add my nodes (options arenot important,
the problem is always here) and submit...

        "Please wait..."

    Creating node "node1" for cluster "clutest": installing packages

    Creating node "node2" for cluster "clutest": installing packages

 

I waited ;) but nothing. My process list on nodes says :

4166 ?        Ss     0:00 /usr/sbin/oddjobd -p /var/run/oddjobd.pid -t
300

22343 ?        S      0:00  \_ ricci-modrpm

22355 ?        S      0:01      \_ /usr/bin/python /usr/bin/yum -y list
all

4221 ?        S<s    0:09 ricci -u 236

22342 ?        S<s    0:00 /usr/libexec/ricci/ricci-worker -f
/var/lib/ricci/queue/1952735127

 

Nothing append, yum -y list all stay blocked... this command works well
manually.

 

I found some people with same problem on centos lists but no answers :(

Do you know what can trouble ricci ? Have someone already same problem ?

 

Best regards,

-------
Ce message et toutes les pi?ces jointes sont ?tablis ? l'intention exclusive de ses destinataires et sont confidentiels. L'int?grit? de ce message n'?tant pas assur?e sur Internet, la SNCF ne peut ?tre tenue responsable des alt?rations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, m?me partielle, non autoris?e pr?alablement par la SNCF, est strictement interdite. Si vous n'?tes pas le destinataire de ce message, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110916/5882ec40/attachment.htm>

From chekov at ucla.edu  Fri Sep 16 22:26:00 2011
From: chekov at ucla.edu (Alan Wood)
Date: Fri, 16 Sep 2011 15:26:00 -0700 (PDT)
Subject: [Linux-cluster] shared disk with virsh migration
In-Reply-To: <mailman.45.1316188821.32152.linux-cluster@redhat.com>
References: <mailman.45.1316188821.32152.linux-cluster@redhat.com>
Message-ID: <alpine.LFD.2.02.1109161508300.7708@cpe-76-169-211-90.socal.res.rr.com>

Hi all,

I'm trying to decide whether I really need a cluster implementation to do 
what I want to do and I figured I'd solicit opinions.
Essentially I want to have two machines running as virtualization hosts 
with libvirt/kvm.  I have shared iSCSI storage available to both hosts and 
have to decide how to configure the storage for use with libvirt.  Right 
now I see three possibilities:
1.  Setting an iSCSI storage pool in libvirt
 	Pros:   Migration seems painless, including live migration
 	Cons:   Need to pre-allocate LUNs on iSCSI box.
 		Does not seem to take advantage of iSCSI offloading or multipathing
2.  Setting up a two-node cluster and running CLVM
 	Pros:   Very flexible storage management (is snapshotting supported yet in clvm?)
 		Automatic failover
 	Cons:	Cluster infrastructure adds complexity, more potential for bugs
 		Possible split brain issues?
3.  A single iSCSI block device with partitions for each VM mounted on both hosts
 	Pros:	Easy migration, setup
 	Cons:	Two hosts accessing the same block device outside of a
 		cluster seems like it might lead to disaster

Right now I actually like option 3 but I'm wondering if I really am asking 
for trouble accessing a block device simultaneously on two hosts without a 
clustering infrastructure.  I did this a while back with a shared-SCSI box 
and it seemed to work.  I would never be accessing the same partition on 
both hosts and I understand that all partitioning has to be done while the 
other host is off, but is there something else I'm missing here?

Also, are people out there running option 2?  Does it make sesne to set up 
a cluster as small as 2-nodes for HA virtualization or do I really need 
more nodes for it to be worthwhile?  I do have all the fencing 
infrastructure I might need (PDUs and Dracs).

any help would be appreciated.  thanks
-alan



From ext.thales.jean-daniel.bonnetot at sncf.fr  Mon Sep 19 08:02:41 2011
From: ext.thales.jean-daniel.bonnetot at sncf.fr (BONNETOT Jean-Daniel (EXT THALES))
Date: Mon, 19 Sep 2011 10:02:41 +0200
Subject: [Linux-cluster] shared disk with virsh migration
In-Reply-To: <alpine.LFD.2.02.1109161508300.7708@cpe-76-169-211-90.socal.res.rr.com>
References: <mailman.45.1316188821.32152.linux-cluster@redhat.com>
	<alpine.LFD.2.02.1109161508300.7708@cpe-76-169-211-90.socal.res.rr.com>
Message-ID: <C088D3516432C643AC828162A5164A7F0A82C568@se3lmwbibaw.COMMUN.AD.SNCF.FR>

Hello,

I don't use KVM and libvirt but my experiment concerne clustering storage :
1. Don't know
2. Snapshotting is supported in clvm (since 5.7 I think)
  Complexity... yes
  Bugs... yes
  Split brain... yes 
  2 nodes is sufficient for HA, juste think what happens if 1 node shuts down and your VMs are very loded (needs 3rd nodes ?)
3. No experiment too but it sounds like it's not the right usage

Best regards
--
JD

-----Message d'origine-----
De?: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] De la part de Alan Wood
Envoy??: samedi 17 septembre 2011 00:26
??: linux-cluster at redhat.com
Objet?: [Linux-cluster] shared disk with virsh migration

Hi all,

I'm trying to decide whether I really need a cluster implementation to do 
what I want to do and I figured I'd solicit opinions.
Essentially I want to have two machines running as virtualization hosts 
with libvirt/kvm.  I have shared iSCSI storage available to both hosts and 
have to decide how to configure the storage for use with libvirt.  Right 
now I see three possibilities:
1.  Setting an iSCSI storage pool in libvirt
 	Pros:   Migration seems painless, including live migration
 	Cons:   Need to pre-allocate LUNs on iSCSI box.
 		Does not seem to take advantage of iSCSI offloading or multipathing
2.  Setting up a two-node cluster and running CLVM
 	Pros:   Very flexible storage management (is snapshotting supported yet in clvm?)
 		Automatic failover
 	Cons:	Cluster infrastructure adds complexity, more potential for bugs
 		Possible split brain issues?
3.  A single iSCSI block device with partitions for each VM mounted on both hosts
 	Pros:	Easy migration, setup
 	Cons:	Two hosts accessing the same block device outside of a
 		cluster seems like it might lead to disaster

Right now I actually like option 3 but I'm wondering if I really am asking 
for trouble accessing a block device simultaneously on two hosts without a 
clustering infrastructure.  I did this a while back with a shared-SCSI box 
and it seemed to work.  I would never be accessing the same partition on 
both hosts and I understand that all partitioning has to be done while the 
other host is off, but is there something else I'm missing here?

Also, are people out there running option 2?  Does it make sesne to set up 
a cluster as small as 2-nodes for HA virtualization or do I really need 
more nodes for it to be worthwhile?  I do have all the fencing 
infrastructure I might need (PDUs and Dracs).

any help would be appreciated.  thanks
-alan

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------
Ce message et toutes les pi?ces jointes sont ?tablis ? l'intention exclusive de ses destinataires et sont confidentiels. L'int?grit? de ce message n'?tant pas assur?e sur Internet, la SNCF ne peut ?tre tenue responsable des alt?rations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, m?me partielle, non autoris?e pr?alablement par la SNCF, est strictement interdite. Si vous n'?tes pas le destinataire de ce message, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it. 




From carlopmart at gmail.com  Mon Sep 19 09:09:40 2011
From: carlopmart at gmail.com (carlopmart)
Date: Mon, 19 Sep 2011 11:09:40 +0200
Subject: [Linux-cluster] Rotating apache logs when is configured as a
	resource under RHCS
Message-ID: <4E7706D4.2070201@gmail.com>

Hi all,

  I have configured an apache resource under cluster.conf like this: 
(both nodes are RHEL6.1)

<apache config_file="/data/config/etc/httpd/conf/httpd-mirror.conf" 
name="httpd-mirror" server_root="/data/config/etc/httpd" shutdown_wait="3"/>

  My question is: which is the best form to rotate apache logs using 
logrotate configuration??

  Is this a possible solution:

  /var/log/httpd/*log {
     missingok
     notifempty
     sharedscripts
     delaycompress
     postrotate
         if [ -f /var/run/cluster/apache/apache:httpd-mirror.pid ]; then
             clusvcadm -R httpd-mirror
         fi
     endscript
}
-- 

CL Martinez
carlopmart {at} gmail {d0t} com



From pmshehzad at yahoo.com  Mon Sep 19 06:07:40 2011
From: pmshehzad at yahoo.com (pmshehzad at yahoo.com)
Date: Mon, 19 Sep 2011 06:07:40 
Subject: [Linux-cluster] hi cluster
Message-ID: eca75ac689a87dca0122ec482a6b9609@[192.168.1.1]

hows it going this is really interesting http://blog.news7ifinance.com/ see you around



From harry.sutton at hp.com  Mon Sep 19 12:34:53 2011
From: harry.sutton at hp.com (Sutton, Harry (HAS GSE))
Date: Mon, 19 Sep 2011 08:34:53 -0400
Subject: [Linux-cluster] shared disk with virsh migration
In-Reply-To: <alpine.LFD.2.02.1109161508300.7708@cpe-76-169-211-90.socal.res.rr.com>
References: <mailman.45.1316188821.32152.linux-cluster@redhat.com>
	<alpine.LFD.2.02.1109161508300.7708@cpe-76-169-211-90.socal.res.rr.com>
Message-ID: <4E7736ED.9000607@hp.com>

I'd have to do some research to verify, but I'm guessing that iSCSI (in 
option 3) would use the traditional SCSI reservation mechanism to 
prevent problems associated with multiple access.

     /Harry

On 09/16/2011 06:26 PM, Alan Wood wrote:
> Hi all,
>
> I'm trying to decide whether I really need a cluster implementation to do
> what I want to do and I figured I'd solicit opinions.
> Essentially I want to have two machines running as virtualization hosts
> with libvirt/kvm.  I have shared iSCSI storage available to both hosts and
> have to decide how to configure the storage for use with libvirt.  Right
> now I see three possibilities:
> 1.  Setting an iSCSI storage pool in libvirt
>   	Pros:   Migration seems painless, including live migration
>   	Cons:   Need to pre-allocate LUNs on iSCSI box.
>   		Does not seem to take advantage of iSCSI offloading or multipathing
> 2.  Setting up a two-node cluster and running CLVM
>   	Pros:   Very flexible storage management (is snapshotting supported yet in clvm?)
>   		Automatic failover
>   	Cons:	Cluster infrastructure adds complexity, more potential for bugs
>   		Possible split brain issues?
> 3.  A single iSCSI block device with partitions for each VM mounted on both hosts
>   	Pros:	Easy migration, setup
>   	Cons:	Two hosts accessing the same block device outside of a
>   		cluster seems like it might lead to disaster
>
> Right now I actually like option 3 but I'm wondering if I really am asking
> for trouble accessing a block device simultaneously on two hosts without a
> clustering infrastructure.  I did this a while back with a shared-SCSI box
> and it seemed to work.  I would never be accessing the same partition on
> both hosts and I understand that all partitioning has to be done while the
> other host is off, but is there something else I'm missing here?
>
> Also, are people out there running option 2?  Does it make sesne to set up
> a cluster as small as 2-nodes for HA virtualization or do I really need
> more nodes for it to be worthwhile?  I do have all the fencing
> infrastructure I might need (PDUs and Dracs).
>
> any help would be appreciated.  thanks
> -alan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5069 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110919/715bb372/attachment.p7s>

From jmd_singhsaini at yahoo.com  Tue Sep 20 05:40:55 2011
From: jmd_singhsaini at yahoo.com (Harvinder Singh Binder)
Date: Tue, 20 Sep 2011 11:10:55 +0530 (IST)
Subject: [Linux-cluster] Rotating apache logs when is configured as a
	resource under RHCS
In-Reply-To: <4E7706D4.2070201@gmail.com>
Message-ID: <1316497255.25429.YahooMailClassic@web94809.mail.in2.yahoo.com>

how i  configure media player in linux operation system
please tell me about configure procedure(Commands).






Harvinder Singh S/O Baldev Raj, VPO Barwa Teh. Anandpur Sahib, Dist. Ropar, PunjabE-Mail ID:- ? ? jmd_singhsaini at yahoo.com
--- On Mon, 19/9/11, carlopmart <carlopmart at gmail.com> wrote:

> From: carlopmart <carlopmart at gmail.com>
> Subject: [Linux-cluster] Rotating apache logs when is configured as a resource under RHCS
> To: linux-cluster at redhat.com
> Date: Monday, 19 September, 2011, 2:09 AM
> Hi all,
> 
>  I have configured an apache resource under cluster.conf
> like this: (both nodes are RHEL6.1)
> 
> <apache
> config_file="/data/config/etc/httpd/conf/httpd-mirror.conf"
> name="httpd-mirror" server_root="/data/config/etc/httpd"
> shutdown_wait="3"/>
> 
>  My question is: which is the best form to rotate apache
> logs using logrotate configuration??
> 
>  Is this a possible solution:
> 
>  /var/log/httpd/*log {
> ? ? missingok
> ? ? notifempty
> ? ? sharedscripts
> ? ? delaycompress
> ? ? postrotate
> ? ? ? ? if [ -f
> /var/run/cluster/apache/apache:httpd-mirror.pid ]; then
> ? ? ? ? ? ? clusvcadm -R
> httpd-mirror
> ? ? ? ? fi
> ? ? endscript
> }
> -- 
> CL Martinez
> carlopmart {at} gmail {d0t} com
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From sdake at redhat.com  Tue Sep 20 18:13:45 2011
From: sdake at redhat.com (Steven Dake)
Date: Tue, 20 Sep 2011 11:13:45 -0700
Subject: [Linux-cluster] New Corosync Mailing list - Please register for it!
Message-ID: <4E78D7D9.7060001@redhat.com>

Hi,

Over the past several years, we have been sharing a mailing list with
the openais project.  I have made a new mailing list specifically for
corosync:  This will be the permanent new list for corosync.

Please register at:
http://lists.corosync.org/mailman/listinfo

The list is called "discuss"

Q Why are we making this change now?

A Several weeks ago Linux Foundation was hacked into (see
http://www.linuxfoundation.org).  They hosted our mailing list service.
 During this event, the mailing list has been unusable.  The Linux
Foundation staff is busy rebuilding their network, but in the interim
this seems like a good opportunity to move everything to our core
infrastructure at corosync.org.

Q What about the archives?

A I hope to restore the archives once I can get the records from Linux
Foundation.  There is no guarantee I can get a restored copy of the
archive however.  Fortunately several services over the years have
archived our mailing list.

Q What about my registration on the openais mailing list?

A I don't have the records to transfer the registrations to the corosync
list, so you will have to sign up for the mailing list again.

Q Is my password that I used to register on the openais mailing list
compromised?

A I do not know what extent the systems were hacked, but I'd recommend
treating the password as compromised.  If you shared this password with
other services, please change it.  Mailman stores passwords in plaintext
so that it can mail them to you once a month.  Always use unique
passwords on mailman mailing lists.

Regards
-steve



From laszlo at beres.me  Thu Sep 22 14:57:27 2011
From: laszlo at beres.me (Laszlo Beres)
Date: Thu, 22 Sep 2011 16:57:27 +0200
Subject: [Linux-cluster] Lost connection to storage - what happens?
Message-ID: <CAJR4=ni6A0zY2JzNmgDEJznTmf06neV7-rNMPb=5pkO-rj162w@mail.gmail.com>

Hi,

just a theoretical question: let's assume we have a cluster with GFS2
filesystem (not as a managed resource). What happens exactly if all
paths to backend device get lost? It's not a cluster event, so I
assume cluster operates normally, but what does GFS2/DLM do?

Regards,

-- 
L?szl? B?res? ? ? ? ? ? Unix system engineer
http://www.google.com/profiles/beres.laszlo



From carlopmart at gmail.com  Mon Sep 26 09:18:11 2011
From: carlopmart at gmail.com (carlopmart)
Date: Mon, 26 Sep 2011 11:18:11 +0200
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
	rhel6.x?
Message-ID: <4E804353.1040605@gmail.com>

Hi all,

  Due to continuous problems with corosync 
(https://bugzilla.redhat.com/show_bug.cgi?id=709758, 
https://www.redhat.com/archives/linux-cluster/2011-July/msg00074.html) 
under rhel6.x (I have a trial subscription, that I will convert to 
permanent subscription when all works ok), I would like to know when 
corosync-1.4.1-3.el6, will be released for rhel6.1. Any??

Thanks ...

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From ajb2 at mssl.ucl.ac.uk  Mon Sep 26 10:01:09 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Mon, 26 Sep 2011 11:01:09 +0100
Subject: [Linux-cluster] Lost connection to storage - what happens?
In-Reply-To: <CAJR4=ni6A0zY2JzNmgDEJznTmf06neV7-rNMPb=5pkO-rj162w@mail.gmail.com>
References: <CAJR4=ni6A0zY2JzNmgDEJznTmf06neV7-rNMPb=5pkO-rj162w@mail.gmail.com>
Message-ID: <4E804D65.6090808@mssl.ucl.ac.uk>

Laszlo Beres wrote:
> Hi,
> 
> just a theoretical question: let's assume we have a cluster with GFS2
> filesystem (not as a managed resource). What happens exactly if all
> paths to backend device get lost? 

GFS2 withdraws that filesystem and you'll have to reboot all the 
withdrawn machines to get it back, once the paths are restored.

GFS doesn't require a reboot.

Redhat argue this is not a regression as GFS2 is not GFS





From jfriesse at redhat.com  Mon Sep 26 10:31:41 2011
From: jfriesse at redhat.com (Jan Friesse)
Date: Mon, 26 Sep 2011 12:31:41 +0200
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
 rhel6.x?
In-Reply-To: <4E804353.1040605@gmail.com>
References: <4E804353.1040605@gmail.com>
Message-ID: <4E80548D.1070904@redhat.com>

carlopmart napsal(a):
> Hi all,
> 
>  Due to continuous problems with corosync 
> (https://bugzilla.redhat.com/show_bug.cgi?id=709758, 
> https://www.redhat.com/archives/linux-cluster/2011-July/msg00074.html) 
> under rhel6.x (I have a trial subscription, that I will convert to 
> permanent subscription when all works ok), I would like to know when 
> corosync-1.4.1-3.el6, will be released for rhel6.1. Any??

We are not doing rebases in Z streams, so Corosync 1.4.1 will be never 
released for RHEL 6.1. It will be available in RHEL 6.2.

Regards,
   Honza

> 
> Thanks ...
> 



From carlopmart at gmail.com  Mon Sep 26 10:51:20 2011
From: carlopmart at gmail.com (carlopmart)
Date: Mon, 26 Sep 2011 12:51:20 +0200
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
 rhel6.x?
In-Reply-To: <4E80548D.1070904@redhat.com>
References: <4E804353.1040605@gmail.com> <4E80548D.1070904@redhat.com>
Message-ID: <4E805928.3020009@gmail.com>

On 09/26/2011 12:31 PM, Jan Friesse wrote:
> carlopmart napsal(a):
>> Hi all,
>>
>> Due to continuous problems with corosync
>> (https://bugzilla.redhat.com/show_bug.cgi?id=709758,
>> https://www.redhat.com/archives/linux-cluster/2011-July/msg00074.html)
>> under rhel6.x (I have a trial subscription, that I will convert to
>> permanent subscription when all works ok), I would like to know when
>> corosync-1.4.1-3.el6, will be released for rhel6.1. Any??
>
> We are not doing rebases in Z streams, so Corosync 1.4.1 will be never
> released for RHEL 6.1. It will be available in RHEL 6.2.
>
> Regards,
> Honza
>
>>

But can be released a version that solves the bugs for rhel6.1 before 
rhel6.2?

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From jfriesse at redhat.com  Mon Sep 26 11:34:22 2011
From: jfriesse at redhat.com (Jan Friesse)
Date: Mon, 26 Sep 2011 13:34:22 +0200
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
 rhel6.x?
In-Reply-To: <4E805928.3020009@gmail.com>
References: <4E804353.1040605@gmail.com> <4E80548D.1070904@redhat.com>
	<4E805928.3020009@gmail.com>
Message-ID: <4E80633E.8020409@redhat.com>

carlopmart napsal(a):
> On 09/26/2011 12:31 PM, Jan Friesse wrote:
>> carlopmart napsal(a):
>>> Hi all,
>>>
>>> Due to continuous problems with corosync
>>> (https://bugzilla.redhat.com/show_bug.cgi?id=709758,
>>> https://www.redhat.com/archives/linux-cluster/2011-July/msg00074.html)
>>> under rhel6.x (I have a trial subscription, that I will convert to
>>> permanent subscription when all works ok), I would like to know when
>>> corosync-1.4.1-3.el6, will be released for rhel6.1. Any??
>>
>> We are not doing rebases in Z streams, so Corosync 1.4.1 will be never
>> released for RHEL 6.1. It will be available in RHEL 6.2.
>>
>> Regards,
>> Honza
>>
>>>
> 
> But can be released a version that solves the bugs for rhel6.1 before 
> rhel6.2?
> 

Please take your time to read how RHEL release process works, but 
basically and shortly. Ya, it's called EUS (Z-stream), and primary 
purpose is for really hard/security bugs. To be honest, 709758 may be 
annoying bug, but it doesn't fit to Z-stream very well, especially 
because it can be seen only in very special conditions/broken environments.

Regards,
   Honza



From carlopmart at gmail.com  Mon Sep 26 11:55:47 2011
From: carlopmart at gmail.com (carlopmart)
Date: Mon, 26 Sep 2011 13:55:47 +0200
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
 rhel6.x?
In-Reply-To: <4E80633E.8020409@redhat.com>
References: <4E804353.1040605@gmail.com>
	<4E80548D.1070904@redhat.com>	<4E805928.3020009@gmail.com>
	<4E80633E.8020409@redhat.com>
Message-ID: <4E806843.6060202@gmail.com>

On 09/26/2011 01:34 PM, Jan Friesse wrote:
> Please take your time to read how RHEL release process works, but
> basically and shortly. Ya, it's called EUS (Z-stream), and primary
> purpose is for really hard/security bugs. To be honest, 709758 may be
> annoying bug, but it doesn't fit to Z-stream very well, especially
> because it can be seen only in very special conditions/broken environments.

But problem described in 709758 appears in my enviroment: One RHEL6.1 
kvm host with two, only two with single CPUs, rhel6.1 guests running 
RHCS ...

See this:

a) running top on a rhel6.1 guest:

top - 13:50:02 up  4:25,  4 users,  load average: 5.91, 5.99, 6.71
Tasks: 132 total,   5 running, 127 sleeping,   0 stopped,   0 zombie
Cpu(s): 96.7%us,  3.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st
Mem:   1289092k total,   259524k used,  1029568k free,    24692k buffers
Swap:  1309688k total,        0k used,  1309688k free,   110376k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
 

  1260 root      RT   0 88572  84m  57m R 94.3  6.7 132:46.40 corosync 
 

10475 root      19  -1 18704 1468  732 R  2.3  0.1   2:01.54 clulog 
 

10454 root      19  -1 18704 1512  764 R  2.0  0.1   2:01.93 clulog 
 

10654 root      20   0  5352 1688 1244 S  0.3  0.1   0:06.76 rgmanager 
 

11681 root      20   0  2672 1132  864 S  0.3  0.1   0:03.43 top

b) trying to stop rgmanager under rhel6.1 kvm guest, never stops:

[root at rhelclunode01 tmp]# time service rgmanager stop
Stopping Cluster Service Manager:

c) running top under rhel6.1 kvm host:

top - 13:52:00 up  4:32,  1 user,  load average: 1.00, 1.00, 0.93
Tasks: 143 total,   1 running, 142 sleeping,   0 stopped,   0 zombie
Cpu(s): 26.4%us,  1.5%sy,  0.0%ni, 72.2%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st
Mem:   5088504k total,  3656212k used,  1432292k free,    57832k buffers
Swap:  5242872k total,        0k used,  5242872k free,  1240980k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
 

  2659 qemu      20   0 1526m 1.2g 3880 S 100.1 25.3 182:17.81 qemu-kvm 
 

  2445 qemu      20   0 1350m 592m 3960 S  6.0 11.9  13:55.74 qemu-kvm 
 

  2203 root      20   0  683m  15m 4904 S  3.0  0.3   7:56.55 libvirtd 
 

  2524 root      20   0     0    0    0 S  1.0  0.0   1:01.55 kvm-pit-wq 
 

  2279 qemu      20   0  852m 534m 3900 S  0.7 10.8   1:31.42 qemu-kvm

d) ps ax |grep qemu-kvm, under rhel6.1 kvm host:

  2659 ?        Sl   183:01 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -cpu 
qemu32 -enable-kvm -m 1280 -smp 1,sockets=1,cores=1,threads=1 -name 
rhelclunode01 -uuid 5f0c1503-34a0-771b-1cde-bbe257447590 -nodefconfig 
-nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhelclunode01.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -netdev 
tap,fd=21,id=hostnet0,vhost=on,vhostfd=25 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:50:56:17:ad:8f,bus=pci.0,addr=0x3,bootindex=1 
-netdev tap,fd=26,id=hostnet1,vhost=on,vhostfd=27 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=00:50:56:36:59:a7,bus=pci.0,addr=0x4 
-chardev pty,id=charserial0 -device 
isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2 -vga 
cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

  Then, what could be the solution if not fix will be released until 
rhel6.2?? disable all rhcs services and don't install RHCS netither on 
virtual or physical enviroments??

  Thanks.

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From jfriesse at redhat.com  Mon Sep 26 13:17:15 2011
From: jfriesse at redhat.com (Jan Friesse)
Date: Mon, 26 Sep 2011 15:17:15 +0200
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
 rhel6.x?
In-Reply-To: <4E806843.6060202@gmail.com>
References: <4E804353.1040605@gmail.com>	<4E80548D.1070904@redhat.com>	<4E805928.3020009@gmail.com>	<4E80633E.8020409@redhat.com>
	<4E806843.6060202@gmail.com>
Message-ID: <4E807B5B.5000606@redhat.com>

carlopmart napsal(a):
> On 09/26/2011 01:34 PM, Jan Friesse wrote:
>> Please take your time to read how RHEL release process works, but
>> basically and shortly. Ya, it's called EUS (Z-stream), and primary
>> purpose is for really hard/security bugs. To be honest, 709758 may be
>> annoying bug, but it doesn't fit to Z-stream very well, especially
>> because it can be seen only in very special conditions/broken 
>> environments.
> 
> But problem described in 709758 appears in my enviroment: One RHEL6.1 

Please contact GSS (Global Support Service). They can help you to:
- Check if your configuration is valid
- Check if architecture is valid
- Give you "not yet" released package and/or hot fix
- Propose backport to Z-stream for given bug

-> Basically everything what you are/will pay them for.

Thanks,
   Honza



From matthew.painter at kusiri.com  Mon Sep 26 15:55:11 2011
From: matthew.painter at kusiri.com (Matthew Painter)
Date: Mon, 26 Sep 2011 16:55:11 +0100
Subject: [Linux-cluster] Manual multicasting address for CMAN bug
Message-ID: <CALj8VcyXuzHiJ+Pc8K_sHSE3J5Aqj7bAmnM3LuHxnvEZHoZfpw@mail.gmail.com>

Hi all,

I have been trying to set up a cluster of 3 on Red Hat 6.1 using a cisco
switch, and therefore a fixed multicast address - 239.192.15.224 in this
case.

All the docs etc. say to add to the cluster.conf:

        <cman>
                <multicast addr="239.192.15.224"/>
        </cman>

This seems to work and a cman_tool status brings back the correct multicast
address, but has a Quorum status of "Activity Blocked", because the culster
nodes never join.

*However* if I manually run "cman_tool leave" and then "cman_tool join -m
239.192.15.224", the nodes can see each other.

Does anyone know if this is this a known issue? I can't find any information
about it.

Thanks for all your help :)

Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110926/ae49c600/attachment.htm>

From rhayden.public at gmail.com  Mon Sep 26 16:20:35 2011
From: rhayden.public at gmail.com (Robert Hayden)
Date: Mon, 26 Sep 2011 11:20:35 -0500
Subject: [Linux-cluster] Manual multicasting address for CMAN bug
In-Reply-To: <CALj8VcyXuzHiJ+Pc8K_sHSE3J5Aqj7bAmnM3LuHxnvEZHoZfpw@mail.gmail.com>
References: <CALj8VcyXuzHiJ+Pc8K_sHSE3J5Aqj7bAmnM3LuHxnvEZHoZfpw@mail.gmail.com>
Message-ID: <CANqTVAH3Y3pr3yns6zWPyQ9QnghL7qd1KCyq2RJFjMN3H0TqjA@mail.gmail.com>

You might try to add the multicast stanza inside the <clusternode> stanza as
well.  You can specify an specific interface as well.

For example,
                <clusternode name="node1.company.com" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="iLO_node1"/>
                                </method>
                        </fence>
                        <multicast addr="239.192.15.224" interface="bond1"/>
                </clusternode>

I have gotten this to work internally, but your environment may be
different.

Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110926/7d0f47b1/attachment.htm>

From matthew.painter at kusiri.com  Mon Sep 26 16:40:21 2011
From: matthew.painter at kusiri.com (Matthew Painter)
Date: Mon, 26 Sep 2011 17:40:21 +0100
Subject: [Linux-cluster] Manual multicasting address for CMAN bug
In-Reply-To: <CALj8VcyXuzHiJ+Pc8K_sHSE3J5Aqj7bAmnM3LuHxnvEZHoZfpw@mail.gmail.com>
References: <CALj8VcyXuzHiJ+Pc8K_sHSE3J5Aqj7bAmnM3LuHxnvEZHoZfpw@mail.gmail.com>
Message-ID: <CALj8Vcz89+GbzSBbXXtLag1Sac7Lc70V5WtP_Z+NhHOHpGpBYw@mail.gmail.com>

Hi Robert,

Thanks for your suggestion. I had tried this, and it gave an error when
starting cman due to incorrect configuration - turns out it is a 5.x option,
not needed for 6.x because it works out the interface based on the cluster
ip address.

Thanks anyway :)



You might try to add the multicast stanza inside the <clusternode> stanza as
well.  You can specify an specific interface as well.

For example,
                <clusternode name="node1.company.com" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="iLO_node1"/>
                                </method>
                        </fence>
                        <multicast addr="239.192.15.224" interface="bond1"/>
                </clusternode>

I have gotten this to work internally, but your environment may be
different.

Robert


On Mon, Sep 26, 2011 at 4:55 PM, Matthew Painter <matthew.painter at kusiri.com
> wrote:

> Hi all,
>
> I have been trying to set up a cluster of 3 on Red Hat 6.1 using a cisco
> switch, and therefore a fixed multicast address - 239.192.15.224 in this
> case.
>
> All the docs etc. say to add to the cluster.conf:
>
>         <cman>
>                 <multicast addr="239.192.15.224"/>
>         </cman>
>
> This seems to work and a cman_tool status brings back the correct multicast
> address, but has a Quorum status of "Activity Blocked", because the culster
> nodes never join.
>
> *However* if I manually run "cman_tool leave" and then "cman_tool join -m
> 239.192.15.224", the nodes can see each other.
>
> Does anyone know if this is this a known issue? I can't find any
> information about it.
>
> Thanks for all your help :)
>
> Matt
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110926/6095ed2e/attachment.htm>

From fdinitto at redhat.com  Mon Sep 26 17:53:36 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 26 Sep 2011 19:53:36 +0200
Subject: [Linux-cluster] Manual multicasting address for CMAN bug
In-Reply-To: <CANqTVAH3Y3pr3yns6zWPyQ9QnghL7qd1KCyq2RJFjMN3H0TqjA@mail.gmail.com>
References: <CALj8VcyXuzHiJ+Pc8K_sHSE3J5Aqj7bAmnM3LuHxnvEZHoZfpw@mail.gmail.com>
	<CANqTVAH3Y3pr3yns6zWPyQ9QnghL7qd1KCyq2RJFjMN3H0TqjA@mail.gmail.com>
Message-ID: <4E80BC20.50507@redhat.com>

On 09/26/2011 06:20 PM, Robert Hayden wrote:
> You might try to add the multicast stanza inside the <clusternode>
> stanza as well.  You can specify an specific interface as well.
> 
> For example,
>                 <clusternode name="node1.company.com
> <http://node1.company.com>" nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="iLO_node1"/>
>                                 </method>
>                         </fence>
>                         <multicast addr="239.192.15.224" interface="bond1"/>
>                 </clusternode>
> 
> I have gotten this to work internally, but your environment may be
> different.

this definitely doesn't not work in RHEL6.1.

multicast is never parsed in that config section.

Fabio



From fdinitto at redhat.com  Mon Sep 26 17:55:00 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 26 Sep 2011 19:55:00 +0200
Subject: [Linux-cluster] Manual multicasting address for CMAN bug
In-Reply-To: <CALj8VcyXuzHiJ+Pc8K_sHSE3J5Aqj7bAmnM3LuHxnvEZHoZfpw@mail.gmail.com>
References: <CALj8VcyXuzHiJ+Pc8K_sHSE3J5Aqj7bAmnM3LuHxnvEZHoZfpw@mail.gmail.com>
Message-ID: <4E80BC74.9090601@redhat.com>

For all RHEL related problems you need to contact GSS.

You also filed https://bugzilla.redhat.com/show_bug.cgi?id=741345

to track your issue.

Please provide the requested info.

Fabio

On 09/26/2011 05:55 PM, Matthew Painter wrote:
> Hi all,
>  
> I have been trying to set up a cluster of 3 on Red Hat 6.1 using a cisco
> switch, and therefore a fixed multicast address - 239.192.15.224 in this
> case.
>  
> All the docs etc. say to add to the cluster.conf:
>  
>         <cman>
>                 <multicast addr="239.192.15.224"/>
>         </cman>
>  
> This seems to work and a cman_tool status brings back the correct
> multicast address, but has a Quorum status of "Activity Blocked",
> because the culster nodes never join.
>  
> *However* if I manually run "cman_tool leave" and then "cman_tool join
> -m 239.192.15.224", the nodes can see each other.
>  
> Does anyone know if this is this a known issue? I can't find any
> information about it.
>  
> Thanks for all your help :)
>  
> Matt
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From matthew.painter at kusiri.com  Mon Sep 26 17:59:46 2011
From: matthew.painter at kusiri.com (Matthew Painter)
Date: Mon, 26 Sep 2011 18:59:46 +0100
Subject: [Linux-cluster] Manual multicasting address for CMAN bug
In-Reply-To: <4E80BC74.9090601@redhat.com>
References: <CALj8VcyXuzHiJ+Pc8K_sHSE3J5Aqj7bAmnM3LuHxnvEZHoZfpw@mail.gmail.com>
	<4E80BC74.9090601@redhat.com>
Message-ID: <CALj8VcwmtaO20M=HTWR288gXbbR_Hz73KN8jH5roOSopxgYc=w@mail.gmail.com>

Indeed, I also opened a bug.

The issue is a dupe of a known issue - I have updated the bug accordingly.

Thank you Fabio for helping me find a work around in setting the TTL
manually :)

Matt

On Mon, Sep 26, 2011 at 6:55 PM, Fabio M. Di Nitto <fdinitto at redhat.com>wrote:

> For all RHEL related problems you need to contact GSS.
>
> You also filed https://bugzilla.redhat.com/show_bug.cgi?id=741345
>
> to track your issue.
>
> Please provide the requested info.
>
> Fabio
>
> On 09/26/2011 05:55 PM, Matthew Painter wrote:
> > Hi all,
> >
> > I have been trying to set up a cluster of 3 on Red Hat 6.1 using a cisco
> > switch, and therefore a fixed multicast address - 239.192.15.224 in this
> > case.
> >
> > All the docs etc. say to add to the cluster.conf:
> >
> >         <cman>
> >                 <multicast addr="239.192.15.224"/>
> >         </cman>
> >
> > This seems to work and a cman_tool status brings back the correct
> > multicast address, but has a Quorum status of "Activity Blocked",
> > because the culster nodes never join.
> >
> > *However* if I manually run "cman_tool leave" and then "cman_tool join
> > -m 239.192.15.224", the nodes can see each other.
> >
> > Does anyone know if this is this a known issue? I can't find any
> > information about it.
> >
> > Thanks for all your help :)
> >
> > Matt
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110926/6658f042/attachment.htm>

From Jeremy.Lyon at us.ibm.com  Mon Sep 26 18:56:24 2011
From: Jeremy.Lyon at us.ibm.com (Jeremy Lyon)
Date: Mon, 26 Sep 2011 12:56:24 -0600
Subject: [Linux-cluster] display and release gfs locks
Message-ID: <OF2703CF2F.9A3E5C8D-ON87257917.0066241B-87257917.00680AB2@us.ibm.com>



Hi,

We have an 8 node cluster running SASgrid. We have the core components of
SAS under RHCS (rgmanager) control, but there are user/client jobs that are
initiated manually and by cron outside of RHCS. We have run into an issue a
few times where it seems that when the gfs init script is called to unmount
all the file systems and it kills off all the processes using the gfs file
systems, the gfs on the other nodes locks up and hangs. The node leaving
the cluster via a reboot appears to have left cleanly (cman_tool services
doesn't show any *WAIT* states) but everything is hung and requires a
complete reboot of the cluster to get things going. We are wondering if the
killing of the processes by the gfs init script, which uses fuser to try to
kill gracefully but then uses a -9, could be issuing the -9 and thus
leaving locks in DLM that could be causing this issue.

Is this possible? I would think that if a node has properly/cleanly left
the cluster, locks that were held by that node would be released. Is there
a way to display locks that may be still existing for that node that is
down? And lastly, is there a way to force the release of those locks with
out the reboot of the cluster? I've been searching the linux-cluster
archives with little success.

RHEL 5.6
cman-2.0.115-68.el5_6.3
gfs-utils-0.1.20-8.el5
kmod-gfs-0.1.34-12.el5


Thanks
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110926/ad07328b/attachment.htm>

From kkovachev at varna.net  Tue Sep 27 07:41:17 2011
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Tue, 27 Sep 2011 10:41:17 +0300
Subject: [Linux-cluster] display and release gfs locks
In-Reply-To: <OF2703CF2F.9A3E5C8D-ON87257917.0066241B-87257917.00680AB2@us.ibm.com>
References: <OF2703CF2F.9A3E5C8D-ON87257917.0066241B-87257917.00680AB2@us.ibm.com>
Message-ID: <f8f422217b7cd08cee06244b66784126@mx.varna.net>

Hi,

> Is this possible? I would think that if a node has properly/cleanly left
> the cluster, locks that were held by that node would be released. Is
there
> a way to display locks that may be still existing for that node that is
> down? And lastly, is there a way to force the release of those locks
with
> out the reboot of the cluster? I've been searching the linux-cluster
> archives with little success.

The best thing is to fix the initial problem, but as a workaround you may
try to fence_node from some of the other machines in the cluster even it
has left cleanly - this should cleanup the locks held from that node

about seeing the locks you may use "gfs(2)_tool lockdump <mount_point>" or
via debugfs by mounting it somewhere



From fdinitto at redhat.com  Tue Sep 27 09:39:03 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 27 Sep 2011 11:39:03 +0200
Subject: [Linux-cluster] cluster 3.1.7 release
Message-ID: <4E8199B7.20608@redhat.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Welcome to the cluster 3.1.7 release.

This release addresses several bugs and especially a serious problem
introduced in the 3.1.6 release. If you are currently running 3.1.6,
it is highly recommended to upgrade to 3.1.7 as soon as possible.

The new source tarball can be downloaded here:

https://fedorahosted.org/releases/c/l/cluster/cluster-3.1.7.tar.xz

ChangeLog:

https://fedorahosted.org/releases/c/l/cluster/Changelog-3.1.7

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Thanks/congratulations to all people that contributed to achieve this
great milestone.

Happy clustering,
Fabio
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJOgZm2AAoJEAgUGcMLQ3qJSMgP+wZN2YUyTLSmD7AK/EgeJPxf
q00EHAa7r0gReiSqwEkuGTTNNxwEkmEUoVlGUR2+Hu9jx6aYjPs+Z+KoCCrzjUGh
y4iSxcje1F2tjLwtswlNbL6itjglwfEHpskcyBRW2DiVDNX3zyUa4E1BE2zfnkOW
1PmxNnMJPQ+N0JDS9+RGho5qNvM+dll/paupl5kH76HY11j3vSY+1ugX5xhnxA4V
FAHxHw3lx7y5/ihqVK1OMBg7lIRzduo82eGJGy62p0VWm2+8VKX8z8YkfgBYfLj4
lWfsk8VHGiajGhA/5bBNphKwQY34NdmsOWJ4X5ksUFiDGJLZ+H400janmiMaheR2
m5T5Hs6ouOGoBIQm5jQxiA9JbeEyzZkl4crpjwQiRJLXJt4t0FHpwrzRIrCUTuPy
7LmIi3WJv2Q4EwDoRRhdOC/9j8WqAMrBoSq72P1b/hHZnRBkDh9X0z/w9tjNvF8C
RnfB6QBxEKnT27qkRyspLwfRx8DQXEGnjJbK6uDYu+m5Et5YJllDmvNKDe/BOjzt
nVw8egqgXKT0fumEFGxfwjmYVeWSpIazEAu5JyoKVddWiWKO2jUj8efgCkrAbZBh
CBKBoCQAVJjTGNsKL6a6xXYFHVjMhE5hsYH1/pT3rx+OiNOT6zQMF+r6MjOa/vyV
MrAP3GokgFOehsCMJhx4
=eiKh
-----END PGP SIGNATURE-----



From rsajnove at cisco.com  Tue Sep 27 21:29:54 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Tue, 27 Sep 2011 17:29:54 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
	Cluster 5.0
Message-ID: <CAA7B892.894B%rsajnove@cisco.com>


Hello,

I?m in the process of design a solution replacement to a Veritas
implementation and have to find similar functionalities, not
sure if this is doable in Red Hat Clutser:

    We have a distributed application that runs in several servers
simultaneously and that application must run in a cluster environment.
    The summary is as follows:

        1. Application has two different roles for the Servers, one we could
call ?Central Server? and the others ?Collectors?.
        2. Application has one Central Server and X Collector Servers.
        3. Central Server + Collector Servers represents a set of servers
that must be running all time and we want to implement
           two sets in order to implement failovers between them.
        4. First issue I have:
                Application is installed in all servers at same location,
let us say ?/opt/app? and I want to monitor it in all them (i.e.:
                different, separated, independent instances in separated
servers).
                In Veritas we had ?fscentral? and ?fscollector?, both with
same device name and mounting point and that worked fine,
                (of course, both resources were part of different service
groups and running in different servers).
                I tried to do the same here and got an error:
                   
>>>  clurgmgrd[9374]: <err> Unique attribute collision. type=fs attr=mountpoint
>>> value=/opt 
>>>  clurgmgrd[9374]: <err> Error storing fs resource
>>> 
>>   Then, I assume should be a different way to implement this resource? Notice
>> that the number of Collectors is variable so I
>>   can?t say ?collector 1 will be mounted as /opt1? or ?collector 1 will have
>> volume name as vol1?.
>> 
> 5. Second issue I have:
> 
>         How I can run the ?service? ?app collector? in more than one server
> simultaneously (in parallel)?
>         Again, the option to have ?X? services for ?X? Collectors is not a
> real option here.
> 
Any idea will be appreciated!!!
> 
> Thanks

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110927/d77e5c5f/attachment.htm>

From linux at alteeve.com  Tue Sep 27 23:18:15 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 27 Sep 2011 16:18:15 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CAA7B892.894B%rsajnove@cisco.com>
References: <CAA7B892.894B%rsajnove@cisco.com>
Message-ID: <4E8259B7.6090204@alteeve.com>

On 09/27/2011 02:29 PM, Ruben Sajnovetzky wrote:
> 
> Hello,
> 
> I?m in the process of design a solution replacement to a Veritas
> implementation and have to find similar functionalities, not
> sure if this is doable in Red Hat Clutser:
> 
>     We have a distributed application that runs in several servers
> simultaneously and that application must run in a cluster environment.
>     The summary is as follows:
> 
>         1. Application has two different roles for the Servers, one we
> could call ?Central Server? and the others ?Collectors?.
>         2. Application has _one_ Central Server and _X_ Collector Servers.
>         3. Central Server + Collector Servers represents a set of
> servers that must be running all time and we want to implement
>            two sets in order to implement failovers between them.
>         4. _First issue I have_:
>                 Application is installed in _all servers_ at same
> location, let us say ?/opt/app? and I want to monitor it in all them (i.e.:
>                 different, separated, independent instances in separated
> servers).
>                 In Veritas we had ?fscentral? and ?fscollector?, both
> with same device name and mounting point and that worked fine,
>                 (of course, both resources were part of different
> service groups and running in different servers).
>                 I tried to do the same here and got an error:
>                     
> 
>             clurgmgrd[9374]: <err> Unique attribute collision. type=fs
>             attr=mountpoint value=/opt
>              clurgmgrd[9374]: <err> Error storing fs resource
> 
>          Then, I assume should be a different way to implement this
>         resource? Notice that the number of Collectors is variable so I
>           can?t say ?collector 1 will be mounted as /opt1? or ?collector
>         1 will have volume name as vol1?.
> 
>     5. Second issue I have:
> 
>             How I can run the ?service? ?app collector? in more than one
>     server simultaneously (in parallel)?
>             Again, the option to have ?X? services for ?X? Collectors is
>     not a real option here.
> 
> Any idea will be appreciated!!!
> 
> 
>     Thanks

I've not read this carefully (at work, sorry), but if I grasped your
question;

For services you want to run on all servers;
- Defined a unique failoverdomain containing each node to run the
parallel services.
- Create the a service multiple times, each using the failoverdomain
containing the single target node.

For services to run on one node, but move on failure, create another
failover domain (ordered, if you want to set preferences) with the
candidate nodes as members. Then create a service and assign it to this
domain.

To provide your cluster.conf (or as much as you've crafted so far).
Please only obfuscate passwords if possible.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From linux at alteeve.com  Tue Sep 27 23:25:14 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 27 Sep 2011 16:25:14 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <4E8259B7.6090204@alteeve.com>
References: <CAA7B892.894B%rsajnove@cisco.com> <4E8259B7.6090204@alteeve.com>
Message-ID: <4E825B5A.3030206@alteeve.com>

Forgot to include an example;

This link shows RGManager/cluster.conf configured with two single-node
failoverdomains (for managing the storage services needed to be running
on both nodes in a 2-node cluster) and two failoverdomains used for a
service that can migrate (a VM, specifially). It will hopefully be
useful as a template for what you are trying to do.

https://alteeve.com/w/Red_Hat_Cluster_Service_2_Tutorial#Creating_the_Ordered_Failover_Domains

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From rsajnove at cisco.com  Wed Sep 28 00:04:53 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Tue, 27 Sep 2011 20:04:53 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <4E825B5A.3030206@alteeve.com>
Message-ID: <CAA7DCE5.899C%rsajnove@cisco.com>


Good example, thanks.
Not sure if is doable because we could have 10 servers and the idea to have
10 service instances could be tricky to admin :(

What about the other q, related with the usage of same name of devices and
mounting points?


-- 
Sent from my PDP-11





On 27-Sep-2011 7:25 PM, "Digimer" <linux at alteeve.com> wrote:

> Forgot to include an example;
> 
> This link shows RGManager/cluster.conf configured with two single-node
> failoverdomains (for managing the storage services needed to be running
> on both nodes in a 2-node cluster) and two failoverdomains used for a
> service that can migrate (a VM, specifially). It will hopefully be
> useful as a template for what you are trying to do.
> 
> https://alteeve.com/w/Red_Hat_Cluster_Service_2_Tutorial#Creating_the_Ordered_
> Failover_Domains

On 27-Sep-2011 7:18 PM, "Digimer" <linux at alteeve.com> wrote:

> On 09/27/2011 02:29 PM, Ruben Sajnovetzky wrote:
>> 
>> Hello,
>> 
>> I?m in the process of design a solution replacement to a Veritas
>> implementation and have to find similar functionalities, not
>> sure if this is doable in Red Hat Clutser:
>> 
>>     We have a distributed application that runs in several servers
>> simultaneously and that application must run in a cluster environment.
>>     The summary is as follows:
>> 
>>         1. Application has two different roles for the Servers, one we
>> could call ?Central Server? and the others ?Collectors?.
>>         2. Application has _one_ Central Server and _X_ Collector Servers.
>>         3. Central Server + Collector Servers represents a set of
>> servers that must be running all time and we want to implement
>>            two sets in order to implement failovers between them.
>>         4. _First issue I have_:
>>                 Application is installed in _all servers_ at same
>> location, let us say ?/opt/app? and I want to monitor it in all them (i.e.:
>>                 different, separated, independent instances in separated
>> servers).
>>                 In Veritas we had ?fscentral? and ?fscollector?, both
>> with same device name and mounting point and that worked fine,
>>                 (of course, both resources were part of different
>> service groups and running in different servers).
>>                 I tried to do the same here and got an error:
>>                 
>> 
>>             clurgmgrd[9374]: <err> Unique attribute collision. type=fs
>>             attr=mountpoint value=/opt
>>              clurgmgrd[9374]: <err> Error storing fs resource
>> 
>>          Then, I assume should be a different way to implement this
>>         resource? Notice that the number of Collectors is variable so I
>>           can?t say ?collector 1 will be mounted as /opt1? or ?collector
>>         1 will have volume name as vol1?.
>> 
>>     5. Second issue I have:
>> 
>>             How I can run the ?service? ?app collector? in more than one
>>     server simultaneously (in parallel)?
>>             Again, the option to have ?X? services for ?X? Collectors is
>>     not a real option here.
>> 
>> Any idea will be appreciated!!!
>> 
>> 
>>     Thanks
> 
> I've not read this carefully (at work, sorry), but if I grasped your
> question;
> 
> For services you want to run on all servers;
> - Defined a unique failoverdomain containing each node to run the
> parallel services.
> - Create the a service multiple times, each using the failoverdomain
> containing the single target node.
> 
> For services to run on one node, but move on failure, create another
> failover domain (ordered, if you want to set preferences) with the
> candidate nodes as members. Then create a service and assign it to this
> domain.
> 
> To provide your cluster.conf (or as much as you've crafted so far).
> Please only obfuscate passwords if possible.
> 
> -- 
> Digimer
> E-Mail:              digimer at alteeve.com
> Freenode handle:     digimer
> Papers and Projects: http://alteeve.com
> Node Assassin:       http://nodeassassin.org
> "At what point did we forget that the Space Shuttle was, essentially,
> a program that strapped human beings to an explosion and tried to stab
> through the sky with fire and math?"




From linux at alteeve.com  Wed Sep 28 00:19:19 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 27 Sep 2011 17:19:19 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CAA7DCE5.899C%rsajnove@cisco.com>
References: <CAA7DCE5.899C%rsajnove@cisco.com>
Message-ID: <4E826807.5030408@alteeve.com>

On 09/27/2011 05:04 PM, Ruben Sajnovetzky wrote:
> 
> Good example, thanks.
> Not sure if is doable because we could have 10 servers and the idea to have
> 10 service instances could be tricky to admin :(

Oh? How so? The file would be a bit long, but even with ten definitions
it should still be manageable. Particularly so if you use a tool like luci.

> What about the other q, related with the usage of same name of devices and
> mounting points?

I didn't follow that question. Rather, that sounds like a much bigger
question...

If '/opt/app' is local to each node, containing separate installs of the
application, it should be fine. However, I expect this is not the case,
of you'd not be asking.

If, on the other hand, '/opt/app' is a shared storage (ie: an NFS mount,
GFS2 partition, etc) then it should still be fine. Look again at that
link and search for '/xen_shared'. That is a common chunk of space
(using clvmd and gfs2) which is un/mounted by the cluster and it is
mounted in the same place on all nodes (and uses the same LV device name).

If I am not answering your question, please ask again. :)

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From rsajnove at cisco.com  Wed Sep 28 00:33:23 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Tue, 27 Sep 2011 20:33:23 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <4E826807.5030408@alteeve.com>
Message-ID: <CAA7E393.89A9%rsajnove@cisco.com>


I might be doing something wrong, because you say "you are fine" but didn't
work :(

All servers have "/opt/app" mounted in same internal disk partition.
They are not shared, it is just that all have identical layout.
I tried to create:

    Resource name: Central_FS
    Device: /dev/mapper/VolGroup00-optvol
    FS Type: ext3
    Mount point: /opt

And

    Resource name: Collector_FS
    Device: /dev/mapper/VolGroup00-optvol
    FS Type: ext3
    Mount point: /opt

When I tried to save it I found in the /var/log/messages:

 clurgmgrd[4174]: <notice> Reconfiguring
 clurgmgrd[4174]: <err> Unique attribute collision. type=fs attr=mountpoint
value=/opt 
 clurgmgrd[4174]: <err> Error storing fs resource
    
Thanks for your help and ideas!


On 27-Sep-2011 8:19 PM, "Digimer" <linux at alteeve.com> wrote:

> On 09/27/2011 05:04 PM, Ruben Sajnovetzky wrote:
>> 
>> Good example, thanks.
>> Not sure if is doable because we could have 10 servers and the idea to have
>> 10 service instances could be tricky to admin :(
> 
> Oh? How so? The file would be a bit long, but even with ten definitions
> it should still be manageable. Particularly so if you use a tool like luci.
> 
>> What about the other q, related with the usage of same name of devices and
>> mounting points?
> 
> I didn't follow that question. Rather, that sounds like a much bigger
> question...
> 
> If '/opt/app' is local to each node, containing separate installs of the
> application, it should be fine. However, I expect this is not the case,
> of you'd not be asking.
> 
> If, on the other hand, '/opt/app' is a shared storage (ie: an NFS mount,
> GFS2 partition, etc) then it should still be fine. Look again at that
> link and search for '/xen_shared'. That is a common chunk of space
> (using clvmd and gfs2) which is un/mounted by the cluster and it is
> mounted in the same place on all nodes (and uses the same LV device name).
> 
> If I am not answering your question, please ask again. :)



From linux at alteeve.com  Wed Sep 28 00:45:34 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 27 Sep 2011 17:45:34 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CAA7E393.89A9%rsajnove@cisco.com>
References: <CAA7E393.89A9%rsajnove@cisco.com>
Message-ID: <4E826E2E.5000507@alteeve.com>

On 09/27/2011 05:33 PM, Ruben Sajnovetzky wrote:
> 
> I might be doing something wrong, because you say "you are fine" but didn't
> work :(
> 
> All servers have "/opt/app" mounted in same internal disk partition.
> They are not shared, it is just that all have identical layout.
> I tried to create:
> 
>     Resource name: Central_FS
>     Device: /dev/mapper/VolGroup00-optvol
>     FS Type: ext3
>     Mount point: /opt
> 
> And
> 
>     Resource name: Collector_FS
>     Device: /dev/mapper/VolGroup00-optvol
>     FS Type: ext3
>     Mount point: /opt
> 
> When I tried to save it I found in the /var/log/messages:
> 
>  clurgmgrd[4174]: <notice> Reconfiguring
>  clurgmgrd[4174]: <err> Unique attribute collision. type=fs attr=mountpoint
> value=/opt 
>  clurgmgrd[4174]: <err> Error storing fs resource
>     
> Thanks for your help and ideas!

Please post your cluster.conf file (and obfuscate only passwords,
please). Also post a sample /etc/fstab and the outputs of 'pvscan',
'vgscan' and 'lvscan'.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From amit.jathar at alepo.com  Wed Sep 28 11:47:59 2011
From: amit.jathar at alepo.com (Amit Jathar)
Date: Wed, 28 Sep 2011 11:47:59 +0000
Subject: [Linux-cluster] corosync crashes after firing crm configuration
 command on any one node
Message-ID: <A7E496ED2A91BC40A0772331D89104BC167EF271@mbx021-e2-nj-2.exch021.domain.local>

Hi,

I am facing weird issue in the corosync behavior.

I have configured a two node cluster.
The cluster is working fine & the crm_mon command is showing proper output.
The command cibadmin -Q also working on both the nodes properly.

The issue starts when I put any crm configuration command.

As I put crm configuration command, I can see the following output:-
[root at AAA02 corosync]# crm configure property no-quorum-policy=ignore Could not connect to the CIB: Remote node did not respond
ERROR: creating tmp shadow __crmshell.12274 failed
[root at AAA02 corosync]#


At the same time, the logs in the /var/log/messages says that:- Sep 28 13:38:40 localhost cibadmin: [12295]: info: Invoked: cibadmin -Ql Sep 28 13:38:40 localhost cibadmin: [12296]: info: Invoked: cibadmin -Ql Sep 28 13:38:40 localhost crm_shadow: [12298]: info: Invoked: crm_shadow -c __crmshell.12274

I have attached a file which has cib.xml & corosync.conf file contents on both the nodes .

Please guide me to troubleshoot this error.
Thanks in advance.

Thanks,
Amit


________________________________
This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited.
________________________________


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cib_xml_corosync_conf.txt
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110928/6a9223ce/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: logs_on_node.txt
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110928/6a9223ce/attachment-0001.txt>

From raju.rajsand at gmail.com  Wed Sep 28 12:49:50 2011
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Wed, 28 Sep 2011 18:19:50 +0530
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CAA7E393.89A9%rsajnove@cisco.com>
References: <4E826807.5030408@alteeve.com> <CAA7E393.89A9%rsajnove@cisco.com>
Message-ID: <CA+YdgaoDLDq=s7fxgGMwBg9K74Yj8HoGpYhS4EqX5VAqS3y+7A@mail.gmail.com>

Greetings,

On Wed, Sep 28, 2011 at 6:03 AM, Ruben Sajnovetzky <rsajnove at cisco.com> wrote:
>
> ? ?FS Type: ext3

Shouldn't it be GFS /gfs2?


-- 
Regards,

Rajagopal



From rhayden.public at gmail.com  Wed Sep 28 12:52:52 2011
From: rhayden.public at gmail.com (Robert Hayden)
Date: Wed, 28 Sep 2011 07:52:52 -0500
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <4E826E2E.5000507@alteeve.com>
References: <CAA7E393.89A9%rsajnove@cisco.com> <4E826E2E.5000507@alteeve.com>
Message-ID: <CANqTVAFQiNZc88T8vNDjH3SWCShu+HAcCmSSkob6buRPRvpGUg@mail.gmail.com>

> On 09/27/2011 05:33 PM, Ruben Sajnovetzky wrote:
> >
> > I might be doing something wrong, because you say "you are fine" but
> didn't
> > work :(
> >
> > All servers have "/opt/app" mounted in same internal disk partition.
> > They are not shared, it is just that all have identical layout.
> > I tried to create:
> >
> >     Resource name: Central_FS
> >     Device: /dev/mapper/VolGroup00-optvol
> >     FS Type: ext3
> >     Mount point: /opt
> >
> > And
> >
> >     Resource name: Collector_FS
> >     Device: /dev/mapper/VolGroup00-optvol
> >     FS Type: ext3
> >     Mount point: /opt
> >
>

My suggestion here is theoretical and not tested.... I think you want to
have a single "resource" with different service names.  For example,

<rm>
  <resources>
       <fs name="OptAppFS" device=/dev/mapper/VolGroup00-optvol .....>
  </resources>
       <service name="Central_FS" .....>
             <fs ref="OptAppFS"  ....>
       </service>
       <service name="Collector_FS" .....>
             <fs ref="OptAppFS"  ....>
       </service>
</rm>



> > When I tried to save it I found in the /var/log/messages:
> >
> >  clurgmgrd[4174]: <notice> Reconfiguring
> >  clurgmgrd[4174]: <err> Unique attribute collision. type=fs
> attr=mountpoint
> > value=/opt
> >  clurgmgrd[4174]: <err> Error storing fs resource
> >
> > Thanks for your help and ideas!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110928/b55ba99c/attachment.htm>

From rsajnove at cisco.com  Wed Sep 28 13:09:13 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Wed, 28 Sep 2011 09:09:13 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CANqTVAFQiNZc88T8vNDjH3SWCShu+HAcCmSSkob6buRPRvpGUg@mail.gmail.com>
Message-ID: <CAA894B9.8A19%rsajnove@cisco.com>

This approach didn?t work either :(
First server started service the second couldn?t start



On 28-Sep-2011 8:52 AM, "Robert Hayden" <rhayden.public at gmail.com> wrote:

> 
>> On 09/27/2011 05:33 PM, Ruben Sajnovetzky wrote:
>>> >
>>> > I might be doing something wrong, because you say "you are fine" but
>>> didn't
>>> > work :(
>>> >
>>> > All servers have "/opt/app" mounted in same internal disk partition.
>>> > They are not shared, it is just that all have identical layout.
>>> > I tried to create:
>>> >
>>> > ? ? Resource name: Central_FS
>>> > ? ? Device: /dev/mapper/VolGroup00-optvol
>>> > ? ? FS Type: ext3
>>> > ? ? Mount point: /opt
>>> >
>>> > And
>>> >
>>> > ? ? Resource name: Collector_FS
>>> > ? ? Device: /dev/mapper/VolGroup00-optvol
>>> > ? ? FS Type: ext3
>>> > ? ? Mount point: /opt
>>> >
> 
> My suggestion here is theoretical and not tested.... I think you want to have
> a single "resource" with different service names.? For example,
> 
> <rm>
> ? <resources>
> ?????? <fs name="OptAppFS" device=/dev/mapper/VolGroup00-optvol .....>
> ? </resources> ? 
> ?????? <service name="Central_FS" .....>
> ???????????? <fs ref="OptAppFS"? ....>
> ?????? </service>
> ?????? <service name="Collector_FS" .....>
> ???????????? <fs ref="OptAppFS"? ....>
> ?????? </service>
> </rm>
> 
> ?
>>> > When I tried to save it I found in the /var/log/messages:
>>> >
>>> > ?clurgmgrd[4174]: <notice> Reconfiguring
>>> > ?clurgmgrd[4174]: <err> Unique attribute collision. type=fs
>>> attr=mountpoint
>>> > value=/opt
>>> > ?clurgmgrd[4174]: <err> Error storing fs resource
>>> >
>>> > Thanks for your help and ideas!
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110928/861ad1d7/attachment.htm>

From rsajnove at cisco.com  Wed Sep 28 13:20:39 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Wed, 28 Sep 2011 09:20:39 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <4E826E2E.5000507@alteeve.com>
Message-ID: <CAA89767.8A20%rsajnove@cisco.com>


Here is the cluster.conf (didn't get access to run other commands yet) :

<?xml version="1.0"?>
<cluster alias="PPM_CL1" config_version="81" name="PPM_CL1">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="30"/>
        <clusternodes>
                <clusternode name="server-87111" nodeid="1" votes="2">
                        <fence/>
                </clusternode>
                <clusternode name="server-87112" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1">
                <multicast addr="224.4.5.6"/>
        </cman>
        <fencedevices/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="PPM_GW_FDN" nofailback="1"
ordered="0" restricted="1">
                                <failoverdomainnode name="server-87111"
priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="PPM_UNIT_FDN" nofailback="1"
ordered="0" restricted="1">
                                <failoverdomainnode name="server-87112"
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="10.116.159.105" monitor_link="1"/>
                        <script file="/usr/local/bin/cluster/ppm_gw_ha"
name="PPM_GW"/>
                        <fs device="/dev/mapper/VolGroup00-optvol"
force_fsck="1" force_unmount="0" fsid="36845" fstype="ext3"
mountpoint="/opt" name="PPM_OPT_FS" self_fence="0"/>
                        <apache config_file="conf/httpd.conf" name="web"
server_root="/etc/httpd" shutdown_wait="0"/>
                        <clusterfs device="/dev/hdd" force_unmount="0"
fsid="20023" fstype="gfs" mountpoint="/mnt" name="pru1" self_fence="0"/>
                </resources>
                <service autostart="0" exclusive="0" name="PPM_PRUEBA"/>
                <service autostart="0" domain="PPM_UNIT_FDN" exclusive="0"
name="PPM Units">
                        <fs ref="PPM_OPT_FS"/>
                </service>
                <service autostart="0" domain="PPM_GW_FDN" exclusive="0"
name="PPM Gateway">
                        <fs ref="PPM_OPT_FS">
                                <ip ref="10.116.159.105">
                                        <script ref="PPM_GW"/>
                                </ip>
                        </fs>
                </service>
        </rm>
</cluster>


On 27-Sep-2011 8:45 PM, "Digimer" <linux at alteeve.com> wrote:

> On 09/27/2011 05:33 PM, Ruben Sajnovetzky wrote:
>> 
>> I might be doing something wrong, because you say "you are fine" but didn't
>> work :(
>> 
>> All servers have "/opt/app" mounted in same internal disk partition.
>> They are not shared, it is just that all have identical layout.
>> I tried to create:
>> 
>>     Resource name: Central_FS
>>     Device: /dev/mapper/VolGroup00-optvol
>>     FS Type: ext3
>>     Mount point: /opt
>> 
>> And
>> 
>>     Resource name: Collector_FS
>>     Device: /dev/mapper/VolGroup00-optvol
>>     FS Type: ext3
>>     Mount point: /opt
>> 
>> When I tried to save it I found in the /var/log/messages:
>> 
>>  clurgmgrd[4174]: <notice> Reconfiguring
>>  clurgmgrd[4174]: <err> Unique attribute collision. type=fs attr=mountpoint
>> value=/opt 
>>  clurgmgrd[4174]: <err> Error storing fs resource
>>     
>> Thanks for your help and ideas!
> 
> Please post your cluster.conf file (and obfuscate only passwords,
> please). Also post a sample /etc/fstab and the outputs of 'pvscan',
> 'vgscan' and 'lvscan'.



From ext.thales.jean-daniel.bonnetot at sncf.fr  Wed Sep 28 15:58:02 2011
From: ext.thales.jean-daniel.bonnetot at sncf.fr (BONNETOT Jean-Daniel (EXT THALES))
Date: Wed, 28 Sep 2011 17:58:02 +0200
Subject: [Linux-cluster] (no subject)
Message-ID: <C088D3516432C643AC828162A5164A7F0AA09779@se3lmwbibaw.COMMUN.AD.SNCF.FR>

Hi,

 

I have problem with two node cluster. When I force a node to faile,
second node fences first one. When first one rejoin my cluster, cman
shutdown on both nodes saying : 

 

Sep 28 17:29:36 s64lmwbig3c openais[7273]: [MAIN ] Killing node
s64lmwbig3b because it has rejoined the cluster with existing state

Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CMAN ] cman killed by node 1
because we rejoined the cluster without a full restart

 

 

Logs :

See attached

 

Conf :

<?xml version="1.0"?>

<cluster config_version="12" name="u64lmwbig8r">

        <cman expected_votes="1" two_node="1">

                <multicast addr="239.192.0.11"/>

        </cman>

        <clusternodes>

                <clusternode name="s64lmwbig3b" nodeid="1" votes="1">

                        <fence>

                                <method name="single">

                                        <device name="fenceHP_g3b"/>

                                </method>

                        </fence>

                </clusternode>

                <clusternode name="s64lmwbig3c" nodeid="2" votes="1">

                        <fence>

                                <method name="single">

                                        <device name="fenceHP_g3c"/>

                                </method>

                        </fence>

                </clusternode>

        </clusternodes>

        <fencedevices>

                <fencedevice agent="fence_ipmilan" ipaddr="XXXXX"
lanplus="1" login="user" name="fenceHP_g3b" passwd="password"
verbose="yes"/>

                <fencedevice agent="fence_ipmilan" ipaddr="XXXXX"
lanplus="1" login="user" name="fenceHP_g3c" passwd="password"
verbose="yes"/>

        </fencedevices>

        <rm>

                <failoverdomains/>

                <resources/>

        </rm>

        <fence_daemon clean_start="0" post_fail_delay="20"
post_join_delay="60"/>

</cluster>

 

Do you know what I missed ?

 

Thanks

Regards,

 

 

Jean-Daniel BONNETOT

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110928/a51d5fa4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 667 bytes
Desc: image001.jpg
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110928/a51d5fa4/attachment.jpg>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cluster_log.txt
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110928/a51d5fa4/attachment.txt>
-------------- next part --------------
-------
Ce message et toutes les pi?ces jointes sont ?tablis ? l'intention exclusive de ses destinataires et sont confidentiels. L'int?grit? de ce message n'?tant pas assur?e sur Internet, la SNCF ne peut ?tre tenue responsable des alt?rations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, m?me partielle, non autoris?e pr?alablement par la SNCF, est strictement interdite. Si vous n'?tes pas le destinataire de ce message, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it. 

From linux at alteeve.com  Wed Sep 28 16:44:59 2011
From: linux at alteeve.com (Digimer)
Date: Wed, 28 Sep 2011 09:44:59 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CAA894B9.8A19%rsajnove@cisco.com>
References: <CAA894B9.8A19%rsajnove@cisco.com>
Message-ID: <4E834F0B.5020702@alteeve.com>

On 09/28/2011 06:09 AM, Ruben Sajnovetzky wrote:
> This approach didn?t work either :(
> First server started service the second couldn?t start

You only shared a small snippet of your cluster.conf config, and none of
the other requested info. I don't know what might be missing versus omitted.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From linux at alteeve.com  Wed Sep 28 16:50:57 2011
From: linux at alteeve.com (Digimer)
Date: Wed, 28 Sep 2011 09:50:57 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CAA89767.8A20%rsajnove@cisco.com>
References: <CAA89767.8A20%rsajnove@cisco.com>
Message-ID: <4E835071.6080506@alteeve.com>

On 09/28/2011 06:20 AM, Ruben Sajnovetzky wrote:
> <?xml version="1.0"?>
> <cluster alias="PPM_CL1" config_version="81" name="PPM_CL1">
>         <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="30"/>
>         <clusternodes>
>                 <clusternode name="server-87111" nodeid="1" votes="2">
>                         <fence/>
>                 </clusternode>
>                 <clusternode name="server-87112" nodeid="2" votes="1">
>                         <fence/>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1">
>                 <multicast addr="224.4.5.6"/>
>         </cman>
>         <fencedevices/>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="PPM_GW_FDN" nofailback="1"
> ordered="0" restricted="1">
>                                 <failoverdomainnode name="server-87111"
> priority="1"/>
>                         </failoverdomain>
>                         <failoverdomain name="PPM_UNIT_FDN" nofailback="1"
> ordered="0" restricted="1">
>                                 <failoverdomainnode name="server-87112"
> priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <ip address="10.116.159.105" monitor_link="1"/>
>                         <script file="/usr/local/bin/cluster/ppm_gw_ha"
> name="PPM_GW"/>
>                         <fs device="/dev/mapper/VolGroup00-optvol"
> force_fsck="1" force_unmount="0" fsid="36845" fstype="ext3"
> mountpoint="/opt" name="PPM_OPT_FS" self_fence="0"/>
>                         <apache config_file="conf/httpd.conf" name="web"
> server_root="/etc/httpd" shutdown_wait="0"/>
>                         <clusterfs device="/dev/hdd" force_unmount="0"
> fsid="20023" fstype="gfs" mountpoint="/mnt" name="pru1" self_fence="0"/>
>                 </resources>
>                 <service autostart="0" exclusive="0" name="PPM_PRUEBA"/>
>                 <service autostart="0" domain="PPM_UNIT_FDN" exclusive="0"
> name="PPM Units">
>                         <fs ref="PPM_OPT_FS"/>
>                 </service>
>                 <service autostart="0" domain="PPM_GW_FDN" exclusive="0"
> name="PPM Gateway">
>                         <fs ref="PPM_OPT_FS">
>                                 <ip ref="10.116.159.105">
>                                         <script ref="PPM_GW"/>
>                                 </ip>
>                         </fs>
>                 </service>
>         </rm>
> </cluster>

Ah, here it is.

If I can recommend, remove the parts you're not yet using (the empty
service, the apache resource, etc). Remove any options that aren't
critical... make it as simple as possible. Once you get it working, you
can start adding things back.

As an aside, with the config you have, the IP address will never
migrate. It will come up on and only ever run on PPM_GW_FDN. Also, it
will not come up if the fs resource fails as it is a child. This may be
what you want though.

Also, you *must* configure fencing. Even without a shared file system, a
fence call will hang the cluster. It can cause many odd symptoms. Get
fencing configured and working, strip down the config and try again. If
it still fails, repost the config as it is at that point.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From linux at alteeve.com  Wed Sep 28 16:51:54 2011
From: linux at alteeve.com (Digimer)
Date: Wed, 28 Sep 2011 09:51:54 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CA+YdgaoDLDq=s7fxgGMwBg9K74Yj8HoGpYhS4EqX5VAqS3y+7A@mail.gmail.com>
References: <4E826807.5030408@alteeve.com> <CAA7E393.89A9%rsajnove@cisco.com>
	<CA+YdgaoDLDq=s7fxgGMwBg9K74Yj8HoGpYhS4EqX5VAqS3y+7A@mail.gmail.com>
Message-ID: <4E8350AA.6090801@alteeve.com>

On 09/28/2011 05:49 AM, Rajagopal Swaminathan wrote:
> Greetings,
> 
> On Wed, Sep 28, 2011 at 6:03 AM, Ruben Sajnovetzky <rsajnove at cisco.com> wrote:
>>
>>    FS Type: ext3
> 
> Shouldn't it be GFS /gfs2?

You can use non-clustered FS if your not mounting the same device on
multiple nodes. I'm not sure why you'd want it managed by the cluster,
mind you, but I figured Ruben had his reasons.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From rsajnove at cisco.com  Wed Sep 28 16:57:02 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Wed, 28 Sep 2011 12:57:02 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <4E834F0B.5020702@alteeve.com>
Message-ID: <CAA8CA1E.8AB2%rsajnove@cisco.com>


I copied the full cluster.conf, I deleted everything else to ?concentrate?
in the issue.
Now I re-created everything from scratch and with only FS service. I?m
copying here the files and
Output you requested.

Situation is still the same.

cluster.conf file:

<?xml version="1.0"?>
<cluster alias="PPM Toronto" config_version="30" name="PPM Toronto">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="server-87111" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="server-87112" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="0">
                <multicast addr="224.4.5.6"/>
        </cman>
        <fencedevices/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="PPM GW Failover"
nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="server-87111"
priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="PPM Units Failover"
nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="server-87112"
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <fs device="/dev/VolGroup00/optvol" force_fsck="1"
force_unmount="0" fsid="36845" fstype="ext3" mountpoint="/opt"
name="PPM_OPT_FS" self_fence="0"/>
                </resources>
                <service autostart="0" domain="PPM GW Failover"
exclusive="0" name="PPM Gateway">
                        <fs ref="PPM_OPT_FS"/>
                </service>
                <service autostart="0" domain="PPM Units Failover"
exclusive="0" name="PPM Units">
                        <fs ref="PPM_OPT_FS"/>
                </service>
        </rm>
</cluster>


------------------------------------------

/etc/fstab

/dev/VolGroup00/LogVol00 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/dev/VolGroup00/LogVol01 swap                    swap    defaults        0 0
/dev/VolGroup00/homevol /home                   ext3    defaults        1 1
#####/dev/VolGroup00/optvol  /opt                    ext3    defaults
1 1



------------------------------------------

[root at server-87112 cluster]# pvscan
  PV /dev/sda2   VG VolGroup00   lvm2 [255.88 GB / 17.09 GB free]
  Total: 1 [255.88 GB] / in use: 1 [255.88 GB] / in no VG: 0 [0   ]
[root at server-87112 cluster]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "VolGroup00" using metadata type lvm2
[root@ server-87112 cluster]# lvscan
  ACTIVE            '/dev/VolGroup00/LogVol00' [11.00 GB] inherit
  ACTIVE            '/dev/VolGroup00/LogVol01' [7.78 GB] inherit
  ACTIVE            '/dev/VolGroup00/homevol' [100.00 GB] inherit
  ACTIVE            '/dev/VolGroup00/optvol' [120.00 GB] inherit

[root at server-87111 cluster]# pvscan
  PV /dev/sda2   VG VolGroup00   lvm2 [255.88 GB / 17.09 GB free]
  Total: 1 [255.88 GB] / in use: 1 [255.88 GB] / in no VG: 0 [0   ]
[root at server-87111 cluster]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "VolGroup00" using metadata type lvm2
[root at server-87111 cluster]# lvscan
  ACTIVE            '/dev/VolGroup00/LogVol00' [11.00 GB] inherit
  ACTIVE            '/dev/VolGroup00/LogVol01' [7.78 GB] inherit
  ACTIVE            '/dev/VolGroup00/homevol' [100.00 GB] inherit
  ACTIVE            '/dev/VolGroup00/optvol' [120.00 GB] inherit



On 28-Sep-2011 12:44 PM, "Digimer" <linux at alteeve.com> wrote:

> On 09/28/2011 06:09 AM, Ruben Sajnovetzky wrote:
>> > This approach didn?t work either :(
>> > First server started service the second couldn?t start
> 
> You only shared a small snippet of your cluster.conf config, and none of
> the other requested info. I don't know what might be missing versus omitted.
> 
> --
> Digimer
> E-Mail:              digimer at alteeve.com
> Freenode handle:     digimer
> Papers and Projects: http://alteeve.com
> Node Assassin:       http://nodeassassin.org
> "At what point did we forget that the Space Shuttle was, essentially,
> a program that strapped human beings to an explosion and tried to stab
> through the sky with fire and math?"
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110928/747886b1/attachment.htm>

From rsajnove at cisco.com  Wed Sep 28 17:02:06 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Wed, 28 Sep 2011 13:02:06 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <4E835071.6080506@alteeve.com>
Message-ID: <CAA8CB4E.8AB7%rsajnove@cisco.com>


We crossed e-mails :)
I sent the new-fresh configuration.
I thought about fencing, the problem is that we have a very "odd"
configuration because I don't really need fence anything ... Maybe I
can establish kind of rule like "If Central is not working, Collector cant
work" or similar, will think on it.

The IP will not be migrated, we have a requirement to have Virtual IP for
the Central (only), that is why it is configured.

Regarding FS (other member asked why GFS), I don't see the advantage
because, again, we are not "sharing" the storage.

Thanks



On 28-Sep-2011 12:50 PM, "Digimer" <linux at alteeve.com> wrote:

> On 09/28/2011 06:20 AM, Ruben Sajnovetzky wrote:
>> <?xml version="1.0"?>
>> <cluster alias="PPM_CL1" config_version="81" name="PPM_CL1">
>>         <fence_daemon clean_start="0" post_fail_delay="0"
>> post_join_delay="30"/>
>>         <clusternodes>
>>                 <clusternode name="server-87111" nodeid="1" votes="2">
>>                         <fence/>
>>                 </clusternode>
>>                 <clusternode name="server-87112" nodeid="2" votes="1">
>>                         <fence/>
>>                 </clusternode>
>>         </clusternodes>
>>         <cman expected_votes="1">
>>                 <multicast addr="224.4.5.6"/>
>>         </cman>
>>         <fencedevices/>
>>         <rm>
>>                 <failoverdomains>
>>                         <failoverdomain name="PPM_GW_FDN" nofailback="1"
>> ordered="0" restricted="1">
>>                                 <failoverdomainnode name="server-87111"
>> priority="1"/>
>>                         </failoverdomain>
>>                         <failoverdomain name="PPM_UNIT_FDN" nofailback="1"
>> ordered="0" restricted="1">
>>                                 <failoverdomainnode name="server-87112"
>> priority="1"/>
>>                         </failoverdomain>
>>                 </failoverdomains>
>>                 <resources>
>>                         <ip address="10.116.159.105" monitor_link="1"/>
>>                         <script file="/usr/local/bin/cluster/ppm_gw_ha"
>> name="PPM_GW"/>
>>                         <fs device="/dev/mapper/VolGroup00-optvol"
>> force_fsck="1" force_unmount="0" fsid="36845" fstype="ext3"
>> mountpoint="/opt" name="PPM_OPT_FS" self_fence="0"/>
>>                         <apache config_file="conf/httpd.conf" name="web"
>> server_root="/etc/httpd" shutdown_wait="0"/>
>>                         <clusterfs device="/dev/hdd" force_unmount="0"
>> fsid="20023" fstype="gfs" mountpoint="/mnt" name="pru1" self_fence="0"/>
>>                 </resources>
>>                 <service autostart="0" exclusive="0" name="PPM_PRUEBA"/>
>>                 <service autostart="0" domain="PPM_UNIT_FDN" exclusive="0"
>> name="PPM Units">
>>                         <fs ref="PPM_OPT_FS"/>
>>                 </service>
>>                 <service autostart="0" domain="PPM_GW_FDN" exclusive="0"
>> name="PPM Gateway">
>>                         <fs ref="PPM_OPT_FS">
>>                                 <ip ref="10.116.159.105">
>>                                         <script ref="PPM_GW"/>
>>                                 </ip>
>>                         </fs>
>>                 </service>
>>         </rm>
>> </cluster>
> 
> Ah, here it is.
> 
> If I can recommend, remove the parts you're not yet using (the empty
> service, the apache resource, etc). Remove any options that aren't
> critical... make it as simple as possible. Once you get it working, you
> can start adding things back.
> 
> As an aside, with the config you have, the IP address will never
> migrate. It will come up on and only ever run on PPM_GW_FDN. Also, it
> will not come up if the fs resource fails as it is a child. This may be
> what you want though.
> 
> Also, you *must* configure fencing. Even without a shared file system, a
> fence call will hang the cluster. It can cause many odd symptoms. Get
> fencing configured and working, strip down the config and try again. If
> it still fails, repost the config as it is at that point.



From linux at alteeve.com  Wed Sep 28 17:04:13 2011
From: linux at alteeve.com (Digimer)
Date: Wed, 28 Sep 2011 10:04:13 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CAA8CA1E.8AB2%rsajnove@cisco.com>
References: <CAA8CA1E.8AB2%rsajnove@cisco.com>
Message-ID: <4E83538D.4040106@alteeve.com>

Ok, that *looks* fine. So when you start the cman and rgmanager, what
does 'clustat' show?

Also, *setup fencing*. Without fencing configured, weird things will
happen. Once you have fencing configured and tested, paste the updated
cluster.conf and the output of clustat.

On 09/28/2011 09:57 AM, Ruben Sajnovetzky wrote:
> 
> I copied the full cluster.conf, I deleted everything else to
> ?concentrate? in the issue.
> Now I re-created everything from scratch and with only FS service. I?m
> copying here the files and
> Output you requested.
> 
> Situation is still the same.
> 
> cluster.conf file:
> 
> <?xml version="1.0"?>
> <cluster alias="PPM Toronto" config_version="30" name="PPM Toronto">
>         <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="server-87111" nodeid="1" votes="1">
>                         <fence/>
>                 </clusternode>
>                 <clusternode name="server-87112" nodeid="2" votes="1">
>                         <fence/>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="0">
>                 <multicast addr="224.4.5.6"/>
>         </cman>
>         <fencedevices/>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="PPM GW Failover"
> nofailback="1" ordered="0" restricted="1">
>                                 <failoverdomainnode name="server-87111"
> priority="1"/>
>                         </failoverdomain>
>                         <failoverdomain name="PPM Units Failover"
> nofailback="1" ordered="0" restricted="1">
>                                 <failoverdomainnode name="server-87112"
> priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <fs device="/dev/VolGroup00/optvol"
> force_fsck="1" force_unmount="0" fsid="36845" fstype="ext3"
> mountpoint="/opt" name="PPM_OPT_FS" self_fence="0"/>
>                 </resources>
>                 <service autostart="0" domain="PPM GW Failover"
> exclusive="0" name="PPM Gateway">
>                         <fs ref="PPM_OPT_FS"/>
>                 </service>
>                 <service autostart="0" domain="PPM Units Failover"
> exclusive="0" name="PPM Units">
>                         <fs ref="PPM_OPT_FS"/>
>                 </service>
>         </rm>
> </cluster>
> 
> 
> ------------------------------------------
> 
> /etc/fstab
> 
> /dev/VolGroup00/LogVol00 /                       ext3    defaults        1 1
> LABEL=/boot             /boot                   ext3    defaults        1 2
> tmpfs                   /dev/shm                tmpfs   defaults        0 0
> devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
> sysfs                   /sys                    sysfs   defaults        0 0
> proc                    /proc                   proc    defaults        0 0
> /dev/VolGroup00/LogVol01 swap                    swap    defaults        0 0
> /dev/VolGroup00/homevol /home                   ext3    defaults        1 1
> #####/dev/VolGroup00/optvol  /opt                    ext3    defaults
>        1 1
> 
> 
> 
> ------------------------------------------
> 
> [root at server-87112 cluster]# pvscan
>   PV /dev/sda2   VG VolGroup00   lvm2 [255.88 GB / 17.09 GB free]
>   Total: 1 [255.88 GB] / in use: 1 [255.88 GB] / in no VG: 0 [0   ]
> [root at server-87112 cluster]# vgscan
>   Reading all physical volumes.  This may take a while...
>   Found volume group "VolGroup00" using metadata type lvm2
> [root@ server-87112 cluster]# lvscan
>   ACTIVE            '/dev/VolGroup00/LogVol00' [11.00 GB] inherit
>   ACTIVE            '/dev/VolGroup00/LogVol01' [7.78 GB] inherit
>   ACTIVE            '/dev/VolGroup00/homevol' [100.00 GB] inherit
>   ACTIVE            '/dev/VolGroup00/optvol' [120.00 GB] inherit
> 
> [root at server-87111 cluster]# pvscan
>   PV /dev/sda2   VG VolGroup00   lvm2 [255.88 GB / 17.09 GB free]
>   Total: 1 [255.88 GB] / in use: 1 [255.88 GB] / in no VG: 0 [0   ]
> [root at server-87111 cluster]# vgscan
>   Reading all physical volumes.  This may take a while...
>   Found volume group "VolGroup00" using metadata type lvm2
> [root at server-87111 cluster]# lvscan
>   ACTIVE            '/dev/VolGroup00/LogVol00' [11.00 GB] inherit
>   ACTIVE            '/dev/VolGroup00/LogVol01' [7.78 GB] inherit
>   ACTIVE            '/dev/VolGroup00/homevol' [100.00 GB] inherit
>   ACTIVE            '/dev/VolGroup00/optvol' [120.00 GB] inherit
> 
> 
> 
> On 28-Sep-2011 12:44 PM, "Digimer" <linux at alteeve.com> wrote:
> 
>     On 09/28/2011 06:09 AM, Ruben Sajnovetzky wrote:
>     > This approach didn?t work either :(
>     > First server started service the second couldn?t start
> 
>     You only shared a small snippet of your cluster.conf config, and none of
>     the other requested info. I don't know what might be missing versus
>     omitted.
> 
>     --
>     Digimer
>     E-Mail:              digimer at alteeve.com
>     Freenode handle:     digimer
>     Papers and Projects: http://alteeve.com
>     Node Assassin:       http://nodeassassin.org
>     "At what point did we forget that the Space Shuttle was, essentially,
>     a program that strapped human beings to an explosion and tried to stab
>     through the sky with fire and math?"
> 


-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From linux at alteeve.com  Wed Sep 28 17:05:16 2011
From: linux at alteeve.com (Digimer)
Date: Wed, 28 Sep 2011 10:05:16 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CAA8CB4E.8AB7%rsajnove@cisco.com>
References: <CAA8CB4E.8AB7%rsajnove@cisco.com>
Message-ID: <4E8353CC.4080601@alteeve.com>

On 09/28/2011 10:02 AM, Ruben Sajnovetzky wrote:
> 
> We crossed e-mails :)
> I sent the new-fresh configuration.
> I thought about fencing, the problem is that we have a very "odd"
> configuration because I don't really need fence anything ... Maybe I
> can establish kind of rule like "If Central is not working, Collector cant
> work" or similar, will think on it.
> 
> The IP will not be migrated, we have a requirement to have Virtual IP for
> the Central (only), that is why it is configured.
> 
> Regarding FS (other member asked why GFS), I don't see the advantage
> because, again, we are not "sharing" the storage.
> 
> Thanks

The cluster stack requires fencing. Please read this;

https://alteeve.com/w/Red_Hat_Cluster_Service_2_Tutorial#Concept.3B_Fencing

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From rsajnove at cisco.com  Wed Sep 28 17:09:07 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Wed, 28 Sep 2011 13:09:07 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <4E8353CC.4080601@alteeve.com>
Message-ID: <CAA8CCF3.8AC7%rsajnove@cisco.com>

Thanks for this.
I have still a long way to learn :)



On 28-Sep-2011 1:05 PM, "Digimer" <linux at alteeve.com> wrote:

> On 09/28/2011 10:02 AM, Ruben Sajnovetzky wrote:
>> >
>> > We crossed e-mails :)
>> > I sent the new-fresh configuration.
>> > I thought about fencing, the problem is that we have a very "odd"
>> > configuration because I don't really need fence anything ... Maybe I
>> > can establish kind of rule like "If Central is not working, Collector cant
>> > work" or similar, will think on it.
>> >
>> > The IP will not be migrated, we have a requirement to have Virtual IP for
>> > the Central (only), that is why it is configured.
>> >
>> > Regarding FS (other member asked why GFS), I don't see the advantage
>> > because, again, we are not "sharing" the storage.
>> >
>> > Thanks
> 
> The cluster stack requires fencing. Please read this;
> 
> https://alteeve.com/w/Red_Hat_Cluster_Service_2_Tutorial#Concept.3B_Fencing
> 
> --
> Digimer
> E-Mail:              digimer at alteeve.com
> Freenode handle:     digimer
> Papers and Projects: http://alteeve.com
> Node Assassin:       http://nodeassassin.org
> "At what point did we forget that the Space Shuttle was, essentially,
> a program that strapped human beings to an explosion and tried to stab
> through the sky with fire and math?"
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110928/727b8dc7/attachment.htm>

From linux at alteeve.com  Wed Sep 28 17:16:57 2011
From: linux at alteeve.com (Digimer)
Date: Wed, 28 Sep 2011 10:16:57 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CAA8CCF3.8AC7%rsajnove@cisco.com>
References: <CAA8CCF3.8AC7%rsajnove@cisco.com>
Message-ID: <4E835689.5040302@alteeve.com>

On 09/28/2011 10:09 AM, Ruben Sajnovetzky wrote:
> Thanks for this.
> I have still a long way to learn :)

That's why clustering is fun! :D

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From tc3driver at gmail.com  Wed Sep 28 21:13:22 2011
From: tc3driver at gmail.com (Bill G.)
Date: Wed, 28 Sep 2011 14:13:22 -0700
Subject: [Linux-cluster]  Bug# 618321 modclusterd memory leak
Message-ID: <CABQafzgRDTqaZaTCU1XMM8dJWaQBtN8ghr=UDy4EnjacY1K7zQ@mail.gmail.com>

Hi List,

I was wondering if you were aware of this bug, and if any of you have had
success in with the suggested work around that is listed as the final
comment.

Currently this is happening on 5 of my 9 server cluster, one was using 35GB
of ram.

I was also wondering, of those who have seen the problem, do you have any
other workable band-aids, besides the kill -9 and the prio program?

https://bugzilla.redhat.com/show_bug.cgi?id=618321

--
Thanks,
Bill G.
tc3driver at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110928/97544805/attachment.htm>

From fdinitto at redhat.com  Thu Sep 29 03:47:48 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 29 Sep 2011 05:47:48 +0200
Subject: [Linux-cluster] Bug# 618321 modclusterd memory leak
In-Reply-To: <CABQafzgRDTqaZaTCU1XMM8dJWaQBtN8ghr=UDy4EnjacY1K7zQ@mail.gmail.com>
References: <CABQafzgRDTqaZaTCU1XMM8dJWaQBtN8ghr=UDy4EnjacY1K7zQ@mail.gmail.com>
Message-ID: <4E83EA64.4010709@redhat.com>

On 09/28/2011 11:13 PM, Bill G. wrote:
> Hi List,
> 
> I was wondering if you were aware of this bug, and if any of you have
> had success in with the suggested work around that is listed as the
> final comment.
> 
> Currently this is happening on 5 of my 9 server cluster, one
> was using 35GB of ram.
> 
> I was also wondering, of those who have seen the problem, do you have
> any other workable band-aids, besides the kill -9 and the prio program?
> 

We actually have some serious issues reproducing this bug in order to
fix it and we are looking into any help / data that could help us to
kill it.

we don't have a workaround but it would be great if you and Ryan (in CC)
could maybe find a way to share info/data, or even a temporary ssh
access to the cluster to diagnose what is happening would be wonderful.

Fabio




From tc3driver at gmail.com  Thu Sep 29 05:24:58 2011
From: tc3driver at gmail.com (Bill G.)
Date: Wed, 28 Sep 2011 22:24:58 -0700
Subject: [Linux-cluster] Bug# 618321 modclusterd memory leak
In-Reply-To: <4E83EA64.4010709@redhat.com>
References: <CABQafzgRDTqaZaTCU1XMM8dJWaQBtN8ghr=UDy4EnjacY1K7zQ@mail.gmail.com>
	<4E83EA64.4010709@redhat.com>
Message-ID: <CABQafzhJ_KOGsZr4p4W2z_zpjyi3_+hXa3g+DoDqzCS_9j7Xyw@mail.gmail.com>

Hi Lon, and Ryan,

If you can get back to me within the next 1:35:00 or so of anything you
would like me to run, as I am scheduled to kill these processes at around
midnight PST.  Allowing remote access is not an option, these servers don't
even have internet access.

uname -a
Linux iss1a 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010 x86_64
x86_64 x86_64 GNU/Linux

RPMs installed
cluster-snmp-0.16.2-10.el6.x86_64
modcluster-0.16.2-10.el6.x86_64
clusterlib-3.0.12-23.el6_0.4.x86_64
cluster-cim-0.16.2-10.el6.x86_64
cluster-glue-libs-1.0.5-2.el6.x86_64
cluster-glue-1.0.5-2.el6.x86_64
luci-0.22.2-14.el6_0.1.x86_64
ricci-0.16.2-13.el6.x86_64

On Wed, Sep 28, 2011 at 8:47 PM, Fabio M. Di Nitto <fdinitto at redhat.com>wrote:

> On 09/28/2011 11:13 PM, Bill G. wrote:
> > Hi List,
> >
> > I was wondering if you were aware of this bug, and if any of you have
> > had success in with the suggested work around that is listed as the
> > final comment.
> >
> > Currently this is happening on 5 of my 9 server cluster, one
> > was using 35GB of ram.
> >
> > I was also wondering, of those who have seen the problem, do you have
> > any other workable band-aids, besides the kill -9 and the prio program?
> >
>
> We actually have some serious issues reproducing this bug in order to
> fix it and we are looking into any help / data that could help us to
> kill it.
>
> we don't have a workaround but it would be great if you and Ryan (in CC)
> could maybe find a way to share info/data, or even a temporary ssh
> access to the cluster to diagnose what is happening would be wonderful.
>
> Fabio
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Thanks,
Bill G.
tc3driver at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110928/8cd084da/attachment.htm>

From fdinitto at redhat.com  Thu Sep 29 06:23:17 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 29 Sep 2011 08:23:17 +0200
Subject: [Linux-cluster] Bug# 618321 modclusterd memory leak
In-Reply-To: <CABQafzhJ_KOGsZr4p4W2z_zpjyi3_+hXa3g+DoDqzCS_9j7Xyw@mail.gmail.com>
References: <CABQafzgRDTqaZaTCU1XMM8dJWaQBtN8ghr=UDy4EnjacY1K7zQ@mail.gmail.com>
	<4E83EA64.4010709@redhat.com>
	<CABQafzhJ_KOGsZr4p4W2z_zpjyi3_+hXa3g+DoDqzCS_9j7Xyw@mail.gmail.com>
Message-ID: <4E840ED5.5010208@redhat.com>

On 09/29/2011 07:24 AM, Bill G. wrote:
> Hi Lon, and Ryan,
> 
> If you can get back to me within the next 1:35:00 or so of anything you
> would like me to run, as I am scheduled to kill these processes at
> around midnight PST.

Unlikely to happen.. both Lon and Ryan are on the East Coast and
probably asleep. We will need to wait the next run.

>  Allowing remote access is not an option, these
> servers don't even have internet access.

Understood. I am sure Ryan will come back to you with what's needed.

> 
> uname -a
> Linux iss1a 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010
> x86_64 x86_64 x86_64 GNU/Linux

Did you also file a ticket with GSS/RH support?

> 
> RPMs installed
> cluster-snmp-0.16.2-10.el6.x86_64
> modcluster-0.16.2-10.el6.x86_64
> clusterlib-3.0.12-23.el6_0.4.x86_64
> cluster-cim-0.16.2-10.el6.x86_64
> cluster-glue-libs-1.0.5-2.el6.x86_64
> cluster-glue-1.0.5-2.el6.x86_64
> luci-0.22.2-14.el6_0.1.x86_64
> ricci-0.16.2-13.el6.x86_64
> 
> On Wed, Sep 28, 2011 at 8:47 PM, Fabio M. Di Nitto <fdinitto at redhat.com
> <mailto:fdinitto at redhat.com>> wrote:
> 
>     On 09/28/2011 11:13 PM, Bill G. wrote:
>     > Hi List,
>     >
>     > I was wondering if you were aware of this bug, and if any of you have
>     > had success in with the suggested work around that is listed as the
>     > final comment.
>     >
>     > Currently this is happening on 5 of my 9 server cluster, one
>     > was using 35GB of ram.
>     >
>     > I was also wondering, of those who have seen the problem, do you have
>     > any other workable band-aids, besides the kill -9 and the prio
>     program?
>     >
> 
>     We actually have some serious issues reproducing this bug in order to
>     fix it and we are looking into any help / data that could help us to
>     kill it.
> 
>     we don't have a workaround but it would be great if you and Ryan (in CC)
>     could maybe find a way to share info/data, or even a temporary ssh
>     access to the cluster to diagnose what is happening would be wonderful.
> 
>     Fabio
> 
> 
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
> -- 
> Thanks,
> Bill G.
> tc3driver at gmail.com <mailto:tc3driver at gmail.com>
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From ext.thales.jean-daniel.bonnetot at sncf.fr  Thu Sep 29 07:29:00 2011
From: ext.thales.jean-daniel.bonnetot at sncf.fr (BONNETOT Jean-Daniel (EXT THALES))
Date: Thu, 29 Sep 2011 09:29:00 +0200
Subject: [Linux-cluster] Killing node XXX because it has rejoined the
	cluster with existing state
Message-ID: <C088D3516432C643AC828162A5164A7F0AA099F8@se3lmwbibaw.COMMUN.AD.SNCF.FR>

Hi,

 

I have problem with two node cluster. When I force a node to faile,
second node fences first one. When first one rejoin my cluster, cman
shutdown on both nodes saying : 

 

Sep 28 17:29:36 s64lmwbig3c openais[7273]: [MAIN ] Killing node
s64lmwbig3b because it has rejoined the cluster with existing state

Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CMAN ] cman killed by node 1
because we rejoined the cluster without a full restart

 

 

Logs :

See attached

 

Conf :

<?xml version="1.0"?>

<cluster config_version="12" name="u64lmwbig8r">

        <cman expected_votes="1" two_node="1">

                <multicast addr="239.192.0.11"/>

        </cman>

        <clusternodes>

                <clusternode name="s64lmwbig3b" nodeid="1" votes="1">

                        <fence>

                                <method name="single">

                                        <device name="fenceHP_g3b"/>

                                </method>

                        </fence>

                </clusternode>

                <clusternode name="s64lmwbig3c" nodeid="2" votes="1">

                        <fence>

                                <method name="single">

                                        <device name="fenceHP_g3c"/>

                                </method>

                        </fence>

                </clusternode>

        </clusternodes>

        <fencedevices>

                <fencedevice agent="fence_ipmilan" ipaddr="XXXXX"
lanplus="1" login="user" name="fenceHP_g3b" passwd="password"
verbose="yes"/>

                <fencedevice agent="fence_ipmilan" ipaddr="XXXXX"
lanplus="1" login="user" name="fenceHP_g3c" passwd="password"
verbose="yes"/>

        </fencedevices>

        <rm>

                <failoverdomains/>

                <resources/>

        </rm>

        <fence_daemon clean_start="0" post_fail_delay="20"
post_join_delay="60"/>

</cluster>

 

Do you know what I missed ?

 

Thanks

Regards,

 

 

Jean-Daniel BONNETOT

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110929/98bbe030/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 667 bytes
Desc: image001.jpg
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110929/98bbe030/attachment.jpg>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cluster_log.txt
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110929/98bbe030/attachment.txt>
-------------- next part --------------
-------
Ce message et toutes les pi?ces jointes sont ?tablis ? l'intention exclusive de ses destinataires et sont confidentiels. L'int?grit? de ce message n'?tant pas assur?e sur Internet, la SNCF ne peut ?tre tenue responsable des alt?rations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, m?me partielle, non autoris?e pr?alablement par la SNCF, est strictement interdite. Si vous n'?tes pas le destinataire de ce message, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it. 

From ext.thales.jean-daniel.bonnetot at sncf.fr  Thu Sep 29 08:13:47 2011
From: ext.thales.jean-daniel.bonnetot at sncf.fr (BONNETOT Jean-Daniel (EXT THALES))
Date: Thu, 29 Sep 2011 10:13:47 +0200
Subject: [Linux-cluster] Killing node XXX because it has rejoined
	thecluster with existing state
In-Reply-To: <C088D3516432C643AC828162A5164A7F0AA099F8@se3lmwbibaw.COMMUN.AD.SNCF.FR>
References: <C088D3516432C643AC828162A5164A7F0AA099F8@se3lmwbibaw.COMMUN.AD.SNCF.FR>
Message-ID: <C088D3516432C643AC828162A5164A7F0AA420B0@se3lmwbibaw.COMMUN.AD.SNCF.FR>

Sorry for double message :)

 

 

 

Jean-Daniel BONNETOT

 

De : linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] De la part de BONNETOT Jean-Daniel (EXT THALES)
Envoy? : jeudi 29 septembre 2011 09:29
? : linux-cluster at redhat.com
Objet : [Linux-cluster] Killing node XXX because it has rejoined thecluster with existing state

 

Hi,

 

I have problem with two node cluster. When I force a node to faile, second node fences first one. When first one rejoin my cluster, cman shutdown on both nodes saying : 

 

Sep 28 17:29:36 s64lmwbig3c openais[7273]: [MAIN ] Killing node s64lmwbig3b because it has rejoined the cluster with existing state

Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart

 

 

Logs :

See attached

 

Conf :

<?xml version="1.0"?>

<cluster config_version="12" name="u64lmwbig8r">

        <cman expected_votes="1" two_node="1">

                <multicast addr="239.192.0.11"/>

        </cman>

        <clusternodes>

                <clusternode name="s64lmwbig3b" nodeid="1" votes="1">

                        <fence>

                                <method name="single">

                                        <device name="fenceHP_g3b"/>

                                </method>

                        </fence>

                </clusternode>

                <clusternode name="s64lmwbig3c" nodeid="2" votes="1">

                        <fence>

                                <method name="single">

                                        <device name="fenceHP_g3c"/>

                                </method>

                        </fence>

                </clusternode>

        </clusternodes>

        <fencedevices>

                <fencedevice agent="fence_ipmilan" ipaddr="XXXXX" lanplus="1" login="user" name="fenceHP_g3b" passwd="password" verbose="yes"/>

                <fencedevice agent="fence_ipmilan" ipaddr="XXXXX" lanplus="1" login="user" name="fenceHP_g3c" passwd="password" verbose="yes"/>

        </fencedevices>

        <rm>

                <failoverdomains/>

                <resources/>

        </rm>

        <fence_daemon clean_start="0" post_fail_delay="20" post_join_delay="60"/>

</cluster>

 

Do you know what I missed ?

 

Thanks

Regards,



 

Jean-Daniel BONNETOT

 

-------
Ce message et toutes les pi?ces jointes sont ?tablis ? l'intention exclusive de ses destinataires et sont confidentiels. L'int?grit? de ce message n'?tant pas assur?e sur Internet, la SNCF ne peut ?tre tenue responsable des alt?rations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, m?me partielle, non autoris?e pr?alablement par la SNCF, est strictement interdite. Si vous n'?tes pas le destinataire de ce message, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110929/c5fe2315/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 667 bytes
Desc: image001.jpg
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110929/c5fe2315/attachment.jpg>

From tc3driver at gmail.com  Thu Sep 29 16:22:52 2011
From: tc3driver at gmail.com (Bill G.)
Date: Thu, 29 Sep 2011 09:22:52 -0700
Subject: [Linux-cluster] Bug# 618321 modclusterd memory leak
In-Reply-To: <4E840ED5.5010208@redhat.com>
References: <CABQafzgRDTqaZaTCU1XMM8dJWaQBtN8ghr=UDy4EnjacY1K7zQ@mail.gmail.com>
	<4E83EA64.4010709@redhat.com>
	<CABQafzhJ_KOGsZr4p4W2z_zpjyi3_+hXa3g+DoDqzCS_9j7Xyw@mail.gmail.com>
	<4E840ED5.5010208@redhat.com>
Message-ID: <CABQafzj_XjLBbtTno32k1avKmTx5oGsF2J1Vppb8XkSD+sHm-A@mail.gmail.com>

> Did you also file a ticket with GSS/RH support?

No I have not logged a ticket with RH/GSS support.

On another interesting note...

It seems that killing modclusterd on one server, started a domino effect for
the rest of the servers that were displaying the Leak issue. After 10
minutes all of the modclusterd services were using roughly the same amount
of memory (~129Mb).

Thanks,
Bill
On Wed, Sep 28, 2011 at 11:23 PM, Fabio M. Di Nitto <fdinitto at redhat.com>wrote:

> On 09/29/2011 07:24 AM, Bill G. wrote:
> > Hi Lon, and Ryan,
> >
> > If you can get back to me within the next 1:35:00 or so of anything you
> > would like me to run, as I am scheduled to kill these processes at
> > around midnight PST.
>
> Unlikely to happen.. both Lon and Ryan are on the East Coast and
> probably asleep. We will need to wait the next run.
>
> >  Allowing remote access is not an option, these
> > servers don't even have internet access.
>
> Understood. I am sure Ryan will come back to you with what's needed.
>
> >
> > uname -a
> > Linux iss1a 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010
> > x86_64 x86_64 x86_64 GNU/Linux
>
> Did you also file a ticket with GSS/RH support?
>
> >
> > RPMs installed
> > cluster-snmp-0.16.2-10.el6.x86_64
> > modcluster-0.16.2-10.el6.x86_64
> > clusterlib-3.0.12-23.el6_0.4.x86_64
> > cluster-cim-0.16.2-10.el6.x86_64
> > cluster-glue-libs-1.0.5-2.el6.x86_64
> > cluster-glue-1.0.5-2.el6.x86_64
> > luci-0.22.2-14.el6_0.1.x86_64
> > ricci-0.16.2-13.el6.x86_64
> >
> > On Wed, Sep 28, 2011 at 8:47 PM, Fabio M. Di Nitto <fdinitto at redhat.com
> > <mailto:fdinitto at redhat.com>> wrote:
> >
> >     On 09/28/2011 11:13 PM, Bill G. wrote:
> >     > Hi List,
> >     >
> >     > I was wondering if you were aware of this bug, and if any of you
> have
> >     > had success in with the suggested work around that is listed as the
> >     > final comment.
> >     >
> >     > Currently this is happening on 5 of my 9 server cluster, one
> >     > was using 35GB of ram.
> >     >
> >     > I was also wondering, of those who have seen the problem, do you
> have
> >     > any other workable band-aids, besides the kill -9 and the prio
> >     program?
> >     >
> >
> >     We actually have some serious issues reproducing this bug in order to
> >     fix it and we are looking into any help / data that could help us to
> >     kill it.
> >
> >     we don't have a workaround but it would be great if you and Ryan (in
> CC)
> >     could maybe find a way to share info/data, or even a temporary ssh
> >     access to the cluster to diagnose what is happening would be
> wonderful.
> >
> >     Fabio
> >
> >
> >     --
> >     Linux-cluster mailing list
> >     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> >     https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> >
> > --
> > Thanks,
> > Bill G.
> > tc3driver at gmail.com <mailto:tc3driver at gmail.com>
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Thanks,
Bill G.
tc3driver at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110929/4044407a/attachment.htm>

From raju.rajsand at gmail.com  Thu Sep 29 16:34:01 2011
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Thu, 29 Sep 2011 22:04:01 +0530
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <4E83538D.4040106@alteeve.com>
References: <CAA8CA1E.8AB2%rsajnove@cisco.com> <4E83538D.4040106@alteeve.com>
Message-ID: <CA+YdgaqUrdx4aauM4+4YRNPHwku3Ud3FrJncm_zHx7oMWscYjQ@mail.gmail.com>

Greetings,

On Wed, Sep 28, 2011 at 10:34 PM, Digimer <linux at alteeve.com> wrote:
> Ok, that *looks* fine. So when you start the cman and rgmanager, what
> does 'clustat' show?
>
>> I copied the full cluster.conf, I deleted everything else to
>> ?concentrate? in the issue.
>> Now I re-created everything from scratch and with only FS service. I?m
>> copying here the files and
>> Output you requested.
>>

Do you know  about LVS for load sharing? It seems you have an issue with that.

Or switch/router which does that.

-- 
Regards,

Rajagopal



From linux at alteeve.com  Thu Sep 29 16:38:47 2011
From: linux at alteeve.com (Digimer)
Date: Thu, 29 Sep 2011 09:38:47 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CA+YdgaqUrdx4aauM4+4YRNPHwku3Ud3FrJncm_zHx7oMWscYjQ@mail.gmail.com>
References: <CAA8CA1E.8AB2%rsajnove@cisco.com> <4E83538D.4040106@alteeve.com>
	<CA+YdgaqUrdx4aauM4+4YRNPHwku3Ud3FrJncm_zHx7oMWscYjQ@mail.gmail.com>
Message-ID: <4E849F17.5030808@alteeve.com>

On 09/29/2011 09:34 AM, Rajagopal Swaminathan wrote:
> Greetings,
> 
> On Wed, Sep 28, 2011 at 10:34 PM, Digimer <linux at alteeve.com> wrote:
>> Ok, that *looks* fine. So when you start the cman and rgmanager, what
>> does 'clustat' show?
>>
>>> I copied the full cluster.conf, I deleted everything else to
>>> ?concentrate? in the issue.
>>> Now I re-created everything from scratch and with only FS service. I?m
>>> copying here the files and
>>> Output you requested.
>>>
> 
> Do you know  about LVS for load sharing? It seems you have an issue with that.
> 
> Or switch/router which does that.

I don't use LVS, but I know of it.

What about your current issue? Did you get fencing working?

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From fdinitto at redhat.com  Fri Sep 30 06:02:40 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Fri, 30 Sep 2011 08:02:40 +0200
Subject: [Linux-cluster] Bug# 618321 modclusterd memory leak
In-Reply-To: <CABQafzj_XjLBbtTno32k1avKmTx5oGsF2J1Vppb8XkSD+sHm-A@mail.gmail.com>
References: <CABQafzgRDTqaZaTCU1XMM8dJWaQBtN8ghr=UDy4EnjacY1K7zQ@mail.gmail.com>
	<4E83EA64.4010709@redhat.com>
	<CABQafzhJ_KOGsZr4p4W2z_zpjyi3_+hXa3g+DoDqzCS_9j7Xyw@mail.gmail.com>
	<4E840ED5.5010208@redhat.com>
	<CABQafzj_XjLBbtTno32k1avKmTx5oGsF2J1Vppb8XkSD+sHm-A@mail.gmail.com>
Message-ID: <4E855B80.9020006@redhat.com>

On 09/29/2011 06:22 PM, Bill G. wrote:
>> Did you also file a ticket with GSS/RH support?
> 
> No I have not logged a ticket with RH/GSS support.

I filed https://bugzilla.redhat.com/show_bug.cgi?id=742431 (clone of
618321 for RHEL6) to track the issue, but please file a ticket and point
GSS to the BZ.

> 
> On another interesting note...
> 
> It seems that killing modclusterd on one server, started a domino effect
> for the rest of the servers that were displaying the Leak issue. After
> 10 minutes all of the modclusterd services were using roughly the same
> amount of memory (~129Mb).
> 
> Thanks,
> Bill
> On Wed, Sep 28, 2011 at 11:23 PM, Fabio M. Di Nitto <fdinitto at redhat.com
> <mailto:fdinitto at redhat.com>> wrote:
> 
>     On 09/29/2011 07:24 AM, Bill G. wrote:
>     > Hi Lon, and Ryan,
>     >
>     > If you can get back to me within the next 1:35:00 or so of
>     anything you
>     > would like me to run, as I am scheduled to kill these processes at
>     > around midnight PST.
> 
>     Unlikely to happen.. both Lon and Ryan are on the East Coast and
>     probably asleep. We will need to wait the next run.
> 
>     >  Allowing remote access is not an option, these
>     > servers don't even have internet access.
> 
>     Understood. I am sure Ryan will come back to you with what's needed.
> 
>     >
>     > uname -a
>     > Linux iss1a 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010
>     > x86_64 x86_64 x86_64 GNU/Linux
> 
>     Did you also file a ticket with GSS/RH support?
> 
>     >
>     > RPMs installed
>     > cluster-snmp-0.16.2-10.el6.x86_64
>     > modcluster-0.16.2-10.el6.x86_64
>     > clusterlib-3.0.12-23.el6_0.4.x86_64
>     > cluster-cim-0.16.2-10.el6.x86_64
>     > cluster-glue-libs-1.0.5-2.el6.x86_64
>     > cluster-glue-1.0.5-2.el6.x86_64
>     > luci-0.22.2-14.el6_0.1.x86_64
>     > ricci-0.16.2-13.el6.x86_64
>     >
>     > On Wed, Sep 28, 2011 at 8:47 PM, Fabio M. Di Nitto
>     <fdinitto at redhat.com <mailto:fdinitto at redhat.com>
>     > <mailto:fdinitto at redhat.com <mailto:fdinitto at redhat.com>>> wrote:
>     >
>     >     On 09/28/2011 11:13 PM, Bill G. wrote:
>     >     > Hi List,
>     >     >
>     >     > I was wondering if you were aware of this bug, and if any of
>     you have
>     >     > had success in with the suggested work around that is listed
>     as the
>     >     > final comment.
>     >     >
>     >     > Currently this is happening on 5 of my 9 server cluster, one
>     >     > was using 35GB of ram.
>     >     >
>     >     > I was also wondering, of those who have seen the problem, do
>     you have
>     >     > any other workable band-aids, besides the kill -9 and the prio
>     >     program?
>     >     >
>     >
>     >     We actually have some serious issues reproducing this bug in
>     order to
>     >     fix it and we are looking into any help / data that could help
>     us to
>     >     kill it.
>     >
>     >     we don't have a workaround but it would be great if you and
>     Ryan (in CC)
>     >     could maybe find a way to share info/data, or even a temporary ssh
>     >     access to the cluster to diagnose what is happening would be
>     wonderful.
>     >
>     >     Fabio
>     >
>     >
>     >     --
>     >     Linux-cluster mailing list
>     >     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     <mailto:Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>>
>     >     https://www.redhat.com/mailman/listinfo/linux-cluster
>     >
>     >
>     >
>     >
>     > --
>     > Thanks,
>     > Bill G.
>     > tc3driver at gmail.com <mailto:tc3driver at gmail.com>
>     <mailto:tc3driver at gmail.com <mailto:tc3driver at gmail.com>>
>     >
>     >
>     > --
>     > Linux-cluster mailing list
>     > Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
> -- 
> Thanks,
> Bill G.
> tc3driver at gmail.com <mailto:tc3driver at gmail.com>
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From mgrac at redhat.com  Fri Sep 30 13:02:46 2011
From: mgrac at redhat.com (Marek Grac)
Date: Fri, 30 Sep 2011 15:02:46 +0200
Subject: [Linux-cluster] fence-agents-3.1.6 stable release
Message-ID: <4E85BDF6.9050904@redhat.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Welcome to the fence-agents 3.1.6 release.

This release includes a few minor bug fixes and support for new fencing
agent for use with kdump crash recovery service.

The new source tarball can be downloaded here:

https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-3.1.6.tar.xz

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Thanks/congratulations to all people that contributed to achieve this
great milestone.

m,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOhb3zAAoJEKQt0ebXzoDMLEoH/jiYLVyeLqmI/kBHIgFZbZP4
gETS1lFVU+8XKXaSCiku5lHKogzmKtjzCI+evpYhyIwrMsvJQM0k3JCWVXGaUpmK
u2txH/mVcgJeB3jNBZUg2r9NP6T8Yo5l1ihpPEu40Hu/yd6NSBrly2uMrkhzwu6x
fu+vJ8BoSHMsU6ubJDlTioTFYn6idDi5iiASMhMDrxuRC8xfk0dxSfDXjhSD3xtL
SWsHqfLKlaCSluFiQ4Vigj2ZYTZEYnV3gkYAMp534p3sNKtZupfWwcHD8yzrmR81
bWmkD/04q4Z/ohlpgWrRhdBWKQ28hvXidF20sj8wlevbJGSrFoygEyH0yp/2ZPA=
=jKs+
-----END PGP SIGNATURE-----



From linux at alteeve.com  Fri Sep 30 14:55:57 2011
From: linux at alteeve.com (Digimer)
Date: Fri, 30 Sep 2011 07:55:57 -0700
Subject: [Linux-cluster] [Linux-ha-dev] [ha-wg] CFP: HA Mini-Conference
 in Prague on Oct 25th
In-Reply-To: <20110927145802.GB3713@suse.de>
References: <20110814193045.GP5299@suse.de> <20110927145802.GB3713@suse.de>
Message-ID: <4E85D87D.4000906@alteeve.com>

On 09/27/2011 07:58 AM, Lars Marowsky-Bree wrote:
> Hi all,
> 
> it turns out that there was zero feedback about people wanting to
> present, only some about travel budget being too tight to come. So we
> had some discussions about whether to cancel this completely, as this
> made planning rather difficult.
> 
> But just in the last few days, I got a fair share of e-mails asking if
> this still takes place, and who is going to be there. ;-)
> 
> So: we have the room. I will be there, and it seems so will at least a
> few other people, including Andrew. I suggest we do it in an
> "unconference" style and draw up the agenda as we go along; you're
> welcome to stop by and discuss HA/clustering topics that are important
> to you.  It is going to be as successful as we all make it out to be.
> 
> We share the venue with LinuxCon Europe: Clarion Congress Hotel ?
> Prague, Czech Republic, on Oct 25th.
> 
> I suggest we start at 9:30 in the morning and go from there.
> 
> 
> Regards,
>     Lars
> 

Is it possible, if this isn't set in stone, to push back to later in the
day? I don't fly in until the 25th, and I think there is one other
person who wants to attend in the same boat.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From rsajnove at cisco.com  Fri Sep 30 21:24:30 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Fri, 30 Sep 2011 17:24:30 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <4E849F17.5030808@alteeve.com>
Message-ID: <CAABABCE.8E53%rsajnove@cisco.com>


Sorry, I was out yesterday and most of today!
I got, finally, somehow running partially what I need:

    I created two resources with different mounting point as:
    
        /opt/Central
        /opt/Collector
    
At least is a workaround doable because the application can be installed
anywhere (installation parameter).

The issue I'm facing now is that I don't see the way to run same service in
more than one node at same time (parallel) and without that, I'm pusshed to
install at /opt/CollectorX, where X is the number of Collector and that will
not be accepted.

Any clue if such "parallel" running can be implemented?

Thanks



On 29-Sep-2011 12:38 PM, "Digimer" <linux at alteeve.com> wrote:

> On 09/29/2011 09:34 AM, Rajagopal Swaminathan wrote:
>> Greetings,
>> 
>> On Wed, Sep 28, 2011 at 10:34 PM, Digimer <linux at alteeve.com> wrote:
>>> Ok, that *looks* fine. So when you start the cman and rgmanager, what
>>> does 'clustat' show?
>>> 
>>>> I copied the full cluster.conf, I deleted everything else to
>>>> ?concentrate? in the issue.
>>>> Now I re-created everything from scratch and with only FS service. I?m
>>>> copying here the files and
>>>> Output you requested.
>>>> 
>> 
>> Do you know  about LVS for load sharing? It seems you have an issue with
>> that.
>> 
>> Or switch/router which does that.
> 
> I don't use LVS, but I know of it.
> 
> What about your current issue? Did you get fencing working?




From linux at alteeve.com  Fri Sep 30 21:30:24 2011
From: linux at alteeve.com (Digimer)
Date: Fri, 30 Sep 2011 14:30:24 -0700
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <CAABABCE.8E53%rsajnove@cisco.com>
References: <CAABABCE.8E53%rsajnove@cisco.com>
Message-ID: <4E8634F0.7090809@alteeve.com>

On 09/30/2011 02:24 PM, Ruben Sajnovetzky wrote:
> 
> Sorry, I was out yesterday and most of today!
> I got, finally, somehow running partially what I need:
> 
>     I created two resources with different mounting point as:
>     
>         /opt/Central
>         /opt/Collector
>     
> At least is a workaround doable because the application can be installed
> anywhere (installation parameter).
> 
> The issue I'm facing now is that I don't see the way to run same service in
> more than one node at same time (parallel) and without that, I'm pusshed to
> install at /opt/CollectorX, where X is the number of Collector and that will
> not be accepted.
> 
> Any clue if such "parallel" running can be implemented?
> 
> Thanks

Yup, just put the <fs ...> entries in parallel, rather than as child
elements.

ie:

	<service ...>
		<fs ... />
		<fs ... />
	</service

RGManager will then start the two FS resources in parallel and without
dependencies.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"



From rsajnove at cisco.com  Fri Sep 30 22:01:09 2011
From: rsajnove at cisco.com (Ruben Sajnovetzky)
Date: Fri, 30 Sep 2011 18:01:09 -0400
Subject: [Linux-cluster] How to run same service in parallel in RedHat
 Cluster 5.0
In-Reply-To: <4E8634F0.7090809@alteeve.com>
Message-ID: <CAABB465.8E67%rsajnove@cisco.com>


No, I'm asking something different:

    To have same Service running in more than one node at same time.



On 30-Sep-2011 5:30 PM, "Digimer" <linux at alteeve.com> wrote:

> On 09/30/2011 02:24 PM, Ruben Sajnovetzky wrote:
>> 
>> Sorry, I was out yesterday and most of today!
>> I got, finally, somehow running partially what I need:
>> 
>>     I created two resources with different mounting point as:
>>     
>>         /opt/Central
>>         /opt/Collector
>>     
>> At least is a workaround doable because the application can be installed
>> anywhere (installation parameter).
>> 
>> The issue I'm facing now is that I don't see the way to run same service in
>> more than one node at same time (parallel) and without that, I'm pusshed to
>> install at /opt/CollectorX, where X is the number of Collector and that will
>> not be accepted.
>> 
>> Any clue if such "parallel" running can be implemented?
>> 
>> Thanks
> 
> Yup, just put the <fs ...> entries in parallel, rather than as child
> elements.
> 
> ie:
> 
> <service ...>
> <fs ... />
> <fs ... />
> </service
> 
> RGManager will then start the two FS resources in parallel and without
> dependencies.