From hitesh1907nayyar at gmail.com  Mon Oct  1 03:16:31 2012
From: hitesh1907nayyar at gmail.com (hitesh nayyar)
Date: Mon, 1 Oct 2012 08:46:31 +0530
Subject: [Linux-cluster] linux-cluster
Message-ID: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>

Hi,

Hi,

I am facing issuing in setting up Linux cluster. Here is the issue that i
am facing.

I have 2 Linux desktop and have following ip's and name:

hitesh12-192.168.1.23
saanvi12-192.168.1.30

i enabled ricci service and have setup passwod as well.Enabled luci service
as well.

When cluster using GUI by activating luci GUI i see error logs in my
/var/log/messages from hitesh12 -192.168.1.23


Sep 30 22:31:57 localhost dlm_controld[2945]: dlm_controld 3.0.12 started
Sep 30 22:32:18 localhost gfs_controld[3010]: gfs_controld 3.0.12 started
Sep 30 22:33:39 localhost kernel: dlm: Using TCP for communications
Sep 30 22:33:41 localhost fenced[2930]: fencing node saanvi12
Sep 30 22:33:44 localhost fenced[2930]: fence saanvi12 dev 0.0 agent none
result: error no method
*Sep 30 22:33:44 localhost fenced[2930]: fence saanvi12 failed
Sep 30 22:33:47 localhost fenced[2930]: fencing node saanvi12
Sep 30 22:33:49 localhost fenced[2930]: fence saanvi12 dev 0.0 agent none
result: error no method
Sep 30 22:33:49 localhost fenced[2930]: fence saanvi12 failed
Sep 30 22:33:52 localhost fenced[2930]: fencing node saanvi12*

With the above error the result in by issuing clustat -i 1 command is :

*Cluster Status for dhoni @ Sun Sep 30 23:04:08 2012
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
hitesh12 1 Online, Local
saanvi12 2 Offline*


I have disabled by firewall on both my linux servers and is able to telnet
each other.


Can somebdy please help me out as how can i remove my fence error ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121001/fc690089/attachment.htm>

From lists at alteeve.ca  Mon Oct  1 03:22:14 2012
From: lists at alteeve.ca (Digimer)
Date: Sun, 30 Sep 2012 23:22:14 -0400
Subject: [Linux-cluster] linux-cluster
In-Reply-To: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
References: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
Message-ID: <50690C66.2070603@alteeve.ca>

Did you setup fencing? Can you send your cluster.conf file please?

digimer

On 09/30/2012 11:16 PM, hitesh nayyar wrote:
> Hi,
>
> Hi,
>
> I am facing issuing in setting up Linux cluster. Here is the issue that
> i am facing.
>
> I have 2 Linux desktop and have following ip's and name:
>
> hitesh12-192.168.1.23
> saanvi12-192.168.1.30
>
> i enabled ricci service and have setup passwod as well.Enabled luci
> service as well.
>
> When cluster using GUI by activating luci GUI i see error logs in my
> /var/log/messages from hitesh12 -192.168.1.23
>
>
> Sep 30 22:31:57 localhost dlm_controld[2945]: dlm_controld 3.0.12 started
> Sep 30 22:32:18 localhost gfs_controld[3010]: gfs_controld 3.0.12 started
> Sep 30 22:33:39 localhost kernel: dlm: Using TCP for communications
> Sep 30 22:33:41 localhost fenced[2930]: fencing node saanvi12
> Sep 30 22:33:44 localhost fenced[2930]: fence saanvi12 dev 0.0 agent
> none result: error no method
> *Sep 30 22:33:44 localhost fenced[2930]: fence saanvi12 failed
> Sep 30 22:33:47 localhost fenced[2930]: fencing node saanvi12
> Sep 30 22:33:49 localhost fenced[2930]: fence saanvi12 dev 0.0 agent
> none result: error no method
> Sep 30 22:33:49 localhost fenced[2930]: fence saanvi12 failed
> Sep 30 22:33:52 localhost fenced[2930]: fencing node saanvi12*
>
> With the above error the result in by issuing clustat -i 1 command is :
>
> *Cluster Status for dhoni @ Sun Sep 30 23:04:08 2012
> Member Status: Quorate
>
> Member Name ID Status
> ------ ---- ---- ------
> hitesh12 1 Online, Local
> saanvi12 2 Offline*
>
>
> I have disabled by firewall on both my linux servers and is able to
> telnet each other.
>
>
> Can somebdy please help me out as how can i remove my fence error ?
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca
"Hydrogen is just a colourless, odorless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From raju.rajsand at gmail.com  Mon Oct  1 03:43:39 2012
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Mon, 1 Oct 2012 09:13:39 +0530
Subject: [Linux-cluster] linux-cluster
In-Reply-To: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
References: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
Message-ID: <CA+YdgapzScFo1qP8Mk5=WW5OCzWuQmkduS0+QgpVv5D0RZvZiA@mail.gmail.com>

Greetings,

On Mon, Oct 1, 2012 at 8:46 AM, hitesh nayyar
<hitesh1907nayyar at gmail.com> wrote:
>
> I have 2 Linux desktop and have following ip's and name:
>

>From what I can gather, This seems to be a desktop class machines.
Hence they may not have IPMI/ILO etc.

I am 99.99% certain that fencing has not been configured.

I also doubt if it has external storage.

(I can identify as I tried my experiments with clusters first time
with such desktop class machines)

The only solution for this is Power Fencing.

-- 
Regards,

Rajagopal


From hitesh1907nayyar at gmail.com  Mon Oct  1 04:53:48 2012
From: hitesh1907nayyar at gmail.com (hitesh nayyar)
Date: Mon, 1 Oct 2012 10:23:48 +0530
Subject: [Linux-cluster] linux-cluster
In-Reply-To: <CA+YdgapzScFo1qP8Mk5=WW5OCzWuQmkduS0+QgpVv5D0RZvZiA@mail.gmail.com>
References: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
	<CA+YdgapzScFo1qP8Mk5=WW5OCzWuQmkduS0+QgpVv5D0RZvZiA@mail.gmail.com>
Message-ID: <CAMyy8gSr3BzTAq8n6QR+tOKL=LJSbNGihOYRoJde_iAXiAeH7A@mail.gmail.com>

Hello,

I am not aware of fencing.Yes you are correct i have not configured
anything related to fencing.

Can you please let me know as how can i proceed...Do i have to purchase
some sort of hardware?

How can i implement Power Fencing?

I am using virtual box machines on my desktop in which linux are installed
and have been connected through swtich.


[root at hitesh12 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="1" name="dhoni">
        <clusternodes>
                <clusternode name="hitesh12" nodeid="1"/>
                <clusternode name="saanvi12" nodeid="2"/>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm/>
</cluster>


On Mon, Oct 1, 2012 at 9:13 AM, Rajagopal Swaminathan <
raju.rajsand at gmail.com> wrote:

> Greetings,
>
> On Mon, Oct 1, 2012 at 8:46 AM, hitesh nayyar
> <hitesh1907nayyar at gmail.com> wrote:
> >
> > I have 2 Linux desktop and have following ip's and name:
> >
>
> >From what I can gather, This seems to be a desktop class machines.
> Hence they may not have IPMI/ILO etc.
>
> I am 99.99% certain that fencing has not been configured.
>
> I also doubt if it has external storage.
>
> (I can identify as I tried my experiments with clusters first time
> with such desktop class machines)
>
> The only solution for this is Power Fencing.
>
> --
> Regards,
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121001/86dcdc27/attachment.htm>

From lists at alteeve.ca  Mon Oct  1 05:14:02 2012
From: lists at alteeve.ca (Digimer)
Date: Mon, 01 Oct 2012 01:14:02 -0400
Subject: [Linux-cluster] linux-cluster
In-Reply-To: <CAMyy8gSr3BzTAq8n6QR+tOKL=LJSbNGihOYRoJde_iAXiAeH7A@mail.gmail.com>
References: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
	<CA+YdgapzScFo1qP8Mk5=WW5OCzWuQmkduS0+QgpVv5D0RZvZiA@mail.gmail.com>
	<CAMyy8gSr3BzTAq8n6QR+tOKL=LJSbNGihOYRoJde_iAXiAeH7A@mail.gmail.com>
Message-ID: <5069269A.2050405@alteeve.ca>

First up, give this section a read. It will explain what fencing does 
and why you're seeing what you are;

https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing

As for being virtualbox guests, you're in a bit of a spot as there 
doesn't seem to be a native agent. Google shows this though;

https://forums.virtualbox.org/viewtopic.php?f=7&t=35372

If you can switch to KVM or Xen, then you can use the fence_virsh or 
fence_xvm agents.

cheers

On 10/01/2012 12:53 AM, hitesh nayyar wrote:
> Hello,
>
> I am not aware of fencing.Yes you are correct i have not configured
> anything related to fencing.
>
> Can you please let me know as how can i proceed...Do i have to purchase
> some sort of hardware?
>
> How can i implement Power Fencing?
>
> I am using virtual box machines on my desktop in which linux are
> installed and have been connected through swtich.
>
>
> [root at hitesh12 ~]# cat /etc/cluster/cluster.conf
> <?xml version="1.0"?>
> <cluster config_version="1" name="dhoni">
>          <clusternodes>
>                  <clusternode name="hitesh12" nodeid="1"/>
>                  <clusternode name="saanvi12" nodeid="2"/>
>          </clusternodes>
>          <cman expected_votes="1" two_node="1"/>
>          <fencedevices/>
>          <rm/>
> </cluster>
>
>
>
>
>
>
> On Mon, Oct 1, 2012 at 9:13 AM, Rajagopal Swaminathan
> <raju.rajsand at gmail.com <mailto:raju.rajsand at gmail.com>> wrote:
>
>     Greetings,
>
>     On Mon, Oct 1, 2012 at 8:46 AM, hitesh nayyar
>     <hitesh1907nayyar at gmail.com <mailto:hitesh1907nayyar at gmail.com>> wrote:
>      >
>      > I have 2 Linux desktop and have following ip's and name:
>      >
>
>      >From what I can gather, This seems to be a desktop class machines.
>     Hence they may not have IPMI/ILO etc.
>
>     I am 99.99% certain that fencing has not been configured.
>
>     I also doubt if it has external storage.
>
>     (I can identify as I tried my experiments with clusters first time
>     with such desktop class machines)
>
>     The only solution for this is Power Fencing.
>
>     --
>     Regards,
>
>     Rajagopal
>
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca
"Hydrogen is just a colourless, odorless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From lists at alteeve.ca  Mon Oct  1 05:49:36 2012
From: lists at alteeve.ca (Digimer)
Date: Mon, 01 Oct 2012 01:49:36 -0400
Subject: [Linux-cluster] linux-cluster
In-Reply-To: <CAMyy8gQ46cxbrqsv_KeJ3GSikAiAk7WZCEnNEmtZS_W9CRHyRg@mail.gmail.com>
References: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
	<CA+YdgapzScFo1qP8Mk5=WW5OCzWuQmkduS0+QgpVv5D0RZvZiA@mail.gmail.com>
	<CAMyy8gSr3BzTAq8n6QR+tOKL=LJSbNGihOYRoJde_iAXiAeH7A@mail.gmail.com>
	<5069269A.2050405@alteeve.ca>
	<CAMyy8gR39h5NFMNxBSR7jsCXH-4dr7cWEO3vL48G7P+P5RsPRg@mail.gmail.com>
	<CAMyy8gQ46cxbrqsv_KeJ3GSikAiAk7WZCEnNEmtZS_W9CRHyRg@mail.gmail.com>
Message-ID: <50692EF0.2000505@alteeve.ca>

Please keep replies on the mailing list. These question helps other in 
the future by being seachable in the archives.

Xen and KVM are types of hypervisors, like virtualbox is, but they run 
on Linux hosts. To use them, you would need to install linux on the bare 
machine. This in turn requires a CPU that supports virtualization (which 
most CPUs made in the last few years do support).

If you can install CentOS 6, you can use KVM which is what I recommend.

digimer

On 10/01/2012 01:26 AM, hitesh nayyar wrote:
> One more thing...till now i have used this setup:
>
> have Windows vista OS ---> Virtual Box---->Red Hat installed.
>
> If i download Xen or KVM can i use the same setup instead of Virtual Box?
>
> Windows vista OS ---->Xen or KVM ---->Red Hat installed
>
> BR//
> Hitesh
>
> On Mon, Oct 1, 2012 at 10:48 AM, hitesh nayyar
> <hitesh1907nayyar at gmail.com <mailto:hitesh1907nayyar at gmail.com>> wrote:
>
>     Hello Again,
>
>     Can you please let me know what refers to KVM or Xen? I have never
>     used this
>
>     Thanks
>
>
>     On Mon, Oct 1, 2012 at 10:44 AM, Digimer <lists at alteeve.ca
>     <mailto:lists at alteeve.ca>> wrote:
>
>         First up, give this section a read. It will explain what fencing
>         does and why you're seeing what you are;
>
>         https://alteeve.ca/w/2-Node___Red_Hat_KVM_Cluster_Tutorial#__Concept.3B_Fencing
>         <https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing>
>
>         As for being virtualbox guests, you're in a bit of a spot as
>         there doesn't seem to be a native agent. Google shows this though;
>
>         https://forums.virtualbox.org/__viewtopic.php?f=7&t=35372
>         <https://forums.virtualbox.org/viewtopic.php?f=7&t=35372>
>
>         If you can switch to KVM or Xen, then you can use the
>         fence_virsh or fence_xvm agents.
>
>         cheers
>
>
>         On 10/01/2012 12:53 AM, hitesh nayyar wrote:
>
>             Hello,
>
>             I am not aware of fencing.Yes you are correct i have not
>             configured
>             anything related to fencing.
>
>             Can you please let me know as how can i proceed...Do i have
>             to purchase
>             some sort of hardware?
>
>             How can i implement Power Fencing?
>
>             I am using virtual box machines on my desktop in which linux are
>             installed and have been connected through swtich.
>
>
>             [root at hitesh12 ~]# cat /etc/cluster/cluster.conf
>             <?xml version="1.0"?>
>             <cluster config_version="1" name="dhoni">
>                       <clusternodes>
>                               <clusternode name="hitesh12" nodeid="1"/>
>                               <clusternode name="saanvi12" nodeid="2"/>
>                       </clusternodes>
>                       <cman expected_votes="1" two_node="1"/>
>                       <fencedevices/>
>                       <rm/>
>             </cluster>
>
>
>
>
>
>
>             On Mon, Oct 1, 2012 at 9:13 AM, Rajagopal Swaminathan
>             <raju.rajsand at gmail.com <mailto:raju.rajsand at gmail.com>
>             <mailto:raju.rajsand at gmail.com
>             <mailto:raju.rajsand at gmail.com>__>> wrote:
>
>                  Greetings,
>
>                  On Mon, Oct 1, 2012 at 8:46 AM, hitesh nayyar
>                  <hitesh1907nayyar at gmail.com
>             <mailto:hitesh1907nayyar at gmail.com>
>             <mailto:hitesh1907nayyar at __gmail.com
>             <mailto:hitesh1907nayyar at gmail.com>>> wrote:
>                   >
>                   > I have 2 Linux desktop and have following ip's and name:
>                   >
>
>                   >From what I can gather, This seems to be a desktop
>             class machines.
>                  Hence they may not have IPMI/ILO etc.
>
>                  I am 99.99% certain that fencing has not been configured.
>
>                  I also doubt if it has external storage.
>
>                  (I can identify as I tried my experiments with clusters
>             first time
>                  with such desktop class machines)
>
>                  The only solution for this is Power Fencing.
>
>                  --
>                  Regards,
>
>                  Rajagopal
>
>                  --
>                  Linux-cluster mailing list
>             Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>             <mailto:Linux-cluster at redhat.__com
>             <mailto:Linux-cluster at redhat.com>>
>             https://www.redhat.com/__mailman/listinfo/linux-cluster
>             <https://www.redhat.com/mailman/listinfo/linux-cluster>
>
>
>
>
>         --
>         Digimer
>         Papers and Projects: https://alteeve.ca
>         "Hydrogen is just a colourless, odorless gas which, if left
>         alone in sufficient quantities for long periods of time, begins
>         to think about itself."
>
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca
"Hydrogen is just a colourless, odorless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From lists at alteeve.ca  Mon Oct  1 06:19:39 2012
From: lists at alteeve.ca (Digimer)
Date: Mon, 01 Oct 2012 02:19:39 -0400
Subject: [Linux-cluster] linux-cluster
In-Reply-To: <CAMyy8gQgAZzPFiu2NkfwiQTc77wu=oX7LAAzsT0F+4yPpKWGnw@mail.gmail.com>
References: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
	<CA+YdgapzScFo1qP8Mk5=WW5OCzWuQmkduS0+QgpVv5D0RZvZiA@mail.gmail.com>
	<CAMyy8gSr3BzTAq8n6QR+tOKL=LJSbNGihOYRoJde_iAXiAeH7A@mail.gmail.com>
	<5069269A.2050405@alteeve.ca>
	<CAMyy8gR39h5NFMNxBSR7jsCXH-4dr7cWEO3vL48G7P+P5RsPRg@mail.gmail.com>
	<CAMyy8gQ46cxbrqsv_KeJ3GSikAiAk7WZCEnNEmtZS_W9CRHyRg@mail.gmail.com>
	<50692EF0.2000505@alteeve.ca>
	<CAMyy8gQgAZzPFiu2NkfwiQTc77wu=oX7LAAzsT0F+4yPpKWGnw@mail.gmail.com>
Message-ID: <506935FB.8060606@alteeve.ca>

You don't seem to be reading what I am typing. Please go back over the 
various replies and read again what I said. Follow the links and read 
what they say.

And please don't reply only to me. Click "Reply All" and include the 
mailing list.

digimer

On 10/01/2012 02:11 AM, hitesh nayyar wrote:
> Hi,
>
> I have a constraint of using Linux on bare machine for my 2 desktop.
>
> Is there not any way i can get use the agent  or perform clustering on
> Virtual box or VM ware software.....
>
> On Mon, Oct 1, 2012 at 11:19 AM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
>
>     Please keep replies on the mailing list. These question helps other
>     in the future by being seachable in the archives.
>
>     Xen and KVM are types of hypervisors, like virtualbox is, but they
>     run on Linux hosts. To use them, you would need to install linux on
>     the bare machine. This in turn requires a CPU that supports
>     virtualization (which most CPUs made in the last few years do support).
>
>     If you can install CentOS 6, you can use KVM which is what I recommend.
>
>     digimer
>
>
>     On 10/01/2012 01:26 AM, hitesh nayyar wrote:
>
>         One more thing...till now i have used this setup:
>
>         have Windows vista OS ---> Virtual Box---->Red Hat installed.
>
>         If i download Xen or KVM can i use the same setup instead of
>         Virtual Box?
>
>         Windows vista OS ---->Xen or KVM ---->Red Hat installed
>
>         BR//
>         Hitesh
>
>         On Mon, Oct 1, 2012 at 10:48 AM, hitesh nayyar
>         <hitesh1907nayyar at gmail.com <mailto:hitesh1907nayyar at gmail.com>
>         <mailto:hitesh1907nayyar at __gmail.com
>         <mailto:hitesh1907nayyar at gmail.com>>> wrote:
>
>              Hello Again,
>
>              Can you please let me know what refers to KVM or Xen? I
>         have never
>              used this
>
>              Thanks
>
>
>              On Mon, Oct 1, 2012 at 10:44 AM, Digimer <lists at alteeve.ca
>         <mailto:lists at alteeve.ca>
>              <mailto:lists at alteeve.ca <mailto:lists at alteeve.ca>>> wrote:
>
>                  First up, give this section a read. It will explain
>         what fencing
>                  does and why you're seeing what you are;
>
>         https://alteeve.ca/w/2-Node_____Red_Hat_KVM_Cluster_Tutorial#____Concept.3B_Fencing
>         <https://alteeve.ca/w/2-Node___Red_Hat_KVM_Cluster_Tutorial#__Concept.3B_Fencing>
>
>
>         <https://alteeve.ca/w/2-Node___Red_Hat_KVM_Cluster_Tutorial#__Concept.3B_Fencing
>         <https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing>>
>
>                  As for being virtualbox guests, you're in a bit of a
>         spot as
>                  there doesn't seem to be a native agent. Google shows
>         this though;
>
>         https://forums.virtualbox.org/____viewtopic.php?f=7&t=35372
>         <https://forums.virtualbox.org/__viewtopic.php?f=7&t=35372>
>
>
>         <https://forums.virtualbox.__org/viewtopic.php?f=7&t=35372
>         <https://forums.virtualbox.org/viewtopic.php?f=7&t=35372>>
>
>                  If you can switch to KVM or Xen, then you can use the
>                  fence_virsh or fence_xvm agents.
>
>                  cheers
>
>
>                  On 10/01/2012 12:53 AM, hitesh nayyar wrote:
>
>                      Hello,
>
>                      I am not aware of fencing.Yes you are correct i
>         have not
>                      configured
>                      anything related to fencing.
>
>                      Can you please let me know as how can i
>         proceed...Do i have
>                      to purchase
>                      some sort of hardware?
>
>                      How can i implement Power Fencing?
>
>                      I am using virtual box machines on my desktop in
>         which linux are
>                      installed and have been connected through swtich.
>
>
>                      [root at hitesh12 ~]# cat /etc/cluster/cluster.conf
>                      <?xml version="1.0"?>
>                      <cluster config_version="1" name="dhoni">
>                                <clusternodes>
>                                        <clusternode name="hitesh12"
>         nodeid="1"/>
>                                        <clusternode name="saanvi12"
>         nodeid="2"/>
>                                </clusternodes>
>                                <cman expected_votes="1" two_node="1"/>
>                                <fencedevices/>
>                                <rm/>
>                      </cluster>
>
>
>
>
>
>
>                      On Mon, Oct 1, 2012 at 9:13 AM, Rajagopal Swaminathan
>                      <raju.rajsand at gmail.com
>         <mailto:raju.rajsand at gmail.com> <mailto:raju.rajsand at gmail.com
>         <mailto:raju.rajsand at gmail.com>__>
>                      <mailto:raju.rajsand at gmail.com
>         <mailto:raju.rajsand at gmail.com>
>
>                      <mailto:raju.rajsand at gmail.com
>         <mailto:raju.rajsand at gmail.com>__>__>> wrote:
>
>                           Greetings,
>
>                           On Mon, Oct 1, 2012 at 8:46 AM, hitesh nayyar
>                           <hitesh1907nayyar at gmail.com
>         <mailto:hitesh1907nayyar at gmail.com>
>                      <mailto:hitesh1907nayyar at __gmail.com
>         <mailto:hitesh1907nayyar at gmail.com>>
>                      <mailto:hitesh1907nayyar@
>         <mailto:hitesh1907nayyar@>__gma__il.com <http://gmail.com>
>
>                      <mailto:hitesh1907nayyar at __gmail.com
>         <mailto:hitesh1907nayyar at gmail.com>>>> wrote:
>                            >
>                            > I have 2 Linux desktop and have following
>         ip's and name:
>                            >
>
>                            >From what I can gather, This seems to be a
>         desktop
>                      class machines.
>                           Hence they may not have IPMI/ILO etc.
>
>                           I am 99.99% certain that fencing has not been
>         configured.
>
>                           I also doubt if it has external storage.
>
>                           (I can identify as I tried my experiments with
>         clusters
>                      first time
>                           with such desktop class machines)
>
>                           The only solution for this is Power Fencing.
>
>                           --
>                           Regards,
>
>                           Rajagopal
>
>                           --
>                           Linux-cluster mailing list
>         Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>         <mailto:Linux-cluster at redhat.__com
>         <mailto:Linux-cluster at redhat.com>>
>                      <mailto:Linux-cluster at redhat.
>         <mailto:Linux-cluster at redhat.>____com
>                      <mailto:Linux-cluster at redhat.__com
>         <mailto:Linux-cluster at redhat.com>>>
>         https://www.redhat.com/____mailman/listinfo/linux-cluster
>         <https://www.redhat.com/__mailman/listinfo/linux-cluster>
>
>
>         <https://www.redhat.com/__mailman/listinfo/linux-cluster
>         <https://www.redhat.com/mailman/listinfo/linux-cluster>__>
>
>
>
>
>                  --
>                  Digimer
>                  Papers and Projects: https://alteeve.ca
>                  "Hydrogen is just a colourless, odorless gas which, if left
>                  alone in sufficient quantities for long periods of
>         time, begins
>                  to think about itself."
>
>
>
>
>
>     --
>     Digimer
>     Papers and Projects: https://alteeve.ca
>     "Hydrogen is just a colourless, odorless gas which, if left alone in
>     sufficient quantities for long periods of time, begins to think
>     about itself."
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca
"Hydrogen is just a colourless, odorless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From tserong at suse.com  Mon Oct  1 12:22:57 2012
From: tserong at suse.com (Tim Serong)
Date: Mon, 01 Oct 2012 22:22:57 +1000
Subject: [Linux-cluster] CFP: Cloud Infrastructure,
 Distributed Storage and High Availability at LCA 2013
Message-ID: <50698B21.4020802@suse.com>

I'm pleased to announce that we will be holding a one day Cloud
Infrastructure, Distributed Storage and High Availability mini
conference[1] on Monday 28 January 2013 as part of linux.conf.au 2013 in
Canberra, Australia[2].

This miniconf is about building reliable infrastructure, from two-node
HA failover pairs to multi-thousand-core cloud systems.  You might like
to think of it as a sequel to the LCA 2012 High Availability and
Distributed Storage miniconf[3].

Do any of the following describe you?

* You're building cloud infrastructure for others to use
  (openstack, cloudstack, eucalyptus, ...)
* Your data needs to be reliably available everywhere
  (ceph, glusterfs, drbd, ...)
* Your system absolutely must be up all the time
  (pacemaker, corosync, ...)

If so, this is the miniconf for you!  Please consider submitting a
presentation at:

  http://tinyurl.com/cidsha-lca2013

We're expecting most talk slots to be 25 minutes (including questions
and changeover), but there will be openings for shorter lightning talks
and maybe a couple of longer talks.  CFP closes on Sunday November 4,
2012.  Notifications of acceptance will be emailed out after this date.

Note that there is also an OpenStack-specific miniconf[4] running on
Tuesday 29 January.  We're hoping this will give us a pretty awesome
two-day LCA 2013 CloudFest.  As a rough rule of thumb, more generic or
infrastructure-related talks should go to Cloud, Distributed Storage &
HA, while deeper OpenStack-specific talks should probably go to the
OpenStack miniconf.  If in doubt, or if you have any other questions,
please contact me directly at tserong at suse.com.

Thanks!

Tim

[1] http://lca2013.linux.org.au/schedule/30073/view_talk
[2] http://lca2013.linux.org.au/
[3]
http://lca2012.linux.org.au/wiki/index.php/Miniconfs/HighAvailabilityAndDistributedStorage
(also videos at http://www.youtube.com/playlist?list=PLE70D0FFF98BC9579)
[4] http://lca2013.linux.org.au/schedule/30100/view_talk?day=tuesday
--
Tim Serong
Senior Clustering Engineer
SUSE
tserong at suse.com


From raju.rajsand at gmail.com  Mon Oct  1 16:45:28 2012
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Mon, 1 Oct 2012 22:15:28 +0530
Subject: [Linux-cluster] linux-cluster
In-Reply-To: <506935FB.8060606@alteeve.ca>
References: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
	<CA+YdgapzScFo1qP8Mk5=WW5OCzWuQmkduS0+QgpVv5D0RZvZiA@mail.gmail.com>
	<CAMyy8gSr3BzTAq8n6QR+tOKL=LJSbNGihOYRoJde_iAXiAeH7A@mail.gmail.com>
	<5069269A.2050405@alteeve.ca>
	<CAMyy8gR39h5NFMNxBSR7jsCXH-4dr7cWEO3vL48G7P+P5RsPRg@mail.gmail.com>
	<CAMyy8gQ46cxbrqsv_KeJ3GSikAiAk7WZCEnNEmtZS_W9CRHyRg@mail.gmail.com>
	<50692EF0.2000505@alteeve.ca>
	<CAMyy8gQgAZzPFiu2NkfwiQTc77wu=oX7LAAzsT0F+4yPpKWGnw@mail.gmail.com>
	<506935FB.8060606@alteeve.ca>
Message-ID: <CA+Ydgaraaw8bhi9=4ugPiy24FLq=UvJwERTL6wPk_M3=WnbyHg@mail.gmail.com>

Greetings,


Hitesh,

Please follow list guidlines.


On Mon, Oct 1, 2012 at 11:49 AM, Digimer <lists at alteeve.ca> wrote:
> You don't seem to be reading what I am typing. Please go back over the
> various replies and read again what I said. Follow the links and read what
> they say.
>
> And please don't reply only to me. Click "Reply All" and include the mailing
> list.
>
>>
>> I have a constraint of using Linux on bare machine for my 2 desktop.

>>                      Can you please let me know as how can i
>>         proceed...Do i have
>>                      to purchase
>>                      some sort of hardware?

Yes. You will need to buy power fencing device -- basically a power
strip with a ethernet port

I would strongly suggest you have two network port on each system.

What you want to do with a cluster?

>>
>>         One more thing...till now i have used this setup:
>>
>>         have Windows vista OS ---> Virtual Box---->Red Hat installed.
>>

You have to be kidding. You are using vista on bare metal for your HA?

>>         If i download Xen or KVM can i use the same setup instead of
>>         Virtual Box?
>>
>>         Windows vista OS ---->Xen or KVM ---->Red Hat installed

http://www.youtube.com/watch?v=oKI-tD0L18A


>>                      [root at hitesh12 ~]# cat /etc/cluster/cluster.conf
>>                      <?xml version="1.0"?>
>>                      <cluster config_version="1" name="dhoni">

There needs to be two_node directive somewher there. Read up.

Better yet get help of some local technical person who knows what is HA.

It is a lot more than simple desktop install.

Or you need to invest quite a bit of time in learning and money in
getting some extra hardware (fence devices, switches, NIC, External
storage -- if required). And dont commit for or do that on production
without knowing what you are getting into.

If you can post more descriptively the objective of using cluster,
perhaps you will get more specific information.

Digimer's _*excellent*_ tutorial covers more or less all that you need
to know about clusters.

I wish I had that when I started playing around with that way back in 2007.

-- 
Regards,

Rajagopal


From parvez.h.shaikh at gmail.com  Tue Oct  2 08:00:32 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Tue, 2 Oct 2012 13:30:32 +0530
Subject: [Linux-cluster] linux-cluster
In-Reply-To: <CA+Ydgaraaw8bhi9=4ugPiy24FLq=UvJwERTL6wPk_M3=WnbyHg@mail.gmail.com>
References: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
	<CA+YdgapzScFo1qP8Mk5=WW5OCzWuQmkduS0+QgpVv5D0RZvZiA@mail.gmail.com>
	<CAMyy8gSr3BzTAq8n6QR+tOKL=LJSbNGihOYRoJde_iAXiAeH7A@mail.gmail.com>
	<5069269A.2050405@alteeve.ca>
	<CAMyy8gR39h5NFMNxBSR7jsCXH-4dr7cWEO3vL48G7P+P5RsPRg@mail.gmail.com>
	<CAMyy8gQ46cxbrqsv_KeJ3GSikAiAk7WZCEnNEmtZS_W9CRHyRg@mail.gmail.com>
	<50692EF0.2000505@alteeve.ca>
	<CAMyy8gQgAZzPFiu2NkfwiQTc77wu=oX7LAAzsT0F+4yPpKWGnw@mail.gmail.com>
	<506935FB.8060606@alteeve.ca>
	<CA+Ydgaraaw8bhi9=4ugPiy24FLq=UvJwERTL6wPk_M3=WnbyHg@mail.gmail.com>
Message-ID: <CAKrd530Y5MeCrYSFbzeCDjKQVRD6ihT3NcqCb1_Z=oSo7ZfmzQ@mail.gmail.com>

What kind of cluster is this - an academic project or production quality
solution?

If its former - go for manual fencing. You wont need fence device but
failover wont be automatic

If its later - yes you'll need fence device

On Mon, Oct 1, 2012 at 10:15 PM, Rajagopal Swaminathan <
raju.rajsand at gmail.com> wrote:

> Greetings,
>
>
> Hitesh,
>
> Please follow list guidlines.
>
>
> On Mon, Oct 1, 2012 at 11:49 AM, Digimer <lists at alteeve.ca> wrote:
> > You don't seem to be reading what I am typing. Please go back over the
> > various replies and read again what I said. Follow the links and read
> what
> > they say.
> >
> > And please don't reply only to me. Click "Reply All" and include the
> mailing
> > list.
> >
> >>
> >> I have a constraint of using Linux on bare machine for my 2 desktop.
>
> >>                      Can you please let me know as how can i
> >>         proceed...Do i have
> >>                      to purchase
> >>                      some sort of hardware?
>
> Yes. You will need to buy power fencing device -- basically a power
> strip with a ethernet port
>
> I would strongly suggest you have two network port on each system.
>
> What you want to do with a cluster?
>
> >>
> >>         One more thing...till now i have used this setup:
> >>
> >>         have Windows vista OS ---> Virtual Box---->Red Hat installed.
> >>
>
> You have to be kidding. You are using vista on bare metal for your HA?
>
> >>         If i download Xen or KVM can i use the same setup instead of
> >>         Virtual Box?
> >>
> >>         Windows vista OS ---->Xen or KVM ---->Red Hat installed
>
> http://www.youtube.com/watch?v=oKI-tD0L18A
>
>
>
> >>                      [root at hitesh12 ~]# cat /etc/cluster/cluster.conf
> >>                      <?xml version="1.0"?>
> >>                      <cluster config_version="1" name="dhoni">
>
> There needs to be two_node directive somewher there. Read up.
>
> Better yet get help of some local technical person who knows what is HA.
>
> It is a lot more than simple desktop install.
>
> Or you need to invest quite a bit of time in learning and money in
> getting some extra hardware (fence devices, switches, NIC, External
> storage -- if required). And dont commit for or do that on production
> without knowing what you are getting into.
>
> If you can post more descriptively the objective of using cluster,
> perhaps you will get more specific information.
>
> Digimer's _*excellent*_ tutorial covers more or less all that you need
> to know about clusters.
>
> I wish I had that when I started playing around with that way back in 2007.
>
> --
> Regards,
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121002/5727074c/attachment.htm>

From lists at alteeve.ca  Tue Oct  2 13:38:28 2012
From: lists at alteeve.ca (Digimer)
Date: Tue, 02 Oct 2012 09:38:28 -0400
Subject: [Linux-cluster] linux-cluster
In-Reply-To: <CAKrd530Y5MeCrYSFbzeCDjKQVRD6ihT3NcqCb1_Z=oSo7ZfmzQ@mail.gmail.com>
References: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
	<CA+YdgapzScFo1qP8Mk5=WW5OCzWuQmkduS0+QgpVv5D0RZvZiA@mail.gmail.com>
	<CAMyy8gSr3BzTAq8n6QR+tOKL=LJSbNGihOYRoJde_iAXiAeH7A@mail.gmail.com>
	<5069269A.2050405@alteeve.ca>
	<CAMyy8gR39h5NFMNxBSR7jsCXH-4dr7cWEO3vL48G7P+P5RsPRg@mail.gmail.com>
	<CAMyy8gQ46cxbrqsv_KeJ3GSikAiAk7WZCEnNEmtZS_W9CRHyRg@mail.gmail.com>
	<50692EF0.2000505@alteeve.ca>
	<CAMyy8gQgAZzPFiu2NkfwiQTc77wu=oX7LAAzsT0F+4yPpKWGnw@mail.gmail.com>
	<506935FB.8060606@alteeve.ca>
	<CA+Ydgaraaw8bhi9=4ugPiy24FLq=UvJwERTL6wPk_M3=WnbyHg@mail.gmail.com>
	<CAKrd530Y5MeCrYSFbzeCDjKQVRD6ihT3NcqCb1_Z=oSo7ZfmzQ@mail.gmail.com>
Message-ID: <506AEE54.7070209@alteeve.ca>

On 10/02/2012 04:00 AM, Parvez Shaikh wrote:
> What kind of cluster is this - an academic project or production quality
> solution?
>
> If its former - go for manual fencing. You wont need fence device but
> failover wont be automatic

*Please* don't do this. Manual fencing support was dropped for a reason. 
It's *far* too easy to mess things up when an admin uses it before 
identifying a problem.

> If its later - yes you'll need fence device

This is the only sane option; Academic or production. Fencing is an 
integral part of the cluster and you do yourself no favour by not 
learning it in an academic setup.

-- 
Digimer
Papers and Projects: https://alteeve.ca
"Hydrogen is just a colourless, odourless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From parvez.h.shaikh at gmail.com  Tue Oct  2 13:43:40 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Tue, 2 Oct 2012 19:13:40 +0530
Subject: [Linux-cluster] linux-cluster
In-Reply-To: <506AEE54.7070209@alteeve.ca>
References: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
	<CA+YdgapzScFo1qP8Mk5=WW5OCzWuQmkduS0+QgpVv5D0RZvZiA@mail.gmail.com>
	<CAMyy8gSr3BzTAq8n6QR+tOKL=LJSbNGihOYRoJde_iAXiAeH7A@mail.gmail.com>
	<5069269A.2050405@alteeve.ca>
	<CAMyy8gR39h5NFMNxBSR7jsCXH-4dr7cWEO3vL48G7P+P5RsPRg@mail.gmail.com>
	<CAMyy8gQ46cxbrqsv_KeJ3GSikAiAk7WZCEnNEmtZS_W9CRHyRg@mail.gmail.com>
	<50692EF0.2000505@alteeve.ca>
	<CAMyy8gQgAZzPFiu2NkfwiQTc77wu=oX7LAAzsT0F+4yPpKWGnw@mail.gmail.com>
	<506935FB.8060606@alteeve.ca>
	<CA+Ydgaraaw8bhi9=4ugPiy24FLq=UvJwERTL6wPk_M3=WnbyHg@mail.gmail.com>
	<CAKrd530Y5MeCrYSFbzeCDjKQVRD6ihT3NcqCb1_Z=oSo7ZfmzQ@mail.gmail.com>
	<506AEE54.7070209@alteeve.ca>
Message-ID: <CAKrd5301McvOBqWbeue95izhMcGEVK18=sGm_eUgzfmtxKBGvg@mail.gmail.com>

Hi Digimer,

Could you please give me reference/case studies of problem about why manual
fencing was dropped and how automated fencing is fixing those?

Thanks,
Parvez

On Tue, Oct 2, 2012 at 7:08 PM, Digimer <lists at alteeve.ca> wrote:

> On 10/02/2012 04:00 AM, Parvez Shaikh wrote:
>
>> What kind of cluster is this - an academic project or production quality
>> solution?
>>
>> If its former - go for manual fencing. You wont need fence device but
>> failover wont be automatic
>>
>
> *Please* don't do this. Manual fencing support was dropped for a reason.
> It's *far* too easy to mess things up when an admin uses it before
> identifying a problem.
>
>
>  If its later - yes you'll need fence device
>>
>
> This is the only sane option; Academic or production. Fencing is an
> integral part of the cluster and you do yourself no favour by not learning
> it in an academic setup.
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca
> "Hydrogen is just a colourless, odourless gas which, if left alone in
> sufficient quantities for long periods of time, begins to think about
> itself."
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121002/82928cc0/attachment.htm>

From lists at alteeve.ca  Tue Oct  2 14:03:06 2012
From: lists at alteeve.ca (Digimer)
Date: Tue, 02 Oct 2012 10:03:06 -0400
Subject: [Linux-cluster] linux-cluster
In-Reply-To: <CAKrd5301McvOBqWbeue95izhMcGEVK18=sGm_eUgzfmtxKBGvg@mail.gmail.com>
References: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
	<CA+YdgapzScFo1qP8Mk5=WW5OCzWuQmkduS0+QgpVv5D0RZvZiA@mail.gmail.com>
	<CAMyy8gSr3BzTAq8n6QR+tOKL=LJSbNGihOYRoJde_iAXiAeH7A@mail.gmail.com>
	<5069269A.2050405@alteeve.ca>
	<CAMyy8gR39h5NFMNxBSR7jsCXH-4dr7cWEO3vL48G7P+P5RsPRg@mail.gmail.com>
	<CAMyy8gQ46cxbrqsv_KeJ3GSikAiAk7WZCEnNEmtZS_W9CRHyRg@mail.gmail.com>
	<50692EF0.2000505@alteeve.ca>
	<CAMyy8gQgAZzPFiu2NkfwiQTc77wu=oX7LAAzsT0F+4yPpKWGnw@mail.gmail.com>
	<506935FB.8060606@alteeve.ca>
	<CA+Ydgaraaw8bhi9=4ugPiy24FLq=UvJwERTL6wPk_M3=WnbyHg@mail.gmail.com>
	<CAKrd530Y5MeCrYSFbzeCDjKQVRD6ihT3NcqCb1_Z=oSo7ZfmzQ@mail.gmail.com>
	<506AEE54.7070209@alteeve.ca>
	<CAKrd5301McvOBqWbeue95izhMcGEVK18=sGm_eUgzfmtxKBGvg@mail.gmail.com>
Message-ID: <506AF41A.5000906@alteeve.ca>

This talks about how manual fencing isn't actual fencing;

https://fedorahosted.org/cluster/wiki/Fence

There was a page where it was said that manual fencing was in no way 
supported, but I can't find it at the moment.

The reason it is not safe is that an admin is likely to issue it in a 
panic while trying to get a hung cluster back online. If this happens 
without first ensuring the peer node(s) is fenced, you can walk into a 
split-brain.

digimer

On 10/02/2012 09:43 AM, Parvez Shaikh wrote:
> Hi Digimer,
>
> Could you please give me reference/case studies of problem about why
> manual fencing was dropped and how automated fencing is fixing those?
>
> Thanks,
> Parvez
>
> On Tue, Oct 2, 2012 at 7:08 PM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
>
>     On 10/02/2012 04:00 AM, Parvez Shaikh wrote:
>
>         What kind of cluster is this - an academic project or production
>         quality
>         solution?
>
>         If its former - go for manual fencing. You wont need fence
>         device but
>         failover wont be automatic
>
>
>     *Please* don't do this. Manual fencing support was dropped for a
>     reason. It's *far* too easy to mess things up when an admin uses it
>     before identifying a problem.
>
>
>         If its later - yes you'll need fence device
>
>
>     This is the only sane option; Academic or production. Fencing is an
>     integral part of the cluster and you do yourself no favour by not
>     learning it in an academic setup.
>
>
>     --
>     Digimer
>     Papers and Projects: https://alteeve.ca
>     "Hydrogen is just a colourless, odourless gas which, if left alone
>     in sufficient quantities for long periods of time, begins to think
>     about itself."
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca
"Hydrogen is just a colourless, odorless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From raju.rajsand at gmail.com  Tue Oct  2 16:08:09 2012
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Tue, 2 Oct 2012 21:38:09 +0530
Subject: [Linux-cluster] linux-cluster
In-Reply-To: <506AF41A.5000906@alteeve.ca>
References: <CAMyy8gSn76Vemu_bY81Xq9t4Wfz_fxZOvmLBekdYg8y_hgtanw@mail.gmail.com>
	<CA+YdgapzScFo1qP8Mk5=WW5OCzWuQmkduS0+QgpVv5D0RZvZiA@mail.gmail.com>
	<CAMyy8gSr3BzTAq8n6QR+tOKL=LJSbNGihOYRoJde_iAXiAeH7A@mail.gmail.com>
	<5069269A.2050405@alteeve.ca>
	<CAMyy8gR39h5NFMNxBSR7jsCXH-4dr7cWEO3vL48G7P+P5RsPRg@mail.gmail.com>
	<CAMyy8gQ46cxbrqsv_KeJ3GSikAiAk7WZCEnNEmtZS_W9CRHyRg@mail.gmail.com>
	<50692EF0.2000505@alteeve.ca>
	<CAMyy8gQgAZzPFiu2NkfwiQTc77wu=oX7LAAzsT0F+4yPpKWGnw@mail.gmail.com>
	<506935FB.8060606@alteeve.ca>
	<CA+Ydgaraaw8bhi9=4ugPiy24FLq=UvJwERTL6wPk_M3=WnbyHg@mail.gmail.com>
	<CAKrd530Y5MeCrYSFbzeCDjKQVRD6ihT3NcqCb1_Z=oSo7ZfmzQ@mail.gmail.com>
	<506AEE54.7070209@alteeve.ca>
	<CAKrd5301McvOBqWbeue95izhMcGEVK18=sGm_eUgzfmtxKBGvg@mail.gmail.com>
	<506AF41A.5000906@alteeve.ca>
Message-ID: <CA+YdgaqWTJ7HHobz=k4SGJ-fK_T02meU4AXh4GvpCQq5eC5CoQ@mail.gmail.com>

Greetings,

On Tue, Oct 2, 2012 at 7:33 PM, Digimer <lists at alteeve.ca> wrote:
>
> The reason it is not safe is that an admin is likely to issue it in a panic
> while trying to get a hung cluster back online. If this happens without
>

+1
+1
+1

> first ensuring the peer node(s) is fenced, you can walk into a split-brain.

My first attempts without fencing landed me in the nightmare of "been
there done that" of cleaning up the mess of split brain or more aptly,
"the fluid brain which hit the fan"

To rephrase the old adage "To err is human, but to really, completely
mess it up, it takes a cluster with split brain". It might be probably
easier to use magnet to write bits on the disk than to clean up *that*
mess.

Not to talk about downtime and the fury of users.

No baby, A HA cluster ain't worth its name without fencing.

And mine was in an academic environment.

-- 
Regards,

Rajagopal


From parvez.h.shaikh at gmail.com  Wed Oct  3 05:23:17 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Wed, 3 Oct 2012 10:53:17 +0530
Subject: [Linux-cluster] Hi
In-Reply-To: <CADFPcS3xSoq0UnxAwZPTtrgxvURt15-W-5i3nBjMo90qDkgfPg@mail.gmail.com>
References: <CADFPcS28S1nNMFBtVZ8bONBn=jWtF17CuBBcd4bbSVSViovKhg@mail.gmail.com>
	<CADFPcS0qL-m7_aHGVzH+baMHXp6io=YsbWSkcjs0JPrCt0sa1g@mail.gmail.com>
	<1347383423.10128.555.camel@aeapen.blr.redhat.com>
	<CADFPcS1tAJS+GTA5+dfvFuLuD+8GJnsHrm8FE3tQD+Zrfn_8Jw@mail.gmail.com>
	<1347445264.32050.398.camel@aeapen.blr.redhat.com>
	<CADFPcS3QiReQ96ZV6a1+0hNSOYc=ouzX1YiW-YkropJFa0YADg@mail.gmail.com>
	<CADFPcS3xSoq0UnxAwZPTtrgxvURt15-W-5i3nBjMo90qDkgfPg@mail.gmail.com>
Message-ID: <CAKrd533gB9t4-nzf+5sU=mfqNFtX8Q=nWT83zuEzzqKbLJrCbg@mail.gmail.com>

A curious observation, there is a sudden surge of sending emails on private
addresses rather than sending over a mailing list.

Please send your doubts / questions on mailing list "
linux-cluster at redhat.com" instead of addressing personally.

Regarding configuration for manual fencing - I don't have it with me, it
was available with RHEL 5.5. Check it out in system-config-cluster tool if
you can add manual fencing.

Thanks,
Parvez

On Wed, Oct 3, 2012 at 10:46 AM, Renchu Mathew <renchumv at gmail.com> wrote:

>  Hi Purvez,
>
> I am trying to setup a test cluster environmet. But I haven't doen
> fencing. Please find below error messages. Some time after the nodes
> restarted, the other node is going down. can you please send me
> theconfiguration for manual fencing?
>
>
>>  > Please find attached my cluster setup. It is not stable
>> > and /var/log/messages shows the below errors.
>> >
>> >
>> > Sep 11 08:49:10 node1 corosync[1814]:   [QUORUM] Members[2]: 1 2
>> > Sep 11 08:49:10 node1 corosync[1814]:   [QUORUM] Members[2]: 1 2
>> > Sep 11 08:49:10 node1 corosync[1814]:   [CPG   ] chosen downlist:
>> > sender r(0) ip(192.168.1.251) ; members(old:2 left:1)
>> > Sep 11 08:49:10 node1 corosync[1814]:   [MAIN  ] Completed service
>> > synchronization, ready to provide service.
>> > Sep 11 08:49:11 node1 corosync[1814]: cman killed by node 2 because we
>> > were killed by cman_tool or other application
>> > Sep 11 08:49:11 node1 fenced[1875]: telling cman to remove nodeid 2
>> > from cluster
>> > Sep 11 08:49:11 node1 fenced[1875]: cluster is down, exiting
>> > Sep 11 08:49:11 node1 gfs_controld[1950]: cluster is down, exiting
>> > Sep 11 08:49:11 node1 gfs_controld[1950]: daemon cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 gfs_controld[1950]: cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: cluster is down, exiting
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: daemon cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 dlm_controld[1889]: cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 fenced[1875]: daemon cpg_dispatch error 2
>> > Sep 11 08:49:11 node1 rgmanager[2409]: #67: Shutting down uncleanly
>> > Sep 11 08:49:11 node1 rgmanager[17059]: [clusterfs] unmounting /Data
>> > Sep 11 08:49:11 node1 rgmanager[17068]: [clusterfs] Sending SIGTERM to
>> > processes on /Data
>> > Sep 11 08:49:16 node1 rgmanager[17104]: [clusterfs] unmounting /Data
>> > Sep 11 08:49:16 node1 rgmanager[17113]: [clusterfs] Sending SIGKILL to
>> > processes on /Data
>> > Sep 11 08:49:19 node1 kernel: dlm: closing connection to node 2
>> > Sep 11 08:49:19 node1 kernel: dlm: closing connection to node 1
>> > Sep 11 08:49:19 node1 kernel: dlm: gfs2: no userland control daemon,
>> > stopping lockspace
>> > Sep 11 08:49:22 node1 rgmanager[17149]: [clusterfs] unmounting /Data
>> > Sep 11 08:49:22 node1 rgmanager[17158]: [clusterfs] Sending SIGKILL to
>> > processes on /Data
>> >
>> >
>> >
>> > Also when I try to restart the cman service, below error comes.
>> > Starting cluster:
>> >    Checking if cluster has been disabled at boot...        [  OK  ]
>> >    Checking Network Manager...                             [  OK  ]
>> >    Global setup...                                         [  OK  ]
>> >    Loading kernel modules...                               [  OK  ]
>> >    Mounting configfs...                                    [  OK  ]
>> >    Starting cman...                                        [  OK  ]
>> >    Waiting for quorum...                                   [  OK  ]
>> >    Starting fenced...                                      [  OK  ]
>> >    Starting dlm_controld...                                [  OK  ]
>> >    Starting gfs_controld...                                [  OK  ]
>> >    Unfencing self... fence_node: cannot connect to cman
>> >                                                            [FAILED]
>> > Stopping cluster:
>> >    Leaving fence domain...                                 [  OK  ]
>> >    Stopping gfs_controld...                                [  OK  ]
>> >    Stopping dlm_controld...                                [  OK  ]
>> >    Stopping fenced...                                      [  OK  ]
>> >    Stopping cman...                                        [  OK  ]
>> >    Unloading kernel modules...                             [  OK  ]
>> >    Unmounting configfs...                                  [  OK  ]
>> >
>> > Thanks again.
>> > Renchu Mathew
>> > On Tue, Sep 11, 2012 at 9:10 PM, Arun Eapen CISSP, RHCA
>> > <arun at redhat.com> wrote:
>> >
>> >
>> >
>> >         Put the fenced in debug mode and copy the error messages, for
>> >         me to
>> >         debug
>> >
>> >         On Tue, 2012-09-11 at 11:52 +0400, Renchu Mathew wrote:
>> >         > Hi Arun,
>> >         >
>> >         > I have done the RH436 course in conducted by you at Redhat
>> >         b'lore. How
>> >         > r u?
>> >         >
>> >         > I have configured a 2 node failover cluster setup (almost
>> >         same like
>> >         > our RH436 lab setup in b'lore) It is almost ok except
>> >         fencing. If I
>> >         > pull the active node network cable it is not switching to
>> >         the other
>> >         > automatically. It is getting hung. Then I have to do this
>> >         manually. Is
>> >         > there any script for creating the dummy fencing in RHCS
>> >         which will
>> >         > restart or shutdown the other node. Please find attached my
>> >         > cluster.conf file. is there anyway we can power fence using
>> >         APC UPS.
>> >         >
>> >         > Could you please help me if you get some time.
>> >         >
>> >         > Thanks and regards
>> >         > Renchu Mathew
>> >         >
>> >         >
>> >         >
>> >
>> >
>> >
>> >         --
>> >         Arun Eapen
>> >         CISSP, RHC{A,DS,E,I,SS,VA,X}
>> >         Senior Technical Consultant & Certification Poobah
>> >         Red Hat India Pvt. Ltd.,
>> >         No - 4/1, Bannergatta Road,
>> >         IBC Knowledge Park,
>> >         11th floor, Tower D,
>> >         Bangalore - 560029, INDIA.
>> >
>> >
>> >
>>
>>
>> --
>> Arun Eapen
>> CISSP, RHC{A,DS,E,I,SS,VA,X}
>> Senior Technical Consultant & Certification Poobah
>> Red Hat India Pvt. Ltd.,
>> No - 4/1, Bannergatta Road,
>> IBC Knowledge Park,
>> 11th floor, Tower D,
>> Bangalore - 560029, INDIA.
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121003/ecf3f31c/attachment.htm>

From lists at verwilst.be  Thu Oct  4 13:47:31 2012
From: lists at verwilst.be (Bart Verwilst)
Date: Thu, 04 Oct 2012 15:47:31 +0200
Subject: [Linux-cluster] Failover network device with rgmanager
Message-ID: <2c3f847bbba16467723fe057dbded285@verwilst.be>

Hi,

I would like to make rgmanager manage a network interface i configured 
under sysconfig ( ifcfg-ethX ). It should be brought up by the active 
node as a resource, and ifdown'ed by the standby node. ( It's actually a 
GRE tunnel interface ). Is there a straightforward way on how to do this 
with CentOS 6.2 cman/rgmanager?

Thanks in advance!

Kind regards,

Bart Verwilst


From lhh at redhat.com  Thu Oct  4 15:56:52 2012
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 04 Oct 2012 11:56:52 -0400
Subject: [Linux-cluster] Failover network device with rgmanager
In-Reply-To: <2c3f847bbba16467723fe057dbded285@verwilst.be>
References: <2c3f847bbba16467723fe057dbded285@verwilst.be>
Message-ID: <506DB1C4.2080609@redhat.com>

On 10/04/2012 09:47 AM, Bart Verwilst wrote:
> Hi,
> 
> I would like to make rgmanager manage a network interface i configured
> under sysconfig ( ifcfg-ethX ). It should be brought up by the active
> node as a resource, and ifdown'ed by the standby node. ( It's actually a
> GRE tunnel interface ). Is there a straightforward way on how to do this
> with CentOS 6.2 cman/rgmanager?
> 

'script' resource, like:

#!/bin/sh

case $1 in
start)
	ifup ethX
	exit $?
	;;
stop)
	ifdown ethX
	exit $?
	;;
status)
        ...
	;;
esac

exit 1

-- Lon


From heiko.nardmann at itechnical.de  Thu Oct  4 16:22:49 2012
From: heiko.nardmann at itechnical.de (Heiko Nardmann)
Date: Thu, 04 Oct 2012 18:22:49 +0200
Subject: [Linux-cluster] Failover network device with rgmanager
In-Reply-To: <506DB1C4.2080609@redhat.com>
References: <2c3f847bbba16467723fe057dbded285@verwilst.be>
	<506DB1C4.2080609@redhat.com>
Message-ID: <506DB7D9.3080909@itechnical.de>

Isn't that a standard ip resource inside cluster.conf?

Kind regards,

    Heiko

Am 04.10.2012 17:56, schrieb Lon Hohberger:
> On 10/04/2012 09:47 AM, Bart Verwilst wrote:
>> Hi,
>>
>> I would like to make rgmanager manage a network interface i configured
>> under sysconfig ( ifcfg-ethX ). It should be brought up by the active
>> node as a resource, and ifdown'ed by the standby node. ( It's actually a
>> GRE tunnel interface ). Is there a straightforward way on how to do this
>> with CentOS 6.2 cman/rgmanager?
>>
> 'script' resource, like:
>
> #!/bin/sh
>
> case $1 in
> start)
> 	ifup ethX
> 	exit $?
> 	;;
> stop)
> 	ifdown ethX
> 	exit $?
> 	;;
> status)
>         ...
> 	;;
> esac
>
> exit 1
>
> -- Lon


From mgrac at redhat.com  Fri Oct  5 11:19:02 2012
From: mgrac at redhat.com (Marek Grac)
Date: Fri, 05 Oct 2012 13:19:02 +0200
Subject: [Linux-cluster] fence-agents-3.1.10 stable release
Message-ID: <506EC226.1010900@redhat.com>

Welcome to the fence-agents 3.1.10 release.

This release includes these updates:
* Faster fencing in fence_vmware_soap
* Action metadata is supported also on older fence agents
* support for using sudo in fence_virsh

The new source tarball can be downloaded here:

https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-3.1.10.tar.xz 


To report bugs or issues:

https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

    Join us on IRC (irc.freenode.net #linux-cluster) and share your
    experience  with other sysadministrators or power users.

Thanks/congratulations to all people that contributed to achieve this
great milestone.

m,


From mmorgan at dca.net  Fri Oct  5 20:22:16 2012
From: mmorgan at dca.net (Michael Morgan)
Date: Fri, 5 Oct 2012 16:22:16 -0400
Subject: [Linux-cluster] GFS2 showing the wrong directory contents on one
	node?
Message-ID: <20121005202216.GK17352@staff.dca.net>

Hello,

 I have a 6 node CentOS 5.8 cluster with 4 nodes mounting a GFS2 filesystem.
Everything had been running nicely for about 2 years but over the past few
months I've had a strange occurence happen twice. One of the two web server
nodes will suddenly start listing the wrong directory contents, both nodes have
been affected at different times. This only seems to affect one or two
directories but it's hard to be certain since there are a large numer of them.
There are no errors logged anywhere on the cluster. Unmounting GFS2 on this
node usually causes a hang and eventual fence. The node will come back online
without issue and begin functioning normally again.

 Just a few minutes ago it started happening again. I currently have services
stopped but have not gone through the unmount/reboot process yet. Before I do
that I figured I'd check the list to see if anyone has come across this before.
Is there any GFS2/cluster information I should be dumping to track down the
cause? Any insight would be appreciated. Thanks.

-Mike


From shanti.pahari at sierra.sg  Mon Oct  8 01:57:01 2012
From: shanti.pahari at sierra.sg (Shanti Pahari)
Date: Mon, 8 Oct 2012 09:57:01 +0800 (SGT)
Subject: [Linux-cluster] LVM cannot initilize
Message-ID: <b7083cda.00000e68.00001f42@sierra-A66>

Hi all,

 
After I added root volume in volume_list in lvm.conf I cannot inlitilize
other lvm .

If I remove volume_list from lvm.conf then only I can intilize other lvm.
But volume_list with only root volume  is required for the cluster.

 
Volume_list  =  [ "myrootvolume", "@hostname" ]

 
Can you help me how can I solve this?

 
Regards,

Shanti

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121008/f1e63b8c/attachment.htm>

From bmr at redhat.com  Mon Oct  8 10:42:56 2012
From: bmr at redhat.com (Bryn M. Reeves)
Date: Mon, 08 Oct 2012 11:42:56 +0100
Subject: [Linux-cluster] LVM cannot initilize
In-Reply-To: <b7083cda.00000e68.00001f42@sierra-A66>
References: <b7083cda.00000e68.00001f42@sierra-A66>
Message-ID: <5072AE30.3000904@redhat.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/08/2012 02:57 AM, Shanti Pahari wrote:
> If I remove volume_list from lvm.conf then only I can intilize
> other lvm. But volume_list with only root volume  is required for
> the cluster.
> 
> 
> 
> Volume_list  =  [ "myrootvolume", "@hostname" ]

It's 'volume_list' (lowercase 'v') and if you're specifying a logical
volume (rather than a volume group) it needs to be "vgname/lvname".

Also make sure it goes in the 'activation' section - there should be a
commented-out example you can use as a template in the default lvm.conf:

# If volume_list is defined, each LV is only activated if there is a
# match against the list.
#  "vgname" and "vgname/lvname" are matched exactly.
#  "@tag" matches any tag set in the LV or VG.
#  "@*" matches if any tag defined on the host is set in the LV or VG
#
# volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ]

Regards,
Bryn.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlByri8ACgkQ6YSQoMYUY94ZNgCeL/ectFyPgippkiQVEYTPWpn7
lP0AoIls3TalqQZgQ0M5fxJppFrUnjVK
=AsGP
-----END PGP SIGNATURE-----


From shanti.pahari at sierra.sg  Mon Oct  8 12:13:33 2012
From: shanti.pahari at sierra.sg (Shanti Pahari)
Date: Mon, 8 Oct 2012 20:13:33 +0800 (SGT)
Subject: [Linux-cluster] LVM cannot initilize
In-Reply-To: <5072AE30.3000904@redhat.com>
References: <b7083cda.00000e68.00001f42@sierra-A66>
	<5072AE30.3000904@redhat.com>
Message-ID: <25537b6b.00000e68.00001f81@sierra-A66>

volume_list = [ "vg_pdcpicpl01", "@PDC-PIC-PL-01" ]

where my vg_pdcpicpl01 is Volume group rather than logical volume. This is
my root volume .

In example I saw that I have to specify only VG of root .

Can you help me where I am wrong ?

Thanks,
Shanti


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bryn M. Reeves
Sent: Monday, 8 October, 2012 6:43 PM
To: linux clustering
Subject: Re: [Linux-cluster] LVM cannot initilize

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/08/2012 02:57 AM, Shanti Pahari wrote:
> If I remove volume_list from lvm.conf then only I can intilize other 
> lvm. But volume_list with only root volume  is required for the 
> cluster.
> 
> 
> 
> Volume_list  =  [ "myrootvolume", "@hostname" ]

It's 'volume_list' (lowercase 'v') and if you're specifying a logical
volume (rather than a volume group) it needs to be "vgname/lvname".

Also make sure it goes in the 'activation' section - there should be a
commented-out example you can use as a template in the default lvm.conf:

# If volume_list is defined, each LV is only activated if there is a #
match against the list.
#  "vgname" and "vgname/lvname" are matched exactly.
#  "@tag" matches any tag set in the LV or VG.
#  "@*" matches if any tag defined on the host is set in the LV or VG # #
volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ]

Regards,
Bryn.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlByri8ACgkQ6YSQoMYUY94ZNgCeL/ectFyPgippkiQVEYTPWpn7
lP0AoIls3TalqQZgQ0M5fxJppFrUnjVK
=AsGP
-----END PGP SIGNATURE-----

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From bmr at redhat.com  Mon Oct  8 12:46:57 2012
From: bmr at redhat.com (Bryn M. Reeves)
Date: Mon, 08 Oct 2012 13:46:57 +0100
Subject: [Linux-cluster] LVM cannot initilize
In-Reply-To: <25537b6b.00000e68.00001f81@sierra-A66>
References: <b7083cda.00000e68.00001f42@sierra-A66>
	<5072AE30.3000904@redhat.com>
	<25537b6b.00000e68.00001f81@sierra-A66>
Message-ID: <5072CB41.4090006@redhat.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/08/2012 01:13 PM, Shanti Pahari wrote:
> volume_list = [ "vg_pdcpicpl01", "@PDC-PIC-PL-01" ]
> 
> where my vg_pdcpicpl01 is Volume group rather than logical volume.
> This is my root volume .

Either a tag "@tagname", a volume group as "vgname", or a logical
volume name as "vgname/lvname" (otherwise the tools cannot know which
LV you mean if there are multiple LVs with the same name in different
VGs).

In your earlier example you seemed to have an LV name:

>> Volume_list  =  [ "myrootvolume", "@hostname" ]

Which won't work since the LV name is unqualified by a VG. Your later
example:

> volume_list = [ "vg_pdcpicpl01", "@PDC-PIC-PL-01" ]

Looks correct assuming that vg_pdcpicpl01 is the name of a VG on your
system.

> In example I saw that I have to specify only VG of root .

It's up to you whether you want to specify just an LV or a whole VG.

> Can you help me where I am wrong ?

It's hard to say. What error do you get initialising LVM? Try adding
more -v if there's nothing useful printed (normally syntax errors in
lvm.conf give a useful message).

Failing that you could post your full lvm.conf to a pastebin somewhere
and mail the link so that others can review your config.

Regards,
Bryn.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlByy0EACgkQ6YSQoMYUY95VEgCdFK9QqeTOGAiiYnZiuAJ6iHY5
BrUAmwbCBULcq9gEVBtqOg8wAAGXph5M
=UsOX
-----END PGP SIGNATURE-----


From cfeist at redhat.com  Tue Oct  9 00:27:36 2012
From: cfeist at redhat.com (Chris Feist)
Date: Mon, 08 Oct 2012 19:27:36 -0500
Subject: [Linux-cluster] Announce: pcs-0.9.26
Message-ID: <50736F78.3060906@redhat.com>

We've been making improvements to the pcs (pacemaker/corosync configuration 
system) command line tool over the past few months.

Currently you can setup a basic cluster (including configuring corosync 2.0 udpu).

David Vossel has also created a version of the "Clusters from Scratch" document 
that illustrates setting up a cluster using pcs.  This should be showing up shortly.

You can view the source here: https://github.com/feist/pcs/

Or download the latest tarball:
https://github.com/downloads/feist/pcs/pcs-0.9.26.tar.gz

There is also a Fedora 18 package that will be included with the next release. 
You should be able to find that package in the following locations...

RPM:
http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm

SRPM:
http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.src.rpm

In the near future we are planning on having builds for SUSE & Ubuntu/Debian.

We're also actively working on a GUI/Daemon that will allow control of your 
entire cluster from one node and/or a web browser.

Please feel free to email me (cfeist at redhat.com) or open issues on the pcs 
project at github (https://github.com/feist/pcs/issues) if you have any 
questions or problems.

Thanks!
Chris


From ming-ming.chen at hp.com  Tue Oct  9 01:47:14 2012
From: ming-ming.chen at hp.com (Chen, Ming Ming)
Date: Tue, 9 Oct 2012 01:47:14 +0000
Subject: [Linux-cluster]  Configure multiple heartbeat on a redhat cluster
In-Reply-To: <5057ED14.4030601@alteeve.ca>
References: <CA+C_GOV08RN0bv4v--xVyY1hkU54r4ntVyXvHk4UgUtWjt3sSg@mail.gmail.com>
	<5057D17D.9060108@alteeve.ca>
	<CA+C_GOXhjW8aeEb586Oum81Qn+u2ryKNwx1pDHA_a7v6WBrcgw@mail.gmail.com>
	<5057E9C5.60506@alteeve.ca>
	<CA+C_GOW=epBq8N-XFhzQ=Fij+Ahz6J-9B+gASyb5PGi31v=8RQ@mail.gmail.com>
	<5057ED14.4030601@alteeve.ca>
Message-ID: <1D241511770E2F4BA89AFD224EDD527141025BAE@G9W0733.americas.hpqcorp.net>


 Hi,
 Is there a way to configure multiple heartbeat network in the /etc/cluster.conf file.
I'm using redhat cluster.
Regards
Ming


From ming-ming.chen at hp.com  Tue Oct  9 01:55:01 2012
From: ming-ming.chen at hp.com (Chen, Ming Ming)
Date: Tue, 9 Oct 2012 01:55:01 +0000
Subject: [Linux-cluster] problem quorum cman
In-Reply-To: <CANQ3Co9=qVZehgECe=ZGj_wEv-eGGR__-TbDF-M=yMgeQuRpyg@mail.gmail.com>
References: <CANQ3Co9=qVZehgECe=ZGj_wEv-eGGR__-TbDF-M=yMgeQuRpyg@mail.gmail.com>
Message-ID: <1D241511770E2F4BA89AFD224EDD527141025C87@G9W0733.americas.hpqcorp.net>

Hi,

Have you ever resolved this issue? If so, what is the problem? I sometime see the same issue on my cluster.

Ming


From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of hakim abdellaoui
Sent: Friday, July 20, 2012 2:15 AM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] problem quorum cman


Hi,

I use rhel6.3 with packages :

cman-3.0.12.1-32.el6.x86_64
rgmanager-3.0.12.1-12.el6.x86_64
openais-1.1.1-7.el6.x86_64

I have two virtual nodes  (vmware) and a quorum share disk (it's a virtual disk i use scsi sharing multi-write)

the cluster work sometime.

if i reboot node2 the cman not start i have   : Waiting for quorum... Timed-out waiting for cluster.

On the log corosync i have :

Jul 20 10:51:22 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 20 10:51:22 corosync [CPG   ] chosen downlist: sender r(0) ip(192.168.10.154) ; members(old:1 left:0)
Jul 20 10:51:22 corosync [MAIN  ] Completed service synchronization, ready to provide service.
Jul 20 10:51:23 corosync [CMAN  ] quorum device unregistered


On the node1 when i type clustat i have :

Cluster Status for clusterweb @ Fri Jul 20 10:38:57 2012
Member Status: Quorate

 Member Name                                  ID   Status
 ------ ----                                  ---- ------
 server-1                                      1 Online, Local
 server-2                                      2 Offline
 /dev/block/8:16                           0 Online, Quorum Disk


If i restart cman on node1 and  i restart cman on node2 the cman start properly a
When i  type clustat on both nodes i can see all online.


I don't understand why i must restart on node1 the cman if i want to add the node2 on the
cluster .


You can see my cluster.conf

<?xml version="1.0"?>
<cluster config_version="6" name="clusterweb">
        <clusternodes>
                <clusternode name="server-1" nodeid="1"/>
                <clusternode name="server-2" nodeid="2"/>
        </clusternodes>
        <cman expected_votes="3"/>
        <quorumd label="quorum">
                <heuristic program="ping -c3 -t2 192.168.254.254"/>
        </quorumd>
</cluster>


Very thanks for your help

Best regards.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121009/c46a6113/attachment.htm>

From lists at alteeve.ca  Tue Oct  9 01:58:04 2012
From: lists at alteeve.ca (Digimer)
Date: Mon, 08 Oct 2012 21:58:04 -0400
Subject: [Linux-cluster] Announce: pcs-0.9.26
In-Reply-To: <50736F78.3060906@redhat.com>
References: <50736F78.3060906@redhat.com>
Message-ID: <507384AC.5010805@alteeve.ca>

Well, I was looking for a reason to download and start testing Fedora 
18. Suppose this is a good enough reason. :)

digimer

On 10/08/2012 08:27 PM, Chris Feist wrote:
> We've been making improvements to the pcs (pacemaker/corosync
> configuration system) command line tool over the past few months.
>
> Currently you can setup a basic cluster (including configuring corosync
> 2.0 udpu).
>
> David Vossel has also created a version of the "Clusters from Scratch"
> document that illustrates setting up a cluster using pcs.  This should
> be showing up shortly.
>
> You can view the source here: https://github.com/feist/pcs/
>
> Or download the latest tarball:
> https://github.com/downloads/feist/pcs/pcs-0.9.26.tar.gz
>
> There is also a Fedora 18 package that will be included with the next
> release. You should be able to find that package in the following
> locations...
>
> RPM:
> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
>
> SRPM:
> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.src.rpm
>
> In the near future we are planning on having builds for SUSE &
> Ubuntu/Debian.
>
> We're also actively working on a GUI/Daemon that will allow control of
> your entire cluster from one node and/or a web browser.
>
> Please feel free to email me (cfeist at redhat.com) or open issues on the
> pcs project at github (https://github.com/feist/pcs/issues) if you have
> any questions or problems.
>
> Thanks!
> Chris
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
"Hydrogen is just a colourless, odorless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From fdinitto at redhat.com  Tue Oct  9 06:53:23 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 09 Oct 2012 08:53:23 +0200
Subject: [Linux-cluster] Announce: pcs-0.9.26
In-Reply-To: <50736F78.3060906@redhat.com>
References: <50736F78.3060906@redhat.com>
Message-ID: <5073C9E3.7000009@redhat.com>

On 10/9/2012 2:27 AM, Chris Feist wrote:
> We've been making improvements to the pcs (pacemaker/corosync
> configuration system) command line tool over the past few months.
> 
> Currently you can setup a basic cluster (including configuring corosync
> 2.0 udpu).
> 
> David Vossel has also created a version of the "Clusters from Scratch"
> document that illustrates setting up a cluster using pcs.  This should
> be showing up shortly.
>

well done guys!!!

Fabio


From shanti.pahari at sierra.sg  Tue Oct  9 09:07:10 2012
From: shanti.pahari at sierra.sg (Shanti Pahari)
Date: Tue, 9 Oct 2012 17:07:10 +0800 (SGT)
Subject: [Linux-cluster] LVM cannot initilize
In-Reply-To: <5072CB41.4090006@redhat.com>
References: <b7083cda.00000e68.00001f42@sierra-A66>
	<5072AE30.3000904@redhat.com>
	<25537b6b.00000e68.00001f81@sierra-A66>
	<5072CB41.4090006@redhat.com>
Message-ID: <97898eab.00001f6c.00000041@sierra-A66>

Hi Bryn,

>From example:
On each cluster node, edit /etc/lvm/lvm.conf and change the volume_list
field to
match the boot volume (myvg) and name of the node cluster interconnect
(ha-web1).
This restricts the list of volumes available during system boot to only
the root volume
and prevents cluster nodes from updating and potentially corrupting the
metadata on
the HA-LVM volume:
volume_list = [ "myvg", "@ha-web1" ]

so I added volume_list = [ " vg_pdcpicpl01 " , "@PDC-PIC-PL-01" ]

PDC-PIC-PL-01 : is my hostname

# dracut --hostonly --force /boot/initramfs-$(uname -r).img $(uname -r)
# shutdown -r now "Activating ramdisk LVM changes"

Reboot

I have /dev/HA-Web-VG/ha-web-lv also, but after reboot I cannot initialize
my volume it throws  nor create logical volume
lvcreate gets error message "not activating volume group lv does not pass
activation filter"

-----Original Message-----
From: Bryn M. Reeves [mailto:bmr at redhat.com] 
Sent: Monday, 8 October, 2012 8:47 PM
To: linux clustering
Cc: Shanti Pahari
Subject: Re: [Linux-cluster] LVM cannot initilize

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/08/2012 01:13 PM, Shanti Pahari wrote:
> volume_list = [ "vg_pdcpicpl01", "@PDC-PIC-PL-01" ]
> 
> where my vg_pdcpicpl01 is Volume group rather than logical volume.
> This is my root volume .

Either a tag "@tagname", a volume group as "vgname", or a logical volume
name as "vgname/lvname" (otherwise the tools cannot know which LV you mean
if there are multiple LVs with the same name in different VGs).

In your earlier example you seemed to have an LV name:

>> Volume_list  =  [ "myrootvolume", "@hostname" ]

Which won't work since the LV name is unqualified by a VG. Your later
example:

> volume_list = [ "vg_pdcpicpl01", "@PDC-PIC-PL-01" ]

Looks correct assuming that vg_pdcpicpl01 is the name of a VG on your
system.

> In example I saw that I have to specify only VG of root .

It's up to you whether you want to specify just an LV or a whole VG.

> Can you help me where I am wrong ?

It's hard to say. What error do you get initialising LVM? Try adding more
-v if there's nothing useful printed (normally syntax errors in lvm.conf
give a useful message).

Failing that you could post your full lvm.conf to a pastebin somewhere and
mail the link so that others can review your config.

Regards,
Bryn.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlByy0EACgkQ6YSQoMYUY95VEgCdFK9QqeTOGAiiYnZiuAJ6iHY5
BrUAmwbCBULcq9gEVBtqOg8wAAGXph5M
=UsOX
-----END PGP SIGNATURE-----


From mmorgan at dca.net  Tue Oct  9 13:44:06 2012
From: mmorgan at dca.net (Michael Morgan)
Date: Tue, 9 Oct 2012 09:44:06 -0400
Subject: [Linux-cluster] GFS2 showing the wrong directory contents on
	one node?
In-Reply-To: <20121005202216.GK17352@staff.dca.net>
References: <20121005202216.GK17352@staff.dca.net>
Message-ID: <20121009134406.GB9351@staff.dca.net>

Replying to myself,

 I was out of the office yesterday but I come back in this morning and
everything looks correct again. Apache is still stopped and nobody has touched
the server since I stopped services on Friday. Very strange

-Mike

On Fri, Oct 05, 2012 at 04:22:16PM -0400, Michael Morgan wrote:
> Hello,
> 
>  I have a 6 node CentOS 5.8 cluster with 4 nodes mounting a GFS2 filesystem.
> Everything had been running nicely for about 2 years but over the past few
> months I've had a strange occurence happen twice. One of the two web server
> nodes will suddenly start listing the wrong directory contents, both nodes have
> been affected at different times. This only seems to affect one or two
> directories but it's hard to be certain since there are a large numer of them.
> There are no errors logged anywhere on the cluster. Unmounting GFS2 on this
> node usually causes a hang and eventual fence. The node will come back online
> without issue and begin functioning normally again.
> 
>  Just a few minutes ago it started happening again. I currently have services
> stopped but have not gone through the unmount/reboot process yet. Before I do
> that I figured I'd check the list to see if anyone has come across this before.
> Is there any GFS2/cluster information I should be dumping to track down the
> cause? Any insight would be appreciated. Thanks.
> 
> -Mike
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From queszama at yahoo.in  Tue Oct  9 16:25:00 2012
From: queszama at yahoo.in (Zama Ques)
Date: Wed, 10 Oct 2012 00:25:00 +0800 (SGT)
Subject: [Linux-cluster] Choosing a fencing device
Message-ID: <1349799900.92783.YahooMailNeo@web193006.mail.sg3.yahoo.com>

Hi All,

Need help in selecting the right fencing device for our HA cluster of two nodes . The server hardware used is HP Proliant Servers and OS we are using is CentOS 5

There are two options for us in selecting the fencing device . 


One is selecting a SAN Brocade switch. In this case , we will use ILO as secondary fencing device .

Other option for us is using HP ILO as primary fencing device and IPMI fencing for secondary fencing 


Of the two options which will be better to go for configuring fencing .Any known issues with ILO or SAN Brocade switch in configuring fencing ?? Any suggestions will be greatly helpful.?

Thanks
Zaman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121010/db2e7503/attachment.htm>

From lists at alteeve.ca  Tue Oct  9 16:58:41 2012
From: lists at alteeve.ca (Digimer)
Date: Tue, 09 Oct 2012 12:58:41 -0400
Subject: [Linux-cluster] Choosing a fencing device
In-Reply-To: <1349799900.92783.YahooMailNeo@web193006.mail.sg3.yahoo.com>
References: <1349799900.92783.YahooMailNeo@web193006.mail.sg3.yahoo.com>
Message-ID: <507457C1.1020402@alteeve.ca>

On 10/09/2012 12:25 PM, Zama Ques wrote:
> Hi All,
>
> Need help in selecting the right fencing device for our HA cluster of
> two nodes . The server hardware used is HP Proliant Servers and OS we
> are using is CentOS 5
>
> There are two options for us in selecting the fencing device .
>
>
> One is selecting a SAN Brocade switch. In this case , we will use ILO as
> secondary fencing device .
>
> Other option for us is using HP ILO as primary fencing device and IPMI
> fencing for secondary fencing
>
>
> Of the two options which will be better to go for configuring fencing
> .Any known issues with ILO or SAN Brocade switch in configuring fencing
> ?  Any suggestions will be greatly helpful.
>
> Thanks
> Zaman

There is no benefit to use fence_ilo and fence_ipmilan as they work on 
the same device... If one fails, the other will, too.

Personally, I'd use fence_ipmilan (more tested than fence_ilo) as 
primary and SAN fencing as a backup in case the out of band management 
fails (as could happen if the node lost it's power).

The reason I recommend the oob interface as primary is that power 
fencing has a chance of recovering the node where fabric fencing merely 
cuts it off, which is fine for fencing, but stays offline until an admin 
solves the problem and unfences the nodes.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
"Hydrogen is just a colourless, odorless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From shanti.pahari at sierra.sg  Wed Oct 10 07:06:30 2012
From: shanti.pahari at sierra.sg (Shanti Pahari)
Date: Wed, 10 Oct 2012 15:06:30 +0800 (SGT)
Subject: [Linux-cluster] cannot run cluster service
Message-ID: <a2202e87.00001f6c.00000065@sierra-A66>

Dear all,

 
I have cluster setup with 2 node and created web cluster service on it but
it cannot run.

I have not listed anything in lvm.conf volume_list because once I add
anything in volume_list and reboot the system then I cannot mount and even
cannot read the lv which I created for my web . It throws error as 

error message "not activating volume group lv does not pass activation
filter"

Therefore I didn't add anything in lvm.conf . Then I try to start my
cluster servers for web server but the service failed.

Please help me so that I can solve this out.

 
I have attached my cluster.conf , lvdisplay , /var/log/messages and
lvm.conf and my /etc/hosts.

 
I will be greatful if anyone can help me!

Thanks

 
And my clustat: 

Cluster Status for PDC-PIC-PL-CL @ Wed Oct 10 14:58:05 2012

Member Status: Quorate

 
Member Name                                                     ID
Status

------ ----                                                     ----
------

PDC-PIC-PL-CL1                                                      1
Online, Local, rgmanager

PDC-PIC-PL-CL2                                                      2
Online, rgmanager

 
Service Name                                               Owner (Last)
State         

 ------- ----                                               ----- ------
-----         

 service:ha-web-service
(PDC-PIC-PL-CL1)                                           failed        

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121010/18f4d448/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: application/octet-stream
Size: 1438 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121010/18f4d448/attachment.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lvdisplay.txt
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121010/18f4d448/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log.txt
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121010/18f4d448/attachment-0001.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lvm.conf
Type: application/octet-stream
Size: 24568 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121010/18f4d448/attachment-0001.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: hosts.txt
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121010/18f4d448/attachment-0002.txt>

From andrew at beekhof.net  Wed Oct 10 10:47:17 2012
From: andrew at beekhof.net (Andrew Beekhof)
Date: Wed, 10 Oct 2012 21:47:17 +1100
Subject: [Linux-cluster] [Pacemaker] Announce: pcs-0.9.26
In-Reply-To: <50736F78.3060906@redhat.com>
References: <50736F78.3060906@redhat.com>
Message-ID: <CAEDLWG2iTtphgjAnBcGTVSDSuaUPHSbvr2bij9LanXbehtCATA@mail.gmail.com>

On Tue, Oct 9, 2012 at 11:27 AM, Chris Feist <cfeist at redhat.com> wrote:
> We've been making improvements to the pcs (pacemaker/corosync configuration
> system) command line tool over the past few months.
>
> Currently you can setup a basic cluster (including configuring corosync 2.0
> udpu).
>
> David Vossel has also created a version of the "Clusters from Scratch"
> document that illustrates setting up a cluster using pcs.  This should be
> showing up shortly.

Its now available at the usual location:

   http://www.clusterlabs.org/doc

>
> You can view the source here: https://github.com/feist/pcs/
>
> Or download the latest tarball:
> https://github.com/downloads/feist/pcs/pcs-0.9.26.tar.gz
>
> There is also a Fedora 18 package that will be included with the next
> release. You should be able to find that package in the following
> locations...
>
> RPM:
> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
>
> SRPM:
> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.src.rpm
>
> In the near future we are planning on having builds for SUSE &
> Ubuntu/Debian.
>
> We're also actively working on a GUI/Daemon that will allow control of your
> entire cluster from one node and/or a web browser.
>
> Please feel free to email me (cfeist at redhat.com) or open issues on the pcs
> project at github (https://github.com/feist/pcs/issues) if you have any
> questions or problems.
>
> Thanks!
> Chris
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


From andrewd at sterling.net  Wed Oct 10 15:27:02 2012
From: andrewd at sterling.net (Andrew Denton)
Date: Wed, 10 Oct 2012 08:27:02 -0700
Subject: [Linux-cluster] cannot run cluster service
In-Reply-To: <a2202e87.00001f6c.00000065@sierra-A66>
References: <a2202e87.00001f6c.00000065@sierra-A66>
Message-ID: <507593C6.5040708@sterling.net>

On 10/10/2012 12:06 AM, Shanti Pahari wrote:
> I have cluster setup with 2 node and created web cluster service on it
> but it cannot run.
>
> I have not listed anything in lvm.conf volume_list because once I add
> anything in volume_list and reboot the system then I cannot mount and
> even cannot read the lv which I created for my web . It throws error as
>
> error message "not activating volume group lv does not pass activation
> filter"
>
> Therefore I didn't add anything in lvm.conf . Then I try to start my
> cluster servers for web server but the service failed.
>

I've seen this failure too when building my cluster. You either need to
add the system's volume groups to volume_list, or tag the system's vgs
with the @hostname so it can still activate them.
e.g.
    volume_list = [ "vg_pdcpicpl01", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2" ]
on one node and
    volume_list = [ "vg_pdcpicpl02", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2" ]
on the other. Next it will complain about initrd being older than
lvm.conf, so I've been running
# mkinitrd -f /boot/initrd-`uname -r`.img `uname -r`
Not sure if that's the right command but it works for me =)

One of these days I'm going to tag the system's vgs properly so I can
use the same lvm.conf across the nodes. I think it's something like
lvchange --addtag PDC-PIC-PL-CL1 vg_pdcpicpl01/lv_root
etc...

By the way, to display how things are tagged, you have to do
lvs -o +tags
I wish it displayed them in lvdisplay, but it doesn't.

-- Andrew


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121010/26890ac5/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 897 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121010/26890ac5/attachment.sig>

From lists at alteeve.ca  Wed Oct 10 15:50:30 2012
From: lists at alteeve.ca (Digimer)
Date: Wed, 10 Oct 2012 11:50:30 -0400
Subject: [Linux-cluster] cannot run cluster service
In-Reply-To: <a2202e87.00001f6c.00000065@sierra-A66>
References: <a2202e87.00001f6c.00000065@sierra-A66>
Message-ID: <50759946.7060303@alteeve.ca>

Your fencing is not setup properly. You define the fence devices, but do 
not use them as <method>'s under each node definition. You should be 
able to 'fence_node <target>' and watch it get rebooted. Until then, 
your cluster will hang (by design) the first time there is a problem.

Assuming you want to use clustered LVM, you need to change locking_type 
to '3' and, I advise, change 'falllback_to_local_locking' to '0'. This 
also requires that the 'clvmd' daemon is running, which in turn needs 
cman to be running first.

What is your backing device for the LVM PV?

digimer

On 10/10/2012 03:06 AM, Shanti Pahari wrote:
> Dear all,
>
> I have cluster setup with 2 node and created web cluster service on it
> but it cannot run.
>
> I have not listed anything in lvm.conf volume_list because once I add
> anything in volume_list and reboot the system then I cannot mount and
> even cannot read the lv which I created for my web . It throws error as
>
> error message "not activating volume group lv does not pass activation
> filter"
>
> Therefore I didn?t add anything in lvm.conf . Then I try to start my
> cluster servers for web server but the service failed.
>
> Please help me so that I can solve this out.
>
> I have attached my cluster.conf , lvdisplay , /var/log/messages and
> lvm.conf and my /etc/hosts.
>
> I will be greatful if anyone can help me!
>
> Thanks
>
> And my clustat:
>
> Cluster Status for PDC-PIC-PL-CL @ Wed Oct 10 14:58:05 2012
>
> Member Status: Quorate
>
> Member Name                                                     ID   Status
>
> ------ ----                                                     ---- ------
>
> PDC-PIC-PL-CL1                                                      1
> Online, Local, rgmanager
>
> PDC-PIC-PL-CL2                                                      2
> Online, rgmanager
>
> Service Name                                               Owner
> (Last)                                               State
>
>   ------- ----                                               -----
> ------                                               -----
>
>   service:ha-web-service
> (PDC-PIC-PL-CL1)                                           failed
>
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
"Hydrogen is just a colourless, odorless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From CBurke at innova-partners.com  Wed Oct 10 18:45:13 2012
From: CBurke at innova-partners.com (Chip Burke)
Date: Wed, 10 Oct 2012 18:45:13 +0000
Subject: [Linux-cluster] Configure multiple heartbeat on a redhat cluster
In-Reply-To: <1D241511770E2F4BA89AFD224EDD527141025BAE@G9W0733.americas.hpqcorp.net>
Message-ID: <F6CC886A516DF049A33F32C149ED21C823C3FAD9@alexandria.innova.local>

I have been looking for an answer to this myself. The only answer I have
found is using bonded interfaces.

https://access.redhat.com/knowledge/node/48157

However, seeing that it uses multicast, I am not sure it say you have NICs
on a production LAN and then NICs on an iSCSI LAN, that they all
send/receive heartbeat packets to the multicast address on all attached
LANs.


________________________________________
Chip Burke


On 10/8/12 9:47 PM, "Chen, Ming Ming" <ming-ming.chen at hp.com> wrote:

>
>
> Hi,
> Is there a way to configure multiple heartbeat network in the
>/etc/cluster.conf file.
>I'm using redhat cluster.
>Regards
>Ming
>
>
>-- 
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster


From lists at alteeve.ca  Thu Oct 11 02:36:48 2012
From: lists at alteeve.ca (Digimer)
Date: Wed, 10 Oct 2012 22:36:48 -0400
Subject: [Linux-cluster] [Pacemaker] Announce: pcs-0.9.26
In-Reply-To: <50736F78.3060906@redhat.com>
References: <50736F78.3060906@redhat.com>
Message-ID: <507630C0.2010607@alteeve.ca>

On 10/08/2012 08:27 PM, Chris Feist wrote:
> We've been making improvements to the pcs (pacemaker/corosync
> configuration system) command line tool over the past few months.
>
> Currently you can setup a basic cluster (including configuring corosync
> 2.0 udpu).
>
> David Vossel has also created a version of the "Clusters from Scratch"
> document that illustrates setting up a cluster using pcs.  This should
> be showing up shortly.
>
> You can view the source here: https://github.com/feist/pcs/
>
> Or download the latest tarball:
> https://github.com/downloads/feist/pcs/pcs-0.9.26.tar.gz
>
> There is also a Fedora 18 package that will be included with the next
> release. You should be able to find that package in the following
> locations...
>
> RPM:
> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
>
> SRPM:
> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.src.rpm
>
> In the near future we are planning on having builds for SUSE &
> Ubuntu/Debian.
>
> We're also actively working on a GUI/Daemon that will allow control of
> your entire cluster from one node and/or a web browser.
>
> Please feel free to email me (cfeist at redhat.com) or open issues on the
> pcs project at github (https://github.com/feist/pcs/issues) if you have
> any questions or problems.
>
> Thanks!
> Chris

Hi Chris,

   I started following Andrew's new pcs-based tutorial today on a fresh, 
minimal F17 x86_64 install. Section 2.5 of CfS-pcs shows;

===
yum install -y pcs

2.5 Setup

<snip>

# systemctl start pcsd.service
# systemctl enable pcsd.service
===

This fails, and Andrew suggested using the version of pcs you annouced 
here. Same problem though;

===
[root at an-c01n01 ~]# rpm -Uvh 
http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
Retrieving http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
Preparing...                ########################################### 
[100%]
    1:pcs                    ########################################### 
[100%]
[root at an-c01n01 ~]# systemctl start pcsd.service
Failed to issue method call: Unit pcsd.service failed to load: No such 
file or directory. See system logs and 'systemctl status pcsd.service' 
for details.

[root at an-c01n01 ~]# rpm -q pacemaker corosync pcs
pacemaker-1.1.7-2.fc17.x86_64
corosync-2.0.1-1.fc17.x86_64
pcs-0.9.26-1.fc18.noarch
===

Any thoughts?

Cheers!

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
"Hydrogen is just a colourless, odorless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From shanti.pahari at sierra.sg  Thu Oct 11 02:54:42 2012
From: shanti.pahari at sierra.sg (Shanti Pahari)
Date: Thu, 11 Oct 2012 10:54:42 +0800 (SGT)
Subject: [Linux-cluster] cannot run cluster service
In-Reply-To: <507593C6.5040708@sterling.net>
References: <a2202e87.00001f6c.00000065@sierra-A66>
	<507593C6.5040708@sterling.net>
Message-ID: <27d23437.00001f6c.000000ff@sierra-A66>

Hi Andrew,

 
Now I added following in my lvm.conf

volume_list = [ "vg_pdcpicpl02", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2" ]

and

# dracut --hostonly --force /boot/initramfs-$(uname -r).img $(uname -r)

# shutdown -r now "Activating ramdisk LVM changes"

 
After that when the system tries to boot up:

Kernel panic - not syncing: Attempted to kill init!

 
So didn't have luck L

 
volume_list = [ "vg_pdcpicpl02", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2" ]

The hostname should be cluster connect name or initial hostname ?

My /etc/hosts:

 
192.168.24.32   PDC-PIC-PL-01   PDC-PIC-PL-01.chcs.sg

192.168.25.132  PDC-PIC-PL-01-PM        

192.168.26.13   PDC-PIC-PL-CL1

192.168.24.33   PDC-PIC-PL-02  PDC-PIC-PL-02.chcs.sg

192.168.25.133  PDC-PIC-PL-02-PM

192.168.26.14   PDC-PIC-PL-CL2

 
From: Andrew Denton [mailto:andrewd at sterling.net] 
Sent: Wednesday, 10 October, 2012 11:27 PM
To: linux clustering
Cc: Shanti Pahari
Subject: Re: [Linux-cluster] cannot run cluster service

 
On 10/10/2012 12:06 AM, Shanti Pahari wrote:

I have cluster setup with 2 node and created web cluster service on it but
it cannot run. 

I have not listed anything in lvm.conf volume_list because once I add
anything in volume_list and reboot the system then I cannot mount and even
cannot read the lv which I created for my web . It throws error as 

error message "not activating volume group lv does not pass activation
filter"

Therefore I didn't add anything in lvm.conf . Then I try to start my
cluster servers for web server but the service failed.


I've seen this failure too when building my cluster. You either need to
add the system's volume groups to volume_list, or tag the system's vgs
with the @hostname so it can still activate them. 
e.g.
    volume_list = [ "vg_pdcpicpl01", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2"
]
on one node and
    volume_list = [ "vg_pdcpicpl02", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2"
]
on the other. Next it will complain about initrd being older than
lvm.conf, so I've been running
# mkinitrd -f /boot/initrd-`uname -r`.img `uname -r`
Not sure if that's the right command but it works for me =)

One of these days I'm going to tag the system's vgs properly so I can use
the same lvm.conf across the nodes. I think it's something like
lvchange --addtag PDC-PIC-PL-CL1 vg_pdcpicpl01/lv_root 
etc...

By the way, to display how things are tagged, you have to do
lvs -o +tags
I wish it displayed them in lvdisplay, but it doesn't.

-- Andrew


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121011/6f1959ae/attachment.htm>

From shanti.pahari at sierra.sg  Thu Oct 11 03:02:11 2012
From: shanti.pahari at sierra.sg (Shanti Pahari)
Date: Thu, 11 Oct 2012 11:02:11 +0800 (SGT)
Subject: [Linux-cluster] cannot run cluster service
In-Reply-To: <50759946.7060303@alteeve.ca>
References: <a2202e87.00001f6c.00000065@sierra-A66>
	<50759946.7060303@alteeve.ca>
Message-ID: <ab9b4ebf.00001f6c.00000104@sierra-A66>

Thanks!

I updated the fencing method in each of the nodes. But after I added
volume_list = [ "vg_pdcpicpl02", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2" ]
and
# dracut --hostonly --force /boot/initramfs-$(uname -r).img $(uname -r)
# shutdown -r now "Activating ramdisk LVM changes"

After that when the system tries to boot up:
Kernel panic ? not syncing: Attempted to kill init!

So didn?t have luck ?

Can you help me pls!

-----Original Message-----
From: Digimer [mailto:lists at alteeve.ca]
Sent: Wednesday, 10 October, 2012 11:51 PM
To: linux clustering
Cc: Shanti Pahari
Subject: Re: [Linux-cluster] cannot run cluster service

Your fencing is not setup properly. You define the fence devices, but do not 
use them as <method>'s under each node definition. You should be able to 
'fence_node <target>' and watch it get rebooted. Until then, your cluster 
will hang (by design) the first time there is a problem.

Assuming you want to use clustered LVM, you need to change locking_type to 
'3' and, I advise, change 'falllback_to_local_locking' to '0'. This also 
requires that the 'clvmd' daemon is running, which in turn needs cman to be 
running first.

What is your backing device for the LVM PV?

digimer

On 10/10/2012 03:06 AM, Shanti Pahari wrote:
> Dear all,
>
> I have cluster setup with 2 node and created web cluster service on it
> but it cannot run.
>
> I have not listed anything in lvm.conf volume_list because once I add
> anything in volume_list and reboot the system then I cannot mount and
> even cannot read the lv which I created for my web . It throws error
> as
>
> error message "not activating volume group lv does not pass activation
> filter"
>
> Therefore I didn?t add anything in lvm.conf . Then I try to start my
> cluster servers for web server but the service failed.
>
> Please help me so that I can solve this out.
>
> I have attached my cluster.conf , lvdisplay , /var/log/messages and
> lvm.conf and my /etc/hosts.
>
> I will be greatful if anyone can help me!
>
> Thanks
>
> And my clustat:
>
> Cluster Status for PDC-PIC-PL-CL @ Wed Oct 10 14:58:05 2012
>
> Member Status: Quorate
>
> Member Name                                                     ID 
> Status
>
> ------ ----                                                     ---- ------
>
> PDC-PIC-PL-CL1                                                      1
> Online, Local, rgmanager
>
> PDC-PIC-PL-CL2                                                      2
> Online, rgmanager
>
> Service Name                                               Owner
> (Last)                                               State
>
>   ------- ----                                               -----
> ------                                               -----
>
>   service:ha-web-service
> (PDC-PIC-PL-CL1)                                           failed
>
>
>


--
Digimer
Papers and Projects: https://alteeve.ca/w/ "Hydrogen is just a colourless, 
odorless gas which, if left alone in sufficient quantities for long periods 
of time, begins to think about itself."


From ali.bendriss at gmail.com  Thu Oct 11 10:03:02 2012
From: ali.bendriss at gmail.com (Ali Bendriss)
Date: Thu, 11 Oct 2012 12:03:02 +0200
Subject: [Linux-cluster] locking_type
Message-ID: <1552642.6zb82LHdvl@zapp>

Hello,

I'm runnning a two nodes clusters on linux: 

- current setup:
cluster-3.1.93
LVM2-2.02.96
kernel 3.2.29

- previous setup:
cluster-3.1.92
LVM2.2.02.84
kernel 3.4.3

Since the updrade to current setup, I'm only able to run clvmd if it is 
compiled using "--with-cluster=shared" and setting the locking_type = 2 in 
clvm.conf

before that using the previous setup I was able to compile clvmd using 
"--with-cluster=internal"  and setting the locking_type = 3

Is there any problem running a gfs2 fs with clvmd using the external shared 
library locking_library ?

thanks

--
Ali


From ali.bendriss at gmail.com  Thu Oct 11 10:13:45 2012
From: ali.bendriss at gmail.com (Ali Bendriss)
Date: Thu, 11 Oct 2012 12:13:45 +0200
Subject: [Linux-cluster] snapshot status
Message-ID: <1780785.LbQT9riKYZ@zapp>

Hello,

I'm runnning a two nodes clusters on linux using gfs2
(cluster-3.1.93, LVM2-2.02.96, kernel 3.2.29).
I would like to know what is the current status of the snapshot support.
I've got a third node that I would like to use for the backup. Could someone 
give me some hint about backuping a gfs2 shared file system.

thanks

--
Ali

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121011/1a7b3f65/attachment.htm>

From agk at redhat.com  Thu Oct 11 10:36:05 2012
From: agk at redhat.com (Alasdair G Kergon)
Date: Thu, 11 Oct 2012 11:36:05 +0100
Subject: [Linux-cluster] locking_type
In-Reply-To: <1552642.6zb82LHdvl@zapp>
References: <1552642.6zb82LHdvl@zapp>
Message-ID: <20121011103604.GD2133@agk-dp.fab.redhat.com>

On Thu, Oct 11, 2012 at 12:03:02PM +0200, Ali Bendriss wrote:
> Since the updrade to current setup, I'm only able to run clvmd if it is 
> compiled using "--with-cluster=shared" and setting the locking_type = 2 in 
> clvm.conf
 
What is the error you get?

I don't think we stopped this working intentionally, but admittedly it's
not a configuration we test very often.

Alasdair


From ali.bendriss at gmail.com  Thu Oct 11 11:19:09 2012
From: ali.bendriss at gmail.com (Ali Bendriss)
Date: Thu, 11 Oct 2012 13:19:09 +0200
Subject: [Linux-cluster] locking_type
In-Reply-To: <20121011103604.GD2133@agk-dp.fab.redhat.com>
References: <1552642.6zb82LHdvl@zapp>
	<20121011103604.GD2133@agk-dp.fab.redhat.com>
Message-ID: <2094091.Y1l6vVcmm9@zapp>

On Thursday, October 11, 2012 11:36:05 AM Alasdair G Kergon wrote:
> On Thu, Oct 11, 2012 at 12:03:02PM +0200, Ali Bendriss wrote:
> > Since the updrade to current setup, I'm only able to run clvmd if it is
> > compiled using "--with-cluster=shared" and setting the locking_type = 2 in
> > clvm.conf
> 
> What is the error you get?
> 
Starting the node with locking type = 3 and clvmd compiled using with-
cluster=internal , I've got no error:
clvmd: Cluster LVM daemon started - connected to CMAN

but running for example vgscan desn't work:
/sbin/vgscan
  connect() failed on local socket: No such file or directory
  Internal cluster locking initialisation failed.
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  Reading all physical volumes.  This may take a while...
  Skipping clustered volume group samba4
  Skipping clustered volume group ctdb
  Skipping clustered volume group shared
  Found volume group "main" using metadata type lvm2

the same command is working using clvmd compiled using the shared locking.

more log below

> I don't think we stopped this working intentionally, but admittedly it's
> not a configuration we test very often.
> 
I was thinking that "--with-cluster=internal", was the recommended 
configuration.
  
> Alasdair

--------------------------------------------------------------------------------------------
(using locking type = 3) and calling vgscan

#clvmd -d 1 
CLVMD[9d22b740]: Oct 11 13:03:51 CLVMD started
CLVMD[9d22b740]: Oct 11 13:03:51 Connected to CMAN
CLVMD[9d22b740]: Oct 11 13:03:51 CMAN initialisation complete
CLVMD[9d22b740]: Oct 11 13:03:51 Created DLM lockspace for CLVMD.
CLVMD[9d22b740]: Oct 11 13:03:51 DLM initialisation complete
CLVMD[9d22b740]: Oct 11 13:03:51 Cluster ready, doing some more initialisation
CLVMD[9d22b740]: Oct 11 13:03:51 starting LVM thread
CLVMD[9d22a700]: Oct 11 13:03:51 LVM thread function started
WARNING: Locking disabled. Be careful! This could corrupt your metadata.
Incorrect metadata area header checksum on /dev/sdd at offset 4096
CLVMD[9d22a700]: Oct 11 13:03:51 getting initial lock for 
ve26qDQ7hcpgDH2fw19GFZkbKgTadysCNUNjh9w8HFdbVvQLBjZidl8QseraUBc0
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: 
've26qDQ7hcpgDH2fw19GFZkbKgTadysCNUNjh9w8HFdbVvQLBjZidl8QseraUBc0' mode:1 
flags=1
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: returning lkid 1
CLVMD[9d22a700]: Oct 11 13:03:51 getting initial lock for 
ve26qDQ7hcpgDH2fw19GFZkbKgTadysCAXDuTvYJ4ambKnLALpOffDSxPrjHliO0
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: 
've26qDQ7hcpgDH2fw19GFZkbKgTadysCAXDuTvYJ4ambKnLALpOffDSxPrjHliO0' mode:1 
flags=1
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: returning lkid 2
CLVMD[9d22a700]: Oct 11 13:03:51 getting initial lock for 
ByYnMJeHNgSIBuJENA2WLMe148edovbofR4f9clPHk2BUveeSUstcSEJzcOHt2BE
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: 
'ByYnMJeHNgSIBuJENA2WLMe148edovbofR4f9clPHk2BUveeSUstcSEJzcOHt2BE' mode:1 
flags=1
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: returning lkid 3
CLVMD[9d22a700]: Oct 11 13:03:51 getting initial lock for 
GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjx6QSKUI7riC86QhzIX98cmu8rL4lHXJlO
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: 
'GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjx6QSKUI7riC86QhzIX98cmu8rL4lHXJlO' mode:1 
flags=1
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: returning lkid 4
CLVMD[9d22a700]: Oct 11 13:03:51 getting initial lock for 
GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjxS36cpLJRPxWYtjPQChMIQZW7Zxx97aGL
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: 
'GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjxS36cpLJRPxWYtjPQChMIQZW7Zxx97aGL' mode:1 
flags=1
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: returning lkid 5
CLVMD[9d22a700]: Oct 11 13:03:51 getting initial lock for 
GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjx7Vov15UStgsS1tgOISASG7bPjYf7NpYO
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: 
'GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjx7Vov15UStgsS1tgOISASG7bPjYf7NpYO' mode:1 
flags=1
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: returning lkid 6
CLVMD[9d22a700]: Oct 11 13:03:51 getting initial lock for 
GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjx1OWkvGiyJGfz9u5Pcedbzj4hnT2Q6TY0
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: 
'GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjx1OWkvGiyJGfz9u5Pcedbzj4hnT2Q6TY0' mode:1 
flags=1
CLVMD[9d22a700]: Oct 11 13:03:51 sync_lock: returning lkid 7
CLVMD[9d22a700]: Oct 11 13:03:51 Sub thread ready for work.
CLVMD[9d22b740]: Oct 11 13:03:51 clvmd ready for work
CLVMD[9d22b740]: Oct 11 13:03:51 Using timeout of 60 seconds
CLVMD[9d22a700]: Oct 11 13:03:51 LVM thread waiting for work

-------------------------------------------------------------------------------------------------------------------------------------
using locking type = 2 and calling vgscan
# clvmd -d 1 

CLVMD[6d153740]: Oct 11 13:07:57 CLVMD started
CLVMD[6d153740]: Oct 11 13:07:57 Connected to CMAN
CLVMD[6d153740]: Oct 11 13:07:57 CMAN initialisation complete
CLVMD[6d153740]: Oct 11 13:07:57 Created DLM lockspace for CLVMD.
CLVMD[6d153740]: Oct 11 13:07:57 DLM initialisation complete
CLVMD[6d153740]: Oct 11 13:07:57 Cluster ready, doing some more initialisation
CLVMD[6d153740]: Oct 11 13:07:57 starting LVM thread
CLVMD[6d152700]: Oct 11 13:07:57 LVM thread function started
WARNING: Locking disabled. Be careful! This could corrupt your metadata.
Incorrect metadata area header checksum on /dev/sdd at offset 4096
CLVMD[6d152700]: Oct 11 13:07:57 getting initial lock for 
ve26qDQ7hcpgDH2fw19GFZkbKgTadysCNUNjh9w8HFdbVvQLBjZidl8QseraUBc0
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: 
've26qDQ7hcpgDH2fw19GFZkbKgTadysCNUNjh9w8HFdbVvQLBjZidl8QseraUBc0' mode:1 
flags=1
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: returning lkid 1
CLVMD[6d152700]: Oct 11 13:07:57 getting initial lock for 
ve26qDQ7hcpgDH2fw19GFZkbKgTadysCAXDuTvYJ4ambKnLALpOffDSxPrjHliO0
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: 
've26qDQ7hcpgDH2fw19GFZkbKgTadysCAXDuTvYJ4ambKnLALpOffDSxPrjHliO0' mode:1 
flags=1
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: returning lkid 2
CLVMD[6d152700]: Oct 11 13:07:57 getting initial lock for 
ByYnMJeHNgSIBuJENA2WLMe148edovbofR4f9clPHk2BUveeSUstcSEJzcOHt2BE
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: 
'ByYnMJeHNgSIBuJENA2WLMe148edovbofR4f9clPHk2BUveeSUstcSEJzcOHt2BE' mode:1 
flags=1
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: returning lkid 3
CLVMD[6d152700]: Oct 11 13:07:57 getting initial lock for 
GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjx6QSKUI7riC86QhzIX98cmu8rL4lHXJlO
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: 
'GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjx6QSKUI7riC86QhzIX98cmu8rL4lHXJlO' mode:1 
flags=1
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: returning lkid 4
CLVMD[6d152700]: Oct 11 13:07:57 getting initial lock for 
GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjxS36cpLJRPxWYtjPQChMIQZW7Zxx97aGL
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: 
'GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjxS36cpLJRPxWYtjPQChMIQZW7Zxx97aGL' mode:1 
flags=1
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: returning lkid 5
CLVMD[6d152700]: Oct 11 13:07:57 getting initial lock for 
GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjx7Vov15UStgsS1tgOISASG7bPjYf7NpYO
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: 
'GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjx7Vov15UStgsS1tgOISASG7bPjYf7NpYO' mode:1 
flags=1
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: returning lkid 6
CLVMD[6d152700]: Oct 11 13:07:57 getting initial lock for 
GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjx1OWkvGiyJGfz9u5Pcedbzj4hnT2Q6TY0
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: 
'GkJTcHFLl6YNc8M0QN7yYA0nWwhzHKjx1OWkvGiyJGfz9u5Pcedbzj4hnT2Q6TY0' mode:1 
flags=1
CLVMD[6d152700]: Oct 11 13:07:57 sync_lock: returning lkid 7
  Incorrect LVM locking library specified in lvm.conf, cluster operations may 
not work.
CLVMD[6d152700]: Oct 11 13:07:57 Sub thread ready for work.
CLVMD[6d152700]: Oct 11 13:07:57 LVM thread waiting for work
CLVMD[6d153740]: Oct 11 13:07:57 clvmd ready for work
CLVMD[6d153740]: Oct 11 13:07:57 Using timeout of 60 seconds
CLVMD[6d153740]: Oct 11 13:08:56 Got new connection on fd 11
CLVMD[6d153740]: Oct 11 13:08:56 Read on local socket 11, len = 29
CLVMD[6d153740]: Oct 11 13:08:56 check_all_clvmds_running
CLVMD[6d153740]: Oct 11 13:08:56 creating pipe, [12, 13]
CLVMD[6d153740]: Oct 11 13:08:56 Creating pre&post thread
CLVMD[6d153740]: Oct 11 13:08:56 Created pre&post thread, state = 0
CLVMD[6d131700]: Oct 11 13:08:56 in sub thread: client = 0x12d06d0
CLVMD[6d131700]: Oct 11 13:08:56 doing PRE command LOCK_VG 'P_#global' at 4 
(client=0x12d06d0)
CLVMD[6d131700]: Oct 11 13:08:56 sync_lock: 'P_#global' mode:4 flags=0
CLVMD[6d131700]: Oct 11 13:08:56 sync_lock: returning lkid 8
CLVMD[6d131700]: Oct 11 13:08:56 Writing status 0 down pipe 13
CLVMD[6d131700]: Oct 11 13:08:56 Waiting to do post command - state = 0
CLVMD[6d153740]: Oct 11 13:08:56 read on PIPE 12: 4 bytes: status: 0
CLVMD[6d153740]: Oct 11 13:08:56 background routine status was 0, 
sock_client=0x12d06d0
CLVMD[6d153740]: Oct 11 13:08:56 distribute command: XID = 0, flags=0x0 ()
CLVMD[6d153740]: Oct 11 13:08:56 add_to_lvmqueue: cmd=0x12d0a10. 
client=0x12d06d0, msg=0x12d02f0, len=29, csid=(nil), xid=0
CLVMD[6d153740]: Oct 11 13:08:56 Sending message to all cluster nodes
CLVMD[6d152700]: Oct 11 13:08:56 process_work_item: local
CLVMD[6d152700]: Oct 11 13:08:56 process_local_command: LOCK_VG (0x33) 
msg=0x12d0a50, msglen =29, client=0x12d06d0
CLVMD[6d152700]: Oct 11 13:08:56 do_lock_vg: resource 'P_#global', cmd = 0x4 
LCK_VG (WRITE|VG), flags = 0x4 ( DMEVENTD_MONITOR ), critical_section = 0
CLVMD[6d152700]: Oct 11 13:08:56 Refreshing context
  Incorrect metadata area header checksum on /dev/sdd at offset 4096
CLVMD[6d152700]: Oct 11 13:08:56 Reply from node node-10: 0 bytes
CLVMD[6d152700]: Oct 11 13:08:56 Got 1 replies, expecting: 2
CLVMD[6d152700]: Oct 11 13:08:56 LVM thread waiting for work
CLVMD[6d153740]: Oct 11 13:08:56 Reply from node node-11: 0 bytes
CLVMD[6d153740]: Oct 11 13:08:56 Got 2 replies, expecting: 2
CLVMD[6d131700]: Oct 11 13:08:56 Got post command condition...
CLVMD[6d131700]: Oct 11 13:08:56 Waiting for next pre command
CLVMD[6d153740]: Oct 11 13:08:56 read on PIPE 12: 4 bytes: status: 0
CLVMD[6d153740]: Oct 11 13:08:56 background routine status was 0, 
sock_client=0x12d06d0
CLVMD[6d153740]: Oct 11 13:08:56 Send local reply
CLVMD[6d153740]: Oct 11 13:08:56 Read on local socket 11, len = 28
CLVMD[6d131700]: Oct 11 13:08:56 Got pre command condition...
CLVMD[6d131700]: Oct 11 13:08:56 doing PRE command LOCK_VG 'V_samba4' at 1 
(client=0x12d06d0)
CLVMD[6d131700]: Oct 11 13:08:56 sync_lock: 'V_samba4' mode:3 flags=0
CLVMD[6d131700]: Oct 11 13:08:56 sync_lock: returning lkid 9
CLVMD[6d131700]: Oct 11 13:08:56 Writing status 0 down pipe 13
CLVMD[6d131700]: Oct 11 13:08:56 Waiting to do post command - state = 0
CLVMD[6d153740]: Oct 11 13:08:56 read on PIPE 12: 4 bytes: status: 0
CLVMD[6d153740]: Oct 11 13:08:56 background routine status was 0, 
sock_client=0x12d06d0
CLVMD[6d153740]: Oct 11 13:08:56 distribute command: XID = 1, flags=0x1 (LOCAL)
CLVMD[6d153740]: Oct 11 13:08:56 add_to_lvmqueue: cmd=0x12d0a10. 
client=0x12d06d0, msg=0x12d02f0, len=28, csid=(nil), xid=1
CLVMD[6d152700]: Oct 11 13:08:56 process_work_item: local
CLVMD[6d152700]: Oct 11 13:08:56 process_local_command: LOCK_VG (0x33) 
msg=0x12d0a50, msglen =28, client=0x12d06d0
CLVMD[6d152700]: Oct 11 13:08:56 do_lock_vg: resource 'V_samba4', cmd = 0x1 
LCK_VG (READ|VG), flags = 0x4 ( DMEVENTD_MONITOR ), critical_section = 0
CLVMD[6d152700]: Oct 11 13:08:56 Invalidating cached metadata for VG samba4
...


From jpokorny at redhat.com  Thu Oct 11 11:25:54 2012
From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=)
Date: Thu, 11 Oct 2012 13:25:54 +0200
Subject: [Linux-cluster] [Pacemaker] Announce: pcs-0.9.26
In-Reply-To: <507630C0.2010607@alteeve.ca>
References: <50736F78.3060906@redhat.com>
 <507630C0.2010607@alteeve.ca>
Message-ID: <20121011112554.GC29887@redhat.com>

Hello Digimer,

On 10/10/12 22:36 -0400, Digimer wrote:
>   I started following Andrew's new pcs-based tutorial today on a fresh,
> minimal F17 x86_64 install. Section 2.5 of CfS-pcs shows;
> 
> ===
> yum install -y pcs
> 
> 2.5 Setup
> 
> <snip>
> 
> # systemctl start pcsd.service
> # systemctl enable pcsd.service
> ===
> 
> This fails, and Andrew suggested using the version of pcs you annouced here.
> Same problem though;
> 
> ===
> [root at an-c01n01 ~]# rpm -Uvh
> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
> Retrieving http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
> Preparing...                ###########################################
> [100%]
>    1:pcs                    ###########################################
> [100%]
> [root at an-c01n01 ~]# systemctl start pcsd.service
> Failed to issue method call: Unit pcsd.service failed to load: No such file
> or directory. See system logs and 'systemctl status pcsd.service' for
> details.
> 
> [...]
> 
> Any thoughts?

this is part of pcs-gui project [1] packaging of which is probably pending.

[1] https://github.com/feist/pcs-gui

-- 
Jan


From shanti.pahari at sierra.sg  Thu Oct 11 13:43:23 2012
From: shanti.pahari at sierra.sg (Shanti Pahari)
Date: Thu, 11 Oct 2012 21:43:23 +0800 (SGT)
Subject: [Linux-cluster] lvm.conf for HA LVM
Message-ID: <0a55cf02.00000e4c.00000039@sierra-A66>

Hi all,

 
Please help me to configure lvm.conf .

I always get problem when I add volume_list in lvm.conf file.

After I add volume_list  = [ "vg_pdcpicpl101" , "@pdc-pic-pl-01" ] and
reboot it later my other lvm goes down L cannot initialize it. Or do I
need to add all my VG in volume list?

 
Please help!

 
Thanks,

Shanti

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121011/5da2e827/attachment.htm>

From bergman at merctech.com  Thu Oct 11 17:01:37 2012
From: bergman at merctech.com (bergman at merctech.com)
Date: Thu, 11 Oct 2012 13:01:37 -0400
Subject: [Linux-cluster] cannot run cluster service
In-Reply-To: Your message of "Thu, 11 Oct 2012 11:02:11 +0800."
	<ab9b4ebf.00001f6c.00000104@sierra-A66>
References: <ab9b4ebf.00001f6c.00000104@sierra-A66>
	<a2202e87.00001f6c.00000065@sierra-A66>
	<50759946.7060303@alteeve.ca>
Message-ID: <4136.1349974897@localhost>

In the message dated: Thu, 11 Oct 2012 11:02:11 +0800,
The pithy ruminations from "Shanti Pahari" on 
<Re: [Linux-cluster] cannot run cluster service> were:
=> Thanks!
=> 
=> I updated the fencing method in each of the nodes. But after I added
=> volume_list = [ "vg_pdcpicpl02", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2" ]
=> and
=> # dracut --hostonly --force /boot/initramfs-$(uname -r).img $(uname -r)

Note that the lvm.conf data that is stored within the initrd.img is not an
exact copy of /etc/lvm/lvm.conf, but it is a filtered version, created
by "lvm dumpconfig". Previous bugs have meant that the embedded version was
not alway bootable...perhaps you're having a similar problem.

=> # shutdown -r now "Activating ramdisk LVM changes"
=> 
=> After that when the system tries to boot up:
=> Kernel panic ??? not syncing: Attempted to kill init!

I don't recall information from you about what OS distribution you are using,
but this problem sounds similar to:

	https://bugzilla.redhat.com/show_bug.cgi?id=517868


Before giving up on LVM and and moving all filesystem management out of RHCS
control, I had several problems with the the embedded copy of /etc/lvm.conf
that's stored within the initrd image. I ended up with a procedure where I
would replace the embedded version (produced with "lvm dumpconfig" by the
mkinitrd process) inside the initrd image with the full working version.

This procedure was necessary after any kernel or lvm.conf change.

An abbreviated set of steps (ommitting all error checking, etc) is:

	gunzip < $initrdfile > /tmp/initrd.unzipped.img
	cpio -i --make-directories < /tmp/initrd.unzipped.img
	cd /tmp/initrd.$$/etc/lvm
	cp /etc/lvm/lvm.conf .
	cd /tmp/initrd.$$
	find ./ | cpio -H newc -o > /tmp/initrd.unzipped.img
	gzip < /tmp/initrd.unzipped.img > /tmp/initrd.zipped.img
	mv /tmp/initrd.zipped.img $initrdfile

I do not recommend this as a standard procedure, but it might be worth
doing once, in order to see whether the embedded version of the lvm.conf
file used in the initrd.img is causing your problem.

Mark

=> 
=> So didn???t have luck ???
=> 
=> Can you help me pls!
=> 
=> -----Original Message-----
=> From: Digimer [mailto:lists at alteeve.ca]
=> Sent: Wednesday, 10 October, 2012 11:51 PM
=> To: linux clustering
=> Cc: Shanti Pahari
=> Subject: Re: [Linux-cluster] cannot run cluster service
=> 
=> Your fencing is not setup properly. You define the fence devices, but do not 
=> use them as <method>'s under each node definition. You should be able to 
=> 'fence_node <target>' and watch it get rebooted. Until then, your cluster 
=> will hang (by design) the first time there is a problem.
=> 
=> Assuming you want to use clustered LVM, you need to change locking_type to 
=> '3' and, I advise, change 'falllback_to_local_locking' to '0'. This also 
=> requires that the 'clvmd' daemon is running, which in turn needs cman to be 
=> running first.
=> 
=> What is your backing device for the LVM PV?
=> 
=> digimer
=> 
=> On 10/10/2012 03:06 AM, Shanti Pahari wrote:
=> > Dear all,
=> >
=> > I have cluster setup with 2 node and created web cluster service on it
=> > but it cannot run.
=> >
=> > I have not listed anything in lvm.conf volume_list because once I add
=> > anything in volume_list and reboot the system then I cannot mount and
=> > even cannot read the lv which I created for my web . It throws error
=> > as
=> >
=> > error message "not activating volume group lv does not pass activation
=> > filter"
=> >
=> > Therefore I didn???t add anything in lvm.conf . Then I try to start my
=> > cluster servers for web server but the service failed.
=> >
=> > Please help me so that I can solve this out.
=> >
=> > I have attached my cluster.conf , lvdisplay , /var/log/messages and
=> > lvm.conf and my /etc/hosts.
=> >
=> > I will be greatful if anyone can help me!
=> >
=> > Thanks
=> >
=> > And my clustat:
=> >
=> > Cluster Status for PDC-PIC-PL-CL @ Wed Oct 10 14:58:05 2012
=> >
=> > Member Status: Quorate
=> >
=> > Member Name                                                     ID 
=> > Status
=> >
=> > ------ ----                                                     ---- ------
=> >
=> > PDC-PIC-PL-CL1                                                      1
=> > Online, Local, rgmanager
=> >
=> > PDC-PIC-PL-CL2                                                      2
=> > Online, rgmanager
=> >
=> > Service Name                                               Owner
=> > (Last)                                               State
=> >
=> >   ------- ----                                               -----
=> > ------                                               -----
=> >
=> >   service:ha-web-service
=> > (PDC-PIC-PL-CL1)                                           failed
=> >
=> >
=> >
=> 
=> 
=> --
=> Digimer
=> Papers and Projects: https://alteeve.ca/w/ "Hydrogen is just a colourless, 
=> odorless gas which, if left alone in sufficient quantities for long periods 
=> of time, begins to think about itself."
=> 
=> -- 
=> Linux-cluster mailing list
=> Linux-cluster at redhat.com
=> https://www.redhat.com/mailman/listinfo/linux-cluster
=> 


From lists at alteeve.ca  Thu Oct 11 17:14:10 2012
From: lists at alteeve.ca (Digimer)
Date: Thu, 11 Oct 2012 13:14:10 -0400
Subject: [Linux-cluster] cannot run cluster service
In-Reply-To: <ab9b4ebf.00001f6c.00000104@sierra-A66>
References: <a2202e87.00001f6c.00000065@sierra-A66>
	<50759946.7060303@alteeve.ca>
	<ab9b4ebf.00001f6c.00000104@sierra-A66>
Message-ID: <5076FE62.9000707@alteeve.ca>

I don't think you addressed any of my comments.

On 10/10/2012 11:02 PM, Shanti Pahari wrote:
> Thanks!
>
> I updated the fencing method in each of the nodes. But after I added
> volume_list = [ "vg_pdcpicpl02", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2" ]
> and
> # dracut --hostonly --force /boot/initramfs-$(uname -r).img $(uname -r)
> # shutdown -r now "Activating ramdisk LVM changes"
>
> After that when the system tries to boot up:
> Kernel panic ? not syncing: Attempted to kill init!
>
> So didn?t have luck ?
>
> Can you help me pls!
>
> -----Original Message-----
> From: Digimer [mailto:lists at alteeve.ca]
> Sent: Wednesday, 10 October, 2012 11:51 PM
> To: linux clustering
> Cc: Shanti Pahari
> Subject: Re: [Linux-cluster] cannot run cluster service
>
> Your fencing is not setup properly. You define the fence devices, but do not
> use them as <method>'s under each node definition. You should be able to
> 'fence_node <target>' and watch it get rebooted. Until then, your cluster
> will hang (by design) the first time there is a problem.
>
> Assuming you want to use clustered LVM, you need to change locking_type to
> '3' and, I advise, change 'falllback_to_local_locking' to '0'. This also
> requires that the 'clvmd' daemon is running, which in turn needs cman to be
> running first.
>
> What is your backing device for the LVM PV?
>
> digimer
>
> On 10/10/2012 03:06 AM, Shanti Pahari wrote:
>> Dear all,
>>
>> I have cluster setup with 2 node and created web cluster service on it
>> but it cannot run.
>>
>> I have not listed anything in lvm.conf volume_list because once I add
>> anything in volume_list and reboot the system then I cannot mount and
>> even cannot read the lv which I created for my web . It throws error
>> as
>>
>> error message "not activating volume group lv does not pass activation
>> filter"
>>
>> Therefore I didn?t add anything in lvm.conf . Then I try to start my
>> cluster servers for web server but the service failed.
>>
>> Please help me so that I can solve this out.
>>
>> I have attached my cluster.conf , lvdisplay , /var/log/messages and
>> lvm.conf and my /etc/hosts.
>>
>> I will be greatful if anyone can help me!
>>
>> Thanks
>>
>> And my clustat:
>>
>> Cluster Status for PDC-PIC-PL-CL @ Wed Oct 10 14:58:05 2012
>>
>> Member Status: Quorate
>>
>> Member Name                                                     ID
>> Status
>>
>> ------ ----                                                     ---- ------
>>
>> PDC-PIC-PL-CL1                                                      1
>> Online, Local, rgmanager
>>
>> PDC-PIC-PL-CL2                                                      2
>> Online, rgmanager
>>
>> Service Name                                               Owner
>> (Last)                                               State
>>
>>    ------- ----                                               -----
>> ------                                               -----
>>
>>    service:ha-web-service
>> (PDC-PIC-PL-CL1)                                           failed
>>
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ "Hydrogen is just a colourless,
> odorless gas which, if left alone in sufficient quantities for long periods
> of time, begins to think about itself."
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
"Hydrogen is just a colourless, odorless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From mkathuria at tuxtechnologies.co.in  Thu Oct 11 17:36:17 2012
From: mkathuria at tuxtechnologies.co.in (Manish Kathuria)
Date: Thu, 11 Oct 2012 23:06:17 +0530
Subject: [Linux-cluster] lvm.conf for HA LVM
In-Reply-To: <0a55cf02.00000e4c.00000039@sierra-A66>
References: <0a55cf02.00000e4c.00000039@sierra-A66>
Message-ID: <CALiQAgno1d2DLs3xrVmiPC1coiZvR6h5oxKK8Gnc2+SqJAiv-g@mail.gmail.com>

On Thu, Oct 11, 2012 at 7:13 PM, Shanti Pahari <shanti.pahari at sierra.sg> wrote:
> Hi all,
>
>
>
> Please help me to configure lvm.conf .
>
> I always get problem when I add volume_list in lvm.conf file.
>
> After I add volume_list  = [ ?vg_pdcpicpl101? , ?@pdc-pic-pl-01? ] and
> reboot it later my other lvm goes down L cannot initialize it. Or do I need
> to add all my VG in volume list?

The Volume Groups which are to be shared using HA LVM should not be
added to this volume list. They need to be included as resources in
the cluster configuration.

>

You can refer to the following document for the steps to configure HA LVM

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/ap-ha-halvm-CA.html

Thanks,
-- 
Manish Kathuria


From gumbs.alfred at att.net  Thu Oct 11 17:49:25 2012
From: gumbs.alfred at att.net (Alfred Gumbs)
Date: Thu, 11 Oct 2012 12:49:25 -0500
Subject: [Linux-cluster] cannot run cluster service
In-Reply-To: <27d23437.00001f6c.000000ff@sierra-A66>
References: <a2202e87.00001f6c.00000065@sierra-A66><507593C6.5040708@sterling.net>
	<27d23437.00001f6c.000000ff@sierra-A66>
Message-ID: <318E18D064474C05AA6F41B04602BF0A@AlfredPC>

I'm not certain of your complete configuration.  However looking at the entry that you placed in the volume_list.  It looks like you listed the VG that is part of your cluster's resource group.  However, the volume_lists should actually list all the volume groups that are not part of the cluster.  

The VG's in the volume_list are actually the ones that the system needs to bring up.  The volume group that are part of the cluster will be brought up by rgmanager, so they should not  be in volumes_list.  The reason for the system panic is because the required volume groups were not listed in volume_list.  So the kernel could not varyon the system VG.

If I have mistakenly interprettted what you've done I'm sorry.  
  ----- Original Message ----- 
  From: Shanti Pahari 
  To: 'Andrew Denton' ; 'linux clustering' 
  Sent: Wednesday, October 10, 2012 9:54 PM
  Subject: Re: [Linux-cluster] cannot run cluster service


  Hi Andrew,

   
  Now I added following in my lvm.conf

  volume_list = [ "vg_pdcpicpl02", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2" ]

  and

  # dracut --hostonly --force /boot/initramfs-$(uname -r).img $(uname -r)

  # shutdown -r now "Activating ramdisk LVM changes"

   
  After that when the system tries to boot up:

  Kernel panic - not syncing: Attempted to kill init!

   
  So didn't have luck L

   
  volume_list = [ "vg_pdcpicpl02", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2" ]

  The hostname should be cluster connect name or initial hostname ?

  My /etc/hosts:

   
  192.168.24.32   PDC-PIC-PL-01   PDC-PIC-PL-01.chcs.sg

  192.168.25.132  PDC-PIC-PL-01-PM        

  192.168.26.13   PDC-PIC-PL-CL1

  192.168.24.33   PDC-PIC-PL-02  PDC-PIC-PL-02.chcs.sg

  192.168.25.133  PDC-PIC-PL-02-PM

  192.168.26.14   PDC-PIC-PL-CL2

   
  From: Andrew Denton [mailto:andrewd at sterling.net] 
  Sent: Wednesday, 10 October, 2012 11:27 PM
  To: linux clustering
  Cc: Shanti Pahari
  Subject: Re: [Linux-cluster] cannot run cluster service

   
  On 10/10/2012 12:06 AM, Shanti Pahari wrote:

    I have cluster setup with 2 node and created web cluster service on it but it cannot run. 

    I have not listed anything in lvm.conf volume_list because once I add anything in volume_list and reboot the system then I cannot mount and even cannot read the lv which I created for my web . It throws error as 

    error message "not activating volume group lv does not pass activation filter"

    Therefore I didn't add anything in lvm.conf . Then I try to start my cluster servers for web server but the service failed.


  I've seen this failure too when building my cluster. You either need to add the system's volume groups to volume_list, or tag the system's vgs with the @hostname so it can still activate them. 
  e.g.
      volume_list = [ "vg_pdcpicpl01", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2" ]
  on one node and
      volume_list = [ "vg_pdcpicpl02", "@PDC-PIC-PL-CL1", "@PDC-PIC-PL-CL2" ]
  on the other. Next it will complain about initrd being older than lvm.conf, so I've been running
  # mkinitrd -f /boot/initrd-`uname -r`.img `uname -r`
  Not sure if that's the right command but it works for me =)

  One of these days I'm going to tag the system's vgs properly so I can use the same lvm.conf across the nodes. I think it's something like
  lvchange --addtag PDC-PIC-PL-CL1 vg_pdcpicpl01/lv_root 
  etc...

  By the way, to display how things are tagged, you have to do
  lvs -o +tags
  I wish it displayed them in lvdisplay, but it doesn't.

  -- Andrew


------------------------------------------------------------------------------


  -- 
  Linux-cluster mailing list
  Linux-cluster at redhat.com
  https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121011/be3d6988/attachment.htm>

From lists at alteeve.ca  Thu Oct 11 18:46:49 2012
From: lists at alteeve.ca (Digimer)
Date: Thu, 11 Oct 2012 14:46:49 -0400
Subject: [Linux-cluster] [Pacemaker] Announce: pcs-0.9.26
In-Reply-To: <20121011112554.GC29887@redhat.com>
References: <50736F78.3060906@redhat.com> <507630C0.2010607@alteeve.ca>
	<20121011112554.GC29887@redhat.com>
Message-ID: <50771419.1000306@alteeve.ca>

On 10/11/2012 07:25 AM, Jan Pokorn? wrote:
> Hello Digimer,
>
> On 10/10/12 22:36 -0400, Digimer wrote:
>>    I started following Andrew's new pcs-based tutorial today on a fresh,
>> minimal F17 x86_64 install. Section 2.5 of CfS-pcs shows;
>>
>> ===
>> yum install -y pcs
>>
>> 2.5 Setup
>>
>> <snip>
>>
>> # systemctl start pcsd.service
>> # systemctl enable pcsd.service
>> ===
>>
>> This fails, and Andrew suggested using the version of pcs you annouced here.
>> Same problem though;
>>
>> ===
>> [root at an-c01n01 ~]# rpm -Uvh
>> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
>> Retrieving http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
>> Preparing...                ###########################################
>> [100%]
>>     1:pcs                    ###########################################
>> [100%]
>> [root at an-c01n01 ~]# systemctl start pcsd.service
>> Failed to issue method call: Unit pcsd.service failed to load: No such file
>> or directory. See system logs and 'systemctl status pcsd.service' for
>> details.
>>
>> [...]
>>
>> Any thoughts?
>
> this is part of pcs-gui project [1] packaging of which is probably pending.
>
> [1] https://github.com/feist/pcs-gui

Ah, so the daemon isn't needed if a user doesn't care to use the GUI?

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
"Hydrogen is just a colourless, odorless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From andrew at beekhof.net  Fri Oct 12 01:00:11 2012
From: andrew at beekhof.net (Andrew Beekhof)
Date: Fri, 12 Oct 2012 12:00:11 +1100
Subject: [Linux-cluster] [Pacemaker] Announce: pcs-0.9.26
In-Reply-To: <50771419.1000306@alteeve.ca>
References: <50736F78.3060906@redhat.com> <507630C0.2010607@alteeve.ca>
	<20121011112554.GC29887@redhat.com> <50771419.1000306@alteeve.ca>
Message-ID: <CAEDLWG10dC_HrvUyPVyxZzz_pneopXJovDY=DkMOzZYa36mBMg@mail.gmail.com>

On Fri, Oct 12, 2012 at 5:46 AM, Digimer <lists at alteeve.ca> wrote:
> On 10/11/2012 07:25 AM, Jan Pokorn? wrote:
>>
>> Hello Digimer,
>>
>> On 10/10/12 22:36 -0400, Digimer wrote:
>>>
>>>    I started following Andrew's new pcs-based tutorial today on a fresh,
>>> minimal F17 x86_64 install. Section 2.5 of CfS-pcs shows;
>>>
>>> ===
>>> yum install -y pcs
>>>
>>> 2.5 Setup
>>>
>>> <snip>
>>>
>>> # systemctl start pcsd.service
>>> # systemctl enable pcsd.service
>>> ===
>>>
>>> This fails, and Andrew suggested using the version of pcs you annouced
>>> here.
>>> Same problem though;
>>>
>>> ===
>>> [root at an-c01n01 ~]# rpm -Uvh
>>> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
>>> Retrieving
>>> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
>>> Preparing...                ###########################################
>>> [100%]
>>>     1:pcs                    ###########################################
>>> [100%]
>>> [root at an-c01n01 ~]# systemctl start pcsd.service
>>> Failed to issue method call: Unit pcsd.service failed to load: No such
>>> file
>>> or directory. See system logs and 'systemctl status pcsd.service' for
>>> details.
>>>
>>> [...]
>>>
>>> Any thoughts?
>>
>>
>> this is part of pcs-gui project [1] packaging of which is probably
>> pending.
>>
>> [1] https://github.com/feist/pcs-gui
>
>
> Ah, so the daemon isn't needed if a user doesn't care to use the GUI?

I believe it is needed if you want to do anything more than talk to
the local node.  Which includes initial cluster setup.
I talked to Chris just now, he wanted to add PAM support (instead of
using pcs_passwd) before releasing that part for upstream.

New packages including the daemon pieces (with PAM support) should
land in the next day or so.


From lists at alteeve.ca  Fri Oct 12 01:26:29 2012
From: lists at alteeve.ca (Digimer)
Date: Thu, 11 Oct 2012 21:26:29 -0400
Subject: [Linux-cluster] [Pacemaker] Announce: pcs-0.9.26
In-Reply-To: <CAEDLWG10dC_HrvUyPVyxZzz_pneopXJovDY=DkMOzZYa36mBMg@mail.gmail.com>
References: <50736F78.3060906@redhat.com> <507630C0.2010607@alteeve.ca>
	<20121011112554.GC29887@redhat.com> <50771419.1000306@alteeve.ca>
	<CAEDLWG10dC_HrvUyPVyxZzz_pneopXJovDY=DkMOzZYa36mBMg@mail.gmail.com>
Message-ID: <507771C5.2080001@alteeve.ca>

On 10/11/2012 09:00 PM, Andrew Beekhof wrote:
> On Fri, Oct 12, 2012 at 5:46 AM, Digimer <lists at alteeve.ca> wrote:
>> On 10/11/2012 07:25 AM, Jan Pokorn? wrote:
>>>
>>> Hello Digimer,
>>>
>>> On 10/10/12 22:36 -0400, Digimer wrote:
>>>>
>>>>     I started following Andrew's new pcs-based tutorial today on a fresh,
>>>> minimal F17 x86_64 install. Section 2.5 of CfS-pcs shows;
>>>>
>>>> ===
>>>> yum install -y pcs
>>>>
>>>> 2.5 Setup
>>>>
>>>> <snip>
>>>>
>>>> # systemctl start pcsd.service
>>>> # systemctl enable pcsd.service
>>>> ===
>>>>
>>>> This fails, and Andrew suggested using the version of pcs you annouced
>>>> here.
>>>> Same problem though;
>>>>
>>>> ===
>>>> [root at an-c01n01 ~]# rpm -Uvh
>>>> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
>>>> Retrieving
>>>> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
>>>> Preparing...                ###########################################
>>>> [100%]
>>>>      1:pcs                    ###########################################
>>>> [100%]
>>>> [root at an-c01n01 ~]# systemctl start pcsd.service
>>>> Failed to issue method call: Unit pcsd.service failed to load: No such
>>>> file
>>>> or directory. See system logs and 'systemctl status pcsd.service' for
>>>> details.
>>>>
>>>> [...]
>>>>
>>>> Any thoughts?
>>>
>>>
>>> this is part of pcs-gui project [1] packaging of which is probably
>>> pending.
>>>
>>> [1] https://github.com/feist/pcs-gui
>>
>>
>> Ah, so the daemon isn't needed if a user doesn't care to use the GUI?
>
> I believe it is needed if you want to do anything more than talk to
> the local node.  Which includes initial cluster setup.
> I talked to Chris just now, he wanted to add PAM support (instead of
> using pcs_passwd) before releasing that part for upstream.
>
> New packages including the daemon pieces (with PAM support) should
> land in the next day or so.
>

Awesome, I'll try it out once it's available.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
"Hydrogen is just a colourless, odorless gas which, if left alone in 
sufficient quantities for long periods of time, begins to think about 
itself."


From a.holway at syseleven.de  Fri Oct 12 15:22:39 2012
From: a.holway at syseleven.de (Andrew Holway)
Date: Fri, 12 Oct 2012 17:22:39 +0200
Subject: [Linux-cluster] some clvmd / locking problem
Message-ID: <1D60B84E-0ABD-4A5C-8E53-67D1EA796537@syseleven.de>

Hello,

I am trying to set up a 4 node cluster with a shared iSCSI storage device.

I cannot start clvmd: service clvmd start just hangs.

I cannot stop cman:

node001:	Working directory: /root
node001:	Stopping cluster: 
node001:	   Leaving fence domain... found dlm lockspace /sys/kernel/dlm/clvmd
node001:	fence_tool: cannot leave due to active systems
node001:	[FAILED]

I find these errors in /var/log/cluster/dlm_controld.log 

Oct 12 17:12:13 dlm_controld daemon cpg_join error retrying
Oct 12 17:12:23 dlm_controld daemon cpg_join error retrying
Oct 12 17:12:33 dlm_controld daemon cpg_join error retrying

[root at node001 clvmd]# clvmd status
clvmd failed in initialisation

Any ideas?

Thanks,

Andrew


From lists at alteeve.ca  Fri Oct 12 15:28:36 2012
From: lists at alteeve.ca (Digimer)
Date: Fri, 12 Oct 2012 11:28:36 -0400
Subject: [Linux-cluster] some clvmd / locking problem
In-Reply-To: <1D60B84E-0ABD-4A5C-8E53-67D1EA796537@syseleven.de>
References: <1D60B84E-0ABD-4A5C-8E53-67D1EA796537@syseleven.de>
Message-ID: <50783724.5040703@alteeve.ca>

On 10/12/2012 11:22 AM, Andrew Holway wrote:
> Hello,
>
> I am trying to set up a 4 node cluster with a shared iSCSI storage device.
>
> I cannot start clvmd: service clvmd start just hangs.
>
> I cannot stop cman:
>
> node001:	Working directory: /root
> node001:	Stopping cluster:
> node001:	   Leaving fence domain... found dlm lockspace /sys/kernel/dlm/clvmd
> node001:	fence_tool: cannot leave due to active systems
> node001:	[FAILED]
>
> I find these errors in /var/log/cluster/dlm_controld.log
>
> Oct 12 17:12:13 dlm_controld daemon cpg_join error retrying
> Oct 12 17:12:23 dlm_controld daemon cpg_join error retrying
> Oct 12 17:12:33 dlm_controld daemon cpg_join error retrying
>
> [root at node001 clvmd]# clvmd status
> clvmd failed in initialisation
>
> Any ideas?
>
> Thanks,
>
> Andrew

Can you paste your cluster.conf please? I suspect something went wrong, 
it tried to fence and then failed to do so, so it's blocked.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From a.holway at syseleven.de  Fri Oct 12 18:52:35 2012
From: a.holway at syseleven.de (Andrew Holway)
Date: Fri, 12 Oct 2012 20:52:35 +0200
Subject: [Linux-cluster] some clvmd / locking problem
In-Reply-To: <50783724.5040703@alteeve.ca>
References: <1D60B84E-0ABD-4A5C-8E53-67D1EA796537@syseleven.de>
	<50783724.5040703@alteeve.ca>
Message-ID: <48CCA736-1A1B-43F5-B94E-8C9EE28E0384@syseleven.de>


On Oct 12, 2012, at 5:28 PM, Digimer wrote:

> On 10/12/2012 11:22 AM, Andrew Holway wrote:
>> Hello,
>> 
>> I am trying to set up a 4 node cluster with a shared iSCSI storage device.
>> 
>> I cannot start clvmd: service clvmd start just hangs.
>> 
>> I cannot stop cman:
>> 
>> node001:	Working directory: /root
>> node001:	Stopping cluster:
>> node001:	   Leaving fence domain... found dlm lockspace /sys/kernel/dlm/clvmd
>> node001:	fence_tool: cannot leave due to active systems
>> node001:	[FAILED]
>> 
>> I find these errors in /var/log/cluster/dlm_controld.log
>> 
>> Oct 12 17:12:13 dlm_controld daemon cpg_join error retrying
>> Oct 12 17:12:23 dlm_controld daemon cpg_join error retrying
>> Oct 12 17:12:33 dlm_controld daemon cpg_join error retrying
>> 
>> [root at node001 clvmd]# clvmd status
>> clvmd failed in initialisation
>> 
>> Any ideas?
>> 
>> Thanks,
>> 
>> Andrew
> 
> Can you paste your cluster.conf please? I suspect something went wrong, it tried to fence and then failed to do so, so it's blocked.

:) I had two node id's the same.

Thanks

Andrew

> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without access to education?


From shanti.pahari at sierra.sg  Fri Oct 12 19:47:19 2012
From: shanti.pahari at sierra.sg (Shanti Pahari)
Date: Sat, 13 Oct 2012 03:47:19 +0800 (SGT)
Subject: [Linux-cluster] cannot detect SAN disk in RHEL6.1
Message-ID: <423ded17.00000e6c.00000023@sierra-A66>

Hi ,

 
When I added FC external disk in RHEL 6.1 it didn't load in /dev/mapper .

After reboot also it didn't detect.

Any help ? 

Highly appreciated.

 
thanks

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121013/0516c3fd/attachment.htm>

From lists at alteeve.ca  Fri Oct 12 19:53:12 2012
From: lists at alteeve.ca (Digimer)
Date: Fri, 12 Oct 2012 15:53:12 -0400
Subject: [Linux-cluster] some clvmd / locking problem
In-Reply-To: <48CCA736-1A1B-43F5-B94E-8C9EE28E0384@syseleven.de>
References: <1D60B84E-0ABD-4A5C-8E53-67D1EA796537@syseleven.de>
	<50783724.5040703@alteeve.ca>
	<48CCA736-1A1B-43F5-B94E-8C9EE28E0384@syseleven.de>
Message-ID: <50787528.7000602@alteeve.ca>

On 10/12/2012 02:52 PM, Andrew Holway wrote:
>
> On Oct 12, 2012, at 5:28 PM, Digimer wrote:
>
>> On 10/12/2012 11:22 AM, Andrew Holway wrote:
>>> Hello,
>>>
>>> I am trying to set up a 4 node cluster with a shared iSCSI storage device.
>>>
>>> I cannot start clvmd: service clvmd start just hangs.
>>>
>>> I cannot stop cman:
>>>
>>> node001:	Working directory: /root
>>> node001:	Stopping cluster:
>>> node001:	   Leaving fence domain... found dlm lockspace /sys/kernel/dlm/clvmd
>>> node001:	fence_tool: cannot leave due to active systems
>>> node001:	[FAILED]
>>>
>>> I find these errors in /var/log/cluster/dlm_controld.log
>>>
>>> Oct 12 17:12:13 dlm_controld daemon cpg_join error retrying
>>> Oct 12 17:12:23 dlm_controld daemon cpg_join error retrying
>>> Oct 12 17:12:33 dlm_controld daemon cpg_join error retrying
>>>
>>> [root at node001 clvmd]# clvmd status
>>> clvmd failed in initialisation
>>>
>>> Any ideas?
>>>
>>> Thanks,
>>>
>>> Andrew
>>
>> Can you paste your cluster.conf please? I suspect something went wrong, it tried to fence and then failed to do so, so it's blocked.
>
> :) I had two node id's the same.
>
> Thanks
>
> Andrew

Heh, I've done that before, too. >_>

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From raju.rajsand at gmail.com  Fri Oct 12 20:00:24 2012
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Sat, 13 Oct 2012 01:30:24 +0530
Subject: [Linux-cluster] cannot detect SAN disk in RHEL6.1
In-Reply-To: <423ded17.00000e6c.00000023@sierra-A66>
References: <423ded17.00000e6c.00000023@sierra-A66>
Message-ID: <CA+YdgaqU_FwwpR2=inBDYKBMkjTa3G_wuK9=t3kvJ=aqNw5puQ@mail.gmail.com>

Greetings,

On Sat, Oct 13, 2012 at 1:17 AM, Shanti Pahari <shanti.pahari at sierra.sg> wrote:
> When I added FC external disk in RHEL 6.1 it didn?t load in /dev/mapper .
>
> After reboot also it didn?t detect.
>
>
> Highly appreciated.

IMHO not appreciated.

You have been very cryptic in your answers.

Why don't you buy Redhat support?

-- 
Regards,

Rajagopal


From shanti.pahari at sierra.sg  Fri Oct 12 23:37:47 2012
From: shanti.pahari at sierra.sg (Shanti Pahari)
Date: Sat, 13 Oct 2012 07:37:47 +0800 (SGT)
Subject: [Linux-cluster] cannot detect SAN disk in RHEL6.1
In-Reply-To: <CA+C_GOVNjiaJ4HFUDE-Rn9SHrK7FTqSKMOiHtEqB6VH0m-5JMA@mail.gmail.com>
References: <423ded17.00000e6c.00000023@sierra-A66>
	<CA+C_GOVNjiaJ4HFUDE-Rn9SHrK7FTqSKMOiHtEqB6VH0m-5JMA@mail.gmail.com>
Message-ID: <6f3021d4.00000e6c.00000027@sierra-A66>

Multipathd is running

And when I try /sbin/multipath -v0  it didn't show anything but when lsmod
|grep dm then I saw dm-multipath here.

But still cannot detect.

 
[root at PDC-PIC-PL-01 ~]# /sbin/multipath -v0

[root at PDC-PIC-PL-01 ~]# modprobe dm-multipath

[root at PDC-PIC-PL-01 ~]# /sbin/multipath -v0

[root at PDC-PIC-PL-01 ~]# lsmod |grep dm

dm_mirror              14067  0 

dm_region_hash         12136  1 dm_mirror

dm_log                 10120  2 dm_mirror,dm_region_hash

dm_round_robin          2651  6 

dm_multipath           18266  4 dm_round_robin

dm_mod                 75539  16 dm_mirror,dm_log,dm_multipath

[root at PDC-PIC-PL-01 ~]# service multipathd status

multipathd (pid  1567) is running...

[root at PDC-PIC-PL-01 ~]#

 
From: Ben .T.George [mailto:bentech4you at gmail.com] 
Sent: Saturday, 13 October, 2012 4:57 AM
To: shanti.pahari at sierra.sg
Subject: Re: [Linux-cluster] cannot detect SAN disk in RHEL6.1

 
HI

 
check multipath demon is running or not..also check dm-multipath kernel
module is loaded or not

 
regards,

Ben

 
On Fri, Oct 12, 2012 at 10:47 PM, Shanti Pahari <shanti.pahari at sierra.sg>
wrote:

Hi ,

 
When I added FC external disk in RHEL 6.1 it didn't load in /dev/mapper .

After reboot also it didn't detect.

Any help ? 

Highly appreciated.

 
thanks


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121013/89aa24a1/attachment.htm>

From a.holway at syseleven.de  Sat Oct 13 10:40:11 2012
From: a.holway at syseleven.de (Andrew Holway)
Date: Sat, 13 Oct 2012 12:40:11 +0200
Subject: [Linux-cluster] Linux clustering for high availability databases
	and other services
Message-ID: <344E72A6-BFE0-4413-88AD-8A066ED39D54@syseleven.de>

Hello, 

We have been experimenting with various storage technologies in order to create moderately highly available database services.

I have the following equipment in my lab:

4x HP G8 servers with 
* Mellanox QDR InfiniBand 
* 10GE adapters
* Lots of memory and the latest, most powerful CPUS.
* Centos 6.0

Oracle ZFS appliance
* Infiniband 
* NFS (over ethernet and infiniband)
* iSCSI (over ethernet and infiniband)
* Various RDMA protocols that are not supported by oracle  on redhat.

Nimble Storage device
* iSCSI over 10G ethernet

Brocade 10G switches.

Can I have some guidance on possible setups for HA database services? I have tested and have a good understanding of all the technology components but I am a bit confused how I should be glueing them together. I need a focus :)

Thanks,

Andrew


From raju.rajsand at gmail.com  Sat Oct 13 16:01:59 2012
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Sat, 13 Oct 2012 21:31:59 +0530
Subject: [Linux-cluster] Linux clustering for high availability
 databases and other services
In-Reply-To: <344E72A6-BFE0-4413-88AD-8A066ED39D54@syseleven.de>
References: <344E72A6-BFE0-4413-88AD-8A066ED39D54@syseleven.de>
Message-ID: <CA+Ydgaq1HZQ3OYgEibWWqnVnHJKR=zYkSpgv6CO_P7wk0pvkiA@mail.gmail.com>

Greetings,

On Sat, Oct 13, 2012 at 4:10 PM, Andrew Holway <a.holway at syseleven.de> wrote:
> Hello,
>
> We have been experimenting with various storage technologies in order to create moderately highly available database services.
>
> Can I have some guidance on possible setups for HA database services? I have tested and have a good understanding of all the technology components but I am a bit confused how I should be glueing them together. I need a focus :)
>
> Thanks,

Do you mean active/active Oracle RAC type?

-- 
Regards,

Rajagopal


From heiko.nardmann at itechnical.de  Sun Oct 14 14:06:02 2012
From: heiko.nardmann at itechnical.de (Heiko Nardmann)
Date: Sun, 14 Oct 2012 16:06:02 +0200
Subject: [Linux-cluster] Linux clustering for high availability
 databases and other services
In-Reply-To: <344E72A6-BFE0-4413-88AD-8A066ED39D54@syseleven.de>
References: <344E72A6-BFE0-4413-88AD-8A066ED39D54@syseleven.de>
Message-ID: <507AC6CA.8080602@itechnical.de>

Hi,

RedHat provides support for setting up such scenarios. I recommend
buying RHEL and contacting them.


Kind regards,

    Heiko

Am 13.10.2012 12:40, schrieb Andrew Holway:
> Hello, 
>
> We have been experimenting with various storage technologies in order to create moderately highly available database services.
>
> I have the following equipment in my lab:
>
> 4x HP G8 servers with 
> * Mellanox QDR InfiniBand 
> * 10GE adapters
> * Lots of memory and the latest, most powerful CPUS.
> * Centos 6.0
>
> Oracle ZFS appliance
> * Infiniband 
> * NFS (over ethernet and infiniband)
> * iSCSI (over ethernet and infiniband)
> * Various RDMA protocols that are not supported by oracle  on redhat.
>
> Nimble Storage device
> * iSCSI over 10G ethernet
>
> Brocade 10G switches.
>
> Can I have some guidance on possible setups for HA database services? I have tested and have a good understanding of all the technology components but I am a bit confused how I should be glueing them together. I need a focus :)
>
> Thanks,
>
> Andrew
>
>


From raju.rajsand at gmail.com  Sun Oct 14 14:38:18 2012
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Sun, 14 Oct 2012 20:08:18 +0530
Subject: [Linux-cluster] Linux clustering for high availability
 databases and other services
In-Reply-To: <507AC6CA.8080602@itechnical.de>
References: <344E72A6-BFE0-4413-88AD-8A066ED39D54@syseleven.de>
	<507AC6CA.8080602@itechnical.de>
Message-ID: <CA+Ydgaqu94yzEbBoNvxQMNeuiAZyFSC4oi75jyRC29_4ex+MqA@mail.gmail.com>

Greetings,


On Sun, Oct 14, 2012 at 7:36 PM, Heiko Nardmann
<heiko.nardmann at itechnical.de> wrote:
>
> RedHat provides support for setting up such scenarios. I recommend
> buying RHEL and contacting them.

++1

Please note that active/active file sharing is *very* different from
active/active DB server.

AFAIK, only Oracle RAC and IBM DB2 have Active/Active HA DB options:
no escape from spending money .... :)

-- 
Regards,

Rajagopal


From a.holway at syseleven.de  Sun Oct 14 17:38:57 2012
From: a.holway at syseleven.de (Andrew Holway)
Date: Sun, 14 Oct 2012 19:38:57 +0200
Subject: [Linux-cluster] Linux clustering for high availability
	databases and other services
In-Reply-To: <CA+Ydgaq1HZQ3OYgEibWWqnVnHJKR=zYkSpgv6CO_P7wk0pvkiA@mail.gmail.com>
References: <344E72A6-BFE0-4413-88AD-8A066ED39D54@syseleven.de>
	<CA+Ydgaq1HZQ3OYgEibWWqnVnHJKR=zYkSpgv6CO_P7wk0pvkiA@mail.gmail.com>
Message-ID: <75F0F61B-117E-4EFE-B265-C61DB52C75D0@syseleven.de>

> 
> Do you mean active/active Oracle RAC type?

No, Perhaps active / passive mysql type.


> 
> -- 
> Regards,
> 
> Rajagopal
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From a.holway at syseleven.de  Sun Oct 14 17:57:24 2012
From: a.holway at syseleven.de (Andrew Holway)
Date: Sun, 14 Oct 2012 19:57:24 +0200
Subject: [Linux-cluster] Linux clustering for high availability
	databases and other services
In-Reply-To: <CA+Ydgaq1HZQ3OYgEibWWqnVnHJKR=zYkSpgv6CO_P7wk0pvkiA@mail.gmail.com>
References: <344E72A6-BFE0-4413-88AD-8A066ED39D54@syseleven.de>
	<CA+Ydgaq1HZQ3OYgEibWWqnVnHJKR=zYkSpgv6CO_P7wk0pvkiA@mail.gmail.com>
Message-ID: <8114940B-2405-4A7B-A189-71B50666B67E@syseleven.de>

> 
> Do you mean active/active Oracle RAC type?

We are using a lot Mysql in house. Perhaps high availability is the wrong phrase.

More Available perhaps?

> 
> -- 
> Regards,
> 
> Rajagopal
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From christian.masopust at siemens.com  Sun Oct 14 19:36:48 2012
From: christian.masopust at siemens.com (Masopust, Christian)
Date: Sun, 14 Oct 2012 21:36:48 +0200
Subject: [Linux-cluster] Linux clustering for high availability
 databases and other services
In-Reply-To: <507AC6CA.8080602@itechnical.de>
References: <344E72A6-BFE0-4413-88AD-8A066ED39D54@syseleven.de>
	<507AC6CA.8080602@itechnical.de>
Message-ID: <C3B6F57F6F0CE34093FF52B3FFBEFA7C0144E493B5D6@ATVIES9917VMSX.ww300.siemens.net>


Hi Andrew,

maybe I understand you completely wrong, but...

Do you know "Galera Cluster" ?  If you are on MySQL maybe
you can give it a try?

br,
christian

> Am 13.10.2012 12:40, schrieb Andrew Holway:
> > Hello, 
> >
> > We have been experimenting with various storage 
> technologies in order to create moderately highly available 
> database services.
> >
> > I have the following equipment in my lab:
> >
> > 4x HP G8 servers with 
> > * Mellanox QDR InfiniBand 
> > * 10GE adapters
> > * Lots of memory and the latest, most powerful CPUS.
> > * Centos 6.0
> >
> > Oracle ZFS appliance
> > * Infiniband 
> > * NFS (over ethernet and infiniband)
> > * iSCSI (over ethernet and infiniband)
> > * Various RDMA protocols that are not supported by oracle  
> on redhat.
> >
> > Nimble Storage device
> > * iSCSI over 10G ethernet
> >
> > Brocade 10G switches.
> >
> > Can I have some guidance on possible setups for HA database 
> services? I have tested and have a good understanding of all 
> the technology components but I am a bit confused how I 
> should be glueing them together. I need a focus :)
> >
> > Thanks,
> >
> > Andrew
> >
> >
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


From bmr at redhat.com  Mon Oct 15 11:08:13 2012
From: bmr at redhat.com (Bryn M. Reeves)
Date: Mon, 15 Oct 2012 12:08:13 +0100
Subject: [Linux-cluster] cannot detect SAN disk in RHEL6.1
In-Reply-To: <423ded17.00000e6c.00000023@sierra-A66>
References: <423ded17.00000e6c.00000023@sierra-A66>
Message-ID: <507BEE9D.10304@redhat.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/12/2012 08:47 PM, Shanti Pahari wrote:
> When I added FC external disk in RHEL 6.1 it didn't load in
> /dev/mapper .
> 
> After reboot also it didn't detect.

Check for SCSI devices being registered (/proc/scsi/scsi, lsscsi and
dmesg).

Also read: http://tinyurl.com/93rnlbn [access.redhat.com, RHEL Storage
Administration Guide).

Regards,
Bryn.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlB77p0ACgkQ6YSQoMYUY94sBwCdFpKSB5gdgdR27CcoE/RTTPkO
xXsAoIPFK1KA2CpHgsYHaubW5s1ocIWg
=XaKW
-----END PGP SIGNATURE-----


From epretorious at yahoo.com  Mon Oct 15 20:41:00 2012
From: epretorious at yahoo.com (Eric)
Date: Mon, 15 Oct 2012 13:41:00 -0700 (PDT)
Subject: [Linux-cluster] Getting started with cLVM
Message-ID: <1350333660.23559.YahooMailNeo@web121705.mail.ne1.yahoo.com>

I've been reading about cLVM and I'm having a difficult time getting my mind wrapped around The Big Picture.

Looking at Figure?1.2, ?CLVM Overview, of? the Red Hat LVM Administrator Guide I'm a bit puzzled. Specifically: Are storage devices exported directly to cLVM daemons? Or are LV's exported directly to cLVM daemonds for consumption? Or will another shared-disk protocol (e.g., GNBD or iSCSI) be required for "the last mile"?

Eric Pretorious
Truckee, CA


From lists at alteeve.ca  Mon Oct 15 21:37:10 2012
From: lists at alteeve.ca (Digimer)
Date: Mon, 15 Oct 2012 17:37:10 -0400
Subject: [Linux-cluster] Getting started with cLVM
In-Reply-To: <1350333660.23559.YahooMailNeo@web121705.mail.ne1.yahoo.com>
References: <1350333660.23559.YahooMailNeo@web121705.mail.ne1.yahoo.com>
Message-ID: <507C8206.6090908@alteeve.ca>

On 10/15/2012 04:41 PM, Eric wrote:
> I've been reading about cLVM and I'm having a difficult time getting my mind wrapped around The Big Picture.
>
> Looking at Figure 1.2, ?CLVM Overview, of  the Red Hat LVM Administrator Guide I'm a bit puzzled. Specifically: Are storage devices exported directly to cLVM daemons? Or are LV's exported directly to cLVM daemonds for consumption? Or will another shared-disk protocol (e.g., GNBD or iSCSI) be required for "the last mile"?
>
> Eric Pretorious
> Truckee, CA

Clustered LVM, fundamentally, swaps out the normal internal locking to 
DLM (distributed lock manager), which is provided by cman under EL6 and 
corosync (v2+). Beyond this, you can think of LVM as you always have.

So, commonly, the PV(s) would be some form of shared storage; DRBD and 
SANs are the most common I think. When you pvcreate /dev/foo (foo being 
your shared storage), you can immediately 'pvscan' on the other nodes 
and see the new PV. Likewise, once you assign the PV to a VG on one 
node, 'vgscan' on all the other nodes will immediately see the 
new/expanded VG. Likewise with LV creation/resize/removes.

Beyond this, there is nothing further special about clustered LVM over 
"normal" LVM.

hth

digimer

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From cfeist at redhat.com  Wed Oct 17 00:02:41 2012
From: cfeist at redhat.com (Chris Feist)
Date: Tue, 16 Oct 2012 19:02:41 -0500
Subject: [Linux-cluster] [Pacemaker] Announce: pcs-0.9.26
In-Reply-To: <50736F78.3060906@redhat.com>
References: <50736F78.3060906@redhat.com>
Message-ID: <507DF5A1.3060905@redhat.com>

On 10/08/12 19:27, Chris Feist wrote:
> We've been making improvements to the pcs (pacemaker/corosync configuration
> system) command line tool over the past few months.
>
> Currently you can setup a basic cluster (including configuring corosync 2.0 udpu).
>
> David Vossel has also created a version of the "Clusters from Scratch" document
> that illustrates setting up a cluster using pcs.  This should be showing up
> shortly.

Just an update, I've updated the pcs (to 0.9.27) and included the pcsd daemon 
with the fedora packages.  You can grab the updated packages here:

http://people.redhat.com/cfeist/pcs/

And you should be able to used the new Clusters from Scratch optimized for the 
pcs CLI here: http://www.clusterlabs.org/doc/

Just a couple things to note (this should be shortly updated in the notes).

To run pcs on Fedora 17/18 you'll need to turn off selinux & disable the 
firewall (or at least allow traffic on port 2224).

To disable SELinux set 'SELINUX=permissive' in /etc/selinux/config and reboot
To disable the firewall run 'systemctl stop iptables.service' (to permanently 
disable run 'systemctl disable iptables.service')

The pcs_passwd command has been removed.  In it's place you can do 
authentication with the hacluster user.  Just set the hacluster user password 
(passwd hacluster) and then use that user and password to authenticate with pcs.

If you have any questions or any issues don't hesitate to contact me, we're 
still working out the bugs in the new pcsd daemon and we appreciate all the 
feedback we can get.

Thanks,
Chris

>
> You can view the source here: https://github.com/feist/pcs/
>
> Or download the latest tarball:
> https://github.com/downloads/feist/pcs/pcs-0.9.26.tar.gz
>
> There is also a Fedora 18 package that will be included with the next release.
> You should be able to find that package in the following locations...
>
> RPM:
> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
>
> SRPM:
> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.src.rpm
>
> In the near future we are planning on having builds for SUSE & Ubuntu/Debian.
>
> We're also actively working on a GUI/Daemon that will allow control of your
> entire cluster from one node and/or a web browser.
>
> Please feel free to email me (cfeist at redhat.com) or open issues on the pcs
> project at github (https://github.com/feist/pcs/issues) if you have any
> questions or problems.
>
> Thanks!
> Chris
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


From andrew at beekhof.net  Wed Oct 17 02:09:18 2012
From: andrew at beekhof.net (Andrew Beekhof)
Date: Wed, 17 Oct 2012 13:09:18 +1100
Subject: [Linux-cluster] [Pacemaker] Announce: pcs-0.9.26
In-Reply-To: <507DF5A1.3060905@redhat.com>
References: <50736F78.3060906@redhat.com>
	<507DF5A1.3060905@redhat.com>
Message-ID: <CAEDLWG23eoSyspSa1=PakrKvVxFrr9ShXibBPE480fJAGmQMZQ@mail.gmail.com>

On Wed, Oct 17, 2012 at 11:02 AM, Chris Feist <cfeist at redhat.com> wrote:
> On 10/08/12 19:27, Chris Feist wrote:
>>
>> We've been making improvements to the pcs (pacemaker/corosync
>> configuration
>> system) command line tool over the past few months.
>>
>> Currently you can setup a basic cluster (including configuring corosync
>> 2.0 udpu).
>>
>> David Vossel has also created a version of the "Clusters from Scratch"
>> document
>> that illustrates setting up a cluster using pcs.  This should be showing
>> up
>> shortly.
>
>
> Just an update, I've updated the pcs (to 0.9.27) and included the pcsd
> daemon with the fedora packages.  You can grab the updated packages here:
>
> http://people.redhat.com/cfeist/pcs/
>
> And you should be able to used the new Clusters from Scratch optimized for
> the pcs CLI here: http://www.clusterlabs.org/doc/

Those docs have now been updated to match the new release.

>
> Just a couple things to note (this should be shortly updated in the notes).
>
> To run pcs on Fedora 17/18 you'll need to turn off selinux & disable the
> firewall (or at least allow traffic on port 2224).
>
> To disable SELinux set 'SELINUX=permissive' in /etc/selinux/config and
> reboot
> To disable the firewall run 'systemctl stop iptables.service' (to
> permanently disable run 'systemctl disable iptables.service')
>
> The pcs_passwd command has been removed.  In it's place you can do
> authentication with the hacluster user.  Just set the hacluster user
> password (passwd hacluster) and then use that user and password to
> authenticate with pcs.
>
> If you have any questions or any issues don't hesitate to contact me, we're
> still working out the bugs in the new pcsd daemon and we appreciate all the
> feedback we can get.
>
> Thanks,
> Chris
>
>
>>
>> You can view the source here: https://github.com/feist/pcs/
>>
>> Or download the latest tarball:
>> https://github.com/downloads/feist/pcs/pcs-0.9.26.tar.gz
>>
>> There is also a Fedora 18 package that will be included with the next
>> release.
>> You should be able to find that package in the following locations...
>>
>> RPM:
>> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
>>
>> SRPM:
>> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.src.rpm
>>
>> In the near future we are planning on having builds for SUSE &
>> Ubuntu/Debian.
>>
>> We're also actively working on a GUI/Daemon that will allow control of
>> your
>> entire cluster from one node and/or a web browser.
>>
>> Please feel free to email me (cfeist at redhat.com) or open issues on the pcs
>> project at github (https://github.com/feist/pcs/issues) if you have any
>> questions or problems.
>>
>> Thanks!
>> Chris
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


From terance at socialtwist.com  Wed Oct 17 13:12:21 2012
From: terance at socialtwist.com (Terance Dias)
Date: Wed, 17 Oct 2012 18:42:21 +0530
Subject: [Linux-cluster] CMAN nodes in different LANs
In-Reply-To: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
Message-ID: <CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>

Hi,

We're trying to create a cluster in which the nodes lie in 2 different
LANs. Since the nodes lie in different networks, they cannot resolve the
other node by their internal IP. So in my cluster.conf file, I've provided
their external IPs. But now when I start CMAN service, I get the following
error.

-----------------------------------

Starting cluster:
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... Cannot find node name in cluster.conf
Unable to get the configuration
Cannot find node name in cluster.conf
cman_tool: corosync daemon didn't start
[FAILED]

-------------------------------------

My cluster.conf file is as below

-------------------------------------

<?xml version="1.0"?>
<!--
This is an example of a cluster.conf file to run qpidd HA under rgmanager.

NOTE: fencing is not shown, you must configure fencing appropriately for
your cluster.
-->

<cluster name="test-cluster" config_version="18">
  <!-- The cluster has 2 nodes. Each has a unique nodid and one vote
       for quorum. -->
  <clusternodes>
    <clusternode name="*external-ip-1*" nodeid="1"/>
    <clusternode name="*external-ip-2*" nodeid="2"/>
  </clusternodes>
  <cman two_node="1" expected_votes="1" transport="udpu">
  </cman>
  <!-- Resouce Manager configuration. -->
  <rm>
    <!--
        There is a failoverdomain for each node containing just that node.
        This lets us stipulate that the qpidd service should always run on
each node.
    -->
    <failoverdomains>
      <failoverdomain name="east-domain" restricted="1">
        <failoverdomainnode name="*external-ip-1*"/>
      </failoverdomain>
      <failoverdomain name="west-domain" restricted="1">
        <failoverdomainnode name="*external-ip-2*"/>
      </failoverdomain>
    </failoverdomains>

    <resources>
      <!-- This script starts a qpidd broker acting as a backup. -->
      <script file="/usr/local/etc/init.d/qpidd" name="qpidd"/>

      <!-- This script promotes the qpidd broker on this node to primary.
-->
      <script file="/usr/local/etc/init.d/qpidd-primary"
name="qpidd-primary"/>
    </resources>

    <!-- There is a qpidd service on each node, it should be restarted if
it fails. -->
    <service name="east-qpidd-service" domain="east-domain"
recovery="restart">
      <script ref="qpidd"/>
    </service>
    <service name="west-qpidd-service" domain="west-domain"
recovery="restart">
      <script ref="qpidd"/>
    </service>

    <!-- There should always be a single qpidd-primary service, it can run
on any node. -->
    <service name="qpidd-primary-service" autostart="1" exclusive="0"
recovery="relocate">
      <script ref="qpidd-primary"/>
    </service>
  </rm>
</cluster>
------------------------------------------------

Thanks,
Terance
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121017/12f6939d/attachment.htm>

From lists at alteeve.ca  Wed Oct 17 13:41:56 2012
From: lists at alteeve.ca (Digimer)
Date: Wed, 17 Oct 2012 09:41:56 -0400
Subject: [Linux-cluster] CMAN nodes in different LANs
In-Reply-To: <CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
Message-ID: <507EB5A4.4020502@alteeve.ca>

On 10/17/2012 09:12 AM, Terance Dias wrote:
> Hi,
> 
> We're trying to create a cluster in which the nodes lie in 2 different
> LANs. Since the nodes lie in different networks, they cannot resolve the
> other node by their internal IP. So in my cluster.conf file, I've provided
> their external IPs. But now when I start CMAN service, I get the following
> error.
> 
> NOTE: fencing is not shown, you must configure fencing appropriately for
> your cluster.

You do need fencing. Without *working* fencing, the first time something
goes wrong, your cluster will hang (by design).

> <cluster name="test-cluster" config_version="18">
>   <!-- The cluster has 2 nodes. Each has a unique nodid and one vote
>        for quorum. -->
>   <clusternodes>
>     <clusternode name="*external-ip-1*" nodeid="1"/>
>     <clusternode name="*external-ip-2*" nodeid="2"/>

You need to use names and these names need to resolve to an IP.
Whichever interface that IP matches will be used for cluster
communications. The ideal is to use `uname -n` from each node, though
this is more convention that a requirement.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From heiko.nardmann at itechnical.de  Wed Oct 17 14:03:04 2012
From: heiko.nardmann at itechnical.de (Heiko Nardmann)
Date: Wed, 17 Oct 2012 16:03:04 +0200
Subject: [Linux-cluster] CMAN nodes in different LANs
In-Reply-To: <507EB5A4.4020502@alteeve.ca>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
	<507EB5A4.4020502@alteeve.ca>
Message-ID: <507EBA98.7040106@itechnical.de>

Sometimes I wonder whether an ML bot grepping for 'fencing' and
responding with something like 'Check fencing' might make sense ... ;-)

    Heiko

Am 17.10.2012 15:41, schrieb Digimer:
> On 10/17/2012 09:12 AM, Terance Dias wrote:
>> Hi,
>>
>> We're trying to create a cluster in which the nodes lie in 2 different
>> LANs. Since the nodes lie in different networks, they cannot resolve the
>> other node by their internal IP. So in my cluster.conf file, I've provided
>> their external IPs. But now when I start CMAN service, I get the following
>> error.
>>
>> NOTE: fencing is not shown, you must configure fencing appropriately for
>> your cluster.
> You do need fencing. Without *working* fencing, the first time something
> goes wrong, your cluster will hang (by design).
>
>> <cluster name="test-cluster" config_version="18">
>>   <!-- The cluster has 2 nodes. Each has a unique nodid and one vote
>>        for quorum. -->
>>   <clusternodes>
>>     <clusternode name="*external-ip-1*" nodeid="1"/>
>>     <clusternode name="*external-ip-2*" nodeid="2"/>
> You need to use names and these names need to resolve to an IP.
> Whichever interface that IP matches will be used for cluster
> communications. The ideal is to use `uname -n` from each node, though
> this is more convention that a requirement.
>


From lists at alteeve.ca  Wed Oct 17 14:06:21 2012
From: lists at alteeve.ca (Digimer)
Date: Wed, 17 Oct 2012 10:06:21 -0400
Subject: [Linux-cluster] CMAN nodes in different LANs
In-Reply-To: <507EBA98.7040106@itechnical.de>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
	<507EB5A4.4020502@alteeve.ca> <507EBA98.7040106@itechnical.de>
Message-ID: <507EBB5D.4020509@alteeve.ca>

But but but, I'd have nothing useful left to add! :(

digimer

On 10/17/2012 10:03 AM, Heiko Nardmann wrote:
> Sometimes I wonder whether an ML bot grepping for 'fencing' and
> responding with something like 'Check fencing' might make sense ... ;-)
> 
>     Heiko
> 
> Am 17.10.2012 15:41, schrieb Digimer:
>> On 10/17/2012 09:12 AM, Terance Dias wrote:
>>> Hi,
>>>
>>> We're trying to create a cluster in which the nodes lie in 2 different
>>> LANs. Since the nodes lie in different networks, they cannot resolve the
>>> other node by their internal IP. So in my cluster.conf file, I've provided
>>> their external IPs. But now when I start CMAN service, I get the following
>>> error.
>>>
>>> NOTE: fencing is not shown, you must configure fencing appropriately for
>>> your cluster.
>> You do need fencing. Without *working* fencing, the first time something
>> goes wrong, your cluster will hang (by design).
>>
>>> <cluster name="test-cluster" config_version="18">
>>>   <!-- The cluster has 2 nodes. Each has a unique nodid and one vote
>>>        for quorum. -->
>>>   <clusternodes>
>>>     <clusternode name="*external-ip-1*" nodeid="1"/>
>>>     <clusternode name="*external-ip-2*" nodeid="2"/>
>> You need to use names and these names need to resolve to an IP.
>> Whichever interface that IP matches will be used for cluster
>> communications. The ideal is to use `uname -n` from each node, though
>> this is more convention that a requirement.
>>
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From terance at socialtwist.com  Wed Oct 17 18:45:33 2012
From: terance at socialtwist.com (Terance Dias)
Date: Thu, 18 Oct 2012 00:15:33 +0530
Subject: [Linux-cluster] CMAN nodes in different LANs
In-Reply-To: <507EB5A4.4020502@alteeve.ca>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
	<507EB5A4.4020502@alteeve.ca>
Message-ID: <CAK76X7TcNHmdPoprDgTt-LnjwO+b5uwvogUUio1aP4hGsEzXcQ@mail.gmail.com>

I tried using public IP as well as names like 'ec2-node-1.amazonaws.com'
which is resolved to an IP by DNS lookup. In both cases I get the same
error I mentioned earlier.
If I give the internal IP of the node where I'm trying to run the CMAN
service, the service starts, but I still cannot get it to resolve the other
node since the other node is in a different network.

On Wed, Oct 17, 2012 at 7:11 PM, Digimer <lists at alteeve.ca> wrote:

> On 10/17/2012 09:12 AM, Terance Dias wrote:
> > Hi,
> >
> > We're trying to create a cluster in which the nodes lie in 2 different
> > LANs. Since the nodes lie in different networks, they cannot resolve the
> > other node by their internal IP. So in my cluster.conf file, I've
> provided
> > their external IPs. But now when I start CMAN service, I get the
> following
> > error.
> >
> > NOTE: fencing is not shown, you must configure fencing appropriately for
> > your cluster.
>
> You do need fencing. Without *working* fencing, the first time something
> goes wrong, your cluster will hang (by design).
>
> > <cluster name="test-cluster" config_version="18">
> >   <!-- The cluster has 2 nodes. Each has a unique nodid and one vote
> >        for quorum. -->
> >   <clusternodes>
> >     <clusternode name="*external-ip-1*" nodeid="1"/>
> >     <clusternode name="*external-ip-2*" nodeid="2"/>
>
> You need to use names and these names need to resolve to an IP.
> Whichever interface that IP matches will be used for cluster
> communications. The ideal is to use `uname -n` from each node, though
> this is more convention that a requirement.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121018/7f1f79f9/attachment.htm>

From cfeist at redhat.com  Wed Oct 17 19:22:09 2012
From: cfeist at redhat.com (Chris Feist)
Date: Wed, 17 Oct 2012 14:22:09 -0500
Subject: [Linux-cluster] [Pacemaker] Announce: pcs-0.9.26
In-Reply-To: <A08579D19748D444B15E83044CC3E433025A4E88@MLRSM11.MLR.BWL.NET>
References: <50736F78.3060906@redhat.com> <507DF5A1.3060905@redhat.com>
	<A08579D19748D444B15E83044CC3E433025A4E88@MLRSM11.MLR.BWL.NET>
Message-ID: <507F0561.3030006@redhat.com>

On 10/17/12 03:22, Gr?ninger, Andreas (LGL Extern) wrote:
> Please see here:
>
> https://github.com/feist/pcs/issues

I'm definitely planning on supporting the --prefix option (and possibly options 
to specify the location for each binary).  I should have something shortly if 
you would like to test it out.

Thanks,
Chris

>
> Andreas
>
> -----Urspr?ngliche Nachricht-----
> Von: Chris Feist [mailto:cfeist at redhat.com]
> Gesendet: Mittwoch, 17. Oktober 2012 02:03
> An: The Pacemaker cluster resource manager
> Cc: linux clustering
> Betreff: Re: [Pacemaker] Announce: pcs-0.9.26
>
> On 10/08/12 19:27, Chris Feist wrote:
>> We've been making improvements to the pcs (pacemaker/corosync
>> configuration
>> system) command line tool over the past few months.
>>
>> Currently you can setup a basic cluster (including configuring corosync 2.0 udpu).
>>
>> David Vossel has also created a version of the "Clusters from Scratch"
>> document that illustrates setting up a cluster using pcs.  This should
>> be showing up shortly.
>
> Just an update, I've updated the pcs (to 0.9.27) and included the pcsd daemon with the fedora packages.  You can grab the updated packages here:
>
> http://people.redhat.com/cfeist/pcs/
>
> And you should be able to used the new Clusters from Scratch optimized for the pcs CLI here: http://www.clusterlabs.org/doc/
>
> Just a couple things to note (this should be shortly updated in the notes).
>
> To run pcs on Fedora 17/18 you'll need to turn off selinux & disable the firewall (or at least allow traffic on port 2224).
>
> To disable SELinux set 'SELINUX=permissive' in /etc/selinux/config and reboot To disable the firewall run 'systemctl stop iptables.service' (to permanently disable run 'systemctl disable iptables.service')
>
> The pcs_passwd command has been removed.  In it's place you can do authentication with the hacluster user.  Just set the hacluster user password (passwd hacluster) and then use that user and password to authenticate with pcs.
>
> If you have any questions or any issues don't hesitate to contact me, we're still working out the bugs in the new pcsd daemon and we appreciate all the feedback we can get.
>
> Thanks,
> Chris
>
>>
>> You can view the source here: https://github.com/feist/pcs/
>>
>> Or download the latest tarball:
>> https://github.com/downloads/feist/pcs/pcs-0.9.26.tar.gz
>>
>> There is also a Fedora 18 package that will be included with the next release.
>> You should be able to find that package in the following locations...
>>
>> RPM:
>> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.noarch.rpm
>>
>> SRPM:
>> http://people.redhat.com/cfeist/pcs/pcs-0.9.26-1.fc18.src.rpm
>>
>> In the near future we are planning on having builds for SUSE & Ubuntu/Debian.
>>
>> We're also actively working on a GUI/Daemon that will allow control of
>> your entire cluster from one node and/or a web browser.
>>
>> Please feel free to email me (cfeist at redhat.com) or open issues on the
>> pcs project at github (https://github.com/feist/pcs/issues) if you
>> have any questions or problems.
>>
>> Thanks!
>> Chris
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


From sam at dotsec.com  Wed Oct 17 21:11:04 2012
From: sam at dotsec.com (Sam Wilson)
Date: Thu, 18 Oct 2012 07:11:04 +1000
Subject: [Linux-cluster] CMAN nodes in different LANs
In-Reply-To: <CAK76X7TcNHmdPoprDgTt-LnjwO+b5uwvogUUio1aP4hGsEzXcQ@mail.gmail.com>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
	<507EB5A4.4020502@alteeve.ca>
	<CAK76X7TcNHmdPoprDgTt-LnjwO+b5uwvogUUio1aP4hGsEzXcQ@mail.gmail.com>
Message-ID: <507F1EE8.40102@dotsec.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I might be totally missing something but unless you specify otherwise
doesn't cman require layer 2 connectivity between nodes?

Cheers,

Sam

- -- 
 ______________________________________________________

Mr Sam Wilson,
Information Security Specialist,
DotSec - Dot com Security.
Email: sam at dotsec.com
Telephone: +61 7 3221 2442
Postal correspondence:
  PO Box 10286, Adelaide St., Brisbane, 4000
Head office (no post):
  Suite 201, 303 Adelaide Street,  Brisbane, Australia

______________________________________________________
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iF4EAREIAAYFAlB/HuIACgkQFdt86iEfl/ed4AD+LIHCF2ljuZ+j6h/fcyk94V6W
Y0H9M2CRcbfFcYR+Qz0BAKwuAmP4eW7Qev0m1Q/em69ClXWW/1v5aC82DQ0Ju2HB
=kaHg
-----END PGP SIGNATURE-----


From lists at alteeve.ca  Wed Oct 17 21:13:41 2012
From: lists at alteeve.ca (Digimer)
Date: Wed, 17 Oct 2012 17:13:41 -0400
Subject: [Linux-cluster] CMAN nodes in different LANs
In-Reply-To: <507F1EE8.40102@dotsec.com>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
	<507EB5A4.4020502@alteeve.ca>
	<CAK76X7TcNHmdPoprDgTt-LnjwO+b5uwvogUUio1aP4hGsEzXcQ@mail.gmail.com>
	<507F1EE8.40102@dotsec.com>
Message-ID: <507F1F85.5020804@alteeve.ca>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/17/2012 05:11 PM, Sam Wilson wrote:
> I might be totally missing something but unless you specify
> otherwise doesn't cman require layer 2 connectivity between nodes?
> 
> Cheers,
> 
> Sam
> 

It normally uses multicast, but you can use unicast with newer
corosync (I've not tried this). I am not sure if two nodes in separate
subnets works, I've also never tried this. I have to think it would be
a bad idea, given the need to fast/stable networking.

- -- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQIcBAEBAgAGBQJQfx+FAAoJEJL1R7RwoP6IXswP/00lC2YFQDOSA2DVsvN1hUTZ
o9u+JSSMv8cigW2VabayZcxs/JMigafRnJK4Lg1VE6272zxGLoKUXphX41QfM7Fl
WQQ/P1ChkDk1FRdNPVIIwNOYPApIA8hPhbBkPsMKdUPTwWu5ecG6RWVWdLLPfT9L
1D6MRy8P+dOU8n+sAyvVCI2LY6LoNX7CwobfqVIhNcmyw8iw4nxlBpnDswds7HJg
eEbFKBWCjhx5i1O4lFUI7TS10N0QoWx+L90HtxKlnMo6JqpJatlorMP5FxkLsgRp
WYgsDyP9S6FQQc6+1OZwKk9E7cOwCenS/FQooBPdgyjGo2sh0ZRdXlIvM3kckCmK
S2y1F6am1th0Wj79z4o7Wwhhevbhm1NRd6myKFkxdxoOeiRRN2vTRUg7oTfnIKJP
xMYjW9UWHm/A6WdDXkFXRkT4a9O0zjqTCggmL6BPT7bkl5I71wmP9fLfQyhO1VdK
uaLYbgma6zEHFJq+sHGgpawowlRlU2CXaJ1jZB316/Y8TTDAyrbk7eQLUribGqra
aRSHz5b+RBlaSHBA3V6b0zEnIIXxcwWtOjpKy++M7BaPFNCa/cPkMJTfKO7O19BP
PShZxE7HvsphTvAwl+Gb3gnhgt1W4a0mrHwRf49lVdf59MUwCkTp+9EOac9VRm7n
51i6iTjsbqlEPeICmCgW
=qzFE
-----END PGP SIGNATURE-----


From terance at socialtwist.com  Thu Oct 18 06:44:30 2012
From: terance at socialtwist.com (Terance Dias)
Date: Thu, 18 Oct 2012 12:14:30 +0530
Subject: [Linux-cluster] CMAN nodes in different LANs
In-Reply-To: <507F1F85.5020804@alteeve.ca>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
	<507EB5A4.4020502@alteeve.ca>
	<CAK76X7TcNHmdPoprDgTt-LnjwO+b5uwvogUUio1aP4hGsEzXcQ@mail.gmail.com>
	<507F1EE8.40102@dotsec.com> <507F1F85.5020804@alteeve.ca>
Message-ID: <CAK76X7SwhCJ0przhtZ4xvefB0Q8Ccf6VcWer7OMXY_DS42_z1A@mail.gmail.com>

Sorry I did not give much context of what I'm trying to do.

I'm trying to set up a cluster of nodes running apache qpid as mentioned in
the documentation
here<http://qpid.apache.org/books/trunk/AMQP-Messaging-Broker-CPP-Book/html/chap-Messaging_User_Guide-Active_Passive_Cluster.html>
The documentation says that cluster mechanism uses cman and rgmanager but
does not directly depend on openais or corosync and does not use
multicasting. It also says "Replication to a *disaster recovery* site can
be handled as simply another node in the cluster, it does not require a
separate replication mechanism" . I think by "disaster recovery site" it
means a different network. I'm very new to clustering so I just wanted to
know if it is possible to use cman and rgmanager with cluster nodes in
different networks.

Thanks,
Terance.

On Thu, Oct 18, 2012 at 2:43 AM, Digimer <lists at alteeve.ca> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10/17/2012 05:11 PM, Sam Wilson wrote:
> > I might be totally missing something but unless you specify
> > otherwise doesn't cman require layer 2 connectivity between nodes?
> >
> > Cheers,
> >
> > Sam
> >
>
> It normally uses multicast, but you can use unicast with newer
> corosync (I've not tried this). I am not sure if two nodes in separate
> subnets works, I've also never tried this. I have to think it would be
> a bad idea, given the need to fast/stable networking.
>
> - --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.12 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJQfx+FAAoJEJL1R7RwoP6IXswP/00lC2YFQDOSA2DVsvN1hUTZ
> o9u+JSSMv8cigW2VabayZcxs/JMigafRnJK4Lg1VE6272zxGLoKUXphX41QfM7Fl
> WQQ/P1ChkDk1FRdNPVIIwNOYPApIA8hPhbBkPsMKdUPTwWu5ecG6RWVWdLLPfT9L
> 1D6MRy8P+dOU8n+sAyvVCI2LY6LoNX7CwobfqVIhNcmyw8iw4nxlBpnDswds7HJg
> eEbFKBWCjhx5i1O4lFUI7TS10N0QoWx+L90HtxKlnMo6JqpJatlorMP5FxkLsgRp
> WYgsDyP9S6FQQc6+1OZwKk9E7cOwCenS/FQooBPdgyjGo2sh0ZRdXlIvM3kckCmK
> S2y1F6am1th0Wj79z4o7Wwhhevbhm1NRd6myKFkxdxoOeiRRN2vTRUg7oTfnIKJP
> xMYjW9UWHm/A6WdDXkFXRkT4a9O0zjqTCggmL6BPT7bkl5I71wmP9fLfQyhO1VdK
> uaLYbgma6zEHFJq+sHGgpawowlRlU2CXaJ1jZB316/Y8TTDAyrbk7eQLUribGqra
> aRSHz5b+RBlaSHBA3V6b0zEnIIXxcwWtOjpKy++M7BaPFNCa/cPkMJTfKO7O19BP
> PShZxE7HvsphTvAwl+Gb3gnhgt1W4a0mrHwRf49lVdf59MUwCkTp+9EOac9VRm7n
> 51i6iTjsbqlEPeICmCgW
> =qzFE
> -----END PGP SIGNATURE-----
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121018/2aad4c11/attachment.htm>

From binanalhalabi at yahoo.com  Thu Oct 18 11:36:59 2012
From: binanalhalabi at yahoo.com (Binan AL Halabi)
Date: Thu, 18 Oct 2012 04:36:59 -0700 (PDT)
Subject: [Linux-cluster] CMAN nodes in different LANs
In-Reply-To: <CAK76X7SwhCJ0przhtZ4xvefB0Q8Ccf6VcWer7OMXY_DS42_z1A@mail.gmail.com>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
	<507EB5A4.4020502@alteeve.ca>
	<CAK76X7TcNHmdPoprDgTt-LnjwO+b5uwvogUUio1aP4hGsEzXcQ@mail.gmail.com>
	<507F1EE8.40102@dotsec.com> <507F1F85.5020804@alteeve.ca>
	<CAK76X7SwhCJ0przhtZ4xvefB0Q8Ccf6VcWer7OMXY_DS42_z1A@mail.gmail.com>
Message-ID: <1350560219.38099.YahooMailNeo@web122604.mail.ne1.yahoo.com>

Hello,
You can build a cluster with nodes on different LANs but your resource manager must support Virtual IP addresses.
As i see in your link this is supported in the resource manager.

// Binan

________________________________
 Fr?n: Terance Dias <terance at socialtwist.com>
Till: linux clustering <linux-cluster at redhat.com> 
Skickat: torsdag, 18 oktober 2012 8:44
?mne: Re: [Linux-cluster] CMAN nodes in different LANs
 

Sorry I did not give much context of what I'm trying to do.

I'm trying to set up a cluster of nodes running apache qpid as mentioned in the documentation?here
The documentation says that cluster mechanism uses cman and rgmanager but does not directly depend on openais or corosync and does not use multicasting. It also says "Replication to a disaster recovery site can be handled as simply another node in the cluster, it does not require a separate replication mechanism" . I think by "disaster recovery site" it means a different network. I'm very new to clustering so I just wanted to know if it is possible to use cman and rgmanager with cluster nodes in different networks.

Thanks,
Terance.


On Thu, Oct 18, 2012 at 2:43 AM, Digimer <lists at alteeve.ca> wrote:

-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>
>On 10/17/2012 05:11 PM, Sam Wilson wrote:
>> I might be totally missing something but unless you specify
>> otherwise doesn't cman require layer 2 connectivity between nodes?
>>
>> Cheers,
>>
>> Sam
>>
>
>It normally uses multicast, but you can use unicast with newer
>corosync (I've not tried this). I am not sure if two nodes in separate
>subnets works, I've also never tried this. I have to think it would be
>a bad idea, given the need to fast/stable networking.
>
>- --
>
>Digimer
>Papers and Projects: https://alteeve.ca/w/
>What if the cure for cancer is trapped in the mind of a person without
>access to education?
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.4.12 (GNU/Linux)
>Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
>
>iQIcBAEBAgAGBQJQfx+FAAoJEJL1R7RwoP6IXswP/00lC2YFQDOSA2DVsvN1hUTZ
>o9u+JSSMv8cigW2VabayZcxs/JMigafRnJK4Lg1VE6272zxGLoKUXphX41QfM7Fl
>WQQ/P1ChkDk1FRdNPVIIwNOYPApIA8hPhbBkPsMKdUPTwWu5ecG6RWVWdLLPfT9L
>1D6MRy8P+dOU8n+sAyvVCI2LY6LoNX7CwobfqVIhNcmyw8iw4nxlBpnDswds7HJg
>eEbFKBWCjhx5i1O4lFUI7TS10N0QoWx+L90HtxKlnMo6JqpJatlorMP5FxkLsgRp
>WYgsDyP9S6FQQc6+1OZwKk9E7cOwCenS/FQooBPdgyjGo2sh0ZRdXlIvM3kckCmK
>S2y1F6am1th0Wj79z4o7Wwhhevbhm1NRd6myKFkxdxoOeiRRN2vTRUg7oTfnIKJP
>xMYjW9UWHm/A6WdDXkFXRkT4a9O0zjqTCggmL6BPT7bkl5I71wmP9fLfQyhO1VdK
>uaLYbgma6zEHFJq+sHGgpawowlRlU2CXaJ1jZB316/Y8TTDAyrbk7eQLUribGqra
>aRSHz5b+RBlaSHBA3V6b0zEnIIXxcwWtOjpKy++M7BaPFNCa/cPkMJTfKO7O19BP
>PShZxE7HvsphTvAwl+Gb3gnhgt1W4a0mrHwRf49lVdf59MUwCkTp+9EOac9VRm7n
>51i6iTjsbqlEPeICmCgW
>=qzFE
>
>-----END PGP SIGNATURE-----
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>

-- 
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121018/c6aacf01/attachment.htm>

From terance at socialtwist.com  Thu Oct 18 12:20:04 2012
From: terance at socialtwist.com (Terance Dias)
Date: Thu, 18 Oct 2012 17:50:04 +0530
Subject: [Linux-cluster] CMAN nodes in different LANs
In-Reply-To: <1350560219.38099.YahooMailNeo@web122604.mail.ne1.yahoo.com>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
	<507EB5A4.4020502@alteeve.ca>
	<CAK76X7TcNHmdPoprDgTt-LnjwO+b5uwvogUUio1aP4hGsEzXcQ@mail.gmail.com>
	<507F1EE8.40102@dotsec.com> <507F1F85.5020804@alteeve.ca>
	<CAK76X7SwhCJ0przhtZ4xvefB0Q8Ccf6VcWer7OMXY_DS42_z1A@mail.gmail.com>
	<1350560219.38099.YahooMailNeo@web122604.mail.ne1.yahoo.com>
Message-ID: <CAK76X7TYvv0YxcaLxCeUUtPJ8gawoNsE7Dnx6Nk3J8dYLQvuLA@mail.gmail.com>

I cannot use Virtual IP addresses since my 2 networks are in different
geographical locations and therefore the IP cannot be relocated. Is it
possible to build this cluster without virtual IP addresses?

Thanks,
Terance.

On Thu, Oct 18, 2012 at 5:06 PM, Binan AL Halabi <binanalhalabi at yahoo.com>wrote:

> Hello,
> You can build a cluster with nodes on different LANs but your resource
> manager must support Virtual IP addresses.
> As i see in your link this is supported in the resource manager.
>
> // Binan
>   ------------------------------
> *Fr?n:* Terance Dias <terance at socialtwist.com>
> *Till:* linux clustering <linux-cluster at redhat.com>
> *Skickat:* torsdag, 18 oktober 2012 8:44
> *?mne:* Re: [Linux-cluster] CMAN nodes in different LANs
>
> Sorry I did not give much context of what I'm trying to do.
>
> I'm trying to set up a cluster of nodes running apache qpid as mentioned
> in the documentation here<http://qpid.apache.org/books/trunk/AMQP-Messaging-Broker-CPP-Book/html/chap-Messaging_User_Guide-Active_Passive_Cluster.html>
> The documentation says that cluster mechanism uses cman and rgmanager but
> does not directly depend on openais or corosync and does not use
> multicasting. It also says "Replication to a *disaster recovery* site can
> be handled as simply another node in the cluster, it does not require a
> separate replication mechanism" . I think by "disaster recovery site" it
> means a different network. I'm very new to clustering so I just wanted to
> know if it is possible to use cman and rgmanager with cluster nodes in
> different networks.
>
> Thanks,
> Terance.
>
> On Thu, Oct 18, 2012 at 2:43 AM, Digimer <lists at alteeve.ca> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10/17/2012 05:11 PM, Sam Wilson wrote:
> > I might be totally missing something but unless you specify
> > otherwise doesn't cman require layer 2 connectivity between nodes?
> >
> > Cheers,
> >
> > Sam
> >
>
> It normally uses multicast, but you can use unicast with newer
> corosync (I've not tried this). I am not sure if two nodes in separate
> subnets works, I've also never tried this. I have to think it would be
> a bad idea, given the need to fast/stable networking.
>
> - --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.12 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJQfx+FAAoJEJL1R7RwoP6IXswP/00lC2YFQDOSA2DVsvN1hUTZ
> o9u+JSSMv8cigW2VabayZcxs/JMigafRnJK4Lg1VE6272zxGLoKUXphX41QfM7Fl
> WQQ/P1ChkDk1FRdNPVIIwNOYPApIA8hPhbBkPsMKdUPTwWu5ecG6RWVWdLLPfT9L
> 1D6MRy8P+dOU8n+sAyvVCI2LY6LoNX7CwobfqVIhNcmyw8iw4nxlBpnDswds7HJg
> eEbFKBWCjhx5i1O4lFUI7TS10N0QoWx+L90HtxKlnMo6JqpJatlorMP5FxkLsgRp
> WYgsDyP9S6FQQc6+1OZwKk9E7cOwCenS/FQooBPdgyjGo2sh0ZRdXlIvM3kckCmK
> S2y1F6am1th0Wj79z4o7Wwhhevbhm1NRd6myKFkxdxoOeiRRN2vTRUg7oTfnIKJP
> xMYjW9UWHm/A6WdDXkFXRkT4a9O0zjqTCggmL6BPT7bkl5I71wmP9fLfQyhO1VdK
> uaLYbgma6zEHFJq+sHGgpawowlRlU2CXaJ1jZB316/Y8TTDAyrbk7eQLUribGqra
> aRSHz5b+RBlaSHBA3V6b0zEnIIXxcwWtOjpKy++M7BaPFNCa/cPkMJTfKO7O19BP
> PShZxE7HvsphTvAwl+Gb3gnhgt1W4a0mrHwRf49lVdf59MUwCkTp+9EOac9VRm7n
> 51i6iTjsbqlEPeICmCgW
> =qzFE
> -----END PGP SIGNATURE-----
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121018/7ea5d9b5/attachment.htm>

From binanalhalabi at yahoo.com  Thu Oct 18 13:30:13 2012
From: binanalhalabi at yahoo.com (Binan AL Halabi)
Date: Thu, 18 Oct 2012 06:30:13 -0700 (PDT)
Subject: [Linux-cluster] VB:  CMAN nodes in different LANs
In-Reply-To: <1350566725.99107.YahooMailNeo@web122603.mail.ne1.yahoo.com>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
	<507EB5A4.4020502@alteeve.ca>
	<CAK76X7TcNHmdPoprDgTt-LnjwO+b5uwvogUUio1aP4hGsEzXcQ@mail.gmail.com>
	<507F1EE8.40102@dotsec.com> <507F1F85.5020804@alteeve.ca>
	<CAK76X7SwhCJ0przhtZ4xvefB0Q8Ccf6VcWer7OMXY_DS42_z1A@mail.gmail.com>
	<1350560219.38099.YahooMailNeo@web122604.mail.ne1.yahoo.com>
	<CAK76X7TYvv0YxcaLxCeUUtPJ8gawoNsE7Dnx6Nk3J8dYLQvuLA@mail.gmail.com>
	<1350566725.99107.YahooMailNeo@web122603.mail.ne1.yahoo.com>
Message-ID: <1350567013.23919.YahooMailNeo@web122605.mail.ne1.yahoo.com>


Hi,

In this case it must be application-level loadbalancing. Is It? 
I think so because i see in your link there ia a list of brokers and they are specified by URL so this is application level loadbalancing.
and you can do it.

The full format of the broker URL is given by this grammar: 
url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
addr = tcp_addr / rmda_addr / ssl_addr / ...
tcp_addr = ["tcp:"] host [":" port]
rdma_addr = "rdma:" host [":" port]
ssl_addr = "ssl:" host [":" port]' The URL can be virtual IP (simple case) or URL as above (This what you want).


// Binan


________________________________
 Fr?n: Terance Dias <terance at socialtwist.com>
Till: Binan AL Halabi <binanalhalabi at yahoo.com>; linux clustering <linux-cluster at redhat.com> 
Skickat: torsdag, 18 oktober 2012 14:20
?mne: Re: [Linux-cluster] CMAN nodes in different LANs
 

I cannot use Virtual IP addresses since my 2 networks are in different geographical locations and therefore the IP cannot be relocated. Is it possible to build this cluster without virtual IP addresses?

Thanks,
Terance.


On Thu, Oct 18, 2012 at 5:06 PM, Binan AL Halabi <binanalhalabi at yahoo.com> wrote:

Hello,
>You can build a cluster with nodes on different LANs but your resource manager must support Virtual IP addresses.
>As i see in your link this is supported in the resource manager.
>
>// Binan 
>
>________________________________
> Fr?n: Terance Dias <terance at socialtwist.com>
>Till: linux clustering <linux-cluster at redhat.com> 
>Skickat: torsdag, 18 oktober 2012 8:44
>?mne: Re: [Linux-cluster] CMAN nodes in different LANs
> 
>
>
>Sorry I did not give much context of what I'm trying to do.
>
>
>I'm trying to set up a cluster of nodes running apache qpid as mentioned in the documentation?here
>The documentation says that cluster mechanism uses cman and rgmanager but does not directly depend on openais or corosync and does not use multicasting. It also says "Replication to a disaster recovery site can be handled as simply another node in the cluster, it does not require a separate replication mechanism" . I think by "disaster recovery site" it means a different network. I'm very new to clustering so I just wanted to know if it is possible to use cman and rgmanager with cluster nodes in different networks.
>
>
>Thanks,
>Terance.
>
>
>On Thu, Oct 18, 2012 at 2:43 AM, Digimer <lists at alteeve.ca> wrote:
>
>-----BEGIN PGP SIGNED MESSAGE-----
>>Hash: SHA1
>>
>>
>>On 10/17/2012 05:11 PM, Sam Wilson wrote:
>>> I might be totally missing something but unless you specify
>>> otherwise doesn't cman require layer 2 connectivity between nodes?
>>>
>>> Cheers,
>>>
>>> Sam
>>>
>>
>>It normally uses multicast, but you can use unicast with newer
>>corosync (I've not tried this). I am not sure if two nodes in separate
>>subnets works, I've also never tried this. I have to think it would be
>>a bad idea, given the need to fast/stable networking.
>>
>>- --
>>
>>Digimer
>>Papers and Projects: https://alteeve.ca/w/
>>What if the cure for cancer is trapped in the mind of a person without
>>access to education?
>>-----BEGIN PGP SIGNATURE-----
>>Version: GnuPG v1.4.12 (GNU/Linux)
>>Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
>>
>>iQIcBAEBAgAGBQJQfx+FAAoJEJL1R7RwoP6IXswP/00lC2YFQDOSA2DVsvN1hUTZ
>>o9u+JSSMv8cigW2VabayZcxs/JMigafRnJK4Lg1VE6272zxGLoKUXphX41QfM7Fl
>>WQQ/P1ChkDk1FRdNPVIIwNOYPApIA8hPhbBkPsMKdUPTwWu5ecG6RWVWdLLPfT9L
>>1D6MRy8P+dOU8n+sAyvVCI2LY6LoNX7CwobfqVIhNcmyw8iw4nxlBpnDswds7HJg
>>eEbFKBWCjhx5i1O4lFUI7TS10N0QoWx+L90HtxKlnMo6JqpJatlorMP5FxkLsgRp
>>WYgsDyP9S6FQQc6+1OZwKk9E7cOwCenS/FQooBPdgyjGo2sh0ZRdXlIvM3kckCmK
>>S2y1F6am1th0Wj79z4o7Wwhhevbhm1NRd6myKFkxdxoOeiRRN2vTRUg7oTfnIKJP
>>xMYjW9UWHm/A6WdDXkFXRkT4a9O0zjqTCggmL6BPT7bkl5I71wmP9fLfQyhO1VdK
>>uaLYbgma6zEHFJq+sHGgpawowlRlU2CXaJ1jZB316/Y8TTDAyrbk7eQLUribGqra
>>aRSHz5b+RBlaSHBA3V6b0zEnIIXxcwWtOjpKy++M7BaPFNCa/cPkMJTfKO7O19BP
>>PShZxE7HvsphTvAwl+Gb3gnhgt1W4a0mrHwRf49lVdf59MUwCkTp+9EOac9VRm7n
>>51i6iTjsbqlEPeICmCgW
>>=qzFE
>>
>>-----END PGP SIGNATURE-----
>>
>>--
>>Linux-cluster mailing list
>>Linux-cluster at redhat.com
>>https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>-- 
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121018/772d7980/attachment.htm>

From terance at socialtwist.com  Thu Oct 18 13:58:08 2012
From: terance at socialtwist.com (Terance Dias)
Date: Thu, 18 Oct 2012 19:28:08 +0530
Subject: [Linux-cluster] VB: CMAN nodes in different LANs
In-Reply-To: <1350567013.23919.YahooMailNeo@web122605.mail.ne1.yahoo.com>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
	<507EB5A4.4020502@alteeve.ca>
	<CAK76X7TcNHmdPoprDgTt-LnjwO+b5uwvogUUio1aP4hGsEzXcQ@mail.gmail.com>
	<507F1EE8.40102@dotsec.com> <507F1F85.5020804@alteeve.ca>
	<CAK76X7SwhCJ0przhtZ4xvefB0Q8Ccf6VcWer7OMXY_DS42_z1A@mail.gmail.com>
	<1350560219.38099.YahooMailNeo@web122604.mail.ne1.yahoo.com>
	<CAK76X7TYvv0YxcaLxCeUUtPJ8gawoNsE7Dnx6Nk3J8dYLQvuLA@mail.gmail.com>
	<1350566725.99107.YahooMailNeo@web122603.mail.ne1.yahoo.com>
	<1350567013.23919.YahooMailNeo@web122605.mail.ne1.yahoo.com>
Message-ID: <CAK76X7QvkD33fBS8jKh4Yam7qsMnXZW4WKywp4FzRu5nEZvyxw@mail.gmail.com>

Yes, that is correct. We want to give address of both nodes in the URL and
allow failover to the other node in case the first node fails. So if the
primary node fails, the rgmanager will promote the secondary node to
primary and the application will failover to the new primary node.
Also, our nodes are in Amazon EC2 where it is not possible to have virtual
IP addresses.

Thanks,
Terance.

On Thu, Oct 18, 2012 at 7:00 PM, Binan AL Halabi <binanalhalabi at yahoo.com>wrote:

>
> Hi,
>
> In this case it must be application-level loadbalancing. Is It?
> I think so because i see in your link there ia a list of brokers and they
> are specified by URL so this is application level loadbalancing.
> and you can do it.
>
> The full format of the broker URL is given by this grammar:
>
> url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
> addr = tcp_addr / rmda_addr / ssl_addr / ...
> tcp_addr = ["tcp:"] host [":" port]
> rdma_addr = "rdma:" host [":" port]
> ssl_addr = "ssl:" host [":" port]'
> 		
>
> The URL can be virtual IP (simple case) or URL as above (This what you
> want).
>
> // Binan
>   ------------------------------
> *Fr?n:* Terance Dias <terance at socialtwist.com>
> *Till:* Binan AL Halabi <binanalhalabi at yahoo.com>; linux clustering <
> linux-cluster at redhat.com>
> *Skickat:* torsdag, 18 oktober 2012 14:20
> *?mne:* Re: [Linux-cluster] CMAN nodes in different LANs
>
> I cannot use Virtual IP addresses since my 2 networks are in different
> geographical locations and therefore the IP cannot be relocated. Is it
> possible to build this cluster without virtual IP addresses?
>
> Thanks,
> Terance.
>
> On Thu, Oct 18, 2012 at 5:06 PM, Binan AL Halabi <binanalhalabi at yahoo.com>wrote:
>
> Hello,
> You can build a cluster with nodes on different LANs but your resource
> manager must support Virtual IP addresses.
> As i see in your link this is supported in the resource manager.
>
> // Binan
>   ------------------------------
> *Fr?n:* Terance Dias <terance at socialtwist.com>
> *Till:* linux clustering <linux-cluster at redhat.com>
> *Skickat:* torsdag, 18 oktober 2012 8:44
> *?mne:* Re: [Linux-cluster] CMAN nodes in different LANs
>
> Sorry I did not give much context of what I'm trying to do.
>
> I'm trying to set up a cluster of nodes running apache qpid as mentioned
> in the documentation here<http://qpid.apache.org/books/trunk/AMQP-Messaging-Broker-CPP-Book/html/chap-Messaging_User_Guide-Active_Passive_Cluster.html>
> The documentation says that cluster mechanism uses cman and rgmanager but
> does not directly depend on openais or corosync and does not use
> multicasting. It also says "Replication to a *disaster recovery* site can
> be handled as simply another node in the cluster, it does not require a
> separate replication mechanism" . I think by "disaster recovery site" it
> means a different network. I'm very new to clustering so I just wanted to
> know if it is possible to use cman and rgmanager with cluster nodes in
> different networks.
>
> Thanks,
> Terance.
>
> On Thu, Oct 18, 2012 at 2:43 AM, Digimer <lists at alteeve.ca> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10/17/2012 05:11 PM, Sam Wilson wrote:
> > I might be totally missing something but unless you specify
> > otherwise doesn't cman require layer 2 connectivity between nodes?
> >
> > Cheers,
> >
> > Sam
> >
>
> It normally uses multicast, but you can use unicast with newer
> corosync (I've not tried this). I am not sure if two nodes in separate
> subnets works, I've also never tried this. I have to think it would be
> a bad idea, given the need to fast/stable networking.
>
> - --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.12 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJQfx+FAAoJEJL1R7RwoP6IXswP/00lC2YFQDOSA2DVsvN1hUTZ
> o9u+JSSMv8cigW2VabayZcxs/JMigafRnJK4Lg1VE6272zxGLoKUXphX41QfM7Fl
> WQQ/P1ChkDk1FRdNPVIIwNOYPApIA8hPhbBkPsMKdUPTwWu5ecG6RWVWdLLPfT9L
> 1D6MRy8P+dOU8n+sAyvVCI2LY6LoNX7CwobfqVIhNcmyw8iw4nxlBpnDswds7HJg
> eEbFKBWCjhx5i1O4lFUI7TS10N0QoWx+L90HtxKlnMo6JqpJatlorMP5FxkLsgRp
> WYgsDyP9S6FQQc6+1OZwKk9E7cOwCenS/FQooBPdgyjGo2sh0ZRdXlIvM3kckCmK
> S2y1F6am1th0Wj79z4o7Wwhhevbhm1NRd6myKFkxdxoOeiRRN2vTRUg7oTfnIKJP
> xMYjW9UWHm/A6WdDXkFXRkT4a9O0zjqTCggmL6BPT7bkl5I71wmP9fLfQyhO1VdK
> uaLYbgma6zEHFJq+sHGgpawowlRlU2CXaJ1jZB316/Y8TTDAyrbk7eQLUribGqra
> aRSHz5b+RBlaSHBA3V6b0zEnIIXxcwWtOjpKy++M7BaPFNCa/cPkMJTfKO7O19BP
> PShZxE7HvsphTvAwl+Gb3gnhgt1W4a0mrHwRf49lVdf59MUwCkTp+9EOac9VRm7n
> 51i6iTjsbqlEPeICmCgW
> =qzFE
> -----END PGP SIGNATURE-----
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121018/0db598bd/attachment.htm>

From fdinitto at redhat.com  Mon Oct 22 07:33:21 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 22 Oct 2012 09:33:21 +0200
Subject: [Linux-cluster] CMAN nodes in different LANs
In-Reply-To: <CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
Message-ID: <5084F6C1.8010406@redhat.com>

On 10/17/2012 03:12 PM, Terance Dias wrote:
> Hi,
> 
> We're trying to create a cluster in which the nodes lie in 2 different
> LANs. Since the nodes lie in different networks, they cannot resolve the
> other node by their internal IP. So in my cluster.conf file, I've
> provided their external IPs. But now when I start CMAN service, I get
> the following error.
> 

First of all, we never tested nodes on different LANs, so you might have
issues there that we are not aware of (beside that, latency between
nodes *MUST* be < 2ms).

As for the IP/name that should work, but I recall fixing something
related not too long ago.

What version of cman did you install and which distribution/OS?

Fabio

> -----------------------------------
> 
> Starting cluster:
>    Checking Network Manager... [  OK  ]
>    Global setup... [  OK  ]
>    Loading kernel modules... [  OK  ]
>    Mounting configfs... [  OK  ]
>    Starting cman... Cannot find node name in cluster.conf
> Unable to get the configuration
> Cannot find node name in cluster.conf
> cman_tool: corosync daemon didn't start
> [FAILED]
> 
> -------------------------------------
> 
> My cluster.conf file is as below
> 
> -------------------------------------
> 
> <?xml version="1.0"?>
> <!--
> This is an example of a cluster.conf file to run qpidd HA under rgmanager.
> 
> NOTE: fencing is not shown, you must configure fencing appropriately for
> your cluster.
> -->
> 
> <cluster name="test-cluster" config_version="18">
>   <!-- The cluster has 2 nodes. Each has a unique nodid and one vote
>        for quorum. -->
>   <clusternodes>
>     <clusternode name="/external-ip-1/" nodeid="1"/>
>     <clusternode name="/external-ip-2/" nodeid="2"/>
>   </clusternodes>
>   <cman two_node="1" expected_votes="1" transport="udpu">
>   </cman>
>   <!-- Resouce Manager configuration. -->
>   <rm>
>     <!--
>         There is a failoverdomain for each node containing just that node.
>         This lets us stipulate that the qpidd service should always run
> on each node.
>     -->
>     <failoverdomains>
>       <failoverdomain name="east-domain" restricted="1">
>         <failoverdomainnode name="/external-ip-1/"/>
>       </failoverdomain>
>       <failoverdomain name="west-domain" restricted="1">
>         <failoverdomainnode name="/external-ip-2/"/>
>       </failoverdomain>
>     </failoverdomains>
> 
>     <resources>
>       <!-- This script starts a qpidd broker acting as a backup. -->
>       <script file="/usr/local/etc/init.d/qpidd" name="qpidd"/>
> 
>       <!-- This script promotes the qpidd broker on this node to
> primary. -->
>       <script file="/usr/local/etc/init.d/qpidd-primary"
> name="qpidd-primary"/>
>     </resources>
> 
>     <!-- There is a qpidd service on each node, it should be restarted
> if it fails. -->
>     <service name="east-qpidd-service" domain="east-domain"
> recovery="restart">
>       <script ref="qpidd"/>
>     </service>
>     <service name="west-qpidd-service" domain="west-domain"
> recovery="restart">
>       <script ref="qpidd"/>
>     </service>
> 
>     <!-- There should always be a single qpidd-primary service, it can
> run on any node. -->
>     <service name="qpidd-primary-service" autostart="1" exclusive="0"
> recovery="relocate">
>       <script ref="qpidd-primary"/>
>     </service>
>   </rm>
> </cluster>
> ------------------------------------------------
> 
> Thanks,
> Terance
> 
> 
> 


From Tom_Dryden at BUDCO.com  Tue Oct 23 20:34:36 2012
From: Tom_Dryden at BUDCO.com (Dryden, Tom)
Date: Tue, 23 Oct 2012 16:34:36 -0400
Subject: [Linux-cluster] Set time on cluster
Message-ID: <1E7F581BEF7B8444A6D29997EECCC66C024ECF21@BPMC-G0-EX1.budcotdc.net>

 
Hello All

I had a problem when time adjusted that caused node evictions.   The
sequence of events are in the log clip below. 

 The events were, 

1. ntpd restarted

2. ntpd adjusted time  +155 sec

3. openais thinks it has lost communication and begins to evict nodes.

 
My question is how do I adjust the time without causing the cluster to
get confused?

 
Thanks for your assistance. 

Tom

 
LOG CLIP:

Oct 22 16:05:02 bpas-j5 ntpd[27399]: Listening on interface bond0:9,
172.26.5.21#123 Enabled

Oct 22 16:05:02 bpas-j5 ntpd[27399]: kernel time sync status 0040

Oct 22 16:05:03 bpas-j5 ntpd[27399]: frequency initialized 40.201 PPM
from /var/lib/ntp/drift

Oct 22 16:08:15 bpas-j5 ntpd[27399]: synchronized to 169.229.70.201,
stratum 3

Oct 22 16:10:50 bpas-j5 ntpd[27399]: time reset +155.075282 s

Oct 22 16:10:50 bpas-j5 ntpd[27399]: kernel time sync enabled 0001

Oct 22 16:10:50 bpas-j5 openais[7238]: [TOTEM] The token was lost in the
OPERATIONAL state.

Oct 22 16:10:50 bpas-j5 openais[7238]: [TOTEM] Receive multicast socket
recv buffer size (288000 bytes).

Oct 22 16:10:50 bpas-j5 openais[7238]: [TOTEM] Transmit multicast socket
send buffer size (262142 bytes).

Oct 22 16:10:50 bpas-j5 openais[7238]: [TOTEM] entering GATHER state
from 2.

Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] entering GATHER state
from 0.

Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Creating commit token
because I am the rep.

Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Saving state aru 33 high
seq received 33

Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Storing new sequence id
for ring 1f48

Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] entering COMMIT state.

Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] entering RECOVERY state.

Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] position [0] member
172.26.5.11:

Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] previous ring seq 8004
rep 172.26.5.11

Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] aru 33 high delivered 33
received flag 1

Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Did not need to originate
any messages in recovery.

Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Sending initial ORF token

Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ] CLM CONFIGURATION CHANGE

Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ] New Configuration:

Oct 22 16:10:55 bpas-j5 kernel: dlm: closing connection to node 2

Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ]  r(0) ip(172.26.5.11)

Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ] Members Left:

Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ]  r(0) ip(172.26.6.11)

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121023/706de981/attachment.htm>

From jfriesse at redhat.com  Wed Oct 24 07:24:22 2012
From: jfriesse at redhat.com (Jan Friesse)
Date: Wed, 24 Oct 2012 09:24:22 +0200
Subject: [Linux-cluster] Set time on cluster
In-Reply-To: <1E7F581BEF7B8444A6D29997EECCC66C024ECF21@BPMC-G0-EX1.budcotdc.net>
References: <1E7F581BEF7B8444A6D29997EECCC66C024ECF21@BPMC-G0-EX1.budcotdc.net>
Message-ID: <508797A6.7090307@redhat.com>

We have support for monotonic timer since 0.80.6-33 (so RHEL >= 5.8).
This should solve problems you are describing. What version are you
using? If it's older then 0.80.6-33, try upgrade.

Regards,
  Honza

Dryden, Tom napsal(a):
>  
> 
> Hello All
> 
> I had a problem when time adjusted that caused node evictions.   The
> sequence of events are in the log clip below. 
> 
>  The events were, 
> 
> 1. ntpd restarted
> 
> 2. ntpd adjusted time  +155 sec
> 
> 3. openais thinks it has lost communication and begins to evict nodes.
> 
>  
> 
> My question is how do I adjust the time without causing the cluster to
> get confused?
> 
>  
> 
> Thanks for your assistance. 
> 
> Tom
> 
>  
> 
>  
> 
> LOG CLIP:
> 
> Oct 22 16:05:02 bpas-j5 ntpd[27399]: Listening on interface bond0:9,
> 172.26.5.21#123 Enabled
> 
> Oct 22 16:05:02 bpas-j5 ntpd[27399]: kernel time sync status 0040
> 
> Oct 22 16:05:03 bpas-j5 ntpd[27399]: frequency initialized 40.201 PPM
> from /var/lib/ntp/drift
> 
> Oct 22 16:08:15 bpas-j5 ntpd[27399]: synchronized to 169.229.70.201,
> stratum 3
> 
> Oct 22 16:10:50 bpas-j5 ntpd[27399]: time reset +155.075282 s
> 
> Oct 22 16:10:50 bpas-j5 ntpd[27399]: kernel time sync enabled 0001
> 
> Oct 22 16:10:50 bpas-j5 openais[7238]: [TOTEM] The token was lost in the
> OPERATIONAL state.
> 
> Oct 22 16:10:50 bpas-j5 openais[7238]: [TOTEM] Receive multicast socket
> recv buffer size (288000 bytes).
> 
> Oct 22 16:10:50 bpas-j5 openais[7238]: [TOTEM] Transmit multicast socket
> send buffer size (262142 bytes).
> 
> Oct 22 16:10:50 bpas-j5 openais[7238]: [TOTEM] entering GATHER state
> from 2.
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] entering GATHER state
> from 0.
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Creating commit token
> because I am the rep.
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Saving state aru 33 high
> seq received 33
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Storing new sequence id
> for ring 1f48
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] entering COMMIT state.
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] entering RECOVERY state.
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] position [0] member
> 172.26.5.11:
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] previous ring seq 8004
> rep 172.26.5.11
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] aru 33 high delivered 33
> received flag 1
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Did not need to originate
> any messages in recovery.
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Sending initial ORF token
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ] CLM CONFIGURATION CHANGE
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ] New Configuration:
> 
> Oct 22 16:10:55 bpas-j5 kernel: dlm: closing connection to node 2
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ]  r(0) ip(172.26.5.11)
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ] Members Left:
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ]  r(0) ip(172.26.6.11)
> 
>  
> 
> 
> 
> 


From Tom_Dryden at BUDCO.com  Wed Oct 24 14:05:04 2012
From: Tom_Dryden at BUDCO.com (Dryden, Tom)
Date: Wed, 24 Oct 2012 10:05:04 -0400
Subject: [Linux-cluster] Set time on cluster
In-Reply-To: <508797A6.7090307@redhat.com>
References: <1E7F581BEF7B8444A6D29997EECCC66C024ECF21@BPMC-G0-EX1.budcotdc.net>
	<508797A6.7090307@redhat.com>
Message-ID: <1E7F581BEF7B8444A6D29997EECCC66C024ECF24@BPMC-G0-EX1.budcotdc.net>

Thank you, we will look into upgrading.


-----Original Message-----
From: Jan Friesse [mailto:jfriesse at redhat.com] 
Sent: Wednesday, October 24, 2012 3:24 AM
To: linux clustering
Cc: Dryden, Tom
Subject: Re: [Linux-cluster] Set time on cluster

We have support for monotonic timer since 0.80.6-33 (so RHEL >= 5.8).
This should solve problems you are describing. What version are you
using? If it's older then 0.80.6-33, try upgrade.

Regards,
  Honza

Dryden, Tom napsal(a):
>  
> 
> Hello All
> 
> I had a problem when time adjusted that caused node evictions.   The
> sequence of events are in the log clip below. 
> 
>  The events were,
> 
> 1. ntpd restarted
> 
> 2. ntpd adjusted time  +155 sec
> 
> 3. openais thinks it has lost communication and begins to evict nodes.
> 
>  
> 
> My question is how do I adjust the time without causing the cluster to

> get confused?
> 
>  
> 
> Thanks for your assistance. 
> 
> Tom
> 
>  
> 
>  
> 
> LOG CLIP:
> 
> Oct 22 16:05:02 bpas-j5 ntpd[27399]: Listening on interface bond0:9,
> 172.26.5.21#123 Enabled
> 
> Oct 22 16:05:02 bpas-j5 ntpd[27399]: kernel time sync status 0040
> 
> Oct 22 16:05:03 bpas-j5 ntpd[27399]: frequency initialized 40.201 PPM 
> from /var/lib/ntp/drift
> 
> Oct 22 16:08:15 bpas-j5 ntpd[27399]: synchronized to 169.229.70.201, 
> stratum 3
> 
> Oct 22 16:10:50 bpas-j5 ntpd[27399]: time reset +155.075282 s
> 
> Oct 22 16:10:50 bpas-j5 ntpd[27399]: kernel time sync enabled 0001
> 
> Oct 22 16:10:50 bpas-j5 openais[7238]: [TOTEM] The token was lost in 
> the OPERATIONAL state.
> 
> Oct 22 16:10:50 bpas-j5 openais[7238]: [TOTEM] Receive multicast 
> socket recv buffer size (288000 bytes).
> 
> Oct 22 16:10:50 bpas-j5 openais[7238]: [TOTEM] Transmit multicast 
> socket send buffer size (262142 bytes).
> 
> Oct 22 16:10:50 bpas-j5 openais[7238]: [TOTEM] entering GATHER state 
> from 2.
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] entering GATHER state 
> from 0.
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Creating commit token 
> because I am the rep.
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Saving state aru 33 
> high seq received 33
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Storing new sequence id

> for ring 1f48
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] entering COMMIT state.
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] entering RECOVERY
state.
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] position [0] member
> 172.26.5.11:
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] previous ring seq 8004 
> rep 172.26.5.11
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] aru 33 high delivered 
> 33 received flag 1
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Did not need to 
> originate any messages in recovery.
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [TOTEM] Sending initial ORF 
> token
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ] CLM CONFIGURATION 
> CHANGE
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ] New Configuration:
> 
> Oct 22 16:10:55 bpas-j5 kernel: dlm: closing connection to node 2
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ]  r(0) ip(172.26.5.11)
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ] Members Left:
> 
> Oct 22 16:10:55 bpas-j5 openais[7238]: [CLM  ]  r(0) ip(172.26.6.11)
> 
>  
> 
> 
> 
> 


From a.holway at syseleven.de  Wed Oct 24 14:38:46 2012
From: a.holway at syseleven.de (Andrew Holway)
Date: Wed, 24 Oct 2012 16:38:46 +0200
Subject: [Linux-cluster] CLVM performance
Message-ID: <E57C3C60-6AA1-4601-874C-7ACA71A253E0@syseleven.de>

Hello,

Ive been doing some testing.
I have an iSCSI device that I have set up with CLVM. 
I have 4 physical hosts, all with top of the range xeons and 144GB ram. 
I have 12 KVM machines per physical machine with a total of 48 VMs.
Each KVM is mounted on a LV.
The performance is not very good.
I am not a fish.

Could CLVM be the bottleneck? Especially with so many LV's?
I though I would ask before I set up 48 physical luns on the iSCSI device :)

Thanks,

Andrew


From heiko.nardmann at itechnical.de  Wed Oct 24 15:01:03 2012
From: heiko.nardmann at itechnical.de (Heiko Nardmann)
Date: Wed, 24 Oct 2012 17:01:03 +0200
Subject: [Linux-cluster] CLVM performance
In-Reply-To: <E57C3C60-6AA1-4601-874C-7ACA71A253E0@syseleven.de>
References: <E57C3C60-6AA1-4601-874C-7ACA71A253E0@syseleven.de>
Message-ID: <508802AF.1050504@itechnical.de>

Am 24.10.2012 16:38, schrieb Andrew Holway:
> Hello,
>
> Ive been doing some testing.
> I have an iSCSI device that I have set up with CLVM. 
> I have 4 physical hosts, all with top of the range xeons and 144GB ram. 
> I have 12 KVM machines per physical machine with a total of 48 VMs.
> Each KVM is mounted on a LV.
> The performance is not very good.
> I am not a fish.
>
> Could CLVM be the bottleneck? Especially with so many LV's?
> I though I would ask before I set up 48 physical luns on the iSCSI device :)
>
> Thanks,
>
> Andrew
>
>
>
>
Looks like you should open a ticket with RedHat and give them/us some
more infos (including expected perf. vs. experienced perf., cluster
configuration) ...

Kind regards,

    Heiko


From a.holway at syseleven.de  Wed Oct 24 15:27:27 2012
From: a.holway at syseleven.de (Andrew Holway)
Date: Wed, 24 Oct 2012 17:27:27 +0200
Subject: [Linux-cluster] CLVM performance
In-Reply-To: <508802AF.1050504@itechnical.de>
References: <E57C3C60-6AA1-4601-874C-7ACA71A253E0@syseleven.de>
	<508802AF.1050504@itechnical.de>
Message-ID: <0EEAC8C7-D4D5-4570-9613-78EA46BDE048@syseleven.de>


On Oct 24, 2012, at 5:01 PM, Heiko Nardmann wrote:

> Am 24.10.2012 16:38, schrieb Andrew Holway:
>> Hello,
>> 
>> Ive been doing some testing.
>> I have an iSCSI device that I have set up with CLVM. 
>> I have 4 physical hosts, all with top of the range xeons and 144GB ram. 
>> I have 12 KVM machines per physical machine with a total of 48 VMs.
>> Each KVM is mounted on a LV.
>> The performance is not very good.
>> I am not a fish.
>> 
>> Could CLVM be the bottleneck? Especially with so many LV's?
>> I though I would ask before I set up 48 physical luns on the iSCSI device :)
>> 
>> Thanks,
>> 
>> Andrew
>> 
>> 
>> 
>> 
> Looks like you should open a ticket with RedHat and give them/us some
> more infos (including expected perf. vs. experienced perf., cluster
> configuration) ...

That would be tricky! We don't have Redhat subscription!

In any case, Im just interested if it *could* be a problem. I will have to test this with no CLVM before I can open a ticket somewhere.

Ta,

Andrew

> 
> Kind regards,
> 
>    Heiko
> 


From lmb at suse.de  Thu Oct 25 11:25:37 2012
From: lmb at suse.de (Lars Marowsky-Bree)
Date: Thu, 25 Oct 2012 13:25:37 +0200
Subject: [Linux-cluster] CLVM performance
In-Reply-To: <E57C3C60-6AA1-4601-874C-7ACA71A253E0@syseleven.de>
References: <E57C3C60-6AA1-4601-874C-7ACA71A253E0@syseleven.de>
Message-ID: <20121025112537.GI22929@suse.de>

On 2012-10-24T16:38:46, Andrew Holway <a.holway at syseleven.de> wrote:

> Hello,
> 
> Ive been doing some testing.
> I have an iSCSI device that I have set up with CLVM. 
> I have 4 physical hosts, all with top of the range xeons and 144GB ram. 
> I have 12 KVM machines per physical machine with a total of 48 VMs.
> Each KVM is mounted on a LV.
> The performance is not very good.
> I am not a fish.
> 
> Could CLVM be the bottleneck? Especially with so many LV's?

cLVM2 pretty much gets out of the way, except for the
adminstrative/management tasks like adding/deleting/resizing LVs. It has
almost no measurable impact on the IO latency/throughput of the LVs.

It is more likely you're looking at a bandwidth/latency problem on the
iSCSI side. Is the connection clogged? Server side overloaded? Network?

Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imend?rffer, HRB 21284 (AG N?rnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


From swhiteho at redhat.com  Thu Oct 25 12:21:22 2012
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Thu, 25 Oct 2012 13:21:22 +0100
Subject: [Linux-cluster] CLVM performance
In-Reply-To: <0EEAC8C7-D4D5-4570-9613-78EA46BDE048@syseleven.de>
References: <E57C3C60-6AA1-4601-874C-7ACA71A253E0@syseleven.de>
	<508802AF.1050504@itechnical.de>
	<0EEAC8C7-D4D5-4570-9613-78EA46BDE048@syseleven.de>
Message-ID: <1351167682.2710.35.camel@menhir>

Hi,

On Wed, 2012-10-24 at 17:27 +0200, Andrew Holway wrote:
> On Oct 24, 2012, at 5:01 PM, Heiko Nardmann wrote:
> 
> > Am 24.10.2012 16:38, schrieb Andrew Holway:
> >> Hello,
> >> 
> >> Ive been doing some testing.
> >> I have an iSCSI device that I have set up with CLVM. 
> >> I have 4 physical hosts, all with top of the range xeons and 144GB ram. 
> >> I have 12 KVM machines per physical machine with a total of 48 VMs.
> >> Each KVM is mounted on a LV.
> >> The performance is not very good.
> >> I am not a fish.
> >> 
> >> Could CLVM be the bottleneck? Especially with so many LV's?
> >> I though I would ask before I set up 48 physical luns on the iSCSI device :)
> >> 
> >> Thanks,
> >> 
> >> Andrew
> >> 
> >> 
> >> 
> >> 
> > Looks like you should open a ticket with RedHat and give them/us some
> > more infos (including expected perf. vs. experienced perf., cluster
> > configuration) ...
> 
> That would be tricky! We don't have Redhat subscription!
> 
> In any case, Im just interested if it *could* be a problem. I will have to test this with no CLVM before I can open a ticket somewhere.
> 
> Ta,
> 
> Andrew
> 

What is it that is not working at the speed that you think it should?
Are you looking at disk throughput, or time taken to scan the LVs or
something else? Does the throughput improve if you use the iSCSI LUN
directly?

I think we'd need more information before this can be solved. Also I'd
suggest asking on the LVM list as there may be more people there who can
help.

Unless this is specifically about the cluster bit of CLVM (and the
cluster bit is only in use when configuration changes are made, or when
things are scanned at boot time) then that may be a better bet for
finding a solution,

Steve.


From terance at socialtwist.com  Mon Oct 29 05:24:38 2012
From: terance at socialtwist.com (Terance Dias)
Date: Mon, 29 Oct 2012 10:54:38 +0530
Subject: [Linux-cluster] CMAN nodes in different LANs
In-Reply-To: <5084F6C1.8010406@redhat.com>
References: <CAK76X7TqhLn6Hxd-4xDTLf+XgKQL59u6ZRN5OnhcAQS3hVZg4A@mail.gmail.com>
	<CAK76X7Rwkn1Rn4yXFgpEbh72NBnn_cHnSHRbTP-o6iG6U1V9sw@mail.gmail.com>
	<5084F6C1.8010406@redhat.com>
Message-ID: <CAK76X7ThtrV9XZoH2rQkzaSV0wEHKEuiFuE278igOtw6RcO_Eg@mail.gmail.com>

Thanks for your reply Fabio. I think the problem may be at our end. Our
infrastructure is on Amazon EC2 and it turns out that the interfaces file
of a EC2 node does not have reference to its public IP address.

On Mon, Oct 22, 2012 at 1:03 PM, Fabio M. Di Nitto <fdinitto at redhat.com>wrote:

> On 10/17/2012 03:12 PM, Terance Dias wrote:
> > Hi,
> >
> > We're trying to create a cluster in which the nodes lie in 2 different
> > LANs. Since the nodes lie in different networks, they cannot resolve the
> > other node by their internal IP. So in my cluster.conf file, I've
> > provided their external IPs. But now when I start CMAN service, I get
> > the following error.
> >
>
> First of all, we never tested nodes on different LANs, so you might have
> issues there that we are not aware of (beside that, latency between
> nodes *MUST* be < 2ms).
>
> As for the IP/name that should work, but I recall fixing something
> related not too long ago.
>
> What version of cman did you install and which distribution/OS?
>
> Fabio
>
> > -----------------------------------
> >
> > Starting cluster:
> >    Checking Network Manager... [  OK  ]
> >    Global setup... [  OK  ]
> >    Loading kernel modules... [  OK  ]
> >    Mounting configfs... [  OK  ]
> >    Starting cman... Cannot find node name in cluster.conf
> > Unable to get the configuration
> > Cannot find node name in cluster.conf
> > cman_tool: corosync daemon didn't start
> > [FAILED]
> >
> > -------------------------------------
> >
> > My cluster.conf file is as below
> >
> > -------------------------------------
> >
> > <?xml version="1.0"?>
> > <!--
> > This is an example of a cluster.conf file to run qpidd HA under
> rgmanager.
> >
> > NOTE: fencing is not shown, you must configure fencing appropriately for
> > your cluster.
> > -->
> >
> > <cluster name="test-cluster" config_version="18">
> >   <!-- The cluster has 2 nodes. Each has a unique nodid and one vote
> >        for quorum. -->
> >   <clusternodes>
> >     <clusternode name="/external-ip-1/" nodeid="1"/>
> >     <clusternode name="/external-ip-2/" nodeid="2"/>
> >   </clusternodes>
> >   <cman two_node="1" expected_votes="1" transport="udpu">
> >   </cman>
> >   <!-- Resouce Manager configuration. -->
> >   <rm>
> >     <!--
> >         There is a failoverdomain for each node containing just that
> node.
> >         This lets us stipulate that the qpidd service should always run
> > on each node.
> >     -->
> >     <failoverdomains>
> >       <failoverdomain name="east-domain" restricted="1">
> >         <failoverdomainnode name="/external-ip-1/"/>
> >       </failoverdomain>
> >       <failoverdomain name="west-domain" restricted="1">
> >         <failoverdomainnode name="/external-ip-2/"/>
> >       </failoverdomain>
> >     </failoverdomains>
> >
> >     <resources>
> >       <!-- This script starts a qpidd broker acting as a backup. -->
> >       <script file="/usr/local/etc/init.d/qpidd" name="qpidd"/>
> >
> >       <!-- This script promotes the qpidd broker on this node to
> > primary. -->
> >       <script file="/usr/local/etc/init.d/qpidd-primary"
> > name="qpidd-primary"/>
> >     </resources>
> >
> >     <!-- There is a qpidd service on each node, it should be restarted
> > if it fails. -->
> >     <service name="east-qpidd-service" domain="east-domain"
> > recovery="restart">
> >       <script ref="qpidd"/>
> >     </service>
> >     <service name="west-qpidd-service" domain="west-domain"
> > recovery="restart">
> >       <script ref="qpidd"/>
> >     </service>
> >
> >     <!-- There should always be a single qpidd-primary service, it can
> > run on any node. -->
> >     <service name="qpidd-primary-service" autostart="1" exclusive="0"
> > recovery="relocate">
> >       <script ref="qpidd-primary"/>
> >     </service>
> >   </rm>
> > </cluster>
> > ------------------------------------------------
> >
> > Thanks,
> > Terance
> >
> >
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121029/c4dbfe97/attachment.htm>

From engineerzuhairraza at gmail.com  Mon Oct 29 11:23:09 2012
From: engineerzuhairraza at gmail.com (Zohair Raza)
Date: Mon, 29 Oct 2012 15:23:09 +0400
Subject: [Linux-cluster] Problem in fencing, gfs2 freezes
Message-ID: <CAF_PTevnYn_Ck4Wda2TMF=uaULFW4vDZufykPSLY90g=hfP6yg@mail.gmail.com>

Hi,

I have setup a Primary/Primary cluster with GFS2.

All works good if I shut down any node regularly, but when I unplug power
of any node, GFS freezes and I can not access the device.

Tried to use http://people.redhat.com/lhh/obliterate

this is what I see in logs

Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not arrive in time.
Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -> Unknown ) conn(
Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )
Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated
Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender thread
Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed
Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure ->
Unconnected )
Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated
Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver thread
Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started
Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected -> WFConnection
)
Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm
fence-peer res0
Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed
Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm
fence-peer res0 exit code 1 (0x100)
Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper broken,
returned 1
Oct 29 08:05:48 node1 corosync[1346]:   [TOTEM ] A processor failed,
forming new configuration.
Oct 29 08:05:53 node1 corosync[1346]:   [QUORUM] Members[1]: 1
Oct 29 08:05:53 node1 corosync[1346]:   [TOTEM ] A processor joined or left
the membership and a new membership was formed.
Oct 29 08:05:53 node1 corosync[1346]:   [CPG   ] chosen downlist: sender
r(0) ip(192.168.23.128) ; members(old:2 left:1)
Oct 29 08:05:53 node1 corosync[1346]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2
Oct 29 08:05:53 node1 fenced[1401]: fencing node node2
Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: jid=1:
Trying to acquire journal lock...
Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent
fence_ack_manual result: error from agent
Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed
Oct 29 08:05:56 node1 fenced[1401]: fencing node node2
Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent
fence_ack_manual result: error from agent
Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed
Oct 29 08:05:59 node1 fenced[1401]: fencing node node2
Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent
fence_ack_manual result: error from agent
Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed

Regards,
Zohair Raza
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121029/4f40d15e/attachment.htm>

From lists at alteeve.ca  Mon Oct 29 17:28:21 2012
From: lists at alteeve.ca (Digimer)
Date: Mon, 29 Oct 2012 10:28:21 -0700
Subject: [Linux-cluster] Problem in fencing, gfs2 freezes
In-Reply-To: <CAF_PTevnYn_Ck4Wda2TMF=uaULFW4vDZufykPSLY90g=hfP6yg@mail.gmail.com>
References: <CAF_PTevnYn_Ck4Wda2TMF=uaULFW4vDZufykPSLY90g=hfP6yg@mail.gmail.com>
Message-ID: <508EBCB5.2060107@alteeve.ca>

Please see the answer given on the DRBD Users list to this question.

digimer

On 10/29/2012 04:23 AM, Zohair Raza wrote:
> Hi, 
> 
> I have setup a Primary/Primary cluster with GFS2.
> 
> All works good if I shut down any node regularly, but when I unplug
> power of any node, GFS freezes and I can not access the device. 
> 
> Tried to use http://people.redhat.com/lhh/obliterate 
> 
> this is what I see in logs 
> 
> Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not arrive in time.
> Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -> Unknown )
> conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0
> -> 1 )
> Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated
> Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender thread
> Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed
> Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure ->
> Unconnected )
> Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated
> Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver thread
> Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started
> Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected ->
> WFConnection )
> Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm
> fence-peer res0
> Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed
> Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm
> fence-peer res0 exit code 1 (0x100)
> Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper broken,
> returned 1
> Oct 29 08:05:48 node1 corosync[1346]:   [TOTEM ] A processor failed,
> forming new configuration.
> Oct 29 08:05:53 node1 corosync[1346]:   [QUORUM] Members[1]: 1
> Oct 29 08:05:53 node1 corosync[1346]:   [TOTEM ] A processor joined or
> left the membership and a new membership was formed.
> Oct 29 08:05:53 node1 corosync[1346]:   [CPG   ] chosen downlist: sender
> r(0) ip(192.168.23.128) ; members(old:2 left:1)
> Oct 29 08:05:53 node1 corosync[1346]:   [MAIN  ] Completed service
> synchronization, ready to provide service.
> Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2
> Oct 29 08:05:53 node1 fenced[1401]: fencing node node2
> Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: jid=1:
> Trying to acquire journal lock...
> Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent
> fence_ack_manual result: error from agent
> Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed
> Oct 29 08:05:56 node1 fenced[1401]: fencing node node2
> Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent
> fence_ack_manual result: error from agent
> Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed
> Oct 29 08:05:59 node1 fenced[1401]: fencing node node2
> Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent
> fence_ack_manual result: error from agent
> Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed
> 
> Regards,
> Zohair Raza 
> 
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From parvez.h.shaikh at gmail.com  Tue Oct 30 08:54:22 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Tue, 30 Oct 2012 14:24:22 +0530
Subject: [Linux-cluster] Not restarting "max_restart" times before
	relocating failed service
Message-ID: <CAKrd532MfD+jR6Pfv64mMY9MW7hx4PKA-c5s2ituD=pqqgydFg@mail.gmail.com>

Hi experts,

I have defined a service as follows in cluster.conf -

                <service autostart="0" domain="mydomain" exclusive="0"
max_restarts="5" name="mgmt" recovery="restart">
                        <script ref="myHaAgent"/>
                        <ip ref="192.168.51.51"/>
                </service>

I mentioned max_restarts=5 hoping that if cluster fails to start service 5
times, then it will relocate to another cluster node in failover domain.

To check this, I turned down NIC hosting service's floating IP and got
following logs -

Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not
detected
Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip
"192.168.51.51" returned 1 (generic error)
Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service
service:mgmt
*Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
recovering*
Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed service
service:mgmt
Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip "192.168.51.51"
returned 1 (generic error)
Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start
service:mgmt; return value: 1
Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service
service:mgmt
*Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
recovering
Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating failed
service service:mgmt*
Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
stopped
Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
stopped

But from the log it appears that cluster tried to restart service only ONCE
before relocating.

I was expecting cluster to retry starting this service five times on the
same node before relocating

Can anybody correct my understanding?

Thanks,
Parvez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121030/ea5b6243/attachment.htm>

From parvez.h.shaikh at gmail.com  Tue Oct 30 10:34:57 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Tue, 30 Oct 2012 16:04:57 +0530
Subject: [Linux-cluster] Monitoring Frequency - can it be changed?
Message-ID: <CAKrd532rEidJa3CSgK0XisuQO8kaknAhFS1PovKgf8iB593KOQ@mail.gmail.com>

Hi experts,

Can we change frequency at which resources are monitored by Cluster?

I observed 30 seconds as monitoring frequency.

Thanks,
Parvez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121030/d7a246be/attachment.htm>

From lists at alteeve.ca  Tue Oct 30 15:55:55 2012
From: lists at alteeve.ca (Digimer)
Date: Tue, 30 Oct 2012 08:55:55 -0700
Subject: [Linux-cluster] Not restarting "max_restart" times before
 relocating failed service
In-Reply-To: <CAKrd532MfD+jR6Pfv64mMY9MW7hx4PKA-c5s2ituD=pqqgydFg@mail.gmail.com>
References: <CAKrd532MfD+jR6Pfv64mMY9MW7hx4PKA-c5s2ituD=pqqgydFg@mail.gmail.com>
Message-ID: <508FF88B.9000901@alteeve.ca>

On 10/30/2012 01:54 AM, Parvez Shaikh wrote:
> Hi experts,
> 
> I have defined a service as follows in cluster.conf -
> 
>                 <service autostart="0" domain="mydomain" exclusive="0"
> max_restarts="5" name="mgmt" recovery="restart">
>                         <script ref="myHaAgent"/>
>                         <ip ref="192.168.51.51"/>
>                 </service>
> 
> I mentioned max_restarts=5 hoping that if cluster fails to start service
> 5 times, then it will relocate to another cluster node in failover domain.
> 
> To check this, I turned down NIC hosting service's floating IP and got
> following logs -
> 
> Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not
> detected
> Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
> Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
> Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip
> "192.168.51.51" returned 1 (generic error)
> Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service
> service:mgmt
> *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> recovering*
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed
> service service:mgmt
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip
> "192.168.51.51" returned 1 (generic error)
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start
> service:mgmt; return value: 1
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service
> service:mgmt
> *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> recovering
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating failed
> service service:mgmt*
> Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> stopped
> Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> stopped
> 
> But from the log it appears that cluster tried to restart service only
> ONCE before relocating.
> 
> I was expecting cluster to retry starting this service five times on the
> same node before relocating
> 
> Can anybody correct my understanding?
> 
> Thanks,
> Parvez

What version? Please paste your full cluster.conf.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From rossnick-lists at cybercat.ca  Tue Oct 30 19:13:40 2012
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Tue, 30 Oct 2012 15:13:40 -0400
Subject: [Linux-cluster] Question about KVM and cluster
Message-ID: <509026E4.2070005@cybercat.ca>

We have 2 distinct clusters in 2 different server rooms for HA perposes. 
So services in culster A have a replica in cluster B. Files from 
services are rsync periodicly on the other server room.

At about every week or so, we switch the running production services 
from one room to another. This is working perfectly for now.

Now, I need to add a KVM virtual machine to the mix. I will then create 
a VM on cluster A, with it's data residing on the fiberchannel raid 
arrays via lvm. The VM will (I hope so) migrate from one machine to 
another within the same cluster when there's a failure or other. But 
what about migrate it to another server room ? Is this possible ? Is 
there a way to replicate at the block-level a lvm volume ? What are my 
other options ?

My other option would be to have a 2 VMs, 1 in each room, and rsync data 
inside the VMs, but I would like to avoir this.

Best regards.


From parvez.h.shaikh at gmail.com  Wed Oct 31 04:20:55 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Wed, 31 Oct 2012 09:50:55 +0530
Subject: [Linux-cluster] Not restarting "max_restart" times before
 relocating failed service
In-Reply-To: <508FF88B.9000901@alteeve.ca>
References: <CAKrd532MfD+jR6Pfv64mMY9MW7hx4PKA-c5s2ituD=pqqgydFg@mail.gmail.com>
	<508FF88B.9000901@alteeve.ca>
Message-ID: <CAKrd5307Lg1wJecyktAU36pRno=-koXPezE2hk2ViCwVxhbMLQ@mail.gmail.com>

Hi Digimer,

cman_tool version gives following -

6.2.0 config 22

Cluster.conf -

<?xml version="1.0"?>
<cluster alias="PARVEZ" config_version="22" name="PARVEZ">
        <clusternodes>
                <clusternode name="myblade2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device blade="2"
missing_as_off="1" name="BladeCenterFencing-1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="myblade1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device blade="1"
missing_as_off="1" name="BladeCenterFencing-1"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_bladecenter" ipaddr="
mm-1.mydomain.com" login="XXXX" name="BladeCenterFencing-1" passwd="XXXXX"
shell_timeout="10"/>
        </fencedevices>
        <rm>
                <resources>
                        <script file="/localhome/my/my_ha"
name="myHaAgent"/>
                        <ip address="192.168.51.51" monitor_link="1"/>
                </resources>
                <failoverdomains>
                        <failoverdomain name="mydomain" nofailback="1"
ordered="1" restricted="1">
                                <failoverdomainnode name="myblade2"
priority="2"/>
                                <failoverdomainnode name="myblade1"
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <service autostart="0" domain="mydomain" exclusive="0"
max_restarts="5" name="mgmt" recovery="restart">
                        <script ref="myHaAgent"/>
                        <ip ref="192.168.51.51"/>
                </service>
        </rm>
        <fence_daemon clean_start="1" post_fail_delay="0"
post_join_delay="0"/>
</cluster>

Thanks,
Parvez

On Tue, Oct 30, 2012 at 9:25 PM, Digimer <lists at alteeve.ca> wrote:

> On 10/30/2012 01:54 AM, Parvez Shaikh wrote:
> > Hi experts,
> >
> > I have defined a service as follows in cluster.conf -
> >
> >                 <service autostart="0" domain="mydomain" exclusive="0"
> > max_restarts="5" name="mgmt" recovery="restart">
> >                         <script ref="myHaAgent"/>
> >                         <ip ref="192.168.51.51"/>
> >                 </service>
> >
> > I mentioned max_restarts=5 hoping that if cluster fails to start service
> > 5 times, then it will relocate to another cluster node in failover
> domain.
> >
> > To check this, I turned down NIC hosting service's floating IP and got
> > following logs -
> >
> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not
> > detected
> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
> > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip
> > "192.168.51.51" returned 1 (generic error)
> > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service
> > service:mgmt
> > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> > recovering*
> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed
> > service service:mgmt
> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip
> > "192.168.51.51" returned 1 (generic error)
> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start
> > service:mgmt; return value: 1
> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service
> > service:mgmt
> > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> > recovering
> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating failed
> > service service:mgmt*
> > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> > stopped
> > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> > stopped
> >
> > But from the log it appears that cluster tried to restart service only
> > ONCE before relocating.
> >
> > I was expecting cluster to retry starting this service five times on the
> > same node before relocating
> >
> > Can anybody correct my understanding?
> >
> > Thanks,
> > Parvez
>
> What version? Please paste your full cluster.conf.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121031/05725d92/attachment.htm>

From lists at alteeve.ca  Wed Oct 31 04:44:45 2012
From: lists at alteeve.ca (Digimer)
Date: Tue, 30 Oct 2012 21:44:45 -0700
Subject: [Linux-cluster] Not restarting "max_restart" times before
 relocating failed service
In-Reply-To: <CAKrd5307Lg1wJecyktAU36pRno=-koXPezE2hk2ViCwVxhbMLQ@mail.gmail.com>
References: <CAKrd532MfD+jR6Pfv64mMY9MW7hx4PKA-c5s2ituD=pqqgydFg@mail.gmail.com>
	<508FF88B.9000901@alteeve.ca>
	<CAKrd5307Lg1wJecyktAU36pRno=-koXPezE2hk2ViCwVxhbMLQ@mail.gmail.com>
Message-ID: <5090ACBD.40703@alteeve.ca>

What does 'rpm -q cman' return?

This looks very odd;
<fencedevice agent="fence_bladecenter"
> ipaddr="mm-1.mydomain.com <http://mm-1.mydomain.com>"

Please remove this for now;

 <fence_daemon clean_start="1" post_fail_delay="0"
> post_join_delay="0"/>

In general, you don't want to assume a clean start. It's asking for
trouble. The default delays are also sane. You can always come back to
this later after this issue is resolved, if you wish.

On 10/30/2012 09:20 PM, Parvez Shaikh wrote:
> Hi Digimer,
> 
> cman_tool version gives following -
> 
> 6.2.0 config 22
> 
> Cluster.conf -
> 
> <?xml version="1.0"?>
> <cluster alias="PARVEZ" config_version="22" name="PARVEZ">
>         <clusternodes>
>                 <clusternode name="myblade2" nodeid="2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device blade="2"
> missing_as_off="1" name="BladeCenterFencing-1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="myblade1" nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device blade="1"
> missing_as_off="1" name="BladeCenterFencing-1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_bladecenter"
> ipaddr="mm-1.mydomain.com <http://mm-1.mydomain.com>" login="XXXX"
> name="BladeCenterFencing-1" passwd="XXXXX" shell_timeout="10"/>
>         </fencedevices>
>         <rm>
>                 <resources>
>                         <script file="/localhome/my/my_ha"
> name="myHaAgent"/>
>                         <ip address="192.168.51.51" monitor_link="1"/>
>                 </resources>
>                 <failoverdomains>
>                         <failoverdomain name="mydomain" nofailback="1"
> ordered="1" restricted="1">
>                                 <failoverdomainnode name="myblade2"
> priority="2"/>
>                                 <failoverdomainnode name="myblade1"
> priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <service autostart="0" domain="mydomain" exclusive="0"
> max_restarts="5" name="mgmt" recovery="restart">
>                         <script ref="myHaAgent"/>
>                         <ip ref="192.168.51.51"/>
>                 </service>
>         </rm>
>         <fence_daemon clean_start="1" post_fail_delay="0"
> post_join_delay="0"/>
> </cluster>
> 
> Thanks,
> Parvez
> 
> On Tue, Oct 30, 2012 at 9:25 PM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
> 
>     On 10/30/2012 01:54 AM, Parvez Shaikh wrote:
>     > Hi experts,
>     >
>     > I have defined a service as follows in cluster.conf -
>     >
>     >                 <service autostart="0" domain="mydomain" exclusive="0"
>     > max_restarts="5" name="mgmt" recovery="restart">
>     >                         <script ref="myHaAgent"/>
>     >                         <ip ref="192.168.51.51"/>
>     >                 </service>
>     >
>     > I mentioned max_restarts=5 hoping that if cluster fails to start
>     service
>     > 5 times, then it will relocate to another cluster node in failover
>     domain.
>     >
>     > To check this, I turned down NIC hosting service's floating IP and got
>     > following logs -
>     >
>     > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not
>     > detected
>     > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
>     > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
>     > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip
>     > "192.168.51.51" returned 1 (generic error)
>     > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service
>     > service:mgmt
>     > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service
>     service:mgmt is
>     > recovering*
>     > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed
>     > service service:mgmt
>     > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip
>     > "192.168.51.51" returned 1 (generic error)
>     > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start
>     > service:mgmt; return value: 1
>     > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service
>     > service:mgmt
>     > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service
>     service:mgmt is
>     > recovering
>     > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating
>     failed
>     > service service:mgmt*
>     > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service
>     service:mgmt is
>     > stopped
>     > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service
>     service:mgmt is
>     > stopped
>     >
>     > But from the log it appears that cluster tried to restart service only
>     > ONCE before relocating.
>     >
>     > I was expecting cluster to retry starting this service five times
>     on the
>     > same node before relocating
>     >
>     > Can anybody correct my understanding?
>     >
>     > Thanks,
>     > Parvez
> 
>     What version? Please paste your full cluster.conf.
> 
>     --
>     Digimer
>     Papers and Projects: https://alteeve.ca/w/
>     What if the cure for cancer is trapped in the mind of a person without
>     access to education?
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From parvez.h.shaikh at gmail.com  Wed Oct 31 04:48:18 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Wed, 31 Oct 2012 10:18:18 +0530
Subject: [Linux-cluster] Not restarting "max_restart" times before
 relocating failed service
In-Reply-To: <5090ACBD.40703@alteeve.ca>
References: <CAKrd532MfD+jR6Pfv64mMY9MW7hx4PKA-c5s2ituD=pqqgydFg@mail.gmail.com>
	<508FF88B.9000901@alteeve.ca>
	<CAKrd5307Lg1wJecyktAU36pRno=-koXPezE2hk2ViCwVxhbMLQ@mail.gmail.com>
	<5090ACBD.40703@alteeve.ca>
Message-ID: <CAKrd533uuWLKoh67gqOtbv_zt7K_1qQxqx8X8wRfTw6OB07KbQ@mail.gmail.com>

Digimer,

Out put of rpm -q cman

cman-2.0.115-34.el5

There is no http mentioned in fencedevice, I think email client is
inserting it.


Thanks,
Parvez

On Wed, Oct 31, 2012 at 10:14 AM, Digimer <lists at alteeve.ca> wrote:

> What does 'rpm -q cman' return?
>
> This looks very odd;
> <fencedevice agent="fence_bladecenter"
> > ipaddr="mm-1.mydomain.com <http://mm-1.mydomain.com>"
>
> Please remove this for now;
>
>  <fence_daemon clean_start="1" post_fail_delay="0"
> > post_join_delay="0"/>
>
> In general, you don't want to assume a clean start. It's asking for
> trouble. The default delays are also sane. You can always come back to
> this later after this issue is resolved, if you wish.
>
> On 10/30/2012 09:20 PM, Parvez Shaikh wrote:
> > Hi Digimer,
> >
> > cman_tool version gives following -
> >
> > 6.2.0 config 22
> >
> > Cluster.conf -
> >
> > <?xml version="1.0"?>
> > <cluster alias="PARVEZ" config_version="22" name="PARVEZ">
> >         <clusternodes>
> >                 <clusternode name="myblade2" nodeid="2" votes="1">
> >                         <fence>
> >                                 <method name="1">
> >                                         <device blade="2"
> > missing_as_off="1" name="BladeCenterFencing-1"/>
> >                                 </method>
> >                         </fence>
> >                 </clusternode>
> >                 <clusternode name="myblade1" nodeid="1" votes="1">
> >                         <fence>
> >                                 <method name="1">
> >                                         <device blade="1"
> > missing_as_off="1" name="BladeCenterFencing-1"/>
> >                                 </method>
> >                         </fence>
> >                 </clusternode>
> >         </clusternodes>
> >         <cman expected_votes="1" two_node="1"/>
> >         <fencedevices>
> >                 <fencedevice agent="fence_bladecenter"
> > ipaddr="mm-1.mydomain.com <http://mm-1.mydomain.com>" login="XXXX"
> > name="BladeCenterFencing-1" passwd="XXXXX" shell_timeout="10"/>
> >         </fencedevices>
> >         <rm>
> >                 <resources>
> >                         <script file="/localhome/my/my_ha"
> > name="myHaAgent"/>
> >                         <ip address="192.168.51.51" monitor_link="1"/>
> >                 </resources>
> >                 <failoverdomains>
> >                         <failoverdomain name="mydomain" nofailback="1"
> > ordered="1" restricted="1">
> >                                 <failoverdomainnode name="myblade2"
> > priority="2"/>
> >                                 <failoverdomainnode name="myblade1"
> > priority="1"/>
> >                         </failoverdomain>
> >                 </failoverdomains>
> >                 <service autostart="0" domain="mydomain" exclusive="0"
> > max_restarts="5" name="mgmt" recovery="restart">
> >                         <script ref="myHaAgent"/>
> >                         <ip ref="192.168.51.51"/>
> >                 </service>
> >         </rm>
> >         <fence_daemon clean_start="1" post_fail_delay="0"
> > post_join_delay="0"/>
> > </cluster>
> >
> > Thanks,
> > Parvez
> >
> > On Tue, Oct 30, 2012 at 9:25 PM, Digimer <lists at alteeve.ca
> > <mailto:lists at alteeve.ca>> wrote:
> >
> >     On 10/30/2012 01:54 AM, Parvez Shaikh wrote:
> >     > Hi experts,
> >     >
> >     > I have defined a service as follows in cluster.conf -
> >     >
> >     >                 <service autostart="0" domain="mydomain"
> exclusive="0"
> >     > max_restarts="5" name="mgmt" recovery="restart">
> >     >                         <script ref="myHaAgent"/>
> >     >                         <ip ref="192.168.51.51"/>
> >     >                 </service>
> >     >
> >     > I mentioned max_restarts=5 hoping that if cluster fails to start
> >     service
> >     > 5 times, then it will relocate to another cluster node in failover
> >     domain.
> >     >
> >     > To check this, I turned down NIC hosting service's floating IP and
> got
> >     > following logs -
> >     >
> >     > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1:
> Not
> >     > detected
> >     > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on
> eth1...
> >     > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on
> eth1...
> >     > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip
> >     > "192.168.51.51" returned 1 (generic error)
> >     > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service
> >     > service:mgmt
> >     > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service
> >     service:mgmt is
> >     > recovering*
> >     > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed
> >     > service service:mgmt
> >     > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip
> >     > "192.168.51.51" returned 1 (generic error)
> >     > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to
> start
> >     > service:mgmt; return value: 1
> >     > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service
> >     > service:mgmt
> >     > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service
> >     service:mgmt is
> >     > recovering
> >     > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating
> >     failed
> >     > service service:mgmt*
> >     > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service
> >     service:mgmt is
> >     > stopped
> >     > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service
> >     service:mgmt is
> >     > stopped
> >     >
> >     > But from the log it appears that cluster tried to restart service
> only
> >     > ONCE before relocating.
> >     >
> >     > I was expecting cluster to retry starting this service five times
> >     on the
> >     > same node before relocating
> >     >
> >     > Can anybody correct my understanding?
> >     >
> >     > Thanks,
> >     > Parvez
> >
> >     What version? Please paste your full cluster.conf.
> >
> >     --
> >     Digimer
> >     Papers and Projects: https://alteeve.ca/w/
> >     What if the cure for cancer is trapped in the mind of a person
> without
> >     access to education?
> >
> >
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121031/93a26d49/attachment.htm>

From emi2fast at gmail.com  Wed Oct 31 09:23:20 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Wed, 31 Oct 2012 10:23:20 +0100
Subject: [Linux-cluster] Not restarting "max_restart" times before
 relocating failed service
In-Reply-To: <CAKrd5307Lg1wJecyktAU36pRno=-koXPezE2hk2ViCwVxhbMLQ@mail.gmail.com>
References: <CAKrd532MfD+jR6Pfv64mMY9MW7hx4PKA-c5s2ituD=pqqgydFg@mail.gmail.com>
	<508FF88B.9000901@alteeve.ca>
	<CAKrd5307Lg1wJecyktAU36pRno=-koXPezE2hk2ViCwVxhbMLQ@mail.gmail.com>
Message-ID: <CAE7pJ3A3PZZBQ-KQEG5Yz_tQknzLF6U1Ds3M+2k6WFmQH3mW4A@mail.gmail.com>

Hello

Maybe you missing recovery="restart" in your services

2012/10/31 Parvez Shaikh <parvez.h.shaikh at gmail.com>

> Hi Digimer,
>
> cman_tool version gives following -
>
> 6.2.0 config 22
>
> Cluster.conf -
>
> <?xml version="1.0"?>
> <cluster alias="PARVEZ" config_version="22" name="PARVEZ">
>         <clusternodes>
>                 <clusternode name="myblade2" nodeid="2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device blade="2"
> missing_as_off="1" name="BladeCenterFencing-1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="myblade1" nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device blade="1"
> missing_as_off="1" name="BladeCenterFencing-1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_bladecenter" ipaddr="
> mm-1.mydomain.com" login="XXXX" name="BladeCenterFencing-1"
> passwd="XXXXX" shell_timeout="10"/>
>         </fencedevices>
>         <rm>
>                 <resources>
>                         <script file="/localhome/my/my_ha"
> name="myHaAgent"/>
>                         <ip address="192.168.51.51" monitor_link="1"/>
>                 </resources>
>                 <failoverdomains>
>                         <failoverdomain name="mydomain" nofailback="1"
> ordered="1" restricted="1">
>                                 <failoverdomainnode name="myblade2"
> priority="2"/>
>                                 <failoverdomainnode name="myblade1"
> priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <service autostart="0" domain="mydomain" exclusive="0"
> max_restarts="5" name="mgmt" recovery="restart">
>                         <script ref="myHaAgent"/>
>                         <ip ref="192.168.51.51"/>
>                 </service>
>         </rm>
>         <fence_daemon clean_start="1" post_fail_delay="0"
> post_join_delay="0"/>
> </cluster>
>
> Thanks,
> Parvez
>
> On Tue, Oct 30, 2012 at 9:25 PM, Digimer <lists at alteeve.ca> wrote:
>
>> On 10/30/2012 01:54 AM, Parvez Shaikh wrote:
>> > Hi experts,
>> >
>> > I have defined a service as follows in cluster.conf -
>> >
>> >                 <service autostart="0" domain="mydomain" exclusive="0"
>> > max_restarts="5" name="mgmt" recovery="restart">
>> >                         <script ref="myHaAgent"/>
>> >                         <ip ref="192.168.51.51"/>
>> >                 </service>
>> >
>> > I mentioned max_restarts=5 hoping that if cluster fails to start service
>> > 5 times, then it will relocate to another cluster node in failover
>> domain.
>> >
>> > To check this, I turned down NIC hosting service's floating IP and got
>> > following logs -
>> >
>> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not
>> > detected
>> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
>> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
>> > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip
>> > "192.168.51.51" returned 1 (generic error)
>> > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service
>> > service:mgmt
>> > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
>> > recovering*
>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed
>> > service service:mgmt
>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip
>> > "192.168.51.51" returned 1 (generic error)
>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start
>> > service:mgmt; return value: 1
>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service
>> > service:mgmt
>> > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
>> > recovering
>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating failed
>> > service service:mgmt*
>> > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
>> > stopped
>> > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
>> > stopped
>> >
>> > But from the log it appears that cluster tried to restart service only
>> > ONCE before relocating.
>> >
>> > I was expecting cluster to retry starting this service five times on the
>> > same node before relocating
>> >
>> > Can anybody correct my understanding?
>> >
>> > Thanks,
>> > Parvez
>>
>> What version? Please paste your full cluster.conf.
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121031/f7233fe2/attachment.htm>

From parvez.h.shaikh at gmail.com  Wed Oct 31 17:35:05 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Wed, 31 Oct 2012 23:05:05 +0530
Subject: [Linux-cluster] Not restarting "max_restart" times before
 relocating failed service
In-Reply-To: <CAE7pJ3A3PZZBQ-KQEG5Yz_tQknzLF6U1Ds3M+2k6WFmQH3mW4A@mail.gmail.com>
References: <CAKrd532MfD+jR6Pfv64mMY9MW7hx4PKA-c5s2ituD=pqqgydFg@mail.gmail.com>
	<508FF88B.9000901@alteeve.ca>
	<CAKrd5307Lg1wJecyktAU36pRno=-koXPezE2hk2ViCwVxhbMLQ@mail.gmail.com>
	<CAE7pJ3A3PZZBQ-KQEG5Yz_tQknzLF6U1Ds3M+2k6WFmQH3mW4A@mail.gmail.com>
Message-ID: <CAKrd530dBW=972kBwD_36vRG_fkKjveZQEVhWHYEUcMCvmRjeA@mail.gmail.com>

Hi,

I am using recovery=restart as evident from earlier attached cluster.conf

Thanks,
Parvez

On Wed, Oct 31, 2012 at 2:53 PM, emmanuel segura <emi2fast at gmail.com> wrote:

> Hello
>
> Maybe you missing recovery="restart" in your services
>
> 2012/10/31 Parvez Shaikh <parvez.h.shaikh at gmail.com>
>
>> Hi Digimer,
>>
>> cman_tool version gives following -
>>
>> 6.2.0 config 22
>>
>> Cluster.conf -
>>
>> <?xml version="1.0"?>
>> <cluster alias="PARVEZ" config_version="22" name="PARVEZ">
>>         <clusternodes>
>>                 <clusternode name="myblade2" nodeid="2" votes="1">
>>                         <fence>
>>                                 <method name="1">
>>                                         <device blade="2"
>> missing_as_off="1" name="BladeCenterFencing-1"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>                 <clusternode name="myblade1" nodeid="1" votes="1">
>>                         <fence>
>>                                 <method name="1">
>>                                         <device blade="1"
>> missing_as_off="1" name="BladeCenterFencing-1"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>         </clusternodes>
>>         <cman expected_votes="1" two_node="1"/>
>>         <fencedevices>
>>                 <fencedevice agent="fence_bladecenter" ipaddr="
>> mm-1.mydomain.com" login="XXXX" name="BladeCenterFencing-1"
>> passwd="XXXXX" shell_timeout="10"/>
>>         </fencedevices>
>>         <rm>
>>                 <resources>
>>                         <script file="/localhome/my/my_ha"
>> name="myHaAgent"/>
>>                         <ip address="192.168.51.51" monitor_link="1"/>
>>                 </resources>
>>                 <failoverdomains>
>>                         <failoverdomain name="mydomain" nofailback="1"
>> ordered="1" restricted="1">
>>                                 <failoverdomainnode name="myblade2"
>> priority="2"/>
>>                                 <failoverdomainnode name="myblade1"
>> priority="1"/>
>>                         </failoverdomain>
>>                 </failoverdomains>
>>                 <service autostart="0" domain="mydomain" exclusive="0"
>> max_restarts="5" name="mgmt" recovery="restart">
>>                         <script ref="myHaAgent"/>
>>                         <ip ref="192.168.51.51"/>
>>                 </service>
>>         </rm>
>>         <fence_daemon clean_start="1" post_fail_delay="0"
>> post_join_delay="0"/>
>> </cluster>
>>
>> Thanks,
>> Parvez
>>
>> On Tue, Oct 30, 2012 at 9:25 PM, Digimer <lists at alteeve.ca> wrote:
>>
>>> On 10/30/2012 01:54 AM, Parvez Shaikh wrote:
>>> > Hi experts,
>>> >
>>> > I have defined a service as follows in cluster.conf -
>>> >
>>> >                 <service autostart="0" domain="mydomain" exclusive="0"
>>> > max_restarts="5" name="mgmt" recovery="restart">
>>> >                         <script ref="myHaAgent"/>
>>> >                         <ip ref="192.168.51.51"/>
>>> >                 </service>
>>> >
>>> > I mentioned max_restarts=5 hoping that if cluster fails to start
>>> service
>>> > 5 times, then it will relocate to another cluster node in failover
>>> domain.
>>> >
>>> > To check this, I turned down NIC hosting service's floating IP and got
>>> > following logs -
>>> >
>>> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not
>>> > detected
>>> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
>>> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
>>> > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip
>>> > "192.168.51.51" returned 1 (generic error)
>>> > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service
>>> > service:mgmt
>>> > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt
>>> is
>>> > recovering*
>>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed
>>> > service service:mgmt
>>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip
>>> > "192.168.51.51" returned 1 (generic error)
>>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start
>>> > service:mgmt; return value: 1
>>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service
>>> > service:mgmt
>>> > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt
>>> is
>>> > recovering
>>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating failed
>>> > service service:mgmt*
>>> > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
>>> > stopped
>>> > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
>>> > stopped
>>> >
>>> > But from the log it appears that cluster tried to restart service only
>>> > ONCE before relocating.
>>> >
>>> > I was expecting cluster to retry starting this service five times on
>>> the
>>> > same node before relocating
>>> >
>>> > Can anybody correct my understanding?
>>> >
>>> > Thanks,
>>> > Parvez
>>>
>>> What version? Please paste your full cluster.conf.
>>>
>>> --
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/
>>> What if the cure for cancer is trapped in the mind of a person without
>>> access to education?
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121031/35d24531/attachment.htm>

From jamespedia at gmail.com  Wed Oct 31 19:07:04 2012
From: jamespedia at gmail.com (james pedia)
Date: Wed, 31 Oct 2012 14:07:04 -0500
Subject: [Linux-cluster] gfs2_tool unfreeze hang
Message-ID: <CAAxr-=8Yj6n53UF=TLbjfUdv6Sbx50MYbwA-AbG__Tdo9ZgBCA@mail.gmail.com>

Noticed this thread for the same issue at:

https://www.redhat.com/archives/linux-cluster/2012-September/msg00084.html:

I think I hit the same issue:

(CentOS6.3)
# uname -r
2.6.32-279.el6.x86_64

gfs2-utils-3.0.12.1-32.el6_3.1.x86_64 is in use here.


# gfs2_tool freeze /var/www/html
# ls -l /var/www/html/
total 8
-rw-r--r-- 1 root root 10 Oct 30 23:47 a
-rw-r--r-- 1 root root 41 Oct 30 20:44 index.html
# cp /var/www/html/a /var/www/html/b
(HANG HERE)

Then try this:
# gfs2_tool unfreeze /var/www/html
(HANG AS WELL)

The whole cluster has to be reset to recover from this.

'dmsetup suspend' and 'dmsetup resume' are working fine.

Are these commands basically doing the same thing ('dmsetup suspend' vs
'gfs2_tool freeze')?

Is there a way to see if GFS2 file system is currently being suspended or
frozen?

Thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121031/4dcb7e0d/attachment.htm>