From queszama at yahoo.in  Thu Jan  3 10:00:46 2013
From: queszama at yahoo.in (Zama Ques)
Date: Thu, 3 Jan 2013 18:00:46 +0800 (SGT)
Subject: [Linux-cluster] GFS without creating a cluster
Message-ID: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com>

Hi All ,


Need few clarification regarding GFS. 


I need to create a shared file system for our servers . The servers will write to the shared file system at the same time and there is no requirement for a cluster . 

Planning to use GFS but GFS requires cluster software to be running . My confusion here is If I just run the cluster software ( cman etc ) without creating a cluster , will I be able to configure and run GFS2. Also , is it possible to write to a GFS file system from many servers at the same time ? 

Will be great if somebody can clarify by doubts. 


Thanks in Advance
Zaman


From swhiteho at redhat.com  Thu Jan  3 10:16:38 2013
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Thu, 03 Jan 2013 10:16:38 +0000
Subject: [Linux-cluster] GFS without creating a cluster
In-Reply-To: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com>
References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com>
Message-ID: <1357208199.2696.1.camel@menhir>

Hi,

On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote:
> Hi All ,
> 
> 
> Need few clarification regarding GFS. 
> 
> 
> I need to create a shared file system for our servers . The servers will write to the shared file system at the same time and there is no requirement for a cluster . 
> 
> Planning to use GFS but GFS requires cluster software to be running . My confusion here is If I just run the cluster software ( cman etc ) without creating a cluster , will I be able to configure and run GFS2. Also , is it possible to write to a GFS file system from many servers at the same time ? 
> 
> Will be great if somebody can clarify by doubts. 
> 
> 
> Thanks in Advance
> Zaman
> 
> 

If you want to use GFS2 without a cluster, then you'll only be able to
use it from a single node (just like if you were using ext3 for
example). If you want to use GFS2 as intended, with multiple nodes
accessing the same filesystem, then you'll need to set up a cluster in
order to do so,

Steve.


From rainer.hartwig.schubert at gmail.com  Thu Jan  3 13:21:27 2013
From: rainer.hartwig.schubert at gmail.com (Rainer Schubert)
Date: Thu, 3 Jan 2013 14:21:27 +0100
Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out
Message-ID: <CANLWqqR0HzD+Vuoa-e4ynsEHqfXEYvzU2p3t907Z9z-S5gj36Q@mail.gmail.com>

Hi,

I have created a small CMAN-Cluster with 3 Nodes and a CLVM
configuration. Now, I want to add a new node (mynode4). CMAN works
fine, cman_tool shows all members:

# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    408   2013-01-03 14:00:57  mynode1
   2   M    408   2013-01-03 14:00:57  mynode2
   3   M    408   2013-01-03 14:00:57  mynode3
   4   M    404   2013-01-03 14:00:56  mynode4


cman_tool services (on mynode4)

fence domain
member count  4
victim count  0
victim now    0
master nodeid 1
wait state    none
members       1 2 3 4


corosync:

corosync-cfgtool -s
Printing ring status.
Local node ID 4
RING ID 0
        id      = 10.10.10.13
        status  = ring 0 active with no faults

Everything looks fine, from my site. No I will start clvmd

:~# /etc/init.d/clvm start
Starting Cluster LVM Daemon: clvm clvmd startup timed out

The CLVM runs into a time out.

My System:

 cat /etc/debian_version
6.0.6

# lvm version
  LVM version:     2.02.66(2) (2010-05-20)
  Library version: 1.02.48 (2010-05-20)
  Driver version:  4.22.0

dpkg -l |grep clvm
ii  clvm                                2.02.66-5
Cluster LVM Daemon for lvm2

 dpkg -l |grep cman
ii  cman                                3.0.12-2
Red Hat cluster suite - cluster manager
ii  libcman3                            3.0.12-2
Red Hat cluster suite - cluster manager libraries

Have anybody a idea, what running false?

best regards


From queszama at yahoo.in  Thu Jan  3 13:37:35 2013
From: queszama at yahoo.in (Zama Ques)
Date: Thu, 3 Jan 2013 21:37:35 +0800 (SGT)
Subject: [Linux-cluster] GFS without creating a cluster
In-Reply-To: <1357208199.2696.1.camel@menhir>
References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com>
	<1357208199.2696.1.camel@menhir>
Message-ID: <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com>


----- Original Message -----
From: Steven Whitehouse <swhiteho at redhat.com>
To: Zama Ques <queszama at yahoo.in>; linux clustering <linux-cluster at redhat.com>
Cc: 
Sent: Thursday, 3 January 2013 3:46 PM
Subject: Re: [Linux-cluster] GFS without creating a cluster

Hi,

On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote:
> Hi All ,
> 
> 
> Need few clarification regarding GFS. 
> 
> 
> I need to create a shared file system for our servers . The servers will write to the shared file system at the same time and there is no requirement for a cluster . 
> 
> Planning to use GFS but GFS requires cluster software to be running . My confusion here is If I just run the cluster software ( cman etc ) without creating a cluster , will I be able to configure and run GFS2. Also , is it possible to write to a GFS file system from many servers at the same time ? 
> 
> Will be great if somebody can clarify by doubts. 
> 
> 
> Thanks in Advance
> Zaman
> 
> 

> If you want to use GFS2 without a cluster, then you'll only be able to
> use it from a single node (just like if you were using ext3 for
> example). If you want to use GFS2 as intended, with multiple nodes
> accessing the same filesystem, then you'll need to set up a cluster in
> order to do so,

Thanks Steve for the reply . As you said setting up a cluster is needed to use GFS2 with multiple nodes, does that mean that I need to create cluster.conf or running cluster services (cman etc) should be fine for setting up GFS2. Not sure whether cman will run without creating cluster.conf

Assuming that I need to setup cluster.conf in order to use GFS2 , that means if there are two nodes in the cluster with GFS2 as file system resource , GFS2 will be mounted on only one host based on failover domain policy . But our requirement is like that GFS2 should be mounted on both servers at the same time? . Based on my little understanding of GFS , looks to me that I will not be able to achieve this using GFS2 or there are some way to achieve this ? 

Please clarify on this.


Thanks in Advance
Zaman


From queszama at yahoo.in  Thu Jan  3 13:37:35 2013
From: queszama at yahoo.in (Zama Ques)
Date: Thu, 3 Jan 2013 21:37:35 +0800 (SGT)
Subject: [Linux-cluster] GFS without creating a cluster
In-Reply-To: <1357208199.2696.1.camel@menhir>
References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com>
	<1357208199.2696.1.camel@menhir>
Message-ID: <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com>


----- Original Message -----
From: Steven Whitehouse <swhiteho at redhat.com>
To: Zama Ques <queszama at yahoo.in>; linux clustering <linux-cluster at redhat.com>
Cc: 
Sent: Thursday, 3 January 2013 3:46 PM
Subject: Re: [Linux-cluster] GFS without creating a cluster

Hi,

On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote:
> Hi All ,
> 
> 
> Need few clarification regarding GFS. 
> 
> 
> I need to create a shared file system for our servers . The servers will write to the shared file system at the same time and there is no requirement for a cluster . 
> 
> Planning to use GFS but GFS requires cluster software to be running . My confusion here is If I just run the cluster software ( cman etc ) without creating a cluster , will I be able to configure and run GFS2. Also , is it possible to write to a GFS file system from many servers at the same time ? 
> 
> Will be great if somebody can clarify by doubts. 
> 
> 
> Thanks in Advance
> Zaman
> 
> 

> If you want to use GFS2 without a cluster, then you'll only be able to
> use it from a single node (just like if you were using ext3 for
> example). If you want to use GFS2 as intended, with multiple nodes
> accessing the same filesystem, then you'll need to set up a cluster in
> order to do so,

Thanks Steve for the reply . As you said setting up a cluster is needed to use GFS2 with multiple nodes, does that mean that I need to create cluster.conf or running cluster services (cman etc) should be fine for setting up GFS2. Not sure whether cman will run without creating cluster.conf

Assuming that I need to setup cluster.conf in order to use GFS2 , that means if there are two nodes in the cluster with GFS2 as file system resource , GFS2 will be mounted on only one host based on failover domain policy . But our requirement is like that GFS2 should be mounted on both servers at the same time? . Based on my little understanding of GFS , looks to me that I will not be able to achieve this using GFS2 or there are some way to achieve this ? 

Please clarify on this.


Thanks in Advance
Zaman


From torajveersingh at gmail.com  Thu Jan  3 15:08:10 2013
From: torajveersingh at gmail.com (Rajveer Singh)
Date: Thu, 3 Jan 2013 20:38:10 +0530
Subject: [Linux-cluster] GFS without creating a cluster
In-Reply-To: <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com>
References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com>
	<1357208199.2696.1.camel@menhir>
	<1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com>
Message-ID: <CAHG1G7HMt6Y8GXtdZOjCfZiqUaA1te09LwKmyGz6o1nMbwuKLg@mail.gmail.com>

On Thu, Jan 3, 2013 at 7:07 PM, Zama Ques <queszama at yahoo.in> wrote:

>
>
>
>
>
> ----- Original Message -----
> From: Steven Whitehouse <swhiteho at redhat.com>
> To: Zama Ques <queszama at yahoo.in>; linux clustering <
> linux-cluster at redhat.com>
> Cc:
> Sent: Thursday, 3 January 2013 3:46 PM
> Subject: Re: [Linux-cluster] GFS without creating a cluster
>
> Hi,
>
> On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote:
> > Hi All ,
> >
> >
> > Need few clarification regarding GFS.
> >
> >
> > I need to create a shared file system for our servers . The servers will
> write to the shared file system at the same time and there is no
> requirement for a cluster .
> >
> > Planning to use GFS but GFS requires cluster software to be running . My
> confusion here is If I just run the cluster software ( cman etc ) without
> creating a cluster , will I be able to configure and run GFS2. Also , is it
> possible to write to a GFS file system from many servers at the same time ?
> >
> > Will be great if somebody can clarify by doubts.
> >
> >
> > Thanks in Advance
> > Zaman
> >
> >
>
> > If you want to use GFS2 without a cluster, then you'll only be able to
> > use it from a single node (just like if you were using ext3 for
> > example). If you want to use GFS2 as intended, with multiple nodes
> > accessing the same filesystem, then you'll need to set up a cluster in
> > order to do so,
>
> Thanks Steve for the reply . As you said setting up a cluster is needed to
> use GFS2 with multiple nodes, does that mean that I need to create
> cluster.conf or running cluster services (cman etc) should be fine for
> setting up GFS2. Not sure whether cman will run without creating
> cluster.conf
>
> Assuming that I need to setup cluster.conf in order to use GFS2 , that
> means if there are two nodes in the cluster with GFS2 as file system
> resource , GFS2 will be mounted on only one host based on failover domain
> policy . But our requirement is like that GFS2 should be mounted on both
> servers at the same time  . Based on my little understanding of GFS , looks
> to me that I will not be able to achieve this using GFS2 or there are some
> way to achieve this ?
>
> Please clarify on this.
>
> Hi Zama,

As steve said, you must have to configure proper cluster to use GFS2
filesystem and mounted on multiple nodes at the same time so that all can
access it. You do not need to configure GFS2 filesystem to be managed by
cluster i.e. rgmanager. but just make the entry in /etc/fstab file as like
normal ext3 filesystem.

I hope, it answers your question.

Regards,
Rajveer Singh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130103/69c581e8/attachment.htm>

From queszama at yahoo.in  Thu Jan  3 15:22:35 2013
From: queszama at yahoo.in (Zama Ques)
Date: Thu, 3 Jan 2013 23:22:35 +0800 (SGT)
Subject: [Linux-cluster] GFS without creating a cluster
In-Reply-To: <CAHG1G7HMt6Y8GXtdZOjCfZiqUaA1te09LwKmyGz6o1nMbwuKLg@mail.gmail.com>
References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com>
	<1357208199.2696.1.camel@menhir>
	<1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com>
	<CAHG1G7HMt6Y8GXtdZOjCfZiqUaA1te09LwKmyGz6o1nMbwuKLg@mail.gmail.com>
Message-ID: <1357226555.85245.YahooMailNeo@web193503.mail.sg3.yahoo.com>


________________________________
 From: Rajveer Singh <torajveersingh at gmail.com>
To: Zama Ques <queszama at yahoo.in>; linux clustering <linux-cluster at redhat.com> 
Cc: Steven Whitehouse <swhiteho at redhat.com> 
Sent: Thursday, 3 January 2013 8:38 PM
Subject: Re: [Linux-cluster] GFS without creating a cluster
 

On Thu, Jan 3, 2013 at 7:07 PM, Zama Ques <queszama at yahoo.in> wrote:


>
>
>
>
>----- Original Message -----
>From: Steven Whitehouse <swhiteho at redhat.com>
>To: Zama Ques <queszama at yahoo.in>; linux clustering <linux-cluster at redhat.com>
>Cc:
>Sent: Thursday, 3 January 2013 3:46 PM
>Subject: Re: [Linux-cluster] GFS without creating a cluster
>
>Hi,
>
>On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote:
>> Hi All ,
>>
>>
>> Need few clarification regarding GFS.
>>
>>
>> I need to create a shared file system for our servers . The servers will write to the shared file system at the same time and there is no requirement for a cluster .
>>
>> Planning to use GFS but GFS requires cluster software to be running . My confusion here is If I just run the cluster software ( cman etc ) without creating a cluster , will I be able to configure and run GFS2. Also , is it possible to write to a GFS file system from many servers at the same time ?
>>
>> Will be great if somebody can clarify by doubts.
>>
>>
>> Thanks in Advance
>> Zaman
>>
>>
>
>> If you want to use GFS2 without a cluster, then you'll only be able to
>> use it from a single node (just like if you were using ext3 for
>> example). If you want to use GFS2 as intended, with multiple nodes
>> accessing the same filesystem, then you'll need to set up a cluster in
>> order to do so,
>
>Thanks Steve for the reply . As you said setting up a cluster is needed to use GFS2 with multiple nodes, does that mean that I need to create cluster.conf or running cluster services (cman etc) should be fine for setting up GFS2. Not sure whether cman will run without creating cluster.conf
>
>Assuming that I need to setup cluster.conf in order to use GFS2 , that means if there are two nodes in the cluster with GFS2 as file system resource , GFS2 will be mounted on only one host based on failover domain policy . But our requirement is like that GFS2 should be mounted on both servers at the same time? . Based on my little understanding of GFS , looks to me that I will not be able to achieve this using GFS2 or there are some way to achieve this ?
>
>Please clarify on this.
>
>
?> Hi Zama,
> As steve said, you must have to configure proper cluster to use GFS2 filesystem and mounted on multiple nodes at the same time so that all can > access it. You do not need to configure GFS2 filesystem to be managed by cluster i.e. rgmanager. but just make the entry in /etc/fstab file as like > normal ext3 filesystem.
> I hope, it answers your question.
Thanks Rajveer for clarifying . I think I am clear now . Will now try to configure GFS2. 


Thanks
Zaman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130103/9561e59b/attachment.htm>

From lists at alteeve.ca  Thu Jan  3 17:17:43 2013
From: lists at alteeve.ca (Digimer)
Date: Thu, 03 Jan 2013 12:17:43 -0500
Subject: [Linux-cluster] GFS without creating a cluster
In-Reply-To: <1357226555.85245.YahooMailNeo@web193503.mail.sg3.yahoo.com>
References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com>
	<1357208199.2696.1.camel@menhir>
	<1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com>
	<CAHG1G7HMt6Y8GXtdZOjCfZiqUaA1te09LwKmyGz6o1nMbwuKLg@mail.gmail.com>
	<1357226555.85245.YahooMailNeo@web193503.mail.sg3.yahoo.com>
Message-ID: <50E5BD37.9080804@alteeve.ca>

On 01/03/2013 10:22 AM, Zama Ques wrote:
> 
> 
> ------------------------------------------------------------------------
> *From:* Rajveer Singh <torajveersingh at gmail.com>
> *To:* Zama Ques <queszama at yahoo.in>; linux clustering
> <linux-cluster at redhat.com>
> *Cc:* Steven Whitehouse <swhiteho at redhat.com>
> *Sent:* Thursday, 3 January 2013 8:38 PM
> *Subject:* Re: [Linux-cluster] GFS without creating a cluster
> 
> 
> 
> On Thu, Jan 3, 2013 at 7:07 PM, Zama Ques <queszama at yahoo.in
> <mailto:queszama at yahoo.in>> wrote:
> 
> 
> 
> 
> 
> 
>     ----- Original Message -----
>     From: Steven Whitehouse <swhiteho at redhat.com
>     <mailto:swhiteho at redhat.com>>
>     To: Zama Ques <queszama at yahoo.in <mailto:queszama at yahoo.in>>; linux
>     clustering <linux-cluster at redhat.com <mailto:linux-cluster at redhat.com>>
>     Cc:
>     Sent: Thursday, 3 January 2013 3:46 PM
>     Subject: Re: [Linux-cluster] GFS without creating a cluster
> 
>     Hi,
> 
>     On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote:
>     > Hi All ,
>     >
>     >
>     > Need few clarification regarding GFS.
>     >
>     >
>     > I need to create a shared file system for our servers . The
>     servers will write to the shared file system at the same time and
>     there is no requirement for a cluster .
>     >
>     > Planning to use GFS but GFS requires cluster software to be
>     running . My confusion here is If I just run the cluster software (
>     cman etc ) without creating a cluster , will I be able to configure
>     and run GFS2. Also , is it possible to write to a GFS file system
>     from many servers at the same time ?
>     >
>     > Will be great if somebody can clarify by doubts.
>     >
>     >
>     > Thanks in Advance
>     > Zaman
>     >
>     >
> 
>     > If you want to use GFS2 without a cluster, then you'll only be able to
>     > use it from a single node (just like if you were using ext3 for
>     > example). If you want to use GFS2 as intended, with multiple nodes
>     > accessing the same filesystem, then you'll need to set up a cluster in
>     > order to do so,
> 
>     Thanks Steve for the reply . As you said setting up a cluster is
>     needed to use GFS2 with multiple nodes, does that mean that I need
>     to create cluster.conf or running cluster services (cman etc) should
>     be fine for setting up GFS2. Not sure whether cman will run without
>     creating cluster.conf
> 
>     Assuming that I need to setup cluster.conf in order to use GFS2 ,
>     that means if there are two nodes in the cluster with GFS2 as file
>     system resource , GFS2 will be mounted on only one host based on
>     failover domain policy . But our requirement is like that GFS2
>     should be mounted on both servers at the same time  . Based on my
>     little understanding of GFS , looks to me that I will not be able to
>     achieve this using GFS2 or there are some way to achieve this ?
> 
>     Please clarify on this.
> 
>  > Hi Zama,
>> As steve said, you must have to configure proper cluster to use GFS2
> filesystem and mounted on multiple nodes at the same time so that all
> can > access it. You do not need to configure GFS2 filesystem to be
> managed by cluster i.e. rgmanager. but just make the entry in /etc/fstab
> file as like > normal ext3 filesystem.
>> I hope, it answers your question.
> 
> Thanks Rajveer for clarifying . I think I am clear now . Will now try to
> configure GFS2.
> 
> 
> Thanks
> Zaman

Note that you will also need proper fencing setup (usually using the
nodes' IPMI interface). Without properly configured, tested fencing, the
first time a node fails the GFS2 partition will hang (by design).

The reason the cluster is needed is that the access to the shared
storage and file system has to be coordinated between the nodes so that
one node doesn't step on the other. This is possible thanks to DLM;
distributed lock manager. DLM uses the cluster communications, hence the
need for the cluster.

Note also that you need shared storage, obviously. iSCSI or DRBD if you
only have two nodes.

Please take a look at this link. It explains in details how this works;

https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From lists at alteeve.ca  Thu Jan  3 17:22:02 2013
From: lists at alteeve.ca (Digimer)
Date: Thu, 03 Jan 2013 12:22:02 -0500
Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out
In-Reply-To: <CANLWqqR0HzD+Vuoa-e4ynsEHqfXEYvzU2p3t907Z9z-S5gj36Q@mail.gmail.com>
References: <CANLWqqR0HzD+Vuoa-e4ynsEHqfXEYvzU2p3t907Z9z-S5gj36Q@mail.gmail.com>
Message-ID: <50E5BE3A.1090006@alteeve.ca>

On 01/03/2013 08:21 AM, Rainer Schubert wrote:
> Hi,
> 
> I have created a small CMAN-Cluster with 3 Nodes and a CLVM
> configuration. Now, I want to add a new node (mynode4). CMAN works
> fine, cman_tool shows all members:
> 
> # cman_tool nodes
> Node  Sts   Inc   Joined               Name
>    1   M    408   2013-01-03 14:00:57  mynode1
>    2   M    408   2013-01-03 14:00:57  mynode2
>    3   M    408   2013-01-03 14:00:57  mynode3
>    4   M    404   2013-01-03 14:00:56  mynode4
> 
> 
> cman_tool services (on mynode4)
> 
> fence domain
> member count  4
> victim count  0
> victim now    0
> master nodeid 1
> wait state    none
> members       1 2 3 4
> 
> 
> corosync:
> 
> corosync-cfgtool -s
> Printing ring status.
> Local node ID 4
> RING ID 0
>         id      = 10.10.10.13
>         status  = ring 0 active with no faults
> 
> Everything looks fine, from my site. No I will start clvmd
> 
> :~# /etc/init.d/clvm start
> Starting Cluster LVM Daemon: clvm clvmd startup timed out
> 
> The CLVM runs into a time out.
> 
> My System:
> 
>  cat /etc/debian_version
> 6.0.6
> 
> # lvm version
>   LVM version:     2.02.66(2) (2010-05-20)
>   Library version: 1.02.48 (2010-05-20)
>   Driver version:  4.22.0
> 
> dpkg -l |grep clvm
> ii  clvm                                2.02.66-5
> Cluster LVM Daemon for lvm2
> 
>  dpkg -l |grep cman
> ii  cman                                3.0.12-2
> Red Hat cluster suite - cluster manager
> ii  libcman3                            3.0.12-2
> Red Hat cluster suite - cluster manager libraries
> 
> Have anybody a idea, what running false?
> 
> best regards
> 

Can you post your cluster.conf please? Obfuscate as little as you can
please.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From radu.rendec at mindbit.ro  Thu Jan  3 20:47:11 2013
From: radu.rendec at mindbit.ro (Radu Rendec)
Date: Thu, 03 Jan 2013 22:47:11 +0200
Subject: [Linux-cluster] GFS without creating a cluster
In-Reply-To: <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com>
References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com>
	<1357208199.2696.1.camel@menhir>
	<1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com>
Message-ID: <1357246031.9208.127.camel@localhost>

On Thu, 2013-01-03 at 21:37 +0800, Zama Ques wrote:
> Thanks Steve for the reply . As you said setting up a cluster is
> needed to use GFS2 with multiple nodes, does that mean that I need to
> create cluster.conf or running cluster services (cman etc) should be
> fine for setting up GFS2. Not sure whether cman will run without
> creating cluster.conf
> 
> Assuming that I need to setup cluster.conf in order to use GFS2 , that
> means if there are two nodes in the cluster with GFS2 as file system
> resource , GFS2 will be mounted on only one host based on failover
> domain policy . But our requirement is like that GFS2 should be
> mounted on both servers at the same time  . Based on my little
> understanding of GFS , looks to me that I will not be able to achieve
> this using GFS2 or there are some way to achieve this ? 

Hi,

This may be a little bit off-topic for this list (as it focuses on the
clustering suite) but if all you need is a shared filesystem (without
the clustering) you may want to take a look at glusterfs
(www.gluster.org).

Cheers,

Radu


From rainer.hartwig.schubert at gmail.com  Fri Jan  4 07:26:16 2013
From: rainer.hartwig.schubert at gmail.com (Rainer Schubert)
Date: Fri, 4 Jan 2013 08:26:16 +0100
Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out
In-Reply-To: <50E5BE3A.1090006@alteeve.ca>
References: <CANLWqqR0HzD+Vuoa-e4ynsEHqfXEYvzU2p3t907Z9z-S5gj36Q@mail.gmail.com>
	<50E5BE3A.1090006@alteeve.ca>
Message-ID: <CANLWqqQpHG547O4BBo7=xPw+L0ojG_wNfYJoVX5udnnQLe_nWg@mail.gmail.com>

Hi,

my cluster.conf:

<?xml version="1.0"?>
<cluster name="my_lvm_cluster" config_version="2">
               <clusternodes>
               <clusternode name="mynode1" nodeid="1">
               </clusternode>

               <clusternode name="mynode2" nodeid="2">
               </clusternode>
               <clusternode name="mynode3" nodeid="3">
               </clusternode>
	       <clusternode name="mynode4" nodeid="4">
               </clusternode>
               </clusternodes>
</cluster>

2013/1/3 Digimer <lists at alteeve.ca>:
> On 01/03/2013 08:21 AM, Rainer Schubert wrote:
>> Hi,
>>
>> I have created a small CMAN-Cluster with 3 Nodes and a CLVM
>> configuration. Now, I want to add a new node (mynode4). CMAN works
>> fine, cman_tool shows all members:
>>
>> # cman_tool nodes
>> Node  Sts   Inc   Joined               Name
>>    1   M    408   2013-01-03 14:00:57  mynode1
>>    2   M    408   2013-01-03 14:00:57  mynode2
>>    3   M    408   2013-01-03 14:00:57  mynode3
>>    4   M    404   2013-01-03 14:00:56  mynode4
>>
>>
>> cman_tool services (on mynode4)
>>
>> fence domain
>> member count  4
>> victim count  0
>> victim now    0
>> master nodeid 1
>> wait state    none
>> members       1 2 3 4
>>
>>
>> corosync:
>>
>> corosync-cfgtool -s
>> Printing ring status.
>> Local node ID 4
>> RING ID 0
>>         id      = 10.10.10.13
>>         status  = ring 0 active with no faults
>>
>> Everything looks fine, from my site. No I will start clvmd
>>
>> :~# /etc/init.d/clvm start
>> Starting Cluster LVM Daemon: clvm clvmd startup timed out
>>
>> The CLVM runs into a time out.
>>
>> My System:
>>
>>  cat /etc/debian_version
>> 6.0.6
>>
>> # lvm version
>>   LVM version:     2.02.66(2) (2010-05-20)
>>   Library version: 1.02.48 (2010-05-20)
>>   Driver version:  4.22.0
>>
>> dpkg -l |grep clvm
>> ii  clvm                                2.02.66-5
>> Cluster LVM Daemon for lvm2
>>
>>  dpkg -l |grep cman
>> ii  cman                                3.0.12-2
>> Red Hat cluster suite - cluster manager
>> ii  libcman3                            3.0.12-2
>> Red Hat cluster suite - cluster manager libraries
>>
>> Have anybody a idea, what running false?
>>
>> best regards
>>
>
> Can you post your cluster.conf please? Obfuscate as little as you can
> please.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?


From emi2fast at gmail.com  Fri Jan  4 08:14:07 2013
From: emi2fast at gmail.com (emmanuel segura)
Date: Fri, 4 Jan 2013 09:14:07 +0100
Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out
In-Reply-To: <CANLWqqQpHG547O4BBo7=xPw+L0ojG_wNfYJoVX5udnnQLe_nWg@mail.gmail.com>
References: <CANLWqqR0HzD+Vuoa-e4ynsEHqfXEYvzU2p3t907Z9z-S5gj36Q@mail.gmail.com>
	<50E5BE3A.1090006@alteeve.ca>
	<CANLWqqQpHG547O4BBo7=xPw+L0ojG_wNfYJoVX5udnnQLe_nWg@mail.gmail.com>
Message-ID: <CAE7pJ3CZyo2Gz+GckQ+obtZVd0QwiyREgnE7y+O6NuWaK_REcw@mail.gmail.com>

Hello

I think you need the fecing

2013/1/4 Rainer Schubert <rainer.hartwig.schubert at gmail.com>

> Hi,
>
> my cluster.conf:
>
> <?xml version="1.0"?>
> <cluster name="my_lvm_cluster" config_version="2">
>                <clusternodes>
>                <clusternode name="mynode1" nodeid="1">
>                </clusternode>
>
>                <clusternode name="mynode2" nodeid="2">
>                </clusternode>
>                <clusternode name="mynode3" nodeid="3">
>                </clusternode>
>                <clusternode name="mynode4" nodeid="4">
>                </clusternode>
>                </clusternodes>
> </cluster>
>
> 2013/1/3 Digimer <lists at alteeve.ca>:
> > On 01/03/2013 08:21 AM, Rainer Schubert wrote:
> >> Hi,
> >>
> >> I have created a small CMAN-Cluster with 3 Nodes and a CLVM
> >> configuration. Now, I want to add a new node (mynode4). CMAN works
> >> fine, cman_tool shows all members:
> >>
> >> # cman_tool nodes
> >> Node  Sts   Inc   Joined               Name
> >>    1   M    408   2013-01-03 14:00:57  mynode1
> >>    2   M    408   2013-01-03 14:00:57  mynode2
> >>    3   M    408   2013-01-03 14:00:57  mynode3
> >>    4   M    404   2013-01-03 14:00:56  mynode4
> >>
> >>
> >> cman_tool services (on mynode4)
> >>
> >> fence domain
> >> member count  4
> >> victim count  0
> >> victim now    0
> >> master nodeid 1
> >> wait state    none
> >> members       1 2 3 4
> >>
> >>
> >> corosync:
> >>
> >> corosync-cfgtool -s
> >> Printing ring status.
> >> Local node ID 4
> >> RING ID 0
> >>         id      = 10.10.10.13
> >>         status  = ring 0 active with no faults
> >>
> >> Everything looks fine, from my site. No I will start clvmd
> >>
> >> :~# /etc/init.d/clvm start
> >> Starting Cluster LVM Daemon: clvm clvmd startup timed out
> >>
> >> The CLVM runs into a time out.
> >>
> >> My System:
> >>
> >>  cat /etc/debian_version
> >> 6.0.6
> >>
> >> # lvm version
> >>   LVM version:     2.02.66(2) (2010-05-20)
> >>   Library version: 1.02.48 (2010-05-20)
> >>   Driver version:  4.22.0
> >>
> >> dpkg -l |grep clvm
> >> ii  clvm                                2.02.66-5
> >> Cluster LVM Daemon for lvm2
> >>
> >>  dpkg -l |grep cman
> >> ii  cman                                3.0.12-2
> >> Red Hat cluster suite - cluster manager
> >> ii  libcman3                            3.0.12-2
> >> Red Hat cluster suite - cluster manager libraries
> >>
> >> Have anybody a idea, what running false?
> >>
> >> best regards
> >>
> >
> > Can you post your cluster.conf please? Obfuscate as little as you can
> > please.
> >
> > --
> > Digimer
> > Papers and Projects: https://alteeve.ca/w/
> > What if the cure for cancer is trapped in the mind of a person without
> > access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130104/f1ed5d65/attachment.htm>

From emi2fast at gmail.com  Fri Jan  4 08:14:41 2013
From: emi2fast at gmail.com (emmanuel segura)
Date: Fri, 4 Jan 2013 09:14:41 +0100
Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out
In-Reply-To: <CAE7pJ3CZyo2Gz+GckQ+obtZVd0QwiyREgnE7y+O6NuWaK_REcw@mail.gmail.com>
References: <CANLWqqR0HzD+Vuoa-e4ynsEHqfXEYvzU2p3t907Z9z-S5gj36Q@mail.gmail.com>
	<50E5BE3A.1090006@alteeve.ca>
	<CANLWqqQpHG547O4BBo7=xPw+L0ojG_wNfYJoVX5udnnQLe_nWg@mail.gmail.com>
	<CAE7pJ3CZyo2Gz+GckQ+obtZVd0QwiyREgnE7y+O6NuWaK_REcw@mail.gmail.com>
Message-ID: <CAE7pJ3CSF3B37+mp6YWGY2nQ1r_vKWkdhzMa8+eZeuYt-oZiUg@mail.gmail.com>

Sorry for my bad english

I think you need the fencing

2013/1/4 emmanuel segura <emi2fast at gmail.com>

> Hello
>
> I think you need the fecing
>
>
> 2013/1/4 Rainer Schubert <rainer.hartwig.schubert at gmail.com>
>
>> Hi,
>>
>> my cluster.conf:
>>
>> <?xml version="1.0"?>
>> <cluster name="my_lvm_cluster" config_version="2">
>>                <clusternodes>
>>                <clusternode name="mynode1" nodeid="1">
>>                </clusternode>
>>
>>                <clusternode name="mynode2" nodeid="2">
>>                </clusternode>
>>                <clusternode name="mynode3" nodeid="3">
>>                </clusternode>
>>                <clusternode name="mynode4" nodeid="4">
>>                </clusternode>
>>                </clusternodes>
>> </cluster>
>>
>> 2013/1/3 Digimer <lists at alteeve.ca>:
>> > On 01/03/2013 08:21 AM, Rainer Schubert wrote:
>> >> Hi,
>> >>
>> >> I have created a small CMAN-Cluster with 3 Nodes and a CLVM
>> >> configuration. Now, I want to add a new node (mynode4). CMAN works
>> >> fine, cman_tool shows all members:
>> >>
>> >> # cman_tool nodes
>> >> Node  Sts   Inc   Joined               Name
>> >>    1   M    408   2013-01-03 14:00:57  mynode1
>> >>    2   M    408   2013-01-03 14:00:57  mynode2
>> >>    3   M    408   2013-01-03 14:00:57  mynode3
>> >>    4   M    404   2013-01-03 14:00:56  mynode4
>> >>
>> >>
>> >> cman_tool services (on mynode4)
>> >>
>> >> fence domain
>> >> member count  4
>> >> victim count  0
>> >> victim now    0
>> >> master nodeid 1
>> >> wait state    none
>> >> members       1 2 3 4
>> >>
>> >>
>> >> corosync:
>> >>
>> >> corosync-cfgtool -s
>> >> Printing ring status.
>> >> Local node ID 4
>> >> RING ID 0
>> >>         id      = 10.10.10.13
>> >>         status  = ring 0 active with no faults
>> >>
>> >> Everything looks fine, from my site. No I will start clvmd
>> >>
>> >> :~# /etc/init.d/clvm start
>> >> Starting Cluster LVM Daemon: clvm clvmd startup timed out
>> >>
>> >> The CLVM runs into a time out.
>> >>
>> >> My System:
>> >>
>> >>  cat /etc/debian_version
>> >> 6.0.6
>> >>
>> >> # lvm version
>> >>   LVM version:     2.02.66(2) (2010-05-20)
>> >>   Library version: 1.02.48 (2010-05-20)
>> >>   Driver version:  4.22.0
>> >>
>> >> dpkg -l |grep clvm
>> >> ii  clvm                                2.02.66-5
>> >> Cluster LVM Daemon for lvm2
>> >>
>> >>  dpkg -l |grep cman
>> >> ii  cman                                3.0.12-2
>> >> Red Hat cluster suite - cluster manager
>> >> ii  libcman3                            3.0.12-2
>> >> Red Hat cluster suite - cluster manager libraries
>> >>
>> >> Have anybody a idea, what running false?
>> >>
>> >> best regards
>> >>
>> >
>> > Can you post your cluster.conf please? Obfuscate as little as you can
>> > please.
>> >
>> > --
>> > Digimer
>> > Papers and Projects: https://alteeve.ca/w/
>> > What if the cure for cancer is trapped in the mind of a person without
>> > access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130104/c52b7a20/attachment.htm>

From lists at alteeve.ca  Fri Jan  4 14:55:24 2013
From: lists at alteeve.ca (Digimer)
Date: Fri, 04 Jan 2013 09:55:24 -0500
Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out
In-Reply-To: <CANLWqqQpHG547O4BBo7=xPw+L0ojG_wNfYJoVX5udnnQLe_nWg@mail.gmail.com>
References: <CANLWqqR0HzD+Vuoa-e4ynsEHqfXEYvzU2p3t907Z9z-S5gj36Q@mail.gmail.com>
	<50E5BE3A.1090006@alteeve.ca>
	<CANLWqqQpHG547O4BBo7=xPw+L0ojG_wNfYJoVX5udnnQLe_nWg@mail.gmail.com>
Message-ID: <50E6ED5C.1090003@alteeve.ca>

As Emmanuel said, you need fencing.

Please read this:

https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing

digimer

On 01/04/2013 02:26 AM, Rainer Schubert wrote:
> Hi,
> 
> my cluster.conf:
> 
> <?xml version="1.0"?>
> <cluster name="my_lvm_cluster" config_version="2">
>                <clusternodes>
>                <clusternode name="mynode1" nodeid="1">
>                </clusternode>
> 
>                <clusternode name="mynode2" nodeid="2">
>                </clusternode>
>                <clusternode name="mynode3" nodeid="3">
>                </clusternode>
> 	       <clusternode name="mynode4" nodeid="4">
>                </clusternode>
>                </clusternodes>
> </cluster>
> 
> 2013/1/3 Digimer <lists at alteeve.ca>:
>> On 01/03/2013 08:21 AM, Rainer Schubert wrote:
>>> Hi,
>>>
>>> I have created a small CMAN-Cluster with 3 Nodes and a CLVM
>>> configuration. Now, I want to add a new node (mynode4). CMAN works
>>> fine, cman_tool shows all members:
>>>
>>> # cman_tool nodes
>>> Node  Sts   Inc   Joined               Name
>>>    1   M    408   2013-01-03 14:00:57  mynode1
>>>    2   M    408   2013-01-03 14:00:57  mynode2
>>>    3   M    408   2013-01-03 14:00:57  mynode3
>>>    4   M    404   2013-01-03 14:00:56  mynode4
>>>
>>>
>>> cman_tool services (on mynode4)
>>>
>>> fence domain
>>> member count  4
>>> victim count  0
>>> victim now    0
>>> master nodeid 1
>>> wait state    none
>>> members       1 2 3 4
>>>
>>>
>>> corosync:
>>>
>>> corosync-cfgtool -s
>>> Printing ring status.
>>> Local node ID 4
>>> RING ID 0
>>>         id      = 10.10.10.13
>>>         status  = ring 0 active with no faults
>>>
>>> Everything looks fine, from my site. No I will start clvmd
>>>
>>> :~# /etc/init.d/clvm start
>>> Starting Cluster LVM Daemon: clvm clvmd startup timed out
>>>
>>> The CLVM runs into a time out.
>>>
>>> My System:
>>>
>>>  cat /etc/debian_version
>>> 6.0.6
>>>
>>> # lvm version
>>>   LVM version:     2.02.66(2) (2010-05-20)
>>>   Library version: 1.02.48 (2010-05-20)
>>>   Driver version:  4.22.0
>>>
>>> dpkg -l |grep clvm
>>> ii  clvm                                2.02.66-5
>>> Cluster LVM Daemon for lvm2
>>>
>>>  dpkg -l |grep cman
>>> ii  cman                                3.0.12-2
>>> Red Hat cluster suite - cluster manager
>>> ii  libcman3                            3.0.12-2
>>> Red Hat cluster suite - cluster manager libraries
>>>
>>> Have anybody a idea, what running false?
>>>
>>> best regards
>>>
>>
>> Can you post your cluster.conf please? Obfuscate as little as you can
>> please.
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From mkathuria at tuxtechnologies.co.in  Sat Jan  5 07:23:36 2013
From: mkathuria at tuxtechnologies.co.in (Manish Kathuria)
Date: Sat, 5 Jan 2013 12:53:36 +0530
Subject: [Linux-cluster] Packet loss after configuring Ethernet bonding
In-Reply-To: <509DD694.1000900@alteeve.ca>
References: <1352514375.40862.YahooMailNeo@web193003.mail.sg3.yahoo.com>
	<509DC1E9.9090704@alteeve.ca>
	<1352520739.40244.YahooMailNeo@web193002.mail.sg3.yahoo.com>
	<509DD694.1000900@alteeve.ca>
Message-ID: <CALiQAgkOHTs7fHVaOD-6nO9X9rdho2SYogjp45=LjXwZfA4G8Q@mail.gmail.com>

On Sat, Nov 10, 2012 at 9:52 AM, Digimer <lists at alteeve.ca> wrote:
> On 11/09/2012 11:12 PM, Zama Ques wrote:

>>> Need help on resolving a issue related to implementing High Availability at network level . I understand that this is not the right forum to ask this question , but since it is related to HA and Linux , I am asking here and I feel somebody here  will have answer to the issues I am facing .
>>>
>>> I am trying to implement Ethernet Bonding , Both the interface in my server are connected to two different network switches .
>>>
>>> My configuration is as follows:
>>>
>>> ========
>>> # cat /proc/net/bonding/bond0
>>>
>>> Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
>>>
>>> Bonding Mode: adaptive load balancing Primary Slave: None Currently
>>> Active Slave: eth0 MII Status: up MII Polling Interval (ms): 0 Up Delay
>>> (ms): 0 Down Delay (ms): 0
>>>
>>> Slave Interface: eth0 MII Status: up Speed: 1000 Mbps Duplex: full Link
>>> Failure Count: 0 Permanent HW addr: e4:e1:5b:d0:11:10 Slave queue ID: 0
>>>
>>> Slave Interface: eth1 MII Status: up Speed: 1000 Mbps Duplex: full Link
>>> Failure Count: 0 Permanent HW addr: e4:e1:5b:d0:11:14 Slave queue ID: 0
>>> ------------
>>> # cat /sys/class/net/bond0/bonding/mode
>>>
>>>    balance-alb 6
>>>
>>>
>>> # cat /sys/class/net/bond0/bonding/miimon
>>>     0
>>>
>>> ============
>>>
>>>
>>> The issue for me is that I am seeing packet loss after configuring bonding .  Tried connecting both the interface to the same switch , but still seeing the packet loss . Also , tried changing miimon value to 100 , but still seeing the packet loss.
>>>
>>> What I am missing in the configuration ? Any help will be highly appreciated in resolving the problem .
>>>
>>>
>>>
>>> Thanks
>>> Zaman
>>
>>  > You didn't share any details on your configuration, but I will assume
>>> you are using corosync.
>>
>>> The only supported bonding mode is Active/Passive (mode=1). I've
>>> personally tried all modes, out of curiosity, and all had problems. The
>>> short of it is that if you need more that 1 gbit of performance, buy
>>> faster cards.
>>
>>> If you are interested in what I use, it's documented here:
>>
>>>   https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Network
>>
>>>   I've used this setup in several production clusters and have tested
>>>   failure are recovery extensively. It's proven very stable. :)
>>
>>
>> Thanks Digimer for the quick response and pointing me to the link . I am yet to reach cluster configuration , initially trying to  understand ethernet bonding before going into cluster configuration. So , option for me is only to use Active/Passive bonding mode in case of clustered environment.
>> Few more clarifications needed , Can we use other bonding modes in non clustered environment .  I am seeing packet loss in other modes . Also , the support of  using only mode=1 in cluster environment is it a restriction of RHEL Cluster suite or it is by design .
>>
>> Will be great if you clarify these queries .
>>
>> Thanks in Advance
>> Zaman
>
> Corosync is the only actively developed/supported (HA) cluster
> communications and membership tool. It's used on all modern distros for
> clustering and the requirement for mode=1 is with it. As such, it
> doesn't matter which OS you are on, it's the only mode that will work
> (reliably).
>
> The problem is that corosync needs to detect state changes quickly. It
> does this using the totem protocol (which serves other purposes), which
> passes a token around the nodes in the cluster. If a node is sent a
> token and the token is not returned within a time-out period, it is
> declared lost and a new token is dispatched. Once too many failures
> occur in a row, the node is declared lost and it is ejected from the
> cluster. This process is detailed in the link above under the "Concept;
> Fencing" section.
>
> With all modes other than mode=1, the failure recovery and/or the
> restoration of a link in the bond causes a sufficient disruption to
> cause a node to be declared lost. As I mentioned, this matches my
> experience in testing the other modes. It isn't an arbitrary rule.
>
> As for non-clustered traffic; the usefulness of other bond modes depends
> entirely on the traffic you are pushing over it. Personally, I am
> focused on HA in clusters, so I only use mode=1, regardless of the
> traffic designed for it.
>
> digimer

I was dealing with an issue where network performance had to be
improved in a high availability cluster and while going through the
archives I saw this thread.

Would this condition of bonding mode being 1 (or active backup) also
apply when we have different interfaces for cluster communication and
service networks ? In such a scenario, can't we have the bonding mode
for the cluster communication network interfaces as 1 and the bonding
mode for the interfaces on service network as 0 or 5 (or any other
suitable mode) ?

Thanks,
--
Manish


From lists at alteeve.ca  Sat Jan  5 19:20:10 2013
From: lists at alteeve.ca (Digimer)
Date: Sat, 05 Jan 2013 14:20:10 -0500
Subject: [Linux-cluster] Packet loss after configuring Ethernet bonding
In-Reply-To: <CALiQAgkOHTs7fHVaOD-6nO9X9rdho2SYogjp45=LjXwZfA4G8Q@mail.gmail.com>
References: <1352514375.40862.YahooMailNeo@web193003.mail.sg3.yahoo.com>
	<509DC1E9.9090704@alteeve.ca>
	<1352520739.40244.YahooMailNeo@web193002.mail.sg3.yahoo.com>
	<509DD694.1000900@alteeve.ca>
	<CALiQAgkOHTs7fHVaOD-6nO9X9rdho2SYogjp45=LjXwZfA4G8Q@mail.gmail.com>
Message-ID: <50E87CEA.6090609@alteeve.ca>

On 01/05/2013 02:23 AM, Manish Kathuria wrote:
> On Sat, Nov 10, 2012 at 9:52 AM, Digimer <lists at alteeve.ca> wrote:
>> On 11/09/2012 11:12 PM, Zama Ques wrote:
> 
>>>> Need help on resolving a issue related to implementing High Availability at network level . I understand that this is not the right forum to ask this question , but since it is related to HA and Linux , I am asking here and I feel somebody here  will have answer to the issues I am facing .
>>>>
>>>> I am trying to implement Ethernet Bonding , Both the interface in my server are connected to two different network switches .
>>>>
>>>> My configuration is as follows:
>>>>
>>>> ========
>>>> # cat /proc/net/bonding/bond0
>>>>
>>>> Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
>>>>
>>>> Bonding Mode: adaptive load balancing Primary Slave: None Currently
>>>> Active Slave: eth0 MII Status: up MII Polling Interval (ms): 0 Up Delay
>>>> (ms): 0 Down Delay (ms): 0
>>>>
>>>> Slave Interface: eth0 MII Status: up Speed: 1000 Mbps Duplex: full Link
>>>> Failure Count: 0 Permanent HW addr: e4:e1:5b:d0:11:10 Slave queue ID: 0
>>>>
>>>> Slave Interface: eth1 MII Status: up Speed: 1000 Mbps Duplex: full Link
>>>> Failure Count: 0 Permanent HW addr: e4:e1:5b:d0:11:14 Slave queue ID: 0
>>>> ------------
>>>> # cat /sys/class/net/bond0/bonding/mode
>>>>
>>>>    balance-alb 6
>>>>
>>>>
>>>> # cat /sys/class/net/bond0/bonding/miimon
>>>>     0
>>>>
>>>> ============
>>>>
>>>>
>>>> The issue for me is that I am seeing packet loss after configuring bonding .  Tried connecting both the interface to the same switch , but still seeing the packet loss . Also , tried changing miimon value to 100 , but still seeing the packet loss.
>>>>
>>>> What I am missing in the configuration ? Any help will be highly appreciated in resolving the problem .
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Zaman
>>>
>>>  > You didn't share any details on your configuration, but I will assume
>>>> you are using corosync.
>>>
>>>> The only supported bonding mode is Active/Passive (mode=1). I've
>>>> personally tried all modes, out of curiosity, and all had problems. The
>>>> short of it is that if you need more that 1 gbit of performance, buy
>>>> faster cards.
>>>
>>>> If you are interested in what I use, it's documented here:
>>>
>>>>   https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Network
>>>
>>>>   I've used this setup in several production clusters and have tested
>>>>   failure are recovery extensively. It's proven very stable. :)
>>>
>>>
>>> Thanks Digimer for the quick response and pointing me to the link . I am yet to reach cluster configuration , initially trying to  understand ethernet bonding before going into cluster configuration. So , option for me is only to use Active/Passive bonding mode in case of clustered environment.
>>> Few more clarifications needed , Can we use other bonding modes in non clustered environment .  I am seeing packet loss in other modes . Also , the support of  using only mode=1 in cluster environment is it a restriction of RHEL Cluster suite or it is by design .
>>>
>>> Will be great if you clarify these queries .
>>>
>>> Thanks in Advance
>>> Zaman
>>
>> Corosync is the only actively developed/supported (HA) cluster
>> communications and membership tool. It's used on all modern distros for
>> clustering and the requirement for mode=1 is with it. As such, it
>> doesn't matter which OS you are on, it's the only mode that will work
>> (reliably).
>>
>> The problem is that corosync needs to detect state changes quickly. It
>> does this using the totem protocol (which serves other purposes), which
>> passes a token around the nodes in the cluster. If a node is sent a
>> token and the token is not returned within a time-out period, it is
>> declared lost and a new token is dispatched. Once too many failures
>> occur in a row, the node is declared lost and it is ejected from the
>> cluster. This process is detailed in the link above under the "Concept;
>> Fencing" section.
>>
>> With all modes other than mode=1, the failure recovery and/or the
>> restoration of a link in the bond causes a sufficient disruption to
>> cause a node to be declared lost. As I mentioned, this matches my
>> experience in testing the other modes. It isn't an arbitrary rule.
>>
>> As for non-clustered traffic; the usefulness of other bond modes depends
>> entirely on the traffic you are pushing over it. Personally, I am
>> focused on HA in clusters, so I only use mode=1, regardless of the
>> traffic designed for it.
>>
>> digimer
> 
> I was dealing with an issue where network performance had to be
> improved in a high availability cluster and while going through the
> archives I saw this thread.
> 
> Would this condition of bonding mode being 1 (or active backup) also
> apply when we have different interfaces for cluster communication and
> service networks ? In such a scenario, can't we have the bonding mode
> for the cluster communication network interfaces as 1 and the bonding
> mode for the interfaces on service network as 0 or 5 (or any other
> suitable mode) ?
> 
> Thanks,
> --
> Manish

That should be fine. Note though that if you use your other network as a
backup totem ring, and for some reason corosync fails over to that ring,
it will fail back again if a member in the non-mode=1 bond hiccups or fails.

I've not tested this though, of course, so there might be a gotcha I
don't know about.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From queszama at yahoo.in  Sun Jan  6 02:35:30 2013
From: queszama at yahoo.in (Zama Ques)
Date: Sun, 6 Jan 2013 10:35:30 +0800 (SGT)
Subject: [Linux-cluster] GFS without creating a cluster
In-Reply-To: <50E5BD37.9080804@alteeve.ca>
References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com>
	<1357208199.2696.1.camel@menhir>
	<1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com>
	<CAHG1G7HMt6Y8GXtdZOjCfZiqUaA1te09LwKmyGz6o1nMbwuKLg@mail.gmail.com>
	<1357226555.85245.YahooMailNeo@web193503.mail.sg3.yahoo.com>
	<50E5BD37.9080804@alteeve.ca>
Message-ID: <1357439730.4360.YahooMailNeo@web193506.mail.sg3.yahoo.com>


________________________________
 From: Digimer <lists at alteeve.ca>
To: Zama Ques <queszama at yahoo.in>; linux clustering <linux-cluster at redhat.com> 
Cc: Rajveer Singh <torajveersingh at gmail.com> 
Sent: Thursday, 3 January 2013 10:47 PM
Subject: Re: [Linux-cluster] GFS without creating a cluster
 
On 01/03/2013 10:22 AM, Zama Ques wrote:
>? ?  ----- Original Message -----
>? ?  From: Steven Whitehouse <swhiteho at redhat.com
>? ?  <mailto:swhiteho at redhat.com>>
>? ?  To: Zama Ques <queszama at yahoo.in <mailto:queszama at yahoo.in>>; linux
>? ?  clustering <linux-cluster at redhat.com <mailto:linux-cluster at redhat.com>>
>? ?  Cc:
>? ?  Sent: Thursday, 3 January 2013 3:46 PM
>? ?  Subject: Re: [Linux-cluster] GFS without creating a cluster
> 
>? ?  Hi,
> 
>? ?  On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote:
>? ?  > Hi All ,
>? ?  >
>? ?  >
>? ?  > Need few clarification regarding GFS.
>? ?  >
>? ?  >
>? ?  > I need to create a shared file system for our servers . The
>? ?  servers will write to the shared file system at the same time and
>? ?  there is no requirement for a cluster .
>? ?  >
>? ?  > Planning to use GFS but GFS requires cluster software to be
>? ?  running . My confusion here is If I just run the cluster software (
>? ?  cman etc ) without creating a cluster , will I be able to configure
>? ?  and run GFS2. Also , is it possible to write to a GFS file system
>? ?  from many servers at the same time ?
>? ?  >
>? ?  > Will be great if somebody can clarify by doubts.
>? ?  >
>? ?  >
>? ?  > Thanks in Advance
>? ?  > Zaman
>? ?  >
>? ?  >
> 
>? ?  > If you want to use GFS2 without a cluster, then you'll only be able to
>? ?  > use it from a single node (just like if you were using ext3 for
>? ?  > example). If you want to use GFS2 as intended, with multiple nodes
>? ?  > accessing the same filesystem, then you'll need to set up a cluster in
>? ?  > order to do so,
> 
>? ?  Thanks Steve for the reply . As you said setting up a cluster is
>? ?  needed to use GFS2 with multiple nodes, does that mean that I need
>? ?  to create cluster.conf or running cluster services (cman etc) should
>? ?  be fine for setting up GFS2. Not sure whether cman will run without
>? ?  creating cluster.conf
> 
>? ?  Assuming that I need to setup cluster.conf in order to use GFS2 ,
>? ?  that means if there are two nodes in the cluster with GFS2 as file
>? ?  system resource , GFS2 will be mounted on only one host based on
>? ?  failover domain policy . But our requirement is like that GFS2
>? ?  should be mounted on both servers at the same time? . Based on my
>? ?  little understanding of GFS , looks to me that I will not be able to
>? ?  achieve this using GFS2 or there are some way to achieve this ?
> 
>? ?  Please clarify on this.
> 
>? > Hi Zama,
>> As steve said, you must have to configure proper cluster to use GFS2
> filesystem and mounted on multiple nodes at the same time so that all
> can > access it. You do not need to configure GFS2 filesystem to be
> managed by cluster i.e. rgmanager. but just make the entry in /etc/fstab
> file as like > normal ext3 filesystem.
>> I hope, it answers your question.
> 
> Thanks Rajveer for clarifying . I think I am clear now . Will now try to
> configure GFS2.
> 
> 
> Thanks
> Zaman

> Note that you will also need proper fencing setup (usually using the
> nodes' IPMI interface). Without properly configured, tested fencing, the
> first time a node fails the GFS2 partition will hang (by design).

> The reason the cluster is needed is that the access to the shared
> storage and file system has to be coordinated between the nodes so that
> one node doesn't step on the other. This is possible thanks to DLM;
> distributed lock manager. DLM uses the cluster communications, hence the
> need for the cluster.

> Note also that you need shared storage, obviously. iSCSI or DRBD if you
> only have two nodes.
?
> Please take a look at this link. It explains in details how this works;

>? https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing

Thanks Digimer for pointing the need of proper fencing setup . After configuring GFS , I did power down on one of the node and could see that the 
GFS mount point got hung on the other host as you have pointed out . Will now try to add fencing to the cluster. 

We are using HP Storage works for shared storage and accessing space from it using multipathing. 


Thanks
Zaman


Thanks Digimer. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130106/f5d1a43f/attachment.htm>

From rainer.hartwig.schubert at gmail.com  Mon Jan  7 07:55:44 2013
From: rainer.hartwig.schubert at gmail.com (Rainer Schubert)
Date: Mon, 7 Jan 2013 08:55:44 +0100
Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out
In-Reply-To: <50E6ED5C.1090003@alteeve.ca>
References: <CANLWqqR0HzD+Vuoa-e4ynsEHqfXEYvzU2p3t907Z9z-S5gj36Q@mail.gmail.com>
	<50E5BE3A.1090006@alteeve.ca>
	<CANLWqqQpHG547O4BBo7=xPw+L0ojG_wNfYJoVX5udnnQLe_nWg@mail.gmail.com>
	<50E6ED5C.1090003@alteeve.ca>
Message-ID: <CANLWqqRSakjMKb2-JugS6Lxy0H0gHdJEpyrV+VZg8V3MqRxoQg@mail.gmail.com>

Hi,

thank you for the fast answer. Now I have one question:

- It is possible to integrate the fencing-service on a working cluster?

I have working virtualisation enviroment with 50 VMs, so i can't take
them down.

best regards


2013/1/4 Digimer <lists at alteeve.ca>:
> As Emmanuel said, you need fencing.
>
> Please read this:
>
> https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing
>
> digimer
>
> On 01/04/2013 02:26 AM, Rainer Schubert wrote:
>> Hi,
>>
>> my cluster.conf:
>>
>> <?xml version="1.0"?>
>> <cluster name="my_lvm_cluster" config_version="2">
>>                <clusternodes>
>>                <clusternode name="mynode1" nodeid="1">
>>                </clusternode>
>>
>>                <clusternode name="mynode2" nodeid="2">
>>                </clusternode>
>>                <clusternode name="mynode3" nodeid="3">
>>                </clusternode>
>>              <clusternode name="mynode4" nodeid="4">
>>                </clusternode>
>>                </clusternodes>
>> </cluster>
>>
>> 2013/1/3 Digimer <lists at alteeve.ca>:
>>> On 01/03/2013 08:21 AM, Rainer Schubert wrote:
>>>> Hi,
>>>>
>>>> I have created a small CMAN-Cluster with 3 Nodes and a CLVM
>>>> configuration. Now, I want to add a new node (mynode4). CMAN works
>>>> fine, cman_tool shows all members:
>>>>
>>>> # cman_tool nodes
>>>> Node  Sts   Inc   Joined               Name
>>>>    1   M    408   2013-01-03 14:00:57  mynode1
>>>>    2   M    408   2013-01-03 14:00:57  mynode2
>>>>    3   M    408   2013-01-03 14:00:57  mynode3
>>>>    4   M    404   2013-01-03 14:00:56  mynode4
>>>>
>>>>
>>>> cman_tool services (on mynode4)
>>>>
>>>> fence domain
>>>> member count  4
>>>> victim count  0
>>>> victim now    0
>>>> master nodeid 1
>>>> wait state    none
>>>> members       1 2 3 4
>>>>
>>>>
>>>> corosync:
>>>>
>>>> corosync-cfgtool -s
>>>> Printing ring status.
>>>> Local node ID 4
>>>> RING ID 0
>>>>         id      = 10.10.10.13
>>>>         status  = ring 0 active with no faults
>>>>
>>>> Everything looks fine, from my site. No I will start clvmd
>>>>
>>>> :~# /etc/init.d/clvm start
>>>> Starting Cluster LVM Daemon: clvm clvmd startup timed out
>>>>
>>>> The CLVM runs into a time out.
>>>>
>>>> My System:
>>>>
>>>>  cat /etc/debian_version
>>>> 6.0.6
>>>>
>>>> # lvm version
>>>>   LVM version:     2.02.66(2) (2010-05-20)
>>>>   Library version: 1.02.48 (2010-05-20)
>>>>   Driver version:  4.22.0
>>>>
>>>> dpkg -l |grep clvm
>>>> ii  clvm                                2.02.66-5
>>>> Cluster LVM Daemon for lvm2
>>>>
>>>>  dpkg -l |grep cman
>>>> ii  cman                                3.0.12-2
>>>> Red Hat cluster suite - cluster manager
>>>> ii  libcman3                            3.0.12-2
>>>> Red Hat cluster suite - cluster manager libraries
>>>>
>>>> Have anybody a idea, what running false?
>>>>
>>>> best regards
>>>>
>>>
>>> Can you post your cluster.conf please? Obfuscate as little as you can
>>> please.
>>>
>>> --
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/
>>> What if the cure for cancer is trapped in the mind of a person without
>>> access to education?
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?


From misch at schwartzkopff.org  Mon Jan  7 08:59:25 2013
From: misch at schwartzkopff.org (Michael Schwartzkopff)
Date: Mon, 07 Jan 2013 09:59:25 +0100
Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out
In-Reply-To: <CANLWqqRSakjMKb2-JugS6Lxy0H0gHdJEpyrV+VZg8V3MqRxoQg@mail.gmail.com>
References: <CANLWqqR0HzD+Vuoa-e4ynsEHqfXEYvzU2p3t907Z9z-S5gj36Q@mail.gmail.com>
	<50E6ED5C.1090003@alteeve.ca>
	<CANLWqqRSakjMKb2-JugS6Lxy0H0gHdJEpyrV+VZg8V3MqRxoQg@mail.gmail.com>
Message-ID: <2810431.8ivETSkJLk@nb003>

Am Montag, 7. Januar 2013, 08:55:44 schrieb Rainer Schubert:
> Hi,
> 
> thank you for the fast answer. Now I have one question:
> 
> - It is possible to integrate the fencing-service on a working cluster?
> 
> I have working virtualisation enviroment with 50 VMs, so i can't take
> them down.
> 
> best regards

Hi,

yes you can update your setup while the cluster is running. See the doc of 
cman.

Please be careful when setting up and testing the fencing while your cluster 
provides services.

Greetings,


-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 M?nchen

Tel: (0163) 172 50 98
Fax: (089) 620 304 13
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130107/910db1f7/attachment.htm>

From lists at alteeve.ca  Mon Jan  7 15:46:55 2013
From: lists at alteeve.ca (Digimer)
Date: Mon, 07 Jan 2013 10:46:55 -0500
Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out
In-Reply-To: <CANLWqqRSakjMKb2-JugS6Lxy0H0gHdJEpyrV+VZg8V3MqRxoQg@mail.gmail.com>
References: <CANLWqqR0HzD+Vuoa-e4ynsEHqfXEYvzU2p3t907Z9z-S5gj36Q@mail.gmail.com>
	<50E5BE3A.1090006@alteeve.ca>
	<CANLWqqQpHG547O4BBo7=xPw+L0ojG_wNfYJoVX5udnnQLe_nWg@mail.gmail.com>
	<50E6ED5C.1090003@alteeve.ca>
	<CANLWqqRSakjMKb2-JugS6Lxy0H0gHdJEpyrV+VZg8V3MqRxoQg@mail.gmail.com>
Message-ID: <50EAEDEF.3050505@alteeve.ca>

Technically, yes. Practically, no.

You need to know if your configuration is working. The best way to do
that is to simulate a failure and watch to make sure that the fence
actions happen.

I would strongly recommend scheduling down time to do this.

digimer

On 01/07/2013 02:55 AM, Rainer Schubert wrote:
> Hi,
> 
> thank you for the fast answer. Now I have one question:
> 
> - It is possible to integrate the fencing-service on a working cluster?
> 
> I have working virtualisation enviroment with 50 VMs, so i can't take
> them down.
> 
> best regards
> 
> 
> 
> 2013/1/4 Digimer <lists at alteeve.ca>:
>> As Emmanuel said, you need fencing.
>>
>> Please read this:
>>
>> https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing
>>
>> digimer
>>
>> On 01/04/2013 02:26 AM, Rainer Schubert wrote:
>>> Hi,
>>>
>>> my cluster.conf:
>>>
>>> <?xml version="1.0"?>
>>> <cluster name="my_lvm_cluster" config_version="2">
>>>                <clusternodes>
>>>                <clusternode name="mynode1" nodeid="1">
>>>                </clusternode>
>>>
>>>                <clusternode name="mynode2" nodeid="2">
>>>                </clusternode>
>>>                <clusternode name="mynode3" nodeid="3">
>>>                </clusternode>
>>>              <clusternode name="mynode4" nodeid="4">
>>>                </clusternode>
>>>                </clusternodes>
>>> </cluster>
>>>
>>> 2013/1/3 Digimer <lists at alteeve.ca>:
>>>> On 01/03/2013 08:21 AM, Rainer Schubert wrote:
>>>>> Hi,
>>>>>
>>>>> I have created a small CMAN-Cluster with 3 Nodes and a CLVM
>>>>> configuration. Now, I want to add a new node (mynode4). CMAN works
>>>>> fine, cman_tool shows all members:
>>>>>
>>>>> # cman_tool nodes
>>>>> Node  Sts   Inc   Joined               Name
>>>>>    1   M    408   2013-01-03 14:00:57  mynode1
>>>>>    2   M    408   2013-01-03 14:00:57  mynode2
>>>>>    3   M    408   2013-01-03 14:00:57  mynode3
>>>>>    4   M    404   2013-01-03 14:00:56  mynode4
>>>>>
>>>>>
>>>>> cman_tool services (on mynode4)
>>>>>
>>>>> fence domain
>>>>> member count  4
>>>>> victim count  0
>>>>> victim now    0
>>>>> master nodeid 1
>>>>> wait state    none
>>>>> members       1 2 3 4
>>>>>
>>>>>
>>>>> corosync:
>>>>>
>>>>> corosync-cfgtool -s
>>>>> Printing ring status.
>>>>> Local node ID 4
>>>>> RING ID 0
>>>>>         id      = 10.10.10.13
>>>>>         status  = ring 0 active with no faults
>>>>>
>>>>> Everything looks fine, from my site. No I will start clvmd
>>>>>
>>>>> :~# /etc/init.d/clvm start
>>>>> Starting Cluster LVM Daemon: clvm clvmd startup timed out
>>>>>
>>>>> The CLVM runs into a time out.
>>>>>
>>>>> My System:
>>>>>
>>>>>  cat /etc/debian_version
>>>>> 6.0.6
>>>>>
>>>>> # lvm version
>>>>>   LVM version:     2.02.66(2) (2010-05-20)
>>>>>   Library version: 1.02.48 (2010-05-20)
>>>>>   Driver version:  4.22.0
>>>>>
>>>>> dpkg -l |grep clvm
>>>>> ii  clvm                                2.02.66-5
>>>>> Cluster LVM Daemon for lvm2
>>>>>
>>>>>  dpkg -l |grep cman
>>>>> ii  cman                                3.0.12-2
>>>>> Red Hat cluster suite - cluster manager
>>>>> ii  libcman3                            3.0.12-2
>>>>> Red Hat cluster suite - cluster manager libraries
>>>>>
>>>>> Have anybody a idea, what running false?
>>>>>
>>>>> best regards
>>>>>
>>>>
>>>> Can you post your cluster.conf please? Obfuscate as little as you can
>>>> please.
>>>>
>>>> --
>>>> Digimer
>>>> Papers and Projects: https://alteeve.ca/w/
>>>> What if the cure for cancer is trapped in the mind of a person without
>>>> access to education?
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From tc3driver at gmail.com  Mon Jan  7 19:25:14 2013
From: tc3driver at gmail.com (Bill G.)
Date: Mon, 7 Jan 2013 11:25:14 -0800
Subject: [Linux-cluster] Please settle a bet for me
Message-ID: <CABQafzgz8NypFZbc8pTYd5HSecGmQrvxkc-K2ZBhTs2U=NoeYQ@mail.gmail.com>

Hi list,

We are having a discussion about clustering on RHEL 5.2 and 5.4.

Knowing that there are no supported fence devices for VMWare 4.1 and the
given versions of RHEL.

As far as I can tell there must be a fence device for any type of automatic
fail over, but coworkers are insisting that if a clustered process dies it
can and will automatically start/fail over... it just won't if there is a
hardware failure. They are also saying that you will be able to move
processes from one node to another without fencing.

I am insisting that clustering does not work without an fence device.

Please settle this :)

-- 
Thanks,
Bill G.
tc3driver at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130107/bd6334c9/attachment.htm>

From arpittolani at gmail.com  Mon Jan  7 19:50:18 2013
From: arpittolani at gmail.com (Arpit Tolani)
Date: Tue, 8 Jan 2013 01:20:18 +0530
Subject: [Linux-cluster] Please settle a bet for me
In-Reply-To: <CABQafzgz8NypFZbc8pTYd5HSecGmQrvxkc-K2ZBhTs2U=NoeYQ@mail.gmail.com>
References: <CABQafzgz8NypFZbc8pTYd5HSecGmQrvxkc-K2ZBhTs2U=NoeYQ@mail.gmail.com>
Message-ID: <CAD3MydBVXHtnzsu0PJMn1Tvq=f8u861Cmd_JnEj7bwAh6MmVDw@mail.gmail.com>

Hello

On Tue, Jan 8, 2013 at 12:55 AM, Bill G. <tc3driver at gmail.com> wrote:
> Hi list,
>
> We are having a discussion about clustering on RHEL 5.2 and 5.4.
>
> Knowing that there are no supported fence devices for VMWare 4.1 and the
> given versions of RHEL.
>
> As far as I can tell there must be a fence device for any type of automatic
> fail over, but coworkers are insisting that if a clustered process dies it
> can and will automatically start/fail over... it just won't if there is a
> hardware failure. They are also saying that you will be able to move
> processes from one node to another without fencing.
>
> I am insisting that clustering does not work without an fence device.
>
> Please settle this :)
>

Yes, fencing mandatory with cluster, Without fencing your cluster will
fail to work.

Refer https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing

If you have Red Hat Support, Check below Kbase.
https://access.redhat.com/knowledge/solutions/15575

Even manual fencing is not supported i.e. fence_manual.

Hope that helps.


Regards
Arpit Tolani


From sam at dotsec.com  Mon Jan  7 22:29:28 2013
From: sam at dotsec.com (Sam Wilson)
Date: Tue, 08 Jan 2013 08:29:28 +1000
Subject: [Linux-cluster] Please settle a bet for me
In-Reply-To: <CABQafzgz8NypFZbc8pTYd5HSecGmQrvxkc-K2ZBhTs2U=NoeYQ@mail.gmail.com>
References: <CABQafzgz8NypFZbc8pTYd5HSecGmQrvxkc-K2ZBhTs2U=NoeYQ@mail.gmail.com>
Message-ID: <50EB4C48.8020209@dotsec.com>

Hi Bill,

As far as I have experienced under pacemaker this is true in most cases.
EG: Two nodes running a master/slave httpd will fail over without fencing.

However, if for example your nodes are also using GFS2 and something
goes wrong then you will find your filesystem locked by DLM which will
obviously break fail over for services on that filesystem.

In short, best to configure fencing unless this is a lab environment
your willing to break!

Cheers,

Sam


From lists at alteeve.ca  Tue Jan  8 01:24:34 2013
From: lists at alteeve.ca (Digimer)
Date: Mon, 07 Jan 2013 20:24:34 -0500
Subject: [Linux-cluster] Please settle a bet for me
In-Reply-To: <50EB4C48.8020209@dotsec.com>
References: <CABQafzgz8NypFZbc8pTYd5HSecGmQrvxkc-K2ZBhTs2U=NoeYQ@mail.gmail.com>
	<50EB4C48.8020209@dotsec.com>
Message-ID: <50EB7552.30903@alteeve.ca>

On 01/07/2013 05:29 PM, Sam Wilson wrote:
> Hi Bill,
> 
> As far as I have experienced under pacemaker this is true in most cases.
> EG: Two nodes running a master/slave httpd will fail over without fencing.
> 
> However, if for example your nodes are also using GFS2 and something
> goes wrong then you will find your filesystem locked by DLM which will
> obviously break fail over for services on that filesystem.
> 
> In short, best to configure fencing unless this is a lab environment
> your willing to break!
> 
> Cheers,
> 
> Sam

DLM absolutely requires fencing, but even without it, a production
cluster without fencing is a bad day waiting to happen. Please always
use fencing... It will save you far more headache in the long run.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From ccaulfie at redhat.com  Tue Jan  8 09:22:30 2013
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 08 Jan 2013 09:22:30 +0000
Subject: [Linux-cluster] Please settle a bet for me
In-Reply-To: <CABQafzgz8NypFZbc8pTYd5HSecGmQrvxkc-K2ZBhTs2U=NoeYQ@mail.gmail.com>
References: <CABQafzgz8NypFZbc8pTYd5HSecGmQrvxkc-K2ZBhTs2U=NoeYQ@mail.gmail.com>
Message-ID: <50EBE556.1030309@redhat.com>

On 07/01/13 19:25, Bill G. wrote:
> Hi list,
>
> We are having a discussion about clustering on RHEL 5.2 and 5.4.
>
> Knowing that there are no supported fence devices for VMWare 4.1 and the
> given versions of RHEL.
>
> As far as I can tell there must be a fence device for any type of
> automatic fail over, but coworkers are insisting that if a clustered
> process dies it can and will automatically start/fail over... it just
> won't if there is a hardware failure. They are also saying that you will
> be able to move processes from one node to another without fencing.
>
> I am insisting that clustering does not work without an fence device.
>
> Please settle this :)

Clustering will work without a fence device but no-one will support you 
doing it. In the vast majority of cases you are risking your data.

In particular cases you can run without fencing if you really, really, 
really know what you are doing (ie are on the dev team) and have a 
particular workload. But if you misjudge the installation ... see above :)

Chrissie


From rossnick-lists at cybercat.ca  Wed Jan  9 15:50:25 2013
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Wed, 09 Jan 2013 10:50:25 -0500
Subject: [Linux-cluster] Moving Physical extents from one PV to another
 in a clustered environement.
In-Reply-To: <50C9FFEF.6030303@cybercat.ca>
References: <50C8C89F.9080200@cybercat.ca>
	<20121212183937.GI14097@squishy.elizium.za.net>
	<50C8DB71.7010009@cybercat.ca>
	<20121212202905.GJ14097@squishy.elizium.za.net>
	<50C94177.8070706@cybercat.ca>
	<00386E1F-FEC6-4CD2-8BB8-8C61A48E17DE@gmail.com>
	<CAMH2m-qeWb08PxCYay2wT5HCg72WmcWCD0owg8N-MMPLRSWBKg@mail.gmail.com>
	<50C9E757.4080005@cybercat.ca> <50C9E91A.1080807@redhat.com>
	<50C9FFEF.6030303@cybercat.ca>
Message-ID: <50ED91C1.7020207@cybercat.ca>


>> You will need to install 'cmirror' package(s),and start cmirror service
>> on all cluster nodes
>>
>> # service cmirror start
>>
>> After that pvmove should work
> No it didn't. I posted in a previous email what it did, It complains
> that it cannot lock the vg.

Just a quick note on this issue. Yesterday, I installed the latest
versions of the kernel and rebooted the whole cluster, and now I can
move my lv to another PV !

So cmirrord was indeed the solutiuon.

Regards,


From gounini.geekarea at gmail.com  Fri Jan 11 12:02:18 2013
From: gounini.geekarea at gmail.com (GouNiNi Geekarea)
Date: Fri, 11 Jan 2013 13:02:18 +0100 (CET)
Subject: [Linux-cluster] [rgmanager] sending email on relocate
In-Reply-To: <597884669.12276.1357905595375.JavaMail.root@geekarea.fr>
Message-ID: <1060367169.12284.1357905738867.JavaMail.root@geekarea.fr>

Hello everyone,

I didn't find any simple solution to send emails to alerte when rgmanager decides to relocate services.
Do you know simple solution other than create a script ressource?

Regards,


From robejrm at gmail.com  Fri Jan 11 14:15:05 2013
From: robejrm at gmail.com (Juan Ramon Martin Blanco)
Date: Fri, 11 Jan 2013 15:15:05 +0100
Subject: [Linux-cluster] [rgmanager] sending email on relocate
In-Reply-To: <1060367169.12284.1357905738867.JavaMail.root@geekarea.fr>
References: <597884669.12276.1357905595375.JavaMail.root@geekarea.fr>
	<1060367169.12284.1357905738867.JavaMail.root@geekarea.fr>
Message-ID: <CAJfd-EBLhvbB5oUgxecirkvKcreNcnDz+NQGt2o3sF9xk1xHKw@mail.gmail.com>

On Fri, Jan 11, 2013 at 1:02 PM, GouNiNi Geekarea <
gounini.geekarea at gmail.com> wrote:

> Hello everyone,
>
> I didn't find any simple solution to send emails to alerte when rgmanager
> decides to relocate services.
> Do you know simple solution other than create a script ressource?
>
> Regards,
>
> You can parse cluster logs (if in debug mode) and send mail if something
like:
"clurgmgrd[30752]: <debug> Sent remote-start request to 2"
or
"attempting to relocate"
happens on them.

I.e: you can use rsyslog ommail http://www.rsyslog.com/doc/ommail.html

Greetings,
Juanra

>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130111/c8427873/attachment.htm>

From gounini.geekarea at gmail.com  Fri Jan 11 14:05:09 2013
From: gounini.geekarea at gmail.com (GouNiNi Geekarea)
Date: Fri, 11 Jan 2013 15:05:09 +0100 (CET)
Subject: [Linux-cluster] [rgmanager] sending email on relocate
In-Reply-To: <CAJfd-EBLhvbB5oUgxecirkvKcreNcnDz+NQGt2o3sF9xk1xHKw@mail.gmail.com>
References: <597884669.12276.1357905595375.JavaMail.root@geekarea.fr>
	<1060367169.12284.1357905738867.JavaMail.root@geekarea.fr>
	<CAJfd-EBLhvbB5oUgxecirkvKcreNcnDz+NQGt2o3sF9xk1xHKw@mail.gmail.com>
Message-ID: <1681853954.12472.1357913109168.JavaMail.root@geekarea.fr>

Good idea, does it mean there is nothing built in rgmanager ?


----- Mail original -----
> De: "Juan Ramon Martin Blanco" <robejrm at gmail.com>
> ?: "linux clustering" <linux-cluster at redhat.com>
> Envoy?: Vendredi 11 Janvier 2013 15:15:05
> Objet: Re: [Linux-cluster] [rgmanager] sending email on relocate
> 
> 
> 
> On Fri, Jan 11, 2013 at 1:02 PM, GouNiNi Geekarea <
> gounini.geekarea at gmail.com > wrote:
> 
> 
> Hello everyone,
> 
> I didn't find any simple solution to send emails to alerte when
> rgmanager decides to relocate services.
> Do you know simple solution other than create a script ressource?
> 
> Regards,
> 
> You can parse cluster logs (if in debug mode) and send mail if
> something like:
> "clurgmgrd[30752]: <debug> Sent remote-start request to 2"
> or
> "attempting to relocate"
> happens on them.
> 
> I.e: you can use rsyslog ommail
> http://www.rsyslog.com/doc/ommail.html
> 
> Greetings,
> Juanra
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From christiangrassi at gmail.com  Sat Jan 12 00:24:20 2013
From: christiangrassi at gmail.com (Christian Grassi)
Date: Sat, 12 Jan 2013 01:24:20 +0100
Subject: [Linux-cluster] Corosync softlookup
In-Reply-To: <CAHrqXNEC_1VUuRxskJ1qrYtfhdUmF1Dc1waeDFx3Y2YHQsG3yQ@mail.gmail.com>
References: <CAHrqXNGvg_o6q6u+APT1gneGkRkkEW1KVY4vCVpsrZRv2Y7G8Q@mail.gmail.com>
	<CAHrqXNHX6KaV8xogdZv76-yD1qEYMqEwe+hJJej+w1_Ypzp26w@mail.gmail.com>
	<CAHrqXNEC_1VUuRxskJ1qrYtfhdUmF1Dc1waeDFx3Y2YHQsG3yQ@mail.gmail.com>
Message-ID: <CAHrqXNH+Be4B7NNui2m3iP2sCqsii=cXe2tFo6t2Zrg0M3u8rg@mail.gmail.com>

Hi all,
I have a three node cluster which run KVM guests a services. The system run
fine for some months but the suddenly it started to have soft lockups as
you can se below and the nodes get fenced.
The guests use clvm with raw lv as back end, and the config files are on
shared gfs2 file systems. Any idea which could be the cause ?
A attache also my cluster.conf

Any idea is welcome

Regards
Chris

Pid: 136556, comm: corosync Not tainted 2.6.32-279.el6.x86_64 #1 HP
ProLiant DL980 G7
RIP: 0010:[<ffffffff8104d08e>]  [<ffffffff8104d08e>]
wait_for_rqlock+0x2e/0x40
RSP: 0018:ffff881c12231ee8  EFLAGS: 00000206
RAX: 00000000e52ae4c7 RBX: ffff881c12231ee8 RCX: ffff882070e16680
RDX: 00000000e52ae4c7 RSI: ffff882070e11960 RDI: 0000000000000000
RBP: ffffffff8100bc0e R08: 0000000000000000 R09: dead000000200200
R10: ffff881c125830c0 R11: 00000000000000d2 R12: 0000000000000282
R13: ffffffff81aa5700 R14: ffff882070e11960 R15: ffff881c12583438
FS:  0000000000000000(0000) GS:ffff882070e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000035a489a490 CR3: 0000000001a85000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process corosync (pid: 136556, threadinfo ffff881c12230000, task
ffff881c12582aa0)
Stack:
ffff881c12231f68 ffffffff8107091b ffff881c12231f78 ffff881c12231f28
<d> ffff881faf1d5660 ffff881c12582f68 ffff881c12582f68 0000000000000000
<d> ffff881c12231f28 ffff881c12231f28 ffff881c12231f78 00007f9ce339d440
Call Trace:
[<ffffffff8107091b>] ? do_exit+0x5ab/0x870
[<ffffffff81070ce7>] ? sys_exit+0x17/0x20
[<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
Code: e5 0f 1f 44 00 00 48 c7 c0 80 66 01 00 65 48 8b 0c 25 b0 e0 00 00 0f
ae f0 48 01 c1 eb 09 0f 1f 80 00 00 00 00 f3 90 8b 01 89 c2 <c1> fa 10 66
39 c2 75 f2 c9 c3 0f 1f 84 00 00 00 00 00 55 48 89
Call Trace:
[<ffffffff8107091b>] ? do_exit+0x5ab/0x870
[<ffffffff81070ce7>] ? sys_exit+0x17/0x20
[<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
BUG: soft lockup - CPU#90 stuck for 67s! [multipathd:141345]
Modules linked in: iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter ip_tables
ebtable_nat ebtables gfs2 dlm configfs autofs4 sunrpc bridge bonding 8021q
garp stp llc ipv6 ext2 vhost_net macvtap macvlan tun kvm_intel kvm
microcode serio_raw power_meter be2net bnx2 netxen_nic iTCO_wdt
iTCO_vendor_support hpilo hpwdt sg i7core_edac edac_core shpchp ext4
mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif lpfc
scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix hpsa radeon ttm
drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130112/290e8324/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: application/octet-stream
Size: 7783 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130112/290e8324/attachment.obj>

From jsosic at srce.hr  Mon Jan 14 01:47:56 2013
From: jsosic at srce.hr (Jakov Sosic)
Date: Mon, 14 Jan 2013 02:47:56 +0100
Subject: [Linux-cluster] Problem with automating ricci & ccs
Message-ID: <50F363CC.3020308@srce.hr>

Hi.

I'm using CentOS 6, and have a problem with ccs & ricci.

At first use, ccs asks for password for each node. After that, ~/.ccs is 
generated with cert in it.

1. I've found how to generate private key in ~/.ccs from the code in ccs 
python executable (/usr/sbin/ccs).

2. I've also found how to generate CA in /var/lib/ricci/certs => code 
for that can be found in init script of ricci (/etc/init.d/ricci).

But what I am missing is how to use the user key/certificate from step 1 
and sign it into CA in step 2?


I'm building puppet module which will autoconfigure whole cluster from 
bare metal to working state. So far my only problem is updating 
cluster.conf, for which I need fully working ricci CA and user 
certificates in /root/.ccs of every node...


So, any ideas are welcome.


-- 
Jakov Sosic
www.srce.unizg.hr


From Ralph.Grothe at itdz-berlin.de  Wed Jan 16 09:19:19 2013
From: Ralph.Grothe at itdz-berlin.de (Ralph.Grothe at itdz-berlin.de)
Date: Wed, 16 Jan 2013 10:19:19 +0100
Subject: [Linux-cluster] [rgmanager] sending email on relocate
In-Reply-To: <1060367169.12284.1357905738867.JavaMail.root@geekarea.fr>
References: <597884669.12276.1357905595375.JavaMail.root@geekarea.fr>
	<1060367169.12284.1357905738867.JavaMail.root@geekarea.fr>
Message-ID: <A789DDB53ED7E94396E842EE2AC9B5FF01432D23@itdzex101.ITDZ.verwalt-berlin.de>

I have implemented this on our RHCS clusters where such a
feature, as e.g. sending out a notification SMS text message when
a service does relocate, was requested by the users/admins of
this service, by simply adding a script function to the RHCS
resource agent (RA) code (i.e. mostly a custom script of RHCS RA
type "script", very similar to a SysV init script) and placing an
invocation statement to this function in the RA script's start
and stop blocks with applicable subject line and body text.
I found the SWAKS client (
http://www.jetmore.org/john/code/swaks/ ) nifty to this end
because it's easy  to use and offers (almost?) complete control
over SMTP communication.
If I feel the urge to dig deeper I usually use the CPAN module
Mail::Sender (
http://search.cpan.org/~jenda/Mail-Sender-0.8.22/Sender.pm ) with
a few helper modules should they be required.
On the other hand I have Nagios service checks for most of our
cluster services (not only Linux clusters) that would trigger a
notification or other event handler on critical state changes and
let Nagios do a centralized notifying.
Often that's the only way to get out messages anyway (e.g. where
clusters operate in shielded LANs)


> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of 
> GouNiNi Geekarea
> Sent: Friday, January 11, 2013 13:02
> To: linux clustering
> Subject: [Linux-cluster] [rgmanager] sending email on relocate
> 
> Hello everyone,
> 
> I didn't find any simple solution to send emails to alerte 
> when rgmanager decides to relocate services.
> Do you know simple solution other than create a script
ressource?
> 
> Regards,
> 
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


From jpokorny at redhat.com  Wed Jan 16 13:14:56 2013
From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=)
Date: Wed, 16 Jan 2013 14:14:56 +0100
Subject: [Linux-cluster] Problem with automating ricci & ccs
In-Reply-To: <50F363CC.3020308@srce.hr>
References: <50F363CC.3020308@srce.hr>
Message-ID: <20130116131456.GA6079@redhat.com>

Hello Jakov,

On 14/01/13 02:47 +0100, Jakov Sosic wrote:
> Hi.
> 
> I'm using CentOS 6, and have a problem with ccs & ricci.
> 
> At first use, ccs asks for password for each node. After that, ~/.ccs is
> generated with cert in it.
> 
> 1. I've found how to generate private key in ~/.ccs from the code in ccs
> python executable (/usr/sbin/ccs).
> 
> 2. I've also found how to generate CA in /var/lib/ricci/certs => code for
> that can be found in init script of ricci (/etc/init.d/ricci).
> 
> But what I am missing is how to use the user key/certificate from step 1 and
> sign it into CA in step 2?

The point here is that once the public certificate of ccs is recognized by
ricci as authorized by supplying the password within the initial session,
any other other session will be passwordless, based only on the "proved"
client's certificate.

Your intention seems to be to skip the initial phase involving password,
is it the case?  This should be doable by forcing ccs to generate its
certificate by doing some NO-OP, then copying (scp?) the public part
to the predefined destination at the machine with ricci installed,
e.g.:

  [root at client1]# ccs -h localhost -p IGNOREME --getconf &>/dev/null
  [root at client1]# PUBLIC_CERT=~/.ccs/cacert/pem
  [root at client1]# RICCI_CLIENTS=/var/lib/ricci/certs/clients
  [root at client1]# UNIQUE_SUFFIX=$(hostname | sha1sum | cut -b1-6)
  [root at client1]# RICCI_CERT=${RICCI_CLIENTS}/client_cert_${UNIQUE_SUFFIX}
  [root at client1]# scp $PUBLIC_CERT riccihost:$RICCI_CERT

Please note that 'sha1sum' command in the above example is only used
to minimize possible collision at certificate filenames coming from
other machines (under highly unprobable circumstances, collision can
still happen) that will possibly run the same sequence, and otherwise
does not guarantee any anonymity of the certificate within the ricci's
certs/clients directory.

Surely, the first step can be substituted by either using pregenerated
certificate + key on the locations expected by ccs (~/.ccs) or
generating them explicitly (e.g., by "openssl req") as part
of the process.  The point is that css-local and ricci-tracked
certificate (one of presumably many) matches.

> I'm building puppet module which will autoconfigure whole cluster from bare
> metal to working state. So far my only problem is updating cluster.conf, for
> which I need fully working ricci CA and user certificates in /root/.ccs of
> every node...

By any chance, are you willing to share the module or its skeleton
to the community?

> So, any ideas are welcome.

Hope the above helps.

-- 
Jan


From epretorious at yahoo.com  Fri Jan 18 04:59:04 2013
From: epretorious at yahoo.com (Eric)
Date: Thu, 17 Jan 2013 20:59:04 -0800 (PST)
Subject: [Linux-cluster] HA iSCSI with DRBD
Message-ID: <1358485144.16911.YahooMailNeo@web126001.mail.ne1.yahoo.com>

I've been attempting to follow the recipe laid-out in the Linbit guide "Highly available iSCSI storage with DRBD and Pacemaker" to create a highly-available iSCSI server on the two servers san1 & san2 but can't quite get the details right:

> crm configure property stonith-enabled=false
> crm configure property no-quorum-policy=ignore
> 
> crm configure primitive p_IP-1_254 ocf:heartbeat:IPaddr2 params ip=192.168.1.254 cidr_netmask=24 op monitor interval=30s
> 
> crm configure primitive p_DRBD-r0 ocf:linbit:drbd params drbd_resource=r0 op monitor interval=60s
> crm configure ms ms_DRBD-r0 p_DRBD-r0 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
> 
> crm configure primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget params iqn=iqn.2012-11.com.example.san1:sda op monitor interval=10s
> crm configure primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=0 path=/dev/drbd0 op monitor interval=10s
> crm configure primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=1 path=/dev/drbd1 op monitor interval=10s
> crm configure primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=2 path=/dev/drbd2 op monitor interval=10s
> crm configure primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=3 path=/dev/drbd3 op monitor interval=10s
> crm configure primitive p_iSCSI-san1_4 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=4 path=/dev/drbd4 op monitor interval=10s
> 
> crm configure group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_iSCSI-san1_4 p_IP-1_254
> crm configure order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start
> crm configure colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master
> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1

IET (i.e., iscsitarget) is already running (with the default configuration) and DRBD's already correctly configured to create the resource r0...

> resource r0 {
> ??? volume 0 {
> ??? ??? device /dev/drbd0 ;
> ??? ??? disk /dev/sda7 ;
> ??? ??? meta-disk internal ;
> ??? }
> ??? volume 1 {
> ??? ??? device /dev/drbd1 ;
> ??? ??? disk /dev/sda8 ;
> ??? ??? meta-disk internal ;
> ??? }
> ??? volume 2 {
> ??? ??? device /dev/drbd2 ;
> ??? ??? disk /dev/sda9 ;
> ??? ??? meta-disk internal ;
> ??? }
> ??? volume 3 {
> ??? ??? device /dev/drbd3 ;
> ??? ??? disk /dev/sda10 ;
> ??? ??? meta-disk internal ;
> ??? }
> ??? on san1 {
> ??? ??? address 192.168.1.1:7789 ;
> ??? }
> ??? on san2 {
> ??? ??? address 192.168.1.2:7789 ;
> ??? }
> }


But the shared IP address won't start nor will the LUN's:

> san1:~ # crm_mon -1
> ============
> Last updated: Thu Jan 17 20:55:55 2013
> Last change: Thu Jan 17 20:55:09 2013 by root via cibadmin on san1
> Stack: openais
> Current DC: san1 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 9 Resources configured.
> ============
> 
> Online: [ san1 san2 ]
> 
>? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
> ???? Masters: [ san1 ]
> ???? Slaves: [ san2 ]
>? Resource Group: g_iSCSI-san1
> ???? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san1
> ???? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
> ???? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
> ???? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
> ???? p_iSCSI-san1_3??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
> ???? p_iSCSI-san1_4??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
> ???? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Stopped 
> 
> Failed actions:
> ??? p_iSCSI-san1_0_start_0 (node=san1, call=23, rc=1, status=complete): unknown error
> ??? p_iSCSI-san1_1_start_0 (node=san1, call=26, rc=1, status=complete): unknown error
> ??? p_iSCSI-san1_2_start_0 (node=san1, call=29, rc=1, status=complete): unknown error
> ??? p_iSCSI-san1_3_start_0 (node=san1, call=32, rc=1, status=complete): unknown error
> ??? p_iSCSI-san1_4_start_0 (node=san1, call=35, rc=1, status=complete): unknown error
> ??? p_iSCSI-san1_0_start_0 (node=san2, call=11, rc=1, status=complete): unknown error
> ??? p_iSCSI-san1_1_start_0 (node=san2, call=14, rc=1, status=complete): unknown error
> ??? p_iSCSI-san1_2_start_0 (node=san2, call=17, rc=1, status=complete): unknown error
> ??? p_iSCSI-san1_3_start_0 (node=san2, call=20, rc=1, status=complete): unknown error
> ??? p_iSCSI-san1_4_start_0 (node=san2, call=23, rc=1, status=complete): unknown error


What am I doing wrong?


TIA,
Eric Pretorious
Truckee, CA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130117/62e09d84/attachment.htm>

From jsosic at srce.hr  Fri Jan 18 18:44:16 2013
From: jsosic at srce.hr (Jakov Sosic)
Date: Fri, 18 Jan 2013 19:44:16 +0100
Subject: [Linux-cluster] Problem with automating ricci & ccs
In-Reply-To: <20130116131456.GA6079@redhat.com>
References: <50F363CC.3020308@srce.hr> <20130116131456.GA6079@redhat.com>
Message-ID: <50F99800.6070102@srce.hr>

On 01/16/2013 02:14 PM, Jan Pokorn? wrote:

> The point here is that once the public certificate of ccs is recognized by
> ricci as authorized by supplying the password within the initial session,
> any other other session will be passwordless, based only on the "proved"
> client's certificate.
> 
> Your intention seems to be to skip the initial phase involving password,
> is it the case?  This should be doable by forcing ccs to generate its
> certificate by doing some NO-OP, then copying (scp?) the public part
> to the predefined destination at the machine with ricci installed,
> e.g.:
> 
>   [root at client1]# ccs -h localhost -p IGNOREME --getconf &>/dev/null
>   [root at client1]# PUBLIC_CERT=~/.ccs/cacert/pem
>   [root at client1]# RICCI_CLIENTS=/var/lib/ricci/certs/clients
>   [root at client1]# UNIQUE_SUFFIX=$(hostname | sha1sum | cut -b1-6)
>   [root at client1]# RICCI_CERT=${RICCI_CLIENTS}/client_cert_${UNIQUE_SUFFIX}
>   [root at client1]# scp $PUBLIC_CERT riccihost:$RICCI_CERT

Thank you for your explanation. I've figured that out later myself :)

So, instead of using the sha1sum to avoide collisions, I use nodename.
So my client_cert names look like this:

client_cert_mynode1
client_cert_mynode2
...

Is this ok? Or should I obfuscate name for some reason...


> Surely, the first step can be substituted by either using pregenerated
> certificate + key on the locations expected by ccs (~/.ccs) or
> generating them explicitly (e.g., by "openssl req") as part
> of the process.  The point is that css-local and ricci-tracked
> certificate (one of presumably many) matches.

I've done this by pre-generating the certificates on my puppet master.


>> I'm building puppet module which will autoconfigure whole cluster from bare
>> metal to working state. So far my only problem is updating cluster.conf, for
>> which I need fully working ricci CA and user certificates in /root/.ccs of
>> every node...
> 
> By any chance, are you willing to share the module or its skeleton
> to the community?

Offcourse, as soon as I'm happy with the code and the level of
functionality. We have around dozen clusters, I'm developing this module
on a new one that's supposed to go into production soon. Other clusters
use older module which really doesn't solve any of this, so as soon as
my code is stable we'll push other clusters to new puppet module. After
that, I will publish it. So expect it in another week or two.


> Hope the above helps.

Yeah, it really did help. But, for some strange reason it seems that
ccs_sync doesn't use certificates, but instead it asks for password...

My idea was to use ccs_sync to propagate new cluster.conf. So, puppet
puts cluster.conf in /etc/cluster.conf, and after that runs ccs_sync -f
/etc/cluster.conf

But unfortunately, ccs_sync doesn't seem to recognize the certificates
as ccs does :( Any idea on this one?

pgsql01-xc # ccs -h pgsql01-xc --getversion
2

pgsql01-xc # ccs_sync -f /etc/cluster.conf
You have not authenticated to the ricci daemon on pgsql01-xc
Password:


I'm digging into source code to try to get some sense of it :-/


From jsosic at srce.hr  Fri Jan 18 19:44:19 2013
From: jsosic at srce.hr (Jakov Sosic)
Date: Fri, 18 Jan 2013 20:44:19 +0100
Subject: [Linux-cluster] Problem with automating ricci & ccs
In-Reply-To: <50F99800.6070102@srce.hr>
References: <50F363CC.3020308@srce.hr> <20130116131456.GA6079@redhat.com>
	<50F99800.6070102@srce.hr>
Message-ID: <50F9A613.30902@srce.hr>

On 01/18/2013 07:44 PM, Jakov Sosic wrote:
> On 01/16/2013 02:14 PM, Jan Pokorn? wrote:
> 
>> The point here is that once the public certificate of ccs is recognized by
>> ricci as authorized by supplying the password within the initial session,
>> any other other session will be passwordless, based only on the "proved"
>> client's certificate.
>>
>> Your intention seems to be to skip the initial phase involving password,
>> is it the case?  This should be doable by forcing ccs to generate its
>> certificate by doing some NO-OP, then copying (scp?) the public part
>> to the predefined destination at the machine with ricci installed,
>> e.g.:
>>
>>   [root at client1]# ccs -h localhost -p IGNOREME --getconf &>/dev/null
>>   [root at client1]# PUBLIC_CERT=~/.ccs/cacert/pem
>>   [root at client1]# RICCI_CLIENTS=/var/lib/ricci/certs/clients
>>   [root at client1]# UNIQUE_SUFFIX=$(hostname | sha1sum | cut -b1-6)
>>   [root at client1]# RICCI_CERT=${RICCI_CLIENTS}/client_cert_${UNIQUE_SUFFIX}
>>   [root at client1]# scp $PUBLIC_CERT riccihost:$RICCI_CERT
> 
> Thank you for your explanation. I've figured that out later myself :)
> 
> So, instead of using the sha1sum to avoide collisions, I use nodename.
> So my client_cert names look like this:
> 
> client_cert_mynode1
> client_cert_mynode2
> ...
> 
> Is this ok? Or should I obfuscate name for some reason...
> 
> 
>> Surely, the first step can be substituted by either using pregenerated
>> certificate + key on the locations expected by ccs (~/.ccs) or
>> generating them explicitly (e.g., by "openssl req") as part
>> of the process.  The point is that css-local and ricci-tracked
>> certificate (one of presumably many) matches.
> 
> I've done this by pre-generating the certificates on my puppet master.
> 
> 
>>> I'm building puppet module which will autoconfigure whole cluster from bare
>>> metal to working state. So far my only problem is updating cluster.conf, for
>>> which I need fully working ricci CA and user certificates in /root/.ccs of
>>> every node...
>>
>> By any chance, are you willing to share the module or its skeleton
>> to the community?
> 
> Offcourse, as soon as I'm happy with the code and the level of
> functionality. We have around dozen clusters, I'm developing this module
> on a new one that's supposed to go into production soon. Other clusters
> use older module which really doesn't solve any of this, so as soon as
> my code is stable we'll push other clusters to new puppet module. After
> that, I will publish it. So expect it in another week or two.
> 
> 
>> Hope the above helps.
> 
> Yeah, it really did help. But, for some strange reason it seems that
> ccs_sync doesn't use certificates, but instead it asks for password...
> 
> My idea was to use ccs_sync to propagate new cluster.conf. So, puppet
> puts cluster.conf in /etc/cluster.conf, and after that runs ccs_sync -f
> /etc/cluster.conf
> 
> But unfortunately, ccs_sync doesn't seem to recognize the certificates
> as ccs does :( Any idea on this one?
> 
> pgsql01-xc # ccs -h pgsql01-xc --getversion
> 2
> 
> pgsql01-xc # ccs_sync -f /etc/cluster.conf
> You have not authenticated to the ricci daemon on pgsql01-xc
> Password:
> 
> 
> I'm digging into source code to try to get some sense of it :-/

It seems that ccs_sync run as root uses /var/lib/ricci/cacert.pem as
it's own client certificate...


Do you think if it's OK to use same client certificate for root user
(/root/.ccs/cacert.pem) and for ricci user (/var/lib/ricci/cacert.pem)
on the same machine?

That way I wouldn't need to generate additional certificates for root
user but just use existing ones. As it seems ccs_sync already uses them...


From epretorious at yahoo.com  Fri Jan 18 20:40:49 2013
From: epretorious at yahoo.com (Eric)
Date: Fri, 18 Jan 2013 12:40:49 -0800 (PST)
Subject: [Linux-cluster] [RESOLVED] HA iSCSI with DRBD
In-Reply-To: <1358485144.16911.YahooMailNeo@web126001.mail.ne1.yahoo.com>
References: <1358485144.16911.YahooMailNeo@web126001.mail.ne1.yahoo.com>
Message-ID: <1358541649.85444.YahooMailNeo@web126004.mail.ne1.yahoo.com>

After rebooting both nodes, I checked the cluster status again and found this:
Code:

> san1:~ # crm_mon -1
> ============
> Last updated: Fri Jan 18 11:51:28 2013
> Last change: Fri Jan 18 09:00:03 2013 by root via cibadmin on san2
> Stack: openais
> Current DC: san2 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 9 Resources configured.
> ============
> 
> Online: [ san1 san2 ]
> 
>? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>????? Masters: [ san2 ]
>????? Slaves: [ san1 ]
>? Resource Group: g_iSCSI-san1
>????? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san2
>????? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>????? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>????? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>????? p_iSCSI-san1_3??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>????? p_iSCSI-san1_4??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
>????? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Stopped 
> 
> Failed actions:
>???? p_iSCSI-san1_4_start_0 (node=san1, call=25, rc=1, status=complete): unknown error
>???? p_iSCSI-san1_4_start_0 (node=san2, call=30, rc=1, status=complete): unknown error

...and that's when it occured to me: There are only four volumes defined in the DRBD cofiguration (0, 1, 2, & 3) - not five (0, 1, 2, 3, & 4)! i.e., The p_iSCSI-san1_4 primitive was failing (because there is no volume /dev/drbd4) and that, in turn, was holding up theresource group g_iSCSI-san1 and causing all of the other primitives [e.g., p_IP-1_254] to fail too!

So, I deleted p_iSCSI-san1_4 from the CIB and the cluster began working as designed:

> san2:~ # ll /dev/drbd*
> brw-rw---- 1 root disk 147, 0 Jan 18 11:47 /dev/drbd0
> brw-rw---- 1 root disk 147, 1 Jan 18 11:47 /dev/drbd1
> brw-rw---- 1 root disk 147, 2 Jan 18 11:47 /dev/drbd2
> brw-rw---- 1 root disk 147, 3 Jan 18 11:47 /dev/drbd3
> 
> ...
> 

> san2:~ # crm_mon -1
> ============
> Last updated: Fri Jan 18 11:53:03 2013
> Last change: Fri Jan 18 11:52:58 2013 by root via cibadmin on san2
> Stack: openais
> Current DC: san2 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 8 Resources configured.
> ============
> 
> Online: [ san1 san2 ]
> 
>? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>????? Masters: [ san2 ]
>????? Slaves: [ san1 ]
>? Resource Group: g_iSCSI-san1
>????? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san2
>????? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>????? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>????? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>????? p_iSCSI-san1_3??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>????? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Started san2

From the iSCSI client (xen2):

> xen2:~ # iscsiadm -m discovery -t st -p 192.168.1.254
> 192.168.1.254:3260,1 iqn.2012-11.com.example.san1:sda
> 192.168.0.2:3260,1 iqn.2012-11.com.example.san1:sda
> 192.168.1.2:3260,1 iqn.2012-11.com.example.san1:sda

Problem fixed!

Eric Pretorious
Truckee, CA


>________________________________
> From: Eric <epretorious at yahoo.com>
>To: linux clustering <linux-cluster at redhat.com> 
>Sent: Thursday, January 17, 2013 8:59 PM
>Subject: [Linux-cluster] HA iSCSI with DRBD
> 
>
>I've been attempting to follow the recipe laid-out in the Linbit guide "Highly available iSCSI storage with DRBD and Pacemaker" to create a highly-available iSCSI server on the two servers san1 & san2 but can't quite get the details right:
>
>
>> crm configure property stonith-enabled=false
>> crm configure property no-quorum-policy=ignore
>> 
>> crm configure primitive p_IP-1_254 ocf:heartbeat:IPaddr2 params ip=192.168.1.254 cidr_netmask=24 op monitor interval=30s
>> 
>> crm configure primitive p_DRBD-r0 ocf:linbit:drbd params drbd_resource=r0 op monitor interval=60s
>> crm configure ms
 ms_DRBD-r0 p_DRBD-r0 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
>> 
>> crm configure primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget params iqn=iqn.2012-11.com.example.san1:sda op monitor interval=10s
>> crm configure primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=0 path=/dev/drbd0 op monitor interval=10s
>> crm configure primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=1 path=/dev/drbd1 op monitor interval=10s
>> crm configure primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=2 path=/dev/drbd2 op monitor interval=10s
>> crm configure primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=3 path=/dev/drbd3 op monitor interval=10s
>> crm configure
 primitive p_iSCSI-san1_4 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=4 path=/dev/drbd4 op monitor interval=10s
>> 
>> crm configure group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_iSCSI-san1_4 p_IP-1_254
>> crm configure order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start
>> crm configure colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master
>> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1
>
>
>IET (i.e., iscsitarget) is already running (with the default configuration) and DRBD's already correctly configured to create the resource r0...
>
>
>> resource r0 {
>> ??? volume 0 {
>> ??? ??? device /dev/drbd0 ;
>> ??? ??? disk /dev/sda7 ;
>> ??? ??? meta-disk internal ;
>> ??? }
>> ??? volume 1 {
>> ??? ??? device /dev/drbd1 ;
>> ??? ??? disk /dev/sda8 ;
>> ??? ??? meta-disk internal
 ;
>> ??? }
>> ??? volume 2 {
>> ??? ??? device /dev/drbd2 ;
>> ??? ??? disk /dev/sda9 ;
>> ??? ??? meta-disk internal ;
>> ??? }
>> ??? volume 3 {
>> ??? ??? device /dev/drbd3 ;
>> ??? ??? disk /dev/sda10 ;
>> ??? ??? meta-disk internal ;
>> ??? }
>> ??? on san1 {
>> ??? ??? address 192.168.1.1:7789 ;
>> ??? }
>> ??? on san2 {
>> ??? ??? address 192.168.1.2:7789 ;
>> ??? }
>> }
>
>
>
>But the shared IP address won't start nor will the LUN's:
>
>
>> san1:~ # crm_mon -1
>> ============
>> Last updated: Thu Jan 17 20:55:55 2013
>> Last change: Thu Jan 17 20:55:09 2013 by root via cibadmin on san1
>> Stack: openais
>> Current DC: san1 - partition with quorum
>> Version:
 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
>> 2 Nodes configured, 2 expected votes
>> 9 Resources configured.
>> ============
>> 
>> Online: [ san1 san2 ]
>> 
>>? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>> ???? Masters: [ san1 ]
>> ???? Slaves: [ san2 ]
>>? Resource Group: g_iSCSI-san1
>> ???? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san1
>> ???? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
>> ???? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
>> ???? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
>> ???? p_iSCSI-san1_3???
 (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
>> ???? p_iSCSI-san1_4??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
>> ???? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Stopped 
>> 
>> Failed actions:
>> ??? p_iSCSI-san1_0_start_0 (node=san1, call=23, rc=1, status=complete): unknown error
>> ??? p_iSCSI-san1_1_start_0 (node=san1, call=26, rc=1, status=complete): unknown error
>> ??? p_iSCSI-san1_2_start_0 (node=san1, call=29, rc=1, status=complete): unknown error
>> ??? p_iSCSI-san1_3_start_0 (node=san1, call=32, rc=1, status=complete): unknown error
>> ??? p_iSCSI-san1_4_start_0 (node=san1, call=35, rc=1, status=complete): unknown error
>> ??? p_iSCSI-san1_0_start_0 (node=san2, call=11, rc=1,
 status=complete): unknown error
>> ??? p_iSCSI-san1_1_start_0 (node=san2, call=14, rc=1, status=complete): unknown error
>> ??? p_iSCSI-san1_2_start_0 (node=san2, call=17, rc=1, status=complete): unknown error
>> ??? p_iSCSI-san1_3_start_0 (node=san2, call=20, rc=1, status=complete): unknown error
>> ??? p_iSCSI-san1_4_start_0 (node=san2, call=23, rc=1, status=complete): unknown error
>
>
>
>What am I doing wrong?
>
>
>
>TIA,
>Eric Pretorious
>Truckee, CA
>
>-- 
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130118/02b73185/attachment.htm>

From jpokorny at redhat.com  Fri Jan 18 22:04:44 2013
From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=)
Date: Fri, 18 Jan 2013 23:04:44 +0100
Subject: [Linux-cluster] Problem with automating ricci & ccs
In-Reply-To: <50F9A613.30902@srce.hr>
References: <50F363CC.3020308@srce.hr> <20130116131456.GA6079@redhat.com>
	<50F99800.6070102@srce.hr> <50F9A613.30902@srce.hr>
Message-ID: <20130118220444.GA13473@redhat.com>

On 18/01/13 20:44 +0100, Jakov Sosic wrote:
> On 01/18/2013 07:44 PM, Jakov Sosic wrote:
>> 
>> Thank you for your explanation. I've figured that out later myself :)
>> 
>> So, instead of using the sha1sum to avoide collisions, I use nodename.
>> So my client_cert names look like this:
>> 
>> client_cert_mynode1
>> client_cert_mynode2
>> ...
>> 
>> Is this ok? Or should I obfuscate name for some reason...

Indeed, you can go with whatever naming convention you like, only location
is important.  Honestly, I wasn't sure so sticked with the internal naming
convention of ricci, which does not hurt either (note that during
the cluster lifetime, you can, e.g., re-add a node under different name so
this descriptive naming can get out of sync, but not a big deal).

>>> Surely, the first step can be substituted by either using pregenerated
>>> certificate + key on the locations expected by ccs (~/.ccs) or
>>> generating them explicitly (e.g., by "openssl req") as part
>>> of the process.  The point is that css-local and ricci-tracked
>>> certificate (one of presumably many) matches.
>> 
>> I've done this by pre-generating the certificates on my puppet master.
>> 
>> 
>>>> I'm building puppet module which will autoconfigure whole cluster
>>>> from bare metal to working state. So far my only problem is
>>>> updating cluster.conf, for which I need fully working ricci CA
>>>> and user certificates in /root/.ccs of every node...
>>> 
>>> By any chance, are you willing to share the module or its skeleton
>>> to the community?
>> 
>> Offcourse, as soon as I'm happy with the code and the level of
>> functionality. We have around dozen clusters, I'm developing this module
>> on a new one that's supposed to go into production soon. Other clusters
>> use older module which really doesn't solve any of this, so as soon as
>> my code is stable we'll push other clusters to new puppet module. After
>> that, I will publish it. So expect it in another week or two.

Cool, thanks.

>>> Hope the above helps.
>> 
>> Yeah, it really did help. But, for some strange reason it seems that
>> ccs_sync doesn't use certificates, but instead it asks for password...

See below.

>> My idea was to use ccs_sync to propagate new cluster.conf. So,
>> puppet puts cluster.conf in /etc/cluster.conf, and after that runs
>> ccs_sync -f /etc/cluster.conf
>> 
>> But unfortunately, ccs_sync doesn't seem to recognize the certificates
>> as ccs does :( Any idea on this one?
>> 
>> pgsql01-xc # ccs -h pgsql01-xc --getversion
>> 2
>> 
>> pgsql01-xc # ccs_sync -f /etc/cluster.conf
>> You have not authenticated to the ricci daemon on pgsql01-xc
>> Password:
>> 
>> 
>> I'm digging into source code to try to get some sense of it :-/
> 
> It seems that ccs_sync run as root uses /var/lib/ricci/cacert.pem as
> it's own client certificate...

(/var/lib/ricci/certs/cacert.pem)

Yes, but it is a little bit more complicated.  When ricci is run for
the first time (prerequisite [1] to run either "cman_tool version ..."
or ccs_sync directly [*]), it generates its OpenSSL certificate
(/var/lib/ricci/certs/cacert.pem) + key, which are then 1:1 cloned
to PKCS#12 format and put into NSS certificate DB (in the same dir)
and this is what ccs_sync uses to obtain its client certificate.

> Do you think if it's OK to use same client certificate for root user
> (/root/.ccs/cacert.pem) and for ricci user (/var/lib/ricci/cacert.pem)
> on the same machine?

As long as you do not need per-client granularity (e.g., to forcibly
revoke/remove particular certificate from /var/lib/ricci/certs/clients;
btw. custom named cert files here would actually prove useful as
otherwise one would have to do a tedious certificate-content-matching
task to identify a correct victim)...

> That way I wouldn't need to generate additional certificates for root
> user but just use existing ones. As it seems ccs_sync already uses them...

See above, really depends on the level of permissions management you
want to achieve (in extreme, there can be a single certificate for
everything, but I wouldn't recommend this).

Other possibility, although suffering from the similar global
permission issue, is to use local certificate authority whose
certificates (every and each) will be automatically considered
as trusted.  It looks like that to you utilize this, you would
need to append the certificate of this CA to
/var/lib/ricci/certs/auth_CAs.pem.  As this path is not commonly
used, this is for the braver ones :)


[*] in fact, this prerequisite can be avoided (a/ by specifying "-c"
    option to ccs_sync AND b/ explicitly listing other nodes as
    arguments, but again, these nodes have to be running ricci),
    however this a degenerate case and best if forgotten

[1] http://www.redhat.com/archives/linux-cluster/2010-November/msg00163.html

-- 
Jan


From DJCapstick1 at uclan.ac.uk  Mon Jan 21 16:18:59 2013
From: DJCapstick1 at uclan.ac.uk (David John Capstick)
Date: Mon, 21 Jan 2013 16:18:59 +0000
Subject: [Linux-cluster] Course of action if Cluster Manager cannot stop a
 Percona Mysql application/service
Message-ID: <F53FB9BB8B9D0C40961896D2E4B069654043FE90@LXMBX-004.ntds.uclan.ac.uk>

Hi,

I am investigating a problem that occurred some time ago with a two node cluster. It would appear that rgmanager was unable to stop the application (percona mysql) cleanly according to /var/log/messages. After a while it would appear that rgmanager did start the service again. Does this mean that despite the messages it was indeed able to shut the service down first ?

If a service cannot be stopped cleanly I would have thought that rgmanager does not try and start it again - is this view wrong ?

Also the logs show that rgmanager tried to stop the service at 05:06:04 but how do you discover why this action was taken ?

I have included an excerpt of /var/log/messages.

Many Thanks

David


Nov 17 22:43:03 db1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="2202" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Nov 20 05:06:04 db1 rgmanager[11672]: Stopping service service:mysql-master
Nov 20 05:06:04 db1 rgmanager[14368]: [mysqld] Stopping Service mysqld:mysql-master
Nov 20 05:06:26 db1 rgmanager[14463]: [mysqld] Stopping Service mysqld:mysql-master > Failed - Application Is Still Running
Nov 20 05:06:26 db1 rgmanager[14485]: [mysqld] Stopping Service mysqld:mysql-master > Failed
Nov 20 05:06:26 db1 rgmanager[11672]: stop on mysqld "mysql-master" returned 1 (generic error)
Nov 20 05:06:26 db1 rgmanager[14559]: [fs] unmounting /srv/mysql-master/mnt
Nov 20 05:06:31 db1 rgmanager[14637]: [fs] unmounting /srv/mysql-master/mnt
Nov 20 05:06:37 db1 rgmanager[14713]: [fs] unmounting /srv/mysql-master/mnt
Nov 20 05:06:37 db1 rgmanager[14758]: [fs] 'umount /srv/mysql-master/mnt' failed, error=1
Nov 20 05:06:37 db1 rgmanager[11672]: stop on fs "mysql-master" returned 1 (generic error)
Nov 20 05:06:37 db1 rgmanager[14811]: [ip] Removing IPv4 address 192.168.249.120/24 from eth0
Nov 20 05:06:38 db1 ntpd[8006]: Deleting interface #28 eth0, 192.168.249.120#123, interface stats: received=0, sent=0, dropped=0, active_time=5767950 secs
Nov 20 05:06:47 db1 rgmanager[11672]: #12: RG service:mysql-master failed to stop; intervention required
Nov 20 05:06:47 db1 rgmanager[11672]: Service service:mysql-master is failed
Nov 20 05:07:32 db1 rgmanager[11672]: #43: Service service:mysql-master has failed; can not start.
Nov 20 05:07:32 db1 rgmanager[11672]: #13: Service service:mysql-master failed to stop cleanly
Nov 20 05:09:46 db1 rgmanager[11672]: #43: Service service:mysql-master has failed; can not start.
Nov 20 05:09:46 db1 rgmanager[11672]: #13: Service service:mysql-master failed to stop cleanly
Nov 20 05:10:37 db1 rgmanager[11672]: #43: Service service:mysql-master has failed; can not start.
Nov 20 05:10:37 db1 rgmanager[11672]: #13: Service service:mysql-master failed to stop cleanly
Nov 20 05:11:06 db1 rgmanager[11672]: #43: Service service:mysql-master has failed; can not start.
Nov 20 05:11:06 db1 rgmanager[11672]: #13: Service service:mysql-master failed to stop cleanly
Nov 20 05:16:50 db1 rgmanager[11672]: Starting stopped service service:mysql-master
Nov 20 05:16:50 db1 rgmanager[15291]: [ip] Adding IPv4 address 192.168.249.120/24 to eth0
Nov 20 05:16:53 db1 ntpd[8006]: Listening on interface #29 eth0, 192.168.249.120#123 Enabled
Nov 20 05:16:53 db1 rgmanager[15516]: [mysqld] Checking Existence Of File /var/run/cluster/mysqld/mysqld:mysql-master.pid [mysqld:mysql-master] > Failed
Nov 20 05:16:54 db1 rgmanager[15538]: [mysqld] Monitoring Service mysqld:mysql-master > Service Is Not Running
Nov 20 05:16:54 db1 rgmanager[15560]: [mysqld] Starting Service mysqld:mysql-master
Nov 20 05:16:58 db1 rgmanager[11672]: Service service:mysql-master started
Nov 20 10:42:01 db1 auditd[7280]: Audit daemon rotating log files

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130121/04b7d121/attachment.htm>

From pankajgundare at gmail.com  Tue Jan 22 04:58:23 2013
From: pankajgundare at gmail.com (Pankaj)
Date: Tue, 22 Jan 2013 10:28:23 +0530
Subject: [Linux-cluster] Cluster restarting.....
Message-ID: <CANOmy+tmoFMTs_Tv4aZy2LasH5_0XmE9ptXTpHby5Bp1i90fQA@mail.gmail.com>

Hi,

I Have two node of production server , but now a days when I am starting
cman service on both node simultaneously one node(secondary ) restarted
automatically , please give me solution.

Thanks
Pankaj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130122/0b80cd53/attachment.htm>

From washer at trlp.com  Tue Jan 22 05:32:44 2013
From: washer at trlp.com (James Washer)
Date: Mon, 21 Jan 2013 21:32:44 -0800
Subject: [Linux-cluster] Cluster restarting.....
In-Reply-To: <CANOmy+tmoFMTs_Tv4aZy2LasH5_0XmE9ptXTpHby5Bp1i90fQA@mail.gmail.com>
References: <CANOmy+tmoFMTs_Tv4aZy2LasH5_0XmE9ptXTpHby5Bp1i90fQA@mail.gmail.com>
Message-ID: <CAO=CEwGLjzRjC7bfEParWYtN_y2cq-_6yRm6ZWPjnSuOa=wh2A@mail.gmail.com>

Did you look at the logs?

On Mon, Jan 21, 2013 at 8:58 PM, Pankaj <pankajgundare at gmail.com> wrote:

> Hi,
>
> I Have two node of production server , but now a days when I am starting
> cman service on both node simultaneously one node(secondary ) restarted
> automatically , please give me solution.
>
> Thanks
> Pankaj
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 


 - jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130121/c5a26b85/attachment.htm>

From Sagar.Shimpi at tieto.com  Tue Jan 22 05:59:14 2013
From: Sagar.Shimpi at tieto.com (Sagar.Shimpi at tieto.com)
Date: Tue, 22 Jan 2013 07:59:14 +0200
Subject: [Linux-cluster] Cluster restarting.....
In-Reply-To: <CANOmy+tmoFMTs_Tv4aZy2LasH5_0XmE9ptXTpHby5Bp1i90fQA@mail.gmail.com>
References: <CANOmy+tmoFMTs_Tv4aZy2LasH5_0XmE9ptXTpHby5Bp1i90fQA@mail.gmail.com>
Message-ID: <F9FFD5DEC2163F498E703E7F9CBB3A351B1C1FF583@EXMB01.eu.tieto.com>

Can you send me the logs of both the nodes ?


Regards,

Sagar Shimpi, Senior Technical Specialist, OSS Labs

Tieto
email sagar.shimpi at tieto.com<mailto:aniruddha.khadkikar at tieto.com>,
Wing 1, Cluster D, EON Free Zone, Plot No. 1, Survery # 77,
MIDC Kharadi Knowledge Park, Pune 411014, India, www.tieto.com www.tieto.in

TIETO. Knowledge. Passion. Results.

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Pankaj
Sent: Tuesday, January 22, 2013 10:28 AM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Cluster restarting.....

Hi,

I Have two node of production server , but now a days when I am starting cman service on both node simultaneously one node(secondary ) restarted automatically , please give me solution.

Thanks
Pankaj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130122/fc8401b7/attachment.htm>

From lists at alteeve.ca  Tue Jan 22 15:59:54 2013
From: lists at alteeve.ca (Digimer)
Date: Tue, 22 Jan 2013 10:59:54 -0500
Subject: [Linux-cluster] Cluster restarting.....
In-Reply-To: <CANOmy+tmoFMTs_Tv4aZy2LasH5_0XmE9ptXTpHby5Bp1i90fQA@mail.gmail.com>
References: <CANOmy+tmoFMTs_Tv4aZy2LasH5_0XmE9ptXTpHby5Bp1i90fQA@mail.gmail.com>
Message-ID: <50FEB77A.3050907@alteeve.ca>

On 01/21/2013 11:58 PM, Pankaj wrote:
> Hi,
> 
> I Have two node of production server , but now a days when I am starting
> cman service on both node simultaneously one node(secondary ) restarted
> automatically , please give me solution.
> 
> Thanks
> Pankaj

You need to share more information about what cluster you are running. I
suspect what happened is that the "post_join_delay", which is a default
of 6 seconds, expired so the first node fenced the second node. You can
change this to, say, "30" seconds to give yourself more time to start
cman on the other node.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From rhayden.public at gmail.com  Tue Jan 22 17:22:25 2013
From: rhayden.public at gmail.com (Robert Hayden)
Date: Tue, 22 Jan 2013 11:22:25 -0600
Subject: [Linux-cluster] self_fence for FS resource in RHEL 6.x operational?
Message-ID: <CANqTVAFDwLHiom-xUP+dkJs7TdKsDQTN1a9xynLa3wr9oqYgjQ@mail.gmail.com>

I am testing RHCS 6.3 and found that the self_fence option for a file
system resource will now longer function as expected.  Before I log an SR
with RH, I was wondering if the design changed between RHEL 5 and RHEL 6.

In RHEL 5, I see logic in /usr/share/cluster/fs.sh that will complete a
"reboot -fn" command on a self_fence logic.  In RHEL 6, there is little to
no logic around self_fence in the fs.sh file.

Example of RHEL 5 logic in fs.sh that appears to be removed from RHEL 6:
        if [ -n "$umount_failed" ]; then
                ocf_log err "'umount $mp' failed, error=$ret_val"

                if [ "$self_fence" ]; then
                        ocf_log alert "umount failed - REBOOTING"
                        sync
                        reboot -fn
                fi
                return $FAIL
        else
                return $SUCCESS
        fi


To test in RHEL 6, I simply create a file system (e.g. /test/data) resource
with self_fence="1" or self_fence="on" (as added by Conga).  Then mount a
small ISO image on top of the file system.  This mount will cause the file
system resource to be unable to unmount itself and should trigger a
self_fence scenario.

Testing RHEL 6, I see the following in /var/log/messages:

Jan 21 16:40:59 techval16 rgmanager[82637]: [fs] unmounting /test/data
Jan 21 16:40:59 techval16 rgmanager[82777]: [fs] Sending SIGTERM to
processes on /test/data
Jan 21 16:41:04 techval16 rgmanager[82859]: [fs] unmounting /test/data
Jan 21 16:41:05 techval16 rgmanager[82900]: [fs] Sending SIGKILL to
processes on /test/data
Jan 21 16:41:05 techval16 rgmanager[61929]: stop on fs "share16_data"
returned 1 (generic error)
Jan 21 16:41:05 techval16 rgmanager[61929]: #12: RG service:fstest_node16
failed to stop; intervention required
Jan 21 16:41:05 techval16 rgmanager[61929]: Service service:fstest_node16
is failed


Thanks
Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130122/fbb0d727/attachment.htm>

From fdinitto at redhat.com  Tue Jan 22 18:38:26 2013
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 22 Jan 2013 19:38:26 +0100
Subject: [Linux-cluster] self_fence for FS resource in RHEL 6.x
	operational?
In-Reply-To: <CANqTVAFDwLHiom-xUP+dkJs7TdKsDQTN1a9xynLa3wr9oqYgjQ@mail.gmail.com>
References: <CANqTVAFDwLHiom-xUP+dkJs7TdKsDQTN1a9xynLa3wr9oqYgjQ@mail.gmail.com>
Message-ID: <50FEDCA2.9080008@redhat.com>

On 01/22/2013 06:22 PM, Robert Hayden wrote:
> I am testing RHCS 6.3 and found that the self_fence option for a file
> system resource will now longer function as expected.  Before I log an
> SR with RH, I was wondering if the design changed between RHEL 5 and RHEL 6.
> 
> In RHEL 5, I see logic in /usr/share/cluster/fs.sh that will complete a
> "reboot -fn" command on a self_fence logic.  In RHEL 6, there is little
> to no logic around self_fence in the fs.sh file.

The logic has just been moved to a common file shared by all *fs
resources (fs-lib)


> 
> Example of RHEL 5 logic in fs.sh that appears to be removed from RHEL 6:
>         if [ -n "$umount_failed" ]; then
>                 ocf_log err "'umount $mp' failed, error=$ret_val"
> 
>                 if [ "$self_fence" ]; then
>                         ocf_log alert "umount failed - REBOOTING"
>                         sync
>                         reboot -fn
>                 fi
>                 return $FAIL
>         else
>                 return $SUCCESS
>         fi

same code, just different file.

> 
> 
> 
> To test in RHEL 6, I simply create a file system (e.g. /test/data)
> resource with self_fence="1" or self_fence="on" (as added by Conga). 
> Then mount a small ISO image on top of the file system.  This mount will
> cause the file system resource to be unable to unmount itself and should
> trigger a self_fence scenario.
> 
> Testing RHEL 6, I see the following in /var/log/messages:
> 
> Jan 21 16:40:59 techval16 rgmanager[82637]: [fs] unmounting /test/data
> Jan 21 16:40:59 techval16 rgmanager[82777]: [fs] Sending SIGTERM to
> processes on /test/data
> Jan 21 16:41:04 techval16 rgmanager[82859]: [fs] unmounting /test/data
> Jan 21 16:41:05 techval16 rgmanager[82900]: [fs] Sending SIGKILL to
> processes on /test/data
> Jan 21 16:41:05 techval16 rgmanager[61929]: stop on fs "share16_data"
> returned 1 (generic error)

Looks like a bug in force_umount option.

Please file a ticket with RH GSS.

As workaround try to disable force_umount.

As far as I can tell, but I haven't verify it:
ocf_log warning "Sending SIGKILL to processes on $mp"
                        fuser -kvm "$mp"

                        case $? in
                        0)
                                ;;
                        1)
                                return $OCF_ERR_GENERIC
                                ;;
                        2)
                                break
                                ;;
                        esac

the issue is the was fuser error is handled in force_umount path, that
would match the log you are posting.

I think the correct way would be to check if self_fence is enabled or
not and then return/reboot later on the script.

Fabio


From epretorious at yahoo.com  Wed Jan 23 01:05:57 2013
From: epretorious at yahoo.com (Eric)
Date: Tue, 22 Jan 2013 17:05:57 -0800 (PST)
Subject: [Linux-cluster] [RESOLVED] HA iSCSI with DRBD
In-Reply-To: <1358541649.85444.YahooMailNeo@web126004.mail.ne1.yahoo.com>
References: <1358485144.16911.YahooMailNeo@web126001.mail.ne1.yahoo.com>
	<1358541649.85444.YahooMailNeo@web126004.mail.ne1.yahoo.com>
Message-ID: <1358903157.86396.YahooMailNeo@web126001.mail.ne1.yahoo.com>

I realized, quite accidentally, that any downtime on either of the nodes (e.g., a reboot) causes corruption/inconsistencies in the DRBD resources because the DRBD node that was the DRBD primary (i.e., the preferred-primary) will forcefully become primary again when the node returns [thereby discarding modifications made on the older primary].

Therefore,  in order to prevent this from happening, it's probably best to REMOVE the final primitive from each group:

> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1
> crm configure location l_iSCSI-san1+DRBD-r1 p_IP-1_253 10240: san2
This will prevent Pacemaker from promoting the younger primary and overwriting the modifications made on the older primary [when the preferred-primary node returns]. The DRBD resources can be moved manually...

> crm resource move p_IP-1_254 san1
> crm resource move p_IP-1_253 san2

...in order to distribute the workload between san1 & san2.

Thoughts? Suggestions?

Eric Pretorious
Truckee, CA


>________________________________
> From: Eric <epretorious at yahoo.com>
>To: linux clustering <linux-cluster at redhat.com> 
>Sent: Friday, January 18, 2013 12:40 PM
>Subject: Re: [Linux-cluster] [RESOLVED] HA iSCSI with DRBD
> 
>
>After rebooting both nodes, I checked the cluster status again and found this:
>Code:
>
>> san1:~ # crm_mon -1
>> ============
>> Last updated: Fri Jan 18 11:51:28 2013
>> Last change: Fri Jan 18 09:00:03 2013 by root via cibadmin on san2
>> Stack: openais
>> Current DC: san2 - partition with quorum
>> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
>> 2 Nodes configured, 2 expected votes
>> 9 Resources configured.
>> ============
>> 
>> Online: [ san1 san2 ]
>> 
>>? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>>????? Masters: [ san2 ]
>>????? Slaves: [ san1 ]
>>? Resource Group: g_iSCSI-san1
>>?????
 p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san2
>>????? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>>????? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>>????? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>>????? p_iSCSI-san1_3??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>>????? p_iSCSI-san1_4??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
>>????? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Stopped 
>> 
>> Failed actions:
>>????
 p_iSCSI-san1_4_start_0 (node=san1, call=25, rc=1, status=complete): unknown error
>>???? p_iSCSI-san1_4_start_0 (node=san2, call=30, rc=1, status=complete): unknown error
>
>...and that's when it occured to me: There are only four volumes defined in the DRBD cofiguration (0, 1, 2, & 3) - not five (0, 1, 2, 3, & 4)! i.e., The p_iSCSI-san1_4 primitive was failing (because there is no volume /dev/drbd4) and that, in turn, was holding up theresource group g_iSCSI-san1 and causing all of the other primitives [e.g., p_IP-1_254] to fail too!
>
>So, I deleted p_iSCSI-san1_4 from the CIB and the cluster began working as designed:
>
>> san2:~ # ll /dev/drbd*
>> brw-rw---- 1 root disk 147, 0 Jan 18 11:47 /dev/drbd0
>> brw-rw---- 1 root disk 147, 1 Jan 18 11:47 /dev/drbd1
>> brw-rw---- 1 root disk 147, 2 Jan 18 11:47 /dev/drbd2
>> brw-rw---- 1 root disk 147, 3 Jan 18 11:47 /dev/drbd3
>> 
>> ...
>> 
>
>> san2:~ # crm_mon -1
>> ============
>> Last updated: Fri Jan 18 11:53:03 2013
>> Last change: Fri Jan 18 11:52:58 2013 by root via cibadmin on san2
>> Stack: openais
>> Current DC: san2 - partition with quorum
>> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
>> 2 Nodes configured, 2 expected votes
>> 8 Resources configured.
>> ============
>> 
>> Online: [ san1 san2 ]
>> 
>>? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>>????? Masters: [ san2
 ]
>>????? Slaves: [ san1 ]
>>? Resource Group: g_iSCSI-san1
>>????? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san2
>>????? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>>????? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>>????? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>>????? p_iSCSI-san1_3??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2
>>????? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Started san2
>
>From the iSCSI client (xen2):
>
>> xen2:~ #
 iscsiadm -m discovery -t st -p 192.168.1.254
>> 192.168.1.254:3260,1 iqn.2012-11.com.example.san1:sda
>> 192.168.0.2:3260,1 iqn.2012-11.com.example.san1:sda
>> 192.168.1.2:3260,1 iqn.2012-11.com.example.san1:sda
>
>
>Problem fixed!
>
>
>Eric Pretorious
>Truckee, CA
>
>
>
>>________________________________
>> From: Eric <epretorious at yahoo.com>
>>To: linux clustering <linux-cluster at redhat.com> 
>>Sent: Thursday, January 17, 2013 8:59 PM
>>Subject: [Linux-cluster] HA iSCSI with DRBD
>> 
>>
>>I've been attempting to follow the recipe laid-out in the Linbit guide "Highly available iSCSI storage with DRBD and Pacemaker" to create a highly-available iSCSI server on the two servers san1 & san2 but can't quite get the details right:
>>
>>
>>> crm configure property stonith-enabled=false
>>> crm configure property no-quorum-policy=ignore
>>> 
>>> crm configure primitive p_IP-1_254 ocf:heartbeat:IPaddr2 params ip=192.168.1.254 cidr_netmask=24 op monitor interval=30s
>>> 
>>> crm configure primitive p_DRBD-r0 ocf:linbit:drbd params drbd_resource=r0 op monitor interval=60s
>>> crm configure
 ms
 ms_DRBD-r0 p_DRBD-r0 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
>>> 
>>> crm configure primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget params iqn=iqn.2012-11.com.example.san1:sda op monitor interval=10s
>>> crm configure primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=0 path=/dev/drbd0 op monitor interval=10s
>>> crm configure primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=1 path=/dev/drbd1 op monitor interval=10s
>>> crm configure primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=2 path=/dev/drbd2 op monitor interval=10s
>>> crm configure primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=3 path=/dev/drbd3 op monitor interval=10s
>>> crm configure
 primitive p_iSCSI-san1_4 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=4 path=/dev/drbd4 op monitor interval=10s
>>> 
>>> crm configure group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_iSCSI-san1_4 p_IP-1_254
>>> crm configure order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start
>>> crm configure colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master
>>> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1
>>
>>
>>IET (i.e., iscsitarget) is already running (with the default configuration) and DRBD's already correctly configured to create the resource r0...
>>
>>
>>> resource r0 {
>>> ??? volume 0 {
>>> ??? ??? device /dev/drbd0 ;
>>> ??? ??? disk /dev/sda7 ;
>>> ??? ??? meta-disk internal ;
>>> ??? }
>>> ??? volume 1 {
>>> ??? ??? device /dev/drbd1 ;
>>> ??? ??? disk /dev/sda8 ;
>>> ??? ??? meta-disk internal
 ;
>>> ??? }
>>> ??? volume 2 {
>>> ??? ??? device /dev/drbd2 ;
>>> ??? ??? disk /dev/sda9 ;
>>> ??? ??? meta-disk internal ;
>>> ??? }
>>> ??? volume 3 {
>>> ??? ??? device /dev/drbd3 ;
>>> ??? ??? disk /dev/sda10 ;
>>> ??? ??? meta-disk internal ;
>>> ??? }
>>> ??? on san1 {
>>> ??? ??? address 192.168.1.1:7789 ;
>>> ??? }
>>> ??? on san2 {
>>> ??? ??? address 192.168.1.2:7789 ;
>>> ??? }
>>> }
>>
>>
>>
>>But the shared IP address won't start nor will the LUN's:
>>
>>
>>> san1:~ # crm_mon -1
>>> ============
>>> Last updated: Thu Jan 17 20:55:55 2013
>>> Last change: Thu Jan 17 20:55:09 2013 by root via cibadmin on san1
>>> Stack: openais
>>> Current DC: san1 - partition with quorum
>>> Version:
 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
>>> 2 Nodes configured, 2 expected votes
>>> 9 Resources configured.
>>> ============
>>> 
>>> Online: [ san1 san2 ]
>>> 
>>>? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>>> ???? Masters: [ san1 ]
>>> ???? Slaves: [ san2 ]
>>>? Resource Group: g_iSCSI-san1
>>> ???? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san1
>>> ???? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
>>> ???? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
>>> ???? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
>>> ???? p_iSCSI-san1_3???
 (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
>>> ???? p_iSCSI-san1_4??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped 
>>> ???? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Stopped 
>>> 
>>> Failed actions:
>>> ??? p_iSCSI-san1_0_start_0 (node=san1, call=23, rc=1, status=complete): unknown error
>>> ??? p_iSCSI-san1_1_start_0 (node=san1, call=26, rc=1, status=complete): unknown error
>>> ??? p_iSCSI-san1_2_start_0 (node=san1, call=29, rc=1, status=complete): unknown error
>>> ??? p_iSCSI-san1_3_start_0 (node=san1, call=32, rc=1, status=complete): unknown error
>>> ??? p_iSCSI-san1_4_start_0 (node=san1, call=35, rc=1, status=complete): unknown error
>>> ??? p_iSCSI-san1_0_start_0 (node=san2, call=11, rc=1,
 status=complete): unknown error
>>> ??? p_iSCSI-san1_1_start_0 (node=san2, call=14, rc=1, status=complete): unknown error
>>> ??? p_iSCSI-san1_2_start_0 (node=san2, call=17, rc=1, status=complete): unknown error
>>> ??? p_iSCSI-san1_3_start_0 (node=san2, call=20, rc=1, status=complete): unknown error
>>> ??? p_iSCSI-san1_4_start_0 (node=san2, call=23, rc=1, status=complete): unknown error
>>
>>
>>
>>What am I doing wrong?
>>
>>
>>
>>TIA,
>>Eric Pretorious
>>Truckee, CA
>>
>>-- 
>>Linux-cluster mailing list
>>Linux-cluster at redhat.com
>>https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130122/e7006690/attachment.htm>

From topumirza at gmail.com  Thu Jan 24 16:03:47 2013
From: topumirza at gmail.com (topu mirza)
Date: Thu, 24 Jan 2013 22:03:47 +0600
Subject: [Linux-cluster] Cluster restarting.....
In-Reply-To: <50FEB77A.3050907@alteeve.ca>
References: <CANOmy+tmoFMTs_Tv4aZy2LasH5_0XmE9ptXTpHby5Bp1i90fQA@mail.gmail.com>
	<50FEB77A.3050907@alteeve.ca>
Message-ID: <CAB5aP+hZ-x8835zv1FTjk4G3AAgPdmLDW_s1ZRt2MFH4psu71g@mail.gmail.com>

set multicast address 224.0.0.1
and
fence_daemon post_fail_delay=45 post_join_delay=60

Thanks
Topu Mirza


On Tue, Jan 22, 2013 at 9:59 PM, Digimer <lists at alteeve.ca> wrote:

> On 01/21/2013 11:58 PM, Pankaj wrote:
> > Hi,
> >
> > I Have two node of production server , but now a days when I am starting
> > cman service on both node simultaneously one node(secondary ) restarted
> > automatically , please give me solution.
> >
> > Thanks
> > Pankaj
>
> You need to share more information about what cluster you are running. I
> suspect what happened is that the "post_join_delay", which is a default
> of 6 seconds, expired so the first node fenced the second node. You can
> change this to, say, "30" seconds to give yourself more time to start
> cman on the other node.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Mirza Jubayar Siddiq
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130124/44ac2980/attachment.htm>

From Tom_Dryden at BUDCO.com  Thu Jan 24 17:16:10 2013
From: Tom_Dryden at BUDCO.com (Dryden, Tom)
Date: Thu, 24 Jan 2013 12:16:10 -0500
Subject: [Linux-cluster] LDAP as a service
Message-ID: <1E7F581BEF7B8444A6D29997EECCC66C0838B556@BPMC-G0-EX1.budcotdc.net>

Greeting All

 
I am looking into implementing a 389 directory server in a clustered/GFS
environment.

Can anyone out there provide a pointer to information on implementing
389-directory server as a clustered service that can be relocated?

 
Thanks in advance

Tom

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130124/df609593/attachment.htm>

From rhayden.public at gmail.com  Thu Jan 24 17:28:38 2013
From: rhayden.public at gmail.com (Robert Hayden)
Date: Thu, 24 Jan 2013 11:28:38 -0600
Subject: [Linux-cluster] self_fence for FS resource in RHEL 6.x
	operational?
In-Reply-To: <50FEDCA2.9080008@redhat.com>
References: <CANqTVAFDwLHiom-xUP+dkJs7TdKsDQTN1a9xynLa3wr9oqYgjQ@mail.gmail.com>
	<50FEDCA2.9080008@redhat.com>
Message-ID: <CANqTVAEpAtern-4UtgxJfH7m0NU42=O-Oq4tMaoW7EO=shPs4w@mail.gmail.com>

On Tue, Jan 22, 2013 at 12:38 PM, Fabio M. Di Nitto <fdinitto at redhat.com> wrote:
>
> On 01/22/2013 06:22 PM, Robert Hayden wrote:
> > I am testing RHCS 6.3 and found that the self_fence option for a file
> > system resource will now longer function as expected.  Before I log an
> > SR with RH, I was wondering if the design changed between RHEL 5 and
> > RHEL 6.
> >
> > In RHEL 5, I see logic in /usr/share/cluster/fs.sh that will complete a
> > "reboot -fn" command on a self_fence logic.  In RHEL 6, there is little
> > to no logic around self_fence in the fs.sh file.
>
> The logic has just been moved to a common file shared by all *fs
> resources (fs-lib)
>
>
>
> >
> > Example of RHEL 5 logic in fs.sh that appears to be removed from RHEL 6:
> >         if [ -n "$umount_failed" ]; then
> >                 ocf_log err "'umount $mp' failed, error=$ret_val"
> >
> >                 if [ "$self_fence" ]; then
> >                         ocf_log alert "umount failed - REBOOTING"
> >                         sync
> >                         reboot -fn
> >                 fi
> >                 return $FAIL
> >         else
> >                 return $SUCCESS
> >         fi
>
> same code, just different file.
>
> >
> >
> >
> > To test in RHEL 6, I simply create a file system (e.g. /test/data)
> > resource with self_fence="1" or self_fence="on" (as added by Conga).
> > Then mount a small ISO image on top of the file system.  This mount will
> > cause the file system resource to be unable to unmount itself and should
> > trigger a self_fence scenario.
> >
> > Testing RHEL 6, I see the following in /var/log/messages:
> >
> > Jan 21 16:40:59 techval16 rgmanager[82637]: [fs] unmounting /test/data
> > Jan 21 16:40:59 techval16 rgmanager[82777]: [fs] Sending SIGTERM to
> > processes on /test/data
> > Jan 21 16:41:04 techval16 rgmanager[82859]: [fs] unmounting /test/data
> > Jan 21 16:41:05 techval16 rgmanager[82900]: [fs] Sending SIGKILL to
> > processes on /test/data
> > Jan 21 16:41:05 techval16 rgmanager[61929]: stop on fs "share16_data"
> > returned 1 (generic error)
>
> Looks like a bug in force_umount option.
>
> Please file a ticket with RH GSS.

I will log a ticket in a few days when I can build a simple test case
for support.

>
> As workaround try to disable force_umount.

The workaround of have force_umount=0 and self_fence=1 worked with the
ISO image mount test.


>
> As far as I can tell, but I haven't verify it:
> ocf_log warning "Sending SIGKILL to processes on $mp"
>                         fuser -kvm "$mp"
>
>                         case $? in
>                         0)
>                                 ;;
>                         1)
>                                 return $OCF_ERR_GENERIC
>                                 ;;
>                         2)
>                                 break
>                                 ;;
>                         esac
>
> the issue is the was fuser error is handled in force_umount path, that
> would match the log you are posting.
>

I have learned that "fuser" command will not find the sub-mounted iso
image that causes the umount to fail.  So, my test case using the iso
image to test self_fence may need to be updated.

[root at techval16]# df -k | grep data
/dev/mapper/share16vg-tv16_mq_data
                        806288     17200    748128   3% /test/data
                           352       352                 0 100% /test/data/mnt
[root at techval16]# fuser -kvm /test/data
[root at techval16]# echo $?
1
[root at techval16]# umount /test/data
umount: /test/data: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
[root at techval16]#

Unsure if the logic in fs-lib needs to be updated to handle
sub-mounted file systems.  That is what the Support Ticket will
determine, I suppose.

> I think the correct way would be to check if self_fence is enabled or
> not and then return/reboot later on the script.
>
> Fabio
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From arpittolani at gmail.com  Thu Jan 24 18:38:49 2013
From: arpittolani at gmail.com (Arpit Tolani)
Date: Fri, 25 Jan 2013 00:08:49 +0530
Subject: [Linux-cluster] LDAP as a service
In-Reply-To: <1E7F581BEF7B8444A6D29997EECCC66C0838B556@BPMC-G0-EX1.budcotdc.net>
References: <1E7F581BEF7B8444A6D29997EECCC66C0838B556@BPMC-G0-EX1.budcotdc.net>
Message-ID: <CAD3MydBK0iG3Z-p8z=SivXXbWDLCwrFnp1W0z-zuSHfNxXRZjw@mail.gmail.com>

Hello

On Thu, Jan 24, 2013 at 10:46 PM, Dryden, Tom <Tom_Dryden at budco.com> wrote:
> Greeting All
>
>
>
> I am looking into implementing a 389 directory server in a clustered/GFS
> environment.
>
> Can anyone out there provide a pointer to information on implementing
> 389-directory server as a clustered service that can be relocated?
>
>

Why do you want to configure LDAP server on cluster ? Most of the ldap
clients (nss_ldap, SSSD) can talk to multiple LDAP server & can
failover when primary is down.

>
> Thanks in advance
>
> Tom
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
Regards
Arpit Tolani


From stephen.krampach at lmco.com  Thu Jan 24 18:43:16 2013
From: stephen.krampach at lmco.com (Krampach, Stephen)
Date: Thu, 24 Jan 2013 18:43:16 +0000
Subject: [Linux-cluster] Cluster Shut Down Procedures
Message-ID: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com>

I hate to ask simple questions however, I've been perusing
books and blogs for two hours and have no definitive procedure;

We are having a power outage. What is the procedure to completely
shut down and power off a Red Hat 6.3 cluster?

Thanks in advance! Steve K
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130124/2cb20df9/attachment.htm>

From epretorious at yahoo.com  Thu Jan 24 20:50:35 2013
From: epretorious at yahoo.com (Eric)
Date: Thu, 24 Jan 2013 12:50:35 -0800 (PST)
Subject: [Linux-cluster] Cluster Shut Down Procedures
In-Reply-To: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com>
References: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com>
Message-ID: <1359060635.97933.YahooMailNeo@web126006.mail.ne1.yahoo.com>

Would `crm node standby` [on each of the nodes] be too simple?

Eric Pretorious
Truckee, CA


>________________________________
> From: "Krampach, Stephen" <stephen.krampach at lmco.com>
>To: "linux-cluster at redhat.com" <linux-cluster at redhat.com> 
>Sent: Thursday, January 24, 2013 10:43 AM
>Subject: [Linux-cluster] Cluster Shut Down Procedures
> 
>
> 
>I hate to ask simple questions however, I?ve been perusing
>books and blogs for two hours and have no definitive procedure;
>?
>We are having a power outage. What is the procedure to completely
>shut down and power off a Red Hat 6.3 cluster?
>?
>Thanks in advance! Steve K
>-- 
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130124/62b82876/attachment.htm>

From swegner at celltrak.com  Thu Jan 24 21:14:14 2013
From: swegner at celltrak.com (Steve Wegner)
Date: Thu, 24 Jan 2013 15:14:14 -0600
Subject: [Linux-cluster] Cluster Shut Down Procedures
In-Reply-To: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com>
References: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com>
Message-ID: <C0C7EDEF9A490C4C9939F8BC92407555188275737D@CTISBS.gpscti.local>

Could it be as simple as " service rgmanager stop " on each node, then normal shutdown?


From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Krampach, Stephen
Sent: Thursday, January 24, 2013 12:43 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Cluster Shut Down Procedures

I hate to ask simple questions however, I've been perusing
books and blogs for two hours and have no definitive procedure;

We are having a power outage. What is the procedure to completely
shut down and power off a Red Hat 6.3 cluster?

Thanks in advance! Steve K

PRIVILEGED & CONFIDENTIAL 
The information contained in this email message is intended only for use of the person or entity to whom it is addressed. The contained information is CONFIDENTIAL and LEGALLY PRIVILEGED and exempt from disclosure under applicable laws. If you read this message and are not the addressee, you are notified that use, dissemination or reproduction of this message is prohibited. If you have received this message in error, please notify the sender immediately.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130124/d9e0a448/attachment.htm>

From adam.scheblein at marquette.edu  Thu Jan 24 21:17:37 2013
From: adam.scheblein at marquette.edu (Scheblein, Adam)
Date: Thu, 24 Jan 2013 21:17:37 +0000
Subject: [Linux-cluster] Cluster Shut Down Procedures
In-Reply-To: <C0C7EDEF9A490C4C9939F8BC92407555188275737D@CTISBS.gpscti.local>
References: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com>
	<C0C7EDEF9A490C4C9939F8BC92407555188275737D@CTISBS.gpscti.local>
Message-ID: <040102649A06724CA48186EAA4A2FBB80B953E@ITS-EXMBITS1.marqnet.mu.edu>

I typically do a ccs --stopall, shutdown, startup, then because stopall disables cluster autostart i do a ccs --startall.

Adam

On Jan 24, 2013, at 3:14 PM, Steve Wegner <swegner at celltrak.com>
 wrote:

> Could it be as simple as ? service rgmanager stop ? on each node, then normal shutdown?
>  
>  
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Krampach, Stephen
> Sent: Thursday, January 24, 2013 12:43 PM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] Cluster Shut Down Procedures
>  
> I hate to ask simple questions however, I?ve been perusing
> books and blogs for two hours and have no definitive procedure;
>  
> We are having a power outage. What is the procedure to completely
> shut down and power off a Red Hat 6.3 cluster?
>  
> Thanks in advance! Steve K
> PRIVILEGED & CONFIDENTIAL 
> The information contained in this email message is intended only for use of the person or entity to whom it is addressed. The contained information is CONFIDENTIAL and LEGALLY PRIVILEGED and exempt from disclosure under applicable laws. If you read this message and are not the addressee, you are notified that use, dissemination or reproduction of this message is prohibited. If you have received this message in error, please notify the sender immediately.
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130124/8c053bd3/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3203 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130124/8c053bd3/attachment.p7s>

From stephen.krampach at lmco.com  Thu Jan 24 21:34:11 2013
From: stephen.krampach at lmco.com (Krampach, Stephen)
Date: Thu, 24 Jan 2013 21:34:11 +0000
Subject: [Linux-cluster] Cluster Shut Down Procedures
In-Reply-To: <040102649A06724CA48186EAA4A2FBB80B953E@ITS-EXMBITS1.marqnet.mu.edu>
References: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com>
	<C0C7EDEF9A490C4C9939F8BC92407555188275737D@CTISBS.gpscti.local>
	<040102649A06724CA48186EAA4A2FBB80B953E@ITS-EXMBITS1.marqnet.mu.edu>
Message-ID: <9F9C54F90C94584AAADAD6877B09C42D0F59504D@HDXDSP51.us.lmco.com>

I'm really not sure.

I've never heard of the css command and man css does not show results.

What I've read on some blogs thus far is; because the cluster is going down
in totality, you need to tell the system to ignore the quorum, stop the fencing
and then leave the cluster however, I have not heard anyone corroborate this
info. I hate being the newbie.

umount /mnt                                                     - Unmounts a GFS file system IF required
vgchange -aln                                                     - Deactivates LVM volumes (locally)
killall clvmd                                                          - Stops the CLVM daemon
fence_tool leave                                              - Leaves the fence domain (stops fenced)
cman_tool leave remove -w                       - Leaves the cluster

Steve K

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Scheblein, Adam
Sent: Thursday, January 24, 2013 1:18 PM
To: linux clustering
Subject: EXTERNAL: Re: [Linux-cluster] Cluster Shut Down Procedures

I typically do a ccs --stopall, shutdown, startup, then because stopall disables cluster autostart i do a ccs --startall.

Adam

On Jan 24, 2013, at 3:14 PM, Steve Wegner <swegner at celltrak.com<mailto:swegner at celltrak.com>>
 wrote:


Could it be as simple as " service rgmanager stop " on each node, then normal shutdown?


From: linux-cluster-bounces at redhat.com<mailto:linux-cluster-bounces at redhat.com> [mailto:linux-cluster-bounces at redhat.com<mailto:cluster-bounces at redhat.com>] On Behalf Of Krampach, Stephen
Sent: Thursday, January 24, 2013 12:43 PM
To: linux-cluster at redhat.com<mailto:linux-cluster at redhat.com>
Subject: [Linux-cluster] Cluster Shut Down Procedures

I hate to ask simple questions however, I've been perusing
books and blogs for two hours and have no definitive procedure;

We are having a power outage. What is the procedure to completely
shut down and power off a Red Hat 6.3 cluster?

Thanks in advance! Steve K

PRIVILEGED & CONFIDENTIAL

The information contained in this email message is intended only for use of the person or entity to whom it is addressed. The contained information is CONFIDENTIAL and LEGALLY PRIVILEGED and exempt from disclosure under applicable laws. If you read this message and are not the addressee, you are notified that use, dissemination or reproduction of this message is prohibited. If you have received this message in error, please notify the sender immediately.


--
Linux-cluster mailing list
Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130124/947ae7f2/attachment.htm>

From adam.scheblein at marquette.edu  Thu Jan 24 21:41:44 2013
From: adam.scheblein at marquette.edu (Scheblein, Adam)
Date: Thu, 24 Jan 2013 21:41:44 +0000
Subject: [Linux-cluster] Cluster Shut Down Procedures
In-Reply-To: <9F9C54F90C94584AAADAD6877B09C42D0F59504D@HDXDSP51.us.lmco.com>
References: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com>
	<C0C7EDEF9A490C4C9939F8BC92407555188275737D@CTISBS.gpscti.local>
	<040102649A06724CA48186EAA4A2FBB80B953E@ITS-EXMBITS1.marqnet.mu.edu>
	<9F9C54F90C94584AAADAD6877B09C42D0F59504D@HDXDSP51.us.lmco.com>
Message-ID: <040102649A06724CA48186EAA4A2FBB80B9663@ITS-EXMBITS1.marqnet.mu.edu>

CCS became a good tool starting in rhel 6.x, prior to that I never used it

 
Here is the man page:  http://linux.die.net/man/8/ccs

 
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Krampach, Stephen
Sent: Thursday, January 24, 2013 3:34 PM
To: linux clustering
Subject: Re: [Linux-cluster] Cluster Shut Down Procedures

 
I'm really not sure.

 
I've never heard of the css command and man css does not show results.

 
What I've read on some blogs thus far is; because the cluster is going down 

in totality, you need to tell the system to ignore the quorum, stop the
fencing

and then leave the cluster however, I have not heard anyone corroborate this

info. I hate being the newbie.

 
umount /mnt                                                     - Unmounts a
GFS file system IF required

vgchange -aln                                                     -
Deactivates LVM volumes (locally)

killall clvmd                                                          -
Stops the CLVM daemon

fence_tool leave                                              - Leaves the
fence domain (stops fenced)

cman_tool leave remove -w                       - Leaves the cluster

 
Steve K

 
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Scheblein, Adam
Sent: Thursday, January 24, 2013 1:18 PM
To: linux clustering
Subject: EXTERNAL: Re: [Linux-cluster] Cluster Shut Down Procedures

 
I typically do a ccs --stopall, shutdown, startup, then because stopall
disables cluster autostart i do a ccs --startall.

 
Adam

 
On Jan 24, 2013, at 3:14 PM, Steve Wegner <swegner at celltrak.com>

 wrote:

 
Could it be as simple as " service rgmanager stop " on each node, then
normal shutdown?

 
From:  <mailto:linux-cluster-bounces at redhat.com>
linux-cluster-bounces at redhat.com [mailto:linux-
<mailto:cluster-bounces at redhat.com> cluster-bounces at redhat.com] On Behalf Of
Krampach, Stephen
Sent: Thursday, January 24, 2013 12:43 PM
To:  <mailto:linux-cluster at redhat.com> linux-cluster at redhat.com
Subject: [Linux-cluster] Cluster Shut Down Procedures

 
I hate to ask simple questions however, I've been perusing

books and blogs for two hours and have no definitive procedure;

 
We are having a power outage. What is the procedure to completely

shut down and power off a Red Hat 6.3 cluster?

 
Thanks in advance! Steve K

PRIVILEGED & CONFIDENTIAL 
The information contained in this email message is intended only for use of
the person or entity to whom it is addressed. The contained information is
CONFIDENTIAL and LEGALLY PRIVILEGED and exempt from disclosure under
applicable laws. If you read this message and are not the addressee, you are
notified that use, dissemination or reproduction of this message is
prohibited. If you have received this message in error, please notify the
sender immediately.
 
-- 
Linux-cluster mailing list
 <mailto:Linux-cluster at redhat.com> Linux-cluster at redhat.com
 <https://www.redhat.com/mailman/listinfo/linux-cluster>
https://www.redhat.com/mailman/listinfo/linux-cluster

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130124/c7f26115/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4266 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130124/c7f26115/attachment.p7s>

From Tom_Dryden at BUDCO.com  Thu Jan 24 21:57:28 2013
From: Tom_Dryden at BUDCO.com (Dryden, Tom)
Date: Thu, 24 Jan 2013 16:57:28 -0500
Subject: [Linux-cluster] LDAP as a service
Message-ID: <1E7F581BEF7B8444A6D29997EECCC66C0838B560@BPMC-G0-EX1.budcotdc.net>


Good Afternoon,

There are a couple of reasons to implement LDAP on a cluster.  
1. I have a cluster with GFS partitions available.
2. Want to avoid the cost putting up 2 more machines for master  -
master LDAP operation.
3. Want to avoid the timeout the client experiences when the primary is
unavailable.
 
My thought is to have the LADP data stored on a GFS partition while the
LDAP server process and IP address are managed as a service.  In this
configuration the process can move between nodes with no impact to the
clients. 

Thanks
Tom

Message: 3
Date: Fri, 25 Jan 2013 00:08:49 +0530
From: Arpit Tolani <arpittolani at gmail.com>
To: linux clustering <linux-cluster at redhat.com>
Subject: Re: [Linux-cluster] LDAP as a service
Message-ID:
	
<CAD3MydBK0iG3Z-p8z=SivXXbWDLCwrFnp1W0z-zuSHfNxXRZjw at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hello

On Thu, Jan 24, 2013 at 10:46 PM, Dryden, Tom <Tom_Dryden at budco.com>
wrote:
> Greeting All
>
>
>
> I am looking into implementing a 389 directory server in a 
> clustered/GFS environment.
>
> Can anyone out there provide a pointer to information on implementing 
> 389-directory server as a clustered service that can be relocated?
>
>

Why do you want to configure LDAP server on cluster ? Most of the ldap
clients (nss_ldap, SSSD) can talk to multiple LDAP server & can failover
when primary is down.

>
> Thanks in advance
>
> Tom
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


--
Regards
Arpit Tolani


From stephen.krampach at lmco.com  Thu Jan 24 22:22:08 2013
From: stephen.krampach at lmco.com (Krampach, Stephen)
Date: Thu, 24 Jan 2013 22:22:08 +0000
Subject: [Linux-cluster] Cluster Shut Down Procedures
In-Reply-To: <040102649A06724CA48186EAA4A2FBB80B9663@ITS-EXMBITS1.marqnet.mu.edu>
References: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com>
	<C0C7EDEF9A490C4C9939F8BC92407555188275737D@CTISBS.gpscti.local>
	<040102649A06724CA48186EAA4A2FBB80B953E@ITS-EXMBITS1.marqnet.mu.edu>
	<9F9C54F90C94584AAADAD6877B09C42D0F59504D@HDXDSP51.us.lmco.com>
	<040102649A06724CA48186EAA4A2FBB80B9663@ITS-EXMBITS1.marqnet.mu.edu>
Message-ID: <9F9C54F90C94584AAADAD6877B09C42D0F595088@HDXDSP51.us.lmco.com>

OH - I saw on http://www.sourceware.org/cluster/conga/
that it is not in the standard Fedora distribution. Unfortunately,
I will need to get authorization prior to installing that. :(

Q. Why is Conga not in the Fedora 6 distribution?
A. Development for Conga started after the freeze for inclusion in FC6.
We have prepared RPMs to run with Fedora 6 on our Downloads<http://people.redhat.com/jparsons/downloads/fc6> page.
- Steve K


From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Scheblein, Adam
Sent: Thursday, January 24, 2013 1:42 PM
To: linux clustering
Subject: EXTERNAL: Re: [Linux-cluster] Cluster Shut Down Procedures

CCS became a good tool starting in rhel 6.x, prior to that I never used it

Here is the man page:  http://linux.die.net/man/8/ccs


From: linux-cluster-bounces at redhat.com<mailto:linux-cluster-bounces at redhat.com> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Krampach, Stephen
Sent: Thursday, January 24, 2013 3:34 PM
To: linux clustering
Subject: Re: [Linux-cluster] Cluster Shut Down Procedures

I'm really not sure.

I've never heard of the css command and man css does not show results.

What I've read on some blogs thus far is; because the cluster is going down
in totality, you need to tell the system to ignore the quorum, stop the fencing
and then leave the cluster however, I have not heard anyone corroborate this
info. I hate being the newbie.

umount /mnt                                                     - Unmounts a GFS file system IF required
vgchange -aln                                                     - Deactivates LVM volumes (locally)
killall clvmd                                                          - Stops the CLVM daemon
fence_tool leave                                              - Leaves the fence domain (stops fenced)
cman_tool leave remove -w                       - Leaves the cluster

Steve K

From: linux-cluster-bounces at redhat.com<mailto:linux-cluster-bounces at redhat.com> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Scheblein, Adam
Sent: Thursday, January 24, 2013 1:18 PM
To: linux clustering
Subject: EXTERNAL: Re: [Linux-cluster] Cluster Shut Down Procedures

I typically do a ccs --stopall, shutdown, startup, then because stopall disables cluster autostart i do a ccs --startall.

Adam

On Jan 24, 2013, at 3:14 PM, Steve Wegner <swegner at celltrak.com<mailto:swegner at celltrak.com>>
 wrote:

Could it be as simple as " service rgmanager stop " on each node, then normal shutdown?


From: linux-cluster-bounces at redhat.com<mailto:linux-cluster-bounces at redhat.com> [mailto:linux-cluster-bounces at redhat.com<mailto:cluster-bounces at redhat.com>] On Behalf Of Krampach, Stephen
Sent: Thursday, January 24, 2013 12:43 PM
To: linux-cluster at redhat.com<mailto:linux-cluster at redhat.com>
Subject: [Linux-cluster] Cluster Shut Down Procedures

I hate to ask simple questions however, I've been perusing
books and blogs for two hours and have no definitive procedure;

We are having a power outage. What is the procedure to completely
shut down and power off a Red Hat 6.3 cluster?

Thanks in advance! Steve K

PRIVILEGED & CONFIDENTIAL

The information contained in this email message is intended only for use of the person or entity to whom it is addressed. The contained information is CONFIDENTIAL and LEGALLY PRIVILEGED and exempt from disclosure under applicable laws. If you read this message and are not the addressee, you are notified that use, dissemination or reproduction of this message is prohibited. If you have received this message in error, please notify the sender immediately.


--
Linux-cluster mailing list
Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130124/04b07b58/attachment.htm>

From ricks at alldigital.com  Thu Jan 24 22:49:45 2013
From: ricks at alldigital.com (Rick Stevens)
Date: Thu, 24 Jan 2013 14:49:45 -0800
Subject: [Linux-cluster] LDAP as a service
In-Reply-To: <1E7F581BEF7B8444A6D29997EECCC66C0838B560@BPMC-G0-EX1.budcotdc.net>
References: <1E7F581BEF7B8444A6D29997EECCC66C0838B560@BPMC-G0-EX1.budcotdc.net>
Message-ID: <5101BA89.90506@alldigital.com>

On 01/24/2013 01:57 PM, Dryden, Tom issued this missive:
>
> Good Afternoon,
>
> There are a couple of reasons to implement LDAP on a cluster.
> 1. I have a cluster with GFS partitions available.

Good.

> 2. Want to avoid the cost putting up 2 more machines for master  -
> master LDAP operation.

Master-master LDAP replication is not hard to do and you're still going
to have two machines running LDAP. Perhaps not simultaneously, but you
will still have two machines.

> 3. Want to avoid the timeout the client experiences when the primary is
> unavailable.

This is what the TIMEOUT and SIZELIMIT and NETWORK_TIMEOUT variables in
the various incarnations of the ldap.conf file are for. The defaults do
make things sluggish if a primary goes down, but you can tweak that.

> My thought is to have the LADP data stored on a GFS partition while the
> LDAP server process and IP address are managed as a service.  In this
> configuration the process can move between nodes with no impact to the
> clients.

Personally, I think you're over complicating things and unless you have
a ridiculously big LDAP database that you don't want to replicate, I
don't think you're really buying anything here. We run several master-
master LDAP clusters here--even with one replicating across the country
(California <--> Florida). Works fine.

That being said, as with most FOSS stuff, there's more than one way to
skin a mule. Do as you wish.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital    ricks at alldigital.com -
- AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
-                                                                    -
-                 All generalizations are false.                     -
----------------------------------------------------------------------


From kkovachev at varna.net  Fri Jan 25 09:47:49 2013
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Fri, 25 Jan 2013 11:47:49 +0200
Subject: [Linux-cluster] LDAP as a service
In-Reply-To: <5101BA89.90506@alldigital.com>
References: <1E7F581BEF7B8444A6D29997EECCC66C0838B560@BPMC-G0-EX1.budcotdc.net>
	<5101BA89.90506@alldigital.com>
Message-ID: <562ee94a434cdad2daf8cb973e46e51f@mx.varna.net>

Hi,
 there should be openldap resource in your cluster, but if not you can
always use a script resource or write your own.


On Thu, 24 Jan 2013 14:49:45 -0800, Rick Stevens <ricks at alldigital.com>
wrote:
> On 01/24/2013 01:57 PM, Dryden, Tom issued this missive:
>>
>> Good Afternoon,
>>
>> There are a couple of reasons to implement LDAP on a cluster.
>> 1. I have a cluster with GFS partitions available.
> 
> Good.
> 
>> 2. Want to avoid the cost putting up 2 more machines for master  -
>> master LDAP operation.
> 
> Master-master LDAP replication is not hard to do and you're still going
> to have two machines running LDAP. Perhaps not simultaneously, but you
> will still have two machines.
> 
>> 3. Want to avoid the timeout the client experiences when the primary is
>> unavailable.
> 
> This is what the TIMEOUT and SIZELIMIT and NETWORK_TIMEOUT variables in
> the various incarnations of the ldap.conf file are for. The defaults do
> make things sluggish if a primary goes down, but you can tweak that.
> 
>> My thought is to have the LADP data stored on a GFS partition while the
>> LDAP server process and IP address are managed as a service.  In this
>> configuration the process can move between nodes with no impact to the
>> clients.
> 
> Personally, I think you're over complicating things and unless you have
> a ridiculously big LDAP database that you don't want to replicate, I
> don't think you're really buying anything here. We run several master-
> master LDAP clusters here--even with one replicating across the country
> (California <--> Florida). Works fine.
> 
> That being said, as with most FOSS stuff, there's more than one way to
> skin a mule. Do as you wish.
> ----------------------------------------------------------------------
> - Rick Stevens, Systems Engineer, AllDigital    ricks at alldigital.com -
> - AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
> -                                                                    -
> -                 All generalizations are false.                     -
> ----------------------------------------------------------------------


From Ralf.Aumueller at informatik.uni-stuttgart.de  Tue Jan 29 07:13:56 2013
From: Ralf.Aumueller at informatik.uni-stuttgart.de (Ralf Aumueller)
Date: Tue, 29 Jan 2013 08:13:56 +0100
Subject: [Linux-cluster] Share cluster interconnect switch hardware with
	second cluster
Message-ID: <510776B4.9040404@informatik.uni-stuttgart.de>

Hello,

we have a two node cluster (CentOS6) configured and running. The
cluster-interconnect is over two network switches (unmanaged. Reserved for the
cluster-interconnect).
Now we want to install a second two node cluster. Is it possible to share the
switches for the cluster-interconnect of the new cluster? Do I have to set
something special in /etc/cluster/cluster.conf of the new cluster?

Thanx and best regards,

Ralf


From lists at alteeve.ca  Tue Jan 29 07:17:36 2013
From: lists at alteeve.ca (Digimer)
Date: Tue, 29 Jan 2013 02:17:36 -0500
Subject: [Linux-cluster] Share cluster interconnect switch hardware with
 second cluster
In-Reply-To: <510776B4.9040404@informatik.uni-stuttgart.de>
References: <510776B4.9040404@informatik.uni-stuttgart.de>
Message-ID: <51077790.4060908@alteeve.ca>

On 01/29/2013 02:13 AM, Ralf Aumueller wrote:
> Hello,
> 
> we have a two node cluster (CentOS6) configured and running. The
> cluster-interconnect is over two network switches (unmanaged. Reserved for the
> cluster-interconnect).
> Now we want to install a second two node cluster. Is it possible to share the
> switches for the cluster-interconnect of the new cluster? Do I have to set
> something special in /etc/cluster/cluster.conf of the new cluster?
> 
> Thanx and best regards,
> 
> Ralf
> 

It's fine. Each cluster will use a different multicast group.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From misch at schwartzkopff.org  Tue Jan 29 08:53:58 2013
From: misch at schwartzkopff.org (Michael Schwartzkopff)
Date: Tue, 29 Jan 2013 09:53:58 +0100
Subject: [Linux-cluster] Share cluster interconnect switch hardware with
	second cluster
In-Reply-To: <510776B4.9040404@informatik.uni-stuttgart.de>
References: <510776B4.9040404@informatik.uni-stuttgart.de>
Message-ID: <1899303.vecLKUaSl7@nb003>

Am Dienstag, 29. Januar 2013, 08:13:56 schrieb Ralf Aumueller:
> Hello,
> 
> we have a two node cluster (CentOS6) configured and running. The
> cluster-interconnect is over two network switches (unmanaged. Reserved for
> the cluster-interconnect).
> Now we want to install a second two node cluster. Is it possible to share
> the switches for the cluster-interconnect of the new cluster? Do I have to
> set something special in /etc/cluster/cluster.conf of the new cluster?
> 
> Thanx and best regards,
> 
> Ralf

To be sure configure a different multicast group on the new cluster.

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 M?nchen

Tel: (0163) 172 50 98
Fax: (089) 620 304 13
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130129/524931f2/attachment.htm>

From jmatchett at cfl.rr.com  Tue Jan 29 20:58:18 2013
From: jmatchett at cfl.rr.com (jmatchett at cfl.rr.com)
Date: Tue, 29 Jan 2013 15:58:18 -0500
Subject: [Linux-cluster] Cluster and Fencing on different subnetworks?
Message-ID: <20130129205818.X4LBR.64959.root@cdptpa-web33-z01.mail.rr.com>

I have a RHEL6.3 cluster with RHCS and DRBD. When I kill the master node,  DRBD on the slave  calls rhcs_fence, but the script thinks its fails returning a (1) since the fence device was not on the same subnet as defined by the clusternode name in the  cluster.conf.  The fencing actually does occur, but when the fenced node reboots, and it tries fo come back in, the New master DRBD always reports Primary/Unknown.   THis requires a reboot of both nodes.

Is this by design or a problem. 

I switched back to Obliterate-peer.sh and the problem goes away. 
here is an excerpt from my cluster.conf.

I </clusternode>
		<clusternode name="cl_lm04.ionharris.com" nodeid="2">   ##10.10.10.x
			<fence>
				<method name="lm04_fence">
					<device name="lm04_ipmi"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<cman expected_votes="1" transport="udpu" two_node="1"/>
	<fencedevices>
		<fencedevice agent="fence_ipmilan" ipaddr="192.168.155.119" login="root" name="lm03_ipmi" passwd="nvslab"/>
		<fencedevice agent="fence_ipmilan" ipaddr="192.168.155.100" login="root" name="lm04_ipmi" passwd="nvslab"/>
	</fencedevices>

Best regards

John Matchett


From ksorensen at nordija.com  Wed Jan 30 11:31:22 2013
From: ksorensen at nordija.com (Kristian =?ISO-8859-1?Q?Gr=F8nfeldt_S=F8rensen?=)
Date: Wed, 30 Jan 2013 12:31:22 +0100
Subject: [Linux-cluster] GFS2 - F_SETLK fails with "ENOSYS" after umount +
	mount
Message-ID: <1359545482.15913.38.camel@kriller.nordija.dk>

Hi,

I'm setting up a two-node cluster sharing a single GFS2 filesystem
backed by a dual-primary DRBD-device (DRBD on top of LVM, so no CLVM
involved).

I am experiencing more or less the same as the OP in this thread:
http://www.redhat.com/archives/linux-cluster/2010-July/msg00136.html

I have an activemq-5.6.0 instance on each server that tries to lock a
file on the GFS2-filesystem (using ).  

When i start the cluster, everything works as expected. The first
activemq instance that starts up acquires the lock, the lock is released
when the activemq exits, and the second instance takes the lock. 

The problem shows when I unmount and subsequently mount the GFS2
filesystem  again on one of the nodes, or reboot one of the nodes (after
having started at least one activemq instance.) 
The I start seeing statements like this in the activemq log files:

Database /srv/activemq/queue#3a#2f#2fstat.#3e/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: Function not implemented | org.apache.activemq.store.kahadb.MessageDatabase

strace -f while that message is logged gives the following:

[pid  3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0
[pid  3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0
[pid  3549] open("/srv/activemq/queue#3a#2f#2fstat.#3e/lock", O_RDWR|O_CREAT, 0666) = 133
[pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid  3549] fcntl(133, F_GETFD)         = 0
[pid  3549] fcntl(133, F_SETFD, FD_CLOEXEC) = 0
[pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid  3549] fcntl(133, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = -1 ENOSYS (Function not implemented)
[pid  3549] dup2(138, 133)              = 133
[pid  3549] close(133)

As you can see, the "Function not implemented" originates from the
F_SETLK fnctl that the JVM does. 
The only way to recover from this state seems to be by unmounting the
GFS2-filesystem on both nodes, then mounting it again again on both
nodes. 

I've tried to isolate this by using a simpler testcase than starting two
activemq instances. I ended up using the java sample from
http://www.javabeat.net/2007/10/locking-files-using-java/ . 

I haven't managed to get the system in to a state where F_SETLK returns
"Function no implemented" by only using the above FileLockTest class, (I
need activemq in order to trigger the situation) but when the system is
in that state, I can run FileLockTest, and it will print out the
following stacktrace.

Exception in thread "main" java.io.IOException: Function not implemented
        at sun.nio.ch.FileChannelImpl.lock0(Native Method)
        at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:871)
        at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
        at FileLockTest.main(FileLockTest.java:15)


If I run this on the other server (where the GFS2 fs was not unmounted
and mounted again), it works correctly. 

Any ideas to what happens, and why?

BR
Kristian S?rensen


From rpeterso at redhat.com  Wed Jan 30 13:17:25 2013
From: rpeterso at redhat.com (Bob Peterson)
Date: Wed, 30 Jan 2013 08:17:25 -0500 (EST)
Subject: [Linux-cluster] GFS2 - F_SETLK fails with "ENOSYS" after umount
 +	mount
In-Reply-To: <1359545482.15913.38.camel@kriller.nordija.dk>
Message-ID: <106620543.15884204.1359551845224.JavaMail.root@redhat.com>

----- Original Message -----
| Hi,
| 
| I'm setting up a two-node cluster sharing a single GFS2 filesystem
| backed by a dual-primary DRBD-device (DRBD on top of LVM, so no CLVM
| involved).
| 
| I am experiencing more or less the same as the OP in this thread:
| http://www.redhat.com/archives/linux-cluster/2010-July/msg00136.html
| 
| I have an activemq-5.6.0 instance on each server that tries to lock a
| file on the GFS2-filesystem (using ).
| 
| When i start the cluster, everything works as expected. The first
| activemq instance that starts up acquires the lock, the lock is
| released
| when the activemq exits, and the second instance takes the lock.
| 
| The problem shows when I unmount and subsequently mount the GFS2
| filesystem  again on one of the nodes, or reboot one of the nodes
| (after
| having started at least one activemq instance.)
| The I start seeing statements like this in the activemq log files:
| 
| Database /srv/activemq/queue#3a#2f#2fstat.#3e/lock is locked...
| waiting 10 seconds for the database to be unlocked. Reason:
| java.io.IOException: Function not implemented |
| org.apache.activemq.store.kahadb.MessageDatabase
| 
| strace -f while that message is logged gives the following:
| 
| [pid  3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e",
| {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0
| [pid  3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e",
| {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0
| [pid  3549] open("/srv/activemq/queue#3a#2f#2fstat.#3e/lock",
| O_RDWR|O_CREAT, 0666) = 133
| [pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
| [pid  3549] fcntl(133, F_GETFD)         = 0
| [pid  3549] fcntl(133, F_SETFD, FD_CLOEXEC) = 0
| [pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
| [pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
| [pid  3549] fcntl(133, F_SETLK, {type=F_WRLCK, whence=SEEK_SET,
| start=0, len=1}) = -1 ENOSYS (Function not implemented)
| [pid  3549] dup2(138, 133)              = 133
| [pid  3549] close(133)
| 
| As you can see, the "Function not implemented" originates from the
| F_SETLK fnctl that the JVM does.
| The only way to recover from this state seems to be by unmounting the
| GFS2-filesystem on both nodes, then mounting it again again on both
| nodes.
| 
| I've tried to isolate this by using a simpler testcase than starting
| two
| activemq instances. I ended up using the java sample from
| http://www.javabeat.net/2007/10/locking-files-using-java/ .
| 
| I haven't managed to get the system in to a state where F_SETLK
| returns
| "Function no implemented" by only using the above FileLockTest class,
| (I
| need activemq in order to trigger the situation) but when the system
| is
| in that state, I can run FileLockTest, and it will print out the
| following stacktrace.
| 
| Exception in thread "main" java.io.IOException: Function not
| implemented
|         at sun.nio.ch.FileChannelImpl.lock0(Native Method)
|         at
|         sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:871)
|         at
|         java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
|         at FileLockTest.main(FileLockTest.java:15)
| 
| 
| If I run this on the other server (where the GFS2 fs was not
| unmounted
| and mounted again), it works correctly.
| 
| Any ideas to what happens, and why?
| 
| BR
| Kristian S?rensen
Hi Kristian,

After doing some simple checks (which shouldn't be your problem) GFS2
passes all posix lock requests down to the DLM for further processing.
I'm not sure what DLM does with them from there, but I believe the
requests are processed by user space, i.e. openais, etc., depending on
what version you're running.  I recommend checking "dmesg" to see if
there are any pertinent errors logged there. You could also check
/var/log/messages to see if user space logged any complaints.  Also,
you might want to do this command to check for pertinent errors:

group_tool dump gfs

(Now, if it was an flock rather than a posix lock, I could help you
because flocks are handled by GFS2 and not just passed on to DLM).

Regards,

Bob Peterson
Red Hat File Systems


From swhiteho at redhat.com  Wed Jan 30 13:34:41 2013
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 30 Jan 2013 13:34:41 +0000
Subject: [Linux-cluster] GFS2 - F_SETLK fails with "ENOSYS" after umount
 + mount
In-Reply-To: <1359545482.15913.38.camel@kriller.nordija.dk>
References: <1359545482.15913.38.camel@kriller.nordija.dk>
Message-ID: <1359552881.2719.12.camel@menhir>

Hi,

On Wed, 2013-01-30 at 12:31 +0100, Kristian Gr?nfeldt S?rensen wrote:
> Hi,
> 
> I'm setting up a two-node cluster sharing a single GFS2 filesystem
> backed by a dual-primary DRBD-device (DRBD on top of LVM, so no CLVM
> involved).
> 
> I am experiencing more or less the same as the OP in this thread:
> http://www.redhat.com/archives/linux-cluster/2010-July/msg00136.html
> 

Well I'm not so sure about that. We never found out what the issue was
in that case, but in your case it seems that you are doing something
which should work. Also, in the msg00136 case it seems that the lock
request didn't work at all, whereas in your case it appears that it does
work until a umount/mount of one node - at least if I've understood it
correctly.

Which kernel and userspace are you using?

It would be a good plan to report this as a bug (or via support if you
are a supported customer and are using RHEL) as it should work
correctly,

Steve.


> I have an activemq-5.6.0 instance on each server that tries to lock a
> file on the GFS2-filesystem (using ).  
> 
> When i start the cluster, everything works as expected. The first
> activemq instance that starts up acquires the lock, the lock is released
> when the activemq exits, and the second instance takes the lock. 
> 
> The problem shows when I unmount and subsequently mount the GFS2
> filesystem  again on one of the nodes, or reboot one of the nodes (after
> having started at least one activemq instance.) 
> The I start seeing statements like this in the activemq log files:
> 
> Database /srv/activemq/queue#3a#2f#2fstat.#3e/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: Function not implemented | org.apache.activemq.store.kahadb.MessageDatabase
> 
> strace -f while that message is logged gives the following:
> 
> [pid  3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0
> [pid  3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0
> [pid  3549] open("/srv/activemq/queue#3a#2f#2fstat.#3e/lock", O_RDWR|O_CREAT, 0666) = 133
> [pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> [pid  3549] fcntl(133, F_GETFD)         = 0
> [pid  3549] fcntl(133, F_SETFD, FD_CLOEXEC) = 0
> [pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> [pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> [pid  3549] fcntl(133, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = -1 ENOSYS (Function not implemented)
> [pid  3549] dup2(138, 133)              = 133
> [pid  3549] close(133)
> 
> As you can see, the "Function not implemented" originates from the
> F_SETLK fnctl that the JVM does. 
> The only way to recover from this state seems to be by unmounting the
> GFS2-filesystem on both nodes, then mounting it again again on both
> nodes. 
> 
> I've tried to isolate this by using a simpler testcase than starting two
> activemq instances. I ended up using the java sample from
> http://www.javabeat.net/2007/10/locking-files-using-java/ . 
> 
> I haven't managed to get the system in to a state where F_SETLK returns
> "Function no implemented" by only using the above FileLockTest class, (I
> need activemq in order to trigger the situation) but when the system is
> in that state, I can run FileLockTest, and it will print out the
> following stacktrace.
> 
> Exception in thread "main" java.io.IOException: Function not implemented
>         at sun.nio.ch.FileChannelImpl.lock0(Native Method)
>         at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:871)
>         at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
>         at FileLockTest.main(FileLockTest.java:15)
> 
> 
> If I run this on the other server (where the GFS2 fs was not unmounted
> and mounted again), it works correctly. 
> 
> Any ideas to what happens, and why?
> 
> BR
> Kristian S?rensen
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From rainer.hartwig.schubert at gmail.com  Wed Jan 30 15:22:36 2013
From: rainer.hartwig.schubert at gmail.com (Rainer Schubert)
Date: Wed, 30 Jan 2013 16:22:36 +0100
Subject: [Linux-cluster] clvmd not running on node mynode3
Message-ID: <CANLWqqQnZbzwP=nCJk+-D90q_e1ODmJRW2PrFkVWz2qV1eGB5w@mail.gmail.com>

Hi,

I can't get a new LVM blockdevice or do a simple resize:

  lvresize /dev/vm-storage/windowsserver1 -L +20G
  Extending logical volume windowsserver1 to 90.00 GiB
  clvmd not running on node mynode3
  Unable to drop cached metadata for VG vm-storage


The node mynode is correctly running. What can I do in this situation?

Best regards


From ksorensen at nordija.com  Wed Jan 30 16:11:09 2013
From: ksorensen at nordija.com (Kristian =?ISO-8859-1?Q?Gr=F8nfeldt_S=F8rensen?=)
Date: Wed, 30 Jan 2013 17:11:09 +0100
Subject: [Linux-cluster] GFS2 - F_SETLK fails with "ENOSYS" after umount
 + mount
In-Reply-To: <1359552881.2719.12.camel@menhir>
References: <1359545482.15913.38.camel@kriller.nordija.dk>
	<1359552881.2719.12.camel@menhir>
Message-ID: <1359562269.15913.57.camel@kriller.nordija.dk>

On Wed, 2013-01-30 at 13:34 +0000, Steven Whitehouse wrote:
> Hi,
> 
> On Wed, 2013-01-30 at 12:31 +0100, Kristian Gr?nfeldt S?rensen wrote:
> > Hi,
> > 
> > I'm setting up a two-node cluster sharing a single GFS2 filesystem
> > backed by a dual-primary DRBD-device (DRBD on top of LVM, so no CLVM
> > involved).
> > 
> > I am experiencing more or less the same as the OP in this thread:
> > http://www.redhat.com/archives/linux-cluster/2010-July/msg00136.html
> > 
> 
> Well I'm not so sure about that. We never found out what the issue was
> in that case, but in your case it seems that you are doing something
> which should work. Also, in the msg00136 case it seems that the lock
> request didn't work at all, whereas in your case it appears that it does
> work until a umount/mount of one node - at least if I've understood it
> correctly.

Correct. And I am able to bring the system into a working state by
unmounting the file system from all nodes at the same time, and mounting
it again. 

> Which kernel and userspace are you using?

It's Debian testing - kernel is from experimental
( 3.7.1-1~experimental.2), since I had problems deleting files with the
gfs2-module included in the default Debian testing kernel (3.2.x). 

cman + libdlm3 is v3.0.12
corosync is v1.4.2

Let me know if you need version numbers of other stuff.

> It would be a good plan to report this as a bug (or via support if you
> are a supported customer and are using RHEL) as it should work
> correctly,

OK will probably file a bug report then. It's at least encouraging to
hear that it should work:-)

/Kristian


> Steve.
> 
> 
> > I have an activemq-5.6.0 instance on each server that tries to lock a
> > file on the GFS2-filesystem (using ).  
> > 
> > When i start the cluster, everything works as expected. The first
> > activemq instance that starts up acquires the lock, the lock is released
> > when the activemq exits, and the second instance takes the lock. 
> > 
> > The problem shows when I unmount and subsequently mount the GFS2
> > filesystem  again on one of the nodes, or reboot one of the nodes (after
> > having started at least one activemq instance.) 
> > The I start seeing statements like this in the activemq log files:
> > 
> > Database /srv/activemq/queue#3a#2f#2fstat.#3e/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: Function not implemented | org.apache.activemq.store.kahadb.MessageDatabase
> > 
> > strace -f while that message is logged gives the following:
> > 
> > [pid  3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0
> > [pid  3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0
> > [pid  3549] open("/srv/activemq/queue#3a#2f#2fstat.#3e/lock", O_RDWR|O_CREAT, 0666) = 133
> > [pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> > [pid  3549] fcntl(133, F_GETFD)         = 0
> > [pid  3549] fcntl(133, F_SETFD, FD_CLOEXEC) = 0
> > [pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> > [pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> > [pid  3549] fcntl(133, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = -1 ENOSYS (Function not implemented)
> > [pid  3549] dup2(138, 133)              = 133
> > [pid  3549] close(133)
> > 
> > As you can see, the "Function not implemented" originates from the
> > F_SETLK fnctl that the JVM does. 
> > The only way to recover from this state seems to be by unmounting the
> > GFS2-filesystem on both nodes, then mounting it again again on both
> > nodes. 
> > 
> > I've tried to isolate this by using a simpler testcase than starting two
> > activemq instances. I ended up using the java sample from
> > http://www.javabeat.net/2007/10/locking-files-using-java/ . 
> > 
> > I haven't managed to get the system in to a state where F_SETLK returns
> > "Function no implemented" by only using the above FileLockTest class, (I
> > need activemq in order to trigger the situation) but when the system is
> > in that state, I can run FileLockTest, and it will print out the
> > following stacktrace.
> > 
> > Exception in thread "main" java.io.IOException: Function not implemented
> >         at sun.nio.ch.FileChannelImpl.lock0(Native Method)
> >         at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:871)
> >         at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
> >         at FileLockTest.main(FileLockTest.java:15)
> > 
> > 
> > If I run this on the other server (where the GFS2 fs was not unmounted
> > and mounted again), it works correctly. 
> > 
> > Any ideas to what happens, and why?
> > 
> > BR
> > Kristian S?rensen
> > 
> > -- 
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 


From queszama at yahoo.in  Wed Jan 30 16:29:23 2013
From: queszama at yahoo.in (Zama Ques)
Date: Thu, 31 Jan 2013 00:29:23 +0800 (SGT)
Subject: [Linux-cluster] GFS2 File System mount failing
Message-ID: <1359563363.57630.YahooMailNeo@web193504.mail.sg3.yahoo.com>


I am facing few issues while creating a GFS2 file system . GFS2 file creation is successful , but? it is failing while trying to mount the file system . 

It is failing with the following error :

===

[root at eser~]# /etc/init.d/gfs2 start
Mounting GFS2 filesystem (/sharedweb): fs is for a different cluster
error mounting lockproto lock_dlm
?????????????????????????????????????????????????????????? [FAILED]
----------

[root at eser ~]# tail -f /var/log/messages
Jan 30 15:50:27 eser modcluster: Updating cluster version
Jan 30 15:50:27 eser corosync[7121]:?? [QUORUM] Members[2]: 1 2
Jan 30 15:50:28 eser rgmanager[7379]: Reconfiguring
Jan 30 15:50:28 eser rgmanager[7379]: Loading Service Data
Jan 30 15:50:29 eserrgmanager[7379]: Stopping changed resources.
Jan 30 15:50:29 eser rgmanager[7379]: Restarting changed resources.
Jan 30 15:50:29 eser rgmanager[7379]: Starting changed resources.
Jan 30 15:56:21 eser gfs_controld[7254]: join: fs requires cluster="mycluster" current="sharedweb"
Jan 30 16:02:43 eser gfs_controld[7254]: join: fs requires cluster="mycluster" current="sharedweb"
Jan 30 18:46:48 eser gfs_controld[7254]: join: fs requires cluster="mycluster" current="sharedweb"


? ==========

"sharedweb" is the cluster which I created earlier and created the GFS2 file system using this cluster name. But I deleted "sharedweb" cluster and created a new cluster called "mycluster" , but while mounting the GFS2 partition with the new cluster , it is showing the error as mentioned above .

I created the new GFS2 file system using the command as shown below 

mkfs.gfs2 -t mycluster:mygfs2 -p lock_dlm -j 2 /dev/mapper/mpathcp1

My cluster config is as follows:

=====
# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="2" name="mycluster">
??????? <clusternodes>
??????????????? <clusternode name="eser.xxx.com" nodeid="1"/>
??????????????? <clusternode name="www.xxxxx.com" nodeid="2"/>
??????? </clusternodes>
??????? <cman expected_votes="1" two_node="1"/>
</cluster>
===

Please suggest how to resolve the issue


Thanks
Zaman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130131/a723ef0f/attachment.htm>

From swhiteho at redhat.com  Wed Jan 30 16:53:31 2013
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 30 Jan 2013 16:53:31 +0000
Subject: [Linux-cluster] GFS2 File System mount failing
In-Reply-To: <1359563363.57630.YahooMailNeo@web193504.mail.sg3.yahoo.com>
References: <1359563363.57630.YahooMailNeo@web193504.mail.sg3.yahoo.com>
Message-ID: <1359564811.2719.25.camel@menhir>

Hi,

On Thu, 2013-01-31 at 00:29 +0800, Zama Ques wrote:
> 
> 
> I am facing few issues while creating a GFS2 file system . GFS2 file
> creation is successful , but  it is failing while trying to mount the
> file system . 
> 
> It is failing with the following error :
> 
> ===
> 
> [root at eser~]# /etc/init.d/gfs2 start
> Mounting GFS2 filesystem (/sharedweb): fs is for a different cluster
> error mounting lockproto lock_dlm
>                                                            [FAILED]
> ----------
> 
Did you restart the cluster demons after you changed the config file? It
looks like it still is looking at the old data from the messages you've
posted,

Steve.

> [root at eser ~]# tail -f /var/log/messages
> Jan 30 15:50:27 eser modcluster: Updating cluster version
> Jan 30 15:50:27 eser corosync[7121]:   [QUORUM] Members[2]: 1 2
> Jan 30 15:50:28 eser rgmanager[7379]: Reconfiguring
> Jan 30 15:50:28 eser rgmanager[7379]: Loading Service Data
> Jan 30 15:50:29 eserrgmanager[7379]: Stopping changed resources.
> Jan 30 15:50:29 eser rgmanager[7379]: Restarting changed resources.
> Jan 30 15:50:29 eser rgmanager[7379]: Starting changed resources.
> Jan 30 15:56:21 eser gfs_controld[7254]: join: fs requires
> cluster="mycluster" current="sharedweb"
> Jan 30 16:02:43 eser gfs_controld[7254]: join: fs requires
> cluster="mycluster" current="sharedweb"
> Jan 30 18:46:48 eser gfs_controld[7254]: join: fs requires
> cluster="mycluster" current="sharedweb"
> 
> 
>   ==========
> 
> "sharedweb" is the cluster which I created earlier and created the
> GFS2 file system using this cluster name. But I deleted "sharedweb"
> cluster and created a new cluster called "mycluster" , but while
> mounting the GFS2 partition with the new cluster , it is showing the
> error as mentioned above .
> 
> I created the new GFS2 file system using the command as shown below 
> 
> mkfs.gfs2 -t mycluster:mygfs2 -p lock_dlm -j 2 /dev/mapper/mpathcp1
> 
> My cluster config is as follows:
> 
> =====
> # cat /etc/cluster/cluster.conf
> <?xml version="1.0"?>
> <cluster config_version="2" name="mycluster">
>         <clusternodes>
>                 <clusternode name="eser.xxx.com" nodeid="1"/>
>                 <clusternode name="www.xxxxx.com" nodeid="2"/>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
> </cluster>
> ===
> 
> Please suggest how to resolve the issue
> 
> 
> Thanks
> Zaman
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From queszama at yahoo.in  Thu Jan 31 01:56:21 2013
From: queszama at yahoo.in (Zama Ques)
Date: Thu, 31 Jan 2013 09:56:21 +0800 (SGT)
Subject: [Linux-cluster] GFS2 File System mount failing
In-Reply-To: <1359564811.2719.25.camel@menhir>
References: <1359563363.57630.YahooMailNeo@web193504.mail.sg3.yahoo.com>
	<1359564811.2719.25.camel@menhir>
Message-ID: <1359597381.66958.YahooMailNeo@web193503.mail.sg3.yahoo.com>

Thanks Steve . Restart of cman resolved the issue .


Thanks
Zaman


________________________________
 From: Steven Whitehouse <swhiteho at redhat.com>
To: Zama Ques <queszama at yahoo.in>; linux clustering <linux-cluster at redhat.com> 
Sent: Wednesday, 30 January 2013 10:23 PM
Subject: Re: [Linux-cluster] GFS2 File System mount failing
 
Hi,

On Thu, 2013-01-31 at 00:29 +0800, Zama Ques wrote:
> 
> 
> I am facing few issues while creating a GFS2 file system . GFS2 file
> creation is successful , but? it is failing while trying to mount the
> file system . 
> 
> It is failing with the following error :
> 
> ===
> 
> [root at eser~]# /etc/init.d/gfs2 start
> Mounting GFS2 filesystem (/sharedweb): fs is for a different cluster
> error mounting lockproto lock_dlm
>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? [FAILED]
> ----------
> 
> Did you restart the cluster demons after you changed the config file? It
> looks like it still is looking at the old data from the messages you've
>? posted,


> [root at eser ~]# tail -f /var/log/messages
> Jan 30 15:50:27 eser modcluster: Updating cluster version
> Jan 30 15:50:27 eser corosync[7121]:?  [QUORUM] Members[2]: 1 2
> Jan 30 15:50:28 eser rgmanager[7379]: Reconfiguring
> Jan 30 15:50:28 eser rgmanager[7379]: Loading Service Data
> Jan 30 15:50:29 eserrgmanager[7379]: Stopping changed resources.
> Jan 30 15:50:29 eser rgmanager[7379]: Restarting changed resources.
> Jan 30 15:50:29 eser rgmanager[7379]: Starting changed resources.
> Jan 30 15:56:21 eser gfs_controld[7254]: join: fs requires
> cluster="mycluster" current="sharedweb"
> Jan 30 16:02:43 eser gfs_controld[7254]: join: fs requires
> cluster="mycluster" current="sharedweb"
> Jan 30 18:46:48 eser gfs_controld[7254]: join: fs requires
> cluster="mycluster" current="sharedweb"
> 
> 
>?  ==========
> 
> "sharedweb" is the cluster which I created earlier and created the
> GFS2 file system using this cluster name. But I deleted "sharedweb"
> cluster and created a new cluster called "mycluster" , but while
> mounting the GFS2 partition with the new cluster , it is showing the
> error as mentioned above .
> 
> I created the new GFS2 file system using the command as shown below 
> 
> mkfs.gfs2 -t mycluster:mygfs2 -p lock_dlm -j 2 /dev/mapper/mpathcp1
> 
> My cluster config is as follows:
> 
> =====
> # cat /etc/cluster/cluster.conf
> <?xml version="1.0"?>
> <cluster config_version="2" name="mycluster">
>? ? ? ?  <clusternodes>
>? ? ? ? ? ? ? ?  <clusternode name="eser.xxx.com" nodeid="1"/>
>? ? ? ? ? ? ? ?  <clusternode name="www.xxxxx.com" nodeid="2"/>
>? ? ? ?  </clusternodes>
>? ? ? ?  <cman expected_votes="1" two_node="1"/>
> </cluster>
> ===
> 
> Please suggest how to resolve the issue
> 
> 
> Thanks
> Zaman
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130131/46fb2714/attachment.htm>