From linux at alteeve.com  Tue Nov  1 04:54:57 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 01 Nov 2011 00:54:57 -0400
Subject: [Linux-cluster] Forcing off KVM Windows guests from rgmanager
Message-ID: <4EAF7BA1.1000500@alteeve.com>

Hi all,

  I've run into something of a corner case;

EL6 / cman 3
rgmanager
KVM VMs
Win2008 R2 guest

I want to allow my UPS to shut down my cluster when the batteries are
about to fail.

The problem with this is that when I try to stop rgmanager (or even
simply disabling the VM resource), an application on the windows KVM
guest pops up a "Are you sure you want to close X?". This blocks the VMs
shutdown, which leaves rgmanager sitting there indefinitely waiting for
the guest VM to stop and nothing actually shuts down until the batteries
drain.

The application in question does not have a "don't prompt me" option, so
I need one of;
* A way to either tell the windows guest to forcibly stop to process.
* A way to have rgmanager pause and write out to disk the state of a VM.
* A way to 'virsh destory' a guest as a special kind of 'clusvcadm -d
...' call.

I'm using the virtio drivers, which I believe (perhaps wrongly) provides
the ACPI hook to start the guest VM.

Any suggestions/ideas? Anything has to be better than waiting and
letting the whole cluster hard power off.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron


From Sagar.Shimpi at tieto.com  Tue Nov  1 09:29:56 2011
From: Sagar.Shimpi at tieto.com (Sagar.Shimpi at tieto.com)
Date: Tue, 1 Nov 2011 11:29:56 +0200
Subject: [Linux-cluster] Need help regarding Sared storage with GFS2
Message-ID: <F9FFD5DEC2163F498E703E7F9CBB3A350F75997A37@EXMB01.eu.tieto.com>

Hi,

Following is my setup -

Redhat -6.0 ==> 64-bit
Cluster configuration using LUCI.

I had setup 2 node cluster Load Balancing Cluster having Mysql service active on both the nodes using different Failover Domain.
Node1 [Mysql-1 running with IP - 192.168.56.2 ]
Node2 [Mysql-2 running with IP - 192.168.56.3 ]

For both the above Mysql services I had used common storage using GFS2 file system. But I am facing the problem in syncing the storage. On both the nodes data is not in sync.

Is it possible to sync the data using GFS2 file system while configuring MYSQL load Balancing Cluster???


Regards,

Sagar Shimpi, Senior Technical Specialist, OSS Labs

Tieto
email sagar.shimpi at tieto.com<mailto:aniruddha.khadkikar at tieto.com>,
Wing 1, Cluster D, EON Free Zone, Plot No. 1, Survery # 77,
MIDC Kharadi Knowledge Park, Pune 411014, India, www.tieto.com www.tieto.in

TIETO. Knowledge. Passion. Results.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111101/225c04d2/attachment.htm>

From swhiteho at redhat.com  Tue Nov  1 11:43:07 2011
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 01 Nov 2011 11:43:07 +0000
Subject: [Linux-cluster] Need help regarding Sared storage with GFS2
In-Reply-To: <F9FFD5DEC2163F498E703E7F9CBB3A350F75997A37@EXMB01.eu.tieto.com>
References: <F9FFD5DEC2163F498E703E7F9CBB3A350F75997A37@EXMB01.eu.tieto.com>
Message-ID: <1320147787.2707.8.camel@menhir>

Hi,

On Tue, 2011-11-01 at 11:29 +0200, Sagar.Shimpi at tieto.com wrote:
> Hi,
> 
>  
> 
> Following is my setup ?
> 
>  
> 
> Redhat -6.0 ? 64-bit
> 
> Cluster configuration using LUCI.
> 
>  
> 
> I had setup 2 node cluster Load Balancing Cluster having Mysql service
> active on both the nodes using different Failover Domain.
> 
> Node1 [Mysql-1 running with IP ? 192.168.56.2 ]
> 
> Node2 [Mysql-2 running with IP ? 192.168.56.3 ]
> 
>  
> 
> For both the above Mysql services I had used common storage using GFS2
> file system. But I am facing the problem in syncing the storage. On
> both the nodes data is not in sync.
> 
>  
> 
> Is it possible to sync the data using GFS2 file system while
> configuring MYSQL load Balancing Cluster???
> 
That is really a MySQL question rather than a cluster question. In
general it is not likely that running multiple copies of MySQL across a
set of nodes will work. At least, not with the standard set up, anyway.
You'd need a cluster aware database in order to do that,

Steve.

>  
> 
>  
> 
> Regards,
> 
>  
> 
> Sagar Shimpi, Senior Technical Specialist, OSS Labs
> 
>  
> 
> Tieto
> 
> email sagar.shimpi at tieto.com,
> 
> Wing 1, Cluster D, EON Free Zone, Plot No. 1, Survery # 77, 
> 
> MIDC Kharadi Knowledge Park, Pune 411014, India, www.tieto.com
> www.tieto.in 
> 
> 
> TIETO. Knowledge. Passion. Results.
> 
>  
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From kkovachev at varna.net  Tue Nov  1 11:53:00 2011
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Tue, 01 Nov 2011 13:53:00 +0200
Subject: [Linux-cluster] Need help regarding Sared storage with GFS2
In-Reply-To: <F9FFD5DEC2163F498E703E7F9CBB3A350F75997A37@EXMB01.eu.tieto.com>
References: <F9FFD5DEC2163F498E703E7F9CBB3A350F75997A37@EXMB01.eu.tieto.com>
Message-ID: <85a8f0a04031dfef1d4e679a7526291c@mx.varna.net>

On Tue, 1 Nov 2011 11:29:56 +0200, <Sagar.Shimpi at tieto.com> wrote:
> Hi,
> 
> Following is my setup -
> 
> Redhat -6.0 ==> 64-bit
> Cluster configuration using LUCI.
> 
> I had setup 2 node cluster Load Balancing Cluster having Mysql service
> active on both the nodes using different Failover Domain.
> Node1 [Mysql-1 running with IP - 192.168.56.2 ]
> Node2 [Mysql-2 running with IP - 192.168.56.3 ]
> 
> For both the above Mysql services I had used common storage using GFS2
> file system. But I am facing the problem in syncing the storage. On both
> the nodes data is not in sync.

Which one is not true "I had used common storage" or "On both the nodes
data is not in sync" - if it is a common storage the data is the same?

if you are using GFS2 without a cluster and dlm locking i.e. local_locking
then it is possible both to be true

> 
> Is it possible to sync the data using GFS2 file system while configuring
> MYSQL load Balancing Cluster???
> 

GFS2 has nothing to do with syncing the data between two storages - if
that's what you are after, check DRBD

if you are using improperly configured GFS2 on a shared storage i.e.
without cluster and dlm it is no different than any other local filesystem
and corruption is guaranteed on simultaneous use

how did you create the GFS2 filesystem? Also please show your cluster.conf
and relevant storage details


From symack at gmail.com  Tue Nov  1 12:27:04 2011
From: symack at gmail.com (Nick Khamis)
Date: Tue, 1 Nov 2011 08:27:04 -0400
Subject: [Linux-cluster] Forcing off KVM Windows guests from rgmanager
In-Reply-To: <4EAF7BA1.1000500@alteeve.com>
References: <4EAF7BA1.1000500@alteeve.com>
Message-ID: <CAGWRaZZdP84_mH=sqkMUHGZn+KTin0Ntr79dgdvdJCYE0cB8Gg@mail.gmail.com>

Hello Digmer,

We are working on a similer project,  only:
EL6 / pcmk+cman (for dlm and fenced)
No rgmanager since we will be using pacemaker
KVM

Could you kindly share some whitepapers that
you have been using for your setup? Documentation
for things like live migration, fenching the VMs etc..?
PLEASE!

Thanks in Advance,

Nick.


On Tue, Nov 1, 2011 at 12:54 AM, Digimer <linux at alteeve.com> wrote:
> Hi all,
>
> ?I've run into something of a corner case;
>
> EL6 / cman 3
> rgmanager
> KVM VMs
> Win2008 R2 guest
>
> I want to allow my UPS to shut down my cluster when the batteries are
> about to fail.
>
> The problem with this is that when I try to stop rgmanager (or even
> simply disabling the VM resource), an application on the windows KVM
> guest pops up a "Are you sure you want to close X?". This blocks the VMs
> shutdown, which leaves rgmanager sitting there indefinitely waiting for
> the guest VM to stop and nothing actually shuts down until the batteries
> drain.
>
> The application in question does not have a "don't prompt me" option, so
> I need one of;
> * A way to either tell the windows guest to forcibly stop to process.
> * A way to have rgmanager pause and write out to disk the state of a VM.
> * A way to 'virsh destory' a guest as a special kind of 'clusvcadm -d
> ...' call.
>
> I'm using the virtio drivers, which I believe (perhaps wrongly) provides
> the ACPI hook to start the guest VM.
>
> Any suggestions/ideas? Anything has to be better than waiting and
> letting the whole cluster hard power off.
>
> --
> Digimer
> E-Mail: ? ? ? ? ? ? ?digimer at alteeve.com
> Freenode handle: ? ? digimer
> Papers and Projects: http://alteeve.com
> Node Assassin: ? ? ? http://nodeassassin.org
> "omg my singularity battery is dead again.
> stupid hawking radiation." - epitron
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


From rpeterso at redhat.com  Tue Nov  1 13:03:41 2011
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 01 Nov 2011 09:03:41 -0400 (EDT)
Subject: [Linux-cluster] Need help regarding Sared storage with GFS2
In-Reply-To: <85a8f0a04031dfef1d4e679a7526291c@mx.varna.net>
Message-ID: <0335948a-fbf7-4108-8786-3f7de0805a5f@zmail06.collab.prod.int.phx2.redhat.com>

----- Original Message -----
| Which one is not true "I had used common storage" or "On both the
| nodes
| data is not in sync" - if it is a common storage the data is the
| same?
| 
| if you are using GFS2 without a cluster and dlm locking i.e.
| local_locking
| then it is possible both to be true
(snip) 
| GFS2 has nothing to do with syncing the data between two storages -
| if
| that's what you are after, check DRBD
| 
| if you are using improperly configured GFS2 on a shared storage i.e.
| without cluster and dlm it is no different than any other local
| filesystem
| and corruption is guaranteed on simultaneous use

Hi,

IMHO, the most important things to bear in mind here are:

(1) The job of GFS2 is to keep the file system _metadata_ consistent
    between nodes in the cluster.
(2) It does _not_ keep DATA within the files consistent within the
    cluster: that's the job of the application.
(3) If the application is not cluster-aware (i.e. one instance of
    mysql doesn't know about another instance in the cluster) they
    will trounce each other's updates, making the data inconsistent.
(4) The general rule is: If two instances of an app can run on the
    same computer, in general it will work properly without data
    corruption. But if one computer is not allowed to run two
    instances of the same app, in general it will not work properly.
(5) With clustering you can essentially think of it this way: it
    makes multiple computers run an app as if they were running
    multiple instances on the same computer.  Almost like forcing
    the app to run two instances on the same computer (although
    that's not at all what really happens).  Multiple instances
    on the same machine will use some kind of locking mechanism,
    like posix locks, to maintain data integrity.
(6) Many apps are written with clustering in mind and there
    may be special "clustered" versions of apps, like mysql.
    It's best to check with the app experts or clustering experts
    or the cluster FAQ before implementing this kind of thing.

So bottom line: You can't run two copies of regular mysql on the
same box (unless it's a special cluster-aware mysql) without conflicts
so you can't run two copies of regular mysql in a cluster without
data corruption, because they are not cluster-aware.

Regards,

Bob Peterson
Red Hat File Systems


From symack at gmail.com  Tue Nov  1 13:17:36 2011
From: symack at gmail.com (Nick Khamis)
Date: Tue, 1 Nov 2011 09:17:36 -0400
Subject: [Linux-cluster] Forcing off KVM Windows guests from rgmanager
In-Reply-To: <CAGWRaZZdP84_mH=sqkMUHGZn+KTin0Ntr79dgdvdJCYE0cB8Gg@mail.gmail.com>
References: <4EAF7BA1.1000500@alteeve.com>
	<CAGWRaZZdP84_mH=sqkMUHGZn+KTin0Ntr79dgdvdJCYE0cB8Gg@mail.gmail.com>
Message-ID: <CAGWRaZbEc9tKgbjOhtZC_DEkXJszcaWh4x9QWta4qnV=t-8g_Q@mail.gmail.com>

You make it look so easy:
https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial ;)


Nick.


From kkovachev at varna.net  Tue Nov  1 14:13:50 2011
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Tue, 01 Nov 2011 16:13:50 +0200
Subject: [Linux-cluster] Need help regarding Sared storage with GFS2
In-Reply-To: <0335948a-fbf7-4108-8786-3f7de0805a5f@zmail06.collab.prod.int.phx2.redhat.com>
References: <0335948a-fbf7-4108-8786-3f7de0805a5f@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID: <331c3d89b5cfd9dc4893b28d3961f667@mx.varna.net>

On Tue, 01 Nov 2011 09:03:41 -0400 (EDT), Bob Peterson
<rpeterso at redhat.com> wrote:
> ----- Original Message -----
> | Which one is not true "I had used common storage" or "On both the
> | nodes
> | data is not in sync" - if it is a common storage the data is the
> | same?
> | 
> | if you are using GFS2 without a cluster and dlm locking i.e.
> | local_locking
> | then it is possible both to be true
> (snip) 
> | GFS2 has nothing to do with syncing the data between two storages -
> | if
> | that's what you are after, check DRBD
> | 
> | if you are using improperly configured GFS2 on a shared storage i.e.
> | without cluster and dlm it is no different than any other local
> | filesystem
> | and corruption is guaranteed on simultaneous use
> 
> Hi,
> 
> IMHO, the most important things to bear in mind here are:
> 
> (1) The job of GFS2 is to keep the file system _metadata_ consistent
>     between nodes in the cluster.
> (2) It does _not_ keep DATA within the files consistent within the
>     cluster: that's the job of the application.
> (3) If the application is not cluster-aware (i.e. one instance of
>     mysql doesn't know about another instance in the cluster) they
>     will trounce each other's updates, making the data inconsistent.
> (4) The general rule is: If two instances of an app can run on the
>     same computer, in general it will work properly without data
>     corruption. But if one computer is not allowed to run two
>     instances of the same app, in general it will not work properly.
> (5) With clustering you can essentially think of it this way: it
>     makes multiple computers run an app as if they were running
>     multiple instances on the same computer.  Almost like forcing
>     the app to run two instances on the same computer (although
>     that's not at all what really happens).  Multiple instances
>     on the same machine will use some kind of locking mechanism,
>     like posix locks, to maintain data integrity.
> (6) Many apps are written with clustering in mind and there
>     may be special "clustered" versions of apps, like mysql.
>     It's best to check with the app experts or clustering experts
>     or the cluster FAQ before implementing this kind of thing.
> 
> So bottom line: You can't run two copies of regular mysql on the
> same box (unless it's a special cluster-aware mysql) without conflicts
> so you can't run two copies of regular mysql in a cluster without
> data corruption, because they are not cluster-aware.
> 

I agree with all said, but it is possible to run more than one instance of
regular mysql on the same box. I run 3 (slave of master 1, slave of master
2 and combined RO export) instances (on the same machine), using the same
data without problems, but you need to define 'external-locking' which
slows them down

running two instances in a cluster from shared storage is possible, but
much slower and not a solution.

> Regards,
> 
> Bob Peterson
> Red Hat File Systems
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From linux at alteeve.com  Tue Nov  1 15:42:36 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 01 Nov 2011 11:42:36 -0400
Subject: [Linux-cluster] Forcing off KVM Windows guests from rgmanager
In-Reply-To: <CAGWRaZbEc9tKgbjOhtZC_DEkXJszcaWh4x9QWta4qnV=t-8g_Q@mail.gmail.com>
References: <4EAF7BA1.1000500@alteeve.com>
	<CAGWRaZZdP84_mH=sqkMUHGZn+KTin0Ntr79dgdvdJCYE0cB8Gg@mail.gmail.com>
	<CAGWRaZbEc9tKgbjOhtZC_DEkXJszcaWh4x9QWta4qnV=t-8g_Q@mail.gmail.com>
Message-ID: <4EB0136C.1080804@alteeve.com>

On 11/01/2011 09:17 AM, Nick Khamis wrote:
> You make it look so easy:
> https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial ;)
> 
> 
> Nick.

Just a fair warning; That tutorial is well on it's way, but it is not
complete and I've not yet gone over it looking for mistakes. Please feel
free to read it and follow it, but be caustious of mistakes or omissions
at this time.

I plan to post an announcement when it's finished. In the mean time, if
you run into problems, feel free to ask me at this address or stop by
#linux-cluster on IRC. :)

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron


From mmorgan at dca.net  Tue Nov  1 15:56:42 2011
From: mmorgan at dca.net (Michael Morgan)
Date: Tue, 1 Nov 2011 11:56:42 -0400
Subject: [Linux-cluster] Forcing off KVM Windows guests from rgmanager
In-Reply-To: <4EAF7BA1.1000500@alteeve.com>
References: <4EAF7BA1.1000500@alteeve.com>
Message-ID: <20111101155642.GA30573@staff.dca.net>

> The problem with this is that when I try to stop rgmanager (or even
> simply disabling the VM resource), an application on the windows KVM
> guest pops up a "Are you sure you want to close X?". This blocks the VMs
> shutdown, which leaves rgmanager sitting there indefinitely waiting for
> the guest VM to stop and nothing actually shuts down until the batteries
> drain.

How much much lead time are you giving rgmanager? In my experience (and
according to vm.sh) rgmanager will issue a virsh destroy roughly 2
minutes after a virsh shutdown. From vm.sh:

 263     <actions>
 264         <action name="start" timeout="300"/>
 265         <action name="stop" timeout="120"/>
...
 467         for op in $*; do
 468                 echo virsh $op $OCF_RESKEY_name ...
 469                 virsh $op $OCF_RESKEY_name
 470 
 471                 timeout=$(get_timeout)
 472                 while [ $timeout -gt 0 ]; do
 473                         sleep 5
 474                         ((timeout -= 5))
 475                         state=$(do_status)
 476                         [ $? -eq 0 ] || return 0
 477 
 478                         if [ "$state" = "paused" ]; then
 479                                 virsh destroy $OCF_RESKEY_name
 480                         fi
 481                 done
 482         done

I don't know off hand if the action parameters can be adjusted in the
rgmanager config, I've never had cause to change it personally.

Your site is a very useful resource btw. Great job and many thanks!

-Mike


From symack at gmail.com  Tue Nov  1 15:59:54 2011
From: symack at gmail.com (Nick Khamis)
Date: Tue, 1 Nov 2011 11:59:54 -0400
Subject: [Linux-cluster] Forcing off KVM Windows guests from rgmanager
In-Reply-To: <4EB0136C.1080804@alteeve.com>
References: <4EAF7BA1.1000500@alteeve.com>
	<CAGWRaZZdP84_mH=sqkMUHGZn+KTin0Ntr79dgdvdJCYE0cB8Gg@mail.gmail.com>
	<CAGWRaZbEc9tKgbjOhtZC_DEkXJszcaWh4x9QWta4qnV=t-8g_Q@mail.gmail.com>
	<4EB0136C.1080804@alteeve.com>
Message-ID: <CAGWRaZaXs884BcFXRfHKPS=16ty-pHmnzu0qq_P7BjxdZoy2WA@mail.gmail.com>

Hey Digmer,

Most definitely. I've made some headway over the past month and a half, our
project is using a pcmk+cman+corosync+ocfs2 stack, all nose bleed versions,
and all built from source. We basically hit ALL the errors. Currently looking
to integrate the fenced part of the equation using fence_xvm. Any input or
heads up would be appreciated?
Also, if I am not mistaken, I noticed that you lean towards the Cluster3 stack,
any reason for staying away from pcmk+cman with corosync?

Cheers,

Nick from Toronto (And sometimes Montreal)


From linux at alteeve.com  Tue Nov  1 16:06:43 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 01 Nov 2011 12:06:43 -0400
Subject: [Linux-cluster] Forcing off KVM Windows guests from rgmanager
In-Reply-To: <CAGWRaZaXs884BcFXRfHKPS=16ty-pHmnzu0qq_P7BjxdZoy2WA@mail.gmail.com>
References: <4EAF7BA1.1000500@alteeve.com>
	<CAGWRaZZdP84_mH=sqkMUHGZn+KTin0Ntr79dgdvdJCYE0cB8Gg@mail.gmail.com>
	<CAGWRaZbEc9tKgbjOhtZC_DEkXJszcaWh4x9QWta4qnV=t-8g_Q@mail.gmail.com>
	<4EB0136C.1080804@alteeve.com>
	<CAGWRaZaXs884BcFXRfHKPS=16ty-pHmnzu0qq_P7BjxdZoy2WA@mail.gmail.com>
Message-ID: <4EB01913.7010309@alteeve.com>

On 11/01/2011 11:59 AM, Nick Khamis wrote:
> Hey Digmer,
> 
> Most definitely. I've made some headway over the past month and a half, our
> project is using a pcmk+cman+corosync+ocfs2 stack, all nose bleed versions,
> and all built from source. We basically hit ALL the errors. Currently looking
> to integrate the fenced part of the equation using fence_xvm. Any input or
> heads up would be appreciated?
> Also, if I am not mistaken, I noticed that you lean towards the Cluster3 stack,
> any reason for staying away from pcmk+cman with corosync?
> 
> Cheers,
> 
> Nick from Toronto (And sometimes Montreal)

Hey, a fellow Torontonian. :)

I stick with what is officially supported by Red Hat, quite simply. I am
tracking Pacemaker's progress closely, and plan to start learning it
more earnestly before too long. At the moment though, there are two
major issues for me with Pacemaker;

* It's in technology preview in EL6 at the moment, so no z-stream  updates.
* The implementation of fencing (aka stonith) are not the way I need
them to be.

Pacemaker is awesome, and it will be the future, but at the moment,
cman+rgmanager is what is most stable, so that is where I work. :)

Also, for anything aiming at production, I *strongly* recommend staying
away from compiling your own apps. For learning/testing though, running
from the latest versions and then filing bugs against the code is always
appreciated. :)

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron


From linux at alteeve.com  Tue Nov  1 16:08:38 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 01 Nov 2011 12:08:38 -0400
Subject: [Linux-cluster] Forcing off KVM Windows guests from rgmanager
In-Reply-To: <20111101155642.GA30573@staff.dca.net>
References: <4EAF7BA1.1000500@alteeve.com>
	<20111101155642.GA30573@staff.dca.net>
Message-ID: <4EB01986.1070108@alteeve.com>

On 11/01/2011 11:56 AM, Michael Morgan wrote:
>> The problem with this is that when I try to stop rgmanager (or even
>> simply disabling the VM resource), an application on the windows KVM
>> guest pops up a "Are you sure you want to close X?". This blocks the VMs
>> shutdown, which leaves rgmanager sitting there indefinitely waiting for
>> the guest VM to stop and nothing actually shuts down until the batteries
>> drain.
> 
> How much much lead time are you giving rgmanager? In my experience (and
> according to vm.sh) rgmanager will issue a virsh destroy roughly 2
> minutes after a virsh shutdown. From vm.sh:
> 
>  263     <actions>
>  264         <action name="start" timeout="300"/>
>  265         <action name="stop" timeout="120"/>
> ...
>  467         for op in $*; do
>  468                 echo virsh $op $OCF_RESKEY_name ...
>  469                 virsh $op $OCF_RESKEY_name
>  470 
>  471                 timeout=$(get_timeout)
>  472                 while [ $timeout -gt 0 ]; do
>  473                         sleep 5
>  474                         ((timeout -= 5))
>  475                         state=$(do_status)
>  476                         [ $? -eq 0 ] || return 0
>  477 
>  478                         if [ "$state" = "paused" ]; then
>  479                                 virsh destroy $OCF_RESKEY_name
>  480                         fi
>  481                 done
>  482         done
> 
> I don't know off hand if the action parameters can be adjusted in the
> rgmanager config, I've never had cause to change it personally.
> 
> Your site is a very useful resource btw. Great job and many thanks!
> 
> -Mike

Not two minutes. ;)

I will try letting is wait on my test cluster shortly and will report
back. If the destroy is indeed called, that would be fantastic.

Glad to hear the site help! :D

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron


From wewrussell at gmail.com  Tue Nov  1 20:48:55 2011
From: wewrussell at gmail.com (W. E. W. Russell)
Date: Tue, 1 Nov 2011 16:48:55 -0400
Subject: [Linux-cluster] Issue with Conga on RHEL 5.7
Message-ID: <CAH8vcWy=73gd-vGwFgmeuVmioe-t3FsJg9T05qCy5W_UPcasdQ@mail.gmail.com>

My name is William Russell and I'm new on this list.

I'm having an issue that I think is really simple, but I can't even figure
out where the problem is located.

I have installed 'luci' and 'ricci' on my main server and 'ricci' on my
other two servers. All servers are running the latest RHEL 5.7 with all the
yum updates (RHN registered with the Clustering entitlements).
When I create a new cluster, I get to the progress screen and that's where
everything falls apart. It just sits there for hours! It never creates the
cluster. The dot for "Install" NEVER fills in. After much research, I
understand what 'luci' is trying to do in terms of installing the packages
necessary to configure, manage, and run the cluster, but I have done the
'yum groupinstall clustering' on another sever and it took 10 mins, if that.

Communication between the servers has been verified - iptables is not
running, selinux is disabled.

If you need more information on the issue, feel free to ask. I looked at
the syslog and see no failures or errors. If anyone can even point me in
the direction of what might be causing the problem, it would be much
appreciated.

-- 
W. E. W. Russell
Director, Systems Intergration at incNETWORKS, Inc.
Work Phone # 732-508-2224

Active Alumni member of Sigma Lambda Beta International Fraternity, Inc.
Cell Phone # 732-744-6483
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111101/6bf022a5/attachment.htm>

From Sagar.Shimpi at tieto.com  Wed Nov  2 04:56:02 2011
From: Sagar.Shimpi at tieto.com (Sagar.Shimpi at tieto.com)
Date: Wed, 2 Nov 2011 06:56:02 +0200
Subject: [Linux-cluster] Need help regarding Sared storage with GFS2
In-Reply-To: <0335948a-fbf7-4108-8786-3f7de0805a5f@zmail06.collab.prod.int.phx2.redhat.com>
References: <85a8f0a04031dfef1d4e679a7526291c@mx.varna.net>
	<0335948a-fbf7-4108-8786-3f7de0805a5f@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID: <F9FFD5DEC2163F498E703E7F9CBB3A350F75997F22@EXMB01.eu.tieto.com>

Thanks A lot for the detail explanation.


Regards,

Sagar Shimpi, Senior Technical Specialist, OSS Labs

Tieto
email sagar.shimpi at tieto.com,
Wing 1, Cluster D, EON Free Zone, Plot No. 1, Survery # 77, 
MIDC Kharadi Knowledge Park, Pune 411014, India, www.tieto.com www.tieto.in 

TIETO. Knowledge. Passion. Results.


-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bob Peterson
Sent: Tuesday, November 01, 2011 6:34 PM
To: linux clustering
Subject: Re: [Linux-cluster] Need help regarding Sared storage with GFS2

----- Original Message -----
| Which one is not true "I had used common storage" or "On both the
| nodes
| data is not in sync" - if it is a common storage the data is the
| same?
| 
| if you are using GFS2 without a cluster and dlm locking i.e.
| local_locking
| then it is possible both to be true
(snip) 
| GFS2 has nothing to do with syncing the data between two storages -
| if
| that's what you are after, check DRBD
| 
| if you are using improperly configured GFS2 on a shared storage i.e.
| without cluster and dlm it is no different than any other local
| filesystem
| and corruption is guaranteed on simultaneous use

Hi,

IMHO, the most important things to bear in mind here are:

(1) The job of GFS2 is to keep the file system _metadata_ consistent
    between nodes in the cluster.
(2) It does _not_ keep DATA within the files consistent within the
    cluster: that's the job of the application.
(3) If the application is not cluster-aware (i.e. one instance of
    mysql doesn't know about another instance in the cluster) they
    will trounce each other's updates, making the data inconsistent.
(4) The general rule is: If two instances of an app can run on the
    same computer, in general it will work properly without data
    corruption. But if one computer is not allowed to run two
    instances of the same app, in general it will not work properly.
(5) With clustering you can essentially think of it this way: it
    makes multiple computers run an app as if they were running
    multiple instances on the same computer.  Almost like forcing
    the app to run two instances on the same computer (although
    that's not at all what really happens).  Multiple instances
    on the same machine will use some kind of locking mechanism,
    like posix locks, to maintain data integrity.
(6) Many apps are written with clustering in mind and there
    may be special "clustered" versions of apps, like mysql.
    It's best to check with the app experts or clustering experts
    or the cluster FAQ before implementing this kind of thing.

So bottom line: You can't run two copies of regular mysql on the
same box (unless it's a special cluster-aware mysql) without conflicts
so you can't run two copies of regular mysql in a cluster without
data corruption, because they are not cluster-aware.

Regards,

Bob Peterson
Red Hat File Systems

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From ext.thales.jean-daniel.bonnetot at sncf.fr  Wed Nov  2 08:42:29 2011
From: ext.thales.jean-daniel.bonnetot at sncf.fr (BONNETOT Jean-Daniel (EXT THALES))
Date: Wed, 2 Nov 2011 09:42:29 +0100
Subject: [Linux-cluster] Issue with Conga on RHEL 5.7
In-Reply-To: <CAH8vcWy=73gd-vGwFgmeuVmioe-t3FsJg9T05qCy5W_UPcasdQ@mail.gmail.com>
References: <CAH8vcWy=73gd-vGwFgmeuVmioe-t3FsJg9T05qCy5W_UPcasdQ@mail.gmail.com>
Message-ID: <C088D3516432C643AC828162A5164A7F0AFFB9FE@se3lmwbibaw.COMMUN.AD.SNCF.FR>

Hello,

We are many with same problem. Since RHEL 5.7, packages installation don't work from ricci.
For now, our solution is to install packages manually without using "groupinstall" command.

yum install cman rgmanager qdiskd ...

Regards,

Jean-Daniel BONNETOT
Ing?nierie Syst?me Aix & Linux 

De?: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] De la part de W. E. W. Russell
Envoy??: mardi 1 novembre 2011 21:49
??: linux-cluster at redhat.com
Cc?: Gerardo Laracuente
Objet?: [Linux-cluster] Issue with Conga on RHEL 5.7

My name is William Russell and I'm new on this list.

I'm having an issue that I think is really simple, but I can't even figure out where the problem is located.

I have installed 'luci' and 'ricci' on my main server and 'ricci' on my other two servers. All servers are running the latest RHEL 5.7 with all the yum updates (RHN registered with the Clustering entitlements).
When I create a new cluster, I get to the progress screen and that's where everything falls apart. It just sits there for hours! It never creates the cluster. The dot for "Install" NEVER fills in. After much research, I understand what 'luci' is trying to do in terms of installing the packages necessary to configure, manage, and run the cluster, but I have done the 'yum groupinstall clustering' on another sever and it took 10 mins, if that.

Communication between the servers has been verified - iptables is not running, selinux is disabled.?

If you need more information on the issue, feel free to ask. I looked at the syslog and see no failures or errors. If anyone can even point me in the direction of what might be causing the problem, it would be much appreciated.

-- 
W. E. W. Russell
Director, Systems Intergration at incNETWORKS, Inc.
Work Phone # 732-508-2224?

Active Alumni member of Sigma Lambda Beta International Fraternity, Inc.
Cell Phone # 732-744-6483


-------
Ce message et toutes les pi?ces jointes sont ?tablis ? l'intention exclusive de ses destinataires et sont confidentiels. L'int?grit? de ce message n'?tant pas assur?e sur Internet, la SNCF ne peut ?tre tenue responsable des alt?rations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, m?me partielle, non autoris?e pr?alablement par la SNCF, est strictement interdite. Si vous n'?tes pas le destinataire de ce message, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it. 


From szekelyi at niif.hu  Wed Nov  2 20:00:18 2011
From: szekelyi at niif.hu (=?ISO-8859-1?Q?Sz=E9kelyi?= Szabolcs)
Date: Wed, 02 Nov 2011 21:00:18 +0100
Subject: [Linux-cluster] Running a cluster on routed networks
Message-ID: <19707654.WTfKzLJxNB@mranderson>

Hello,

how can I run a cluster on a network where nodes are on different subnets? 
Currently the main problem is that heartbeats are sent with their IP level TTL 
set to 1, which keeps them from reaching the other nodes. How can I change 
this? I'm using multicasting.

Thanks,
-- 
Szabolcs


From szekelyi at niif.hu  Thu Nov  3 17:12:46 2011
From: szekelyi at niif.hu (=?ISO-8859-1?Q?Sz=E9kelyi?= Szabolcs)
Date: Thu, 03 Nov 2011 18:12:46 +0100
Subject: [Linux-cluster] Running a cluster on routed networks
In-Reply-To: <19707654.WTfKzLJxNB@mranderson>
References: <19707654.WTfKzLJxNB@mranderson>
Message-ID: <9710609.JpD2iA48WT@mranderson>

On 2011. November 2. 21:00:18 Sz?kelyi Szabolcs wrote:
> how can I run a cluster on a network where nodes are on different subnets?
> Currently the main problem is that heartbeats are sent with their IP level
> TTL set to 1, which keeps them from reaching the other nodes. How can I
> change this? I'm using multicasting.

OK, I've found this: https://bugzilla.redhat.com/show_bug.cgi?id=640311 , 
saying that it's now possible to set the TTL for multicast. But I haven't 
found any info on *how* to set it. I've seen the following possible solutions:

<cman ttl="X" />

<cman>
  <interface ttl="x" />
</cman>

<totem>
  <interface ttl="x" />
</totem>

But whatever I do, ccs_config_validate always says that my cluster.conf is 
invalid, and the TTL (as reported by tcpdump) is still zero. Is it possible 
that my cman is out of date? I'm using version 3.0.12. Can you tell me which 
is the eariest version that has this feature?

Thanks,
-- 
Szabolcs


From linux at alteeve.com  Thu Nov  3 17:29:25 2011
From: linux at alteeve.com (Digimer)
Date: Thu, 03 Nov 2011 13:29:25 -0400
Subject: [Linux-cluster] Running a cluster on routed networks
In-Reply-To: <9710609.JpD2iA48WT@mranderson>
References: <19707654.WTfKzLJxNB@mranderson> <9710609.JpD2iA48WT@mranderson>
Message-ID: <4EB2CF75.60904@alteeve.com>

On 11/03/2011 01:12 PM, Sz?kelyi Szabolcs wrote:
> On 2011. November 2. 21:00:18 Sz?kelyi Szabolcs wrote:
>> how can I run a cluster on a network where nodes are on different subnets?
>> Currently the main problem is that heartbeats are sent with their IP level
>> TTL set to 1, which keeps them from reaching the other nodes. How can I
>> change this? I'm using multicasting.
> 
> OK, I've found this: https://bugzilla.redhat.com/show_bug.cgi?id=640311 , 
> saying that it's now possible to set the TTL for multicast. But I haven't 
> found any info on *how* to set it. I've seen the following possible solutions:
> 
> <cman ttl="X" />
> 
> <cman>
>   <interface ttl="x" />
> </cman>
> 
> <totem>
>   <interface ttl="x" />
> </totem>
> 
> But whatever I do, ccs_config_validate always says that my cluster.conf is 
> invalid, and the TTL (as reported by tcpdump) is still zero. Is it possible 
> that my cman is out of date? I'm using version 3.0.12. Can you tell me which 
> is the eariest version that has this feature?
> 
> Thanks,

Looking in the cluster.rng file (the one used to validate cluster.conf),
'<cman ttl="x" ...>' should be valid. What version of cman are you using?

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron


From szekelyi at niif.hu  Thu Nov  3 19:17:46 2011
From: szekelyi at niif.hu (=?ISO-8859-1?Q?Sz=E9kelyi?= Szabolcs)
Date: Thu, 03 Nov 2011 20:17:46 +0100
Subject: [Linux-cluster] Running a cluster on routed networks
In-Reply-To: <4EB2CF75.60904@alteeve.com>
References: <19707654.WTfKzLJxNB@mranderson> <9710609.JpD2iA48WT@mranderson>
	<4EB2CF75.60904@alteeve.com>
Message-ID: <1864491.GxLbRM0kJ3@mranderson>

On 2011. November 3. 13:29:25 Digimer wrote:
> On 11/03/2011 01:12 PM, Sz?kelyi Szabolcs wrote:
> > On 2011. November 2. 21:00:18 Sz?kelyi Szabolcs wrote:
> >> how can I run a cluster on a network where nodes are on different
> >> subnets? Currently the main problem is that heartbeats are sent with
> >> their IP level TTL set to 1, which keeps them from reaching the other
> >> nodes. How can I change this? I'm using multicasting.
> > 
> > OK, I've found this: https://bugzilla.redhat.com/show_bug.cgi?id=640311
> > ,
> > saying that it's now possible to set the TTL for multicast. But I
> > haven't
> > found any info on *how* to set it.
[...]
> > But whatever I do, ccs_config_validate always says that my cluster.conf
> > is invalid, and the TTL (as reported by tcpdump) is still zero. Is it
> > possible that my cman is out of date? I'm using version 3.0.12. Can you
> > tell me which is the eariest version that has this feature?
> > 
> > Thanks,
> 
> Looking in the cluster.rng file (the one used to validate cluster.conf),
> '<cman ttl="x" ...>' should be valid. What version of cman are you using?

If I add the ttl="8" attribute to <cman> in cluster.conf, it fails to validate 
according to ccs_config_validate. Without this attribute it validates. I've 
grepped cluster.rng for "ttl", but found nothing sensible. It looks like it's 
missing from my cluster.rng.

My cman's version is 3.0.12:
$ sudo cman_tool -V
cman_tool 3.0.12 (built Jul  2 2010 09:55:13)

The cluster starts with the attibute (it issues a warning), but the TTL is 
still 1.

I've already upgraded corosync to support TTL adjustment, but it looks like I 
just have a problem to push it through cman.

Thanks,
-- 
cc


From jpokorny at redhat.com  Thu Nov  3 19:29:50 2011
From: jpokorny at redhat.com (Jan Pokorny)
Date: Thu, 03 Nov 2011 20:29:50 +0100
Subject: [Linux-cluster] Issue with Conga on RHEL 5.7
In-Reply-To: <C088D3516432C643AC828162A5164A7F0AFFB9FE@se3lmwbibaw.COMMUN.AD.SNCF.FR>
References: <CAH8vcWy=73gd-vGwFgmeuVmioe-t3FsJg9T05qCy5W_UPcasdQ@mail.gmail.com>
	<C088D3516432C643AC828162A5164A7F0AFFB9FE@se3lmwbibaw.COMMUN.AD.SNCF.FR>
Message-ID: <4EB2EBAE.9010801@redhat.com>

Hello,

On 11/02/2011 09:42 AM, BONNETOT Jean-Daniel (EXT THALES) wrote:
> Hello,
>
> We are many with same problem. Since RHEL 5.7, packages installation don't work from ricci.
> For now, our solution is to install packages manually without using "groupinstall" command.
>
> yum install cman rgmanager qdiskd ...

yes, as I mentioned in the (late-coming, I admit) reply [1] to already
posted observation [2], this is a known issue with the fixes (more
releases affected) being almost out of the door -- updated packages
for RHEL 5.7 should be already available [3].

> Regards,
>
> Jean-Daniel BONNETOT
> Ing?nierie Syst?me Aix&  Linux
>
> De : linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] De la part de W. E. W. Russell
> Envoy? : mardi 1 novembre 2011 21:49
> ? : linux-cluster at redhat.com
> Cc : Gerardo Laracuente
> Objet : [Linux-cluster] Issue with Conga on RHEL 5.7
>
> My name is William Russell and I'm new on this list.
>
> I'm having an issue that I think is really simple, but I can't even figure out where the problem is located.
>
> I have installed 'luci' and 'ricci' on my main server and 'ricci' on my other two servers. All servers are running the latest RHEL 5.7 with all the yum updates (RHN registered with the Clustering entitlements).
> When I create a new cluster, I get to the progress screen and that's where everything falls apart. It just sits there for hours! It never creates the cluster. The dot for "Install" NEVER fills in. After much research, I understand what 'luci' is trying to do in terms of installing the packages necessary to configure, manage, and run the cluster, but I have done the 'yum groupinstall clustering' on another sever and it took 10 mins, if that.
>
> Communication between the servers has been verified - iptables is not running, selinux is disabled.
>
> If you need more information on the issue, feel free to ask. I looked at the syslog and see no failures or errors. If anyone can even point me in the direction of what might be causing the problem, it would be much appreciated.
>

Also, as I mentioned in [1], while RHEL 5.7 exhibits the issue
reliably, we would appreciate details about reproducers with 5.6
and especially with 6.x.


[1] https://www.redhat.com/archives/linux-cluster/2011-October/msg00058.html
[2] 
https://www.redhat.com/archives/linux-cluster/2011-September/msg00020.html
[3] http://rhn.redhat.com/errata/RHBA-2011-1421.html


Thanks,

Jan


From jochen.schneider at gmail.com  Fri Nov  4 09:03:52 2011
From: jochen.schneider at gmail.com (Jochen Schneider)
Date: Fri, 4 Nov 2011 10:03:52 +0100
Subject: [Linux-cluster] Failover after partial failure because of SAN?
Message-ID: <CAKEmP882N=XJ52H6dGBUtxSPMHNhFUZjPSc9+8Pji8=wLM33cQ@mail.gmail.com>

Hi,

We are setting up a cluster for a storage application with SAN disks
managed through HA-LVM and connected through multipath. There are actually
two applications which have to run on the same node, but only one of them
needs the disk. Both of them have clients.

The question I have is what should happen when the SAN fails: Should both
applications failover to another machine (possibly after a retry) or should
the application which doesn't need the disk keep running while the other is
shut down? I'm not sure how much recovery can come out of a failover in
case of a SAN failure, if it's not both network cards of the node which are
defective or whatever.

Thanks,

  Jochen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111104/6b2c8221/attachment.htm>

From list at fajar.net  Fri Nov  4 10:04:57 2011
From: list at fajar.net (Fajar A. Nugraha)
Date: Fri, 4 Nov 2011 17:04:57 +0700
Subject: [Linux-cluster] Failover after partial failure because of SAN?
In-Reply-To: <CAKEmP882N=XJ52H6dGBUtxSPMHNhFUZjPSc9+8Pji8=wLM33cQ@mail.gmail.com>
References: <CAKEmP882N=XJ52H6dGBUtxSPMHNhFUZjPSc9+8Pji8=wLM33cQ@mail.gmail.com>
Message-ID: <CAG1y0seBtMq5jwoQ8ykZrrdLnW09Dmeo4Z-+_aHd_=3t3MdRkA@mail.gmail.com>

On Fri, Nov 4, 2011 at 4:03 PM, Jochen Schneider
<jochen.schneider at gmail.com> wrote:
> Hi,
>
> We are setting up a cluster for a storage application with SAN disks managed
> through HA-LVM and connected through multipath. There are actually two
> applications which have to run on the same node,

HAVE to run on the same node? Why? Can't they communicate via TCP/IP?

> but only one of them needs
> the disk. Both of them have clients.
>
> The question I have is what should happen when the SAN fails: Should both
> applications failover to another machine (possibly after a retry) or should
> the application which doesn't need the disk keep running while the other is
> shut down?

You're not giving yourself much option. Since you say both application
HAVE to run on the same node, I assume both are related (e.g. one
needs the other). In that case, the only viable option is to failover.

Having said that, I'm curious what do you mean by "SAN fails". It's
rare for a cluster node to be suddenly unable to access a node while
the other can access it just fine. Usually it's either the SAN
unaccessible completely (e.g. broken SAN or switches) or a server node
fails.

> I'm not sure how much recovery can come out of a failover in case
> of a SAN failure, if it's not both network cards of the node which are
> defective or whatever.

Exactly :)

If no node can access the SAN, then it can't failover anywhere.

-- 
Fajar


From jochen.schneider at gmail.com  Fri Nov  4 11:11:31 2011
From: jochen.schneider at gmail.com (Jochen Schneider)
Date: Fri, 4 Nov 2011 12:11:31 +0100
Subject: [Linux-cluster] Failover after partial failure because of SAN?
In-Reply-To: <CAG1y0seBtMq5jwoQ8ykZrrdLnW09Dmeo4Z-+_aHd_=3t3MdRkA@mail.gmail.com>
References: <CAKEmP882N=XJ52H6dGBUtxSPMHNhFUZjPSc9+8Pji8=wLM33cQ@mail.gmail.com>
	<CAG1y0seBtMq5jwoQ8ykZrrdLnW09Dmeo4Z-+_aHd_=3t3MdRkA@mail.gmail.com>
Message-ID: <CAKEmP8-rCrE6mMzr=tgQoR9z53CwmsDBZZafnHbiCZg6tTgDjQ@mail.gmail.com>

On Fri, Nov 4, 2011 at 11:04 AM, Fajar A. Nugraha <list at fajar.net> wrote:
>
> On Fri, Nov 4, 2011 at 4:03 PM, Jochen Schneider
> <jochen.schneider at gmail.com> wrote:
> > Hi,
> >
> > We are setting up a cluster for a storage application with SAN disks managed
> > through HA-LVM and connected through multipath. There are actually two
> > applications which have to run on the same node,
>
> HAVE to run on the same node? Why? Can't they communicate via TCP/IP?

They are already communicating via TCP/IP, so they could be running on
different nodes, you are right. But they are working in pairs so they
shouldn't be like randomly distributed over the nodes. Also, we would
have to see what the performance impact would be to have them on
different nodes.

> > but only one of them needs
> > the disk. Both of them have clients.
> >
> > The question I have is what should happen when the SAN fails: Should both
> > applications failover to another machine (possibly after a retry) or should
> > the application which doesn't need the disk keep running while the other is
> > shut down?
>
> You're not giving yourself much option. Since you say both application
> HAVE to run on the same node, I assume both are related (e.g. one
> needs the other). In that case, the only viable option is to failover.

The one application not needing disk access can run without the other
so in case of SAN failure there could be a degraded mode where only
the first is serving its clients and the other is down.

> Having said that, I'm curious what do you mean by "SAN fails". It's
> rare for a cluster node to be suddenly unable to access a node while
> the other can access it just fine. Usually it's either the SAN
> inaccessible completely (e.g. broken SAN or switches) or a server node
> fails.

I'm am not sure, actually. I don't have any practical data points of a
"real" SAN failure, only one due to misconfiguration. That's why I
find it hard to decide on our configuration, I'm not sure about
possible failures, dependencies between them and (even rough)
probability estimates. (Has anybody ever come across a document
addressing that, maybe as failure assumptions behind a clustering
package and its configuration?)

> > I'm not sure how much recovery can come out of a failover in case
> > of a SAN failure, if it's not both network cards of the node which are
> > defective or whatever.
>
> Exactly :)
>
> If no node can access the SAN, then it can't failover anywhere.

If it is more likely that SAN access fails on the SAN side than on the
node side, I guess that would mean it would be better to keep the
application not needing the SAN running, i.e., not failing over. Or
maybe failover should be tried once and then my service should go in
the degraded mode described above? I'm not sure whether that is
possible.

> --
> Fajar

Thanks!

  Jochen


From list at fajar.net  Fri Nov  4 13:25:58 2011
From: list at fajar.net (Fajar A. Nugraha)
Date: Fri, 4 Nov 2011 20:25:58 +0700
Subject: [Linux-cluster] Failover after partial failure because of SAN?
In-Reply-To: <CAKEmP8-rCrE6mMzr=tgQoR9z53CwmsDBZZafnHbiCZg6tTgDjQ@mail.gmail.com>
References: <CAKEmP882N=XJ52H6dGBUtxSPMHNhFUZjPSc9+8Pji8=wLM33cQ@mail.gmail.com>
	<CAG1y0seBtMq5jwoQ8ykZrrdLnW09Dmeo4Z-+_aHd_=3t3MdRkA@mail.gmail.com>
	<CAKEmP8-rCrE6mMzr=tgQoR9z53CwmsDBZZafnHbiCZg6tTgDjQ@mail.gmail.com>
Message-ID: <CAG1y0se=NkqG0+-K75TovWOPWDk+Px7A9z+keLg92YrYnXkjag@mail.gmail.com>

On Fri, Nov 4, 2011 at 6:11 PM, Jochen Schneider
<jochen.schneider at gmail.com> wrote:
>> > I'm not sure how much recovery can come out of a failover in case
>> > of a SAN failure, if it's not both network cards of the node which are
>> > defective or whatever.
>>
>> Exactly :)
>>
>> If no node can access the SAN, then it can't failover anywhere.
>
> If it is more likely that SAN access fails on the SAN side than on the
> node side, I guess that would mean it would be better to keep the
> application not needing the SAN running, i.e., not failing over. Or
> maybe failover should be tried once and then my service should go in
> the degraded mode described above? I'm not sure whether that is
> possible.

I recommend you just keep it simple: treat the two applications
differently. Don't put any dependcy between them. Period.

That way when a node dies, they will be migrated to other nodes. If
the SAN dies, the one that doesn't need external disk will still work
just fine, while the one that needs it will be marked as dead (I
assume you have some kind of monitoring script for this already). Then
the dead one will try to either restart or moved to another node, and
if the SAN is also not available there it will simply die.

-- 
Fajar


From raju.rajsand at gmail.com  Fri Nov  4 17:00:00 2011
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Fri, 4 Nov 2011 22:30:00 +0530
Subject: [Linux-cluster] Failover after partial failure because of SAN?
In-Reply-To: <CAKEmP882N=XJ52H6dGBUtxSPMHNhFUZjPSc9+8Pji8=wLM33cQ@mail.gmail.com>
References: <CAKEmP882N=XJ52H6dGBUtxSPMHNhFUZjPSc9+8Pji8=wLM33cQ@mail.gmail.com>
Message-ID: <CA+YdgarOL9H-dp=GL2Y2OtMrMhtfQU072T5B59_K1A63hSdKhQ@mail.gmail.com>

Greetings,

On Fri, Nov 4, 2011 at 2:33 PM, Jochen Schneider
<jochen.schneider at gmail.com> wrote:
> Hi,
>
> The question I have is what should happen when the SAN fails:

You should be looking at SAN replication solutions if it fits your budget.

If you want alternatives, have look at DRBD for local storage redundancy.

I can't perceive any redundancy from you setup for the SPF of storage.

RHEL and VMware has been shouting from the top of their roof about
Storage Virtualization.

Have a look at that.

An oh, don't forget offsite DR and BCP (or BPC: Business process
continuity--permute the words) if your application is mission
critical.

-- 
Regards,

Rajagopal


From rossnick-lists at cybercat.ca  Fri Nov  4 18:05:34 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Fri, 4 Nov 2011 14:05:34 -0400
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
Message-ID: <57026C2F7E1748F1A63F624ED8087C1B@versa>

Hi !

We are curently using RHEL 6.1 with GFS2 file systems on top of 
fiber-channel stoarage for our cluster. All fs' are in lv's, with clvmd.

Our services are divided into directories. For exemple : 
/GFSVolume1/Service1, GFSVolume1/Service2, and so forth. Almost everuything 
the service needs to run is under those directories (apache, php 
executables, websites data, java servers, etc).

On some services, there are document directories that are huge, not that 
much in size (about 35 gigs), but in number of files, around one million. 
One service even has 3 data directories with that many files each.

It works pretty well for now, but when it comes to data update (via rsync) 
and backup (also via rsync), the node doing the rsync crawls to a stop, all 
16 logical cores are used at 100% system, and it sometimes freezes the file 
system for other services on other nodes.

We've changed recently the way the rsync is done, we just to a rsync -nv to 
see what files would be transfered and transfer thos files manually. But 
it's still too much sometimes for the gfs.

In our case, nfs is not an option, there is a lot of is_file called that 
access this directory structure all the time, and the added latency of nfs 
is not viable.

So, I'm thinking of putting each of thos directories into a single ext4 
filesystem of about 100 gigs to speed up all of those process. Where those 
huge directories are used, they are used by one service and one service 
alone. So, I would do a file system in cluster.conf, something like :

<fs device="/dev/VGx/documentsA" force_unmount="1" fstype="ext4" 
mountpoint="/GFSVolume1/Service1/documentsA" name="documentsA" 
options="noatime,quota=off"/>

So, first, is this doable ?

Second, is this risky ? In the sens of that with force_unmont true, I assume 
that no other node would mount that filesystem before it is unmounted on the 
stopping service. I know that for some reason umount could hang, but it's 
not likely since this data is mostly read-only. In that case the service 
would be failed and need to be manually restarted. What would be the 
consequence if the filesystem happens to be mounted on 2 nodes ?

One could add self_fence="1" to the fs line, so that even if it fails, it 
will self-fence the node to force the umount. But I'm not there yet.

Third, I've been told that it's not recommended to mount a file system like 
this "on top" of another clustered fs. Why is that ? I suppose I'll have to 
mount under /mnt/something and symlink to that.

Thanks for any insights. 


From rossnick-lists at cybercat.ca  Fri Nov  4 18:05:47 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Fri, 4 Nov 2011 14:05:47 -0400
Subject: [Linux-cluster] Corosync goes cpu to 95-99%
References: <4DD29D03.9080901@gmail.com>	<4DD2BAC3.50509@redhat.com>	<4DD2BD7D.5070704@gmail.com>	<4DD2CA90.6090802@redhat.com>	<3B50BA7445114813AE429BEE51A2BA52@versa>	<4DD78908.2030801@gmail.com>	<0B1965C8-9807-42B6-9453-01BE0C0B1DCB@cybercat.ca><4DD80D5D.10004@gmail.com>	<4DD873C7.8080402@cybercat.ca>	<22E7D11CD5E64E338A66811F31F06238@versa>	<4DE545D7.1080703@redhat.com>	<4DE69786.5010204@gmail.com><4DE6CAF6.4000002@cybercat.ca>	<4DE75602.1000408@gmail.com><51BB988BCCF547E69BF222BDAF34C4DE@versa><4E04B61B.9070208@cybercat.ca>
	<4E2D63DD.4050007@gmail.com><4E2D7329.6050607@redhat.com>
	<4E2D7425.4070801@gmail.com><4E2D8ECB.6020305@redhat.com>
	<4E2D8F87.30508@gmail.com><4E2D940B.5020803@redhat.com>
	<4E73073D.8010209@gmail.com>
Message-ID: <16366A53AA0D47A7A935FD7FE920D462@versa>

>> get a support signoff.  Also the corosync updates have not finished
>> through our validation process.  Only hot fixes (from support) are 
>> available
>>
>> Regards
>> -steve
>>
>
> Sorry to re-open this thread ... But exists any news about this problem??

In fact, there is !

It appears that this situation is within the microcode of some specific xeon 
"nahalem" (sorry for the spelling) processors... It has to do with switching 
cstate and the way rhel6.1 now switch state that was not done in 6.0.

You can look at bugzilla # 710265 and kb docs # 61105.

Our temporary fix for the moment was to disable cstate transition by adding 
:

intel_idle.max_cstate=0 processor.max_cstate=1

to the kernel line in grub.conf, update and reboot. We hadn't had any cpu 
spikes on any of the 5 nodes we've updated yet. The 3 remaining still 
haven't been updated due to production downtime.

Get a support signoff for this, I'm in no way endorsing this solution, as I 
can't know if you're in the same situation as mine.

Have fun ! 


From rossnick-lists at cybercat.ca  Fri Nov  4 18:11:45 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Fri, 4 Nov 2011 14:11:45 -0400
Subject: [Linux-cluster] When corosync-1.4.1-3.el6 will be released for
	rhel6.x?
References: <4E804353.1040605@gmail.com>	<4E80548D.1070904@redhat.com>	<4E805928.3020009@gmail.com>	<4E80633E.8020409@redhat.com><4E806843.6060202@gmail.com>
	<4E807B5B.5000606@redhat.com>
Message-ID: <62E11F24A0C846B9A39FC4AC1E08B209@versa>

>> But problem described in 709758 appears in my enviroment: One RHEL6.1 
> 
> Please contact GSS (Global Support Service). They can help you to:
> - Check if your configuration is valid
> - Check if architecture is valid
> - Give you "not yet" released package and/or hot fix
> - Propose backport to Z-stream for given bug
> 
> -> Basically everything what you are/will pay them for.

You might read :

https://www.redhat.com/archives/linux-cluster/2011-November/msg00027.html

for a temp fix.

Regards,


From carlopmart at gmail.com  Fri Nov  4 18:20:43 2011
From: carlopmart at gmail.com (carlopmart)
Date: Fri, 04 Nov 2011 19:20:43 +0100
Subject: [Linux-cluster] Corosync goes cpu to 95-99%
In-Reply-To: <16366A53AA0D47A7A935FD7FE920D462@versa>
References: <4DD29D03.9080901@gmail.com>	<4DD2BAC3.50509@redhat.com>	<4DD2BD7D.5070704@gmail.com>	<4DD2CA90.6090802@redhat.com>	<3B50BA7445114813AE429BEE51A2BA52@versa>	<4DD78908.2030801@gmail.com>	<0B1965C8-9807-42B6-9453-01BE0C0B1DCB@cybercat.ca><4DD80D5D.10004@gmail.com>	<4DD873C7.8080402@cybercat.ca>	<22E7D11CD5E64E338A66811F31F06238@versa>	<4DE545D7.1080703@redhat.com>	<4DE69786.5010204@gmail.com><4DE6CAF6.4000002@cybercat.ca>	<4DE75602.1000408@gmail.com><51BB988BCCF547E69BF222BDAF34C4DE@versa><4E04B61B.9070208@cybercat.ca>	<4E2D63DD.4050007@gmail.com><4E2D7329.6050607@redhat.com>	<4E2D7425.4070801@gmail.com><4E2D8ECB.6020305@redhat.com>	<4E2D8F87.30508@gmail.com><4E2D940B.5020803@redhat.com>	<4E73073D.8010209@gmail.com>
	<16366A53AA0D47A7A935FD7FE920D462@versa>
Message-ID: <4EB42CFB.7070908@gmail.com>

On 11/04/2011 07:05 PM, Nicolas Ross wrote:
>>> get a support signoff. Also the corosync updates have not finished
>>> through our validation process. Only hot fixes (from support) are
>>> available
>>>
>>> Regards
>>> -steve
>>>
>>
>> Sorry to re-open this thread ... But exists any news about this problem??
>
> In fact, there is !
>
> It appears that this situation is within the microcode of some specific
> xeon "nahalem" (sorry for the spelling) processors... It has to do with
> switching cstate and the way rhel6.1 now switch state that was not done
> in 6.0.
>
> You can look at bugzilla # 710265 and kb docs # 61105.
>
> Our temporary fix for the moment was to disable cstate transition by
> adding :
>
> intel_idle.max_cstate=0 processor.max_cstate=1
>
> to the kernel line in grub.conf, update and reboot. We hadn't had any
> cpu spikes on any of the 5 nodes we've updated yet. The 3 remaining
> still haven't been updated due to production downtime.
>
> Get a support signoff for this, I'm in no way endorsing this solution,
> as I can't know if you're in the same situation as mine.
>

good!! ... But this problem appears in AMD Opteron QuadCore, too ... At 
least in my installation ..


-- 
CL Martinez
carlopmart {at} gmail {d0t} com


From Colin.Simpson at iongeo.com  Fri Nov  4 18:50:17 2011
From: Colin.Simpson at iongeo.com (Colin Simpson)
Date: Fri, 04 Nov 2011 18:50:17 +0000
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <57026C2F7E1748F1A63F624ED8087C1B@versa>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
Message-ID: <1320432617.20017.104.camel@bhac.iouk.ioroot.tld>


On Fri, 2011-11-04 at 14:05 -0400, Nicolas Ross wrote:
> Hi !
> 
> So, I'm thinking of putting each of thos directories into a single ext4 
> filesystem of about 100 gigs to speed up all of those process. Where those 
> huge directories are used, they are used by one service and one service 
> alone. So, I would do a file system in cluster.conf, something like :
> 
> <fs device="/dev/VGx/documentsA" force_unmount="1" fstype="ext4" 
> mountpoint="/GFSVolume1/Service1/documentsA" name="documentsA" 
> options="noatime,quota=off"/>
> 
> So, first, is this doable ?

Yup, I have had some tasks that have needed to switch over to ext4 from
GFS2 (I'm on RHEL6 in case any of this makes a difference).

I'm using,

<fs device="/dev/cluvg00/lv00home" force_fsck="1" force_unmount="1"
mountpoint="/data/home" name="homefs" nfslock="1" options="acl"
quick_status="0" self_fence="0"/>

I fully let cluster.conf manage ext4 and have no mention of it in fstab.

> 
> Second, is this risky ? In the sens of that with force_unmont true, I assume 
> that no other node would mount that filesystem before it is unmounted on the 
> stopping service. I know that for some reason umount could hang, but it's 
> not likely since this data is mostly read-only. In that case the service 
> would be failed and need to be manually restarted. What would be the 
> consequence if the filesystem happens to be mounted on 2 nodes ?

The failing to umount mostly happens to me because I NFS export this
file system. Now in theory the cluster should take care of this by
freeing the NFS lockd's, but doesn't always happen for me. But you are
probably in a better position as it doesn't sound like you are doing NFS
on this. I've never seen it fail when NFS isn't involved on a fs. 

If the filesystems fails to umount, the service gets marked as failed,
so won't start on another node (and so won't mount on another node). 
However badness will happen if you manually disable the service and
reenable it on another node. The other node will assume the filesystem
isn't mounted anywhere else and mount it itself. The "solution", of
course, is to check any failed service to see where it was last running
on and make sure it's dependant fs's are umounted from that node, before
disabling it and bringing it back up. 

There was a resource agent patch floating around somewhere (that didn't
make it in so far) that would (as I remember) lock the clvmd to prevent
double mounting of non-clustered fs's. But I guess most people are using
GFS2 so not really a priority.

As you say below a failing to umount can be tackled by a self_fence, but
I haven't needed to go there yet. 

Also depending on how quickly you need the service back, quick_status
and force_fsck will have to be set accordingly. I wanted the paranoia of
checking for a good file system, others may want faster start times. 

> 
> One could add self_fence="1" to the fs line, so that even if it fails, it 
> will self-fence the node to force the umount. But I'm not there yet.
> 
> Third, I've been told that it's not recommended to mount a file system like 
> this "on top" of another clustered fs. Why is that ? I suppose I'll have to 
> mount under /mnt/something and symlink to that.
> 

Don't know on this. Maybe due to extra dependency issues that might
effect operations (or maybe just not thoroughly tested in GFS2, as not a
priority).  

> Thanks for any insights. 

Hopefully

Thanks

Colin

> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


From rossnick-lists at cybercat.ca  Fri Nov  4 19:40:11 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Fri, 4 Nov 2011 15:40:11 -0400
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <1320432617.20017.104.camel@bhac.iouk.ioroot.tld>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
	<1320432617.20017.104.camel@bhac.iouk.ioroot.tld>
Message-ID: <0CF2B72C-5D44-4B80-B212-83F43C24A3BC@cybercat.ca>


> Also depending on how quickly you need the service back, quick_status
> and force_fsck will have to be set accordingly. I wanted the paranoia of
> checking for a good file system, others may want faster start times. 

Thanks for the rsponse, I will go ahead and do some tests...

What is the quick_status setting ? I haven't seen it in the doc ?


From Colin.Simpson at iongeo.com  Fri Nov  4 19:50:34 2011
From: Colin.Simpson at iongeo.com (Colin Simpson)
Date: Fri, 04 Nov 2011 19:50:34 +0000
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <0CF2B72C-5D44-4B80-B212-83F43C24A3BC@cybercat.ca>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa><1320432617.20017.104.camel@bhac.iouk.ioroot.tld>
	<0CF2B72C-5D44-4B80-B212-83F43C24A3BC@cybercat.ca>
Message-ID: <1320436234.20017.112.camel@bhac.iouk.ioroot.tld>

On Fri, 2011-11-04 at 19:40 +0000, Nicolas Ross wrote:
> 
> > Also depending on how quickly you need the service back,
> quick_status
> > and force_fsck will have to be set accordingly. I wanted the
> paranoia of
> > checking for a good file system, others may want faster start times.
> 
> Thanks for the rsponse, I will go ahead and do some tests...
> 
> What is the quick_status setting ? I haven't seen it in the doc ?
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
According to the comment in the resource script:

Use quick status checks.  When set to 0 (the default), this
agent behaves normally.  When set to 1, this agent will not
log errors incurred or perform the file system accessibility
check (e.g. it will not try to read from/write to the file
system).  You should only set this to 1 if you have lots of
file systems on your cluster or you are seeing very high load
spikes as a direct result of this agent.

I'm guessing that if the checking of the filesystem is causing high load
you can disable this checking (presumably probes the filesystem
periodically). My reading is you really want 0 unless you are seeing an
issue.

Thanks

Colin


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


From bohdan at harazd.net  Fri Nov  4 21:35:44 2011
From: bohdan at harazd.net (Bohdan Sydor)
Date: Fri, 4 Nov 2011 22:35:44 +0100
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <57026C2F7E1748F1A63F624ED8087C1B@versa>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
Message-ID: <CA+p8omyDbx9n6QtXL6ciAmEpXdnQhr-oEfqRrufHLpeh0nz82w@mail.gmail.com>

Hi,

On Fri, Nov 4, 2011 at 7:05 PM, Nicolas Ross <rossnick-lists at cybercat.ca> wrote:
> On some services, there are document directories that are huge, not that
> much in size (about 35 gigs), but in number of files, around one million.
> One service even has 3 data directories with that many files each.
>
> It works pretty well for now, but when it comes to data update (via rsync)
> and backup (also via rsync), the node doing the rsync crawls to a stop, all
> 16 logical cores are used at 100% system, and it sometimes freezes the file
> system for other services on other nodes.

I have a 600GB GFS2 FS, and I resolved the issue with rsync that I run
ionice -c3 rsync -av ...
That way rsync is given the CPU for IO, if all other processes don't
require IO. Of course it takes a lot of time to compete the sync, but
if the time is not an issue, it can be a solution.

-- 
Regards,
Bohdan


From rossnick-lists at cybercat.ca  Fri Nov  4 22:42:00 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Fri, 4 Nov 2011 18:42:00 -0400
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <CA+p8omyDbx9n6QtXL6ciAmEpXdnQhr-oEfqRrufHLpeh0nz82w@mail.gmail.com>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
	<CA+p8omyDbx9n6QtXL6ciAmEpXdnQhr-oEfqRrufHLpeh0nz82w@mail.gmail.com>
Message-ID: <3E26DA09-2E25-493E-B64B-956A9535C97F@cybercat.ca>


> I have a 600GB GFS2 FS, and I resolved the issue with rsync that I run
> ionice -c3 rsync -av ...
> That way rsync is given the CPU for IO, if all other processes don't
> require IO. Of course it takes a lot of time to compete the sync, but
> if the time is not an issue, it can be a solution.

Oh, greet, I will try that asap!

How much more time? I don't mind taking 3 or 4 hours instead of 2.5, but if it goes up to 5 or 6, I'll consider an ext4 fs...


From bohdan at harazd.net  Fri Nov  4 22:58:05 2011
From: bohdan at harazd.net (Bohdan Sydor)
Date: Fri, 4 Nov 2011 23:58:05 +0100
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <3E26DA09-2E25-493E-B64B-956A9535C97F@cybercat.ca>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
	<CA+p8omyDbx9n6QtXL6ciAmEpXdnQhr-oEfqRrufHLpeh0nz82w@mail.gmail.com>
	<3E26DA09-2E25-493E-B64B-956A9535C97F@cybercat.ca>
Message-ID: <CA+p8omzp4cc0QObgPHPqUghSDefwqvrZuvvh9KrKLFhy_w7UhA@mail.gmail.com>

On Fri, Nov 4, 2011 at 11:42 PM, Nicolas Ross
<rossnick-lists at cybercat.ca> wrote:
>> ionice -c3 rsync -av ...
> Oh, greet, I will try that asap!
>
> How much more time? I don't mind taking 3 or 4 hours instead of 2.5, but if it goes up to 5 or 6, I'll consider an ext4 fs...

I can't answer your question because it all depends on other IO
operations that are running on your system with higher priority.
You can also consider setting the nice value eg 19 for rsync processes.

-- 
regards,
Bohdan


From rossnick-lists at cybercat.ca  Sat Nov  5 15:38:31 2011
From: rossnick-lists at cybercat.ca (rossnick-lists at cybercat.ca)
Date: Sat, 05 Nov 2011 11:38:31 -0400
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <CA+p8omyDbx9n6QtXL6ciAmEpXdnQhr-oEfqRrufHLpeh0nz82w@mail.gmail.com>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
	<CA+p8omyDbx9n6QtXL6ciAmEpXdnQhr-oEfqRrufHLpeh0nz82w@mail.gmail.com>
Message-ID: <web-115428922@cybercat.ca>

> I have a 600GB GFS2 FS, and I resolved the issue with
> rsync that I run
> ionice -c3 rsync -av ...
> That way rsync is given the CPU for IO, if all other
> processes don't
> require IO. Of course it takes a lot of time to compete
> the sync, but
> if the time is not an issue, it can be a solution.

I tried it on some directories, it seems that the peeks in
cpu are still present, but it seems not to affect the other
nodes as before. I'm not that sure, since it was not 100%
of the time I saw impact on other nodes.

I will keep that trick in mind...


From bergman at merctech.com  Sat Nov  5 18:17:09 2011
From: bergman at merctech.com (bergman at merctech.com)
Date: Sat, 05 Nov 2011 14:17:09 -0400
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: Your message of "Fri, 04 Nov 2011 14:05:34 EDT."
	<57026C2F7E1748F1A63F624ED8087C1B@versa>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
Message-ID: <5424.1320517029@localhost>

In the message dated: Fri, 04 Nov 2011 14:05:34 EDT,
The pithy ruminations from "Nicolas Ross" on 
<[Linux-cluster] Ext3/ext4 in a clustered environement> were:
=> Hi !
=> 

	[SNIP!]

=> 
=> On some services, there are document directories that are huge, not that 
=> much in size (about 35 gigs), but in number of files, around one million. 
=> One service even has 3 data directories with that many files each.

Ouch.

I've seen significant a performance drop with ext3 (and other) filesystems
with 10s to 100s of thousands of files per directory. Make sure that the
"directory hash" option is enabled for ext3. With ~1M files per directory, I'd
do some performance tests comparing rsync under ext3, ext4, and gfs befor
changing filesystems...while ext3/4 do perform better than gfs, the directory
size may be such an overwhelming factor that the filesystem choice is
irrelevent. 

=> 
=> It works pretty well for now, but when it comes to data update (via rsync) 
=> and backup (also via rsync), the node doing the rsync crawls to a stop, all 
=> 16 logical cores are used at 100% system, and it sometimes freezes the file 
=> system for other services on other nodes.

Ouch!

=> 
=> We've changed recently the way the rsync is done, we just to a rsync -nv to 
=> see what files would be transfered and transfer thos files manually. But 
=> it's still too much sometimes for the gfs.

Is this a GFS issue strictly, or an issue with rsync. Have you set up a
similar environment under ext3/4 to test jus the rsync part? Rsync is
known for being a memory & resource hog, particularly at the initial
stage of  building the filesystem tree.

I would strongly recommend benchmarking rsync on ext3/4 before making the
switch.

One option would be to do several 'rsync' operations (serially, not in
parallel!), each operating on a subset of the filesystem, while continuing
to use gfs.


	[SNIP!]

=> 
=> <fs device="/dev/VGx/documentsA" force_unmount="1" fstype="ext4" 
=> mountpoint="/GFSVolume1/Service1/documentsA" name="documentsA" 
=> options="noatime,quota=off"/>
=> 
=> So, first, is this doable ?

Yes.

We have been doing something very similar for the past ~2 years, except
not mounting the ext3/4 partition under a GFS mountpoint.

=> 
=> Second, is this risky ? In the sens of that with force_unmont true, I assume 
=> that no other node would mount that filesystem before it is unmounted on the 
=> stopping service. I know that for some reason umount could hang, but it's 
=> not likely since this data is mostly read-only. In that case the service 

We've experienced numerous cases where the filesystem hangs after a
service migration due a node (or service) failover. These hangs all
seem to be related to quota or NFS issues, so this may not be an issue
in your environment.

=> would be failed and need to be manually restarted. What would be the 
=> consequence if the filesystem happens to be mounted on 2 nodes ?

Most likely, filesystem corruption.

=> 
=> One could add self_fence="1" to the fs line, so that even if it fails, it 
=> will self-fence the node to force the umount. But I'm not there yet.

We don't do that...and haven't felt the need to.

=> 
=> Third, I've been told that it's not recommended to mount a file system like 
=> this "on top" of another clustered fs. Why is that ? I suppose I'll have to 

First of all, that's introducing another dependency. If you mount the ext3/4
partition under a local directory (ie., /export), then you could have nodes
that provide your rsync data service, without requiring GFS.

=> mount under /mnt/something and symlink to that.
=> 
=> Thanks for any insights. 
=> 

Mark


From kkovachev at varna.net  Mon Nov  7 10:13:52 2011
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Mon, 07 Nov 2011 12:13:52 +0200
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <57026C2F7E1748F1A63F624ED8087C1B@versa>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
Message-ID: <67914ccd78ddcf19d72a3d2302bf9298@mx.varna.net>

Hi,

On Fri, 4 Nov 2011 14:05:34 -0400, "Nicolas Ross"
<rossnick-lists at cybercat.ca> wrote:
> Hi !
> 
> We are curently using RHEL 6.1 with GFS2 file systems on top of 
> fiber-channel stoarage for our cluster. All fs' are in lv's, with clvmd.
> 

As they are LV's, you may try to make a snapshot and then mount it with
lock_nolock - faster and won't interfere with other services. If you are
not mounting it on dedicated backup node where the fs is not mounted, you
may need to change the UUID and lock table to be able to mount the snapshot
on the same machine.


From swhiteho at redhat.com  Mon Nov  7 10:54:30 2011
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Mon, 07 Nov 2011 10:54:30 +0000
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <5424.1320517029@localhost>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
	<5424.1320517029@localhost>
Message-ID: <1320663270.2762.16.camel@menhir>

Hi,

On Sat, 2011-11-05 at 14:17 -0400, bergman at merctech.com wrote:
> In the message dated: Fri, 04 Nov 2011 14:05:34 EDT,
> The pithy ruminations from "Nicolas Ross" on 
> <[Linux-cluster] Ext3/ext4 in a clustered environement> were:
> => Hi !
> => 
> 
> 	[SNIP!]
> 
> => 
> => On some services, there are document directories that are huge, not that 
> => much in size (about 35 gigs), but in number of files, around one million. 
> => One service even has 3 data directories with that many files each.
> 
> Ouch.
> 
> I've seen significant a performance drop with ext3 (and other) filesystems
> with 10s to 100s of thousands of files per directory. Make sure that the
> "directory hash" option is enabled for ext3. With ~1M files per directory, I'd
> do some performance tests comparing rsync under ext3, ext4, and gfs befor
> changing filesystems...while ext3/4 do perform better than gfs, the directory
> size may be such an overwhelming factor that the filesystem choice is
> irrelevent. 
> 
There are really two issues here, one is the performance of readdir and
listing the directory and the other is the performance of look ups of
individual inodes.

Turning on the hashing option for ext3 will improve the look up
performance, but make next to no different to the readdir performance.
GFS2 has had hashed directories, inherited from GFS, so on the look up
side of things, both should be fairly similar.

One issue though is that GFS2 will return the directory entries from
readdir in hash order. That is due to a restriction imposed by the
unfortunate combination of the Linux VFS readdir code and the GFS2
algorithm for expanding the directory hash table when it fills up.

Ideally, one would sort the returned entries into inode number order
before beginning the look ups of the individual inodes. I don't know if
rsync does this, or whether it is an option that can be turned on. It
should make a difference though. Also, being able to look up multiple
inodes in parallel should also dramatically improve the speed, if this
is possible. 

> => 
> => It works pretty well for now, but when it comes to data update (via rsync) 
> => and backup (also via rsync), the node doing the rsync crawls to a stop, all 
> => 16 logical cores are used at 100% system, and it sometimes freezes the file 
> => system for other services on other nodes.
> 
> Ouch!
> 
So the question is what is using all this cpu time? Is this being used
by rsync, or by some of the gfs2/dlm system daemons or even by some
other threads?

> => 
> => We've changed recently the way the rsync is done, we just to a rsync -nv to 
> => see what files would be transfered and transfer thos files manually. But 
> => it's still too much sometimes for the gfs.
> 
> Is this a GFS issue strictly, or an issue with rsync. Have you set up a
> similar environment under ext3/4 to test jus the rsync part? Rsync is
> known for being a memory & resource hog, particularly at the initial
> stage of  building the filesystem tree.
> 
> I would strongly recommend benchmarking rsync on ext3/4 before making the
> switch.
> 
> One option would be to do several 'rsync' operations (serially, not in
> parallel!), each operating on a subset of the filesystem, while continuing
> to use gfs.
> 
> 
I agree that we don't have enough information yet to make a judgement on
where the problem lies. It may well be something that can be resolved by
making some alterations in the way that rsync is done,

Steve.


From ajb2 at mssl.ucl.ac.uk  Mon Nov  7 11:43:29 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Mon, 07 Nov 2011 11:43:29 +0000
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <57026C2F7E1748F1A63F624ED8087C1B@versa>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
Message-ID: <4EB7C461.1010608@mssl.ucl.ac.uk>

Nicolas Ross wrote:

> On some services, there are document directories that are huge, not that 
> much in size (about 35 gigs), but in number of files, around one 
> million. One service even has 3 data directories with that many files each.

You are utterly mad.

Apart from the human readability aspects if someone attempts a directory 
listing, you're putting a substantial load on your system each time you 
attempt to go into those directories, even with dentry/inode caching 
tweaked out to maximums.

Directory inode hashing helps, but not for filesystem abuse on this scale.

Be glad you're using ext3/4 and not GFS, the problems are several orders 
of magnitude worse there (it can take 10 minutes to list a directory 
with 10,000 files in it, let alone 1,000,000)

  > It works pretty well for now, but when it comes to data update (via
> rsync) and backup (also via rsync), the node doing the rsync crawls to a 
> stop, all 16 logical cores are used at 100% system, and it sometimes 
> freezes the file system for other services on other nodes.

That's not particularly surprising - and a fairly solid hint you should 
be revisiting the way you lay out your files.

If you go for a hierarchical layout you'll see several orders of 
magnitude speedup in access time without any real effort at all.

If you absolutely must put that many files in a directory, then use a 
filesystem able to cope with such activities. Ext3/4 aren't it.


From xavier.montagutelli at unilim.fr  Mon Nov  7 12:57:35 2011
From: xavier.montagutelli at unilim.fr (Xavier Montagutelli)
Date: Mon, 7 Nov 2011 13:57:35 +0100
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <67914ccd78ddcf19d72a3d2302bf9298@mx.varna.net>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
	<67914ccd78ddcf19d72a3d2302bf9298@mx.varna.net>
Message-ID: <201111071357.35844.xavier.montagutelli@unilim.fr>

On Monday 07 November 2011 11:13:52 Kaloyan Kovachev wrote:
> Hi,
> 
> On Fri, 4 Nov 2011 14:05:34 -0400, "Nicolas Ross"
> 
> <rossnick-lists at cybercat.ca> wrote:
> > Hi !
> > 
> > We are curently using RHEL 6.1 with GFS2 file systems on top of
> > fiber-channel stoarage for our cluster. All fs' are in lv's, with clvmd.
> 
> As they are LV's, you may try to make a snapshot 

Is it possible to make snapshots in a *cluster* LVM environment ? Last time I 
read the manual it was not possible.

Oops, possible starting with RH 6.1, okay ... 
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-
single/Logical_Volume_Manager_Administration/index.html#snapshot_command

> and then mount it with
> lock_nolock - faster and won't interfere with other services. 

If you can make snapshots, I agree, it is a good solution to mount a snapshot 
on a dedicated node.

But perhaps this can also be done at the storage level, one layer deeper ?

> If you are
> not mounting it on dedicated backup node where the fs is not mounted, you
> may need to change the UUID and lock table to be able to mount the snapshot
> on the same machine.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Xavier Montagutelli
http://twitter.com/#!/XMontagutelli
Service Commun Informatique - Universite de Limoges
123, avenue Albert Thomas - 87060 Limoges cedex
Tel : +33 (0)5 55 45 77 20 /   Fax : +33 (0)5 55 45 75 95


From rpeterso at redhat.com  Mon Nov  7 13:35:47 2011
From: rpeterso at redhat.com (Bob Peterson)
Date: Mon, 07 Nov 2011 08:35:47 -0500 (EST)
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <201111071357.35844.xavier.montagutelli@unilim.fr>
Message-ID: <cf6bffda-4e2d-4c13-9e6d-cc5b192669a6@zmail06.collab.prod.int.phx2.redhat.com>

----- Original Message -----
| On Monday 07 November 2011 11:13:52 Kaloyan Kovachev wrote:
| > Hi,
| > 
| > On Fri, 4 Nov 2011 14:05:34 -0400, "Nicolas Ross"
| > 
| > <rossnick-lists at cybercat.ca> wrote:
| > > Hi !
| > > 
| > > We are curently using RHEL 6.1 with GFS2 file systems on top of
| > > fiber-channel stoarage for our cluster. All fs' are in lv's, with
| > > clvmd.
| > 
| > As they are LV's, you may try to make a snapshot
| 
| Is it possible to make snapshots in a *cluster* LVM environment ?
| Last time I
| read the manual it was not possible.

I highly suspect Nick was talking about _hardware_ snapshotting that is
supported by some SANs, _not_ our clustered snapshot software.

Regards,

Bob Peterson
Red Hat File Systems


From Nicholas.Geovanis at uscellular.com  Mon Nov  7 17:30:48 2011
From: Nicholas.Geovanis at uscellular.com (Geovanis, Nicholas)
Date: Mon, 7 Nov 2011 11:30:48 -0600
Subject: [Linux-cluster] NTP sync cause CNAM shutdown
Message-ID: <A64E914D72373B4287DF2222856FFF351298E9BC@ILBENEVS001.int.usc.local>

I can't find from where I leaned this "trick", but if you look at the
stock RH 5.6 startup script for ntpd you'll see it: If you put an IP
address (not hostname but numeric address) in the file
/etc/ntp/step-tickers, the startup script takes that to mean the
following: "Run ntpdate against the server(s) in /etc/ntp/step-tickers
before you establish yourself as ntp client". I point it at my very same
ntp server (just by address, name resolution isn't necessarily up yet).
This way the local clock gets "normalised" before it really tries to
properly sync via ntpd and that subsequent time sync isn't problematic.

More importantly, in one datacenter I have clusters serving GFS2 which
take so long to establish client-server with the NTP servers that
they'll startup inquorate almost every time _without_ doing this.

Nick Geovanis
US Cellular/Kforce Inc
v. 708-674-4924
e. Nicholas.Geovanis at uscellular.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111107/0a4b2422/attachment.htm>

From ufimtseva at gmail.com  Mon Nov  7 19:05:52 2011
From: ufimtseva at gmail.com (Elena Ufimtseva)
Date: Mon, 7 Nov 2011 14:05:52 -0500
Subject: [Linux-cluster] fence_ilo question
Message-ID: <CAEr7rXhwFRHdB5MBq_cuv7R2YnW0xhu7sm9bLH6TwOz7s4UbHA@mail.gmail.com>

Hello All

Anyone knows what is the latest version of fence_ilo or if fence_ilo (ILo3)
should support timeout parameter? I try connecting to
ILO (its hp ilo v3) manually and it works fine. But fencing does not work
in cluster.

Checking

fence_ilo -l admin -p password -o status -a 172.28.84.33
Unable to connect/login to fencing device

fence_ilo -V
2.0.115 (built Wed Aug 5 08:25:06 EDT 2009) Copyright (C) Red Hat, Inc.
2004 All rights reserved.

in strace output it looks like a timeout:

ioctl(3, TIOCGPTN, [6]) = 0 stat("/dev/pts/6", {st_mode=S_IFCHR|0620,
st_rdev=makedev(136, 6), ...}) = 0 statfs("/dev/pts/6",
{f_type="DEVPTS_SUPER_MAGIC", f_bsize=4096, f_blocks=0, f_bfree=0,
f_bavail=0, f_files=0, f_ffree=0, ioctl(3, TIOCSPTLCK, [0]) = 0 ioctl(3,
SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(3, TIOCGPTN, [6]) = 0 stat("/dev/pts/6", {st_mode=S_IFCHR|0620,
st_rdev=makedev(136, 6), ...}) = 0 open("/dev/pts/6", O_RDWR|O_NOCTTY) = 4
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2acc82a54020) = 3120 close(4) = 0 select(0, NULL, NULL,
NULL, {0, 50000}) = 0 (Timeout) write(3, "<?xml version=\"1.0\"?>\r\n", 23)
= 23 wait4(3120, 0x7fffd7c58474, WNOHANG, NULL) = 0 wait4(3120,
0x7fffd7c58474, WNOHANG, NULL) = 0 select(4, [3], [], [], {10, 0}) = 1 (in
[3], left {10, 0}) read(3, "<?xml version=\"1.0\"?>\r\n\r\n", 2000) = 25
select(0, NULL, NULL, NULL, {0, 100}) = 0 (Timeout) wait4(3120,
0x7fffd7c58474, WNOHANG, NULL) = 0 wait4(3120, 0x7fffd7c58474, WNOHANG,
NULL) = 0 select(4, [3], [], [], {9, 997862}) = 1 (in [3], left {6,
413000}) read(3, "HTTP/1.1 405 Method Not Allowed\r"..., 2000) = 132
select(0, NULL, NULL, NULL, {0, 100}) = 0 (Timeout) wait4(3120,
0x7fffd7c58474, WNOHANG, NULL) = 0 wait4(3120, 0x7fffd7c58474, WNOHANG,
NULL) = 0 select(4, [3], [], [], {6, 410183}) = 1 (in [3], left {6,
365000}) --- SIGCHLD (Child exited) @ 0 (0) --- read(3, 0x1108faa4, 2000) =
-1 EIO (Input/output error) write(2, "Unable to connect/login to fenci"...,
42Unable to connect/login to fencing device ) = 42 close(3) = 0 select(0,
NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) wait4(3120, [{WIFEXITED(s) &&
WEXITSTATUS(s) == 0}], 0, NULL) = 3120 rt_sigaction(SIGINT, {SIG_DFL, [],
SA_RESTORER, 0x39ec40e7c0}, {0x39fdebc330, [], SA_RESTORER, 0x39ec40e7c0},
8) = 0

That makes me think, that the default time out should be modified, but this
version of fence_ilo
doesn't have timeout option.

Does anyone knows if there is another version and if there is, where to get
it.


Thanks.

-- 
Elena
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111107/fb2920d2/attachment.htm>

From linux at alteeve.com  Mon Nov  7 19:12:03 2011
From: linux at alteeve.com (Digimer)
Date: Mon, 07 Nov 2011 14:12:03 -0500
Subject: [Linux-cluster] fence_ilo question
In-Reply-To: <CAEr7rXhwFRHdB5MBq_cuv7R2YnW0xhu7sm9bLH6TwOz7s4UbHA@mail.gmail.com>
References: <CAEr7rXhwFRHdB5MBq_cuv7R2YnW0xhu7sm9bLH6TwOz7s4UbHA@mail.gmail.com>
Message-ID: <4EB82D83.4020301@alteeve.com>

On 11/07/2011 02:05 PM, Elena Ufimtseva wrote:
> Hello All
> 
> Anyone knows what is the latest version of fence_ilo or if fence_ilo
> (ILo3) should support timeout parameter? I try connecting to 
> ILO (its hp ilo v3) manually and it works fine. But fencing does not
> work in cluster.
> 
> Checking 
> 
> fence_ilo -l admin -p password -o status -a 172.28.84.33
> Unable to connect/login to fencing device
> 
> fence_ilo -V
> 2.0.115 (built Wed Aug 5 08:25:06 EDT 2009) Copyright (C) Red Hat, Inc.
> 2004 All rights reserved.
> 
> in strace output it looks like a timeout:
> 
> ioctl(3, TIOCGPTN, [6]) = 0 stat("/dev/pts/6", {st_mode=S_IFCHR|0620,
> st_rdev=makedev(136, 6), ...}) = 0 statfs("/dev/pts/6",
> {f_type="DEVPTS_SUPER_MAGIC", f_bsize=4096, f_blocks=0, f_bfree=0,
> f_bavail=0, f_files=0, f_ffree=0, ioctl(3, TIOCSPTLCK, [0]) = 0 ioctl(3,
> SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
> ioctl(3, TIOCGPTN, [6]) = 0 stat("/dev/pts/6", {st_mode=S_IFCHR|0620,
> st_rdev=makedev(136, 6), ...}) = 0 open("/dev/pts/6", O_RDWR|O_NOCTTY) =
> 4 clone(child_stack=0,
> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x2acc82a54020) = 3120 close(4) = 0 select(0, NULL, NULL,
> NULL, {0, 50000}) = 0 (Timeout) write(3, "<?xml version=\"1.0\"?>\r\n",
> 23) = 23 wait4(3120, 0x7fffd7c58474, WNOHANG, NULL) = 0 wait4(3120,
> 0x7fffd7c58474, WNOHANG, NULL) = 0 select(4, [3], [], [], {10, 0}) = 1
> (in [3], left {10, 0}) read(3, "<?xml version=\"1.0\"?>\r\n\r\n", 2000)
> = 25 select(0, NULL, NULL, NULL, {0, 100}) = 0 (Timeout) wait4(3120,
> 0x7fffd7c58474, WNOHANG, NULL) = 0 wait4(3120, 0x7fffd7c58474, WNOHANG,
> NULL) = 0 select(4, [3], [], [], {9, 997862}) = 1 (in [3], left {6,
> 413000}) read(3, "HTTP/1.1 405 Method Not Allowed\r"..., 2000) = 132
> select(0, NULL, NULL, NULL, {0, 100}) = 0 (Timeout) wait4(3120,
> 0x7fffd7c58474, WNOHANG, NULL) = 0 wait4(3120, 0x7fffd7c58474, WNOHANG,
> NULL) = 0 select(4, [3], [], [], {6, 410183}) = 1 (in [3], left {6,
> 365000}) --- SIGCHLD (Child exited) @ 0 (0) --- read(3, 0x1108faa4,
> 2000) = -1 EIO (Input/output error) write(2, "Unable to connect/login to
> fenci"..., 42Unable to connect/login to fencing device ) = 42 close(3) =
> 0 select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) wait4(3120,
> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3120
> rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x39ec40e7c0},
> {0x39fdebc330, [], SA_RESTORER, 0x39ec40e7c0}, 8) = 0
> 
> That makes me think, that the default time out should be modified, but
> this version of fence_ilo
> doesn't have timeout option.
> 
> Does anyone knows if there is another version and if there is, where to
> get it.
> 
> 
> Thanks.
> 
> -- 
> Elena
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Looking at the cluster.rng, I see the follow options as being valid;

<attribute name="power_timeout" rha:description="Test X seconds for
status change after ON/OFF" />

<attribute name="shell_timeout" rha:description="Wait X seconds for cmd
prompt after issuing command" />

<attribute name="login_timeout" rha:description="Wait X seconds for cmd
prompt after login" />

<attribute name="power_wait" rha:description="Wait X seconds after
issuing ON/OFF" />

<attribute name="delay" rha:description="Wait X seconds before fencing
is started" />

To use these, try, for example, <fencedevice login_timeout="30" ...>

If this doesn't help, can you paste your cluster.conf file and the shell
call that works?

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron


From linux at alteeve.com  Mon Nov  7 19:36:24 2011
From: linux at alteeve.com (Digimer)
Date: Mon, 07 Nov 2011 14:36:24 -0500
Subject: [Linux-cluster] fence_ilo question
In-Reply-To: <CAEr7rXgfRucs8zqXYGAJTsdEcrYOMkyMNHJWZAnP9u=3Fm0Efw@mail.gmail.com>
References: <CAEr7rXhwFRHdB5MBq_cuv7R2YnW0xhu7sm9bLH6TwOz7s4UbHA@mail.gmail.com>
	<4EB82D83.4020301@alteeve.com>
	<CAEr7rXgfRucs8zqXYGAJTsdEcrYOMkyMNHJWZAnP9u=3Fm0Efw@mail.gmail.com>
Message-ID: <4EB83338.2050502@alteeve.com>

On 11/07/2011 02:18 PM, Elena Ufimtseva wrote:
> Thanks Digimer
> 
> 
> But look what confuses me here. 
> Cluster will run fence_node, right? It will read cluster.conf, get all
> these parameters, timouts and etc... and will run an agent which is in
> my case fence_ilo, correct?
> 
> Ok, looking at fence_ilo:
> 
>         fence_ilo [options]
> Options:
>    -o <action>    Action: status, reboot (default), off or on
>    -a <ip>        IP address or hostname of fencing device
>    -l <name>      Login name
>    -p <password>  Login password or passphrase
>    -S <script>    Script to run to retrieve password
>    -z             Use ssl connection
>    -r <version>   Force ribcl version to use
>    -q             Quiet mode
>    -v             Verbose mode
>    -D <debugfile> Debugging to output file
>    -V             Output version information and exit
>    -h             Display this help and exit
> 
> So how it can use the login_timout?
> 
> Thanks, 
> 
> Elena

The fence agent can be called in two ways; One is with command-line
switches, as you see there, and the other is by passing variable=value
pairs, one per line. It could be that not all the options are available
from the command line, but I can't say for sure without looking at the
source.

As for how it's called; When the cluster needs to fence a node, fenced
takes the named fence agent (fence_ilo here), gathers up the arguments
set in the victim node's <fence ...> element and the attributes in the
<fencedevice ...> element, calls the fence agent and then feeds in the
options, one at a time as I described above.

The fence_node command is really just a small script to trigger a fence
call in the cluster. It doesn't directly talk to the agent and as such,
relies entirely on the configuration in the cluster to execute the fence.

Now, if you call 'fence_ilo' directly, then you can test sending
arguments to the agent to see how it behaves. Try putting the arguments
and values as:

vim /tmp/fence_args
foo=bar
baz=bing

Then cat'ing the file into the fence agent;

cat /tmp/fence_args | fence_ilo

This will simulate how the cluster calls the agent and should let you
test out the arguments I showed you in the first email.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron


From bturner at redhat.com  Mon Nov  7 19:40:42 2011
From: bturner at redhat.com (Ben Turner)
Date: Mon, 07 Nov 2011 14:40:42 -0500 (EST)
Subject: [Linux-cluster] fence_ilo question
In-Reply-To: <CAEr7rXhwFRHdB5MBq_cuv7R2YnW0xhu7sm9bLH6TwOz7s4UbHA@mail.gmail.com>
Message-ID: <3302935d-74e8-4e38-9954-ae5f9887e51c@zmail07.collab.prod.int.phx2.redhat.com>

Here is a list of the RH supported fence devices/agents:

https://access.redhat.com/kb/docs/DOC-30004

I can't speak for the upstream bits but I know in RHEL you need to use fence_impilan with iLO3s.  I suggest testing with fence_ipmilan.

-b


----- Original Message -----
> 
> Hello All
> 
> 
> Anyone knows what is the latest version of fence_ilo or if fence_ilo
> (ILo3) should support timeout parameter? I try connecting to
> ILO (its hp ilo v3) manually and it works fine. But fencing does not
> work in cluster.
> 
> 
> Checking
> 
> 
> fence_ilo -l admin -p password -o status -a 172.28.84.33
> Unable to connect/login to fencing device
> 
> 
> fence_ilo -V
> 2.0.115 (built Wed Aug 5 08:25:06 EDT 2009) Copyright (C) Red Hat,
> Inc. 2004 All rights reserved.
> 
> 
> in strace output it looks like a timeout:
> 
> 
> ioctl(3, TIOCGPTN, [6]) = 0 stat("/dev/pts/6", {st_mode=S_IFCHR|0620,
> st_rdev=makedev(136, 6), ...}) = 0 statfs("/dev/pts/6",
> {f_type="DEVPTS_SUPER_MAGIC", f_bsize=4096, f_blocks=0, f_bfree=0,
> f_bavail=0, f_files=0, f_ffree=0, ioctl(3, TIOCSPTLCK, [0]) = 0
> ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon
> echo ...}) = 0 ioctl(3, TIOCGPTN, [6]) = 0 stat("/dev/pts/6",
> {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 6), ...}) = 0
> open("/dev/pts/6", O_RDWR|O_NOCTTY) = 4 clone(child_stack=0,
> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x2acc82a54020) = 3120 close(4) = 0 select(0, NULL,
> NULL, NULL, {0, 50000}) = 0 (Timeout) write(3, "<?xml
> version=\"1.0\"?>\r\n", 23) = 23 wait4(3120, 0x7fffd7c58474,
> WNOHANG, NULL) = 0 wait4(3120, 0x7fffd7c58474, WNOHANG, NULL) = 0
> select(4, [3], [], [], {10, 0}) = 1 (in [3], left {10, 0}) read(3,
> "<?xml version=\"1.0\"?>\r\n\r\n", 2000) = 25 select(0, NULL, NULL,
> NULL, {0, 100}) = 0 (Timeout) wait4(3120, 0x7fffd7c58474, WNOHANG,
> NULL) = 0 wait4(3120, 0x7fffd7c58474, WNOHANG, NULL) = 0 select(4,
> [3], [], [], {9, 997862}) = 1 (in [3], left {6, 413000}) read(3,
> "HTTP/1.1 405 Method Not Allowed\r"..., 2000) = 132 select(0, NULL,
> NULL, NULL, {0, 100}) = 0 (Timeout) wait4(3120, 0x7fffd7c58474,
> WNOHANG, NULL) = 0 wait4(3120, 0x7fffd7c58474, WNOHANG, NULL) = 0
> select(4, [3], [], [], {6, 410183}) = 1 (in [3], left {6, 365000})
> --- SIGCHLD (Child exited) @ 0 (0) --- read(3, 0x1108faa4, 2000) =
> -1 EIO (Input/output error) write(2, "Unable to connect/login to
> fenci"..., 42Unable to connect/login to fencing device ) = 42
> close(3) = 0 select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
> wait4(3120, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3120
> rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x39ec40e7c0},
> {0x39fdebc330, [], SA_RESTORER, 0x39ec40e7c0}, 8) = 0
> 
> 
> That makes me think, that the default time out should be modified,
> but this version of fence_ilo
> doesn't have timeout option.
> 
> 
> Does anyone knows if there is another version and if there is, where
> to get it.
> 
> 
> 
> 
> Thanks.
> 
> 
> --
> Elena
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From zagar at arlut.utexas.edu  Mon Nov  7 19:43:54 2011
From: zagar at arlut.utexas.edu (Randy Zagar)
Date: Mon, 07 Nov 2011 13:43:54 -0600
Subject: [Linux-cluster] Linux-cluster Digest, Vol 91, Issue 7
In-Reply-To: <mailman.31.1320685210.28754.linux-cluster@redhat.com>
References: <mailman.31.1320685210.28754.linux-cluster@redhat.com>
Message-ID: <4EB834FA.2000702@arlut.utexas.edu>

I'm betting, Alan, that you're used to being able to tell your users how 
to do their work... :-)  Some of us don't live in that world, and our 
users are allowed to do some incredibly stupid things.

I'm curious what filesystem you would recommend for his use-case.

-RZ

p.s.  Nicolas' users are only slightly "angry" compared to mine... My 
users are well and truly MAD, however.  Think 16-TB ext4 filesystems 
with 400,000,000 files smaller than 64K.

On 11/07/2011, Alan Brown <ajb2 at mssl.ucl.ac.uk> wrote:
> Nicolas Ross wrote:
>> On some services, there are document directories that are huge, not that
>> much in size (about 35 gigs), but in number of files, around one
>> million. One service even has 3 data directories with that many files each.
> You are utterly mad.
>
> Apart from the human readability aspects if someone attempts a directory
> listing, you're putting a substantial load on your system each time you
> attempt to go into those directories, even with dentry/inode caching
> tweaked out to maximums.
>
> Directory inode hashing helps, but not for filesystem abuse on this scale.
>
> Be glad you're using ext3/4 and not GFS, the problems are several orders
> of magnitude worse there (it can take 10 minutes to list a directory
> with 10,000 files in it, let alone 1,000,000)
>
>    >  It works pretty well for now, but when it comes to data update (via
>> rsync) and backup (also via rsync), the node doing the rsync crawls to a
>> stop, all 16 logical cores are used at 100% system, and it sometimes
>> freezes the file system for other services on other nodes.
> That's not particularly surprising - and a fairly solid hint you should
> be revisiting the way you lay out your files.
>
> If you go for a hierarchical layout you'll see several orders of
> magnitude speedup in access time without any real effort at all.
>
> If you absolutely must put that many files in a directory, then use a
> filesystem able to cope with such activities. Ext3/4 aren't it.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5434 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111107/3f4ba2c0/attachment.p7s>

From devrim at gunduz.org  Mon Nov  7 19:48:55 2011
From: devrim at gunduz.org (Devrim =?ISO-8859-1?Q?G=DCND=DCZ?=)
Date: Mon, 07 Nov 2011 17:48:55 -0200
Subject: [Linux-cluster] fence_ilo question
In-Reply-To: <CAEr7rXhwFRHdB5MBq_cuv7R2YnW0xhu7sm9bLH6TwOz7s4UbHA@mail.gmail.com>
References: <CAEr7rXhwFRHdB5MBq_cuv7R2YnW0xhu7sm9bLH6TwOz7s4UbHA@mail.gmail.com>
Message-ID: <1320695343.2135.32.camel@lenovo01-laptop03.gunduz.org>

On Mon, 2011-11-07 at 14:05 -0500, Elena Ufimtseva wrote:
> 
> 
> Anyone knows what is the latest version of fence_ilo or if fence_ilo
> (ILo3) should support timeout parameter? I try connecting to
> ILO (its hp ilo v3) manually and it works fine. But fencing does not
> work in cluster.
> 
Have you tried with fence_ipmilan? IIRC fence_ilo* does not work with
iLO3.

Regards,
-- 
Devrim G?ND?Z
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Dan??man?/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111107/82387fec/attachment.sig>

From ufimtseva at gmail.com  Mon Nov  7 20:04:10 2011
From: ufimtseva at gmail.com (Elena Ufimtseva)
Date: Mon, 7 Nov 2011 15:04:10 -0500
Subject: [Linux-cluster] fence_ilo question
In-Reply-To: <1320695343.2135.32.camel@lenovo01-laptop03.gunduz.org>
References: <CAEr7rXhwFRHdB5MBq_cuv7R2YnW0xhu7sm9bLH6TwOz7s4UbHA@mail.gmail.com>
	<1320695343.2135.32.camel@lenovo01-laptop03.gunduz.org>
Message-ID: <CAEr7rXgBy3Vzd4TJoUjm43J=2ryijY4ZA6QooZTAgqLaj5XnKQ@mail.gmail.com>

Ok, let me test this one.

2011/11/7 Devrim G?ND?Z <devrim at gunduz.org>

> On Mon, 2011-11-07 at 14:05 -0500, Elena Ufimtseva wrote:
> >
> >
> > Anyone knows what is the latest version of fence_ilo or if fence_ilo
> > (ILo3) should support timeout parameter? I try connecting to
> > ILO (its hp ilo v3) manually and it works fine. But fencing does not
> > work in cluster.
> >
> Have you tried with fence_ipmilan? IIRC fence_ilo* does not work with
> iLO3.
>
> Regards,
> --
> Devrim G?ND?Z
> Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
> PostgreSQL Dan??man?/Consultant, Red Hat Certified Engineer
> Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
> http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Elena
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111107/9d97ce83/attachment.htm>

From rossnick-lists at cybercat.ca  Mon Nov  7 21:00:27 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Mon, 7 Nov 2011 16:00:27 -0500
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
	<5424.1320517029@localhost>
Message-ID: <F57B9FE8BF2B45539C7F857327D7CC1B@versa>

> I've seen significant a performance drop with ext3 (and other) filesystems
> with 10s to 100s of thousands of files per directory. Make sure that the
> "directory hash" option is enabled for ext3. With ~1M files per directory, 
> I'd
> do some performance tests comparing rsync under ext3, ext4, and gfs befor
> changing filesystems...while ext3/4 do perform better than gfs, the 
> directory
> size may be such an overwhelming factor that the filesystem choice is
> irrelevent.

Get me right, there are millions of files, but no more than a few hundreds 
per directory. They are spread out splited on the database id, 2 caracters 
at a time. So a file name 1234567.jpg would end up in a directory 12/34/5/, 
or something similar.

> Is this a GFS issue strictly, or an issue with rsync. Have you set up a
> similar environment under ext3/4 to test jus the rsync part? Rsync is
> known for being a memory & resource hog, particularly at the initial
> stage of  building the filesystem tree.
>
> I would strongly recommend benchmarking rsync on ext3/4 before making the
> switch.
>
> One option would be to do several 'rsync' operations (serially, not in
> parallel!), each operating on a subset of the filesystem, while continuing
> to use gfs.

Yes it is a GFS specific, our backup server is on ext3 and rsyncing can be 
made in a couple of hours, without eating cpu at all (only memory), and 
without bringing the server on it's knees.

Spliting on subdirectories might be an option, but that would be more like a 
band-aid... I'll try to avoid that.

> => <fs device="/dev/VGx/documentsA" force_unmount="1" fstype="ext4"
> => mountpoint="/GFSVolume1/Service1/documentsA" name="documentsA"
> => options="noatime,quota=off"/>
> =>
> => So, first, is this doable ?
>
> Yes.
>
> We have been doing something very similar for the past ~2 years, except
> not mounting the ext3/4 partition under a GFS mountpoint.

I will be doing some expiriment with that...

>
> =>
> => Second, is this risky ? In the sens of that with force_unmont true, I 
> assume
> => that no other node would mount that filesystem before it is unmounted 
> on the
> => stopping service. I know that for some reason umount could hang, but 
> it's
> => not likely since this data is mostly read-only. In that case the 
> service
>
> We've experienced numerous cases where the filesystem hangs after a
> service migration due a node (or service) failover. These hangs all
> seem to be related to quota or NFS issues, so this may not be an issue
> in your environment.

While we do not use nfs on top of the 3 most important directories, it will 
be used on some of those volumes...

> => would be failed and need to be manually restarted. What would be the
> => consequence if the filesystem happens to be mounted on 2 nodes ?
>
> Most likely, filesystem corruption.

Other responses led me to beleive that if I let the cluster manage the 
filesystem, and never mount it myselef, it's much less likely to happen.

Thanks 


From rossnick-lists at cybercat.ca  Mon Nov  7 21:05:13 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Mon, 7 Nov 2011 16:05:13 -0500
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
References: <57026C2F7E1748F1A63F624ED8087C1B@versa><5424.1320517029@localhost>
	<1320663270.2762.16.camel@menhir>
Message-ID: <6A7AA62BA886445A8505E2D454CAAE51@versa>

[...]

>> => It works pretty well for now, but when it comes to data update (via 
>> rsync)
>> => and backup (also via rsync), the node doing the rsync crawls to a 
>> stop, all
>> => 16 logical cores are used at 100% system, and it sometimes freezes the 
>> file
>> => system for other services on other nodes.
>>
>> Ouch!
>>
> So the question is what is using all this cpu time? Is this being used
> by rsync, or by some of the gfs2/dlm system daemons or even by some
> other threads?

It is used by gfs2/dlm, or some other process, and 100% system cpu, not user 
cpu. At that time, my rsync process is stalled, if I try to ctrl-z it, it's 
no joy, only when cpu cools down that my ctrl-z works.

Previously it was suggested I use nice and mostly ionice, it did help alot, 
but I still see good spikes in cpu, but now my servers still responds... 


From rossnick-lists at cybercat.ca  Mon Nov  7 21:21:59 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Mon, 7 Nov 2011 16:21:59 -0500
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
References: <cf6bffda-4e2d-4c13-9e6d-cc5b192669a6@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID: <DBC9A328666046E9A527993046ABB073@versa>

> | Is it possible to make snapshots in a *cluster* LVM environment ?
> | Last time I
> | read the manual it was not possible.
>
> I highly suspect Nick was talking about _hardware_ snapshotting that is
> supported by some SANs, _not_ our clustered snapshot software.

You are right, I don't beleive our enclosure support hw snapshot. It's not 
that smart...

But in our case, I don't see what good a snapshot would do.

A little bit more on our client's process. They work on documenting and 
publishing documents and images to a so-called "work version" named 
doctravail/ (in french ;-). Once they decides a document, image, whatever is 
good for production it is copied into the so-called "curent version" 
directory named doccourant/.

On theire web sites, there is 3 databases used in rotation, doca, b and c.

At night, data is fetch from their intranet database and pumped into the 
next to be used database, and then the data from the doccourant/ is rsync-ed 
into the corresponding document directory of the web server.

So I during the night, data from the doctravail/ directory doesn't move. It 
won't help if I use a snapshot of that volume to rsync to the destination. I 
think...

Regards, 


From rhayden.public at gmail.com  Mon Nov  7 21:48:19 2011
From: rhayden.public at gmail.com (Robert Hayden)
Date: Mon, 7 Nov 2011 15:48:19 -0600
Subject: [Linux-cluster] RHCS between VMs in a large KVM cluster?
Message-ID: <CANqTVAFsUHMsbi19+8p0tR+=0uU8Qw6nenk5=5MRwJeg5OhSmQ@mail.gmail.com>

Looking to see if anyone has worked with creating a multi-node RHCS cluster
comprised of virtual machines in a large KVM pool.  I am investigating the
potential of creating a KVM pool that will consist of 10s of physical
machines that will provide 100s of VMs.  Some of the VMs need to have HA
configurations similar to what is currently being provided by RHCS.  I
would like to transfer the RHCS done today on physical machines into the VM
world without application changes.

I have some feelers into my Red Hat contacts, but I thought I would see
what the community is doing as well.

BTW - we are also researching the use of the unique HA solutions that a VM
world provides in parallel to the RHCS configurations.

Thanks
Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111107/711f085a/attachment.htm>

From Nicholas.Geovanis at uscellular.com  Mon Nov  7 21:51:09 2011
From: Nicholas.Geovanis at uscellular.com (Geovanis, Nicholas)
Date: Mon, 7 Nov 2011 15:51:09 -0600
Subject: [Linux-cluster] Issue with Conga on RHEL 5.7
Message-ID: <A64E914D72373B4287DF2222856FFF3512A52DB8@ILBENEVS001.int.usc.local>

Jan Pokorny <jpokorny redhat com> wrote:

>> Also, as I mentioned in [1], while RHEL 5.7 exhibits the issue
>> reliably, we would appreciate details about reproducers with 5.6
>> and especially with 6.x.

I believe that I experienced this on 5.6, sadly the logs are gone by
now. And it turned-out that RHCS were not supported in Vmware VMs at
5.6, so I upgraded to 5.7. Frankly, I also gave-up on Conga, even though
it works fine for basic situations. For roll-outs of several dozen
servers you need to establish greater control over the configuration I
think.

Nick Geovanis
US Cellular/Kforce Inc
v. 708-674-4924
e. Nicholas.Geovanis at uscellular.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111107/fd6d942e/attachment.htm>

From wewrussell at gmail.com  Mon Nov  7 23:52:52 2011
From: wewrussell at gmail.com (W. E. W. Russell)
Date: Mon, 7 Nov 2011 18:52:52 -0500
Subject: [Linux-cluster] Issue with Conga on RHEL 5.7
In-Reply-To: <A64E914D72373B4287DF2222856FFF3512A52DB8@ILBENEVS001.int.usc.local>
References: <A64E914D72373B4287DF2222856FFF3512A52DB8@ILBENEVS001.int.usc.local>
Message-ID: <CAH8vcWzE+6+qXtpSUo5hCyr5c-gc24S0Zgqyx6vh9xuGDyrCCw@mail.gmail.com>

Just FYI...the issues was with the checkbox "Enabled Shared Storage"...once
that was deselected, I was able to complete all the tasks. There seems to
be an issue installing those packages. Thanks for all the help!

On Mon, Nov 7, 2011 at 4:51 PM, Geovanis, Nicholas <
Nicholas.Geovanis at uscellular.com> wrote:

> **
>
> Jan Pokorny <jpokorny redhat com> wrote:
>
> >> Also, as I mentioned in [1], while RHEL 5.7 exhibits the issue
> >> reliably, we would appreciate details about reproducers with 5.6
> >> and especially with 6.x.
>
> I believe that I experienced this on 5.6, sadly the logs are gone by now.
> And it turned-out that RHCS were not supported in Vmware VMs at 5.6, so I
> upgraded to 5.7. Frankly, I also gave-up on Conga, even though it works
> fine for basic situations. For roll-outs of several dozen servers you need
> to establish greater control over the configuration I think.
>
> Nick Geovanis
> US Cellular/Kforce Inc
> v. 708-674-4924
> e. Nicholas.Geovanis at uscellular.com
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
W. E. W. Russell
Director, Systems Intergration at incNETWORKS, Inc.
Work Phone # 732-508-2224

Active Alumni member of Sigma Lambda Beta International Fraternity, Inc.
Cell Phone # 732-744-6483
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111107/db374133/attachment.htm>

From kkovachev at varna.net  Tue Nov  8 10:18:38 2011
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Tue, 08 Nov 2011 12:18:38 +0200
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <DBC9A328666046E9A527993046ABB073@versa>
References: <cf6bffda-4e2d-4c13-9e6d-cc5b192669a6@zmail06.collab.prod.int.phx2.redhat.com>
	<DBC9A328666046E9A527993046ABB073@versa>
Message-ID: <5c1c1a808c10f0b3255c43d50269f754@mx.varna.net>

On Mon, 7 Nov 2011 16:21:59 -0500, "Nicolas Ross"
<rossnick-lists at cybercat.ca> wrote:
>> | Is it possible to make snapshots in a *cluster* LVM environment ?
>> | Last time I
>> | read the manual it was not possible.
>>
>> I highly suspect Nick was talking about _hardware_ snapshotting that is
>> supported by some SANs, _not_ our clustered snapshot software.
> 
> You are right, I don't beleive our enclosure support hw snapshot. It's
not 
> that smart...
> 
> But in our case, I don't see what good a snapshot would do.
> 
> A little bit more on our client's process. They work on documenting and 
> publishing documents and images to a so-called "work version" named 
> doctravail/ (in french ;-). Once they decides a document, image,
whatever
> is 
> good for production it is copied into the so-called "curent version" 
> directory named doccourant/.
> 
> On theire web sites, there is 3 databases used in rotation, doca, b and
c.
> 
> At night, data is fetch from their intranet database and pumped into the

> next to be used database, and then the data from the doccourant/ is
> rsync-ed 
> into the corresponding document directory of the web server.
> 
> So I during the night, data from the doctravail/ directory doesn't move.
> It 
> won't help if I use a snapshot of that volume to rsync to the
destination.
> I 
> think...
> 

It will help to avoid expensive DLM locking if yoo mount the snapshot with
local locking.

> Regards, 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From ufimtseva at gmail.com  Tue Nov  8 21:54:29 2011
From: ufimtseva at gmail.com (Elena Ufimtseva)
Date: Tue, 8 Nov 2011 16:54:29 -0500
Subject: [Linux-cluster] fence_ilo question
In-Reply-To: <CAEr7rXgBy3Vzd4TJoUjm43J=2ryijY4ZA6QooZTAgqLaj5XnKQ@mail.gmail.com>
References: <CAEr7rXhwFRHdB5MBq_cuv7R2YnW0xhu7sm9bLH6TwOz7s4UbHA@mail.gmail.com>
	<1320695343.2135.32.camel@lenovo01-laptop03.gunduz.org>
	<CAEr7rXgBy3Vzd4TJoUjm43J=2ryijY4ZA6QooZTAgqLaj5XnKQ@mail.gmail.com>
Message-ID: <CAEr7rXiFFeUb1wZwVkBr_ptmjZG5phHakJgQ253ayemEF1k6qw@mail.gmail.com>

Devrim, Ben

Thanks for advise, I moved much further now.

[root at cclbpdb1 ~]# /usr/bin/ipmitool -I lanplus -H '172.28.84.32' -U
'admin' -A 'password' -P 'pwd' chassis power status
Chassis Power is on

Thats on the local node though.

Just to have it here.
I installed hp-ilo driver hp-ilo-8.5.0-1.rhel5.x86_64.rpm from hp.
modprobe ipmi_si
modprobe ipmi_devintf

Remotely it does not work yet, I guess there is some filtering going on of
UDP.

It works perfect from debian just installing ipmitool.

Elena.


On Mon, Nov 7, 2011 at 3:04 PM, Elena Ufimtseva <ufimtseva at gmail.com> wrote:

> Ok, let me test this one.
>
> 2011/11/7 Devrim G?ND?Z <devrim at gunduz.org>
>
>>  On Mon, 2011-11-07 at 14:05 -0500, Elena Ufimtseva wrote:
>> >
>> >
>> > Anyone knows what is the latest version of fence_ilo or if fence_ilo
>> > (ILo3) should support timeout parameter? I try connecting to
>> > ILO (its hp ilo v3) manually and it works fine. But fencing does not
>> > work in cluster.
>> >
>> Have you tried with fence_ipmilan? IIRC fence_ilo* does not work with
>> iLO3.
>>
>> Regards,
>> --
>> Devrim G?ND?Z
>> Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
>> PostgreSQL Dan??man?/Consultant, Red Hat Certified Engineer
>> Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
>> http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> Elena
>


-- 
Elena
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111108/598df4dc/attachment.htm>

From rossnick-lists at cybercat.ca  Wed Nov  9 14:42:17 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Wed, 9 Nov 2011 09:42:17 -0500
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
References: <cf6bffda-4e2d-4c13-9e6d-cc5b192669a6@zmail06.collab.prod.int.phx2.redhat.com><DBC9A328666046E9A527993046ABB073@versa>
	<5c1c1a808c10f0b3255c43d50269f754@mx.varna.net>
Message-ID: <58AFCFD6439A47409710F3A6B5AA8226@versa>

> It will help to avoid expensive DLM locking if yoo mount the snapshot with
> local locking.

Sorry, you lost me, isn't dlm locking for gfs ?


From rossnick-lists at cybercat.ca  Wed Nov  9 14:57:16 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Wed, 9 Nov 2011 09:57:16 -0500
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
Message-ID: <3CD7D2C08F454B0C859B5A8D08FAFEC6@versa>

(...)

> On some services, there are document directories that are huge, not that 
> much in size (about 35 gigs), but in number of files, around one million. 
> One service even has 3 data directories with that many files each.

(...)

We're leaning towards changing the file system for those document 
directories (agian hierachy of, not 1 mil files per directory ;-) to ext4.

One or two question pops in my mind.

1. Can a node mount the volume read-write, while another one mount it 
read-only ?

2. I curently experience intermitent high i/o contention on the volume 
containing this data, the website still responds fase, but images takes more 
time to load when it arrives. I highly suspects, that since it's a ton of 
files spread trough out the disk, the it's the added latency of GFS that 
causes my problem.

So, how will perform an ext4 files over fiber versus a local one ? 


From ajb2 at mssl.ucl.ac.uk  Wed Nov  9 14:57:45 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Wed, 09 Nov 2011 14:57:45 +0000
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <F57B9FE8BF2B45539C7F857327D7CC1B@versa>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>	<5424.1320517029@localhost>
	<F57B9FE8BF2B45539C7F857327D7CC1B@versa>
Message-ID: <4EBA94E9.8010501@mssl.ucl.ac.uk>

Nicolas Ross wrote:

> Get me right, there are millions of files, but no more than a few 
> hundreds per directory. They are spread out splited on the database id, 
> 2 caracters at a time. So a file name 1234567.jpg would end up in a 
> directory 12/34/5/, or something similar.

OK, the way you wrote it looked like flat directory spacing.

We see appreciable knee points in GFS directory performance at 512, 4096 
and 16384 files/directory, with progressively worse performance 
deterioration between each knee pair. (It's a 2^n type problem)

> Yes it is a GFS specific, our backup server is on ext3 and rsyncing can 
> be made in a couple of hours, without eating cpu at all (only memory), 
> and without bringing the server on it's knees.

Have you tuned dentry/inode hashes? Have you got enough memory?

Bear in mind that rsync has to (at least) stat() every single file it 
looks at, which causes multicast locking traffic between the nodes if 
the FS is mounted on multiple machines - even mounted on a single node, 
it's slow.

If you can remount the FS with localflock then you'll see performance 
akin to your ext3 results, but on a single node mount with appropriate 
network/memory tuning you can at least double the rsync speed over 
vanilla configuration if there are a few million files involved.

>> We've experienced numerous cases where the filesystem hangs after a
>> service migration due a node (or service) failover. These hangs all
>> seem to be related to quota or NFS issues, so this may not be an issue
>> in your environment.
> 
> While we do not use nfs on top of the 3 most important directories, it 
> will be used on some of those volumes...

nfs(v2,3) is old, crufty, non-cluster/multitask aware(*), doesn't play 
nice with anything else accessing the disk and seems to be the root 
cause of most of our stability problems.

I can't talk about pNFS (NFSv4) stability as that requires bind mounts 
which aren't supported in a failover environment - it seems to work on 
individual nodes but I've never managed to have it working properly on a 
cluster.

(*) BEWARE if you have multiple services with NFS exports in them, the 
exportfs commands can play a nasty race game and scribble over the 
export list in an unpredictable manner. We fixed this with flocking in 
nfsclient.sh but redhat haven't rolled it into their distribution yet.

>> => would be failed and need to be manually restarted. What would be the
>> => consequence if the filesystem happens to be mounted on 2 nodes ?
>>
>> Most likely, filesystem corruption.
> 
> Other responses led me to beleive that if I let the cluster manage the 
> filesystem, and never mount it myselef, it's much less likely to happen.

Correct... But human factors being what they are added with other 
possibilities (such as failure to unmount, etc) mean that the chance is 
significantly higher than zero for my liking on any important FS


From swhiteho at redhat.com  Wed Nov  9 15:15:00 2011
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 09 Nov 2011 15:15:00 +0000
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <4EBA94E9.8010501@mssl.ucl.ac.uk>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
	<5424.1320517029@localhost> <F57B9FE8BF2B45539C7F857327D7CC1B@versa>
	<4EBA94E9.8010501@mssl.ucl.ac.uk>
Message-ID: <1320851700.2703.14.camel@menhir>

Hi,

On Wed, 2011-11-09 at 14:57 +0000, Alan Brown wrote:
> Nicolas Ross wrote:
> 
> > Get me right, there are millions of files, but no more than a few 
> > hundreds per directory. They are spread out splited on the database id, 
> > 2 caracters at a time. So a file name 1234567.jpg would end up in a 
> > directory 12/34/5/, or something similar.
> 
> OK, the way you wrote it looked like flat directory spacing.
> 
> We see appreciable knee points in GFS directory performance at 512, 4096 
> and 16384 files/directory, with progressively worse performance 
> deterioration between each knee pair. (It's a 2^n type problem)
> 
That is a bit strange. The GFS2 directory entries are sized according to
(length of file name + length of fixed size info) which means that
generally the number of blocks required to store a specific number of
files is not constant unless the file names are all the same length.

Also, once a directory has been unstuffed, the hash table will grow
until it is 128k in size, which is 16k pointers. So with 16384 directory
entries, you should be a long way from having a full hash table, since
each leaf block should contain around 80 entries (again depending on
filename length), so thats not too far off 1m entries.

So for all unstuffed directories with fewer than about 1m entries, I'd
expect to see all accesses resulting in the following I/O pattern:
 1. Look up hash table block
 2. Look up dir leaf block
 3. Look up inode (if this is a ->lookup rather than readdir)

What test are you using to generate the performance figures in this
case?

Steve.


From kkovachev at varna.net  Wed Nov  9 15:15:56 2011
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Wed, 09 Nov 2011 17:15:56 +0200
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <58AFCFD6439A47409710F3A6B5AA8226@versa>
References: <cf6bffda-4e2d-4c13-9e6d-cc5b192669a6@zmail06.collab.prod.int.phx2.redhat.com><DBC9A328666046E9A527993046ABB073@versa>
	<5c1c1a808c10f0b3255c43d50269f754@mx.varna.net>
	<58AFCFD6439A47409710F3A6B5AA8226@versa>
Message-ID: <6bf8e42c7716376d55f6b7b23c64e47d@mx.varna.net>

On Wed, 9 Nov 2011 09:42:17 -0500, "Nicolas Ross"
<rossnick-lists at cybercat.ca> wrote:
>> It will help to avoid expensive DLM locking if yoo mount the snapshot
>> with
>> local locking.
> 
> Sorry, you lost me, isn't dlm locking for gfs ?
> 

>From your first email:
"We are curently using RHEL 6.1 with GFS2 file systems on top of 
fiber-channel stoarage for our cluster. All fs' are in lv's, with clvmd."
and
"So, I'm thinking of putting each of thos directories into a single ext4"

So, i thought its GFS2 and switching to ext4 is not done yet ... but i
guess i misunderstood that

> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From swhiteho at redhat.com  Wed Nov  9 15:19:10 2011
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 09 Nov 2011 15:19:10 +0000
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <3CD7D2C08F454B0C859B5A8D08FAFEC6@versa>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>
	<3CD7D2C08F454B0C859B5A8D08FAFEC6@versa>
Message-ID: <1320851950.2703.18.camel@menhir>

Hi,

On Wed, 2011-11-09 at 09:57 -0500, Nicolas Ross wrote:
> (...)
> 
> > On some services, there are document directories that are huge, not that 
> > much in size (about 35 gigs), but in number of files, around one million. 
> > One service even has 3 data directories with that many files each.
> 
> (...)
> 
> We're leaning towards changing the file system for those document 
> directories (agian hierachy of, not 1 mil files per directory ;-) to ext4.
> 
> One or two question pops in my mind.
> 
> 1. Can a node mount the volume read-write, while another one mount it 
> read-only ?
> 
No (assuming you mean ext4 here)

> 2. I curently experience intermitent high i/o contention on the volume 
> containing this data, the website still responds fase, but images takes more 
> time to load when it arrives. I highly suspects, that since it's a ton of 
> files spread trough out the disk, the it's the added latency of GFS that 
> causes my problem.
> 
> So, how will perform an ext4 files over fiber versus a local one ? 
> 
ext4 should be just as fast, or even faster depending on what storage it
is running on. The issue though is whether you need to share the data or
not? GFS2 is slower in certain cases because of the overheads of the
keeping everything cluster coherent. If you don't need cluster
coherency, then maybe it is not the right solution, and ext4 may be
preferable,

Steve.


From rossnick-lists at cybercat.ca  Wed Nov  9 15:19:13 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Wed, 9 Nov 2011 10:19:13 -0500
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
References: <cf6bffda-4e2d-4c13-9e6d-cc5b192669a6@zmail06.collab.prod.int.phx2.redhat.com><DBC9A328666046E9A527993046ABB073@versa><5c1c1a808c10f0b3255c43d50269f754@mx.varna.net><58AFCFD6439A47409710F3A6B5AA8226@versa>
	<6bf8e42c7716376d55f6b7b23c64e47d@mx.varna.net>
Message-ID: <FDCE378CFAAC405B981E91DF61AEB254@versa>

> On Wed, 9 Nov 2011 09:42:17 -0500, "Nicolas Ross"
> <rossnick-lists at cybercat.ca> wrote:
>>> It will help to avoid expensive DLM locking if yoo mount the snapshot
>>> with
>>> local locking.
>> 
>> Sorry, you lost me, isn't dlm locking for gfs ?
>> 
> 
>>From your first email:
> "We are curently using RHEL 6.1 with GFS2 file systems on top of 
> fiber-channel stoarage for our cluster. All fs' are in lv's, with clvmd."
> and
> "So, I'm thinking of putting each of thos directories into a single ext4"
> 
> So, i thought its GFS2 and switching to ext4 is not done yet ... but i
> guess i misunderstood that

No, it's not done yet, but I tought that gfs snapshot were not supported ?


From kkovachev at varna.net  Wed Nov  9 15:44:17 2011
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Wed, 09 Nov 2011 17:44:17 +0200
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <FDCE378CFAAC405B981E91DF61AEB254@versa>
References: <cf6bffda-4e2d-4c13-9e6d-cc5b192669a6@zmail06.collab.prod.int.phx2.redhat.com><DBC9A328666046E9A527993046ABB073@versa><5c1c1a808c10f0b3255c43d50269f754@mx.varna.net><58AFCFD6439A47409710F3A6B5AA8226@versa>
	<6bf8e42c7716376d55f6b7b23c64e47d@mx.varna.net>
	<FDCE378CFAAC405B981E91DF61AEB254@versa>
Message-ID: <e6ee74d26da9349eaf76f16ba1de160f@mx.varna.net>

On Wed, 9 Nov 2011 10:19:13 -0500, "Nicolas Ross"
<rossnick-lists at cybercat.ca> wrote:
>> On Wed, 9 Nov 2011 09:42:17 -0500, "Nicolas Ross"
>> <rossnick-lists at cybercat.ca> wrote:
>>>> It will help to avoid expensive DLM locking if yoo mount the snapshot
>>>> with
>>>> local locking.
>>> 
>>> Sorry, you lost me, isn't dlm locking for gfs ?
>>> 
>> 
>>>From your first email:
>> "We are curently using RHEL 6.1 with GFS2 file systems on top of 
>> fiber-channel stoarage for our cluster. All fs' are in lv's, with
clvmd."
>> and
>> "So, I'm thinking of putting each of thos directories into a single
ext4"
>> 
>> So, i thought its GFS2 and switching to ext4 is not done yet ... but i
>> guess i misunderstood that
> 
> No, it's not done yet, but I tought that gfs snapshot were not supported
?

Well with clvm sort of ...

with non clustered lvm it doesn't matter what fs you are using and it is
possible to make a snapshot - it would also help mounting the snapshot of
an ext4 partition on different node too

with clvm you need to "activate the volume exclusively and then create the
snapshot", but not sure if you can later mount the snapshot on another node


> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From ajb2 at mssl.ucl.ac.uk  Wed Nov  9 16:00:39 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Wed, 09 Nov 2011 16:00:39 +0000
Subject: [Linux-cluster] Ext3/ext4 in a clustered environement
In-Reply-To: <1320851700.2703.14.camel@menhir>
References: <57026C2F7E1748F1A63F624ED8087C1B@versa>	<5424.1320517029@localhost>
	<F57B9FE8BF2B45539C7F857327D7CC1B@versa>	<4EBA94E9.8010501@mssl.ucl.ac.uk>
	<1320851700.2703.14.camel@menhir>
Message-ID: <4EBAA3A7.2000805@mssl.ucl.ac.uk>

Steven Whitehouse wrote:

>> We see appreciable knee points in GFS directory performance at 512, 4096 
>> and 16384 files/directory, with progressively worse performance 
>> deterioration between each knee pair. (It's a 2^n type problem)
>>
> That is a bit strange. The GFS2 directory entries are sized according to
> (length of file name + length of fixed size info) which means that
> generally the number of blocks required to store a specific number of
> files is not constant unless the file names are all the same length.

Generally they are, as are file sizes.

> Also, once a directory has been unstuffed, the hash table will grow
> until it is 128k in size, which is 16k pointers. So with 16384 directory
> entries, you should be a long way from having a full hash table, since
> each leaf block should contain around 80 entries (again depending on
> filename length), so thats not too far off 1m entries.

Should be, but performance becomes unusable long before that happens.

> So for all unstuffed directories with fewer than about 1m entries, I'd
> expect to see all accesses resulting in the following I/O pattern:
>  1. Look up hash table block
>  2. Look up dir leaf block
>  3. Look up inode (if this is a ->lookup rather than readdir)
> 
> What test are you using to generate the performance figures in this
> case?

"ls -l" - which is what the clients are using as they import data for 
number crunching work. Rsync uses a raw directory read but the stat() 
calls on individual files are pretty similar.

Once the information is cached, accessing the directory is fast until 
the cache expires (3-10 minutes)

There is definite and very measurable performance degradation as more 
files are added to a directory - even on things as simple as an 
incremental backup the number of files opened/second falls away rapidly 
as directories get larger.


From michael at ulimit.org  Wed Nov  9 20:23:41 2011
From: michael at ulimit.org (Michael Pye)
Date: Wed, 09 Nov 2011 20:23:41 +0000
Subject: [Linux-cluster] qdisk timeouts
Message-ID: <4EBAE14D.7030703@ulimit.org>

2 node cluster with quorum disk running rhcs on rh5.7.

Attempting to debug why we get the following qdisk issues:
Nov  7 16:24:09 host532 qdiskd[5750]:  qdiskd: write (system call) has 
hung for 13 seconds
Nov  7 16:24:09 host532 qdiskd[5750]:  In 14 more seconds, we will be 
evicted
Nov  7 16:24:11 host532 openais[5711]: [CMAN ] lost contact with quorum 
device
Nov  7 16:24:29 host532 openais[5711]: [CMAN ] cman killed by node 1 
because we were killed by cman_tool or other application

And the node where the time out happens is usually fenced. Is this 
something to do with my qdisk interval/ko timings of:
quorumd interval="3" label="qdisk" min_score="1" tko="9" votes="1

cluster.conf attached.

Thanks
Michael
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: host532_cluster.conf.txt
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111109/40102290/attachment.txt>

From jose.neto at liber4e.com  Thu Nov 10 07:46:54 2011
From: jose.neto at liber4e.com (jose nuno neto)
Date: Thu, 10 Nov 2011 07:46:54 -0000 (GMT)
Subject: [Linux-cluster] qdisk timeouts
In-Reply-To: <4EBAE14D.7030703@ulimit.org>
References: <4EBAE14D.7030703@ulimit.org>
Message-ID: <76b83175a5a6b60344831e7f6759250f.squirrel@208.77.96.130>

Hello

from the logs seems qdisk could not write to the quorum and got evicted,
normal behavior.
now depends on what you want it to, and fine tune some timmings

cheers
jose


> 2 node cluster with quorum disk running rhcs on rh5.7.
>
> Attempting to debug why we get the following qdisk issues:
> Nov  7 16:24:09 host532 qdiskd[5750]:  qdiskd: write (system call) has
> hung for 13 seconds
> Nov  7 16:24:09 host532 qdiskd[5750]:  In 14 more seconds, we will be
> evicted
> Nov  7 16:24:11 host532 openais[5711]: [CMAN ] lost contact with quorum
> device
> Nov  7 16:24:29 host532 openais[5711]: [CMAN ] cman killed by node 1
> because we were killed by cman_tool or other application
>
> And the node where the time out happens is usually fenced. Is this
> something to do with my qdisk interval/ko timings of:
> quorumd interval="3" label="qdisk" min_score="1" tko="9" votes="1
>
> cluster.conf attached.
>
> Thanks
> Michael
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From initxpolo at gmail.com  Thu Nov 10 09:31:02 2011
From: initxpolo at gmail.com (Polo InitX)
Date: Thu, 10 Nov 2011 17:31:02 +0800
Subject: [Linux-cluster] error messages while use fsck.gfs2
Message-ID: <CAHefBgSy8eBU0jzVAJoMMhb1HOBeoqifWY_d=ZYguqQcPigBBw@mail.gmail.com>

Hi
    The host use UIT's CT1600 and UIT's 2000E IP-SAN storage (single LUN
10TB)
last time the switch got a  power failure during the data backup (the iSCSI
connection is lost)
after the iSCSI reconneted, the filesystem was unable to use.

So I try  use fsck.gfs2 to fix the filesystem problem.but there is a error
message and I got no idea with this error.

[root at Toureg ~]# fsck.gfs2 /dev/sdc
Initializing fsck
The system master directory seems to be destroyed.
Okay to rebuild it? (y/n)y
Trying to rebuild the master directory.
libgfs2.h: out of space

file system info
[root at Toureg ~]# gfs2_tool sb /dev/sdc all
  mh_magic = 0x01161970
  mh_type = 1
  mh_format = 100
  sb_fs_format = 1801
  sb_multihost_format = 1900
  sb_bsize = 4096
  sb_bsize_shift = 12
  no_formal_ino = 2
  no_addr = 35
  no_formal_ino = 1
  no_addr = 34
  sb_lockproto = fsck_nolock
  sb_locktable =
  uuid = 5a8234ba-ad89-f87a-c98b-807e307085fe

disk info
Disk /dev/sdc: 9987.9 GB, 9987946446848 bytes
255 heads, 63 sectors/track, 1214298 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sdc doesn't contain a valid partition table

I had attached the host dmesg wish the data can come back.

the OS is RHEL 6.0 and I also tried CentOS 5.7 SL 6.1 the results are the
same . only centos 5.4 will got different error messages
I guess it depends on what version of the gfs2-utils used.

btw:
gfs2_tool version
gfs2_tool master (built Nov 10 2011 21:26:03)
Copyright (C) Red Hat, Inc.  2004-2010  All rights reserved.

Thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111110/fb285411/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: host.dmesg
Type: application/octet-stream
Size: 67539 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111110/fb285411/attachment.obj>

From mgrac at redhat.com  Thu Nov 10 11:13:23 2011
From: mgrac at redhat.com (Marek Grac)
Date: Thu, 10 Nov 2011 12:13:23 +0100
Subject: [Linux-cluster] fence_ilo question
In-Reply-To: <CAEr7rXiFFeUb1wZwVkBr_ptmjZG5phHakJgQ253ayemEF1k6qw@mail.gmail.com>
References: <CAEr7rXhwFRHdB5MBq_cuv7R2YnW0xhu7sm9bLH6TwOz7s4UbHA@mail.gmail.com>	<1320695343.2135.32.camel@lenovo01-laptop03.gunduz.org>	<CAEr7rXgBy3Vzd4TJoUjm43J=2ryijY4ZA6QooZTAgqLaj5XnKQ@mail.gmail.com>
	<CAEr7rXiFFeUb1wZwVkBr_ptmjZG5phHakJgQ253ayemEF1k6qw@mail.gmail.com>
Message-ID: <4EBBB1D3.5070802@redhat.com>

Hi,

On 11/08/2011 10:54 PM, Elena Ufimtseva wrote:
> Devrim, Ben
> 
> Thanks for advise, I moved much further now.
> 
> [root at cclbpdb1 ~]# /usr/bin/ipmitool -I lanplus -H '172.28.84.32' -U
> 'admin' -A 'password' -P 'pwd' chassis power status
> Chassis Power is on
> 
> Thats on the local node though.
> 
> Just to have it here.
> I installed hp-ilo driver hp-ilo-8.5.0-1.rhel5.x86_64.rpm from hp.
> modprobe ipmi_si
> modprobe ipmi_devintf

All you need is fence-agents package and ipmitool (should be in
dependencies).

https://access.redhat.com/kb/docs/DOC-30004

notes for iLO3:

fence_ipmilan needs to be used with the -P option to enable Lanplus. It
also requires usage of the -T parameter to provide a 4 second timeout
rather than the default fence_ipmilan default of 2 seconds. The -T
parameter is provided in versions of fence_ipmilan from Red Hat
Enterprise Linux 5.5.z and up.

Cmd-line arguments are like you propose in first mail

m,


From ext.thales.jean-daniel.bonnetot at sncf.fr  Thu Nov 10 14:39:48 2011
From: ext.thales.jean-daniel.bonnetot at sncf.fr (BONNETOT Jean-Daniel (EXT THALES))
Date: Thu, 10 Nov 2011 15:39:48 +0100
Subject: [Linux-cluster] Meaning of TOTEM Retransmit List
Message-ID: <C088D3516432C643AC828162A5164A7F0B1A21BB@se3lmwbibaw.COMMUN.AD.SNCF.FR>

Hi,

Can you tell me what 'Retransmit List' means ?
We had lots of this messages associated with "The token was lost in the
OPERATIONAL state.".
We have added a dedicated network and now we have just 'Retransmit
List'. I would like to know if this is bad ?

RHEL 5.6 x86_64, 3 nodes

Regards,

Jean-Daniel BONNETOT
-------
Ce message et toutes les pi?ces jointes sont ?tablis ? l'intention exclusive de ses destinataires et sont confidentiels. L'int?grit? de ce message n'?tant pas assur?e sur Internet, la SNCF ne peut ?tre tenue responsable des alt?rations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, m?me partielle, non autoris?e pr?alablement par la SNCF, est strictement interdite. Si vous n'?tes pas le destinataire de ce message, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it. 


From rpeterso at redhat.com  Thu Nov 10 14:45:19 2011
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 10 Nov 2011 09:45:19 -0500 (EST)
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <CAHefBgSy8eBU0jzVAJoMMhb1HOBeoqifWY_d=ZYguqQcPigBBw@mail.gmail.com>
Message-ID: <38e582e2-13de-4e78-95c0-9ef2de08c974@zmail06.collab.prod.int.phx2.redhat.com>

----- Original Message -----
| Hi
|     The host use UIT's CT1600 and UIT's 2000E IP-SAN storage (single
|     LUN
| 10TB)
| last time the switch got a  power failure during the data backup (the
| iSCSI
| connection is lost)
| after the iSCSI reconneted, the filesystem was unable to use.
| 
| So I try  use fsck.gfs2 to fix the filesystem problem.but there is a
| error
| message and I got no idea with this error.
| 
| [root at Toureg ~]# fsck.gfs2 /dev/sdc
| Initializing fsck
| The system master directory seems to be destroyed.
| Okay to rebuild it? (y/n)y
| Trying to rebuild the master directory.
| libgfs2.h: out of space

Hi,

The only time I've seen this error before is when the device
was badly damaged. I don't know how that one was damaged, but
it looked like the RAID controller had completely rearranged
the blocks on the media.

If you are a Red Hat customer, please call the Red Hat support
number and ask for help.

A number of things can cause this message:
1. Scrambled blocks on media
2. A disk failed
3. Running fsck on the wrong device. For example, if /dev/sdc
   is partitioned (should have specified /dev/sdc1?) or if
   /dev/sdc is supposed to be part of an LVM2 logical volume,
   (should you have specified /dev/volgrp/logvol (for example)
   rather than /dev/sdc?

Either way, something's seriously wrong with it. If it were my
device, I'd dump the first few MB to a file and see what's there,
compared to what should be there.

Regards,

Bob Peterson
Red Hat File Systems


From jfenal at redhat.com  Thu Nov 10 18:00:44 2011
From: jfenal at redhat.com (=?ISO-8859-1?Q?J=E9r=F4me?= Fenal)
Date: Thu, 10 Nov 2011 19:00:44 +0100
Subject: [Linux-cluster] Meaning of TOTEM Retransmit List
In-Reply-To: <C088D3516432C643AC828162A5164A7F0B1A21BB@se3lmwbibaw.COMMUN.AD.SNCF.FR>
References: <C088D3516432C643AC828162A5164A7F0B1A21BB@se3lmwbibaw.COMMUN.AD.SNCF.FR>
Message-ID: <1320948044.28470.11.camel@jfenal.csb>

Le jeudi 10 novembre 2011 ? 15:39 +0100, BONNETOT Jean-Daniel (EXT
THALES) a ?crit :
> Hi,
> 
> Can you tell me what 'Retransmit List' means ?
> We had lots of this messages associated with "The token was lost in the
> OPERATIONAL state.".
> We have added a dedicated network and now we have just 'Retransmit
> List'. I would like to know if this is bad ?
> 
> RHEL 5.6 x86_64, 3 nodes

Hi Jean-Daniel,

Open a ticket at Red Hat support, or consult the following KBase :
https://access.redhat.com/kb/docs/DOC-63927

Regards,

J.
-- 
J?r?me Fenal, RHCE                        Tel.: +33 1 41 91 23 37
Solutions Architect Public Sector         Mob.: +33 6 88 06 51 15
Architecte Solutions Secteur Public       Fax.: +33 1 41 91 23 32
http://www.fr.redhat.com/                       jfenal at redhat.com
Red Hat France SARL                    Siret n? 421 199 464 00064
Le Linea, 1 rue du G?n?ral Leclerc   92047 Paris La D?fense Cedex
Own The New Now                      http://www.ownthenewnow.com/


From michael at ulimit.org  Thu Nov 10 19:19:01 2011
From: michael at ulimit.org (Michael Pye)
Date: Thu, 10 Nov 2011 19:19:01 +0000
Subject: [Linux-cluster] qdisk timeouts
In-Reply-To: <76b83175a5a6b60344831e7f6759250f.squirrel@208.77.96.130>
References: <4EBAE14D.7030703@ulimit.org>
	<76b83175a5a6b60344831e7f6759250f.squirrel@208.77.96.130>
Message-ID: <4EBC23A5.7020302@ulimit.org>

On 10/11/2011 07:46, jose nuno neto wrote:
> from the logs seems qdisk could not write to the quorum and got evicted,
> normal behavior.
> now depends on what you want it to, and fine tune some timmings

Thanks for the response. Why couldn't it suddenly write to qdisk though 
when it works for days with no problem ?

I suspected time outs or multipath failovers / lun tresspases on the 
storage but this doesn't seem to be the case. I've lowered lpfc timeout 
to 10 seconds and still have this issue.

I will turn on debug logging and raised the case with red hat. Maybe 
there is something wrong with my timings but any futher pointers 
appreciated.

Michael


From sherlock_zhang at uit.com.cn  Fri Nov 11 09:04:34 2011
From: sherlock_zhang at uit.com.cn (Sherlock Zhang)
Date: Fri, 11 Nov 2011 17:04:34 +0800
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <38e582e2-13de-4e78-95c0-9ef2de08c974@zmail06.collab.prod.int.phx2.redhat.com>
References: <CAHefBgSy8eBU0jzVAJoMMhb1HOBeoqifWY_d=ZYguqQcPigBBw@mail.gmail.com>
	<38e582e2-13de-4e78-95c0-9ef2de08c974@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID: <CAHefBgTxhMYRHz1vi-QjpVae8GbpNA9TUTq8thRfW0QpmptUEA@mail.gmail.com>

Hi Bob
    Thank you for your advice,
I had dump the FS head.
#dd if=/dev/sdc of=/home/disk.dump bs=4096 count=1000
#file disk.dump
disk.dump: Linux GFS2 Filesystem (blocksize 4096, lockproto fsck_nolock)
and I also try
# gfs2_tool sb /dev/sdc all
  mh_magic = 0x01161970
  mh_type = 1
  mh_format = 100
  sb_fs_format = 1801
  sb_multihost_format = 1900
  sb_bsize = 4096
  sb_bsize_shift = 12
  no_formal_ino = 2
  no_addr = 35
  no_formal_ino = 1
  no_addr = 34
  sb_lockproto = fsck_nolock
  sb_locktable =
  uuid = 5a8234ba-ad89-f87a-c98b-807e307085fe

then try to mount the storage
#mount.gfs2 -o  lockproto=fsck_nolock /dev/sdc /opt
error mounting /dev/sdc on /opt: No such file or directory

#dmesg
GFS2: can't find protocol fsck_nolock

Is this mean the filesystem structure still ok?
why should dmesg will report can't find fsck_nolock protocol?
Did I miss something?

Thank you

On Thu, Nov 10, 2011 at 10:45 PM, Bob Peterson <rpeterso at redhat.com> wrote:

> ----- Original Message -----
> | Hi
> |     The host use UIT's CT1600 and UIT's 2000E IP-SAN storage (single
> |     LUN
> | 10TB)
> | last time the switch got a  power failure during the data backup (the
> | iSCSI
> | connection is lost)
> | after the iSCSI reconneted, the filesystem was unable to use.
> |
> | So I try  use fsck.gfs2 to fix the filesystem problem.but there is a
> | error
> | message and I got no idea with this error.
> |
> | [root at Toureg ~]# fsck.gfs2 /dev/sdc
> | Initializing fsck
> | The system master directory seems to be destroyed.
> | Okay to rebuild it? (y/n)y
> | Trying to rebuild the master directory.
> | libgfs2.h: out of space
>
> Hi,
>
> The only time I've seen this error before is when the device
> was badly damaged. I don't know how that one was damaged, but
> it looked like the RAID controller had completely rearranged
> the blocks on the media.
>
> If you are a Red Hat customer, please call the Red Hat support
> number and ask for help.
>
> A number of things can cause this message:
> 1. Scrambled blocks on media
> 2. A disk failed
> 3. Running fsck on the wrong device. For example, if /dev/sdc
>   is partitioned (should have specified /dev/sdc1?) or if
>   /dev/sdc is supposed to be part of an LVM2 logical volume,
>   (should you have specified /dev/volgrp/logvol (for example)
>   rather than /dev/sdc?
>
> Either way, something's seriously wrong with it. If it were my
> device, I'd dump the first few MB to a file and see what's there,
> compared to what should be there.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
 Thank you

Best regards
*************************************************************************************************
Sherlock Zhang  ? ??
Technical Supporter East China Technical Support Department

Address: Room 23E No. 728 West Yanan Road Changning DC. Shanghai China
Post code: 20050

Office: +86 021 62253300 ext. 803
Mobile: +86 133 8600 6305
www.uit.com.cn


*************************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111111/ff6adeaa/attachment.htm>

From swhiteho at redhat.com  Fri Nov 11 09:17:45 2011
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Fri, 11 Nov 2011 09:17:45 +0000
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <CAHefBgTxhMYRHz1vi-QjpVae8GbpNA9TUTq8thRfW0QpmptUEA@mail.gmail.com>
References: <CAHefBgSy8eBU0jzVAJoMMhb1HOBeoqifWY_d=ZYguqQcPigBBw@mail.gmail.com>
	<38e582e2-13de-4e78-95c0-9ef2de08c974@zmail06.collab.prod.int.phx2.redhat.com>
	<CAHefBgTxhMYRHz1vi-QjpVae8GbpNA9TUTq8thRfW0QpmptUEA@mail.gmail.com>
Message-ID: <1321003065.2710.3.camel@menhir>

Hi,

On Fri, 2011-11-11 at 17:04 +0800, Sherlock Zhang wrote:
> Hi Bob
>     Thank you for your advice, 
> I had dump the FS head.
> #dd if=/dev/sdc of=/home/disk.dump bs=4096 count=1000
> #file disk.dump
> disk.dump: Linux GFS2 Filesystem (blocksize 4096, lockproto
> fsck_nolock)
> and I also try
> # gfs2_tool sb /dev/sdc all
>   mh_magic = 0x01161970
>   mh_type = 1
>   mh_format = 100
>   sb_fs_format = 1801
>   sb_multihost_format = 1900
>   sb_bsize = 4096
>   sb_bsize_shift = 12
>   no_formal_ino = 2
>   no_addr = 35
>   no_formal_ino = 1
>   no_addr = 34
>   sb_lockproto = fsck_nolock
>   sb_locktable =
>   uuid = 5a8234ba-ad89-f87a-c98b-807e307085fe
> 
> 
> then try to mount the storage 
> #mount.gfs2 -o  lockproto=fsck_nolock /dev/sdc /opt
> error mounting /dev/sdc on /opt: No such file or directory
> 
> 
> #dmesg
> GFS2: can't find protocol fsck_nolock
> 
> 
> Is this mean the filesystem structure still ok?
> why should dmesg will report can't find fsck_nolock protocol?
> Did I miss something?
> 
> 
> Thank you
> 
This means that fsck has stopped before it finished for some reason. It
replaces the lock protocol with fsck_nolock while it is running so that
if this happens you cannot remount the filesystem which is potentially
damaged. If fsck is not completing correctly then we need to look into
why that is,

Steve.

> On Thu, Nov 10, 2011 at 10:45 PM, Bob Peterson <rpeterso at redhat.com>
> wrote:
>         ----- Original Message -----
>         | Hi
>         |     The host use UIT's CT1600 and UIT's 2000E IP-SAN storage
>         (single
>         |     LUN
>         | 10TB)
>         | last time the switch got a  power failure during the data
>         backup (the
>         | iSCSI
>         | connection is lost)
>         | after the iSCSI reconneted, the filesystem was unable to
>         use.
>         |
>         | So I try  use fsck.gfs2 to fix the filesystem problem.but
>         there is a
>         | error
>         | message and I got no idea with this error.
>         |
>         | [root at Toureg ~]# fsck.gfs2 /dev/sdc
>         | Initializing fsck
>         | The system master directory seems to be destroyed.
>         | Okay to rebuild it? (y/n)y
>         | Trying to rebuild the master directory.
>         | libgfs2.h: out of space
>         
>         
>         Hi,
>         
>         The only time I've seen this error before is when the device
>         was badly damaged. I don't know how that one was damaged, but
>         it looked like the RAID controller had completely rearranged
>         the blocks on the media.
>         
>         If you are a Red Hat customer, please call the Red Hat support
>         number and ask for help.
>         
>         A number of things can cause this message:
>         1. Scrambled blocks on media
>         2. A disk failed
>         3. Running fsck on the wrong device. For example, if /dev/sdc
>           is partitioned (should have specified /dev/sdc1?) or if
>           /dev/sdc is supposed to be part of an LVM2 logical volume,
>           (should you have specified /dev/volgrp/logvol (for example)
>           rather than /dev/sdc?
>         
>         Either way, something's seriously wrong with it. If it were my
>         device, I'd dump the first few MB to a file and see what's
>         there,
>         compared to what should be there.
>         
>         Regards,
>         
>         Bob Peterson
>         Red Hat File Systems
>         
>         --
>         Linux-cluster mailing list
>         Linux-cluster at redhat.com
>         https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
> -- 
>  Thank you
> 
> Best regards
> *************************************************************************************************
> Sherlock Zhang  ? ?? 
> Technical Supporter East China Technical Support Department
> 
> Address: Room 23E No. 728 West Yanan Road Changning DC. Shanghai China
> Post code: 20050
> 
> Office: +86 021 62253300 ext. 803
> Mobile: +86 133 8600 6305
> www.uit.com.cn
> 
> 
> *************************************************************************************************
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From initxpolo at gmail.com  Fri Nov 11 10:00:53 2011
From: initxpolo at gmail.com (Polo InitX)
Date: Fri, 11 Nov 2011 18:00:53 +0800
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <1321003065.2710.3.camel@menhir>
References: <CAHefBgSy8eBU0jzVAJoMMhb1HOBeoqifWY_d=ZYguqQcPigBBw@mail.gmail.com>
	<38e582e2-13de-4e78-95c0-9ef2de08c974@zmail06.collab.prod.int.phx2.redhat.com>
	<CAHefBgTxhMYRHz1vi-QjpVae8GbpNA9TUTq8thRfW0QpmptUEA@mail.gmail.com>
	<1321003065.2710.3.camel@menhir>
Message-ID: <CAHefBgT434q8CgD9ZKeOBZ863OKCLmx+BwqqPi3KgkJkc+JdVw@mail.gmail.com>

I have no idea with fsck.gfs2 to check the storage.
Any other advice?

On Fri, Nov 11, 2011 at 5:17 PM, Steven Whitehouse <swhiteho at redhat.com>wrote:

> Hi,
>
> On Fri, 2011-11-11 at 17:04 +0800, Sherlock Zhang wrote:
> > Hi Bob
> >     Thank you for your advice,
> > I had dump the FS head.
> > #dd if=/dev/sdc of=/home/disk.dump bs=4096 count=1000
> > #file disk.dump
> > disk.dump: Linux GFS2 Filesystem (blocksize 4096, lockproto
> > fsck_nolock)
> > and I also try
> > # gfs2_tool sb /dev/sdc all
> >   mh_magic = 0x01161970
> >   mh_type = 1
> >   mh_format = 100
> >   sb_fs_format = 1801
> >   sb_multihost_format = 1900
> >   sb_bsize = 4096
> >   sb_bsize_shift = 12
> >   no_formal_ino = 2
> >   no_addr = 35
> >   no_formal_ino = 1
> >   no_addr = 34
> >   sb_lockproto = fsck_nolock
> >   sb_locktable =
> >   uuid = 5a8234ba-ad89-f87a-c98b-807e307085fe
> >
> >
> > then try to mount the storage
> > #mount.gfs2 -o  lockproto=fsck_nolock /dev/sdc /opt
> > error mounting /dev/sdc on /opt: No such file or directory
> >
> >
> > #dmesg
> > GFS2: can't find protocol fsck_nolock
> >
> >
> > Is this mean the filesystem structure still ok?
> > why should dmesg will report can't find fsck_nolock protocol?
> > Did I miss something?
> >
> >
> > Thank you
> >
> This means that fsck has stopped before it finished for some reason. It
> replaces the lock protocol with fsck_nolock while it is running so that
> if this happens you cannot remount the filesystem which is potentially
> damaged. If fsck is not completing correctly then we need to look into
> why that is,
>
> Steve.
>
> > On Thu, Nov 10, 2011 at 10:45 PM, Bob Peterson <rpeterso at redhat.com>
> > wrote:
> >         ----- Original Message -----
> >         | Hi
> >         |     The host use UIT's CT1600 and UIT's 2000E IP-SAN storage
> >         (single
> >         |     LUN
> >         | 10TB)
> >         | last time the switch got a  power failure during the data
> >         backup (the
> >         | iSCSI
> >         | connection is lost)
> >         | after the iSCSI reconneted, the filesystem was unable to
> >         use.
> >         |
> >         | So I try  use fsck.gfs2 to fix the filesystem problem.but
> >         there is a
> >         | error
> >         | message and I got no idea with this error.
> >         |
> >         | [root at Toureg ~]# fsck.gfs2 /dev/sdc
> >         | Initializing fsck
> >         | The system master directory seems to be destroyed.
> >         | Okay to rebuild it? (y/n)y
> >         | Trying to rebuild the master directory.
> >         | libgfs2.h: out of space
> >
> >
> >         Hi,
> >
> >         The only time I've seen this error before is when the device
> >         was badly damaged. I don't know how that one was damaged, but
> >         it looked like the RAID controller had completely rearranged
> >         the blocks on the media.
> >
> >         If you are a Red Hat customer, please call the Red Hat support
> >         number and ask for help.
> >
> >         A number of things can cause this message:
> >         1. Scrambled blocks on media
> >         2. A disk failed
> >         3. Running fsck on the wrong device. For example, if /dev/sdc
> >           is partitioned (should have specified /dev/sdc1?) or if
> >           /dev/sdc is supposed to be part of an LVM2 logical volume,
> >           (should you have specified /dev/volgrp/logvol (for example)
> >           rather than /dev/sdc?
> >
> >         Either way, something's seriously wrong with it. If it were my
> >         device, I'd dump the first few MB to a file and see what's
> >         there,
> >         compared to what should be there.
> >
> >         Regards,
> >
> >         Bob Peterson
> >         Red Hat File Systems
> >
> >         --
> >         Linux-cluster mailing list
> >         Linux-cluster at redhat.com
> >         https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> >
> > --
> >  Thank you
> >
> > Best regards
> >
> *************************************************************************************************
> > Sherlock Zhang  ? ??
> > Technical Supporter East China Technical Support Department
> >
> > Address: Room 23E No. 728 West Yanan Road Changning DC. Shanghai China
> > Post code: 20050
> >
> > Office: +86 021 62253300 ext. 803
> > Mobile: +86 133 8600 6305
> > www.uit.com.cn
> >
> >
> >
> *************************************************************************************************
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
        ?
 ???

*************************************************************************************************
Sherlock Zhang  ? ??
Technical Supporter East China Technical Support Department


Office: +86 021 62253300 ext. 803
Mobile: +86 133 8600 6305
www.uit.com.cn


*************************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111111/1804541d/attachment.htm>

From rpeterso at redhat.com  Fri Nov 11 13:49:28 2011
From: rpeterso at redhat.com (Bob Peterson)
Date: Fri, 11 Nov 2011 08:49:28 -0500 (EST)
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <CAHefBgTxhMYRHz1vi-QjpVae8GbpNA9TUTq8thRfW0QpmptUEA@mail.gmail.com>
Message-ID: <f4d7bc49-7b52-4a02-98ba-5f57e9ae290c@zmail06.collab.prod.int.phx2.redhat.com>

----- Original Message -----
| Hi Bob
|     Thank you for your advice,
| I had dump the FS head.
| #dd if=/dev/sdc of=/home/disk.dump bs=4096 count=1000
| #file disk.dump

As Steve Whitehouse said, the error you posted indicates that
fsck.gfs2 aborted abnormally. You can change the protocol back
with this command:

gfs2_tool sb /dev/sdc proto "lock_nolock"

But I suspect that won't help much because the corruption goes
farther than just the superblock. I suspect that the superblock
only looks reasonable because the run of fsck.gfs2 recreated it.
But previous messages indicated that the master system directory
was gone or corrupt for some reason, and we have to determine
where it went.  The best way to do that is to look at the image.

Can you make the file disk.dump available for me to download?

Regards,

Bob Peterson
Red Hat File Systems


From erik.redding at txstate.edu  Fri Nov 11 22:27:09 2011
From: erik.redding at txstate.edu (Redding, Erik)
Date: Fri, 11 Nov 2011 16:27:09 -0600
Subject: [Linux-cluster] configuring NFS exports
Message-ID: <332A0B70-6BA9-4A80-A186-EBB5B97F9E81@txstate.edu>

I've been digging around for a bit and thought I'd ask the community.  I'm configuring a clustered NFS resource with a single "fs device", but I want to export the directories under the "fs device" - not just the whole device...I read somewhere that the <nfsexport> tag pulls the "left-side" of the /etc/exports file from the <fs> tag before and the <nfsclient> tag is like the right-side of the exports file but I was hoping there's some logical way to provide something more than a mountpoint in an nfs-cluster.

example:
I have a device: dev/vgexport/lvexport.  when mounted to /export, I run ls:

$ ls -al   
total 43                                                            
drwxrwxr-x    9 user user   9 Jan 13  2011 .                      
drwxr-xr-x+   7 user user   7 Mar  8  2009 ..                     
drwxrwxrwx  132 user user 132 Jan 25  2006 2005                   
drwxrwxrwx  355 user user 355 Dec 31  2006 2006                   
drwxrwxrwx  367 user user 367 Mar 25  2009 2007                   
                

I want to be able to export out each one of these directories inside the mount point as a separate nfs mount because they may not all be in the same directory tree. 

Here's my current definition inside of a service block:

<fs device="/dev/vgexport/lvexport" force_fsck="0" force_unmount="1" fsid="20855" fstype="ext3" mountpoint="/export" name="export-nfs-fs" options="_netdev" self_fence="0"/>
<nfsexport name="nfs-export"/>
<nfsclient name="client-nfs" options="rw" path="/export/" target="10.30.35.*"/>

My goal isn't to serve out the mount point itself, but the year-named folders inside to these clients, so can I do this?: 

<fs device="/dev/vgexport/lvexport" force_fsck="0" force_unmount="1" fsid="20855" fstype="ext3" mountpoint="/export" name="export-nfs-fs" options="_netdev" self_fence="0"/>
<nfsexport name="nfs-export"/>
<nfsclient name="client-nfs" options="rw" path="/export/2005" target="10.30.35.*"/>
<nfsclient name="client-nfs" options="rw" path="/export/2006" target="10.30.35.*"/>
<nfsclient name="client-nfs" options="rw" path="/export/2007" target="10.30.35.*"/>


Anyone run into this before or have any advice?  Thanks in advance!


Erik Redding
Core Systems
Texas State University-San Marcos


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4897 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111111/744da0e0/attachment.p7s>

From sherlock_zhang at uit.com.cn  Sat Nov 12 00:26:06 2011
From: sherlock_zhang at uit.com.cn (Sherlock Zhang)
Date: Sat, 12 Nov 2011 08:26:06 +0800
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <f4d7bc49-7b52-4a02-98ba-5f57e9ae290c@zmail06.collab.prod.int.phx2.redhat.com>
References: <CAHefBgTxhMYRHz1vi-QjpVae8GbpNA9TUTq8thRfW0QpmptUEA@mail.gmail.com>
	<f4d7bc49-7b52-4a02-98ba-5f57e9ae290c@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID: <CAHefBgQTLrv=7Zfr4CZz8eO=4VOL9exsh+ZO744V5f6MQNT=Xw@mail.gmail.com>

I will upload the disk.dump file to somewhere next week then paste the URL.
Is there anything else that I should provide.

On Fri, Nov 11, 2011 at 9:49 PM, Bob Peterson <rpeterso at redhat.com> wrote:

> ----- Original Message -----
> | Hi Bob
> |     Thank you for your advice,
> | I had dump the FS head.
> | #dd if=/dev/sdc of=/home/disk.dump bs=4096 count=1000
> | #file disk.dump
>
> As Steve Whitehouse said, the error you posted indicates that
> fsck.gfs2 aborted abnormally. You can change the protocol back
> with this command:
>
> gfs2_tool sb /dev/sdc proto "lock_nolock"
>
> But I suspect that won't help much because the corruption goes
> farther than just the superblock. I suspect that the superblock
> only looks reasonable because the run of fsck.gfs2 recreated it.
> But previous messages indicated that the master system directory
> was gone or corrupt for some reason, and we have to determine
> where it went.  The best way to do that is to look at the image.
>
> Can you make the file disk.dump available for me to download?
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
 Thank you

Best regards
*************************************************************************************************
Sherlock Zhang  ? ??
Technical Supporter East China Technical Support Department

Address: Room 23E No. 728 West Yanan Road Changning DC. Shanghai China
Post code: 20050

Office: +86 021 62253300 ext. 803
Mobile: +86 133 8600 6305
www.uit.com.cn


*************************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111112/c87cba9b/attachment.htm>

From initxpolo at gmail.com  Mon Nov 14 04:10:25 2011
From: initxpolo at gmail.com (Polo InitX)
Date: Mon, 14 Nov 2011 12:10:25 +0800
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <CAHefBgQTLrv=7Zfr4CZz8eO=4VOL9exsh+ZO744V5f6MQNT=Xw@mail.gmail.com>
References: <CAHefBgTxhMYRHz1vi-QjpVae8GbpNA9TUTq8thRfW0QpmptUEA@mail.gmail.com>
	<f4d7bc49-7b52-4a02-98ba-5f57e9ae290c@zmail06.collab.prod.int.phx2.redhat.com>
	<CAHefBgQTLrv=7Zfr4CZz8eO=4VOL9exsh+ZO744V5f6MQNT=Xw@mail.gmail.com>
Message-ID: <CAHefBgR_0keH6A96V+9zP=2QzV6psgQgxy8V6AHJfWJhFojy7Q@mail.gmail.com>

Hi Bob
    You can download the disk.dump file via this URL
"http://www.waalker.com/disk.dump"
and the file's md5sum is
1e43719ca086220478a9aee9c09c727b  disk.dump
if is there anything other can help ascertain the problem please let me
know.

Thank you


On Sat, Nov 12, 2011 at 8:26 AM, Sherlock Zhang
<sherlock_zhang at uit.com.cn>wrote:

> I will upload the disk.dump file to somewhere next week then paste the URL.
> Is there anything else that I should provide.
>
> On Fri, Nov 11, 2011 at 9:49 PM, Bob Peterson <rpeterso at redhat.com> wrote:
>
>> ----- Original Message -----
>> | Hi Bob
>> |     Thank you for your advice,
>> | I had dump the FS head.
>> | #dd if=/dev/sdc of=/home/disk.dump bs=4096 count=1000
>> | #file disk.dump
>>
>> As Steve Whitehouse said, the error you posted indicates that
>> fsck.gfs2 aborted abnormally. You can change the protocol back
>> with this command:
>>
>> gfs2_tool sb /dev/sdc proto "lock_nolock"
>>
>> But I suspect that won't help much because the corruption goes
>> farther than just the superblock. I suspect that the superblock
>> only looks reasonable because the run of fsck.gfs2 recreated it.
>> But previous messages indicated that the master system directory
>> was gone or corrupt for some reason, and we have to determine
>> where it went.  The best way to do that is to look at the image.
>>
>> Can you make the file disk.dump available for me to download?
>>
>> Regards,
>>
>> Bob Peterson
>> Red Hat File Systems
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
>  Thank you
>
> Best regards
>
> *************************************************************************************************
> Sherlock Zhang  ? ??
> Technical Supporter East China Technical Support Department
>
> Address: Room 23E No. 728 West Yanan Road Changning DC. Shanghai China
> Post code: 20050
>
> Office: +86 021 62253300 ext. 803
> Mobile: +86 133 8600 6305
> www.uit.com.cn
>
>
>
> *************************************************************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111114/a18968cc/attachment.htm>

From rpeterso at redhat.com  Mon Nov 14 15:00:12 2011
From: rpeterso at redhat.com (Bob Peterson)
Date: Mon, 14 Nov 2011 10:00:12 -0500 (EST)
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <CAHefBgR_0keH6A96V+9zP=2QzV6psgQgxy8V6AHJfWJhFojy7Q@mail.gmail.com>
Message-ID: <e711795e-92d0-45f0-96f8-160e8da07e7c@zmail16.collab.prod.int.phx2.redhat.com>

----- Original Message -----
| Hi Bob
|     You can download the disk.dump file via this URL
| "http://www.waalker.com/disk.dump"
| and the file's md5sum is
| 1e43719ca086220478a9aee9c09c727b  disk.dump
| if is there anything other can help ascertain the problem please let
| me
| know.
| 
| Thank you

Hi Sherlock,

I've taken a close look at the image file you created.
This appears to be a normal, everyday GFS2 file system
except there is a section of 16 blocks (or 0x10 in hex)
that are completely destroyed near the beginning of the
file system, right after the root directory. Unfortunately,
there are critical system files like the master directory
that were overwritten.

Blocks 35 through 50 are overwritten by unrecognisable
binary data.  There's no way to tell how this happened.

You might be able to recover the file system if you find
a copy of the image from before the corruption and copy
the corrupted 16 blocks from that.  For example, you could
use a command like this:

dd if=/dev/backup.image of=/dev/your/device bs=4K skip=35 seek=35 count=16 conv=notrunc

You would also need to fix up the locking protocol with:
gfs2_tool sb /dev/your/device proto "lock_dlm"

Without those 16 destroyed blocks, you may not be able to
recover the file system.

As a last-ditch effort, you could try running the experimental
fsck.gfs2 for RHEL6 located on my people page to see if it
can recreate any of the data, but it's a long shot, and unlikely
to work:

http://people.redhat.com/rpeterso/Experimental/RHEL6.x/fsck.gfs2

I hope this helps.

Regards,

Bob Peterson
Red Hat File Systems


From vadim.bulst at bbz.uni-leipzig.de  Mon Nov 14 15:00:30 2011
From: vadim.bulst at bbz.uni-leipzig.de (Vadim Bulst)
Date: Mon, 14 Nov 2011 16:00:30 +0100
Subject: [Linux-cluster] CLVM mirror using Pacemaker (RHEL6)
In-Reply-To: <AANLkTim8zZEMNRU5fY4GEpJ4WcBJTEEGxX4iDgDPb7ku@mail.gmail.com>
References: <AANLkTim8zZEMNRU5fY4GEpJ4WcBJTEEGxX4iDgDPb7ku@mail.gmail.com>
Message-ID: <4EC12D0E.3060208@bbz.uni-leipzig.de>

Am 21.02.2011 08:54, schrieb Pieter Baele:
> Hi,
>
> I added a DLM resource, but when I try to add clvm in crm, I get the
> following error:
> crm(live)configure# primitive clvm ocf:lvm2:clvmd params
> daemon_timeout="30" op monitor interval="60" timeout="60"
> ERROR: ocf:lvm2:clvmd: could not parse meta-data:
> ERROR: ocf:lvm2:clvmd: no such resource agent
>
> How can I set up clvm (mirroring) using Pacemaker DLM integration?
>
>
> Met vriendelijke groeten,
> Pieter Baele
> www.pieterb.be
>
Hi Pieter,

did you find a solution for your problem? I try to bring up clustered lvm on a 3 node SL 6 cluster  
with pacemaker and run in the same problem.

Cheers,

Vadim

-- 
Mit freundlichen Gr??en

Vadim Bulst
Systemadministrator BBZ

Biotechnologisch-Biomedizinisches Zentrum
Universit?t Leipzig
Deutscher Platz 5, 04103 Leipzig
Tel.: 0341 97 - 31 307
Fax : 0341 97 - 31 309


From grimme at atix.de  Mon Nov 14 21:08:42 2011
From: grimme at atix.de (Marc Grimme)
Date: Mon, 14 Nov 2011 22:08:42 +0100 (CET)
Subject: [Linux-cluster] Release of the open-shardroot software version 5.0
In-Reply-To: <c0303ba9-b555-4fb4-a1c8-4be0f1891607@mobilix-20>
Message-ID: <c7d9dcf7-8097-4826-8041-dd232d5fb3b5@mobilix-20>

Hello,
I'm very happy to announce the availability for the open-sharedroot project version 5.0.

It is now possible to build diskless "shared" root clusters for the following configurations:

- RHEL5 Ext3, Ext4, NFS, GFS, GFS2*, GlusterFS*
- RHEL6 Ext3, Ext4, NFS, GFS2*, GlusterFS*
- SLES11 Ext3, Ext4, NFS, OCFS2
- Fedora*
- OpenSuSE*
- Now with and without configuration via /etc/cluster/cluster.conf.

* not yet or not officially supported

We are looking forward to your feedback.

See also  
http://comoonics.org/news-archive/release-of-com-oonics-5-0-for-rhel5-nfs-and-rhel6-nfs-sharedroot

Have fun.
Marc

______________________________________________________________________________

Marc Grimme

E-Mail: grimme( at )atix.de

ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 |
85716 Unterschleissheim | www.atix.de

Enterprise Linux einfach online kaufen: www.linux-subscriptions.com

Registergericht: Amtsgericht M?nchen, Registernummer: HRB 168930, USt.-Id.:
DE209485962 | Vorstand: Thomas Merz (Vors.), Marc Grimme, Mark Hlawatschek, Jan R. Bergrath |
Vorsitzender des Aufsichtsrats: Dr. Martin Buss


From erik.redding at txstate.edu  Tue Nov 15 23:37:23 2011
From: erik.redding at txstate.edu (Redding, Erik)
Date: Tue, 15 Nov 2011 17:37:23 -0600
Subject: [Linux-cluster] client target issue
Message-ID: <9B468ED6-FBC2-4D8E-9E93-3EF20E617879@txstate.edu>

Hello all, 

I'm relatively new to the mailing list and looking for some help with nfs and RHCS.  In my resources, I'm looking to export one mount point to multiple subnets, but running into issues with this:

<nfsclient name="ha-nfs" path="/tr/" fsid="101" target="111.11.11.0/22,111.11.8.0/24" options="sync,rw"/>

I'm getting an "unmatched host" error.

And this works:
<nfsclient name="ha-nfs" path="/tr/" fsid="101" target="111.11.11.0/22" options="sync,rw"/>

How do you go about defining multiple targets subnets for a single resource?  Do you break them out into separate "nfsclient"s? 


Erik Redding
Core Systems
Texas State University-San Marcos


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4897 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111115/59f6502f/attachment.p7s>

From claudio.martin at abilene.it  Wed Nov 16 10:19:18 2011
From: claudio.martin at abilene.it (Martin Claudio)
Date: Wed, 16 Nov 2011 11:19:18 +0100
Subject: [Linux-cluster] Node Join
Message-ID: <4EC38E26.6060702@abilene.it>

Hi all,

i have a two node cluster with some script in start/stop format.
I notice a strange issue: when second node join cluster after reboot it 
does a script stop cause me some problems.... it's possible to config 
cluster not stopping script when a node join ?

Thanks,

Claudio


From ajb2 at mssl.ucl.ac.uk  Wed Nov 16 11:03:33 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Wed, 16 Nov 2011 11:03:33 +0000
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <e711795e-92d0-45f0-96f8-160e8da07e7c@zmail16.collab.prod.int.phx2.redhat.com>
References: <e711795e-92d0-45f0-96f8-160e8da07e7c@zmail16.collab.prod.int.phx2.redhat.com>
Message-ID: <4EC39885.10309@mssl.ucl.ac.uk>

Bob Peterson wrote:

> I've taken a close look at the image file you created.
> This appears to be a normal, everyday GFS2 file system
> except there is a section of 16 blocks (or 0x10 in hex)
> that are completely destroyed near the beginning of the
> file system, right after the root directory. Unfortunately,
> there are critical system files like the master directory
> that were overwritten.

Single point of failure?

Is there any particular reason not to have secondary superblocks?


From swhiteho at redhat.com  Wed Nov 16 11:14:19 2011
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 16 Nov 2011 11:14:19 +0000
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <4EC39885.10309@mssl.ucl.ac.uk>
References: <e711795e-92d0-45f0-96f8-160e8da07e7c@zmail16.collab.prod.int.phx2.redhat.com>
	<4EC39885.10309@mssl.ucl.ac.uk>
Message-ID: <1321442059.2713.29.camel@menhir>

Hi,

On Wed, 2011-11-16 at 11:03 +0000, Alan Brown wrote:
> Bob Peterson wrote:
> 
> > I've taken a close look at the image file you created.
> > This appears to be a normal, everyday GFS2 file system
> > except there is a section of 16 blocks (or 0x10 in hex)
> > that are completely destroyed near the beginning of the
> > file system, right after the root directory. Unfortunately,
> > there are critical system files like the master directory
> > that were overwritten.
> 
> Single point of failure?
> 
> Is there any particular reason not to have secondary superblocks?
> 
> 
It isn't the super block that is the problem. That is relatively easy to
reconstruct as it has very little information in it, and what is there
is nearly constant anyway.

The problem is the blocks following that, such as the master directory
which contains all the system files. If enough of that has been
destroyed, it would make it very tricky to reconstruct. Even so it might
be possible depending on exactly which blocks are damaged and what is
known about the original fs.

The real question is how those blocks became overwritten in the first
place. However, if there is some other process which has overwritten
part of the disk there is very little that the fs can do,

Steve.


From ajb2 at mssl.ucl.ac.uk  Wed Nov 16 12:00:26 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Wed, 16 Nov 2011 12:00:26 +0000 (GMT)
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <1321442059.2713.29.camel@menhir>
References: <e711795e-92d0-45f0-96f8-160e8da07e7c@zmail16.collab.prod.int.phx2.redhat.com>
	<4EC39885.10309@mssl.ucl.ac.uk> <1321442059.2713.29.camel@menhir>
Message-ID: <Pine.LNX.4.64.1111161158340.12814@mssle5.mssl.ucl.ac.uk>

On Wed, 16 Nov 2011, Steven Whitehouse wrote:

> The problem is the blocks following that, such as the master directory
> which contains all the system files. If enough of that has been
> destroyed, it would make it very tricky to reconstruct. Even so it might
> be possible depending on exactly which blocks are damaged and what is
> known about the original fs.

Why can't this be mirrored at the end of the partition/fs?

> The real question is how those blocks became overwritten in the first
> place. However, if there is some other process which has overwritten
> part of the disk there is very little that the fs can do,

It's most likely to be something external that's done it, but IMO critical
metadata really should be duplicated elsewhere on the FS to aid recovery.

AB


From swhiteho at redhat.com  Wed Nov 16 12:14:36 2011
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 16 Nov 2011 12:14:36 +0000
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <Pine.LNX.4.64.1111161158340.12814@mssle5.mssl.ucl.ac.uk>
References: <e711795e-92d0-45f0-96f8-160e8da07e7c@zmail16.collab.prod.int.phx2.redhat.com>
	<4EC39885.10309@mssl.ucl.ac.uk> <1321442059.2713.29.camel@menhir>
	<Pine.LNX.4.64.1111161158340.12814@mssle5.mssl.ucl.ac.uk>
Message-ID: <1321445676.2713.44.camel@menhir>

Hi,

On Wed, 2011-11-16 at 12:00 +0000, Alan Brown wrote:
> On Wed, 16 Nov 2011, Steven Whitehouse wrote:
> 
> > The problem is the blocks following that, such as the master directory
> > which contains all the system files. If enough of that has been
> > destroyed, it would make it very tricky to reconstruct. Even so it might
> > be possible depending on exactly which blocks are damaged and what is
> > known about the original fs.
> 
> Why can't this be mirrored at the end of the partition/fs?
> 
Because some of those items are updated during normal fs operation and
it would dramatically reduce performance if we had to update multiple
places on disk.

This is no different to any other filesystem. There is an argument for
having multiple superblock copies, which probably wouldn't be too tricky
to add, but we've not bothered so far simply because it is generally
very easy to reconstruct.

It contains the pointers to the root and master inodes, plus the fs
label and uuid and thats basically it.

> > The real question is how those blocks became overwritten in the first
> > place. However, if there is some other process which has overwritten
> > part of the disk there is very little that the fs can do,
> 
> It's most likely to be something external that's done it, but IMO critical
> metadata really should be duplicated elsewhere on the FS to aid recovery.
> 
> AB
> 
We cannot reasonably guard against other processes doing something they
ought not to, directly to the device, by duplicating metadata. This is
why processes have permissions associated with them - they should not
have access to the device if they are not trusted. If the issue is
device reliability, then that should be taken care of at the device
level, using RAID.

We cannot reasonably guard against everything a sysadmin does (I'm not
saying that this was the case here, but something has gone wrong and it
doesn't look like it probably happened via the filesystem) either.

Having a backup of the system is the only real solution in this case,

Steve.


From rpeterso at redhat.com  Wed Nov 16 13:43:35 2011
From: rpeterso at redhat.com (Bob Peterson)
Date: Wed, 16 Nov 2011 08:43:35 -0500 (EST)
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <4EC39885.10309@mssl.ucl.ac.uk>
Message-ID: <043b5183-50b4-4f5b-9ffe-a74d42219844@zmail16.collab.prod.int.phx2.redhat.com>

----- Original Message -----
| Bob Peterson wrote:
| 
| > I've taken a close look at the image file you created.
| > This appears to be a normal, everyday GFS2 file system
| > except there is a section of 16 blocks (or 0x10 in hex)
| > that are completely destroyed near the beginning of the
| > file system, right after the root directory. Unfortunately,
| > there are critical system files like the master directory
| > that were overwritten.
| 
| Single point of failure?
| 
| Is there any particular reason not to have secondary superblocks?

I agree with what Steve Whitehouse has said thus far, but I
want to add:

The latest/greatest upstream fsck.gfs2 has the ability to recreate
pretty much any and all damaged system structures and system files,
but there's only so much it can do. That's why I suggested trying the
experimental RHEL6 version, which isn't too far out of date from
the upstream version. It's much better at recovering single blocks
that have been overwritten, rather than a group of blocks. It's
actually quite sophisticated in recreating things.

Regards,

Bob Peterson
Red Hat File Systems


From raju.rajsand at gmail.com  Wed Nov 16 14:14:36 2011
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Wed, 16 Nov 2011 19:44:36 +0530
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <043b5183-50b4-4f5b-9ffe-a74d42219844@zmail16.collab.prod.int.phx2.redhat.com>
References: <4EC39885.10309@mssl.ucl.ac.uk>
	<043b5183-50b4-4f5b-9ffe-a74d42219844@zmail16.collab.prod.int.phx2.redhat.com>
Message-ID: <CA+YdgapkAqiKe6Mgf0S7vkWhx1ZbvO4tTtNa4mdgMeyk6SojLw@mail.gmail.com>

Greetings,

On Wed, Nov 16, 2011 at 7:13 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> ----- Original Message -----
> The latest/greatest upstream fsck.gfs2 has the ability to recreate
> pretty much any and all damaged system structures and system files,
> but there's only so much it can do. That's why I suggested trying the
> experimental RHEL6 version, which isn't too far out of date from
> the upstream version. It's much better at recovering single blocks
> that have been overwritten, rather than a group of blocks. It's
> actually quite sophisticated in recreating things.
>

Well, can't we (the Redhat/Centos fanboys) expect a critical Clustered
filesystem like GFS2 (Which supports over 16TB on a 64-bit bit systems
at least) take a leaf or two from () ZFS on this issue?

Of course, I don't support misuse of "dd" on any critical system by
anybody. I will make sure that they will not been seen within 100KM
radius near that cluster after doing that. Though I am not vindictive,
I will "hunt them, chase them" and whatever. Even an alcoholic/drug
addict does not do that.

Well, above just my IMHO.

-- 
Regards,

Rajagopal


From swhiteho at redhat.com  Wed Nov 16 14:25:06 2011
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 16 Nov 2011 14:25:06 +0000
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <CA+YdgapkAqiKe6Mgf0S7vkWhx1ZbvO4tTtNa4mdgMeyk6SojLw@mail.gmail.com>
References: <4EC39885.10309@mssl.ucl.ac.uk>
	<043b5183-50b4-4f5b-9ffe-a74d42219844@zmail16.collab.prod.int.phx2.redhat.com>
	<CA+YdgapkAqiKe6Mgf0S7vkWhx1ZbvO4tTtNa4mdgMeyk6SojLw@mail.gmail.com>
Message-ID: <1321453506.2713.61.camel@menhir>

Hi,

On Wed, 2011-11-16 at 19:44 +0530, Rajagopal Swaminathan wrote:
> Greetings,
> 
> On Wed, Nov 16, 2011 at 7:13 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> > ----- Original Message -----
> > The latest/greatest upstream fsck.gfs2 has the ability to recreate
> > pretty much any and all damaged system structures and system files,
> > but there's only so much it can do. That's why I suggested trying the
> > experimental RHEL6 version, which isn't too far out of date from
> > the upstream version. It's much better at recovering single blocks
> > that have been overwritten, rather than a group of blocks. It's
> > actually quite sophisticated in recreating things.
> >
> 
> Well, can't we (the Redhat/Centos fanboys) expect a critical Clustered
> filesystem like GFS2 (Which supports over 16TB on a 64-bit bit systems
> at least) take a leaf or two from () ZFS on this issue?
> 
I'm not quite sure which feature you are suggesting that we take, but
I'd be surprised that if the start of a ZFS filesystem were to be
overwritten that it could be easily reconstructed.

The problem here is "how much is enough?". If we kept the first 8 blocks
of the fs duplicated, then someone would come along and overwrite the
first 16 and them say why did you choose only 8? We could duplicate
everything, but then why not simply mirror at the block device level?

Which is not to say that we couldn't usefully learn a few lessons from
what other filesystems are doing, but only that I'm not sure that it
would help for this particular issue.

> Of course, I don't support misuse of "dd" on any critical system by
> anybody. I will make sure that they will not been seen within 100KM
> radius near that cluster after doing that. Though I am not vindictive,
> I will "hunt them, chase them" and whatever. Even an alcoholic/drug
> addict does not do that.
> 
> Well, above just my IMHO.
> 

Steve.


From raju.rajsand at gmail.com  Wed Nov 16 14:45:51 2011
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Wed, 16 Nov 2011 20:15:51 +0530
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <1321453506.2713.61.camel@menhir>
References: <4EC39885.10309@mssl.ucl.ac.uk>
	<043b5183-50b4-4f5b-9ffe-a74d42219844@zmail16.collab.prod.int.phx2.redhat.com>
	<CA+YdgapkAqiKe6Mgf0S7vkWhx1ZbvO4tTtNa4mdgMeyk6SojLw@mail.gmail.com>
	<1321453506.2713.61.camel@menhir>
Message-ID: <CA+YdgaqqOuAbZeNktfY+PcMm1AOzc3Ym1wtCJBDgqqw2Lu9xtw@mail.gmail.com>

Greetings,

On Wed, Nov 16, 2011 at 7:55 PM, Steven Whitehouse <swhiteho at redhat.com> wrote:
> Hi,
>
> On Wed, 2011-11-16 at 19:44 +0530, Rajagopal Swaminathan wrote:
>> Greetings,
>>
>> On Wed, Nov 16, 2011 at 7:13 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> I'm not quite sure which feature you are suggesting that we take, but
> I'd be surprised that if the start of a ZFS filesystem were to be
> overwritten that it could be easily reconstructed.

Well, Self correction/healing like some kinda block level CRC check?

Cores are cheap nowadays.

I do understand that I/O is still expensive at the OS level.

I would anyday prefer a MD raid to h/w raid as the disk can be simply
ported to another system and the RAID lives.

I know I am being a bit (or a lot) vague here.

>
> The problem here is "how much is enough?". If we kept the first 8 blocks
> of the fs duplicated, then someone would come along and overwrite the
> first 16 and them say why did you choose only 8? We could duplicate
> everything, but then why not simply mirror at the block device level?
>

Now, I _did_ opine on the matter of the sysadmin managing a cluster
who does not monitor any command/process which does low level access
to disk.

I can't comment on some RDBMS preferring "Raw devices".

> Which is not to say that we couldn't usefully learn a few lessons from
> what other filesystems are doing, but only that I'm not sure that it
> would help for this particular issue.
>

IMHO, It is quite simple: have multiple backups of critical data at
every level -- be it block level, filesystem, files etc. etc., if not
pragmatic in local, then a remote system.

I have implemented at least a few RHCS way back in 2007-2009. Some,
who could afford, were using storage (like IBM DS<something>,
Sun<somemodel>, HP<somemodel>) and, many, DRBD.

Apologies if I sounded arrogant, It is just the pain I have
encountered when it comes to cost.

Well, again, above just my IMHO.

-- 
Regards,

Rajagopal


From linux at alteeve.com  Wed Nov 16 15:15:05 2011
From: linux at alteeve.com (Digimer)
Date: Wed, 16 Nov 2011 10:15:05 -0500
Subject: [Linux-cluster] Node Join
In-Reply-To: <4EC38E26.6060702@abilene.it>
References: <4EC38E26.6060702@abilene.it>
Message-ID: <4EC3D379.3040200@alteeve.com>

On 11/16/2011 05:19 AM, Martin Claudio wrote:
> Hi all,
> 
> i have a two node cluster with some script in start/stop format.
> I notice a strange issue: when second node join cluster after reboot it
> does a script stop cause me some problems.... it's possible to config
> cluster not stopping script when a node join ?
> 
> Thanks,
> 
> Claudio

It would help if you told us what kind of cluster you've built, on what
distro/version, what your configuration is, etc.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron


From ajb2 at mssl.ucl.ac.uk  Wed Nov 16 15:43:48 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Wed, 16 Nov 2011 15:43:48 +0000
Subject: [Linux-cluster] error messages while use fsck.gfs2
In-Reply-To: <1321453506.2713.61.camel@menhir>
References: <4EC39885.10309@mssl.ucl.ac.uk>	<043b5183-50b4-4f5b-9ffe-a74d42219844@zmail16.collab.prod.int.phx2.redhat.com>	<CA+YdgapkAqiKe6Mgf0S7vkWhx1ZbvO4tTtNa4mdgMeyk6SojLw@mail.gmail.com>
	<1321453506.2713.61.camel@menhir>
Message-ID: <4EC3DA34.40109@mssl.ucl.ac.uk>

Steven Whitehouse wrote:

>> Well, can't we (the Redhat/Centos fanboys) expect a critical Clustered
>> filesystem like GFS2 (Which supports over 16TB on a 64-bit bit systems
>> at least) take a leaf or two from () ZFS on this issue?
>>
> I'm not quite sure which feature you are suggesting that we take, but
> I'd be surprised that if the start of a ZFS filesystem were to be
> overwritten that it could be easily reconstructed.

ZFS is paranoid about metadata integrity to a degree which some might 
regard as obsessional. There are at least 2 copies of everything and 4 
for anything critical.

Of course it does have the advantage of not being a cluster filesystem 
and having built in raid modes which makes raid6 look a bit careless, 
along with online fsck and the ability to detect silent disk errors 
(this is important given that it's statistically likely that a 2Tb 
512bytes/sector drive will have undetectable errors not picked up by 
onboard ECC a couple of times a year)

> Which is not to say that we couldn't usefully learn a few lessons from
> what other filesystems are doing, but only that I'm not sure that it
> would help for this particular issue.

Bearing in mind that ZFS code is cddl, not GPLv2 (or v3), I believe that 
it'd be worth looking at the design principles.

I've abused my test boxes in ways which have irreversably corrupted 
every other filesystem and the worst ZFS has ever done is take the FS 
offline.

I really wish there was some way of bringing ZFS into the RHEL fold (ie, 
as a supported FS) as in my opinion it beats the pants off XFS, Ext3/4 
or btrfs.

Alan


From mbubb at collective.com  Wed Nov 16 16:42:06 2011
From: mbubb at collective.com (Michael Bubb)
Date: Wed, 16 Nov 2011 11:42:06 -0500
Subject: [Linux-cluster] slow I/O on v7000 SAN
Message-ID: <4EC3E7DE.7000505@collective.com>

Hello -

We are experiencing extreme I/O slowness on a gfs2 volume on a SAN.

We have a:

Netezza TF24
IBM V7000 SAN
IBM Bladecenter with 3 HS22 blades
Stand alone HP DL380 G7 server


The 3 blades and the HP DL380 are clustered using RHEL 6.1 and
clustersuite 5.5.

We have 2 clustered volumes on different storage pools (one has 10k
drives the other 7.2k).

We have an internal test that reads a large file (950G) using fopen and
memmap. On a standalone server in a datacenter (Ubuntu Raid 5 10k disks)
the tests take approximately 75seconds to run.

On the blades the test takes 300 - 350 seconds.

I have been looking at the cluster conf; any gfs tuning I can find. I am
not really sure what I should post here?

yrs

Michael

-- 


Michael Bubb
System Administrator
..................................................
T: +1.646.380-2738  | C: +1.646.783.8769
mbubb at collective.com  | michael.bubb at gmail.com
..................................................
Collective | collective.com | Twitter: @collectivesays
99 Park Ave. 5th Floor |  New York, NY 10016 | +1.888.460.9513
..................................................

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111116/c96014a8/attachment.sig>

From swhiteho at redhat.com  Wed Nov 16 16:54:01 2011
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 16 Nov 2011 16:54:01 +0000
Subject: [Linux-cluster] slow I/O on v7000 SAN
In-Reply-To: <4EC3E7DE.7000505@collective.com>
References: <4EC3E7DE.7000505@collective.com>
Message-ID: <1321462441.2713.69.camel@menhir>

Hi,

On Wed, 2011-11-16 at 11:42 -0500, Michael Bubb wrote:
> Hello -
> 
> We are experiencing extreme I/O slowness on a gfs2 volume on a SAN.
> 
> We have a:
> 
> Netezza TF24
> IBM V7000 SAN
> IBM Bladecenter with 3 HS22 blades
> Stand alone HP DL380 G7 server
> 
> 
> The 3 blades and the HP DL380 are clustered using RHEL 6.1 and
> clustersuite 5.5.
> 
You should ask the Red Hat support team about this, as they should be
able to help.

> We have 2 clustered volumes on different storage pools (one has 10k
> drives the other 7.2k).
> 
> We have an internal test that reads a large file (950G) using fopen and
> memmap. On a standalone server in a datacenter (Ubuntu Raid 5 10k disks)
> the tests take approximately 75seconds to run.
> 
> On the blades the test takes 300 - 350 seconds.
> 
> I have been looking at the cluster conf; any gfs tuning I can find. I am
> not really sure what I should post here?
> 
> yrs
> 
> Michael
> 
So it is a streaming data test. Are you running it on all three nodes at
the same time or just one when you get the 300 seconds times? Did you
mount with noatime,nodiratime set?

Are the drives you are using just a straight linear lvm volume from a
JBOD effectively?

Steve.
 

From rpeterso at redhat.com  Wed Nov 16 16:57:06 2011
From: rpeterso at redhat.com (Bob Peterson)
Date: Wed, 16 Nov 2011 11:57:06 -0500 (EST)
Subject: [Linux-cluster] slow I/O on v7000 SAN
In-Reply-To: <4EC3E7DE.7000505@collective.com>
Message-ID: <18fd9e40-e331-4ec4-a5b1-2d652cb65fa2@zmail16.collab.prod.int.phx2.redhat.com>

----- Original Message -----
| Hello -
| 
| We are experiencing extreme I/O slowness on a gfs2 volume on a SAN.
| 
| We have a:
| 
| Netezza TF24
| IBM V7000 SAN
| IBM Bladecenter with 3 HS22 blades
| Stand alone HP DL380 G7 server
| 
| 
| The 3 blades and the HP DL380 are clustered using RHEL 6.1 and
| clustersuite 5.5.
| 
| We have 2 clustered volumes on different storage pools (one has 10k
| drives the other 7.2k).
| 
| We have an internal test that reads a large file (950G) using fopen
| and
| memmap. On a standalone server in a datacenter (Ubuntu Raid 5 10k
| disks)
| the tests take approximately 75seconds to run.
| 
| On the blades the test takes 300 - 350 seconds.
| 
| I have been looking at the cluster conf; any gfs tuning I can find. I
| am
| not really sure what I should post here?
| 
| yrs
| 
| Michael
| 
| --
| 
| 
| Michael Bubb
| System Administrator
Hi Michael,

It sounds like you probably ran into our "Large File Allocate" bug.
This is solved in recent kernels, including 5.7.z, 5.8, 6.3, etc.
More info here:

https://bugzilla.redhat.com/show_bug.cgi?id=683155

Regards,

Bob Peterson
Red Hat File Systems


From mbubb at collective.com  Wed Nov 16 17:07:01 2011
From: mbubb at collective.com (Michael Bubb)
Date: Wed, 16 Nov 2011 12:07:01 -0500
Subject: [Linux-cluster] slow I/O on v7000 SAN
In-Reply-To: <1321462441.2713.69.camel@menhir>
References: <4EC3E7DE.7000505@collective.com> <1321462441.2713.69.camel@menhir>
Message-ID: <4EC3EDB5.7000804@collective.com>


On 11/16/2011 11:54 AM, Steven Whitehouse wrote:
> Hi,
> 
> On Wed, 2011-11-16 at 11:42 -0500, Michael Bubb wrote:
>> Hello -
>>
>> We are experiencing extreme I/O slowness on a gfs2 volume on a SAN.
>>
>> We have a:
>>
>> Netezza TF24
>> IBM V7000 SAN
>> IBM Bladecenter with 3 HS22 blades
>> Stand alone HP DL380 G7 server
>>
>>
>> The 3 blades and the HP DL380 are clustered using RHEL 6.1 and
>> clustersuite 5.5.
>>
> You should ask the Red Hat support team about this, as they should be
> able to help.

I assumed that this would be assumed. I have had a ticket in for a
while. The ticket is not going anywhere. I have sent sosrepts, etc... I
also have aticket in with IBM regarding the SAN. After a day they came
back and told me "Yes, you have I/O slownesss...."

So I thought I would hit the community and see if this rings a bell with
anyone else.
> 
>> We have 2 clustered volumes on different storage pools (one has 10k
>> drives the other 7.2k).
>>
>> We have an internal test that reads a large file (950G) using fopen and
>> memmap. On a standalone server in a datacenter (Ubuntu Raid 5 10k disks)
>> the tests take approximately 75seconds to run.
>>
>> On the blades the test takes 300 - 350 seconds.
>>
>> I have been looking at the cluster conf; any gfs tuning I can find. I am
>> not really sure what I should post here?
>>
>> yrs
>>
>> Michael
>>
> So it is a streaming data test. Are you running it on all three nodes at
> the same time or just one when you get the 300 seconds times? Did you
> mount with noatime,nodiratime set?

We tested first all nodes, then one node. There is amazing consistency
here. He have run this test about 10 times in different scenarios and it
is always about 5 times slower than the Ubuntu SATA Raid5 volume.

> 
> Are the drives you are using just a straight linear lvm volume from a
> JBOD effectively?
THey are RAID6 mdisks on the SAN organized into storage pools. THe
volumes are created on top of this.
> 
> Steve.
>  
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 


Michael Bubb
System Administrator
..................................................
T: +1.646.380-2738  | C: +1.646.783.8769
mbubb at collective.com  | michael.bubb at gmail.com
..................................................
Collective | collective.com | Twitter: @collectivesays
99 Park Ave. 5th Floor |  New York, NY 10016 | +1.888.460.9513
..................................................

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111116/dba7982b/attachment.sig>

From mbubb at collective.com  Wed Nov 16 17:12:21 2011
From: mbubb at collective.com (Michael Bubb)
Date: Wed, 16 Nov 2011 12:12:21 -0500
Subject: [Linux-cluster] slow I/O on v7000 SAN
In-Reply-To: <18fd9e40-e331-4ec4-a5b1-2d652cb65fa2@zmail16.collab.prod.int.phx2.redhat.com>
References: <18fd9e40-e331-4ec4-a5b1-2d652cb65fa2@zmail16.collab.prod.int.phx2.redhat.com>
Message-ID: <4EC3EEF5.6060200@collective.com>

Thank yo uBob - looks interesting.

We are:

[root at ny7blade01 ~]# uname -a
Linux ny7blade01 2.6.32-131.17.1.el6.x86_64 #1 SMP Thu Sep 29 10:24:25
EDT 2011 x86_64 x86_64 x86_64 GNU/Linux


That bug looks limited to RHEL 5.x - would it affect us?


On 11/16/2011 11:57 AM, Bob Peterson wrote:
> ----- Original Message -----
> | Hello -
> | 
> | We are experiencing extreme I/O slowness on a gfs2 volume on a SAN.
> | 
> | We have a:
> | 
> | Netezza TF24
> | IBM V7000 SAN
> | IBM Bladecenter with 3 HS22 blades
> | Stand alone HP DL380 G7 server
> | 
> | 
> | The 3 blades and the HP DL380 are clustered using RHEL 6.1 and
> | clustersuite 5.5.
> | 
> | We have 2 clustered volumes on different storage pools (one has 10k
> | drives the other 7.2k).
> | 
> | We have an internal test that reads a large file (950G) using fopen
> | and
> | memmap. On a standalone server in a datacenter (Ubuntu Raid 5 10k
> | disks)
> | the tests take approximately 75seconds to run.
> | 
> | On the blades the test takes 300 - 350 seconds.
> | 
> | I have been looking at the cluster conf; any gfs tuning I can find. I
> | am
> | not really sure what I should post here?
> | 
> | yrs
> | 
> | Michael
> | 
> | --
> | 
> | 
> | Michael Bubb
> | System Administrator
> Hi Michael,
> 
> It sounds like you probably ran into our "Large File Allocate" bug.
> This is solved in recent kernels, including 5.7.z, 5.8, 6.3, etc.
> More info here:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=683155
> 
> Regards,
> 
> Bob Peterson
> Red Hat File Systems
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 


Michael Bubb
System Administrator
..................................................
T: +1.646.380-2738  | C: +1.646.783.8769
mbubb at collective.com  | michael.bubb at gmail.com
..................................................
Collective | collective.com | Twitter: @collectivesays
99 Park Ave. 5th Floor |  New York, NY 10016 | +1.888.460.9513
..................................................

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111116/2e2e43ae/attachment.sig>

From rpeterso at redhat.com  Wed Nov 16 18:05:19 2011
From: rpeterso at redhat.com (Bob Peterson)
Date: Wed, 16 Nov 2011 13:05:19 -0500 (EST)
Subject: [Linux-cluster] slow I/O on v7000 SAN
In-Reply-To: <4EC3EEF5.6060200@collective.com>
Message-ID: <e6a60937-7c53-4a3a-aa30-3672ea05a095@zmail16.collab.prod.int.phx2.redhat.com>

----- Original Message -----
| Thank yo uBob - looks interesting.
| 
| We are:
| 
| [root at ny7blade01 ~]# uname -a
| Linux ny7blade01 2.6.32-131.17.1.el6.x86_64 #1 SMP Thu Sep 29
| 10:24:25
| EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
| 
| 
| That bug looks limited to RHEL 5.x - would it affect us?

Hi Michael,

Sorry, my mistake. I had assumed your systems were RHEL5.
RHEL6 is not affected by that problem.

Regards,

Bob Peterson
Red Hat File Systems


From mbubb at collective.com  Wed Nov 16 18:05:23 2011
From: mbubb at collective.com (Michael Bubb)
Date: Wed, 16 Nov 2011 13:05:23 -0500
Subject: [Linux-cluster] slow I/O on v7000 SAN
In-Reply-To: <e6a60937-7c53-4a3a-aa30-3672ea05a095@zmail16.collab.prod.int.phx2.redhat.com>
References: <e6a60937-7c53-4a3a-aa30-3672ea05a095@zmail16.collab.prod.int.phx2.redhat.com>
Message-ID: <4EC3FB63.5020301@collective.com>

Thank you anyway - I feel like it is something like this.

On 11/16/2011 01:05 PM, Bob Peterson wrote:
> ----- Original Message -----
> | Thank yo uBob - looks interesting.
> | 
> | We are:
> | 
> | [root at ny7blade01 ~]# uname -a
> | Linux ny7blade01 2.6.32-131.17.1.el6.x86_64 #1 SMP Thu Sep 29
> | 10:24:25
> | EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
> | 
> | 
> | That bug looks limited to RHEL 5.x - would it affect us?
> 
> Hi Michael,
> 
> Sorry, my mistake. I had assumed your systems were RHEL5.
> RHEL6 is not affected by that problem.
> 
> Regards,
> 
> Bob Peterson
> Red Hat File Systems

-- 


Michael Bubb
System Administrator
..................................................
T: +1.646.380-2738  | C: +1.646.783.8769
mbubb at collective.com  | michael.bubb at gmail.com
..................................................
Collective | collective.com | Twitter: @collectivesays
99 Park Ave. 5th Floor |  New York, NY 10016 | +1.888.460.9513
..................................................

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111116/9d41428a/attachment.sig>

From mbubb at collective.com  Wed Nov 16 17:27:09 2011
From: mbubb at collective.com (Michael Bubb)
Date: Wed, 16 Nov 2011 12:27:09 -0500
Subject: [Linux-cluster] slow I/O on v7000 SAN
In-Reply-To: <4EC3EDB5.7000804@collective.com>
References: <4EC3E7DE.7000505@collective.com> <1321462441.2713.69.camel@menhir>
	<4EC3EDB5.7000804@collective.com>
Message-ID: <4EC3F26D.3030804@collective.com>

On the Red Hat and IBM ticket mention - I wasn't really complaining
about either support. They have been pretty responsive. Just we are not
finding the underlying cause.

Red Hat support is pretty good - I have seen much worse. I didn't mean
to come off annoyed by RH tech support.

I really cant see where the issue is. At first I thought it might be a
fencing issue, or a gfs2 tuning issue (plocks, etc).

Recently I started looking at it from the SAN side - wondering if
somethig was amiss there.

On 11/16/2011 12:07 PM, Michael Bubb wrote:
> 
> 
> On 11/16/2011 11:54 AM, Steven Whitehouse wrote:
>> Hi,
>>
>> On Wed, 2011-11-16 at 11:42 -0500, Michael Bubb wrote:
>>> Hello -
>>>
>>> We are experiencing extreme I/O slowness on a gfs2 volume on a SAN.
>>>
>>> We have a:
>>>
>>> Netezza TF24
>>> IBM V7000 SAN
>>> IBM Bladecenter with 3 HS22 blades
>>> Stand alone HP DL380 G7 server
>>>
>>>
>>> The 3 blades and the HP DL380 are clustered using RHEL 6.1 and
>>> clustersuite 5.5.
>>>
>> You should ask the Red Hat support team about this, as they should be
>> able to help.
> 
> I assumed that this would be assumed. I have had a ticket in for a
> while. The ticket is not going anywhere. I have sent sosrepts, etc... I
> also have aticket in with IBM regarding the SAN. After a day they came
> back and told me "Yes, you have I/O slownesss...."
> 
> So I thought I would hit the community and see if this rings a bell with
> anyone else.
>>
>>> We have 2 clustered volumes on different storage pools (one has 10k
>>> drives the other 7.2k).
>>>
>>> We have an internal test that reads a large file (950G) using fopen and
>>> memmap. On a standalone server in a datacenter (Ubuntu Raid 5 10k disks)
>>> the tests take approximately 75seconds to run.
>>>
>>> On the blades the test takes 300 - 350 seconds.
>>>
>>> I have been looking at the cluster conf; any gfs tuning I can find. I am
>>> not really sure what I should post here?
>>>
>>> yrs
>>>
>>> Michael
>>>
>> So it is a streaming data test. Are you running it on all three nodes at
>> the same time or just one when you get the 300 seconds times? Did you
>> mount with noatime,nodiratime set?
> 
> We tested first all nodes, then one node. There is amazing consistency
> here. He have run this test about 10 times in different scenarios and it
> is always about 5 times slower than the Ubuntu SATA Raid5 volume.
> 
>>
>> Are the drives you are using just a straight linear lvm volume from a
>> JBOD effectively?
> THey are RAID6 mdisks on the SAN organized into storage pools. THe
> volumes are created on top of this.
>>
>> Steve.
>>  
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 


Michael Bubb
System Administrator
..................................................
T: +1.646.380-2738  | C: +1.646.783.8769
mbubb at collective.com  | michael.bubb at gmail.com
..................................................
Collective | collective.com | Twitter: @collectivesays
99 Park Ave. 5th Floor |  New York, NY 10016 | +1.888.460.9513
..................................................

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111116/7d88a38e/attachment.sig>

From Nicholas.Geovanis at uscellular.com  Wed Nov 16 17:45:55 2011
From: Nicholas.Geovanis at uscellular.com (Geovanis, Nicholas)
Date: Wed, 16 Nov 2011 11:45:55 -0600
Subject: [Linux-cluster] Linux-cluster Digest, Vol 91, Issue 17
In-Reply-To: <mailman.16086.1321462632.23108.linux-cluster@redhat.com>
References: <mailman.16086.1321462632.23108.linux-cluster@redhat.com>
Message-ID: <A64E914D72373B4287DF2222856FFF3512C728A4@ILBENEVS001.int.usc.local>

>> It sounds like you probably ran into our "Large File Allocate" bug.
>> This is solved in recent kernels, including 5.7.z, 5.8, 6.3, etc.

So it looks like this is fixed in kernel-2.6.18-274?

Nick Geovanis
US Cellular/Kforce Inc
v. 708-674-4924
e. Nicholas.Geovanis at uscellular.com

-----Original Message-----

It sounds like you probably ran into our "Large File Allocate" bug.
This is solved in recent kernels, including 5.7.z, 5.8, 6.3, etc.
More info here:

https://bugzilla.redhat.com/show_bug.cgi?id=683155

Regards,

Bob Peterson
Red Hat File Systems


------------------------------


From rpeterso at redhat.com  Wed Nov 16 19:55:37 2011
From: rpeterso at redhat.com (Bob Peterson)
Date: Wed, 16 Nov 2011 14:55:37 -0500 (EST)
Subject: [Linux-cluster] Linux-cluster Digest, Vol 91, Issue 17
In-Reply-To: <A64E914D72373B4287DF2222856FFF3512C728A4@ILBENEVS001.int.usc.local>
Message-ID: <ad9375d3-1f89-4334-a3b7-d3a91ab7a815@zmail16.collab.prod.int.phx2.redhat.com>

----- Original Message -----
| >> It sounds like you probably ran into our "Large File Allocate"
| >> bug.
| >> This is solved in recent kernels, including 5.7.z, 5.8, 6.3, etc.
| 
| So it looks like this is fixed in kernel-2.6.18-274?
| 
| Nick Geovanis
| US Cellular/Kforce Inc
| v. 708-674-4924
| e. Nicholas.Geovanis at uscellular.com

Hi Nick,

If I've done my homework right, it should be in 2.6.18-274.11.1.el5
and newer for the z-stream, otherwise 2.6.18-287.el5 and up.

Regards,

Bob Peterson
Red Hat File Systems


From loforbes at alaska.edu  Wed Nov 16 20:36:03 2011
From: loforbes at alaska.edu (Liam Forbes)
Date: Wed, 16 Nov 2011 11:36:03 -0900
Subject: [Linux-cluster] clustered web service using Apache resource?
Message-ID: <CACXvBAr3gS+AC-DCbkygne0ucQSEXXCp-hWMH7TC70t0VfAJEg@mail.gmail.com>

Hello,

I've set up a 5 node cluster using RHEL 6.1 and the HA & Resilient
Storage add-ons, with one service group (so far) containing an IP
address resource and an Apache resource. ?The service is working fine,
but only on the first node (which happens to be where luci is
running). ?All the /etc/cluster/cluster.conf files match, but I only
see the /etc/cluster/apache directory on that node. ?I have one
failover domain that includes all nodes.  All nodes have the httpd RPM
and the GFS2 filesystem containing the?apache service?files?mounted
(from /etc/fstab), but I don't seem to be able to get the service to
move over to another node through the Conga interface. ?I haven't
tried anything at the command-line yet.

In C. Hohberger's Red Hat SUMMIT tutorial slides, I see he used a
Script instead of an Apache resource. ?I also found a statement in KB
DOC-5897 that an apache web server clustered service needs an IP
address, a script, and a filesystem. ?Why a Script instead of an
Apache resource? ?Is using the Apache resource why I can't get the
service to launch on another node?

[lforbes at hatest1 ~]$ ls -l /etc/cluster
total 12
drwxr-xr-x 3 root root 4096 Nov 16 10:25 apache
-rw-r----- 1 root root 3370 Nov 16 10:24 cluster.conf
drwxr-xr-x 2 root root 4096 Aug ?5 04:15 cman-notify.d
[lforbes at hatest1 ~]$ ls -l /etc/cluster/apache
total 4
drwxr-xr-x 2 root root 4096 Nov 16 10:25 apache:bbtest
[lforbes at hatest1 ~]$ ls -l /etc/cluster/apache/apache\:bbtest/
total 36
-rw-r--r-- 1 root root 35021 Nov 16 10:25 httpd.conf

[lforbes at hatest5 ~]$ ls -l /etc/cluster
total 8
-rw-r----- 1 root root 3370 Nov 16 10:24 cluster.conf
drwxr-xr-x 2 root root 4096 Aug ?5 04:15 cman-notify.d


[lforbes at hatest1 ~]$ sudo md5sum /etc/cluster/cluster.conf
d1eda9c2f1452a729837374b8f1779c6 ?/etc/cluster/cluster.conf

[lforbes at hatest5 ~]$ sudo md5sum /etc/cluster/cluster.conf
d1eda9c2f1452a729837374b8f1779c6 ?/etc/cluster/cluster.conf


[lforbes at hatest1 ~]$ ps -eaf | grep http | grep -v grep
root ? ? 22437 ? ? 1 ?0 10:25 ? ? ? ? ?00:00:00 /usr/sbin/httpd
-Dbbtest -d /datafs/bbtest/httpd -f
/etc/cluster/apache/apache:bbtest/httpd.conf -k start
apache ? 22448 22437 ?0 10:25 ? ? ? ? ?00:00:00 /usr/sbin/httpd
-Dbbtest -d /datafs/bbtest/httpd -f
/etc/cluster/apache/apache:bbtest/httpd.conf -k start
apache ? 22450 22437 ?0 10:25 ? ? ? ? ?00:00:00 /usr/sbin/httpd
-Dbbtest -d /datafs/bbtest/httpd -f
/etc/cluster/apache/apache:bbtest/httpd.conf -k start
apache ? 22451 22437 ?0 10:25 ? ? ? ? ?00:00:00 /usr/sbin/httpd
-Dbbtest -d /datafs/bbtest/httpd -f
/etc/cluster/apache/apache:bbtest/httpd.conf -k start
apache ? 22452 22437 ?0 10:25 ? ? ? ? ?00:00:00 /usr/sbin/httpd
-Dbbtest -d /datafs/bbtest/httpd -f
/etc/cluster/apache/apache:bbtest/httpd.conf -k start
apache ? 22453 22437 ?0 10:25 ? ? ? ? ?00:00:00 /usr/sbin/httpd
-Dbbtest -d /datafs/bbtest/httpd -f
/etc/cluster/apache/apache:bbtest/httpd.conf -k start
apache ? 22454 22437 ?0 10:25 ? ? ? ? ?00:00:00 /usr/sbin/httpd
-Dbbtest -d /datafs/bbtest/httpd -f
/etc/cluster/apache/apache:bbtest/httpd.conf -k start
apache ? 22455 22437 ?0 10:25 ? ? ? ? ?00:00:00 /usr/sbin/httpd
-Dbbtest -d /datafs/bbtest/httpd -f
/etc/cluster/apache/apache:bbtest/httpd.conf -k start
apache ? 22456 22437 ?0 10:25 ? ? ? ? ?00:00:00 /usr/sbin/httpd
-Dbbtest -d /datafs/bbtest/httpd -f
/etc/cluster/apache/apache:bbtest/httpd.conf -k start
[lforbes at hatest1 ~]$ ls -l /datafs/bbtest
total 12
drwxr-xr-x 2 apache apache 0 Oct ?6 07:47 cgi-bin
drwxr-xr-x 2 apache apache 0 Nov 16 10:08 html
drwxr-xr-x 6 root ? root ? 0 Nov 16 10:00 httpd
[lforbes at hatest1 ~]$ ls -l /datafs/bbtest/httpd
total 20
drwxr-xr-x 2 root root ?0 Nov 16 10:14 conf
drwxr-xr-x 2 root root ?0 Oct 23 02:33 conf.d
drwxr-xr-x 2 root root ?0 Nov 16 10:06 logs
lrwxrwxrwx 1 root root 24 Nov 16 10:00 modules -> /usr/lib64/httpd/modules
drwxr-xr-x 2 root root ?0 Nov 16 10:00 run

[lforbes at hatest5 ~]$ ps -eaf | grep http | grep -v grep
[lforbes at hatest5 ~]$ ls -l /datafs/bbtest
total 12
drwxr-xr-x 2 apache apache 3864 Oct  6 07:47 cgi-bin
drwxr-xr-x 2 apache apache 3864 Nov 16 10:08 html
drwxr-xr-x 6 root   root   3864 Nov 16 10:00 httpd
[lforbes at hatest5 ~]$ ls -l /datafs/bbtest/httpd
total 20
drwxr-xr-x 2 root root 3864 Nov 16 10:14 conf
drwxr-xr-x 2 root root 3864 Oct 23 02:33 conf.d
drwxr-xr-x 2 root root 3864 Nov 16 10:06 logs
lrwxrwxrwx 1 root root   24 Nov 16 10:00 modules -> /usr/lib64/httpd/modules
drwxr-xr-x 2 root root 3864 Nov 16 10:00 run

Assistance appreciated.

--
Regards,
-liam

-There are uncountably more irrational fears than rational ones. -P. Dolan
Liam Forbes? ? ?? HPC Systems Analyst,? ? ? HPC&S Group Leader,? ? ? CISSP
ARSC, U of AK, Fairbanks?? lforbes at arsc.edu 907-450-8618 fax: 907-450-8605


From junaidkhan1081 at yahoo.co.uk  Wed Nov 16 21:08:05 2011
From: junaidkhan1081 at yahoo.co.uk (Junaid Khan)
Date: Wed, 16 Nov 2011 21:08:05 +0000 (GMT)
Subject: [Linux-cluster] failover problem
Message-ID: <1321477685.96440.YahooMailNeo@web29114.mail.ird.yahoo.com>

Hi All,
?
This is Junaid.?I have two node RHEL 5.3 cluster servers. These are oracle database servers. Unfortunately, there was a power outage in datacenter. One of the active server was rebooted but failover did not happen to other server.
?
I am investigating to find root cause but still no evidence.
?
Please help me to find out root cause as I need to report my manager asap.
?
Here is cluster log error:
?
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] send threads (0 threads)
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] RRP token expired timeout (495 ms)
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] RRP token problem counter (2000 ms)
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] RRP threshold (10 problem count)
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] RRP mode set to none.
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] heartbeat_failures_allowed (0)
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] max_network_delay (50 ms)
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] Transmit multicast socket send buffer size (288000 bytes).
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] The network interface [10.128.7.13] is now up.
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] Created or loaded sequence id 4728.10.128.7.13 for this ring.
Nov? 4 14:05:55 akarsum1 openais[5827]: [TOTEM] entering GATHER state from 15.
Nov? 4 14:05:55 akarsum1 openais[5827]: [SERV ] Initialising service handler 'openais extended virtual synchrony service'
Nov? 4 14:05:55 akarsum1 openais[5827]: [SERV ] Initialising service handler 'openais cluster membership service B.01.01'
Nov? 4 14:05:55 akarsum1 openais[5827]: [SERV ] Initialising service handler 'openais availability management framework B.01.01'
Nov? 4 14:05:55 akarsum1 openais[5827]: [SERV ] Initialising service handler 'openais checkpoint service B.01.01'
Nov? 4 14:05:55 akarsum1 openais[5827]: [SERV ] Initialising service handler 'openais event service B.01.01'
Nov? 4 14:05:55 akarsum1 openais[5827]: [SERV ] Initialising service handler 'openais distributed locking service B.01.01'
Nov? 4 14:05:55 akarsum1 openais[5827]: [SERV ] Initialising service handler 'openais message service B.01.01'
Nov? 4 14:05:55 akarsum1 openais[5827]: [SERV ] Initialising service handler 'openais configuration service'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111116/bd3813e7/attachment.htm>

From goutam.baul at rp-sg.in  Thu Nov 17 05:49:47 2011
From: goutam.baul at rp-sg.in (Goutam Baul)
Date: Thu, 17 Nov 2011 11:19:47 +0530
Subject: [Linux-cluster] DRAC5 fencing issue - unable to connect/login
Message-ID: <NCEPJJMAJKELEEEMCIHNIEGEIBAA.goutam.baul@rp-sg.in>

Dear List,

We are trying to configure a RHEL cluster using two Dell Servers (Power Edge
R610) with iDRAC 6. We intend to use the DRAC as the fencing device. We have
installed 32 bit RHEL 5.4 OS and the cluster packages.

When we are trying to test the fencing manually from one of the member
nodes, we are getting the following error :-

[root at drmail2 ~]# fence_drac5 -a 10.50.4.22 -l root -p calvin -x -c
"admin1->" -o status
Unable to connect/login to fencing device

But we are able to login to the DRAC using ssh as shown below:-

[root at drmail2 ~]# ssh 10.50.4.22
root at 10.50.4.22's password:
/admin1->

But when we enable the telnet service in the iDRAC card then we are getting
the desired output.

[root at drmail2 ~]# fence_drac5 -a 10.50.4.22 -l root -p calvin  -c
"admin1->" -o status
Status: ON

Any clue please?

With regards,

Goutam

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111117/8dafe72e/attachment.htm>

From stefan at lsd.co.za  Thu Nov 17 06:45:39 2011
From: stefan at lsd.co.za (Stefan Lesicnik)
Date: Thu, 17 Nov 2011 08:45:39 +0200 (SAST)
Subject: [Linux-cluster] DRAC5 fencing issue - unable to connect/login
Message-ID: <q3dpur05ysn5vbtwylk56g8y.1321512321008@email.android.com>

Hi,

Try use ipmilan instead of Drac.

Stefan

Goutam Baul <goutam.baul at rp-sg.in> wrote:

Dear List,

We are trying to configure a RHEL cluster using two Dell Servers (Power Edge
R610) with iDRAC 6. We intend to use the DRAC as the fencing device. We have
installed 32 bit RHEL 5.4 OS and the cluster packages.

When we are trying to test the fencing manually from one of the member
nodes, we are getting the following error :-

[root at drmail2 ~]# fence_drac5 -a 10.50.4.22 -l root -p calvin -x -c
"admin1->" -o status
Unable to connect/login to fencing device

But we are able to login to the DRAC using ssh as shown below:-

[root at drmail2 ~]# ssh 10.50.4.22
root at 10.50.4.22's password:
/admin1->

But when we enable the telnet service in the iDRAC card then we are getting
the desired output.

[root at drmail2 ~]# fence_drac5 -a 10.50.4.22 -l root -p calvin  -c
"admin1->" -o status
Status: ON

Any clue please?

With regards,

Goutam


From Ralph.Grothe at itdz-berlin.de  Thu Nov 17 10:02:15 2011
From: Ralph.Grothe at itdz-berlin.de (Ralph.Grothe at itdz-berlin.de)
Date: Thu, 17 Nov 2011 11:02:15 +0100
Subject: [Linux-cluster] clustered web service using Apache resource?
In-Reply-To: <CACXvBAr3gS+AC-DCbkygne0ucQSEXXCp-hWMH7TC70t0VfAJEg@mail.gmail.com>
References: <CACXvBAr3gS+AC-DCbkygne0ucQSEXXCp-hWMH7TC70t0VfAJEg@mail.gmail.com>
Message-ID: <A789DDB53ED7E94396E842EE2AC9B5FF01432B12@itdzex101.ITDZ.verwalt-berlin.de>

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Liam
Forbes
> Sent: Wednesday, November 16, 2011 21:36
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] clustered web service using Apache
resource?
> 
> Hello,
> 
> I've set up a 5 node cluster using RHEL 6.1 and the HA &
Resilient
> Storage add-ons, with one service group (so far) containing an
IP
> address resource and an Apache resource. ?The service is
working fine,
> but only on the first node (which happens to be where luci is
> running). 

What does a "clustat" issued on the node where the resource is
active display?

Can clustat be issued on the other nodes (especially those which
are part of the failover domain of the service that the apache
resource belongs to) as well?
If not then probably clurgmgrd isn't running on them yet (i.e.
init service rgmanager hasn't started yet).
But to be able to relocate the service (i.e. your HA webserver;
it's daft that we have a naming collision of init service and
RHCS service here)
rgmanager must have been started on those nodes likewise.
If a "service rgmanager status" says something like "halted" or
"not running" issue a "service rgmanager start".


>?All the /etc/cluster/cluster.conf files match, but I only
> see the /etc/cluster/apache directory on that node. 

That sounds queer to me.
Usually such a directory doesn't exist.
Is that where you have your shared storage (LV/GFS), i.e. fs
resource, of your webserver service mounted?
I would rather choose a different mountpoint.
But I seem to understand why you want to mount it in that place.
You only want to keep one central httpd.conf file I guess?
I'm not sure because I don't use the RHCS apache resource agent
(RA) on my clusters yet.

Let me see what parameters it cares for

# /usr/share/cluster/apache.sh meta-data|grep parameter\ name
        <parameter name="name" primary="1">
        <parameter name="server_root">
        <parameter name="config_file">
        <parameter name="httpd_options">
        <parameter name="shutdown_wait">


Yes, it looks as if you can provide it the attribute
config_file="/path/to/httpd.conf"


>?I have one
> failover domain that includes all nodes.  All nodes have the
httpd RPM
> and the GFS2 filesystem containing the?apache
service?files?mounted
> (from /etc/fstab), but I don't seem to be able to get the
service to
> move over to another node through the Conga interface. ?I
haven't
> tried anything at the command-line yet.
> 

In the shell see "clusvcadm -h" for usage info.
You could also relocate from the shell by something like e.g.

# clusvcadm -r <name_of_your_webserver_service> -m
<name_of_a_node_from_your_failover_domain>

>From another shell terminal you could follow the relocation
progress by e.g.

# clustat -i3

and additionally  "tail -f" or "less +F" on /var/log/messages on
both nodes,
i.e. the one that releases the service/resources and the one that
takes them over.


> In C. Hohberger's Red Hat SUMMIT tutorial slides, I see he used
a
> Script instead of an Apache resource. ?I also found a statement
in KB
> DOC-5897 that an apache web server clustered service needs an
IP
> address, a script, and a filesystem. ?Why a Script instead of
an
> Apache resource? ?

I would assume because the RHCS apache RA is taylored at the
httpd RPM from the RHEL repository.
As said, I'm not using the apache RA so far.
But I encountered a similar issue with the RHCS tomcat RA.
Because our customer wished a special Tomcat release build with
different paths, starting/stopping scripts, environment etc. it
would have been much more work to adapt the RHCS tomcat RA to
those special needs than providing an own script RA script.


> Is using the Apache resource why I can't get the
> service to launch on another node?
> 

If you hold the httpd.conf and all apache lib module files etc.
centralised in your shared storage as a clustered resource and
provide the mountpoint on each node it should work.
On the other hand if you cling to the dirs and files as provided
by the RHEL httpd RPM you yourself must take care of keeping any
config changes of your httpd.conf in sync over all cluster nodes.
That way should also work.
But you really ought to watch the output of clurgmgrd in
/var/log/messages on the nodes involved in the failed service
relocation.


From Colin.Simpson at iongeo.com  Thu Nov 17 10:31:25 2011
From: Colin.Simpson at iongeo.com (Colin Simpson)
Date: Thu, 17 Nov 2011 10:31:25 +0000
Subject: [Linux-cluster] DRAC5 fencing issue - unable to connect/login
In-Reply-To: <q3dpur05ysn5vbtwylk56g8y.1321512321008@email.android.com>
References: <q3dpur05ysn5vbtwylk56g8y.1321512321008@email.android.com>
Message-ID: <1321525884.21629.1.camel@bhac.iouk.ioroot.tld>

I had to do some setup to get IPMI to work with DRAC.

I added a new custom user in "Openmanage" under "Main System Chassis" ->
"Remote Access", then the users tab.  Called the user "fence" and set a
secure password. The only option I set was under "IPMI User Privileges"
section:  "Maximum LAN User Privilege granted" to Administrator. That is
the only access this user needs.

On the DRAC card iteself, On the left hand menu select "Remote Access",
then "Network/Security" at the top. Scroll down until you see "IPMI
Settings" and tick "Enable IPMI Over LAN", "Channel Privilege Level
Limit" to "Administrator" and type in 40 randon hex chars as an
encryption key.

I added the drac cards to my hosts file (I called them just the
hostnames with drac on the end)

10.1.10.22    bldg1ux01n1drac
10.1.10.23    bldg1ux01n2drac

Can be tested by:

fence_ipmilan  -a bldg1ux01n1drac -l fence -p securepassword -o status

,replace securepassword with the secure password you used. This won't
turn anything off but will show you that the cluster fence stuff will
correctly talk to the DRAC's.

Hope this helps

Colin

On Thu, 2011-11-17 at 08:45 +0200, Stefan Lesicnik wrote:
> Hi,
>
> Try use ipmilan instead of Drac.
>
> Stefan
>
> Goutam Baul <goutam.baul at rp-sg.in> wrote:
>
> Dear List,
>
> We are trying to configure a RHEL cluster using two Dell Servers (Power Edge
> R610) with iDRAC 6. We intend to use the DRAC as the fencing device. We have
> installed 32 bit RHEL 5.4 OS and the cluster packages.
>
> When we are trying to test the fencing manually from one of the member
> nodes, we are getting the following error :-
>
> [root at drmail2 ~]# fence_drac5 -a 10.50.4.22 -l root -p calvin -x -c
> "admin1->" -o status
> Unable to connect/login to fencing device
>
> But we are able to login to the DRAC using ssh as shown below:-
>
> [root at drmail2 ~]# ssh 10.50.4.22
> root at 10.50.4.22's password:
> /admin1->
>
> But when we enable the telnet service in the iDRAC card then we are getting
> the desired output.
>
> [root at drmail2 ~]# fence_drac5 -a 10.50.4.22 -l root -p calvin  -c
> "admin1->" -o status
> Status: ON
>
> Any clue please?
>
> With regards,
>
> Goutam
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


________________________________


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


From jochen.schneider at gmail.com  Thu Nov 17 15:25:14 2011
From: jochen.schneider at gmail.com (Jochen Schneider)
Date: Thu, 17 Nov 2011 16:25:14 +0100
Subject: [Linux-cluster] Failure of one of two services sharing the same
	virtual IP tears down IP
Message-ID: <CAKEmP8-dmqRv65Z3nrpSP93nTzrnOrdX6hjR8dYb5wkp0RDJ2w@mail.gmail.com>

Hi,

I'm creating a 2+1 failover cluster with 4 services of which 2 have to
run on the same node. One of the services on each node needs a (SAN)
disk, the other doesn't. I'm using HA-LVM. When I ifdown the two
interfaces connected to the SAN to simulate SAN failure, the service
needing the disk is disabled, the other keeps running. So far so
expected. Unfortunately, the virtual IP address shared by the two
services on the same machine is also removed, rendering the
still-running service useless. This rather surprised me. How can I
configure the cluster to keep the IP address up?

Thanks,

  Jochen

P.S.: The cluster.conf:

<?xml version="1.0" ?>
<cluster config_version="1" name="cluster">
  <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
  <cman shutdown_timeout="10000"/>
  <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="device1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device name="device2"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node3" nodeid="3" votes="1">
      <fence>
        <method name="1">
          <device name="device3"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice agent="fence_ilo" ipaddr="10.0.24.101" login="admin"
name="device1" passwd="password"/>
    <fencedevice agent="fence_ilo" ipaddr="10.0.24.102" login="admin"
name="device2" passwd="password"/>
    <fencedevice agent="fence_ilo" ipaddr="10.0.24.103" login="admin"
name="device3" passwd="password"/>
  </fencedevices>
  <rm>
    <failoverdomains>
      <failoverdomain name="domain1" nofailback="0">
	<failoverdomainnode name="node1" priority="1"/>
      </failoverdomain>
      <failoverdomain name="domain2" nofailback="0">
	<failoverdomainnode name="node2" priority="1"/>
      </failoverdomain>
    </failoverdomains>
    <resources>
      <ip address="10.0.24.111" monitor_link="1"/>
      <ip address="10.0.24.112" monitor_link="1"/>
    </resources>
    <service autostart="1" exclusive="0" name="disk1"
recovery="restart" domain="domain1">
      <ip ref="10.0.24.111"/>
      <script file="/etc/init.d/disk1" name="disk1"/>
      <fs device="/dev/VolGroup10/LogVol10" force_fsck="0"
force_unmount="1" fstype="ext3" mountpoint="/mnt/lun1" name="lun1"
self_fence="1"/>
      <lvm lv_name="LogVol10" name="VolGroup10/LogVol10" vg_name="VolGroup10"/>
    </service>
    <service autostart="1" exclusive="0" name="nodisk1"
recovery="restart" domain="domain1">
      <ip ref="10.0.24.111"/>
      <script file="/etc/init.d/nodisk1" name="nodisk1"/>
    </service>
    <service autostart="1" exclusive="0" name="disk2"
recovery="restart" domain="domain2">
      <ip ref="10.0.24.112"/>
      <script file="/etc/init.d/disk2" name="disk2"/>
      <fs device="/dev/VolGroup20/LogVol20" force_fsck="0"
force_unmount="1" fstype="ext3" mountpoint="/mnt/lun2" name="lun2"
self_fence="1"/>
      <lvm lv_name="LogVol20" name="VolGroup20/LogVol20" vg_name="VolGroup20"/>
    </service>
    <service autostart="1" exclusive="0" name="nodisk2"
recovery="restart" domain="domain2">
      <ip ref="10.0.24.112"/>
      <script file="/etc/init.d/nodisk2" name="nodisk2"/>
    </service>
  </rm>
</cluster>


From loforbes at alaska.edu  Thu Nov 17 19:29:48 2011
From: loforbes at alaska.edu (Liam Forbes)
Date: Thu, 17 Nov 2011 10:29:48 -0900
Subject: [Linux-cluster] clustered web service using Apache resource?
In-Reply-To: <A789DDB53ED7E94396E842EE2AC9B5FF01432B12@itdzex101.ITDZ.verwalt-berlin.de>
References: <CACXvBAr3gS+AC-DCbkygne0ucQSEXXCp-hWMH7TC70t0VfAJEg@mail.gmail.com>
	<A789DDB53ED7E94396E842EE2AC9B5FF01432B12@itdzex101.ITDZ.verwalt-berlin.de>
Message-ID: <CACXvBArfLW6hvUWNbqujdgverb2aSZp3-zHbKJjLPfZJP7-yAQ@mail.gmail.com>

On Thu, Nov 17, 2011 at 1:02 AM,  <Ralph.Grothe at itdz-berlin.de> wrote:
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Liam Forbes
>> Sent: Wednesday, November 16, 2011 21:36
>> To: linux-cluster at redhat.com
>> Subject: [Linux-cluster] clustered web service using Apache resource?
>>
>> Hello,
>>
>> I've set up a 5 node cluster using RHEL 6.1 and the HA & Resilient
>> Storage add-ons, with one service group (so far) containing an IP
>> address resource and an Apache resource. ?The service is working fine,
>> but only on the first node (which happens to be where luci is
>> running).
>
> What does a "clustat" issued on the node where the resource is
> active display?

[lforbes at hatest1 ~]$ sudo clustat
Cluster Status for hatest @ Tue Nov 15 16:17:54 2011
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 hatest1-m                                   1 Online, Local, rgmanager
 hatest2-m                                   2 Online, rgmanager
 hatest3-m                                   3 Online, rgmanager
 hatest4-m                                   4 Online, rgmanager
 hatest5-m                                   5 Online, rgmanager

 Service Name                   Owner (Last)                   State
 ------- ----                   ----- ------                   -----
 service:bbtest                 hatest1-m                      started

> Can clustat be issued on the other nodes (especially those which
> are part of the failover domain of the service that the apache
> resource belongs to) as well?

Yes.

>>?All the /etc/cluster/cluster.conf files match, but I only
>> see the /etc/cluster/apache directory on that node.
>
> That sounds queer to me.
> Usually such a directory doesn't exist.
> Is that where you have your shared storage (LV/GFS), i.e. fs
> resource, of your webserver service mounted?

No.  It appears to be created when a service with the Apache resource
is started on a node.

> I would rather choose a different mountpoint.
> But I seem to understand why you want to mount it in that place.
> You only want to keep one central httpd.conf file I guess?

No, I'm developing individual httpd.conf files, but they will all
reside in the same GFS2 filesystem (/datafs).

> In the shell see "clusvcadm -h" for usage info.
> You could also relocate from the shell by something like e.g.
>
> # clusvcadm -r <name_of_your_webserver_service> -m
> <name_of_a_node_from_your_failover_domain>
>
> From another shell terminal you could follow the relocation
> progress by e.g.
>
> # clustat -i3

I tried a `clusvcadm -r -m`  this morning.  And it works.

[lforbes at hatest1 ~]$ sudo clusvcadm -r bbtest -m hatest4-m
Trying to relocate service:bbtest to hatest4-m...Success
[lforbes at hatest1 ~]$ sudo clustat
Cluster Status for hatest @ Thu Nov 17 10:13:04 2011
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 hatest1-m                                   1 Online, Local, rgmanager
 hatest2-m                                   2 Online, rgmanager
 hatest3-m                                   3 Online, rgmanager
 hatest4-m                                   4 Online, rgmanager
 hatest5-m                                   5 Online, rgmanager

 Service Name                   Owner (Last)                   State
 ------- ----                   ----- ------                   -----
 service:bbtest                 hatest4-m                      started

So it appears that using the drop down menu in Conga to select a node
and relocate the service to that node as part of a restart is not
working.  When I do that, the service just restarts on whatever node
it's already running on (and it's very obvious in the rgmanager log
file).  The drop down menu isn't being honored.

> I would assume because the RHCS apache RA is taylored at the
> httpd RPM from the RHEL repository.
> As said, I'm not using the apache RA so far.
> But I encountered a similar issue with the RHCS tomcat RA.
> Because our customer wished a special Tomcat release build with
> different paths, starting/stopping scripts, environment etc. it
> would have been much more work to adapt the RHCS tomcat RA to
> those special needs than providing an own script RA script.

I am using the Red Hat httpd RPM.  But I see your point and it makes sense.

Maybe a little more documentation on the intended use, and how to use,
each resource agent in the Cluster Administration guide would be very
helpful for newbs like myself.

-- 
Regards,
-liam

-There are uncountably more irrational fears than rational ones. -P. Dolan
Liam Forbes? ? ?? HPC Systems Analyst,? ? ? HPC&S Group Leader,? ? ? CISSP
ARSC, U of AK, Fairbanks?? lforbes at arsc.edu 907-450-8618 fax: 907-450-8605


From goutam.baul at rp-sg.in  Fri Nov 18 04:31:46 2011
From: goutam.baul at rp-sg.in (Goutam Baul)
Date: Fri, 18 Nov 2011 10:01:46 +0530
Subject: [Linux-cluster] DRAC5 fencing issue - unable to
	connect/login[SOLVED]
Message-ID: <NCEPJJMAJKELEEEMCIHNEEHIIBAA.goutam.baul@rp-sg.in>


Dear List,

Thanks to all the people who were so kind to respond. We ultimately found
that one of the Cisco Switches being used in the network was creating the
problem and replaced the switch. The system is now working fine.

With regards,

Goutam

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111118/ef822f7a/attachment.htm>

From kclo2000 at gmail.com  Mon Nov 21 09:22:29 2011
From: kclo2000 at gmail.com (KC LO)
Date: Mon, 21 Nov 2011 17:22:29 +0800
Subject: [Linux-cluster] Network detect failure
Message-ID: <CADxp4Ogpypf4nnNgXtNNQd61sdvOcO1TOtUXBAMwhujkP5g-Sg@mail.gmail.com>

Hi all,

I just configured a two node(node01, node02) cluster and the IP resources
defined with

<ip address="10.1.1.1" monitor_link="1"/>

I observed that the server will conduct ping test every 60 seconds.

Every day, the active node will get network link detect failure several
times.(The time and frequency happen randomly)
Nov 21 14:39:29 node02 clurgmgrd[5841]: <notice> status on ip "10.1.1.1"
returned 1 (generic error)

When node02 detected failure, it can successfully fail-over to node01.
However, the network link in node01 will also detect failure within several
hours.  It will then auto fail-over to node02.  It happens between node01
and node02.  I don't see any link down in /var/log/messages or network
switches.

Do you have any ideas?

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111121/1035328f/attachment.htm>

From xuteng at szjhkx.com  Fri Nov 25 06:07:53 2011
From: xuteng at szjhkx.com (xuteng)
Date: Fri, 25 Nov 2011 14:07:53 +0800
Subject: [Linux-cluster] Failed to start a "virtual machine " service on
	RHCS in CentOS 6
Message-ID: <201111251407539160182@szjhkx.com>

Hi? All:

I have two physical machines as KVM hosts (clusterA.RHCS and clusterB.RHCS) , an iscsi target set into GFS. 
All I want is a HA Cluster which could migrate all the virtual machines on a node to another when the first node failed into some error status.

So I created a cluster "cluster" using RHCS ,added the two hosts into the cluster . created a fence device .
for every virtual machine on clusterA , create a service ,added a resouce which type is "virtual machine" , and set it's properties.

But when I start the service ,I got these errors in /var/log/message :

[root at clusterA ~]# tail -f /var/log/messages
Nov 24 16:19:29 clusterA libvirtd: Could not find keytab file: /etc/libvirt/krb5.tab: Permission denied
Nov 24 16:19:30 clusterA modclusterd: startup succeeded
Nov 24 16:19:48 clusterA rgmanager[6754]: Executing /etc/init.d/httpd status
Nov 24 16:19:56 clusterA modcluster: Starting service: wuguang-xp on node 
Nov 24 16:19:56 clusterA rgmanager[2198]: Stopping service vm:wuguang-xp
Nov 24 16:19:56 clusterA rgmanager[6850]: Using /etc/libvirt/qemu/wuguang-xp.xml instead of searching /etc/libvirt/qemu
Nov 24 16:19:56 clusterA rgmanager[6906]: xend/libvirtd is dead; cannot stop wuguang-xp
Nov 24 16:19:56 clusterA rgmanager[2198]: stop on vm "wuguang-xp" returned 1 (generic error)
Nov 24 16:19:56 clusterA rgmanager[2198]: Marking vm:wuguang-xp as 'disabled', but some resources may still be allocated!
Nov 24 16:19:56 clusterA rgmanager[2198]: Service vm:wuguang-xp is disabled


But I set the libvirt tcp authority to "none" , here is my libvirtd.conf:


listen_tls = 0


listen_tcp = 1


tcp_port = "16509"


auth_tcp = "none"


my cluster.conf is :

<?xml version="1.0"?>
<cluster config_version="83" name="cluster">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="clusterA.RHCS" nodeid="1" votes="1">
<fence>
<method name="fence1">
<device name="fence1"/>
</method>
</fence>
</clusternode>
<clusternode name="clusterB.RHCS" nodeid="2" votes="1">
<fence>
<method name="fence2">
<device name="fence2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_scsi" name="fence1" nodename="clusterA.RHCS"/>
<fencedevice agent="fence_scsi" name="fence2" nodename="clusterB.RHCS"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="FD" nofailback="0" ordered="1" restricted="1">
<failoverdomainnode name="clusterA.RHCS" priority="1"/>
<failoverdomainnode name="clusterB.RHCS" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="192.168.6.250/24" monitor_link="on" sleeptime="10"/>
<script file="/etc/init.d/httpd" name="www_script"/>
<clusterfs device="/dev/dm-0" force_unmount="on" fsid="56449" fstype="gfs2" mountpoint="/var/lib/libvirt/images/iscsi" name="gfs2" self_fence="on"/>
</resources>
<service autostart="1" domain="FD" exclusive="0" name="www_service" recovery="relocate">
<ip ref="192.168.6.250/24"/>
<script ref="www_script"/>
<clusterfs ref="gfs2"/>
</service>
<vm autostart="1" domain="FD" exclusive="0" migrate="live" name="wuguang-xp" path="/etc/libvirt/qemu" recovery="relocate" xmlfile="/etc/libvirt/qemu/wuguang-xp.xml"/>
</rm>
</cluster>

What's wrong with my configuration of RHCS or libvirt?
Thanks in advance

wade


xuteng via foxmail
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111125/0685b16a/attachment.htm>

From christiankwall-qsa at yahoo.com  Sun Nov 27 12:52:52 2011
From: christiankwall-qsa at yahoo.com (Chris Kwall)
Date: Sun, 27 Nov 2011 12:52:52 +0000 (GMT)
Subject: [Linux-cluster] RHCS simple ip-failover problem
Message-ID: <1322398372.35164.YahooMailNeo@web29717.mail.ird.yahoo.com>

Dear List,

I've?received today a evaluation License for RHEL 6.1[Red Hat Enterprise Linux Server release 6.1 (Santiago)] for learning purposes with the RHCS.

So?I've?setup 2?machines?vbox1 and vbox2, with 2 Interfaces (Intranet + Hearbeat) and created a cluster.conf with a simple IP-fail-over?Scenario.
Starting the cluster no node takes the IP.
After checking cluster state I'm a little bit confused, it doesn't list the service.

Maybe i understood something wrong from the documentation.

[root at vbox1 cluster]# clustat ?-l
Cluster Status for vbox @ Sun Nov 27 14:38:00 2011
Member Status: Quorate

?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
?vbox1.example.local ? ? ? ? ? ? ? ? ? ? ? ? 1 Online, Local
?vbox2.example.local ? ? ? ? ? ? ? ? ? ? ? ? 2 Online

By testing the rules the system takes the ip-address.
[root at vbox1 cluster]# rg_test test /etc/cluster/cluster.conf start service webprod
Running in test mode.
..
Starting webprod...

<debug> ?Link for eth0: Detected
Link for eth0: Detected
<info> ? Adding IPv4 address 192.168.99.100/24 to eth0
Adding IPv4 address 192.168.99.100/24 to eth0
<debug> ?Pinging addr 192.168.99.100 from deveth0
Pinging addr 192.168.99.100 from deveth0
<debug> ?Sending gratuitous ARP: 192.168.99.100 00:0c:29:00:d1:05 brd ff:ff:ff:ff:ff:ff
Sending gratuitous ARP: 192.168.99.100 00:0c:29:00:d1:05 brd ff:ff:ff:ff:ff:ff
rdisc: no process killed
Start of webprod complete

[root at vbox1 cluster]# ipaddr list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdiscpfifo_fast state UP qlen 1000
? ? link/ether 00:0c:29:00:d1:05 brd ff:ff:ff:ff:ff:ff
? ? inet 192.168.99.11/24 brd 192.168.99.255 scope global eth0
? ? inet 192.168.99.100/24 scope global secondary eth0
? ? inet6 fe80::20c:29ff:fe00:d105/64 scope link
? ? ? ?valid_lft forever preferred_lft forever

maybe someone give me a point to the right direction?

/etc/hosts
# Intranet
192.168.99.11 vbox1.example.local vbox1
192.168.99.12 vbox2.example.local vbox2
# Heartbeat
192.168.1.11 h-vbox1.example.local h-vbox1
192.168.1.12 h-vbox2.example.local h-vbox2
# Service-IP
192.168.99.100 vbox.example.local vbox

/etc/cluster/cluster.conf (for testing purposes only manual fencing. later i'l try it with?fence_ipmilan)

<?xml version="1.0"?>
<cluster config_version="3" name="vbox">
<cman expected_votes="1" two_node="1"/>
<clusternodes>
<clusternode name="vbox1.example.local" nodeid="1">
<altname name="h-vbox1.example.local"/>
<fence>
<method name="n1">
<device name="human" nodename="vbox1.example.local"/>
</method>
</fence>
</clusternode>
<clusternode name="vbox2.example.local" nodeid="2">
<altname name="h-vbox2.example.local"/>
<fence>
<method name="n2">
<device name="human" nodename="vbox2.example.local"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_manual" name="human"/>
</fencedevices>
<rm>
<resources>
<ip address="192.168.99.100" monitor_link="on" sleeptime="15"/>
</resources>
<service autostart="1" domain="web" exclusive="0" name="webprod" recovery="restart">
<ip ref="192.168.99.100"/>
</service>
<failoverdomains>
<failoverdomain name="web" nofailback="0" ordered="1" restricted="0">
<failoverdomainnode name="vbox1.example.local" priority="1"/>
<failoverdomainnode name="vbox2.example.local" priority="2"/>
</failoverdomain>
</failoverdomains>
</rm>
</cluster>

Thanks in advance
Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111127/75de55b6/attachment.htm>

From Colin.Simpson at iongeo.com  Sun Nov 27 14:24:48 2011
From: Colin.Simpson at iongeo.com (Colin Simpson)
Date: Sun, 27 Nov 2011 14:24:48 +0000
Subject: [Linux-cluster] RHCS simple ip-failover problem
In-Reply-To: <1322398372.35164.YahooMailNeo@web29717.mail.ird.yahoo.com>
References: <1322398372.35164.YahooMailNeo@web29717.mail.ird.yahoo.com>
Message-ID: <F7D01032076B4C4BAFA5A43ECD5DC50C078CCA@EDI2EXMBX03.ioinc.ioroot.tld>

Maybe I'm missing something but it just looks like the "rgmanager" service isn't started?

Colin


________________________________
From: linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com] on behalf of Chris Kwall [christiankwall-qsa at yahoo.com]
Sent: 27 November 2011 12:52
To: linux-cluster at redhat.com
Subject: [Linux-cluster] RHCS simple ip-failover problem

Dear List,

I've received today a evaluation License for RHEL 6.1[Red Hat Enterprise Linux Server release 6.1 (Santiago)] for learning purposes with the RHCS.

So I've setup 2 machines vbox1 and vbox2, with 2 Interfaces (Intranet + Hearbeat) and created a cluster.conf with a simple IP-fail-over Scenario.
Starting the cluster no node takes the IP.
After checking cluster state I'm a little bit confused, it doesn't list the service.
Maybe i understood something wrong from the documentation.

[root at vbox1 cluster]# clustat  -l
Cluster Status for vbox @ Sun Nov 27 14:38:00 2011
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 vbox1.example.local                         1 Online, Local
 vbox2.example.local                         2 Online

By testing the rules the system takes the ip-address.
[root at vbox1 cluster]# rg_test test /etc/cluster/cluster.conf start service webprod
Running in test mode.
..
Starting webprod...
<debug>  Link for eth0: Detected
Link for eth0: Detected
<info>   Adding IPv4 address 192.168.99.100/24 to eth0
Adding IPv4 address 192.168.99.100/24 to eth0
<debug>  Pinging addr 192.168.99.100 from dev eth0
Pinging addr 192.168.99.100 from dev eth0
<debug>  Sending gratuitous ARP: 192.168.99.100 00:0c:29:00:d1:05 brd ff:ff:ff:ff:ff:ff
Sending gratuitous ARP: 192.168.99.100 00:0c:29:00:d1:05 brd ff:ff:ff:ff:ff:ff
rdisc: no process killed
Start of webprod complete

[root at vbox1 cluster]# ip addr list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:00:d1:05 brd ff:ff:ff:ff:ff:ff
    inet 192.168.99.11/24 brd 192.168.99.255 scope global eth0
    inet 192.168.99.100/24 scope global secondary eth0
    inet6 fe80::20c:29ff:fe00:d105/64 scope link
       valid_lft forever preferred_lft forever

maybe someone give me a point to the right direction?

/etc/hosts
# Intranet
192.168.99.11 vbox1.example.local vbox1
192.168.99.12 vbox2.example.local vbox2
# Heartbeat
192.168.1.11 h-vbox1.example.local h-vbox1
192.168.1.12 h-vbox2.example.local h-vbox2
# Service-IP
192.168.99.100 vbox.example.local vbox

/etc/cluster/cluster.conf (for testing purposes only manual fencing. later i'l try it with fence_ipmilan)

<?xml version="1.0"?>
<cluster config_version="3" name="vbox">
<cman expected_votes="1" two_node="1"/>
<clusternodes>
<clusternode name="vbox1.example.local" nodeid="1">
<altname name="h-vbox1.example.local"/>
<fence>
<method name="n1">
<device name="human" nodename="vbox1.example.local"/>
</method>
</fence>
</clusternode>
<clusternode name="vbox2.example.local" nodeid="2">
<altname name="h-vbox2.example.local"/>
<fence>
<method name="n2">
<device name="human" nodename="vbox2.example.local"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_manual" name="human"/>
</fencedevices>
<rm>
<resources>
<ip address="192.168.99.100" monitor_link="on" sleeptime="15"/>
</resources>
<service autostart="1" domain="web" exclusive="0" name="webprod" recovery="restart">
<ip ref="192.168.99.100"/>
</service>
<failoverdomains>
<failoverdomain name="web" nofailback="0" ordered="1" restricted="0">
<failoverdomainnode name="vbox1.example.local" priority="1"/>
<failoverdomainnode name="vbox2.example.local" priority="2"/>
</failoverdomain>
</failoverdomains>
</rm>
</cluster>

Thanks in advance
Chris

________________________________


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111127/b532c91b/attachment.htm>

From christiankwall-qsa at yahoo.com  Sun Nov 27 16:11:53 2011
From: christiankwall-qsa at yahoo.com (Chris Kwall)
Date: Sun, 27 Nov 2011 16:11:53 +0000 (GMT)
Subject: [Linux-cluster] RHCS simple ip-failover problem
In-Reply-To: <F7D01032076B4C4BAFA5A43ECD5DC50C078CCA@EDI2EXMBX03.ioinc.ioroot.tld>
References: <1322398372.35164.YahooMailNeo@web29717.mail.ird.yahoo.com>
	<F7D01032076B4C4BAFA5A43ECD5DC50C078CCA@EDI2EXMBX03.ioinc.ioroot.tld>
Message-ID: <1322410313.84093.YahooMailNeo@web29718.mail.ird.yahoo.com>

I?didn't?found at Cluster Administration Documentation (docs.redhat.com) that it's?necessarily?to start rgmanager?separated.

now the Service up and running.

?Service Name ? ? ? ? ? ? ? ? ? Owner (Last) ? ? ? ? ? ? ? ? ? State

?------- ---- ? ? ? ? ? ? ? ? ? ----- ------ ? ? ? ? ? ? ? ? ? -----
?service:webprod ? ? ? ? ? ? ? ?vbox1.example.local ? ? ? ? ? ?started

Thank you very much Colin.?


>________________________________
> Von: Colin Simpson <Colin.Simpson at iongeo.com>
>An: Chris Kwall <christiankwall-qsa at yahoo.com>; linux clustering <linux-cluster at redhat.com> 
>Gesendet: 15:24 Sonntag, 27.November 2011
>Betreff: RE: [Linux-cluster] RHCS simple ip-failover problem
> 
>
> 
>Maybe I'm missing something but it just looks like the "rgmanager" service isn't started?
>
>Colin
>
>
>
>
>________________________________
> 
>From: linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com] on behalf of Chris Kwall [christiankwall-qsa at yahoo.com]
>Sent: 27 November 2011 12:52
>To: linux-cluster at redhat.com
>Subject: [Linux-cluster] RHCS simple ip-failover problem
>
>
>Dear List,
>
>
>I've?received today a evaluation License for RHEL 6.1[Red Hat Enterprise Linux Server release 6.1 (Santiago)] for learning purposes with the RHCS.
>
>
>So?I've?setup 2?machines?vbox1 and vbox2, with 2 Interfaces (Intranet + Hearbeat) and created a cluster.conf with a simple IP-fail-over?Scenario.
>Starting the cluster no node takes the IP.
>After checking cluster state I'm a little bit confused, it doesn't list the service.
>
>Maybe i understood something wrong from the documentation.
>
>
>[root at vbox1 cluster]# clustat ?-l
>Cluster Status for vbox @ Sun Nov 27 14:38:00 2011
>Member Status: Quorate
>
>
>?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>?vbox1.example.local ? ? ? ? ? ? ? ? ? ? ? ? 1 Online, Local
>?vbox2.example.local ? ? ? ? ? ? ? ? ? ? ? ? 2 Online
>
>
>By testing the rules the system takes the ip-address.
>[root at vbox1 cluster]# rg_test test /etc/cluster/cluster.conf start service webprod
>Running in test mode.
>..
>Starting webprod...
>
><debug> ?Link for eth0: Detected
>Link for eth0: Detected
><info> ? Adding IPv4 address 192.168.99.100/24 to eth0
>Adding IPv4 address 192.168.99.100/24 to eth0
><debug> ?Pinging addr 192.168.99.100 from dev eth0
>Pinging addr 192.168.99.100 from dev eth0
><debug> ?Sending gratuitous ARP: 192.168.99.100 00:0c:29:00:d1:05 brd ff:ff:ff:ff:ff:ff
>Sending gratuitous ARP: 192.168.99.100 00:0c:29:00:d1:05 brd ff:ff:ff:ff:ff:ff
>rdisc: no process killed
>Start of webprod complete
>
>
>[root at vbox1 cluster]# ip addr list eth0
>2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
>? ? link/ether 00:0c:29:00:d1:05 brd ff:ff:ff:ff:ff:ff
>? ? inet 192.168.99.11/24 brd 192.168.99.255 scope global eth0
>? ? inet 192.168.99.100/24 scope global secondary eth0
>? ? inet6 fe80::20c:29ff:fe00:d105/64 scope link
>? ? ? ?valid_lft forever preferred_lft forever
>
>
>maybe someone give me a point to the right direction?
>
>
>/etc/hosts
># Intranet
>192.168.99.11 vbox1.example.local vbox1
>192.168.99.12 vbox2.example.local vbox2
># Heartbeat
>192.168.1.11 h-vbox1.example.local h-vbox1
>192.168.1.12 h-vbox2.example.local h-vbox2
># Service-IP
>192.168.99.100 vbox.example.local vbox
>
>
>/etc/cluster/cluster.conf (for testing purposes only manual fencing. later i'l try it with?fence_ipmilan)
>
>
><?xml version="1.0"?>
><cluster config_version="3" name="vbox">
><cman expected_votes="1" two_node="1"/>
><clusternodes>
><clusternode name="vbox1.example.local" nodeid="1">
><altname name="h-vbox1.example.local"/>
><fence>
><method name="n1">
><device name="human" nodename="vbox1.example.local"/>
></method>
></fence>
></clusternode>
><clusternode name="vbox2.example.local" nodeid="2">
><altname name="h-vbox2.example.local"/>
><fence>
><method name="n2">
><device name="human" nodename="vbox2.example.local"/>
></method>
></fence>
></clusternode>
></clusternodes>
><fencedevices>
><fencedevice agent="fence_manual" name="human"/>
></fencedevices>
><rm>
><resources>
><ip address="192.168.99.100" monitor_link="on" sleeptime="15"/>
></resources>
><service autostart="1" domain="web" exclusive="0" name="webprod" recovery="restart">
><ip ref="192.168.99.100"/>
></service>
><failoverdomains>
><failoverdomain name="web" nofailback="0" ordered="1" restricted="0">
><failoverdomainnode name="vbox1.example.local" priority="1"/>
><failoverdomainnode name="vbox2.example.local" priority="2"/>
></failoverdomain>
></failoverdomains>
></rm>
></cluster>
>
>
>Thanks in advance
>Chris
>>________________________________
> 
>
>This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended
 recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the
 original.
>
> 
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111127/74065755/attachment.htm>

From Colin.Simpson at iongeo.com  Sun Nov 27 17:30:09 2011
From: Colin.Simpson at iongeo.com (Colin Simpson)
Date: Sun, 27 Nov 2011 17:30:09 +0000
Subject: [Linux-cluster] RHCS simple ip-failover problem
In-Reply-To: <1322410313.84093.YahooMailNeo@web29718.mail.ird.yahoo.com>
References: <1322398372.35164.YahooMailNeo@web29717.mail.ird.yahoo.com>
	<F7D01032076B4C4BAFA5A43ECD5DC50C078CCA@EDI2EXMBX03.ioinc.ioroot.tld>
	<1322410313.84093.YahooMailNeo@web29718.mail.ird.yahoo.com>
Message-ID: <1322415009.459.10.camel@bhac.iouk.ioroot.tld>

Glad you are working. You may have all these in place now but you'll
need the following services (and they should start in this order):

cman
clvmd - If using the cluster aware Logical Volume manager
gfs2 - Unless you purely demand mount GFS2 or don't use GFS2 at all.
ricci - To distribute cluster configuration files between nodes
rgmanager - Resource manager
lucci - If you want the web configuration interface (only should be on
one node)

Though ricci and rgmanager aren't dependent on each other, so could be
flipped around in the above order.

Thanks

Colin


On Sun, 2011-11-27 at 16:11 +0000, Chris Kwall wrote:
> I didn't found at Cluster Administration Documentation
> (docs.redhat.com) that it's necessarily to start rgmanager separated.
>
> now the Service up and running.
>
>
>  Service Name                   Owner (Last)                   State
>
>  ------- ----                   ----- ------                   -----
>  service:webprod                vbox1.example.local            started
>
>
> Thank you very much Colin.
>
>
>
>         ______________________________________________________________
>         Von: Colin Simpson <Colin.Simpson at iongeo.com>
>         An: Chris Kwall <christiankwall-qsa at yahoo.com>; linux
>         clustering <linux-cluster at redhat.com>
>         Gesendet: 15:24 Sonntag, 27.November 2011
>         Betreff: RE: [Linux-cluster] RHCS simple ip-failover problem
>
>         Maybe I'm missing something but it just looks like the
>         "rgmanager" service isn't started?
>
>         Colin
>
>
>
>         ______________________________________________________________
>         From: linux-cluster-bounces at redhat.com
>         [linux-cluster-bounces at redhat.com] on behalf of Chris Kwall
>         [christiankwall-qsa at yahoo.com]
>         Sent: 27 November 2011 12:52
>         To: linux-cluster at redhat.com
>         Subject: [Linux-cluster] RHCS simple ip-failover problem
>
>
>         Dear List,
>
>
>         I've received today a evaluation License for RHEL 6.1[Red Hat
>         Enterprise Linux Server release 6.1 (Santiago)] for learning
>         purposes with the RHCS.
>
>
>         So I've setup 2 machines vbox1 and vbox2, with 2 Interfaces
>         (Intranet + Hearbeat) and created a cluster.conf with a simple
>         IP-fail-over Scenario.
>         Starting the cluster no node takes the IP.
>         After checking cluster state I'm a little bit confused, it
>         doesn't list the service.
>
>         Maybe i understood something wrong from the documentation.
>
>
>         [root at vbox1 cluster]# clustat  -l
>         Cluster Status for vbox @ Sun Nov 27 14:38:00 2011
>         Member Status: Quorate
>
>
>          Member Name                             ID   Status
>          ------ ----                             ---- ------
>          vbox1.example.local                         1 Online, Local
>          vbox2.example.local                         2 Online
>
>
>         By testing the rules the system takes the ip-address.
>         [root at vbox1 cluster]# rg_test test /etc/cluster/cluster.conf
>         start service webprod
>         Running in test mode.
>         ..
>         Starting webprod...
>
>         <debug>  Link for eth0: Detected
>         Link for eth0: Detected
>         <info>   Adding IPv4 address 192.168.99.100/24 to eth0
>         Adding IPv4 address 192.168.99.100/24 to eth0
>         <debug>  Pinging addr 192.168.99.100 from dev eth0
>         Pinging addr 192.168.99.100 from dev eth0
>         <debug>  Sending gratuitous ARP: 192.168.99.100
>         00:0c:29:00:d1:05 brd ff:ff:ff:ff:ff:ff
>         Sending gratuitous ARP: 192.168.99.100 00:0c:29:00:d1:05 brd
>         ff:ff:ff:ff:ff:ff
>         rdisc: no process killed
>         Start of webprod complete
>
>
>         [root at vbox1 cluster]# ip addr list eth0
>         2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
>         pfifo_fast state UP qlen 1000
>             link/ether 00:0c:29:00:d1:05 brd ff:ff:ff:ff:ff:ff
>             inet 192.168.99.11/24 brd 192.168.99.255 scope global eth0
>             inet 192.168.99.100/24 scope global secondary eth0
>             inet6 fe80::20c:29ff:fe00:d105/64 scope link
>                valid_lft forever preferred_lft forever
>
>
>         maybe someone give me a point to the right direction?
>
>
>         /etc/hosts
>         # Intranet
>         192.168.99.11 vbox1.example.local vbox1
>         192.168.99.12 vbox2.example.local vbox2
>         # Heartbeat
>         192.168.1.11 h-vbox1.example.local h-vbox1
>         192.168.1.12 h-vbox2.example.local h-vbox2
>         # Service-IP
>         192.168.99.100 vbox.example.local vbox
>
>
>         /etc/cluster/cluster.conf (for testing purposes only manual
>         fencing. later i'l try it with fence_ipmilan)
>
>
>         <?xml version="1.0"?>
>         <cluster config_version="3" name="vbox">
>         <cman expected_votes="1" two_node="1"/>
>         <clusternodes>
>         <clusternode name="vbox1.example.local" nodeid="1">
>         <altname name="h-vbox1.example.local"/>
>         <fence>
>         <method name="n1">
>         <device name="human" nodename="vbox1.example.local"/>
>         </method>
>         </fence>
>         </clusternode>
>         <clusternode name="vbox2.example.local" nodeid="2">
>         <altname name="h-vbox2.example.local"/>
>         <fence>
>         <method name="n2">
>         <device name="human" nodename="vbox2.example.local"/>
>         </method>
>         </fence>
>         </clusternode>
>         </clusternodes>
>         <fencedevices>
>         <fencedevice agent="fence_manual" name="human"/>
>         </fencedevices>
>         <rm>
>         <resources>
>         <ip address="192.168.99.100" monitor_link="on"
>         sleeptime="15"/>
>         </resources>
>         <service autostart="1" domain="web" exclusive="0"
>         name="webprod" recovery="restart">
>         <ip ref="192.168.99.100"/>
>         </service>
>         <failoverdomains>
>         <failoverdomain name="web" nofailback="0" ordered="1"
>         restricted="0">
>         <failoverdomainnode name="vbox1.example.local" priority="1"/>
>         <failoverdomainnode name="vbox2.example.local" priority="2"/>
>         </failoverdomain>
>         </failoverdomains>
>         </rm>
>         </cluster>
>
>
>         Thanks in advance
>         Chris
>
>
>         ______________________________________________________________
>
>
>         This email and any files transmitted with it are confidential
>         and are intended solely for the use of the individual or
>         entity to whom they are addressed. If you are not the original
>         recipient or the person responsible for delivering the email
>         to the intended recipient, be advised that you have received
>         this email in error, and that any use, dissemination,
>         forwarding, printing, or copying of this email is strictly
>         prohibited. If you received this email in error, please
>         immediately notify the sender and delete the original.
>
>
>
>
>


________________________________


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


From criley at erad.com  Mon Nov 28 13:58:15 2011
From: criley at erad.com (Charles Riley)
Date: Mon, 28 Nov 2011 08:58:15 -0500
Subject: [Linux-cluster] RHCS simple ip-failover problem
In-Reply-To: <1322398372.35164.YahooMailNeo@web29717.mail.ird.yahoo.com>
References: <1322398372.35164.YahooMailNeo@web29717.mail.ird.yahoo.com>
Message-ID: <CAPSBQNO7mDqRjF7RVi753w-0Jc4-tPEhRJVU91_Ooy9DuX7COQ@mail.gmail.com>

Huh?
In the output you sent, vbox1 has the IP

Charles

On Sun, Nov 27, 2011 at 7:52 AM, Chris Kwall
<christiankwall-qsa at yahoo.com>wrote:

> Dear List,
>
> I've received today a evaluation License for RHEL 6.1[Red Hat Enterprise
> Linux Server release 6.1 (Santiago)] for learning purposes with the RHCS.
>
> So I've setup 2 machines vbox1 and vbox2, with 2 Interfaces (Intranet +
> Hearbeat) and created a cluster.conf with a simple IP-fail-over Scenario.
> Starting the cluster no node takes the IP.
> After checking cluster state I'm a little bit confused, it doesn't list
> the service.
> Maybe i understood something wrong from the documentation.
>
> [root at vbox1 cluster]# clustat  -l
> Cluster Status for vbox @ Sun Nov 27 14:38:00 2011
> Member Status: Quorate
>
>  Member Name                             ID   Status
>  ------ ----                             ---- ------
>  vbox1.example.local                         1 Online, Local
>  vbox2.example.local                         2 Online
>
> By testing the rules the system takes the ip-address.
> [root at vbox1 cluster]# rg_test test /etc/cluster/cluster.conf start
> service webprod
> Running in test mode.
> ..
> Starting webprod...
> <debug>  Link for eth0: Detected
> Link for eth0: Detected
> <info>   Adding IPv4 address 192.168.99.100/24 to eth0
> Adding IPv4 address 192.168.99.100/24 to eth0
> <debug>  Pinging addr 192.168.99.100 from dev eth0
> Pinging addr 192.168.99.100 from dev eth0
> <debug>  Sending gratuitous ARP: 192.168.99.100 00:0c:29:00:d1:05 brdff:ff:ff:ff:ff:ff
> Sending gratuitous ARP: 192.168.99.100 00:0c:29:00:d1:05 brdff:ff:ff:ff:ff:ff
> rdisc: no process killed
> Start of webprod complete
>
> [root at vbox1 cluster]# ip addr list eth0
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> state UP qlen 1000
>     link/ether 00:0c:29:00:d1:05 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.99.11/24 brd 192.168.99.255 scope global eth0
>     inet 192.168.99.100/24 scope global secondary eth0
>     inet6 fe80::20c:29ff:fe00:d105/64 scope link
>        valid_lft forever preferred_lft forever
>
> maybe someone give me a point to the right direction?
>
> /etc/hosts
> # Intranet
> 192.168.99.11 vbox1.example.local vbox1
> 192.168.99.12 vbox2.example.local vbox2
> # Heartbeat
> 192.168.1.11 h-vbox1.example.local h-vbox1
> 192.168.1.12 h-vbox2.example.local h-vbox2
> # Service-IP
> 192.168.99.100 vbox.example.local vbox
>
> /etc/cluster/cluster.conf (for testing purposes only manual fencing.
> later i'l try it with fence_ipmilan)
>
> <?xml version="1.0"?>
> <cluster config_version="3" name="vbox">
> <cman expected_votes="1" two_node="1"/>
> <clusternodes>
> <clusternode name="vbox1.example.local" nodeid="1">
> <altname name="h-vbox1.example.local"/>
> <fence>
> <method name="n1">
> <device name="human" nodename="vbox1.example.local"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="vbox2.example.local" nodeid="2">
> <altname name="h-vbox2.example.local"/>
> <fence>
> <method name="n2">
> <device name="human" nodename="vbox2.example.local"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <fencedevices>
> <fencedevice agent="fence_manual" name="human"/>
> </fencedevices>
> <rm>
> <resources>
> <ip address="192.168.99.100" monitor_link="on" sleeptime="15"/>
> </resources>
> <service autostart="1" domain="web" exclusive="0" name="webprod"
> recovery="restart">
> <ip ref="192.168.99.100"/>
> </service>
> <failoverdomains>
> <failoverdomain name="web" nofailback="0" ordered="1" restricted="0">
> <failoverdomainnode name="vbox1.example.local" priority="1"/>
> <failoverdomainnode name="vbox2.example.local" priority="2"/>
> </failoverdomain>
> </failoverdomains>
> </rm>
> </cluster>
>
> Thanks in advance
> Chris
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111128/87854ac3/attachment.htm>

From nicolas at ecarnot.net  Mon Nov 28 16:08:03 2011
From: nicolas at ecarnot.net (Nicolas Ecarnot)
Date: Mon, 28 Nov 2011 17:08:03 +0100
Subject: [Linux-cluster] Sizing partition for CTDB_RECOVERY_LOCK files
Message-ID: <4ED3B1E3.4020407@ecarnot.net>

Hi,

Cluster newbie calling for hint!

I have set up a two nodes cluster with cman, qdisk, clvm, gfs2 and a 
fiber channel SAN.
On the SAN are two LUNs on which I ran mkfs.gfs2 with no issue, and 
they're mounting nicely. (Actually, a third LUN is used for qdisk.)

Next step is CTDB.

I read here :
http://wiki.samba.org/index.php/GFS_CTDB_HowTo#GFS2_Configuration
and here :
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/Cluster_Administration/s1-GFS2-Configuration-CA.html
that I have to create a supplementary shared partition on wich I will 
store a lock file.

Two questions :

- I almost read every web page of cluster documentation on internet and 
couldn't find information about the sizing of this partition / file 
growing? The URL I quoted are giving examples, but I don't know how to 
relate to real world.

- I would like to use CTDB to manage samba dealing with *TWO* shares, 
using *TWO* LUNs on my SAN.
Does the recovery_lock setup has to be duplicated, or is this a general 
setup, which will work fine with my smb.conf, whatever shares are managed?

Thank you.

-- 
Nicolas Ecarnot


From fdinitto at redhat.com  Mon Nov 28 18:33:33 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 28 Nov 2011 19:33:33 +0100
Subject: [Linux-cluster] Upcoming changes in cluster releases/branches
Message-ID: <4ED3D3FD.10400@redhat.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi all,

first of all I am very happy to announce that Digimer is going to be
our new release manager.

Digimer has been contributing to the cluster project in many many
different ways, very active and helpful in the community, with a
strong dedication to write documentation, testing and endless other
things.

So I finally decided to pop the question and Digimer happily accepted
to take over the responsibility for cluster release management.

The transition is going to be smooth and done over time, please be
gentle as one of the worst nightmare in preparing the very first
release, and generally 99% of the time doesn't even build ;)

- From a stricter technical/versioning side of the story:

- - digimer and I are going to release 3.1.8 soon'ish
- - after that I will lockdown STABLE31 branch temporary and open
  STABLE32.

We have several dozens commits in the pipe with all kind of bug fixes,
coming from static code analysis. Most of those changes are hard to
test in isolation. Plan is to push those changes in "chunks" into
STABLE32 branch and do simple test releases (3.1.9x), to make it
easier to test. It will be very useful for people to report bugs ASAP.

NOTE TO PACKAGERS: shipping of 3.1.9x is strongly discouraged. Use
your temporary archive/private ppa or secret repositories. Wait for
3.2.x series before uploading to any unstable/stable distribution. You
have been warned.

Cheers
Fabio
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJO09P8AAoJEAgUGcMLQ3qJ6YwP/jHfHBq2ZoDQZHAyxNAErBXt
cTL1Pa+z+9mq+HT7bfvwdGdMpRniIMWBcmNjfPhMkhxXcabQCuS33coMQ/mq6y3K
Vz80MyqD4pY75b8foylQ+z0XI4qwbpguaoUMKgXF9FNx1VBYEmzdeH3X5U/qEy7f
PELP258dGV/Ho4Rq0Wy7nOV+xNeLFET42cfqeCuiz/H6RbugpVBKcUICs4bWfBJA
ztPz0AXFl5Xh2ztMDi15y/Tp6g6llnOsb9n4mnd+yRXTOWGuPIaakz1ZtHnkXL0i
8EFsfoFjB8wapLkSGBjLmRWgOt/ThQJkm/y1th5qBgtrEUcykayzPVWZOWeFo1qp
K/tBiyYTzgxo9gTa4zMFlNs0Z7lTQ8N0uQy5oLpXbds/aZi6qB7IzkRgD82kAIiD
C/fiKLbwaSy/MdwbOOi3YUrqNOyymnuNZXzO3c1rEQY8GeeQM7xWIuPykYLhN4MG
8HHuXq7NxUMb+8N75I5u2lODQPPKh0SMgF4fyWOH77F9HUQZkhPOlV0wwDKL0wc+
sRnThytA9X5j/o3Vz1LXThxX6vpIOnd0+msIUFWYYRE1q5GcGQMJ+MXwWz5KtYQK
RUVA21578eVzZrmSGzM2Vvm66+56AALTjbMkTcuhFL2x/LZf6z5ZeK6GCLc/Xhb7
8pwTxhJcVOF+xC8QHel4
=ucay
-----END PGP SIGNATURE-----


From Nicholas.Geovanis at uscellular.com  Mon Nov 28 21:59:05 2011
From: Nicholas.Geovanis at uscellular.com (Geovanis, Nicholas)
Date: Mon, 28 Nov 2011 15:59:05 -0600
Subject: [Linux-cluster] Fence_vmware_soap
Message-ID: <A64E914D72373B4287DF2222856FFF3512E66E67@ILBENEVS001.int.usc.local>

RH Cluster Services on RHEL 5.7, all nodes running in Vmware VMs:
The only doc I can find on the fence_vmware_soap fencing agent is the
script itself and the man page for it. There is no background info in
either and no examples. I can get my Vcenter server to respond to a
"list" subcommand but anything else receives "Failed: Unable to obtain
correct plug status or plug is not available". Sadly, the "successful"
list only retrieves exactly 100 entries from one of the 8 (Vmware)
clusters running (same cluster every time, same 100 VMs and templates
every time).

So I need some serious clues with this script and I can't find a thing
useful out there. I don't think I should have to delve into the Vmware
SOAP API in order to use the fencing agent, but I'm running out of
options. Is there something better in the 6.x doc or someone out there
who can point me at some doc? Any help appreciated. Thanks......Nick


Nick Geovanis
US Cellular/Kforce Inc
e. Nicholas.Geovanis at uscellular.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111128/b3f0fb3b/attachment.htm>

From rossnick-lists at cybercat.ca  Tue Nov 29 20:51:52 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Tue, 29 Nov 2011 15:51:52 -0500
Subject: [Linux-cluster] Bringing up new cluster to life...
Message-ID: <22B7B1894E0F41BD8903740171EEE17A@versa>

Hi !

We curently have a cluster running fine with RH 6.1, and now we are making a 
second one in another location. They both share the same "public" network, 
bridge with a lan-ex. Each cluster has it's own private lan for the "totem" 
communication.

Now, I made, for the new cluster, 3 of the 8 machines run and attached to 
the local fiber channel network.

Each node has 2 nics, one for the public services, and one for the corosync 
communcation.

My very basic config looks like :

<?xml version="1.0"?>
<cluster config_version="1001" name="CyberClusterAS">
  <cman/>
  <logging debug="off"/>
  <gfs_controld plock_ownership="1" plock_rate_limit="500"/>
  <clusternodes>
    <clusternode name="node201.lan.cybercat.priv" nodeid="1">
      <fence>
      </fence>
    </clusternode>
    <clusternode name="node202.lan.cybercat.priv" nodeid="2">
      <fence>
      </fence>
    </clusternode>
    <clusternode name="node203.lan.cybercat.priv" nodeid="3">
      <fence>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
  </fencedevices>
  <rm>
  </rm>
</cluster>

node(201|202|203).lan.cybercat.priv is not in our dns and is in each of the 
nodes host file, and points to local ip in a dedicated vlan. I can ping and 
ssh from one node to the other.

I did scp my cluster.conf file to the 3 nodes, and when I try to do a 
"service cman start" and I get this :

Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman...                                        [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
                                                           [FAILED]

iptables is running, I tried it with iptables off, same results.

So I wonder if anyone have a brilliant idea as to why I am not able to bring 
up this cluster... 


From fdinitto at redhat.com  Wed Nov 30 05:34:37 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 30 Nov 2011 06:34:37 +0100
Subject: [Linux-cluster] Bringing up new cluster to life...
In-Reply-To: <22B7B1894E0F41BD8903740171EEE17A@versa>
References: <22B7B1894E0F41BD8903740171EEE17A@versa>
Message-ID: <4ED5C06D.3090803@redhat.com>

Hi Nicolas,

On 11/29/2011 09:51 PM, Nicolas Ross wrote:

> 
> Starting cluster:
>   Checking if cluster has been disabled at boot...        [  OK  ]
>   Checking Network Manager...                             [  OK  ]
>   Global setup...                                         [  OK  ]
>   Loading kernel modules...                               [  OK  ]
>   Mounting configfs...                                    [  OK  ]
>   Starting cman...                                        [  OK  ]
>   Waiting for quorum... Timed-out waiting for cluster
>                                                           [FAILED]
> 
> iptables is running, I tried it with iptables off, same results.
> 
> So I wonder if anyone have a brilliant idea as to why I am not able to
> bring up this cluster...

Did you check if multicast is enabled on the network switch and working
properly? That's generally the first thing people forget to look.

A simple test would be to switch to udpu transport (TechPreview in
rhel6.1) and see if that solves the issue, effectively isolating the
problem to multicast on the switch.

It might be useful if you can enable debugging and collect
/var/log/cluster/corosync.log from all 3 nodes.

Fabio

PS It's easy to test with iptables off. and check logs for possible avc
denials (selinux) just in case we missed something during out tests.


From rossnick-lists at cybercat.ca  Wed Nov 30 14:34:24 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Wed, 30 Nov 2011 09:34:24 -0500
Subject: [Linux-cluster] Bringing up new cluster to life...
References: <22B7B1894E0F41BD8903740171EEE17A@versa>
	<4ED5C06D.3090803@redhat.com>
Message-ID: <FDEEB89F17264AB6BD1EEA440E535854@versa>

...snip...

> Did you check if multicast is enabled on the network switch and working
> properly? That's generally the first thing people forget to look.

Thanks, that was it !

For some reason on the switch we have in this server room, forward of 
unknown multicast packets was disabled. I had to issue "no ip mcast 
block_unknown", and then the cluster is up !

Thank you very much ! 


From rossnick-lists at cybercat.ca  Wed Nov 30 14:34:55 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Wed, 30 Nov 2011 09:34:55 -0500
Subject: [Linux-cluster] Bringing up new cluster to life...
References: <22B7B1894E0F41BD8903740171EEE17A@versa>
	<4ED5C06D.3090803@redhat.com>
Message-ID: <FC5275E164AC42F1BA53601AFF29B5A3@versa>

...snip...

> Did you check if multicast is enabled on the network switch and working
> properly? That's generally the first thing people forget to look.

Thanks, that was it !

For some reason on the switch we have in this server room, forward of 
unknown multicast packets was disabled. I had to issue "no ip mcast 
block_unknown", and then the cluster is up !

Thank you very much !