From s.wendy.cheng at gmail.com  Sun Jun  1 04:12:21 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Sat, 31 May 2008 23:12:21 -0500
Subject: [Linux-cluster] Re: [linux-lvm] Distributed LVM/filesystem/storage
In-Reply-To: <20080531070328.GD19431@lug-owl.de>
References: <20080529231213.GY19431@lug-owl.de>	<A78DB34D00374344A0AB65B6523C05DC0304E8E1@marsden.win.datacash.com>
	<20080531070328.GD19431@lug-owl.de>
Message-ID: <484221A5.8040605@gmail.com>

Jan-Benedict Glaw wrote:
> On Fri, 2008-05-30 09:03:35 +0100, Gerrard Geldenhuis <Gerrard.Geldenhuis at datacash.com> wrote:
>   
>> On Behalf Of Jan-Benedict Glaw
>>     
>>> I'm just thinking about using my friend's overly empty harddisks for a
>>> common large filesystem by merging them all together into a single,
>>> large storage pool accessible by everybody.
>>>       
> [...]
>   
>>> It would be nice to see if anybody of you did the same before (merging
>>> the free space from a lot computers into one commonly used large
>>> filesystem), if it was successful and what techniques
>>> (LVM/NBD/DM/MD/iSCSI/Tahoe/Freenet/Other P2P/...) you used to get there,
>>> and how well that worked out in the end.
>>>       
>> Maybe have a look at GFS.
>>     
>
> GFS (or GFS2 fwiw) imposes a single, shared storage as its backend. At
> least I get that from reading the documentation. This would result in
> merging all the single disks via NBD/LVM to one machine first and
> export that merged volume back via NBD/iSCSI to the nodes. In case the
> actual data is local to a client, it would still be first send to the
> central machine (running LVM) and loaded back from there. Not as
> distributed as I hoped, or are there other configuration possibilities
> to not go that route?
>   

GFS is certainly developed and well tuned in a SAN environment where the 
shared storage(s) and cluster nodes reside on the very same fibre 
channel switch network. However, with its symmetric architecture, 
nothing can prevent it running on top of a group of iscsi disks (with 
GFS node as initiator), as long as each node can see and access these 
disks. It doesn't care where the iscsi targets live, nor how many there 
are. Of course, whether it can perform well in this environment is 
another story. In short, the notion that GFS requires all disks to be 
merged into one machine first and then export the merged volume back to 
the GFS node is *not* correct.

I actually have a 4-nodes cluster in my house. Two nodes running Linux 
iscsi initiators that have a 2-node GFS cluster setup. Another two nodes 
running a special version of  free-BSD as iscsi targets, each directly 
exports their local disks to the GFS nodes. I have not put too much IO 
loads on the GFS nodes though (since the cluster is mostly used to study 
storage block allocation issues - not for real data and/or application).

cc linxu-cluster

-- Wendy


From rcronenwett at gmail.com  Sun Jun  1 12:37:46 2008
From: rcronenwett at gmail.com (Ron Cronenwett)
Date: Sun, 1 Jun 2008 08:37:46 -0400
Subject: [Linux-cluster] apache resource problem in RHCS 5.1
In-Reply-To: <483ECA36.7070007@xbe.ch>
References: <483ECA36.7070007@xbe.ch>
Message-ID: <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>

Hi Lorenz

I had a similar problem while testing with Centos 5.1 on a VMWare
workstation setup. One more difference, I have been using
system-config-cluster
to configure the cluster. Luci seemed to be giving me problems with
setting up a mount of an NFS export. But I have not retried Luci since
changing
the selinux setting I mention below.

I found if I did not configure SELinux with setenforce permissive, the
/usr/share/cluster/apache.sh script did not execute. Once that runs,
it creates
/etc/cluster/apache/apache:"name". In that subdirectory, the script
creates an httpd.conf file from /etc/httpd/httpd.conf. I also found
the new httpd.conf
had the Listen statement commented out even though I had set it to my
clustered address in /etc/httpd/httpd. I needed to manually uncomment
the
Listen statement on each node in /etc/cluster/apache/apache:"name"/httpd.conf.

Hope this helps.

Ron C.


On Thu, May 29, 2008 at 11:22 AM, Lorenz Pfiffner <lp at xbe.ch> wrote:
>
> Hello everybody
>
> I have the following test setup:
>
> - RHEL 5.1 Cluster Suite with rgmanager-2.0.31-1 and cman-2.0.73-1
> - Two VMware machines on an ESX 3.5 U1, so no fence device (it's only a test)
> - 4 IP resources defined
> - GFS over DRBD, doesn't matter, because it doesn't even work on a local disk
>
> Now I would like to have an "Apache Resource" which i can select in the luci interface. I assume it's using the /usr/share/cluster/apache.sh script. If I try to start it, the error message looks like
> this:
>
> May 28 16:18:15 testsrv clurgmgrd: [18475]: <err> Starting Service apache:test_httpd > Failed
> May 28 16:18:15 testsrv clurgmgrd[18475]: <notice> start on apache "test_httpd" returned 1 (generic error)
> May 28 16:18:15 testsrv clurgmgrd[18475]: <warning> #68: Failed to start service:test_proxy_http; return value: 1
> May 28 16:18:15 testsrv clurgmgrd[18475]: <notice> Stopping service service:test_proxy_http
> May 28 16:18:16 testsrv clurgmgrd: [18475]: <err> Checking Existence Of File /var/run/cluster/apache/apache:test_httpd.pid [apache:test_httpd] > Failed - File Doesn't Exist
> May 28 16:18:16 testsrv clurgmgrd: [18475]: <err> Stopping Service apache:test_httpd > Failed
> May 28 16:18:16 testsrv clurgmgrd[18475]: <notice> stop on apache "test_httpd" returned 1 (generic error)
> May 28 16:18:16 testsrv clurgmgrd[18475]: <warning> #71: Relocating failed service service:test_proxy_http
>
> I've another cluster in which I had to alter the default init.d/httpd script to be able to run multiple apache instances (not vhosts) on one server. But there I have the Apache Service configured with
> a "Script Resource".
>
> Is this supposed to work of is it a feature in development? I don't see something like "Apache Resource" in the current documentation.
>
> Kind Regards
> Lorenz
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From s.wendy.cheng at gmail.com  Sun Jun  1 13:50:26 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Sun, 01 Jun 2008 08:50:26 -0500
Subject: [Linux-cluster] Re: [linux-lvm] Distributed LVM/filesystem/storage
In-Reply-To: <20080601070726.GK19431@lug-owl.de>
References: <20080529231213.GY19431@lug-owl.de>	<A78DB34D00374344A0AB65B6523C05DC0304E8E1@marsden.win.datacash.com>	<20080531070328.GD19431@lug-owl.de>
	<484221A5.8040605@gmail.com> <20080601070726.GK19431@lug-owl.de>
Message-ID: <4842A922.6040102@gmail.com>

Jan-Benedict Glaw wrote:
> On Sat, 2008-05-31 23:12:21 -0500, Wendy Cheng <s.wendy.cheng at gmail.com> wrote:
>   
>> Jan-Benedict Glaw wrote:
>>     
>>> On Fri, 2008-05-30 09:03:35 +0100, Gerrard Geldenhuis <Gerrard.Geldenhuis at datacash.com> wrote:
>>>       
>>>> On Behalf Of Jan-Benedict Glaw
>>>>         
>>>>> I'm just thinking about using my friend's overly empty harddisks for a
>>>>> common large filesystem by merging them all together into a single,
>>>>> large storage pool accessible by everybody.
>>>>>           
>>> [...]
>>>  
>>>       
>>>>> It would be nice to see if anybody of you did the same before (merging
>>>>> the free space from a lot computers into one commonly used large
>>>>> filesystem), if it was successful and what techniques
>>>>> (LVM/NBD/DM/MD/iSCSI/Tahoe/Freenet/Other P2P/...) you used to get there,
>>>>> and how well that worked out in the end.
>>>>>           
>>>> Maybe have a look at GFS.
>>>>         
>>> GFS (or GFS2 fwiw) imposes a single, shared storage as its backend. At
>>> least I get that from reading the documentation. This would result in
>>> merging all the single disks via NBD/LVM to one machine first and
>>> export that merged volume back via NBD/iSCSI to the nodes. In case the
>>> actual data is local to a client, it would still be first send to the
>>> central machine (running LVM) and loaded back from there. Not as
>>> distributed as I hoped, or are there other configuration possibilities
>>> to not go that route?
>>>       
>>                         However, with its symmetric architecture, 
>> nothing can prevent it running on top of a group of iscsi disks (with 
>> GFS node as initiator), as long as each node can see and access these 
>> disks. It doesn't care where the iscsi targets live, nor how many there 
>> are.
>>     
>
> So I'd configure each machine's empty disk/partition as an iSCSI
> target and let them show up an every "client" machine and run that
> setup. How good will GFS deal with temporary (or total) outage of
> single targets? Eg. 24h disconnects with ADSL connectivity etc.?
>
>   
High availability will not work well in this particular setup - it is 
more about data and storage sharing between GFS nodes.

Note that GFS normally runs on top of CLVM (clustered lvm, in case you 
don't know about it). You might want to check current (Linux) CLVM raid 
level support to see whether it fits your needs. 
 

-- Wendy


From dinesh at patel2202.fsnet.co.uk  Sun Jun  1 18:08:41 2008
From: dinesh at patel2202.fsnet.co.uk (Dinesh)
Date: Sun, 1 Jun 2008 19:08:41 +0100
Subject: [Linux-cluster] rhel5
Message-ID: <!&!AAAAAAAAAAAYAAAAAAAAAMaIXbJyscdBq0RaarBwTRnCgAAAEAAAAGHu2vgDcPJKk3EF+2RONb8BAAAAAA==@patel2202.fsnet.co.uk>

 
No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.24.4/1475 - Release Date: 30/05/2008
14:53
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080601/c8590528/attachment.htm>

From doobs72 at hotmail.com  Sun Jun  1 18:09:51 2008
From: doobs72 at hotmail.com (Dinesh Patel)
Date: Sun, 1 Jun 2008 19:09:51 +0100
Subject: [Linux-cluster] rhel5
Message-ID: <BLU121-DAV7468418D04B543BDF20A8BFB80@phx.gbl>

 
No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.24.4/1475 - Release Date: 30/05/2008
14:53
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080601/40371d9a/attachment.htm>

From doobs72 at hotmail.com  Sun Jun  1 18:12:38 2008
From: doobs72 at hotmail.com (doobs72 _)
Date: Sun, 1 Jun 2008 18:12:38 +0000
Subject: [Linux-cluster] rhel5
Message-ID: <BLU121-W25EE5F47C6D079F7AE2362BFB80@phx.gbl>


_________________________________________________________________
Great deals on almost anything at eBay.co.uk. Search, bid, find and win on eBay today!
http://clk.atdmt.com/UKM/go/msnnkmgl0010000004ukm/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080601/3ceb0738/attachment.htm>

From doobs72 at hotmail.com  Sun Jun  1 19:33:14 2008
From: doobs72 at hotmail.com (doobs72 _)
Date: Sun, 1 Jun 2008 19:33:14 +0000
Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding
Message-ID: <BLU121-W143A6D729D43A1AAD030E5BFB80@phx.gbl>


Hi
 
 I?m having fencing problems in my 3 node cluster running on RHEL5.0 which involves bonding.
 
I have 3 severs A, B & C in a cluster with bonding configured on eth2 & eth3 for my cluster traffic.  The config is as below:
 
DEVICE=eth2
BOOTPROTO=none
ONBOOT=yes
TYPE=Ethernet
MASTER=bond1
SLAVE=yes
USRCTL=no
 
DEVICE=eth3
BOOTPROTO=none
ONBOOT=yes
TYPE=Ethernet
MASTER=bond1
SLAVE=yes
USRCTL=no
 
 
DEVICE=bond1
IPADDR=192.168.x.x
NETMASK=255.255.255.0
NETWORK=192.168.x.0
BROADCAST=192.168.x.255
ONBOOT=YES
BOOTPROTO=none
 
The /etc/modprobe.conf file is configured as below:
 
alias eth0 bnx2
alias eth1 bnx2
alias eth2 e1000
alias eth3 e1000
alias eth4 e1000
alias eth5 e1000
alias scsi_hostadapter cciss
alias bond0 bonding
options bond0 miimon=100 mode=active-backup max_bonds=3
alias bond1 bonding
options bond1 miimon=100 mode=active-backup
alias bond2 bonding
options bond2 miimon=100 mode=active-backup
alias scsi_hostadapter1 qla2xxx
alias scsi_hostadapter2 usb-storage
 
 
The cluster starts up OK, however when I try to test the bonded interfaces my troubles begin.
On Node C if I "ifdown bond1", the node C, is fenced and everything works as expected.
 
However if on Node C, I take down the interfaces one at a time i.e. 
 "ifdown  eth2", - the cluster stays up as expected using eth3 for routing traffic  
  "ifdown eth3" 
then node C is fenced by Node A. However in the /var/log/messages file on Node C I see a message saying that Node B will be fenced. The outcome is Nodes C & B are fenced. 
 
My question is why does node B get fenced as well?
 
 
 D.

 
_________________________________________________________________

http://clk.atdmt.com/UKM/go/msnnkmgl0010000009ukm/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080601/87285174/attachment.htm>

From zac at sprackett.com  Mon Jun  2 04:48:12 2008
From: zac at sprackett.com (S. Zachariah Sprackett)
Date: Mon, 2 Jun 2008 00:48:12 -0400
Subject: [Linux-cluster] Announcing Perl bindings for libcman
Message-ID: <ed9a61600806012148v519fa6e6u1ac113dd2c15cbec@mail.gmail.com>

Hello,

I'd like to announce the availability of my Perl bindings for libcman.

You can grab them from here:

http://zac.sprackett.com/cman/cluster-cman-0.01.tar.gz

A simple example script would be as follows:

use Cluster::CMAN;

my $cman = new Cluster::CMAN();
$cman->init();

foreach ($cman->get_nodes) {
    print "Found a node: " . $_->{name} ."\n";
}

print "Cluster is"
   . ($cman->is_quorate() ? "" : " NOT") . " quorate!\n";

$cman->finish();

These bindings also fully support both the notification and recv_data
callbacks allowing you to take advantage of them from within perl.

Please let me know if you have any trouble with them.

-z
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080602/c9881603/attachment.htm>

From fdinitto at redhat.com  Mon Jun  2 06:10:43 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 2 Jun 2008 08:10:43 +0200 (CEST)
Subject: [Linux-cluster] Cluster 2.99.03 (development snapshot) released
Message-ID: <Pine.LNX.4.64.0806020807310.5892@trider-g7>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The cluster team and its community are proud to announce the 4th release
from the master branch: 2.99.03.

The 2.99.XX releases are _NOT_ meant to be used for production
environments.. yet.

You have been warned: *this code will have no mercy* for your servers and
your data.

The master branch is the main development tree that receives all new
features, code, clean up and a whole brand new set of bugs,

At some point in time this code will become the 3.0 stable release.

Everybody with test equipment and time to spare, is highly encouraged to
download, install and test the 2.99 releases and more important report
problems.

In order to build the 2.99.03 release you will need:

- - openais 0.83 or higher
- - linux kernel (git snapshot or 2.6.26-rc3) from
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git
(but can run on 2.6.25 in compatibility mode)

NOTE to packagers: the library API/ABI's are _NOT_ stable (hence 2.9). We
are still shipping shared libraries but remember that they can change
anytime without warning. A bunch of new shared libraries have been added.

The new source tarball can be downloaded here:

   ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.03.tar.gz

In order to use GFS1, the Linux kernel requires a minimal patch:

   ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Happy clustering,
Fabio

Under the hood (from 2.99.02):

Bob Peterson (1):
       bz 446085: Back-port faster bitfit algorithm from gfs2 for better

Christine Caulfield (1):
       [CMAN] Don't busy-loop if we can't get a node name

David Teigland (3):
       gfs_controld: rename files
       gfs_controld: move recover.c
       gfs_controld: restructuring

Fabio M. Di Nitto (19):
       [BUILD] Fix sparc #ifdef according to the new gcc tables
       [MISC] Update copyright
       [BUILD] Fix build order
       Merge branch 'master' of ssh://sources.redhat.com/git/cluster
       [BUILD] Fix dlm_controld linking
       [BUILD] Fix rg_test linking
       [BUILD] Fix install permissions
       [GFS2] Use proper include dir for libvolume_id
       [FENCE] Fix copyright header for fence_ifmib manpage
       [FENCE] Fix ifmib README to report the right fence agent
       [BUILD] Plugin the new shiny fence_ifmib agent
       [CCS] Use absolute path for queries
       [CONFIG] Fix lots of bugs in libccsconfdb
       [BUILD] Add fence_lpar fencing agent to the build system
       [GFS] remove symlink to umount.gfs2
       [GROUP] libgfscontrol: fix build with gcc-4.3
       [BUILD] Change build system to cope with new libgfscontrol
       [BUILD] gfs2 requires group to build
       [BUILD] Fix mount.gfs2 build

Lon Hohberger (3):
       [rgmanager] Apply patch from Marcelo Azevedo to make migration more robust
       [rgmanager] Fix live migration option (broken in last commit)
       [rgmanager] Use /cluster/rm instead of //rm

Marek 'marx' Grac (4):
       [FENCE] Fix #248609: SSH support in Bladecenter fencing (ssh)
       [FENCE] Fix #446995: Parse error: Unknown option 'switch=3'
       [FENCE] Fix #447378 - fence_apc unable to connect via ssh to APC 7900
       [FENCE]: Fix #237266: New fence agent for HMC/LPAR

Ross Vandegrift (1):
       [FENCE] Add fence_ifmib new agent

Ryan McCabe (3):
       fence: fixes and cleanups to fencing.py library
       libfence: handle EINTR correctly
       libfence: update copyright notice

  Makefile                                      |    4 +-
  ccs/ccs_tool/update.c                         |    6 +-
  ccs/daemon/misc.c                             |    8 +-
  cman/daemon/cmanconfig.c                      |    2 +-
  cman/qdisk/disk_util.c                        |    2 +-
  config/libs/libccsconfdb/libccs.c             |  166 +-
  configure                                     |   14 +
  fence/agents/apc/fence_apc.py                 |   90 +-
  fence/agents/bladecenter/fence_bladecenter.py |    2 +-
  fence/agents/ifmib/Makefile                   |   18 +
  fence/agents/ifmib/README                     |   45 +
  fence/agents/ifmib/fence_ifmib.py             |  221 ++
  fence/agents/lib/fencing.py.py                |   88 +-
  fence/agents/lpar/Makefile                    |   18 +
  fence/agents/lpar/fence_lpar.py               |   97 +
  fence/libfence/agent.c                        |   49 +-
  fence/libfence/libfence.h                     |    5 +-
  fence/man/Makefile                            |    1 +
  fence/man/fence_ifmib.8                       |   69 +
  gfs-kernel/src/gfs/bits.c                     |   85 +-
  gfs-kernel/src/gfs/bits.h                     |    3 +-
  gfs-kernel/src/gfs/rgrp.c                     |    3 +-
  gfs/Makefile                                  |    5 +-
  gfs2/mkfs/Makefile                            |    1 +
  gfs2/mount/Makefile                           |   27 +-
  gfs2/mount/mount.gfs2.c                       |   20 +-
  gfs2/mount/umount.gfs2.c                      |  168 --
  gfs2/mount/util.c                             |  475 +----
  gfs2/mount/util.h                             |    2 +-
  group/Makefile                                |    4 +-
  group/dlm_controld/Makefile                   |    2 +-
  group/gfs_control/Makefile                    |   41 +
  group/gfs_control/main.c                      |  212 ++
  group/gfs_controld/Makefile                   |   11 +-
  group/gfs_controld/config.c                   |  180 ++
  group/gfs_controld/config.h                   |   47 +
  group/gfs_controld/cpg-old.c                  | 2686 +++++++++++++++++++++++
  group/gfs_controld/cpg-old.h                  |   60 +
  group/gfs_controld/cpg.c                      |  289 ---
  group/gfs_controld/gfs_controld.h             |   49 +
  group/gfs_controld/gfs_daemon.h               |  268 +++
  group/gfs_controld/group.c                    |   64 +-
  group/gfs_controld/lock_dlm.h                 |  310 ---
  group/gfs_controld/main.c                     | 1219 ++++++-----
  group/gfs_controld/member_cman.c              |   29 +-
  group/gfs_controld/plock.c                    |  228 +--
  group/gfs_controld/recover.c                  | 2805 -------------------------
  group/gfs_controld/util.c                     |  197 ++
  group/libgfscontrol/Makefile                  |   53 +
  group/libgfscontrol/libgfscontrol.h           |  131 ++
  group/libgfscontrol/main.c                    |  438 ++++
  make/defines.mk.input                         |    2 +
  rgmanager/include/platform.h                  |    2 +-
  rgmanager/include/reslist.h                   |    2 +-
  rgmanager/src/clulib/vft.c                    |    2 +-
  rgmanager/src/daemons/Makefile                |    2 +-
  rgmanager/src/resources/Makefile              |   17 +-
  rgmanager/src/resources/vm.sh                 |   20 +-
  scripts/fenceparse                            |    2 +-
  59 files changed, 6197 insertions(+), 4869 deletions(-)

- --
I'm going to make him an offer he can't refuse.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iQIVAwUBSEOO8AgUGcMLQ3qJAQLeIQ//ZExWyhAdAdWrlFn5BLoThySCmt2LIvjf
TUQAbn8/kXExfdQjB94rfwlCwfml3G7VELZ9g4m9eVhKWATBnKGW+zFyFLPoQnKT
XTXre1WDqvQFeoWN/TlmeQ+AhxVCWHrDsvKnWah03ns4dspd85224dHa2MWe0vJe
grGhfy88tB+7nbVKC9vJgF5BDUVDJvtAm7BDs0tJYn87JE2riUIEZBJSIyXyrC1x
QyjQJrrZxHm2h9g/oDUXTg+BmvAP+RjXaRqQMYFKo/7NoIjR5ZIlecDYHLs5dnbM
/dCjgQuFhb3Y+gMmEmb9zA6F7FPbZegFfVMG+bdEt3vwnRIU3RpyKNsZIAp8Z3eK
jJQQ3JMmszePFBX3NZoB0BqGuEvUNmt4u82NqLGV3BjphxLzyQMjBSt0BzaLu4fj
fkL170J/wDJHfrW7sqkUflrPRRtDXzKXh+n0x9U+hkSA4Oh/haf22/7liRzez9wh
xKc4OGnEk+ZeMQ4lR/SXNEr9sOANaJgYrotoNS3NZ2wjEOdMjTYL+JV5k/S9OfHG
3g2XS8CfjuWlvfYxEv9bbWBH4mtBY8HWCEslnXjWUpNs8tpAgfvUwJS+u00JjwDR
/RfkaynapgSV3OqzRTOi1iXiEzpsV/n+Dp7zxBgdCc2kECq28tcIDPjzN+ShfaER
o7NWXbCZXCY=
=jHFQ
-----END PGP SIGNATURE-----


From maciej.bogucki at artegence.com  Mon Jun  2 06:24:12 2008
From: maciej.bogucki at artegence.com (Maciej Bogucki)
Date: Mon, 02 Jun 2008 08:24:12 +0200
Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding
In-Reply-To: <BLU121-W143A6D729D43A1AAD030E5BFB80@phx.gbl>
References: <BLU121-W143A6D729D43A1AAD030E5BFB80@phx.gbl>
Message-ID: <4843920C.60109@artegence.com>

doobs72 _ wrote:
>
> Hi
>
>  
>
>  I?m having fencing problems in my 3 node cluster running on 
> RHEL5.0 which involves bonding.
>
>  
>
> I have 3 severs A, B & C in a cluster with bonding configured on eth2 
> & eth3 for my cluster traffic.  The config is as below:
>
>  
>
> DEVICE=eth2
>
> BOOTPROTO=none
>
> ONBOOT=yes
>
> TYPE=Ethernet
>
> MASTER=bond1
>
> SLAVE=yes
>
> USRCTL=no
>
>  
>
> DEVICE=eth3
>
> BOOTPROTO=none
>
> ONBOOT=yes
>
> TYPE=Ethernet
>
> MASTER=bond1
>
> SLAVE=yes
>
> USRCTL=no
>
>  
>
>  
>
> DEVICE=bond1
>
> IPADDR=192.168.x.x
>
> NETMASK=255.255.255.0
>
> NETWORK=192.168.x.0
>
> BROADCAST=192.168.x.255
>
> ONBOOT=YES
>
> BOOTPROTO=none
>
>  
>
> The /etc/modprobe.conf file is configured as below:
>
>  
>
> alias eth0 bnx2
>
> alias eth1 bnx2
>
> alias eth2 e1000
>
> alias eth3 e1000
>
> alias eth4 e1000
>
> alias eth5 e1000
>
> alias scsi_hostadapter cciss
>
> alias bond0 bonding
>
> options bond0 miimon=100 mode=active-backup max_bonds=3
>
> alias bond1 bonding
>
> options bond1 miimon=100 mode=active-backup
>
> alias bond2 bonding
>
> options bond2 miimon=100 mode=active-backup
>
> alias scsi_hostadapter1 qla2xxx
>
> alias scsi_hostadapter2 usb-storage
>
>  
>
>  
>
> The cluster starts up OK, however when I try to test the bonded 
> interfaces my troubles begin.
>
> On Node C if I "ifdown bond1", the node C, is fenced and everything 
> works as expected.
>
>  
>
> However if on Node C, I take down the interfaces one at a time i.e. 
>
>  "ifdown  eth2", - the cluster stays up as expected using eth3 for 
> routing traffic  
>
>   "ifdown eth3" 
>
> then node C is fenced by Node A. However in the /var/log/messages file 
> on Node C I see a message saying that Node B will be fenced. The 
> outcome is Nodes C & B are fenced.
>
>  
>
> My question is why does node B get fenced as well?
>
>
Hello,

First of all, You have the problem with bonding. Switch off the cluster, 
and investigate why when You do "ifdown eth3" the cluster goes down. I 
suspect that the problem is with e1000 driver.
I suppose that C is the master of the cluster and it is faster than 
election of new master(of A,B).
You could identify the master by: i=`cman_tool services | grep -A 1 
default | tail -1 | sed -e 's/\[\(.\).*/\1/'`; cman_tool nodes | awk 
'{print $1,$5}' | grep "^$i"
To resolve this issue You need to use more than one communication medium 
fe. ethernet or disk quorum if You have one?

Best Regards
Maciej Bogucki


From Dinesh.Patel at AAH.co.uk  Mon Jun  2 07:35:59 2008
From: Dinesh.Patel at AAH.co.uk (Patel Dino)
Date: Mon, 2 Jun 2008 08:35:59 +0100
Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding
Message-ID: <8C22506D4103BE40B23DFE9E04B2D8FE05E35561@GBW607SC0054.GB-WS.net>

At the time Node A is the master.

I do have a quorum disk setup. When the two nodes (B & C) get fenced the
cluster stays up with Node A  & the quorum disk.


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Maciej Bogucki
Sent: Monday, June 02, 2008 7:24 AM
To: linux clustering
Subject: Re: [Linux-cluster] RHEL5.0 Cluster fencing problems involving
bonding


doobs72 _ wrote:
>
> Hi
>
>  
>
>  I'm having fencing problems in my 3 node cluster running on 
> RHEL5.0 which involves bonding.
>
>  
>
> I have 3 severs A, B & C in a cluster with bonding configured on eth2 
> & eth3 for my cluster traffic.  The config is as below:
>
>  
>
> DEVICE=eth2
>
> BOOTPROTO=none
>
> ONBOOT=yes
>
> TYPE=Ethernet
>
> MASTER=bond1
>
> SLAVE=yes
>
> USRCTL=no
>
>  
>
> DEVICE=eth3
>
> BOOTPROTO=none
>
> ONBOOT=yes
>
> TYPE=Ethernet
>
> MASTER=bond1
>
> SLAVE=yes
>
> USRCTL=no
>
>  
>
>  
>
> DEVICE=bond1
>
> IPADDR=192.168.x.x
>
> NETMASK=255.255.255.0
>
> NETWORK=192.168.x.0
>
> BROADCAST=192.168.x.255
>
> ONBOOT=YES
>
> BOOTPROTO=none
>
>  
>
> The /etc/modprobe.conf file is configured as below:
>
>  
>
> alias eth0 bnx2
>
> alias eth1 bnx2
>
> alias eth2 e1000
>
> alias eth3 e1000
>
> alias eth4 e1000
>
> alias eth5 e1000
>
> alias scsi_hostadapter cciss
>
> alias bond0 bonding
>
> options bond0 miimon=100 mode=active-backup max_bonds=3
>
> alias bond1 bonding
>
> options bond1 miimon=100 mode=active-backup
>
> alias bond2 bonding
>
> options bond2 miimon=100 mode=active-backup
>
> alias scsi_hostadapter1 qla2xxx
>
> alias scsi_hostadapter2 usb-storage
>
>  
>
>  
>
> The cluster starts up OK, however when I try to test the bonded 
> interfaces my troubles begin.
>
> On Node C if I "ifdown bond1", the node C, is fenced and everything 
> works as expected.
>
>  
>
> However if on Node C, I take down the interfaces one at a time i.e. 
>
>  "ifdown  eth2", - the cluster stays up as expected using eth3 for 
> routing traffic  
>
>   "ifdown eth3" 
>
> then node C is fenced by Node A. However in the /var/log/messages file

> on Node C I see a message saying that Node B will be fenced. The 
> outcome is Nodes C & B are fenced.
>
>  
>
> My question is why does node B get fenced as well?
>
>
Hello,

First of all, You have the problem with bonding. Switch off the cluster,

and investigate why when You do "ifdown eth3" the cluster goes down. I 
suspect that the problem is with e1000 driver.
I suppose that C is the master of the cluster and it is faster than 
election of new master(of A,B).
You could identify the master by: i=`cman_tool services | grep -A 1 
default | tail -1 | sed -e 's/\[\(.\).*/\1/'`; cman_tool nodes | awk 
'{print $1,$5}' | grep "^$i"
To resolve this issue You need to use more than one communication medium

fe. ethernet or disk quorum if You have one?

Best Regards
Maciej Bogucki


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
************************************************************************ 
DISCLAIMER 
The information contained in this e-mail is confidential and is intended 
for the recipient only. 
If you have received it in error, please notify us immediately by reply e-mail and then delete it from your system. Please do not copy it or use it for any other purposes, or disclose the content of the e-mail to any other person or store or copy the information in any medium. 
The views contained in this e-mail are those of the author and not necessarily those of AAH Pharmaceuticals Ltd. 
AAH Pharmaceuticals Ltd is a company incorporated in England and Wales under company number 123458 and whose registered office is at Sapphire Court, Walsgrave Triangle, Coventry, CV2 2TX 
************************************************************************


From denisb+gmane at gmail.com  Mon Jun  2 11:20:46 2008
From: denisb+gmane at gmail.com (denis)
Date: Mon, 02 Jun 2008 13:20:46 +0200
Subject: [Linux-cluster] Re: cman_tool returns Flags: Dirty
In-Reply-To: <483FACEF.2080509@redhat.com>
References: <g1lp8u$un2$1@ger.gmane.org> <g1m34h$15j$1@ger.gmane.org>
	<483FACEF.2080509@redhat.com>
Message-ID: <g20l2e$8sp$1@ger.gmane.org>

Christine Caulfield wrote:
>> denis wrote:
>>> What does "Flags: Dirty" mean? Is it anything to worry about?
>> http://www.redhat.com/archives/cluster-devel/2007-September/msg00091.html
>> NODE_FLAGS_DIRTY - This node has internal state and must not join
>>                    a cluster that also has state.
>> What does this actually imply? Anything to care about? How would this 
>> node "recover" from being dirty?
> It's a perfectly normal state. in fact it's expected if you are running 
> services. It simply means that the cluster has some services running 
> that have state of their own that cannot be recovered without a full 
> restart. I would be more worried if you did NOT see this in cman_tool 
> status. It's NOT a warning. don't worry about it :)

Thanks for clarification. I sort of figured this out, but confirmation 
is appreciated.

Regards
--
Denis Braekhus


From stephan.windmueller at cs.uni-dortmund.de  Mon Jun  2 12:47:30 2008
From: stephan.windmueller at cs.uni-dortmund.de (Stephan =?iso-8859-1?Q?Windm=FCller?=)
Date: Mon, 2 Jun 2008 14:47:30 +0200
Subject: [Linux-cluster] qdiskd does not start
Message-ID: <20080602124730.GA16072@speutel.de>

Hello!

I created a quorum disk with mkqdisk which is shown when I run
"mkqdisk -L"

| # mkqdisk -L
| mkqdisk v2.0
| 
| /dev/sdc:
|         Magic:   eb7a62c2
|         Label:   quorum
|         Created: Mon Jun  2 11:21:29 2008
|         Host:    clnode01

My quorum-config in cluster.conf is:

| <quorumd device="/dev/disk/by-id/scsi-3600a..." votes="1" log_level="7" status_file="/tmp/quorum-state" min_score="1">
|    <heuristic program="ping xxx.xxx.xxx.xxx -c1 -t1" score="1" interval="2"/>
|    <heuristic program="ping yyy.yyy.yyy.yyy -c1 -t1" score="1" interval="2"/>
|    <heuristic program="ping zzz.zzz.zzz.zzz -c1 -t1" score="1" interval="2"/>
| </quorumd>

But when the cluster starts, I can not see that it makes use of the quorum disk:

| Nodes: 2
| Expected votes: 3
| Total votes: 2
| Quorum: 2

Neither I can see anything in the daemon-log nor is there a file
/tmp/quorum-state. Does anyone know why the qdisk daemon does not start
here?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080602/4833e098/attachment.sig>

From Alain.Moulle at bull.net  Mon Jun  2 12:54:54 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Mon, 02 Jun 2008 14:54:54 +0200
Subject: [Linux-cluster] CS5 / what does that means ?
Message-ID: <4843ED9E.5080109@bull.net>

Hi

What can be the causes of this message during a relocate of service ?

<err> #60: Mangled reply from member #1 during RG relocate

Consequence is that the service remains "starting" and never goes "started".

Thanks
Regards
Alain Moull?


From jakub.suchy at enlogit.cz  Mon Jun  2 13:33:01 2008
From: jakub.suchy at enlogit.cz (Jakub Suchy)
Date: Mon, 2 Jun 2008 15:33:01 +0200
Subject: [Linux-cluster] heartbeat over 2 NICs
Message-ID: <20080602133301.GD4368@localhost>

Hi,
I would like to know, if it's possible to run heartbeat (through cman)
over two dedicated network NICs. AFAIK, in old hearbeat code, it was
possible using serial + NIC. Unfortunately, I was unable to find this in
any documentation and this is the first time a customer is requesting
this. (I am not talking about network bonding).

Thanks you very much,
Jakub Suchy


From ccaulfie at redhat.com  Mon Jun  2 13:38:37 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Mon, 02 Jun 2008 14:38:37 +0100
Subject: [Linux-cluster] heartbeat over 2 NICs
In-Reply-To: <20080602133301.GD4368@localhost>
References: <20080602133301.GD4368@localhost>
Message-ID: <4843F7DD.8000808@redhat.com>

Jakub Suchy wrote:
> Hi,
> I would like to know, if it's possible to run heartbeat (through cman)
> over two dedicated network NICs. AFAIK, in old hearbeat code, it was
> possible using serial + NIC. Unfortunately, I was unable to find this in
> any documentation and this is the first time a customer is requesting
> this. (I am not talking about network bonding).
> 

Basically, no.

If you want to use 2 NICs then bonding is what you need. cman can use 
dual NICs after a fashion but it's not supported and even less well tested.

Sorry.

-- 

Chrissie


From Dinesh.Patel at AAH.co.uk  Mon Jun  2 13:48:52 2008
From: Dinesh.Patel at AAH.co.uk (Patel Dino)
Date: Mon, 2 Jun 2008 14:48:52 +0100
Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding
Message-ID: <8C22506D4103BE40B23DFE9E04B2D8FE05E35562@GBW607SC0054.GB-WS.net>

I think I know what's going on ...

When I take down the two slave interfaces (eth2 & eth3) on Node C, the
bond1 interface remains UP. 
This means  that the Node C still thinks its OK, however it can not see
Node A & B, and tries to fence Node B.
Node A which is the master fences Node C.

I'm not sure how to resolve this any help would be appreciated.

D.

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patel Dino
Sent: Monday, June 02, 2008 8:36 AM
To: linux clustering
Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving
bonding


At the time Node A is the master.

I do have a quorum disk setup. When the two nodes (B & C) get fenced the
cluster stays up with Node A  & the quorum disk.


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Maciej Bogucki
Sent: Monday, June 02, 2008 7:24 AM
To: linux clustering
Subject: Re: [Linux-cluster] RHEL5.0 Cluster fencing problems involving
bonding


doobs72 _ wrote:
>
> Hi
>
>  
>
>  I'm having fencing problems in my 3 node cluster running on 
> RHEL5.0 which involves bonding.
>
>  
>
> I have 3 severs A, B & C in a cluster with bonding configured on eth2 
> & eth3 for my cluster traffic.  The config is as below:
>
>  
>
> DEVICE=eth2
>
> BOOTPROTO=none
>
> ONBOOT=yes
>
> TYPE=Ethernet
>
> MASTER=bond1
>
> SLAVE=yes
>
> USRCTL=no
>
>  
>
> DEVICE=eth3
>
> BOOTPROTO=none
>
> ONBOOT=yes
>
> TYPE=Ethernet
>
> MASTER=bond1
>
> SLAVE=yes
>
> USRCTL=no
>
>  
>
>  
>
> DEVICE=bond1
>
> IPADDR=192.168.x.x
>
> NETMASK=255.255.255.0
>
> NETWORK=192.168.x.0
>
> BROADCAST=192.168.x.255
>
> ONBOOT=YES
>
> BOOTPROTO=none
>
>  
>
> The /etc/modprobe.conf file is configured as below:
>
>  
>
> alias eth0 bnx2
>
> alias eth1 bnx2
>
> alias eth2 e1000
>
> alias eth3 e1000
>
> alias eth4 e1000
>
> alias eth5 e1000
>
> alias scsi_hostadapter cciss
>
> alias bond0 bonding
>
> options bond0 miimon=100 mode=active-backup max_bonds=3
>
> alias bond1 bonding
>
> options bond1 miimon=100 mode=active-backup
>
> alias bond2 bonding
>
> options bond2 miimon=100 mode=active-backup
>
> alias scsi_hostadapter1 qla2xxx
>
> alias scsi_hostadapter2 usb-storage
>
>  
>
>  
>
> The cluster starts up OK, however when I try to test the bonded 
> interfaces my troubles begin.
>
> On Node C if I "ifdown bond1", the node C, is fenced and everything 
> works as expected.
>
>  
>
> However if on Node C, I take down the interfaces one at a time i.e. 
>
>  "ifdown  eth2", - the cluster stays up as expected using eth3 for 
> routing traffic  
>
>   "ifdown eth3" 
>
> then node C is fenced by Node A. However in the /var/log/messages file

> on Node C I see a message saying that Node B will be fenced. The 
> outcome is Nodes C & B are fenced.
>
>  
>
> My question is why does node B get fenced as well?
>
>
Hello,

First of all, You have the problem with bonding. Switch off the cluster,

and investigate why when You do "ifdown eth3" the cluster goes down. I 
suspect that the problem is with e1000 driver.
I suppose that C is the master of the cluster and it is faster than 
election of new master(of A,B).
You could identify the master by: i=`cman_tool services | grep -A 1 
default | tail -1 | sed -e 's/\[\(.\).*/\1/'`; cman_tool nodes | awk 
'{print $1,$5}' | grep "^$i"
To resolve this issue You need to use more than one communication medium

fe. ethernet or disk quorum if You have one?

Best Regards
Maciej Bogucki


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
************************************************************************

DISCLAIMER 
The information contained in this e-mail is confidential and is intended

for the recipient only. 
If you have received it in error, please notify us immediately by reply
e-mail and then delete it from your system. Please do not copy it or use
it for any other purposes, or disclose the content of the e-mail to any
other person or store or copy the information in any medium. 
The views contained in this e-mail are those of the author and not
necessarily those of AAH Pharmaceuticals Ltd. 
AAH Pharmaceuticals Ltd is a company incorporated in England and Wales
under company number 123458 and whose registered office is at Sapphire
Court, Walsgrave Triangle, Coventry, CV2 2TX 
************************************************************************

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From orkcu at yahoo.com  Mon Jun  2 14:02:40 2008
From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=)
Date: Mon, 2 Jun 2008 07:02:40 -0700 (PDT)
Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding
In-Reply-To: <8C22506D4103BE40B23DFE9E04B2D8FE05E35562@GBW607SC0054.GB-WS.net>
Message-ID: <320668.48996.qm@web50604.mail.re2.yahoo.com>


--- On Mon, 6/2/08, Patel Dino <Dinesh.Patel at AAH.co.uk> wrote:

> From: Patel Dino <Dinesh.Patel at AAH.co.uk>
> Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding
> To: "linux clustering" <linux-cluster at redhat.com>
> Received: Monday, June 2, 2008, 9:48 AM
> I think I know what's going on ...
> 
> When I take down the two slave interfaces (eth2 & eth3)
> on Node C, the
> bond1 interface remains UP. 
> This means  that the Node C still thinks its OK, however it
> can not see
> Node A & B, and tries to fence Node B.
> Node A which is the master fences Node C.
> 
> I'm not sure how to resolve this any help would be
> appreciated.

as previously said, it is a bond problem

I had several problems with bonding e1000 interfaces in RHEL4, and it was a problem with the e1000 driver, as soon as I use a new one, bond start to work properly
I don?t know if that is the case with rhel5.0, but maybe it is.

you can check the archives of this list if you want to find which version of e1000 driver fix my problem.

cu
roger


      __________________________________________________________________
Get a sneak peak at messages with a handy reading pane with All new Yahoo! Mail: http://ca.promos.yahoo.com/newmail/overview2/


From ricks at nerd.com  Mon Jun  2 16:31:59 2008
From: ricks at nerd.com (Rick Stevens)
Date: Mon, 02 Jun 2008 09:31:59 -0700
Subject: [Linux-cluster] apache resource problem in RHCS 5.1
In-Reply-To: <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>
References: <483ECA36.7070007@xbe.ch>
	<9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>
Message-ID: <4844207F.2090706@nerd.com>

Ron Cronenwett wrote:
> Hi Lorenz
> 
> I had a similar problem while testing with Centos 5.1 on a VMWare
> workstation setup. One more difference, I have been using
> system-config-cluster
> to configure the cluster. Luci seemed to be giving me problems with
> setting up a mount of an NFS export. But I have not retried Luci since
> changing
> the selinux setting I mention below.
> 
> I found if I did not configure SELinux with setenforce permissive, the
> /usr/share/cluster/apache.sh script did not execute. Once that runs,
> it creates
> /etc/cluster/apache/apache:"name". In that subdirectory, the script
> creates an httpd.conf file from /etc/httpd/httpd.conf. I also found
> the new httpd.conf
> had the Listen statement commented out even though I had set it to my
> clustered address in /etc/httpd/httpd. I needed to manually uncomment
> the
> Listen statement on each node in /etc/cluster/apache/apache:"name"/httpd.conf.

Have you checked the SELinux error messages in either /var/log/messages
or /var/log/audit/audit.log (or the output of audit2allow -a) to see
what SELinux policy is being violated?  I'd do that, then bugzilla the
apache.sh script and cite your findings.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer                       rps2 at nerd.com -
- Hosting Consulting, Inc.                                           -
-                                                                    -
-               The Theory of Rapitivity: E=MC Hammer                -
-                                  -- Glenn Marcus (via TopFive.com) -
----------------------------------------------------------------------


From Dinesh.Patel at AAH.co.uk  Mon Jun  2 18:02:27 2008
From: Dinesh.Patel at AAH.co.uk (Patel Dino)
Date: Mon, 2 Jun 2008 19:02:27 +0100
Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding
Message-ID: <8C22506D4103BE40B23DFE9E04B2D8FE05E35563@GBW607SC0054.GB-WS.net>

I've updated the e1000 drivers from  version7.2.7  to  version7.6.15.5 and still getting the same problems.

Any more suggestions would be appreciated.

D
-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Roger Pe?a
Sent: Monday, June 02, 2008 3:03 PM
To: linux clustering
Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding


--- On Mon, 6/2/08, Patel Dino <Dinesh.Patel at AAH.co.uk> wrote:

> From: Patel Dino <Dinesh.Patel at AAH.co.uk>
> Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding
> To: "linux clustering" <linux-cluster at redhat.com>
> Received: Monday, June 2, 2008, 9:48 AM
> I think I know what's going on ...
> 
> When I take down the two slave interfaces (eth2 & eth3)
> on Node C, the
> bond1 interface remains UP. 
> This means  that the Node C still thinks its OK, however it
> can not see
> Node A & B, and tries to fence Node B.
> Node A which is the master fences Node C.
> 
> I'm not sure how to resolve this any help would be
> appreciated.

as previously said, it is a bond problem

I had several problems with bonding e1000 interfaces in RHEL4, and it was a problem with the e1000 driver, as soon as I use a new one, bond start to work properly
I don?t know if that is the case with rhel5.0, but maybe it is.

you can check the archives of this list if you want to find which version of e1000 driver fix my problem.

cu
roger


      __________________________________________________________________
Get a sneak peak at messages with a handy reading pane with All new Yahoo! Mail: http://ca.promos.yahoo.com/newmail/overview2/

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
*****************************************************************************
DISCLAIMER 
The information contained in this e-mail is confidential and is intended 
for the recipient only. 
If you have received it in error, please notify us immediately by reply 
e-mail and then delete it from your system. Please do not copy it or 
use it for any other purposes, or disclose the content of the e-mail 
to any other person or store or copy the information in any medium. 
The views contained in this e-mail are those of the author and not 
necessarily those of AAH Pharmaceuticals Ltd. 
AAH Pharmaceuticals Ltd is a company incorporated in England and 
Wales under company number 123458 and whose registered office 
is at Sapphire Court, Walsgrave Triangle, Coventry, CV2 2TX 
*****************************************************************************


From cma at analog.org  Mon Jun  2 22:09:55 2008
From: cma at analog.org (Chris Adams)
Date: Mon, 2 Jun 2008 17:09:55 -0500
Subject: [Linux-cluster] /sbin/mount.gfs thinks fs is gfs2?
Message-ID: <20080602220955.GA83307@analog.org>

I am upgrading a system with a GFS 6.0 filesystem from RHEL 3 to CentOS 5, 
and subsequently GFS 6.0 to 6.1.  I've followed the instructions here:
http://www.redhat.com/docs/manuals/csgfs/browse/rh-gfs-en/ap-license.html
and subsequently ran gfs_tool sb device proto lock_dlm on my gfs lv

The cluster is up and quorate, and clvmd sees the gfs lv, but when I try 
to mount it, I get:

#  mount -t gfs -o upgrade /dev/mapper/pool_gfs-pool_gfs /VAULT10/
/sbin/mount.gfs: there appears to be a GFS2, not GFS, filesystem on 
/dev/mapper/pool_gfs-pool_gfs

I'm not sure why this is failing.  For grins, I tried mounting it as a 
gfs2 filesystem and this is what I get:

#  mount -t gfs2 -o upgrade /dev/mapper/pool_gfs-pool_gfs /VAULT10/
/sbin/mount.gfs2: there appears to be a GFS, not GFS2, filesystem on 
/dev/mapper/pool_gfs-pool_gfs

I have successfully performed the upgrade if I use centos 4 as an 
intermediate step in the upgrade and perform the upgrade steps there and 
the conversion from lock_gulmd to dlm.  However, there are several 
clusters we need to do this with, so that's a painful option to avoid if 
possible.  

Here is the output from gfs_tool:
# gfs_tool sb /dev/mapper/pool_gfs-pool_gfs all
  mh_magic = 0x01161970
  mh_type = 1
  mh_generation = 0
  mh_format = 100
  mh_incarn = 0
  sb_fs_format = 1308
  sb_multihost_format = 1401
  sb_flags = 0
  sb_bsize = 4096
  sb_bsize_shift = 12
  sb_seg_size = 16
  no_formal_ino = 21
  no_addr = 21
  no_formal_ino = 22
  no_addr = 22
  no_formal_ino = 25
  no_addr = 25
  sb_lockproto = lock_dlm
  sb_locktable = cma:pool_gfs
  no_formal_ino = 23
  no_addr = 23
  no_formal_ino = 24
  no_addr = 24
  sb_reserved =
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


thanks,
-chris


From Santosh.Panigrahi at in.unisys.com  Tue Jun  3 03:57:24 2008
From: Santosh.Panigrahi at in.unisys.com (Panigrahi, Santosh Kumar)
Date: Tue, 3 Jun 2008 09:27:24 +0530
Subject: [Linux-cluster] qdiskd does not start
In-Reply-To: <20080602124730.GA16072@speutel.de>
References: <20080602124730.GA16072@speutel.de>
Message-ID: <D566E8CF3538B54D95B925CB69CB4D2A112AA733@inblr-exch1.eu.uis.unisys.com>

I got an impression from your mail that you have not started qdiskd service.
If above is the case then, you have to explicitly start the qdiskd service in all the cluster nodes after starting the cman/rgmanager service.
Don't expect cman/rgmanager to start the qdiskd service. Unless one will start the qdiskd service, the cluster won't consider the qdisk configuration.

Thanks,
Santosh 

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Stephan Windm?ller
Sent: Monday, June 02, 2008 6:18 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] qdiskd does not start

Hello!

I created a quorum disk with mkqdisk which is shown when I run
"mkqdisk -L"

| # mkqdisk -L
| mkqdisk v2.0
| 
| /dev/sdc:
|         Magic:   eb7a62c2
|         Label:   quorum
|         Created: Mon Jun  2 11:21:29 2008
|         Host:    clnode01

My quorum-config in cluster.conf is:

| <quorumd device="/dev/disk/by-id/scsi-3600a..." votes="1" log_level="7" status_file="/tmp/quorum-state" min_score="1">
|    <heuristic program="ping xxx.xxx.xxx.xxx -c1 -t1" score="1" interval="2"/>
|    <heuristic program="ping yyy.yyy.yyy.yyy -c1 -t1" score="1" interval="2"/>
|    <heuristic program="ping zzz.zzz.zzz.zzz -c1 -t1" score="1" interval="2"/>
| </quorumd>

But when the cluster starts, I can not see that it makes use of the quorum disk:

| Nodes: 2
| Expected votes: 3
| Total votes: 2
| Quorum: 2

Neither I can see anything in the daemon-log nor is there a file
/tmp/quorum-state. Does anyone know why the qdisk daemon does not start
here?


From stephan.windmueller at cs.uni-dortmund.de  Tue Jun  3 07:11:32 2008
From: stephan.windmueller at cs.uni-dortmund.de (Stephan =?iso-8859-1?Q?Windm=FCller?=)
Date: Tue, 3 Jun 2008 09:11:32 +0200
Subject: [Linux-cluster] qdiskd does not start
In-Reply-To: <D566E8CF3538B54D95B925CB69CB4D2A112AA733@inblr-exch1.eu.uis.unisys.com>
References: <20080602124730.GA16072@speutel.de>
	<D566E8CF3538B54D95B925CB69CB4D2A112AA733@inblr-exch1.eu.uis.unisys.com>
Message-ID: <20080603071132.GA8765@speutel.de>

On Tue, 03. Jun 2008, Panigrahi, Santosh Kumar wrote:

> I got an impression from your mail that you have not started qdiskd
> service.

The service is started from the init script.

> If above is the case then, you have to explicitly start the qdiskd
> service in all the cluster nodes after starting the cman/rgmanager
> service.

I tried that, but after running "qdiskd" as root there is no running
daemon. syslog says:

| qdiskd: <debug> Heuristic: 'ping xxx.xxx.xxx.xxx -c1 -t1' score=1 interval=2 tko=1 
| qdiskd: <debug> Heuristic: 'ping yyy.yyy.yyy.yyy -c1 -t1' score=1 interval=2 tko=1 
| qdiskd: <debug> Heuristic: 'ping zzz.zzz.zzz.zzz -c1 -t1' score=1 interval=2 tko=1 
| qdiskd: <debug> 3 heuristics loaded 
| qdiskd: <debug> Quorum Daemon: 3 heuristics, 1 interval, 10 tko, 1 votes

With strace I see that qdiskd reads /var/run/qdiskd.pid and tries to
access this process (which is not running any more). Even when I delete
this pid-file nothing changes.

Regards
 Stephan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080603/0e8168d6/attachment.sig>

From fdinitto at redhat.com  Tue Jun  3 09:16:11 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 3 Jun 2008 11:16:11 +0200 (CEST)
Subject: [Linux-cluster] Changes in libccs behaviour (PLEASE READ!)
Message-ID: <Pine.LNX.4.64.0806031055250.5892@trider-g7>


Hi guys,

I just landed the last bits in libccs to support both xpath lite and full 
xpath queries. With this new code, a couple of things need to be checked 
across all applications using libccs.

Relevant changes:

ccs_connect() used to return only when cluster is quorated.
This is not the case anymore. ccs_connect will return as soon as it can 
connect to aisexec and init properly (or fail).
You can use cman_is_quorate from libcman for the same feature.

ccs_force_connect() used to take a cluster name in input. The API is still 
the same, but the cluster name is now ignored (it wasn't in used before 
either).

in order to use xpath lite or full xpath, set fullxpath (int from ccs.h) 
to either 0 (xpath lite and default) or 1 (full xpath) before invoking 
ccs_connect or ccs_force_connect.
In order to switch from one mode to another, you have to disconnect and 
connect again.

WARNING: use full xpath only if you cannot live without. It is slow and 
it's a memory eating piece of code.

WARNING2: the library is not thread safe (yet?). So far none of our 
callers really need this feature. Please let me know if i overlooked.

Please review your ccs init calls around and take appropriate actions.

ccs_test(8): not fully completed yet (another email will follow).

Feel free to contact me if you have any questions

Fabio

PS hint: ccs_force_connect() has a blocking option that will idle loop as 
long as required and will exit the loop when cman is available for 
queries. This could replace several hand made loops on ccs_connect i have 
seen around.

--
I'm going to make him an offer he can't refuse.


From stephan.windmueller at cs.uni-dortmund.de  Tue Jun  3 09:27:19 2008
From: stephan.windmueller at cs.uni-dortmund.de (Stephan =?iso-8859-1?Q?Windm=FCller?=)
Date: Tue, 3 Jun 2008 11:27:19 +0200
Subject: [Linux-cluster] qdiskd does not start
In-Reply-To: <20080603071132.GA8765@speutel.de>
References: <20080602124730.GA16072@speutel.de>
	<D566E8CF3538B54D95B925CB69CB4D2A112AA733@inblr-exch1.eu.uis.unisys.com>
	<20080603071132.GA8765@speutel.de>
Message-ID: <20080603092719.GA15653@speutel.de>

On Tue, 03. Jun 2008, Stephan Windm?ller wrote:

> With strace I see that qdiskd reads /var/run/qdiskd.pid and tries to
> access this process (which is not running any more). Even when I delete
> this pid-file nothing changes.

After reading parts of the source code I think that I found the problem.
In qdisk/main.c the function daemon_init is called:

| if (daemon_init(argv[0]) < 0)
|     goto out;

But the type of daemon_init is "void" and it does not return a value:

| void
| daemon_init(char *prog)
| {
|
|       [...]
|
|       daemon(0, 0);
|
|       update_pidfile(prog);
| }

I do not understand why the linker does not produce an error here. Also it
seems unwanted that daemon_init dies with "exit(1)" when an error occurs
instead of returning -1.

However, qdiskd will always exit when daemonized with this code. I removed the
comparison < 0 and got this in syslog:

| qdiskd: <info> Initial score 3/3 
| qdiskd: <info> Initialization complete 
| qdiskd: <notice> Score sufficient for master operation (3/3; required=1); upgrading 
| qdiskd: <debug> Making bid for master 
| qdiskd: <info> Assuming master role 

But after that "cman_tool status" hangs and produces no output.

- Stephan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080603/2d18c746/attachment.sig>

From denisb+gmane at gmail.com  Tue Jun  3 12:26:59 2008
From: denisb+gmane at gmail.com (denis)
Date: Tue, 03 Jun 2008 14:26:59 +0200
Subject: [Linux-cluster] Re: CS5 / what does that means ?
In-Reply-To: <4843ED9E.5080109@bull.net>
References: <4843ED9E.5080109@bull.net>
Message-ID: <g23daj$5td$1@ger.gmane.org>

Alain Moulle wrote:
> Hi
> 
> What can be the causes of this message during a relocate of service ?
> 
> <err> #60: Mangled reply from member #1 during RG relocate
> 
> Consequence is that the service remains "starting" and never goes "started".

I had the same issue at one time, I debugged the initscripts and 
configuration of the service in question on both nodes and discovered 
one had a problem in starting the service.

As far as I recall fixing the issue with the broken startup also 
resolved this "Mangled reply" error.

I am not saying this is the case on your system, I just thought I would 
share my experience.

Regards
--
Denis


From miolinux at libero.it  Tue Jun  3 13:53:34 2008
From: miolinux at libero.it (Miolinux)
Date: Tue, 03 Jun 2008 15:53:34 +0200
Subject: [Linux-cluster] Error with gfs_grow/ gfs_fsck
Message-ID: <1212501214.10658.11.camel@GD-P2-093>

Hi,

I tried to expand my gfs filesystem from 250Gb to 350Gb.
I run gfs_grow without any error or warnings.
But something gone wrong.

Now, i cannot mount the gfs filesystem anymore (lock computer)

When i try to do a gfs_fsck i get:

[root at west ~]# gfs_fsck -v /dev/mapper/VolGroup_FS100-LogVol_FS100 
Initializing fsck
Initializing lists...
Initializing special inodes...
Validating Resource Group index.
Level 1 check.
371 resource groups found.
(passed)
Setting block ranges...
This file system is too big for this computer to handle.
Last fs block = 0x1049c5c47, but sizeof(unsigned long) is 4 bytes.
Unable to determine the boundaries of the file system.
Freeing buffers.
---

Like when trying to access a >16Tb on 32bit.
But the disk below is just 350Gb!!

[root at west ~]# lvdisplay /dev/mapper/VolGroup_FS100-LogVol_FS100 
  --- Logical volume ---
  LV Name                /dev/VolGroup_FS100/LogVol_FS100
  VG Name                VolGroup_FS100
  LV UUID                6kPwvg-AOuA-iUOY-KboE-PyRO-DPNt-5yeD3h
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                349.99 GB
  Current LE             89597
  Segments               3
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:17
-----

How can i resolve the issue? / How can i recover the data?

Infos:

CentOS 5.1
[root at west ~]#  rpm -qa|grep -i gfs
gfs-utils-0.1.12-1.el5
gfs2-utils-0.1.38-1.el5
kmod-gfs-PAE-0.1.19-7.el5_1.1
kmod-gfs-PAE-0.1.16-6.2.6.18_8.1.15.el5
kmod-gfs-PAE-0.1.19-7.el5
---------
[root at west ~]# uname -a
Linux west.polito.it 2.6.18-53.1.21.el5PAE #1 SMP Tue May 20 10:03:06
EDT 2008 i686 i686 i386 GNU/Linux
-------

P.s: tried also gfs-utils-0.1.17-1 gfs_fsck but with no luck :(


From cma at analog.org  Tue Jun  3 14:58:45 2008
From: cma at analog.org (Chris Adams)
Date: Tue, 3 Jun 2008 09:58:45 -0500
Subject: [Linux-cluster] gfs 6.1 superblock backups
Message-ID: <20080603145845.GA88611@analog.org>

Does GFS 6.1 have any superblock backups a la ext2/3?  If so, how can I 
find them?

thanks,
-chris


From s.wendy.cheng at gmail.com  Tue Jun  3 15:03:55 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Tue, 03 Jun 2008 11:03:55 -0400
Subject: [Linux-cluster] gfs 6.1 superblock backups
In-Reply-To: <20080603145845.GA88611@analog.org>
References: <20080603145845.GA88611@analog.org>
Message-ID: <48455D5B.8080909@gmail.com>

Chris Adams wrote:
> Does GFS 6.1 have any superblock backups a la ext2/3?  If so, how can I 
> find them?
>
>   

Unfortunately, no.


From cma at analog.org  Tue Jun  3 16:27:55 2008
From: cma at analog.org (Chris Adams)
Date: Tue, 3 Jun 2008 11:27:55 -0500
Subject: [Linux-cluster] gfs 6.1 superblock backups
Message-ID: <20080603162755.GA89011@analog.org>

On Tue, 2008-06-03 at 11:03 -0400, Wendy Cheng wrote:
Chris Adams wrote:
> > Does GFS 6.1 have any superblock backups a la ext2/3?  If so, how
> > can I find them?
>
> Unfortunately, no.
>

If that is the case, then is it safe to assume that fs_sb_format will
always be bytes 0x1001a and 0x100b on a gfs logical volume, and that that 
is the only location on the lv that it is stored?  I see 
#define GFS_FORMAT_FS (1309) /* Filesystem (all-encompassing) */
and that is the location that where I see 0x051d (1309) stored.

thanks,
-chris


From mghofran at caregroup.harvard.edu  Tue Jun  3 16:30:04 2008
From: mghofran at caregroup.harvard.edu (mghofran at caregroup.harvard.edu)
Date: Tue, 3 Jun 2008 12:30:04 -0400
Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding
In-Reply-To: <8C22506D4103BE40B23DFE9E04B2D8FE05E35563@GBW607SC0054.GB-WS.net>
References: <8C22506D4103BE40B23DFE9E04B2D8FE05E35563@GBW607SC0054.GB-WS.net>
Message-ID: <1BA553C5537DA74194724A82D9595CCB841DF7@EVS8.its.caregroup.org>

One observation:

In your bond1 file, shouldn't you have a "type=bonding"?

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patel Dino
Sent: Monday, June 02, 2008 2:02 PM
To: linux clustering
Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding

I've updated the e1000 drivers from  version7.2.7  to  version7.6.15.5 and still getting the same problems.

Any more suggestions would be appreciated.

D
-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Roger Pe?a
Sent: Monday, June 02, 2008 3:03 PM
To: linux clustering
Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding


--- On Mon, 6/2/08, Patel Dino <Dinesh.Patel at AAH.co.uk> wrote:

> From: Patel Dino <Dinesh.Patel at AAH.co.uk>
> Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding
> To: "linux clustering" <linux-cluster at redhat.com>
> Received: Monday, June 2, 2008, 9:48 AM
> I think I know what's going on ...
> 
> When I take down the two slave interfaces (eth2 & eth3)
> on Node C, the
> bond1 interface remains UP. 
> This means  that the Node C still thinks its OK, however it
> can not see
> Node A & B, and tries to fence Node B.
> Node A which is the master fences Node C.
> 
> I'm not sure how to resolve this any help would be
> appreciated.

as previously said, it is a bond problem

I had several problems with bonding e1000 interfaces in RHEL4, and it was a problem with the e1000 driver, as soon as I use a new one, bond start to work properly
I don?t know if that is the case with rhel5.0, but maybe it is.

you can check the archives of this list if you want to find which version of e1000 driver fix my problem.

cu
roger


      __________________________________________________________________
Get a sneak peak at messages with a handy reading pane with All new Yahoo! Mail: http://ca.promos.yahoo.com/newmail/overview2/

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
*****************************************************************************
DISCLAIMER 
The information contained in this e-mail is confidential and is intended 
for the recipient only. 
If you have received it in error, please notify us immediately by reply 
e-mail and then delete it from your system. Please do not copy it or 
use it for any other purposes, or disclose the content of the e-mail 
to any other person or store or copy the information in any medium. 
The views contained in this e-mail are those of the author and not 
necessarily those of AAH Pharmaceuticals Ltd. 
AAH Pharmaceuticals Ltd is a company incorporated in England and 
Wales under company number 123458 and whose registered office 
is at Sapphire Court, Walsgrave Triangle, Coventry, CV2 2TX 
*****************************************************************************

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From rpeterso at redhat.com  Tue Jun  3 17:23:00 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 03 Jun 2008 12:23:00 -0500
Subject: [Linux-cluster] Error with gfs_grow/ gfs_fsck
In-Reply-To: <1212501214.10658.11.camel@GD-P2-093>
References: <1212501214.10658.11.camel@GD-P2-093>
Message-ID: <1212513780.3428.1.camel@technetium.msp.redhat.com>

Hi,

On Tue, 2008-06-03 at 15:53 +0200, Miolinux wrote:
> Hi,
> 
> I tried to expand my gfs filesystem from 250Gb to 350Gb.
> I run gfs_grow without any error or warnings.
> But something gone wrong.
> 
> Now, i cannot mount the gfs filesystem anymore (lock computer)
> 
> When i try to do a gfs_fsck i get:
> 
> [root at west ~]# gfs_fsck -v /dev/mapper/VolGroup_FS100-LogVol_FS100 
> Initializing fsck
> Initializing lists...
> Initializing special inodes...
> Validating Resource Group index.
> Level 1 check.
> 371 resource groups found.
> (passed)
> Setting block ranges...
> This file system is too big for this computer to handle.
> Last fs block = 0x1049c5c47, but sizeof(unsigned long) is 4 bytes.
> Unable to determine the boundaries of the file system.

You've probably hit the gfs_grow bug described in bz #434962 (436383)
and the gfs_fsck bug described in 440897 (440896).  My apologies if
you can't read them; permissions to individual bugzilla records are
out of my control.

The fixes are available in the recently released RHEL5.2, although
I don't know when they'll hit Centos.  The fixes are also available
in the latest cluster git tree if you want to compile/install them
from source code yourself.  Documentation for doing this can
be found at: http://sources.redhat.com/cluster/wiki/ClusterGit

Regards,

Bob Peterson
Red Hat Clustering & GFS


From s.wendy.cheng at gmail.com  Tue Jun  3 17:43:46 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Tue, 03 Jun 2008 13:43:46 -0400
Subject: [Linux-cluster] Error with gfs_grow/ gfs_fsck
In-Reply-To: <1212513780.3428.1.camel@technetium.msp.redhat.com>
References: <1212501214.10658.11.camel@GD-P2-093>
	<1212513780.3428.1.camel@technetium.msp.redhat.com>
Message-ID: <484582D2.20401@gmail.com>

Bob Peterson wrote:
> Hi,
>
> On Tue, 2008-06-03 at 15:53 +0200, Miolinux wrote:
>   
>> Hi,
>>
>> I tried to expand my gfs filesystem from 250Gb to 350Gb.
>> I run gfs_grow without any error or warnings.
>> But something gone wrong.
>>
>> Now, i cannot mount the gfs filesystem anymore (lock computer)
>>
>> When i try to do a gfs_fsck i get:
>>
>> [root at west ~]# gfs_fsck -v /dev/mapper/VolGroup_FS100-LogVol_FS100 
>> Initializing fsck
>> Initializing lists...
>> Initializing special inodes...
>> Validating Resource Group index.
>> Level 1 check.
>> 371 resource groups found.
>> (passed)
>> Setting block ranges...
>> This file system is too big for this computer to handle.
>> Last fs block = 0x1049c5c47, but sizeof(unsigned long) is 4 bytes.
>> Unable to determine the boundaries of the file system.
>>     
>
> You've probably hit the gfs_grow bug described in bz #434962 (436383)
> and the gfs_fsck bug described in 440897 (440896).  My apologies if
> you can't read them; permissions to individual bugzilla records are
> out of my control.
>
> The fixes are available in the recently released RHEL5.2, although
> I don't know when they'll hit Centos.  The fixes are also available
> in the latest cluster git tree if you want to compile/install them
> from source code yourself.  Documentation for doing this can
> be found at: http://sources.redhat.com/cluster/wiki/ClusterGit
>
>   
This is almost qualified as an FAQ entry :) ...

-- Wendy


From s.wendy.cheng at gmail.com  Tue Jun  3 17:56:57 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Tue, 03 Jun 2008 13:56:57 -0400
Subject: [Linux-cluster] gfs 6.1 superblock backups
In-Reply-To: <20080603162755.GA89011@analog.org>
References: <20080603162755.GA89011@analog.org>
Message-ID: <484585E9.3060505@gmail.com>

Chris Adams wrote:
> On Tue, 2008-06-03 at 11:03 -0400, Wendy Cheng wrote:
> Chris Adams wrote:
>   
>>> Does GFS 6.1 have any superblock backups a la ext2/3?  If so, how
>>> can I find them?
>>>       
>> Unfortunately, no.
>>
>>     
>
> If that is the case, then is it safe to assume that fs_sb_format will
> always be bytes 0x1001a and 0x100b on a gfs logical volume, and that that 
> is the only location on the lv that it is stored?  I see 
> #define GFS_FORMAT_FS (1309) /* Filesystem (all-encompassing) */
> and that is the location that where I see 0x051d (1309) stored.
>
>   
Yes .. in theory (since I don't have the source code in front of me at 
this moment).

Thinking to hand patch it, don't you ? ... There is a header file (I 
think it is gfs_ondisk.h) that describes the super block layout.

-- Wendy


From rpeterso at redhat.com  Tue Jun  3 17:55:50 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 03 Jun 2008 12:55:50 -0500
Subject: [Linux-cluster] gfs 6.1 superblock backups
In-Reply-To: <20080603162755.GA89011@analog.org>
References: <20080603162755.GA89011@analog.org>
Message-ID: <1212515750.3428.32.camel@technetium.msp.redhat.com>

On Tue, 2008-06-03 at 11:27 -0500, Chris Adams wrote:
> On Tue, 2008-06-03 at 11:03 -0400, Wendy Cheng wrote:
> Chris Adams wrote:
> > > Does GFS 6.1 have any superblock backups a la ext2/3?  If so, how
> > > can I find them?
> >
> > Unfortunately, no.
> >
> 
> If that is the case, then is it safe to assume that fs_sb_format will
> always be bytes 0x1001a and 0x100b on a gfs logical volume, and that that 
> is the only location on the lv that it is stored?  I see 
> #define GFS_FORMAT_FS (1309) /* Filesystem (all-encompassing) */
> and that is the location that where I see 0x051d (1309) stored.
> 
> thanks,
> -chris

Hi Chris,

As Wendy pointed out, there is only one copy of the GFS superblock.

You might be better off recreating the file system with gfs_mkfs
and restoring from backup.  If that option isn't available, read on:

The superblock itself is not too horrible to reconstruct, as long
as you know the block size (default is 4096).  The big question is:
did anything AFTER the superblock get destroyed?  A lot depends on
what was destroyed.  Immediately after the superblock is the first
resource group (RG) and its bitmaps, and if they got blasted, it
might be difficult to reconstruct your file system.  The newer
versions of gfs_fsck can repair a lot of these problems though, so
once you have a proper GFS superblock, you can give that a try.
If the RG was destroyed, gfs_fsck is likely to complain about a 
lot of things.

Right after the first set of bitmaps comes some important
system files: journal index, resource group index, etc.
If those got destroyed, it's even more difficult or even
impossible to get your file system back.  The quota file follows
and then the license file (now reused for fast statfs).  After
that is the root directory.  So you see, it all depends on what
all is destroyed and what is still intact.

If ONLY the gfs superblock got destroyed, you might be able to use the
gfs2_edit tool to patch in the correct values.  The superblock ought
to look something like this:

gfs2_edit - Global File System Editor (use with extreme caution)
Block #16    (0x10)           of 13092864 (0xC7C800)   (superblock)
(p.1 of 6)
00010000 01161970 00000001 00000000 00000000 [...p............]
00010010 00000064 00000000 0000051D 00000579 [...d...........y]
00010020 00000000 00001000 0000000C 00000010 [................]
00010030 00000000 00000016 00000000 00000016 [................]
00010040 00000000 00000017 00000000 00000017 [................]
00010050 00000000 0000001A 00000000 0000001A [................]
00010060 6C6F636B 5F646C6D 00000000 00000000 [lock_dlm........]
00010070 00000000 00000000 00000000 00000000 [................]
00010080 00000000 00000000 00000000 00000000 [................]
00010090 00000000 00000000 00000000 00000000 [................]
000100A0 626F6273 5F657878 6F6E3A65 78786F6E [bobs_exxon:exxon]
000100B0 5F6C7600 00000000 00000000 00000000 [_lv.............]
000100C0 00000000 00000000 00000000 00000000 [................]
000100D0 00000000 00000000 00000000 00000000 [................]
000100E0 00000000 00000018 00000000 00000018 [................]
000100F0 00000000 00000019 00000000 00000019 [................]
00010100 00000000 00000000 00000000 00000000 [................]
00010110 00000000 00000000 00000000 00000000 [................]
00010120 00000000 00000000 00000000 00000000 [................]
00010130 00000000 00000000 00000000 00000000 [................]
00010140 00000000 00000000 00000000 00000000 [................]
00010150 00000000 00000000 00000000 00000000 [................]

Everything after offset 0x150 should be zeroes on that block.
To get a breakdown of the superblock fields, press the "m" key.
For my example above, the field breakdown looks like this:

Superblock:
  mh_magic              0x01161970                (hex)
  mh_type               1                         0x1
  mh_format             100                       0x64
  sb_fs_format          1309                      0x51d
  sb_multihost_format   1401                      0x579
  sb_bsize              4096                      0x1000
  sb_bsize_shift        12                        0xc
  jindex ino            22                        0x16
                        22                        0x16
  rindex ino            23                        0x17
                        23                        0x17
  root dir              26                        0x1a
                        26                        0x1a
  sb_lockproto          lock_dlm
  sb_locktable          bobs_exxon:exxon_lv       
  quota ino             24                        0x18
                        24                        0x18
  license               25                        0x19
                        25                        0x19

The 'm' key is a three-way toggle, so you can get back to hex
mode by pressing it again once or twice.  The gfs2_tool is complex
and can be dangerous, so I don't recommend it for file systems
that are in production, unless your need is great.  Also,
never use it when the fs is mounted. The gfs2_edit man page tells
how to use it.  If this is a RHEL5 system or similar, you'll already
have the gfs2_edit tool available to you.  If this is RHEL4 you 
won't have gfs2_edit so your options are: (1) use gfs_edit which
is a primitive version  of the same tool, (2) I did a port of
gfs2_edit for RHEL4.  The source tree may be found at:

http://people.redhat.com/rpeterso/Experimental/RHEL4.x/

If you go this route, you would have to untar the file, then do:

.configure --kernel_src=/usr/src/kernels/(your kernel)
make
make install

This port assumes you have the kernel headers (i.e. kernel-devel)
rpms installed.  I hope this helps.

Regards,

Bob Peterson
Red Hat Clustering & GFS


From bkyoung at gmail.com  Tue Jun  3 17:55:57 2008
From: bkyoung at gmail.com (Brandon Young)
Date: Tue, 3 Jun 2008 12:55:57 -0500
Subject: [Linux-cluster] Fencing Device Question
Message-ID: <824ffea00806031055n6c02701fh8a7fa9587727217e@mail.gmail.com>

In my GFS cluster, I use DRAC cards as the fencing device for each node.
Yesterday, I had a situation where the DRAC card on a particular node had
failed, and would not allow remote logins, etc, but it still returned
pings.  I don't know how long the card had been dead, and I only noticed
because I wished to manually fence the node and fencing failed ... which
caused me all sorts of other fun to recover the cluster, afterwards.  So, I
have uncovered a pretty scary bad-case scenario for my cluster
configuration.

My question is what (if anything) can RHCS/GFS do to determine the
health/presence/operation of fencing devices?  If it can do something to
monitor the fencing devices, and discovers a bad fencing device, what will
it do?  For example, if I unplug the network cable for the heartbeat, the
node will get fenced immediately.  I never tested whether the same would
happen if I unplugged a fencing device.  I haven't delved into the
documentation in a while, but I don't remember anything about a way to have
redundant fencing devices, like a DRAC and a network power switch.  Is there
a way?

Thoughts, opinions, insight, documentation, etc would be greatly
appreciated.

--
Brandon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080603/536afcaf/attachment.htm>

From cma at analog.org  Tue Jun  3 18:27:13 2008
From: cma at analog.org (Chris Adams)
Date: Tue, 3 Jun 2008 13:27:13 -0500
Subject: [Linux-cluster] gfs 6.1 superblock backups
Message-ID: <20080603182713.GA89586@analog.org>

Bob and Wendy, 
Thank you for your input on this.  What I am trying to do 
is upgrade a GFS 6.0 filesystems which are attached to various 
RHEL3/CentOS3 systems.  After performing the steps which outline the 
process of going from 3 to 4, but on a CentOS 5 system, I get the problems 
mentioned in my message yesterday Re: /sbin/mount.gfs thinks fs is gfs2?  
Everyt time I reinstalled a system with CentOS 5 and tried to get gfs 
running again I got the same error.

Since I know that this is an unsupported operation, I haven't sought 
support for this.  However, I noticed that my upgraded filesystem had 
sb_fs_format = 1308.  The mount code checks for sb_fs_format == 
GFS_FORMAT_FS for gfs 6.1 and GFS2_FORMAT_FS for gfs2.  Since it was 
neither of these, it kept dying saying that it was a gfs2 fs when mounting 
it as gfs, and vice versa.  Manually modifying sb_fs_format allowed it to 
mount immediately afterward.  A subsequent gfs_fsck completes all passes 
successfully.  

Is that sufficient for upgrading the filesystem if the other steps are 
performed?  All fs operations appear to be successful at this point.

thanks,
-chris


From rpeterso at redhat.com  Tue Jun  3 18:49:12 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 03 Jun 2008 13:49:12 -0500
Subject: [Linux-cluster] gfs 6.1 superblock backups
In-Reply-To: <20080603182713.GA89586@analog.org>
References: <20080603182713.GA89586@analog.org>
Message-ID: <1212518952.3428.46.camel@technetium.msp.redhat.com>

On Tue, 2008-06-03 at 13:27 -0500, Chris Adams wrote:
> Bob and Wendy, 
> Thank you for your input on this.  What I am trying to do 
> is upgrade a GFS 6.0 filesystems which are attached to various 
> RHEL3/CentOS3 systems.  After performing the steps which outline the 
> process of going from 3 to 4, but on a CentOS 5 system, I get the problems 
> mentioned in my message yesterday Re: /sbin/mount.gfs thinks fs is gfs2?  
> Everyt time I reinstalled a system with CentOS 5 and tried to get gfs 
> running again I got the same error.
> 
> Since I know that this is an unsupported operation, I haven't sought 
> support for this.  However, I noticed that my upgraded filesystem had 
> sb_fs_format = 1308.  The mount code checks for sb_fs_format == 
> GFS_FORMAT_FS for gfs 6.1 and GFS2_FORMAT_FS for gfs2.  Since it was 
> neither of these, it kept dying saying that it was a gfs2 fs when mounting 
> it as gfs, and vice versa.  Manually modifying sb_fs_format allowed it to 
> mount immediately afterward.  A subsequent gfs_fsck completes all passes 
> successfully.  
> 
> Is that sufficient for upgrading the filesystem if the other steps are 
> performed?  All fs operations appear to be successful at this point.
> 
> thanks,
> -chris

Hey Chris,

I really don't know offhand what changed in the file system between the
RHEL3 proprietary version of GFS and the version we have today.
(There aren't any differences between RHEL4.x and RHEL5.x GFS format).

I can't think of a good reason why my predecessors would have changed
the file system format ID unless there was something in the file system
that changed and needed reorganizing or reformatting.  So like you, that
makes me concerned about some loose end.  However, I do know gfs_fsck
pretty well, and if it says the file system is sane, you should be able
to trust it.

This is just a guess, but perhaps it had something to do with the
difference between the old proprietary GFS (i.e. the old license file)
and the GFS Red Hat open-sourced (i.e. empty license file because no
license is needed to use it).  If I'm correct, it's not likely to
cause any problems.

There are a few developers from that era around; maybe they'll remember
what changed back then and post why it was done.

Regards,

Bob Peterson
Red Hat Clustering & GFS


From fdinitto at redhat.com  Tue Jun  3 19:19:40 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 3 Jun 2008 21:19:40 +0200 (CEST)
Subject: [Linux-cluster] Announcing Perl bindings for libcman
In-Reply-To: <ed9a61600806012148v519fa6e6u1ac113dd2c15cbec@mail.gmail.com>
References: <ed9a61600806012148v519fa6e6u1ac113dd2c15cbec@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0806032112070.5892@trider-g7>


Hi,

On Mon, 2 Jun 2008, S. Zachariah Sprackett wrote:

> Hello,
>
> I'd like to announce the availability of my Perl bindings for libcman.
>
> You can grab them from here:
>
> http://zac.sprackett.com/cman/cluster-cman-0.01.tar.gz

this looks really good.

What I would really love to see is a set of perl and python bindings for 
our shared libraries and part of our official releases.

As we discussed on IRC, i'd like them for our master branch in git
for libccs, libcman, libdlm and libfence.

In master (pre3):
libccs from cluster/config/libs/libccsconfdb/
libcman from cluster/cman/lib
libdlm from cluster/dlm/libdlm
libfence from cluster/fence/libfence
  (careful there is also a libfenced that we don't need)

I believe that all the API's in these libraries are stable by now, but i 
can't guarantee that 100% yet.

Please submit what you like and in your preferred format (patches tho 
would be best). I noticed that you used GPL2 licence and that's perfect.
Make _absolutely_ sure that you take copyright and credits for your work 
:)

Thanks a lot for your contribution
Fabio

--
I'm going to make him an offer he can't refuse.


From s.wendy.cheng at gmail.com  Tue Jun  3 23:15:41 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Tue, 03 Jun 2008 19:15:41 -0400
Subject: [Linux-cluster] gfs 6.1 superblock backups
In-Reply-To: <1212518952.3428.46.camel@technetium.msp.redhat.com>
References: <20080603182713.GA89586@analog.org>
	<1212518952.3428.46.camel@technetium.msp.redhat.com>
Message-ID: <4845D09D.6050902@gmail.com>

Bob Peterson wrote:
> On Tue, 2008-06-03 at 13:27 -0500, Chris Adams wrote:
>   
>> Bob and Wendy, 
>> Thank you for your input on this.  What I am trying to do 
>> is upgrade a GFS 6.0 filesystems which are attached to various 
>> RHEL3/CentOS3 systems.  After performing the steps which outline the 
>> process of going from 3 to 4, but on a CentOS 5 system, I get the problems 
>> mentioned in my message yesterday Re: /sbin/mount.gfs thinks fs is gfs2?  
>> Everyt time I reinstalled a system with CentOS 5 and tried to get gfs 
>> running again I got the same error.
>>
>> Since I know that this is an unsupported operation, I haven't sought 
>> support for this.  However, I noticed that my upgraded filesystem had 
>> sb_fs_format = 1308.  The mount code checks for sb_fs_format == 
>> GFS_FORMAT_FS for gfs 6.1 and GFS2_FORMAT_FS for gfs2.  Since it was 
>> neither of these, it kept dying saying that it was a gfs2 fs when mounting 
>> it as gfs, and vice versa.  Manually modifying sb_fs_format allowed it to 
>> mount immediately afterward.  A subsequent gfs_fsck completes all passes 
>> successfully.  
>>
>> Is that sufficient for upgrading the filesystem if the other steps are 
>> performed?  All fs operations appear to be successful at this point.
>>
>> thanks,
>> -chris
>>     
>
> I can't think of a good reason why my predecessors would have changed
> the file system format ID unless there was something in the file system
> that changed and needed reorganizing or reformatting. 
I'm not the person who added this ID but it is a *right* thing to do. As 
a rule of thumb, when moving between major releases, such as RHEL3 and 
RHEL4, a filesystem needs to have an identifier to facilitate the 
upgrade process. There should be documents, commands and/or tools to 
guide people how to do the upgrade - all require this type of "ID" 
implementation. And there should be associated testing efforts allocated 
to the upgrade command as a safe guard before you can call a filesystem 
"enterprise product". For GFS specifically, the locking protocols are 
different between GFS 6.0 and 6.1 (e.g. GULM is in RHEL3 but not in 
RHEL4) and locking protocol is part of the superblock structure, iirc.

 From practical point of view, it is probably ok to keep going (but do 
check RHEL manuals - there should be chapters talking about migration 
and upgrade between RHEL3 to 4 and RHEL4 to 5).

 From process point of view, this looks like a RHEL5 bug to me.

-- Wendy


From miolinux at libero.it  Wed Jun  4 08:23:55 2008
From: miolinux at libero.it (Miolinux)
Date: Wed, 04 Jun 2008 10:23:55 +0200
Subject: [Linux-cluster] Error with gfs_grow/ gfs_fsck
In-Reply-To: <484582D2.20401@gmail.com>
References: <1212501214.10658.11.camel@GD-P2-093>
	<1212513780.3428.1.camel@technetium.msp.redhat.com>
	<484582D2.20401@gmail.com>
Message-ID: <1212567835.7752.3.camel@GD-P2-093>

On Tue, 2008-06-03 at 13:43 -0400, Wendy Cheng wrote:
> Bob Peterson wrote:
> > Hi,
> >
> > On Tue, 2008-06-03 at 15:53 +0200, Miolinux wrote:
> >   
> >> Hi,
> >>
> >> I tried to expand my gfs filesystem from 250Gb to 350Gb.
> >> I run gfs_grow without any error or warnings.
> >> But something gone wrong.
> >>
> >> Now, i cannot mount the gfs filesystem anymore (lock computer)
> >>
> >> When i try to do a gfs_fsck i get:
> >>
> >> [root at west ~]# gfs_fsck -v /dev/mapper/VolGroup_FS100-LogVol_FS100 
> >> Initializing fsck
> >> Initializing lists...
> >> Initializing special inodes...
> >> Validating Resource Group index.
> >> Level 1 check.
> >> 371 resource groups found.
> >> (passed)
> >> Setting block ranges...
> >> This file system is too big for this computer to handle.
> >> Last fs block = 0x1049c5c47, but sizeof(unsigned long) is 4 bytes.
> >> Unable to determine the boundaries of the file system.
> >>     
> >
> > You've probably hit the gfs_grow bug described in bz #434962 (436383)
> > and the gfs_fsck bug described in 440897 (440896).  My apologies if
> > you can't read them; permissions to individual bugzilla records are
> > out of my control.
> >
> > The fixes are available in the recently released RHEL5.2, although
> > I don't know when they'll hit Centos.  The fixes are also available
> > in the latest cluster git tree if you want to compile/install them
> > from source code yourself.  Documentation for doing this can
> > be found at: http://sources.redhat.com/cluster/wiki/ClusterGit
> >
> >   
> This is almost qualified as an FAQ entry :) ...
> 
> -- Wendy
> 
> --

Yes, indeed i followed instruction in ?Mikko Partio thread and now it
seems working, however i had to install a new computer with a 64bit OS,
and compiled a 64bit version of gfs_fsck to fsck the broken disk.

Thanks,
Miolinux


From Alain.Moulle at bull.net  Wed Jun  4 09:14:42 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Wed, 04 Jun 2008 11:14:42 +0200
Subject: [Linux-cluster] CS5 / tuning token and consequence on dlm
Message-ID: <48465D02.2020104@bull.net>

Hi

With CS5 :
Is there always a link to the value to set for :
DLM_LOCK_TIMEOUT
if the token default is modified in cluster.conf ????

(with CS4, the modification of deadnode_timer was
to be linked to a modification of the DLM_LOCK_TIMEOUT)

Thanks
Regards
Alain Moull?


From sunhux at gmail.com  Wed Jun  4 10:49:24 2008
From: sunhux at gmail.com (sunhux G)
Date: Wed, 4 Jun 2008 18:49:24 +0800
Subject: [Linux-cluster] heartbeat over 2 NICs - Hi Christine
Message-ID: <60f08e700806040349u344c1bdakcf67ba6ee9492c18@mail.gmail.com>

Hi Christine,


I could have searched Redhat knowledgebase but thought would
be easier if I clarify here.   We plan to cluster two RHES, server A
& server B (on Ver 5.1AP)

a)besides the regular network port for the usual network traffic,
   we only need one additional network port per server to set up
   the clustering, is this right?

b)what if we want to use 2 network ports, then we have to bond
   the two network ports on server A & the two network ports on
   server B - is this right?

c)anything we need to do on the Cisco switch's ports end? We
   are using Cisco 6513


Thanks
U

On 6/2/08, Christine Caulfield <ccaulfie at redhat.com> wrote:
>
> Jakub Suchy wrote:
>
>> Hi,
>> I would like to know, if it's possible to run heartbeat (through cman)
>> over two dedicated network NICs. AFAIK, in old hearbeat code, it was
>> possible using serial + NIC. Unfortunately, I was unable to find this in
>> any documentation and this is the first time a customer is requesting
>> this. (I am not talking about network bonding).
>>
>>
> Basically, no.
>
> If you want to use 2 NICs then bonding is what you need. cman can use dual
> NICs after a fashion but it's not supported and even less well tested.
>
> Sorry.
>
> --
>
> Chrissie
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080604/8e8e2a0f/attachment.htm>

From martin.fuerstenau at oce.com  Wed Jun  4 12:12:21 2008
From: martin.fuerstenau at oce.com (Martin Fuerstenau)
Date: Wed, 4 Jun 2008 14:12:21 +0200
Subject: [Linux-cluster] heartbeat over 2 NICs - Hi Christine
In-Reply-To: <60f08e700806040349u344c1bdakcf67ba6ee9492c18@mail.gmail.com>
References: <60f08e700806040349u344c1bdakcf67ba6ee9492c18@mail.gmail.com>
Message-ID: <1212581541.19889.24.camel@lx002140.ops.de>

Hi,

in my config here I use 2 dual port network cards in each node. I run a
2 node cluster. The nodes are in two racks in the same room.

Port 1 of Card 1 and Port 1 of card 2 are bonded to bond0 and are (for
fail over and redundancy) connected to 2 Cisco switches. This
configuration is save even in the case if on network card will fail.

Port 2 of Card 1 and Port 2 of card 2 are bionded to interface bond1.
This interface has a private non routed address (192.168....) and is
connected to the second with 2 crossed network cables. Therefore I need
no switch for the cluster internal traffic. And that means more security
because a nonexixting switch can not fail.

This configuration works well now for the last two years.

Yours
Martin F?rstenau
Oce Printing Systems

On Wed, 2008-06-04 at 18:49 +0800, sunhux G wrote:
> Hi Christine,
>  
>  
> I could have searched Redhat knowledgebase but thought would
> be easier if I clarify here.   We plan to cluster two RHES, server A
> & server B (on Ver 5.1AP) 
>  
> a)besides the regular network port for the usual network traffic,
>    we only need one additional network port per server to set up
>    the clustering, is this right?
>  
> b)what if we want to use 2 network ports, then we have to bond
>    the two network ports on server A & the two network ports on
>    server B - is this right?
>  
> c)anything we need to do on the Cisco switch's ports end? We
>    are using Cisco 6513
> 
>  
> Thanks
> U
>  
> On 6/2/08, Christine Caulfield <ccaulfie at redhat.com> wrote: 
>         Jakub Suchy wrote:
>                 Hi,
>                 I would like to know, if it's possible to run
>                 heartbeat (through cman)
>                 over two dedicated network NICs. AFAIK, in old
>                 hearbeat code, it was
>                 possible using serial + NIC. Unfortunately, I was
>                 unable to find this in
>                 any documentation and this is the first time a
>                 customer is requesting
>                 this. (I am not talking about network bonding).
>                 
>         
>         Basically, no.
>         
>         If you want to use 2 NICs then bonding is what you need. cman
>         can use dual NICs after a fashion but it's not supported and
>         even less well tested.
>         
>         Sorry.
>         
>         -- 
>         
>         Chrissie 
>         
>         
>         --
>         Linux-cluster mailing list
>         Linux-cluster at redhat.com
>         https://www.redhat.com/mailman/listinfo/linux-cluster
>         
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Visit Oce at drupa! Register online now: <http://drupa.oce.com>

This message and attachment(s) are intended solely for use by the addressee and may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law.

If you are not the intended recipient or agent thereof responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.

If you have received this communication in error, please notify the sender immediately by telephone and with a 'reply' message.

Thank you for your co-operation.


From johannes.russek at io-consulting.net  Wed Jun  4 12:15:42 2008
From: johannes.russek at io-consulting.net (Johannes Russek)
Date: Wed, 04 Jun 2008 14:15:42 +0200
Subject: [Linux-cluster] Fencing Device Question
In-Reply-To: <824ffea00806031055n6c02701fh8a7fa9587727217e@mail.gmail.com>
References: <824ffea00806031055n6c02701fh8a7fa9587727217e@mail.gmail.com>
Message-ID: <4846876E.7080406@io-consulting.net>


> My question is what (if anything) can RHCS/GFS do to determine the 
> health/presence/operation of fencing devices?  If it can do something 
> to monitor the fencing devices, and discovers a bad fencing device, 
> what will it do?  For example, if I unplug the network cable for the 
> heartbeat, the node will get fenced immediately.  I never tested 
> whether the same would happen if I unplugged a fencing device.  I 
> haven't delved into the documentation in a while, but I don't remember 
> anything about a way to have redundant fencing devices, like a DRAC 
> and a network power switch.  Is there a way?

You should be able to add as many fencing devices as you like, cman 
should go through them top to bottom, if it won't get a positive 
response from the fencing script.
in my case i have IPMI, then network power switch, then fabric fencing.
Regards,
Johannes

>
> Thoughts, opinions, insight, documentation, etc would be greatly 
> appreciated.
>
> --
> Brandon


> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From Alain.Moulle at bull.net  Wed Jun  4 12:19:15 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Wed, 04 Jun 2008 14:19:15 +0200
Subject: [Linux-cluster] CS5 / is there a tunable timer between the three
 start/stop tries ?
Message-ID: <48468843.5040300@bull.net>

Hi

With CS5, when the status of a service returns failed, the CS5 tries
to start three times the service , so we can see three start/stop
sequences if it does not start correctly each time. The following
start is always launchec just after the stop,
is there a tunable timer between the three start/stop tries ?

Regards
Alain Moull?


From mgrac at redhat.com  Wed Jun  4 12:31:41 2008
From: mgrac at redhat.com (Marek 'marx' Grac)
Date: Wed, 04 Jun 2008 14:31:41 +0200
Subject: [Linux-cluster] apache resource problem in RHCS 5.1
In-Reply-To: <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>
References: <483ECA36.7070007@xbe.ch>
	<9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>
Message-ID: <48468B2D.7060509@redhat.com>

Hi,

Ron Cronenwett wrote:
> I found if I did not configure SELinux with setenforce permissive, the
> /usr/share/cluster/apache.sh script did not execute. Once that runs,
> it creates
> /etc/cluster/apache/apache:"name". In that subdirectory, the script
> creates an httpd.conf file from /etc/httpd/httpd.conf. I also found
> the new httpd.conf
> had the Listen statement commented out even though I had set it to my
> clustered address in /etc/httpd/httpd. I needed to manually uncomment
> the
> Listen statement on each node in /etc/cluster/apache/apache:"name"/httpd.conf.
>   

IP addresses for Apache (same for MySQL, PgSQL, tomcat, ...) are taken 
from the configuration. This is the reason why original values are 
commented and replaced with those from cluster.conf (ip address should 
be a child to service and sibling to apache - as you can use this IP 
address for different resource agents)

m,


-- 
Marek Grac
Red Hat Czech s.r.o.


From Alain.Moulle at bull.net  Wed Jun  4 12:47:21 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Wed, 04 Jun 2008 14:47:21 +0200
Subject: [Linux-cluster] CS5 / about loop "Node is undead"
Message-ID: <48468ED9.3050401@bull.net>

Hi

About my problem of node entering a loop :
Jun  3 15:54:49 s_sys at xn2 qdiskd[22256]: <notice> Writing eviction notice for node 1
Jun  3 15:54:50 s_sys at xn2 qdiskd[22256]: <notice> Node 1 evicted
Jun  3 15:54:51 s_sys at xn2 qdiskd[22256]: <crit> Node 1 is undead.

I notice that just before entering this loop, I have a message :
Jun  3 15:54:47 s_sys at xn2 fenced[22327]: fencing node "xn1"
Jun  3 15:54:48 s_sys at xn2 qdiskd[22256]: <info> Assuming master role

but never the message :
Jun  3 15:54:47 s_sys at xn2 fenced[22327]: fence "xn1" success

Nethertheless, the service of xn1 is well failovered by xn2, but
then after the reboot of xn1, we can't start again the CS5 due
to the problem of infernal loop "Node is undead" on xn2.

whereas when it works correctly, both messages :
fencing node "xn1"
fence "xn1" success
are successive (after about 30s)

So my question is : could this pb of infernal loop "Node is undead"
be systematically due to a failed fencing phase of xn2 towards xn1 ?

PS: note that I have applied patch :
http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9

Thanks
Regards
Alain Moull?


From lp at xbe.ch  Wed Jun  4 13:31:45 2008
From: lp at xbe.ch (Lorenz Pfiffner)
Date: Wed, 04 Jun 2008 15:31:45 +0200
Subject: [Linux-cluster] apache resource problem in RHCS 5.1
In-Reply-To: <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>
References: <483ECA36.7070007@xbe.ch>
	<9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>
Message-ID: <48469941.6030800@xbe.ch>

Hi Ron

Thanks for replying! Your answer gave me some tipps, but none of them worked for me. I don't have SELinux enabled or permissive, it's disabled anyway.

I couldn't make it working with the apache resource. For me it seems quite unstable and it's nowhere really mentioned in any documentation I found. So please, if any RedHat guy is reading this, can 
you please improve this feature and put it into the official documentation. For example, why does the apache.sh script change the "Listen" directive? How can I execute apache.sh manually to debug the 
resource?

My workaround: I altered the default httpd script and made a script resource. In that case it's working as expected. The only thing that bothers me quite a lot is the relocation time. It takes about 
50 to 60 seconds to relocate 5 IPs, a GFS mount and the apache script resource! Is this a reasonable time? On older clusters I remember times around 5 to 10 seconds.

Kind regards
Lorenz

Ron Cronenwett wrote:
> Hi Lorenz
> 
> I had a similar problem while testing with Centos 5.1 on a VMWare
> workstation setup. One more difference, I have been using
> system-config-cluster
> to configure the cluster. Luci seemed to be giving me problems with
> setting up a mount of an NFS export. But I have not retried Luci since
> changing
> the selinux setting I mention below.
> 
> I found if I did not configure SELinux with setenforce permissive, the
> /usr/share/cluster/apache.sh script did not execute. Once that runs,
> it creates
> /etc/cluster/apache/apache:"name". In that subdirectory, the script
> creates an httpd.conf file from /etc/httpd/httpd.conf. I also found
> the new httpd.conf
> had the Listen statement commented out even though I had set it to my
> clustered address in /etc/httpd/httpd. I needed to manually uncomment
> the
> Listen statement on each node in /etc/cluster/apache/apache:"name"/httpd.conf.
> 
> Hope this helps.
> 
> Ron C.
> 
> 
> 
> On Thu, May 29, 2008 at 11:22 AM, Lorenz Pfiffner <lp at xbe.ch> wrote:
>> Hello everybody
>>
>> I have the following test setup:
>>
>> - RHEL 5.1 Cluster Suite with rgmanager-2.0.31-1 and cman-2.0.73-1
>> - Two VMware machines on an ESX 3.5 U1, so no fence device (it's only a test)
>> - 4 IP resources defined
>> - GFS over DRBD, doesn't matter, because it doesn't even work on a local disk
>>
>> Now I would like to have an "Apache Resource" which i can select in the luci interface. I assume it's using the /usr/share/cluster/apache.sh script. If I try to start it, the error message looks like
>> this:
>>
>> May 28 16:18:15 testsrv clurgmgrd: [18475]: <err> Starting Service apache:test_httpd > Failed
>> May 28 16:18:15 testsrv clurgmgrd[18475]: <notice> start on apache "test_httpd" returned 1 (generic error)
>> May 28 16:18:15 testsrv clurgmgrd[18475]: <warning> #68: Failed to start service:test_proxy_http; return value: 1
>> May 28 16:18:15 testsrv clurgmgrd[18475]: <notice> Stopping service service:test_proxy_http
>> May 28 16:18:16 testsrv clurgmgrd: [18475]: <err> Checking Existence Of File /var/run/cluster/apache/apache:test_httpd.pid [apache:test_httpd] > Failed - File Doesn't Exist
>> May 28 16:18:16 testsrv clurgmgrd: [18475]: <err> Stopping Service apache:test_httpd > Failed
>> May 28 16:18:16 testsrv clurgmgrd[18475]: <notice> stop on apache "test_httpd" returned 1 (generic error)
>> May 28 16:18:16 testsrv clurgmgrd[18475]: <warning> #71: Relocating failed service service:test_proxy_http
>>
>> I've another cluster in which I had to alter the default init.d/httpd script to be able to run multiple apache instances (not vhosts) on one server. But there I have the Apache Service configured with
>> a "Script Resource".
>>
>> Is this supposed to work of is it a feature in development? I don't see something like "Apache Resource" in the current documentation.
>>
>> Kind Regards
>> Lorenz
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


From ccaulfie at redhat.com  Wed Jun  4 13:38:05 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Wed, 04 Jun 2008 14:38:05 +0100
Subject: [Linux-cluster] heartbeat over 2 NICs - Hi Christine
In-Reply-To: <60f08e700806040349u344c1bdakcf67ba6ee9492c18@mail.gmail.com>
References: <60f08e700806040349u344c1bdakcf67ba6ee9492c18@mail.gmail.com>
Message-ID: <48469ABD.3030409@redhat.com>

sunhux G wrote:
> Hi Christine,
>  
>  
> I could have searched Redhat knowledgebase but thought would
> be easier if I clarify here.   We plan to cluster two RHES, server A
> & server B (on Ver 5.1AP)
>  
> a)besides the regular network port for the usual network traffic,
>    we only need one additional network port per server to set up
>    the clustering, is this right?

That is highly recommended, yes. You can run with just the one interface 
(or two bonded) but we always recommend that the cluster traffic is 
isolated from a main serving network


> b)what if we want to use 2 network ports, then we have to bond
>    the two network ports on server A & the two network ports on
>    server B - is this right?

That right, yes.


> c)anything we need to do on the Cisco switch's ports end? We
>    are using Cisco 6513
> 

Almost certainly :)

I'm no expert on cisco switches but there is some information about 
running openais over them here:

http://openais.org/doku.php?id=faq:cisco_switches

-- 

Chrissie


From T.Kumar at alcoa.com  Wed Jun  4 13:53:05 2008
From: T.Kumar at alcoa.com (Kumar, T Santhosh (TCS))
Date: Wed, 4 Jun 2008 09:53:05 -0400
Subject: [Linux-cluster] Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 -
	Impact analysis.
In-Reply-To: <20080531160007.B6F1061A461@hormel.redhat.com>
References: <20080531160007.B6F1061A461@hormel.redhat.com>
Message-ID: <0C3FC6B507AF684199E57BFCA3EAB5532582D7DB@NOANDC-MXU11.NOA.Alcoa.com>


 here is the RHEL version details.

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.1 (Tikanga)

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of
linux-cluster-request at redhat.com
Sent: Saturday, May 31, 2008 12:00 PM
To: linux-cluster at redhat.com
Subject: Linux-cluster Digest, Vol 49, Issue 39

Send Linux-cluster mailing list submissions to
	linux-cluster at redhat.com

To subscribe or unsubscribe via the World Wide Web, visit
	https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
	linux-cluster-request at redhat.com

You can reach the person managing the list at
	linux-cluster-owner at redhat.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-cluster digest..."


Today's Topics:

   1. Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 -	Impact
      analysis. (Kumar, T Santhosh (TCS))
   2. Re: Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 -	Impact
      analysis. (Roger Pe?a)


----------------------------------------------------------------------

Message: 1
Date: Fri, 30 May 2008 13:25:07 -0400
From: "Kumar, T Santhosh \(TCS\)" <T.Kumar at alcoa.com>
Subject: [Linux-cluster] Upgrading to
	lvm2-cluster-2.02.32-4.el5.x86_64 -	Impact analysis.
To: <linux-cluster at redhat.com>
Message-ID:
	
<0C3FC6B507AF684199E57BFCA3EAB5532565630D at NOANDC-MXU11.NOA.Alcoa.com>
Content-Type: text/plain;	charset="us-ascii"


I am planning to upgrade to lvm2-2.02.32-4.el5.x86_64.rpm  along with
the other three dependencies  listed below.

lvm2-cluster-2.02.32-4.el5.x86_64.rpm
device-mapper-event-1.02.24-1.el5.x86_64.rpm
device-mapper-1.02.24-1.el5.x86_64.rpm

I prefer to do this as I realise  the below.

lvm2-2.02.32-4.el5.x86_64.rpm is an updated package which resolves the
"clvmd -R  did not work as expected". 

Do any one know of any problems which might come with upgrading the
lvm2, device mapper packages.


------------------------------

Message: 2
Date: Fri, 30 May 2008 11:14:41 -0700 (PDT)
From: Roger Pe?a <orkcu at yahoo.com>
Subject: Re: [Linux-cluster] Upgrading to
	lvm2-cluster-2.02.32-4.el5.x86_64 -	Impact analysis.
To: linux clustering <linux-cluster at redhat.com>
Message-ID: <767810.9219.qm at web50605.mail.re2.yahoo.com>
Content-Type: text/plain; charset=us-ascii


--- On Fri, 5/30/08, Kumar, T Santhosh (TCS) <T.Kumar at alcoa.com> wrote:

> From: Kumar, T Santhosh (TCS) <T.Kumar at alcoa.com>
> Subject: [Linux-cluster] Upgrading to
lvm2-cluster-2.02.32-4.el5.x86_64 - Impact analysis.
> To: linux-cluster at redhat.com
> Received: Friday, May 30, 2008, 1:25 PM
> I am planning to upgrade to lvm2-2.02.32-4.el5.x86_64.rpm 
> along with
> the other three dependencies  listed below.
> 
> lvm2-cluster-2.02.32-4.el5.x86_64.rpm
> device-mapper-event-1.02.24-1.el5.x86_64.rpm
> device-mapper-1.02.24-1.el5.x86_64.rpm
> 
> I prefer to do this as I realise  the below.
> 
> lvm2-2.02.32-4.el5.x86_64.rpm is an updated package which
> resolves the
> "clvmd -R  did not work as expected". 
> 
> Do any one know of any problems which might come with
> upgrading the
> lvm2, device mapper packages.

I suggest you to take a look in bugzilla.
I dont have a linux server in my hand right now to check so I dont know
tom what RHEL release you are refering, but we got some clvm problems
when we update a RHEL4.5 to RHEL4.6 + update.
and also there is bug, fixed for 5.2 but dont know for 4.6, that I think
you should into, it was discussed in this list days ago (subject: LVM
manager or something)

cu
roger


      __________________________________________________________________
Be smarter than spam. See how smart SpamGuard is at giving junk email
the boot with the All-new Yahoo! Mail.  Click on Options in Mail and
switch to New Mail today or register for free at http://mail.yahoo.ca


------------------------------

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 49, Issue 39
*********************************************


From kri_thi at yahoo.com  Wed Jun  4 14:53:09 2008
From: kri_thi at yahoo.com (krishnamurthi G)
Date: Wed, 4 Jun 2008 07:53:09 -0700 (PDT)
Subject: [Linux-cluster] Any group for VCS cluster
Message-ID: <412748.77396.qm@web90407.mail.mud.yahoo.com>

Hi,
Is there any group to get more info on VCS cluster.
Thanks in advance
-Krishna


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080604/9378c660/attachment.htm>

From corey.kovacs at gmail.com  Wed Jun  4 18:18:47 2008
From: corey.kovacs at gmail.com (Corey Kovacs)
Date: Wed, 4 Jun 2008 19:18:47 +0100
Subject: [Linux-cluster] gfs_controld
Message-ID: <7d6e8da40806041118p73484d53r3c15510dfb536d9c@mail.gmail.com>

Previous to a recent upgrade to RHEL5.2 from RHEL5.1, I was using KDE
as my default desktop
with a home dir mounted from and nfs exported gfs2 filesystem. After
the upgrade, kde hangs due
to hundreds (even thousands) of the following errors....

gfs_controld[XXXX]: plock result write err 0 errno 2

the exports are nfs ver 3 (i have some older clients) ant proto=udp

is this a known issue? is there a fix available?


thanks


-corey


From kri_thi at yahoo.com  Thu Jun  5 09:40:51 2008
From: kri_thi at yahoo.com (krishnamurthi G)
Date: Thu, 5 Jun 2008 02:40:51 -0700 (PDT)
Subject: [Linux-cluster] Any group for VCS cluster
Message-ID: <576986.63897.qm@web90407.mail.mud.yahoo.com>

Hi ,

As part of port activity we are planning to port VCS cluster on Windows. We will make use of CLI on UNIX, whereas API are being used on Windows. I am newbie to Windows world. I would appreciate if somebody give me pointers/referrance or any active group ( specify group name).

Warm Regards
- Krishna


----- Original Message ----
From: krishnamurthi G <kri_thi at yahoo.com>
To: linux clustering <linux-cluster at redhat.com>
Sent: Wednesday, June 4, 2008 8:23:09 PM
Subject: [Linux-cluster] Any group for VCS cluster


Hi,
Is there any group to get more info on VCS cluster.
 
Thanks in advance
-Krishna


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080605/23a969b0/attachment.htm>

From mgrac at redhat.com  Thu Jun  5 15:24:24 2008
From: mgrac at redhat.com (Marek 'marx' Grac)
Date: Thu, 05 Jun 2008 17:24:24 +0200
Subject: [Linux-cluster] apache resource problem in RHCS 5.1
In-Reply-To: <48469941.6030800@xbe.ch>
References: <483ECA36.7070007@xbe.ch>	<9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>
	<48469941.6030800@xbe.ch>
Message-ID: <48480528.6000708@redhat.com>

Hi,

Lorenz Pfiffner wrote:
> Hi Ron
>
> I couldn't make it working with the apache resource. For me it seems 
> quite unstable and it's nowhere really mentioned in any documentation 
> I found. So please, if any RedHat guy is reading this, can you please 
> improve this feature and put it into the official documentation. For 
> example, why does the apache.sh script change the "Listen" directive?
Look at my previous post to this thread. IMHO unstable is something that 
does not work.

> How can I execute apache.sh manually to debug the resource?
>
If you want to debug, the best way is to run resource group manager in 
debug mode. So stop it in all machines, and run clurgmgrd -fd  (stay 
forward and debug). Resource agents tries to log as much as is useful 
and you will see everything on output. If you want to run this script 
directly, you will have to setup all environment variables OCF_*.

> My workaround: I altered the default httpd script and made a script 
> resource. In that case it's working as expected. The only thing that 
> bothers me quite a lot is the relocation time. It takes about 50 to 60 
> seconds to relocate 5 IPs, a GFS mount and the apache script resource! 
> Is this a reasonable time? On older clusters I remember times around 5 
> to 10 seconds.

Default init script for httpd, mysqld, ... will work for you if you have 
only one httpd on your cluster. It is not suitable for running several 
instances on same machine. This is one of the reasons why we need 
resource agents.


-- 
Marek Grac
Red Hat Czech s.r.o.


From david.costakos at gmail.com  Thu Jun  5 20:44:39 2008
From: david.costakos at gmail.com (Dave Costakos)
Date: Thu, 5 Jun 2008 13:44:39 -0700
Subject: [Linux-cluster] apache resource problem in RHCS 5.1
In-Reply-To: <48469941.6030800@xbe.ch>
References: <483ECA36.7070007@xbe.ch>
	<9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>
	<48469941.6030800@xbe.ch>
Message-ID: <6b6836c60806051344m345b05a5x96e4bdd43fdffe4b@mail.gmail.com>

For what it's worth,  Lorenz, sometimes it's the simplest things that cause
errors. I had this same error.  It turned out that the parent directory for
the pid file didn't exist.  It's complaining about
/var/run/cluster/apache/apache:test_httpd.pid.  In my case /var/run/cluster
existed but /var/run/cluster/apache did not.  Can you confirm that
/var/run/cluster/apache exists?

-Dave.


On Wed, Jun 4, 2008 at 6:31 AM, Lorenz Pfiffner <lp at xbe.ch> wrote:

> Hi Ron
>
> Thanks for replying! Your answer gave me some tipps, but none of them
> worked for me. I don't have SELinux enabled or permissive, it's disabled
> anyway.
>
> I couldn't make it working with the apache resource. For me it seems quite
> unstable and it's nowhere really mentioned in any documentation I found. So
> please, if any RedHat guy is reading this, can you please improve this
> feature and put it into the official documentation. For example, why does
> the apache.sh script change the "Listen" directive? How can I execute
> apache.sh manually to debug the resource?
>
> My workaround: I altered the default httpd script and made a script
> resource. In that case it's working as expected. The only thing that bothers
> me quite a lot is the relocation time. It takes about 50 to 60 seconds to
> relocate 5 IPs, a GFS mount and the apache script resource! Is this a
> reasonable time? On older clusters I remember times around 5 to 10 seconds.
>
> Kind regards
> Lorenz
>
>
> Ron Cronenwett wrote:
>
>> Hi Lorenz
>>
>> I had a similar problem while testing with Centos 5.1 on a VMWare
>> workstation setup. One more difference, I have been using
>> system-config-cluster
>> to configure the cluster. Luci seemed to be giving me problems with
>> setting up a mount of an NFS export. But I have not retried Luci since
>> changing
>> the selinux setting I mention below.
>>
>> I found if I did not configure SELinux with setenforce permissive, the
>> /usr/share/cluster/apache.sh script did not execute. Once that runs,
>> it creates
>> /etc/cluster/apache/apache:"name". In that subdirectory, the script
>> creates an httpd.conf file from /etc/httpd/httpd.conf. I also found
>> the new httpd.conf
>> had the Listen statement commented out even though I had set it to my
>> clustered address in /etc/httpd/httpd. I needed to manually uncomment
>> the
>> Listen statement on each node in
>> /etc/cluster/apache/apache:"name"/httpd.conf.
>>
>> Hope this helps.
>>
>> Ron C.
>>
>>
>>
>> On Thu, May 29, 2008 at 11:22 AM, Lorenz Pfiffner <lp at xbe.ch> wrote:
>>
>>> Hello everybody
>>>
>>> I have the following test setup:
>>>
>>> - RHEL 5.1 Cluster Suite with rgmanager-2.0.31-1 and cman-2.0.73-1
>>> - Two VMware machines on an ESX 3.5 U1, so no fence device (it's only a
>>> test)
>>> - 4 IP resources defined
>>> - GFS over DRBD, doesn't matter, because it doesn't even work on a local
>>> disk
>>>
>>> Now I would like to have an "Apache Resource" which i can select in the
>>> luci interface. I assume it's using the /usr/share/cluster/apache.sh script.
>>> If I try to start it, the error message looks like
>>> this:
>>>
>>> May 28 16:18:15 testsrv clurgmgrd: [18475]: <err> Starting Service
>>> apache:test_httpd > Failed
>>> May 28 16:18:15 testsrv clurgmgrd[18475]: <notice> start on apache
>>> "test_httpd" returned 1 (generic error)
>>> May 28 16:18:15 testsrv clurgmgrd[18475]: <warning> #68: Failed to start
>>> service:test_proxy_http; return value: 1
>>> May 28 16:18:15 testsrv clurgmgrd[18475]: <notice> Stopping service
>>> service:test_proxy_http
>>> May 28 16:18:16 testsrv clurgmgrd: [18475]: <err> Checking Existence Of
>>> File /var/run/cluster/apache/apache:test_httpd.pid [apache:test_httpd] >
>>> Failed - File Doesn't Exist
>>> May 28 16:18:16 testsrv clurgmgrd: [18475]: <err> Stopping Service
>>> apache:test_httpd > Failed
>>> May 28 16:18:16 testsrv clurgmgrd[18475]: <notice> stop on apache
>>> "test_httpd" returned 1 (generic error)
>>> May 28 16:18:16 testsrv clurgmgrd[18475]: <warning> #71: Relocating
>>> failed service service:test_proxy_http
>>>
>>> I've another cluster in which I had to alter the default init.d/httpd
>>> script to be able to run multiple apache instances (not vhosts) on one
>>> server. But there I have the Apache Service configured with
>>> a "Script Resource".
>>>
>>> Is this supposed to work of is it a feature in development? I don't see
>>> something like "Apache Resource" in the current documentation.
>>>
>>> Kind Regards
>>> Lorenz
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Dave Costakos
mailto:david.costakos at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080605/b027f452/attachment.htm>

From rfpike at fedex.com  Fri Jun  6 19:45:37 2008
From: rfpike at fedex.com (Robbie Pike)
Date: Fri, 6 Jun 2008 14:45:37 -0500
Subject: [Linux-cluster] cluster.conf settings
Message-ID: <A1E84F72C73C3E4CAA5FB3A8F3050C930BF254CD@MEMEXCH06V.corp.ds.fedex.com>

I'm working on procedures for installing Cluster Suite and setting up
cluster. I always try to do everything command-line first before using
anything like conga or modifying the configuration file directly. Is
there a way to add fence_daemon post_join_delay post_fail_delay settings
to the cluster.conf using ccs_tool? What things can only be added to the
cluster.conf by editing the file? Any help is appreciated.

 
R.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080606/640187b6/attachment.htm>

From fdinitto at redhat.com  Mon Jun  9 07:42:30 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 9 Jun 2008 09:42:30 +0200 (CEST)
Subject: [Linux-cluster] Cluster 2.99.04 (development snapshot) released
Message-ID: <Pine.LNX.4.64.0806090936530.5892@trider-g7>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


The cluster team and its community are proud to announce the 5th release
from the master branch: 2.99.04.

The 2.99.XX releases are _NOT_ meant to be used for production
environments.. yet.

You have been warned: *this code will have no mercy* for your servers and
your data.

The master branch is the main development tree that receives all new
features, code, clean up and a whole brand new set of bugs,

At some point in time this code will become the 3.0 stable release.

Everybody with test equipment and time to spare, is highly encouraged to
download, install and test the 2.99 releases and more important report
problems.

In order to build the 2.99.04 release you will need:

- - openais 0.83 or higher
- - linux kernel (git snapshot or 2.6.26-rc3) from
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git
(but can run on 2.6.25 in compatibility mode)

NOTE to packagers: the library API/ABI's are _NOT_ stable (hence 2.9). We
are still shipping shared libraries but remember that they can change
anytime without warning. A bunch of new shared libraries have been added.

The new source tarball can be downloaded here:

   ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.04.tar.gz

In order to use GFS1, the Linux kernel requires a minimal patch:

   ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Happy clustering,
Fabio

Under the hood (from 2.99.03):

Bob Peterson (3):
       Fix gfs2_edit bugs with non-4K block sizes
       Make gfs2_edit more friendly to automated testing.
       Updates to gfs2_edit man page for new option.

Fabio M. Di Nitto (12):
       [MISC] Make several API's private again
       [CONFIG] Add full xpath support to libccs
       [CMAN] Bump library version
       [BUILD] Switch libdlmcontrol back to shared library
       [BUILD] Collapse common library makefile bits in libs.mk
       [MISC] Remove obsolete and empty files
       [MISC] Add top level licence files
       [MISC] Cleanup licence, copyright and header duplication
       [MISC] Tree cleanup
       [BUILD] Prepare infrastructure for perl/python bindings
       [GNBD/FENCE] Move fence_gnbd agent where it belongs
       [MISC] Update top level copyright file

Marek 'marx' Grac (3):
       [FENCE] Fix #446995: Unknown option
       [FENCE] Fix: 447378: fence_apc unable to connect via ssh to APC 7900
       Fixes #445662: names of resources with spaces are mishandled

Mark Hlawatschek (1):
       mount.gfs2: skip mtab updates

  COPYING.applications                             |  339 +++
  COPYING.libraries                                |  510 ++++
  COPYRIGHT                                        |  230 ++
  Makefile                                         |   15 +-
  README.licence                                   |   40 +
  bindings/Makefile                                |    4 +
  bindings/perl/Makefile                           |    4 +
  bindings/python/Makefile                         |    4 +
  ccs/Makefile                                     |   12 -
  ccs/ccs_tool/Makefile                            |   12 -
  ccs/ccs_tool/editconf.c                          |   12 -
  ccs/ccs_tool/editconf.h                          |   12 -
  ccs/ccs_tool/old_parser.c                        |   12 -
  ccs/ccs_tool/update.c                            |   11 -
  ccs/ccs_tool/update.h                            |   12 -
  ccs/ccs_tool/upgrade.c                           |   11 -
  ccs/ccs_tool/upgrade.h                           |   12 -
  ccs/ccsais/Makefile                              |   12 -
  ccs/ccsais/config.c                              |   11 -
  ccs/daemon/Makefile                              |   12 -
  ccs/daemon/ccsd.c                                |   11 -
  ccs/daemon/cluster_mgr.c                         |   11 -
  ccs/daemon/cluster_mgr.h                         |   11 -
  ccs/daemon/cnx_mgr.c                             |   11 -
  ccs/daemon/cnx_mgr.h                             |   12 -
  ccs/daemon/globals.c                             |   11 -
  ccs/daemon/globals.h                             |   11 -
  ccs/daemon/misc.c                                |   11 -
  ccs/daemon/misc.h                                |   11 -
  ccs/include/comm_headers.h                       |   12 -
  ccs/include/debug.h                              |   12 -
  ccs/libccscompat/Makefile                        |   28 +-
  ccs/libccscompat/libccscompat.c                  |   11 -
  ccs/libccscompat/libccscompat.h                  |   11 -
  ccs/man/Makefile                                 |   13 -
  ccs/man/ccs.7                                    |    6 -
  ccs/man/ccs_tool.8                               |    7 -
  ccs/man/ccsd.8                                   |    7 -
  ccs/man/cluster.conf.5                           |    4 -
  cman/Makefile                                    |   13 -
  cman/cman_tool/Makefile                          |   12 -
  cman/cman_tool/cman_tool.h                       |   13 -
  cman/cman_tool/join.c                            |   13 -
  cman/cman_tool/main.c                            |   13 -
  cman/daemon/Makefile                             |   12 -
  cman/daemon/ais.c                                |   12 -
  cman/daemon/ais.h                                |   11 -
  cman/daemon/barrier.c                            |   13 -
  cman/daemon/barrier.h                            |   12 -
  cman/daemon/cman-preconfig.c                     |   11 -
  cman/daemon/cman.h                               |   12 -
  cman/daemon/cmanconfig.c                         |   11 -
  cman/daemon/cmanconfig.h                         |   12 -
  cman/daemon/cnxman-private.h                     |   13 -
  cman/daemon/cnxman-socket.h                      |   13 -
  cman/daemon/commands.c                           |   13 -
  cman/daemon/commands.h                           |   12 -
  cman/daemon/daemon.c                             |   11 -
  cman/daemon/daemon.h                             |   12 -
  cman/daemon/list.h                               |   15 -
  cman/daemon/logging.c                            |   12 -
  cman/daemon/logging.h                            |   11 -
  cman/daemon/nodelist.h                           |   13 -
  cman/init.d/Makefile                             |   12 -
  cman/lib/Makefile                                |   43 +-
  cman/lib/libcman.c                               |   22 -
  cman/lib/libcman.h                               |   24 +-
  cman/man/Makefile                                |   13 -
  cman/man/cman.5                                  |    3 -
  cman/qdisk/Makefile                              |   12 -
  cman/qdisk/bitmap.c                              |   19 -
  cman/qdisk/crc32.c                               |   20 -
  cman/qdisk/daemon_init.c                         |   19 -
  cman/qdisk/disk.c                                |   19 -
  cman/qdisk/disk.h                                |   20 -
  cman/qdisk/disk_util.c                           |   20 -
  cman/qdisk/main.c                                |   20 -
  cman/qdisk/mkqdisk.c                             |   20 -
  cman/qdisk/platform.h                            |   19 -
  cman/qdisk/proc.c                                |   20 -
  cman/qdisk/scandisk.c                            |   19 -
  cman/qdisk/scandisk.h                            |   18 -
  cman/qdisk/score.c                               |   20 -
  cman/qdisk/score.h                               |   20 -
  cman/tests/Makefile                              |   12 -
  cman/tests/qwait.c                               |    9 -
  cman/tests/user_service.c                        |   13 -
  cmirror-kernel/src/dm-clog-tfr.c                 |   83 -
  cmirror-kernel/src/dm-clog-tfr.h                 |   40 -
  cmirror-kernel/src/dm-clog.c                     |  624 -----
  cmirror/Makefile                                 |   14 -
  config/Makefile                                  |   13 -
  config/libs/Makefile                             |   13 -
  config/libs/libccsconfdb/Makefile                |   44 +-
  config/libs/libccsconfdb/ccs.h                   |   13 +-
  config/libs/libccsconfdb/libccs.c                |  298 ++-
  config/tools/Makefile                            |   13 -
  config/tools/ccs_test/Makefile                   |   12 -
  config/tools/ccs_test/ccs_test.c                 |   11 -
  config/tools/man/Makefile                        |   13 -
  config/tools/man/ccs_test.8                      |    6 -
  configure                                        |   49 +-
  csnap-kernel/Makefile                            |   14 -
  csnap-kernel/patches/2.6.15/00001.patch          |   16 -
  csnap-kernel/patches/2.6.15/00002.patch          |   32 -
  csnap-kernel/patches/2.6.15/00003.patch          |   30 -
  csnap-kernel/patches/2.6.9/00001.patch           |   16 -
  csnap-kernel/patches/2.6.9/00002.patch           |   32 -
  csnap-kernel/patches/2.6.9/00003.patch           |   30 -
  csnap-kernel/src/Makefile                        |   69 -
  csnap-kernel/src/dm-csnap.c                      | 1147 ---------
  csnap-kernel/src/dm-csnap.h                      |   70 -
  csnap/COPYING                                    |  340 ---
  csnap/Makefile                                   |   15 -
  csnap/README                                     |   67 -
  csnap/doc/cluster.snapshot.design.html           | 1467 -----------
  csnap/doc/csnap.ps                               | 2994 ----------------------
  csnap/patches/csnap-2.6.7-2.4.26                 |  195 --
  csnap/patches/csnap-2.6.8.1                      | 1321 ----------
  csnap/src/Makefile                               |   44 -
  csnap/src/agent.c                                |  359 ---
  csnap/src/buffer.c                               |  268 --
  csnap/src/buffer.h                               |   60 -
  csnap/src/buffertest.c                           |   15 -
  csnap/src/create.c                               |   58 -
  csnap/src/csnap.c                                | 2623 -------------------
  csnap/src/csnap.h                                |   44 -
  csnap/src/list.h                                 |   64 -
  csnap/src/sock.h                                 |   55 -
  csnap/src/trace.h                                |    7 -
  csnap/tests/Makefile                             |   49 -
  csnap/tests/devpoke.c                            |   55 -
  csnap/tests/devspam.c                            |   83 -
  csnap/tests/testclient.c                         |  185 --
  dlm/Makefile                                     |   12 -
  dlm/libdlm/Makefile                              |   21 +-
  dlm/libdlm/libdlm.c                              |   24 -
  dlm/libdlm/libdlm.h                              |   23 -
  dlm/libdlmcontrol/Makefile                       |   42 +-
  dlm/libdlmcontrol/libdlmcontrol.h                |   22 -
  dlm/libdlmcontrol/main.c                         |   12 -
  dlm/man/Makefile                                 |   12 -
  dlm/man/dlm_tool.8                               |    6 -
  dlm/tests/Makefile                               |   12 -
  dlm/tests/usertest/Makefile                      |   12 -
  dlm/tests/usertest/alternate-lvb.c               |   12 -
  dlm/tests/usertest/dlmtest2.c                    |   12 -
  dlm/tests/usertest/threads.c                     |   12 -
  dlm/tool/Makefile                                |   12 -
  dlm/tool/main.c                                  |   12 -
  fence/Makefile                                   |   13 -
  fence/agents/Makefile                            |   13 -
  fence/agents/apc/Makefile                        |   13 -
  fence/agents/apc/fence_apc.py                    |    3 +-
  fence/agents/apc_snmp/Makefile                   |   13 -
  fence/agents/apc_snmp/README                     |    2 -
  fence/agents/apc_snmp/fence_apc_snmp.py          |   13 -
  fence/agents/baytech/Makefile                    |   13 -
  fence/agents/baytech/fence_baytech.pl            |   13 -
  fence/agents/brocade/Makefile                    |   13 -
  fence/agents/brocade/fence_brocade.pl            |   13 -
  fence/agents/bullpap/Makefile                    |   13 -
  fence/agents/bullpap/fence_bullpap.pl            |   12 -
  fence/agents/cpint/Makefile                      |   13 -
  fence/agents/cpint/fence_cpint.pl                |   13 -
  fence/agents/drac/Makefile                       |   13 -
  fence/agents/drac/fence_drac.pl                  |   12 -
  fence/agents/drac/fence_drac5.py                 |    3 +-
  fence/agents/egenera/Makefile                    |   13 -
  fence/agents/egenera/fence_egenera.pl            |   13 -
  fence/agents/gnbd/Makefile                       |   23 +
  fence/agents/gnbd/main.c                         |  327 +++
  fence/agents/ibmblade/Makefile                   |   13 -
  fence/agents/ibmblade/fence_ibmblade.pl          |   13 -
  fence/agents/ifmib/Makefile                      |   13 -
  fence/agents/ilo/Makefile                        |   13 -
  fence/agents/ilo/fence_ilo.py                    |    3 +-
  fence/agents/ipmilan/Makefile                    |   13 -
  fence/agents/ipmilan/expect.c                    |   19 -
  fence/agents/ipmilan/expect.h                    |   16 -
  fence/agents/ipmilan/ipmilan.c                   |   17 -
  fence/agents/lib/Makefile                        |   14 -
  fence/agents/lib/fencing.py.py                   |   14 +-
  fence/agents/lpar/Makefile                       |   13 -
  fence/agents/lpar/fence_lpar.py                  |    3 +-
  fence/agents/manual/Makefile                     |   13 -
  fence/agents/manual/fence_ack_manual.sh          |   12 -
  fence/agents/mcdata/Makefile                     |   13 -
  fence/agents/mcdata/fence_mcdata.pl              |   14 -
  fence/agents/rackswitch/Makefile                 |   13 -
  fence/agents/rackswitch/do_rack.c                |   12 -
  fence/agents/rps10/Makefile                      |   13 -
  fence/agents/rps10/rps10.c                       |   18 -
  fence/agents/rsa/Makefile                        |   13 -
  fence/agents/rsa/fence_rsa.py                    |   13 -
  fence/agents/rsb/Makefile                        |   13 -
  fence/agents/rsb/fence_rsb.py                    |   13 -
  fence/agents/sanbox2/Makefile                    |   13 -
  fence/agents/sanbox2/fence_sanbox2.pl            |   13 -
  fence/agents/scsi/Makefile                       |   12 -
  fence/agents/vixel/Makefile                      |   13 -
  fence/agents/vixel/fence_vixel.pl                |   13 -
  fence/agents/vmware/Makefile                     |   13 -
  fence/agents/vmware/fence_vmware.pl              |   15 -
  fence/agents/wti/Makefile                        |   13 -
  fence/agents/wti/fence_wti.py                    |    3 +-
  fence/agents/xcat/Makefile                       |   13 -
  fence/agents/xcat/fence_xcat.pl                  |    9 -
  fence/agents/xvm/Makefile                        |   12 -
  fence/agents/xvm/debug.c                         |   18 -
  fence/agents/xvm/debug.h                         |   18 -
  fence/agents/xvm/fence_xvm.c                     |   18 -
  fence/agents/xvm/fence_xvmd.c                    |   18 -
  fence/agents/xvm/ip_lookup.c                     |   18 -
  fence/agents/xvm/ip_lookup.h                     |   18 -
  fence/agents/xvm/mcast.c                         |   18 -
  fence/agents/xvm/mcast.h                         |   18 -
  fence/agents/xvm/options-ccs.c                   |   18 -
  fence/agents/xvm/options.c                       |   18 -
  fence/agents/xvm/options.h                       |   18 -
  fence/agents/xvm/simple_auth.c                   |   18 -
  fence/agents/xvm/simple_auth.h                   |   18 -
  fence/agents/xvm/tcp.c                           |   19 -
  fence/agents/xvm/tcp.h                           |   18 -
  fence/agents/xvm/virt.c                          |   18 -
  fence/agents/xvm/virt.h                          |   18 -
  fence/agents/xvm/vm_states.c                     |   18 -
  fence/agents/xvm/xvm.h                           |   18 -
  fence/agents/zvm/Makefile                        |   13 -
  fence/agents/zvm/fence_zvm.pl                    |   13 -
  fence/fence_node/Makefile                        |   20 +-
  fence/fence_node/fence_node.c                    |   13 -
  fence/fence_tool/Makefile                        |   20 +-
  fence/fence_tool/fence_tool.c                    |   13 -
  fence/fenced/Makefile                            |   18 +-
  fence/fenced/config.c                            |   12 -
  fence/fenced/cpg.c                               |   12 -
  fence/fenced/fd.h                                |   13 -
  fence/fenced/fenced.h                            |   12 -
  fence/fenced/group.c                             |   12 -
  fence/fenced/main.c                              |   13 -
  fence/fenced/member_cman.c                       |   12 -
  fence/fenced/recover.c                           |   13 -
  fence/include/linux_endian.h                     |   13 -
  fence/libfence/Makefile                          |   42 +-
  fence/libfence/agent.c                           |   13 -
  fence/libfence/libfence.h                        |   21 -
  fence/libfenced/Makefile                         |   42 +-
  fence/libfenced/libfenced.h                      |   22 -
  fence/libfenced/main.c                           |   12 -
  fence/man/Makefile                               |   14 +-
  fence/man/fence.8                                |    7 -
  fence/man/fence_ack_manual.8                     |    7 -
  fence/man/fence_apc.8                            |    7 -
  fence/man/fence_baytech.8                        |    7 -
  fence/man/fence_bladecenter.8                    |    7 -
  fence/man/fence_brocade.8                        |    7 -
  fence/man/fence_bullpap.8                        |    7 -
  fence/man/fence_cpint.8                          |    7 -
  fence/man/fence_drac.8                           |    6 -
  fence/man/fence_egenera.8                        |    7 -
  fence/man/fence_gnbd.8                           |   84 +
  fence/man/fence_ibmblade.8                       |    7 -
  fence/man/fence_ilo.8                            |    7 -
  fence/man/fence_ipmilan.8                        |    7 -
  fence/man/fence_manual.8                         |    7 -
  fence/man/fence_mcdata.8                         |    7 -
  fence/man/fence_node.8                           |    7 -
  fence/man/fence_rackswitch.8                     |    7 -
  fence/man/fence_rib.8                            |    7 -
  fence/man/fence_rsa.8                            |    6 -
  fence/man/fence_rsb.8                            |    6 -
  fence/man/fence_sanbox2.8                        |    7 -
  fence/man/fence_scsi.8                           |    6 -
  fence/man/fence_tool.8                           |    7 -
  fence/man/fence_vixel.8                          |    7 -
  fence/man/fence_wti.8                            |    7 -
  fence/man/fence_xcat.8                           |    3 -
  fence/man/fence_xvm.8                            |    7 -
  fence/man/fence_xvmd.8                           |    7 -
  fence/man/fence_zvm.8                            |    7 -
  fence/man/fenced.8                               |    7 -
  gfs-kernel/src/gfs/Makefile                      |   13 -
  gfs-kernel/src/gfs/acl.c                         |   13 -
  gfs-kernel/src/gfs/acl.h                         |   13 -
  gfs-kernel/src/gfs/bits.c                        |   13 -
  gfs-kernel/src/gfs/bits.h                        |   13 -
  gfs-kernel/src/gfs/bmap.c                        |   13 -
  gfs-kernel/src/gfs/bmap.h                        |   13 -
  gfs-kernel/src/gfs/daemon.c                      |   13 -
  gfs-kernel/src/gfs/daemon.h                      |   13 -
  gfs-kernel/src/gfs/dio.c                         |   13 -
  gfs-kernel/src/gfs/dio.h                         |   13 -
  gfs-kernel/src/gfs/dir.c                         |   13 -
  gfs-kernel/src/gfs/dir.h                         |   13 -
  gfs-kernel/src/gfs/eaops.c                       |   13 -
  gfs-kernel/src/gfs/eaops.h                       |   13 -
  gfs-kernel/src/gfs/eattr.c                       |   13 -
  gfs-kernel/src/gfs/eattr.h                       |   13 -
  gfs-kernel/src/gfs/file.c                        |   13 -
  gfs-kernel/src/gfs/file.h                        |   13 -
  gfs-kernel/src/gfs/fixed_div64.h                 |   34 -
  gfs-kernel/src/gfs/format.h                      |   13 -
  gfs-kernel/src/gfs/gfs.h                         |   13 -
  gfs-kernel/src/gfs/gfs_ioctl.h                   |   13 -
  gfs-kernel/src/gfs/gfs_ondisk.h                  |   13 -
  gfs-kernel/src/gfs/glock.c                       |   13 -
  gfs-kernel/src/gfs/glock.h                       |   13 -
  gfs-kernel/src/gfs/glops.c                       |   13 -
  gfs-kernel/src/gfs/glops.h                       |   13 -
  gfs-kernel/src/gfs/incore.h                      |   13 -
  gfs-kernel/src/gfs/inode.c                       |   13 -
  gfs-kernel/src/gfs/inode.h                       |   13 -
  gfs-kernel/src/gfs/ioctl.c                       |   13 -
  gfs-kernel/src/gfs/ioctl.h                       |   13 -
  gfs-kernel/src/gfs/lm.c                          |    9 -
  gfs-kernel/src/gfs/lm.h                          |   13 -
  gfs-kernel/src/gfs/log.c                         |   13 -
  gfs-kernel/src/gfs/log.h                         |   13 -
  gfs-kernel/src/gfs/lops.c                        |   13 -
  gfs-kernel/src/gfs/lops.h                        |   13 -
  gfs-kernel/src/gfs/lvb.c                         |   13 -
  gfs-kernel/src/gfs/lvb.h                         |   13 -
  gfs-kernel/src/gfs/main.c                        |   13 -
  gfs-kernel/src/gfs/mount.c                       |   13 -
  gfs-kernel/src/gfs/mount.h                       |   13 -
  gfs-kernel/src/gfs/ondisk.c                      |   13 -
  gfs-kernel/src/gfs/ops_address.c                 |   13 -
  gfs-kernel/src/gfs/ops_address.h                 |   13 -
  gfs-kernel/src/gfs/ops_dentry.c                  |   13 -
  gfs-kernel/src/gfs/ops_dentry.h                  |   13 -
  gfs-kernel/src/gfs/ops_export.c                  |   13 -
  gfs-kernel/src/gfs/ops_export.h                  |   13 -
  gfs-kernel/src/gfs/ops_file.c                    |   13 -
  gfs-kernel/src/gfs/ops_file.h                    |   13 -
  gfs-kernel/src/gfs/ops_fstype.c                  |   10 -
  gfs-kernel/src/gfs/ops_fstype.h                  |   13 -
  gfs-kernel/src/gfs/ops_inode.c                   |   13 -
  gfs-kernel/src/gfs/ops_inode.h                   |   13 -
  gfs-kernel/src/gfs/ops_super.c                   |   13 -
  gfs-kernel/src/gfs/ops_super.h                   |   13 -
  gfs-kernel/src/gfs/ops_vm.c                      |   13 -
  gfs-kernel/src/gfs/ops_vm.h                      |   13 -
  gfs-kernel/src/gfs/page.c                        |   13 -
  gfs-kernel/src/gfs/page.h                        |   13 -
  gfs-kernel/src/gfs/proc.c                        |   13 -
  gfs-kernel/src/gfs/proc.h                        |   13 -
  gfs-kernel/src/gfs/quota.c                       |   13 -
  gfs-kernel/src/gfs/quota.h                       |   13 -
  gfs-kernel/src/gfs/recovery.c                    |   13 -
  gfs-kernel/src/gfs/recovery.h                    |   13 -
  gfs-kernel/src/gfs/rgrp.c                        |   13 -
  gfs-kernel/src/gfs/rgrp.h                        |   13 -
  gfs-kernel/src/gfs/super.c                       |   13 -
  gfs-kernel/src/gfs/super.h                       |   13 -
  gfs-kernel/src/gfs/sys.c                         |   13 -
  gfs-kernel/src/gfs/sys.h                         |   13 -
  gfs-kernel/src/gfs/trans.c                       |   13 -
  gfs-kernel/src/gfs/trans.h                       |   13 -
  gfs-kernel/src/gfs/unlinked.c                    |   13 -
  gfs-kernel/src/gfs/unlinked.h                    |   13 -
  gfs-kernel/src/gfs/util.c                        |   13 -
  gfs-kernel/src/gfs/util.h                        |   13 -
  gfs/Makefile                                     |   13 -
  gfs/gfs_debug/Makefile                           |   13 -
  gfs/gfs_debug/basic.c                            |   13 -
  gfs/gfs_debug/basic.h                            |   13 -
  gfs/gfs_debug/block_device.c                     |   13 -
  gfs/gfs_debug/block_device.h                     |   13 -
  gfs/gfs_debug/gfs_debug.h                        |   13 -
  gfs/gfs_debug/main.c                             |   13 -
  gfs/gfs_debug/ondisk.c                           |   13 -
  gfs/gfs_debug/readfile.c                         |   13 -
  gfs/gfs_debug/readfile.h                         |   13 -
  gfs/gfs_debug/util.c                             |   13 -
  gfs/gfs_debug/util.h                             |   13 -
  gfs/gfs_edit/Makefile                            |   13 -
  gfs/gfs_edit/gfshex.c                            |   13 -
  gfs/gfs_edit/gfshex.h                            |   13 -
  gfs/gfs_edit/hexedit.c                           |   13 -
  gfs/gfs_edit/hexedit.h                           |   13 -
  gfs/gfs_fsck/Makefile                            |   12 -
  gfs/gfs_fsck/bio.c                               |   13 -
  gfs/gfs_fsck/bio.h                               |   13 -
  gfs/gfs_fsck/bitmap.c                            |   12 -
  gfs/gfs_fsck/bitmap.h                            |   13 -
  gfs/gfs_fsck/block_list.c                        |   12 -
  gfs/gfs_fsck/block_list.h                        |   12 -
  gfs/gfs_fsck/eattr.c                             |   12 -
  gfs/gfs_fsck/eattr.h                             |   12 -
  gfs/gfs_fsck/file.c                              |   13 -
  gfs/gfs_fsck/file.h                              |   13 -
  gfs/gfs_fsck/fs_bits.c                           |   13 -
  gfs/gfs_fsck/fs_bits.h                           |   13 -
  gfs/gfs_fsck/fs_bmap.c                           |   13 -
  gfs/gfs_fsck/fs_bmap.h                           |   13 -
  gfs/gfs_fsck/fs_dir.c                            |   13 -
  gfs/gfs_fsck/fs_dir.h                            |   13 -
  gfs/gfs_fsck/fs_inode.c                          |   13 -
  gfs/gfs_fsck/fs_inode.h                          |   13 -
  gfs/gfs_fsck/fs_recovery.c                       |   14 -
  gfs/gfs_fsck/fs_recovery.h                       |   13 -
  gfs/gfs_fsck/fsck.h                              |   12 -
  gfs/gfs_fsck/fsck_incore.h                       |   15 -
  gfs/gfs_fsck/hash.c                              |   13 -
  gfs/gfs_fsck/hash.h                              |   13 -
  gfs/gfs_fsck/initialize.c                        |   13 -
  gfs/gfs_fsck/inode.c                             |   12 -
  gfs/gfs_fsck/inode.h                             |   12 -
  gfs/gfs_fsck/inode_hash.c                        |   13 -
  gfs/gfs_fsck/inode_hash.h                        |   13 -
  gfs/gfs_fsck/link.c                              |   13 -
  gfs/gfs_fsck/link.h                              |   14 -
  gfs/gfs_fsck/log.c                               |   12 -
  gfs/gfs_fsck/log.h                               |   12 -
  gfs/gfs_fsck/lost_n_found.c                      |   13 -
  gfs/gfs_fsck/lost_n_found.h                      |   13 -
  gfs/gfs_fsck/main.c                              |   12 -
  gfs/gfs_fsck/metawalk.c                          |   12 -
  gfs/gfs_fsck/metawalk.h                          |   12 -
  gfs/gfs_fsck/ondisk.c                            |   13 -
  gfs/gfs_fsck/ondisk.h                            |   13 -
  gfs/gfs_fsck/pass1.c                             |   13 -
  gfs/gfs_fsck/pass1b.c                            |   13 -
  gfs/gfs_fsck/pass1c.c                            |   12 -
  gfs/gfs_fsck/pass2.c                             |   13 -
  gfs/gfs_fsck/pass3.c                             |   13 -
  gfs/gfs_fsck/pass4.c                             |   13 -
  gfs/gfs_fsck/pass5.c                             |   13 -
  gfs/gfs_fsck/rgrp.c                              |   14 -
  gfs/gfs_fsck/rgrp.h                              |   13 -
  gfs/gfs_fsck/super.c                             |   13 -
  gfs/gfs_fsck/super.h                             |   13 -
  gfs/gfs_fsck/test_bitmap.c                       |   12 -
  gfs/gfs_fsck/test_block_list.c                   |   12 -
  gfs/gfs_fsck/util.c                              |   13 -
  gfs/gfs_fsck/util.h                              |   13 -
  gfs/gfs_grow/Makefile                            |   13 -
  gfs/gfs_grow/main.c                              |   13 -
  gfs/gfs_grow/ondisk.c                            |   13 -
  gfs/gfs_jadd/Makefile                            |   13 -
  gfs/gfs_jadd/main.c                              |   13 -
  gfs/gfs_jadd/ondisk.c                            |   13 -
  gfs/gfs_mkfs/Makefile                            |   13 -
  gfs/gfs_mkfs/device_geometry.c                   |   13 -
  gfs/gfs_mkfs/fs_geometry.c                       |   13 -
  gfs/gfs_mkfs/locking.c                           |   13 -
  gfs/gfs_mkfs/main.c                              |   13 -
  gfs/gfs_mkfs/mkfs_gfs.h                          |   13 -
  gfs/gfs_mkfs/ondisk.c                            |   13 -
  gfs/gfs_mkfs/structures.c                        |   13 -
  gfs/gfs_quota/Makefile                           |   13 -
  gfs/gfs_quota/check.c                            |   13 -
  gfs/gfs_quota/gfs_quota.h                        |   13 -
  gfs/gfs_quota/layout.c                           |   13 -
  gfs/gfs_quota/main.c                             |   13 -
  gfs/gfs_quota/names.c                            |   13 -
  gfs/gfs_quota/ondisk.c                           |   13 -
  gfs/gfs_tool/Makefile                            |   13 -
  gfs/gfs_tool/counters.c                          |   13 -
  gfs/gfs_tool/decipher_lockstate_dump             |   14 -
  gfs/gfs_tool/df.c                                |   13 -
  gfs/gfs_tool/gfs_tool.h                          |   13 -
  gfs/gfs_tool/layout.c                            |   13 -
  gfs/gfs_tool/main.c                              |   13 -
  gfs/gfs_tool/misc.c                              |   13 -
  gfs/gfs_tool/ondisk.c                            |   13 -
  gfs/gfs_tool/parse_lockdump                      |   14 -
  gfs/gfs_tool/sb.c                                |   13 -
  gfs/gfs_tool/tune.c                              |   13 -
  gfs/gfs_tool/util.c                              |   13 -
  gfs/include/global.h                             |   13 -
  gfs/include/linux_endian.h                       |   13 -
  gfs/include/osi_list.h                           |   13 -
  gfs/include/osi_user.h                           |   13 -
  gfs/init.d/Makefile                              |   12 -
  gfs/libgfs/Makefile                              |   51 +-
  gfs/libgfs/bio.c                                 |   13 -
  gfs/libgfs/bitmap.c                              |   12 -
  gfs/libgfs/block_list.c                          |   12 -
  gfs/libgfs/file.c                                |   13 -
  gfs/libgfs/fs_bits.c                             |   13 -
  gfs/libgfs/fs_bmap.c                             |   13 -
  gfs/libgfs/fs_dir.c                              |   13 -
  gfs/libgfs/fs_inode.c                            |   13 -
  gfs/libgfs/incore.h                              |   13 -
  gfs/libgfs/inode.c                               |   12 -
  gfs/libgfs/log.c                                 |   12 -
  gfs/libgfs/ondisk.c                              |   13 -
  gfs/libgfs/rgrp.c                                |   14 -
  gfs/libgfs/size.c                                |   13 -
  gfs/libgfs/super.c                               |   13 -
  gfs/libgfs/util.c                                |   13 -
  gfs/man/Makefile                                 |   13 -
  gfs/man/gfs.8                                    |    3 -
  gfs/man/gfs_edit.8                               |    2 -
  gfs/man/gfs_fsck.8                               |    3 -
  gfs/man/gfs_grow.8                               |    3 -
  gfs/man/gfs_jadd.8                               |    3 -
  gfs/man/gfs_mkfs.8                               |    3 -
  gfs/man/gfs_mount.8                              |    8 -
  gfs/man/gfs_quota.8                              |    3 -
  gfs/man/gfs_tool.8                               |    3 -
  gfs/tests/Makefile                               |   12 -
  gfs/tests/filecon2/Makefile                      |   13 -
  gfs/tests/filecon2/filecon2.h                    |   13 -
  gfs/tests/filecon2/filecon2_client.c             |   13 -
  gfs/tests/filecon2/filecon2_server.c             |   13 -
  gfs/tests/mmdd/Makefile                          |   13 -
  gfs/tests/mmdd/mmdd.c                            |   13 -
  gfs2/Makefile                                    |   13 -
  gfs2/convert/Makefile                            |   12 -
  gfs2/convert/gfs2_convert.c                      |    6 -
  gfs2/debug/Makefile                              |   59 -
  gfs2/debug/basic.c                               |  471 ----
  gfs2/debug/basic.h                               |   39 -
  gfs2/debug/block_device.c                        |  130 -
  gfs2/debug/block_device.h                        |   27 -
  gfs2/debug/gfs2_debug.h                          |   96 -
  gfs2/debug/main.c                                |  192 --
  gfs2/debug/ondisk.c                              |   25 -
  gfs2/debug/readfile.c                            |  228 --
  gfs2/debug/readfile.h                            |   27 -
  gfs2/debug/util.c                                |  347 ---
  gfs2/debug/util.h                                |   42 -
  gfs2/edit/Makefile                               |   13 -
  gfs2/edit/gfs2hex.c                              |   26 +-
  gfs2/edit/gfs2hex.h                              |   13 -
  gfs2/edit/hexedit.c                              |  154 +-
  gfs2/edit/hexedit.h                              |   14 -
  gfs2/edit/savemeta.c                             |   70 +-
  gfs2/fsck/Makefile                               |   12 -
  gfs2/fsck/eattr.c                                |   12 -
  gfs2/fsck/eattr.h                                |   12 -
  gfs2/fsck/fs_bits.h                              |   13 -
  gfs2/fsck/fs_recovery.c                          |   13 -
  gfs2/fsck/fs_recovery.h                          |   13 -
  gfs2/fsck/fsck.h                                 |   12 -
  gfs2/fsck/hash.c                                 |   13 -
  gfs2/fsck/hash.h                                 |   13 -
  gfs2/fsck/initialize.c                           |   13 -
  gfs2/fsck/inode_hash.c                           |   13 -
  gfs2/fsck/inode_hash.h                           |   13 -
  gfs2/fsck/link.c                                 |   13 -
  gfs2/fsck/link.h                                 |   14 -
  gfs2/fsck/lost_n_found.c                         |   13 -
  gfs2/fsck/lost_n_found.h                         |   13 -
  gfs2/fsck/main.c                                 |   12 -
  gfs2/fsck/metawalk.c                             |   12 -
  gfs2/fsck/metawalk.h                             |   12 -
  gfs2/fsck/pass1.c                                |   13 -
  gfs2/fsck/pass1b.c                               |   13 -
  gfs2/fsck/pass1c.c                               |   12 -
  gfs2/fsck/pass2.c                                |   13 -
  gfs2/fsck/pass3.c                                |   13 -
  gfs2/fsck/pass4.c                                |   13 -
  gfs2/fsck/pass5.c                                |   13 -
  gfs2/fsck/rgrepair.c                             |   13 -
  gfs2/fsck/test.c                                 |    1 -
  gfs2/fsck/test_bitmap.c                          |   12 -
  gfs2/fsck/test_block_list.c                      |   12 -
  gfs2/fsck/util.c                                 |   13 -
  gfs2/fsck/util.h                                 |   13 -
  gfs2/include/gfs2_disk_hash.h                    |   13 -
  gfs2/include/global.h                            |   13 -
  gfs2/include/linux_endian.h                      |   13 -
  gfs2/include/osi_list.h                          |   13 -
  gfs2/include/osi_user.h                          |   13 -
  gfs2/init.d/Makefile                             |   12 -
  gfs2/libgfs2/Makefile                            |   48 +-
  gfs2/libgfs2/bitmap.c                            |   12 -
  gfs2/libgfs2/block_list.c                        |   12 -
  gfs2/libgfs2/buf.c                               |   13 -
  gfs2/libgfs2/device_geometry.c                   |   13 -
  gfs2/libgfs2/fs_bits.c                           |   13 -
  gfs2/libgfs2/fs_geometry.c                       |   13 -
  gfs2/libgfs2/fs_ops.c                            |   13 -
  gfs2/libgfs2/gfs2_log.c                          |   12 -
  gfs2/libgfs2/libgfs2.h                           |   13 -
  gfs2/libgfs2/locking.c                           |   13 -
  gfs2/libgfs2/misc.c                              |   13 -
  gfs2/libgfs2/ondisk.c                            |   13 -
  gfs2/libgfs2/ondisk.h                            |    9 -
  gfs2/libgfs2/recovery.c                          |    9 -
  gfs2/libgfs2/rgrp.c                              |   13 -
  gfs2/libgfs2/size.c                              |   13 -
  gfs2/libgfs2/structures.c                        |   13 -
  gfs2/libgfs2/super.c                             |   13 -
  gfs2/man/Makefile                                |   13 -
  gfs2/man/gfs2.8                                  |    3 -
  gfs2/man/gfs2_convert.8                          |    3 -
  gfs2/man/gfs2_edit.8                             |    6 +-
  gfs2/man/gfs2_fsck.8                             |    3 -
  gfs2/man/gfs2_grow.8                             |    3 -
  gfs2/man/gfs2_jadd.8                             |    3 -
  gfs2/man/gfs2_mount.8                            |    8 -
  gfs2/man/gfs2_quota.8                            |    3 -
  gfs2/man/gfs2_tool.8                             |    3 -
  gfs2/man/mkfs.gfs2.8                             |    3 -
  gfs2/mkfs/Makefile                               |    5 -
  gfs2/mkfs/gfs2_mkfs.h                            |   13 -
  gfs2/mkfs/main.c                                 |   13 -
  gfs2/mkfs/main_grow.c                            |   12 -
  gfs2/mkfs/main_jadd.c                            |   11 -
  gfs2/mkfs/main_mkfs.c                            |   13 -
  gfs2/mount/Makefile                              |   18 +-
  gfs2/mount/mount.gfs2.c                          |    8 -
  gfs2/mount/mtab.c                                |   14 +-
  gfs2/mount/ondisk1.c                             |   13 -
  gfs2/mount/ondisk2.c                             |   13 -
  gfs2/mount/util.c                                |    8 -
  gfs2/mount/util.h                                |    8 -
  gfs2/quota/Makefile                              |   13 -
  gfs2/quota/check.c                               |   13 -
  gfs2/quota/gfs2_quota.h                          |   13 -
  gfs2/quota/main.c                                |   12 -
  gfs2/quota/names.c                               |   13 -
  gfs2/tool/Makefile                               |   13 -
  gfs2/tool/decipher_lockstate_dump                |   14 -
  gfs2/tool/df.c                                   |   13 -
  gfs2/tool/gfs2_tool.h                            |   13 -
  gfs2/tool/iflags.h                               |   13 -
  gfs2/tool/layout.c                               |   13 -
  gfs2/tool/main.c                                 |   13 -
  gfs2/tool/misc.c                                 |   13 -
  gfs2/tool/ondisk.c                               |   13 -
  gfs2/tool/parse_lockdump                         |   14 -
  gfs2/tool/sb.c                                   |   13 -
  gfs2/tool/tune.c                                 |   13 -
  gnbd-kernel/src/Makefile                         |   13 -
  gnbd-kernel/src/gnbd.c                           |   13 -
  gnbd-kernel/src/gnbd.h                           |   13 -
  gnbd/COPYING                                     |  340 ---
  gnbd/Makefile                                    |   13 -
  gnbd/client/Makefile                             |   13 -
  gnbd/client/gnbd_monitor.c                       |   12 -
  gnbd/client/gnbd_monitor.h                       |   12 -
  gnbd/client/gnbd_recvd.c                         |   12 -
  gnbd/client/monitor_req.c                        |   12 -
  gnbd/include/global.h                            |   13 -
  gnbd/include/gnbd_endian.h                       |   13 -
  gnbd/man/Makefile                                |   16 +-
  gnbd/man/fence_gnbd.8                            |   87 -
  gnbd/man/gnbd.8                                  |    3 -
  gnbd/man/gnbd_export.8                           |    3 -
  gnbd/man/gnbd_import.8                           |    3 -
  gnbd/man/gnbd_serv.8                             |    2 -
  gnbd/server/Makefile                             |   13 -
  gnbd/server/device.c                             |   12 -
  gnbd/server/device.h                             |   12 -
  gnbd/server/extern_req.c                         |   11 -
  gnbd/server/extern_req.h                         |   12 -
  gnbd/server/fence.c                              |   12 -
  gnbd/server/fence.h                              |   12 -
  gnbd/server/gnbd_clusterd.c                      |   12 -
  gnbd/server/gnbd_serv.c                          |   12 -
  gnbd/server/gnbd_server.h                        |   12 -
  gnbd/server/gserv.c                              |   12 -
  gnbd/server/gserv.h                              |   12 -
  gnbd/server/list.h                               |   13 -
  gnbd/server/local_req.c                          |   12 -
  gnbd/server/local_req.h                          |   12 -
  gnbd/tools/Makefile                              |   15 +-
  gnbd/tools/fence_gnbd/Makefile                   |   37 -
  gnbd/tools/fence_gnbd/main.c                     |  340 ---
  gnbd/tools/gnbd_export/Makefile                  |   13 -
  gnbd/tools/gnbd_export/gnbd_export.c             |   14 -
  gnbd/tools/gnbd_import/Makefile                  |   13 -
  gnbd/tools/gnbd_import/fence_return.h            |   13 -
  gnbd/tools/gnbd_import/gnbd_import.c             |   12 -
  gnbd/utils/Makefile                              |   13 -
  gnbd/utils/gnbd_utils.c                          |   12 -
  gnbd/utils/gnbd_utils.h                          |   12 -
  gnbd/utils/member_cman.c                         |   12 -
  gnbd/utils/member_cman.h                         |   12 -
  gnbd/utils/trans.c                               |   12 -
  gnbd/utils/trans.h                               |   12 -
  group/Makefile                                   |   12 -
  group/daemon/Makefile                            |   12 -
  group/daemon/gd_internal.h                       |   13 -
  group/daemon/groupd.h                            |   13 -
  group/daemon/main.c                              |   12 -
  group/dlm_controld/Makefile                      |   20 +-
  group/dlm_controld/action.c                      |   12 -
  group/dlm_controld/config.c                      |   12 -
  group/dlm_controld/config.h                      |   12 -
  group/dlm_controld/cpg.c                         |   12 -
  group/dlm_controld/crc.c                         |   13 -
  group/dlm_controld/deadlock.c                    |   12 -
  group/dlm_controld/dlm_controld.h                |   12 -
  group/dlm_controld/dlm_daemon.h                  |   12 -
  group/dlm_controld/group.c                       |   12 -
  group/dlm_controld/main.c                        |   12 -
  group/dlm_controld/member_cman.c                 |   12 -
  group/dlm_controld/netlink.c                     |   12 -
  group/dlm_controld/plock.c                       |   12 -
  group/gfs_control/Makefile                       |   17 +-
  group/gfs_control/main.c                         |   12 -
  group/gfs_controld/Makefile                      |   20 +-
  group/gfs_controld/config.c                      |   12 -
  group/gfs_controld/config.h                      |   12 -
  group/gfs_controld/cpg-old.c                     |   12 -
  group/gfs_controld/cpg-old.h                     |   12 -
  group/gfs_controld/gfs_controld.h                |   12 -
  group/gfs_controld/gfs_daemon.h                  |   12 -
  group/gfs_controld/group.c                       |   12 -
  group/gfs_controld/main.c                        |   12 -
  group/gfs_controld/member_cman.c                 |   12 -
  group/gfs_controld/plock.c                       |   12 -
  group/gfs_controld/util.c                        |   12 -
  group/include/linux_endian.h                     |   13 -
  group/lib/Makefile                               |   25 +-
  group/lib/libgroup.c                             |   22 -
  group/lib/libgroup.h                             |   22 -
  group/libgfscontrol/Makefile                     |   43 +-
  group/libgfscontrol/libgfscontrol.h              |   22 -
  group/libgfscontrol/main.c                       |   12 -
  group/man/Makefile                               |   12 -
  group/man/dlm_controld.8                         |    6 -
  group/man/gfs_controld.8                         |    6 -
  group/man/group_tool.8                           |    6 -
  group/man/groupd.8                               |    6 -
  group/test/Makefile                              |   12 -
  group/test/clientd.c                             |   12 -
  group/tool/Makefile                              |   12 -
  group/tool/main.c                                |   12 -
  make/copyright.cf                                |   16 -
  make/defines.mk.input                            |   18 +-
  make/libs.mk                                     |   47 +
  rgmanager/AUTHORS                                |   13 -
  rgmanager/COPYING                                |  340 ---
  rgmanager/INSTALL                                |    7 -
  rgmanager/Makefile                               |   13 -
  rgmanager/NEWS                                   |    2 -
  rgmanager/include/clulog.h                       |   22 -
  rgmanager/include/event.h                        |   17 -
  rgmanager/include/findproc.h                     |   18 -
  rgmanager/include/platform.h                     |   19 -
  rgmanager/include/res-ocf.h                      |   18 -
  rgmanager/include/reslist.h                      |   18 -
  rgmanager/include/restart_counter.h              |   17 -
  rgmanager/include/rg_locks.h                     |   17 -
  rgmanager/include/rg_queue.h                     |   17 -
  rgmanager/include/rmtab.h                        |   18 -
  rgmanager/include/sets.h                         |   17 -
  rgmanager/include/vf.h                           |   18 -
  rgmanager/init.d/Makefile                        |   12 -
  rgmanager/init.d/rgmanager.in                    |    6 -
  rgmanager/man/Makefile                           |   13 -
  rgmanager/src/Makefile                           |   13 -
  rgmanager/src/clulib/Makefile                    |   12 -
  rgmanager/src/clulib/alloc.c                     |   22 -
  rgmanager/src/clulib/ckpt_state.c                |   18 -
  rgmanager/src/clulib/clulog.c                    |   19 -
  rgmanager/src/clulib/cman.c                      |   18 -
  rgmanager/src/clulib/daemon_init.c               |   19 -
  rgmanager/src/clulib/fdops.c                     |   18 -
  rgmanager/src/clulib/lock.c                      |   18 -
  rgmanager/src/clulib/locktest.c                  |   18 -
  rgmanager/src/clulib/members.c                   |   18 -
  rgmanager/src/clulib/message.c                   |   18 -
  rgmanager/src/clulib/msg_cluster.c               |   18 -
  rgmanager/src/clulib/msg_socket.c                |   18 -
  rgmanager/src/clulib/msgsimple.c                 |   19 -
  rgmanager/src/clulib/msgtest.c                   |   18 -
  rgmanager/src/clulib/rg_strings.c                |   18 -
  rgmanager/src/clulib/sets.c                      |   17 -
  rgmanager/src/clulib/signals.c                   |   18 -
  rgmanager/src/clulib/tmgr.c                      |   19 -
  rgmanager/src/clulib/vft.c                       |   18 -
  rgmanager/src/clulib/wrap_lock.c                 |   19 -
  rgmanager/src/daemons/Makefile                   |   12 -
  rgmanager/src/daemons/clurmtabd.c                |   18 -
  rgmanager/src/daemons/clurmtabd_lib.c            |   18 -
  rgmanager/src/daemons/depends.c                  |   19 -
  rgmanager/src/daemons/event_config.c             |   17 -
  rgmanager/src/daemons/fo_domain.c                |   18 -
  rgmanager/src/daemons/groups.c                   |   19 -
  rgmanager/src/daemons/main.c                     |   19 -
  rgmanager/src/daemons/reslist.c                  |   18 -
  rgmanager/src/daemons/resrules.c                 |   18 -
  rgmanager/src/daemons/restart_counter.c          |   17 -
  rgmanager/src/daemons/restree.c                  |   21 -
  rgmanager/src/daemons/rg_event.c                 |   17 -
  rgmanager/src/daemons/rg_forward.c               |   18 -
  rgmanager/src/daemons/rg_locks.c                 |   18 -
  rgmanager/src/daemons/rg_queue.c                 |   18 -
  rgmanager/src/daemons/rg_state.c                 |   18 -
  rgmanager/src/daemons/rg_thread.c                |   18 -
  rgmanager/src/daemons/service_op.c               |   17 -
  rgmanager/src/daemons/slang_event.c              |   17 -
  rgmanager/src/daemons/test.c                     |   18 -
  rgmanager/src/daemons/watchdog.c                 |   18 -
  rgmanager/src/resources/Makefile                 |   12 -
  rgmanager/src/resources/apache.sh                |   23 -
  rgmanager/src/resources/clusterfs.sh             |   20 -
  rgmanager/src/resources/fs.sh                    |   20 -
  rgmanager/src/resources/ip.sh                    |   20 -
  rgmanager/src/resources/lvm.sh                   |   19 -
  rgmanager/src/resources/lvm_by_lv.sh             |   19 -
  rgmanager/src/resources/lvm_by_vg.sh             |   19 -
  rgmanager/src/resources/mysql.sh                 |   23 -
  rgmanager/src/resources/named.sh                 |   27 +-
  rgmanager/src/resources/netfs.sh                 |   20 -
  rgmanager/src/resources/nfsclient.sh             |   19 -
  rgmanager/src/resources/nfsexport.sh             |   20 -
  rgmanager/src/resources/nfsserver.sh             |   19 -
  rgmanager/src/resources/ocf-shellfuncs           |   20 -
  rgmanager/src/resources/openldap.sh              |   23 -
  rgmanager/src/resources/postgres-8.sh            |   29 +-
  rgmanager/src/resources/samba.sh                 |   33 +-
  rgmanager/src/resources/script.sh                |   19 -
  rgmanager/src/resources/service.sh               |   19 -
  rgmanager/src/resources/smb.sh                   |   24 -
  rgmanager/src/resources/svclib_nfslock           |   18 -
  rgmanager/src/resources/tomcat-5.sh              |   25 +-
  rgmanager/src/resources/utils/config-utils.sh.in |   19 -
  rgmanager/src/resources/utils/member_util.sh     |   25 -
  rgmanager/src/resources/utils/messages.sh        |   25 -
  rgmanager/src/resources/utils/ra-skelet.sh       |   22 -
  rgmanager/src/resources/vm.sh                    |   18 -
  rgmanager/src/utils/Makefile                     |   12 -
  rgmanager/src/utils/cluarp.c                     |   19 -
  rgmanager/src/utils/clubufflush.c                |   19 -
  rgmanager/src/utils/clufindhostname.c            |   19 -
  rgmanager/src/utils/clulog.c                     |   19 -
  rgmanager/src/utils/clunfslock.sh                |    4 -
  rgmanager/src/utils/clunfsops.c                  |   18 -
  rgmanager/src/utils/clusvcadm.c                  |   18 -
  rgmanager/src/utils/syscall.h                    |   17 -
  scripts/fenceparse                               |   12 -
  scripts/uninstall.pl                             |   13 -
  832 files changed, 2124 insertions(+), 25812 deletions(-)

- --
I'm going to make him an offer he can't refuse.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iQIVAwUBSEze7wgUGcMLQ3qJAQJuag/9ElCvjLF8kvTAzXhIrJFz87bHYBHcoLdu
0sbkXyuqRRJn3lx4Cnvs0OcFKS7Z5QWz7163/n+jnotJkP+ZjEKq4BCz5RbP5jhJ
LoEYIfs9AEIdg/1UKxcgIrFZLm/ETexW3v8ou/pnEolo0+xgC6NEQKM2/IHYcQMY
EP5kuZIFI8j2NIQJCFDtGFiRWfGyk4mqMdRvm4a1D0D3uTIa1m5rPdm0cGl2mBY9
1YQUp331M79VhAKKAXq0an+0kETeZthHdo/6uxSAB8csOz/oSvH4uZohPTs34QGH
AHao2qQH9bXajY8c3UYry36lrVuNyGoJY1yuxJP0X48ua5f04IusuqJDBSRYoTyk
lzsXxzzWOPgXY6v2yPZoFHHRBA/p6ugxRWfR0938ZHlpfuI4XprbLtnFg66BCBQ1
KpSha84OWaTZGBBuYYsqVwJcVBYC/GG9USOq/1pq8l9ha3xnwQYhWSgwKHbDPBy4
s5JbPzRvts0K1n7nvgAPbE9IFKRZLaFQjYNpIUbZFNbThJw5o4qAfS+uDfmjnZJO
DoWSycVVxfg7Teh0RQYf5fJZ1ZW7nW6XBbp+8Oed2eLnn2xpodt+ghxlvfHUtAjh
JWZWJ4EUG+acqPrMkiHWEtGB794XrGy9kaQ7+RSQtJs0TQO7vIiDxXLk9RDw+dKe
Hc8zdwtRrho=
=cUUA
-----END PGP SIGNATURE-----


From Alain.Moulle at bull.net  Mon Jun  9 09:04:38 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Mon, 09 Jun 2008 11:04:38 +0200
Subject: [Linux-cluster] CS5 / about loop "Node is undead"
Message-ID: <484CF226.1040602@bull.net>

Hi

About my problem of node entering a loop :
Jun  3 15:54:49 s_sys at xn2 qdiskd[22256]: <notice> Writing eviction notice for node 1
Jun  3 15:54:50 s_sys at xn2 qdiskd[22256]: <notice> Node 1 evicted
Jun  3 15:54:51 s_sys at xn2 qdiskd[22256]: <crit> Node 1 is undead.

I notice that just before entering this loop, I have a message :
Jun  3 15:54:47 s_sys at xn2 fenced[22327]: fencing node "xn1"
Jun  3 15:54:48 s_sys at xn2 qdiskd[22256]: <info> Assuming master role

but never the message :
Jun  3 15:54:47 s_sys at xn2 fenced[22327]: fence "xn1" success

Nethertheless, the service of xn1 is well failovered by xn2, but
then after the reboot of xn1, we can't start again the CS5 due
to the problem of infernal loop "Node is undead" on xn2.

whereas when it works correctly, both messages :
fencing node "xn1"
fence "xn1" success
are successive (after about 30s)

So my question is : could this pb of infernal loop "Node is undead"
be systematically due to a failed fencing phase of xn2 towards xn1 ?

PS: note that I have applied patch :
http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9

Thanks
Regards
Alain Moull?


From ccaulfie at redhat.com  Mon Jun  9 09:21:19 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Mon, 09 Jun 2008 10:21:19 +0100
Subject: [Linux-cluster] DLM Book updated
Message-ID: <484CF60F.4050804@redhat.com>

I have updated the "Programming Locking Applications" document. Lots of 
typos and bizarre sentences have been fixed (thanks Bob!). I have also 
added a new section (chapter 5) which is an overview of the DLM 
internals for those that want to understand a little of how and where 
locks are mastered etc.

It's no substitute for reading the code but it might make it a little 
easier :)

http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf

Chrissie


From ccaulfie at redhat.com  Mon Jun  9 10:19:14 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Mon, 09 Jun 2008 11:19:14 +0100
Subject: [Linux-cluster] DLM Book updated
In-Reply-To: <484CF60F.4050804@redhat.com>
References: <484CF60F.4050804@redhat.com>
Message-ID: <484D03A2.1030908@redhat.com>

Christine Caulfield wrote:
> I have updated the "Programming Locking Applications" document. Lots of 
> typos and bizarre sentences have been fixed (thanks Bob!). I have also 
> added a new section (chapter 5) which is an overview of the DLM 
> internals for those that want to understand a little of how and where 
> locks are mastered etc.
> 
> It's no substitute for reading the code but it might make it a little 
> easier :)
> 
> http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf

I should also have mentioned that the file is also available in the 
cluster wiki at:

http://sources.redhat.com/cluster/wiki/

-- 

Chrissie


From yamato at redhat.com  Mon Jun  9 10:29:06 2008
From: yamato at redhat.com (Masatake YAMATO)
Date: Mon, 09 Jun 2008 19:29:06 +0900 (JST)
Subject: [Linux-cluster] DLM Book updated
In-Reply-To: <484D03A2.1030908@redhat.com>
References: <484CF60F.4050804@redhat.com>
	<484D03A2.1030908@redhat.com>
Message-ID: <20080609.192906.106743919.yamato@redhat.com>

I'm quite happy if you write that wireshark-1.0.0 has DLM3 protocol
dissector in the book. Actually wireshark is helpful for the readers
of the book to understand the behavior of DLM.


Regards,
Masatake YAMATO


From ccaulfie at redhat.com  Mon Jun  9 10:32:54 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Mon, 09 Jun 2008 11:32:54 +0100
Subject: [Linux-cluster] DLM Book updated
In-Reply-To: <20080609.192906.106743919.yamato@redhat.com>
References: <484CF60F.4050804@redhat.com>	<484D03A2.1030908@redhat.com>
	<20080609.192906.106743919.yamato@redhat.com>
Message-ID: <484D06D6.8010005@redhat.com>

Masatake YAMATO wrote:
> I'm quite happy if you write that wireshark-1.0.0 has DLM3 protocol
> dissector in the book. Actually wireshark is helpful for the readers
> of the book to understand the behavior of DLM.

I did mention it, in another document about Cluster Suite networking. 
But, yes it would be nice to have a reference in that book too, I'll add it.

Thanks

Chrissie


From alain.richard at equation.fr  Mon Jun  9 11:45:02 2008
From: alain.richard at equation.fr (Alain RICHARD)
Date: Mon, 9 Jun 2008 13:45:02 +0200
Subject: [Linux-cluster] DLM Book updated
In-Reply-To: <484CF60F.4050804@redhat.com>
References: <484CF60F.4050804@redhat.com>
Message-ID: <F860960D-BEC7-4E25-834B-EFB6445D3AB1@equation.fr>


Le 9 juin 08 ? 11:21, Christine Caulfield a ?crit :

> I have updated the "Programming Locking Applications" document. Lots  
> of typos and bizarre sentences have been fixed (thanks Bob!). I have  
> also added a new section (chapter 5) which is an overview of the DLM  
> internals for those that want to understand a little of how and  
> where locks are mastered etc.
>
> It's no substitute for reading the code but it might make it a  
> little easier :)
>
> http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf
>
> Chrissie
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


You mention in this document that dlm is able to use SCTP, but I found  
no information on how to do it, is there any documents about it ?

Regards,

-- 
Alain RICHARD <mailto:alain.richard at equation.fr>
EQUATION SA <http://www.equation.fr/>
Tel : +33 477 79 48 00     Fax : +33 477 79 48 01
E-Liance, Op?rateur des entreprises et collectivit?s,
Liaisons Fibre optique, SDSL et ADSL <http://www.e-liance.fr>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080609/6febc318/attachment.htm>

From ccaulfie at redhat.com  Mon Jun  9 12:06:43 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Mon, 09 Jun 2008 13:06:43 +0100
Subject: [Linux-cluster] DLM Book updated
In-Reply-To: <F860960D-BEC7-4E25-834B-EFB6445D3AB1@equation.fr>
References: <484CF60F.4050804@redhat.com>
	<F860960D-BEC7-4E25-834B-EFB6445D3AB1@equation.fr>
Message-ID: <484D1CD3.5070907@redhat.com>

Alain RICHARD wrote:
> 
> Le 9 juin 08 ? 11:21, Christine Caulfield a ?crit :
> 
>> I have updated the "Programming Locking Applications" document. Lots 
>> of typos and bizarre sentences have been fixed (thanks Bob!). I have 
>> also added a new section (chapter 5) which is an overview of the DLM 
>> internals for those that want to understand a little of how and where 
>> locks are mastered etc.
>>
>> It's no substitute for reading the code but it might make it a little 
>> easier :)
>>
>> http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf
>>
>> Chrissie
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> You mention in this document that dlm is able to use SCTP, but I found 
> no information on how to do it, is there any documents about it ?
> 

It's not (well) tested so it's regarded as unsupported at the moment. If 
you want to test it you'll need to add this to cluster.conf (inside the 
<cluster> tags:

<dlm protocol="sctp"/>

and the following sysctls to keep SCTP itself happy"

# echo 4194304 > /proc/sys/net/core/rmem_default
# echo 4194304 > /proc/sys/net/core/rmem_max


Chrissie


From Alain.Moulle at bull.net  Mon Jun  9 12:23:13 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Mon, 09 Jun 2008 14:23:13 +0200
Subject: [Linux-cluster] CS5 / quorum disk and heuristics 
Message-ID: <484D20B1.9040502@bull.net>

Hi

One thing bothers me again :

I have this record in cluster.conf :
<quorumd label="QDISK_0_0" interval="1" tko="10" votes="1" min_score="1">
  <heuristic interval="10" tko="3" program="ping -t1 -c1 172.20.0.110" score="1"/>
  <heuristic interval="10" program="ping -t3 -c1 172.20.0.110" score="1"/>
</quorumd>
where 172.20.0.110 is a third machine not in my cluster pair node1/node2

My last understanding was that quorum disk was NOT a redundancy of heart-beat,
meaning that if heart-beat interface fails, there is a failover but it is
always the node with the expected min_score in quorum disk which fence the
other.
So I thought that the quorum disk check was operationnal only if the node
detects a problem on heart-beat interface ... but when I set down the interface
on the third machine, and after a few seconds, both nodes node1/node2
are killed !!! Whereas heart-beat interface was working fine.
And after reboot, I can see "cluster not quorate" etc.

So in fact, even if the heart-beat interface works fine, but there is
not the expected min_score for heuristics of quorum disk, both nodes
are stopped.

Is it the normal behavior ?

Thanks
Regards
Alain Moull?


From rohara at redhat.com  Mon Jun  9 14:56:34 2008
From: rohara at redhat.com (Ryan O'Hara)
Date: Mon, 09 Jun 2008 09:56:34 -0500
Subject: [Linux-cluster] Changes in libccs behaviour (PLEASE READ!)
In-Reply-To: <Pine.LNX.4.64.0806031055250.5892@trider-g7>
References: <Pine.LNX.4.64.0806031055250.5892@trider-g7>
Message-ID: <484D44A2.4090706@redhat.com>

Fabio M. Di Nitto wrote:

> ccs_test(8): not fully completed yet (another email will follow).

ccs_test should go away. It was never intended to be used as a 
production tool, it was simply intended to be a tool to test ccs. 
Futhermore, the fact that you must create connections and then use those 
connection ID's in order to extract information from ccs is overkill. 
What we really want is a simple tool that handles xpath queries for 
config information. The idea of "connections" should be hidden from the 
user.

I believe there is some overlap between ccs_test and ccs_tool. If I 
recall, ccs_tool can handle some simple xpath queries. Even better is 
that users do not have to create connections, etc.


From fdinitto at redhat.com  Mon Jun  9 15:22:38 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 9 Jun 2008 17:22:38 +0200 (CEST)
Subject: [Linux-cluster] Changes in libccs behaviour (PLEASE READ!)
In-Reply-To: <484D44A2.4090706@redhat.com>
References: <Pine.LNX.4.64.0806031055250.5892@trider-g7>
	<484D44A2.4090706@redhat.com>
Message-ID: <Pine.LNX.4.64.0806091716470.5892@trider-g7>

On Mon, 9 Jun 2008, Ryan O'Hara wrote:

> Fabio M. Di Nitto wrote:
>
>> ccs_test(8): not fully completed yet (another email will follow).
>
> ccs_test should go away. It was never intended to be used as a production 
> tool, it was simply intended to be a tool to test ccs.

Indeed but it used as such and this is a fact :)

> Futhermore, the fact 
> that you must create connections and then use those connection ID's in order 
> to extract information from ccs is overkill. What we really want is a simple 
> tool that handles xpath queries for config information. The idea of 
> "connections" should be hidden from the user.

Not anymore. with the new libccs, there is no need to establish a 
connection. You just need to pass something > 0 instead of the fd.
I kept the fd option around to avoid breaking the compatibility.

What ccs_test is missing is only an option to select full xpath vs xpath 
lite at the moment.

> I believe there is some overlap between ccs_test and ccs_tool. If I recall, 
> ccs_tool can handle some simple xpath queries. Even better is that users do 
> not have to create connections, etc.

No, ccs_tool doesn't handle queries at all.

As above, no need to create connections anylonger :) I only need to finish 
that switch and write those changes.

Fabio

--
I'm going to make him an offer he can't refuse.


From fdinitto at redhat.com  Mon Jun  9 19:22:53 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 9 Jun 2008 21:22:53 +0200 (CEST)
Subject: [Linux-cluster] New fencing method
In-Reply-To: <20080519230347.GA30667@kallisti.us>
References: <20080519230347.GA30667@kallisti.us>
Message-ID: <Pine.LNX.4.64.0806092122000.5892@trider-g7>

On Mon, 19 May 2008, Ross Vandegrift wrote:

> Hello everyone,
>
> I wrote a new fencing method script that fences by remotely shutting
> down a switchport.  The idea is to fabric fence an iSCSI client by
> shutting down the port used for iSCSI connectivity.

Hi Ross,

for your information the agent will be part of our stable releases 
starting from the next one (2.03.04).

Thanks again for your help and contribution.

Fabio

--
I'm going to make him an offer he can't refuse.


From lhh at redhat.com  Mon Jun  9 20:24:00 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 09 Jun 2008 16:24:00 -0400
Subject: [Linux-cluster] CS5 / is there a tunable timer between the
	three start/stop tries ?
In-Reply-To: <48468843.5040300@bull.net>
References: <48468843.5040300@bull.net>
Message-ID: <1213043040.27637.4.camel@ayanami.boston.devel.redhat.com>


On Wed, 2008-06-04 at 14:19 +0200, Alain Moulle wrote:
> Hi
> 
> With CS5, when the status of a service returns failed, the CS5 tries
> to start three times the service , so we can see three start/stop
> sequences if it does not start correctly each time. The following
> start is always launchec just after the stop,
> is there a tunable timer between the three start/stop tries ?

Not currently.

-- Lon


From lhh at redhat.com  Mon Jun  9 20:25:40 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 09 Jun 2008 16:25:40 -0400
Subject: [Linux-cluster] CS5 / about loop "Node is undead"
In-Reply-To: <48468ED9.3050401@bull.net>
References: <48468ED9.3050401@bull.net>
Message-ID: <1213043140.27637.7.camel@ayanami.boston.devel.redhat.com>


On Wed, 2008-06-04 at 14:47 +0200, Alain Moulle wrote:
> Hi
> 
> About my problem of node entering a loop :
> Jun  3 15:54:49 s_sys at xn2 qdiskd[22256]: <notice> Writing eviction notice for node 1
> Jun  3 15:54:50 s_sys at xn2 qdiskd[22256]: <notice> Node 1 evicted
> Jun  3 15:54:51 s_sys at xn2 qdiskd[22256]: <crit> Node 1 is undead.
> 
> I notice that just before entering this loop, I have a message :
> Jun  3 15:54:47 s_sys at xn2 fenced[22327]: fencing node "xn1"
> Jun  3 15:54:48 s_sys at xn2 qdiskd[22256]: <info> Assuming master role
> 
> but never the message :
> Jun  3 15:54:47 s_sys at xn2 fenced[22327]: fence "xn1" success
> 
> Nethertheless, the service of xn1 is well failovered by xn2, but
> then after the reboot of xn1, we can't start again the CS5 due
> to the problem of infernal loop "Node is undead" on xn2.
> 
> whereas when it works correctly, both messages :
> fencing node "xn1"
> fence "xn1" success
> are successive (after about 30s)
> 
> So my question is : could this pb of infernal loop "Node is undead"
> be systematically due to a failed fencing phase of xn2 towards xn1 ?
> 
> PS: note that I have applied patch :
> http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9

Yes.  If qdiskd thinks the node is dead and the node started writing to
the disk again (which is what fencing should prevent), it will display
those messages.

-- Lon


From rpeterso at redhat.com  Mon Jun  9 21:16:56 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Mon, 09 Jun 2008 16:16:56 -0500
Subject: [Linux-cluster] GFS performance tuning
Message-ID: <1213046217.21321.53.camel@technetium.msp.redhat.com>

Hi Everyone,

I just wanted to let everyone here know that I just updated the
cluster wiki page regarding GFS performance tuning.  I added a bunch
of information about increasing GFS performance:

1. How to use "fast statfs".
2. Disabling updatedb for GFS.
3. More considerations about the Resource Group size and the
   new "bitfit" function.
4. Designing your environment with the DLM in mind.
5. How to use "glock trimming".

The updates are here:

http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_tuning

Regards,

Bob Peterson
Red Hat GFS & Clustering


From james.hofmeister at hp.com  Mon Jun  9 21:19:33 2008
From: james.hofmeister at hp.com (Hofmeister, James (WTEC Linux))
Date: Mon, 9 Jun 2008 21:19:33 +0000
Subject: [Linux-cluster] Scipt to revert GFS2 to GFS1?
In-Reply-To: <1213043140.27637.7.camel@ayanami.boston.devel.redhat.com>
Message-ID: <EC61DD7B6048464AB0E1B713AF7521BC15A88781D7@GVW0676EXC.americas.hpqcorp.net>

Hello All,

I have a customer RHEL-5.1 who converted a GFS1 file system to GFS2 with gfs2_convert.  They are experiencing hangs on unmount of GFS2 file systems since this change. Is there a tool to convert GFS2 file systems back to GFS1?

Regards,
James Hofmeister
Hewlett Packard Linux Solutions Engineer


From rpeterso at redhat.com  Mon Jun  9 21:32:41 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Mon, 09 Jun 2008 16:32:41 -0500
Subject: [Linux-cluster] Scipt to revert GFS2 to GFS1?
In-Reply-To: <EC61DD7B6048464AB0E1B713AF7521BC15A88781D7@GVW0676EXC.americas.hpqcorp.net>
References: <EC61DD7B6048464AB0E1B713AF7521BC15A88781D7@GVW0676EXC.americas.hpqcorp.net>
Message-ID: <1213047161.21321.67.camel@technetium.msp.redhat.com>

On Mon, 2008-06-09 at 21:19 +0000, Hofmeister, James (WTEC Linux) wrote:
> Hello All,
> 
> I have a customer RHEL-5.1 who converted a GFS1 file system to GFS2 with gfs2_convert.  They are experiencing hangs on unmount of GFS2 file systems since this change. Is there a tool to convert GFS2 file systems back to GFS1?
> 
> Regards,
> James Hofmeister
> Hewlett Packard Linux Solutions Engineer

The short answer is No, the gfs2_convert tool is one-way.  It could be
done because the on-disk formats are not that different.  You would have
to write a tool that does gfs2_convert in reverse: changing all the
inode numbers back to match their disk block locations, converting all
your journals back into giant journal-sized holes in the file system,
and changing all the file flags from standard Linux format to GFS format.
This would not be impossible, but it would be a big project.
The biggest challenge would be in figuring out where the journals belong
and moving anything that got moved to those locations different RGs.
The would be a very good "start" of a "gfs_shrink" tool that doesn't
exist, by the way.

So I recommend they upgrade to the latest and greatest GFS2 code, which
would either be from the nwm git tree, or else the newest RHEL5.2 kmod
RPM.  Then, if they still have a problem unmounting, post the symptoms
and we'll try to address the issues here.

If there is a bug in the unmount code, we need to find and fix it.
I am only aware of one such bug at the moment, which is:

https://bugzilla.redhat.com/show_bug.cgi?id=207697

You may or may not be able to read this bug record; My apologies if you
can't read it; the bug record permissions are out of my control.
This bug is only for the unmount that happens when systems are rebooted.
There is a work-around for it, too, which is to enable the gfs2 init
script.  If you've encountered some other problem, and it can be
recreated on recent levels of GFS2 code, please open a bugzilla record
so we can help find and fix it.

Regards,

Bob Peterson
Red Hat Clustering & GFS


From lhh at redhat.com  Mon Jun  9 21:35:54 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 09 Jun 2008 17:35:54 -0400
Subject: [Linux-cluster] CS5 / quorum disk and heuristics
In-Reply-To: <484D20B1.9040502@bull.net>
References: <484D20B1.9040502@bull.net>
Message-ID: <1213047355.27637.20.camel@ayanami.boston.devel.redhat.com>


On Mon, 2008-06-09 at 14:23 +0200, Alain Moulle wrote:

> My last understanding was that quorum disk was NOT a redundancy of heart-beat,
> meaning that if heart-beat interface fails, there is a failover but it is
> always the node with the expected min_score in quorum disk which fence the
> other.

Qdiskd can never tell CMAN or openais that a computer is a member of the
cluster, but it can remove nodes from the cluster.

> So I thought that the quorum disk check was operationnal only if the node
> detects a problem on heart-beat interface ... but when I set down the interface
> on the third machine, and after a few seconds, both nodes node1/node2
> are killed !!! 

Think of the heuristics as asking the question:

  "Am I fit to participate in the cluster?"

If the answer is "yes" and suddenly changes to "no", the node removes
itself.

> Whereas heart-beat interface was working fine.

You can disable these by setting allow_kill="0" and/or reboot="0" (see
qdisk(5)).


> And after reboot, I can see "cluster not quorate" etc.

This happens after both nodes boot, or just one?  If both nodes boot up
with the third node off, they should still be able to form a quorum by
themselves, even if qdiskd isn't running or its score isn't sufficient.

-- Lon


From lstrozzini at gmail.com  Tue Jun 10 08:18:40 2008
From: lstrozzini at gmail.com (Loris Strozzini)
Date: Tue, 10 Jun 2008 10:18:40 +0200
Subject: [Linux-cluster] Basic RHEL 5.1 cluster problem
Message-ID: <4b28518b0806100118r10c908b6m8c3e3321355ab180@mail.gmail.com>

Hi all,
I have a problem with my RHEL 5.2 a 2 node cluster running on IBM X3650.
My cluster is configured for fencing on IBM RSAII via system-config-cluster,
with only one network interface, no shared storage and I have followed the
Red Hat Cluster suite for installation.
At the first look, no syntax error in my cluster.conf but when I'm going to
start the cman and the rgmanager daemons on primary node the other node
reboot or poweroff immediately.

Can anyone help me?
Thanks in advance


Loris


My cluster.conf:

<?xml version="1.0" ?>
<cluster alias="newsocks" config_version="10" name="newsocks">
        <fence_daemon post_fail_delay="60" post_join_delay="30"/>
        <clusternodes>
                <clusternode name="socks1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="RSA_1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="socks2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="RSA_2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_rsa" ipaddr="10.242.164.126"
login="xxxxx" name="RSA_1" passwd="xxxxx"/>
                <fencedevice agent="fence_rsa" ipaddr="10.242.164.128"
login="xxxxx" name="RSA_2" passwd="xxxxx"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="domso" ordered="1"
restricted="1">
                                <failoverdomainnode name="socks1"
priority="1"/>
                                <failoverdomainnode name="socks2"
priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="10.242.156.100" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="domso" name="servso"
recovery="relocate">
                        <ip ref="10.242.156.100">
                                <script file="/etc/init.d/ss5"
name="sockss5"/>
                        </ip>
                </service>
        </rm>
</cluster>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080610/7f941421/attachment.htm>

From farislinux at yahoo.com  Tue Jun 10 08:39:09 2008
From: farislinux at yahoo.com (faris)
Date: Tue, 10 Jun 2008 01:39:09 -0700 (PDT)
Subject: [Linux-cluster] Connecting to an External SUN SAN box !
Message-ID: <7351.48682.qm@web33202.mail.mud.yahoo.com>

Hi,
&nbsp;
I am really a newbie to linux clustering and GFS and i have used samba and cifs for remote file sharing etc .
&nbsp;
Currently there is a SUN SAN box running well and some of the servers from other vendors are accessing storage spaces via FC HBA&nbsp;&nbsp;(2GB). 
&nbsp;
My task is to access this SUN SAN Box same storage space via 2 RHEL4 servers having FC HBA cards each connected. could some one tell me:
&nbsp;
1.&nbsp;If i install GFS only is enough or do i need to install cluster suite as well?
2. Are there any special configs need to be done ?
3.&nbsp;Are there any special&nbsp;mount parameters need to pass ?
&nbsp;
Thanks a lot!
&nbsp;
Farislinux
&nbsp;&nbsp;


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080610/0ff6d3b8/attachment.htm>

From fdinitto at redhat.com  Tue Jun 10 08:44:41 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 10 Jun 2008 10:44:41 +0200 (CEST)
Subject: [Linux-cluster] Cluster 2.03.04 released
Message-ID: <Pine.LNX.4.64.0806101037360.5892@trider-g7>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


The cluster team and its vibrant community are proud to announce the 5th
release from the STABLE2 branch: 2.03.04.

The STABLE2 branch collects, on a daily base, all bug fixes and the bare
minimal changes required to run the cluster on top of the most recent Linux
kernel (2.6.25) and rock solid openais (0.80.3 or higher).

The 2.03.04 release features a major cleanup on the Copyright and Licence 
front. For distributions, like Debian, it marks the end of a nightmare. 
Make sure to read README.licence and COPYRIGHT top level files.

We also welcome a new fence agent (fence_ifmib) kindly contributed by
Ross Vandegrift and we say good bye to gfs_edit, now replaced by 
gfs2_edit.

The new source tarball can be downloaded here:

   ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.04.tar.gz

In order to use GFS1, the Linux kernel requires a minimal patch:

   ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Happy clustering,
Fabio

Under the hood (from 2.03.03):

Bob Peterson (6):
       Fix gfs2_edit bugs with non-4K block sizes
       Make gfs2_edit more friendly to automated testing.
       Updates to gfs2_edit man page for new option.
       Allow keywords in block number input
       Ability to specify starting block or structure with -s
       Fix compiler warning.

Fabio M. Di Nitto (14):
       [CMAN] Fix cman_tool node name override
       [GFS2] Use proper include dir for libvolume_id
       [FENCE] Fix copyright header for fence_ifmib manpage
       [FENCE] Fix ifmib README to report the right fence agent
       [BUILD] Plugin the new shiny fence_ifmib agent
       [CCS] Use absolute path for queries
       [GNBD/FENCE] Move fence_gnbd agent where it belongs
       [BUILD] Fix file permissions all around
       [BUILD] Clean up standard libraries Makefiles
       [MISC] Whitespace cleanup
       [GFS] Remove obsoleted gfs_edit in favour of gfs2_edit
       [MISC] Remove obsolete and empty files
       [MISC] Add top level licence files
       [MISC] Cleanup licence, copyright and header duplication

Lon Hohberger (1):
       [rgmanager] Use /cluster/rm instead of //rm

Marek 'marx' Grac (3):
       [FENCE] Fix #446995: Unknown option
       [FENCE] Fix: 447378: fence_apc unable to connect via ssh to APC 7900
       Fixes #445662: names of resources with spaces are mishandled

Mark Hlawatschek (1):
       mount.gfs2: skip mtab updates

Ross Vandegrift (1):
       [FENCE] Add fence_ifmib new agent

Ryan McCabe (1):
       fence: fixes and cleanups to fencing.py library

  COPYING.applications                               |  339 ++++++++
  COPYING.libraries                                  |  510 ++++++++++++
  COPYRIGHT                                          |  242 ++++++
  Makefile                                           |   19 +-
  README.licence                                     |   33 +
  ccs/Makefile                                       |   12 -
  ccs/ccs_test/Makefile                              |   12 -
  ccs/ccs_test/ccs_test.c                            |   11 -
  ccs/ccs_tool/Makefile                              |   12 -
  ccs/ccs_tool/editconf.c                            |   12 -
  ccs/ccs_tool/editconf.h                            |   12 -
  ccs/ccs_tool/old_parser.c                          |   12 -
  ccs/ccs_tool/update.c                              |   17 +-
  ccs/ccs_tool/update.h                              |   12 -
  ccs/ccs_tool/upgrade.c                             |   11 -
  ccs/ccs_tool/upgrade.h                             |   12 -
  ccs/common/log.c                                   |   11 -
  ccs/common/log.h                                   |   11 -
  ccs/daemon/Makefile                                |   12 -
  ccs/daemon/ccsd.c                                  |   11 -
  ccs/daemon/cluster_mgr.c                           |   11 -
  ccs/daemon/cluster_mgr.h                           |   11 -
  ccs/daemon/cnx_mgr.c                               |   11 -
  ccs/daemon/cnx_mgr.h                               |   12 -
  ccs/daemon/globals.c                               |   11 -
  ccs/daemon/globals.h                               |   11 -
  ccs/daemon/misc.c                                  |   18 +-
  ccs/daemon/misc.h                                  |   11 -
  ccs/include/comm_headers.h                         |   12 -
  ccs/include/debug.h                                |   12 -
  ccs/lib/Makefile                                   |   14 +-
  ccs/lib/ccs.h                                      |   11 -
  ccs/lib/libccs.c                                   |   11 -
  ccs/man/Makefile                                   |   13 -
  ccs/man/ccs.7                                      |    6 -
  ccs/man/ccs_test.8                                 |    6 -
  ccs/man/ccs_tool.8                                 |    7 -
  ccs/man/ccsd.8                                     |    7 -
  ccs/man/cluster.conf.5                             |    4 -
  cman/Makefile                                      |   13 -
  cman/cman_tool/Makefile                            |   12 -
  cman/cman_tool/cman_tool.h                         |   13 -
  cman/cman_tool/join.c                              |   13 -
  cman/cman_tool/main.c                              |   13 -
  cman/daemon/Makefile                               |   12 -
  cman/daemon/ais.c                                  |   12 -
  cman/daemon/ais.h                                  |   11 -
  cman/daemon/barrier.c                              |   13 -
  cman/daemon/barrier.h                              |   12 -
  cman/daemon/cmanccs.c                              |   18 +-
  cman/daemon/cmanccs.h                              |   12 -
  cman/daemon/cnxman-private.h                       |   13 -
  cman/daemon/cnxman-socket.h                        |   13 -
  cman/daemon/commands.c                             |   13 -
  cman/daemon/commands.h                             |   12 -
  cman/daemon/config.c                               |   12 -
  cman/daemon/daemon.c                               |   11 -
  cman/daemon/daemon.h                               |   12 -
  cman/daemon/list.h                                 |   15 -
  cman/daemon/logging.c                              |   12 -
  cman/daemon/logging.h                              |   11 -
  cman/init.d/Makefile                               |   12 -
  cman/lib/Makefile                                  |   15 +-
  cman/lib/libcman.c                                 |   22 -
  cman/lib/libcman.h                                 |   22 -
  cman/man/Makefile                                  |   13 -
  cman/man/cman.5                                    |    3 -
  cman/qdisk/Makefile                                |   12 -
  cman/qdisk/bitmap.c                                |   19 -
  cman/qdisk/clulog.c                                |   19 -
  cman/qdisk/clulog.h                                |   19 -
  cman/qdisk/crc32.c                                 |   20 -
  cman/qdisk/daemon_init.c                           |   19 -
  cman/qdisk/disk.c                                  |   19 -
  cman/qdisk/disk.h                                  |   20 -
  cman/qdisk/disk_util.c                             |   20 -
  cman/qdisk/main.c                                  |   20 -
  cman/qdisk/mkqdisk.c                               |   20 -
  cman/qdisk/platform.h                              |   19 -
  cman/qdisk/proc.c                                  |   20 -
  cman/qdisk/scandisk.c                              |   19 -
  cman/qdisk/scandisk.h                              |   18 -
  cman/qdisk/score.c                                 |   20 -
  cman/qdisk/score.h                                 |   20 -
  cman/tests/Makefile                                |   12 -
  cman/tests/qwait.c                                 |    2 -
  cman/tests/user_service.c                          |   13 -
  config/copyright.cf                                |   13 -
  configure                                          |   14 -
  dlm/Makefile                                       |   12 -
  dlm/doc/example.c                                  |    1 -
  dlm/lib/Makefile                                   |   21 +-
  dlm/lib/libdlm.c                                   |   24 -
  dlm/lib/libdlm.h                                   |   23 -
  dlm/man/Makefile                                   |   12 -
  dlm/man/dlm_tool.8                                 |    6 -
  dlm/tests/Makefile                                 |   12 -
  dlm/tests/usertest/Makefile                        |   12 -
  dlm/tests/usertest/alternate-lvb.c                 |   12 -
  dlm/tests/usertest/dlmtest2.c                      |   12 -
  dlm/tests/usertest/threads.c                       |   12 -
  dlm/tool/Makefile                                  |   12 -
  dlm/tool/main.c                                    |   12 -
  fence/Makefile                                     |   13 -
  fence/agents/Makefile                              |   13 -
  fence/agents/apc/Makefile                          |   13 -
  fence/agents/apc/fence_apc.py                      |    3 +-
  fence/agents/apc_snmp/Makefile                     |   13 -
  fence/agents/apc_snmp/fence_apc_snmp.py            |   13 -
  fence/agents/baytech/Makefile                      |   13 -
  fence/agents/baytech/fence_baytech.pl              |   13 -
  fence/agents/bladecenter/Makefile                  |   13 -
  fence/agents/bladecenter/fence_bladecenter.py      |    3 +-
  fence/agents/brocade/Makefile                      |   13 -
  fence/agents/brocade/fence_brocade.pl              |   13 -
  fence/agents/bullpap/Makefile                      |   13 -
  fence/agents/bullpap/fence_bullpap.pl              |   12 -
  fence/agents/cpint/Makefile                        |   13 -
  fence/agents/cpint/fence_cpint.pl                  |   13 -
  fence/agents/drac/Makefile                         |   13 -
  fence/agents/drac/fence_drac.pl                    |   12 -
  fence/agents/drac/fence_drac5.py                   |    3 +-
  fence/agents/egenera/Makefile                      |   13 -
  fence/agents/egenera/fence_egenera.pl              |   13 -
  fence/agents/gnbd/Makefile                         |   21 +
  fence/agents/gnbd/main.c                           |  327 ++++++++
  fence/agents/ibmblade/Makefile                     |   13 -
  fence/agents/ibmblade/fence_ibmblade.pl            |   13 -
  fence/agents/ifmib/Makefile                        |    5 +
  fence/agents/ifmib/README                          |   45 ++
  fence/agents/ifmib/fence_ifmib.py                  |  221 ++++++
  fence/agents/ilo/Makefile                          |   13 -
  fence/agents/ilo/fence_ilo.py                      |    3 +-
  fence/agents/ipmilan/Makefile                      |   13 -
  fence/agents/ipmilan/expect.c                      |   19 -
  fence/agents/ipmilan/expect.h                      |   16 -
  fence/agents/ipmilan/ipmilan.c                     |   17 -
  fence/agents/lib/Makefile                          |   14 -
  fence/agents/lib/fencing.py.py                     |   51 +-
  fence/agents/manual/Makefile                       |   13 -
  fence/agents/manual/fence_ack_manual.sh            |   13 +-
  fence/agents/mcdata/Makefile                       |   13 -
  fence/agents/mcdata/fence_mcdata.pl                |   14 -
  fence/agents/rackswitch/Makefile                   |   13 -
  fence/agents/rackswitch/do_rack.c                  |   12 -
  fence/agents/rps10/Makefile                        |   13 -
  fence/agents/rps10/rps10.c                         |   18 -
  fence/agents/rsa/Makefile                          |   13 -
  fence/agents/rsa/fence_rsa.py                      |   13 -
  fence/agents/rsb/Makefile                          |   13 -
  fence/agents/rsb/fence_rsb.py                      |   13 -
  fence/agents/sanbox2/Makefile                      |   13 -
  fence/agents/sanbox2/fence_sanbox2.pl              |   13 -
  fence/agents/scsi/Makefile                         |   12 -
  fence/agents/vixel/Makefile                        |   13 -
  fence/agents/vixel/fence_vixel.pl                  |   13 -
  fence/agents/vmware/Makefile                       |   13 -
  fence/agents/vmware/fence_vmware.pl                |   15 -
  fence/agents/wti/Makefile                          |   13 -
  fence/agents/wti/fence_wti.py                      |    3 +-
  fence/agents/xcat/Makefile                         |   13 -
  fence/agents/xcat/fence_xcat.pl                    |    9 -
  fence/agents/xvm/Makefile                          |   12 -
  fence/agents/xvm/debug.c                           |   18 -
  fence/agents/xvm/debug.h                           |   18 -
  fence/agents/xvm/fence_xvm.c                       |   18 -
  fence/agents/xvm/fence_xvmd.c                      |   21 -
  fence/agents/xvm/ip_lookup.c                       |   18 -
  fence/agents/xvm/ip_lookup.h                       |   18 -
  fence/agents/xvm/mcast.c                           |   21 -
  fence/agents/xvm/mcast.h                           |   18 -
  fence/agents/xvm/options-ccs.c                     |   18 -
  fence/agents/xvm/options.c                         |   18 -
  fence/agents/xvm/options.h                         |   18 -
  fence/agents/xvm/simple_auth.c                     |   18 -
  fence/agents/xvm/simple_auth.h                     |   18 -
  fence/agents/xvm/tcp.c                             |   24 -
  fence/agents/xvm/tcp.h                             |   18 -
  fence/agents/xvm/virt.c                            |   18 -
  fence/agents/xvm/virt.h                            |   18 -
  fence/agents/xvm/vm_states.c                       |   18 -
  fence/agents/xvm/xvm.h                             |   18 -
  fence/agents/zvm/Makefile                          |   13 -
  fence/agents/zvm/fence_zvm.pl                      |   13 -
  fence/fence_node/Makefile                          |   13 -
  fence/fence_node/fence_node.c                      |   13 -
  fence/fence_tool/Makefile                          |   13 -
  fence/fence_tool/fence_tool.c                      |   13 -
  fence/fenced/Makefile                              |   13 -
  fence/fenced/agent.c                               |   13 -
  fence/fenced/fd.h                                  |   13 -
  fence/fenced/group.c                               |   12 -
  fence/fenced/main.c                                |   13 -
  fence/fenced/member_cman.c                         |   12 -
  fence/fenced/recover.c                             |   13 -
  fence/man/Makefile                                 |   15 +-
  fence/man/fence.8                                  |    7 -
  fence/man/fence_ack_manual.8                       |    7 -
  fence/man/fence_apc.8                              |    7 -
  fence/man/fence_baytech.8                          |    7 -
  fence/man/fence_bladecenter.8                      |    7 -
  fence/man/fence_brocade.8                          |    7 -
  fence/man/fence_bullpap.8                          |    7 -
  fence/man/fence_cpint.8                            |    7 -
  fence/man/fence_drac.8                             |    6 -
  fence/man/fence_egenera.8                          |    7 -
  fence/man/fence_gnbd.8                             |   84 ++
  fence/man/fence_ibmblade.8                         |    7 -
  fence/man/fence_ifmib.8                            |   69 ++
  fence/man/fence_ilo.8                              |    7 -
  fence/man/fence_ipmilan.8                          |    7 -
  fence/man/fence_manual.8                           |    7 -
  fence/man/fence_mcdata.8                           |    7 -
  fence/man/fence_node.8                             |    7 -
  fence/man/fence_rackswitch.8                       |    7 -
  fence/man/fence_rib.8                              |    7 -
  fence/man/fence_rsa.8                              |    6 -
  fence/man/fence_rsb.8                              |    6 -
  fence/man/fence_sanbox2.8                          |    7 -
  fence/man/fence_scsi.8                             |    6 -
  fence/man/fence_tool.8                             |    7 -
  fence/man/fence_vixel.8                            |    7 -
  fence/man/fence_wti.8                              |    7 -
  fence/man/fence_xcat.8                             |    3 -
  fence/man/fence_xvm.8                              |    7 -
  fence/man/fence_xvmd.8                             |    7 -
  fence/man/fence_zvm.8                              |    7 -
  fence/man/fenced.8                                 |    7 -
  gfs-kernel/src/gfs/Makefile                        |   13 -
  gfs-kernel/src/gfs/acl.c                           |   13 -
  gfs-kernel/src/gfs/acl.h                           |   13 -
  gfs-kernel/src/gfs/bits.c                          |   13 -
  gfs-kernel/src/gfs/bits.h                          |   13 -
  gfs-kernel/src/gfs/bmap.c                          |   13 -
  gfs-kernel/src/gfs/bmap.h                          |   13 -
  gfs-kernel/src/gfs/daemon.c                        |   13 -
  gfs-kernel/src/gfs/daemon.h                        |   13 -
  gfs-kernel/src/gfs/dio.c                           |   13 -
  gfs-kernel/src/gfs/dio.h                           |   13 -
  gfs-kernel/src/gfs/dir.c                           |   13 -
  gfs-kernel/src/gfs/dir.h                           |   13 -
  gfs-kernel/src/gfs/eaops.c                         |   13 -
  gfs-kernel/src/gfs/eaops.h                         |   13 -
  gfs-kernel/src/gfs/eattr.c                         |   13 -
  gfs-kernel/src/gfs/eattr.h                         |   13 -
  gfs-kernel/src/gfs/file.c                          |   13 -
  gfs-kernel/src/gfs/file.h                          |   13 -
  gfs-kernel/src/gfs/fixed_div64.h                   |   34 -
  gfs-kernel/src/gfs/format.h                        |   13 -
  gfs-kernel/src/gfs/gfs.h                           |   13 -
  gfs-kernel/src/gfs/gfs_ioctl.h                     |   13 -
  gfs-kernel/src/gfs/gfs_ondisk.h                    |   13 -
  gfs-kernel/src/gfs/glock.c                         |   13 -
  gfs-kernel/src/gfs/glock.h                         |   13 -
  gfs-kernel/src/gfs/glops.c                         |   13 -
  gfs-kernel/src/gfs/glops.h                         |   13 -
  gfs-kernel/src/gfs/incore.h                        |   13 -
  gfs-kernel/src/gfs/inode.c                         |   13 -
  gfs-kernel/src/gfs/inode.h                         |   13 -
  gfs-kernel/src/gfs/ioctl.c                         |   13 -
  gfs-kernel/src/gfs/ioctl.h                         |   13 -
  gfs-kernel/src/gfs/lm.c                            |    9 -
  gfs-kernel/src/gfs/lm.h                            |   13 -
  gfs-kernel/src/gfs/log.c                           |   13 -
  gfs-kernel/src/gfs/log.h                           |   13 -
  gfs-kernel/src/gfs/lops.c                          |   13 -
  gfs-kernel/src/gfs/lops.h                          |   13 -
  gfs-kernel/src/gfs/lvb.c                           |   13 -
  gfs-kernel/src/gfs/lvb.h                           |   13 -
  gfs-kernel/src/gfs/main.c                          |   13 -
  gfs-kernel/src/gfs/mount.c                         |   13 -
  gfs-kernel/src/gfs/mount.h                         |   13 -
  gfs-kernel/src/gfs/ondisk.c                        |   13 -
  gfs-kernel/src/gfs/ops_address.c                   |   13 -
  gfs-kernel/src/gfs/ops_address.h                   |   13 -
  gfs-kernel/src/gfs/ops_dentry.c                    |   13 -
  gfs-kernel/src/gfs/ops_dentry.h                    |   13 -
  gfs-kernel/src/gfs/ops_export.c                    |   13 -
  gfs-kernel/src/gfs/ops_export.h                    |   13 -
  gfs-kernel/src/gfs/ops_file.c                      |   13 -
  gfs-kernel/src/gfs/ops_file.h                      |   13 -
  gfs-kernel/src/gfs/ops_fstype.c                    |   10 -
  gfs-kernel/src/gfs/ops_fstype.h                    |   13 -
  gfs-kernel/src/gfs/ops_inode.c                     |   13 -
  gfs-kernel/src/gfs/ops_inode.h                     |   13 -
  gfs-kernel/src/gfs/ops_super.c                     |   13 -
  gfs-kernel/src/gfs/ops_super.h                     |   13 -
  gfs-kernel/src/gfs/ops_vm.c                        |   13 -
  gfs-kernel/src/gfs/ops_vm.h                        |   13 -
  gfs-kernel/src/gfs/page.c                          |   13 -
  gfs-kernel/src/gfs/page.h                          |   13 -
  gfs-kernel/src/gfs/proc.c                          |   13 -
  gfs-kernel/src/gfs/proc.h                          |   13 -
  gfs-kernel/src/gfs/quota.c                         |   13 -
  gfs-kernel/src/gfs/quota.h                         |   13 -
  gfs-kernel/src/gfs/recovery.c                      |   13 -
  gfs-kernel/src/gfs/recovery.h                      |   13 -
  gfs-kernel/src/gfs/rgrp.c                          |   13 -
  gfs-kernel/src/gfs/rgrp.h                          |   13 -
  gfs-kernel/src/gfs/super.c                         |   13 -
  gfs-kernel/src/gfs/super.h                         |   13 -
  gfs-kernel/src/gfs/sys.c                           |   13 -
  gfs-kernel/src/gfs/sys.h                           |   13 -
  gfs-kernel/src/gfs/trans.c                         |   13 -
  gfs-kernel/src/gfs/trans.h                         |   13 -
  gfs-kernel/src/gfs/unlinked.c                      |   13 -
  gfs-kernel/src/gfs/unlinked.h                      |   13 -
  gfs-kernel/src/gfs/util.c                          |   13 -
  gfs-kernel/src/gfs/util.h                          |   13 -
  gfs/Makefile                                       |   19 +-
  gfs/gfs_debug/Makefile                             |   13 -
  gfs/gfs_debug/basic.c                              |   13 -
  gfs/gfs_debug/basic.h                              |   13 -
  gfs/gfs_debug/block_device.c                       |   13 -
  gfs/gfs_debug/block_device.h                       |   13 -
  gfs/gfs_debug/gfs_debug.h                          |   13 -
  gfs/gfs_debug/main.c                               |   13 -
  gfs/gfs_debug/ondisk.c                             |   13 -
  gfs/gfs_debug/readfile.c                           |   13 -
  gfs/gfs_debug/readfile.h                           |   13 -
  gfs/gfs_debug/util.c                               |   13 -
  gfs/gfs_debug/util.h                               |   13 -
  gfs/gfs_edit/Makefile                              |   40 -
  gfs/gfs_edit/gfshex.c                              |  357 ---------
  gfs/gfs_edit/gfshex.h                              |   23 -
  gfs/gfs_edit/hexedit.c                             |  833 --------------------
  gfs/gfs_edit/hexedit.h                             |  193 -----
  gfs/gfs_fsck/Makefile                              |   12 -
  gfs/gfs_fsck/bio.c                                 |   13 -
  gfs/gfs_fsck/bio.h                                 |   13 -
  gfs/gfs_fsck/bitmap.c                              |   12 -
  gfs/gfs_fsck/bitmap.h                              |   13 -
  gfs/gfs_fsck/block_list.c                          |   12 -
  gfs/gfs_fsck/block_list.h                          |   12 -
  gfs/gfs_fsck/eattr.c                               |   12 -
  gfs/gfs_fsck/eattr.h                               |   12 -
  gfs/gfs_fsck/file.c                                |   13 -
  gfs/gfs_fsck/file.h                                |   13 -
  gfs/gfs_fsck/fs_bits.c                             |   13 -
  gfs/gfs_fsck/fs_bits.h                             |   13 -
  gfs/gfs_fsck/fs_bmap.c                             |   13 -
  gfs/gfs_fsck/fs_bmap.h                             |   13 -
  gfs/gfs_fsck/fs_dir.c                              |   13 -
  gfs/gfs_fsck/fs_dir.h                              |   13 -
  gfs/gfs_fsck/fs_inode.c                            |   13 -
  gfs/gfs_fsck/fs_inode.h                            |   13 -
  gfs/gfs_fsck/fs_recovery.c                         |   14 -
  gfs/gfs_fsck/fs_recovery.h                         |   13 -
  gfs/gfs_fsck/fsck.h                                |   12 -
  gfs/gfs_fsck/fsck_incore.h                         |   15 -
  gfs/gfs_fsck/hash.c                                |   13 -
  gfs/gfs_fsck/hash.h                                |   13 -
  gfs/gfs_fsck/initialize.c                          |   13 -
  gfs/gfs_fsck/inode.c                               |   12 -
  gfs/gfs_fsck/inode.h                               |   12 -
  gfs/gfs_fsck/inode_hash.c                          |   13 -
  gfs/gfs_fsck/inode_hash.h                          |   13 -
  gfs/gfs_fsck/link.c                                |   13 -
  gfs/gfs_fsck/link.h                                |   14 -
  gfs/gfs_fsck/log.c                                 |   12 -
  gfs/gfs_fsck/log.h                                 |   12 -
  gfs/gfs_fsck/lost_n_found.c                        |   13 -
  gfs/gfs_fsck/lost_n_found.h                        |   13 -
  gfs/gfs_fsck/main.c                                |   12 -
  gfs/gfs_fsck/metawalk.c                            |   12 -
  gfs/gfs_fsck/metawalk.h                            |   12 -
  gfs/gfs_fsck/ondisk.c                              |   13 -
  gfs/gfs_fsck/ondisk.h                              |   13 -
  gfs/gfs_fsck/pass1.c                               |   13 -
  gfs/gfs_fsck/pass1b.c                              |   13 -
  gfs/gfs_fsck/pass1c.c                              |   12 -
  gfs/gfs_fsck/pass2.c                               |   13 -
  gfs/gfs_fsck/pass3.c                               |   13 -
  gfs/gfs_fsck/pass4.c                               |   13 -
  gfs/gfs_fsck/pass5.c                               |   13 -
  gfs/gfs_fsck/rgrp.c                                |   14 -
  gfs/gfs_fsck/rgrp.h                                |   13 -
  gfs/gfs_fsck/super.c                               |   13 -
  gfs/gfs_fsck/super.h                               |   13 -
  gfs/gfs_fsck/test_bitmap.c                         |   12 -
  gfs/gfs_fsck/test_block_list.c                     |   12 -
  gfs/gfs_fsck/util.c                                |   13 -
  gfs/gfs_fsck/util.h                                |   13 -
  gfs/gfs_grow/Makefile                              |   13 -
  gfs/gfs_grow/main.c                                |   13 -
  gfs/gfs_grow/ondisk.c                              |   13 -
  gfs/gfs_jadd/Makefile                              |   13 -
  gfs/gfs_jadd/main.c                                |   13 -
  gfs/gfs_jadd/ondisk.c                              |   13 -
  gfs/gfs_mkfs/Makefile                              |   13 -
  gfs/gfs_mkfs/device_geometry.c                     |   13 -
  gfs/gfs_mkfs/fs_geometry.c                         |   13 -
  gfs/gfs_mkfs/locking.c                             |   13 -
  gfs/gfs_mkfs/main.c                                |   13 -
  gfs/gfs_mkfs/mkfs_gfs.h                            |   13 -
  gfs/gfs_mkfs/ondisk.c                              |   13 -
  gfs/gfs_mkfs/structures.c                          |   13 -
  gfs/gfs_quota/Makefile                             |   13 -
  gfs/gfs_quota/check.c                              |   13 -
  gfs/gfs_quota/gfs_quota.h                          |   13 -
  gfs/gfs_quota/layout.c                             |   13 -
  gfs/gfs_quota/main.c                               |   13 -
  gfs/gfs_quota/names.c                              |   13 -
  gfs/gfs_quota/ondisk.c                             |   13 -
  gfs/gfs_tool/Makefile                              |   13 -
  gfs/gfs_tool/counters.c                            |   13 -
  gfs/gfs_tool/decipher_lockstate_dump               |   14 -
  gfs/gfs_tool/df.c                                  |   13 -
  gfs/gfs_tool/gfs_tool.h                            |   13 -
  gfs/gfs_tool/layout.c                              |   13 -
  gfs/gfs_tool/main.c                                |   13 -
  gfs/gfs_tool/misc.c                                |   13 -
  gfs/gfs_tool/ondisk.c                              |   13 -
  gfs/gfs_tool/parse_lockdump                        |   14 -
  gfs/gfs_tool/sb.c                                  |   13 -
  gfs/gfs_tool/tune.c                                |   13 -
  gfs/gfs_tool/util.c                                |   13 -
  gfs/include/global.h                               |   13 -
  gfs/include/linux_endian.h                         |   13 -
  gfs/include/osi_list.h                             |   13 -
  gfs/include/osi_user.h                             |   13 -
  gfs/init.d/Makefile                                |   12 -
  gfs/libgfs/Makefile                                |   13 -
  gfs/libgfs/bio.c                                   |   13 -
  gfs/libgfs/bitmap.c                                |   12 -
  gfs/libgfs/block_list.c                            |   12 -
  gfs/libgfs/file.c                                  |   13 -
  gfs/libgfs/fs_bits.c                               |   13 -
  gfs/libgfs/fs_bmap.c                               |   13 -
  gfs/libgfs/fs_dir.c                                |   13 -
  gfs/libgfs/fs_inode.c                              |   13 -
  gfs/libgfs/incore.h                                |   13 -
  gfs/libgfs/inode.c                                 |   12 -
  gfs/libgfs/log.c                                   |   12 -
  gfs/libgfs/ondisk.c                                |   13 -
  gfs/libgfs/rgrp.c                                  |   14 -
  gfs/libgfs/size.c                                  |   13 -
  gfs/libgfs/super.c                                 |   13 -
  gfs/libgfs/util.c                                  |   13 -
  gfs/man/Makefile                                   |   13 -
  gfs/man/gfs.8                                      |    3 -
  gfs/man/gfs_edit.8                                 |  131 +---
  gfs/man/gfs_fsck.8                                 |    3 -
  gfs/man/gfs_grow.8                                 |    3 -
  gfs/man/gfs_jadd.8                                 |    3 -
  gfs/man/gfs_mkfs.8                                 |    3 -
  gfs/man/gfs_mount.8                                |    8 -
  gfs/man/gfs_quota.8                                |    3 -
  gfs/man/gfs_tool.8                                 |    3 -
  gfs/tests/Makefile                                 |   12 -
  gfs/tests/filecon2/Makefile                        |   13 -
  gfs/tests/filecon2/filecon2.h                      |   13 -
  gfs/tests/filecon2/filecon2_client.c               |   13 -
  gfs/tests/filecon2/filecon2_server.c               |   13 -
  gfs/tests/mmdd/Makefile                            |   13 -
  gfs/tests/mmdd/mmdd.c                              |   13 -
  gfs2/Makefile                                      |   13 -
  gfs2/convert/Makefile                              |   12 -
  gfs2/convert/gfs2_convert.c                        |   14 -
  gfs2/debug/Makefile                                |   13 -
  gfs2/debug/basic.c                                 |   13 -
  gfs2/debug/basic.h                                 |   13 -
  gfs2/debug/block_device.c                          |   13 -
  gfs2/debug/block_device.h                          |   13 -
  gfs2/debug/gfs2_debug.h                            |   13 -
  gfs2/debug/main.c                                  |   13 -
  gfs2/debug/ondisk.c                                |   13 -
  gfs2/debug/readfile.c                              |   13 -
  gfs2/debug/readfile.h                              |   13 -
  gfs2/debug/util.c                                  |   13 -
  gfs2/debug/util.h                                  |   13 -
  gfs2/edit/Makefile                                 |   13 -
  gfs2/edit/gfs2hex.c                                |   26 +-
  gfs2/edit/gfs2hex.h                                |   13 -
  gfs2/edit/hexedit.c                                |  326 ++++----
  gfs2/edit/hexedit.h                                |   15 +-
  gfs2/edit/savemeta.c                               |   70 +-
  gfs2/fsck/Makefile                                 |   12 -
  gfs2/fsck/eattr.c                                  |   12 -
  gfs2/fsck/eattr.h                                  |   12 -
  gfs2/fsck/fs_bits.h                                |   13 -
  gfs2/fsck/fs_recovery.c                            |   13 -
  gfs2/fsck/fs_recovery.h                            |   13 -
  gfs2/fsck/fsck.h                                   |   12 -
  gfs2/fsck/hash.c                                   |   13 -
  gfs2/fsck/hash.h                                   |   13 -
  gfs2/fsck/initialize.c                             |   13 -
  gfs2/fsck/inode_hash.c                             |   13 -
  gfs2/fsck/inode_hash.h                             |   13 -
  gfs2/fsck/link.c                                   |   13 -
  gfs2/fsck/link.h                                   |   14 -
  gfs2/fsck/lost_n_found.c                           |   13 -
  gfs2/fsck/lost_n_found.h                           |   13 -
  gfs2/fsck/main.c                                   |   12 -
  gfs2/fsck/metawalk.c                               |   12 -
  gfs2/fsck/metawalk.h                               |   12 -
  gfs2/fsck/pass1.c                                  |   13 -
  gfs2/fsck/pass1b.c                                 |   13 -
  gfs2/fsck/pass1c.c                                 |   12 -
  gfs2/fsck/pass2.c                                  |   13 -
  gfs2/fsck/pass3.c                                  |   13 -
  gfs2/fsck/pass4.c                                  |   13 -
  gfs2/fsck/pass5.c                                  |   13 -
  gfs2/fsck/rgrepair.c                               |   13 -
  gfs2/fsck/test_bitmap.c                            |   12 -
  gfs2/fsck/test_block_list.c                        |   12 -
  gfs2/fsck/util.c                                   |   13 -
  gfs2/fsck/util.h                                   |   13 -
  gfs2/include/gfs2_disk_hash.h                      |   13 -
  gfs2/include/global.h                              |   13 -
  gfs2/include/linux_endian.h                        |   13 -
  gfs2/include/osi_list.h                            |   13 -
  gfs2/include/osi_user.h                            |   13 -
  gfs2/init.d/Makefile                               |   12 -
  gfs2/libgfs2/Makefile                              |   13 -
  gfs2/libgfs2/bitmap.c                              |   12 -
  gfs2/libgfs2/block_list.c                          |   12 -
  gfs2/libgfs2/buf.c                                 |   13 -
  gfs2/libgfs2/device_geometry.c                     |   13 -
  gfs2/libgfs2/fs_bits.c                             |   13 -
  gfs2/libgfs2/fs_geometry.c                         |   13 -
  gfs2/libgfs2/fs_ops.c                              |   13 -
  gfs2/libgfs2/gfs2_log.c                            |   12 -
  gfs2/libgfs2/libgfs2.h                             |   13 -
  gfs2/libgfs2/locking.c                             |   13 -
  gfs2/libgfs2/misc.c                                |   13 -
  gfs2/libgfs2/ondisk.c                              |   13 -
  gfs2/libgfs2/ondisk.h                              |    9 -
  gfs2/libgfs2/recovery.c                            |    9 -
  gfs2/libgfs2/rgrp.c                                |   13 -
  gfs2/libgfs2/size.c                                |   13 -
  gfs2/libgfs2/structures.c                          |   13 -
  gfs2/libgfs2/super.c                               |   13 -
  gfs2/man/Makefile                                  |   13 -
  gfs2/man/gfs2.8                                    |    3 -
  gfs2/man/gfs2_convert.8                            |    3 -
  gfs2/man/gfs2_edit.8                               |   15 +-
  gfs2/man/gfs2_fsck.8                               |    3 -
  gfs2/man/gfs2_grow.8                               |    3 -
  gfs2/man/gfs2_jadd.8                               |    3 -
  gfs2/man/gfs2_mount.8                              |    8 -
  gfs2/man/gfs2_quota.8                              |    3 -
  gfs2/man/gfs2_tool.8                               |    3 -
  gfs2/man/mkfs.gfs2.8                               |    3 -
  gfs2/mkfs/Makefile                                 |    6 +-
  gfs2/mkfs/gfs2_mkfs.h                              |   13 -
  gfs2/mkfs/main.c                                   |   13 -
  gfs2/mkfs/main_grow.c                              |   12 -
  gfs2/mkfs/main_jadd.c                              |   11 -
  gfs2/mkfs/main_mkfs.c                              |   13 -
  gfs2/mount/Makefile                                |   12 -
  gfs2/mount/mount.gfs2.c                            |    8 -
  gfs2/mount/mtab.c                                  |   14 +-
  gfs2/mount/ondisk1.c                               |   13 -
  gfs2/mount/ondisk2.c                               |   13 -
  gfs2/mount/umount.gfs2.c                           |    8 -
  gfs2/mount/util.c                                  |    8 -
  gfs2/mount/util.h                                  |    8 -
  gfs2/quota/Makefile                                |   13 -
  gfs2/quota/check.c                                 |   13 -
  gfs2/quota/gfs2_quota.h                            |   13 -
  gfs2/quota/main.c                                  |   12 -
  gfs2/quota/names.c                                 |   13 -
  gfs2/tool/Makefile                                 |   13 -
  gfs2/tool/decipher_lockstate_dump                  |   14 -
  gfs2/tool/df.c                                     |   13 -
  gfs2/tool/gfs2_tool.h                              |   13 -
  gfs2/tool/iflags.h                                 |   13 -
  gfs2/tool/layout.c                                 |   13 -
  gfs2/tool/main.c                                   |   13 -
  gfs2/tool/misc.c                                   |   13 -
  gfs2/tool/ondisk.c                                 |   13 -
  gfs2/tool/parse_lockdump                           |   14 -
  gfs2/tool/sb.c                                     |   13 -
  gfs2/tool/tune.c                                   |   13 -
  gnbd-kernel/src/Makefile                           |   13 -
  gnbd-kernel/src/gnbd.c                             |   13 -
  gnbd-kernel/src/gnbd.h                             |   13 -
  gnbd/COPYING                                       |  340 --------
  gnbd/Makefile                                      |   13 -
  gnbd/client/Makefile                               |   13 -
  gnbd/client/gnbd_monitor.c                         |   12 -
  gnbd/client/gnbd_monitor.h                         |   12 -
  gnbd/client/gnbd_recvd.c                           |   12 -
  gnbd/client/monitor_req.c                          |   12 -
  gnbd/include/global.h                              |   13 -
  gnbd/include/gnbd_endian.h                         |   13 -
  gnbd/man/Makefile                                  |   16 +-
  gnbd/man/fence_gnbd.8                              |   87 --
  gnbd/man/gnbd.8                                    |    3 -
  gnbd/man/gnbd_export.8                             |    3 -
  gnbd/man/gnbd_import.8                             |    3 -
  gnbd/man/gnbd_serv.8                               |    2 -
  gnbd/server/Makefile                               |   13 -
  gnbd/server/device.c                               |   12 -
  gnbd/server/device.h                               |   12 -
  gnbd/server/extern_req.c                           |   11 -
  gnbd/server/extern_req.h                           |   12 -
  gnbd/server/fence.c                                |   12 -
  gnbd/server/fence.h                                |   12 -
  gnbd/server/gnbd_clusterd.c                        |   12 -
  gnbd/server/gnbd_serv.c                            |   12 -
  gnbd/server/gnbd_server.h                          |   12 -
  gnbd/server/gserv.c                                |   12 -
  gnbd/server/gserv.h                                |   12 -
  gnbd/server/list.h                                 |   13 -
  gnbd/server/local_req.c                            |   12 -
  gnbd/server/local_req.h                            |   12 -
  gnbd/tools/Makefile                                |   15 +-
  gnbd/tools/fence_gnbd/Makefile                     |   35 -
  gnbd/tools/fence_gnbd/main.c                       |  340 --------
  gnbd/tools/gnbd_export/Makefile                    |   13 -
  gnbd/tools/gnbd_export/gnbd_export.c               |   14 -
  gnbd/tools/gnbd_import/Makefile                    |   13 -
  gnbd/tools/gnbd_import/fence_return.h              |   13 -
  gnbd/tools/gnbd_import/gnbd_import.c               |   12 -
  gnbd/utils/Makefile                                |   13 -
  gnbd/utils/gnbd_utils.c                            |   12 -
  gnbd/utils/gnbd_utils.h                            |   12 -
  gnbd/utils/member_cman.c                           |   12 -
  gnbd/utils/member_cman.h                           |   12 -
  gnbd/utils/trans.c                                 |   12 -
  gnbd/utils/trans.h                                 |   12 -
  group/Makefile                                     |   12 -
  group/daemon/Makefile                              |   12 -
  group/daemon/gd_internal.h                         |   13 -
  group/daemon/groupd.h                              |   13 -
  group/daemon/main.c                                |   12 -
  group/dlm_controld/Makefile                        |   12 -
  group/dlm_controld/action.c                        |   12 -
  group/dlm_controld/deadlock.c                      |   12 -
  group/dlm_controld/dlm_controld.h                  |   12 -
  group/dlm_controld/dlm_daemon.h                    |   12 -
  group/dlm_controld/group.c                         |   12 -
  group/dlm_controld/main.c                          |   12 -
  group/dlm_controld/member_cman.c                   |   12 -
  group/gfs_controld/Makefile                        |   12 -
  group/gfs_controld/cpg.c                           |   12 -
  group/gfs_controld/group.c                         |   12 -
  group/gfs_controld/lock_dlm.h                      |   12 -
  group/gfs_controld/main.c                          |   12 -
  group/gfs_controld/member_cman.c                   |   12 -
  group/gfs_controld/plock.c                         |   12 -
  group/gfs_controld/recover.c                       |   12 -
  group/include/linux_endian.h                       |   13 -
  group/lib/Makefile                                 |   14 +-
  group/lib/libgroup.c                               |   22 -
  group/lib/libgroup.h                               |   22 -
  group/man/Makefile                                 |   12 -
  group/man/dlm_controld.8                           |    6 -
  group/man/gfs_controld.8                           |    6 -
  group/man/group_tool.8                             |    6 -
  group/man/groupd.8                                 |    6 -
  group/test/Makefile                                |   12 -
  group/test/clientd.c                               |   12 -
  group/tool/Makefile                                |   12 -
  group/tool/main.c                                  |   12 -
  make/defines.mk.input                              |   17 +-
  make/fencebuild.mk                                 |    6 +-
  make/install.mk                                    |   14 +-
  make/man.mk                                        |   10 +-
  rgmanager/AUTHORS                                  |   13 -
  rgmanager/COPYING                                  |  340 --------
  rgmanager/INSTALL                                  |    7 -
  rgmanager/Makefile                                 |   13 -
  rgmanager/NEWS                                     |    2 -
  rgmanager/include/clulog.h                         |   19 -
  rgmanager/include/event.h                          |   17 -
  rgmanager/include/findproc.h                       |   18 -
  rgmanager/include/platform.h                       |   19 -
  rgmanager/include/res-ocf.h                        |   18 -
  rgmanager/include/reslist.h                        |   20 +-
  rgmanager/include/restart_counter.h                |   17 -
  rgmanager/include/rg_locks.h                       |   17 -
  rgmanager/include/rg_queue.h                       |   17 -
  rgmanager/include/rmtab.h                          |   18 -
  rgmanager/include/sets.h                           |   17 -
  rgmanager/include/vf.h                             |   18 -
  rgmanager/init.d/Makefile                          |   12 -
  rgmanager/init.d/rgmanager.in                      |    6 -
  rgmanager/man/Makefile                             |   13 -
  rgmanager/src/Makefile                             |   13 -
  rgmanager/src/clulib/Makefile                      |   12 -
  rgmanager/src/clulib/alloc.c                       |   22 -
  rgmanager/src/clulib/ckpt_state.c                  |   18 -
  rgmanager/src/clulib/clulog.c                      |   19 -
  rgmanager/src/clulib/cman.c                        |   18 -
  rgmanager/src/clulib/daemon_init.c                 |   19 -
  rgmanager/src/clulib/fdops.c                       |   18 -
  rgmanager/src/clulib/lock.c                        |   18 -
  rgmanager/src/clulib/lockspace.c                   |   18 -
  rgmanager/src/clulib/locktest.c                    |   18 -
  rgmanager/src/clulib/members.c                     |   18 -
  rgmanager/src/clulib/message.c                     |   18 -
  rgmanager/src/clulib/msg_cluster.c                 |   18 -
  rgmanager/src/clulib/msg_socket.c                  |   18 -
  rgmanager/src/clulib/msgsimple.c                   |   19 -
  rgmanager/src/clulib/msgtest.c                     |   18 -
  rgmanager/src/clulib/rg_strings.c                  |   18 -
  rgmanager/src/clulib/sets.c                        |   17 -
  rgmanager/src/clulib/signals.c                     |   18 -
  rgmanager/src/clulib/tmgr.c                        |   19 -
  rgmanager/src/clulib/vft.c                         |   18 -
  rgmanager/src/clulib/wrap_lock.c                   |   19 -
  rgmanager/src/daemons/Makefile                     |   12 -
  rgmanager/src/daemons/clurmtabd.c                  |   18 -
  rgmanager/src/daemons/clurmtabd_lib.c              |   18 -
  rgmanager/src/daemons/depends.c                    |   19 -
  rgmanager/src/daemons/event_config.c               |   17 -
  rgmanager/src/daemons/fo_domain.c                  |   18 -
  rgmanager/src/daemons/groups.c                     |   19 -
  rgmanager/src/daemons/main.c                       |   19 -
  rgmanager/src/daemons/reslist.c                    |   18 -
  rgmanager/src/daemons/resrules.c                   |   18 -
  rgmanager/src/daemons/restart_counter.c            |   17 -
  rgmanager/src/daemons/restree.c                    |   21 -
  rgmanager/src/daemons/rg_event.c                   |   17 -
  rgmanager/src/daemons/rg_forward.c                 |   18 -
  rgmanager/src/daemons/rg_locks.c                   |   18 -
  rgmanager/src/daemons/rg_queue.c                   |   18 -
  rgmanager/src/daemons/rg_state.c                   |   18 -
  rgmanager/src/daemons/rg_thread.c                  |   18 -
  rgmanager/src/daemons/service_op.c                 |   17 -
  rgmanager/src/daemons/slang_event.c                |   17 -
  rgmanager/src/daemons/test.c                       |   18 -
  rgmanager/src/daemons/watchdog.c                   |   18 -
  rgmanager/src/resources/Makefile                   |   12 -
  rgmanager/src/resources/apache.sh                  |   23 -
  rgmanager/src/resources/clusterfs.sh               |   20 -
  rgmanager/src/resources/fs.sh                      |   20 -
  rgmanager/src/resources/ip.sh                      |   20 -
  rgmanager/src/resources/lvm.sh                     |   19 -
  rgmanager/src/resources/lvm_by_lv.sh               |   19 -
  rgmanager/src/resources/lvm_by_vg.sh               |   19 -
  rgmanager/src/resources/mysql.sh                   |   23 -
  rgmanager/src/resources/named.sh                   |   27 +-
  rgmanager/src/resources/netfs.sh                   |   20 -
  rgmanager/src/resources/nfsclient.sh               |   19 -
  rgmanager/src/resources/nfsexport.sh               |   20 -
  rgmanager/src/resources/nfsserver.sh               |   19 -
  rgmanager/src/resources/ocf-shellfuncs             |   21 -
  rgmanager/src/resources/openldap.sh                |   23 -
  rgmanager/src/resources/oracledb.sh                |    8 -
  rgmanager/src/resources/postgres-8.sh              |   29 +-
  rgmanager/src/resources/samba.sh                   |   33 +-
  rgmanager/src/resources/script.sh                  |   19 -
  rgmanager/src/resources/service.sh                 |   19 -
  rgmanager/src/resources/smb.sh                     |   24 -
  rgmanager/src/resources/svclib_nfslock             |   18 -
  rgmanager/src/resources/tomcat-5.sh                |   25 +-
  rgmanager/src/resources/utils/config-utils.sh.in   |   19 -
  rgmanager/src/resources/utils/member_util.sh       |   25 -
  rgmanager/src/resources/utils/messages.sh          |   25 -
  rgmanager/src/resources/utils/ra-skelet.sh         |   22 -
  rgmanager/src/resources/vm.sh                      |   18 -
  rgmanager/src/utils/Makefile                       |   12 -
  rgmanager/src/utils/cluarp.c                       |   19 -
  rgmanager/src/utils/clubufflush.c                  |   19 -
  rgmanager/src/utils/clufindhostname.c              |   19 -
  rgmanager/src/utils/clulog.c                       |   19 -
  rgmanager/src/utils/clunfslock.sh                  |    6 +-
  rgmanager/src/utils/clunfsops.c                    |   18 -
  rgmanager/src/utils/clusvcadm.c                    |   18 -
  rgmanager/src/utils/syscall.h                      |   17 -
  scripts/fenceparse                                 |   12 -
  scripts/uninstall.pl                               |   13 -
  766 files changed, 2213 insertions(+), 12625 deletions(-)

- --
I'm going to make him an offer he can't refuse.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iQIUAwUBSE4+/wgUGcMLQ3qJAQL1LA/2JAxGoJvqw4j4aZel1zzlAm2Vb4NSdiFz
Uo9wDM8iB/lf3ysAjnGo1C7uW4R1g17+VwImW6nRutWWghbMu3oEH7Nn4JTmhFIl
NF9ZUUOu1fTIepTfRuHnLumkESTdDuV5uE8MDdKAJXmNXid56bAycTNYKBI2v/kl
eb7PLyOQ27Ox0COdZLZxGSWcUXmFepXNK+rl82W3U7ni9adkrDsRxMeZp2yeZIu7
bmt8lA0vERYCo9bF/MSoFjY1Fr+j2xtWXFpLOjR5TmpD8qq0VBqHwDHMvv5L2twg
Bxtpe7Gnjm5B93LElKQsRQxbpaOBcRAScg26/nufBoVZs8WwmPJn6cxrOv9laTNk
qfUIrQKivaHBbN5RvkQFoMKntXhz8WZ31mYZGHjEx9/5K2iSP676BsGZ9cIjkVhg
C73f56uLasPOz7agaXuO5SuOk1ausznkW4MiHJuK968BXvT5K3FtOrqsyLwz7HGk
BVgBPIb7SjNiUXlPgozuPjelImmNc4ndnr3fZ65HdepwJT+g+CF0ZW8sQyMKKdCp
C/VcEdW446VpCQNLZ5AmXPCtdjGmdBYPCERCwiLp2JcfLD+NSm9LNTCwV0FtNgKz
DzJQjMJJiRrLLcEthKAcg56uemwvR0s1PtHjnwBCF7zPMtpvQDFZiZxjhpDcT5bf
IuZilV3gxA==
=VSJo
-----END PGP SIGNATURE-----


From Alain.Moulle at bull.net  Tue Jun 10 09:17:41 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 10 Jun 2008 11:17:41 +0200
Subject: [Linux-cluster] CS5 / quorum disk and heuristics
Message-ID: <484E46B5.4080107@bull.net>

Hi Lon,

and thanks again for all your answers.
>>Whereas heart-beat interface was working fine.
>>You can disable these by setting allow_kill="0" and/or reboot="0"
>>(see qdisk(5)).

=> ok but in the case of a heart-beat failure, it will no more
avoid the dual-fencing if allow_kill="0" and/or reboot="0" , right ?

>> And after reboot, I can see "cluster not quorate" etc.
>> This happens after both nodes boot, or just one?
After both nodes reboot but this message appears only on one.

>>If both nodes boot up with the third node off, they should still be able to
>>form a quorum by themselves, even if qdiskd isn't running or its score isn't
>>sufficient.
It's a pair, no third node.

Alain Moull?


From peter.haufschild at dilax.com  Tue Jun 10 10:23:04 2008
From: peter.haufschild at dilax.com (Peter Haufschild)
Date: Tue, 10 Jun 2008 12:23:04 +0200
Subject: [Linux-cluster] clvmd  Falling back to local file-based locking
Message-ID: <H0000084001f9ea4.1213093384.postamt.berlin.dilax.com@MHS>

Hallo,
after a reboot I start cman and clvmd and the start success.

3 Blades show's me

[root at blade1 ~]# pvdisplay
    Logging initialised at Tue Jun 10 12:18:56 2008
    Set umask to 0077
    Scanning for physical volume names
  --- Physical volume ---
  PV Name               /dev/sdb1
  VG Name               SATAStorage
  PV Size               4096,00 EB / not usable 4096,00 EB
  Allocatable           yes 
  PE Size (KByte)       4096
  Total PE              476906
  Free PE               4842
  Allocated PE          472064
  PV UUID               I2xV9i-8SN7-ifza-CmJA-W6ez-b2cT-OL4gxw
   
  --- Physical volume ---
  PV Name               /dev/sda1
  VG Name               SCSIStorage
  PV Size               410,19 GB / not usable 3,40 MB
  Allocatable           yes 
  PE Size (KByte)       4096
  Total PE              105008
  Free PE               19248
  Allocated PE          85760
  PV UUID               695f63-a0Tv-lEJ9-Xubw-f4We-ymak-TZ0ah4
   
    Wiping internal VG cache
This are OK.

But 2 Blades show me 


[root at blade9 ~]# lvdisplay
lvm2    Logging initialised at Tue Jun 10 12:14:41 2008
lvm2    Set umask to 0077
lvm2  connect() failed on local socket: Verbindungsaufbau abgelehnt
lvm2  WARNING: Falling back to local file-based locking.
lvm2  Volume Groups with the clustered attribute will be inaccessible.
lvm2    Finding all logical volumes
lvm2  Skipping clustered volume group SATAStorage
lvm2  Skipping clustered volume group SCSIStorage
lvm2    Wiping internal VG cache
[root at blade9 ~]# vgdisplay
lvm2    Logging initialised at Tue Jun 10 12:20:13 2008
lvm2    Set umask to 0077
lvm2  connect() failed on local socket: Verbindungsaufbau abgelehnt
lvm2  WARNING: Falling back to local file-based locking.
lvm2  Volume Groups with the clustered attribute will be inaccessible.
lvm2    Finding all volume groups
lvm2    Finding volume group "SATAStorage"
lvm2  Skipping clustered volume group SATAStorage
lvm2    Finding volume group "SCSIStorage"
lvm2  Skipping clustered volume group SCSIStorage
lvm2    Wiping internal VG cache

cluster.conf and lvm.conf identical.


This difference I could see already at boot time, when activating lvm's. 

Any ideas?
Thanks for assistance.
Peter


______________________________________________________
Peter Haufschild
Systemadministrator
DILAX Intelcom GmbH
Alt-Moabit 96 b
D-10559 Berlin
Tel.: +49-30-773092-338
Mobil: +49-172-9909197
Fax: +49-30-773092-50
E-Mail: peter.haufschild at dilax.com
www.dilax.com
______________________________________________________
Dilax Intelcom GmbH | Amtsgericht Charlottenburg |
96 HRB 28251 Tax No.: 27/23/6233 | ID Nr. DE 136699718 CEO / 
Gesch?ftsf?hrer: Uwe Hinrichsen, Jan Karsch


From ccaulfie at redhat.com  Tue Jun 10 10:39:11 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 10 Jun 2008 11:39:11 +0100
Subject: [Linux-cluster] clvmd  Falling back to local file-based locking
In-Reply-To: <H0000084001f9ea4.1213093384.postamt.berlin.dilax.com@MHS>
References: <H0000084001f9ea4.1213093384.postamt.berlin.dilax.com@MHS>
Message-ID: <484E59CF.7030600@redhat.com>

Peter Haufschild wrote:
> Hallo,
> after a reboot I start cman and clvmd and the start success.
> 
> 3 Blades show's me
> 
> [root at blade1 ~]# pvdisplay
>     Logging initialised at Tue Jun 10 12:18:56 2008
>     Set umask to 0077
>     Scanning for physical volume names
>   --- Physical volume ---
>   PV Name               /dev/sdb1
>   VG Name               SATAStorage
>   PV Size               4096,00 EB / not usable 4096,00 EB
>   Allocatable           yes 
>   PE Size (KByte)       4096
>   Total PE              476906
>   Free PE               4842
>   Allocated PE          472064
>   PV UUID               I2xV9i-8SN7-ifza-CmJA-W6ez-b2cT-OL4gxw
>    
>   --- Physical volume ---
>   PV Name               /dev/sda1
>   VG Name               SCSIStorage
>   PV Size               410,19 GB / not usable 3,40 MB
>   Allocatable           yes 
>   PE Size (KByte)       4096
>   Total PE              105008
>   Free PE               19248
>   Allocated PE          85760
>   PV UUID               695f63-a0Tv-lEJ9-Xubw-f4We-ymak-TZ0ah4
>    
>     Wiping internal VG cache
> This are OK.
> 
> But 2 Blades show me 
> 
> 
> [root at blade9 ~]# lvdisplay
> lvm2    Logging initialised at Tue Jun 10 12:14:41 2008
> lvm2    Set umask to 0077
> lvm2  connect() failed on local socket: Verbindungsaufbau abgelehnt
> lvm2  WARNING: Falling back to local file-based locking.
> lvm2  Volume Groups with the clustered attribute will be inaccessible.
> lvm2    Finding all logical volumes
> lvm2  Skipping clustered volume group SATAStorage
> lvm2  Skipping clustered volume group SCSIStorage
> lvm2    Wiping internal VG cache
> [root at blade9 ~]# vgdisplay
> lvm2    Logging initialised at Tue Jun 10 12:20:13 2008
> lvm2    Set umask to 0077
> lvm2  connect() failed on local socket: Verbindungsaufbau abgelehnt
> lvm2  WARNING: Falling back to local file-based locking.
> lvm2  Volume Groups with the clustered attribute will be inaccessible.
> lvm2    Finding all volume groups
> lvm2    Finding volume group "SATAStorage"
> lvm2  Skipping clustered volume group SATAStorage
> lvm2    Finding volume group "SCSIStorage"
> lvm2  Skipping clustered volume group SCSIStorage
> lvm2    Wiping internal VG cache
> 
> cluster.conf and lvm.conf identical.
> 
> 
> This difference I could see already at boot time, when activating lvm's. 

Assuming that Google's translation to "Connection Rejected" is correct 
then it sounds like either clvmd isn't running or possibly SELinux 
getting in the way of connecting to clvmd.

You say clvmd has started, but have you check it is actually running ?

Chrissie


From peter.haufschild at dilax.com  Tue Jun 10 10:50:25 2008
From: peter.haufschild at dilax.com (Peter Haufschild)
Date: Tue, 10 Jun 2008 12:50:25 +0200
Subject: AW: Re: [Linux-cluster] clvmd Falling back to local file-based locking
In-Reply-To: <484E59CF.7030600@redhat.com>
Message-ID: <H0000084001f9eaf.1213095024.postamt.berlin.dilax.com@MHS>

> -----Urspr?ngliche Nachricht-----
> Von: Christine Caulfield [mailto:ccaulfie at redhat.com]
> Gesendet: Dienstag, 10. Juni 2008 12:39
> An: linux clustering
> Betreff: Re: [Linux-cluster] clvmd Falling back to local file-based
> locking
> 
> Peter Haufschild wrote:
> > Hallo,
> > after a reboot I start cman and clvmd and the start success.
> >
> > 3 Blades show's me
> >
> > [root at blade1 ~]# pvdisplay
> >     Logging initialised at Tue Jun 10 12:18:56 2008
> >     Set umask to 0077
> >     Scanning for physical volume names
> >   --- Physical volume ---
> >   PV Name               /dev/sdb1
> >   VG Name               SATAStorage
> >   PV Size               4096,00 EB / not usable 4096,00 EB
> >   Allocatable           yes
> >   PE Size (KByte)       4096
> >   Total PE              476906
> >   Free PE               4842
> >   Allocated PE          472064
> >   PV UUID               I2xV9i-8SN7-ifza-CmJA-W6ez-b2cT-OL4gxw
> >
> >   --- Physical volume ---
> >   PV Name               /dev/sda1
> >   VG Name               SCSIStorage
> >   PV Size               410,19 GB / not usable 3,40 MB
> >   Allocatable           yes
> >   PE Size (KByte)       4096
> >   Total PE              105008
> >   Free PE               19248
> >   Allocated PE          85760
> >   PV UUID               695f63-a0Tv-lEJ9-Xubw-f4We-ymak-TZ0ah4
> >
> >     Wiping internal VG cache
> > This are OK.
> >
> > But 2 Blades show me
> >
> >
> > [root at blade9 ~]# lvdisplay
> > lvm2    Logging initialised at Tue Jun 10 12:14:41 2008
> > lvm2    Set umask to 0077
> > lvm2  connect() failed on local socket: Verbindungsaufbau abgelehnt
> > lvm2  WARNING: Falling back to local file-based locking.
> > lvm2  Volume Groups with the clustered attribute will be inaccessible.
> > lvm2    Finding all logical volumes
> > lvm2  Skipping clustered volume group SATAStorage
> > lvm2  Skipping clustered volume group SCSIStorage
> > lvm2    Wiping internal VG cache
> > [root at blade9 ~]# vgdisplay
> > lvm2    Logging initialised at Tue Jun 10 12:20:13 2008
> > lvm2    Set umask to 0077
> > lvm2  connect() failed on local socket: Verbindungsaufbau abgelehnt
> > lvm2  WARNING: Falling back to local file-based locking.
> > lvm2  Volume Groups with the clustered attribute will be inaccessible.
> > lvm2    Finding all volume groups
> > lvm2    Finding volume group "SATAStorage"
> > lvm2  Skipping clustered volume group SATAStorage
> > lvm2    Finding volume group "SCSIStorage"
> > lvm2  Skipping clustered volume group SCSIStorage
> > lvm2    Wiping internal VG cache
> >
> > cluster.conf and lvm.conf identical.
> >
> >
> > This difference I could see already at boot time, when activating lvm's.
> 
> Assuming that Google's translation to "Connection Rejected" is correct

[Peter Haufschild] 
Connection Rejected that's the problem 

> then it sounds like either clvmd isn't running or possibly SELinux
[Peter Haufschild] 
Selinux is disabled

/etc/init.d/clvmd status
clvmd (PID 7866) wird ausgef?hrt...
active volumes: lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2 lvm2

maybe he is lying :-(

but on a working maschine it looks like:

/etc/init.d/clvmd status
clvmd (PID 5448) wird ausgef?hrt...
active volumes: alfrescovar bose ftpvar gfs intranet projektronvar scalixvar schulung schulung1 schulung2 sql2005dmzD alfrescoroot aurillac cotral dmztest ftproot hkl management projektronroot rhel5test samba scalixroot sql2005dmzC sql2005dmzc stabus swt w2k3calypso1 w2k3calypso2 w2k3caruso1 w2k3caruso2


/etc/init.d/clvmd restart
Deactivating VG SATAStorage: lvm2    Logging initialised at Tue Jun 10 12:44:47 2008
lvm2    Set umask to 0077
lvm2    Using volume group(s) on command line
lvm2    Finding volume group "SATAStorage"
lvm2    Deactivated logical volumes in volume group "SATAStorage"
lvm2  0 logical volume(s) in volume group "SATAStorage" now active
lvm2    Wiping internal VG cache
                                                           [  OK  ]
Deactivating VG SCSIStorage: lvm2    Logging initialised at Tue Jun 10 12:44:47 2008
lvm2    Set umask to 0077
lvm2    Using volume group(s) on command line
lvm2    Finding volume group "SCSIStorage"
lvm2    Deactivated logical volumes in volume group "SCSIStorage"
lvm2  0 logical volume(s) in volume group "SCSIStorage" now active
lvm2    Wiping internal VG cache
                                                           [  OK  ]
Stopping clvm:                                             [  OK  ]
Starting clvmd:                                            [  OK  ]
Activating VGs: lvm2    Logging initialised at Tue Jun 10 12:44:49 2008
lvm2    Set umask to 0077
lvm2    Finding all volume groups
lvm2    Finding volume group "SATAStorage"
lvm2    Activated logical volumes in volume group "SATAStorage"
lvm2  11 logical volume(s) in volume group "SATAStorage" now active
lvm2    Finding volume group "SCSIStorage"
lvm2    Activated logical volumes in volume group "SCSIStorage"
lvm2  19 logical volume(s) in volume group "SCSIStorage" now active
lvm2    Wiping internal VG cache
                                                           [  OK  ]

When I do this restart on a working machine I do not get this lvm2 on the beginning of each line as output.

Peter

> getting in the way of connecting to clvmd.
> 
> You say clvmd has started, but have you check it is actually running ?
> 
> Chrissie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From maurizio.rottin at gmail.com  Tue Jun 10 15:24:22 2008
From: maurizio.rottin at gmail.com (Maurizio Rottin)
Date: Tue, 10 Jun 2008 17:24:22 +0200
Subject: [Linux-cluster] getting rhcs critical events by email/snmp
Message-ID: <e83473390806100824r57747a02q3f2c3b8d1153dab5@mail.gmail.com>

hi,
i had a problem with a cluster node (i call it nodeA for now on).
It suddently stopped it's service without notifing the other nodes (or
maybe it did notify, but the other nodes did not get the packets). As
a result, no one was able to get a lock on a gfs filesystem shared
across 5 nodes. (and no one was trying to fence it).

Perhaps it was a network problem, because the nodeA was still working,
or a ccsd problem.

Anyway, since if it happen once it can happen again, is there any way
to get the cluster send emails or snmp messages on nodes critical
events?

-- 
mr


From ross at kallisti.us  Tue Jun 10 15:55:40 2008
From: ross at kallisti.us (Ross Vandegrift)
Date: Tue, 10 Jun 2008 11:55:40 -0400
Subject: [Linux-cluster] GFS performance tuning
In-Reply-To: <1213046217.21321.53.camel@technetium.msp.redhat.com>
References: <1213046217.21321.53.camel@technetium.msp.redhat.com>
Message-ID: <20080610155540.GA32692@kallisti.us>

On Mon, Jun 09, 2008 at 04:16:56PM -0500, Bob Peterson wrote:
> Hi Everyone,
> 
> I just wanted to let everyone here know that I just updated the
> cluster wiki page regarding GFS performance tuning.  I added a bunch
> of information about increasing GFS performance:

Hi Bob, thanks for the additional information.  Quick questions:


> 1. How to use "fast statfs".

On a GFS2 filesystem, I see the following:
[root at sensor02 ~]# gfs2_tool gettune /rrds
...
statfs_slow = 0
...

Does that indicate that my filesystem is already using this feature?

> 2. Disabling updatedb for GFS.
> 3. More considerations about the Resource Group size and the
>    new "bitfit" function.
> 4. Designing your environment with the DLM in mind.

Do you have any specific reading material you'd suggest on this topic?
I suspect the interesting bits are related ot how GFS actually uses
the DLM to lock filesystem metadata.  I've read some of Christine's
DLM book, but there's not really anything related to GFS therein.

> 5. How to use "glock trimming".

Is the glock trimming patch present in the cluster suite from RH5?

The link provided in the Wiki went to a bugzilla bug with no info, and
URL to people.redhat.com didn't work over https.  I update that, but
wanted to make sure the Bugzilla link wasn't relevant.


-- 
Ross Vandegrift
ross at kallisti.us

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37


From david.costakos at gmail.com  Tue Jun 10 16:53:55 2008
From: david.costakos at gmail.com (Dave Costakos)
Date: Tue, 10 Jun 2008 09:53:55 -0700
Subject: [Linux-cluster] Basic RHEL 5.1 cluster problem
In-Reply-To: <4b28518b0806100118r10c908b6m8c3e3321355ab180@mail.gmail.com>
References: <4b28518b0806100118r10c908b6m8c3e3321355ab180@mail.gmail.com>
Message-ID: <6b6836c60806100953p727a8653r14e4bdcced946f38@mail.gmail.com>

It is expected behavior for a cluster node to fence the 2nd node in a 2-node
cluster _IF_ it comes up and it can't connect to the 2nd cluster node ccs
daemon.

I've seen this happen in the past when multicast doesn't work as you might
expect on your network.  You may want to try specifying a multicast IP
address in the 224.0.0.111 - 224.0.0.250 range in your cluster.conf file.

-Dave.

2008/6/10 Loris Strozzini <lstrozzini at gmail.com>:

> Hi all,
> I have a problem with my RHEL 5.2 a 2 node cluster running on IBM X3650.
> My cluster is configured for fencing on IBM RSAII via
> system-config-cluster, with only one network interface, no shared storage
> and I have followed the Red Hat Cluster suite for installation.
> At the first look, no syntax error in my cluster.conf but when I'm going to
> start the cman and the rgmanager daemons on primary node the other node
> reboot or poweroff immediately.
>
> Can anyone help me?
> Thanks in advance
>
>
> Loris
>
>
> My cluster.conf:
>
> <?xml version="1.0" ?>
> <cluster alias="newsocks" config_version="10" name="newsocks">
>         <fence_daemon post_fail_delay="60" post_join_delay="30"/>
>         <clusternodes>
>                 <clusternode name="socks1" nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="RSA_1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="socks2" nodeid="2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="RSA_2"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_rsa" ipaddr="10.242.164.126"
> login="xxxxx" name="RSA_1" passwd="xxxxx"/>
>                 <fencedevice agent="fence_rsa" ipaddr="10.242.164.128"
> login="xxxxx" name="RSA_2" passwd="xxxxx"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="domso" ordered="1"
> restricted="1">
>                                 <failoverdomainnode name="socks1"
> priority="1"/>
>                                 <failoverdomainnode name="socks2"
> priority="2"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <ip address="10.242.156.100" monitor_link="1"/>
>                 </resources>
>                 <service autostart="1" domain="domso" name="servso"
> recovery="relocate">
>                         <ip ref="10.242.156.100">
>                                 <script file="/etc/init.d/ss5"
> name="sockss5"/>
>                         </ip>
>                 </service>
>         </rm>
> </cluster>
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Dave Costakos
mailto:david.costakos at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080610/5485b72f/attachment.htm>

From rpeterso at redhat.com  Tue Jun 10 16:59:29 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 10 Jun 2008 11:59:29 -0500
Subject: [Linux-cluster] GFS performance tuning
In-Reply-To: <20080610155540.GA32692@kallisti.us>
References: <1213046217.21321.53.camel@technetium.msp.redhat.com>
	<20080610155540.GA32692@kallisti.us>
Message-ID: <1213117169.21321.97.camel@technetium.msp.redhat.com>

Hi Ross,

On Tue, 2008-06-10 at 11:55 -0400, Ross Vandegrift wrote:
> On a GFS2 filesystem, I see the following:
> [root at sensor02 ~]# gfs2_tool gettune /rrds
> ...
> statfs_slow = 0
> ...
> 
> Does that indicate that my filesystem is already using this feature?

Yes, GFS2 always uses fast statfs by default.  The code was retrofitted
to GFS (1) by Wendy Cheng (Thanks, Wendy!)

> Do you have any specific reading material you'd suggest on this topic?
> I suspect the interesting bits are related ot how GFS actually uses
> the DLM to lock filesystem metadata.  I've read some of Christine's
> DLM book, but there's not really anything related to GFS therein.

The inner workings of glocks and how GFS and GFS2 use the DLM are
not well documented.  In fact, I don't know of any documents that
exist to explain it.  Steve Whitehouse recently created a "first bash"
glock document and posted it to the cluster-devel mailing list.
You can read it at:

https://www.redhat.com/archives/cluster-devel/2008-June/msg00021.html

The rules for how GFS and GFS2 use the DLM are diverging, so this
document is really a fast-moving target.  What now applies to the
upstream GFS2 does not necessarily apply to the current GFS(1).
Until recently, GFS2 and GFS used the same lock harness, lock_dlm.ko.
However, as time goes on, it is becoming increasingly necessary to
split the lock harness into separate ones for each file system.
This is an ongoing thing right now, upstream at least.  I suspect
that when GFS2 hits RHEL (hopefully for 5.3) there will be two
separate lock harnesses for GFS and GFS2.

> Is the glock trimming patch present in the cluster suite from RH5?

Yes, I glock trimming is available in RHEL5.  I think it was back-
ported to RHEL4 at some level (4.6 maybe?), again by Wendy Cheng.

> The link provided in the Wiki went to a bugzilla bug with no info, and
> URL to people.redhat.com didn't work over https.  I update that, but
> wanted to make sure the Bugzilla link wasn't relevant.

Hm.  Unfortunately, the bugzilla permissions are kind of out of my
control.  I'll see if I can find someone who can change them.

I don't know of a way to make https work on my people page, but http
should work.

I should add that most of the information on my people page is actually
stuff that was created by Wendy Cheng, and she should get the credit.
I had it moved to my account when she left Red Hat so the information
didn't get lost.

Regards,

Bob Peterson
Red Hat Clustering & GFS


From s.wendy.cheng at gmail.com  Tue Jun 10 17:59:03 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Tue, 10 Jun 2008 12:59:03 -0500
Subject: [Linux-cluster] GFS performance tuning
In-Reply-To: <20080610155540.GA32692@kallisti.us>
References: <1213046217.21321.53.camel@technetium.msp.redhat.com>
	<20080610155540.GA32692@kallisti.us>
Message-ID: <484EC0E7.7070804@gmail.com>

Ross Vandegrift wrote:

>> 1. How to use "fast statfs".
>>     
>
> On a GFS2 filesystem, I see the following:
> [root at sensor02 ~]# gfs2_tool gettune /rrds
> ...
> statfs_slow = 0
> ...
>
> Does that indicate that my filesystem is already using this feature?
>   

The fast statfs patch was a *back* port from GFS2 to GFS1. In GFS2, fast 
statfs is the *default*.

>> 2. Disabling updatedb for GFS.
>> 3. More considerations about the Resource Group size and the
>>    new "bitfit" function.
>> 4. Designing your environment with the DLM in mind.
>>     
>
> Do you have any specific reading material you'd suggest on this topic?
> I suspect the interesting bits are related ot how GFS actually uses
> the DLM to lock filesystem metadata.  I've read some of Christine's
> DLM book, but there's not really anything related to GFS therein.
>   

I happen to have a new write-up about GFS locking. It, unfortunately, 
also interleaves with my current employer's disk block allocation policy 
(proprietary info). Note that GFS disk block handling has been 
piggy-backed on its glock logic. Would need sometime to clean it up for 
public reading. Stay tuned.

>   
>> 5. How to use "glock trimming".
>>     
>
> Is the glock trimming patch present in the cluster suite from RH5?
>   
For GFS1, yes.

For GFS2 ... it is a long story... I'll let other people comment it.

-- Wendy


From lhh at redhat.com  Tue Jun 10 18:37:19 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 10 Jun 2008 14:37:19 -0400
Subject: [Linux-cluster] CS5 / quorum disk and heuristics
In-Reply-To: <484E46B5.4080107@bull.net>
References: <484E46B5.4080107@bull.net>
Message-ID: <1213123039.20204.105.camel@ayanami.boston.devel.redhat.com>


On Tue, 2008-06-10 at 11:17 +0200, Alain Moulle wrote:
> Hi Lon,
> 
> and thanks again for all your answers.
> >>Whereas heart-beat interface was working fine.
> >>You can disable these by setting allow_kill="0" and/or reboot="0"
> >>(see qdisk(5)).
> 
> => ok but in the case of a heart-beat failure, it will no more
> avoid the dual-fencing if allow_kill="0" and/or reboot="0" , right ?

I'd have to think about it.


> >> And after reboot, I can see "cluster not quorate" etc.
> >> This happens after both nodes boot, or just one?
> After both nodes reboot but this message appears only on one.

Oh... so it's coming from ccsd?  Don't worry about it.  Does it recover
and return to normal operation?

-- Lon


From lhh at redhat.com  Tue Jun 10 18:39:36 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 10 Jun 2008 14:39:36 -0400
Subject: [Linux-cluster] getting rhcs critical events by email/snmp
In-Reply-To: <e83473390806100824r57747a02q3f2c3b8d1153dab5@mail.gmail.com>
References: <e83473390806100824r57747a02q3f2c3b8d1153dab5@mail.gmail.com>
Message-ID: <1213123176.20204.108.camel@ayanami.boston.devel.redhat.com>


On Tue, 2008-06-10 at 17:24 +0200, Maurizio Rottin wrote:
> hi,
> i had a problem with a cluster node (i call it nodeA for now on).

> It suddently stopped it's service without notifing the other nodes (or
> maybe it did notify, but the other nodes did not get the packets). As
> a result, no one was able to get a lock on a gfs filesystem shared
> across 5 nodes. (and no one was trying to fence it).

Could you get cman_tool services output ?


> Perhaps it was a network problem, because the nodeA was still working,
> or a ccsd problem.
> 
> Anyway, since if it happen once it can happen again, is there any way
> to get the cluster send emails or snmp messages on nodes critical
> events?

You can set syslog to do email, I think.

-- Lon


From zac at sprackett.com  Tue Jun 10 18:44:46 2008
From: zac at sprackett.com (S. Zachariah Sprackett)
Date: Tue, 10 Jun 2008 14:44:46 -0400
Subject: [Linux-cluster] getting rhcs critical events by email/snmp
In-Reply-To: <1213123176.20204.108.camel@ayanami.boston.devel.redhat.com>
References: <e83473390806100824r57747a02q3f2c3b8d1153dab5@mail.gmail.com>
	<1213123176.20204.108.camel@ayanami.boston.devel.redhat.com>
Message-ID: <ed9a61600806101144x2e07859cyb73abc4c6de95677@mail.gmail.com>

On Tue, Jun 10, 2008 at 2:39 PM, Lon Hohberger <lhh at redhat.com> wrote:

> > Anyway, since if it happen once it can happen again, is there any way
> > to get the cluster send emails or snmp messages on nodes critical
> > events?
>
> You can set syslog to do email, I think.


See here for details on how:

http://www.johnandcailin.com/blog/john/how-setup-real-time-email-notification-critical-syslog-events
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080610/e600b37d/attachment.htm>

From jralph at intertechmedia.com  Tue Jun 10 21:29:50 2008
From: jralph at intertechmedia.com (Jason R. Ralph)
Date: Tue, 10 Jun 2008 17:29:50 -0400
Subject: [Linux-cluster] Connecting to an External SUN SAN box !
Message-ID: <b53b87dfbf3148ab8a5cd32a6d60901d.jralph@intertechmedia.com>

Hello Farislinux,

Due to each install being very different I recommend
the following documents.
You can download each pdf from the red hat site.

1.  Cluster_Administration.pdf

2. Global_File_System.pdf

3. Rhel-4_Quick_Deployment_Guide.pdf


Jason R. Ralph

InterTech Media, LLC.
20 Summer Street - 5th Floor
Stamford CT 06901
http://www.intertechmedia.com
jralph at intertechmedia.com


------- Original Message -------
>From    : faris[mailto:farislinux at yahoo.com]
Sent    : 6/10/2008 4:39:09 AM
To      : linux-cluster at redhat.com
Cc      : 
Subject : RE: [Linux-cluster] Connecting to an
External SUN SAN box !

 Hi,
 
I am really a newbie to linux clustering and GFS and
i have used samba and cifs for remote file sharing etc .
 
Currently there is a SUN SAN box running well and
some of the servers from other vendors are accessing
storage spaces via FC HBA  (2GB). 
 
My task is to access this SUN SAN Box same storage
space via 2 RHEL4 servers having FC HBA cards each
connected. could some one tell me:
 
1. If i install GFS only is enough or do i need to
install cluster suite as well?
2. Are there any special configs need to be done ?
3. Are there any special mount parameters need to pass ?
 
Thanks a lot!
 
Farislinux
  

InterTech Media LLC


From rcronenwett at gmail.com  Wed Jun 11 00:12:55 2008
From: rcronenwett at gmail.com (Ron Cronenwett)
Date: Tue, 10 Jun 2008 20:12:55 -0400
Subject: [Linux-cluster] Create Cluster using existing LVM on shared storage
Message-ID: <9c649280806101712g1b5c68a4g1a4d53f534596d95@mail.gmail.com>

Hi all,

I have been presented a pair of servers (RHEL 5.2) connected to a dual
port scsi array. The storage has been configured as one large disk
partitioned into /dev/sdb1 /dev/sdb2,and /dev/sdb3 with dev/sdb3 the
sole device in a physical volume. Several logical volumes have been
created on the physical volume. Currently the logical volumes are
mounted on server A. Server B was added to the mix as a new server.
Server B sees the external storage with fdisk -l and the volumes with
pvdisplay.

My questions are, can an existing logical volume be used as a cluster
resource that is mounted on one node or the other (but not both)?
Would this work with just the existing LVM or would CLVM be needed? If
I need CLVM, can the existing logical volumes be used without
destroying the existing data?  In either case, would all the logical
volumes need to be mounted on the same server or can just one logical
volume be mounted on one server while the remaining logical volumes
are mounted on the other?

The idea is to create an active-passive HA cluster of an existing web server.

Any thoughts, ideas, comments, questions would be appreciated.

Thanks

Ron Cronenwett


From orkcu at yahoo.com  Wed Jun 11 02:14:23 2008
From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=)
Date: Tue, 10 Jun 2008 19:14:23 -0700 (PDT)
Subject: [Linux-cluster] clvmd  Falling back to local file-based locking
In-Reply-To: <484E59CF.7030600@redhat.com>
Message-ID: <406282.81660.qm@web50603.mail.re2.yahoo.com>


--- On Tue, 6/10/08, Christine Caulfield <ccaulfie at redhat.com> wrote:

> From: Christine Caulfield <ccaulfie at redhat.com>
> Subject: Re: [Linux-cluster] clvmd  Falling back to local file-based locking
> To: "linux clustering" <linux-cluster at redhat.com>
> Received: Tuesday, June 10, 2008, 6:39 AM
> Peter Haufschild wrote:
> > Hallo,
> > after a reboot I start cman and clvmd and the start
> success.
> > 
> > 3 Blades show's me
> > 
> > [root at blade1 ~]# pvdisplay
> >     Logging initialised at Tue Jun 10 12:18:56 2008
> >     Set umask to 0077
> >     Scanning for physical volume names
> >   --- Physical volume ---
> >   PV Name               /dev/sdb1
> >   VG Name               SATAStorage
> >   PV Size               4096,00 EB / not usable
> 4096,00 EB
> >   Allocatable           yes 
> >   PE Size (KByte)       4096
> >   Total PE              476906
> >   Free PE               4842
> >   Allocated PE          472064
> >   PV UUID              
> I2xV9i-8SN7-ifza-CmJA-W6ez-b2cT-OL4gxw
> >    
> >   --- Physical volume ---
> >   PV Name               /dev/sda1
> >   VG Name               SCSIStorage
> >   PV Size               410,19 GB / not usable 3,40 MB
> >   Allocatable           yes 
> >   PE Size (KByte)       4096
> >   Total PE              105008
> >   Free PE               19248
> >   Allocated PE          85760
> >   PV UUID              
> 695f63-a0Tv-lEJ9-Xubw-f4We-ymak-TZ0ah4
> >    
> >     Wiping internal VG cache
> > This are OK.
> > 
> > But 2 Blades show me 
> > 
> > 
> > [root at blade9 ~]# lvdisplay
> > lvm2    Logging initialised at Tue Jun 10 12:14:41
> 2008
> > lvm2    Set umask to 0077
> > lvm2  connect() failed on local socket:
> Verbindungsaufbau abgelehnt
> > lvm2  WARNING: Falling back to local file-based
> locking.
> > lvm2  Volume Groups with the clustered attribute will
> be inaccessible.
> > lvm2    Finding all logical volumes
> > lvm2  Skipping clustered volume group SATAStorage
> > lvm2  Skipping clustered volume group SCSIStorage
> > lvm2    Wiping internal VG cache
> > [root at blade9 ~]# vgdisplay
> > lvm2    Logging initialised at Tue Jun 10 12:20:13
> 2008
> > lvm2    Set umask to 0077
> > lvm2  connect() failed on local socket:
> Verbindungsaufbau abgelehnt
> > lvm2  WARNING: Falling back to local file-based
> locking.
> > lvm2  Volume Groups with the clustered attribute will
> be inaccessible.
> > lvm2    Finding all volume groups
> > lvm2    Finding volume group "SATAStorage"
> > lvm2  Skipping clustered volume group SATAStorage
> > lvm2    Finding volume group "SCSIStorage"
> > lvm2  Skipping clustered volume group SCSIStorage
> > lvm2    Wiping internal VG cache
> > 
> > cluster.conf and lvm.conf identical.
> > 
> > 
> > This difference I could see already at boot time, when
> activating lvm's. 
> 
> Assuming that Google's translation to "Connection
> Rejected" is correct 
> then it sounds like either clvmd isn't running or
> possibly SELinux 
> getting in the way of connecting to clvmd.

you mean SELinux or iptables?

because localfirewall can prevent the connection also

cu
roger


      __________________________________________________________________
Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail.  Click on Options in Mail and switch to New Mail today or register for free at http://mail.yahoo.ca


From oioi at cableplus.com.cn  Wed Jun 11 07:52:18 2008
From: oioi at cableplus.com.cn (Lu Wen-yan)
Date: Wed, 11 Jun 2008 15:52:18 +0800
Subject: [Linux-cluster] Booting node 1 causes it to fence node 2
In-Reply-To: <g16mki$enl$1@ger.gmane.org>
References: <g16mki$enl$1@ger.gmane.org>
Message-ID: <1487391179.20080611155218@cableplus.com.cn>

Hello NM,

You can add clean_start="1" in your config

sample:

<cluster alias="cms" config_version="4" name="cms">
        <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>


Friday, May 23, 2008, 11:08:02 PM, you wrote:

N> I have two nodes, each fenceable through a Dell RAC card. When I power 
N> cycle one of them, it reboots ... and proceeds to fence the other one!

N> I must be missing something ...

N> (btw should cman be started in init.d automatically? or should it be 
N> launched by an operator after having made sure the node was sane?)

N> --
N> Linux-cluster mailing list
N> Linux-cluster at redhat.com
N> https://www.redhat.com/mailman/listinfo/linux-cluster
 

-- 
Best regards,
 Lu                            mailto:oioi at cableplus.com.cn


From jbrassow at redhat.com  Wed Jun 11 12:16:33 2008
From: jbrassow at redhat.com (Jonathan Brassow)
Date: Wed, 11 Jun 2008 07:16:33 -0500
Subject: [Linux-cluster] Create Cluster using existing LVM on shared
	storage
In-Reply-To: <9c649280806101712g1b5c68a4g1a4d53f534596d95@mail.gmail.com>
References: <9c649280806101712g1b5c68a4g1a4d53f534596d95@mail.gmail.com>
Message-ID: <B81BE59F-E503-4725-B74E-C7B49EEE54C4@redhat.com>

Here is a good source of information for those questions:
http://sources.redhat.com/cluster/wiki/LVMFailover

  brassow

On Jun 10, 2008, at 7:12 PM, Ron Cronenwett wrote:

> Hi all,
>
> I have been presented a pair of servers (RHEL 5.2) connected to a dual
> port scsi array. The storage has been configured as one large disk
> partitioned into /dev/sdb1 /dev/sdb2,and /dev/sdb3 with dev/sdb3 the
> sole device in a physical volume. Several logical volumes have been
> created on the physical volume. Currently the logical volumes are
> mounted on server A. Server B was added to the mix as a new server.
> Server B sees the external storage with fdisk -l and the volumes with
> pvdisplay.
>
> My questions are, can an existing logical volume be used as a cluster
> resource that is mounted on one node or the other (but not both)?
> Would this work with just the existing LVM or would CLVM be needed? If
> I need CLVM, can the existing logical volumes be used without
> destroying the existing data?  In either case, would all the logical
> volumes need to be mounted on the same server or can just one logical
> volume be mounted on one server while the remaining logical volumes
> are mounted on the other?
>
> The idea is to create an active-passive HA cluster of an existing  
> web server.
>
> Any thoughts, ideas, comments, questions would be appreciated.
>
> Thanks
>
> Ron Cronenwett
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From rcronenwett at gmail.com  Wed Jun 11 13:23:54 2008
From: rcronenwett at gmail.com (Ron Cronenwett)
Date: Wed, 11 Jun 2008 09:23:54 -0400
Subject: [Linux-cluster] Create Cluster using existing LVM on shared
	storage
In-Reply-To: <B81BE59F-E503-4725-B74E-C7B49EEE54C4@redhat.com>
References: <9c649280806101712g1b5c68a4g1a4d53f534596d95@mail.gmail.com>
	<B81BE59F-E503-4725-B74E-C7B49EEE54C4@redhat.com>
Message-ID: <9c649280806110623l15f67558r35d0ba05710a01df@mail.gmail.com>

Excellent, this answers most of what I was looking for.
Just one question, if I were to install CLVM, can the existing volume
group and logical volume be converted or would I need to recreate
everything and restore the data from a back up copy?

Thanks

Ron Cronenwett


On Wed, Jun 11, 2008 at 8:16 AM, Jonathan Brassow <jbrassow at redhat.com> wrote:
> Here is a good source of information for those questions:
> http://sources.redhat.com/cluster/wiki/LVMFailover
>
>  brassow
>
> On Jun 10, 2008, at 7:12 PM, Ron Cronenwett wrote:
>
>> Hi all,
>>
>> I have been presented a pair of servers (RHEL 5.2) connected to a dual
>> port scsi array. The storage has been configured as one large disk
>> partitioned into /dev/sdb1 /dev/sdb2,and /dev/sdb3 with dev/sdb3 the
>> sole device in a physical volume. Several logical volumes have been
>> created on the physical volume. Currently the logical volumes are
>> mounted on server A. Server B was added to the mix as a new server.
>> Server B sees the external storage with fdisk -l and the volumes with
>> pvdisplay.
>>
>> My questions are, can an existing logical volume be used as a cluster
>> resource that is mounted on one node or the other (but not both)?
>> Would this work with just the existing LVM or would CLVM be needed? If
>> I need CLVM, can the existing logical volumes be used without
>> destroying the existing data?  In either case, would all the logical
>> volumes need to be mounted on the same server or can just one logical
>> volume be mounted on one server while the remaining logical volumes
>> are mounted on the other?
>>
>> The idea is to create an active-passive HA cluster of an existing web
>> server.
>>
>> Any thoughts, ideas, comments, questions would be appreciated.
>>
>> Thanks
>>
>> Ron Cronenwett
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


From tiagocruz at forumgdh.net  Wed Jun 11 23:10:43 2008
From: tiagocruz at forumgdh.net (Tiago Cruz)
Date: Wed, 11 Jun 2008 20:10:43 -0300
Subject: [Linux-cluster] DRBD8 and GFS issues
Message-ID: <1213225843.6895.59.camel@tuxkiller.ig.com.br>

Hello guys,

I'm trying to use one cluster with 2 nodes, using DRDB 8.x and GFS 1.x
on RHEL 5.2 x84_64.

The problem is: Then one machine was gone (node2) the node1 stop to work
(one simple 'ls -l' on shared mounted point) until the second machine
return.

I'm using GFS on this way:

# gfs_mkfs -t hotsite:gfs-00 -p lock_dlm -j 2 /dev/drbd0
# mount -v /dev/drbd0 /test

'Causing a FAIL on second node on this way:
# echo 1 > /proc/sys/kernel/sysrq
# echo b > /proc/sysrq-trigger

=======================================?=======================================
$ cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster name="hotsite" config_version="4">

<cman two_node="1" expected_votes="1"/>

<fence_daemon post_join_delay="60">
</fence_daemon>

<clusternodes>
<clusternode name="drdb_hotsite-1" nodeid="1">
        <fence>
                <method name="single">
                        <device name="gnbd" ipaddr="192.168.0.3"/>
                </method>
        </fence>
</clusternode>
<clusternode name="drdb_hotsite-2" nodeid="2">
        <fence>
                <method name="single">
                        <device name="gnbd" ipaddr="192.168.0.3"/>
                </method>
        </fence>
</clusternode>
</clusternodes>

<fencedevices>
        <fencedevice name="manual" agent="fence_manual"/>
</fencedevices>
</cluster>
?=======================================?=======================================

Follow the logs:

Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: PingAck did not arrive in time.
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 ) 
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: asender terminated
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: Terminating asender thread
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: short read expecting header on sock: r=-512
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: Creating new current UUID
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: Writing meta data super block now.
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: Connection closed
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: helper command: /sbin/drbdadm outdate-peer
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: outdate-peer helper broken, returned 0
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0:  old = { cs:NetworkFailure st:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0:  new = { cs:Unconnected st:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: conn( NetworkFailure -> Unconnected ) 
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: receiver terminated
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: receiver (re)started
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0:  old = { cs:Unconnected st:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0:  new = { cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jun 11 19:59:08 hotsite-bsb-la-1 kernel: drbd0: conn( Unconnected -> WFConnection ) 
Jun 11 19:59:08 hotsite-bsb-la-1 openais[2939]: [TOTEM] The token was lost in the OPERATIONAL state. 
Jun 11 19:59:08 hotsite-bsb-la-1 openais[2939]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). 
Jun 11 19:59:08 hotsite-bsb-la-1 openais[2939]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). 
Jun 11 19:59:08 hotsite-bsb-la-1 openais[2939]: [TOTEM] entering GATHER state from 2. 
Jun 11 19:59:12 hotsite-bsb-la-1 fenced[2956]: drdb_hotsite-2 not a cluster member after 0 sec post_fail_delay
Jun 11 19:59:12 hotsite-bsb-la-1 fenced[2956]: fencing node "drdb_hotsite-2"
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [TOTEM] entering GATHER state from 0. 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [TOTEM] Creating commit token because I am the rep. 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [TOTEM] Saving state aru 31 high seq received 31 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [TOTEM] Storing new sequence id for ring 168 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [TOTEM] entering COMMIT state. 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [TOTEM] entering RECOVERY state. 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [TOTEM] position [0] member 192.168.0.3: 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [TOTEM] previous ring seq 356 rep 192.168.0.3 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [TOTEM] aru 31 high delivered 31 received flag 1 
Jun 11 19:59:12 hotsite-bsb-la-1 kernel: dlm: closing connection to node 2
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [TOTEM] Did not need to originate any messages in recovery. 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [TOTEM] Sending initial ORF token 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CLM  ] CLM CONFIGURATION CHANGE 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CLM  ] New Configuration: 
Jun 11 19:59:12 hotsite-bsb-la-1 fenced[2956]: fence "drdb_hotsite-2" failed
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CLM  ] 	r(0) ip(192.168.0.3)  
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CLM  ] Members Left: 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CLM  ] 	r(0) ip(192.168.0.4)  
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CLM  ] Members Joined: 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CLM  ] CLM CONFIGURATION CHANGE 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CLM  ] New Configuration: 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CLM  ] 	r(0) ip(192.168.0.3)  
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CLM  ] Members Left: 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CLM  ] Members Joined: 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [SYNC ] This node is within the primary component and will provide service. 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [TOTEM] entering OPERATIONAL state. 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CLM  ] got nodejoin message 192.168.0.3 
Jun 11 19:59:12 hotsite-bsb-la-1 openais[2939]: [CPG  ] got joinlist message from node 1 
Jun 11 19:59:17 hotsite-bsb-la-1 fenced[2956]: fencing node "drdb_hotsite-2"
Jun 11 19:59:17 hotsite-bsb-la-1 fenced[2956]: fence "drdb_hotsite-2" failed
Jun 11 19:59:22 hotsite-bsb-la-1 fenced[2956]: fencing node "drdb_hotsite-2"
Jun 11 19:59:22 hotsite-bsb-la-1 fenced[2956]: fence "drdb_hotsite-2" failed
Jun 11 19:59:27 hotsite-bsb-la-1 fenced[2956]: fencing node "drdb_hotsite-2"
Jun 11 19:59:27 hotsite-bsb-la-1 fenced[2956]: fence "drdb_hotsite-2" failed
.....
Jun 11 20:01:32 hotsite-bsb-la-1 fenced[2956]: fence "drdb_hotsite-2" failed
Jun 11 20:01:37 hotsite-bsb-la-1 fenced[2956]: fencing node "drdb_hotsite-2"
Jun 11 20:01:37 hotsite-bsb-la-1 fenced[2956]: fence "drdb_hotsite-2" failed
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] entering GATHER state from 11. 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] Creating commit token because I am the rep. 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] Saving state aru 14 high seq received 14 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] Storing new sequence id for ring 16c 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] entering COMMIT state. 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] entering RECOVERY state. 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] position [0] member 192.168.0.3: 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] previous ring seq 360 rep 192.168.0.3 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] aru 14 high delivered 14 received flag 1 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] position [1] member 192.168.0.4: 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] previous ring seq 360 rep 192.168.0.4 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] aru 9 high delivered 9 received flag 1 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] Did not need to originate any messages in recovery. 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] Sending initial ORF token 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] CLM CONFIGURATION CHANGE 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] New Configuration: 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] 	r(0) ip(192.168.0.3)  
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] Members Left: 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] Members Joined: 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] CLM CONFIGURATION CHANGE 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] New Configuration: 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] 	r(0) ip(192.168.0.3)  
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] 	r(0) ip(192.168.0.4)  
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] Members Left: 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] Members Joined: 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] 	r(0) ip(192.168.0.4)  
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [SYNC ] This node is within the primary component and will provide service. 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [TOTEM] entering OPERATIONAL state. 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] got nodejoin message 192.168.0.4 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CLM  ] got nodejoin message 192.168.0.3 
Jun 11 20:01:40 hotsite-bsb-la-1 openais[2939]: [CPG  ] got joinlist message from node 1 
Jun 11 20:01:42 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=1: Trying to acquire journal lock...
Jun 11 20:01:42 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=1: Looking at journal...
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: Handshake successful: Agreed network protocol version 88
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0:  old = { cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0:  new = { cs:WFReportParams st:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: conn( WFConnection -> WFReportParams ) 
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: Starting asender thread (from drbd0_receiver [526])
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: data-integrity-alg: <not-used>
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Outdated ) 
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: Writing meta data super block now.
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: tl_clear()
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: susp( 1 -> 0 ) 
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: peer( Secondary -> Primary ) 
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Outdated -> Inconsistent ) 
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: Began resync as SyncSource (will sync 548864 KB [137216 bits set]).
Jun 11 20:05:04 hotsite-bsb-la-1 kernel: drbd0: Writing meta data super block now.
Jun 11 20:05:05 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=1: Acquiring the transaction lock...
Jun 11 20:05:07 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=1: Replaying journal...
Jun 11 20:05:07 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=1: Replayed 0 of 1 blocks
Jun 11 20:05:07 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=1: replays = 0, skips = 0, sames = 1
Jun 11 20:05:10 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=1: Journal replayed in 5s
Jun 11 20:05:10 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=1: Done
Jun 11 20:05:20 hotsite-bsb-la-1 kernel: drbd0: Resync done (total 15 sec; paused 0 sec; 36588 K/sec)
Jun 11 20:05:20 hotsite-bsb-la-1 kernel: drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
Jun 11 20:05:20 hotsite-bsb-la-1 kernel: drbd0: Writing meta data super block now.
Jun 11 20:07:03 hotsite-bsb-la-1 kernel: Trying to join cluster "lock_dlm", "hotsite:gfs-00"
Jun 11 20:07:03 hotsite-bsb-la-1 kernel: dlm: Using TCP for communications
Jun 11 20:07:03 hotsite-bsb-la-1 kernel: Joined cluster. Now mounting FS...
Jun 11 20:07:03 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=0: Trying to acquire journal lock...
Jun 11 20:07:03 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=0: Looking at journal...
Jun 11 20:07:03 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=0: Done
Jun 11 20:07:03 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=1: Trying to acquire journal lock...
Jun 11 20:07:03 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=1: Looking at journal...
Jun 11 20:07:03 hotsite-bsb-la-1 kernel: GFS: fsid=hotsite:gfs-00.0: jid=1: Done
Jun 11 20:07:25 hotsite-bsb-la-1 kernel: dlm: connecting to 2

Thanks!

-- 
Tiago Cruz
http://everlinux.com
Linux User #282636


From Rakesh.Kumar at evalueserve.com  Thu Jun 12 01:41:57 2008
From: Rakesh.Kumar at evalueserve.com (Rakesh Kumar Ranjan)
Date: Thu, 12 Jun 2008 07:11:57 +0530
Subject: [Linux-cluster] cluster
Message-ID: <810FE46AADEF62478AAB09DB687D8243080DCCFB@EVSMAIL.Evalueserve.com>

Hi,

 
Please send me the documentation about the installation & configuration
of linux cluster.

 
Best Regards,

  
Rakesh Kumar Ranjan

________________________________

System Engineer, Information Technology 
Evalueserve 
Tel: +91 124 4120000 (ext: 1580)
www.evalueserve.com <http://www.evalueserve.com/>  

 
The information in this e-mail is the property of Evalueserve and is confidential and privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken in reliance on it is prohibited and will be unlawful. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080612/cb7b629e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 20943 bytes
Desc: image001.png
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080612/cb7b629e/attachment.png>

From mij at irwan.name  Thu Jun 12 01:45:31 2008
From: mij at irwan.name (Mohd Irwan Jamaluddin)
Date: Thu, 12 Jun 2008 09:45:31 +0800
Subject: [Linux-cluster] cluster
In-Reply-To: <810FE46AADEF62478AAB09DB687D8243080DCCFB@EVSMAIL.Evalueserve.com>
References: <810FE46AADEF62478AAB09DB687D8243080DCCFB@EVSMAIL.Evalueserve.com>
Message-ID: <b3d998b00806111845y7bcb656cmdde628785e1231d3@mail.gmail.com>

2008/6/12 Rakesh Kumar Ranjan <Rakesh.Kumar at evalueserve.com>:
> Hi,
>
>
>
> Please send me the documentation about the installation & configuration of
> linux cluster.
>

What kind of cluster are you looking for?
Red Hat Docs is a good place to start with,
http://www.redhat.com/docs/manuals/csgfs/

-- 
Regards,
Mohd Irwan Jamaluddin
Web: http://www.irwan.name/
Blog: http://blog.irwan.name/


From adel at opennet.ae  Thu Jun 12 07:14:45 2008
From: adel at opennet.ae (Adel Ben Zarrouk)
Date: Thu, 12 Jun 2008 11:14:45 +0400
Subject: [Linux-cluster] KTCPVS
Message-ID: <200806121114.45727.adel@opennet.ae>

Hello,

Is RHEL support KTCPVS as load balancing layer-7 switching?

Regards

 --Adel


From lhh at redhat.com  Thu Jun 12 15:03:43 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 12 Jun 2008 11:03:43 -0400
Subject: [Linux-cluster] DRBD8 and GFS issues
In-Reply-To: <1213225843.6895.59.camel@tuxkiller.ig.com.br>
References: <1213225843.6895.59.camel@tuxkiller.ig.com.br>
Message-ID: <1213283023.11285.13.camel@ayanami.boston.devel.redhat.com>

On Wed, 2008-06-11 at 20:10 -0300, Tiago Cruz wrote:
> Jun 11 20:01:37 hotsite-bsb-la-1 fenced[2956]: fence "drdb_hotsite-2"
> failed

GFS requires fencing (even if DRBD doesn't).

I wrote a howto here to make GFS/DRBD work:

  http://sources.redhat.com/cluster/wiki/DRBD_Cookbook

The examples have power-based fencing where we shut the node off and
report to DRBD that "external STONITH succeeded".

You could roll your own peer-outdater script if you wanted.

-- Lon


From jbrassow at redhat.com  Thu Jun 12 16:20:56 2008
From: jbrassow at redhat.com (Jonathan Brassow)
Date: Thu, 12 Jun 2008 11:20:56 -0500
Subject: [Linux-cluster] Create Cluster using existing LVM on shared
	storage
In-Reply-To: <9c649280806110623l15f67558r35d0ba05710a01df@mail.gmail.com>
References: <9c649280806101712g1b5c68a4g1a4d53f534596d95@mail.gmail.com>
	<B81BE59F-E503-4725-B74E-C7B49EEE54C4@redhat.com>
	<9c649280806110623l15f67558r35d0ba05710a01df@mail.gmail.com>
Message-ID: <D0F0F47C-9F39-4C90-9B64-D66846C51B29@redhat.com>

Snapshot volumes are the only thing that would not work (because there  
is no cluster-aware snapshot implementation yet).

Other than that, all your volumes will convert just fine without  
having to recreate, etc.

Once you've got the cluster infrastructure setup, all you need to do is:
*> setup cluster infrastructure
*> install lvm2-cluster
1> vgchange -an <volume group> # deactivate volumes
2> vgchange -cy <volume group>  # set cluster attribute
3> vgchange -ay <volume group> # re-activate the volumes

  brassow

On Jun 11, 2008, at 8:23 AM, Ron Cronenwett wrote:

> Excellent, this answers most of what I was looking for.
> Just one question, if I were to install CLVM, can the existing volume
> group and logical volume be converted or would I need to recreate
> everything and restore the data from a back up copy?
>
> Thanks
>
> Ron Cronenwett
>
>
> On Wed, Jun 11, 2008 at 8:16 AM, Jonathan Brassow  
> <jbrassow at redhat.com> wrote:
>> Here is a good source of information for those questions:
>> http://sources.redhat.com/cluster/wiki/LVMFailover
>>
>> brassow
>>
>> On Jun 10, 2008, at 7:12 PM, Ron Cronenwett wrote:
>>
>>> Hi all,
>>>
>>> I have been presented a pair of servers (RHEL 5.2) connected to a  
>>> dual
>>> port scsi array. The storage has been configured as one large disk
>>> partitioned into /dev/sdb1 /dev/sdb2,and /dev/sdb3 with dev/sdb3 the
>>> sole device in a physical volume. Several logical volumes have been
>>> created on the physical volume. Currently the logical volumes are
>>> mounted on server A. Server B was added to the mix as a new server.
>>> Server B sees the external storage with fdisk -l and the volumes  
>>> with
>>> pvdisplay.
>>>
>>> My questions are, can an existing logical volume be used as a  
>>> cluster
>>> resource that is mounted on one node or the other (but not both)?
>>> Would this work with just the existing LVM or would CLVM be  
>>> needed? If
>>> I need CLVM, can the existing logical volumes be used without
>>> destroying the existing data?  In either case, would all the logical
>>> volumes need to be mounted on the same server or can just one  
>>> logical
>>> volume be mounted on one server while the remaining logical volumes
>>> are mounted on the other?
>>>
>>> The idea is to create an active-passive HA cluster of an existing  
>>> web
>>> server.
>>>
>>> Any thoughts, ideas, comments, questions would be appreciated.
>>>
>>> Thanks
>>>
>>> Ron Cronenwett
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From balajisundar at midascomm.com  Fri Jun 13 07:08:02 2008
From: balajisundar at midascomm.com (Balaji)
Date: Fri, 13 Jun 2008 12:38:02 +0530
Subject: [Linux-cluster] cman stop failed
Message-ID: <48521CD2.8040602@midascomm.com>

Dear All,

  I am using RHEL4 Update 3 Cluster Suite for i386 Architecture and
  my linux kernel version is 2.6.9-34.EL
  I have configured two node cluster
  My nodes are primary and secondary and I am using cluster to 
monitoring scripts
  During reboot or poweroff i am getting the problem is cman stop failed
  and I have verified the "/var/log/messages" log file and i getting the 
following
  messages
  Jun 12 18:34:13 primary cman: Stopping cman:
  Jun 12 18:34:16 primary cman: failed to stop cman failed
  Jun 12 18:34:16 primary cman:
 Jun 12 18:34:16 primary cman:
 Jun 12 18:34:16 primary rc: Stopping cman:  failed

We are not sure why this is happening. Can some one throw light on this.

Regards
-S.Balaji


From sghosh at redhat.com  Fri Jun 13 14:14:39 2008
From: sghosh at redhat.com (Subhendu Ghosh)
Date: Fri, 13 Jun 2008 10:14:39 -0400
Subject: [Linux-cluster] KTCPVS
In-Reply-To: <200806121114.45727.adel@opennet.ae>
References: <200806121114.45727.adel@opennet.ae>
Message-ID: <485280CF.7070001@redhat.com>

Adel Ben Zarrouk wrote:
> Hello,
> 
> Is RHEL support KTCPVS as load balancing layer-7 switching?
> 

not in the shipping products.  KTCPVS hasn't realy seen and major development 
and as kernel component hasn't really been pushed upstream.

-regards
Subhendu


From ssingh at amnh.org  Fri Jun 13 14:16:45 2008
From: ssingh at amnh.org (Sajesh Singh)
Date: Fri, 13 Jun 2008 10:16:45 -0400
Subject: [Linux-cluster] Logging configuration
Message-ID: <4852814D.7050006@amnh.org>

I am running RHCS on a EL 4.6 platform. How can I change the logging 
configuration so that cluster messages that are normally sent to 
/var/log/messages are logged to a separate file?

Regards,

Sajesh Singh


From lhh at redhat.com  Fri Jun 13 15:17:20 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 13 Jun 2008 11:17:20 -0400
Subject: [Linux-cluster] Logging configuration
In-Reply-To: <4852814D.7050006@amnh.org>
References: <4852814D.7050006@amnh.org>
Message-ID: <1213370240.11285.99.camel@ayanami.boston.devel.redhat.com>


On Fri, 2008-06-13 at 10:16 -0400, Sajesh Singh wrote:
> I am running RHCS on a EL 4.6 platform. How can I change the logging 
> configuration so that cluster messages that are normally sent to 
> /var/log/messages are logged to a separate file?

Adding this to syslog.conf should redirect most of the cluster stack in
one shot on RHEL 4:

daemon.* /var/log/daemon-log

-- Lon


From rcronenwett at gmail.com  Fri Jun 13 17:46:17 2008
From: rcronenwett at gmail.com (Ron Cronenwett)
Date: Fri, 13 Jun 2008 13:46:17 -0400
Subject: [Linux-cluster] Create Cluster using existing LVM on shared
	storage
In-Reply-To: <D0F0F47C-9F39-4C90-9B64-D66846C51B29@redhat.com>
References: <9c649280806101712g1b5c68a4g1a4d53f534596d95@mail.gmail.com>
	<B81BE59F-E503-4725-B74E-C7B49EEE54C4@redhat.com>
	<9c649280806110623l15f67558r35d0ba05710a01df@mail.gmail.com>
	<D0F0F47C-9F39-4C90-9B64-D66846C51B29@redhat.com>
Message-ID: <9c649280806131046j54760669l481100c588a74905@mail.gmail.com>

Thanks again. I think I have everything I need now.

Ron Cronenwett


On Thu, Jun 12, 2008 at 12:20 PM, Jonathan Brassow <jbrassow at redhat.com> wrote:
> Snapshot volumes are the only thing that would not work (because there is no
> cluster-aware snapshot implementation yet).
>
> Other than that, all your volumes will convert just fine without having to
> recreate, etc.
>
> Once you've got the cluster infrastructure setup, all you need to do is:
> *> setup cluster infrastructure
> *> install lvm2-cluster
> 1> vgchange -an <volume group> # deactivate volumes
> 2> vgchange -cy <volume group>  # set cluster attribute
> 3> vgchange -ay <volume group> # re-activate the volumes
>
>  brassow
>
> On Jun 11, 2008, at 8:23 AM, Ron Cronenwett wrote:
>
>> Excellent, this answers most of what I was looking for.
>> Just one question, if I were to install CLVM, can the existing volume
>> group and logical volume be converted or would I need to recreate
>> everything and restore the data from a back up copy?
>>
>> Thanks
>>
>> Ron Cronenwett
>>
>>
>> On Wed, Jun 11, 2008 at 8:16 AM, Jonathan Brassow <jbrassow at redhat.com>
>> wrote:
>>>
>>> Here is a good source of information for those questions:
>>> http://sources.redhat.com/cluster/wiki/LVMFailover
>>>
>>> brassow
>>>
>>> On Jun 10, 2008, at 7:12 PM, Ron Cronenwett wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have been presented a pair of servers (RHEL 5.2) connected to a dual
>>>> port scsi array. The storage has been configured as one large disk
>>>> partitioned into /dev/sdb1 /dev/sdb2,and /dev/sdb3 with dev/sdb3 the
>>>> sole device in a physical volume. Several logical volumes have been
>>>> created on the physical volume. Currently the logical volumes are
>>>> mounted on server A. Server B was added to the mix as a new server.
>>>> Server B sees the external storage with fdisk -l and the volumes with
>>>> pvdisplay.
>>>>
>>>> My questions are, can an existing logical volume be used as a cluster
>>>> resource that is mounted on one node or the other (but not both)?
>>>> Would this work with just the existing LVM or would CLVM be needed? If
>>>> I need CLVM, can the existing logical volumes be used without
>>>> destroying the existing data?  In either case, would all the logical
>>>> volumes need to be mounted on the same server or can just one logical
>>>> volume be mounted on one server while the remaining logical volumes
>>>> are mounted on the other?
>>>>
>>>> The idea is to create an active-passive HA cluster of an existing web
>>>> server.
>>>>
>>>> Any thoughts, ideas, comments, questions would be appreciated.
>>>>
>>>> Thanks
>>>>
>>>> Ron Cronenwett
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


From tiagocruz at forumgdh.net  Fri Jun 13 17:48:01 2008
From: tiagocruz at forumgdh.net (Tiago Cruz)
Date: Fri, 13 Jun 2008 14:48:01 -0300
Subject: [Linux-cluster] DRBD8 and GFS issues
In-Reply-To: <1213283023.11285.13.camel@ayanami.boston.devel.redhat.com>
References: <1213225843.6895.59.camel@tuxkiller.ig.com.br>
	<1213283023.11285.13.camel@ayanami.boston.devel.redhat.com>
Message-ID: <1213379281.10530.2.camel@tuxkiller.ig.com.br>

On Thu, 2008-06-12 at 11:03 -0400, Lon Hohberger wrote:
> On Wed, 2008-06-11 at 20:10 -0300, Tiago Cruz wrote:
> > Jun 11 20:01:37 hotsite-bsb-la-1 fenced[2956]: fence "drdb_hotsite-2"
> > failed
> 
> GFS requires fencing (even if DRBD doesn't).

> You could roll your own peer-outdater script if you wanted.


Lon,

Many thanks for your tip!
How, the environment has working very well! I'm just modified your
script to use the manual fencing:

	# fence_node $REMOTE
	fence_ack_manual -O -e -n $REMOTE

	if [ $? -eq 0 ]; then
		exit 7
	fi

Now, I'll work to add one third node on this setup, and I'll need to
change your script again. I'll be back with results ASAP ;)

Thanks!

-- 
Tiago Cruz
http://everlinux.com
Linux User #282636


From lhh at redhat.com  Fri Jun 13 18:30:35 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 13 Jun 2008 14:30:35 -0400
Subject: [Linux-cluster] DRBD8 and GFS issues
In-Reply-To: <1213379281.10530.2.camel@tuxkiller.ig.com.br>
References: <1213225843.6895.59.camel@tuxkiller.ig.com.br>
	<1213283023.11285.13.camel@ayanami.boston.devel.redhat.com>
	<1213379281.10530.2.camel@tuxkiller.ig.com.br>
Message-ID: <1213381835.11285.130.camel@ayanami.boston.devel.redhat.com>


On Fri, 2008-06-13 at 14:48 -0300, Tiago Cruz wrote:
> On Thu, 2008-06-12 at 11:03 -0400, Lon Hohberger wrote:
> > On Wed, 2008-06-11 at 20:10 -0300, Tiago Cruz wrote:
> > > Jun 11 20:01:37 hotsite-bsb-la-1 fenced[2956]: fence "drdb_hotsite-2"
> > > failed
> > 
> > GFS requires fencing (even if DRBD doesn't).
> 
> > You could roll your own peer-outdater script if you wanted.
> 
> 
> Lon,
> 
> Many thanks for your tip!
> How, the environment has working very well! I'm just modified your
> script to use the manual fencing:

Yipes... That's not what I meant.  My apologies for being unclear.  I
meant roll a script using the other "resource-only" method.

disk {
  fencing resource-only;
}

The script would tells the other node to run "drbdadm outdate
<resource-name>" and return "4" if it was successful[1].

>         # fence_node $REMOTE
> 	fence_ack_manual -O -e -n $REMOTE
> 
> 	if [ $? -eq 0 ]; then
> 		exit 7
> 	fi

Running fence_ack_manual in an automated way defeats the purpose and is
really "no fencing at all". 

If you are really okay with this, you could simplify your configuration
somewhat.  For example, you could just change the fencedevice "agent"
attribute to "/bin/true" in cluster.conf and simplify your peer-outdater
script:

#!/bin/sh
exit 7

Or -- you can even further simplify things by changing drbd.conf to not
care about fencing (then you don't need a script at all):

disk {
  fencing dont-care;
}


> Now, I'll work to add one third node on this setup, and I'll need to
> change your script again. I'll be back with results ASAP ;)

You can't do 3 nodes with the open source version of DRBD...?

-- Lon

[1] http://osdir.com/ml/linux.kernel.drbd.devel/2006-11/msg00005.html


From td3201 at gmail.com  Sun Jun 15 20:17:52 2008
From: td3201 at gmail.com (Terry)
Date: Sun, 15 Jun 2008 15:17:52 -0500
Subject: [Linux-cluster] disk performance issues
Message-ID: <8ee061010806151317s25f2fe7bx73e065f6ca999b69@mail.gmail.com>

Hello,

I am starting here because the filesystems in question are GFS.  Here
are a few facts:

iscsi
bonded ethernet with 9000 MTU setting
LVM
GFS

Here is an iostat of the volumes in question:
Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sdc              11.36   564.51 373.65 99.31  3082.85  5397.92
17.93     7.64   16.13   1.69  79.82
sdd              11.92   567.96 266.21 48.28  2230.24  5044.40
23.13     0.91   51.38   2.04  64.30
sdf              13.80   536.48 263.91 77.40  2225.89  5014.98
21.21     0.70   46.72   2.13  72.54
sde              39.97   802.23 345.06 47.22  3089.78  6969.91
25.64     2.42   45.02   1.89  74.32

The load average of the box is 40.   I am not using near the capacity
of the network.

Based on all of this, anyone have any thoughts?  My await seems a tad high.  :)

Thanks!


From ben.yarwood at juno.co.uk  Mon Jun 16 09:49:56 2008
From: ben.yarwood at juno.co.uk (Ben Yarwood)
Date: Mon, 16 Jun 2008 10:49:56 +0100
Subject: [Linux-cluster] Help with cluster.conf
Message-ID: <021001c8cf96$55d510d0$017f3270$@yarwood@juno.co.uk>

I'm not sure how to craft a service in my RHEL4 cluster.conf for the following:

I have 4 gfs file systems that I "combine" for use as one larger file system, joining them together using symlinks in another
directory.  The reason for this is size limitations with the 32bit OS we use and also they all contain a very large number of files.

I want to export this resulting combined file system using nfs.  This is a snippet of what I have so far in my cluster.conf, the
four file systems are all mounted in slightly different directories but the directory I want to export is a fifth one
"/mnt/encoded/audio/wav".  I need to somehow reflect this directory in the configuration so the nfs export knows what to export but
make sure the resource manager will not try to mount and unmount the directory as it's not an fs.

Any ideas would be appreciated?

Regards
Ben

<resources>

   <clusterfs fstype="gfs" name="wavfs-0" mountpoint="/mnt/encoded/audio/wav-0" device="/dev/mapper/shared_disk-wav--0" options=""/>
   <clusterfs fstype="gfs" name="wavfs-4" mountpoint="/mnt/encoded/audio/wav-4" device="/dev/mapper/shared_disk-wav--4" options=""/>

   <clusterfs fstype="gfs" name="wavfs-8" mountpoint="/mnt/encoded/audio/wav-8" device="/dev/mapper/shared_disk-wav--8" options=""/>

   <clusterfs fstype="gfs" name="wavfs-C" mountpoint="/mnt/encoded/audio/wav-C" device="/dev/mapper/shared_disk-wav--C" options=""/>

   <nfsexport name="NFSexports"/>
   <nfsclient name="dmz" target="10.0.40.0/24" options="ro,sync"/>
   <nfsclient name="data" target="10.0.20.0/24" options="rw,sync,no_root_squash"/>
   <nfsclient name="web" target="10.0.1.0/24" options="ro,sync"/>     

</resources>

   <service name="wav" domain="a-priority">
      <ip address="10.0.20.50" monitor_link="no"/>
      <ip address="10.0.1.50" monitor_link="no"/>
    
      <clusterfs ref="wavfs-0"/>
      <clusterfs ref="wavfs-4"/>                        
	<clusterfs ref="wavfs-8"/>                        
	<clusterfs ref="wavfs-C"/>

      .... I need to include something here to tell nfs what to export! ....
      .... Along the lines of export directory /mnt/encoded/audio/wav ....
         <nfsexport ref="NFSexports">
            <nfsclient ref="dmz"/>
            <nfsclient ref="data"/>                                        
            <nfsclient ref="web"/>
         </nfsexport>
	..... And close it here .... 

   </service>


From gsrlinux at gmail.com  Mon Jun 16 10:27:07 2008
From: gsrlinux at gmail.com (GS R)
Date: Mon, 16 Jun 2008 15:57:07 +0530
Subject: [Linux-cluster] Help with cluster.conf
In-Reply-To: <7055134104139601366@unknownmsgid>
References: <7055134104139601366@unknownmsgid>
Message-ID: <d765e01f0806160327m59410d06y32d0164b81486196@mail.gmail.com>

Hi Ben

<resources>
>
>   <clusterfs fstype="gfs" name="wavfs-0"
> mountpoint="/mnt/encoded/audio/wav-0"
> device="/dev/mapper/shared_disk-wav--0" options=""/>
>   <clusterfs fstype="gfs" name="wavfs-4"
> mountpoint="/mnt/encoded/audio/wav-4"
> device="/dev/mapper/shared_disk-wav--4" options=""/>
>
>   <clusterfs fstype="gfs" name="wavfs-8"
> mountpoint="/mnt/encoded/audio/wav-8"
> device="/dev/mapper/shared_disk-wav--8" options=""/>
>
>   <clusterfs fstype="gfs" name="wavfs-C"
> mountpoint="/mnt/encoded/audio/wav-C"
> device="/dev/mapper/shared_disk-wav--C" options=""/>
>
>   <nfsexport name="NFSexports"/>
>   <nfsclient name="dmz" target="10.0.40.0/24" options="ro,sync"/>
>   <nfsclient name="data" target="10.0.20.0/24"
> options="rw,sync,no_root_squash"/>
>   <nfsclient name="web" target="10.0.1.0/24" options="ro,sync"/>
>
> </resources>
>
>   <service name="wav" domain="a-priority">
>      <ip address="10.0.20.50" monitor_link="no"/>
>      <ip address="10.0.1.50" monitor_link="no"/>
>
>      <clusterfs ref="wavfs-0"/>
>      <clusterfs ref="wavfs-4"/>
>        <clusterfs ref="wavfs-8"/>
>        <clusterfs ref="wavfs-C"/>
>
>      .... I need to include something here to tell nfs what to export! ....
>      .... Along the lines of export directory /mnt/encoded/audio/wav ....


<netfs options="rw" mountpoint="/mnt/encoded/audio/wav" force_unmount="1"
export="/export-path" fstype="nfs" host="hostname"
name="name-of-the-resource"/>


        <nfsexport ref="NFSexports">
>            <nfsclient ref="dmz"/>
>            <nfsclient ref="data"/>
>            <nfsclient ref="web"/>
>         </nfsexport>
>        ..... And close it here ....
>
>   </service>


Let me know if this works and correct me if I am wrong :)

Thanks n Regards
Gowrishankar Rajaiyan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080616/222d29f1/attachment.htm>

From gsrlinux at gmail.com  Mon Jun 16 10:29:34 2008
From: gsrlinux at gmail.com (GS R)
Date: Mon, 16 Jun 2008 15:59:34 +0530
Subject: [Linux-cluster] Help with cluster.conf
In-Reply-To: <d765e01f0806160327m59410d06y32d0164b81486196@mail.gmail.com>
References: <7055134104139601366@unknownmsgid>
	<d765e01f0806160327m59410d06y32d0164b81486196@mail.gmail.com>
Message-ID: <d765e01f0806160329j47cefcc7he3142a2f48da6d48@mail.gmail.com>

On 6/16/08, GS R <gsrlinux at gmail.com> wrote:
>
>
> Hi Ben
>
> <resources>
>>
>>   <clusterfs fstype="gfs" name="wavfs-0"
>> mountpoint="/mnt/encoded/audio/wav-0"
>> device="/dev/mapper/shared_disk-wav--0" options=""/>
>>   <clusterfs fstype="gfs" name="wavfs-4"
>> mountpoint="/mnt/encoded/audio/wav-4"
>> device="/dev/mapper/shared_disk-wav--4" options=""/>
>>
>>   <clusterfs fstype="gfs" name="wavfs-8"
>> mountpoint="/mnt/encoded/audio/wav-8"
>> device="/dev/mapper/shared_disk-wav--8" options=""/>
>>
>>   <clusterfs fstype="gfs" name="wavfs-C"
>> mountpoint="/mnt/encoded/audio/wav-C"
>> device="/dev/mapper/shared_disk-wav--C" options=""/>
>>
>>   <nfsexport name="NFSexports"/>
>>   <nfsclient name="dmz" target="10.0.40.0/24" options="ro,sync"/>
>>   <nfsclient name="data" target="10.0.20.0/24"
>> options="rw,sync,no_root_squash"/>
>>   <nfsclient name="web" target="10.0.1.0/24" options="ro,sync"/>
>
>
<netfs options="rw" mountpoint="/mnt/encoded/audio/wav" force_unmount="1"
export="/export-path" fstype="nfs" host="hostname"
name="name-of-the-resource"/>

 </resources>
>>
>>   <service name="wav" domain="a-priority">
>>      <ip address="10.0.20.50" monitor_link="no"/>
>>      <ip address="10.0.1.50" monitor_link="no"/>
>>
>>      <clusterfs ref="wavfs-0"/>
>>      <clusterfs ref="wavfs-4"/>
>>        <clusterfs ref="wavfs-8"/>
>>        <clusterfs ref="wavfs-C"/>
>>
>>      .... I need to include something here to tell nfs what to export!
>> ....
>>      .... Along the lines of export directory /mnt/encoded/audio/wav ....
>
>         <nfsexport ref="NFSexports">
>            <nfsclient ref="dmz"/>
>            <nfsclient ref="data"/>
>            <nfsclient ref="web"/>
>         </nfsexport>
>        ..... And close it here ....
>
>   </service>
>
> Let me know if this works and correct me if I am wrong :)
>
> Thanks n Regards
> Gowrishankar Rajaiyan
>
>
>

Sorry!!! Those entries should to be in the resources section.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080616/6a156b35/attachment.htm>

From ben.yarwood at juno.co.uk  Mon Jun 16 11:29:14 2008
From: ben.yarwood at juno.co.uk (Ben Yarwood)
Date: Mon, 16 Jun 2008 12:29:14 +0100
Subject: [Linux-cluster] Help with cluster.conf
In-Reply-To: <d765e01f0806160329j47cefcc7he3142a2f48da6d48@mail.gmail.com>
References: <7055134104139601366@unknownmsgid>	<d765e01f0806160327m59410d06y32d0164b81486196@mail.gmail.com>
	<d765e01f0806160329j47cefcc7he3142a2f48da6d48@mail.gmail.com>
Message-ID: <022201c8cfa4$350511d0$9f0f3570$@yarwood@juno.co.uk>

I think having looked at:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/netfs.sh?rev=1.11&content-type=text/x-cvsweb-markup&cvs
root=cluster

this:

        <netfs options="rw" mountpoint="/mnt/encoded/audio/wav" force_unmount="1" export="/export-path" fstype="nfs" host="hostname"
name="name-of-the-resource"/>

Will try to mount an nfs fs from the remote host and export <host="hostname" export="/export-path"> rather than just specify a
directory to be exported from a local fs.

I've now looked at the nfsclient.sh script and I think I can just specify the export path in that part of the file e.g. as follows:

<resources>
...

<nfsclient name="dmz" path="/mnt/encoded/audio/wav" target="10.0.40.0/24" options="ro,sync"/>

...

</resources>


Can anyone confirm if this is correct?   I'm not sure what will happen as there is no fsid to inherit?


Regards
Ben


> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of GS R
> Sent: 16 June 2008 11:30
> To: linux clustering
> Subject: Re: [Linux-cluster] Help with cluster.conf
> 
> 
> 
> On 6/16/08, GS R <gsrlinux at gmail.com> wrote:
> 
> 
> 	Hi Ben
> 
> 
> 
> 		<resources>
> 
> 		  <clusterfs fstype="gfs" name="wavfs-0" mountpoint="/mnt/encoded/audio/wav-0"
> device="/dev/mapper/shared_disk-wav--0" options=""/>
> 		  <clusterfs fstype="gfs" name="wavfs-4" mountpoint="/mnt/encoded/audio/wav-4"
> device="/dev/mapper/shared_disk-wav--4" options=""/>
> 
> 		  <clusterfs fstype="gfs" name="wavfs-8" mountpoint="/mnt/encoded/audio/wav-8"
> device="/dev/mapper/shared_disk-wav--8" options=""/>
> 
> 		  <clusterfs fstype="gfs" name="wavfs-C" mountpoint="/mnt/encoded/audio/wav-C"
> device="/dev/mapper/shared_disk-wav--C" options=""/>
> 
> 		  <nfsexport name="NFSexports"/>
> 		  <nfsclient name="dmz" target="10.0.40.0/24" options="ro,sync"/>
> 		  <nfsclient name="data" target="10.0.20.0/24" options="rw,sync,no_root_squash"/>
> 		  <nfsclient name="web" target="10.0.1.0/24" options="ro,sync"/>
> 
> 
> <netfs options="rw" mountpoint="/mnt/encoded/audio/wav" force_unmount="1" export="/export-path"
> fstype="nfs" host="hostname" name="name-of-the-resource"/>
> 
> 
> 
> 
> 		</resources>
> 
> 		  <service name="wav" domain="a-priority">
> 		     <ip address="10.0.20.50 <http://10.0.20.50/> " monitor_link="no"/>
> 		     <ip address="10.0.1.50 <http://10.0.1.50/> " monitor_link="no"/>
> 
> 		     <clusterfs ref="wavfs-0"/>
> 		     <clusterfs ref="wavfs-4"/>
> 		       <clusterfs ref="wavfs-8"/>
> 		       <clusterfs ref="wavfs-C"/>
> 
> 		     .... I need to include something here to tell nfs what to export! ....
> 		     .... Along the lines of export directory /mnt/encoded/audio/wav ....
> 
> 
> 	        <nfsexport ref="NFSexports">
> 	           <nfsclient ref="dmz"/>
> 	           <nfsclient ref="data"/>
> 	           <nfsclient ref="web"/>
> 	        </nfsexport>
> 	       ..... And close it here ....
> 
> 	  </service>
> 
> 
> 	Let me know if this works and correct me if I am wrong :)
> 
> 	Thanks n Regards
> 	Gowrishankar Rajaiyan
> 
> 
> 
> 
> Sorry!!! Those entries should to be in the resources section.


From alfredo.moralejo at roche.com  Mon Jun 16 15:35:44 2008
From: alfredo.moralejo at roche.com (Moralejo, Alfredo)
Date: Mon, 16 Jun 2008 17:35:44 +0200
Subject: [Linux-cluster] chkconfig on lvm2-cluster
Message-ID: <EF348547F4F8A747B3D4BAD9C0C9F4B8276940@rkamsem4.emea.roche.com>

Hi,

 
I've been using GFS in the past using RHEL 4U4. Now I'm trying to do it based on RHEL 4.6. Everything seems to run fine more or less, however I've observed a changes i lvm2-cluster package (I'm using now lvm2-cluster-2.02.27-2.el4_6.2):

 
-chkconfig -add does not enable the scripts, only add it to chkconfig but I have to enable them manually (in the chkconfig line of the init scritp there is a "-" instead of the levels)

                        
Am I right?, If so I think it's not very clearly documented in the changelog.

 
Best regards,

 
Alfredo Moralejo 
UNIX Senior Specialist 

Roche Farma, S.A. 
PGIE UNIX Platform Engineering
Josefa Valc?rcel 40, 2nd. Floor 
28027, Madrid, Spain

Phone: +34 91 305 97 87 

alfredo.moralejo at roche.com <mailto:alfredo.moralejo at roche.com>  

Confidentiality Note: This message is intended only for the use of the named recipient(s) and may contain confidential and/or proprietary information. If you are not the intended recipient, please contact the sender and delete this message. Any unauthorized use of the information contained in this message is prohibited. 

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080616/7cbce9cd/attachment.htm>

From shawnlhood at gmail.com  Mon Jun 16 15:54:36 2008
From: shawnlhood at gmail.com (Shawn Hood)
Date: Mon, 16 Jun 2008 11:54:36 -0400
Subject: [Linux-cluster] cluster instability
Message-ID: <cfe2fc960806160854q7ade9413g5012d44730fbdda8@mail.gmail.com>

All,

This message was sent out to my office, so the voice may seem a bit
odd.  We have a 4 node cluster running RHEL4U6 on Dell Poweredge
1950s.  Fencing is done via DRAC.

Using packages (from RHN):

cman-kernel-smp-2.6.9-53.13
cman-1.0.17-0.el4_6.5
ccs-1.0.11-1.el4_6.1
fence-1.32.50-2.el4_6.1
lvm2-cluster-2.02.27-2.el4_6.2
dlm-kernel-smp-2.6.9-52.9
dlm-kernheaders-2.6.9-52.9

Our cluster became unstable on Saturday morning.  Apparently
hugin stopped sending out heartbeats, causing it to become fenced.  hugin
was under heavy load (~10) at the time:

03:30:02 AM         6       453      9.35     10.29     10.51
03:40:01 AM        12       465     11.02     11.00     10.75
03:50:02 AM         3       446      9.75     10.80     10.86
04:00:01 AM         5       430      9.23      9.47     10.07
Average:            7       455     10.19     10.32     10.28

04:09:35 AM       LINUX RESTART

As you can see, hugin was fenced at 4:09.  The other nodes then began
logging the following:

Jun 14 04:08:06 munin kernel: CMAN: Initiating transition, generation 58
Jun 14 04:08:21 munin kernel: CMAN: Initiating transition, generation 59
Jun 14 04:08:36 munin kernel: CMAN: Initiating transition, generation 60
Jun 14 04:08:51 munin kernel: CMAN: Initiating transition, generation 61
Jun 14 04:09:06 munin kernel: CMAN: too many transition restarts - will die
Jun 14 04:09:06 munin kernel: CMAN: we are leaving the cluster. Inconsistent
cluster view

After so many 'initiating transition' messages, the cluster died.  Our
network utilization was very low at the time.

Any ideas?

Shawn


From shawnlhood at gmail.com  Mon Jun 16 15:57:40 2008
From: shawnlhood at gmail.com (Shawn Hood)
Date: Mon, 16 Jun 2008 11:57:40 -0400
Subject: [Linux-cluster] Re: cluster instability
In-Reply-To: <cfe2fc960806160854q7ade9413g5012d44730fbdda8@mail.gmail.com>
References: <cfe2fc960806160854q7ade9413g5012d44730fbdda8@mail.gmail.com>
Message-ID: <cfe2fc960806160857v72e13598qf4d25f5db8adc1e5@mail.gmail.com>

And here's my cluster.conf:

<?xml version="1.0"?>
<cluster alias="tungsten" config_version="31" name="qualia">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="odin" votes="1">
                        <fence>
                                <method name="1">
					<device modulename="" name="odin-drac"/>
				</method>
                        </fence>
                </clusternode>
                <clusternode name="hugin" votes="1">
                        <fence>
                                <method name="1">
					<device modulename="" name="hugin-drac"/>
				</method>
                        </fence>
                </clusternode>
                <clusternode name="munin" votes="1">
                        <fence>
                                <method name="1">
					<device modulename="" name="munin-drac"/>
				</method>
                        </fence>
                </clusternode>
                <clusternode name="zeus" votes="1">
                        <fence>
                                <method name="1">
					<device modulename="" name="zeus-drac"/>
				</method>
                        </fence>
                </clusternode>
	</clusternodes>
        <cman expected_votes="1" two_node="0"/>
        <fencedevices>
                <resources/>
                <fencedevice name="odin-drac" agent="fence_drac"
ipaddr="172.26.26.126" login="root" passwd="xxx"/>
                <fencedevice name="hugin-drac" agent="fence_drac"
ipaddr="172.26.26.123" login="root" passwd="xxx"/>
                <fencedevice name="munin-drac" agent="fence_drac"
ipaddr="172.26.26.120" login="root" passwd="xxx"/>
                <fencedevice name="zeus-drac" agent="fence_drac"
ipaddr="172.26.26.122" login="root" passwd="xxx"/>
        </fencedevices>
        <rm>
		<failoverdomains/>
		<resources/>
	</rm>
</cluster>


From td3201 at gmail.com  Mon Jun 16 16:45:51 2008
From: td3201 at gmail.com (Terry)
Date: Mon, 16 Jun 2008 11:45:51 -0500
Subject: [Linux-cluster] gfs tuning
Message-ID: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>

Hello,

I have 4 GFS volumes, each 4 TB.  I am seeing pretty high load
averages on the host that is serving these volumes out via NFS.  I
notice that gfs_scand, dlm_recv, and dlm_scand are running with high
CPU%.  I truly believe the box is I/O bound due to high awaits but
trying to dig into root cause.  99% of the activity on these volumes
is write.  The number of files is around 15 million per TB.   Given
the high number of writes, increasing scand_secs will not help.  Any
other optimizations I can do?

Thanks!


From td3201 at gmail.com  Mon Jun 16 16:53:59 2008
From: td3201 at gmail.com (Terry)
Date: Mon, 16 Jun 2008 11:53:59 -0500
Subject: [Linux-cluster] Re: gfs tuning
In-Reply-To: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
Message-ID: <8ee061010806160953l6b1a23fq242a29ae751a6653@mail.gmail.com>

Doh!   Check this out:

[root at omadvnfs01b ~]# gfs_tool df /data01d
/data01d:
  SB lock proto = "lock_dlm"
  SB lock table = "omadvnfs01:gfs_data01d"
  SB ondisk format = 1309
  SB multihost format = 1401
  Block size = 4096
  Journals = 2
  Resource Groups = 16384
  Mounted lock proto = "lock_dlm"
  Mounted lock table = "omadvnfs01:gfs_data01d"
  Mounted host data = "jid=1:id=786434:first=0"
  Journal number = 1
  Lock module flags = 0
  Local flocks = FALSE
  Local caching = FALSE
  Oopses OK = FALSE

  Type           Total          Used           Free           use%
  ------------------------------------------------------------------------
  inodes         18417216       18417216       0              100%
  metadata       21078536       20002007       1076529        95%
  data           1034059688     744936460      289123228      72%


The number of inodes is interesting......


On Mon, Jun 16, 2008 at 11:45 AM, Terry <td3201 at gmail.com> wrote:
> Hello,
>
> I have 4 GFS volumes, each 4 TB.  I am seeing pretty high load
> averages on the host that is serving these volumes out via NFS.  I
> notice that gfs_scand, dlm_recv, and dlm_scand are running with high
> CPU%.  I truly believe the box is I/O bound due to high awaits but
> trying to dig into root cause.  99% of the activity on these volumes
> is write.  The number of files is around 15 million per TB.   Given
> the high number of writes, increasing scand_secs will not help.  Any
> other optimizations I can do?
>
> Thanks!
>


From tiagocruz at forumgdh.net  Mon Jun 16 17:00:43 2008
From: tiagocruz at forumgdh.net (Tiago Cruz)
Date: Mon, 16 Jun 2008 14:00:43 -0300
Subject: [Linux-cluster] Re: gfs tuning
In-Reply-To: <8ee061010806160953l6b1a23fq242a29ae751a6653@mail.gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
	<8ee061010806160953l6b1a23fq242a29ae751a6653@mail.gmail.com>
Message-ID: <1213635644.8298.51.camel@tuxkiller.ig.com.br>

Doh!! :)

Is it normal?

$ gfs_tool df /mnt
/mnt:
  SB lock proto = "lock_dlm"
  SB lock table = "hotsite:gfs-00"
  SB ondisk format = 1309
  SB multihost format = 1401
  Block size = 4096
  Journals = 2
  Resource Groups = 424
  Mounted lock proto = "lock_dlm"
  Mounted lock table = "hotsite:gfs-00"
  Mounted host data = "jid=1:id=196609:first=0"
  Journal number = 1
  Lock module flags = 0
  Local flocks = FALSE
  Local caching = FALSE
  Oopses OK = FALSE

  Type           Total          Used           Free           use%           
  ------------------------------------------------------------------------
  inodes         854            854            0              100%
  metadata       48761          2259           46502          5%
  data           27652913       1061834        26591079       4%


I thinking my load average very high under apacheab...

On Mon, 2008-06-16 at 11:53 -0500, Terry wrote:
> Doh!   Check this out:
> 
> [root at omadvnfs01b ~]# gfs_tool df /data01d
> /data01d:
>   SB lock proto = "lock_dlm"
>   SB lock table = "omadvnfs01:gfs_data01d"
>   SB ondisk format = 1309
>   SB multihost format = 1401
>   Block size = 4096
>   Journals = 2
>   Resource Groups = 16384
>   Mounted lock proto = "lock_dlm"
>   Mounted lock table = "omadvnfs01:gfs_data01d"
>   Mounted host data = "jid=1:id=786434:first=0"
>   Journal number = 1
>   Lock module flags = 0
>   Local flocks = FALSE
>   Local caching = FALSE
>   Oopses OK = FALSE
> 
>   Type           Total          Used           Free           use%
>   ------------------------------------------------------------------------
>   inodes         18417216       18417216       0              100%
>   metadata       21078536       20002007       1076529        95%
>   data           1034059688     744936460      289123228      72%
> 
> 
> The number of inodes is interesting......
> 
> 
> On Mon, Jun 16, 2008 at 11:45 AM, Terry <td3201 at gmail.com> wrote:
> > Hello,
> >
> > I have 4 GFS volumes, each 4 TB.  I am seeing pretty high load
> > averages on the host that is serving these volumes out via NFS.  I
> > notice that gfs_scand, dlm_recv, and dlm_scand are running with high
> > CPU%.  I truly believe the box is I/O bound due to high awaits but
> > trying to dig into root cause.  99% of the activity on these volumes
> > is write.  The number of files is around 15 million per TB.   Given
> > the high number of writes, increasing scand_secs will not help.  Any
> > other optimizations I can do?
> >
> > Thanks!
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
-- 
Tiago Cruz
http://everlinux.com
Linux User #282636


From rpeterso at redhat.com  Mon Jun 16 17:06:45 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Mon, 16 Jun 2008 12:06:45 -0500
Subject: [Linux-cluster] Re: gfs tuning
In-Reply-To: <8ee061010806160953l6b1a23fq242a29ae751a6653@mail.gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
	<8ee061010806160953l6b1a23fq242a29ae751a6653@mail.gmail.com>
Message-ID: <1213636005.3724.38.camel@technetium.msp.redhat.com>

Hi Terry,

On Mon, 2008-06-16 at 11:53 -0500, Terry wrote:
> Doh!   Check this out:
> The number of inodes is interesting......

I don't see anything unusual about the inodes there offhand.
Since GFS allocates inodes and metadata from the free space,
it's perfectly normal to see "100% used" and "0% free".  It
will allocate new inodes as it needs to.  That just means that
not a lot has been deleted and able to be re-used.

As for performance tuning, did you read the performance tuning
section in the FAQ?  I've added a lot of info to it lately.
The URL is:
http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_tuning

Regards,

Bob Peterson
Red Hat Clustering & GFS


From Derek.Anderson at compellent.com  Mon Jun 16 17:08:58 2008
From: Derek.Anderson at compellent.com (Derek Anderson)
Date: Mon, 16 Jun 2008 12:08:58 -0500
Subject: [Linux-cluster] Re: gfs tuning
In-Reply-To: <8ee061010806160953l6b1a23fq242a29ae751a6653@mail.gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
	<8ee061010806160953l6b1a23fq242a29ae751a6653@mail.gmail.com>
Message-ID: <A7519D7FDEF2A144BA6D40BDF13100FE5AFF75@ex2.Beer.Town>


> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-
> bounces at redhat.com] On Behalf Of Terry
> Sent: Monday, June 16, 2008 11:54 AM
> To: linux clustering
> Subject: [Linux-cluster] Re: gfs tuning
> 
> Doh!   Check this out:
> 
> [root at omadvnfs01b ~]# gfs_tool df /data01d
> /data01d:
>   SB lock proto = "lock_dlm"
>   SB lock table = "omadvnfs01:gfs_data01d"
>   SB ondisk format = 1309
>   SB multihost format = 1401
>   Block size = 4096
>   Journals = 2
>   Resource Groups = 16384
>   Mounted lock proto = "lock_dlm"
>   Mounted lock table = "omadvnfs01:gfs_data01d"
>   Mounted host data = "jid=1:id=786434:first=0"
>   Journal number = 1
>   Lock module flags = 0
>   Local flocks = FALSE
>   Local caching = FALSE
>   Oopses OK = FALSE
> 
>   Type           Total          Used           Free           use%
>
---------------------------------------------------------------------
> ---
>   inodes         18417216       18417216       0              100%
>   metadata       21078536       20002007       1076529        95%
>   data           1034059688     744936460      289123228      72%
> 
> 
> The number of inodes is interesting......

If you mean the percentage used, I believe GFS allocates inodes as
needed, so this will always indicate that they are 100% in use.

> 
> 
> On Mon, Jun 16, 2008 at 11:45 AM, Terry <td3201 at gmail.com> wrote:
> > Hello,
> >
> > I have 4 GFS volumes, each 4 TB.  I am seeing pretty high load
> > averages on the host that is serving these volumes out via NFS.  I
> > notice that gfs_scand, dlm_recv, and dlm_scand are running with high
> > CPU%.  I truly believe the box is I/O bound due to high awaits but
> > trying to dig into root cause.  99% of the activity on these volumes
> > is write.  The number of files is around 15 million per TB.   Given
> > the high number of writes, increasing scand_secs will not help.  Any
> > other optimizations I can do?
> >
> > Thanks!
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From ross at kallisti.us  Mon Jun 16 19:16:41 2008
From: ross at kallisti.us (Ross Vandegrift)
Date: Mon, 16 Jun 2008 15:16:41 -0400
Subject: [Linux-cluster] gfs tuning
In-Reply-To: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
Message-ID: <20080616191641.GA17965@kallisti.us>

On Mon, Jun 16, 2008 at 11:45:51AM -0500, Terry wrote:
> I have 4 GFS volumes, each 4 TB.  I am seeing pretty high load
> averages on the host that is serving these volumes out via NFS.  I
> notice that gfs_scand, dlm_recv, and dlm_scand are running with high
> CPU%.  I truly believe the box is I/O bound due to high awaits but
> trying to dig into root cause.  99% of the activity on these volumes
> is write.  The number of files is around 15 million per TB.   Given
> the high number of writes, increasing scand_secs will not help.  Any
> other optimizations I can do?

Are you running multi-threaded/multi-process writes to the same files
on various nodes?

During benchmarking and testing a cluster I recently built, I noticed
a very large performance hit when performing multi-threaded I/O to
overlapping areas of the filesystem.

If you can randomize the order that different nodes are accessing
the filesystem, you'll go a long way to reducing contention.  That
will improve your performance.

However, I suspect with NFS you won't have too much choice, since
file access will be governed by client read/write patterns...


-- 
Ross Vandegrift
ross at kallisti.us

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37


From td3201 at gmail.com  Mon Jun 16 19:38:55 2008
From: td3201 at gmail.com (Terry)
Date: Mon, 16 Jun 2008 14:38:55 -0500
Subject: [Linux-cluster] gfs tuning
In-Reply-To: <20080616191641.GA17965@kallisti.us>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
	<20080616191641.GA17965@kallisti.us>
Message-ID: <8ee061010806161238x56b905aeqd3752c32a304f098@mail.gmail.com>

On Mon, Jun 16, 2008 at 2:16 PM, Ross Vandegrift <ross at kallisti.us> wrote:
> On Mon, Jun 16, 2008 at 11:45:51AM -0500, Terry wrote:
>> I have 4 GFS volumes, each 4 TB.  I am seeing pretty high load
>> averages on the host that is serving these volumes out via NFS.  I
>> notice that gfs_scand, dlm_recv, and dlm_scand are running with high
>> CPU%.  I truly believe the box is I/O bound due to high awaits but
>> trying to dig into root cause.  99% of the activity on these volumes
>> is write.  The number of files is around 15 million per TB.   Given
>> the high number of writes, increasing scand_secs will not help.  Any
>> other optimizations I can do?
>
> Are you running multi-threaded/multi-process writes to the same files
> on various nodes?
>
> During benchmarking and testing a cluster I recently built, I noticed
> a very large performance hit when performing multi-threaded I/O to
> overlapping areas of the filesystem.
>
> If you can randomize the order that different nodes are accessing
> the filesystem, you'll go a long way to reducing contention.  That
> will improve your performance.
>
> However, I suspect with NFS you won't have too much choice, since
> file access will be governed by client read/write patterns...
>
>
> --
> Ross Vandegrift
> ross at kallisti.us

I won't have a choice, unfortunately.  Here is what I set so far:

gfs_tool settune $i statfs_slots 128
gfs_tool settune $i scand_secs 30
gfs_tool settune $i glock_purge 50


From s.wendy.cheng at gmail.com  Mon Jun 16 19:48:21 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Mon, 16 Jun 2008 15:48:21 -0400
Subject: [Linux-cluster] gfs tuning
In-Reply-To: <20080616191641.GA17965@kallisti.us>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
	<20080616191641.GA17965@kallisti.us>
Message-ID: <4856C385.8000800@gmail.com>

Ross Vandegrift wrote:
> On Mon, Jun 16, 2008 at 11:45:51AM -0500, Terry wrote:
>   
>> I have 4 GFS volumes, each 4 TB.  I am seeing pretty high load
>> averages on the host that is serving these volumes out via NFS.  I
>> notice that gfs_scand, dlm_recv, and dlm_scand are running with high
>> CPU%.  I truly believe the box is I/O bound due to high awaits but
>> trying to dig into root cause.  99% of the activity on these volumes
>> is write.  The number of files is around 15 million per TB.   Given
>> the high number of writes, increasing scand_secs will not help.  Any
>> other optimizations I can do?
>>     
>
>   

A similar case two years ago was solved by the following two tunables:

shell> gfs_tool settune <mount_point> demote_secs <seconds>
(e.g. "gfs_tool settune /mnt/gfs1 demote_secs 200").
shell> gfs_tool settune <mount_point> glock_purge <percentage>
(e.g. "gfs_tool settune /mnt/gfs1 glock_purge 50")

The example above will trim 50% of inode away for every 200 seconds 
interval (default is 300 seconds). Do this on all the GFS-NFS servers 
that show this issues. It can be dynamically turned on (non-zero 
percentage) and off (0 percentage).

As I recalled, the customer used a very aggressive percentage (I think 
it was 100%) but please start from middle ground (50%) to see how it goes.

-- Wendy


From gsrlinux at gmail.com  Tue Jun 17 03:12:30 2008
From: gsrlinux at gmail.com (GS R)
Date: Tue, 17 Jun 2008 08:42:30 +0530
Subject: [Linux-cluster] Help with cluster.conf
In-Reply-To: <-3115627869481524872@unknownmsgid>
References: <7055134104139601366@unknownmsgid>
	<d765e01f0806160327m59410d06y32d0164b81486196@mail.gmail.com>
	<d765e01f0806160329j47cefcc7he3142a2f48da6d48@mail.gmail.com>
	<-3115627869481524872@unknownmsgid>
Message-ID: <d765e01f0806162012k6e4b5cc9q61a60749956700dd@mail.gmail.com>

On 6/16/08, Ben Yarwood <ben.yarwood at juno.co.uk> wrote:
>
> I think having looked at:
>
>
> http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/netfs.sh?rev=1.11&content-type=text/x-cvsweb-markup&cvs
> root=cluster
>
> this:
>
>        <netfs options="rw" mountpoint="/mnt/encoded/audio/wav"
> force_unmount="1" export="/export-path" fstype="nfs" host="hostname"
> name="name-of-the-resource"/>
>
> Will try to mount an nfs fs from the remote host and export
> <host="hostname" export="/export-path"> rather than just specify a
> directory to be exported from a local fs.
>
> I've now looked at the nfsclient.sh script and I think I can just specify
> the export path in that part of the file e.g. as follows:
>
> <resources>
> ...
>
> <nfsclient name="dmz" path="/mnt/encoded/audio/wav" target="10.0.40.0/24"
> options="ro,sync"/>
>
> ...


I had misunderstood your scenario earlier. However I can confirm that the
syntax for nfsclient resource is correct as you have specified. No fsid is
required for nfsclient resource.


</resources>
>
>
> Can anyone confirm if this is correct?   I'm not sure what will happen as
> there is no fsid to inherit?
>
>
> Regards
> Ben


Thanks
Gowrishankar Rajaiyan | Senior Quality Analyst.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080617/2789612b/attachment.htm>

From gsrlinux at gmail.com  Tue Jun 17 03:31:14 2008
From: gsrlinux at gmail.com (GS R)
Date: Tue, 17 Jun 2008 09:01:14 +0530
Subject: [Linux-cluster] cluster instability
In-Reply-To: <cfe2fc960806160854q7ade9413g5012d44730fbdda8@mail.gmail.com>
References: <cfe2fc960806160854q7ade9413g5012d44730fbdda8@mail.gmail.com>
Message-ID: <d765e01f0806162031g6c3d06do8ce49320bc2387ff@mail.gmail.com>

On 6/16/08, Shawn Hood <shawnlhood at gmail.com> wrote:
>
> All,
>
> This message was sent out to my office, so the voice may seem a bit
> odd.  We have a 4 node cluster running RHEL4U6 on Dell Poweredge
> 1950s.  Fencing is done via DRAC.
>
> Using packages (from RHN):
>
> cman-kernel-smp-2.6.9-53.13
> cman-1.0.17-0.el4_6.5
> ccs-1.0.11-1.el4_6.1
> fence-1.32.50-2.el4_6.1
> lvm2-cluster-2.02.27-2.el4_6.2
> dlm-kernel-smp-2.6.9-52.9
> dlm-kernheaders-2.6.9-52.9
>
> Our cluster became unstable on Saturday morning.  Apparently
> hugin stopped sending out heartbeats, causing it to become fenced.  hugin
> was under heavy load (~10) at the time:
>
> 03:30:02 AM         6       453      9.35     10.29     10.51
> 03:40:01 AM        12       465     11.02     11.00     10.75
> 03:50:02 AM         3       446      9.75     10.80     10.86
> 04:00:01 AM         5       430      9.23      9.47     10.07
> Average:            7       455     10.19     10.32     10.28
>
> 04:09:35 AM       LINUX RESTART
>
> As you can see, hugin was fenced at 4:09.  The other nodes then began
> logging the following:
>
> Jun 14 04:08:06 munin kernel: CMAN: Initiating transition, generation 58
> Jun 14 04:08:21 munin kernel: CMAN: Initiating transition, generation 59
> Jun 14 04:08:36 munin kernel: CMAN: Initiating transition, generation 60
> Jun 14 04:08:51 munin kernel: CMAN: Initiating transition, generation 61
> Jun 14 04:09:06 munin kernel: CMAN: too many transition restarts - will die
> Jun 14 04:09:06 munin kernel: CMAN: we are leaving the cluster.
> Inconsistent
> cluster view


I guess this has to do with network issue though its utilization was low
when this logged.
The node is not able to receive messages.

After so many 'initiating transition' messages, the cluster died.  Our
> network utilization was very low at the time.
>
> Any ideas?
>
> Shawn
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

Thanks
Gowrishankar Rajaiyan | Senior Quality Analyst
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080617/99d0aec8/attachment.htm>

From stephenamadei at hotmail.com  Tue Jun 17 03:47:44 2008
From: stephenamadei at hotmail.com (Stephen Amadei)
Date: Mon, 16 Jun 2008 23:47:44 -0400
Subject: [Linux-cluster] GFS2 nulling larger files.
Message-ID: <BAY110-W16D6B765DD8F81329C5F3CBBA80@phx.gbl>


Hello.

I'm running cluster-2.03.02, Openais-0.80.3, lvm2-2.02.36 on top of a DRBD 8.0.12 device on Slackware 12.0 almost 12.1 box.  I stuck with these versions, as newer versions didn't provide the features I needed or didn't compile for various reasons.

So, everything was running well, but now after testing somewhat, when I copy files to the GFS2 partition, the files larger than 2.9K, but smaller than 4.3K get nulled.

So, I'm up and running.  Both A and B have the GFS2 partition /var/shared running.  On A, I copy some files to /var/shared.  ls -l /var/shared shows the files have the proper size... and CATing the files shows content.  On B, the files larger than 2.9K are null... zero size, and you can't cat them.  If I unmount /var/shared on A and remount it, it also only sees proper file sizes up to 2.9K, everything else is null.

fsck doesn't seem to see anything wrong with the partition.

What on earth could be the problem here?

Thanks in advance.

Stephen
_________________________________________________________________
It?s easy to add contacts from Facebook and other social sites through Windows Live? Messenger. Learn how.
https://www.invite2messenger.net/im/?source=TXT_EML_WLH_LearnHow


From fdinitto at redhat.com  Tue Jun 17 05:05:28 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 17 Jun 2008 07:05:28 +0200 (CEST)
Subject: [Linux-cluster] GFS2 nulling larger files.
In-Reply-To: <BAY110-W16D6B765DD8F81329C5F3CBBA80@phx.gbl>
References: <BAY110-W16D6B765DD8F81329C5F3CBBA80@phx.gbl>
Message-ID: <Pine.LNX.4.64.0806170704240.5892@trider-g7>

On Mon, 16 Jun 2008, Stephen Amadei wrote:

>
> Hello.
>
> I'm running cluster-2.03.02, Openais-0.80.3,

[SNIP]

> I stuck with these versions, as newer versions didn't provide the 
> features I needed or didn't compile for various reasons.

Could you please collect the build errors from the above sources (if any) 
and send me the logs?

Thanks
Fabio

--
I'm going to make him an offer he can't refuse.


From maurizio.rottin at gmail.com  Tue Jun 17 05:46:13 2008
From: maurizio.rottin at gmail.com (Maurizio Rottin)
Date: Tue, 17 Jun 2008 07:46:13 +0200
Subject: [Linux-cluster] getting rhcs critical events by email/snmp
In-Reply-To: <ed9a61600806101144x2e07859cyb73abc4c6de95677@mail.gmail.com>
References: <e83473390806100824r57747a02q3f2c3b8d1153dab5@mail.gmail.com>
	<1213123176.20204.108.camel@ayanami.boston.devel.redhat.com>
	<ed9a61600806101144x2e07859cyb73abc4c6de95677@mail.gmail.com>
Message-ID: <e83473390806162246s79facee0sdebdc05f13ace770@mail.gmail.com>

2008/6/10 S. Zachariah Sprackett <zac at sprackett.com>:
> On Tue, Jun 10, 2008 at 2:39 PM, Lon Hohberger <lhh at redhat.com> wrote:
>>
>> > Anyway, since if it happen once it can happen again, is there any way
>> > to get the cluster send emails or snmp messages on nodes critical
>> > events?
>>
>> You can set syslog to do email, I think.
>
> See here for details on how:
>
> http://www.johnandcailin.com/blog/john/how-setup-real-time-email-notification-critical-syslog-events

may i say this is an horrible solution!?(even if it is a solution)

this means i must setup some cron job consuming cpu cycles for nothing
but sending a message to syslog (which sends en email). To make it
short i should setup a a cron job that consumes cpu cycles for nothing
and send email realtime(without syslog interaction)

I believe it's difficult to catch syslog mesages from rhcs, since it
send to something *.info;mail.none;authpriv.none;cron.none; then it is
not something easy to catch like local5.crit.

yum searchng "snmp" i found this cluster-snmp rpm.

now i'm gonna check it out, i will write about it as soon as it is configured!

You can use syslog if you want, but i want to point out that that is a
really bad solution!

bye,

-- 
mr


From lp at xbe.ch  Tue Jun 17 07:19:33 2008
From: lp at xbe.ch (Lorenz Pfiffner)
Date: Tue, 17 Jun 2008 09:19:33 +0200
Subject: [Linux-cluster] apache resource problem in RHCS 5.1
In-Reply-To: <48480528.6000708@redhat.com>
References: <483ECA36.7070007@xbe.ch>	<9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>	<48469941.6030800@xbe.ch>
	<48480528.6000708@redhat.com>
Message-ID: <48576585.3050409@xbe.ch>

I want to add something I recently found out about the relocation time of IP resources, which I complained about.

The reason why it takes 10 seconds per IP can be found here: http://sources.redhat.com/cluster/faq.html#rgm_failovertime

Greetz
Lorenz

Marek 'marx' Grac wrote:
> Hi,
> 
> Lorenz Pfiffner wrote:
>> Hi Ron
>>
>> I couldn't make it working with the apache resource. For me it seems 
>> quite unstable and it's nowhere really mentioned in any documentation 
>> I found. So please, if any RedHat guy is reading this, can you please 
>> improve this feature and put it into the official documentation. For 
>> example, why does the apache.sh script change the "Listen" directive?
> Look at my previous post to this thread. IMHO unstable is something that 
> does not work.
> 
>> How can I execute apache.sh manually to debug the resource?
>>
> If you want to debug, the best way is to run resource group manager in 
> debug mode. So stop it in all machines, and run clurgmgrd -fd  (stay 
> forward and debug). Resource agents tries to log as much as is useful 
> and you will see everything on output. If you want to run this script 
> directly, you will have to setup all environment variables OCF_*.
> 
>> My workaround: I altered the default httpd script and made a script 
>> resource. In that case it's working as expected. The only thing that 
>> bothers me quite a lot is the relocation time. It takes about 50 to 60 
>> seconds to relocate 5 IPs, a GFS mount and the apache script resource! 
>> Is this a reasonable time? On older clusters I remember times around 5 
>> to 10 seconds.
> 
> Default init script for httpd, mysqld, ... will work for you if you have 
> only one httpd on your cluster. It is not suitable for running several 
> instances on same machine. This is one of the reasons why we need 
> resource agents.
> 
> 


From ccaulfie at redhat.com  Tue Jun 17 07:29:00 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 17 Jun 2008 08:29:00 +0100
Subject: [Linux-cluster] cluster instability
In-Reply-To: <d765e01f0806162031g6c3d06do8ce49320bc2387ff@mail.gmail.com>
References: <cfe2fc960806160854q7ade9413g5012d44730fbdda8@mail.gmail.com>
	<d765e01f0806162031g6c3d06do8ce49320bc2387ff@mail.gmail.com>
Message-ID: <485767BC.4050705@redhat.com>

GS R wrote:
> 
> 
> On 6/16/08, *Shawn Hood* <shawnlhood at gmail.com 
> <mailto:shawnlhood at gmail.com>> wrote:
> 
>     All,
> 
>     This message was sent out to my office, so the voice may seem a bit
>     odd.  We have a 4 node cluster running RHEL4U6 on Dell Poweredge
>     1950s.  Fencing is done via DRAC.
> 
>     Using packages (from RHN):
> 
>     cman-kernel-smp-2.6.9-53.13
>     cman-1.0.17-0.el4_6.5
>     ccs-1.0.11-1.el4_6.1
>     fence-1.32.50-2.el4_6.1
>     lvm2-cluster-2.02.27-2.el4_6.2
>     dlm-kernel-smp-2.6.9-52.9
>     dlm-kernheaders-2.6.9-52.9
> 
>     Our cluster became unstable on Saturday morning.  Apparently
>     hugin stopped sending out heartbeats, causing it to become
>     fenced.  hugin
>     was under heavy load (~10) at the time:
> 
>     03:30:02 AM         6       453      9.35     10.29     10.51
>     03:40:01 AM        12       465     11.02     11.00     10.75
>     03:50:02 AM         3       446      9.75     10.80     10.86
>     04:00:01 AM         5       430      9.23      9.47     10.07
>     Average:            7       455     10.19     10.32     10.28
> 
>     04:09:35 AM       LINUX RESTART
> 
>     As you can see, hugin was fenced at 4:09.  The other nodes then began
>     logging the following:
> 
>     Jun 14 04:08:06 munin kernel: CMAN: Initiating transition, generation 58
>     Jun 14 04:08:21 munin kernel: CMAN: Initiating transition, generation 59
>     Jun 14 04:08:36 munin kernel: CMAN: Initiating transition, generation 60
>     Jun 14 04:08:51 munin kernel: CMAN: Initiating transition, generation 61
>     Jun 14 04:09:06 munin kernel: CMAN: too many transition restarts -
>     will die
>     Jun 14 04:09:06 munin kernel: CMAN: we are leaving the cluster.
>     Inconsistent
>     cluster view
> 
>  
> I guess this has to do with network issue though its utilization was low 
> when this logged.
> The node is not able to receive messages.
> 

I suspect you've hit this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=444751


There's a patch in the bugzilla, and a workaround program you can run 
which should help if you can't upgrade the kernel module (See comment #10)

-- 

Chrissie


From denisb+gmane at gmail.com  Tue Jun 17 07:52:53 2008
From: denisb+gmane at gmail.com (denis)
Date: Tue, 17 Jun 2008 09:52:53 +0200
Subject: [Linux-cluster] Re: apache resource problem in RHCS 5.1
In-Reply-To: <48576585.3050409@xbe.ch>
References: <483ECA36.7070007@xbe.ch>	<9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>	<48469941.6030800@xbe.ch>	<48480528.6000708@redhat.com>
	<48576585.3050409@xbe.ch>
Message-ID: <g37qgl$jll$1@ger.gmane.org>

Lorenz Pfiffner wrote:
> I want to add something I recently found out about the relocation time 
> of IP resources, which I complained about.
> 
> The reason why it takes 10 seconds per IP can be found here: 
> http://sources.redhat.com/cluster/faq.html#rgm_failovertime

Can anyone add to this, seems very interesting to cut the failover time 
for my scenario too.

Is lowering the sleep time or removing it safe for clusters without NFS?

Regards
--
Denis


From ben.yarwood at juno.co.uk  Tue Jun 17 10:03:10 2008
From: ben.yarwood at juno.co.uk (Ben Yarwood)
Date: Tue, 17 Jun 2008 11:03:10 +0100
Subject: [Linux-cluster] Help with cluster.conf
In-Reply-To: <d765e01f0806162012k6e4b5cc9q61a60749956700dd@mail.gmail.com>
References: <7055134104139601366@unknownmsgid>	<d765e01f0806160327m59410d06y32d0164b81486196@mail.gmail.com>	<d765e01f0806160329j47cefcc7he3142a2f48da6d48@mail.gmail.com>	<-3115627869481524872@unknownmsgid>
	<d765e01f0806162012k6e4b5cc9q61a60749956700dd@mail.gmail.com>
Message-ID: <02e001c8d061$593093a0$0b91bae0$@yarwood@juno.co.uk>

Thanks for your help.

Ben


> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of GS R
> Sent: 17 June 2008 04:12
> To: linux clustering
> Subject: Re: [Linux-cluster] Help with cluster.conf
> 
> 
> 
> On 6/16/08, Ben Yarwood <ben.yarwood at juno.co.uk> wrote:
> 
> 	I think having looked at:
> 
> 	http://sources.redhat.com/cgi-
> bin/cvsweb.cgi/cluster/rgmanager/src/resources/netfs.sh?rev=1.11&content-type=text/x-cvsweb-markup&cvs
> 	root=cluster
> 
> 	this:
> 
> 	       <netfs options="rw" mountpoint="/mnt/encoded/audio/wav" force_unmount="1"
> export="/export-path" fstype="nfs" host="hostname"
> 	name="name-of-the-resource"/>
> 
> 	Will try to mount an nfs fs from the remote host and export <host="hostname" export="/export-
> path"> rather than just specify a
> 	directory to be exported from a local fs.
> 
> 	I've now looked at the nfsclient.sh script and I think I can just specify the export path in
> that part of the file e.g. as follows:
> 
> 	<resources>
> 	...
> 
> 	<nfsclient name="dmz" path="/mnt/encoded/audio/wav" target="10.0.40.0/24" options="ro,sync"/>
> 
> 	...
> 
> 
> I had misunderstood your scenario earlier. However I can confirm that the syntax for nfsclient
> resource is correct as you have specified. No fsid is required for nfsclient resource.
> 
> 
> 
> 	</resources>
> 
> 
> 	Can anyone confirm if this is correct?   I'm not sure what will happen as there is no fsid to
> inherit?
> 
> 
> 	Regards
> 	Ben
> 
> 
> Thanks
> Gowrishankar Rajaiyan | Senior Quality Analyst.
> 


From denisb+gmane at gmail.com  Tue Jun 17 10:08:44 2008
From: denisb+gmane at gmail.com (denis)
Date: Tue, 17 Jun 2008 12:08:44 +0200
Subject: [Linux-cluster] Re: apache resource problem in RHCS 5.1
In-Reply-To: <g37qgl$jll$1@ger.gmane.org>
References: <483ECA36.7070007@xbe.ch>	<9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>	<48469941.6030800@xbe.ch>	<48480528.6000708@redhat.com>	<48576585.3050409@xbe.ch>
	<g37qgl$jll$1@ger.gmane.org>
Message-ID: <g382fc$e2j$1@ger.gmane.org>

denis wrote:
> Lorenz Pfiffner wrote:
>> I want to add something I recently found out about the relocation time 
>> of IP resources, which I complained about.
>>
>> The reason why it takes 10 seconds per IP can be found here: 
>> http://sources.redhat.com/cluster/faq.html#rgm_failovertime
> 
> Can anyone add to this, seems very interesting to cut the failover time 
> for my scenario too.
> 
> Is lowering the sleep time or removing it safe for clusters without NFS?

Well, duh, actually reading the page revealed this FAQ has migrated to a 
new place, and the new FAQ maintains this advice so I'll just assume 
this is valid.

http://sources.redhat.com/cluster/wiki/FAQ/RGManager#rgm_failovertime

Sorry for noise.

--
Denis


From rpeterso at redhat.com  Tue Jun 17 13:47:13 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 17 Jun 2008 08:47:13 -0500
Subject: [Linux-cluster] GFS2 nulling larger files.
In-Reply-To: <BAY110-W16D6B765DD8F81329C5F3CBBA80@phx.gbl>
References: <BAY110-W16D6B765DD8F81329C5F3CBBA80@phx.gbl>
Message-ID: <1213710433.3724.44.camel@technetium.msp.redhat.com>

Hi Stephen,

On Mon, 2008-06-16 at 23:47 -0400, Stephen Amadei wrote:
> So, everything was running well, but now after testing somewhat, when I copy files to the GFS2 partition, the files larger than 2.9K, but smaller than 4.3K get nulled.

We're already aware of the problem and are working on it.
We know kind of what's going on and which set of patches
broke it.  Basically, we rewrote some of the block allocation
code for performance reasons and that's the bit having the problem.

It seems to be a problem only in the upstream code; the GFS2 code
in RHEL5.2 has the old allocation code, so it doesn't have the
problem.  So a fix shouldn't be too far off.  Part of the
problem, though, is that Steve Whitehouse is on holiday, so
we're short-handed and any fixes we develop would have to be
pushed to the git tree by him.

Regards,

Bob Peterson
Red Hat Clustering & GFS


From td3201 at gmail.com  Tue Jun 17 16:54:26 2008
From: td3201 at gmail.com (Terry)
Date: Tue, 17 Jun 2008 11:54:26 -0500
Subject: [Linux-cluster] gfs tuning
In-Reply-To: <4856C385.8000800@gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
	<20080616191641.GA17965@kallisti.us> <4856C385.8000800@gmail.com>
Message-ID: <8ee061010806170954y1d6d7555qf7cd5a137b63e018@mail.gmail.com>

On Mon, Jun 16, 2008 at 2:48 PM, Wendy Cheng <s.wendy.cheng at gmail.com> wrote:
> Ross Vandegrift wrote:
>>
>> On Mon, Jun 16, 2008 at 11:45:51AM -0500, Terry wrote:
>>
>>>
>>> I have 4 GFS volumes, each 4 TB.  I am seeing pretty high load
>>> averages on the host that is serving these volumes out via NFS.  I
>>> notice that gfs_scand, dlm_recv, and dlm_scand are running with high
>>> CPU%.  I truly believe the box is I/O bound due to high awaits but
>>> trying to dig into root cause.  99% of the activity on these volumes
>>> is write.  The number of files is around 15 million per TB.   Given
>>> the high number of writes, increasing scand_secs will not help.  Any
>>> other optimizations I can do?
>>>
>>
>>
>
> A similar case two years ago was solved by the following two tunables:
>
> shell> gfs_tool settune <mount_point> demote_secs <seconds>
> (e.g. "gfs_tool settune /mnt/gfs1 demote_secs 200").
> shell> gfs_tool settune <mount_point> glock_purge <percentage>
> (e.g. "gfs_tool settune /mnt/gfs1 glock_purge 50")
>
> The example above will trim 50% of inode away for every 200 seconds interval
> (default is 300 seconds). Do this on all the GFS-NFS servers that show this
> issues. It can be dynamically turned on (non-zero percentage) and off (0
> percentage).
>
> As I recalled, the customer used a very aggressive percentage (I think it
> was 100%) but please start from middle ground (50%) to see how it goes.
>
> -- Wendy

I am still seeing some high load averages.  Here is an example of a
gfs configuration.  I left statfs_fast off as it would not apply to
one of my volumes for an unknown reason.  Not sure that would have
helped anyways.  I do, however, feel that reducing scand_secs helped a
little:

[root at omadvnfs01a ~]# gfs_tool gettune /data01a
ilimit1 = 100
ilimit1_tries = 3
ilimit1_min = 1
ilimit2 = 500
ilimit2_tries = 10
ilimit2_min = 3
demote_secs = 200
incore_log_blocks = 1024
jindex_refresh_secs = 60
depend_secs = 60
scand_secs = 30
recoverd_secs = 60
logd_secs = 1
quotad_secs = 5
inoded_secs = 15
glock_purge = 50
quota_simul_sync = 64
quota_warn_period = 10
atime_quantum = 3600
quota_quantum = 60
quota_scale = 1.0000   (1, 1)
quota_enforce = 0
quota_account = 0
new_files_jdata = 0
new_files_directio = 0
max_atomic_write = 4194304
max_readahead = 262144
lockdump_size = 131072
stall_secs = 600
complain_secs = 10
reclaim_limit = 5000
entries_per_readdir = 32
prefetch_secs = 10
statfs_slots = 128
max_mhc = 10000
greedy_default = 100
greedy_quantum = 25
greedy_max = 250
rgrp_try_threshold = 100
statfs_fast = 0

Given the high number of files (10-15 million per TB) would it be
smarter to use ext3?  My NFS cluster is set up as an active/passive
anyways so only 1 node will have access to the data at any one time.
Thoughts?  Opinions?

Anyone have an NFS cluster that is active/active?  Thoughts?  I am not
certain that nfsd and locking is cluster friendly.  That said, I don't
feel my application (nfs clients) will be requesting a file in write
(or even read) mode at the same time so my locking concerns aren't
high.


From s.wendy.cheng at gmail.com  Tue Jun 17 20:09:00 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Tue, 17 Jun 2008 16:09:00 -0400
Subject: [Linux-cluster] gfs tuning
In-Reply-To: <8ee061010806170954y1d6d7555qf7cd5a137b63e018@mail.gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>	<20080616191641.GA17965@kallisti.us>
	<4856C385.8000800@gmail.com>
	<8ee061010806170954y1d6d7555qf7cd5a137b63e018@mail.gmail.com>
Message-ID: <485819DC.90503@gmail.com>

Hi, Terry,
>
> I am still seeing some high load averages.  Here is an example of a
> gfs configuration.  I left statfs_fast off as it would not apply to
> one of my volumes for an unknown reason.  Not sure that would have
> helped anyways.  I do, however, feel that reducing scand_secs helped a
> little:
>   
Sorry I missed scand_secs (was mindless as the brain was mostly occupied 
by day time work).

To simplify the view, glock states include exclusive (write), share 
(read), and not-locked (in reality, there are more). Exclusive lock has 
to be demoted (demote_secs) to share, then to not-locked (another 
demote_secs) before it is scanned (every scand_secs) to get added into 
reclaim list where it can be purged. Between exclusive and share state 
transition, the file contents need to get flushed to disk (to keep file 
content cluster coherent).  All of above assume the file (protected by 
this glock) is not accessed (idle).

You hit an area that GFS normally doesn't perform well. With GFS1 in 
maintenance mode while GFS2 seems to be so far away, ext3 could be a 
better answer. However, before switching, do make sure to test it 
thoroughly (since Ext3 could have the very same issue as well - check 
out: http://marc.info/?l=linux-nfs&m=121362947909974&w=2 ).

Did you look (and test) GFS "nolock" protocol (for single node GFS)? It 
bypasses some locking overhead and can be switched to  DLM in the future 
(just make sure you reserve enough journal space - the rule of thumb is 
one journal per node and know how many nodes you plan to have in the 
future).

-- Wendy


From RichardW at iodynamix.com  Tue Jun 17 21:14:58 2008
From: RichardW at iodynamix.com (Richard Williams - IoDynamix)
Date: Tue, 17 Jun 2008 14:14:58 -0700
Subject: [Linux-cluster] Can clustered RHEL 5 use a SAN with different
 access rights for different nodes in the cluster?
Message-ID: <C47D7762.128A%RichardW@iodynamix.com>

Please advise and/or redirect this posting if this is not the correct forum
for my question - thanks.

A company wants to use clustered rhel5 systems as inside/outside ftp
servers. Users on the inside (LAN) cluster nodes can read and write to the
SAN, while users on the outside (DMZ) cluster can only read.

Is this application possible without GFS?

If one node in the cluster fails, can the other node be provisioned to
provide all services until recovery?

Can a SAN be used as the "single" ftp location for both services (inside FTP
& outside FTP?)

Does the customer need more than four systems (i.e. 2 inside - 2 outside) -
is a separate "command" system required?


Have Dell's m1000e & 600 series blades been certified for this operating
system?

Is there any documentation available regarding separate access rights for
multiple nodes in a cluster available?

Thanks for your constructive reply.


From pronix.service at gmail.com  Tue Jun 17 21:23:25 2008
From: pronix.service at gmail.com (pronix pronix)
Date: Wed, 18 Jun 2008 01:23:25 +0400
Subject: [Linux-cluster] Can clustered RHEL 5 use a SAN with different
	access rights for different nodes in the cluster?
In-Reply-To: <C47D7762.128A%RichardW@iodynamix.com>
References: <C47D7762.128A%RichardW@iodynamix.com>
Message-ID: <639ce0480806171423u4503665ewd7426080145309ea@mail.gmail.com>

yes , you can deploy than without gfs,but with gfs2 better
readonly access implement by anonymous (read only) users.
failover possible create - enough 2 nodes and drbd


2008/6/18 Richard Williams - IoDynamix <RichardW at iodynamix.com>:

> Please advise and/or redirect this posting if this is not the correct forum
> for my question - thanks.
>
> A company wants to use clustered rhel5 systems as inside/outside ftp
> servers. Users on the inside (LAN) cluster nodes can read and write to the
> SAN, while users on the outside (DMZ) cluster can only read.
>
> Is this application possible without GFS?
>
> If one node in the cluster fails, can the other node be provisioned to
> provide all services until recovery?
>
> Can a SAN be used as the "single" ftp location for both services (inside
> FTP
> & outside FTP?)
>
> Does the customer need more than four systems (i.e. 2 inside - 2 outside) -
> is a separate "command" system required?
>
>
> Have Dell's m1000e & 600 series blades been certified for this operating
> system?
>
> Is there any documentation available regarding separate access rights for
> multiple nodes in a cluster available?
>
> Thanks for your constructive reply.
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080618/92d21ea9/attachment.htm>

From stephenamadei at hotmail.com  Tue Jun 17 21:44:13 2008
From: stephenamadei at hotmail.com (Stephen Amadei)
Date: Tue, 17 Jun 2008 17:44:13 -0400
Subject: [Linux-cluster] GFS2 nulling larger files.
In-Reply-To: <1213710433.3724.44.camel@technetium.msp.redhat.com>
References: <BAY110-W16D6B765DD8F81329C5F3CBBA80@phx.gbl>
	<1213710433.3724.44.camel@technetium.msp.redhat.com>
Message-ID: <BAY110-W84E1A5C4B583AECABE37DBBA80@phx.gbl>


>On Mon, 2008-06-16 at 23:47 -0400, Stephen Amadei wrote:
>> So, everything was running well, but now after testing somewhat, when I copy files to the GFS2 partition,>>the files larger than 2.9K, but smaller than 4.3K get nulled.

>We're already aware of the problem and are working on it.
>We know kind of what's going on and which set of patches
>broke it. Basically, we rewrote some of the block allocation
>code for performance reasons and that's the bit having the problem.

>It seems to be a problem only in the upstream code; the GFS2 code
>in RHEL5.2 has the old allocation code, so it doesn't have the
>problem. So a fix shouldn't be too far off. Part of the
>problem, though, is that Steve Whitehouse is on holiday, so
>we're short-handed and any fixes we develop would have to be
>pushed to the git tree by him.

I'm kind of new to the whole Cluster/OpenAIS/GFS2-tools thing, so bear with me...

Specifically, which package is causing the problem?  Cluster?  I assume when you mean upstream code, I assume you mean the code that is part of the kernel, as, yes, I am running a pretty new kernel... 2.6.24.7-grsec.  

While I would like to wait until the problem is fixed, is there a known good version of one package (Kernel, Cluster, OpenAIS, GFS-tools, etc.) or combo that would get me up and running?  I need to have GFS2 running a week ago, and due to various problems, I am already horribly late in getting this deployed.

Thanks.

Stephen


_________________________________________________________________
The other season of giving begins 6/24/08. Check out the i?m Talkathon.
http://www.imtalkathon.com?source=TXT_EML_WLH_SeasonOfGiving


From td3201 at gmail.com  Tue Jun 17 22:22:12 2008
From: td3201 at gmail.com (Terry)
Date: Tue, 17 Jun 2008 17:22:12 -0500
Subject: [Linux-cluster] gfs tuning
In-Reply-To: <485819DC.90503@gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
	<20080616191641.GA17965@kallisti.us> <4856C385.8000800@gmail.com>
	<8ee061010806170954y1d6d7555qf7cd5a137b63e018@mail.gmail.com>
	<485819DC.90503@gmail.com>
Message-ID: <8ee061010806171522l1d18480em861bffb87f8b8be2@mail.gmail.com>

On Tue, Jun 17, 2008 at 3:09 PM, Wendy Cheng <s.wendy.cheng at gmail.com> wrote:
> Hi, Terry,
>>
>> I am still seeing some high load averages.  Here is an example of a
>> gfs configuration.  I left statfs_fast off as it would not apply to
>> one of my volumes for an unknown reason.  Not sure that would have
>> helped anyways.  I do, however, feel that reducing scand_secs helped a
>> little:
>>
>
> Sorry I missed scand_secs (was mindless as the brain was mostly occupied by
> day time work).
>
> To simplify the view, glock states include exclusive (write), share (read),
> and not-locked (in reality, there are more). Exclusive lock has to be
> demoted (demote_secs) to share, then to not-locked (another demote_secs)
> before it is scanned (every scand_secs) to get added into reclaim list where
> it can be purged. Between exclusive and share state transition, the file
> contents need to get flushed to disk (to keep file content cluster
> coherent).  All of above assume the file (protected by this glock) is not
> accessed (idle).
>
> You hit an area that GFS normally doesn't perform well. With GFS1 in
> maintenance mode while GFS2 seems to be so far away, ext3 could be a better
> answer. However, before switching, do make sure to test it thoroughly (since
> Ext3 could have the very same issue as well - check out:
> http://marc.info/?l=linux-nfs&m=121362947909974&w=2 ).
>
> Did you look (and test) GFS "nolock" protocol (for single node GFS)? It
> bypasses some locking overhead and can be switched to  DLM in the future
> (just make sure you reserve enough journal space - the rule of thumb is one
> journal per node and know how many nodes you plan to have in the future).
>
> -- Wendy

Good points.  I could try the nolock feature I suppose.  Not quite
clear on how to reserve journal space.  I forgot to post the cpu time,
check out this:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4822 root      10  -5     0    0    0 S    1  0.0   2159:15 dlm_recv
 4820 root      10  -5     0    0    0 S    1  0.0 368:09.34 dlm_astd
 4821 root      10  -5     0    0    0 S    0  0.0 153:06.80 dlm_scand
 3659 root      10  -5     0    0    0 S    0  0.0 134:40.14 scsi_wq_4
 4823 root      11  -5     0    0    0 S    1  0.0 109:33.33 dlm_send
  367 root      10  -5     0    0    0 S    0  0.0 103:33.74 kswapd0

gfs_glockd is further below so not so concerned with that right now.
It appears turning on nolock would do the trick.  The times aren't
extremely accurate because I have failed this cluster between nodes
while testing.


From fdinitto at redhat.com  Wed Jun 18 07:04:26 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 18 Jun 2008 09:04:26 +0200
Subject: [Linux-cluster] HA Cluster Developer Summit 2008 (phase 1):
	Collecting ideas
Message-ID: <1213772667.3498.34.camel@diapolon.int.fabbione.net>

The need for a high bandwith, extremely focused, cluster developer
summit has been growing for the past few months and it is time to try
to organize one.

We will do this in the most open and transparent way as possible, for
everybody to participate in whole process.

This email targets packagers and developers that are actively involved,
interested or plan to base their software on top of openais, pacemaker,
cman, dlm, i/o fencing and resource manager.

The very first step is to collect ideas that are worth discussing face
to face between developers (remember you can always submit ideas even
if you cannot participate at the summit directly).

When submitting ideas keep in mind those golden rules:
- it needs to be technical. This is a developer summit.
- it needs to be realistic and achievable by humans
  (developers still fall in this category).
- each item needs to be worth discussing face to face.

A preliminary list, that also underlines the basic directions that will
lead the summit, has already been collected here:

http://sources.redhat.com/cluster/wiki/ClusterSummit2008

By the end of June, the list of ideas will be "frozen" and will make
the final call to decide if there are enough topics to hold the summit
or not.

The next steps include:

- decide absolute minimal list of participants (based on ideas)
- location and dates

Please feel free to distribute this email as you believe it fits best.

Fabio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080618/61080128/attachment.sig>

From federico.simoncelli at gmail.com  Wed Jun 18 07:39:19 2008
From: federico.simoncelli at gmail.com (Federico Simoncelli)
Date: Wed, 18 Jun 2008 09:39:19 +0200
Subject: [Linux-cluster] Disabling cman at boot
Message-ID: <a01fe36d0806180039h3203c6a8na414bca14f306287@mail.gmail.com>

Hi all, is there any way to prevent cman to start at boot?
Let's assume that I need to boot a fenced node and I want to skip the
cluster services to avoid possible problems (such as long waits,
fences in a two-nodes configuration, etc...). Is there any way to
accomplish this?
Do you think adding a boot parameter (eg: nocluster) could be a good
solution? We should modify the init file for cman to check the
presence of that parameter and skip the start process if present.
Any other idea? I think that booting in single mode and disabling the
service is more work than adding a temporary parameter in the grub
interface at boot.

Thanks.
-- 
Federico.


From kadlec at sunserv.kfki.hu  Wed Jun 18 08:23:55 2008
From: kadlec at sunserv.kfki.hu (Kadlecsik Jozsef)
Date: Wed, 18 Jun 2008 10:23:55 +0200 (CEST)
Subject: [Linux-cluster] Disabling cman at boot
Message-ID: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>

On Wed, 18 Jun 2008, Federico Simoncelli wrote:

> Let's assume that I need to boot a fenced node and I want to skip the
> cluster services to avoid possible problems (such as long waits,
> fences in a two-nodes configuration, etc...). Is there any way to
> accomplish this?
> Do you think adding a boot parameter (eg: nocluster) could be a good
> solution? We should modify the init file for cman to check the
> presence of that parameter and skip the start process if present.

We use exactly the same method to specify how to boot a machine:

a. start GFS services and mount volumes (default)
b. start GFS services but do not mount GFS volumes and do not start any
   services relying on them (failsafe mode for gfs_fsck)
c. do not start GFS services at all

Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary


From deka.lipika at gmail.com  Wed Jun 18 11:49:06 2008
From: deka.lipika at gmail.com (Lipika Deka)
Date: Wed, 18 Jun 2008 12:49:06 +0100
Subject: [Linux-cluster] Work load traces
Message-ID: <43097e740806180449q2c2f830cia2a893babbd169c1@mail.gmail.com>

Hi All,
     I am a PhD student working on Cluster file system and on GFS
particularly.Could anyone help me by giving me pointers as to where I can
get real work load traces of file systems and if anyone knows of any
simulator available for cluster file system analysis.
thanks...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080618/5f487d62/attachment.htm>

From rpeterso at redhat.com  Wed Jun 18 12:48:47 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Wed, 18 Jun 2008 07:48:47 -0500
Subject: [Linux-cluster] GFS2 nulling larger files.
In-Reply-To: <1213710433.3724.44.camel@technetium.msp.redhat.com>
References: <BAY110-W16D6B765DD8F81329C5F3CBBA80@phx.gbl>
	<1213710433.3724.44.camel@technetium.msp.redhat.com>
Message-ID: <1213793327.3724.49.camel@technetium.msp.redhat.com>

Hi,

On Tue, 2008-06-17 at 08:47 -0500, Bob Peterson wrote:
> On Mon, 2008-06-16 at 23:47 -0400, Stephen Amadei wrote:
> > So, everything was running well, but now after testing somewhat, when I copy files to the GFS2 partition, the files larger than 2.9K, but smaller than 4.3K get nulled.
> 
> We're already aware of the problem and are working on it.
> We know kind of what's going on and which set of patches
> broke it.  Basically, we rewrote some of the block allocation
> code for performance reasons and that's the bit having the problem.
> 
> It seems to be a problem only in the upstream code; the GFS2 code
> in RHEL5.2 has the old allocation code, so it doesn't have the
> problem.  So a fix shouldn't be too far off.  Part of the
> problem, though, is that Steve Whitehouse is on holiday, so
> we're short-handed and any fixes we develop would have to be
> pushed to the git tree by him.

Yesterday, Ben Marzinski posted a patch for this problem to the public
cluster-devel mailing list.  It can be found here:
https://www.redhat.com/archives/cluster-devel/2008-June/msg00114.html

It is still not available in the upstream GIT tree due to Steve
still being on vacation/holiday.

Regards,

Bob Peterson
Red Hat Clustering & GFS


From federico.simoncelli at gmail.com  Wed Jun 18 13:55:36 2008
From: federico.simoncelli at gmail.com (Federico Simoncelli)
Date: Wed, 18 Jun 2008 15:55:36 +0200
Subject: [Linux-cluster] Disabling cman at boot
In-Reply-To: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>
References: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>
Message-ID: <a01fe36d0806180655u6f9621bcg5a0f56b3d975e627@mail.gmail.com>

On Wed, Jun 18, 2008 at 10:23 AM, Kadlecsik Jozsef
<kadlec at sunserv.kfki.hu> wrote:
> On Wed, 18 Jun 2008, Federico Simoncelli wrote:
>> Do you think adding a boot parameter (eg: nocluster) could be a good
>> solution? We should modify the init file for cman to check the
>> presence of that parameter and skip the start process if present.
>
> We use exactly the same method to specify how to boot a machine:

Do you think we can find a common solution and propose a patch upstream?
Having a boot parameter (eg: nocluster) to skip cman at boot would be
useful in your case or you really need something more specific to GFS?
Skipping cman would prevent any other cluster service to start (GFS too).

-- 
Federico.


From sunhux at gmail.com  Wed Jun 18 16:20:03 2008
From: sunhux at gmail.com (sunhux G)
Date: Thu, 19 Jun 2008 00:20:03 +0800
Subject: [Linux-cluster] Multipath or Oracle RAC ASM issue?
Message-ID: <60f08e700806180920s580a3fbdsfb520fd1d54551c2@mail.gmail.com>

Hi

We have SAN disk partitions presented to 2 Redhat Linux servers,  running
Oracle RAC (cluster) ASM.

As suggested by Linux "man multipath", I've made the
/var/lib/multipath/bindings the same
on both servers "to ensure That the multipath devices have the same names on
all nodes
                            accessing  them" :
# Format:
# alias wwid
mpath0 360a98000567244396334493370345055
mpath1 360a9800056724439633449336c786d69
mpath2 360a980005672443963344933706f536c
mpath3 360a9800056724439633449336c514c75
mpath4 360a980005672443963344933706b4770
mpath5 360a9800056724439633449336c4d6b36

However /proc/partition   on both servers  showed differences on the
 dm-x  devices  (but not on the /dev/sdxx devices) :
  RAC1 server :
 253     0   41947136 dm-0
 253     1   41947136 dm-1
 253     2   41947136 dm-2
 253     3    5242880 dm-3
 253     4    5242880 dm-4
 253     5   41947136 dm-5

  RAC2 server :
 253     0    5242880 dm-0
 253     1   41947136 dm-1
 253     2   41947136 dm-2
 253     3    5242880 dm-3
 253     4   41947136 dm-4
 253     5   41947136 dm-5


Appendix A below is the output of "multipath -ll" outputs from both servers
:


Q1 :
Do mpath0, mpath1, ...   shown in "multipath -ll" listing refer to
  /dev/mapper/mpathx  or  to  /dev/mpath/mpathx ?

Q2 :
How do we make it such that the dm-x  devices are accessing the same
SAN LUNs across the servers?  I believe they are not the same based
on the observations of the disk spaces associated with each of the dm-x
shown in /proc/partition


What's described above so far is to help relate to the main issue below
(Oracle ASM on both servers seem to be Not accessing the same underlying
  physical storage when trying to access the same ASM diskgroup:

On Server 1 :
==========
[root at landnet-rac1-temp mpath]# /etc/init.d/oracleasm created
isk land4 /dev/dm-0
Marking disk "/dev/dm-0" as an ASM disk:   OK  ]

[root at landnet-rac1-temp mpath]# /etc/init.d/oracleasm listdisks
LAND1
LAND2
LAND4
[root at landnet-rac1-temp mpath]# /etc/init.d/oracleasm querydisk land4
Disk "LAND4" is a valid ASM disk on device [253, 0]On Server 1 :

[root at landnet-rac1-temp mpath]# /etc/init.d/oracleasm created
ddisk land4 /dev/dm-0
Marking disk "/dev/dm-0" as an ASM disk:   OK  ]

[root at landnet-rac1-temp mpath]# /etc/init.d/oracleasm listdisks
LAND1
LAND2
LAND4
[root at landnet-rac1-temp mpath]# /etc/init.d/oracleasm querydisk land4
Disk "LAND4" is a valid ASM disk on device [253, 0]   <<== note this number

Server 2:
=======
[root at landnet-rac2-temp dev]# /etc/init.d/oracleasm scandisks
Scanning system for ASM disks:   OK  ]
[root at landnet-rac2-temp dev]# /etc/init.d/oracleasm listdisks
LAND1
LAND2
LAND4
[root at landnet-rac2-temp dev]# /etc/init.d/oracleasm querydisk land4
Disk "LAND4" is a valid ASM disk on device [8, 32] <<== note the number


We logged a call to Oracle who responded :
If we are using device mapper and ASMLib then, we need to use disks from
/dev/dm-* disks
instead of disks from  /dev/mapper/mpath*

but Redhat Support told us we should use /dev/mapper/mpath*


Appreciate if you can give me step by step instruction on how to untangle
this
whole issue.  Basically we want the Oracle RAC ASM diskgroup LAN4 (& other
diskgroups which we'll be creating) to be accessing the same underlying
storage.

I'm inclined to think this has to do with multipathing setup than Oracle ASM
setup
or SAN setup.

One last piece of information (which may be relevant) :
On server 2, the dm-x mappings (shown below) differs from that on server 1:
[root at landnet-rac2-temp dev]# ls -ld dm-*
brw-r-----  1 root root 253, 0 Jun 18 17:49 dm-0
brw-r-----  1 root root 253, 1 Jun 18 17:49 dm-1
brw-r-----  1 root root 253, 2 Jun 18 17:49 dm-2
brw-r-----  1 root root 253, 3 Jun 18 17:49 dm-3
brw-r-----  1 root root 253, 4 Jun 18 17:49 dm-4
brw-r-----  1 root root 253, 5 Jun 18 17:49 dm-5


Appendix A ("multipath -ll" output from the servers) :
========================

On RAC1 server :

[root at landnet-rac1-temp mpath]# multipath -ll
mpath2 (360a980005672443963344933706f536c)
[size=40 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:3 sdq 65:0   [active][ready]
 \_ 8:0:4:3 sdw 65:96  [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:3 sde 8:64   [active][ready]
 \_ 8:0:1:3 sdk 8:160  [active][ready]

mpath1 (360a9800056724439633449336c786d69)
[size=5 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:4 sdr 65:16  [active][ready]
 \_ 8:0:4:4 sdx 65:112 [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:4 sdf 8:80   [active][ready]
 \_ 8:0:1:4 sdl 8:176  [active][ready]

mpath0 (360a98000567244396334493370345055)
[size=5 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:5 sds 65:32  [active][ready]
 \_ 8:0:4:5 sdy 65:128 [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:5 sdg 8:96   [active][ready]
 \_ 8:0:1:5 sdm 8:192  [active][ready]

mpath5 (360a9800056724439633449336c4d6b36)
[size=40 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:1 sdo 8:224  [active][ready]
 \_ 8:0:4:1 sdu 65:64  [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:1 sdc 8:32   [active][ready]
 \_ 8:0:1:1 sdi 8:128  [active][ready]

mpath4 (360a980005672443963344933706b4770)
[size=40 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:2 sdp 8:240  [active][ready]
 \_ 8:0:4:2 sdv 65:80  [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:2 sdd 8:48   [active][ready]
 \_ 8:0:1:2 sdj 8:144  [active][ready]

mpath3 (360a9800056724439633449336c514c75)
[size=40 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:0 sdn 8:208  [active][ready]
 \_ 8:0:4:0 sdt 65:48  [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:1:0 sdh 8:112  [active][ready]
 \_ 8:0:0:0 sdb 8:16   [active][ready]


=================

On RAC 2 server :

[root at landnet-rac2-temp ~]# multipath -ll
mpath2 (360a980005672443963344933706f536c)
[size=40 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:3 sdq 65:0   [active][ready]
 \_ 8:0:3:3 sdw 65:96  [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:3 sde 8:64   [active][ready]
 \_ 8:0:1:3 sdk 8:160  [active][ready]

mpath1 (360a9800056724439633449336c786d69)
[size=5 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:4 sdr 65:16  [active][ready]
 \_ 8:0:3:4 sdx 65:112 [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:4 sdf 8:80   [active][ready]
 \_ 8:0:1:4 sdl 8:176  [active][ready]

mpath0 (360a98000567244396334493370345055)
[size=5 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:5 sds 65:32  [active][ready]
 \_ 8:0:3:5 sdy 65:128 [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:5 sdg 8:96   [active][ready]
 \_ 8:0:1:5 sdm 8:192  [active][ready]

mpath5 (360a9800056724439633449336c4d6b36)
[size=40 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:3:1 sdu 65:64  [active][ready]
 \_ 8:0:2:1 sdo 8:224  [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:1 sdc 8:32   [active][ready]
 \_ 8:0:1:1 sdi 8:128  [active][ready]

mpath4 (360a980005672443963344933706b4770)
[size=40 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:2 sdp 8:240  [active][ready]
 \_ 8:0:3:2 sdv 65:80  [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:1:2 sdj 8:144  [active][ready]
 \_ 8:0:0:2 sdd 8:48   [active][ready]

mpath3 (360a9800056724439633449336c514c75)
[size=40 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:0 sdn 8:208  [active][ready]
 \_ 8:0:3:0 sdt 65:48  [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:0 sdb 8:16   [active][ready]
 \_ 8:0:1:0 sdh 8:112  [active][ready]


Appendix B (rawdevices file)

# This file and interface are deprecated.
# Applications needing raw device access should open regular
# block devices with O_DIRECT.
# raw device bindings
# format:  <rawdev> <major> <minor>
#          <rawdev> <blockdev>
# example: /dev/raw/raw1 /dev/sda1
#          /dev/raw/raw2 8 5
/dev/mpath/mpath0
/dev/mpath/mpath1
/dev/mpath/mpath2
/dev/mpath/mpath3
/dev/mpath/mpath4
/dev/mpath/mpath5


Appendix C (/etc/multipath.conf)

========================

Server 1:

# This is an example configuration file for device mapper multipath.
# For a complete list of the default configuration values, see
# /usr/share/doc/device-mapper-multipath-0.4.5/multipath.conf.defaults
# For a list of configuration options with descriptions, see
# /usr/share/doc/device-mapper-multipath-0.4.5/multipath.conf.annotated


# Blacklist all devices by default. Remove this to enable multipathing
# on the default devices.
#devnode_blacklist {
#        devnode "*"
#}

## Use user friendly names, instead of using WWIDs as names.
defaults {
      user_friendly_names yes
}


## By default, devices with vendor = "IBM" and product = "S/390.*" are
## blacklisted. To enable mulitpathing on these devies, uncomment the
## following lines.
#devices {
#     device {
#           vendor                  "IBM"
#           product                 "S/390 DASD ECKD"
#           path_grouping_policy    multibus
#           getuid_callout          "/sbin/dasdview -j /dev/%n"
#           path_checker            directio
#     }
#}


##
## This is a template multipath-tools configuration file
## Uncomment the lines relevent to your environment
##
#defaults {
#      udev_dir          /dev
#      polling_interval       10
#      selector          "round-robin 0"
#      path_grouping_policy      multibus
#      getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
#      prio_callout            /bin/true
#      path_checker            readsector0
#      rr_min_io         100
#      rr_weight         priorities
#      failback          immediate
#      no_path_retry           fail
#      user_friendly_name      yes
#}
##
## The wwid line in the following blacklist section is shown as an example
## of how to blacklist devices by wwid.  The 3 devnode lines are the
## compiled in default blacklist. If you want to blacklist entire types
## of devices, such as all scsi devices, you should use a devnode line.
## However, if you want to blacklist specific devices, you should use
## a wwid line.  Since there is no guarantee that a specific device will
## not change names on reboot (from /dev/sda to /dev/sdb for example)
## devnode lines are not recommended for blacklisting specific devices.
##
## insert the sda, sdb or whatever local disks below so tt they're not
multipathed
devnode_blacklist {
#       wwid 26353900f02796769
      devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda|sdz)[0-9]*"
      devnode "^hd[a-z]"
      devnode "^cciss!c[0-9]d[0-9]*"
}
#multipaths {
#      multipath {
#           wwid              3600508b4000156d700012000000b0000
#           alias             yellow
#           path_grouping_policy      multibus
#           path_checker            readsector0
#           path_selector           "round-robin 0"
#           failback          manual
#           rr_weight         priorities
#           no_path_retry           5
#     }
#      multipath {
#           wwid              1DEC_____321816758474
#           alias             red
#     }
#}
devices {
#     device {
#           vendor                  "COMPAQ  "
#           product                 "HSV110 (C)COMPAQ"
#           path_grouping_policy      multibus
#           getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
#           path_checker            readsector0
#           path_selector           "round-robin 0"
#           hardware_handler      "0"
#           failback          15
#           rr_weight         priorities
#           no_path_retry           queue
#     }
#     device {
#           vendor                  "COMPAQ  "
#           product                 "MSA1000         "
#           path_grouping_policy      multibus
#     }
#
       device {
               vendor                  "NETAPP"
               product                 "LUN"
               path_grouping_policy    group_by_prio
               getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
               prio_callout            "/opt/netapp/santools/mpath_prio_ontap
/dev/%n"
               features                "1 queue_if_no_path"
               path_checker            readsector0
            failback          immediate
       }
}


Appendix D ("fdisk -l" output)

Server 1:

Disk /dev/sda: 146.6 GB, 146685296640 bytes
255 heads, 63 sectors/track, 17833 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14       13067   104856255   83  Linux
/dev/sda3           13068       15162    16828087+  82  Linux swap
/dev/sda4           15163       17833    21454807+   5  Extended
/dev/sda5           15163       16990    14683378+  83  Linux

Disk /dev/sdb: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes


Disk /dev/sdc: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes


Disk /dev/sdd: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sde: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdf: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

Disk /dev/sdg: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdg1               1        1018     5238597    5  Extended

Disk /dev/sdh: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdi: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdj: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdk: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdl: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

Disk /dev/sdm: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdm1               1        1018     5238597    5  Extended

Disk /dev/sdn: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdo: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdp: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdq: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdr: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

Disk /dev/sds: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sds1               1        1018     5238597    5  Extended

Disk /dev/sdt: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdu: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdv: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdw: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdx: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

Disk /dev/sdy: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdy1               1        1018     5238597    5  Extended

Disk /dev/dm-0: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/dm-1: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/dm-2: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/dm-3: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

Disk /dev/dm-4: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

     Device Boot      Start         End      Blocks   Id  System
/dev/dm-4p1               1        1018     5238597    5  Extended

Disk /dev/dm-5: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes


Server 2's "fdisk -l"

Disk /dev/sda: 146.6 GB, 146685296640 bytes
255 heads, 63 sectors/track, 17833 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14       13067   104856255   83  Linux
/dev/sda3           13068       15162    16828087+  82  Linux swap
/dev/sda4           15163       17833    21454807+   5  Extended
/dev/sda5           15163       16990    14683378+  83  Linux

Disk /dev/sdb: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdc: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes


Disk /dev/sdd: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes


Disk /dev/sde: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes


Disk /dev/sdf: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes


Disk /dev/sdg: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdg1               1        1018     5238597    5  Extended

Disk /dev/sdh: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdi: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdj: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdk: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdl: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

Disk /dev/sdm: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdm1               1        1018     5238597    5  Extended

Disk /dev/sdn: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdo: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdp: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdq: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdr: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

Disk /dev/sds: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sds1               1        1018     5238597    5  Extended

Disk /dev/sdt: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdu: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdv: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdw: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdx: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

Disk /dev/sdy: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdy1               1        1018     5238597    5  Extended

Disk /dev/dm-0: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

Disk /dev/dm-1: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/dm-2: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/dm-3: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes

     Device Boot      Start         End      Blocks   Id  System
/dev/dm-3p1               1        1018     5238597    5  Extended

Disk /dev/dm-4: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/dm-5: 42.9 GB, 42953867264 bytes
64 heads, 32 sectors/track, 40964 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes


Server 2 (/etc/multipath.conf)

========================

## Use user friendly names, instead of using WWIDs as names.
defaults {
      user_friendly_names yes
}


## By default, devices with vendor = "IBM" and product = "S/390.*" are
## blacklisted. To enable mulitpathing on these devies, uncomment the
## following lines.
#devices {
#     device {
#           vendor                  "IBM"
#           product                 "S/390 DASD ECKD"
#           path_grouping_policy    multibus
#           getuid_callout          "/sbin/dasdview -j /dev/%n"
#           path_checker            directio
#     }
#}


##
## This is a template multipath-tools configuration file
## Uncomment the lines relevent to your environment
##
#defaults {
#      udev_dir          /dev
#      polling_interval       10
#      selector          "round-robin 0"
#      path_grouping_policy      multibus
#      getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
#      prio_callout            /bin/true
#      path_checker            readsector0
#      rr_min_io         100
#      rr_weight         priorities
#      failback          immediate
#      no_path_retry           fail
#      user_friendly_name      yes
#}
##
## The wwid line in the following blacklist section is shown as an example
## of how to blacklist devices by wwid.  The 3 devnode lines are the
## compiled in default blacklist. If you want to blacklist entire types
## of devices, such as all scsi devices, you should use a devnode line.
## However, if you want to blacklist specific devices, you should use
## a wwid line.  Since there is no guarantee that a specific device will
## not change names on reboot (from /dev/sda to /dev/sdb for example)
## devnode lines are not recommended for blacklisting specific devices.
##
## insert the sda, sdb or whatever local disks below so tt they're not
multipathed
devnode_blacklist {
#       wwid 26353900f02796769
      devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda|sdz)[0-9]*"
      devnode "^hd[a-z]"
      devnode "^cciss!c[0-9]d[0-9]*"
}
#multipaths {
#      multipath {
#           wwid              3600508b4000156d700012000000b0000
#           alias             yellow
#           path_grouping_policy      multibus
#           path_checker            readsector0
#           path_selector           "round-robin 0"
#           failback          manual
#           rr_weight         priorities
#           no_path_retry           5
#     }
#      multipath {
#           wwid              1DEC_____321816758474
#           alias             red
#     }
#}
devices {
#     device {
#           vendor                  "COMPAQ  "
#           product                 "HSV110 (C)COMPAQ"
#           path_grouping_policy      multibus
#           getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
#           path_checker            readsector0
#           path_selector           "round-robin 0"
#           hardware_handler      "0"
#           failback          15
#           rr_weight         priorities
#           no_path_retry           queue
#     }
#     device {
#           vendor                  "COMPAQ  "
#           product                 "MSA1000         "
#           path_grouping_policy      multibus
#     }
#
       device {
               vendor                  "NETAPP"
               product                 "LUN"
               path_grouping_policy    group_by_prio
               getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
               prio_callout
"/opt/netapp/santools/mpath_prio_ontap /dev/%n"
               features                "1 queue_if_no_path"
               path_checker            readsector0
            failback          immediate
       }
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080619/69107d58/attachment.htm>

From td3201 at gmail.com  Wed Jun 18 17:48:07 2008
From: td3201 at gmail.com (Terry)
Date: Wed, 18 Jun 2008 12:48:07 -0500
Subject: [Linux-cluster] gfs tuning
In-Reply-To: <8ee061010806171522l1d18480em861bffb87f8b8be2@mail.gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
	<20080616191641.GA17965@kallisti.us> <4856C385.8000800@gmail.com>
	<8ee061010806170954y1d6d7555qf7cd5a137b63e018@mail.gmail.com>
	<485819DC.90503@gmail.com>
	<8ee061010806171522l1d18480em861bffb87f8b8be2@mail.gmail.com>
Message-ID: <8ee061010806181048j2d9e4635n6e0133c855e4ea06@mail.gmail.com>

On Tue, Jun 17, 2008 at 5:22 PM, Terry <td3201 at gmail.com> wrote:
> On Tue, Jun 17, 2008 at 3:09 PM, Wendy Cheng <s.wendy.cheng at gmail.com> wrote:
>> Hi, Terry,
>>>
>>> I am still seeing some high load averages.  Here is an example of a
>>> gfs configuration.  I left statfs_fast off as it would not apply to
>>> one of my volumes for an unknown reason.  Not sure that would have
>>> helped anyways.  I do, however, feel that reducing scand_secs helped a
>>> little:
>>>
>>
>> Sorry I missed scand_secs (was mindless as the brain was mostly occupied by
>> day time work).
>>
>> To simplify the view, glock states include exclusive (write), share (read),
>> and not-locked (in reality, there are more). Exclusive lock has to be
>> demoted (demote_secs) to share, then to not-locked (another demote_secs)
>> before it is scanned (every scand_secs) to get added into reclaim list where
>> it can be purged. Between exclusive and share state transition, the file
>> contents need to get flushed to disk (to keep file content cluster
>> coherent).  All of above assume the file (protected by this glock) is not
>> accessed (idle).
>>
>> You hit an area that GFS normally doesn't perform well. With GFS1 in
>> maintenance mode while GFS2 seems to be so far away, ext3 could be a better
>> answer. However, before switching, do make sure to test it thoroughly (since
>> Ext3 could have the very same issue as well - check out:
>> http://marc.info/?l=linux-nfs&m=121362947909974&w=2 ).
>>
>> Did you look (and test) GFS "nolock" protocol (for single node GFS)? It
>> bypasses some locking overhead and can be switched to  DLM in the future
>> (just make sure you reserve enough journal space - the rule of thumb is one
>> journal per node and know how many nodes you plan to have in the future).
>>
>> -- Wendy
>
> Good points.  I could try the nolock feature I suppose.  Not quite
> clear on how to reserve journal space.  I forgot to post the cpu time,
> check out this:
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4822 root      10  -5     0    0    0 S    1  0.0   2159:15 dlm_recv
>  4820 root      10  -5     0    0    0 S    1  0.0 368:09.34 dlm_astd
>  4821 root      10  -5     0    0    0 S    0  0.0 153:06.80 dlm_scand
>  3659 root      10  -5     0    0    0 S    0  0.0 134:40.14 scsi_wq_4
>  4823 root      11  -5     0    0    0 S    1  0.0 109:33.33 dlm_send
>  367 root      10  -5     0    0    0 S    0  0.0 103:33.74 kswapd0
>
> gfs_glockd is further below so not so concerned with that right now.
> It appears turning on nolock would do the trick.  The times aren't
> extremely accurate because I have failed this cluster between nodes
> while testing.
>

Here is some more testing information....

I created a new volume on my iscsi san of 1 TB and formatted it for
ext3. I then used dd to create a 100G file.  This yielded roughly 900
Mb/sec.  I then stopped my application and did the same thing with an
existing GFS volume.  This gave me about 850 Kb/sec.  This isn't an
iscsi issue.  This appears to be a load issue and the number of I/O
occurring on these volumes.  That said, I would expect that performing
the changes I did would result in a major performance improvement.
Since it didn't, what are my other points I could consider?   If its a
GFS issue, ext3 is the way to go.  Maybe even switch to using
active-active on my NFS cluster.   If its a backend disk issue, I
would expect to see the throughput on my iscsi link (bond1) be fully
utilized.  Its not.  Could I be thrashing the disks?  This is an iscsi
san with 30 sata disks.  Just bouncing some thoughts around to see if
anyone has any more thoughts.

Thanks!


From Joel.Becker at oracle.com  Wed Jun 18 19:28:32 2008
From: Joel.Becker at oracle.com (Joel Becker)
Date: Wed, 18 Jun 2008 12:28:32 -0700
Subject: [Linux-cluster] Multipath or Oracle RAC ASM issue?
In-Reply-To: <60f08e700806180920s580a3fbdsfb520fd1d54551c2@mail.gmail.com>
References: <60f08e700806180920s580a3fbdsfb520fd1d54551c2@mail.gmail.com>
Message-ID: <20080618192832.GB16780@ca-server1.us.oracle.com>

On Thu, Jun 19, 2008 at 12:20:03AM +0800, sunhux G wrote:
> We have SAN disk partitions presented to 2 Redhat Linux servers,  running
> Oracle RAC (cluster) ASM.

Hey there,
	First off, you're using ASMLib, so you don't need to worry about
device naming - ASMLib makes sure you are accessing the same device.
But read on for furthur discussion.

> As suggested by Linux "man multipath", I've made the
> /var/lib/multipath/bindings the same
> on both servers "to ensure That the multipath devices have the same names on
> all nodes
>                             accessing  them" :
> # Format:
> # alias wwid
> mpath0 360a98000567244396334493370345055

	This works as you expect.  The names in
/dev/(mapper|mpath)/mpathX should be identical on each server.
 
> However /proc/partition   on both servers  showed differences on the
>  dm-x  devices  (but not on the /dev/sdxx devices) :

	The dm-X devices are not expected to be the same.  You see, the
system creates the dm-X devices in the order they are seen - this can be
different.  After they are created, the udev subsystem uses the
information in /var/lib/multipath/bindings to map the mpathX name to the
appropriate dm-X device.  This ensures that the mpathX names are
identical on each system.

> Q1 :
> Do mpath0, mpath1, ...   shown in "multipath -ll" listing refer to
>   /dev/mapper/mpathx  or  to  /dev/mpath/mpathx ?

	If there are names /dev/mapper/mpathx and /dev/mpath/mpathx,
they should be refer to the same thing.

> Q2 :
> How do we make it such that the dm-x  devices are accessing the same
> SAN LUNs across the servers?  I believe they are not the same based
> on the observations of the disk spaces associated with each of the dm-x
> shown in /proc/partition

	You don't.  You don't care.  If you want a /dev name, you use
the mpathX name that you know maps correctly.  Since you are using
ASMLib, you don't worry about that either.  The LANDx names are read by
ASMLib to ensure ASM sees the correct devices no matter what /dev name
they have.

> [root at landnet-rac1-temp mpath]# /etc/init.d/oracleasm querydisk land4
> Disk "LAND4" is a valid ASM disk on device [253, 0]On Server 1 :

	That maps to dm-0.

> [root at landnet-rac2-temp dev]# /etc/init.d/oracleasm scandisks
> Scanning system for ASM disks:   OK  ]
> [root at landnet-rac2-temp dev]# /etc/init.d/oracleasm querydisk land4
> Disk "LAND4" is a valid ASM disk on device [8, 32] <<== note the number

	And that is /dev/sdc.  Your problem is SCANORDER.  ASMLib's disk
scan is seeing the scsi device (one of the paths) before the multipath
device.  This is actually safe, it just doesn't take advantage of the
multipath.  You want to use the multipath of course, so you want to
configure it correctly.
	See the instructions at
http://www.oracle.com/technology/tech/linux/asmlib/multipath.html to
configure ASMLib to see dm devices before scsi devices.  Then your
scandisks will always see the multipath devices.

> We logged a call to Oracle who responded :
> If we are using device mapper and ASMLib then, we need to use disks from
> /dev/dm-* disks
> instead of disks from  /dev/mapper/mpath*

	That's not the issue, and I'm sorry Oracle support gave you the
wrong answer.  You can createdisk against any name that's correct
(/dev/dm-X, /dev/mapper/mpathX, /dev/mpath/mpathX).  scandisks doesn't
even need a name - it just needs the correct order in your case.

Joel

-- 

"War doesn't determine who's right; war determines who's left."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127


From Joel.Becker at oracle.com  Wed Jun 18 19:29:29 2008
From: Joel.Becker at oracle.com (Joel Becker)
Date: Wed, 18 Jun 2008 12:29:29 -0700
Subject: [Linux-cluster] linux-cluster@redhat.com has a misconfigured
	Reply-To:
Message-ID: <20080618192929.GC16780@ca-server1.us.oracle.com>

Please remove the Reply-To: munging.  I tried to group reply, and of
course it failed.

http://www.unicom.com/pw/reply-to-harmful.html

Joel

-- 

"In the room the women come and go
 Talking of Michaelangelo."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127


From breeves at redhat.com  Wed Jun 18 23:43:55 2008
From: breeves at redhat.com (Bryn M. Reeves)
Date: Thu, 19 Jun 2008 00:43:55 +0100
Subject: [Linux-cluster] Multipath or Oracle RAC ASM issue?
In-Reply-To: <20080618192832.GB16780@ca-server1.us.oracle.com>
References: <60f08e700806180920s580a3fbdsfb520fd1d54551c2@mail.gmail.com>
	<20080618192832.GB16780@ca-server1.us.oracle.com>
Message-ID: <48599DBB.5040303@redhat.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Joel Becker wrote:
> 	This works as you expect.  The names in
> /dev/(mapper|mpath)/mpathX should be identical on each server.

It's often best to avoid the symlinks in /dev/mpath. Depending on your
version of udev they may get seriously out of sync with device creation,
meaning that the symlinks might not exist at the point you try to use
them. This is especially a problem with older udevs and for e.g.
mounting file systems using /dev/mpath/mpathN entries in fstab.

> different.  After they are created, the udev subsystem uses the
> information in /var/lib/multipath/bindings to map the mpathX name to the
> appropriate dm-X device.  This ensures that the mpathX names are
> identical on each system.

multipath/multipathd manage the mpathX names directly (creating them via
libdevmapper). It's udev that adds the symlinks (possible at some later
time).

>> Q2 :
>> How do we make it such that the dm-x  devices are accessing the same
>> SAN LUNs across the servers?  I believe they are not the same based
>> on the observations of the disk spaces associated with each of the dm-x
>> shown in /proc/partition
> 
> 	You don't.  You don't care.  If you want a /dev name, you use
> the mpathX name that you know maps correctly.  Since you are using
> ASMLib, you don't worry about that either.  The LANDx names are read by
> ASMLib to ensure ASM sees the correct devices no matter what /dev name
> they have.

It is important here to make sure that the aliases are synchronised. You
can do this by syncing the bindings file between the hosts or by using
explicit alias { /*...*/ } entries in multipath.conf.

Watch out if you have /var as a separate file system though - you can
get situations where mpathN names change unpredictably as the system
boots up (initial multipath run happens before /var is mounted, causing
multipath to think there is no bindings file. Later when /var is mounted
multipath or multipathd will try to change the mappings to honour the
bindings file causing devices to flip around). This can lead to data
loss/corruption in the the administrator cannot be completely sure the
device he/she thinks mpathN corresponds to really maps to the intended
device.

Recent versions of multipath-tools have a "bindings_file" option that
allows you to specify an alternate location to avoid this problem (e.g.
/etc/multipath-bindings).

>> We logged a call to Oracle who responded :
>> If we are using device mapper and ASMLib then, we need to use disks from
>> /dev/dm-* disks
>> instead of disks from  /dev/mapper/mpath*
> 
> 	That's not the issue, and I'm sorry Oracle support gave you the
> wrong answer.  You can createdisk against any name that's correct
> (/dev/dm-X, /dev/mapper/mpathX, /dev/mpath/mpathX).  scandisks doesn't
> even need a name - it just needs the correct order in your case.

Yeah, that's unfortunate. The dm-N nodes are really internal
device-mapper names and the general advice is to always prefer the
/dev/mapper names. Recent distributions tend to have udev rules that
ignore these devices & avoid creating the /dev/dm-N nodes completely.

Regards,
Bryn.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkhZnbsACgkQ6YSQoMYUY94fhwCgkrZPrT63kU4CDeipGdsscNbP
casAn2ZYi08ZCycIkQAe+5DVvDlcxrFv
=yIMT
-----END PGP SIGNATURE-----


From stephenamadei at hotmail.com  Thu Jun 19 00:59:06 2008
From: stephenamadei at hotmail.com (Stephen Amadei)
Date: Wed, 18 Jun 2008 20:59:06 -0400
Subject: [Linux-cluster] GFS2 nulling larger files.
In-Reply-To: <1213793327.3724.49.camel@technetium.msp.redhat.com>
References: <BAY110-W16D6B765DD8F81329C5F3CBBA80@phx.gbl>
	<1213710433.3724.44.camel@technetium.msp.redhat.com> 
	<1213793327.3724.49.camel@technetium.msp.redhat.com>
Message-ID: <BAY110-W51C3CC518069679F7FEF93BBAA0@phx.gbl>


>Yesterday, Ben Marzinski posted a patch for this problem to the public
>cluster-devel mailing list. It can be found here:
>https://www.redhat.com/archives/cluster-devel/2008-June/msg00114.html

>It is still not available in the upstream GIT tree due to Steve
>still being on vacation/holiday.

Thank you for the pointer.

Unfortunately, I can't seem to find this code in any of my source trees.  I looking my 2.6.24.7 and 2.6.25.7 kernels and all of the cluster code, but I have no 'zero_metapath_length' function to patch.

I am assuming from the directory tree (fs/gfs2/bmap.c) that this is a kernel patch.  Is there a specific kernel?

Thanks in advance.

Stephen
_________________________________________________________________
Introducing Live Search cashback .  It's search that pays you back!
http://search.live.com/cashback/?&pkw=form=MIJAAF/publ=HMTGL/crea=introsrchcashback


From Joel.Becker at oracle.com  Thu Jun 19 01:48:02 2008
From: Joel.Becker at oracle.com (Joel Becker)
Date: Wed, 18 Jun 2008 18:48:02 -0700
Subject: [Linux-cluster] Multipath or Oracle RAC ASM issue?
In-Reply-To: <48599DBB.5040303@redhat.com>
References: <60f08e700806180920s580a3fbdsfb520fd1d54551c2@mail.gmail.com>
	<20080618192832.GB16780@ca-server1.us.oracle.com>
	<48599DBB.5040303@redhat.com>
Message-ID: <20080619014802.GA27364@ca-server1.us.oracle.com>

On Thu, Jun 19, 2008 at 12:43:55AM +0100, Bryn M. Reeves wrote:
> Joel Becker wrote:
> > 	This works as you expect.  The names in
> > /dev/(mapper|mpath)/mpathX should be identical on each server.
> 
> It's often best to avoid the symlinks in /dev/mpath. Depending on your
> version of udev they may get seriously out of sync with device creation,

	Oh, that stinks.  Not having dealt with them much, I was unaware
of that.
 
> multipath/multipathd manage the mpathX names directly (creating them via
> libdevmapper). It's udev that adds the symlinks (possible at some later
> time).

	Thanks for the correction.

> > 	You don't.  You don't care.  If you want a /dev name, you use
> > the mpathX name that you know maps correctly.  Since you are using
> > ASMLib, you don't worry about that either.  The LANDx names are read by
> > ASMLib to ensure ASM sees the correct devices no matter what /dev name
> > they have.
> 
> It is important here to make sure that the aliases are synchronised. You
> can do this by syncing the bindings file between the hosts or by using
> explicit alias { /*...*/ } entries in multipath.conf.

	I assume you're talking about synchronizing the mpathX aliases.
the LANDx names were selected by the user via ASMLib, and not part of
the usual multipathd/udev/dm stuff.

> > 	That's not the issue, and I'm sorry Oracle support gave you the
> > wrong answer.  You can createdisk against any name that's correct
> > (/dev/dm-X, /dev/mapper/mpathX, /dev/mpath/mpathX).  scandisks doesn't
> > even need a name - it just needs the correct order in your case.
> 
> Yeah, that's unfortunate. The dm-N nodes are really internal
> device-mapper names and the general advice is to always prefer the
> /dev/mapper names. Recent distributions tend to have udev rules that
> ignore these devices & avoid creating the /dev/dm-N nodes completely.

	I wish they would leave the /dev/dm-N nodes alone.
/dev/mapper/* doesn't show up in /proc/partitions, and thus is
impossible to scan simply for.  A future ASMLib toolset will have new
scan code that walks /sys/block and can handle devices that don't show
up in /proc/partitions, but current releases need /dev/dm-X nodes.  When
people run into that, I have them uncomment that rule in udev.

Joel

-- 

"Anything that is too stupid to be spoken is sung."  
        - Voltaire

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127


From fdinitto at redhat.com  Thu Jun 19 07:35:40 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 19 Jun 2008 09:35:40 +0200 (CEST)
Subject: [Linux-cluster] Disabling cman at boot
In-Reply-To: <a01fe36d0806180655u6f9621bcg5a0f56b3d975e627@mail.gmail.com>
References: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>
	<a01fe36d0806180655u6f9621bcg5a0f56b3d975e627@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0806190933400.5892@trider-g7>

On Wed, 18 Jun 2008, Federico Simoncelli wrote:

> On Wed, Jun 18, 2008 at 10:23 AM, Kadlecsik Jozsef
> <kadlec at sunserv.kfki.hu> wrote:
>> On Wed, 18 Jun 2008, Federico Simoncelli wrote:
>>> Do you think adding a boot parameter (eg: nocluster) could be a good
>>> solution? We should modify the init file for cman to check the
>>> presence of that parameter and skip the start process if present.
>>
>> We use exactly the same method to specify how to boot a machine:
>
> Do you think we can find a common solution and propose a patch upstream?

I am open to add patches.

> Having a boot parameter (eg: nocluster) to skip cman at boot would be
> useful in your case or you really need something more specific to GFS?
> Skipping cman would prevent any other cluster service to start (GFS too).

GFS can also run in lock_nolock and that does not require cman.

I think you want to make sure of what you really want before 
enabling/disabling cman.

Fabio

--
I'm going to make him an offer he can't refuse.


From breeves at redhat.com  Thu Jun 19 09:00:08 2008
From: breeves at redhat.com (Bryn M. Reeves)
Date: Thu, 19 Jun 2008 10:00:08 +0100
Subject: [Linux-cluster] Multipath or Oracle RAC ASM issue?
In-Reply-To: <20080619014802.GA27364@ca-server1.us.oracle.com>
References: <60f08e700806180920s580a3fbdsfb520fd1d54551c2@mail.gmail.com>	<20080618192832.GB16780@ca-server1.us.oracle.com>	<48599DBB.5040303@redhat.com>
	<20080619014802.GA27364@ca-server1.us.oracle.com>
Message-ID: <485A2018.3090803@redhat.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Joel Becker wrote:
> On Thu, Jun 19, 2008 at 12:43:55AM +0100, Bryn M. Reeves wrote:
>> It is important here to make sure that the aliases are synchronised. You
>> can do this by syncing the bindings file between the hosts or by using
>> explicit alias { /*...*/ } entries in multipath.conf.
> 
> 	I assume you're talking about synchronizing the mpathX aliases.
> the LANDx names were selected by the user via ASMLib, and not part of
> the usual multipathd/udev/dm stuff.

Yes - just the multipath aliases, either in the bindings file or
multipath.conf. This applies any time you want to have mpath0 or
whatever be the same device across a set of machines that all see the
same storage. If ASM uses labels then it might not actually matter that
the aliases differ across nodes but administrators find it less confusing.

> 	I wish they would leave the /dev/dm-N nodes alone.
> /dev/mapper/* doesn't show up in /proc/partitions, and thus is
> impossible to scan simply for.  A future ASMLib toolset will have new
> scan code that walks /sys/block and can handle devices that don't show
> up in /proc/partitions, but current releases need /dev/dm-X nodes.  When
> people run into that, I have them uncomment that rule in udev.

Hmm, interesting - hadn't considered this case. Most of the tools that I
work with do use /sys/block for scanning purposes but I do know of code
out there that does rely on the content of partitions and you're right
of course that this won't work if the /dev/dm-N entries aren't there.

Regards,
Bryn.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkhaIBgACgkQ6YSQoMYUY97OzgCeMMaU01g83V23UpHsBd8yD1k8
sY0AnRcTD+fLZyueLXmCCGq0H4m1yqVC
=Ow6f
-----END PGP SIGNATURE-----


From kadlec at sunserv.kfki.hu  Thu Jun 19 09:34:57 2008
From: kadlec at sunserv.kfki.hu (Kadlecsik Jozsef)
Date: Thu, 19 Jun 2008 11:34:57 +0200 (CEST)
Subject: [Linux-cluster] Disabling cman at boot
In-Reply-To: <Pine.LNX.4.64.0806190933400.5892@trider-g7>
References: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>
	<a01fe36d0806180655u6f9621bcg5a0f56b3d975e627@mail.gmail.com>
	<Pine.LNX.4.64.0806190933400.5892@trider-g7>
Message-ID: <Pine.LNX.4.64.0806191006080.16390@lxserv0.kfki.hu>

On Thu, 19 Jun 2008, Fabio M. Di Nitto wrote:

> On Wed, 18 Jun 2008, Federico Simoncelli wrote:
> 
> > On Wed, Jun 18, 2008 at 10:23 AM, Kadlecsik Jozsef
> > <kadlec at sunserv.kfki.hu> wrote:
> > > On Wed, 18 Jun 2008, Federico Simoncelli wrote:
> > > > Do you think adding a boot parameter (eg: nocluster) could be a good
> > > > solution? We should modify the init file for cman to check the
> > > > presence of that parameter and skip the start process if present.
> > >
> > > We use exactly the same method to specify how to boot a machine:
> >
> > Do you think we can find a common solution and propose a patch upstream?
> 
> I am open to add patches.
>
> > Having a boot parameter (eg: nocluster) to skip cman at boot would be
> > useful in your case or you really need something more specific to GFS?
> > Skipping cman would prevent any other cluster service to start (GFS too).
> 
> GFS can also run in lock_nolock and that does not require cman.

But that is not an option in a in-production GFS cluster, I believe.
 
> I think you want to make sure of what you really want before
> enabling/disabling cman.

What we implemented serves multiple purposes:

a. disable the whole GFS cluster suite (and every service relying on GFS)
   for the next reboot of a host as a planned maintenance mode
b. disable the mounting of GFS volumes (and every services relying on GFS
   volumes) for the next reboot of a host as a planned maintenance mode
c. boot a host directly in a) mode in case of emergency
d. boot a host directly in b) mode in case of emergency

To underline the difference and importancy of a-c and b-d modes, just a 
few examples: 

- due to the hardware upgrade of the shared block devices (in our case 
  Coraid AoE), we applied a) to shut down the whole cluster, upgrade the 
  hardware, test AoE access and update manually /etc/cluster/cluster.conf 
  with the new fencing parameters
- once we were hit by a firmware bug in the shared devices and by 
  switching to b-d) we could run gfs_fsck on all volumes easily
- all modes were used at testing services, fixing boot/shutdown/reboot 
  init scripts on test nodes (we run GFS on top of Ubuntu).

As we rewrote the cman and gfs init scripts due to the differences between 
RedHat and Debian/Ubuntu, I can't really send patches. But the changes in 
question are actually minimal: both the cman and gfs init script starts 
with the added lines

#
# Skip GFS if asked at boot time
#
[ "$1" = "start" -a -e /etc/cluster/skip_gfs ] && exit 0

and there is just one more line added to the gfs init script, before 
mounting the volumes

                [ -e /etc/cluster/skip_gfs_mount ] && exit 0
                echo -n "Mounting GFS filesystems: "
                mount -a -t gfs

The files /etc/cluster/skip_gfs and /etc/cluster/skip_gfs_mount are 
created by the attached init script called 'cluster_maintenance' if 
it's called manually by the proper arguments or at boot time when it 
detects the corresponding boot arguments (started before cman by init). 
The script can run additional scripts to disable/enable services which 
rely on GFS.

I hope it helps.

Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary
-------------- next part --------------
#!/bin/sh

case "$1" in
    start)
        if [ -n "`grep SKIP_GFS /proc/cmdline`" ]; then
        	touch /etc/cluster/skip_gfs
        fi
        if [ -n "`grep SKIP_GFS_MOUNT /proc/cmdline`" ]; then
        	touch /etc/cluster/skip_gfs_mount
        fi
    	if [ -e /etc/cluster/skip_gfs -o -e /etc/cluster/skip_gfs_mount ]; then
    		shopt -s nullglob
		for x in /etc/cluster/disable_services/*; do
	    		$x disable
		done
		# ssh
		cat > /etc/nologin <<TXT
System is under maintenance, sorry.
TXT
	fi
	;;
    skip_gfs)
    	touch /etc/cluster/skip_gfs
    	;;
    skip_gfs_mount)
    	touch /etc/cluster/skip_gfs_mount
    	;;
    disable)
        rm -f /etc/cluster/skip_gfs /etc/cluster/skip_gfs_mount
        if [ -z "`pgrep fenced`" ]; then
        	/etc/init.d/cman start
        fi
        if [ -z "`pgrep clvmd`" ]; then
        	/etc/init.d/gfs start
        fi
    	shopt -s nullglob
	for x in /etc/cluster/disable_services/*; do
	    $x enable
	    y=${x##*/}
	    /etc/init.d/$y start
	done
	# ssh
	rm -f /etc/nologin
	;;
    stop_services)
	shopt -s nullglob
	for x in /etc/cluster/disable_services/*; do
		y=${x##*/}
		/etc/init.d/$y stop
	done
	;;
    stop)
	;;
   *)
   	echo "Usage: $0 {start|stop_services|disable|gfs|gfs_mount}"
   	;;
esac

From fdinitto at redhat.com  Thu Jun 19 12:24:27 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 19 Jun 2008 14:24:27 +0200 (CEST)
Subject: [Linux-cluster] Disabling cman at boot
In-Reply-To: <Pine.LNX.4.64.0806191006080.16390@lxserv0.kfki.hu>
References: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>
	<a01fe36d0806180655u6f9621bcg5a0f56b3d975e627@mail.gmail.com>
	<Pine.LNX.4.64.0806190933400.5892@trider-g7>
	<Pine.LNX.4.64.0806191006080.16390@lxserv0.kfki.hu>
Message-ID: <Pine.LNX.4.64.0806191418010.5892@trider-g7>

On Thu, 19 Jun 2008, Kadlecsik Jozsef wrote:

> On Thu, 19 Jun 2008, Fabio M. Di Nitto wrote:
>
>> On Wed, 18 Jun 2008, Federico Simoncelli wrote:
>>
>>> On Wed, Jun 18, 2008 at 10:23 AM, Kadlecsik Jozsef
>>> <kadlec at sunserv.kfki.hu> wrote:
>>>> On Wed, 18 Jun 2008, Federico Simoncelli wrote:
>>>>> Do you think adding a boot parameter (eg: nocluster) could be a good
>>>>> solution? We should modify the init file for cman to check the
>>>>> presence of that parameter and skip the start process if present.
>>>>
>>>> We use exactly the same method to specify how to boot a machine:
>>>
>>> Do you think we can find a common solution and propose a patch upstream?
>>
>> I am open to add patches.
>>
>>> Having a boot parameter (eg: nocluster) to skip cman at boot would be
>>> useful in your case or you really need something more specific to GFS?
>>> Skipping cman would prevent any other cluster service to start (GFS too).
>>
>> GFS can also run in lock_nolock and that does not require cman.
>
> But that is not an option in a in-production GFS cluster, I believe.

There is really nothing that stops you to have / on GFS lock_nolock.

>
>> I think you want to make sure of what you really want before
>> enabling/disabling cman.
>
> What we implemented serves multiple purposes:
>
> a. disable the whole GFS cluster suite (and every service relying on GFS)
>   for the next reboot of a host as a planned maintenance mode
> b. disable the mounting of GFS volumes (and every services relying on GFS
>   volumes) for the next reboot of a host as a planned maintenance mode
> c. boot a host directly in a) mode in case of emergency
> d. boot a host directly in b) mode in case of emergency
>
> To underline the difference and importancy of a-c and b-d modes, just a
> few examples:
>
> - due to the hardware upgrade of the shared block devices (in our case
>  Coraid AoE), we applied a) to shut down the whole cluster, upgrade the
>  hardware, test AoE access and update manually /etc/cluster/cluster.conf
>  with the new fencing parameters
> - once we were hit by a firmware bug in the shared devices and by
>  switching to b-d) we could run gfs_fsck on all volumes easily
> - all modes were used at testing services, fixing boot/shutdown/reboot
>  init scripts on test nodes (we run GFS on top of Ubuntu).

I understand the use cases. That's not the problem. I'd like to see a 
"standard" set of keywords to use that we all agree upon.

> As we rewrote the cman and gfs init scripts due to the differences between
> RedHat and Debian/Ubuntu, I can't really send patches.

I maintain the Ubuntu init script and work very closely with the Debian 
maintainers. At the same time i can apply changes to whatver is shipped 
from upstream.
So this is not a blocker whatsoever.

Whatever changes you have, best to have them sent to this mailing list for 
evaluation and mostlikely inclusion if they are valid.

> But the changes in
> question are actually minimal: both the cman and gfs init script starts
> with the added lines
>
> #
> # Skip GFS if asked at boot time
> #
> [ "$1" = "start" -a -e /etc/cluster/skip_gfs ] && exit 0
>
> and there is just one more line added to the gfs init script, before
> mounting the volumes
>
>                [ -e /etc/cluster/skip_gfs_mount ] && exit 0
>                echo -n "Mounting GFS filesystems: "
>                mount -a -t gfs
>
> The files /etc/cluster/skip_gfs and /etc/cluster/skip_gfs_mount are
> created by the attached init script called 'cluster_maintenance' if
> it's called manually by the proper arguments or at boot time when it
> detects the corresponding boot arguments (started before cman by init).
> The script can run additional scripts to disable/enable services which
> rely on GFS.

I see. I think i would prefer to see the parsing of cmdline done directly 
by cman/gfs/rgmanager init scripts rather than an extra one.

Few reasons:

- more init scripts increase complexity of boot sequence
- you touch files in /etc and that's not good practise. / might be read 
only if we are in emergency maintainance and even if it is read-write, 
something that is not really a config option, should go in either 
/var/lib/something (if needs to be persistent at reboot) or /tmp.

> I hope it helps.

absolutely.

Thanks
Fabio

--
I'm going to make him an offer he can't refuse.


From rpeterso at redhat.com  Thu Jun 19 14:08:44 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 19 Jun 2008 09:08:44 -0500
Subject: [Linux-cluster] GFS2 nulling larger files.
In-Reply-To: <BAY110-W51C3CC518069679F7FEF93BBAA0@phx.gbl>
References: <BAY110-W16D6B765DD8F81329C5F3CBBA80@phx.gbl>
	<1213710433.3724.44.camel@technetium.msp.redhat.com>
	<1213793327.3724.49.camel@technetium.msp.redhat.com>
	<BAY110-W51C3CC518069679F7FEF93BBAA0@phx.gbl>
Message-ID: <1213884524.3724.60.camel@technetium.msp.redhat.com>

Hi Stephen,

On Wed, 2008-06-18 at 20:59 -0400, Stephen Amadei wrote:
> Thank you for the pointer.
> 
> Unfortunately, I can't seem to find this code in any of my source trees.  I looking my 2.6.24.7 and 2.6.25.7 kernels and all of the cluster code, but I have no 'zero_metapath_length' function to patch.
> 
> I am assuming from the directory tree (fs/gfs2/bmap.c) that this is a kernel patch.  Is there a specific kernel?
> 
> Thanks in advance.
> 
> Stephen

Ben's patch is against the latest and greatest upstream GFS2 code,
which is in Steve Whitehouse's git repository.  To get that source
tree, which includes a complete upstream kernel tree, you can do
this command:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw.git gfs2-2.6.git

You may have to install the 'git' tool first.

Regards,

Bob Peterson
Red Hat Clustering & GFS


From s.wendy.cheng at gmail.com  Thu Jun 19 14:49:17 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Thu, 19 Jun 2008 10:49:17 -0400
Subject: [Linux-cluster] gfs tuning
In-Reply-To: <8ee061010806181048j2d9e4635n6e0133c855e4ea06@mail.gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>	<20080616191641.GA17965@kallisti.us>
	<4856C385.8000800@gmail.com>	<8ee061010806170954y1d6d7555qf7cd5a137b63e018@mail.gmail.com>	<485819DC.90503@gmail.com>	<8ee061010806171522l1d18480em861bffb87f8b8be2@mail.gmail.com>
	<8ee061010806181048j2d9e4635n6e0133c855e4ea06@mail.gmail.com>
Message-ID: <485A71ED.8030305@gmail.com>

Terry wrote:
> On Tue, Jun 17, 2008 at 5:22 PM, Terry <td3201 at gmail.com> wrote:
>   
>> On Tue, Jun 17, 2008 at 3:09 PM, Wendy Cheng <s.wendy.cheng at gmail.com> wrote:
>>     
>>> Hi, Terry,
>>>       
>>>> I am still seeing some high load averages.  Here is an example of a
>>>> gfs configuration.  I left statfs_fast off as it would not apply to
>>>> one of my volumes for an unknown reason.  Not sure that would have
>>>> helped anyways.  I do, however, feel that reducing scand_secs helped a
>>>> little:
>>>>
>>>>         
>>> Sorry I missed scand_secs (was mindless as the brain was mostly occupied by
>>> day time work).
>>>
>>> To simplify the view, glock states include exclusive (write), share (read),
>>> and not-locked (in reality, there are more). Exclusive lock has to be
>>> demoted (demote_secs) to share, then to not-locked (another demote_secs)
>>> before it is scanned (every scand_secs) to get added into reclaim list where
>>> it can be purged. Between exclusive and share state transition, the file
>>> contents need to get flushed to disk (to keep file content cluster
>>> coherent).  All of above assume the file (protected by this glock) is not
>>> accessed (idle).
>>>
>>> You hit an area that GFS normally doesn't perform well. With GFS1 in
>>> maintenance mode while GFS2 seems to be so far away, ext3 could be a better
>>> answer. However, before switching, do make sure to test it thoroughly (since
>>> Ext3 could have the very same issue as well - check out:
>>> http://marc.info/?l=linux-nfs&m=121362947909974&w=2 ).
>>>
>>> Did you look (and test) GFS "nolock" protocol (for single node GFS)? It
>>> bypasses some locking overhead and can be switched to  DLM in the future
>>> (just make sure you reserve enough journal space - the rule of thumb is one
>>> journal per node and know how many nodes you plan to have in the future).
>>>
>>> -- Wendy
>>>       
>> Good points.  I could try the nolock feature I suppose.  Not quite
>> clear on how to reserve journal space.  I forgot to post the cpu time,
>> check out this:
>>
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  4822 root      10  -5     0    0    0 S    1  0.0   2159:15 dlm_recv
>>  4820 root      10  -5     0    0    0 S    1  0.0 368:09.34 dlm_astd
>>  4821 root      10  -5     0    0    0 S    0  0.0 153:06.80 dlm_scand
>>  3659 root      10  -5     0    0    0 S    0  0.0 134:40.14 scsi_wq_4
>>  4823 root      11  -5     0    0    0 S    1  0.0 109:33.33 dlm_send
>>  367 root      10  -5     0    0    0 S    0  0.0 103:33.74 kswapd0
>>
>> gfs_glockd is further below so not so concerned with that right now.
>> It appears turning on nolock would do the trick.  The times aren't
>> extremely accurate because I have failed this cluster between nodes
>> while testing.
>>
>>     
>
> Here is some more testing information....
>
> I created a new volume on my iscsi san of 1 TB and formatted it for
> ext3. I then used dd to create a 100G file.  This yielded roughly 900
> Mb/sec.  I then stopped my application and did the same thing with an
> existing GFS volume.  This gave me about 850 Kb/sec.  This isn't an
> iscsi issue.  This appears to be a load issue and the number of I/O
> occurring on these volumes.  That said, I would expect that performing
> the changes I did would result in a major performance improvement.
> Since it didn't, what are my other points I could consider?   If its a
> GFS issue, ext3 is the way to go.  Maybe even switch to using
> active-active on my NFS cluster.   If its a backend disk issue, I
> would expect to see the throughput on my iscsi link (bond1) be fully
> utilized.  Its not.  Could I be thrashing the disks?  This is an iscsi
> san with 30 sata disks.  Just bouncing some thoughts around to see if
> anyone has any more thoughts.
>
>   
Really need to focus on my day time job - its worload has been climbing 
... but can't help to place a quick comment here ..

The 900 MB/s vs. 850 KB/s difference looks like a caching  issue - that 
is, for 900 MB/s, it looks like the data was still lingering in the 
system cache while in 850 KB/s case, the data might already hit disk. 
Cluster filesystem normally syncs more by its nature. In general, ext3 
does perform better in single node environment but the difference should 
not be as big as above. 

There are certainly more tuning knobs available (such as journal size 
and/or network buffer size) to make GFS-iscsi "dd" run better but it is 
pointless. To deploy a cluster filesystem for production usage, the 
tuning should not be driven by such a simple-mind command. You also have 
to consider the support issues when deploying a filesystem. GFS1 is a 
little bit out of date and any new development and/or significant 
performance improvements would likely be in GFS2, not in GFS1. Research 
GFS2 (googling to see how other people said about it) to understand 
whether its direction fits your need (so you can migrate from GFS1 to 
GFS2 if you bump into any show stopper in the future). If not, ext3 
(with ext4 actively developed) is a fine choice if I read your 
configuration right from previous posts.

-- Wendy


From td3201 at gmail.com  Thu Jun 19 15:30:57 2008
From: td3201 at gmail.com (Terry)
Date: Thu, 19 Jun 2008 10:30:57 -0500
Subject: [Linux-cluster] gfs tuning
In-Reply-To: <485A71ED.8030305@gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
	<20080616191641.GA17965@kallisti.us> <4856C385.8000800@gmail.com>
	<8ee061010806170954y1d6d7555qf7cd5a137b63e018@mail.gmail.com>
	<485819DC.90503@gmail.com>
	<8ee061010806171522l1d18480em861bffb87f8b8be2@mail.gmail.com>
	<8ee061010806181048j2d9e4635n6e0133c855e4ea06@mail.gmail.com>
	<485A71ED.8030305@gmail.com>
Message-ID: <8ee061010806190830q67977113l1b140a0ad91d0ec0@mail.gmail.com>

On Thu, Jun 19, 2008 at 9:49 AM, Wendy Cheng <s.wendy.cheng at gmail.com> wrote:
> Terry wrote:
>>
>> On Tue, Jun 17, 2008 at 5:22 PM, Terry <td3201 at gmail.com> wrote:
>>
>>>
>>> On Tue, Jun 17, 2008 at 3:09 PM, Wendy Cheng <s.wendy.cheng at gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hi, Terry,
>>>>
>>>>>
>>>>> I am still seeing some high load averages.  Here is an example of a
>>>>> gfs configuration.  I left statfs_fast off as it would not apply to
>>>>> one of my volumes for an unknown reason.  Not sure that would have
>>>>> helped anyways.  I do, however, feel that reducing scand_secs helped a
>>>>> little:
>>>>>
>>>>>
>>>>
>>>> Sorry I missed scand_secs (was mindless as the brain was mostly occupied
>>>> by
>>>> day time work).
>>>>
>>>> To simplify the view, glock states include exclusive (write), share
>>>> (read),
>>>> and not-locked (in reality, there are more). Exclusive lock has to be
>>>> demoted (demote_secs) to share, then to not-locked (another demote_secs)
>>>> before it is scanned (every scand_secs) to get added into reclaim list
>>>> where
>>>> it can be purged. Between exclusive and share state transition, the file
>>>> contents need to get flushed to disk (to keep file content cluster
>>>> coherent).  All of above assume the file (protected by this glock) is
>>>> not
>>>> accessed (idle).
>>>>
>>>> You hit an area that GFS normally doesn't perform well. With GFS1 in
>>>> maintenance mode while GFS2 seems to be so far away, ext3 could be a
>>>> better
>>>> answer. However, before switching, do make sure to test it thoroughly
>>>> (since
>>>> Ext3 could have the very same issue as well - check out:
>>>> http://marc.info/?l=linux-nfs&m=121362947909974&w=2 ).
>>>>
>>>> Did you look (and test) GFS "nolock" protocol (for single node GFS)? It
>>>> bypasses some locking overhead and can be switched to  DLM in the future
>>>> (just make sure you reserve enough journal space - the rule of thumb is
>>>> one
>>>> journal per node and know how many nodes you plan to have in the
>>>> future).
>>>>
>>>> -- Wendy
>>>>
>>>
>>> Good points.  I could try the nolock feature I suppose.  Not quite
>>> clear on how to reserve journal space.  I forgot to post the cpu time,
>>> check out this:
>>>
>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>  4822 root      10  -5     0    0    0 S    1  0.0   2159:15 dlm_recv
>>>  4820 root      10  -5     0    0    0 S    1  0.0 368:09.34 dlm_astd
>>>  4821 root      10  -5     0    0    0 S    0  0.0 153:06.80 dlm_scand
>>>  3659 root      10  -5     0    0    0 S    0  0.0 134:40.14 scsi_wq_4
>>>  4823 root      11  -5     0    0    0 S    1  0.0 109:33.33 dlm_send
>>>  367 root      10  -5     0    0    0 S    0  0.0 103:33.74 kswapd0
>>>
>>> gfs_glockd is further below so not so concerned with that right now.
>>> It appears turning on nolock would do the trick.  The times aren't
>>> extremely accurate because I have failed this cluster between nodes
>>> while testing.
>>>
>>>
>>
>> Here is some more testing information....
>>
>> I created a new volume on my iscsi san of 1 TB and formatted it for
>> ext3. I then used dd to create a 100G file.  This yielded roughly 900
>> Mb/sec.  I then stopped my application and did the same thing with an
>> existing GFS volume.  This gave me about 850 Kb/sec.  This isn't an
>> iscsi issue.  This appears to be a load issue and the number of I/O
>> occurring on these volumes.  That said, I would expect that performing
>> the changes I did would result in a major performance improvement.
>> Since it didn't, what are my other points I could consider?   If its a
>> GFS issue, ext3 is the way to go.  Maybe even switch to using
>> active-active on my NFS cluster.   If its a backend disk issue, I
>> would expect to see the throughput on my iscsi link (bond1) be fully
>> utilized.  Its not.  Could I be thrashing the disks?  This is an iscsi
>> san with 30 sata disks.  Just bouncing some thoughts around to see if
>> anyone has any more thoughts.
>>
>>
>
> Really need to focus on my day time job - its worload has been climbing ...
> but can't help to place a quick comment here ..
>
> The 900 MB/s vs. 850 KB/s difference looks like a caching  issue - that is,
> for 900 MB/s, it looks like the data was still lingering in the system cache
> while in 850 KB/s case, the data might already hit disk. Cluster filesystem
> normally syncs more by its nature. In general, ext3 does perform better in
> single node environment but the difference should not be as big as above.
> There are certainly more tuning knobs available (such as journal size and/or
> network buffer size) to make GFS-iscsi "dd" run better but it is pointless.
> To deploy a cluster filesystem for production usage, the tuning should not
> be driven by such a simple-mind command. You also have to consider the
> support issues when deploying a filesystem. GFS1 is a little bit out of date
> and any new development and/or significant performance improvements would
> likely be in GFS2, not in GFS1. Research GFS2 (googling to see how other
> people said about it) to understand whether its direction fits your need (so
> you can migrate from GFS1 to GFS2 if you bump into any show stopper in the
> future). If not, ext3 (with ext4 actively developed) is a fine choice if I
> read your configuration right from previous posts.
>
> -- Wendy

I wrote off the difference between ext3 and gfs performance with my
simple dd command as nothing.  I wanted to ensure I wasn't seeing some
other issue.  I am happy with 800-900 regardless of the filesystem.
I'm going to see if I can get some performance metrics off the SAN and
go from there.


From s.wendy.cheng at gmail.com  Thu Jun 19 15:42:17 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Thu, 19 Jun 2008 11:42:17 -0400
Subject: [Linux-cluster] gfs tuning
In-Reply-To: <485A71ED.8030305@gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>	<20080616191641.GA17965@kallisti.us>
	<4856C385.8000800@gmail.com>	<8ee061010806170954y1d6d7555qf7cd5a137b63e018@mail.gmail.com>	<485819DC.90503@gmail.com>	<8ee061010806171522l1d18480em861bffb87f8b8be2@mail.gmail.com>
	<8ee061010806181048j2d9e4635n6e0133c855e4ea06@mail.gmail.com>
	<485A71ED.8030305@gmail.com>
Message-ID: <485A7E59.8020107@gmail.com>

Wendy Cheng wrote:
> Terry wrote:
>> On Tue, Jun 17, 2008 at 5:22 PM, Terry <td3201 at gmail.com> wrote:
>>  
>>> On Tue, Jun 17, 2008 at 3:09 PM, Wendy Cheng 
>>> <s.wendy.cheng at gmail.com> wrote:
>>>    
>>>> Hi, Terry,
>>>>      
>>>>> I am still seeing some high load averages.  Here is an example of a
>>>>> gfs configuration.  I left statfs_fast off as it would not apply to
>>>>> one of my volumes for an unknown reason.  Not sure that would have
>>>>> helped anyways.  I do, however, feel that reducing scand_secs 
>>>>> helped a
>>>>> little:
>>>>>
>>>>>         
>>>> Sorry I missed scand_secs (was mindless as the brain was mostly 
>>>> occupied by
>>>> day time work).
>>>>
>>>> To simplify the view, glock states include exclusive (write), share 
>>>> (read),
>>>> and not-locked (in reality, there are more). Exclusive lock has to be
>>>> demoted (demote_secs) to share, then to not-locked (another 
>>>> demote_secs)
>>>> before it is scanned (every scand_secs) to get added into reclaim 
>>>> list where
>>>> it can be purged. Between exclusive and share state transition, the 
>>>> file
>>>> contents need to get flushed to disk (to keep file content cluster
>>>> coherent).  All of above assume the file (protected by this glock) 
>>>> is not
>>>> accessed (idle).
>>>>
>>>> You hit an area that GFS normally doesn't perform well. With GFS1 in
>>>> maintenance mode while GFS2 seems to be so far away, ext3 could be 
>>>> a better
>>>> answer. However, before switching, do make sure to test it 
>>>> thoroughly (since
>>>> Ext3 could have the very same issue as well - check out:
>>>> http://marc.info/?l=linux-nfs&m=121362947909974&w=2 ).
>>>>
>>>> Did you look (and test) GFS "nolock" protocol (for single node 
>>>> GFS)? It
>>>> bypasses some locking overhead and can be switched to  DLM in the 
>>>> future
>>>> (just make sure you reserve enough journal space - the rule of 
>>>> thumb is one
>>>> journal per node and know how many nodes you plan to have in the 
>>>> future).
>>>>
>>>> -- Wendy
>>>>       
>>> Good points.  I could try the nolock feature I suppose.  Not quite
>>> clear on how to reserve journal space.  I forgot to post the cpu time,
>>> check out this:
>>>
>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>  4822 root      10  -5     0    0    0 S    1  0.0   2159:15 dlm_recv
>>>  4820 root      10  -5     0    0    0 S    1  0.0 368:09.34 dlm_astd
>>>  4821 root      10  -5     0    0    0 S    0  0.0 153:06.80 dlm_scand
>>>  3659 root      10  -5     0    0    0 S    0  0.0 134:40.14 scsi_wq_4
>>>  4823 root      11  -5     0    0    0 S    1  0.0 109:33.33 dlm_send
>>>  367 root      10  -5     0    0    0 S    0  0.0 103:33.74 kswapd0
>>>
>>> gfs_glockd is further below so not so concerned with that right now.
>>> It appears turning on nolock would do the trick.  The times aren't
>>> extremely accurate because I have failed this cluster between nodes
>>> while testing.
>>>
>>>     
>>
>> Here is some more testing information....
>>
>> I created a new volume on my iscsi san of 1 TB and formatted it for
>> ext3. I then used dd to create a 100G file.  This yielded roughly 900
>> Mb/sec.  I then stopped my application and did the same thing with an
>> existing GFS volume.  This gave me about 850 Kb/sec.  This isn't an
>> iscsi issue.  This appears to be a load issue and the number of I/O
>> occurring on these volumes.  That said, I would expect that performing
>> the changes I did would result in a major performance improvement.
>> Since it didn't, what are my other points I could consider?   If its a
>> GFS issue, ext3 is the way to go.  Maybe even switch to using
>> active-active on my NFS cluster.   If its a backend disk issue, I
>> would expect to see the throughput on my iscsi link (bond1) be fully
>> utilized.  Its not.  Could I be thrashing the disks?  This is an iscsi
>> san with 30 sata disks.  Just bouncing some thoughts around to see if
>> anyone has any more thoughts.
>>
>>   
> Really need to focus on my day time job - its worload has been 
> climbing ... but can't help to place a quick comment here ..
>
> The 900 MB/s vs. 850 KB/s difference looks like a caching  issue - 
> that is, for 900 MB/s, it looks like the data was still lingering in 
> the system cache while in 850 KB/s case, the data might already hit 
> disk. Cluster filesystem normally syncs more by its nature. In 
> general, ext3 does perform better in single node environment but the 
> difference should not be as big as above.
> There are certainly more tuning knobs available (such as journal size 
> and/or network buffer size) to make GFS-iscsi "dd" run better but it 
> is pointless. To deploy a cluster filesystem for production usage, the 
> tuning should not be driven by such a simple-mind command. You also 
> have to consider the support issues when deploying a filesystem. GFS1 
> is a little bit out of date and any new development and/or significant 
> performance improvements would likely be in GFS2, not in GFS1. 
> Research GFS2 (googling to see how other people said about it) to 
> understand whether its direction fits your need (so you can migrate 
> from GFS1 to GFS2 if you bump into any show stopper in the future). If 
> not, ext3 (with ext4 actively developed) is a fine choice if I read 
> your configuration right from previous posts.
>
Or .. there is a known GFS1 writepage issue if most of your files are 
all very big .. The problem is fixed in RHEL kernels though. What is 
your kernel version ?

-- Wendy


From td3201 at gmail.com  Thu Jun 19 16:03:42 2008
From: td3201 at gmail.com (Terry)
Date: Thu, 19 Jun 2008 11:03:42 -0500
Subject: [Linux-cluster] gfs tuning
In-Reply-To: <485A7E59.8020107@gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
	<20080616191641.GA17965@kallisti.us> <4856C385.8000800@gmail.com>
	<8ee061010806170954y1d6d7555qf7cd5a137b63e018@mail.gmail.com>
	<485819DC.90503@gmail.com>
	<8ee061010806171522l1d18480em861bffb87f8b8be2@mail.gmail.com>
	<8ee061010806181048j2d9e4635n6e0133c855e4ea06@mail.gmail.com>
	<485A71ED.8030305@gmail.com> <485A7E59.8020107@gmail.com>
Message-ID: <8ee061010806190903h65cafa6bqfac07aa88c42f012@mail.gmail.com>

On Thu, Jun 19, 2008 at 10:42 AM, Wendy Cheng <s.wendy.cheng at gmail.com> wrote:
> Wendy Cheng wrote:
>>
>> Terry wrote:
>>>
>>> On Tue, Jun 17, 2008 at 5:22 PM, Terry <td3201 at gmail.com> wrote:
>>>
>>>>
>>>> On Tue, Jun 17, 2008 at 3:09 PM, Wendy Cheng <s.wendy.cheng at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi, Terry,
>>>>>
>>>>>>
>>>>>> I am still seeing some high load averages.  Here is an example of a
>>>>>> gfs configuration.  I left statfs_fast off as it would not apply to
>>>>>> one of my volumes for an unknown reason.  Not sure that would have
>>>>>> helped anyways.  I do, however, feel that reducing scand_secs helped a
>>>>>> little:
>>>>>>
>>>>>>
>>>>>
>>>>> Sorry I missed scand_secs (was mindless as the brain was mostly
>>>>> occupied by
>>>>> day time work).
>>>>>
>>>>> To simplify the view, glock states include exclusive (write), share
>>>>> (read),
>>>>> and not-locked (in reality, there are more). Exclusive lock has to be
>>>>> demoted (demote_secs) to share, then to not-locked (another
>>>>> demote_secs)
>>>>> before it is scanned (every scand_secs) to get added into reclaim list
>>>>> where
>>>>> it can be purged. Between exclusive and share state transition, the
>>>>> file
>>>>> contents need to get flushed to disk (to keep file content cluster
>>>>> coherent).  All of above assume the file (protected by this glock) is
>>>>> not
>>>>> accessed (idle).
>>>>>
>>>>> You hit an area that GFS normally doesn't perform well. With GFS1 in
>>>>> maintenance mode while GFS2 seems to be so far away, ext3 could be a
>>>>> better
>>>>> answer. However, before switching, do make sure to test it thoroughly
>>>>> (since
>>>>> Ext3 could have the very same issue as well - check out:
>>>>> http://marc.info/?l=linux-nfs&m=121362947909974&w=2 ).
>>>>>
>>>>> Did you look (and test) GFS "nolock" protocol (for single node GFS)? It
>>>>> bypasses some locking overhead and can be switched to  DLM in the
>>>>> future
>>>>> (just make sure you reserve enough journal space - the rule of thumb is
>>>>> one
>>>>> journal per node and know how many nodes you plan to have in the
>>>>> future).
>>>>>
>>>>> -- Wendy
>>>>>
>>>>
>>>> Good points.  I could try the nolock feature I suppose.  Not quite
>>>> clear on how to reserve journal space.  I forgot to post the cpu time,
>>>> check out this:
>>>>
>>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>  4822 root      10  -5     0    0    0 S    1  0.0   2159:15 dlm_recv
>>>>  4820 root      10  -5     0    0    0 S    1  0.0 368:09.34 dlm_astd
>>>>  4821 root      10  -5     0    0    0 S    0  0.0 153:06.80 dlm_scand
>>>>  3659 root      10  -5     0    0    0 S    0  0.0 134:40.14 scsi_wq_4
>>>>  4823 root      11  -5     0    0    0 S    1  0.0 109:33.33 dlm_send
>>>>  367 root      10  -5     0    0    0 S    0  0.0 103:33.74 kswapd0
>>>>
>>>> gfs_glockd is further below so not so concerned with that right now.
>>>> It appears turning on nolock would do the trick.  The times aren't
>>>> extremely accurate because I have failed this cluster between nodes
>>>> while testing.
>>>>
>>>>
>>>
>>> Here is some more testing information....
>>>
>>> I created a new volume on my iscsi san of 1 TB and formatted it for
>>> ext3. I then used dd to create a 100G file.  This yielded roughly 900
>>> Mb/sec.  I then stopped my application and did the same thing with an
>>> existing GFS volume.  This gave me about 850 Kb/sec.  This isn't an
>>> iscsi issue.  This appears to be a load issue and the number of I/O
>>> occurring on these volumes.  That said, I would expect that performing
>>> the changes I did would result in a major performance improvement.
>>> Since it didn't, what are my other points I could consider?   If its a
>>> GFS issue, ext3 is the way to go.  Maybe even switch to using
>>> active-active on my NFS cluster.   If its a backend disk issue, I
>>> would expect to see the throughput on my iscsi link (bond1) be fully
>>> utilized.  Its not.  Could I be thrashing the disks?  This is an iscsi
>>> san with 30 sata disks.  Just bouncing some thoughts around to see if
>>> anyone has any more thoughts.
>>>
>>>
>>
>> Really need to focus on my day time job - its worload has been climbing
>> ... but can't help to place a quick comment here ..
>>
>> The 900 MB/s vs. 850 KB/s difference looks like a caching  issue - that
>> is, for 900 MB/s, it looks like the data was still lingering in the system
>> cache while in 850 KB/s case, the data might already hit disk. Cluster
>> filesystem normally syncs more by its nature. In general, ext3 does perform
>> better in single node environment but the difference should not be as big as
>> above.
>> There are certainly more tuning knobs available (such as journal size
>> and/or network buffer size) to make GFS-iscsi "dd" run better but it is
>> pointless. To deploy a cluster filesystem for production usage, the tuning
>> should not be driven by such a simple-mind command. You also have to
>> consider the support issues when deploying a filesystem. GFS1 is a little
>> bit out of date and any new development and/or significant performance
>> improvements would likely be in GFS2, not in GFS1. Research GFS2 (googling
>> to see how other people said about it) to understand whether its direction
>> fits your need (so you can migrate from GFS1 to GFS2 if you bump into any
>> show stopper in the future). If not, ext3 (with ext4 actively developed) is
>> a fine choice if I read your configuration right from previous posts.
>>
> Or .. there is a known GFS1 writepage issue if most of your files are all
> very big .. The problem is fixed in RHEL kernels though. What is your kernel
> version ?
>
> -- Wendy

2.6.18-92.el5

The files are not all very big though.  Varies.


From David.C.Harding at hp.com  Thu Jun 19 20:36:23 2008
From: David.C.Harding at hp.com (Harding, David)
Date: Thu, 19 Jun 2008 20:36:23 +0000
Subject: [Linux-cluster] Linux cluster moved to new subdomain
In-Reply-To: <485A7E59.8020107@gmail.com>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
	<20080616191641.GA17965@kallisti.us>	<4856C385.8000800@gmail.com>
	<8ee061010806170954y1d6d7555qf7cd5a137b63e018@mail.gmail.com>
	<485819DC.90503@gmail.com>
	<8ee061010806171522l1d18480em861bffb87f8b8be2@mail.gmail.com>
	<8ee061010806181048j2d9e4635n6e0133c855e4ea06@mail.gmail.com>
	<485A71ED.8030305@gmail.com> <485A7E59.8020107@gmail.com>
Message-ID: <0AB7D520EBDCE743BFAE18CF2B5B04C901EAFECAFC@G3W0070.americas.hpqcorp.net>


We moved our Linux cluster to a new tcpip subnet.
It is running Redhat V4 update 6.  After the move
I fixed the cluster.conf to reflect the subnet name.

I then went into luci.  When I select the cluster tab I get
an error message stating "an error occurred when trying to contact any of the nodes in the ermmro cluster"
The systems show up ok in the homepase tab and everything looks correct under the storage tab.

dave


From federico.simoncelli at gmail.com  Thu Jun 19 21:10:36 2008
From: federico.simoncelli at gmail.com (Federico Simoncelli)
Date: Thu, 19 Jun 2008 23:10:36 +0200
Subject: [Linux-cluster] Disabling cman at boot
In-Reply-To: <Pine.LNX.4.64.0806191418010.5892@trider-g7>
References: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>
	<a01fe36d0806180655u6f9621bcg5a0f56b3d975e627@mail.gmail.com>
	<Pine.LNX.4.64.0806190933400.5892@trider-g7>
	<Pine.LNX.4.64.0806191006080.16390@lxserv0.kfki.hu>
	<Pine.LNX.4.64.0806191418010.5892@trider-g7>
Message-ID: <a01fe36d0806191410x188f8b71nbd7c9858993756c7@mail.gmail.com>

On Thu, Jun 19, 2008 at 2:24 PM, Fabio M. Di Nitto <fdinitto at redhat.com> wrote:
> I see. I think i would prefer to see the parsing of cmdline done directly by
> cman/gfs/rgmanager init scripts rather than an extra one.

I agree with you. Adding an extra service is not necessary and we
still need to patch cman/gfs/rgmanager(/clvmd?) just to check a
different condition (is /etc/cluster/skip_* present?).
Anyway my vote is for "nocluster" at boot. If you think this might be
okay I could work on the patch tomorrow and post it for feedback.
Do you think we should file a bug and continue the discussion there?

Thanks.
-- 
Federico.


From d.degroot at griffith.edu.au  Fri Jun 20 00:45:17 2008
From: d.degroot at griffith.edu.au (Darrin De Groot)
Date: Fri, 20 Jun 2008 11:45:17 +1100
Subject: [Linux-cluster] cluster.conf fails to validate
Message-ID: <OFE0564245.EC527D0C-ON4A25746D.00827E6B-CA25746E.0003F68D@domino.griffith.edu.au>

Hi,

I have noticed recently that my RHEL5 cluster can no longer be managed 
through the luci interface - I see the following error:

An error occurred while attempting to get status information for this 
cluster. The information shown may be stale or inaccurate.

The odd thing is that I *can* actually update the cluster.conf on all the 
nodes through this interface...

I tried to validate the cluster.conf file, thinking that might have 
something to do with it:

xmllint   --relaxng    /usr/share/system-config-cluster/misc/cluster.ng 
/etc/cluster/cluster.conf

The trailing portion of which showed this:

        <quorumd device="/dev/dm-1" interval="2" min_score="1" tko="20" 
votes="3"/>
</cluster>
Relax-NG validity error : Extra element quorumd in interleave
/etc/cluster/cluster.conf:2: element cluster: Relax-NG validity error : 
Element cluster failed to validate content
/etc/cluster/cluster.conf fails to validate

Is this relevant?

Does anyone have any ideas? Command line tools (clustat, cman_tool 
services|status) all work just fine. The only reason I noticed was because 
one of the nodes didn't reboot correctly (dlm took too long to start on 
boot, causing the node to not join the fence domain, and therefore all 
other cluster services to fail - the nodes just sit there "Starting 
fencing"). If anyone has any ideas about this I'd apreciate too.

One other small question - is there any way to query which node is the 
current qdisk master?

Cheers,

Darrin de Groot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080620/03d90064/attachment.htm>

From david.monro at adelaide.edu.au  Fri Jun 20 05:26:31 2008
From: david.monro at adelaide.edu.au (David Monro)
Date: Fri, 20 Jun 2008 14:56:31 +0930
Subject: [Linux-cluster] question about what happens when fencing fails
Message-ID: <485B3F87.90108@adelaide.edu.au>

Hi,

just trying to get my head around the way CS5 copes with various failure
modes.

I have 2 sites, with one node and san at one site, and a second node in
the other site. (There is also a second san in the other site with a san
copy from the primary for disaster recovery). I am using HP iLO fencing.

The most likely failure scenario for us at the moment is complete loss
of the ethernet network between the 2 sites, with the SAN remaining up.
Obviously in this case both nodes will be unable to see the other, and
in addition will be unable to fence each other.

In the case where I do not use a quorum disk, what will happen? I would
have to guess that the answer will be a dead cluster, since neither node
can succeed at fencing the other.

In the case where I do use a quorum disk, what will happen? Both will
still have access to the quorum disk, but neither can fence the other.
Assuming both still achieve their minimum score, they will presumably
have some sort of fight over the quorum disk - how does that get
resolved, and how can the winning node be sure it has won? (I think I
managed to provoke this scenario by accident when messing around the
other day, and one of the nodes started spitting lots of messages out
about the other node being undead - I'm not sure if the cluster was
quorate at the time or not).

I did look at other fencing options as well, but I can't use fence_scsi
(because we use dm_multipath - a pity because its about the one thing
which actually should work for this scenario I think!), or fence_brocade
(because the node can't get to the ethernet port on the switch in the
other site).

Obviously careful selection of a heuristic may be possible to allow one
node to remove itself from the fight over the qdisk, in which case will
the cluster be OK even though the remaining node can't prove that the
one with the less-than-minimum score is actually dead?

Any guidance would be much appreciated.

Cheers,

	David


From fdinitto at redhat.com  Fri Jun 20 08:42:43 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Fri, 20 Jun 2008 10:42:43 +0200 (CEST)
Subject: [Linux-cluster] Disabling cman at boot
In-Reply-To: <a01fe36d0806191410x188f8b71nbd7c9858993756c7@mail.gmail.com>
References: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>
	<a01fe36d0806180655u6f9621bcg5a0f56b3d975e627@mail.gmail.com>
	<Pine.LNX.4.64.0806190933400.5892@trider-g7>
	<Pine.LNX.4.64.0806191006080.16390@lxserv0.kfki.hu>
	<Pine.LNX.4.64.0806191418010.5892@trider-g7>
	<a01fe36d0806191410x188f8b71nbd7c9858993756c7@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0806201041050.5892@trider-g7>

On Thu, 19 Jun 2008, Federico Simoncelli wrote:

> On Thu, Jun 19, 2008 at 2:24 PM, Fabio M. Di Nitto <fdinitto at redhat.com> wrote:
>> I see. I think i would prefer to see the parsing of cmdline done directly by
>> cman/gfs/rgmanager init scripts rather than an extra one.
>
> I agree with you. Adding an extra service is not necessary and we
> still need to patch cman/gfs/rgmanager(/clvmd?) just to check a
> different condition (is /etc/cluster/skip_* present?).

that's also another benefit.

> Anyway my vote is for "nocluster" at boot.

we will need 2 options. One to disable only gfs and one to disable 
everything.

> If you think this might be
> okay I could work on the patch tomorrow and post it for feedback.

sure, that would be nice. Please start from the master branch in git (or 
grab the latest 2.99.xx release). Backporting is going to be easier once 
it's there.

> Do you think we should file a bug and continue the discussion there?

Sure. that's also fine by me.

Fabio

--
I'm going to make him an offer he can't refuse.


From kadlec at sunserv.kfki.hu  Fri Jun 20 08:49:26 2008
From: kadlec at sunserv.kfki.hu (Kadlecsik Jozsef)
Date: Fri, 20 Jun 2008 10:49:26 +0200 (CEST)
Subject: [Linux-cluster] Disabling cman at boot
In-Reply-To: <Pine.LNX.4.64.0806201041050.5892@trider-g7>
References: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>
	<a01fe36d0806180655u6f9621bcg5a0f56b3d975e627@mail.gmail.com>
	<Pine.LNX.4.64.0806190933400.5892@trider-g7>
	<Pine.LNX.4.64.0806191006080.16390@lxserv0.kfki.hu>
	<Pine.LNX.4.64.0806191418010.5892@trider-g7>
	<a01fe36d0806191410x188f8b71nbd7c9858993756c7@mail.gmail.com>
	<Pine.LNX.4.64.0806201041050.5892@trider-g7>
Message-ID: <Pine.LNX.4.64.0806201047440.31981@lxserv1.kfki.hu>

On Fri, 20 Jun 2008, Fabio M. Di Nitto wrote:

> > Anyway my vote is for "nocluster" at boot.
> 
> we will need 2 options. One to disable only gfs and one to disable everything.

That can be "one" boot option, e.g. 'nocluster' to disable everything and 
say 'nocluster=volumes' to disable only gfs.
 
Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary


From sunhux at gmail.com  Fri Jun 20 10:19:45 2008
From: sunhux at gmail.com (sunhux G)
Date: Fri, 20 Jun 2008 18:19:45 +0800
Subject: [Linux-cluster] Multipath or Oracle RAC ASM issue?
In-Reply-To: <485A2018.3090803@redhat.com>
References: <60f08e700806180920s580a3fbdsfb520fd1d54551c2@mail.gmail.com>
	<20080618192832.GB16780@ca-server1.us.oracle.com>
	<48599DBB.5040303@redhat.com>
	<20080619014802.GA27364@ca-server1.us.oracle.com>
	<485A2018.3090803@redhat.com>
Message-ID: <60f08e700806200319xad3705agc7b23a873ba01557@mail.gmail.com>

Wow, can't catch up with the discussion.  The outsourced DBA/
application developer now wants us to clear away /dev/mapper/*
& /dev/mpath/*  which I've done.

They would like to use /dev/raw/asm*  which was created on the
old servers by a previous team (that are now gone).

Anyone know how those ASM partitions end up in /dev/raw?
Is it using one of those ASM tools from Oracle?


Thanks
U
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080620/c02439a0/attachment.htm>

From theophanis_kontogiannis at yahoo.gr  Fri Jun 20 11:48:33 2008
From: theophanis_kontogiannis at yahoo.gr (Theophanis Kontogiannis)
Date: Fri, 20 Jun 2008 14:48:33 +0300
Subject: [Linux-cluster] Linux cluster moved to new subdomain
In-Reply-To: <0AB7D520EBDCE743BFAE18CF2B5B04C901EAFECAFC@G3W0070.americas.hpqcorp.net>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>	<20080616191641.GA17965@kallisti.us>	<4856C385.8000800@gmail.com>	<8ee061010806170954y1d6d7555qf7cd5a137b63e018@mail.gmail.com>	<485819DC.90503@gmail.com>	<8ee061010806171522l1d18480em861bffb87f8b8be2@mail.gmail.com>	<8ee061010806181048j2d9e4635n6e0133c855e4ea06@mail.gmail.com>	<485A71ED.8030305@gmail.com>
	<485A7E59.8020107@gmail.com>
	<0AB7D520EBDCE743BFAE18CF2B5B04C901EAFECAFC@G3W0070.americas.hpqcorp.net>
Message-ID: <001301c8d2cb$9317be60$b9473b20$@gr>

Hi Dave,

Did you make the appropriate changes on the iptables, to reflect the new IPs
given to the servers?

Sincerely,

Theophanis Kontogiannis


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Harding, David
Sent: Thursday, June 19, 2008 11:36 PM
To: 'linux clustering'
Subject: [Linux-cluster] Linux cluster moved to new subdomain


We moved our Linux cluster to a new tcpip subnet.
It is running Redhat V4 update 6.  After the move
I fixed the cluster.conf to reflect the subnet name.

I then went into luci.  When I select the cluster tab I get
an error message stating "an error occurred when trying to contact any of
the nodes in the ermmro cluster"
The systems show up ok in the homepase tab and everything looks correct
under the storage tab.

dave

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From federico.simoncelli at gmail.com  Fri Jun 20 13:34:57 2008
From: federico.simoncelli at gmail.com (Federico Simoncelli)
Date: Fri, 20 Jun 2008 15:34:57 +0200
Subject: [Linux-cluster] Disabling cman at boot
In-Reply-To: <Pine.LNX.4.64.0806201041050.5892@trider-g7>
References: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>
	<a01fe36d0806180655u6f9621bcg5a0f56b3d975e627@mail.gmail.com>
	<Pine.LNX.4.64.0806190933400.5892@trider-g7>
	<Pine.LNX.4.64.0806191006080.16390@lxserv0.kfki.hu>
	<Pine.LNX.4.64.0806191418010.5892@trider-g7>
	<a01fe36d0806191410x188f8b71nbd7c9858993756c7@mail.gmail.com>
	<Pine.LNX.4.64.0806201041050.5892@trider-g7>
Message-ID: <a01fe36d0806200634l3f50edc6n1f351befd663f95f@mail.gmail.com>

On Fri, Jun 20, 2008 at 10:42 AM, Fabio M. Di Nitto <fdinitto at redhat.com> wrote:
>> Do you think we should file a bug and continue the discussion there?
>
> Sure. that's also fine by me.

https://bugzilla.redhat.com/show_bug.cgi?id=452234

As I wrote in the bug description... the solution I proposed (adding a
boot parameter) will also make impossible to start the services
manually later on (until next reboot). This might be why Jozsef was
using the check on a file (which can be easily removed). Jozsef can
you confirm this?
Do you think we can find a way to remove this side effect?
Is there a way to detect if the init script was run by the init binary
or manually from the shell?
Should we add an other init command such as "force-start" to override
the boot parameter check?
Any better idea is more than welcome.

-- 
Federico.


From David.C.Harding at hp.com  Fri Jun 20 13:50:11 2008
From: David.C.Harding at hp.com (Harding, David)
Date: Fri, 20 Jun 2008 13:50:11 +0000
Subject: [Linux-cluster] Linux cluster moved to new subdomain
In-Reply-To: <001301c8d2cb$9317be60$b9473b20$@gr>
References: <8ee061010806160945pc2418f8w95749c8bf566d02d@mail.gmail.com>
	<20080616191641.GA17965@kallisti.us>	<4856C385.8000800@gmail.com>
	<8ee061010806170954y1d6d7555qf7cd5a137b63e018@mail.gmail.com>
	<485819DC.90503@gmail.com>
	<8ee061010806171522l1d18480em861bffb87f8b8be2@mail.gmail.com>
	<8ee061010806181048j2d9e4635n6e0133c855e4ea06@mail.gmail.com>
	<485A71ED.8030305@gmail.com>	<485A7E59.8020107@gmail.com>
	<0AB7D520EBDCE743BFAE18CF2B5B04C901EAFECAFC@G3W0070.americas.hpqcorp.net>
	<001301c8d2cb$9317be60$b9473b20$@gr>
Message-ID: <0AB7D520EBDCE743BFAE18CF2B5B04C901EAFECB03@G3W0070.americas.hpqcorp.net>

 Yes,  The issue is that when I first built the cluster I used the full qualified host name. I changed the cluster.conf file,
But some where it is still getting the old fully qualified host name.  I can see it in the log files.  Where else does it
Store the cluster host names other then the cluster.conf file ?

dave

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Theophanis Kontogiannis
Sent: Friday, June 20, 2008 7:49 AM
To: 'linux clustering'
Subject: RE: [Linux-cluster] Linux cluster moved to new subdomain

Hi Dave,

Did you make the appropriate changes on the iptables, to reflect the new IPs given to the servers?

Sincerely,

Theophanis Kontogiannis


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Harding, David
Sent: Thursday, June 19, 2008 11:36 PM
To: 'linux clustering'
Subject: [Linux-cluster] Linux cluster moved to new subdomain


We moved our Linux cluster to a new tcpip subnet.
It is running Redhat V4 update 6.  After the move I fixed the cluster.conf to reflect the subnet name.

I then went into luci.  When I select the cluster tab I get an error message stating "an error occurred when trying to contact any of the nodes in the ermmro cluster"
The systems show up ok in the homepase tab and everything looks correct under the storage tab.

dave

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From Joel.Becker at oracle.com  Fri Jun 20 16:31:41 2008
From: Joel.Becker at oracle.com (Joel Becker)
Date: Fri, 20 Jun 2008 09:31:41 -0700
Subject: [Linux-cluster] Multipath or Oracle RAC ASM issue?
In-Reply-To: <60f08e700806200319xad3705agc7b23a873ba01557@mail.gmail.com>
References: <60f08e700806180920s580a3fbdsfb520fd1d54551c2@mail.gmail.com>
	<20080618192832.GB16780@ca-server1.us.oracle.com>
	<48599DBB.5040303@redhat.com>
	<20080619014802.GA27364@ca-server1.us.oracle.com>
	<485A2018.3090803@redhat.com>
	<60f08e700806200319xad3705agc7b23a873ba01557@mail.gmail.com>
Message-ID: <20080620163141.GB14238@ca-server1.us.oracle.com>

On Fri, Jun 20, 2008 at 06:19:45PM +0800, sunhux G wrote:
> Wow, can't catch up with the discussion.  The outsourced DBA/
> application developer now wants us to clear away /dev/mapper/*
> & /dev/mpath/*  which I've done.
> 
> They would like to use /dev/raw/asm*  which was created on the
> old servers by a previous team (that are now gone).

	These names are pretty irrelevant.  You should use ASMLib, which
gives you names you choose (like LANDx in your example).  Just make sure
to configure your SCANORDER as described in
http://www.oracle.com/technology/tech/linux/asmlib/multipath.html and
you'll be fine.

Joel
-- 

	f/8 and be there.

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127


From lhh at redhat.com  Fri Jun 20 20:40:15 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 20 Jun 2008 16:40:15 -0400
Subject: [Linux-cluster] cluster.conf fails to validate
In-Reply-To: <OFE0564245.EC527D0C-ON4A25746D.00827E6B-CA25746E.0003F68D@domino.griffith.edu.au>
References: <OFE0564245.EC527D0C-ON4A25746D.00827E6B-CA25746E.0003F68D@domino.griffith.edu.au>
Message-ID: <1213994415.16628.19.camel@ayanami.boston.devel.redhat.com>

On Fri, 2008-06-20 at 11:45 +1100, Darrin De Groot wrote:

> One other small question - is there any way to query which node is the current qdisk master? 

Not currently; you can add 'status_file="/path-to-file"' to the
<quorumd> tag.  You can cat it out whenever you want.  However, it is
for debugging only and I don't recommend it for extended use.

Being a 'qdisk master' isn't a huge job; it doesn't do anything
particularly special that the other nodes do not do except flip a couple
of bits in a bitmap occasionally based on input from CMAN and ask CMAN
to kill nodes if needed.

-- Lon


From lhh at redhat.com  Fri Jun 20 20:44:13 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 20 Jun 2008 16:44:13 -0400
Subject: [Linux-cluster] question about what happens when fencing fails
In-Reply-To: <485B3F87.90108@adelaide.edu.au>
References: <485B3F87.90108@adelaide.edu.au>
Message-ID: <1213994653.16628.24.camel@ayanami.boston.devel.redhat.com>


On Fri, 2008-06-20 at 14:56 +0930, David Monro wrote:

> The most likely failure scenario for us at the moment is complete loss
> of the ethernet network between the 2 sites, with the SAN remaining up.
> Obviously in this case both nodes will be unable to see the other, and
> in addition will be unable to fence each other.

That's unfortunate. ;)

> In the case where I do not use a quorum disk, what will happen? I would
> have to guess that the answer will be a dead cluster, since neither node
> can succeed at fencing the other.

Correct.

> In the case where I do use a quorum disk, what will happen?

Well, since fencing is required when using a quorum disk, the effect
will be the same.

The quorum disk can be used to help decide *who* is allowed to fence.

> I did look at other fencing options as well, but I can't use fence_scsi
> (because we use dm_multipath - a pity because its about the one thing
> which actually should work for this scenario I think!), or fence_brocade
> (because the node can't get to the ethernet port on the switch in the
> other site).

That should be fixed in 5.2 (I think?)

-- Lon


From lhh at redhat.com  Fri Jun 20 20:50:46 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 20 Jun 2008 16:50:46 -0400
Subject: [Linux-cluster] Re: apache resource problem in RHCS 5.1
In-Reply-To: <g37qgl$jll$1@ger.gmane.org>
References: <483ECA36.7070007@xbe.ch>
	<9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>
	<48469941.6030800@xbe.ch>	<48480528.6000708@redhat.com>
	<48576585.3050409@xbe.ch>  <g37qgl$jll$1@ger.gmane.org>
Message-ID: <1213995046.16628.28.camel@ayanami.boston.devel.redhat.com>


On Tue, 2008-06-17 at 09:52 +0200, denis wrote:
> Lorenz Pfiffner wrote:
> > I want to add something I recently found out about the relocation time 
> > of IP resources, which I complained about.
> > 
> > The reason why it takes 10 seconds per IP can be found here: 
> > http://sources.redhat.com/cluster/faq.html#rgm_failovertime
> 
> Can anyone add to this, seems very interesting to cut the failover time 
> for my scenario too.
> 
> Is lowering the sleep time or removing it safe for clusters without NFS?

Generally, whenever you leave a process running to service requests, you
want to leave the 'sleep' in.

EX: nfsd generally stays running because it can be used by multiple
services running on the same node (and there is in no way to make
multiple separate instances of nfsd bind to separate IP addresses
currently).  So, sleep for a bit.

Leaving the app running has some limitations, particularly if the
service comes back to the original host (this was recently discussed on
linux-nfs).


Whenever you stop the application, you can remove the sleep.

EX: Apache is generally shut down for a service relocation.  You may
remove the 'sleep' because the sockets will be closed and the
application will be terminated.

-- Lon


From sunhux at gmail.com  Mon Jun 23 07:38:54 2008
From: sunhux at gmail.com (sunhux G)
Date: Mon, 23 Jun 2008 15:38:54 +0800
Subject: [Linux-cluster] live & standby (primary & secondary partitions) in
	"multipath -ll"
Message-ID: <60f08e700806230038j1bdabe9eue6ffe5c9db2089f7@mail.gmail.com>

Hi,


Below is a listing of "multipath -ll" which are the same across
both our RAC servers (as the bindings file are identical).

Currently our NetApp Filer1's load is very heavy while Filer2
is underutilized

Question 1:
a) how do we find out which of the device files /dev/sd*
    go to NetApp SAN Filer1 & which to Filer2 (we have 2
    NetApp files)?


b)on the lines below with "round robin", what does "enabled"
    & "active" mean or what's the difference?  Does "active"
    refers to the primary path it goes to & when the primary
    filer fails, then only secondary filer took over?

=========================================

Multipath -ll    output :

mpath2 (360a980005672443963344933706f536c)
[size=40 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:3 sdq 65:0   [active][ready]      **
 \_ 8:0:3:3 sdw 65:96  [active][ready]      **
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:3 sde 8:64   [active][ready]
 \_ 8:0:1:3 sdk 8:160  [active][ready]

mpath1 (360a9800056724439633449336c786d69)
[size=5 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:4 sdr 65:16  [active][ready]      **
 \_ 8:0:3:4 sdx 65:112 [active][ready]      **
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:4 sdf 8:80   [active][ready]
 \_ 8:0:1:4 sdl 8:176  [active][ready]

mpath0 (360a98000567244396334493370345055)
[size=5 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:5 sds 65:32  [active][ready]      **
 \_ 8:0:3:5 sdy 65:128 [active][ready]      **
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:5 sdg 8:96   [active][ready]
 \_ 8:0:1:5 sdm 8:192  [active][ready]

mpath5 (360a9800056724439633449336c4d6b36)
[size=40 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:1 sdo 8:224  [active][ready]      **
 \_ 8:0:3:1 sdu 65:64  [active][ready]      **
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:1 sdc 8:32   [active][ready]
 \_ 8:0:1:1 sdi 8:128  [active][ready]

mpath4 (360a980005672443963344933706b4770)
[size=40 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:2 sdp 8:240  [active][ready]      **
 \_ 8:0:3:2 sdv 65:80  [active][ready]      **
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:2 sdd 8:48   [active][ready]
 \_ 8:0:1:2 sdj 8:144  [active][ready]

mpath3 (360a9800056724439633449336c514c75)
[size=40 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
 \_ 8:0:2:0 sdn 8:208  [active][ready]      **
 \_ 8:0:3:0 sdt 65:48  [active][ready]      **
\_ round-robin 0 [prio=2][enabled]
 \_ 8:0:0:0 sdb 8:16   [active][ready]
 \_ 8:0:1:0 sdh 8:112  [active][ready]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080623/76bfc911/attachment.htm>

From theophanis_kontogiannis at yahoo.gr  Mon Jun 23 11:00:05 2008
From: theophanis_kontogiannis at yahoo.gr (Theophanis Kontogiannis)
Date: Mon, 23 Jun 2008 14:00:05 +0300
Subject: [Linux-cluster] Issue with GFS2 medata block on two nodes cluster.
In-Reply-To: <1213995046.16628.28.camel@ayanami.boston.devel.redhat.com>
References: <483ECA36.7070007@xbe.ch>	<9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com>	<48469941.6030800@xbe.ch>	<48480528.6000708@redhat.com>	<48576585.3050409@xbe.ch>
	<g37qgl$jll$1@ger.gmane.org>
	<1213995046.16628.28.camel@ayanami.boston.devel.redhat.com>
Message-ID: <00a101c8d520$4ce4ec40$e6aec4c0$@gr>

Hello all,

I have a two node cluster with kernel-2.6.18-53.1.21.el5 and drbd8.2.

The LV is formatted with GFS2.

However suddenly one of the nodes keeps on reporting:


Jun 23 12:56:05 tweety1 kernel: GFS2: fsid=tweety:gfs2-00.0: fatal: invalid
metadata block
Jun 23 12:56:05 tweety1 kernel: GFS2: fsid=tweety:gfs2-00.0:   bh = 21879736
(magic number)
Jun 23 12:56:05 tweety1 kernel: GFS2: fsid=tweety:gfs2-00.0:   function =
gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 438
Jun 23 12:56:05 tweety1 kernel: GFS2: fsid=tweety:gfs2-00.0: about to
withdraw this file system
Jun 23 12:56:05 tweety1 kernel: GFS2: fsid=tweety:gfs2-00.0: telling LM to
withdraw


Jun 23 12:56:06 tweety1 kernel: GFS2: fsid=tweety:gfs2-00.0: withdrawn
Jun 23 12:56:06 tweety1 kernel:
Jun 23 12:56:06 tweety1 kernel: Call Trace:
Jun 23 12:56:06 tweety1 kernel:  [<ffffffff8865715e>]
:gfs2:gfs2_lm_withdraw+0xc1/0xd0
Jun 23 12:56:06 tweety1 kernel:  [<ffffffff80014cca>] sync_buffer+0x0/0x3f
Jun 23 12:56:06 tweety1 kernel:  [<ffffffff80062a3f>]
out_of_line_wait_on_bit+0x6c/0x78
Jun 23 12:56:06 tweety1 kernel:  [<ffffffff8009ba34>]
wake_bit_function+0x0/0x23
Jun 23 12:56:06 tweety1 kernel:  [<ffffffff8866895b>]
:gfs2:gfs2_meta_check_ii+0x2c/0x38
Jun 23 12:56:06 tweety1 kernel:  [<ffffffff8865adbb>]
:gfs2:gfs2_meta_indirect_buffer+0x1e3/0x284
Jun 23 12:56:06 tweety1 kernel:  [<ffffffff88655a88>]
:gfs2:gfs2_inode_refresh+0x22/0x2b9
Jun 23 12:56:06 tweety1 kernel:  [<ffffffff88654eff>]
:gfs2:inode_go_lock+0x29/0x57


After this, the node halts.
The other node works perfectly.

If I boot single user the problematic node, what steps should I take to
recover??

Thank you all for your time.

Theophanis Kontogiannis


From teigland at redhat.com  Mon Jun 23 16:16:02 2008
From: teigland at redhat.com (David Teigland)
Date: Mon, 23 Jun 2008 11:16:02 -0500
Subject: [Linux-cluster] RFC: updating cluster.conf
Message-ID: <20080623161602.GB19528@redhat.com>

Hi,

We're looking into how cluster.conf updates should be done in future
versions and we'd like some feedback about how you currently do this, and
what you'd like to see.

1. How often do you update cluster.conf?  ("Never" would be valuable
   feedback.)

2. What changes do you make?  e.g. add nodes, change fencing settings,
   add or change rgmanager settings.

3. How do you currently update cluster.conf?  Cluster online or offline?
   Manually scp to all nodes?  ccs_tool?  conga?  What do you like and not
   like about the method you use now?

4. How would you like to do updates to cluster.conf in the future?
   Conga (graphical management interface)?  Command line program that
   updates /etc/cluster/cluster.conf on all cluster nodes?
   Manually scp to all nodes?  Other?

5. Would you like to use an LDAP server?  All cluster nodes would read
   cluster.conf info from the server; updates would just be made on
   the server.

Thanks,
Dave


From s.wendy.cheng at gmail.com  Mon Jun 23 16:53:21 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Mon, 23 Jun 2008 12:53:21 -0400
Subject: [Linux-cluster] live & standby (primary & secondary partitions)
	in "multipath -ll"
In-Reply-To: <60f08e700806230038j1bdabe9eue6ffe5c9db2089f7@mail.gmail.com>
References: <60f08e700806230038j1bdabe9eue6ffe5c9db2089f7@mail.gmail.com>
Message-ID: <485FD501.3030008@gmail.com>

sunhux G wrote:
> Question 1:
> a) how do we find out which of the device files /dev/sd*  
>     go to NetApp SAN Filer1 & which to Filer2 (we have 2
>     NetApp files)? 

Contact your Netapp support or directly go to the "NOW" web site to 
download its linux host utility packages (e.g. 
netapp_linux_host_utils_3_1.tar.gz) . Its installation should be 
reasonably trivial. After install, there will be a command called 
"sanlun" that can tell you the mapping between /dev/sd* and filers luns:

For example:
[root at acamar wendy]# sanlun lun show
   filer:      lun-pathname   device filename  adapter  
protocol          lun size         lun state
wcheng-1  /vol/cluster1/lun0  /dev/sda         host5    iSCSI           
300GB        GOOD  
wcheng-2  /vol/cluster2/lun0  /dev/sdb         host6    iSCSI           
300GB        GOOD   

-- Wendy


From fog at t.is  Mon Jun 23 16:39:52 2008
From: fog at t.is (=?iso-8859-1?Q?Finnur_=D6rn_Gu=F0mundsson_-_TM_Software?=)
Date: Mon, 23 Jun 2008 16:39:52 -0000
Subject: [Linux-cluster] Service dependancies
Message-ID: <3DDA6E3E456E144DA3BB0A62A7F7F7790221624E@SKYHQAMX08.klasi.is>

Hi,

 
I was wondering if you could have 2 services:

 
Service A (ie, Oracle)

Service B (ie, Tomcat)

 
If Service A fails, it is moved to another node, but Service B will not recover until it is restarted since it needs to "reconnect" to Service A. Does the cluster software (RHEL 5.2) support adding Service A as a  dependency for Service B?

 
I was not able to find anything in the documentation regarding this, but if there is, please throw it at me :)

 
K?r kve?ja / Best Regards,

Finnur ?rn Gu?mundsson
Network Engineer - Network Operations
fog at t.is <mailto:fog at t.is> 

TM Software
Ur?arhvarf 6, IS-203 K?pavogur, Iceland
Tel: +354 545 3000 - fax +354 545 3610
www.tm-software.is <http://www.tm-software.is/> 

This e-mail message and any attachments are confidential and may be privileged. TM Software e-mail disclaimer: www.tm-software.is/disclaimer <http://www.tm-software.is/disclaimer>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080623/4d5a2bb9/attachment.htm>

From david.costakos at gmail.com  Mon Jun 23 18:55:56 2008
From: david.costakos at gmail.com (Dave Costakos)
Date: Mon, 23 Jun 2008 11:55:56 -0700
Subject: [Linux-cluster] RFC: updating cluster.conf
In-Reply-To: <20080623161602.GB19528@redhat.com>
References: <20080623161602.GB19528@redhat.com>
Message-ID: <6b6836c60806231155x5dab7a57g7672d1f4ceafe66@mail.gmail.com>

Thanks for asking :-)

1. Daily -- or multiple times daily in our Xen VM clusters.  I have the
config_version property though, I'd like to see that gotten rid of so I
don't have to worry about going over the limit.

2. Generally it's adding/changing services.  Adding of nodes/fencing
settings happens, but much, much less frequently

3/4.  I wrote a python script to update the XML automagically, then use
ccs_tool to propagate the update.  It would be nice to have an API interface
for changing the cluster config from custom tools (like shell scripts or web
interfaces) if that's possible.  Though I should say that the python script
I wrote was pretty stupid as easy, I'm just concerned about cluster.conf
format changes and how that will affect my tools.

5. I would not like to use LDAP.  My experience in general with OpenLDAP is
poor and it seems poorly documented.  Moreover, it implements a single-point
of failure for the cluster configuration.  I like having hte cluster
configuration stored on each node.

On Mon, Jun 23, 2008 at 9:16 AM, David Teigland <teigland at redhat.com> wrote:

> Hi,
>
> We're looking into how cluster.conf updates should be done in future
> versions and we'd like some feedback about how you currently do this, and
> what you'd like to see.
>
> 1. How often do you update cluster.conf?  ("Never" would be valuable
>   feedback.)
>
> 2. What changes do you make?  e.g. add nodes, change fencing settings,
>   add or change rgmanager settings.
>
> 3. How do you currently update cluster.conf?  Cluster online or offline?
>   Manually scp to all nodes?  ccs_tool?  conga?  What do you like and not
>   like about the method you use now?
>
> 4. How would you like to do updates to cluster.conf in the future?
>   Conga (graphical management interface)?  Command line program that
>   updates /etc/cluster/cluster.conf on all cluster nodes?
>   Manually scp to all nodes?  Other?
>
> 5. Would you like to use an LDAP server?  All cluster nodes would read
>   cluster.conf info from the server; updates would just be made on
>   the server.
>
> Thanks,
> Dave
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Dave Costakos
mailto:david.costakos at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080623/db992b5f/attachment.htm>

From zac at sprackett.com  Mon Jun 23 19:50:52 2008
From: zac at sprackett.com (S. Zachariah Sprackett)
Date: Mon, 23 Jun 2008 15:50:52 -0400
Subject: [Linux-cluster] RFC: updating cluster.conf
In-Reply-To: <20080623161602.GB19528@redhat.com>
References: <20080623161602.GB19528@redhat.com>
Message-ID: <ed9a61600806231250o4186f089s1b9e2d9a8135d849@mail.gmail.com>

On Mon, Jun 23, 2008 at 12:16 PM, David Teigland <teigland at redhat.com>
wrote:

> Hi,
>
> We're looking into how cluster.conf updates should be done in future
> versions and we'd like some feedback about how you currently do this, and
> what you'd like to see.
>
> 1. How often do you update cluster.conf?  ("Never" would be valuable

  feedback.)


Infrequently, we pretty much only update it as we grow or shrink the
cluster.


>
> 2. What changes do you make?  e.g. add nodes, change fencing settings,
>   add or change rgmanager settings.
>

We add and remove nodes.

3. How do you currently update cluster.conf?  Cluster online or offline?
>   Manually scp to all nodes?  ccs_tool?  conga?  What do you like and not
>   like about the method you use now?
>

We have a product that ships to customers and uses RHCS for clustering
support.  Our config files are generated on each node and we've had problems
in the past with the current system of using the cluster.conf not only as a
configuration file but also as a config cache.  We'd like a mechanism to
allow us to generate the configuration files and then manually tell each
node to move to them.

4. How would you like to do updates to cluster.conf in the future?
>   Conga (graphical management interface)?  Command line program that
>   updates /etc/cluster/cluster.conf on all cluster nodes?
>   Manually scp to all nodes?  Other?
>

We'll continue to do it as we do above, so from your perspective, you can
count us in the manual camp.

5. Would you like to use an LDAP server?  All cluster nodes would read
>   cluster.conf info from the server; updates would just be made on
>   the server.


This adds a dependency on an external server or will cause undue pain in
trying to keep the LDAP databases in sync if running on the nodes.  From our
perspective, this sounds like a regression.

-z
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080623/1bebff63/attachment.htm>

From stpierre at NebrWesleyan.edu  Mon Jun 23 20:02:02 2008
From: stpierre at NebrWesleyan.edu (Chris St. Pierre)
Date: Mon, 23 Jun 2008 15:02:02 -0500 (CDT)
Subject: [Linux-cluster] RFC: updating cluster.conf
In-Reply-To: <20080623161602.GB19528@redhat.com>
References: <20080623161602.GB19528@redhat.com>
Message-ID: <alpine.LFD.1.10.0806231455200.24482@grunthos.NebrWesleyan.edu>

On Mon, 23 Jun 2008, David Teigland wrote:

> 1. How often do you update cluster.conf?  ("Never" would be valuable
>   feedback.)

Very rarely.  Close to never.

> 2. What changes do you make?  e.g. add nodes, change fencing settings,
>   add or change rgmanager settings.

Since we've just started using RHCS, we're still doing some config
tweaks from time to time.  In the long run, I suspect it will mostly
be to add and remove nodes, and, perhaps more rarely, rgmanager
services.

> 3. How do you currently update cluster.conf?  Cluster online or offline?
>   Manually scp to all nodes?  ccs_tool?  conga?  What do you like and not
>   like about the method you use now?

We put the new cluster.conf in Cfengine and let it copy it out to the
cluster.  When Cfengine detects that it has copied out a new
cluster.conf, it automatically runs the ccs_tool/cman_tool magic to
get the cluster to read the new config.

> 4. How would you like to do updates to cluster.conf in the future?
>   Conga (graphical management interface)?  Command line program that
>   updates /etc/cluster/cluster.conf on all cluster nodes?
>   Manually scp to all nodes?  Other?

I rather like the way we do it now.

> 5. Would you like to use an LDAP server?  All cluster nodes would read
>   cluster.conf info from the server; updates would just be made on
>   the server.

On the one hand, that would be unspeakably awesome.  On the other
hand, that could limit our ability to run an LDAP cluster under RHCS,
although I suppose we could just use the old-skool cluster.conf for
that cluster.

For us, if LDAP is down, we're already hosed beyond hosed, so the idea
of an additional external dependency isn't really that distasteful.
That said, it might be nice to have RHCS cache the config locally to
avoid a temporary LDAP outage from causing a cluster outage, too.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University


From jstoner at opsource.net  Mon Jun 23 20:06:18 2008
From: jstoner at opsource.net (Jeff Stoner)
Date: Mon, 23 Jun 2008 21:06:18 +0100
Subject: [Linux-cluster] RFC: updating cluster.conf
In-Reply-To: <20080623161602.GB19528@redhat.com>
References: <20080623161602.GB19528@redhat.com>
Message-ID: <38A48FA2F0103444906AD22E14F1B5A307881516@mailxchg01.corp.opsource.net>

1. and 2. Our cluster changes across all clusters happen on an
infrequent basis (6 or fewer a month) and typically involve simple
changes such as changing a SAN mount point (changing the device name or
where to mount it,) adding a SAN mount or changing the user password for
our fencing devices.


3. How we propogate the changes depends on who's making the change. I
typically roll several changes (app and cluster) into the same
maintenance window so I can bring down the cluster, at which point I
simply vi the cluster.conf files (after testing the changes in staging,
making backups, etc. etc. etc.) Other sys admins use ccs_tool to roll
out simple changes. We don't have Conga deployed because, quite frankly,
it wasn't enterprise ready when we last tested it (RHEL4 flavor, we have
not deployed any RHEL5 clusters.)


4. While I prefer delimited flat files, there are occassions where a
more structured config file is warranted (Apache jumps to mind.) I
detest the use of XML as a straight-up configuration file. I am not an
XML parser. That being said, what I would like to see for a command-line
interface to configuration is something similar to what Augeas
(augeas.net) aims to provide. A program that parses the existing config
into a tree in memory, then acts as a shell to allow me to browse the
tree, make changes as appropriate then validate the changes when it
writes out the config or aborting with appropriate error messages on
invalid settings.

Using a "Augeas-type" tool might look something like the following:
$ ccs_tool
> get /files/etc/cluster/cluster.conf/1/name
wilma
> set /files/etc/cluster/cluster.conf/1/name fred
> set /files/etc/cluster/cluster.conf/1/fence_domain/post_join_delay 10
> save
> deploy
Successfully deployed configuration version 97 to cluster "fred"
> exit
$

The "save" command would validate the configuration and write out the
XML file. The "deploy" command would push the config to the other
clusters the way css_tool does today.

My understanding from the Augeas talk at the Redhat Summit is no lenses
have been created that deal with XML-formatted files and David believes
it's possible to do, he just simply hasn't tried it yet.

I absolutely do not want to install X just to use system-config-cluster.
We have clusters around the world and the latency imposed by the network
on top of the latency of forwarding X over ssh is too painful. Our half
dozen cluster admins all share this view.


5. Does it have to be LDAP? Can you make it a pluggable architecture w/
API so we can use whatever backend we want (MySQL/Postgres/SQLite, LDAP,
Active Directory, DT+BG server*, etc.)? This feature should be optional,
though. Centralized management of cluster configurations would be nice
but, as others have said, introduces possible dependencies and points of
failure.


*Duct Tape and Bubble Gum

--Jeff
Sr. Systems Engineer/Performance Engineer

OpSource, Inc.
http://www.opsource.net
"Your Success is Our Success"
  

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of David Teigland
> Sent: Monday, June 23, 2008 12:16 PM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] RFC: updating cluster.conf
> 
> Hi,
> 
> We're looking into how cluster.conf updates should be done in future
> versions and we'd like some feedback about how you currently 
> do this, and
> what you'd like to see.
> 
> 1. How often do you update cluster.conf?  ("Never" would be valuable
>    feedback.)
> 
> 2. What changes do you make?  e.g. add nodes, change fencing settings,
>    add or change rgmanager settings.
> 
> 3. How do you currently update cluster.conf?  Cluster online 
> or offline?
>    Manually scp to all nodes?  ccs_tool?  conga?  What do you 
> like and not
>    like about the method you use now?
> 
> 4. How would you like to do updates to cluster.conf in the future?
>    Conga (graphical management interface)?  Command line program that
>    updates /etc/cluster/cluster.conf on all cluster nodes?
>    Manually scp to all nodes?  Other?
> 
> 5. Would you like to use an LDAP server?  All cluster nodes would read
>    cluster.conf info from the server; updates would just be made on
>    the server.
> 
> Thanks,
> Dave
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 


From lhh at redhat.com  Mon Jun 23 20:39:46 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 23 Jun 2008 16:39:46 -0400
Subject: [Linux-cluster] RFC: updating cluster.conf
In-Reply-To: <20080623161602.GB19528@redhat.com>
References: <20080623161602.GB19528@redhat.com>
Message-ID: <1214253586.3956.15.camel@localhost.localdomain>

On Mon, 2008-06-23 at 11:16 -0500, David Teigland wrote:

> 1. How often do you update cluster.conf?  ("Never" would be valuable
>    feedback.)

Varies.  During development / testing, several (10-20) times a day,
otherwise, rarely.  One cluster's config version is in the 300s, singly
incremented; another is in the 400s, but we skipped 100 on that one ;)


> 2. What changes do you make?  e.g. add nodes, change fencing settings,
>    add or change rgmanager settings.

Mostly rgmanager and qdiskd settings.

> 3. How do you currently update cluster.conf?  Cluster online or offline?
>    Manually scp to all nodes?  ccs_tool?  conga?  What do you like and not
>    like about the method you use now?

Typically - 
  edit on one node with vim.
  ccs_tool update /etc/cluster/cluster.conf
  cman_tool version -r <new_config_version>

> 4. How would you like to do updates to cluster.conf in the future?
>    Conga (graphical management interface)?  Command line program that
>    updates /etc/cluster/cluster.conf on all cluster nodes?
>    Manually scp to all nodes?  Other?

Something similar, except sans the "cman_tool version -r" part.

I suppose one could envision something using inotify so that upgrading
was fully automatic based on timestamps (eliminating the need for the
config serial #), but there are things to think about there (e.g. if
dates aren't in sync across the cluster...).  There's a reason BIND uses
serial numbers contained within its individual zone/config files...


> 5. Would you like to use an LDAP server?  All cluster nodes would read
>    cluster.conf info from the server; updates would just be made on
>    the server.

Truthfully, I will probably not use it much. :)

?One thing that I would really disagree with: given a running a cluster,
forcing an administrator to manually distribute the configuration.  That
said, we might be able to push this back on ricci instead of using the
distribution code as it exists in ccsd today.

-- Lon


From lhh at redhat.com  Mon Jun 23 21:22:02 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 23 Jun 2008 17:22:02 -0400
Subject: [Linux-cluster] RFC: updating cluster.conf
In-Reply-To: <38A48FA2F0103444906AD22E14F1B5A307881516@mailxchg01.corp.opsource.net>
References: <20080623161602.GB19528@redhat.com>
	<38A48FA2F0103444906AD22E14F1B5A307881516@mailxchg01.corp.opsource.net>
Message-ID: <1214256122.3956.57.camel@localhost.localdomain>

On Mon, 2008-06-23 at 21:06 +0100, Jeff Stoner wrote:
> Using a "Augeas-type" tool might look something like the following:
> $ ccs_tool
> > get /files/etc/cluster/cluster.conf/1/name
> wilma
> > set /files/etc/cluster/cluster.conf/1/name fred
> > set /files/etc/cluster/cluster.conf/1/fence_domain/post_join_delay 10
> > save
> > deploy
> Successfully deployed configuration version 97 to cluster "fred"
> > exit
> $

You could use xmlstarlet to accomplish the above, I suppose, but it's
not a shell (rather, a command line utility).


> I absolutely do not want to install X just to use system-config-cluster.

Perfectly reasonable.


> 5. Does it have to be LDAP?

No.

> Can you make it a pluggable architecture w/
> API so we can use whatever backend we want (MySQL/Postgres/SQLite, LDAP,
> Active Directory, DT+BG server*, etc.)?

Yes, that's the plan.  The point was how much effort we should put in to
LDAP vs other things, I suspect; I could be wrong.

-- Lon


From lhh at redhat.com  Mon Jun 23 21:24:05 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 23 Jun 2008 17:24:05 -0400
Subject: [Linux-cluster] Service dependancies
In-Reply-To: <3DDA6E3E456E144DA3BB0A62A7F7F7790221624E@SKYHQAMX08.klasi.is>
References: <3DDA6E3E456E144DA3BB0A62A7F7F7790221624E@SKYHQAMX08.klasi.is>
Message-ID: <1214256245.3956.60.camel@localhost.localdomain>

On Mon, 2008-06-23 at 16:39 +0000, Finnur ?rn Gu?mundsson - TM Software
wrote:


> I was not able to find anything in the documentation regarding this,
> but if there is, please throw it at me :)

?<service name="A" depend="service:B" />
<service name="B" />

(Off the top of my head...)

-- Lon


From lhh at redhat.com  Mon Jun 23 21:26:47 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 23 Jun 2008 17:26:47 -0400
Subject: [Linux-cluster] Service dependancies
In-Reply-To: <1214256245.3956.60.camel@localhost.localdomain>
References: <3DDA6E3E456E144DA3BB0A62A7F7F7790221624E@SKYHQAMX08.klasi.is>
	<1214256245.3956.60.camel@localhost.localdomain>
Message-ID: <1214256407.3956.64.camel@localhost.localdomain>

On Mon, 2008-06-23 at 17:24 -0400, Lon Hohberger wrote:
> On Mon, 2008-06-23 at 16:39 +0000, Finnur ?rn Gu?mundsson - TM Software
> wrote:
> 
> 
> > I was not able to find anything in the documentation regarding this,
> > but if there is, please throw it at me :)
> 
> ?<service name="A" depend="service:B" />
> <service name="B" />
> 
> (Off the top of my head...)

I had that backwards...

You get the idea.  A -> B.  B fails, A stops.  B recovers, A starts.

If that doesn't do what you want, let me know and we'll work out
something using the new stuff in 5.2.

-- Lon


From sean at bruenor.org  Tue Jun 24 02:18:02 2008
From: sean at bruenor.org (Sean E. Millichamp)
Date: Mon, 23 Jun 2008 22:18:02 -0400
Subject: [Linux-cluster] RFC: updating cluster.conf
In-Reply-To: <20080623161602.GB19528@redhat.com>
References: <20080623161602.GB19528@redhat.com>
Message-ID: <1214273882.7496.30.camel@sewt>

On Mon, 2008-06-23 at 11:16 -0500, David Teigland wrote:

> 1. How often do you update cluster.conf?  ("Never" would be valuable
>    feedback.)
?
> 2. What changes do you make?  e.g. add nodes, change fencing settings,
>    add or change rgmanager settings.

Currently we rarely update it, only for node adds/removes.  We only use
Cluster Suite for fencing and GFS services though.  If/when we begin to
use rgmanager I suspect it will get updates frequently (possibly
multiple times a week).

> 3. How do you currently update cluster.conf?  Cluster online or offline?
>    Manually scp to all nodes?  ccs_tool?  conga?  What do you like and not
>    like about the method you use now?

We update cluster.conf in a version controlled repository which we use
cfengine to deploy on the hosts.  When cfengine detects a deploy it uses
ccs_tool to update it across the cluster.

I'm not certain if we would use the same mechanism if we were using
rgmanager, but I think we would.

> 4. How would you like to do updates to cluster.conf in the future?
>    Conga (graphical management interface)?  Command line program that
>    updates /etc/cluster/cluster.conf on all cluster nodes?
>    Manually scp to all nodes?  Other?

I liked Conga, but I got impatient waiting for it.  Yanking and pasting
lines in vi ended up being much faster.  Also, it didn't fit into our
"every config in version control" concept.

We also place high value on having a single repository where we can diff
all changes made to the environment across all services (if necessary)
over a period of time with the history of who made them.  While Conga
could (and may - I don't recall) provide all of this, it is more
convenient to have one central tool for auditing changes (the VC system
logs and diffs) then have to use various auditing tools in various
different GUI interfaces (if all of the services we used even had them).

I think we're probably happy to continue doing updates as we have done
them.  Having a validation tool which we could run independently from
any cluster infrastructure for sanity/syntax validation would be nice -
such as having a DTD for cluster.conf.  There may be one, but I don't
think I've ever found it.

> 5. Would you like to use an LDAP server?  All cluster nodes would read
>    cluster.conf info from the server; updates would just be made on
>    the server.

Probably not.  I can't see what that would really get us given that we
already have a version controlled cfengine setup to help us manage the
history and distribution.

Hope that helps!

Sean


From fdinitto at redhat.com  Tue Jun 24 06:39:39 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 24 Jun 2008 08:39:39 +0200 (CEST)
Subject: [Linux-cluster] RFC: updating cluster.conf
In-Reply-To: <38A48FA2F0103444906AD22E14F1B5A307881516@mailxchg01.corp.opsource.net>
References: <20080623161602.GB19528@redhat.com>
	<38A48FA2F0103444906AD22E14F1B5A307881516@mailxchg01.corp.opsource.net>
Message-ID: <Pine.LNX.4.64.0806240837340.5892@trider-g7>

On Mon, 23 Jun 2008, Jeff Stoner wrote:

> 5. Does it have to be LDAP? Can you make it a pluggable architecture w/
> API so we can use whatever backend we want (MySQL/Postgres/SQLite, LDAP,
> Active Directory, DT+BG server*, etc.)? This feature should be optional,
> though. Centralized management of cluster configurations would be nice
> but, as others have said, introduces possible dependencies and points of
> failure.

cluster3 has exactly this feature implemented.

The issue is only for people to write plugins to load the configuration 
from different sources.

Fabio

--
I'm going to make him an offer he can't refuse.


From fdinitto at redhat.com  Tue Jun 24 07:12:29 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 24 Jun 2008 09:12:29 +0200 (CEST)
Subject: [Linux-cluster] Cluster 2.99.05  (development snapshot) released
Message-ID: <Pine.LNX.4.64.0806240910090.5892@trider-g7>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


The cluster team and its community are proud to announce the 6th release
from the master branch: 2.99.05.

The 2.99.XX releases are _NOT_ meant to be used for production
environments.. yet.

You have been warned: *this code will have no mercy* for your servers and
your data.

The master branch is the main development tree that receives all new
features, code, clean up and a whole brand new set of bugs,

At some point in time this code will become the 3.0 stable release.

Everybody with test equipment and time to spare, is highly encouraged to
download, install and test the 2.99 releases and more important report
problems.

In order to build the 2.99.05 release you will need:

- - openais 0.84 or higher

- - linux kernel (git snapshot or 2.6.26) from 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git (but can 
run on 2.6.25 in compatibility mode)

NOTE to packagers: the library API/ABI's are _NOT_ stable (hence 2.9). We
are still shipping shared libraries but remember that they can change
anytime without warning. A bunch of new shared libraries have been added.

The new source tarball can be downloaded here:

   ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.05.tar.gz

In order to use GFS1, the Linux kernel requires a minimal patch:

   ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Happy clustering,
Fabio

Under the hood (from 2.99.04):

Benjamin Marzinski (2):
       gnbd-kernel: Fix receiver race
       [gnbd-kernel] bz 449812: disallow sending requests after a send has failed.

Bob Peterson (14):
       Allow keywords in block number input
       Ability to specify starting block or structure with -s
       Fix compiler warning.
       Added an optional block-size to mkfs.gfs2
       Fix build warnings in gfs2-utils.
       Fix another compiler warning for 32-bit arch.
       Fix build warnings from libgfs
       Fix gfs_debug build warning
       Ignoring gets return value in gfs_mkfs
       Fix gfs_tool build warnings
       Fix gfs_fsck build warnings
       Fix 32-bit warning in super.c.
       452004: gfs: BUG: unable to handle kernel paging request.
       savemeta was not saving gfs1 journals properly.

Christine Caulfield (2):
       [CMAN] Fix some compiler warnings on 64 bit systems
       [CMAN] use list_iterate_safe when removing nodes

David Teigland (2):
       gfs_controld: new version
       dlm_controld/gfs_controld: minor fixes

Fabio M. Di Nitto (35):
       [BUILD] Fix file permissions all around
       [MISC] Whitespace cleanup
       [MISC] Relicence rgmanager/src/resources/oracledb.sh under GPLv2+
       [GFS] Remove obsoleted gfs_edit in favour of gfs2_edit
       [MISC] Remove osl-2.1 exception from README.licence
       [MISC] Add original author for cman/qdisk/disk.c
       [MISC] Remove old copyright
       [MISC] Add another exception to COPYRIGHT
       [GFS2] Add missing include and fix build warning
       [QDISK] Add better support for Xen virtual block devices
       [CCS] Fix build warnings on sparc
       [CCS] Add missing CCSEXIT call
       [CCS] Fix priority setting
       [CCS] Fix a few logsys configuration bits
       [CCS] Remove duplicate code and make it common
       [CCS] Remove LOG_MODE_DISPLAY_DEBUG from logsys settings
       [CCS] Init logsys as early as possible
       [CCS] Shrink more common code for internal xml queries
       [CCS] Add cosmetic CCSENTER/EXIT for simple xml queries
       [CCS] Improve logsys init order
       [CCS] Fix improper log level on debugging information
       [CCS] Convert ccs logsys config to the ais format
       [QDISK] Fix build with new openais logsys
       [QDISK] Fix debug type
       [QDISK] Make get_config_data static
       [QDISK] get_config_data cleanup
       [QDISK] Remove duplicate debugging configuration
       [QDISK] Clean handling of debug envvar
       [QDISK] Init logsys later in the process
       [QDISK] Major clean up
       [BUILD] Fix new gfs_controld Makefile
       [CCS] Always check for debug setting as first thing
       [CCS] Fix debug override from command line vs config
       [QDISK] Port qdisk to the new logsys config interface
       [MISC] Logging: optimizing query sequence

James Parsons (1):
       Fix for 251358

Lon Hohberger (4):
       Fix #362351 - make fence_xvmd work in no-cluster mode
       Ancillary NOCLUSTER mode fixes for fence_xvmd
       Ancillary NOCLUSTER mode fixes for fence_xvmd
       [rgmanager] Make rgmanager check pbond links correctly

  COPYRIGHT                                          |    8 +-
  Makefile                                           |    8 +-
  README.licence                                     |    7 -
  ccs/daemon/ccsd.c                                  |   21 +-
  ccs/daemon/cnx_mgr.c                               |    8 +
  ccs/daemon/misc.c                                  |  272 ++-
  cman/daemon/ais.c                                  |    4 +-
  cman/daemon/commands.c                             |    7 +-
  cman/daemon/daemon.c                               |    4 +-
  cman/qdisk/Makefile                                |    1 -
  cman/qdisk/crc32.c                                 |    8 -
  cman/qdisk/daemon_init.c                           |    3 +-
  cman/qdisk/disk.h                                  |    1 -
  cman/qdisk/disk_util.c                             |   69 +-
  cman/qdisk/gettid.c                                |   24 -
  cman/qdisk/gettid.h                                |    7 -
  cman/qdisk/main.c                                  |  293 ++-
  cman/qdisk/mkqdisk.c                               |    2 +-
  cman/qdisk/proc.c                                  |    8 +-
  cman/qdisk/scandisk.c                              |   12 +-
  cman/qdisk/score.c                                 |   56 +-
  cman/qdisk/score.h                                 |    5 -
  dlm/doc/example.c                                  |    1 -
  dlm/libdlmcontrol/libdlmcontrol.h                  |    3 +
  dlm/libdlmcontrol/main.c                           |   12 +-
  fence/agents/egenera/fence_egenera.pl              |   22 +-
  fence/agents/xvm/fence_xvmd.c                      |   38 +-
  fence/agents/xvm/options.c                         |    1 -
  fence/agents/xvm/xml.c                             |    2 +
  fence/man/fence_xvmd.8                             |    7 +
  gfs-kernel/src/gfs/bits.c                          |    2 +-
  gfs/Makefile                                       |    6 +-
  gfs/gfs_debug/readfile.c                           |    4 +-
  gfs/gfs_edit/Makefile                              |   29 -
  gfs/gfs_edit/gfshex.c                              |  344 ---
  gfs/gfs_edit/gfshex.h                              |   10 -
  gfs/gfs_edit/hexedit.c                             |  820 -------
  gfs/gfs_edit/hexedit.h                             |  180 --
  gfs/gfs_fsck/fs_bits.c                             |   13 +-
  gfs/gfs_fsck/fs_dir.c                              |    4 +-
  gfs/gfs_fsck/fs_inode.c                            |    2 +-
  gfs/gfs_fsck/log.c                                 |    8 +-
  gfs/gfs_fsck/main.c                                |   18 +-
  gfs/gfs_fsck/pass2.c                               |    4 +-
  gfs/gfs_fsck/pass5.c                               |    4 +-
  gfs/gfs_fsck/rgrp.c                                |    4 +-
  gfs/gfs_fsck/super.c                               |   19 +-
  gfs/gfs_fsck/util.c                                |    6 +-
  gfs/gfs_mkfs/main.c                                |    4 +-
  gfs/gfs_tool/counters.c                            |    2 +-
  gfs/gfs_tool/main.c                                |    2 +-
  gfs/gfs_tool/misc.c                                |    6 +-
  gfs/gfs_tool/sb.c                                  |   11 +-
  gfs/libgfs/file.c                                  |    2 +-
  gfs/libgfs/fs_bits.c                               |    6 +-
  gfs/libgfs/fs_dir.c                                |    6 +-
  gfs/libgfs/fs_inode.c                              |    2 +-
  gfs/libgfs/log.c                                   |    8 +-
  gfs/libgfs/rgrp.c                                  |    8 +-
  gfs/libgfs/util.c                                  |    6 +-
  gfs/man/gfs_edit.8                                 |  129 +-
  gfs2/edit/hexedit.c                                |  230 +-
  gfs2/edit/hexedit.h                                |    1 +
  gfs2/edit/savemeta.c                               |   13 +
  gfs2/fsck/lost_n_found.c                           |   26 +-
  gfs2/man/gfs2_edit.8                               |    9 +-
  gfs2/man/mkfs.gfs2.8                               |   11 +-
  gfs2/mkfs/main_mkfs.c                              |   30 +-
  gfs2/quota/main.c                                  |   19 +-
  gfs2/tool/df.c                                     |    9 +-
  gnbd-kernel/src/gnbd.c                             |   60 +-
  gnbd-kernel/src/gnbd.h                             |    3 +
  group/dlm_controld/main.c                          |    2 +-
  group/gfs_controld/Makefile                        |   11 +-
  group/gfs_controld/cpg-new.c                       | 2565 ++++++++++++++++++++
  group/gfs_controld/cpg-old.c                       |  261 +--
  group/gfs_controld/cpg-old.h                       |    2 +-
  group/gfs_controld/crc.c                           |   84 +
  group/gfs_controld/gfs_daemon.h                    |   74 +-
  group/gfs_controld/main.c                          |  262 ++-
  group/gfs_controld/plock.c                         |    6 +-
  group/gfs_controld/util.c                          |   29 +-
  make/defines.mk.input                              |    4 +-
  make/fencebuild.mk                                 |    6 +-
  make/install.mk                                    |   14 +-
  make/libs.mk                                       |   32 +-
  make/man.mk                                        |   10 +-
  rgmanager/src/resources/ip.sh                      |   11 +-
  rgmanager/src/resources/oracledb.sh                |    9 -
  89 files changed, 3884 insertions(+), 2492 deletions(-)

- --
I'm going to make him an offer he can't refuse.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iQIVAwUBSGCeZAgUGcMLQ3qJAQJtlBAAsZM784Qk7L2GZY8+vQw/tvy2LJZPCrQT
RDDSpAbgZ4EhLvV3z2Ks5xx2BJLXa7nF8xcv6ZNwfruRXr1vP2kF5nd9O8WOgfHz
4O+BikR9wvyduiKuW9lzj26xbCnDZ3XuRsu1fG521mLF1jSSv59OXYa4bKeO76Cg
yiAmCkg7HvjWVksRwexgJCUVMRw5SzkluRgRRZsqxWMHE18s+Wps5aofvOTxCtlm
KrL6lcERN3pG4DMh870YUtti4/e1htrLO7ctL2LkyqueKN66zQpOZDsEU1dXJipu
x548RB9PiWAqNgKAgHCkhbRTFqtRDDwO0vjfyVI4rgfQ8RxofwidBI/X44RwxXbF
rtGYT2zyCNdDGO2qWOWW8IWZ7exu7p7ilk1GYlxbcuJxp9yYVGUQ6p+Mn/4ZKksS
DDFbjZo/HSRTE889M5ansdT42yn8EQW4Gcpf5w2PHXRdo0+xFuPpc2PRJTzzgTkg
hvViCaB2+V3Sqtva6++2r7zsEL3Fb+JDtw37CLxQffuBs3mEAQMEDr5P8QXbtLky
tTRRUu3iS+TTOfk0hm1XO0bb2kxDPCt34MomN0H/SM3BICo5BJ10ohBx74xdrvRJ
+uRBUqlyh3/gGsRkM1BnkxQvDBuFeBYuUasVfjbQYWixKgOqt2I3172J8SaxKiyp
QS3rAcuV96g=
=U6m9
-----END PGP SIGNATURE-----


From grimme at atix.de  Tue Jun 24 07:53:56 2008
From: grimme at atix.de (Marc Grimme)
Date: Tue, 24 Jun 2008 09:53:56 +0200
Subject: [Linux-cluster] Spectator mount option
Message-ID: <200806240953.56658.grimme@atix.de>

Hello,
we are currently testing the specator mount option for giving nodes readonly 
access to a gfs filesystem.

One thing we found out is that any node having mounted the filesystem with 
spectator mount option cannot do recovery when a node in the cluster fails. 
That means we need at least 2 rw-nodes. It's clear when I keep in mind that 
the node has no rw-access to the journal and therefor cannot do the journal 
replay. But it is not mentioned anywhere.

Could you please explain the ideas and other "unnormal" behaviors coming along 
with the spectator mount-options.

And are there any advantages from it except the "having no journal"?

-- 
Gruss / Regards,

Marc Grimme
http://www.atix.de/               http://www.open-sharedroot.org/


From fdinitto at redhat.com  Tue Jun 24 08:15:38 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 24 Jun 2008 10:15:38 +0200 (CEST)
Subject: [Linux-cluster] Disabling cman at boot
In-Reply-To: <a01fe36d0806200634l3f50edc6n1f351befd663f95f@mail.gmail.com>
References: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>
	<a01fe36d0806180655u6f9621bcg5a0f56b3d975e627@mail.gmail.com>
	<Pine.LNX.4.64.0806190933400.5892@trider-g7>
	<Pine.LNX.4.64.0806191006080.16390@lxserv0.kfki.hu>
	<Pine.LNX.4.64.0806191418010.5892@trider-g7>
	<a01fe36d0806191410x188f8b71nbd7c9858993756c7@mail.gmail.com>
	<Pine.LNX.4.64.0806201041050.5892@trider-g7>
	<a01fe36d0806200634l3f50edc6n1f351befd663f95f@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0806241013390.5892@trider-g7>

On Fri, 20 Jun 2008, Federico Simoncelli wrote:

> On Fri, Jun 20, 2008 at 10:42 AM, Fabio M. Di Nitto <fdinitto at redhat.com> wrote:
>>> Do you think we should file a bug and continue the discussion there?
>>
>> Sure. that's also fine by me.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=452234

thanks.

>
> As I wrote in the bug description... the solution I proposed (adding a
> boot parameter) will also make impossible to start the services
> manually later on (until next reboot). This might be why Jozsef was
> using the check on a file (which can be easily removed). Jozsef can
> you confirm this?
> Do you think we can find a way to remove this side effect?
> Is there a way to detect if the init script was run by the init binary
> or manually from the shell?

I am pretty sure there is a way to know if the script is invoked by init 
or not and believe it might be different from distro to distro, so this 
needs to be investigated properly.

> Should we add an other init command such as "force-start" to override
> the boot parameter check?

Maybe...

> Any better idea is more than welcome.

I'd really like to see you guys finding the right way to fullfil your 
needs here. I am absolutely open to solutions that will make your life 
easier.

Fabio

--
I'm going to make him an offer he can't refuse.


From kadlec at sunserv.kfki.hu  Tue Jun 24 08:28:51 2008
From: kadlec at sunserv.kfki.hu (Kadlecsik Jozsef)
Date: Tue, 24 Jun 2008 10:28:51 +0200 (CEST)
Subject: [Linux-cluster] Disabling cman at boot
In-Reply-To: <a01fe36d0806200634l3f50edc6n1f351befd663f95f@mail.gmail.com>
References: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>
	<a01fe36d0806180655u6f9621bcg5a0f56b3d975e627@mail.gmail.com>
	<Pine.LNX.4.64.0806190933400.5892@trider-g7>
	<Pine.LNX.4.64.0806191006080.16390@lxserv0.kfki.hu>
	<Pine.LNX.4.64.0806191418010.5892@trider-g7>
	<a01fe36d0806191410x188f8b71nbd7c9858993756c7@mail.gmail.com>
	<Pine.LNX.4.64.0806201041050.5892@trider-g7>
	<a01fe36d0806200634l3f50edc6n1f351befd663f95f@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0806241014230.10802@lxserv1.kfki.hu>

On Fri, 20 Jun 2008, Federico Simoncelli wrote:

> On Fri, Jun 20, 2008 at 10:42 AM, Fabio M. Di Nitto <fdinitto at redhat.com> wrote:
> >> Do you think we should file a bug and continue the discussion there?
> >
> > Sure. that's also fine by me.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=452234
> 
> As I wrote in the bug description... the solution I proposed (adding a
> boot parameter) will also make impossible to start the services
> manually later on (until next reboot). This might be why Jozsef was
> using the check on a file (which can be easily removed). Jozsef can
> you confirm this?

Yes, that was the easiest way to make possible to (re)start services 
without the need to reboot.

> Do you think we can find a way to remove this side effect?

It could be worked around by adding new command options to the 
cman/gfs/... init scripts:

case "$1" in
   start)
	<check boot option>
	# fall through
   force-start)
	# real startup

> Is there a way to detect if the init script was run by the init binary
> or manually from the shell?

I don't know any reliable way to detect it - of course it might be 
possible.

> Should we add an other init command such as "force-start" to override
> the boot parameter check?

I think the cleanest solution would be to add new command options to the 
init scripts. 

Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary


From sunhux at gmail.com  Tue Jun 24 08:38:35 2008
From: sunhux at gmail.com (sunhux G)
Date: Tue, 24 Jun 2008 16:38:35 +0800
Subject: [Linux-cluster] live & standby (primary & secondary partitions)
	in "multipath -ll"
In-Reply-To: <485FD501.3030008@gmail.com>
References: <60f08e700806230038j1bdabe9eue6ffe5c9db2089f7@mail.gmail.com>
	<485FD501.3030008@gmail.com>
Message-ID: <60f08e700806240138l5f491b29vb94963153746731e@mail.gmail.com>

Thanks Wendy, that answered my original question.

I should have rephrased my question :

I received an alert email from Filer1 :
"autosupport.doit             FCP PARTNER PATH MISCONFIGURED"

when our outsourced DBA built the Oracle ASM & ocfs2 partitions on
/dev/sdc1, /dev/sdd1, /dev/sde1, /dev/sdf1 & /dev/sdg1 & I suspected
this is related to building the partitions on the "enabled" (ie standby)
partitions as shown by "multipath -ll".  After we got the DBA to wipe
out & rebuild the ASM/ocfs2 on those partitions that are listed under
"prio 8" by "multipath -ll" (see lines indicated by ** in my 1st email),
the error of "FCP partner path misconfigured" cleared.

Below is the "sanlun lun show ...  (detail)" command that I've captured
about 1 month ago & all the /dev/sd* practically comes under Filer2.
So, now I'm not too sure what caused the "misconfiguration" error.

All the /dev/sd* that the DBA used are listed by the sanlun command,
so it's really puzzling.

Error is now cleared but we want to know what triggered the error so
as to prevent future similar problems.


Thanks
Sun


***************** "sanlun lun show ..."  output :  ***********************


  filer:                  lun-pathname                 device filename
adapter
 protocol          lun size         lun state
 FILER2:  /vol/filer2_vol1/landnetrac/racasm1.lun   /dev/sdl         host8
  FCP            5g (5368709120)     GOOD
        Serial number: VrD9c4I3lxmi
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098283b121b4

        Filer adapter name: v.9b
        Filer IP address:       10.51.1.3
        Filer volume name:file2_vol1   FSID:0x2e012126
        Filer qtree name:/vol/filer2_vol1/landnetrac   ID:0x9000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN1/landnetrac/raclun9.lun   /dev/sdu         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3lMk6
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098293b121b4

        Filer adapter name: 9b
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN1   FSID:0x11c04b3
        Filer qtree name:/vol/File2_ag4_LN1/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/filer2_vol1/landnetrac/racasm2.lun   /dev/sdy         host8
  FCP            5g (5368709120)     GOOD
        Serial number: VrD9c4I3p4PU
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098293b121b4

        Filer adapter name: 9b
        Filer IP address:       10.51.1.3
        Filer volume name:file2_vol1   FSID:0x2e012126
        Filer qtree name:/vol/filer2_vol1/landnetrac   ID:0x9000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN2/landnetrac/raclun11.lun  /dev/sdv         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3pkGp
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098293b121b4

        Filer adapter name: 9b
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN2   FSID:0x21c1704
        Filer qtree name:/vol/File2_ag4_LN2/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/filer2_vol1/landnetrac/racasm1.lun   /dev/sdx         host8
  FCP            5g (5368709120)     GOOD
        Serial number: VrD9c4I3lxmi
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098293b121b4

        Filer adapter name: 9b
        Filer IP address:       10.51.1.3
        Filer volume name:file2_vol1   FSID:0x2e012126
        Filer qtree name:/vol/filer2_vol1/landnetrac   ID:0x9000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN1/landnetrac/raclun9.lun   /dev/sdo         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3lMk6
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098193b121b4

        Filer adapter name: 9a
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN1   FSID:0x11c04b3
        Filer qtree name:/vol/File2_ag4_LN1/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN1/landnetrac/raclun10.lun  /dev/sdb         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3lQLu
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098183b121b4

        Filer adapter name: v.9a
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN1   FSID:0x11c04b3
        Filer qtree name:/vol/File2_ag4_LN1/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/filer2_vol1/landnetrac/racasm1.lun   /dev/sdr         host8
  FCP            5g (5368709120)     GOOD
        Serial number: VrD9c4I3lxmi
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098193b121b4

        Filer adapter name: 9a
        Filer IP address:       10.51.1.3
        Filer volume name:file2_vol1   FSID:0x2e012126
        Filer qtree name:/vol/filer2_vol1/landnetrac   ID:0x9000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN1/landnetrac/raclun9.lun   /dev/sdi         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3lMk6
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098283b121b4

        Filer adapter name: v.9b
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN1   FSID:0x11c04b3
        Filer qtree name:/vol/File2_ag4_LN1/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/filer2_vol1/landnetrac/racasm2.lun   /dev/sds         host8
  FCP            5g (5368709120)     GOOD
        Serial number: VrD9c4I3p4PU
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098193b121b4

        Filer adapter name: 9a
        Filer IP address:       10.51.1.3
        Filer volume name:file2_vol1   FSID:0x2e012126
        Filer qtree name:/vol/filer2_vol1/landnetrac   ID:0x9000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/filer2_vol1/landnetrac/racasm2.lun   /dev/sdm         host8
  FCP            5g (5368709120)     GOOD
        Serial number: VrD9c4I3p4PU
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098283b121b4

        Filer adapter name: v.9b
        Filer IP address:       10.51.1.3
        Filer volume name:file2_vol1   FSID:0x2e012126
        Filer qtree name:/vol/filer2_vol1/landnetrac   ID:0x9000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN1/landnetrac/raclun10.lun  /dev/sdt         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3lQLu
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098293b121b4

        Filer adapter name: 9b
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN1   FSID:0x11c04b3
        Filer qtree name:/vol/File2_ag4_LN1/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN2/landnetrac/raclun12.lun  /dev/sde         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3poSl
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098183b121b4

        Filer adapter name: v.9a
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN2   FSID:0x21c1704
        Filer qtree name:/vol/File2_ag4_LN2/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/filer2_vol1/landnetrac/racasm2.lun   /dev/sdg         host8
  FCP            5g (5368709120)     GOOD
        Serial number: VrD9c4I3p4PU
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098183b121b4

        Filer adapter name: v.9a
        Filer IP address:       10.51.1.3
        Filer volume name:file2_vol1   FSID:0x2e012126
        Filer qtree name:/vol/filer2_vol1/landnetrac   ID:0x9000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN1/landnetrac/raclun10.lun  /dev/sdn         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3lQLu
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098193b121b4

        Filer adapter name: 9a
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN1   FSID:0x11c04b3
        Filer qtree name:/vol/File2_ag4_LN1/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN2/landnetrac/raclun11.lun  /dev/sdj         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3pkGp
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098283b121b4

        Filer adapter name: v.9b
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN2   FSID:0x21c1704
        Filer qtree name:/vol/File2_ag4_LN2/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN2/landnetrac/raclun12.lun  /dev/sdq         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3poSl
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098193b121b4

        Filer adapter name: 9a
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN2   FSID:0x21c1704
        Filer qtree name:/vol/File2_ag4_LN2/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN2/landnetrac/raclun11.lun  /dev/sdd         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3pkGp
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098183b121b4

        Filer adapter name: v.9a
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN2   FSID:0x21c1704
        Filer qtree name:/vol/File2_ag4_LN2/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/filer2_vol1/landnetrac/racasm1.lun   /dev/sdf         host8
  FCP            5g (5368709120)     GOOD
        Serial number: VrD9c4I3lxmi
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098183b121b4

        Filer adapter name: v.9a
        Filer IP address:       10.51.1.3
        Filer volume name:file2_vol1   FSID:0x2e012126
        Filer qtree name:/vol/filer2_vol1/landnetrac   ID:0x9000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN1/landnetrac/raclun9.lun   /dev/sdc         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3lMk6
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098183b121b4

        Filer adapter name: v.9a
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN1   FSID:0x11c04b3
        Filer qtree name:/vol/File2_ag4_LN1/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN2/landnetrac/raclun12.lun  /dev/sdw         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3poSl
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098293b121b4

        Filer adapter name: 9b
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN2   FSID:0x21c1704
        Filer qtree name:/vol/File2_ag4_LN2/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN2/landnetrac/raclun12.lun  /dev/sdk         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3poSl
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098283b121b4

        Filer adapter name: v.9b
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN2   FSID:0x21c1704
        Filer qtree name:/vol/File2_ag4_LN2/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN1/landnetrac/raclun10.lun  /dev/sdh         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3lQLu
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098283b121b4

        Filer adapter name: v.9b
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN1   FSID:0x11c04b3
        Filer qtree name:/vol/File2_ag4_LN1/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
 FILER2:  /vol/File2_ag4_LN2/landnetrac/raclun11.lun  /dev/sdp         host8
  FCP         40.0g (42953867264)    GOOD
        Serial number: VrD9c4I3pkGp
        Filer FCP nodename:500a098083b121b4  Filer FCP
portname:500a098193b121b4

        Filer adapter name: 9a
        Filer IP address:       10.51.1.3
        Filer volume name:File2_ag4_LN2   FSID:0x21c1704
        Filer qtree name:/vol/File2_ag4_LN2/landnetrac   ID:0x1000000
        Filer snapshot name:?   ID:0x0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080624/a0897303/attachment.htm>

From denisb+gmane at gmail.com  Tue Jun 24 08:41:14 2008
From: denisb+gmane at gmail.com (denis)
Date: Tue, 24 Jun 2008 10:41:14 +0200
Subject: [Linux-cluster] Re: RFC: updating cluster.conf
In-Reply-To: <20080623161602.GB19528@redhat.com>
References: <20080623161602.GB19528@redhat.com>
Message-ID: <g3qbva$n7n$1@ger.gmane.org>

David Teigland wrote:
> Hi,
> 
> We're looking into how cluster.conf updates should be done in future
> versions and we'd like some feedback about how you currently do this, and
> what you'd like to see.
> 
> 1. How often do you update cluster.conf?  ("Never" would be valuable
>    feedback.)

After stabilizing the config, ~never, but probably will be something 
like 1 per 6months. Prior to setting the system in production, very 
frequent updates (several per day).

> 2. What changes do you make?  e.g. add nodes, change fencing settings,
>    add or change rgmanager settings.

During test, change fencing settings, modify services / hosts.

> 3. How do you currently update cluster.conf?  Cluster online or offline?
>    Manually scp to all nodes?  ccs_tool?  conga?  What do you like and not
>    like about the method you use now?

Cluster online:
  1. Edit cluster.conf.new in Puppet working copy
  2. Commit to Puppet SVN
  3. "puppetd -t" on all nodes or verify that cluster.conf.new is 
updated on all nodes if Puppet is running automatically.
  4. "ccs_tool update /etc/cluster/cluster.conf.new"

Like about it :
  * Centralized storage of all configuration in Puppet
  * Full control of update process
  * Revision control of configuration makes it easier to debug errors 
due to changes

Areas that could be improved :
  * Very manual process

> 4. How would you like to do updates to cluster.conf in the future?
>    Conga (graphical management interface)?  Command line program that
>    updates /etc/cluster/cluster.conf on all cluster nodes?
>    Manually scp to all nodes?  Other?

Would be great if I could have some sort of automated loading of 
configuration changes when all nodes have the correct version of the 
config.

> 5. Would you like to use an LDAP server?  All cluster nodes would read
>    cluster.conf info from the server; updates would just be made on
>    the server.

Maybe. Not sure if the added complexity would be worthwhile in this 
small setup. For larger clusters I understand the benefits.

Should at least be an optional way of doing things imho.

Regards
--
DenisB


From gpbuono at gmail.com  Tue Jun 24 11:45:39 2008
From: gpbuono at gmail.com (Gian Paolo Buono)
Date: Tue, 24 Jun 2008 13:45:39 +0200
Subject: [Linux-cluster] can't communicate with fenced -1
Message-ID: <c60a46e50806240445w4fb1cf2an79147a801bee7874@mail.gmail.com>

Hi,

We have two RHEL5.1 boxes installed sharing a
single iscsi emc2 SAN, whitout fence devices. System is configured
as a high-availability system of xen guest.

One of the most repeating problems are fence_tool related.

  # service cman start
  Starting cluster:
     Loading modules... done
     Mounting configfs... done
     Starting ccsd... done
     Starting cman... done
     Starting daemons... done
     Starting fencing... fence_tool: can't communicate with fenced -1

  # fenced -D
  1204556546 cman_init error 0 111

  # clustat
  CMAN is not running.

  # cman_tool join

  # clustat
  msg_open: Connection refused
  Member Status: Quorate
    Member Name                        ID   Status
    ------ ----                        ---- ------
    yoda1                             1 Online, Local
    yoda2                             2 Offline

Sometimes this problem gets solved if the two machines are rebooted at
the same time. But in the current HA configuration, I cannot guarantee
two systems will be rebooted at the same time for every problem we
face. This is my config file:

###################################cluster.conf####################################

<?xml version="1.0"?>
<cluster alias="yoda-cl" config_version="2" name="yoda-cl">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="yoda2" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="yoda1" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
        <fencedevices/>
</cluster>
###################################cluster.conf####################################
Regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080624/1fa16f38/attachment.htm>

From fdinitto at redhat.com  Tue Jun 24 12:15:17 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 24 Jun 2008 14:15:17 +0200 (CEST)
Subject: [Linux-cluster] RFC: updating cluster.conf
In-Reply-To: <20080623161602.GB19528@redhat.com>
References: <20080623161602.GB19528@redhat.com>
Message-ID: <Pine.LNX.4.64.0806241409440.5892@trider-g7>

On Mon, 23 Jun 2008, David Teigland wrote:

> Hi,
>
> We're looking into how cluster.conf updates should be done in future
> versions and we'd like some feedback about how you currently do this, and
> what you'd like to see.
>
> 1. How often do you update cluster.conf?  ("Never" would be valuable
>   feedback.)

Production: rarely if nodes are behaving properly and no switch to other 
nodes is required

as developer, you know better than me.. virtually every 5 minutes :)

>
> 2. What changes do you make?  e.g. add nodes, change fencing settings,
>   add or change rgmanager settings.

Production: switch nodes, add/modify services.

> 3. How do you currently update cluster.conf?  Cluster online or offline?
>   Manually scp to all nodes?  ccs_tool?  conga?  What do you like and not
>   like about the method you use now?

Production: manually edit cluster.conf, ccs_tool to propagate across the 
nodes. I like it because i can do the changes in one place and they are 
propagated all over the cluster in one command.

as developer, i don't have a standard way of doing it. I often use scp 
wrapper scripts since the configuration changes are done offline and 
involves several nodes to be powered off or not available.

> 4. How would you like to do updates to cluster.conf in the future?
>   Conga (graphical management interface)?  Command line program that
>   updates /etc/cluster/cluster.conf on all cluster nodes?
>   Manually scp to all nodes?  Other?

For production I generally like the idea of changes in one place and then 
automatic propagation and this generally fits also #5

>
> 5. Would you like to use an LDAP server?  All cluster nodes would read
>   cluster.conf info from the server; updates would just be made on
>   the server.

Yes. the main idea is to fire up a node without having to worry about a 
local copy of a file and get all the info down the pipe.

_theoretically_ even a wget http://foo.bar.com/cluster.conf would do.

Fabio

--
I'm going to make him an offer he can't refuse.


From s.wendy.cheng at gmail.com  Tue Jun 24 14:46:48 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Tue, 24 Jun 2008 10:46:48 -0400
Subject: [Linux-cluster] live & standby (primary & secondary partitions)
	in "multipath -ll"
In-Reply-To: <60f08e700806240138l5f491b29vb94963153746731e@mail.gmail.com>
References: <60f08e700806230038j1bdabe9eue6ffe5c9db2089f7@mail.gmail.com>	<485FD501.3030008@gmail.com>
	<60f08e700806240138l5f491b29vb94963153746731e@mail.gmail.com>
Message-ID: <486108D8.9030503@gmail.com>

sunhux G wrote:
>  
> Thanks Wendy, that answered my original question.
>  
> I should have rephrased my question :
>  
> I received an alert email from Filer1 :
> "autosupport.doit             FCP PARTNER PATH MISCONFIGURED"
>  
> when our outsourced DBA built the Oracle ASM & ocfs2 partitions on
> /dev/sdc1, /dev/sdd1, /dev/sde1, /dev/sdf1 & /dev/sdg1 & I suspected
> this is related to building the partitions on the "enabled" (ie standby)
> partitions as shown by "multipath -ll".  After we got the DBA to wipe
> out & rebuild the ASM/ocfs2 on those partitions that are listed under
> "prio 8" by "multipath -ll" (see lines indicated by ** in my 1st email),
> the error of "FCP partner path misconfigured" cleared.
>  
Funny that you got Netapp cluster questions answered on Linux-cluster 
mailing list :) ...

Anyway, assume your filers are on Data Ontap 10.x releases and they are 
clustered ? The action you'd taken *is* correct. For details, check out: 
https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb30541 where 
the issue is documented.  Let me know if you can't access to the KB 
article.

"Hosts (linux) should, under normal circumstances, only access LUNs 
through ports on the cluster node which hosts the LUN. I/O paths that 
utilize the ports of the cluster node that host the LUN are referred to 
as primary paths or optimized paths. I/O paths that utilize the partner 
cluster node are known as secondary paths, partner paths or 
non-optimized paths. A LUN should only be accessed through the partner 
cluster node when the primary ports are unavailable. I/O access to LUNs 
using a secondary path indicates one or both of the following 
conditions: the primary path(s) between host and storage controller have 
failed, or host MPIO software is not configured correctly. These 
conditions indicate that the redundancy and performance of the SAN has 
been compromised. Corrective action should be taken immediately to 
restore primary paths to the storage controllers. "

-- Wendy


From kanderso at redhat.com  Tue Jun 24 14:48:00 2008
From: kanderso at redhat.com (Kevin Anderson)
Date: Tue, 24 Jun 2008 09:48:00 -0500
Subject: [Linux-cluster] Spectator mount option
In-Reply-To: <200806240953.56658.grimme@atix.de>
References: <200806240953.56658.grimme@atix.de>
Message-ID: <1214318880.2862.4.camel@localhost.localdomain>

On Tue, 2008-06-24 at 09:53 +0200, Marc Grimme wrote:
> Hello,
> we are currently testing the specator mount option for giving nodes readonly 
> access to a gfs filesystem.
> 
> One thing we found out is that any node having mounted the filesystem with 
> spectator mount option cannot do recovery when a node in the cluster fails. 
> That means we need at least 2 rw-nodes. It's clear when I keep in mind that 
> the node has no rw-access to the journal and therefor cannot do the journal 
> replay. But it is not mentioned anywhere.
> 
> Could you please explain the ideas and other "unnormal" behaviors coming along 
> with the spectator mount-options.
> 
> And are there any advantages from it except the "having no journal"?
> 
Marc,

Funny you should bring this up.  We were just discussing a good use case
for the spectator mode and the advantages of it.  Dave and Chrissie are
looking at putting together a document on it, maybe you can help?

Some of items we are discussing:
- Central central cluster nodes that provide quorum votes
- Spectator nodes don't have quorum counts
- DLM balancing such that all lock hosting is done on the central
cluster nodes
- No fencing required for spectator nodes

Thanks
Kevin


From fog at t.is  Tue Jun 24 15:40:16 2008
From: fog at t.is (=?iso-8859-1?Q?Finnur_=D6rn_Gu=F0mundsson_-_TM_Software?=)
Date: Tue, 24 Jun 2008 15:40:16 -0000
Subject: [Linux-cluster] Oracle 10G resource agent - status polling
Message-ID: <3DDA6E3E456E144DA3BB0A62A7F7F7790221640D@SKYHQAMX08.klasi.is>

Hi,

 
I'm in the middle of configuring a HA cluster for Oracle, and everything is working as planned....failover etc. However there is one thing does bug me a bit, and that is:

 
If i start the database with the cluster software (clusvcadm -e oracle10), let it start and log into the database and run shutdown abort (or just kill it ...whatever) the cluster software does not seem to notice this until after around 5 minutes.

 
Is there any way to make the status checks more frequent ?

 
K?r kve?ja / Best Regards,

Finnur ?rn Gu?mundsson
Network Engineer - Network Operations
fog at t.is <mailto:fog at t.is> 

TM Software
Ur?arhvarf 6, IS-203 K?pavogur, Iceland
Tel: +354 545 3000 - fax +354 545 3610
www.tm-software.is <http://www.tm-software.is/> 

This e-mail message and any attachments are confidential and may be privileged. TM Software e-mail disclaimer: www.tm-software.is/disclaimer <http://www.tm-software.is/disclaimer>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080624/96947e19/attachment.htm>

From federico.simoncelli at gmail.com  Tue Jun 24 15:59:07 2008
From: federico.simoncelli at gmail.com (Federico Simoncelli)
Date: Tue, 24 Jun 2008 17:59:07 +0200
Subject: [Linux-cluster] Disabling cman at boot
In-Reply-To: <Pine.LNX.4.64.0806241014230.10802@lxserv1.kfki.hu>
References: <Pine.LNX.4.64.0806181023120.23043@lxserv0.kfki.hu>
	<a01fe36d0806180655u6f9621bcg5a0f56b3d975e627@mail.gmail.com>
	<Pine.LNX.4.64.0806190933400.5892@trider-g7>
	<Pine.LNX.4.64.0806191006080.16390@lxserv0.kfki.hu>
	<Pine.LNX.4.64.0806191418010.5892@trider-g7>
	<a01fe36d0806191410x188f8b71nbd7c9858993756c7@mail.gmail.com>
	<Pine.LNX.4.64.0806201041050.5892@trider-g7>
	<a01fe36d0806200634l3f50edc6n1f351befd663f95f@mail.gmail.com>
	<Pine.LNX.4.64.0806241014230.10802@lxserv1.kfki.hu>
Message-ID: <a01fe36d0806240859q3361980cg4232365d83e0b869@mail.gmail.com>

On Tue, Jun 24, 2008 at 10:28 AM, Kadlecsik Jozsef
<kadlec at sunserv.kfki.hu> wrote:
> It could be worked around by adding new command options to the
> cman/gfs/... init scripts:
>
> case "$1" in
>   start)
>        <check boot option>
>        # fall through
>   force-start)
>        # real startup
>

Yes that's what I'm using even though I think you can't fall through in bash.
It looks like the best solution for now; the problem is that you will
always have to use "force-start" until you reboot.
What about something like the rc.sysinit which asks to press "I" for
interactive startup?
Can we add something like that in the cman service?
We could ask "Press S to skip" and wait a very short time for an interaction.
I'm not sure if I like this either. Comments?

-- 
Federico.


From teigland at redhat.com  Tue Jun 24 16:38:41 2008
From: teigland at redhat.com (David Teigland)
Date: Tue, 24 Jun 2008 11:38:41 -0500
Subject: [Linux-cluster] Spectator mount option
In-Reply-To: <200806240953.56658.grimme@atix.de>
References: <200806240953.56658.grimme@atix.de>
Message-ID: <20080624163841.GB19067@redhat.com>

On Tue, Jun 24, 2008 at 09:53:56AM +0200, Marc Grimme wrote:
> Hello,
> we are currently testing the specator mount option for giving nodes readonly 
> access to a gfs filesystem.
> 
> One thing we found out is that any node having mounted the filesystem with 
> spectator mount option cannot do recovery when a node in the cluster fails. 
> That means we need at least 2 rw-nodes. It's clear when I keep in mind that 
> the node has no rw-access to the journal and therefor cannot do the journal 
> replay. But it is not mentioned anywhere.
> 
> Could you please explain the ideas and other "unnormal" behaviors coming
> along with the spectator mount-options.
> 
> And are there any advantages from it except the "having no journal"?

It's not mentioned much because it's never crossed the grey line of being
a promoted or "supportable" feature.  Among the reasons are:

- The use case(s) or benefits have never been clearly understood or stated.
  What exactly are the spectator features?  (see below)
  When should you use the spectator features, and why?
  Are the benefits great enough to justify all the work/testing?

- None of the spectator features have been tested.  QE would need to
  develop tests for them, run them, and we'd need to fix the problems
  that fall out.

"Spectator features" refers to more than the spectator mount option in
gfs.  There are three non-standard configuration modes that could be used
together (although they can be used independently, too):

1. The spectator mount option in gfs.  With this option, gfs will never
   write to the fs.  It won't do journal recovery, and won't allow
   remount rw.  The main benefit of this is that the node does not need
   to be fenced if it fails, so the node can mount without joining the
   fence domain.

   You point out some of the thorny problems with this option (along with
   the ro mount option).  What happens when the last rw node fails,
   leaving only spectators who can't recover the journal, and other
   similar scenarios?  gfs_controld has code to handle these cases,
   but it would require serious testing/validation.

2. Quorum votes in cman.  It may make sense in some environments for a node
   to not contribute to quorum, either positively or negatively, of course.
   <clusternode name="foo" nodeid="1" votes="0"/>

3. Resource mastering in dlm.  Nodes can be configured to never master
   any dlm resources, which means there's less disruption in the dlm when
   they join/leave the lockspace.  See this bug for more details:
   https://bugzilla.redhat.com/show_bug.cgi?id=206336

We'd like to understand specific use cases people have where these things
would provide real advantages.  We need to be able to advise people when,
why and how to use these settings, and we need to be able to test them as
they'd be used.

Thanks,
Dave


From lhh at redhat.com  Tue Jun 24 17:03:53 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 24 Jun 2008 13:03:53 -0400
Subject: [Linux-cluster] Oracle 10G resource agent - status polling
In-Reply-To: <3DDA6E3E456E144DA3BB0A62A7F7F7790221640D@SKYHQAMX08.klasi.is>
References: <3DDA6E3E456E144DA3BB0A62A7F7F7790221640D@SKYHQAMX08.klasi.is>
Message-ID: <1214327033.3143.9.camel@localhost.localdomain>

On Tue, 2008-06-24 at 15:40 +0000, Finnur ?rn Gu?mundsson - TM Software
wrote:
> Hi,

> I?m in the middle of configuring a HA cluster for Oracle, and
> everything is working as planned....failover etc. However there is one
> thing does bug me a bit, and that is:
> 
>  
> 
> If i start the database with the cluster software (clusvcadm ?e
> oracle10), let it start and log into the database and run shutdown
> abort (or just kill it ...whatever) the cluster software does not seem
> to notice this until after around 5 minutes.

(1) Is this with it fully started?  The 'status' check will wait until
the 'start' is complete - this can take several minutes.

(2) [likely the problem] The default check interval in the oracledb.sh
resource agent is 5 minutes.  That's probably a bit long, even for a
heavily loaded Oracle instance.


You have two options -

 - edit /usr/share/cluster/oracledb.sh and change the 'status' and
'monitor' action intervals (well, status only; we don't use monitor).


 - add a special tag below the resource agent in cluster.conf:
   <action name="status" depth="*" interval="1m"/>

   (This overrides the policies in /usr/share/cluster/* on a
    per-instance basis)
 
?Note that if you set it too fast such that the previous status check
hasn't completed by the time the new status check is supposed to occur,
the new status check will get thrown away.

-- Lon


From s.wendy.cheng at gmail.com  Tue Jun 24 17:30:26 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Tue, 24 Jun 2008 13:30:26 -0400
Subject: [Linux-cluster] live & standby (primary & secondary partitions)
	in "multipath -ll"
In-Reply-To: <486108D8.9030503@gmail.com>
References: <60f08e700806230038j1bdabe9eue6ffe5c9db2089f7@mail.gmail.com>	<485FD501.3030008@gmail.com>
	<60f08e700806240138l5f491b29vb94963153746731e@mail.gmail.com>
	<486108D8.9030503@gmail.com>
Message-ID: <48612F32.8050003@gmail.com>


>
> Anyway, assume your filers are on Data Ontap 10.x releases and they 
> are clustered ? 

Sorry, didn't read the rest of the post until now and forgot that 10.x 
releases out in the field do not support FCP protocol. So apparently you 
are on "7.x" releases. The KnowledgeBase article that I referred to is 
*still* valid though.

If the "sanlun lun show" output are all from Filer2, than most likely 
your filers admin had assigned all the disks to Filer2, even Filer1 can 
see them. Check with your filer admin if you want to load balancing the 
filers.

-- Wendy


From fog at t.is  Tue Jun 24 17:35:22 2008
From: fog at t.is (=?UTF-8?B?RmlubnVyIMOWcm4gR3XDsG11bmRzc29uIC0gVE0gU29mdA==?= =?UTF-8?B?d2FyZQ==?=)
Date: Tue, 24 Jun 2008 17:35:22 -0000
Subject: [Linux-cluster] Oracle 10G resource agent - status polling
In-Reply-To: <1214327033.3143.9.camel@localhost.localdomain>
References: <3DDA6E3E456E144DA3BB0A62A7F7F7790221640D@SKYHQAMX08.klasi.is>
	<1214327033.3143.9.camel@localhost.localdomain>
Message-ID: <3DDA6E3E456E144DA3BB0A62A7F7F7790221644D@SKYHQAMX08.klasi.is>

Hi,

(Sorry for the top posting, i blame my email client!)

The service is started yes.

I was going for the cluster.conf way, IE, setting the status check to 1 minute.

I can see it should not be a problem with a script resource but since i use the Oracle Resource Agent it kinda runs around in my head.

Here is what i tried:

                <service autostart="1" exclusive="0" name="oracle1" recovery="relocate">
                        <ip __independent_subtree="1" ref="10.x.x.x">
                                <fs ref="ora1-data"/>
                                <fs ref="ora1-archlogs"/>
                                <oracledb home="/u01/app/oracle" name="oracle1" type="10g" user="oracle"/>
                                <action name="status" depth="*" interval="1m"/>
                        </ip>
                </service>

However this does not seem to work, but i am pretty sure it is just because the oracle agent is not configured as a "resource" ....Am i right?

Thanks,
Finnur

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: 24. j?n? 2008 17:04
To: linux clustering
Subject: Re: [Linux-cluster] Oracle 10G resource agent - status polling

On Tue, 2008-06-24 at 15:40 +0000, Finnur ?rn Gu?mundsson - TM Software
wrote:
> Hi,

> I?m in the middle of configuring a HA cluster for Oracle, and
> everything is working as planned....failover etc. However there is one
> thing does bug me a bit, and that is:
> 
>  
> 
> If i start the database with the cluster software (clusvcadm ?e
> oracle10), let it start and log into the database and run shutdown
> abort (or just kill it ...whatever) the cluster software does not seem
> to notice this until after around 5 minutes.

(1) Is this with it fully started?  The 'status' check will wait until
the 'start' is complete - this can take several minutes.

(2) [likely the problem] The default check interval in the oracledb.sh
resource agent is 5 minutes.  That's probably a bit long, even for a
heavily loaded Oracle instance.


You have two options -

 - edit /usr/share/cluster/oracledb.sh and change the 'status' and
'monitor' action intervals (well, status only; we don't use monitor).


 - add a special tag below the resource agent in cluster.conf:
   <action name="status" depth="*" interval="1m"/>

   (This overrides the policies in /usr/share/cluster/* on a
    per-instance basis)
 
?Note that if you set it too fast such that the previous status check
hasn't completed by the time the new status check is supposed to occur,
the new status check will get thrown away.

-- Lon


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From grimme at atix.de  Tue Jun 24 19:54:48 2008
From: grimme at atix.de (Marc Grimme)
Date: Tue, 24 Jun 2008 21:54:48 +0200
Subject: [Linux-cluster] Spectator mount option
In-Reply-To: <20080624163841.GB19067@redhat.com>
References: <200806240953.56658.grimme@atix.de>
	<20080624163841.GB19067@redhat.com>
Message-ID: <200806242154.49147.grimme@atix.de>

Ok, got it?!
So my understanding is as follows.

Except from the spectator mount option not being adviceable for production use 
the *features* are:

* the spectator node does not get a journal
* the spectator node cannot replay a journal and therefore cannot be involved 
in fencing. Or the other way round it is not involved in fencing
* the spectator node must not have votes. I can but it mustn't.
* the spectator node is less involved in locking. It does not hold locks?! 
Does it? Or does it only have read-locks? That is not clear to me. See below.

On spectator and locking:
Everything clear except from the locking topic. Why do you think the spectator 
node is less involved in locking. Doesn't it have to request a readlock for 
any file it wants to read. And as it cannot (if so?!) hold the lock itself it 
has to ask for it. Isn't this from the network point of view more network 
traffic then if it would "master/cache" the lock?
The only advantage would be that no or a less loaded gfs_scand would be 
running?! I have to admit that this wouldn't be to bad.

If this is so (correct me if I'm wrong) for me only one usecase would be the 
use of old less powerfull nodes that only need ro access to the fs.
As a drawback the symmetry of the cluster would go away and one would end with 
type a and type b nodes. I not yet sure of if I like it or not or if it is a 
very common usecase.

But isn't another usecase more likely. The readonly attempt. That means 
readonly access to the filesystem and everything else (fencing, journal 
replay and locking) would be running as before? But one could be sure that 
the ro-nodes would only request ro-locks but would be holding themselves and 
therefore could respond more quickly and perhaps some other tunings for 
readonly access to the gfs.

Because normally the usecase for readonly access would be (wouldn't it) that 
there are some nodes changing data (more or less frequently but only in few 
cases continuously) and others reading/serving the data. Those readonly nodes 
should be able to access the data very quickly and should respond to request 
instantaneous and therefore should be more powerfull then the rw-nodes. 

Could this usecase benefit from the spectator mount options? Or should this 
usecase not be build with spectator mountoptions. Or wouldn't it be better to 
reduce the demote_secs use glock_purging and if need be increase the size of 
the hashtables and use rw as mountoption?

Regards Marc.
On Tuesday 24 June 2008 18:38:41 David Teigland wrote:
> On Tue, Jun 24, 2008 at 09:53:56AM +0200, Marc Grimme wrote:
> > Hello,
> > we are currently testing the specator mount option for giving nodes
> > readonly access to a gfs filesystem.
> >
> > One thing we found out is that any node having mounted the filesystem
> > with spectator mount option cannot do recovery when a node in the cluster
> > fails. That means we need at least 2 rw-nodes. It's clear when I keep in
> > mind that the node has no rw-access to the journal and therefor cannot do
> > the journal replay. But it is not mentioned anywhere.
> >
> > Could you please explain the ideas and other "unnormal" behaviors coming
> > along with the spectator mount-options.
> >
> > And are there any advantages from it except the "having no journal"?
>
> It's not mentioned much because it's never crossed the grey line of being
> a promoted or "supportable" feature.  Among the reasons are:
>
> - The use case(s) or benefits have never been clearly understood or stated.
>   What exactly are the spectator features?  (see below)
>   When should you use the spectator features, and why?
>   Are the benefits great enough to justify all the work/testing?
>
> - None of the spectator features have been tested.  QE would need to
>   develop tests for them, run them, and we'd need to fix the problems
>   that fall out.
>
> "Spectator features" refers to more than the spectator mount option in
> gfs.  There are three non-standard configuration modes that could be used
> together (although they can be used independently, too):
>
> 1. The spectator mount option in gfs.  With this option, gfs will never
>    write to the fs.  It won't do journal recovery, and won't allow
>    remount rw.  The main benefit of this is that the node does not need
>    to be fenced if it fails, so the node can mount without joining the
>    fence domain.
>
>    You point out some of the thorny problems with this option (along with
>    the ro mount option).  What happens when the last rw node fails,
>    leaving only spectators who can't recover the journal, and other
>    similar scenarios?  gfs_controld has code to handle these cases,
>    but it would require serious testing/validation.
>
> 2. Quorum votes in cman.  It may make sense in some environments for a node
>    to not contribute to quorum, either positively or negatively, of course.
>    <clusternode name="foo" nodeid="1" votes="0"/>
>
> 3. Resource mastering in dlm.  Nodes can be configured to never master
>    any dlm resources, which means there's less disruption in the dlm when
>    they join/leave the lockspace.  See this bug for more details:
>    https://bugzilla.redhat.com/show_bug.cgi?id=206336
>
> We'd like to understand specific use cases people have where these things
> would provide real advantages.  We need to be able to advise people when,
> why and how to use these settings, and we need to be able to test them as
> they'd be used.
>
> Thanks,
> Dave


-- 
Gruss / Regards,

Marc Grimme
Phone: +49-89 452 3538-14
http://www.atix.de/               http://www.open-sharedroot.org/

**
ATIX Informationstechnologie und Consulting AG
Einsteinstr. 10 
85716 Unterschleissheim
Deutschland/Germany

Phone: +49-89 452 3538-0
Fax:   +49-89 990 1766-0

Registergericht: Amtsgericht Muenchen
Registernummer: HRB 168930
USt.-Id.: DE209485962

Vorstand: 
Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.)

Vorsitzender des Aufsichtsrats:
Dr. Martin Buss


From lhh at redhat.com  Tue Jun 24 20:04:46 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 24 Jun 2008 16:04:46 -0400
Subject: [Linux-cluster] Oracle 10G resource agent - status polling
In-Reply-To: <3DDA6E3E456E144DA3BB0A62A7F7F7790221644D@SKYHQAMX08.klasi.is>
References: <3DDA6E3E456E144DA3BB0A62A7F7F7790221640D@SKYHQAMX08.klasi.is>
	<1214327033.3143.9.camel@localhost.localdomain>
	<3DDA6E3E456E144DA3BB0A62A7F7F7790221644D@SKYHQAMX08.klasi.is>
Message-ID: <1214337886.3143.19.camel@localhost.localdomain>

On Tue, 2008-06-24 at 17:35 +0000, Finnur ?rn Gu?mundsson - TM Software
wrote:
> Hi,
> 
> (Sorry for the top posting, i blame my email client!)
> 
> The service is started yes.
> 
> I was going for the cluster.conf way, IE, setting the status check to 1 minute.
> 
> I can see it should not be a problem with a script resource but since i use the Oracle Resource Agent it kinda runs around in my head.
> 
> Here is what i tried:
> 
>                 <service autostart="1" exclusive="0" name="oracle1" recovery="relocate">
>                         <ip __independent_subtree="1" ref="10.x.x.x">
>                                 <fs ref="ora1-data"/>
>                                 <fs ref="ora1-archlogs"/>
>                                 <oracledb home="/u01/app/oracle" name="oracle1" type="10g" user="oracle"/>
>                                 <action name="status" depth="*" interval="1m"/>
>                         </ip>
>                 </service>

I would do it this way (the __independent_subtree option in this context
doesn't do anything useful):
?
<service autostart="1" exclusive="0" name="oracle1" recovery="relocate">
  <ip ref="10.x.x.x" />
  <fs ref="ora1-data" />
  <fs ref="ora1-archlogs" />
  <oracledb home="/u01/app/oracle" name="oracle1" type="10g" user="oracle">
    <action name="status" depth="*" interval="1m"/>
  </oracledb>
</service>

(action looks like a resource here, but is handled "special-case" like
by rgmanager)

-- Lon


From fog at t.is  Tue Jun 24 20:21:43 2008
From: fog at t.is (=?UTF-8?B?RmlubnVyIMOWcm4gR3XDsG11bmRzc29uIC0gVE0gU29mdA==?= =?UTF-8?B?d2FyZQ==?=)
Date: Tue, 24 Jun 2008 20:21:43 -0000
Subject: [Linux-cluster] Oracle 10G resource agent - status polling
In-Reply-To: <1214337886.3143.19.camel@localhost.localdomain>
References: <3DDA6E3E456E144DA3BB0A62A7F7F7790221640D@SKYHQAMX08.klasi.is><1214327033.3143.9.camel@localhost.localdomain><3DDA6E3E456E144DA3BB0A62A7F7F7790221644D@SKYHQAMX08.klasi.is>
	<1214337886.3143.19.camel@localhost.localdomain>
Message-ID: <3DDA6E3E456E144DA3BB0A62A7F7F7790221645F@SKYHQAMX08.klasi.is>

Hi,

Thanks a lot, this of course worked like a charm :)

Bgrds,
Finnur

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: 24. j?n? 2008 20:05
To: linux clustering
Subject: RE: [Linux-cluster] Oracle 10G resource agent - status polling

On Tue, 2008-06-24 at 17:35 +0000, Finnur ?rn Gu?mundsson - TM Software
wrote:
> Hi,
> 
> (Sorry for the top posting, i blame my email client!)
> 
> The service is started yes.
> 
> I was going for the cluster.conf way, IE, setting the status check to 1 minute.
> 
> I can see it should not be a problem with a script resource but since i use the Oracle Resource Agent it kinda runs around in my head.
> 
> Here is what i tried:
> 
>                 <service autostart="1" exclusive="0" name="oracle1" recovery="relocate">
>                         <ip __independent_subtree="1" ref="10.x.x.x">
>                                 <fs ref="ora1-data"/>
>                                 <fs ref="ora1-archlogs"/>
>                                 <oracledb home="/u01/app/oracle" name="oracle1" type="10g" user="oracle"/>
>                                 <action name="status" depth="*" interval="1m"/>
>                         </ip>
>                 </service>

I would do it this way (the __independent_subtree option in this context
doesn't do anything useful):
?
<service autostart="1" exclusive="0" name="oracle1" recovery="relocate">
  <ip ref="10.x.x.x" />
  <fs ref="ora1-data" />
  <fs ref="ora1-archlogs" />
  <oracledb home="/u01/app/oracle" name="oracle1" type="10g" user="oracle">
    <action name="status" depth="*" interval="1m"/>
  </oracledb>
</service>

(action looks like a resource here, but is handled "special-case" like
by rgmanager)

-- Lon


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From teigland at redhat.com  Tue Jun 24 21:42:05 2008
From: teigland at redhat.com (David Teigland)
Date: Tue, 24 Jun 2008 16:42:05 -0500
Subject: [Linux-cluster] Spectator mount option
In-Reply-To: <200806242154.49147.grimme@atix.de>
References: <200806240953.56658.grimme@atix.de>
	<20080624163841.GB19067@redhat.com>
	<200806242154.49147.grimme@atix.de>
Message-ID: <20080624214205.GC19067@redhat.com>


Sorry for the confusion, I'll try to sort it out...

On Tue, Jun 24, 2008 at 09:54:48PM +0200, Marc Grimme wrote:
> * the spectator node does not get a journal

correct

> * the spectator node cannot replay a journal 

correct

> and therefore cannot be involved in fencing. Or the other way round it
> is not involved in fencing

A node doesn't *need* to be fenced if it can't write to the fs.  If you
*want* spectator nodes to be fenced when the fail, that's fine, just have
them join the fence domain as usual.

> * the spectator node must not have votes. I can but it mustn't.
> * the spectator node is less involved in locking. It does not hold locks?! 
> Does it? Or does it only have read-locks? That is not clear to me. See below.
> 
> On spectator and locking:
> Everything clear except from the locking topic. Why do you think the spectator
> node is less involved in locking. Doesn't it have to request a readlock for 
> any file it wants to read. And as it cannot (if so?!) hold the lock itself it 
> has to ask for it. Isn't this from the network point of view more network 
> traffic then if it would "master/cache" the lock?
> The only advantage would be that no or a less loaded gfs_scand would be 
> running?! I have to admit that this wouldn't be to bad.

I think you're confused by the relationship between the three "features" I
mentioned (1 spectator mount option, 2 zero votes, 3 dlm resource masters).
The key thing to understand is that those three features are *completely
unrelated* to each other; they are completely indepedent, they may or may
not be used together.

Refering to all three collectively as "spectator features" is very
misleading at a technical level.  This misleading language is an
unfortunate accident of marketing.  Technically, the word "spectator" is
only relevant to the gfs mount option (feature 1).  The word "spectator"
has no concrete, technical meaning in cman or dlm.

In your original question about spectators, I believe you were only asking
about the gfs issues (feature 1).  I think we've answered that question
now:  spectator mount option in gfs means the node is assigned no journal,
it cannot do journal recovery, and because it will never write to the fs,
it's ok for it to remain out of the fence domain if desired.  That's all,
it involves nothing different in regard to votes or locking.

So, why bring votes and locking into the picture?  Because if there's an
environment where feature 1 (gfs spectator option) is helpful, features 2
and 3 (voting/locking configurations) might also be helpful.  Again,
there's no technical relationship between any of these features, there's
just a suspicion that people may want to use them together.

I don't think there's any question about the effects of zero votes.

There's a lot of confusion about the dlm configuration.  The dlm settings
have no effect on gfs, all the same gfs locking is done.  The dlm settings
only change the internal dlm algorithm for selecting the master of a
resource.  One node is always the master of a dlm resource; normally the
first node to lock a resource becomes the master.  The new settings change
this so that a fixed node (or nodes) become the master.

Advantages:  When a node is removed, dlm recovery has substantially *less*
work to do when the node was not the master of any resources.  So, this
node will cause less disruption to others in the cluster.  Whether the
overall effect of this is big or small we don't yet know.

Disadvantages:  If a resource is mastered locally, no network messages are
needed to lock it.  So, a side effect of a node *not* mastering any
resources, is that all its locking will involve network messages and
locking may become slower (how much depends entirely on how isolated the
node's fs activity is.)

Dave


From gsrlinux at gmail.com  Wed Jun 25 03:22:59 2008
From: gsrlinux at gmail.com (GS R)
Date: Wed, 25 Jun 2008 08:52:59 +0530
Subject: [Linux-cluster] can't communicate with fenced -1
In-Reply-To: <c60a46e50806240445w4fb1cf2an79147a801bee7874@mail.gmail.com>
References: <c60a46e50806240445w4fb1cf2an79147a801bee7874@mail.gmail.com>
Message-ID: <d765e01f0806242022r44a0906fx55f96bf6e2df5b29@mail.gmail.com>

On 6/24/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:

> Hi,
>
> We have two RHEL5.1 boxes installed sharing a
> single iscsi emc2 SAN, whitout fence devices. System is configured
> as a high-availability system of xen guest.
>
> One of the most repeating problems are fence_tool related.
>
>   # service cman start
>   Starting cluster:
>      Loading modules... done
>      Mounting configfs... done
>      Starting ccsd... done
>      Starting cman... done
>      Starting daemons... done
>  Starting fencing... fence_tool: can't communicate with fenced -1
>
>  # fenced -D
>   1204556546 cman_init error 0 111
>
>   # clustat
>   CMAN is not running.
>
>   # cman_tool join
>
>   # clustat
>   msg_open: Connection refused
>   Member Status: Quorate
>     Member Name                        ID   Status
>
>     ------ ----                        ---- ------
>     yoda1                             1 Online, Local
>     yoda2                             2 Offline
>
> Sometimes this problem gets solved if the two machines are rebooted at
>
> the same time. But in the current HA configuration, I cannot guarantee
> two systems will be rebooted at the same time for every problem we
> face. This is my config file:
>
> ###################################cluster.conf####################################
>
> <?xml version="1.0"?>
> <cluster alias="yoda-cl" config_version="2" name="yoda-cl">
>         <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>
>         <clusternodes>
>                 <clusternode name="yoda2" nodeid="1" votes="1">
>                         <fence/>
>                 </clusternode>
>                 <clusternode name="yoda1" nodeid="2" votes="1">
>
>                         <fence/>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <rm>
>                 <failoverdomains/>
>
>                 <resources/>
>         </rm>
>         <fencedevices/>
> </cluster>
> ###################################cluster.conf####################################
> Regards.
>
> Hi

I configured a two node cluster with no fence device on RHEL5.1.
The cluster started and stopped with no issues. The only difference that I
see is that I have used FQDN in my cluster.conf

i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1">

Check your /etc/hosts if it has the FQDN in it.

Thanks
Gowrishankar Rajaiyan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/7ea60216/attachment.htm>

From gpbuono at gmail.com  Wed Jun 25 07:46:43 2008
From: gpbuono at gmail.com (Gian Paolo Buono)
Date: Wed, 25 Jun 2008 09:46:43 +0200
Subject: [Linux-cluster] can't communicate with fenced -1
In-Reply-To: <d765e01f0806242022r44a0906fx55f96bf6e2df5b29@mail.gmail.com>
References: <c60a46e50806240445w4fb1cf2an79147a801bee7874@mail.gmail.com>
	<d765e01f0806242022r44a0906fx55f96bf6e2df5b29@mail.gmail.com>
Message-ID: <c60a46e50806250046q65ce3614q23d391a11d0481bc@mail.gmail.com>

Hi,
the problem of my cluster is that it start-up weel but after two days the
problem that I have described is running, and this problem gets solved if
the two machines are rebooted at the same time.

Thanks
Gian Paolo


2008/6/25 GS R <gsrlinux at gmail.com>:

>
>
> On 6/24/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>
>> Hi,
>>
>> We have two RHEL5.1 boxes installed sharing a
>> single iscsi emc2 SAN, whitout fence devices. System is configured
>>
>> as a high-availability system of xen guest.
>>
>> One of the most repeating problems are fence_tool related.
>>
>>   # service cman start
>>   Starting cluster:
>>      Loading modules... done
>>      Mounting configfs... done
>>      Starting ccsd... done
>>      Starting cman... done
>>      Starting daemons... done
>>  Starting fencing... fence_tool: can't communicate with fenced -1
>>
>>
>>  # fenced -D
>>   1204556546 cman_init error 0 111
>>
>>   # clustat
>>   CMAN is not running.
>>
>>   # cman_tool join
>>
>>   # clustat
>>   msg_open: Connection refused
>>
>>   Member Status: Quorate
>>     Member Name                        ID   Status
>>
>>     ------ ----                        ---- ------
>>     yoda1                             1 Online, Local
>>     yoda2                             2 Offline
>>
>> Sometimes this problem gets solved if the two machines are rebooted at
>>
>>
>> the same time. But in the current HA configuration, I cannot guarantee
>> two systems will be rebooted at the same time for every problem we
>> face. This is my config file:
>>
>> ###################################cluster.conf####################################
>>
>>
>> <?xml version="1.0"?>
>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl">
>>         <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>
>>
>>         <clusternodes>
>>                 <clusternode name="yoda2" nodeid="1" votes="1">
>>                         <fence/>
>>                 </clusternode>
>>                 <clusternode name="yoda1" nodeid="2" votes="1">
>>
>>
>>                         <fence/>
>>                 </clusternode>
>>         </clusternodes>
>>         <cman expected_votes="1" two_node="1"/>
>>         <rm>
>>                 <failoverdomains/>
>>
>>
>>                 <resources/>
>>         </rm>
>>         <fencedevices/>
>> </cluster>
>> ###################################cluster.conf####################################
>> Regards.
>>
>> Hi
>
> I configured a two node cluster with no fence device on RHEL5.1.
> The cluster started and stopped with no issues. The only difference that I
> see is that I have used FQDN in my cluster.conf
>
> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1">
>
> Check your /etc/hosts if it has the FQDN in it.
>
> Thanks
> Gowrishankar Rajaiyan
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/d1faed01/attachment.htm>

From sunhux at gmail.com  Wed Jun 25 07:53:46 2008
From: sunhux at gmail.com (sunhux G)
Date: Wed, 25 Jun 2008 15:53:46 +0800
Subject: [Linux-cluster] Info & documentation on configuring Power Fencing
	using IBM RSA II (x3850/3950 M2 servers )
Message-ID: <60f08e700806250053g3ede518fib3fd982d2b42ff1a@mail.gmail.com>

Hi,

We've been googling to look for step by step guide on how to configure
IBM RSA II for power fencing in an RHES 5.1  environment.

Is it just as simple as this one page instruction below :
http://www.redhat.com/docs/manuals/csgfs/browse/rh-cs-en/s1-config-fence-devices.html


Some questions :

a)how do we get the "Add a New Fence Device" screen?  Is it somewhere on the
   Redhat Gnome desktop that I can click to bring it up?

b)the factory default IP addr of the RSA II  LAN port is 192.168.70.125/24.
   What's the IP addr I can input in the above "Add New Fence Device"
   screen - must it be 192.168.70.x  (within same subnet as 192.168.70.125)?

c)do we repeat the same step("Add New Fence Device") for every RHES
   server in the cluster & is the same IP address/login id being input for
   each of the servers in the cluster?

The link below only gives a little concept, not actual configuration guide :
http://www.centos.org/docs/4/4.5/SAC_Cluster_Suite_Overview/s2-fencing-overview-CSO.html<http://www.redhat.com/docs/manuals/csgfs/browse/rh-cs-en/s1-config-fence-devices.html>


Any other links/information is much appreciated.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/1346ef88/attachment.htm>

From gsrlinux at gmail.com  Wed Jun 25 08:33:51 2008
From: gsrlinux at gmail.com (GS R)
Date: Wed, 25 Jun 2008 14:03:51 +0530
Subject: [Linux-cluster] can't communicate with fenced -1
In-Reply-To: <c60a46e50806250046q65ce3614q23d391a11d0481bc@mail.gmail.com>
References: <c60a46e50806240445w4fb1cf2an79147a801bee7874@mail.gmail.com>
	<d765e01f0806242022r44a0906fx55f96bf6e2df5b29@mail.gmail.com>
	<c60a46e50806250046q65ce3614q23d391a11d0481bc@mail.gmail.com>
Message-ID: <d765e01f0806250133t521f1a99y2852225d4e46a891@mail.gmail.com>

>
>
>
>
> 2008/6/25 GS R <gsrlinux at gmail.com>:
>
>>
>>
>> On 6/24/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have two RHEL5.1 boxes installed sharing a
>>> single iscsi emc2 SAN, whitout fence devices. System is configured
>>>
>>>
>>> as a high-availability system of xen guest.
>>>
>>> One of the most repeating problems are fence_tool related.
>>>
>>>   # service cman start
>>>   Starting cluster:
>>>      Loading modules... done
>>>      Mounting configfs... done
>>>      Starting ccsd... done
>>>      Starting cman... done
>>>      Starting daemons... done
>>>  Starting fencing... fence_tool: can't communicate with fenced -1
>>>
>>>
>>>
>>>  # fenced -D
>>>   1204556546 cman_init error 0 111
>>>
>>>   # clustat
>>>   CMAN is not running.
>>>
>>>   # cman_tool join
>>>
>>>   # clustat
>>>   msg_open: Connection refused
>>>
>>>   Member Status: Quorate
>>>     Member Name                        ID   Status
>>>
>>>     ------ ----                        ---- ------
>>>     yoda1                             1 Online, Local
>>>     yoda2                             2 Offline
>>>
>>> Sometimes this problem gets solved if the two machines are rebooted at
>>>
>>>
>>>
>>> the same time. But in the current HA configuration, I cannot guarantee
>>> two systems will be rebooted at the same time for every problem we
>>> face. This is my config file:
>>>
>>> ###################################cluster.conf####################################
>>>
>>>
>>>
>>> <?xml version="1.0"?>
>>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl">
>>>         <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>
>>>
>>>
>>>         <clusternodes>
>>>                 <clusternode name="yoda2" nodeid="1" votes="1">
>>>                         <fence/>
>>>                 </clusternode>
>>>                 <clusternode name="yoda1" nodeid="2" votes="1">
>>>
>>>
>>>
>>>                         <fence/>
>>>                 </clusternode>
>>>         </clusternodes>
>>>         <cman expected_votes="1" two_node="1"/>
>>>         <rm>
>>>                 <failoverdomains/>
>>>
>>>
>>>
>>>                 <resources/>
>>>         </rm>
>>>         <fencedevices/>
>>> </cluster>
>>> ###################################cluster.conf####################################
>>> Regards.
>>>
>>> Hi
>>
>> I configured a two node cluster with no fence device on RHEL5.1.
>> The cluster started and stopped with no issues. The only difference that I
>> see is that I have used FQDN in my cluster.conf
>>
>> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1">
>>
>> Check your /etc/hosts if it has the FQDN in it.
>>
>> Thanks
>> Gowrishankar Rajaiyan
>>
>>
>>
>

On 6/25/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>
> Hi,
> the problem of my cluster is that it start-up weel but after two days the
> problem that I have described is running, and this problem gets solved if
> the two machines are rebooted at the same time.
>
> Thanks
> Gian Paolo


Hi Gian

Could you please attach the logs.

Thanks
Gowrishankar Rajaiyan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/805cdbbe/attachment.htm>

From gsrlinux at gmail.com  Wed Jun 25 08:47:26 2008
From: gsrlinux at gmail.com (GS R)
Date: Wed, 25 Jun 2008 14:17:26 +0530
Subject: [Linux-cluster] Info & documentation on configuring Power Fencing
	using IBM RSA II (x3850/3950 M2 servers )
In-Reply-To: <60f08e700806250053g3ede518fib3fd982d2b42ff1a@mail.gmail.com>
References: <60f08e700806250053g3ede518fib3fd982d2b42ff1a@mail.gmail.com>
Message-ID: <4862061E.9040306@gmail.com>

sunhux G wrote:
> Hi,
>  
> We've been googling to look for step by step guide on how to configure
> IBM RSA II for power fencing in an RHES 5.1  environment.
>  
> Is it just as simple as this one page instruction below :
> http://www.redhat.com/docs/manuals/csgfs/browse/rh-cs-en/s1-config-fence-devices.html
>  
>  
> Some questions :
>  
> a)how do we get the "Add a New Fence Device" screen?  Is it 
> somewhere on the
>    Redhat Gnome desktop that I can click to bring it up?
No. It's not anywhere in RedHat Gnome. You will have to use Conga OR 
system-config-cluster OR enter it manually.
>  
> b)the factory default IP addr of the RSA II  LAN port is 
> 192.168.70.125/24 <http://192.168.70.125/24>.
>    What's the IP addr I can input in the above "Add New Fence Device"
>    screen - must it be 192.168.70.x  (within same subnet as 
> 192.168.70.125 <http://192.168.70.125>)?
It's the IP address assigned to the IPMI port and not the network.
>  
> c)do we repeat the same step("Add New Fence Device") for every RHES
>    server in the cluster & is the same IP address/login id being input for
>    each of the servers in the cluster?
New fence device is added only once. However, you need to assign this 
fence device to all the nodes in your cluster.
>  
> The link below only gives a little concept, not actual configuration 
> guide :
> http://www.centos.org/docs/4/4.5/SAC_Cluster_Suite_Overview/s2-fencing-overview-CSO.html
>
>  
> Any other links/information is much appreciated.
>  
> Thanks
>  
>  
Thanks
Gowrishankar Rajaiyan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/44ab2a21/attachment.htm>

From gpbuono at gmail.com  Wed Jun 25 08:55:49 2008
From: gpbuono at gmail.com (Gian Paolo Buono)
Date: Wed, 25 Jun 2008 10:55:49 +0200
Subject: [Linux-cluster] can't communicate with fenced -1
In-Reply-To: <d765e01f0806250133t521f1a99y2852225d4e46a891@mail.gmail.com>
References: <c60a46e50806240445w4fb1cf2an79147a801bee7874@mail.gmail.com>
	<d765e01f0806242022r44a0906fx55f96bf6e2df5b29@mail.gmail.com>
	<c60a46e50806250046q65ce3614q23d391a11d0481bc@mail.gmail.com>
	<d765e01f0806250133t521f1a99y2852225d4e46a891@mail.gmail.com>
Message-ID: <c60a46e50806250155l6c4eacc2s190d36b0eb34ffb8@mail.gmail.com>

Hi,
if I try to restart on yoda2 cman
[root at yoda2 ~]# /etc/init.d/cman restart
Stopping cluster:
   Stopping fencing... done
   Stopping cman... done
   Stopping ccsd... done
   Unmounting configfs... done
                                                           [  OK  ]
Starting cluster:
   Enabling workaround for Xend bridged networking... done
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... failed

                                                           [FAILED]
[root at yoda2 ~]# tail -f /var/log/messages
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] Members Joined:
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ]   r(0) ip(172.20.0.174)
Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within the
primary component and will provide service.
Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL state.
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
172.20.0.174
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
172.20.0.175
Jun 25 10:50:42 yoda2 openais[18429]: [CPG  ] got joinlist message from node
2
Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node 1 because
we were killed by cman_tool or other application
Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate
Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111
Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster
infrastructure after 30 seconds.
Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP: [172.20.0.32]:55090


on this server there are 3 xen domu and i can't to reboot yoda2 :( ..

best regards..  and sorry for my english :)

2008/6/25 GS R <gsrlinux at gmail.com>:

>
>>
>>
>> 2008/6/25 GS R <gsrlinux at gmail.com>:
>>
>>>
>>>
>>> On 6/24/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have two RHEL5.1 boxes installed sharing a
>>>> single iscsi emc2 SAN, whitout fence devices. System is configured
>>>>
>>>>
>>>>
>>>> as a high-availability system of xen guest.
>>>>
>>>> One of the most repeating problems are fence_tool related.
>>>>
>>>>   # service cman start
>>>>   Starting cluster:
>>>>      Loading modules... done
>>>>      Mounting configfs... done
>>>>      Starting ccsd... done
>>>>      Starting cman... done
>>>>      Starting daemons... done
>>>>  Starting fencing... fence_tool: can't communicate with fenced -1
>>>>
>>>>
>>>>
>>>>
>>>>  # fenced -D
>>>>   1204556546 cman_init error 0 111
>>>>
>>>>   # clustat
>>>>   CMAN is not running.
>>>>
>>>>   # cman_tool join
>>>>
>>>>   # clustat
>>>>   msg_open: Connection refused
>>>>
>>>>   Member Status: Quorate
>>>>     Member Name                        ID   Status
>>>>
>>>>     ------ ----                        ---- ------
>>>>     yoda1                             1 Online, Local
>>>>     yoda2                             2 Offline
>>>>
>>>> Sometimes this problem gets solved if the two machines are rebooted at
>>>>
>>>>
>>>>
>>>>
>>>> the same time. But in the current HA configuration, I cannot guarantee
>>>> two systems will be rebooted at the same time for every problem we
>>>> face. This is my config file:
>>>>
>>>> ###################################cluster.conf####################################
>>>>
>>>>
>>>>
>>>>
>>>> <?xml version="1.0"?>
>>>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl">
>>>>         <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>
>>>>
>>>>
>>>>
>>>>         <clusternodes>
>>>>                 <clusternode name="yoda2" nodeid="1" votes="1">
>>>>                         <fence/>
>>>>                 </clusternode>
>>>>                 <clusternode name="yoda1" nodeid="2" votes="1">
>>>>
>>>>
>>>>
>>>>
>>>>                         <fence/>
>>>>                 </clusternode>
>>>>         </clusternodes>
>>>>         <cman expected_votes="1" two_node="1"/>
>>>>         <rm>
>>>>                 <failoverdomains/>
>>>>
>>>>
>>>>
>>>>
>>>>                 <resources/>
>>>>         </rm>
>>>>         <fencedevices/>
>>>> </cluster>
>>>> ###################################cluster.conf####################################
>>>> Regards.
>>>>
>>>> Hi
>>>
>>> I configured a two node cluster with no fence device on RHEL5.1.
>>> The cluster started and stopped with no issues. The only difference that
>>> I see is that I have used FQDN in my cluster.conf
>>>
>>> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1">
>>>
>>> Check your /etc/hosts if it has the FQDN in it.
>>>
>>> Thanks
>>> Gowrishankar Rajaiyan
>>>
>>>
>>>
>>
>
> On 6/25/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>
>> Hi,
>> the problem of my cluster is that it start-up weel but after two days the
>> problem that I have described is running, and this problem gets solved if
>> the two machines are rebooted at the same time.
>>
>> Thanks
>> Gian Paolo
>>
>
>
> Hi Gian
>
> Could you please attach the logs.
>
> Thanks
> Gowrishankar Rajaiyan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/03101f0d/attachment.htm>

From gpbuono at gmail.com  Wed Jun 25 09:24:04 2008
From: gpbuono at gmail.com (Gian Paolo Buono)
Date: Wed, 25 Jun 2008 11:24:04 +0200
Subject: [Linux-cluster] can't communicate with fenced -1
In-Reply-To: <c60a46e50806250155l6c4eacc2s190d36b0eb34ffb8@mail.gmail.com>
References: <c60a46e50806240445w4fb1cf2an79147a801bee7874@mail.gmail.com>
	<d765e01f0806242022r44a0906fx55f96bf6e2df5b29@mail.gmail.com>
	<c60a46e50806250046q65ce3614q23d391a11d0481bc@mail.gmail.com>
	<d765e01f0806250133t521f1a99y2852225d4e46a891@mail.gmail.com>
	<c60a46e50806250155l6c4eacc2s190d36b0eb34ffb8@mail.gmail.com>
Message-ID: <c60a46e50806250224x5a2ed28bn32a57e0f3baab578@mail.gmail.com>

Hi,
an other problem the process clurgmgrd don't dead:

[root at yoda2 ~]# /etc/init.d/rgmanager stop
Shutting down Cluster Service Manager...
Waiting for services to stop:

but nothing to do...

[root at yoda2 ~]# ps -ef | grep clurgmgrd
root      6620     1 55 Jun03 ?        12-02:06:46 clurgmgrd
[root at yoda2 ~]# kill -9 6620
[root at yoda2 ~]# ps -ef | grep clurgmgrd

and the process clvmd

[root at yoda2 ~]# /etc/init.d/clvmd status
clvmd dead but subsys locked
active volumes: LV06 LV_nex2

help me ... i don't want reboot the yoda2 ...

bye


On Wed, Jun 25, 2008 at 10:55 AM, Gian Paolo Buono <gpbuono at gmail.com>
wrote:

> Hi,
> if I try to restart on yoda2 cman
> [root at yoda2 ~]# /etc/init.d/cman restart
> Stopping cluster:
>    Stopping fencing... done
>    Stopping cman... done
>    Stopping ccsd... done
>    Unmounting configfs... done
>                                                            [  OK  ]
> Starting cluster:
>    Enabling workaround for Xend bridged networking... done
>    Loading modules... done
>    Mounting configfs... done
>    Starting ccsd... done
>    Starting cman... done
>    Starting daemons... done
>    Starting fencing... failed
>
>                                                            [FAILED]
> [root at yoda2 ~]# tail -f /var/log/messages
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] Members Joined:
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ]   r(0) ip(172.20.0.174)
> Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within the
> primary component and will provide service.
> Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL state.
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
> 172.20.0.174
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
> 172.20.0.175
> Jun 25 10:50:42 yoda2 openais[18429]: [CPG  ] got joinlist message from
> node 2
> Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node 1 because
> we were killed by cman_tool or other application
> Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate
> Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111
> Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster
> infrastructure after 30 seconds.
> Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP: [172.20.0.32
> ]:55090
>
>
> on this server there are 3 xen domu and i can't to reboot yoda2 :( ..
>
> best regards..  and sorry for my english :)
>
> 2008/6/25 GS R <gsrlinux at gmail.com>:
>
>>
>>>
>>>
>>> 2008/6/25 GS R <gsrlinux at gmail.com>:
>>>
>>>>
>>>>
>>>> On 6/24/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have two RHEL5.1 boxes installed sharing a
>>>>> single iscsi emc2 SAN, whitout fence devices. System is configured
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> as a high-availability system of xen guest.
>>>>>
>>>>> One of the most repeating problems are fence_tool related.
>>>>>
>>>>>   # service cman start
>>>>>   Starting cluster:
>>>>>      Loading modules... done
>>>>>      Mounting configfs... done
>>>>>      Starting ccsd... done
>>>>>      Starting cman... done
>>>>>      Starting daemons... done
>>>>>  Starting fencing... fence_tool: can't communicate with fenced -1
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  # fenced -D
>>>>>   1204556546 cman_init error 0 111
>>>>>
>>>>>   # clustat
>>>>>   CMAN is not running.
>>>>>
>>>>>   # cman_tool join
>>>>>
>>>>>   # clustat
>>>>>   msg_open: Connection refused
>>>>>
>>>>>   Member Status: Quorate
>>>>>     Member Name                        ID   Status
>>>>>
>>>>>     ------ ----                        ---- ------
>>>>>     yoda1                             1 Online, Local
>>>>>     yoda2                             2 Offline
>>>>>
>>>>> Sometimes this problem gets solved if the two machines are rebooted at
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> the same time. But in the current HA configuration, I cannot guarantee
>>>>> two systems will be rebooted at the same time for every problem we
>>>>> face. This is my config file:
>>>>>
>>>>> ###################################cluster.conf####################################
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> <?xml version="1.0"?>
>>>>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl">
>>>>>         <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>         <clusternodes>
>>>>>                 <clusternode name="yoda2" nodeid="1" votes="1">
>>>>>                         <fence/>
>>>>>                 </clusternode>
>>>>>                 <clusternode name="yoda1" nodeid="2" votes="1">
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                         <fence/>
>>>>>                 </clusternode>
>>>>>         </clusternodes>
>>>>>         <cman expected_votes="1" two_node="1"/>
>>>>>         <rm>
>>>>>                 <failoverdomains/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 <resources/>
>>>>>         </rm>
>>>>>         <fencedevices/>
>>>>> </cluster>
>>>>> ###################################cluster.conf####################################
>>>>> Regards.
>>>>>
>>>>> Hi
>>>>
>>>> I configured a two node cluster with no fence device on RHEL5.1.
>>>> The cluster started and stopped with no issues. The only difference that
>>>> I see is that I have used FQDN in my cluster.conf
>>>>
>>>> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1">
>>>>
>>>> Check your /etc/hosts if it has the FQDN in it.
>>>>
>>>> Thanks
>>>> Gowrishankar Rajaiyan
>>>>
>>>>
>>>>
>>>
>>
>> On 6/25/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>
>>> Hi,
>>> the problem of my cluster is that it start-up weel but after two days the
>>> problem that I have described is running, and this problem gets solved if
>>> the two machines are rebooted at the same time.
>>>
>>> Thanks
>>> Gian Paolo
>>>
>>
>>
>> Hi Gian
>>
>> Could you please attach the logs.
>>
>> Thanks
>> Gowrishankar Rajaiyan
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/b39ebe5e/attachment.htm>

From gsrlinux at gmail.com  Wed Jun 25 09:56:45 2008
From: gsrlinux at gmail.com (GS R)
Date: Wed, 25 Jun 2008 15:26:45 +0530
Subject: [Linux-cluster] can't communicate with fenced -1
In-Reply-To: <c60a46e50806250224x5a2ed28bn32a57e0f3baab578@mail.gmail.com>
References: <c60a46e50806240445w4fb1cf2an79147a801bee7874@mail.gmail.com>	<d765e01f0806242022r44a0906fx55f96bf6e2df5b29@mail.gmail.com>	<c60a46e50806250046q65ce3614q23d391a11d0481bc@mail.gmail.com>	<d765e01f0806250133t521f1a99y2852225d4e46a891@mail.gmail.com>	<c60a46e50806250155l6c4eacc2s190d36b0eb34ffb8@mail.gmail.com>
	<c60a46e50806250224x5a2ed28bn32a57e0f3baab578@mail.gmail.com>
Message-ID: <4862165D.3030105@gmail.com>

Hi Gian,

I too faced the same issue. Rebooting the system is indeed the easy 
solution here.
I could do it because it was my test setup.

   1. Have you added any resources to this cluster?
   2. Have you configured any services to this cluster?
   3. Have you tried using a fence device. i.e., fence_manual?
   4. Is there at times a heavy load on your network?
   5. Have you opened all the ports on your firewall?


Thanks
Gowrishankar Rajaiyan


Gian Paolo Buono wrote:
> Hi,
> an other problem the process clurgmgrd don't dead:
>
> [root at yoda2 ~]# /etc/init.d/rgmanager stop
> Shutting down Cluster Service Manager...
> Waiting for services to stop:   
>
> but nothing to do...
>
> [root at yoda2 ~]# ps -ef | grep clurgmgrd
> root      6620     1 55 Jun03 ?        12-02:06:46 clurgmgrd
> [root at yoda2 ~]# kill -9 6620   
> [root at yoda2 ~]# ps -ef | grep clurgmgrd
>
> and the process clvmd
>
> [root at yoda2 ~]# /etc/init.d/clvmd status
> clvmd dead but subsys locked
> active volumes: LV06 LV_nex2
>
> help me ... i don't want reboot the yoda2 ...
>
> bye
>
>
> On Wed, Jun 25, 2008 at 10:55 AM, Gian Paolo Buono <gpbuono at gmail.com 
> <mailto:gpbuono at gmail.com>> wrote:
>
>     Hi,
>     if I try to restart on yoda2 cman
>     [root at yoda2 ~]# /etc/init.d/cman restart
>     Stopping cluster:
>        Stopping fencing... done
>        Stopping cman... done
>        Stopping ccsd... done
>        Unmounting configfs... done
>                                                                [  OK  ]
>     Starting cluster:
>        Enabling workaround for Xend bridged networking... done
>
>        Loading modules... done
>        Mounting configfs... done
>        Starting ccsd... done
>        Starting cman... done
>        Starting daemons... done
>        Starting fencing... failed
>
>                                                                [FAILED]
>     [root at yoda2 ~]# tail -f /var/log/messages
>     Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] Members Joined:
>     Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ]   r(0)
>     ip(172.20.0.174 <http://172.20.0.174>)
>     Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within
>     the primary component and will provide service.
>     Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL
>     state.
>     Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
>     172.20.0.174 <http://172.20.0.174>
>     Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
>     172.20.0.175 <http://172.20.0.175>
>     Jun 25 10:50:42 yoda2 openais[18429]: [CPG  ] got joinlist message
>     from node 2
>     Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node
>     1 because we were killed by cman_tool or other application
>     Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate
>     Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111
>     Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster
>     infrastructure after 30 seconds.
>     Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP:
>     [172.20.0.32 <http://172.20.0.32>]:55090
>
>
>     on this server there are 3 xen domu and i can't to reboot yoda2 :( ..
>
>     best regards..  and sorry for my english :)
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/0f4ff35e/attachment.htm>

From T.Kumar at alcoa.com  Wed Jun 25 13:51:07 2008
From: T.Kumar at alcoa.com (Kumar, T Santhosh (TCS))
Date: Wed, 25 Jun 2008 09:51:07 -0400
Subject: [Linux-cluster] RE: Linux-cluster Digest, Vol 50, Issue 32
In-Reply-To: <20080625095746.322FF619BE8@hormel.redhat.com>
References: <20080625095746.322FF619BE8@hormel.redhat.com>
Message-ID: <0C3FC6B507AF684199E57BFCA3EAB553263B3402@NOANDC-MXU11.NOA.Alcoa.com>


If this is thr error, then start the lvm by "/usr/sbin/clvmd"  which
will start clvmd while cluster VG/resources are online.


[root at yoda2 ~]# /etc/init.d/clvmd status
clvmd dead but subsys locked
active volumes: LV06 LV_nex2 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of
linux-cluster-request at redhat.com
Sent: Wednesday, June 25, 2008 5:58 AM
To: linux-cluster at redhat.com
Subject: Linux-cluster Digest, Vol 50, Issue 32

Send Linux-cluster mailing list submissions to
	linux-cluster at redhat.com

To subscribe or unsubscribe via the World Wide Web, visit
	https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
	linux-cluster-request at redhat.com

You can reach the person managing the list at
	linux-cluster-owner at redhat.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-cluster digest..."


Today's Topics:

   1. Re: can't communicate with fenced -1 (GS R)
   2. Re: Info & documentation on configuring Power Fencing	using
      IBM RSA II (x3850/3950 M2 servers ) (GS R)
   3. Re: can't communicate with fenced -1 (Gian Paolo Buono)
   4. Re: can't communicate with fenced -1 (Gian Paolo Buono)
   5. Re: can't communicate with fenced -1 (GS R)


----------------------------------------------------------------------

Message: 1
Date: Wed, 25 Jun 2008 14:03:51 +0530
From: "GS R" <gsrlinux at gmail.com>
Subject: Re: [Linux-cluster] can't communicate with fenced -1
To: "linux clustering" <linux-cluster at redhat.com>
Message-ID:
	<d765e01f0806250133t521f1a99y2852225d4e46a891 at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

>
>
>
>
> 2008/6/25 GS R <gsrlinux at gmail.com>:
>
>>
>>
>> On 6/24/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have two RHEL5.1 boxes installed sharing a
>>> single iscsi emc2 SAN, whitout fence devices. System is configured
>>>
>>>
>>> as a high-availability system of xen guest.
>>>
>>> One of the most repeating problems are fence_tool related.
>>>
>>>   # service cman start
>>>   Starting cluster:
>>>      Loading modules... done
>>>      Mounting configfs... done
>>>      Starting ccsd... done
>>>      Starting cman... done
>>>      Starting daemons... done
>>>  Starting fencing... fence_tool: can't communicate with fenced -1
>>>
>>>
>>>
>>>  # fenced -D
>>>   1204556546 cman_init error 0 111
>>>
>>>   # clustat
>>>   CMAN is not running.
>>>
>>>   # cman_tool join
>>>
>>>   # clustat
>>>   msg_open: Connection refused
>>>
>>>   Member Status: Quorate
>>>     Member Name                        ID   Status
>>>
>>>     ------ ----                        ---- ------
>>>     yoda1                             1 Online, Local
>>>     yoda2                             2 Offline
>>>
>>> Sometimes this problem gets solved if the two machines are rebooted
at
>>>
>>>
>>>
>>> the same time. But in the current HA configuration, I cannot
guarantee
>>> two systems will be rebooted at the same time for every problem we
>>> face. This is my config file:
>>>
>>>
###################################cluster.conf#########################
###########
>>>
>>>
>>>
>>> <?xml version="1.0"?>
>>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl">
>>>         <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
>>>
>>>
>>>
>>>         <clusternodes>
>>>                 <clusternode name="yoda2" nodeid="1" votes="1">
>>>                         <fence/>
>>>                 </clusternode>
>>>                 <clusternode name="yoda1" nodeid="2" votes="1">
>>>
>>>
>>>
>>>                         <fence/>
>>>                 </clusternode>
>>>         </clusternodes>
>>>         <cman expected_votes="1" two_node="1"/>
>>>         <rm>
>>>                 <failoverdomains/>
>>>
>>>
>>>
>>>                 <resources/>
>>>         </rm>
>>>         <fencedevices/>
>>> </cluster>
>>>
###################################cluster.conf#########################
###########
>>> Regards.
>>>
>>> Hi
>>
>> I configured a two node cluster with no fence device on RHEL5.1.
>> The cluster started and stopped with no issues. The only difference
that I
>> see is that I have used FQDN in my cluster.conf
>>
>> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1">
>>
>> Check your /etc/hosts if it has the FQDN in it.
>>
>> Thanks
>> Gowrishankar Rajaiyan
>>
>>
>>
>

On 6/25/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>
> Hi,
> the problem of my cluster is that it start-up weel but after two days
the
> problem that I have described is running, and this problem gets solved
if
> the two machines are rebooted at the same time.
>
> Thanks
> Gian Paolo


Hi Gian

Could you please attach the logs.

Thanks
Gowrishankar Rajaiyan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20080625/805cd
bbe/attachment.html

------------------------------

Message: 2
Date: Wed, 25 Jun 2008 14:17:26 +0530
From: GS R <gsrlinux at gmail.com>
Subject: Re: [Linux-cluster] Info & documentation on configuring Power
	Fencing	using IBM RSA II (x3850/3950 M2 servers )
To: linux clustering <linux-cluster at redhat.com>
Message-ID: <4862061E.9040306 at gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

sunhux G wrote:
> Hi,
>  
> We've been googling to look for step by step guide on how to configure
> IBM RSA II for power fencing in an RHES 5.1  environment.
>  
> Is it just as simple as this one page instruction below :
>
http://www.redhat.com/docs/manuals/csgfs/browse/rh-cs-en/s1-config-fence
-devices.html
>  
>  
> Some questions :
>  
> a)how do we get the "Add a New Fence Device" screen?  Is it 
> somewhere on the
>    Redhat Gnome desktop that I can click to bring it up?
No. It's not anywhere in RedHat Gnome. You will have to use Conga OR 
system-config-cluster OR enter it manually.
>  
> b)the factory default IP addr of the RSA II  LAN port is 
> 192.168.70.125/24 <http://192.168.70.125/24>.
>    What's the IP addr I can input in the above "Add New Fence Device"
>    screen - must it be 192.168.70.x  (within same subnet as 
> 192.168.70.125 <http://192.168.70.125>)?
It's the IP address assigned to the IPMI port and not the network.
>  
> c)do we repeat the same step("Add New Fence Device") for every RHES
>    server in the cluster & is the same IP address/login id being input
for
>    each of the servers in the cluster?
New fence device is added only once. However, you need to assign this 
fence device to all the nodes in your cluster.
>  
> The link below only gives a little concept, not actual configuration 
> guide :
>
http://www.centos.org/docs/4/4.5/SAC_Cluster_Suite_Overview/s2-fencing-o
verview-CSO.html
>
>  
> Any other links/information is much appreciated.
>  
> Thanks
>  
>  
Thanks
Gowrishankar Rajaiyan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20080625/44ab2
a21/attachment.html

------------------------------

Message: 3
Date: Wed, 25 Jun 2008 10:55:49 +0200
From: "Gian Paolo Buono" <gpbuono at gmail.com>
Subject: Re: [Linux-cluster] can't communicate with fenced -1
To: "linux clustering" <linux-cluster at redhat.com>
Message-ID:
	<c60a46e50806250155l6c4eacc2s190d36b0eb34ffb8 at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,
if I try to restart on yoda2 cman
[root at yoda2 ~]# /etc/init.d/cman restart
Stopping cluster:
   Stopping fencing... done
   Stopping cman... done
   Stopping ccsd... done
   Unmounting configfs... done
                                                           [  OK  ]
Starting cluster:
   Enabling workaround for Xend bridged networking... done
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... failed

                                                           [FAILED]
[root at yoda2 ~]# tail -f /var/log/messages
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] Members Joined:
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ]   r(0) ip(172.20.0.174)
Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within the
primary component and will provide service.
Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL
state.
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
172.20.0.174
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
172.20.0.175
Jun 25 10:50:42 yoda2 openais[18429]: [CPG  ] got joinlist message from
node
2
Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node 1
because
we were killed by cman_tool or other application
Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate
Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111
Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster
infrastructure after 30 seconds.
Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP:
[172.20.0.32]:55090


on this server there are 3 xen domu and i can't to reboot yoda2 :( ..

best regards..  and sorry for my english :)

2008/6/25 GS R <gsrlinux at gmail.com>:

>
>>
>>
>> 2008/6/25 GS R <gsrlinux at gmail.com>:
>>
>>>
>>>
>>> On 6/24/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have two RHEL5.1 boxes installed sharing a
>>>> single iscsi emc2 SAN, whitout fence devices. System is configured
>>>>
>>>>
>>>>
>>>> as a high-availability system of xen guest.
>>>>
>>>> One of the most repeating problems are fence_tool related.
>>>>
>>>>   # service cman start
>>>>   Starting cluster:
>>>>      Loading modules... done
>>>>      Mounting configfs... done
>>>>      Starting ccsd... done
>>>>      Starting cman... done
>>>>      Starting daemons... done
>>>>  Starting fencing... fence_tool: can't communicate with fenced -1
>>>>
>>>>
>>>>
>>>>
>>>>  # fenced -D
>>>>   1204556546 cman_init error 0 111
>>>>
>>>>   # clustat
>>>>   CMAN is not running.
>>>>
>>>>   # cman_tool join
>>>>
>>>>   # clustat
>>>>   msg_open: Connection refused
>>>>
>>>>   Member Status: Quorate
>>>>     Member Name                        ID   Status
>>>>
>>>>     ------ ----                        ---- ------
>>>>     yoda1                             1 Online, Local
>>>>     yoda2                             2 Offline
>>>>
>>>> Sometimes this problem gets solved if the two machines are rebooted
at
>>>>
>>>>
>>>>
>>>>
>>>> the same time. But in the current HA configuration, I cannot
guarantee
>>>> two systems will be rebooted at the same time for every problem we
>>>> face. This is my config file:
>>>>
>>>>
###################################cluster.conf#########################
###########
>>>>
>>>>
>>>>
>>>>
>>>> <?xml version="1.0"?>
>>>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl">
>>>>         <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
>>>>
>>>>
>>>>
>>>>
>>>>         <clusternodes>
>>>>                 <clusternode name="yoda2" nodeid="1" votes="1">
>>>>                         <fence/>
>>>>                 </clusternode>
>>>>                 <clusternode name="yoda1" nodeid="2" votes="1">
>>>>
>>>>
>>>>
>>>>
>>>>                         <fence/>
>>>>                 </clusternode>
>>>>         </clusternodes>
>>>>         <cman expected_votes="1" two_node="1"/>
>>>>         <rm>
>>>>                 <failoverdomains/>
>>>>
>>>>
>>>>
>>>>
>>>>                 <resources/>
>>>>         </rm>
>>>>         <fencedevices/>
>>>> </cluster>
>>>>
###################################cluster.conf#########################
###########
>>>> Regards.
>>>>
>>>> Hi
>>>
>>> I configured a two node cluster with no fence device on RHEL5.1.
>>> The cluster started and stopped with no issues. The only difference
that
>>> I see is that I have used FQDN in my cluster.conf
>>>
>>> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1">
>>>
>>> Check your /etc/hosts if it has the FQDN in it.
>>>
>>> Thanks
>>> Gowrishankar Rajaiyan
>>>
>>>
>>>
>>
>
> On 6/25/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>
>> Hi,
>> the problem of my cluster is that it start-up weel but after two days
the
>> problem that I have described is running, and this problem gets
solved if
>> the two machines are rebooted at the same time.
>>
>> Thanks
>> Gian Paolo
>>
>
>
> Hi Gian
>
> Could you please attach the logs.
>
> Thanks
> Gowrishankar Rajaiyan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20080625/03101
f0d/attachment.html

------------------------------

Message: 4
Date: Wed, 25 Jun 2008 11:24:04 +0200
From: "Gian Paolo Buono" <gpbuono at gmail.com>
Subject: Re: [Linux-cluster] can't communicate with fenced -1
To: "linux clustering" <linux-cluster at redhat.com>
Message-ID:
	<c60a46e50806250224x5a2ed28bn32a57e0f3baab578 at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,
an other problem the process clurgmgrd don't dead:

[root at yoda2 ~]# /etc/init.d/rgmanager stop
Shutting down Cluster Service Manager...
Waiting for services to stop:

but nothing to do...

[root at yoda2 ~]# ps -ef | grep clurgmgrd
root      6620     1 55 Jun03 ?        12-02:06:46 clurgmgrd
[root at yoda2 ~]# kill -9 6620
[root at yoda2 ~]# ps -ef | grep clurgmgrd

and the process clvmd

[root at yoda2 ~]# /etc/init.d/clvmd status
clvmd dead but subsys locked
active volumes: LV06 LV_nex2

help me ... i don't want reboot the yoda2 ...

bye


On Wed, Jun 25, 2008 at 10:55 AM, Gian Paolo Buono <gpbuono at gmail.com>
wrote:

> Hi,
> if I try to restart on yoda2 cman
> [root at yoda2 ~]# /etc/init.d/cman restart
> Stopping cluster:
>    Stopping fencing... done
>    Stopping cman... done
>    Stopping ccsd... done
>    Unmounting configfs... done
>                                                            [  OK  ]
> Starting cluster:
>    Enabling workaround for Xend bridged networking... done
>    Loading modules... done
>    Mounting configfs... done
>    Starting ccsd... done
>    Starting cman... done
>    Starting daemons... done
>    Starting fencing... failed
>
>                                                            [FAILED]
> [root at yoda2 ~]# tail -f /var/log/messages
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] Members Joined:
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ]   r(0) ip(172.20.0.174)
> Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within the
> primary component and will provide service.
> Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL
state.
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
> 172.20.0.174
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
> 172.20.0.175
> Jun 25 10:50:42 yoda2 openais[18429]: [CPG  ] got joinlist message
from
> node 2
> Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node 1
because
> we were killed by cman_tool or other application
> Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate
> Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111
> Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster
> infrastructure after 30 seconds.
> Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP: [172.20.0.32
> ]:55090
>
>
> on this server there are 3 xen domu and i can't to reboot yoda2 :( ..
>
> best regards..  and sorry for my english :)
>
> 2008/6/25 GS R <gsrlinux at gmail.com>:
>
>>
>>>
>>>
>>> 2008/6/25 GS R <gsrlinux at gmail.com>:
>>>
>>>>
>>>>
>>>> On 6/24/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have two RHEL5.1 boxes installed sharing a
>>>>> single iscsi emc2 SAN, whitout fence devices. System is configured
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> as a high-availability system of xen guest.
>>>>>
>>>>> One of the most repeating problems are fence_tool related.
>>>>>
>>>>>   # service cman start
>>>>>   Starting cluster:
>>>>>      Loading modules... done
>>>>>      Mounting configfs... done
>>>>>      Starting ccsd... done
>>>>>      Starting cman... done
>>>>>      Starting daemons... done
>>>>>  Starting fencing... fence_tool: can't communicate with fenced -1
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  # fenced -D
>>>>>   1204556546 cman_init error 0 111
>>>>>
>>>>>   # clustat
>>>>>   CMAN is not running.
>>>>>
>>>>>   # cman_tool join
>>>>>
>>>>>   # clustat
>>>>>   msg_open: Connection refused
>>>>>
>>>>>   Member Status: Quorate
>>>>>     Member Name                        ID   Status
>>>>>
>>>>>     ------ ----                        ---- ------
>>>>>     yoda1                             1 Online, Local
>>>>>     yoda2                             2 Offline
>>>>>
>>>>> Sometimes this problem gets solved if the two machines are
rebooted at
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> the same time. But in the current HA configuration, I cannot
guarantee
>>>>> two systems will be rebooted at the same time for every problem we
>>>>> face. This is my config file:
>>>>>
>>>>>
###################################cluster.conf#########################
###########
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> <?xml version="1.0"?>
>>>>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl">
>>>>>         <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>         <clusternodes>
>>>>>                 <clusternode name="yoda2" nodeid="1" votes="1">
>>>>>                         <fence/>
>>>>>                 </clusternode>
>>>>>                 <clusternode name="yoda1" nodeid="2" votes="1">
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                         <fence/>
>>>>>                 </clusternode>
>>>>>         </clusternodes>
>>>>>         <cman expected_votes="1" two_node="1"/>
>>>>>         <rm>
>>>>>                 <failoverdomains/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 <resources/>
>>>>>         </rm>
>>>>>         <fencedevices/>
>>>>> </cluster>
>>>>>
###################################cluster.conf#########################
###########
>>>>> Regards.
>>>>>
>>>>> Hi
>>>>
>>>> I configured a two node cluster with no fence device on RHEL5.1.
>>>> The cluster started and stopped with no issues. The only difference
that
>>>> I see is that I have used FQDN in my cluster.conf
>>>>
>>>> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1">
>>>>
>>>> Check your /etc/hosts if it has the FQDN in it.
>>>>
>>>> Thanks
>>>> Gowrishankar Rajaiyan
>>>>
>>>>
>>>>
>>>
>>
>> On 6/25/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>
>>> Hi,
>>> the problem of my cluster is that it start-up weel but after two
days the
>>> problem that I have described is running, and this problem gets
solved if
>>> the two machines are rebooted at the same time.
>>>
>>> Thanks
>>> Gian Paolo
>>>
>>
>>
>> Hi Gian
>>
>> Could you please attach the logs.
>>
>> Thanks
>> Gowrishankar Rajaiyan
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20080625/b39eb
e5e/attachment.html

------------------------------

Message: 5
Date: Wed, 25 Jun 2008 15:26:45 +0530
From: GS R <gsrlinux at gmail.com>
Subject: Re: [Linux-cluster] can't communicate with fenced -1
To: linux clustering <linux-cluster at redhat.com>
Message-ID: <4862165D.3030105 at gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Gian,

I too faced the same issue. Rebooting the system is indeed the easy 
solution here.
I could do it because it was my test setup.

   1. Have you added any resources to this cluster?
   2. Have you configured any services to this cluster?
   3. Have you tried using a fence device. i.e., fence_manual?
   4. Is there at times a heavy load on your network?
   5. Have you opened all the ports on your firewall?


Thanks
Gowrishankar Rajaiyan


Gian Paolo Buono wrote:
> Hi,
> an other problem the process clurgmgrd don't dead:
>
> [root at yoda2 ~]# /etc/init.d/rgmanager stop
> Shutting down Cluster Service Manager...
> Waiting for services to stop:   
>
> but nothing to do...
>
> [root at yoda2 ~]# ps -ef | grep clurgmgrd
> root      6620     1 55 Jun03 ?        12-02:06:46 clurgmgrd
> [root at yoda2 ~]# kill -9 6620   
> [root at yoda2 ~]# ps -ef | grep clurgmgrd
>
> and the process clvmd
>
> [root at yoda2 ~]# /etc/init.d/clvmd status
> clvmd dead but subsys locked
> active volumes: LV06 LV_nex2
>
> help me ... i don't want reboot the yoda2 ...
>
> bye
>
>
> On Wed, Jun 25, 2008 at 10:55 AM, Gian Paolo Buono <gpbuono at gmail.com 
> <mailto:gpbuono at gmail.com>> wrote:
>
>     Hi,
>     if I try to restart on yoda2 cman
>     [root at yoda2 ~]# /etc/init.d/cman restart
>     Stopping cluster:
>        Stopping fencing... done
>        Stopping cman... done
>        Stopping ccsd... done
>        Unmounting configfs... done
>                                                                [  OK
]
>     Starting cluster:
>        Enabling workaround for Xend bridged networking... done
>
>        Loading modules... done
>        Mounting configfs... done
>        Starting ccsd... done
>        Starting cman... done
>        Starting daemons... done
>        Starting fencing... failed
>
>
[FAILED]
>     [root at yoda2 ~]# tail -f /var/log/messages
>     Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] Members Joined:
>     Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ]   r(0)
>     ip(172.20.0.174 <http://172.20.0.174>)
>     Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within
>     the primary component and will provide service.
>     Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL
>     state.
>     Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
>     172.20.0.174 <http://172.20.0.174>
>     Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
>     172.20.0.175 <http://172.20.0.175>
>     Jun 25 10:50:42 yoda2 openais[18429]: [CPG  ] got joinlist message
>     from node 2
>     Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node
>     1 because we were killed by cman_tool or other application
>     Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate
>     Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111
>     Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster
>     infrastructure after 30 seconds.
>     Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP:
>     [172.20.0.32 <http://172.20.0.32>]:55090
>
>
>     on this server there are 3 xen domu and i can't to reboot yoda2 :(
..
>
>     best regards..  and sorry for my english :)
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20080625/0f4ff
35e/attachment.html

------------------------------

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 50, Issue 32
*********************************************


From punit_j at rediffmail.com  Wed Jun 25 18:53:43 2008
From: punit_j at rediffmail.com (punit_j)
Date: 25 Jun 2008 18:53:43 -0000
Subject: [Linux-cluster] Cluster issues
Message-ID: <20080625185343.18967.qmail@f5mail-237-214.rediffmail.com>

 ?
 Hi All,

I have installed zimbra mailbox server in 2+1 cluster. I am facing following issues while configuring Redhat cluster suite: -

2 services -
mb1-cluster.ku.edu.kw
mb2-cluster.ku.edu.kw

 I have only IP and mount points which are used using LABEL assigned to them. When i reboot all the 3 nodes mount points dont come up and i am getting following errors in log file. Attached is the cluster.conf for reference and also below are logs mentioned : -

These logs are on 1st node : -

Jun 25 11:13:15 mb1 clurgmgrd[14825]: Resource Group Manager Starting
Jun 25 11:13:15 mb1 clurgmgrd[14825]: Loading Service Data
Jun 25 11:13:17 mb1 clurgmgrd[14825]: Initializing Services
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: /dev/sdh1 is not mounted
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-BACKUP with a real device
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-BACKUP returned 2 (invalid argument(s))
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-STORE with a real device
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-STORE returned 2 (invalid argument(s))
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-DBDATA with a real device
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-DBDATA returned 2 (invalid argument(s))
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-CONF with a real device
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-CONF returned 2 (invalid argument(s))
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-REDOLOG with a real device
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-REDOLOG returned 2 (invalid argument(s))
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-INDEX with a real device
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-INDEX returned 2 (invalid argument(s))
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-LOG with a real device
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-LOG returned 2 (invalid argument(s))
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-TEST-CLUST with a real device
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-CLUSTER returned 2 (invalid argument(s))
Jun 25 11:13:22 mb1 clurgmgrd: [14825]: /dev/sdg1 is not mounted
Jun 25 11:13:27 mb1 clurgmgrd: [14825]: /dev/sdf1 is not mounted
Jun 25 11:13:33 mb1 clurgmgrd: [14825]: /dev/sde1 is not mounted
Jun 25 11:13:38 mb1 clurgmgrd: [14825]: /dev/sdd1 is not mounted
Jun 25 11:13:43 mb1 clurgmgrd: [14825]: /dev/sdc1 is not mounted
Jun 25 11:13:45 mb1 rgmanager: clurgmgrd startup failed
Jun 25 11:13:48 mb1 clurgmgrd: [14825]: /dev/sdb1 is not mounted
Jun 25 11:13:53 mb1 clurgmgrd: [14825]: /dev/sda1 is not mounted
Jun 25 11:13:58 mb1 clurgmgrd[14825]: Services Initialized
Jun 25 11:14:01 mb1 clurgmgrd[14825]: Logged in SG "usrm::manager"
Jun 25 11:14:01 mb1 clurgmgrd[14825]: Magma Event: Membership Change
Jun 25 11:14:01 mb1 clurgmgrd[14825]: State change: Local UP
Jun 25 11:14:01 mb1 clurgmgrd[14825]: State change: mbstandby.ku.edu.kw UP
Jun 25 11:14:03 mb1 clurgmgrd[14825]: Magma Event: Membership Change
Jun 25 11:14:03 mb1 clurgmgrd[14825]: State change: mb2.ku.edu.kw UP

On STANDBY NODE

forgot to add standby server logs: -

Jun 25 11:13:26 mbstandby clurgmgrd[15850]: Resource Group Manager Starting
Jun 25 11:13:26 mbstandby clurgmgrd[15850]: Loading Service Data
Jun 25 11:13:27 mbstandby clurgmgrd[15850]: Initializing Services
Jun 25 11:13:27 mbstandby clurgmgrd: [15850]: /dev/sdl1 is not mounted
Jun 25 11:13:27 mbstandby clurgmgrd: [15850]: /dev/sdp1 is not mounted
Jun 25 11:13:32 mbstandby clurgmgrd: [15850]: /dev/sdk1 is not mounted
Jun 25 11:13:32 mbstandby clurgmgrd: [15850]: /dev/sdn1 is not mounted
Jun 25 11:13:38 mbstandby clurgmgrd: [15850]: /dev/sdj1 is not mounted
Jun 25 11:13:38 mbstandby clurgmgrd: [15850]: /dev/sdo1 is not mounted
Jun 25 11:13:43 mbstandby clurgmgrd: [15850]: /dev/sdi1 is not mounted
Jun 25 11:13:43 mbstandby clurgmgrd: [15850]: /dev/sdm1 is not mounted
Jun 25 11:13:47 mbstandby sshd(pam_unix)[17583]: session opened for user root by (uid=0)
Jun 25 11:13:48 mbstandby clurgmgrd: [15850]: /dev/sdd1 is not mounted
Jun 25 11:13:48 mbstandby clurgmgrd: [15850]: /dev/sdh1 is not mounted
Jun 25 11:13:53 mbstandby clurgmgrd: [15850]: /dev/sdg1 is not mounted
Jun 25 11:13:53 mbstandby clurgmgrd: [15850]: /dev/sdc1 is not mounted
Jun 25 11:13:56 mbstandby rgmanager: clurgmgrd startup failed
Jun 25 11:13:58 mbstandby clurgmgrd: [15850]: /dev/sdf1 is not mounted
Jun 25 11:13:58 mbstandby clurgmgrd: [15850]: /dev/sdb1 is not mounted
Jun 25 11:14:04 mbstandby clurgmgrd: [15850]: /dev/sde1 is not mounted
Jun 25 11:14:04 mbstandby clurgmgrd: [15850]: /dev/sda1 is not mounted
Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Services Initialized
Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Logged in SG "usrm::manager"
Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Magma Event: Membership Change
Jun 25 11:14:09 mbstandby clurgmgrd[15850]: State change: Local UP
Jun 25 11:14:12 mbstandby clurgmgrd[15850]: Magma Event: Membership Change
Jun 25 11:14:12 mbstandby clurgmgrd[15850]: State change: mb1.ku.edu.kw UP
Jun 25 11:14:13 mbstandby clurgmgrd[15850]: Resource groups locked; not evaluating
Jun 25 11:14:14 mbstandby clurgmgrd[15850]: Magma Event: Membership Change
Jun 25 11:14:14 mbstandby clurgmgrd[15850]: State change: mb2.ku.edu.kw UP
Jun 25 11:49:22 mbstandby sshd(pam_unix)[9438]: session opened for user root by (uid=0)

Can anyone help me in this issue. Its really urgent. So request you for help. 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/bc21f700/attachment.htm>
-------------- next part --------------
<?xml version="1.0"?>
<cluster alias="KU-CLUSTER" config_version="26" name="KU-CLUSTER">
	<fence_daemon post_fail_delay="0" post_join_delay="8"/>
	<clusternodes>
		<clusternode name="mb1.ku.edu.kw" votes="1">
			<fence>
				<method name="1">
					<device name="MB1-FENCE"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="mb2.ku.edu.kw" votes="1">
			<fence>
				<method name="1">
					<device name="MB1-FENCE"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="mbstandby.ku.edu.kw" votes="1">
			<fence>
				<method name="1">
					<device name="MB1-FENCE"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<cman/>
	<fencedevices>
		<fencedevice agent="fence_ilo" hostname="10.10.1.102" login="Administrator" name="MB1-FENCE" passwd="YTRFARYM744"/>
	</fencedevices>
	<rm log_facility="local4" log_level="7">
		<failoverdomains>
			<failoverdomain name="mb1-cluster.ku.edu.kw" ordered="1" restricted="1">
				<failoverdomainnode name="mb1.ku.edu.kw" priority="1"/>
				<failoverdomainnode name="mbstandby.ku.edu.kw" priority="2"/>
			</failoverdomain>
			<failoverdomain name="mb2-cluster.ku.edu.kw" ordered="1" restricted="1">
				<failoverdomainnode name="mbstandby.ku.edu.kw" priority="2"/>
				<failoverdomainnode name="mb2.ku.edu.kw" priority="1"/>
			</failoverdomain>
		</failoverdomains>
		<resources>
			<ip address="10.10.1.98" monitor_link="1"/>
			<ip address="10.10.1.80" monitor_link="1"/>
			<fs device="LABEL=MB1-test-CLUST" force_fsck="0" force_unmount="1" fsid="55679" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb1-cluster.ku.edu.kw" name="MB1-CLUSTER" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB1-STORE" force_fsck="0" force_unmount="1" fsid="57530" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb1-cluster.ku.edu.kw/store" name="MB1-STORE" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB1-REDOLOG" force_fsck="0" force_unmount="1" fsid="46961" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb1-cluster.ku.edu.kw/redolog" name="MB1-REDOLOG" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB1-BACKUP" force_fsck="0" force_unmount="1" fsid="29649" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb1-cluster.ku.edu.kw/backup" name="MB1-BACKUP" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB1-CONF" force_fsck="0" force_unmount="1" fsid="59576" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb1-cluster.ku.edu.kw/conf" name="MB1-CONF" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB1-LOG" force_fsck="0" force_unmount="1" fsid="29992" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb1-cluster.ku.edu.kw/log" name="MB1-LOG" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB1-INDEX" force_fsck="0" force_unmount="1" fsid="57292" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb1-cluster.ku.edu.kw/index" name="MB1-INDEX" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB1-DBDATA" force_fsck="0" force_unmount="1" fsid="18112" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb1-cluster.ku.edu.kw/db/data" name="MB1-DBDATA" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB2-test-CLUST" force_fsck="0" force_unmount="1" fsid="58810" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb2-cluster.ku.edu.kw" name="MB2-CLUSTER" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB2-LOG" force_fsck="0" force_unmount="1" fsid="4088" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb2-cluster.ku.edu.kw/log" name="MB2-LOG" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB2-INDEX" force_fsck="0" force_unmount="1" fsid="50513" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb2-cluster.ku.edu.kw/index" name="MB2-INDEX" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB2-REDOLOG" force_fsck="0" force_unmount="1" fsid="35764" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb2-cluster.ku.edu.kw/redolog" name="MB2-REDOLOG" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB2-CONF" force_fsck="0" force_unmount="1" fsid="39446" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb2-cluster.ku.edu.kw/conf" name="MB2-CONF" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB2-STORE" force_fsck="0" force_unmount="1" fsid="64039" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb2-cluster.ku.edu.kw/store" name="MB2-STORE" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB2-DBDATA" force_fsck="0" force_unmount="1" fsid="16675" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb2-cluster.ku.edu.kw/db/data" name="MB2-DBDATA" options="noatime" self_fence="1"/>
			<fs device="LABEL=MB2-BACKUP" force_fsck="0" force_unmount="1" fsid="29746" fstype="ext3" mountpoint="/opt/test-cluster/mountpoints/mb2-cluster.ku.edu.kw/backup" name="MB2-BACKUP" options="noatime" self_fence="1"/>
		</resources>
		<service autostart="0" domain="mb1-cluster.ku.edu.kw" exclusive="1" name="mb1-cluster.ku.edu.kw" recovery="restart">
			<ip ref="10.10.1.98">
				<fs ref="MB1-CLUSTER"/>
				<fs ref="MB1-STORE"/>
				<fs ref="MB1-REDOLOG"/>
				<fs ref="MB1-BACKUP"/>
				<fs ref="MB1-CONF"/>
				<fs ref="MB1-LOG"/>
				<fs ref="MB1-INDEX"/>
				<fs ref="MB1-DBDATA"/>
			</ip>
		</service>
		<service autostart="0" domain="mb2-cluster.ku.edu.kw" exclusive="1" name="mb2-cluster.ku.edu.kw" recovery="restart">
			<ip ref="10.10.1.80">
				<fs ref="MB2-CLUSTER"/>
				<fs ref="MB2-LOG"/>
				<fs ref="MB2-INDEX"/>
				<fs ref="MB2-REDOLOG"/>
				<fs ref="MB2-CONF"/>
				<fs ref="MB2-DBDATA"/>
				<fs ref="MB2-STORE"/>
				<fs ref="MB2-BACKUP"/>
			</ip>
		</service>
	</rm>
</cluster>

From alacey at brynmawr.edu  Wed Jun 25 19:29:08 2008
From: alacey at brynmawr.edu (Andrew Lacey)
Date: Wed, 25 Jun 2008 15:29:08 -0400 (EDT)
Subject: [Linux-cluster] Cluster issues
In-Reply-To: <1475730099.2116501214422045790.JavaMail.root@ganesh.brynmawr.edu>
Message-ID: <1931225635.2117111214422148904.JavaMail.root@ganesh.brynmawr.edu>

Hi- 

Are you using the "multipath" Red Hat package (like for an iSCSI SAN)? I had similar problems because my multipath stuff was not set up properly. Do some research on /etc/multipath.conf if you suspect this is the problem. Once multipath is set up, your mountpoints will be in /dev/mapper/ and not right in /dev/. 

-Andrew L 

----- Original Message ----- 
From: "punit_j" <punit_j at rediffmail.com> 
To: linux-cluster at redhat.com 
Sent: Wednesday, June 25, 2008 2:53:43 PM GMT -05:00 US/Canada Eastern 
Subject: [Linux-cluster] Cluster issues 


Hi All, 

I have installed zimbra mailbox server in 2+1 cluster. I am facing following issues while configuring Redhat cluster suite: - 

2 services - 
mb1-cluster.ku.edu.kw 
mb2-cluster.ku.edu.kw 

I have only IP and mount points which are used using LABEL assigned to them. When i reboot all the 3 nodes mount points dont come up and i am getting following errors in log file. Attached is the cluster.conf for reference and also below are logs mentioned : - 

These logs are on 1st node : - 

Jun 25 11:13:15 mb1 clurgmgrd[14825]: Resource Group Manager Starting 
Jun 25 11:13:15 mb1 clurgmgrd[14825]: Loading Service Data 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: Initializing Services 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: /dev/sdh1 is not mounted 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-BACKUP with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-BACKUP returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-STORE with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-STORE returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-DBDATA with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-DBDATA returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-CONF with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-CONF returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-REDOLOG with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-REDOLOG returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-INDEX with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-INDEX returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-LOG with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-LOG returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-TEST-CLUST with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-CLUSTER returned 2 (invalid argument(s)) 
Jun 25 11:13:22 mb1 clurgmgrd: [14825]: /dev/sdg1 is not mounted 
Jun 25 11:13:27 mb1 clurgmgrd: [14825]: /dev/sdf1 is not mounted 
Jun 25 11:13:33 mb1 clurgmgrd: [14825]: /dev/sde1 is not mounted 
Jun 25 11:13:38 mb1 clurgmgrd: [14825]: /dev/sdd1 is not mounted 
Jun 25 11:13:43 mb1 clurgmgrd: [14825]: /dev/sdc1 is not mounted 
Jun 25 11:13:45 mb1 rgmanager: clurgmgrd startup failed 
Jun 25 11:13:48 mb1 clurgmgrd: [14825]: /dev/sdb1 is not mounted 
Jun 25 11:13:53 mb1 clurgmgrd: [14825]: /dev/sda1 is not mounted 
Jun 25 11:13:58 mb1 clurgmgrd[14825]: Services Initialized 
Jun 25 11:14:01 mb1 clurgmgrd[14825]: Logged in SG "usrm::manager" 
Jun 25 11:14:01 mb1 clurgmgrd[14825]: Magma Event: Membership Change 
Jun 25 11:14:01 mb1 clurgmgrd[14825]: State change: Local UP 
Jun 25 11:14:01 mb1 clurgmgrd[14825]: State change: mbstandby.ku.edu.kw UP 
Jun 25 11:14:03 mb1 clurgmgrd[14825]: Magma Event: Membership Change 
Jun 25 11:14:03 mb1 clurgmgrd[14825]: State change: mb2.ku.edu.kw UP 

On STANDBY NODE 

forgot to add standby server logs: - 

Jun 25 11:13:26 mbstandby clurgmgrd[15850]: Resource Group Manager Starting 
Jun 25 11:13:26 mbstandby clurgmgrd[15850]: Loading Service Data 
Jun 25 11:13:27 mbstandby clurgmgrd[15850]: Initializing Services 
Jun 25 11:13:27 mbstandby clurgmgrd: [15850]: /dev/sdl1 is not mounted 
Jun 25 11:13:27 mbstandby clurgmgrd: [15850]: /dev/sdp1 is not mounted 
Jun 25 11:13:32 mbstandby clurgmgrd: [15850]: /dev/sdk1 is not mounted 
Jun 25 11:13:32 mbstandby clurgmgrd: [15850]: /dev/sdn1 is not mounted 
Jun 25 11:13:38 mbstandby clurgmgrd: [15850]: /dev/sdj1 is not mounted 
Jun 25 11:13:38 mbstandby clurgmgrd: [15850]: /dev/sdo1 is not mounted 
Jun 25 11:13:43 mbstandby clurgmgrd: [15850]: /dev/sdi1 is not mounted 
Jun 25 11:13:43 mbstandby clurgmgrd: [15850]: /dev/sdm1 is not mounted 
Jun 25 11:13:47 mbstandby sshd(pam_unix)[17583]: session opened for user root by (uid=0) 
Jun 25 11:13:48 mbstandby clurgmgrd: [15850]: /dev/sdd1 is not mounted 
Jun 25 11:13:48 mbstandby clurgmgrd: [15850]: /dev/sdh1 is not mounted 
Jun 25 11:13:53 mbstandby clurgmgrd: [15850]: /dev/sdg1 is not mounted 
Jun 25 11:13:53 mbstandby clurgmgrd: [15850]: /dev/sdc1 is not mounted 
Jun 25 11:13:56 mbstandby rgmanager: clurgmgrd startup failed 
Jun 25 11:13:58 mbstandby clurgmgrd: [15850]: /dev/sdf1 is not mounted 
Jun 25 11:13:58 mbstandby clurgmgrd: [15850]: /dev/sdb1 is not mounted 
Jun 25 11:14:04 mbstandby clurgmgrd: [15850]: /dev/sde1 is not mounted 
Jun 25 11:14:04 mbstandby clurgmgrd: [15850]: /dev/sda1 is not mounted 
Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Services Initialized 
Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Logged in SG "usrm::manager" 
Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Magma Event: Membership Change 
Jun 25 11:14:09 mbstandby clurgmgrd[15850]: State change: Local UP 
Jun 25 11:14:12 mbstandby clurgmgrd[15850]: Magma Event: Membership Change 
Jun 25 11:14:12 mbstandby clurgmgrd[15850]: State change: mb1.ku.edu.kw UP 
Jun 25 11:14:13 mbstandby clurgmgrd[15850]: Resource groups locked; not evaluating 
Jun 25 11:14:14 mbstandby clurgmgrd[15850]: Magma Event: Membership Change 
Jun 25 11:14:14 mbstandby clurgmgrd[15850]: State change: mb2.ku.edu.kw UP 
Jun 25 11:49:22 mbstandby sshd(pam_unix)[9438]: session opened for user root by (uid=0) 

Can anyone help me in this issue. Its really urgent. So request you for help. 


Sig_3million_578x38.gif
-- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/2b963688/attachment.htm>

From alacey at brynmawr.edu  Wed Jun 25 19:31:04 2008
From: alacey at brynmawr.edu (Andrew Lacey)
Date: Wed, 25 Jun 2008 15:31:04 -0400 (EDT)
Subject: [Linux-cluster] Cluster issues
In-Reply-To: <20080625185343.18967.qmail@f5mail-237-214.rediffmail.com>
Message-ID: <589227855.2117821214422264992.JavaMail.root@ganesh.brynmawr.edu>

Correction: I meant that your device nodes for the SAN volumes will be in /dev/mapper, not your mountpoints, which are wherever you want them to be :-) 

-Andrew L 

----- Original Message ----- 
From: "punit_j" <punit_j at rediffmail.com> 
To: linux-cluster at redhat.com 
Sent: Wednesday, June 25, 2008 2:53:43 PM GMT -05:00 US/Canada Eastern 
Subject: [Linux-cluster] Cluster issues 


Hi All, 

I have installed zimbra mailbox server in 2+1 cluster. I am facing following issues while configuring Redhat cluster suite: - 

2 services - 
mb1-cluster.ku.edu.kw 
mb2-cluster.ku.edu.kw 

I have only IP and mount points which are used using LABEL assigned to them. When i reboot all the 3 nodes mount points dont come up and i am getting following errors in log file. Attached is the cluster.conf for reference and also below are logs mentioned : - 

These logs are on 1st node : - 

Jun 25 11:13:15 mb1 clurgmgrd[14825]: Resource Group Manager Starting 
Jun 25 11:13:15 mb1 clurgmgrd[14825]: Loading Service Data 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: Initializing Services 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: /dev/sdh1 is not mounted 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-BACKUP with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-BACKUP returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-STORE with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-STORE returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-DBDATA with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-DBDATA returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-CONF with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-CONF returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-REDOLOG with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-REDOLOG returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-INDEX with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-INDEX returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-LOG with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-LOG returned 2 (invalid argument(s)) 
Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-TEST-CLUST with a real device 
Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-CLUSTER returned 2 (invalid argument(s)) 
Jun 25 11:13:22 mb1 clurgmgrd: [14825]: /dev/sdg1 is not mounted 
Jun 25 11:13:27 mb1 clurgmgrd: [14825]: /dev/sdf1 is not mounted 
Jun 25 11:13:33 mb1 clurgmgrd: [14825]: /dev/sde1 is not mounted 
Jun 25 11:13:38 mb1 clurgmgrd: [14825]: /dev/sdd1 is not mounted 
Jun 25 11:13:43 mb1 clurgmgrd: [14825]: /dev/sdc1 is not mounted 
Jun 25 11:13:45 mb1 rgmanager: clurgmgrd startup failed 
Jun 25 11:13:48 mb1 clurgmgrd: [14825]: /dev/sdb1 is not mounted 
Jun 25 11:13:53 mb1 clurgmgrd: [14825]: /dev/sda1 is not mounted 
Jun 25 11:13:58 mb1 clurgmgrd[14825]: Services Initialized 
Jun 25 11:14:01 mb1 clurgmgrd[14825]: Logged in SG "usrm::manager" 
Jun 25 11:14:01 mb1 clurgmgrd[14825]: Magma Event: Membership Change 
Jun 25 11:14:01 mb1 clurgmgrd[14825]: State change: Local UP 
Jun 25 11:14:01 mb1 clurgmgrd[14825]: State change: mbstandby.ku.edu.kw UP 
Jun 25 11:14:03 mb1 clurgmgrd[14825]: Magma Event: Membership Change 
Jun 25 11:14:03 mb1 clurgmgrd[14825]: State change: mb2.ku.edu.kw UP 

On STANDBY NODE 

forgot to add standby server logs: - 

Jun 25 11:13:26 mbstandby clurgmgrd[15850]: Resource Group Manager Starting 
Jun 25 11:13:26 mbstandby clurgmgrd[15850]: Loading Service Data 
Jun 25 11:13:27 mbstandby clurgmgrd[15850]: Initializing Services 
Jun 25 11:13:27 mbstandby clurgmgrd: [15850]: /dev/sdl1 is not mounted 
Jun 25 11:13:27 mbstandby clurgmgrd: [15850]: /dev/sdp1 is not mounted 
Jun 25 11:13:32 mbstandby clurgmgrd: [15850]: /dev/sdk1 is not mounted 
Jun 25 11:13:32 mbstandby clurgmgrd: [15850]: /dev/sdn1 is not mounted 
Jun 25 11:13:38 mbstandby clurgmgrd: [15850]: /dev/sdj1 is not mounted 
Jun 25 11:13:38 mbstandby clurgmgrd: [15850]: /dev/sdo1 is not mounted 
Jun 25 11:13:43 mbstandby clurgmgrd: [15850]: /dev/sdi1 is not mounted 
Jun 25 11:13:43 mbstandby clurgmgrd: [15850]: /dev/sdm1 is not mounted 
Jun 25 11:13:47 mbstandby sshd(pam_unix)[17583]: session opened for user root by (uid=0) 
Jun 25 11:13:48 mbstandby clurgmgrd: [15850]: /dev/sdd1 is not mounted 
Jun 25 11:13:48 mbstandby clurgmgrd: [15850]: /dev/sdh1 is not mounted 
Jun 25 11:13:53 mbstandby clurgmgrd: [15850]: /dev/sdg1 is not mounted 
Jun 25 11:13:53 mbstandby clurgmgrd: [15850]: /dev/sdc1 is not mounted 
Jun 25 11:13:56 mbstandby rgmanager: clurgmgrd startup failed 
Jun 25 11:13:58 mbstandby clurgmgrd: [15850]: /dev/sdf1 is not mounted 
Jun 25 11:13:58 mbstandby clurgmgrd: [15850]: /dev/sdb1 is not mounted 
Jun 25 11:14:04 mbstandby clurgmgrd: [15850]: /dev/sde1 is not mounted 
Jun 25 11:14:04 mbstandby clurgmgrd: [15850]: /dev/sda1 is not mounted 
Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Services Initialized 
Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Logged in SG "usrm::manager" 
Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Magma Event: Membership Change 
Jun 25 11:14:09 mbstandby clurgmgrd[15850]: State change: Local UP 
Jun 25 11:14:12 mbstandby clurgmgrd[15850]: Magma Event: Membership Change 
Jun 25 11:14:12 mbstandby clurgmgrd[15850]: State change: mb1.ku.edu.kw UP 
Jun 25 11:14:13 mbstandby clurgmgrd[15850]: Resource groups locked; not evaluating 
Jun 25 11:14:14 mbstandby clurgmgrd[15850]: Magma Event: Membership Change 
Jun 25 11:14:14 mbstandby clurgmgrd[15850]: State change: mb2.ku.edu.kw UP 
Jun 25 11:49:22 mbstandby sshd(pam_unix)[9438]: session opened for user root by (uid=0) 

Can anyone help me in this issue. Its really urgent. So request you for help. 


Sig_3million_578x38.gif
-- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/fc8dd820/attachment.htm>

From bfields at fieldses.org  Wed Jun 25 22:45:44 2008
From: bfields at fieldses.org (J. Bruce Fields)
Date: Wed, 25 Jun 2008 18:45:44 -0400
Subject: [Linux-cluster] gfs2, kvm setup
Message-ID: <20080625224544.GJ12629@fieldses.org>

I'm trying to get a gfs2 file system running on some kvm hosts, using an
ordinary qemu disk for the shared storage (is there any reason this
can't work?).

I installed openais80.3 from source (after modifying Makefile so "make
install" would install to /), and installed gfs2 from the STABLE2 branch
of git://sources.redhat.com/git/cluster.git, plus this patch:

	https://www.redhat.com/archives/cluster-devel/2008-April/msg00143.html

(with conflict in write_result() resolved in the obvious way).  The
kernel is from recent git: 2.6.26-rc4-00103-g1beee8d.  I created a
minimal cluster.conf and did a mkfs -tgfs2 following doc/usage.txt, then
did the startup steps from usage.txt by hand.  Everything works up to
the mount, at which point the first host gets the following lock bug in
the logs.  Other mounts fail or hang.

Any hints?

--b.

Jun 25 18:30:11 piglet1 ccsd[3022]: Starting ccsd 1214172260: 
Jun 25 18:30:11 piglet1 ccsd[3022]:  Built: Jun 22 2008 18:04:35 
Jun 25 18:30:11 piglet1 ccsd[3022]:  Copyright (C) Red Hat, Inc.  2004-2008  All rights reserved. 
Jun 25 18:30:11 piglet1 ccsd[3022]: /etc/cluster/cluster.conf (cluster name = piglet, version = 1) found. 
Jun 25 18:30:15 piglet1 ccsd[3022]: Initial status:: Quorate 
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "piglet:test"
Jun 25 18:31:01 piglet1 kernel: dlm: Using TCP for communications
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: Joined cluster. Now mounting FS...
Jun 25 18:31:01 piglet1 kernel: 
Jun 25 18:31:01 piglet1 kernel: =====================================
Jun 25 18:31:01 piglet1 kernel: [ BUG: bad unlock balance detected! ]
Jun 25 18:31:01 piglet1 kernel: -------------------------------------
Jun 25 18:31:01 piglet1 kernel: dlm_recoverd/3061 is trying to release lock (&ls->ls_in_recovery) at:
Jun 25 18:31:01 piglet1 kernel: [<c01c3930>] dlm_recoverd+0x440/0x510
Jun 25 18:31:01 piglet1 kernel: but there are no more locks to release!
Jun 25 18:31:01 piglet1 kernel: 
Jun 25 18:31:01 piglet1 kernel: other info that might help us debug this:
Jun 25 18:31:01 piglet1 kernel: 3 locks held by dlm_recoverd/3061:
Jun 25 18:31:01 piglet1 kernel:  #0:  (&ls->ls_recoverd_active){--..}, at: [<c01c35c5>] dlm_recoverd+0xd5/0x510
Jun 25 18:31:01 piglet1 kernel:  #1:  (&ls->ls_recv_active){--..}, at: [<c01c38c5>] dlm_recoverd+0x3d5/0x510
Jun 25 18:31:01 piglet1 kernel:  #2:  (&ls->ls_recover_lock){--..}, at: [<c01c38cd>] dlm_recoverd+0x3dd/0x510
Jun 25 18:31:01 piglet1 kernel: 
Jun 25 18:31:01 piglet1 kernel: stack backtrace:
Jun 25 18:31:01 piglet1 kernel: Pid: 3061, comm: dlm_recoverd Not tainted 2.6.26-rc4-00103-g1beee8d #38
Jun 25 18:31:01 piglet1 kernel:  [<c0137bb9>] print_unlock_inbalance_bug+0xc9/0xf0
Jun 25 18:31:01 piglet1 kernel:  [<c0137449>] ? save_trace+0x39/0xa0
Jun 25 18:31:01 piglet1 kernel:  [<c01374ea>] ? add_lock_to_list+0x3a/0xa0
Jun 25 18:31:01 piglet1 kernel:  [<c0139cac>] ? __lock_acquire+0xb9c/0xfc0
Jun 25 18:31:01 piglet1 kernel:  [<c0139f14>] ? __lock_acquire+0xe04/0xfc0
Jun 25 18:31:01 piglet1 kernel:  [<c013a23f>] lock_release_non_nested+0xff/0x170
Jun 25 18:31:01 piglet1 kernel:  [<c01c3930>] ? dlm_recoverd+0x440/0x510
Jun 25 18:31:01 piglet1 kernel:  [<c01c3930>] ? dlm_recoverd+0x440/0x510
Jun 25 18:31:01 piglet1 kernel:  [<c013a33d>] lock_release+0x8d/0x150
Jun 25 18:31:01 piglet1 kernel:  [<c0131c96>] up_write+0x16/0x30
Jun 25 18:31:01 piglet1 kernel:  [<c01c3930>] dlm_recoverd+0x440/0x510
Jun 25 18:31:01 piglet1 kernel:  [<c01c34f0>] ? dlm_recoverd+0x0/0x510
Jun 25 18:31:01 piglet1 kernel:  [<c012e546>] kthread+0x36/0x60
Jun 25 18:31:01 piglet1 kernel:  [<c012e510>] ? kthread+0x0/0x60
Jun 25 18:31:01 piglet1 kernel:  [<c0103587>] kernel_thread_helper+0x7/0x10
Jun 25 18:31:01 piglet1 kernel:  =======================
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0, already locked for use
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0: Looking at journal...
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0: Done
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Trying to acquire journal lock...
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Looking at journal...
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Done
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Trying to acquire journal lock...
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Looking at journal...
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Done
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Trying to acquire journal lock...
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Looking at journal...
Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Done
Jun 25 18:33:46 piglet1 ntpd[2951]: synchronized to 76.189.12.0, stratum 1


From whiteley at pdx.edu  Wed Jun 25 22:51:53 2008
From: whiteley at pdx.edu (matt whiteley)
Date: Wed, 25 Jun 2008 15:51:53 -0700
Subject: [Linux-cluster] virtual machine failover with gfs
Message-ID: <D074D3F4-84B1-4596-92AF-12FD99AF726B@pdx.edu>

I have spent lots of hours trying different setups and reading the  
documentation already so I hope this isn't a faq as I am new to the  
list.

I read the Red Hat Magazine article on this topic[1], but have come to  
realize that it might not be exactly what I am going for. I want to  
have a group of nodes that run a group of virtual machines with  
automated failover. I set things up how the article described but  
realized I didn't want the gfs mount in the fstab file. I would like  
the gfs mount described in the cluster.conf file so that as nodes are  
added or removed the mount will follow the changes (I know about the 1  
journal per node so have created a few extra already). When I add a  
service to mount the gfs resource, it only gets mounted on one node as  
is to be expected thinking in terms of other resources.

I started thinking about this and it almost seems like gfs is  
unnecessary. Should I have a file system per virtual machine that  
wouldn't need to be gfs since only one node will ever run a virtual  
machine at a time? Then mount/umount the file system as the virtual  
machine was migrated in the cluster?

It seems like I am missing something about how this should be setup  
and I would really appreciate any tips or ideas. I will include my  
cluster.conf in case it provides any more info.

As a side note, what is with all the errors from system-config- 
kickstart telling me my config file is invalid if it was generated by  
conga. Both versions are updated to the newest available.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: application/octet-stream
Size: 3040 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/0e5eeb2d/attachment.obj>
-------------- next part --------------


[1] http://www.redhatmagazine.com/2007/08/23/automated-failover-and-recovery-of-virtualized-guests-in-advanced-platform/

thanks,
-- 
matt whiteley <whiteley at pdx.edu>


From joe at 2resonate.net  Wed Jun 25 23:41:31 2008
From: joe at 2resonate.net (Joe Royall)
Date: Wed, 25 Jun 2008 16:41:31 -0700
Subject: [Linux-cluster] virtual machine failover with gfs
In-Reply-To: <D074D3F4-84B1-4596-92AF-12FD99AF726B@pdx.edu>
References: <D074D3F4-84B1-4596-92AF-12FD99AF726B@pdx.edu>
Message-ID: <bb0fa5d0806251641s7596ebc6q2d905187e24e7a5b@mail.gmail.com>

2008/6/25 matt whiteley <whiteley at pdx.edu>:

> I have spent lots of hours trying different setups and reading the
> documentation already so I hope this isn't a faq as I am new to the list.
>
> I read the Red Hat Magazine article on this topic[1], but have come to
> realize that it might not be exactly what I am going for. I want to have a
> group of nodes that run a group of virtual machines with automated failover.
> I set things up how the article described but realized I didn't want the gfs
> mount in the fstab file. I would like the gfs mount described in the
> cluster.conf file so that as nodes are added or removed the mount will
> follow the changes (I know about the 1 journal per node so have created a
> few extra already). When I add a service to mount the gfs resource, it only
> gets mounted on one node as is to be expected thinking in terms of other
> resources.
>
> I started thinking about this and it almost seems like gfs is unnecessary.
> Should I have a file system per virtual machine that wouldn't need to be gfs
> since only one node will ever run a virtual machine at a time? Then
> mount/umount the file system as the virtual machine was migrated in the
> cluster?
>
> It seems like I am missing something about how this should be setup and I
> would really appreciate any tips or ideas. I will include my cluster.conf in
> case it provides any more info.
>
> As a side note, what is with all the errors from system-config-kickstart
> telling me my config file is invalid if it was generated by conga. Both
> versions are updated to the newest available.
>

Why not use lvm backed vms, 1 per vm, share the entire partition with all
the lvms via ISCSI to each dom0 and run clvm on the dom0s.  The lvms do not
need to be mounted in dom0.  You can then use RHCS to failover vms between
dom0s.  Consider putting all the vms on a single node into a single resource
group and only allow 1 group to operate on a single node.  You can then
configure N+1 redundancy.

>
>
>
>
>
> [1]
> http://www.redhatmagazine.com/2007/08/23/automated-failover-and-recovery-of-virtualized-guests-in-advanced-platform/
>
> thanks,
> --
> matt whiteley <whiteley at pdx.edu>
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Joe Royall
Red Hat Certified Architect
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/0fa34b20/attachment.htm>

From Alain.Moulle at bull.net  Thu Jun 26 08:34:51 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Thu, 26 Jun 2008 10:34:51 +0200
Subject: [Linux-cluster] CS5 / IP failover with bond interface ?
Message-ID: <486354AB.4050307@bull.net>

Hi

Is it supported to use IP bonded adress as IP to
be failovered via the CS5 ?

Thanks
Regards
Alain Moull?


From Alain.Moulle at bull.net  Thu Jun 26 08:38:54 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Thu, 26 Jun 2008 10:38:54 +0200
Subject: [Linux-cluster] Re: CS5 / quorum disk and heuristics / about
	allow_kill and/or reboot
Message-ID: <4863559E.9030200@bull.net>

Hi Lon

and so ... ? ;-)

Regards
Alain Moull?


Date: Tue, 10 Jun 2008 14:37:19 -0400
From: Lon Hohberger <lhh at redhat.com>
>>Hi Lon,
>>> Whereas heart-beat interface was working fine.
>>> You can disable these by setting allow_kill="0" and/or reboot="0"
>>> (see qdisk(5)).
>>
>>
>> => ok but in the case of a heart-beat failure, it will no more
>> avoid the dual-fencing in a two-nodes cluster if allow_kill="0" and/or
reboot="0" , right ?

>I'd have to think about it.
>Lon


From punit_j at rediffmail.com  Thu Jun 26 10:05:10 2008
From: punit_j at rediffmail.com (punit_j)
Date: 26 Jun 2008 10:05:10 -0000
Subject: [Linux-cluster] Cluster issues
Message-ID: <20080626100510.7840.qmail@f5mail-236-237.rediffmail.com>

 ?
Hi Andrew, 

Thanks for the response. I have single path with fibre attached to only one controller. I has issues in multipath so i moved to single path. In this case then it should not be a problem right ? but  still i see these errors while mounting SAN device.I am using e2label to assign names to drive and those names are being used in mounting.


Any clue ? Thanks in advance. 
Regards,
- Punit

On Thu, 26 Jun 2008 Andrew Lacey wrote :
>Correction: I meant that your device nodes for the SAN volumes will be in /dev/mapper, not your mountpoints, which are wherever you want them to be :-)
>
>-Andrew L
>
>----- Original Message -----
> From: "punit_j" <punit_j at rediffmail.com>
>To: linux-cluster at redhat.com
>Sent: Wednesday, June 25, 2008 2:53:43 PM GMT -05:00 US/Canada Eastern
>Subject: [Linux-cluster] Cluster issues
>
>
>
>
>Hi All,
>
>I have installed zimbra mailbox server in 2+1 cluster. I am facing following issues while configuring Redhat cluster suite: -
>
>2 services -
>mb1-cluster.ku.edu.kw
>mb2-cluster.ku.edu.kw
>
>I have only IP and mount points which are used using LABEL assigned to them. When i reboot all the 3 nodes mount points dont come up and i am getting following errors in log file. Attached is the cluster.conf for reference and also below are logs mentioned : -
>
>These logs are on 1st node : -
>
>Jun 25 11:13:15 mb1 clurgmgrd[14825]: Resource Group Manager Starting
>Jun 25 11:13:15 mb1 clurgmgrd[14825]: Loading Service Data
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: Initializing Services
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: /dev/sdh1 is not mounted
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-BACKUP with a real device
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-BACKUP returned 2 (invalid argument(s))
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-STORE with a real device
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-STORE returned 2 (invalid argument(s))
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-DBDATA with a real device
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-DBDATA returned 2 (invalid argument(s))
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-CONF with a real device
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-CONF returned 2 (invalid argument(s))
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-REDOLOG with a real device
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-REDOLOG returned 2 (invalid argument(s))
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-INDEX with a real device
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-INDEX returned 2 (invalid argument(s))
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-LOG with a real device
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-LOG returned 2 (invalid argument(s))
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-TEST-CLUST with a real device
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-CLUSTER returned 2 (invalid argument(s))
>Jun 25 11:13:22 mb1 clurgmgrd: [14825]: /dev/sdg1 is not mounted
>Jun 25 11:13:27 mb1 clurgmgrd: [14825]: /dev/sdf1 is not mounted
>Jun 25 11:13:33 mb1 clurgmgrd: [14825]: /dev/sde1 is not mounted
>Jun 25 11:13:38 mb1 clurgmgrd: [14825]: /dev/sdd1 is not mounted
>Jun 25 11:13:43 mb1 clurgmgrd: [14825]: /dev/sdc1 is not mounted
>Jun 25 11:13:45 mb1 rgmanager: clurgmgrd startup failed
>Jun 25 11:13:48 mb1 clurgmgrd: [14825]: /dev/sdb1 is not mounted
>Jun 25 11:13:53 mb1 clurgmgrd: [14825]: /dev/sda1 is not mounted
>Jun 25 11:13:58 mb1 clurgmgrd[14825]: Services Initialized
>Jun 25 11:14:01 mb1 clurgmgrd[14825]: Logged in SG "usrm::manager"
>Jun 25 11:14:01 mb1 clurgmgrd[14825]: Magma Event: Membership Change
>Jun 25 11:14:01 mb1 clurgmgrd[14825]: State change: Local UP
>Jun 25 11:14:01 mb1 clurgmgrd[14825]: State change: mbstandby.ku.edu.kw UP
>Jun 25 11:14:03 mb1 clurgmgrd[14825]: Magma Event: Membership Change
>Jun 25 11:14:03 mb1 clurgmgrd[14825]: State change: mb2.ku.edu.kw UP
>
>On STANDBY NODE
>
>forgot to add standby server logs: -
>
>Jun 25 11:13:26 mbstandby clurgmgrd[15850]: Resource Group Manager Starting
>Jun 25 11:13:26 mbstandby clurgmgrd[15850]: Loading Service Data
>Jun 25 11:13:27 mbstandby clurgmgrd[15850]: Initializing Services
>Jun 25 11:13:27 mbstandby clurgmgrd: [15850]: /dev/sdl1 is not mounted
>Jun 25 11:13:27 mbstandby clurgmgrd: [15850]: /dev/sdp1 is not mounted
>Jun 25 11:13:32 mbstandby clurgmgrd: [15850]: /dev/sdk1 is not mounted
>Jun 25 11:13:32 mbstandby clurgmgrd: [15850]: /dev/sdn1 is not mounted
>Jun 25 11:13:38 mbstandby clurgmgrd: [15850]: /dev/sdj1 is not mounted
>Jun 25 11:13:38 mbstandby clurgmgrd: [15850]: /dev/sdo1 is not mounted
>Jun 25 11:13:43 mbstandby clurgmgrd: [15850]: /dev/sdi1 is not mounted
>Jun 25 11:13:43 mbstandby clurgmgrd: [15850]: /dev/sdm1 is not mounted
>Jun 25 11:13:47 mbstandby sshd(pam_unix)[17583]: session opened for user root by (uid=0)
>Jun 25 11:13:48 mbstandby clurgmgrd: [15850]: /dev/sdd1 is not mounted
>Jun 25 11:13:48 mbstandby clurgmgrd: [15850]: /dev/sdh1 is not mounted
>Jun 25 11:13:53 mbstandby clurgmgrd: [15850]: /dev/sdg1 is not mounted
>Jun 25 11:13:53 mbstandby clurgmgrd: [15850]: /dev/sdc1 is not mounted
>Jun 25 11:13:56 mbstandby rgmanager: clurgmgrd startup failed
>Jun 25 11:13:58 mbstandby clurgmgrd: [15850]: /dev/sdf1 is not mounted
>Jun 25 11:13:58 mbstandby clurgmgrd: [15850]: /dev/sdb1 is not mounted
>Jun 25 11:14:04 mbstandby clurgmgrd: [15850]: /dev/sde1 is not mounted
>Jun 25 11:14:04 mbstandby clurgmgrd: [15850]: /dev/sda1 is not mounted
>Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Services Initialized
>Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Logged in SG "usrm::manager"
>Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Magma Event: Membership Change
>Jun 25 11:14:09 mbstandby clurgmgrd[15850]: State change: Local UP
>Jun 25 11:14:12 mbstandby clurgmgrd[15850]: Magma Event: Membership Change
>Jun 25 11:14:12 mbstandby clurgmgrd[15850]: State change: mb1.ku.edu.kw UP
>Jun 25 11:14:13 mbstandby clurgmgrd[15850]: Resource groups locked; not evaluating
>Jun 25 11:14:14 mbstandby clurgmgrd[15850]: Magma Event: Membership Change
>Jun 25 11:14:14 mbstandby clurgmgrd[15850]: State change: mb2.ku.edu.kw UP
>Jun 25 11:49:22 mbstandby sshd(pam_unix)[9438]: session opened for user root by (uid=0)
>
>Can anyone help me in this issue. Its really urgent. So request you for help.
>
>
>
>
>
>Sig_3million_578x38.gif
>-- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080626/42b4d636/attachment.htm>

From connerf at ncifcrf.gov  Thu Jun 26 12:49:45 2008
From: connerf at ncifcrf.gov (fred conner)
Date: Thu, 26 Jun 2008 08:49:45 -0400
Subject: [Linux-cluster] gfs2 kernel bug?
Message-ID: <1214484585.8227.36.camel@norbert>

yesterday a co-worker was compiling code on a gfs2 filesystem and it
hung the server.  Is there a fix for this?  thank you.

Jun 25 14:24:54 abcc2 kernel: dlm: connecting to 1
Jun 25 14:24:54 abcc2 kernel: dlm: got connection from 1
original: gfs2_rename+0x1a9/0x610 [gfs2]
new: gfs2_inplace_reserve_i+0x205/0x5d0 [gfs2]
Kernel BUG at fs/gfs2/glock.c:1131
invalid opcode: 0000 [1] SMP 
last sysfs file: /kernel/dlm/rgmanager/control
CPU 3 
Modules linked in: autofs4 ipmi_devintf ipmi_si ipmi_msghandler hidp
l2cap bluetooth nfs lockd fscache nfs_acl lock_dlm gfs2 dlm configfs
sunrpc tun
  ip_gre ip_conntrack_ftp ip_conntrack_netbios_ns ipt_REJECT ipt_LOG
xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT
xt_tcpudp i
p6table_filter ip6_tables x_tables ipv6 video sbs backlight i2c_ec
i2c_corea bcc2 kernel: orbutton battery asus_acpi acpi_memhotplug ac
parport_pc l
p parport joydev ata_piix libata sr_mod shpchp ide_cd cdrom e1000 sg
pcspkr serio_raw bnx2 dm_snapshot dm_zero dm_mirror dm_mod usb_storage
megaraid
_sas megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd ehci_hcd
ohci_hcd uhci_hcd
Pid: 13627, comm: mv Not tainted 2.6.18-53.1.14.el5 #1
iginal: gfs2_renRIP: 0010:[<ffffffff8849b216>] ame+0x1a9/0x610
[<ffffffff8849b216>] :gfs2:gfs2_glock_nq+0x111/0x1d4
RSP: 0018:ffff8103e7681a98  EFLAGS: 00010286
[gfs2]
Jun 25 1RAX: 0000000000000020 RBX: ffff8103e7681cb0 RCX:
ffffffff80443520
RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff802e675c
RBP: ffff8103f55c3eb0 R08: 00000000ffffffff R09: 0000000000000020
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8103eadf3378
R13: ffff8103eadf3378 R14: 0000000000000000 R15: ffff81041d563000
FS:  00002aaaaaac9f20(0000) GS:ffff81010eb9c640(0000)
knlGS:0000000000000000
new: gfs2_CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000003acba5cf10 CR3: 00000003ee29e000 CR4: 00000000000006e0
Process mv (pid: 13627, threadinfo ffff8103e7680000, task
ffff810409dd20c0)
inplace_reserve_Stack:  0000000000000000 00000000000200cf
ffff81041d039dc0 ffff8103f55c3eb0
  ffff81041d563000 ffff8103f55c3a60 ffff8103f55c3d60 ffffffff884ac499
  0000000000000000 000002d000000000 ffffffff884b3568 ffff81041d563000
i+0x205/0x5d0 [gCall Trace:
fs2]
  [<ffffffff884ac499>] :gfs2:gfs2_inplace_reserve_i+0x20d/0x5d0
  [<ffffffff80143f87>] sort+0xfa/0x189
  [<ffffffff884a9ab1>] :gfs2:sort_qd+0x0/0x36
  [<ffffffff884a9cf5>] :gfs2:gfs2_quota_check+0x9a/0x182
  [<ffffffff884a69ce>] :gfs2:gfs2_rename+0x3b5/0x610
  [<ffffffff884a6715>] :gfs2:gfs2_rename+0xfc/0x610
  [<ffffffff884a6757>] :gfs2:gfs2_rename+0x13e/0x610
  [<ffffffff884a6781>] :gfs2:gfs2_rename+0x168/0x610
  [<ffffffff884a67c2>] :gfs2:gfs2_rename+0x1a9/0x610
  [<ffffffff8849b417>] :gfs2:gfs2_holder_uninit+0xd/0x1f
  [<ffffffff884a765e>] :gfs2:gfs2_permission+0xae/0xd4
  [<ffffffff8000d42e>] permission+0x81/0xc8
  [<ffffffff8002a515>] vfs_rename+0x2db/0x458
  [<ffffffff800362fa>] sys_renameat+0x180/0x1eb
  [<ffffffff80065a9d>] do_page_fault+0x4eb/0x81d
  [<ffffffff8002c886>] mntput_no_expire+0x19/0x89
  [<ffffffff80031f4f>] sys_faccessat+0x148/0x18d
  [<ffffffff800b3a88>] audit_syscall_entry+0x14d/0x180
  [<ffffffff8005c28d>] tracesys+0xd5/0xe0


Code: 0f 0b 68 2d 1c 4b 88 c2 6b 04 8b 75 18 49 8b 44 24 78 49 8d 
RIP  [<ffffffff8849b216>] :gfs2:gfs2_glock_nq+0x111/0x1d4
  RSP <ffff8103e7681a98>
  <0>Kernel panic - not syncing: Fatal exception


-- 
Fred Conner [Contractor]


From swhiteho at redhat.com  Thu Jun 26 12:49:54 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Thu, 26 Jun 2008 13:49:54 +0100
Subject: [Linux-cluster] gfs2 kernel bug?
In-Reply-To: <1214484585.8227.36.camel@norbert>
References: <1214484585.8227.36.camel@norbert>
Message-ID: <1214484594.4011.40.camel@quoit>

Hi,

It would help to know what version of GFS2 you are using, but from the
trace it looks likely that its ancient and if you choose a more recent
version of the code, you should find that this problem has gone away,

Steve.

On Thu, 2008-06-26 at 08:49 -0400, fred conner wrote:
> yesterday a co-worker was compiling code on a gfs2 filesystem and it
> hung the server.  Is there a fix for this?  thank you.
> 
> Jun 25 14:24:54 abcc2 kernel: dlm: connecting to 1
> Jun 25 14:24:54 abcc2 kernel: dlm: got connection from 1
> original: gfs2_rename+0x1a9/0x610 [gfs2]
> new: gfs2_inplace_reserve_i+0x205/0x5d0 [gfs2]
> Kernel BUG at fs/gfs2/glock.c:1131
> invalid opcode: 0000 [1] SMP 
> last sysfs file: /kernel/dlm/rgmanager/control
> CPU 3 
> Modules linked in: autofs4 ipmi_devintf ipmi_si ipmi_msghandler hidp
> l2cap bluetooth nfs lockd fscache nfs_acl lock_dlm gfs2 dlm configfs
> sunrpc tun
>   ip_gre ip_conntrack_ftp ip_conntrack_netbios_ns ipt_REJECT ipt_LOG
> xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT
> xt_tcpudp i
> p6table_filter ip6_tables x_tables ipv6 video sbs backlight i2c_ec
> i2c_corea bcc2 kernel: orbutton battery asus_acpi acpi_memhotplug ac
> parport_pc l
> p parport joydev ata_piix libata sr_mod shpchp ide_cd cdrom e1000 sg
> pcspkr serio_raw bnx2 dm_snapshot dm_zero dm_mirror dm_mod usb_storage
> megaraid
> _sas megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd ehci_hcd
> ohci_hcd uhci_hcd
> Pid: 13627, comm: mv Not tainted 2.6.18-53.1.14.el5 #1
> iginal: gfs2_renRIP: 0010:[<ffffffff8849b216>] ame+0x1a9/0x610
> [<ffffffff8849b216>] :gfs2:gfs2_glock_nq+0x111/0x1d4
> RSP: 0018:ffff8103e7681a98  EFLAGS: 00010286
> [gfs2]
> Jun 25 1RAX: 0000000000000020 RBX: ffff8103e7681cb0 RCX:
> ffffffff80443520
> RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff802e675c
> RBP: ffff8103f55c3eb0 R08: 00000000ffffffff R09: 0000000000000020
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8103eadf3378
> R13: ffff8103eadf3378 R14: 0000000000000000 R15: ffff81041d563000
> FS:  00002aaaaaac9f20(0000) GS:ffff81010eb9c640(0000)
> knlGS:0000000000000000
> new: gfs2_CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000003acba5cf10 CR3: 00000003ee29e000 CR4: 00000000000006e0
> Process mv (pid: 13627, threadinfo ffff8103e7680000, task
> ffff810409dd20c0)
> inplace_reserve_Stack:  0000000000000000 00000000000200cf
> ffff81041d039dc0 ffff8103f55c3eb0
>   ffff81041d563000 ffff8103f55c3a60 ffff8103f55c3d60 ffffffff884ac499
>   0000000000000000 000002d000000000 ffffffff884b3568 ffff81041d563000
> i+0x205/0x5d0 [gCall Trace:
> fs2]
>   [<ffffffff884ac499>] :gfs2:gfs2_inplace_reserve_i+0x20d/0x5d0
>   [<ffffffff80143f87>] sort+0xfa/0x189
>   [<ffffffff884a9ab1>] :gfs2:sort_qd+0x0/0x36
>   [<ffffffff884a9cf5>] :gfs2:gfs2_quota_check+0x9a/0x182
>   [<ffffffff884a69ce>] :gfs2:gfs2_rename+0x3b5/0x610
>   [<ffffffff884a6715>] :gfs2:gfs2_rename+0xfc/0x610
>   [<ffffffff884a6757>] :gfs2:gfs2_rename+0x13e/0x610
>   [<ffffffff884a6781>] :gfs2:gfs2_rename+0x168/0x610
>   [<ffffffff884a67c2>] :gfs2:gfs2_rename+0x1a9/0x610
>   [<ffffffff8849b417>] :gfs2:gfs2_holder_uninit+0xd/0x1f
>   [<ffffffff884a765e>] :gfs2:gfs2_permission+0xae/0xd4
>   [<ffffffff8000d42e>] permission+0x81/0xc8
>   [<ffffffff8002a515>] vfs_rename+0x2db/0x458
>   [<ffffffff800362fa>] sys_renameat+0x180/0x1eb
>   [<ffffffff80065a9d>] do_page_fault+0x4eb/0x81d
>   [<ffffffff8002c886>] mntput_no_expire+0x19/0x89
>   [<ffffffff80031f4f>] sys_faccessat+0x148/0x18d
>   [<ffffffff800b3a88>] audit_syscall_entry+0x14d/0x180
>   [<ffffffff8005c28d>] tracesys+0xd5/0xe0
> 
> 
> Code: 0f 0b 68 2d 1c 4b 88 c2 6b 04 8b 75 18 49 8b 44 24 78 49 8d 
> RIP  [<ffffffff8849b216>] :gfs2:gfs2_glock_nq+0x111/0x1d4
>   RSP <ffff8103e7681a98>
>   <0>Kernel panic - not syncing: Fatal exception
> 
> 


From connerf at ncifcrf.gov  Thu Jun 26 13:11:35 2008
From: connerf at ncifcrf.gov (fred conner)
Date: Thu, 26 Jun 2008 09:11:35 -0400
Subject: [Linux-cluster] gfs2 kernel bug?
In-Reply-To: <1214484594.4011.40.camel@quoit>
References: <1214484585.8227.36.camel@norbert> <1214484594.4011.40.camel@quoit>
Message-ID: <1214485895.8227.38.camel@norbert>

I am running Redhat Enterprise 5.1.

On Thu, 2008-06-26 at 13:49 +0100, Steven Whitehouse wrote:
> Hi,
> 
> It would help to know what version of GFS2 you are using, but from the
> trace it looks likely that its ancient and if you choose a more recent
> version of the code, you should find that this problem has gone away,
> 
> Steve.
> 
> On Thu, 2008-06-26 at 08:49 -0400, fred conner wrote:
> > yesterday a co-worker was compiling code on a gfs2 filesystem and it
> > hung the server.  Is there a fix for this?  thank you.
> > 
> > Jun 25 14:24:54 abcc2 kernel: dlm: connecting to 1
> > Jun 25 14:24:54 abcc2 kernel: dlm: got connection from 1
> > original: gfs2_rename+0x1a9/0x610 [gfs2]
> > new: gfs2_inplace_reserve_i+0x205/0x5d0 [gfs2]
> > Kernel BUG at fs/gfs2/glock.c:1131
> > invalid opcode: 0000 [1] SMP 
> > last sysfs file: /kernel/dlm/rgmanager/control
> > CPU 3 
> > Modules linked in: autofs4 ipmi_devintf ipmi_si ipmi_msghandler hidp
> > l2cap bluetooth nfs lockd fscache nfs_acl lock_dlm gfs2 dlm configfs
> > sunrpc tun
> >   ip_gre ip_conntrack_ftp ip_conntrack_netbios_ns ipt_REJECT ipt_LOG
> > xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT
> > xt_tcpudp i
> > p6table_filter ip6_tables x_tables ipv6 video sbs backlight i2c_ec
> > i2c_corea bcc2 kernel: orbutton battery asus_acpi acpi_memhotplug ac
> > parport_pc l
> > p parport joydev ata_piix libata sr_mod shpchp ide_cd cdrom e1000 sg
> > pcspkr serio_raw bnx2 dm_snapshot dm_zero dm_mirror dm_mod usb_storage
> > megaraid
> > _sas megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd ehci_hcd
> > ohci_hcd uhci_hcd
> > Pid: 13627, comm: mv Not tainted 2.6.18-53.1.14.el5 #1
> > iginal: gfs2_renRIP: 0010:[<ffffffff8849b216>] ame+0x1a9/0x610
> > [<ffffffff8849b216>] :gfs2:gfs2_glock_nq+0x111/0x1d4
> > RSP: 0018:ffff8103e7681a98  EFLAGS: 00010286
> > [gfs2]
> > Jun 25 1RAX: 0000000000000020 RBX: ffff8103e7681cb0 RCX:
> > ffffffff80443520
> > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff802e675c
> > RBP: ffff8103f55c3eb0 R08: 00000000ffffffff R09: 0000000000000020
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8103eadf3378
> > R13: ffff8103eadf3378 R14: 0000000000000000 R15: ffff81041d563000
> > FS:  00002aaaaaac9f20(0000) GS:ffff81010eb9c640(0000)
> > knlGS:0000000000000000
> > new: gfs2_CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000003acba5cf10 CR3: 00000003ee29e000 CR4: 00000000000006e0
> > Process mv (pid: 13627, threadinfo ffff8103e7680000, task
> > ffff810409dd20c0)
> > inplace_reserve_Stack:  0000000000000000 00000000000200cf
> > ffff81041d039dc0 ffff8103f55c3eb0
> >   ffff81041d563000 ffff8103f55c3a60 ffff8103f55c3d60 ffffffff884ac499
> >   0000000000000000 000002d000000000 ffffffff884b3568 ffff81041d563000
> > i+0x205/0x5d0 [gCall Trace:
> > fs2]
> >   [<ffffffff884ac499>] :gfs2:gfs2_inplace_reserve_i+0x20d/0x5d0
> >   [<ffffffff80143f87>] sort+0xfa/0x189
> >   [<ffffffff884a9ab1>] :gfs2:sort_qd+0x0/0x36
> >   [<ffffffff884a9cf5>] :gfs2:gfs2_quota_check+0x9a/0x182
> >   [<ffffffff884a69ce>] :gfs2:gfs2_rename+0x3b5/0x610
> >   [<ffffffff884a6715>] :gfs2:gfs2_rename+0xfc/0x610
> >   [<ffffffff884a6757>] :gfs2:gfs2_rename+0x13e/0x610
> >   [<ffffffff884a6781>] :gfs2:gfs2_rename+0x168/0x610
> >   [<ffffffff884a67c2>] :gfs2:gfs2_rename+0x1a9/0x610
> >   [<ffffffff8849b417>] :gfs2:gfs2_holder_uninit+0xd/0x1f
> >   [<ffffffff884a765e>] :gfs2:gfs2_permission+0xae/0xd4
> >   [<ffffffff8000d42e>] permission+0x81/0xc8
> >   [<ffffffff8002a515>] vfs_rename+0x2db/0x458
> >   [<ffffffff800362fa>] sys_renameat+0x180/0x1eb
> >   [<ffffffff80065a9d>] do_page_fault+0x4eb/0x81d
> >   [<ffffffff8002c886>] mntput_no_expire+0x19/0x89
> >   [<ffffffff80031f4f>] sys_faccessat+0x148/0x18d
> >   [<ffffffff800b3a88>] audit_syscall_entry+0x14d/0x180
> >   [<ffffffff8005c28d>] tracesys+0xd5/0xe0
> > 
> > 
> > Code: 0f 0b 68 2d 1c 4b 88 c2 6b 04 8b 75 18 49 8b 44 24 78 49 8d 
> > RIP  [<ffffffff8849b216>] :gfs2:gfs2_glock_nq+0x111/0x1d4
> >   RSP <ffff8103e7681a98>
> >   <0>Kernel panic - not syncing: Fatal exception
> > 
> > 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
-- 
Fred Conner [Contractor]


From swhiteho at redhat.com  Thu Jun 26 13:13:15 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Thu, 26 Jun 2008 14:13:15 +0100
Subject: [Linux-cluster] gfs2 kernel bug?
In-Reply-To: <1214485895.8227.38.camel@norbert>
References: <1214484585.8227.36.camel@norbert>
	<1214484594.4011.40.camel@quoit>  <1214485895.8227.38.camel@norbert>
Message-ID: <1214485995.4011.43.camel@quoit>

Hi,

On Thu, 2008-06-26 at 09:11 -0400, fred conner wrote:
> I am running Redhat Enterprise 5.1.
> 
Ok. That is ancient then. Try the latest Fedora, or if you need support
then the best thing is to talk directly to the Red Hat sales/support who
can tell you all the current details on that.

I know that its marked as a preview feature in RHEL, but really Fedora
is better for that purpose since its much more uptodate,

Steve.

> On Thu, 2008-06-26 at 13:49 +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > It would help to know what version of GFS2 you are using, but from the
> > trace it looks likely that its ancient and if you choose a more recent
> > version of the code, you should find that this problem has gone away,
> > 
> > Steve.
> > 
> > On Thu, 2008-06-26 at 08:49 -0400, fred conner wrote:
> > > yesterday a co-worker was compiling code on a gfs2 filesystem and it
> > > hung the server.  Is there a fix for this?  thank you.
> > > 
> > > Jun 25 14:24:54 abcc2 kernel: dlm: connecting to 1
> > > Jun 25 14:24:54 abcc2 kernel: dlm: got connection from 1
> > > original: gfs2_rename+0x1a9/0x610 [gfs2]
> > > new: gfs2_inplace_reserve_i+0x205/0x5d0 [gfs2]
> > > Kernel BUG at fs/gfs2/glock.c:1131
> > > invalid opcode: 0000 [1] SMP 
> > > last sysfs file: /kernel/dlm/rgmanager/control
> > > CPU 3 
> > > Modules linked in: autofs4 ipmi_devintf ipmi_si ipmi_msghandler hidp
> > > l2cap bluetooth nfs lockd fscache nfs_acl lock_dlm gfs2 dlm configfs
> > > sunrpc tun
> > >   ip_gre ip_conntrack_ftp ip_conntrack_netbios_ns ipt_REJECT ipt_LOG
> > > xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT
> > > xt_tcpudp i
> > > p6table_filter ip6_tables x_tables ipv6 video sbs backlight i2c_ec
> > > i2c_corea bcc2 kernel: orbutton battery asus_acpi acpi_memhotplug ac
> > > parport_pc l
> > > p parport joydev ata_piix libata sr_mod shpchp ide_cd cdrom e1000 sg
> > > pcspkr serio_raw bnx2 dm_snapshot dm_zero dm_mirror dm_mod usb_storage
> > > megaraid
> > > _sas megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd ehci_hcd
> > > ohci_hcd uhci_hcd
> > > Pid: 13627, comm: mv Not tainted 2.6.18-53.1.14.el5 #1
> > > iginal: gfs2_renRIP: 0010:[<ffffffff8849b216>] ame+0x1a9/0x610
> > > [<ffffffff8849b216>] :gfs2:gfs2_glock_nq+0x111/0x1d4
> > > RSP: 0018:ffff8103e7681a98  EFLAGS: 00010286
> > > [gfs2]
> > > Jun 25 1RAX: 0000000000000020 RBX: ffff8103e7681cb0 RCX:
> > > ffffffff80443520
> > > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff802e675c
> > > RBP: ffff8103f55c3eb0 R08: 00000000ffffffff R09: 0000000000000020
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8103eadf3378
> > > R13: ffff8103eadf3378 R14: 0000000000000000 R15: ffff81041d563000
> > > FS:  00002aaaaaac9f20(0000) GS:ffff81010eb9c640(0000)
> > > knlGS:0000000000000000
> > > new: gfs2_CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000003acba5cf10 CR3: 00000003ee29e000 CR4: 00000000000006e0
> > > Process mv (pid: 13627, threadinfo ffff8103e7680000, task
> > > ffff810409dd20c0)
> > > inplace_reserve_Stack:  0000000000000000 00000000000200cf
> > > ffff81041d039dc0 ffff8103f55c3eb0
> > >   ffff81041d563000 ffff8103f55c3a60 ffff8103f55c3d60 ffffffff884ac499
> > >   0000000000000000 000002d000000000 ffffffff884b3568 ffff81041d563000
> > > i+0x205/0x5d0 [gCall Trace:
> > > fs2]
> > >   [<ffffffff884ac499>] :gfs2:gfs2_inplace_reserve_i+0x20d/0x5d0
> > >   [<ffffffff80143f87>] sort+0xfa/0x189
> > >   [<ffffffff884a9ab1>] :gfs2:sort_qd+0x0/0x36
> > >   [<ffffffff884a9cf5>] :gfs2:gfs2_quota_check+0x9a/0x182
> > >   [<ffffffff884a69ce>] :gfs2:gfs2_rename+0x3b5/0x610
> > >   [<ffffffff884a6715>] :gfs2:gfs2_rename+0xfc/0x610
> > >   [<ffffffff884a6757>] :gfs2:gfs2_rename+0x13e/0x610
> > >   [<ffffffff884a6781>] :gfs2:gfs2_rename+0x168/0x610
> > >   [<ffffffff884a67c2>] :gfs2:gfs2_rename+0x1a9/0x610
> > >   [<ffffffff8849b417>] :gfs2:gfs2_holder_uninit+0xd/0x1f
> > >   [<ffffffff884a765e>] :gfs2:gfs2_permission+0xae/0xd4
> > >   [<ffffffff8000d42e>] permission+0x81/0xc8
> > >   [<ffffffff8002a515>] vfs_rename+0x2db/0x458
> > >   [<ffffffff800362fa>] sys_renameat+0x180/0x1eb
> > >   [<ffffffff80065a9d>] do_page_fault+0x4eb/0x81d
> > >   [<ffffffff8002c886>] mntput_no_expire+0x19/0x89
> > >   [<ffffffff80031f4f>] sys_faccessat+0x148/0x18d
> > >   [<ffffffff800b3a88>] audit_syscall_entry+0x14d/0x180
> > >   [<ffffffff8005c28d>] tracesys+0xd5/0xe0
> > > 
> > > 
> > > Code: 0f 0b 68 2d 1c 4b 88 c2 6b 04 8b 75 18 49 8b 44 24 78 49 8d 
> > > RIP  [<ffffffff8849b216>] :gfs2:gfs2_glock_nq+0x111/0x1d4
> > >   RSP <ffff8103e7681a98>
> > >   <0>Kernel panic - not syncing: Fatal exception
> > > 
> > > 
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 


From alacey at brynmawr.edu  Thu Jun 26 13:28:41 2008
From: alacey at brynmawr.edu (Andrew Lacey)
Date: Thu, 26 Jun 2008 09:28:41 -0400 (EDT)
Subject: [Linux-cluster] Cluster issues
In-Reply-To: <20080626100510.7840.qmail@f5mail-236-237.rediffmail.com>
Message-ID: <122034900.2195401214486921403.JavaMail.root@ganesh.brynmawr.edu>

Hi- 

Are you able to use the mount command to manually mount your volumes when the cluster services are stopped? If you get errors just doing that, then the cluster has nothing to do with it. Maybe if you had set up multipath before and tried to undo it, you didn't undo it all the way. 

-Andrew L 

----- Original Message ----- 
From: "punit_j" <punit_j at rediffmail.com> 
To: "Andrew Lacey" <alacey at brynmawr.edu> 
Cc: "linux clustering" <linux-cluster at redhat.com> 
Sent: Thursday, June 26, 2008 6:05:10 AM GMT -05:00 US/Canada Eastern 
Subject: Re: Re: [Linux-cluster] Cluster issues 


Hi Andrew, 

Thanks for the response. I have single path with fibre attached to only one controller. I has issues in multipath so i moved to single path. In this case then it should not be a problem right ? but still i see these errors while mounting SAN device.I am using e2label to assign names to drive and those names are being used in mounting. 


Any clue ? Thanks in advance. 
Regards, 
- Punit 

On Thu, 26 Jun 2008 Andrew Lacey wrote : 
>Correction: I meant that your device nodes for the SAN volumes will be in /dev/mapper, not your mountpoints, which are wherever you want them to be :-) 
> 
>-Andrew L 
> 
>----- Original Message ----- 
> From: "punit_j" <punit_j at rediffmail.com> 
>To: linux-cluster at redhat.com 
>Sent: Wednesday, June 25, 2008 2:53:43 PM GMT -05:00 US/Canada Eastern 
>Subject: [Linux-cluster] Cluster issues 
> 
> 
> 
> 
>Hi All, 
> 
>I have installed zimbra mailbox server in 2+1 cluster. I am facing following issues while configuring Redhat cluster suite: - 
> 
>2 services - 
>mb1-cluster.ku.edu.kw 
>mb2-cluster.ku.edu.kw 
> 
>I have only IP and mount points which are used using LABEL assigned to them. When i reboot all the 3 nodes mount points dont come up and i am getting following errors in log file. Attached is the cluster.conf for reference and also below are logs mentioned : - 
> 
>These logs are on 1st node : - 
> 
>Jun 25 11:13:15 mb1 clurgmgrd[14825]: Resource Group Manager Starting 
>Jun 25 11:13:15 mb1 clurgmgrd[14825]: Loading Service Data 
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: Initializing Services 
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: /dev/sdh1 is not mounted 
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-BACKUP with a real device 
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-BACKUP returned 2 (invalid argument(s)) 
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-STORE with a real device 
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-STORE returned 2 (invalid argument(s)) 
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-DBDATA with a real device 
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-DBDATA returned 2 (invalid argument(s)) 
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-CONF with a real device 
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-CONF returned 2 (invalid argument(s)) 
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-REDOLOG with a real device 
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-REDOLOG returned 2 (invalid argument(s)) 
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-INDEX with a real device 
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-INDEX returned 2 (invalid argument(s)) 
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-LOG with a real device 
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-LOG returned 2 (invalid argument(s)) 
>Jun 25 11:13:17 mb1 clurgmgrd: [14825]: stop: Could not match LABEL=MB2-TEST-CLUST with a real device 
>Jun 25 11:13:17 mb1 clurgmgrd[14825]: stop on fs:MB2-CLUSTER returned 2 (invalid argument(s)) 
>Jun 25 11:13:22 mb1 clurgmgrd: [14825]: /dev/sdg1 is not mounted 
>Jun 25 11:13:27 mb1 clurgmgrd: [14825]: /dev/sdf1 is not mounted 
>Jun 25 11:13:33 mb1 clurgmgrd: [14825]: /dev/sde1 is not mounted 
>Jun 25 11:13:38 mb1 clurgmgrd: [14825]: /dev/sdd1 is not mounted 
>Jun 25 11:13:43 mb1 clurgmgrd: [14825]: /dev/sdc1 is not mounted 
>Jun 25 11:13:45 mb1 rgmanager: clurgmgrd startup failed 
>Jun 25 11:13:48 mb1 clurgmgrd: [14825]: /dev/sdb1 is not mounted 
>Jun 25 11:13:53 mb1 clurgmgrd: [14825]: /dev/sda1 is not mounted 
>Jun 25 11:13:58 mb1 clurgmgrd[14825]: Services Initialized 
>Jun 25 11:14:01 mb1 clurgmgrd[14825]: Logged in SG "usrm::manager" 
>Jun 25 11:14:01 mb1 clurgmgrd[14825]: Magma Event: Membership Change 
>Jun 25 11:14:01 mb1 clurgmgrd[14825]: State change: Local UP 
>Jun 25 11:14:01 mb1 clurgmgrd[14825]: State change: mbstandby.ku.edu.kw UP 
>Jun 25 11:14:03 mb1 clurgmgrd[14825]: Magma Event: Membership Change 
>Jun 25 11:14:03 mb1 clurgmgrd[14825]: State change: mb2.ku.edu.kw UP 
> 
>On STANDBY NODE 
> 
>forgot to add standby server logs: - 
> 
>Jun 25 11:13:26 mbstandby clurgmgrd[15850]: Resource Group Manager Starting 
>Jun 25 11:13:26 mbstandby clurgmgrd[15850]: Loading Service Data 
>Jun 25 11:13:27 mbstandby clurgmgrd[15850]: Initializing Services 
>Jun 25 11:13:27 mbstandby clurgmgrd: [15850]: /dev/sdl1 is not mounted 
>Jun 25 11:13:27 mbstandby clurgmgrd: [15850]: /dev/sdp1 is not mounted 
>Jun 25 11:13:32 mbstandby clurgmgrd: [15850]: /dev/sdk1 is not mounted 
>Jun 25 11:13:32 mbstandby clurgmgrd: [15850]: /dev/sdn1 is not mounted 
>Jun 25 11:13:38 mbstandby clurgmgrd: [15850]: /dev/sdj1 is not mounted 
>Jun 25 11:13:38 mbstandby clurgmgrd: [15850]: /dev/sdo1 is not mounted 
>Jun 25 11:13:43 mbstandby clurgmgrd: [15850]: /dev/sdi1 is not mounted 
>Jun 25 11:13:43 mbstandby clurgmgrd: [15850]: /dev/sdm1 is not mounted 
>Jun 25 11:13:47 mbstandby sshd(pam_unix)[17583]: session opened for user root by (uid=0) 
>Jun 25 11:13:48 mbstandby clurgmgrd: [15850]: /dev/sdd1 is not mounted 
>Jun 25 11:13:48 mbstandby clurgmgrd: [15850]: /dev/sdh1 is not mounted 
>Jun 25 11:13:53 mbstandby clurgmgrd: [15850]: /dev/sdg1 is not mounted 
>Jun 25 11:13:53 mbstandby clurgmgrd: [15850]: /dev/sdc1 is not mounted 
>Jun 25 11:13:56 mbstandby rgmanager: clurgmgrd startup failed 
>Jun 25 11:13:58 mbstandby clurgmgrd: [15850]: /dev/sdf1 is not mounted 
>Jun 25 11:13:58 mbstandby clurgmgrd: [15850]: /dev/sdb1 is not mounted 
>Jun 25 11:14:04 mbstandby clurgmgrd: [15850]: /dev/sde1 is not mounted 
>Jun 25 11:14:04 mbstandby clurgmgrd: [15850]: /dev/sda1 is not mounted 
>Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Services Initialized 
>Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Logged in SG "usrm::manager" 
>Jun 25 11:14:09 mbstandby clurgmgrd[15850]: Magma Event: Membership Change 
>Jun 25 11:14:09 mbstandby clurgmgrd[15850]: State change: Local UP 
>Jun 25 11:14:12 mbstandby clurgmgrd[15850]: Magma Event: Membership Change 
>Jun 25 11:14:12 mbstandby clurgmgrd[15850]: State change: mb1.ku.edu.kw UP 
>Jun 25 11:14:13 mbstandby clurgmgrd[15850]: Resource groups locked; not evaluating 
>Jun 25 11:14:14 mbstandby clurgmgrd[15850]: Magma Event: Membership Change 
>Jun 25 11:14:14 mbstandby clurgmgrd[15850]: State change: mb2.ku.edu.kw UP 
>Jun 25 11:49:22 mbstandby sshd(pam_unix)[9438]: session opened for user root by (uid=0) 
> 
>Can anyone help me in this issue. Its really urgent. So request you for help. 
> 
> 
> 
> 
> 
>Sig_3million_578x38.gif 
>-- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080626/c6c37f5d/attachment.htm>

From connerf at ncifcrf.gov  Thu Jun 26 13:56:10 2008
From: connerf at ncifcrf.gov (fred conner)
Date: Thu, 26 Jun 2008 09:56:10 -0400
Subject: [Linux-cluster] gfs2 kernel bug?
In-Reply-To: <1214485995.4011.43.camel@quoit>
References: <1214484585.8227.36.camel@norbert>
	<1214484594.4011.40.camel@quoit>  <1214485895.8227.38.camel@norbert>
	<1214485995.4011.43.camel@quoit>
Message-ID: <1214488571.8227.50.camel@norbert>

thanks, I am going to upgrade to Enterprise 5.2.  There is a kernel
module update and according to the errata is fixes some gfs2 bugs.  It
does not say which bugs it fixes, but hopefully it fixes this one.

On Thu, 2008-06-26 at 14:13 +0100, Steven Whitehouse wrote:
> Hi,
> 
> On Thu, 2008-06-26 at 09:11 -0400, fred conner wrote:
> > I am running Redhat Enterprise 5.1.
> > 
> Ok. That is ancient then. Try the latest Fedora, or if you need support
> then the best thing is to talk directly to the Red Hat sales/support who
> can tell you all the current details on that.
> 
> I know that its marked as a preview feature in RHEL, but really Fedora
> is better for that purpose since its much more uptodate,
> 
> Steve.
> 
> > On Thu, 2008-06-26 at 13:49 +0100, Steven Whitehouse wrote:
> > > Hi,
> > > 
> > > It would help to know what version of GFS2 you are using, but from the
> > > trace it looks likely that its ancient and if you choose a more recent
> > > version of the code, you should find that this problem has gone away,
> > > 
> > > Steve.
> > > 
> > > On Thu, 2008-06-26 at 08:49 -0400, fred conner wrote:
> > > > yesterday a co-worker was compiling code on a gfs2 filesystem and it
> > > > hung the server.  Is there a fix for this?  thank you.
> > > > 
> > > > Jun 25 14:24:54 abcc2 kernel: dlm: connecting to 1
> > > > Jun 25 14:24:54 abcc2 kernel: dlm: got connection from 1
> > > > original: gfs2_rename+0x1a9/0x610 [gfs2]
> > > > new: gfs2_inplace_reserve_i+0x205/0x5d0 [gfs2]
> > > > Kernel BUG at fs/gfs2/glock.c:1131
> > > > invalid opcode: 0000 [1] SMP 
> > > > last sysfs file: /kernel/dlm/rgmanager/control
> > > > CPU 3 
> > > > Modules linked in: autofs4 ipmi_devintf ipmi_si ipmi_msghandler hidp
> > > > l2cap bluetooth nfs lockd fscache nfs_acl lock_dlm gfs2 dlm configfs
> > > > sunrpc tun
> > > >   ip_gre ip_conntrack_ftp ip_conntrack_netbios_ns ipt_REJECT ipt_LOG
> > > > xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT
> > > > xt_tcpudp i
> > > > p6table_filter ip6_tables x_tables ipv6 video sbs backlight i2c_ec
> > > > i2c_corea bcc2 kernel: orbutton battery asus_acpi acpi_memhotplug ac
> > > > parport_pc l
> > > > p parport joydev ata_piix libata sr_mod shpchp ide_cd cdrom e1000 sg
> > > > pcspkr serio_raw bnx2 dm_snapshot dm_zero dm_mirror dm_mod usb_storage
> > > > megaraid
> > > > _sas megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd ehci_hcd
> > > > ohci_hcd uhci_hcd
> > > > Pid: 13627, comm: mv Not tainted 2.6.18-53.1.14.el5 #1
> > > > iginal: gfs2_renRIP: 0010:[<ffffffff8849b216>] ame+0x1a9/0x610
> > > > [<ffffffff8849b216>] :gfs2:gfs2_glock_nq+0x111/0x1d4
> > > > RSP: 0018:ffff8103e7681a98  EFLAGS: 00010286
> > > > [gfs2]
> > > > Jun 25 1RAX: 0000000000000020 RBX: ffff8103e7681cb0 RCX:
> > > > ffffffff80443520
> > > > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff802e675c
> > > > RBP: ffff8103f55c3eb0 R08: 00000000ffffffff R09: 0000000000000020
> > > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8103eadf3378
> > > > R13: ffff8103eadf3378 R14: 0000000000000000 R15: ffff81041d563000
> > > > FS:  00002aaaaaac9f20(0000) GS:ffff81010eb9c640(0000)
> > > > knlGS:0000000000000000
> > > > new: gfs2_CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 0000003acba5cf10 CR3: 00000003ee29e000 CR4: 00000000000006e0
> > > > Process mv (pid: 13627, threadinfo ffff8103e7680000, task
> > > > ffff810409dd20c0)
> > > > inplace_reserve_Stack:  0000000000000000 00000000000200cf
> > > > ffff81041d039dc0 ffff8103f55c3eb0
> > > >   ffff81041d563000 ffff8103f55c3a60 ffff8103f55c3d60 ffffffff884ac499
> > > >   0000000000000000 000002d000000000 ffffffff884b3568 ffff81041d563000
> > > > i+0x205/0x5d0 [gCall Trace:
> > > > fs2]
> > > >   [<ffffffff884ac499>] :gfs2:gfs2_inplace_reserve_i+0x20d/0x5d0
> > > >   [<ffffffff80143f87>] sort+0xfa/0x189
> > > >   [<ffffffff884a9ab1>] :gfs2:sort_qd+0x0/0x36
> > > >   [<ffffffff884a9cf5>] :gfs2:gfs2_quota_check+0x9a/0x182
> > > >   [<ffffffff884a69ce>] :gfs2:gfs2_rename+0x3b5/0x610
> > > >   [<ffffffff884a6715>] :gfs2:gfs2_rename+0xfc/0x610
> > > >   [<ffffffff884a6757>] :gfs2:gfs2_rename+0x13e/0x610
> > > >   [<ffffffff884a6781>] :gfs2:gfs2_rename+0x168/0x610
> > > >   [<ffffffff884a67c2>] :gfs2:gfs2_rename+0x1a9/0x610
> > > >   [<ffffffff8849b417>] :gfs2:gfs2_holder_uninit+0xd/0x1f
> > > >   [<ffffffff884a765e>] :gfs2:gfs2_permission+0xae/0xd4
> > > >   [<ffffffff8000d42e>] permission+0x81/0xc8
> > > >   [<ffffffff8002a515>] vfs_rename+0x2db/0x458
> > > >   [<ffffffff800362fa>] sys_renameat+0x180/0x1eb
> > > >   [<ffffffff80065a9d>] do_page_fault+0x4eb/0x81d
> > > >   [<ffffffff8002c886>] mntput_no_expire+0x19/0x89
> > > >   [<ffffffff80031f4f>] sys_faccessat+0x148/0x18d
> > > >   [<ffffffff800b3a88>] audit_syscall_entry+0x14d/0x180
> > > >   [<ffffffff8005c28d>] tracesys+0xd5/0xe0
> > > > 
> > > > 
> > > > Code: 0f 0b 68 2d 1c 4b 88 c2 6b 04 8b 75 18 49 8b 44 24 78 49 8d 
> > > > RIP  [<ffffffff8849b216>] :gfs2:gfs2_glock_nq+0x111/0x1d4
> > > >   RSP <ffff8103e7681a98>
> > > >   <0>Kernel panic - not syncing: Fatal exception
> > > > 
> > > > 
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
-- 
Fred Conner [Contractor]


From swhiteho at redhat.com  Thu Jun 26 13:56:10 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Thu, 26 Jun 2008 14:56:10 +0100
Subject: [Linux-cluster] gfs2, kvm setup
In-Reply-To: <20080625224544.GJ12629@fieldses.org>
References: <20080625224544.GJ12629@fieldses.org>
Message-ID: <1214488571.4011.46.camel@quoit>

Hi,

On Wed, 2008-06-25 at 18:45 -0400, J. Bruce Fields wrote:
> I'm trying to get a gfs2 file system running on some kvm hosts, using an
> ordinary qemu disk for the shared storage (is there any reason this
> can't work?).
> 
> I installed openais80.3 from source (after modifying Makefile so "make
> install" would install to /), and installed gfs2 from the STABLE2 branch
> of git://sources.redhat.com/git/cluster.git, plus this patch:
> 
> 	https://www.redhat.com/archives/cluster-devel/2008-April/msg00143.html
> 
> (with conflict in write_result() resolved in the obvious way).  The
> kernel is from recent git: 2.6.26-rc4-00103-g1beee8d.  I created a
> minimal cluster.conf and did a mkfs -tgfs2 following doc/usage.txt, then
> did the startup steps from usage.txt by hand.  Everything works up to
> the mount, at which point the first host gets the following lock bug in
> the logs.  Other mounts fail or hang.
> 
> Any hints?
> 
So the first mount is ok, but further mounts fail? or is it all mounts
that fail/hang?

Steve.

> --b.
> 
> Jun 25 18:30:11 piglet1 ccsd[3022]: Starting ccsd 1214172260: 
> Jun 25 18:30:11 piglet1 ccsd[3022]:  Built: Jun 22 2008 18:04:35 
> Jun 25 18:30:11 piglet1 ccsd[3022]:  Copyright (C) Red Hat, Inc.  2004-2008  All rights reserved. 
> Jun 25 18:30:11 piglet1 ccsd[3022]: /etc/cluster/cluster.conf (cluster name = piglet, version = 1) found. 
> Jun 25 18:30:15 piglet1 ccsd[3022]: Initial status:: Quorate 
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "piglet:test"
> Jun 25 18:31:01 piglet1 kernel: dlm: Using TCP for communications
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: Joined cluster. Now mounting FS...
> Jun 25 18:31:01 piglet1 kernel: 
> Jun 25 18:31:01 piglet1 kernel: =====================================
> Jun 25 18:31:01 piglet1 kernel: [ BUG: bad unlock balance detected! ]
> Jun 25 18:31:01 piglet1 kernel: -------------------------------------
> Jun 25 18:31:01 piglet1 kernel: dlm_recoverd/3061 is trying to release lock (&ls->ls_in_recovery) at:
> Jun 25 18:31:01 piglet1 kernel: [<c01c3930>] dlm_recoverd+0x440/0x510
> Jun 25 18:31:01 piglet1 kernel: but there are no more locks to release!
> Jun 25 18:31:01 piglet1 kernel: 
> Jun 25 18:31:01 piglet1 kernel: other info that might help us debug this:
> Jun 25 18:31:01 piglet1 kernel: 3 locks held by dlm_recoverd/3061:
> Jun 25 18:31:01 piglet1 kernel:  #0:  (&ls->ls_recoverd_active){--..}, at: [<c01c35c5>] dlm_recoverd+0xd5/0x510
> Jun 25 18:31:01 piglet1 kernel:  #1:  (&ls->ls_recv_active){--..}, at: [<c01c38c5>] dlm_recoverd+0x3d5/0x510
> Jun 25 18:31:01 piglet1 kernel:  #2:  (&ls->ls_recover_lock){--..}, at: [<c01c38cd>] dlm_recoverd+0x3dd/0x510
> Jun 25 18:31:01 piglet1 kernel: 
> Jun 25 18:31:01 piglet1 kernel: stack backtrace:
> Jun 25 18:31:01 piglet1 kernel: Pid: 3061, comm: dlm_recoverd Not tainted 2.6.26-rc4-00103-g1beee8d #38
> Jun 25 18:31:01 piglet1 kernel:  [<c0137bb9>] print_unlock_inbalance_bug+0xc9/0xf0
> Jun 25 18:31:01 piglet1 kernel:  [<c0137449>] ? save_trace+0x39/0xa0
> Jun 25 18:31:01 piglet1 kernel:  [<c01374ea>] ? add_lock_to_list+0x3a/0xa0
> Jun 25 18:31:01 piglet1 kernel:  [<c0139cac>] ? __lock_acquire+0xb9c/0xfc0
> Jun 25 18:31:01 piglet1 kernel:  [<c0139f14>] ? __lock_acquire+0xe04/0xfc0
> Jun 25 18:31:01 piglet1 kernel:  [<c013a23f>] lock_release_non_nested+0xff/0x170
> Jun 25 18:31:01 piglet1 kernel:  [<c01c3930>] ? dlm_recoverd+0x440/0x510
> Jun 25 18:31:01 piglet1 kernel:  [<c01c3930>] ? dlm_recoverd+0x440/0x510
> Jun 25 18:31:01 piglet1 kernel:  [<c013a33d>] lock_release+0x8d/0x150
> Jun 25 18:31:01 piglet1 kernel:  [<c0131c96>] up_write+0x16/0x30
> Jun 25 18:31:01 piglet1 kernel:  [<c01c3930>] dlm_recoverd+0x440/0x510
> Jun 25 18:31:01 piglet1 kernel:  [<c01c34f0>] ? dlm_recoverd+0x0/0x510
> Jun 25 18:31:01 piglet1 kernel:  [<c012e546>] kthread+0x36/0x60
> Jun 25 18:31:01 piglet1 kernel:  [<c012e510>] ? kthread+0x0/0x60
> Jun 25 18:31:01 piglet1 kernel:  [<c0103587>] kernel_thread_helper+0x7/0x10
> Jun 25 18:31:01 piglet1 kernel:  =======================
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0, already locked for use
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0: Looking at journal...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0: Done
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Trying to acquire journal lock...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Looking at journal...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Done
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Trying to acquire journal lock...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Looking at journal...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Done
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Trying to acquire journal lock...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Looking at journal...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Done
> Jun 25 18:33:46 piglet1 ntpd[2951]: synchronized to 76.189.12.0, stratum 1
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From teigland at redhat.com  Thu Jun 26 15:27:33 2008
From: teigland at redhat.com (David Teigland)
Date: Thu, 26 Jun 2008 10:27:33 -0500
Subject: [Linux-cluster] gfs2, kvm setup
In-Reply-To: <20080625224544.GJ12629@fieldses.org>
References: <20080625224544.GJ12629@fieldses.org>
Message-ID: <20080626152733.GC21081@redhat.com>

On Wed, Jun 25, 2008 at 06:45:44PM -0400, J. Bruce Fields wrote:
> I'm trying to get a gfs2 file system running on some kvm hosts, using an
> ordinary qemu disk for the shared storage (is there any reason this
> can't work?).
> 
> I installed openais80.3 from source (after modifying Makefile so "make
> install" would install to /), and installed gfs2 from the STABLE2 branch
> of git://sources.redhat.com/git/cluster.git, plus this patch:
> 
> 	https://www.redhat.com/archives/cluster-devel/2008-April/msg00143.html
> 
> (with conflict in write_result() resolved in the obvious way).  The
> kernel is from recent git: 2.6.26-rc4-00103-g1beee8d.  I created a
> minimal cluster.conf and did a mkfs -tgfs2 following doc/usage.txt, then
> did the startup steps from usage.txt by hand.  Everything works up to
> the mount, at which point the first host gets the following lock bug in
> the logs.  Other mounts fail or hang.

I don't know why other mounts fail or hang, but it's not related to this:

> Jun 25 18:31:01 piglet1 kernel: =====================================
> Jun 25 18:31:01 piglet1 kernel: [ BUG: bad unlock balance detected! ]
> Jun 25 18:31:01 piglet1 kernel: -------------------------------------
> Jun 25 18:31:01 piglet1 kernel: dlm_recoverd/3061 is trying to release lock (&ls->ls_in_recovery) at:
> Jun 25 18:31:01 piglet1 kernel: [<c01c3930>] dlm_recoverd+0x440/0x510
> Jun 25 18:31:01 piglet1 kernel: but there are no more locks to release!
> Jun 25 18:31:01 piglet1 kernel: 
> Jun 25 18:31:01 piglet1 kernel: other info that might help us debug this:
> Jun 25 18:31:01 piglet1 kernel: 3 locks held by dlm_recoverd/3061:
> Jun 25 18:31:01 piglet1 kernel:  #0:  (&ls->ls_recoverd_active){--..}, at: [<c01c35c5>] dlm_recoverd+0xd5/0x510
> Jun 25 18:31:01 piglet1 kernel:  #1:  (&ls->ls_recv_active){--..}, at: [<c01c38c5>] dlm_recoverd+0x3d5/0x510
> Jun 25 18:31:01 piglet1 kernel:  #2:  (&ls->ls_recover_lock){--..}, at: [<c01c38cd>] dlm_recoverd+0x3dd/0x510
> Jun 25 18:31:01 piglet1 kernel: 
> Jun 25 18:31:01 piglet1 kernel: stack backtrace:
> Jun 25 18:31:01 piglet1 kernel: Pid: 3061, comm: dlm_recoverd Not tainted 2.6.26-rc4-00103-g1beee8d #38
> Jun 25 18:31:01 piglet1 kernel:  [<c0137bb9>] print_unlock_inbalance_bug+0xc9/0xf0

This is actually a false warning that's triggered by different threads
doing the up and down.  To remove this we'd need down_write_non_owner() /
up_write_non_owner() to parallel the "read" variants in rwsem.h.

> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0, already locked for use
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0: Looking at journal...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0: Done
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Trying to acquire journal lock...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Looking at journal...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Done
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Trying to acquire journal lock...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Looking at journal...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Done
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Trying to acquire journal lock...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Looking at journal...
> Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Done

This mount appears to have been successful.  Usual things to collect for
debugging the other problems:
- any errors in /var/log/messages from all nodes
- cman_tool nodes; cman_tool status from all nodes
- group_tool -v from all nodes

Dave


From bfields at fieldses.org  Thu Jun 26 18:32:17 2008
From: bfields at fieldses.org (J. Bruce Fields)
Date: Thu, 26 Jun 2008 14:32:17 -0400
Subject: [Linux-cluster] gfs2, kvm setup
In-Reply-To: <1214488571.4011.46.camel@quoit>
References: <20080625224544.GJ12629@fieldses.org>
	<1214488571.4011.46.camel@quoit>
Message-ID: <20080626183217.GC10593@fieldses.org>

On Thu, Jun 26, 2008 at 02:56:10PM +0100, Steven Whitehouse wrote:
> Hi,
> 
> On Wed, 2008-06-25 at 18:45 -0400, J. Bruce Fields wrote:
> > I'm trying to get a gfs2 file system running on some kvm hosts, using an
> > ordinary qemu disk for the shared storage (is there any reason this
> > can't work?).
> > 
> > I installed openais80.3 from source (after modifying Makefile so "make
> > install" would install to /), and installed gfs2 from the STABLE2 branch
> > of git://sources.redhat.com/git/cluster.git, plus this patch:
> > 
> > 	https://www.redhat.com/archives/cluster-devel/2008-April/msg00143.html
> > 
> > (with conflict in write_result() resolved in the obvious way).  The
> > kernel is from recent git: 2.6.26-rc4-00103-g1beee8d.  I created a
> > minimal cluster.conf and did a mkfs -tgfs2 following doc/usage.txt, then
> > did the startup steps from usage.txt by hand.  Everything works up to
> > the mount, at which point the first host gets the following lock bug in
> > the logs.  Other mounts fail or hang.
> > 
> > Any hints?
> > 
> So the first mount is ok, but further mounts fail? or is it all mounts
> that fail/hang?

The first mount appears to succeed, though any subsequent access to the
mounted filesystem hangs (I assume that's by design).  Mounts from the
other nodes hang or fail.

--b.

(PS: Could you leave me cc'd?)

> 
> Steve.
> 
> > --b.
> > 
> > Jun 25 18:30:11 piglet1 ccsd[3022]: Starting ccsd 1214172260: 
> > Jun 25 18:30:11 piglet1 ccsd[3022]:  Built: Jun 22 2008 18:04:35 
> > Jun 25 18:30:11 piglet1 ccsd[3022]:  Copyright (C) Red Hat, Inc.  2004-2008  All rights reserved. 
> > Jun 25 18:30:11 piglet1 ccsd[3022]: /etc/cluster/cluster.conf (cluster name = piglet, version = 1) found. 
> > Jun 25 18:30:15 piglet1 ccsd[3022]: Initial status:: Quorate 
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "piglet:test"
> > Jun 25 18:31:01 piglet1 kernel: dlm: Using TCP for communications
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: Joined cluster. Now mounting FS...
> > Jun 25 18:31:01 piglet1 kernel: 
> > Jun 25 18:31:01 piglet1 kernel: =====================================
> > Jun 25 18:31:01 piglet1 kernel: [ BUG: bad unlock balance detected! ]
> > Jun 25 18:31:01 piglet1 kernel: -------------------------------------
> > Jun 25 18:31:01 piglet1 kernel: dlm_recoverd/3061 is trying to release lock (&ls->ls_in_recovery) at:
> > Jun 25 18:31:01 piglet1 kernel: [<c01c3930>] dlm_recoverd+0x440/0x510
> > Jun 25 18:31:01 piglet1 kernel: but there are no more locks to release!
> > Jun 25 18:31:01 piglet1 kernel: 
> > Jun 25 18:31:01 piglet1 kernel: other info that might help us debug this:
> > Jun 25 18:31:01 piglet1 kernel: 3 locks held by dlm_recoverd/3061:
> > Jun 25 18:31:01 piglet1 kernel:  #0:  (&ls->ls_recoverd_active){--..}, at: [<c01c35c5>] dlm_recoverd+0xd5/0x510
> > Jun 25 18:31:01 piglet1 kernel:  #1:  (&ls->ls_recv_active){--..}, at: [<c01c38c5>] dlm_recoverd+0x3d5/0x510
> > Jun 25 18:31:01 piglet1 kernel:  #2:  (&ls->ls_recover_lock){--..}, at: [<c01c38cd>] dlm_recoverd+0x3dd/0x510
> > Jun 25 18:31:01 piglet1 kernel: 
> > Jun 25 18:31:01 piglet1 kernel: stack backtrace:
> > Jun 25 18:31:01 piglet1 kernel: Pid: 3061, comm: dlm_recoverd Not tainted 2.6.26-rc4-00103-g1beee8d #38
> > Jun 25 18:31:01 piglet1 kernel:  [<c0137bb9>] print_unlock_inbalance_bug+0xc9/0xf0
> > Jun 25 18:31:01 piglet1 kernel:  [<c0137449>] ? save_trace+0x39/0xa0
> > Jun 25 18:31:01 piglet1 kernel:  [<c01374ea>] ? add_lock_to_list+0x3a/0xa0
> > Jun 25 18:31:01 piglet1 kernel:  [<c0139cac>] ? __lock_acquire+0xb9c/0xfc0
> > Jun 25 18:31:01 piglet1 kernel:  [<c0139f14>] ? __lock_acquire+0xe04/0xfc0
> > Jun 25 18:31:01 piglet1 kernel:  [<c013a23f>] lock_release_non_nested+0xff/0x170
> > Jun 25 18:31:01 piglet1 kernel:  [<c01c3930>] ? dlm_recoverd+0x440/0x510
> > Jun 25 18:31:01 piglet1 kernel:  [<c01c3930>] ? dlm_recoverd+0x440/0x510
> > Jun 25 18:31:01 piglet1 kernel:  [<c013a33d>] lock_release+0x8d/0x150
> > Jun 25 18:31:01 piglet1 kernel:  [<c0131c96>] up_write+0x16/0x30
> > Jun 25 18:31:01 piglet1 kernel:  [<c01c3930>] dlm_recoverd+0x440/0x510
> > Jun 25 18:31:01 piglet1 kernel:  [<c01c34f0>] ? dlm_recoverd+0x0/0x510
> > Jun 25 18:31:01 piglet1 kernel:  [<c012e546>] kthread+0x36/0x60
> > Jun 25 18:31:01 piglet1 kernel:  [<c012e510>] ? kthread+0x0/0x60
> > Jun 25 18:31:01 piglet1 kernel:  [<c0103587>] kernel_thread_helper+0x7/0x10
> > Jun 25 18:31:01 piglet1 kernel:  =======================
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0, already locked for use
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0: Looking at journal...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0: Done
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Trying to acquire journal lock...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Looking at journal...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Done
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Trying to acquire journal lock...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Looking at journal...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Done
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Trying to acquire journal lock...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Looking at journal...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Done
> > Jun 25 18:33:46 piglet1 ntpd[2951]: synchronized to 76.189.12.0, stratum 1
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From bfields at fieldses.org  Thu Jun 26 18:35:29 2008
From: bfields at fieldses.org (J. Bruce Fields)
Date: Thu, 26 Jun 2008 14:35:29 -0400
Subject: [Linux-cluster] gfs2, kvm setup
In-Reply-To: <20080626152733.GC21081@redhat.com>
References: <20080625224544.GJ12629@fieldses.org>
	<20080626152733.GC21081@redhat.com>
Message-ID: <20080626183529.GD10593@fieldses.org>

On Thu, Jun 26, 2008 at 10:27:33AM -0500, David Teigland wrote:
> On Wed, Jun 25, 2008 at 06:45:44PM -0400, J. Bruce Fields wrote:
> > I'm trying to get a gfs2 file system running on some kvm hosts, using an
> > ordinary qemu disk for the shared storage (is there any reason this
> > can't work?).
> > 
> > I installed openais80.3 from source (after modifying Makefile so "make
> > install" would install to /), and installed gfs2 from the STABLE2 branch
> > of git://sources.redhat.com/git/cluster.git, plus this patch:
> > 
> > 	https://www.redhat.com/archives/cluster-devel/2008-April/msg00143.html
> > 
> > (with conflict in write_result() resolved in the obvious way).  The
> > kernel is from recent git: 2.6.26-rc4-00103-g1beee8d.  I created a
> > minimal cluster.conf and did a mkfs -tgfs2 following doc/usage.txt, then
> > did the startup steps from usage.txt by hand.  Everything works up to
> > the mount, at which point the first host gets the following lock bug in
> > the logs.  Other mounts fail or hang.
> 
> I don't know why other mounts fail or hang, but it's not related to this:

OK, darn I was hoping it was that simple.

> > Jun 25 18:31:01 piglet1 kernel: =====================================
> > Jun 25 18:31:01 piglet1 kernel: [ BUG: bad unlock balance detected! ]
> > Jun 25 18:31:01 piglet1 kernel: -------------------------------------
> > Jun 25 18:31:01 piglet1 kernel: dlm_recoverd/3061 is trying to release lock (&ls->ls_in_recovery) at:
> > Jun 25 18:31:01 piglet1 kernel: [<c01c3930>] dlm_recoverd+0x440/0x510
> > Jun 25 18:31:01 piglet1 kernel: but there are no more locks to release!
> > Jun 25 18:31:01 piglet1 kernel: 
> > Jun 25 18:31:01 piglet1 kernel: other info that might help us debug this:
> > Jun 25 18:31:01 piglet1 kernel: 3 locks held by dlm_recoverd/3061:
> > Jun 25 18:31:01 piglet1 kernel:  #0:  (&ls->ls_recoverd_active){--..}, at: [<c01c35c5>] dlm_recoverd+0xd5/0x510
> > Jun 25 18:31:01 piglet1 kernel:  #1:  (&ls->ls_recv_active){--..}, at: [<c01c38c5>] dlm_recoverd+0x3d5/0x510
> > Jun 25 18:31:01 piglet1 kernel:  #2:  (&ls->ls_recover_lock){--..}, at: [<c01c38cd>] dlm_recoverd+0x3dd/0x510
> > Jun 25 18:31:01 piglet1 kernel: 
> > Jun 25 18:31:01 piglet1 kernel: stack backtrace:
> > Jun 25 18:31:01 piglet1 kernel: Pid: 3061, comm: dlm_recoverd Not tainted 2.6.26-rc4-00103-g1beee8d #38
> > Jun 25 18:31:01 piglet1 kernel:  [<c0137bb9>] print_unlock_inbalance_bug+0xc9/0xf0
> 
> This is actually a false warning that's triggered by different threads
> doing the up and down.  To remove this we'd need down_write_non_owner() /
> up_write_non_owner() to parallel the "read" variants in rwsem.h.

Thanks for the explanation.

> 
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0, already locked for use
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0: Looking at journal...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0: Done
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Trying to acquire journal lock...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Looking at journal...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Done
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Trying to acquire journal lock...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Looking at journal...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Done
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Trying to acquire journal lock...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Looking at journal...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Done
> 
> This mount appears to have been successful.  Usual things to collect for
> debugging the other problems:
> - any errors in /var/log/messages from all nodes
> - cman_tool nodes; cman_tool status from all nodes
> - group_tool -v from all nodes

Thanks, I'll see what more information I can collect.

--b.

(PS: Can I get cc'd?  I filter mailing list traffic to a different
folder that I don't look at as often....)


From bfields at fieldses.org  Thu Jun 26 19:11:06 2008
From: bfields at fieldses.org (J. Bruce Fields)
Date: Thu, 26 Jun 2008 15:11:06 -0400
Subject: [Linux-cluster] gfs2, kvm setup
In-Reply-To: <20080626183529.GD10593@fieldses.org>
References: <20080625224544.GJ12629@fieldses.org>
	<20080626152733.GC21081@redhat.com>
	<20080626183529.GD10593@fieldses.org>
Message-ID: <20080626191106.GA11945@fieldses.org>

On Thu, Jun 26, 2008 at 02:35:29PM -0400, bfields wrote:
> On Thu, Jun 26, 2008 at 10:27:33AM -0500, David Teigland wrote:
> > This mount appears to have been successful.  Usual things to collect for
> > debugging the other problems:
> > - any errors in /var/log/messages from all nodes
> > - cman_tool nodes; cman_tool status from all nodes
> > - group_tool -v from all nodes
> 
> Thanks, I'll see what more information I can collect.

So, the first mount (on "piglet1") succeeds.  The second (on "piglet2")
returns immediately without mounting, and leaves this in the logs:

	gfs_controld[3035]: segfault at 0 ip 08051361 sp bfd88ae0 error 4 in gfs_controld[8048000+1d000]
	GFS2: fsid=: Trying to join cluster "lock_dlm", "piglet:test"
	lock_dlm: no mount options, (u)mount helpers not installed
	GFS2: fsid=: can't mount proto=lock_dlm, table=piglet:test, hostdata=

At this point, cman_tool nodes, status, and group_tool -v output from
piglet 1 are:

Node  Sts   Inc   Joined               Name
   1   M    128   2008-06-26 14:49:51  piglet1
   2   M    132   2008-06-26 14:49:51  piglet2
   3   M    136   2008-06-26 14:49:52  piglet3
   4   M    132   2008-06-26 14:49:51  piglet4
Version: 6.1.0
Config Version: 1
Cluster Name: piglet
Cluster Id: 6838
Cluster Member: Yes
Cluster Generation: 136
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Quorum: 3  
Active subsystems: 6
Flags: Dirty 
Ports Bound: 0  
Node name: piglet1
Node ID: 1
Multicast addresses: 239.192.26.208 
Node addresses: 192.168.122.129 
type             level name     id       state node id local_done
fence            0     default  00010004 none        
[1 2 3 4]
dlm              1     test     00020001 none        
[1]
gfs              2     test     00010001 none        
[1 2]

>From piglet2:

Node  Sts   Inc   Joined               Name
   1   M    132   2008-06-26 14:49:52  piglet1
   2   M    124   2008-06-26 14:49:51  piglet2
   3   M    136   2008-06-26 14:49:52  piglet3
   4   M    128   2008-06-26 14:49:51  piglet4
Version: 6.1.0
Config Version: 1
Cluster Name: piglet
Cluster Id: 6838
Cluster Member: Yes
Cluster Generation: 136
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Quorum: 3  
Active subsystems: 6
Flags: Dirty 
Ports Bound: 0  
Node name: piglet2
Node ID: 2
Multicast addresses: 239.192.26.208 
Node addresses: 192.168.122.130 
type             level name     id       state node id local_done
fence            0     default  00010004 none        
[1 2 3 4]
gfs              2     test     00010001 none        
[1 2]

>From piglet3:
Node  Sts   Inc   Joined               Name
   1   M    136   2008-06-26 14:49:52  piglet1
   2   M    136   2008-06-26 14:49:52  piglet2
   3   M    124   2008-06-26 14:49:52  piglet3
   4   M    136   2008-06-26 14:49:52  piglet4
Version: 6.1.0
Config Version: 1
Cluster Name: piglet
Cluster Id: 6838
Cluster Member: Yes
Cluster Generation: 136
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Quorum: 3  
Active subsystems: 7
Flags: Dirty 
Ports Bound: 0  
Node name: piglet3
Node ID: 3
Multicast addresses: 239.192.26.208 
Node addresses: 192.168.122.131 
type             level name     id       state node id local_done
fence            0     default  00010004 none        
[1 2 3 4]

>From piglet4:
Node  Sts   Inc   Joined               Name
   1   M    132   2008-06-26 14:49:51  piglet1
   2   M    128   2008-06-26 14:49:51  piglet2
   3   M    136   2008-06-26 14:49:52  piglet3
   4   M    124   2008-06-26 14:49:50  piglet4
Version: 6.1.0
Config Version: 1
Cluster Name: piglet
Cluster Id: 6838
Cluster Member: Yes
Cluster Generation: 136
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Quorum: 3  
Active subsystems: 7
Flags: Dirty 
Ports Bound: 0  
Node name: piglet4
Node ID: 4
Multicast addresses: 239.192.26.208 
Node addresses: 192.168.122.132 
type             level name     id       state node id local_done
fence            0     default  00010004 none        
[1 2 3 4]

--b.


From bfields at fieldses.org  Thu Jun 26 20:33:15 2008
From: bfields at fieldses.org (J. Bruce Fields)
Date: Thu, 26 Jun 2008 16:33:15 -0400
Subject: [Linux-cluster] gfs2, kvm setup
In-Reply-To: <20080626191106.GA11945@fieldses.org>
References: <20080625224544.GJ12629@fieldses.org>
	<20080626152733.GC21081@redhat.com>
	<20080626183529.GD10593@fieldses.org>
	<20080626191106.GA11945@fieldses.org>
Message-ID: <20080626203315.GB13293@fieldses.org>

On Thu, Jun 26, 2008 at 03:11:06PM -0400, bfields wrote:
> On Thu, Jun 26, 2008 at 02:35:29PM -0400, bfields wrote:
> > On Thu, Jun 26, 2008 at 10:27:33AM -0500, David Teigland wrote:
> > > This mount appears to have been successful.  Usual things to collect for
> > > debugging the other problems:
> > > - any errors in /var/log/messages from all nodes
> > > - cman_tool nodes; cman_tool status from all nodes
> > > - group_tool -v from all nodes
> > 
> > Thanks, I'll see what more information I can collect.
> 
> So, the first mount (on "piglet1") succeeds.  The second (on "piglet2")
> returns immediately without mounting, and leaves this in the logs:
> 
> 	gfs_controld[3035]: segfault at 0 ip 08051361 sp bfd88ae0 error 4 in gfs_controld[8048000+1d000]

Looking at the object file, that appears to be in purge_plocks().

After rerunning configure with --debug and rebuilding, the second mount
hangs instead of returning immediately without mounting.

--b.

> 	GFS2: fsid=: Trying to join cluster "lock_dlm", "piglet:test"
> 	lock_dlm: no mount options, (u)mount helpers not installed
> 	GFS2: fsid=: can't mount proto=lock_dlm, table=piglet:test, hostdata=
> 
> At this point, cman_tool nodes, status, and group_tool -v output from
> piglet 1 are:
> 
> Node  Sts   Inc   Joined               Name
>    1   M    128   2008-06-26 14:49:51  piglet1
>    2   M    132   2008-06-26 14:49:51  piglet2
>    3   M    136   2008-06-26 14:49:52  piglet3
>    4   M    132   2008-06-26 14:49:51  piglet4
> Version: 6.1.0
> Config Version: 1
> Cluster Name: piglet
> Cluster Id: 6838
> Cluster Member: Yes
> Cluster Generation: 136
> Membership state: Cluster-Member
> Nodes: 4
> Expected votes: 4
> Total votes: 4
> Quorum: 3  
> Active subsystems: 6
> Flags: Dirty 
> Ports Bound: 0  
> Node name: piglet1
> Node ID: 1
> Multicast addresses: 239.192.26.208 
> Node addresses: 192.168.122.129 
> type             level name     id       state node id local_done
> fence            0     default  00010004 none        
> [1 2 3 4]
> dlm              1     test     00020001 none        
> [1]
> gfs              2     test     00010001 none        
> [1 2]
> 
> From piglet2:
> 
> Node  Sts   Inc   Joined               Name
>    1   M    132   2008-06-26 14:49:52  piglet1
>    2   M    124   2008-06-26 14:49:51  piglet2
>    3   M    136   2008-06-26 14:49:52  piglet3
>    4   M    128   2008-06-26 14:49:51  piglet4
> Version: 6.1.0
> Config Version: 1
> Cluster Name: piglet
> Cluster Id: 6838
> Cluster Member: Yes
> Cluster Generation: 136
> Membership state: Cluster-Member
> Nodes: 4
> Expected votes: 4
> Total votes: 4
> Quorum: 3  
> Active subsystems: 6
> Flags: Dirty 
> Ports Bound: 0  
> Node name: piglet2
> Node ID: 2
> Multicast addresses: 239.192.26.208 
> Node addresses: 192.168.122.130 
> type             level name     id       state node id local_done
> fence            0     default  00010004 none        
> [1 2 3 4]
> gfs              2     test     00010001 none        
> [1 2]
> 
> From piglet3:
> Node  Sts   Inc   Joined               Name
>    1   M    136   2008-06-26 14:49:52  piglet1
>    2   M    136   2008-06-26 14:49:52  piglet2
>    3   M    124   2008-06-26 14:49:52  piglet3
>    4   M    136   2008-06-26 14:49:52  piglet4
> Version: 6.1.0
> Config Version: 1
> Cluster Name: piglet
> Cluster Id: 6838
> Cluster Member: Yes
> Cluster Generation: 136
> Membership state: Cluster-Member
> Nodes: 4
> Expected votes: 4
> Total votes: 4
> Quorum: 3  
> Active subsystems: 7
> Flags: Dirty 
> Ports Bound: 0  
> Node name: piglet3
> Node ID: 3
> Multicast addresses: 239.192.26.208 
> Node addresses: 192.168.122.131 
> type             level name     id       state node id local_done
> fence            0     default  00010004 none        
> [1 2 3 4]
> 
> From piglet4:
> Node  Sts   Inc   Joined               Name
>    1   M    132   2008-06-26 14:49:51  piglet1
>    2   M    128   2008-06-26 14:49:51  piglet2
>    3   M    136   2008-06-26 14:49:52  piglet3
>    4   M    124   2008-06-26 14:49:50  piglet4
> Version: 6.1.0
> Config Version: 1
> Cluster Name: piglet
> Cluster Id: 6838
> Cluster Member: Yes
> Cluster Generation: 136
> Membership state: Cluster-Member
> Nodes: 4
> Expected votes: 4
> Total votes: 4
> Quorum: 3  
> Active subsystems: 7
> Flags: Dirty 
> Ports Bound: 0  
> Node name: piglet4
> Node ID: 4
> Multicast addresses: 239.192.26.208 
> Node addresses: 192.168.122.132 
> type             level name     id       state node id local_done
> fence            0     default  00010004 none        
> [1 2 3 4]
> 
> --b.


From bfields at fieldses.org  Thu Jun 26 21:10:52 2008
From: bfields at fieldses.org (J. Bruce Fields)
Date: Thu, 26 Jun 2008 17:10:52 -0400
Subject: [Linux-cluster] gfs2, kvm setup
In-Reply-To: <20080626203315.GB13293@fieldses.org>
References: <20080625224544.GJ12629@fieldses.org>
	<20080626152733.GC21081@redhat.com>
	<20080626183529.GD10593@fieldses.org>
	<20080626191106.GA11945@fieldses.org>
	<20080626203315.GB13293@fieldses.org>
Message-ID: <20080626211052.GC13293@fieldses.org>

On Thu, Jun 26, 2008 at 04:33:15PM -0400, bfields wrote:
> On Thu, Jun 26, 2008 at 03:11:06PM -0400, bfields wrote:
> > On Thu, Jun 26, 2008 at 02:35:29PM -0400, bfields wrote:
> > > On Thu, Jun 26, 2008 at 10:27:33AM -0500, David Teigland wrote:
> > > > This mount appears to have been successful.  Usual things to collect for
> > > > debugging the other problems:
> > > > - any errors in /var/log/messages from all nodes
> > > > - cman_tool nodes; cman_tool status from all nodes
> > > > - group_tool -v from all nodes
> > > 
> > > Thanks, I'll see what more information I can collect.
> > 
> > So, the first mount (on "piglet1") succeeds.  The second (on "piglet2")
> > returns immediately without mounting, and leaves this in the logs:
> > 
> > 	gfs_controld[3035]: segfault at 0 ip 08051361 sp bfd88ae0 error 4 in gfs_controld[8048000+1d000]
> 
> Looking at the object file, that appears to be in purge_plocks().
> 
> After rerunning configure with --debug and rebuilding, the second mount
> hangs instead of returning immediately without mounting.

By the way, as I've said, I'm using the STABLE2 branch with the one
patch from you to bring it uptodate with the latest kernel interface.  I
might actually prefer to try the master branch instead (with the stuff
labeled "cluster3" in the wiki), but didn't know whether it was actually
working at this point.  Is it?

My goal is just to be able to experiment with your latest posix locking
code and with nfsd exports.

--b.


From whiteley at pdx.edu  Thu Jun 26 21:51:09 2008
From: whiteley at pdx.edu (matt whiteley)
Date: Thu, 26 Jun 2008 14:51:09 -0700
Subject: [Linux-cluster] virtual machine failover with gfs
In-Reply-To: <bb0fa5d0806251641s7596ebc6q2d905187e24e7a5b@mail.gmail.com>
References: <D074D3F4-84B1-4596-92AF-12FD99AF726B@pdx.edu>
	<bb0fa5d0806251641s7596ebc6q2d905187e24e7a5b@mail.gmail.com>
Message-ID: <1C3145B9-8F7A-462C-8E14-FD8DAC53F55F@pdx.edu>

On Jun 25, 2008, at 4:41 PM, Joe Royall wrote:
> Why not use lvm backed vms, 1 per vm, share the entire partition  
> with all the lvms via ISCSI to each dom0 and run clvm on the dom0s.   
> The lvms do not need to be mounted in dom0.  You can then use RHCS  
> to failover vms between dom0s.  Consider putting all the vms on a  
> single node into a single resource group and only allow 1 group to  
> operate on a single node.  You can then configure N+1 redundancy.
> -- 
> Joe Royall
> Red Hat Certified Architect


We already have all of the nodes attached to a san vi fibre channel,  
so I would rather not just provide storage from the san through  
another box as an iscsi target to these 4. It seems like it would add  
a layer of complexity and performance bottlenecking.

It seems like I could do the first half of what you talk about,  
instead of making a gfs filesystem in the clvm lv and using files  
there for the vm backends, I could make multiple lvs in the clvm vg  
and use one for each vm. This gets around using gfs at all, and I  
could just have a resource per vm for it's lv. I am not sure how I  
would specify this in cluster.conf so that the lv would get mounted on  
the proper node that was going to run a vm. From what I have read, a  
<vm> element can't be a child of a <service> element and there doesn't  
seem to be any other way to define a relationship between the two.

Are all resources in a cluster only ever on one node at a time, or is  
there a way to specify that a resource should be on more than one or  
all nodes as they join a cluster?

I looked that the nfscookbook available from Red Hat and it seemed to  
describe a similar problem, I could create 4 services for the same gfs  
filesystem and have each in a failover domain for each node. At this  
point this seems like not even a good problem to use gfs for, I got  
the idea from that Red Hat Magazine article that seemed to be  
describing the very problem I am trying to solve.

-- 
matt whiteley <whiteley at pdx.edu>


From yamato at redhat.com  Fri Jun 27 05:46:06 2008
From: yamato at redhat.com (Masatake YAMATO)
Date: Fri, 27 Jun 2008 14:46:06 +0900 (JST)
Subject: [Linux-cluster] [PATCH] release sock in tcp_connect_to_sock of dlm
 if dlm_nodeid_to_addr returns error
In-Reply-To: <20080623.112259.203763464.yamato@redhat.com>
References: <20080528.144510.108902725.yamato@redhat.com>
	<20080620.122909.158128900.yamato@redhat.com>
	<20080623.112259.203763464.yamato@redhat.com>
Message-ID: <20080627.144606.07609650.yamato@redhat.com>

Hi,

(I submitted to this patch to cluster-devel. However, none has interest
 to my post. So I'll submit it here again.)

It seems that `sock' allocated by sock_create_kern
in tcp_connect_to_sock() of dlm/fs/lowcomms.c is not released if
dlm_nodeid_to_addr an error.


    static void tcp_connect_to_sock(struct connection *con)
    {
    ...

	result = sock_create_kern(dlm_local_addr[0]->ss_family, SOCK_STREAM,
				  IPPROTO_TCP, &sock);
	if (result < 0)
		goto out_err;

	memset(&saddr, 0, sizeof(saddr));
	if (dlm_nodeid_to_addr(con->nodeid, &saddr)) {
		sock_release(sock);
		goto out_err;
	}

	...

    out_err:
	    if (con->sock) {
		    sock_release(con->sock);
		    con->sock = NULL;
	    }


Signed-off-by: Masatake YAMATO <yamato at redhat.com>

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 637018c..3962262 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -891,8 +891,10 @@ static void tcp_connect_to_sock(struct connection *con)
 		goto out_err;
 
 	memset(&saddr, 0, sizeof(saddr));
-	if (dlm_nodeid_to_addr(con->nodeid, &saddr))
+	if (dlm_nodeid_to_addr(con->nodeid, &saddr)) {
+		sock_release(sock);
 		goto out_err;
+	}
 
 	sock->sk->sk_user_data = con;
 	con->rx_action = receive_from_sock;


From Bevan.Broun at ardec.com.au  Fri Jun 27 06:03:35 2008
From: Bevan.Broun at ardec.com.au (Bevan Broun)
Date: Fri, 27 Jun 2008 16:03:35 +1000
Subject: [Linux-cluster] Lost token - every 5 minutes: [TOTEM] The token was
 lost. Samba process possible cause?
Message-ID: <6008E5CED89FD44A86D3C376519E1DB2010347BB42@megatron.ms.a2end.com>

Hi All

I have a 2 node RHEL-5.1 cluster. A quorum disk is configured.
The hosts have 4 NICs. These are bonded:
(eth0+eth2) -> bond0
(eth1+eth3) -> bond1
Unfortunately I was not able to use a dedicated interface for cluster communications - bond1 is being used. This is where I think Im in trouble.

The cluster has been configured using IP addressess. I did have to use http://archives.free.net.ph/message/20080130.074958.5c7a211c.en.html
as the hostname is related to the bond0 IP.

I have not defined the interface to be used by the cluster, just relying on the IP address configured.
The cluster's purpose is 2 GFS file systems.

The cluster was configured and working for 4 days before there was problems.

I now have almost constant lost of token message in /var/log/message. They are almost exactly 5 minutes apart. A typical bit of messages file is show below my sig.

Just before the problem started a samba message shows nmdb becomming local master browser for a work group on the interface used for cluster communications.

Jun 20 13:39:27 HOST1 nmbd[24506]: [2008/06/20 13:39:27, 0] nmbd/nmbd_become_lmb.c:become_loca
l_master_stage2(396)
Jun 20 13:39:27 HOST1 nmbd[24506]:   *****
Jun 20 13:39:27 HOST1 nmbd[24506]:
Jun 20 13:39:27 HOST1 nmbd[24506]:   Samba name server NBM1 is now a local master browser for
workgroup SMS_DOMAIN on subnet 162.16.96.229
Jun 20 13:39:27 HOST1 nmbd[24506]:
Jun 20 13:39:27 HOST1 nmbd[24506]:   *****
Jun 20 13:43:27 HOST1 openais[15265]: [TOTEM] The token was lost in the OPERATIONAL state.

"cman_tool status" shows both nodes and looks normal. Looks like clmvd is not happy, df commands are hanging.

Could nmdb be causing this token loss? Any ideas on how to proceed?

(names and IPs have been changed).

Thanks

Bevan Broun
Solutions Architect
Ardec International
http://www.ardec.com.au
http://www.lisasoft.com
http://www.terrapages.com
Sydney
-----------------------
Suite 112,The Lower Deck
19-21 Jones Bay Wharf
Pirrama Road, Pyrmont 2009
Ph:  +61 2 8570 5000
Fax: +61 2 8570 5099


Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] The token was lost in the OPERATIONAL state.
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Receive multicast socket recv buffer size (28800
 0 bytes).
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Transmit multicast socket send buffer size (2621
 42 bytes).
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] entering GATHER state from 2.
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Creating commit token because I am the rep.
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Saving state aru 16 high seq received 16
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 20ce34
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] entering COMMIT state.
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] The token was lost in the COMMIT state.
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4.
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] Creating commit token because I am the rep.
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 20ce38
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] entering COMMIT state.
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] The token was lost in the COMMIT state.
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4.
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] Creating commit token because I am the rep.
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 20ce3c
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] entering COMMIT state.
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] The token was lost in the COMMIT state.
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4.
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] Creating commit token because I am the rep.
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 20ce40
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] entering COMMIT state.
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] entering RECOVERY state.
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] position [0] member 162.16.96.229:
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] previous ring seq 2149936 rep 162.16.96.229
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] aru 16 high delivered 16 received flag 1
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] position [1] member 162.16.96.230:
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] previous ring seq 2149936 rep 162.16.96.229
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] aru 16 high delivered 16 received flag 1
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] Did not need to originate any messages in recove
 ry.
Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] Sending initial ORF token
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] CLM CONFIGURATION CHANGE
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] New Configuration:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ]    r(0) ip(162.16.96.229)
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ]    r(0) ip(162.16.96.230)
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] Members Left:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] Members Joined:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] CLM CONFIGURATION CHANGE
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] New Configuration:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ]    r(0) ip(162.16.96.229)
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ]    r(0) ip(162.16.96.230)
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] Members Left:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] Members Joined:
 Jun 20 13:49:06 HOST1 openais[15265]: [SYNC ] This node is within the primary component and wi
 ll provide service.
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] entering OPERATIONAL state.
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] got nodejoin message 162.16.96.229
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] got nodejoin message 162.16.96.230
 Jun 20 13:49:06 HOST1 openais[15265]: [CPG  ] got joinlist message from node 2
 Jun 20 13:49:06 HOST1 openais[15265]: [CPG  ] got joinlist message from node 1
 Jun 20 13:53:38 HOST1 openais[15265]: [TOTEM] The token was lost in the OPERATIONAL state.

The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.


From pmshehzad at yahoo.com  Fri Jun 27 10:01:53 2008
From: pmshehzad at yahoo.com (Mshehzad Pankhawala)
Date: Fri, 27 Jun 2008 03:01:53 -0700 (PDT)
Subject: [Linux-cluster] Availability and Working of Service
Message-ID: <55459.36160.qm@web45809.mail.sp1.yahoo.com>

Hello Every one,

I
am planning to configure Asterisk Cluster using some of the clustering
technologies (LVS or OpenSER or Heartbeat or any other thing).

My
problem is that Heartbeat and other component just check the
availability of the Server which is to be clustered. But I also want the
Service Asterisk should also be checked like Server is Answering the
call properly, All the functionality of Asterisk Server is working properly or other services such as voice mail server (which is
used by Asterisk Server) is running properly.

Any body can guide
me how to do that, is there any components, tools available, or Any
Asterisk Specific tool to check Asterisk services etc. then please
reply. 

Thanking you,
Shehzad Pankhawala.


      Unlimited freedom, unlimited storage. Get it now, on http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080627/a62ab33c/attachment.htm>

From federico.simoncelli at gmail.com  Fri Jun 27 11:19:07 2008
From: federico.simoncelli at gmail.com (Federico Simoncelli)
Date: Fri, 27 Jun 2008 13:19:07 +0200
Subject: [Linux-cluster] Files modification time is not updated using GFS2
Message-ID: <a01fe36d0806270419s28099ff3t38ebe76e292c84e4@mail.gmail.com>

Hi all, I'm developing a cluster-aware application which uses files
modification time and locks.
Right now I'm experiencing a weird behaviour using GFS2: files
modification time are not updated from node to node.

# mount | grep gfs2
/dev/mapper/VolGroup01-shared on /cluster type gfs2
(rw,hostdata=jid=0:id=196609:first=0)

[root at node1 ~]# echo "hi" > /cluster/test
[root at node1 ~]# stat -c "%n %Y %y" /cluster/test
/cluster/test 1214563066 2008-06-27 12:37:46.198405310 +0200
[root at node2 ~]# stat -c "%n %Y %y" /cluster/test
/cluster/test 1214563066 2008-06-27 12:37:46.198405310 +0200

[root at node1 ~]# echo "hi2" >> /cluster/test
[root at node1 ~]# stat -c "%n %Y %y" /cluster/test
/cluster/test 1214563136 2008-06-27 12:38:56.358790060 +0200
[root at node2 ~]# stat -c "%n %Y %y" /cluster/test
/cluster/test 1214563066 2008-06-27 12:37:46.198405310 +0200

As you can see the mtime is not updated on the node2 .
I read the most part of the GFS documentation on the redhat website
but I couldn't find any option related to my issue.
The kernel version is 2.6.18-92.1.1.el5 and the system is a CentOS 5.2.

PS. I didn't check this on GFS yet.
-- 
Federico.


From ccaulfie at redhat.com  Fri Jun 27 13:22:36 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Fri, 27 Jun 2008 14:22:36 +0100
Subject: [Linux-cluster] [PATCH] release sock in tcp_connect_to_sock of
	dlm if dlm_nodeid_to_addr returns error
In-Reply-To: <20080627.144606.07609650.yamato@redhat.com>
References: <20080528.144510.108902725.yamato@redhat.com>	<20080620.122909.158128900.yamato@redhat.com>	<20080623.112259.203763464.yamato@redhat.com>
	<20080627.144606.07609650.yamato@redhat.com>
Message-ID: <4864E99C.70301@redhat.com>

Masatake YAMATO wrote:
> Hi,
> 
> (I submitted to this patch to cluster-devel. However, none has interest
>  to my post. So I'll submit it here again.)
> 
> It seems that `sock' allocated by sock_create_kern
> in tcp_connect_to_sock() of dlm/fs/lowcomms.c is not released if
> dlm_nodeid_to_addr an error.
> 
> 
> 
>     static void tcp_connect_to_sock(struct connection *con)
>     {
>     ...
> 
> 	result = sock_create_kern(dlm_local_addr[0]->ss_family, SOCK_STREAM,
> 				  IPPROTO_TCP, &sock);
> 	if (result < 0)
> 		goto out_err;
> 
> 	memset(&saddr, 0, sizeof(saddr));
> 	if (dlm_nodeid_to_addr(con->nodeid, &saddr)) {
> 		sock_release(sock);
> 		goto out_err;
> 	}
> 
> 	...
> 
>     out_err:
> 	    if (con->sock) {
> 		    sock_release(con->sock);
> 		    con->sock = NULL;
> 	    }
> 
> 
> Signed-off-by: Masatake YAMATO <yamato at redhat.com>
> 
> diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> index 637018c..3962262 100644
> --- a/fs/dlm/lowcomms.c
> +++ b/fs/dlm/lowcomms.c
> @@ -891,8 +891,10 @@ static void tcp_connect_to_sock(struct connection *con)
>  		goto out_err;
>  
>  	memset(&saddr, 0, sizeof(saddr));
> -	if (dlm_nodeid_to_addr(con->nodeid, &saddr))
> +	if (dlm_nodeid_to_addr(con->nodeid, &saddr)) {
> +		sock_release(sock);
>  		goto out_err;
> +	}
>  
>  	sock->sk->sk_user_data = con;
>  	con->rx_action = receive_from_sock;


That looks good to me. Thanks.

Can you commit this please Dave?

-- 

Chrissie


From jruemker at redhat.com  Fri Jun 27 13:30:12 2008
From: jruemker at redhat.com (John Ruemker)
Date: Fri, 27 Jun 2008 09:30:12 -0400
Subject: [Linux-cluster] virtual machine failover with gfs
In-Reply-To: <1C3145B9-8F7A-462C-8E14-FD8DAC53F55F@pdx.edu>
References: <D074D3F4-84B1-4596-92AF-12FD99AF726B@pdx.edu>	<bb0fa5d0806251641s7596ebc6q2d905187e24e7a5b@mail.gmail.com>
	<1C3145B9-8F7A-462C-8E14-FD8DAC53F55F@pdx.edu>
Message-ID: <4864EB64.4080006@redhat.com>

matt whiteley wrote:
> On Jun 25, 2008, at 4:41 PM, Joe Royall wrote:
>> Why not use lvm backed vms, 1 per vm, share the entire partition with 
>> all the lvms via ISCSI to each dom0 and run clvm on the dom0s. The 
>> lvms do not need to be mounted in dom0. You can then use RHCS to 
>> failover vms between dom0s. Consider putting all the vms on a single 
>> node into a single resource group and only allow 1 group to operate 
>> on a single node. You can then configure N+1 redundancy.
>> -- 
>> Joe Royall
>> Red Hat Certified Architect
>
>
> We already have all of the nodes attached to a san vi fibre channel, 
> so I would rather not just provide storage from the san through 
> another box as an iscsi target to these 4. It seems like it would add 
> a layer of complexity and performance bottlenecking.
>
> It seems like I could do the first half of what you talk about, 
> instead of making a gfs filesystem in the clvm lv and using files 
> there for the vm backends, I could make multiple lvs in the clvm vg 
> and use one for each vm. This gets around using gfs at all, and I 
> could just have a resource per vm for it's lv. I am not sure how I 
> would specify this in cluster.conf so that the lv would get mounted on 
> the proper node that was going to run a vm. From what I have read, a 
> <vm> element can't be a child of a <service> element and there doesn't 
> seem to be any other way to define a relationship between the two.
If you are using clvmd there is no need to create a resource in 
cluster.conf for the LV. Clustered vg's can be active and used on every 
node in the cluster, so there is no need to have it failed over from one 
to the other. Just create your vm with the LV as its backend storage and 
then create a vm resource for it

<vm autostart="1" name="myguest" path="/etc/xen"/>


John


From teigland at redhat.com  Fri Jun 27 14:20:41 2008
From: teigland at redhat.com (David Teigland)
Date: Fri, 27 Jun 2008 09:20:41 -0500
Subject: [Linux-cluster] [PATCH] release sock in tcp_connect_to_sock of
	dlm if dlm_nodeid_to_addr returns error
In-Reply-To: <20080627.144606.07609650.yamato@redhat.com>
References: <20080528.144510.108902725.yamato@redhat.com>
	<20080620.122909.158128900.yamato@redhat.com>
	<20080623.112259.203763464.yamato@redhat.com>
	<20080627.144606.07609650.yamato@redhat.com>
Message-ID: <20080627142041.GA13664@redhat.com>

On Fri, Jun 27, 2008 at 02:46:06PM +0900, Masatake YAMATO wrote:
> Hi,
> 
> (I submitted to this patch to cluster-devel. However, none has interest
>  to my post. So I'll submit it here again.)
> 
> It seems that `sock' allocated by sock_create_kern
> in tcp_connect_to_sock() of dlm/fs/lowcomms.c is not released if
> dlm_nodeid_to_addr an error.

This is now in dlm.git.
Thanks, Dave


From teigland at redhat.com  Thu Jun 26 19:19:30 2008
From: teigland at redhat.com (David Teigland)
Date: Thu, 26 Jun 2008 14:19:30 -0500
Subject: [Linux-cluster] gfs2, kvm setup
In-Reply-To: <20080626191106.GA11945@fieldses.org>
References: <20080625224544.GJ12629@fieldses.org>
	<20080626152733.GC21081@redhat.com>
	<20080626183529.GD10593@fieldses.org>
	<20080626191106.GA11945@fieldses.org>
Message-ID: <20080626191930.GA3815@redhat.com>

On Thu, Jun 26, 2008 at 03:11:06PM -0400, J. Bruce Fields wrote:
> So, the first mount (on "piglet1") succeeds.  The second (on "piglet2")
> returns immediately without mounting, and leaves this in the logs:
> 
> 	gfs_controld[3035]: segfault at 0 ip 08051361 sp bfd88ae0 error 4 in gfs_controld[8048000+1d000]

Thanks, that's the problem, I'll try to get equivalent versions of things
running to find it.

Dave


From teigland at redhat.com  Fri Jun 27 17:18:45 2008
From: teigland at redhat.com (David Teigland)
Date: Fri, 27 Jun 2008 12:18:45 -0500
Subject: [Linux-cluster] gfs2, kvm setup
In-Reply-To: <20080626211052.GC13293@fieldses.org>
References: <20080625224544.GJ12629@fieldses.org>
	<20080626152733.GC21081@redhat.com>
	<20080626183529.GD10593@fieldses.org>
	<20080626191106.GA11945@fieldses.org>
	<20080626203315.GB13293@fieldses.org>
	<20080626211052.GC13293@fieldses.org>
Message-ID: <20080627171845.GD19105@redhat.com>

On Thu, Jun 26, 2008 at 05:10:52PM -0400, J. Bruce Fields wrote:
> > > So, the first mount (on "piglet1") succeeds.  The second (on "piglet2")
> > > returns immediately without mounting, and leaves this in the logs:
> > > 
> > > 	gfs_controld[3035]: segfault at 0 ip 08051361 sp bfd88ae0 error 4 in gfs_controld[8048000+1d000]
> > 
> > Looking at the object file, that appears to be in purge_plocks().
> > 
> > After rerunning configure with --debug and rebuilding, the second mount
> > hangs instead of returning immediately without mounting.
> 
> By the way, as I've said, I'm using the STABLE2 branch with the one
> patch from you to bring it uptodate with the latest kernel interface.  I
> might actually prefer to try the master branch instead (with the stuff
> labeled "cluster3" in the wiki), but didn't know whether it was actually
> working at this point.  Is it?
> 
> My goal is just to be able to experiment with your latest posix locking
> code and with nfsd exports.

The master branch is probably too unstable to attempt right now.
I'd really like to try this out, but none of the 2.6.26-rc kernels
will work on my test machines!  (mpt fusion driver doesn't work)
Not yet sure what to try next...


From teigland at redhat.com  Fri Jun 27 18:41:17 2008
From: teigland at redhat.com (David Teigland)
Date: Fri, 27 Jun 2008 13:41:17 -0500
Subject: [Linux-cluster] gfs2, kvm setup
In-Reply-To: <Pine.BSO.4.64.0806271325000.12106@citi.umich.edu>
References: <20080625224544.GJ12629@fieldses.org>
	<20080626152733.GC21081@redhat.com>
	<20080626183529.GD10593@fieldses.org>
	<20080626191106.GA11945@fieldses.org>
	<20080626203315.GB13293@fieldses.org>
	<20080626211052.GC13293@fieldses.org>
	<20080627171845.GD19105@redhat.com>
	<Pine.BSO.4.64.0806271325000.12106@citi.umich.edu>
Message-ID: <20080627184117.GE19105@redhat.com>

On Fri, Jun 27, 2008 at 01:28:56PM -0400, david m. richter wrote:
> 	i also have another setup in vmware; while i doubt it's 
> substantively different than bruce's, i'm a ready and willing tester.  is 
> there a different branch (or repo, or just a stack of patches somewhere) 
> that i should/could be using?

If on 2.6.25, then use

  ftp://ftp%40openais%2Eorg:downloads at openais.org/downloads/openais-0.80.3/openais-0.80.3.tar.gz
  ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.04.tar.gz

If on 2.6.26-rc, then you'll need to add the attached patch to cluster.

Dave

-------------- next part --------------
commit 0e25f89dc09cab05c8ce519f4a84bdbf0bff25aa
Author: David Teigland <teigland at redhat.com>
Date:   Mon Apr 7 16:15:01 2008 -0500

    gfs_controld: read plocks from dlm or lock_dlm
    
    In kernels before 2.6.26, cluster posix lock ops are passed to user
    space through the gfs-specific lock_dlm module.  In 2.6.26, the same
    ops are passed to user space through the dlm module.  Update gfs_controld
    to read the plock ops from either module, depending on the kernel.
    
    Signed-off-by: David Teigland <teigland at redhat.com>

diff --git a/group/gfs_controld/lock_dlm.h b/group/gfs_controld/lock_dlm.h
index b57c1f3..4b132b0 100644
--- a/group/gfs_controld/lock_dlm.h
+++ b/group/gfs_controld/lock_dlm.h
@@ -110,6 +110,7 @@ struct mountpoint {
 struct mountgroup {
 	struct list_head	list;
 	uint32_t		id;
+	uint32_t		associated_ls_id;
 	struct list_head	members;
 	struct list_head	members_gone;
 	int			memb_count;
diff --git a/group/gfs_controld/plock.c b/group/gfs_controld/plock.c
index c564300..5e4f56b 100644
--- a/group/gfs_controld/plock.c
+++ b/group/gfs_controld/plock.c
@@ -20,9 +20,10 @@
 #include <netdb.h>
 #include <limits.h>
 #include <unistd.h>
+#include <dirent.h>
 #include <openais/saAis.h>
 #include <openais/saCkpt.h>
-#include <linux/lock_dlm_plock.h>
+#include <linux/dlm_plock.h>
 
 #include "lock_dlm.h"
 
@@ -30,8 +31,9 @@
 #define PROC_DEVICES            "/proc/devices"
 #define MISC_NAME               "misc"
 #define CONTROL_DIR             "/dev/misc"
-#define CONTROL_NAME            "lock_dlm_plock"
+#define CONTROL_NAME            "dlm_plock"
 
+extern struct list_head mounts;
 extern int our_nodeid;
 extern int message_flow_control_on;
 
@@ -57,6 +59,7 @@ static SaCkptCallbacksT callbacks = { 0, 0 };
 static SaVersionT version = { 'B', 1, 1 };
 static char section_buf[1024 * 1024];
 static uint32_t section_len;
+static int need_fsid_translation = 0;
 
 struct pack_plock {
 	uint64_t start;
@@ -100,13 +103,13 @@ struct posix_lock {
 struct lock_waiter {
 	struct list_head	list;
 	uint32_t		flags;
-	struct gdlm_plock_info	info;
+	struct dlm_plock_info	info;
 };
 
 
 static void send_own(struct mountgroup *mg, struct resource *r, int owner);
 static void save_pending_plock(struct mountgroup *mg, struct resource *r,
-			       struct gdlm_plock_info *in);
+			       struct dlm_plock_info *in);
 
 
 static int got_unown(struct resource *r)
@@ -114,7 +117,7 @@ static int got_unown(struct resource *r)
 	return !!(r->flags & R_GOT_UNOWN);
 }
 
-static void info_bswap_out(struct gdlm_plock_info *i)
+static void info_bswap_out(struct dlm_plock_info *i)
 {
 	i->version[0]	= cpu_to_le32(i->version[0]);
 	i->version[1]	= cpu_to_le32(i->version[1]);
@@ -129,7 +132,7 @@ static void info_bswap_out(struct gdlm_plock_info *i)
 	i->owner	= cpu_to_le64(i->owner);
 }
 
-static void info_bswap_in(struct gdlm_plock_info *i)
+static void info_bswap_in(struct dlm_plock_info *i)
 {
 	i->version[0]	= le32_to_cpu(i->version[0]);
 	i->version[1]	= le32_to_cpu(i->version[1]);
@@ -147,11 +150,11 @@ static void info_bswap_in(struct gdlm_plock_info *i)
 static char *op_str(int optype)
 {
 	switch (optype) {
-	case GDLM_PLOCK_OP_LOCK:
+	case DLM_PLOCK_OP_LOCK:
 		return "LK";
-	case GDLM_PLOCK_OP_UNLOCK:
+	case DLM_PLOCK_OP_UNLOCK:
 		return "UN";
-	case GDLM_PLOCK_OP_GET:
+	case DLM_PLOCK_OP_GET:
 		return "GET";
 	default:
 		return "??";
@@ -160,7 +163,7 @@ static char *op_str(int optype)
 
 static char *ex_str(int optype, int ex)
 {
-	if (optype == GDLM_PLOCK_OP_UNLOCK || optype == GDLM_PLOCK_OP_GET)
+	if (optype == DLM_PLOCK_OP_UNLOCK || optype == DLM_PLOCK_OP_GET)
 		return "-";
 	if (ex)
 		return "WR";
@@ -195,10 +198,11 @@ static int get_proc_number(const char *file, const char *name, uint32_t *number)
 	return 0;
 }
 
-static int control_device_number(uint32_t *major, uint32_t *minor)
+static int control_device_number(const char *plock_misc_name,
+				 uint32_t *major, uint32_t *minor)
 {
 	if (!get_proc_number(PROC_DEVICES, MISC_NAME, major) ||
-	    !get_proc_number(PROC_MISC, GDLM_PLOCK_MISC_NAME, minor)) {
+	    !get_proc_number(PROC_MISC, plock_misc_name, minor)) {
 		*major = 0;
 		return 0;
 	}
@@ -265,7 +269,7 @@ static int create_control(const char *control, uint32_t major, uint32_t minor)
 	return 1;
 }
 
-static int open_control(void)
+static int open_control(const char *control_name, const char *plock_misc_name)
 {
 	char control[PATH_MAX];
 	uint32_t major = 0, minor = 0;
@@ -273,22 +277,20 @@ static int open_control(void)
 	if (control_fd != -1)
 		return 0;
 
-	snprintf(control, sizeof(control), "%s/%s", CONTROL_DIR, CONTROL_NAME);
+	snprintf(control, sizeof(control), "%s/%s", CONTROL_DIR, control_name);
 
-	if (!control_device_number(&major, &minor)) {
-		log_error("Is dlm missing from kernel?");
+	if (!control_device_number(plock_misc_name, &major, &minor))
 		return -1;
-	}
 
 	if (!control_exists(control, major, minor) &&
 	    !create_control(control, major, minor)) {
-		log_error("Failure to communicate with kernel lock_dlm");
+		log_error("Failure to create device file %s", control);
 		return -1;
 	}
 
 	control_fd = open(control, O_RDWR);
 	if (control_fd < 0) {
-		log_error("Failure to communicate with kernel lock_dlm: %s",
+		log_error("Failure to open device %s: %s", control,
 			  strerror(errno));
 		return -1;
 	}
@@ -296,6 +298,16 @@ static int open_control(void)
 	return 0;
 }
 
+/*
+ * In kernels before 2.6.26, plocks came from gfs2's lock_dlm module.
+ * Reading plocks from there as well should allow us to use cluster3
+ * on old (RHEL5) kernels.  In this case, the fsid we read in plock_info
+ * structs is the mountgroup id, which we need to translate to the ls id.
+ */
+
+#define OLD_CONTROL_NAME "lock_dlm_plock"
+#define OLD_PLOCK_MISC_NAME "lock_dlm_plock"
+
 int setup_plocks(void)
 {
 	SaAisErrorT err;
@@ -318,14 +330,29 @@ int setup_plocks(void)
 		log_error("ckpt init error %d - plocks unavailable", err);
 
  control:
-	rv = open_control();
-	if (rv)
-		return rv;
+	need_fsid_translation = 1;
+
+	rv = open_control(CONTROL_NAME, DLM_PLOCK_MISC_NAME);
+	if (rv) {
+		log_debug("setup_plocks trying old lock_dlm interface");
+		rv = open_control(OLD_CONTROL_NAME, OLD_PLOCK_MISC_NAME);
+		if (rv) {
+			log_error("Is dlm missing from kernel?  No control device.");
+			return rv;
+		}
+
+		/* the fsid from the kernel is the mountgroup id in old
+		   kernels, which we can use to look up the mg directly
+		   without translation */
+
+		need_fsid_translation = 0;
+	}
 
 	log_debug("plocks %d", control_fd);
+	log_debug("plock need_fsid_translation %d", need_fsid_translation);
 	log_debug("plock cpg message size: %u bytes",
 		  (unsigned int) (sizeof(struct gdlm_header) +
-		                  sizeof(struct gdlm_plock_info)));
+		                  sizeof(struct dlm_plock_info)));
 
 	return control_fd;
 }
@@ -517,7 +544,7 @@ static int shrink_range(struct posix_lock *po, uint64_t start, uint64_t end)
 	return shrink_range2(&po->start, &po->end, start, end);
 }
 
-static int is_conflict(struct resource *r, struct gdlm_plock_info *in, int get)
+static int is_conflict(struct resource *r, struct dlm_plock_info *in, int get)
 {
 	struct posix_lock *po;
 
@@ -566,7 +593,7 @@ static int add_lock(struct resource *r, uint32_t nodeid, uint64_t owner,
    2. convert RE to RN range and mode */
 
 static int lock_case1(struct posix_lock *po, struct resource *r,
-		      struct gdlm_plock_info *in)
+		      struct dlm_plock_info *in)
 {
 	uint64_t start2, end2;
 	int rv;
@@ -593,7 +620,7 @@ static int lock_case1(struct posix_lock *po, struct resource *r,
    3. convert RE to RN range and mode */
 			 
 static int lock_case2(struct posix_lock *po, struct resource *r,
-		      struct gdlm_plock_info *in)
+		      struct dlm_plock_info *in)
 
 {
 	int rv;
@@ -616,7 +643,7 @@ static int lock_case2(struct posix_lock *po, struct resource *r,
 }
 
 static int lock_internal(struct mountgroup *mg, struct resource *r,
-			 struct gdlm_plock_info *in)
+			 struct dlm_plock_info *in)
 {
 	struct posix_lock *po, *safe;
 	int rv = 0;
@@ -679,7 +706,7 @@ static int lock_internal(struct mountgroup *mg, struct resource *r,
 }
 
 static int unlock_internal(struct mountgroup *mg, struct resource *r,
-			   struct gdlm_plock_info *in)
+			   struct dlm_plock_info *in)
 {
 	struct posix_lock *po, *safe;
 	int rv = 0;
@@ -743,7 +770,7 @@ static int unlock_internal(struct mountgroup *mg, struct resource *r,
 }
 
 static int add_waiter(struct mountgroup *mg, struct resource *r,
-		      struct gdlm_plock_info *in)
+		      struct dlm_plock_info *in)
 
 {
 	struct lock_waiter *w;
@@ -751,14 +778,17 @@ static int add_waiter(struct mountgroup *mg, struct resource *r,
 	w = malloc(sizeof(struct lock_waiter));
 	if (!w)
 		return -ENOMEM;
-	memcpy(&w->info, in, sizeof(struct gdlm_plock_info));
+	memcpy(&w->info, in, sizeof(struct dlm_plock_info));
 	list_add_tail(&w->list, &r->waiters);
 	return 0;
 }
 
-static void write_result(struct mountgroup *mg, struct gdlm_plock_info *in,
+static void write_result(struct mountgroup *mg, struct dlm_plock_info *in,
 			 int rv)
 {
+	if (need_fsid_translation)
+		in->fsid = mg->associated_ls_id;
+
 	in->rv = rv;
 	write(control_fd, in, sizeof(struct gdlm_plock_info));
 }
@@ -766,7 +796,7 @@ static void write_result(struct mountgroup *mg, struct gdlm_plock_info *in,
 static void do_waiters(struct mountgroup *mg, struct resource *r)
 {
 	struct lock_waiter *w, *safe;
-	struct gdlm_plock_info *in;
+	struct dlm_plock_info *in;
 	int rv;
 
 	list_for_each_entry_safe(w, safe, &r->waiters, list) {
@@ -792,7 +822,7 @@ static void do_waiters(struct mountgroup *mg, struct resource *r)
 	}
 }
 
-static void do_lock(struct mountgroup *mg, struct gdlm_plock_info *in,
+static void do_lock(struct mountgroup *mg, struct dlm_plock_info *in,
 		    struct resource *r)
 {
 	int rv;
@@ -817,7 +847,7 @@ static void do_lock(struct mountgroup *mg, struct gdlm_plock_info *in,
 	put_resource(r);
 }
 
-static void do_unlock(struct mountgroup *mg, struct gdlm_plock_info *in,
+static void do_unlock(struct mountgroup *mg, struct dlm_plock_info *in,
 		      struct resource *r)
 {
 	int rv;
@@ -833,7 +863,7 @@ static void do_unlock(struct mountgroup *mg, struct gdlm_plock_info *in,
 
 /* we don't even get to this function if the getlk isn't from us */
 
-static void do_get(struct mountgroup *mg, struct gdlm_plock_info *in,
+static void do_get(struct mountgroup *mg, struct dlm_plock_info *in,
 		   struct resource *r)
 {
 	int rv;
@@ -846,19 +876,19 @@ static void do_get(struct mountgroup *mg, struct gdlm_plock_info *in,
 	write_result(mg, in, rv);
 }
 
-static void __receive_plock(struct mountgroup *mg, struct gdlm_plock_info *in,
+static void __receive_plock(struct mountgroup *mg, struct dlm_plock_info *in,
 			    int from, struct resource *r)
 {
 	switch (in->optype) {
-	case GDLM_PLOCK_OP_LOCK:
+	case DLM_PLOCK_OP_LOCK:
 		mg->last_plock_time = time(NULL);
 		do_lock(mg, in, r);
 		break;
-	case GDLM_PLOCK_OP_UNLOCK:
+	case DLM_PLOCK_OP_UNLOCK:
 		mg->last_plock_time = time(NULL);
 		do_unlock(mg, in, r);
 		break;
-	case GDLM_PLOCK_OP_GET:
+	case DLM_PLOCK_OP_GET:
 		do_get(mg, in, r);
 		break;
 	default:
@@ -880,7 +910,7 @@ static void __receive_plock(struct mountgroup *mg, struct gdlm_plock_info *in,
 
 static void _receive_plock(struct mountgroup *mg, char *buf, int len, int from)
 {
-	struct gdlm_plock_info info;
+	struct dlm_plock_info info;
 	struct gdlm_header *hd = (struct gdlm_header *) buf;
 	struct resource *r = NULL;
 	struct timeval now;
@@ -907,7 +937,7 @@ static void _receive_plock(struct mountgroup *mg, char *buf, int len, int from)
 		plock_recv_time = now;
 	}
 
-	if (info.optype == GDLM_PLOCK_OP_GET && from != our_nodeid)
+	if (info.optype == DLM_PLOCK_OP_GET && from != our_nodeid)
 		return;
 
 	if (from != hd->nodeid || from != info.nodeid) {
@@ -1013,14 +1043,14 @@ void receive_plock(struct mountgroup *mg, char *buf, int len, int from)
 	_receive_plock(mg, buf, len, from);
 }
 
-static int send_struct_info(struct mountgroup *mg, struct gdlm_plock_info *in,
+static int send_struct_info(struct mountgroup *mg, struct dlm_plock_info *in,
 			    int msg_type)
 {
 	char *buf;
 	int rv, len;
 	struct gdlm_header *hd;
 
-	len = sizeof(struct gdlm_header) + sizeof(struct gdlm_plock_info);
+	len = sizeof(struct gdlm_header) + sizeof(struct dlm_plock_info);
 	buf = malloc(len);
 	if (!buf) {
 		rv = -ENOMEM;
@@ -1047,14 +1077,14 @@ static int send_struct_info(struct mountgroup *mg, struct gdlm_plock_info *in,
 }
 
 static void send_plock(struct mountgroup *mg, struct resource *r,
-		       struct gdlm_plock_info *in)
+		       struct dlm_plock_info *in)
 {
 	send_struct_info(mg, in, MSG_PLOCK);
 }
 
 static void send_own(struct mountgroup *mg, struct resource *r, int owner)
 {
-	struct gdlm_plock_info info;
+	struct dlm_plock_info info;
 
 	/* if we've already sent an own message for this resource,
 	   (pending list is not empty), then we shouldn't send another */
@@ -1074,7 +1104,7 @@ static void send_own(struct mountgroup *mg, struct resource *r, int owner)
 
 static void send_syncs(struct mountgroup *mg, struct resource *r)
 {
-	struct gdlm_plock_info info;
+	struct dlm_plock_info info;
 	struct posix_lock *po;
 	struct lock_waiter *w;
 	int rv;
@@ -1111,7 +1141,7 @@ static void send_syncs(struct mountgroup *mg, struct resource *r)
 
 static void send_drop(struct mountgroup *mg, struct resource *r)
 {
-	struct gdlm_plock_info info;
+	struct dlm_plock_info info;
 
 	memset(&info, 0, sizeof(info));
 	info.number = r->number;
@@ -1123,7 +1153,7 @@ static void send_drop(struct mountgroup *mg, struct resource *r)
    so the op is saved on the pending list until the r owner is established */
 
 static void save_pending_plock(struct mountgroup *mg, struct resource *r,
-			       struct gdlm_plock_info *in)
+			       struct dlm_plock_info *in)
 {
 	struct lock_waiter *w;
 
@@ -1132,7 +1162,7 @@ static void save_pending_plock(struct mountgroup *mg, struct resource *r,
 		log_error("save_pending_plock no mem");
 		return;
 	}
-	memcpy(&w->info, in, sizeof(struct gdlm_plock_info));
+	memcpy(&w->info, in, sizeof(struct dlm_plock_info));
 	list_add_tail(&w->list, &r->pending);
 }
 
@@ -1167,7 +1197,7 @@ static void send_pending_plocks(struct mountgroup *mg, struct resource *r)
 static void _receive_own(struct mountgroup *mg, char *buf, int len, int from)
 {
 	struct gdlm_header *hd = (struct gdlm_header *) buf;
-	struct gdlm_plock_info info;
+	struct dlm_plock_info info;
 	struct resource *r;
 	int should_not_happen = 0;
 	int rv;
@@ -1294,7 +1324,7 @@ void receive_own(struct mountgroup *mg, char *buf, int len, int from)
 	_receive_own(mg, buf, len, from);
 }
 
-static void clear_syncing_flag(struct resource *r, struct gdlm_plock_info *in)
+static void clear_syncing_flag(struct resource *r, struct dlm_plock_info *in)
 {
 	struct posix_lock *po;
 	struct lock_waiter *w;
@@ -1333,7 +1363,7 @@ static void clear_syncing_flag(struct resource *r, struct gdlm_plock_info *in)
 
 static void _receive_sync(struct mountgroup *mg, char *buf, int len, int from)
 {
-	struct gdlm_plock_info info;
+	struct dlm_plock_info info;
 	struct gdlm_header *hd = (struct gdlm_header *) buf;
 	struct resource *r;
 	int rv;
@@ -1379,7 +1409,7 @@ void receive_sync(struct mountgroup *mg, char *buf, int len, int from)
 
 static void _receive_drop(struct mountgroup *mg, char *buf, int len, int from)
 {
-	struct gdlm_plock_info info;
+	struct dlm_plock_info info;
 	struct resource *r;
 	int rv;
 
@@ -1478,11 +1508,122 @@ static int drop_resources(struct mountgroup *mg)
 	return 0;
 }
 
+/* iterate through directory names looking for matching id:
+   /sys/kernel/dlm/<name>/id */
+
+#define DLM_SYSFS_DIR "/sys/kernel/dlm"
+
+static char ls_name[256];
+
+static int get_lockspace_name(uint32_t ls_id)
+{
+	char path[PATH_MAX];
+	DIR *d;
+	FILE *file;
+	struct dirent *de;
+	uint32_t id;
+	int rv, error;
+
+        d = opendir(DLM_SYSFS_DIR);
+        if (!d) {
+                log_debug("%s: opendir failed: %d", path, errno);
+		return -1;
+        }
+
+	rv = -1;
+
+	while ((de = readdir(d))) {
+		if (de->d_name[0] == '.')
+			continue;
+
+		id = 0;
+		memset(path, 0, PATH_MAX);
+		snprintf(path, PATH_MAX, "%s/%s/id", DLM_SYSFS_DIR, de->d_name);
+
+		file = fopen(path, "r");
+		if (!file) {
+			log_error("can't open %s %d", path, errno);
+			continue;
+		}
+
+		error = fscanf(file, "%u", &id);
+		fclose(file);
+
+		if (error != 1) {
+			log_error("bad read %s %d", path, errno);
+			continue;
+		}
+		if (id != ls_id) {
+			log_debug("get_lockspace_name skip %x %s",
+				  id, de->d_name);
+			continue;
+		}
+
+		log_debug("get_lockspace_name found %x %s", id, de->d_name);
+		strncpy(ls_name, de->d_name, 256);
+		rv = 0;
+		break;
+	}
+
+	closedir(d);
+	return rv;
+}
+
+/* find the locskapce with "ls_id" in sysfs, get it's name, then look for
+   the mg with with the same name in mounts list, return it's id */
+
+static void set_associated_id(uint32_t ls_id)
+{
+	struct mountgroup *mg;
+	int rv;
+
+	log_debug("set_associated_id ls_id %x %d", ls_id, ls_id);
+
+	memset(&ls_name, 0, sizeof(ls_name));
+
+	rv = get_lockspace_name(ls_id);
+	if (rv) {
+		log_error("no lockspace found with id %x", ls_id);
+		return;
+	}
+
+	mg = find_mg(ls_name);
+	if (!mg) {
+		log_error("no mountgroup found with name %s for ls_id %x",
+			  ls_name, ls_id);
+		return;
+	}
+
+	log_debug("set_associated_id ls %x is mg %x", ls_id, mg->id);
+
+	mg->associated_ls_id = ls_id;
+}
+
+static uint32_t ls_to_mg_id(uint32_t fsid)
+{
+	struct mountgroup *mg;
+	int do_set = 1;
+
+ retry:
+	list_for_each_entry(mg, &mounts, list) {
+		if (mg->associated_ls_id == fsid)
+			return mg->id;
+	}
+
+	if (do_set) {
+		do_set = 0;
+		set_associated_id(fsid);
+		goto retry;
+	}
+
+	return fsid;
+}
+
 int process_plocks(void)
 {
 	struct mountgroup *mg;
 	struct resource *r;
-	struct gdlm_plock_info info;
+	struct dlm_plock_info info;
 	struct timeval now;
 	uint64_t usec;
 	int rv;
@@ -1526,6 +1667,9 @@ int process_plocks(void)
 		goto fail;
 	}
 
+	if (need_fsid_translation)
+		info.fsid = ls_to_mg_id(info.fsid);
+
 	mg = find_mg_id(info.fsid);
 	if (!mg) {
 		log_debug("process_plocks: no mg id %x", info.fsid);

From bfields at fieldses.org  Fri Jun 27 22:17:15 2008
From: bfields at fieldses.org (J. Bruce Fields)
Date: Fri, 27 Jun 2008 18:17:15 -0400
Subject: [Linux-cluster] gfs2, kvm setup
In-Reply-To: <20080627171845.GD19105@redhat.com>
References: <20080625224544.GJ12629@fieldses.org>
	<20080626152733.GC21081@redhat.com>
	<20080626183529.GD10593@fieldses.org>
	<20080626191106.GA11945@fieldses.org>
	<20080626203315.GB13293@fieldses.org>
	<20080626211052.GC13293@fieldses.org>
	<20080627171845.GD19105@redhat.com>
Message-ID: <20080627221715.GA25549@fieldses.org>

On Fri, Jun 27, 2008 at 12:18:45PM -0500, David Teigland wrote:
> On Thu, Jun 26, 2008 at 05:10:52PM -0400, J. Bruce Fields wrote:
> > > > So, the first mount (on "piglet1") succeeds.  The second (on "piglet2")
> > > > returns immediately without mounting, and leaves this in the logs:
> > > > 
> > > > 	gfs_controld[3035]: segfault at 0 ip 08051361 sp bfd88ae0 error 4 in gfs_controld[8048000+1d000]
> > > 
> > > Looking at the object file, that appears to be in purge_plocks().
> > > 
> > > After rerunning configure with --debug and rebuilding, the second mount
> > > hangs instead of returning immediately without mounting.
> > 
> > By the way, as I've said, I'm using the STABLE2 branch with the one
> > patch from you to bring it uptodate with the latest kernel interface.  I
> > might actually prefer to try the master branch instead (with the stuff
> > labeled "cluster3" in the wiki), but didn't know whether it was actually
> > working at this point.  Is it?
> > 
> > My goal is just to be able to experiment with your latest posix locking
> > code and with nfsd exports.
> 
> The master branch is probably too unstable to attempt right now.
> I'd really like to try this out, but none of the 2.6.26-rc kernels
> will work on my test machines!  (mpt fusion driver doesn't work)

Oog.  Well, I see that's been reported a few times.  Maybe someone is
taking a look....

> Not yet sure what to try next...

I'll see if I can get any more information out of my setup.

--b.


From fdinitto at redhat.com  Mon Jun 30 06:50:54 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 30 Jun 2008 08:50:54 +0200 (CEST)
Subject: [Linux-cluster] HA Cluster Developer Summit 2008 (phase 1):
	Collecting ideas
In-Reply-To: <1213772667.3498.34.camel@diapolon.int.fabbione.net>
References: <1213772667.3498.34.camel@diapolon.int.fabbione.net>
Message-ID: <Pine.LNX.4.64.0806300849440.27368@trider-g7>

On Wed, 18 Jun 2008, Fabio M. Di Nitto wrote:

> A preliminary list, that also underlines the basic directions that will
> lead the summit, has already been collected here:
>
> http://sources.redhat.com/cluster/wiki/ClusterSummit2008
>
> By the end of June, the list of ideas will be "frozen" and will make
> the final call to decide if there are enough topics to hold the summit
> or not.

Reminder: today is the last day for submitting ideas and add a small draft 
to explain the concepts behind.

Fabio

--
I'm going to make him an offer he can't refuse.


From federico.simoncelli at gmail.com  Mon Jun 30 10:05:26 2008
From: federico.simoncelli at gmail.com (Federico Simoncelli)
Date: Mon, 30 Jun 2008 12:05:26 +0200
Subject: [Linux-cluster] Taking VM snapshots when services are stopped
	(patch)
Message-ID: <a01fe36d0806300305q3f3f3b7ey3059a946554b563@mail.gmail.com>

Hi all, I added the support for taking snapshots of virtual machines
when the services are stopped.
This avoids the immediate shutdown of the vm and the consequent
problems during the next boot.

Usage example:

  <vm name="myvm01" path="/etc/xen" snapshot="/var/lib/xen/save"/>

I'm sure there's a lot we can take from the xendomains init script to
improve this feature.
My current patch in attachment. Suggestions are welcome.

-- 
Federico.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rgmanager-2.0.31-vmsnapshots.patch
Type: application/octet-stream
Size: 1664 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080630/1754f272/attachment.obj>

From theophanis_kontogiannis at yahoo.gr  Mon Jun 30 14:52:09 2008
From: theophanis_kontogiannis at yahoo.gr (Theophanis Kontogiannis)
Date: Mon, 30 Jun 2008 17:52:09 +0300
Subject: [Linux-cluster] Problem with GFS2 - Kernel Panic
Message-ID: <008201c8dac0$e0aae010$a200a030$@gr>

Hello all,

 
I have a two node cluster with DRBD running in Primary/Primary.

Both nodes are running:

 
?         Kernel 2.6.18-92.1.6.el5.centos.plus

?         GFS2 fsck 0.1.44

?         cman_tool 2.0.84

?         Cluster LVM daemon version: 2.02.32-RHEL5 (2008-03-04)

Protocol version:           0.2.1

?         DRBD Version: 8.2.6 (api:88)

 
After a corruption (which was the result of combining updating and rebooting
with the FS mounted, network interruption during the reboot and like issues,
I keep on getting the following on one node:

 
Jun 30 00:13:40 tweety1 clurgmgrd[5283]: <notice> stop on script "BOINC"
returned 1 (generic error)

Jun 30 00:13:40 tweety1 clurgmgrd[5283]: <info> Services Initialized

Jun 30 00:13:40 tweety1 clurgmgrd[5283]: <info> State change: Local UP

Jun 30 00:13:45 tweety1 clurgmgrd[5283]: <notice> Starting stopped service
service:BOINC-t1

Jun 30 00:13:45 tweety1 kernel: GFS2: fsid=tweety:gfs2-00.0: fatal: invalid
metadata block

Jun 30 00:13:45 tweety1 kernel: GFS2: fsid=tweety:gfs2-00.0:   bh = 21879736
(magic number)

Jun 30 00:13:45 tweety1 kernel: GFS2: fsid=tweety:gfs2-00.0:   function =
gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 332

Jun 30 00:13:45 tweety1 kernel: GFS2: fsid=tweety:gfs2-00.0: about to
withdraw this file system

Jun 30 00:13:45 tweety1 kernel: GFS2: fsid=tweety:gfs2-00.0: telling LM to
withdraw

Jun 30 00:13:46 tweety1 clurgmgrd[5283]: <notice> Service service:BOINC-t1
started

Jun 30 00:13:46 tweety1 kernel: GFS2: fsid=tweety:gfs2-00.0: withdrawn

Jun 30 00:13:46 tweety1 kernel:

Jun 30 00:13:46 tweety1 kernel: Call Trace:

Jun 30 00:13:46 tweety1 kernel:  [<ffffffff88629146>]
:gfs2:gfs2_lm_withdraw+0xc1/0xd0

Jun 30 00:13:46 tweety1 kernel:  [<ffffffff800639de>]
__wait_on_bit+0x60/0x6e

Jun 30 00:13:46 tweety1 kernel:  [<ffffffff80014eec>] sync_buffer+0x0/0x3f

Jun 30 00:13:46 tweety1 kernel:  [<ffffffff80063a58>]
out_of_line_wait_on_bit+0x6c/0x78

Jun 30 00:13:46 tweety1 kernel:  [<ffffffff8009d1bb>]
wake_bit_function+0x0/0x23

Jun 30 00:13:46 tweety1 kernel:  [<ffffffff8863af7f>]
:gfs2:gfs2_meta_check_ii+0x2c/0x38

Jun 30 00:13:46 tweety1 kernel:  [<ffffffff8862ca06>]
:gfs2:gfs2_meta_indirect_buffer+0x104/0x15e

Jun 30 00:13:46 tweety1 kernel:  [<ffffffff8862795a>]
:gfs2:gfs2_inode_refresh+0x22/0x2ca

Jun 30 00:13:46 tweety1 kernel:  [<ffffffff8009d1bb>]
wake_bit_function+0x0/0x23

Jun 30 00:13:46 tweety1 kernel:  [<ffffffff88626d9c>]
:gfs2:inode_go_lock+0x29/0x57

Jun 30 00:13:47 tweety1 kernel:  [<ffffffff88625f04>]
:gfs2:glock_wait_internal+0x1d4/0x23f

Jun 30 00:13:47 tweety1 kernel:  [<ffffffff8862611d>]
:gfs2:gfs2_glock_nq+0x1ae/0x1d4

Jun 30 00:13:47 tweety1 kernel:  [<ffffffff88632053>]
:gfs2:gfs2_lookup+0x58/0xa7

Jun 30 00:13:47 tweety1 kernel:  [<ffffffff8863204b>]
:gfs2:gfs2_lookup+0x50/0xa7

Jun 30 00:13:47 tweety1 kernel:  [<ffffffff80022663>] d_alloc+0x174/0x1a9

Jun 30 00:13:47 tweety1 kernel:  [<ffffffff8000cbb4>] do_lookup+0xd3/0x1d4

Jun 30 00:13:47 tweety1 kernel:  [<ffffffff80009f73>]
__link_path_walk+0xa01/0xf42

Jun 30 00:13:47 tweety1 kernel:  [<ffffffff8861fd37>]
:gfs2:compare_dents+0x0/0x57

Jun 30 00:13:47 tweety1 kernel:  [<ffffffff8000e782>]
link_path_walk+0x5c/0xe5

Jun 30 00:13:47 tweety1 kernel:  [<ffffffff88624d6f>]
:gfs2:gfs2_glock_put+0x26/0x133

 
After that, the machine freezes completely. The only way to recover is to
power-cycle / reset.

 
"gfs2-fsck -vy /dev/mapper/vg0-data0" ends (not terminates, it just look
like it finishes) with: 

 
Pass5 complete

Writing changes to disk

gfs2_fsck: buffer still held for block: 21875415 (0x14dcad7)

 
After remounting the file system and having a service start (that has its
files on this gfs2 filesystem), the kernel again crasses with the same
message and the node freezes up.

 
Unfortunately due to bad handling, I failed to DRBD invalidate the
problematic node, and instead of making it sync target (which theoretically
would solve the problem, since the good node, would sync the bad node).

Instead I made the bad node, sync source and now both nodes have the same
issue L

 
Any ideas of how can I resolve this issue?

 
Sincerely,

 
Theophanis Kontogiannis

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080630/a18cf3fb/attachment.htm>

From sanelson at gmail.com  Mon Jun 30 21:56:27 2008
From: sanelson at gmail.com (Stephen Nelson-Smith)
Date: Mon, 30 Jun 2008 22:56:27 +0100
Subject: [Linux-cluster] Homebrew NAS Cluster
Message-ID: <b6131fdc0806301456s11bacae8wae629723d21cf569@mail.gmail.com>

Hi all,

I'm in the process of setting up a virtualisation farm which will have
50-60 virtual machines, running a wide range of web, application and
database applications, all on top of vmware vi3.

My budget won't stretch to a commercial NAS solution, so it's either a
SAN, which could get complicated and hard to manage with so many
nodes, or a home-brew NAS solution.

Has anyone done this, on the list?  I'm wondering what the catch is?
I'm thinking all I need to do is run NFS on top of a clustered
filesystem, and export to ESX.

I could use some pointers, gotchas, ideas and experiences.

Thanks!

S.


From tina.soles at strsoftware.com  Mon Jun 30 22:32:25 2008
From: tina.soles at strsoftware.com (Tina Soles)
Date: Mon, 30 Jun 2008 18:32:25 -0400
Subject: [Linux-cluster] Help with Oracle ASMLib 2.0 and Fedora 9
Message-ID: <05DA6438AEDF5E4B8583C12EBD6C32C0011C2341@mail.strsoftware.com>

Hello,

 
I am attempting to setup an Oracle RAC using these instructions: 
http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi_2.ht
ml#17

 
I am running Fedora 9 with kernel = 2.6.25-14.fc9.i686

 
I realize this is probably an "unsupported" version, but it's the only
version that I could get to work with my firewire setup, so I cannot
change the kernel. 

ocfs2 is up and running, and now I need to install ASMLib 2.0, but it
appears that there is no rpm distribution for this kernel.  Therefore, I
am attempting to build my own, from the source files,
oracleasm-2.0.4.tar.gz.  After unzipping and untarring, I run
./configure and it seems to run fine (see below), but when I try to run
make install it bombs with an error no rule to make target
`oracleasm.ko', needed by `install-oracleasm'.  Stop.

 
I don't have any experience building rpms from source, so any explicit
instructions you can give me would be much appreciated.  Also, does this
source file contain everything I need in order to build the kernel
driver, userspace library, and driver support files, or do I need
separate source files for those?  Please forgive my ignorance, as I am
new to this.

 
Thanks in advance for any help you can give me.

 
Regards,

Tina

 
# ./configure

checking build system type... i686-pc-linux-gnu

checking host system type... i686-pc-linux-gnu

checking for gcc... gcc

checking for C compiler default output file name... a.out

checking whether the C compiler works... yes

checking whether we are cross compiling... no

checking for suffix of executables...

checking for suffix of object files... o

checking whether we are using the GNU C compiler... yes

checking whether gcc accepts -g... yes

checking for gcc option to accept ANSI C... none needed

checking how to run the C preprocessor... gcc -E

checking for a BSD-compatible install... /usr/bin/install -c

checking whether ln -s works... yes

checking for ranlib... ranlib

checking for ar... /usr/bin/ar

checking for egrep... grep -E

checking for ANSI C header files... yes

checking for an ANSI C-conforming const... yes

checking for sys/types.h... yes

checking for sys/stat.h... yes

checking for stdlib.h... yes

checking for string.h... yes

checking for memory.h... yes

checking for strings.h... yes

checking for inttypes.h... yes

checking for stdint.h... yes

checking for unistd.h... yes

checking for unsigned long... yes

checking size of unsigned long... 4

checking for vendor... not found

checking for vendor kernel... not supported

checking for directory with kernel build tree...
/lib/modules/2.6.25-14.fc9.i686/build

checking for kernel version... 2.6.25-14.fc9.i686

checking for capabilities mask in backing_dev_info... yes

checking for vfsmount in ->get_sb() helpers... yes

checking for for mutex API... yes

checking for for i_private... yes

checking for for i_blksize... no

configure: creating ./config.status

config.status: creating Config.make

config.status: creating include/linux/oracleasm/module_version.h

config.status: creating vendor/sles9/oracleasm.spec-generic

config.status: creating vendor/rhel4/oracleasm.spec-generic

config.status: creating vendor/fc6/oracleasm.spec-generic

config.status: creating vendor/sles10/oracleasm.spec-generic

config.status: creating vendor/rhel5/oracleasm.spec-generic

config.status: creating vendor/common/oracleasm-headers.spec-generic

 
# make install

make -C include install

make[1]: Entering directory `/root/rpms/source/oracleasm-2.0.4/include'

make -C linux install

make[2]: Entering directory
`/root/rpms/source/oracleasm-2.0.4/include/linux'

make -C oracleasm install

make[3]: Entering directory
`/root/rpms/source/oracleasm-2.0.4/include/linux/oracleasm'

/bin/sh ../../../mkinstalldirs /usr/local/include/linux/oracleasm

for hdr in abi.h abi_compat.h disk.h error.h manager.h manager_compat.h
kernel.h compat32.h module_version.h; do \

          /usr/bin/install -c -m 644 $hdr
/usr/local/include/linux/oracleasm/$hdr; \

        done

make[3]: Leaving directory
`/root/rpms/source/oracleasm-2.0.4/include/linux/oracleasm'

make[2]: Leaving directory
`/root/rpms/source/oracleasm-2.0.4/include/linux'

make[1]: Leaving directory `/root/rpms/source/oracleasm-2.0.4/include'

make -C kernel install

make[1]: Entering directory `/root/rpms/source/oracleasm-2.0.4/kernel'

make[1]: *** No rule to make target `oracleasm.ko', needed by
`install-oracleasm'.  Stop.

make[1]: Leaving directory `/root/rpms/source/oracleasm-2.0.4/kernel'

make: *** [kernel-install] Error 2

 
Tina Soles

Senior Analyst

 
  <http://www.strsoftware.com/> 

 
11505 Allecingie Parkway
Richmond, VA 23235
email. tina.soles at strsoftware.com <mailto:troy.duval at strsoftware.com> 

phone. 804.897.1600 
fax. 804.897.1638 

web. www.strsoftware.com <http://www.strsoftware.com/> 

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080630/1de4653c/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 3308 bytes
Desc: image001.gif
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080630/1de4653c/attachment.gif>