From riaan at obsidian.co.za  Thu Jun  1 08:27:19 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Thu, 1 Jun 2006 10:27:19 +0200 (SAST)
Subject: [Linux-cluster] Choose between broadcast and multicast for cman
Message-ID: <Pine.LNX.4.33.0606011016480.25176-100000@isis.obsidian.co.za>

Is there any reason or advantage to use multicast over broadcast, which
offsets the complexity of multicast (relative to broadcast), e.g more
control, less traffic, others?

The documentation is not very helpful. The RHCS 4 manual just says "choose
one". The RHCS 3 manual section 3.6.1 says "Multicast heartbeating over a
channel-bonded Ethernet interface provides good fault tolerance and is
recommended for availability." This looks more like a recommendation of
channel-bonding versus standalone interfaces than recommending multicast
over broadcast, since fault tolerance and availability are offered by
channel-bonding, not multicast.


-- 
Riaan van Niekerk
Systems Architect
Obsidian Red Hat Consulting

Obsidian Systems
www.obsidian.co.za
Cel: +27 82 921 8768
Tel: +27 11 792 6500
Fax: +27 11 792 6522


From pcaulfie at redhat.com  Thu Jun  1 08:43:45 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 01 Jun 2006 09:43:45 +0100
Subject: [Linux-cluster] Choose between broadcast and multicast for cman
In-Reply-To: <Pine.LNX.4.33.0606011016480.25176-100000@isis.obsidian.co.za>
References: <Pine.LNX.4.33.0606011016480.25176-100000@isis.obsidian.co.za>
Message-ID: <447EA8C1.6010203@redhat.com>

Riaan van Niekerk wrote:
> Is there any reason or advantage to use multicast over broadcast, which
> offsets the complexity of multicast (relative to broadcast), e.g more
> control, less traffic, others?

Generally multicast behaves the same as broadcast. The only time you would
need multicast is if your nodes are on different subnets - in which case you
would also have to make sure the router had sufficiently low latencies to
support clustering over it.


-- 

patrick


From c_triantafillou at hotmail.com  Thu Jun  1 10:57:32 2006
From: c_triantafillou at hotmail.com (Christos Triantafillou)
Date: Thu, 01 Jun 2006 12:57:32 +0200
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
In-Reply-To: <447DCBC7.9020803@redhat.com>
Message-ID: <BAY112-F26E0FD40C60D1BB38527F793900@phx.gbl>

>You mean that the users are using the default lockspace even
>though the lockspace that was created by root was a different one? Strange.

yes, that is what is happenning:
I have got these devices:
crwxrwxrwx   1 root root 10, 62 May 30 21:32 dlm-control
crwxrwxrwx   1 root root 10, 61 May 30 21:32 dlm_default
crwxrwxrwx   1 root root 10, 61 May 31 17:03 dlm_kobe

and I can now run all the user tests as a non-root user:
# lstest -o -r -l default
Opening lockspace default
locking LOCK-NAME EX ...ast called, status = 0, lkid=103a8
unlocking LOCK-NAME...ast called, status = 65538, lkid=103a8

but
# lstest -o -l default
Opening lockspace default
locking LOCK-NAME EX ...ast called, status = 0, lkid=100cf
unlocking LOCK-NAME...ast called, status = 65538, lkid=100cf
Releasing ls default
release ls: Operation not permitted

Regards,
Christos


From pcaulfie at redhat.com  Thu Jun  1 12:02:23 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 01 Jun 2006 13:02:23 +0100
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
In-Reply-To: <BAY112-F26E0FD40C60D1BB38527F793900@phx.gbl>
References: <BAY112-F26E0FD40C60D1BB38527F793900@phx.gbl>
Message-ID: <447ED74F.1040601@redhat.com>

Christos Triantafillou wrote:
>> You mean that the users are using the default lockspace even
>> though the lockspace that was created by root was a different one?
>> Strange.
> 
> yes, that is what is happenning:
> I have got these devices:
> crwxrwxrwx   1 root root 10, 62 May 30 21:32 dlm-control
> crwxrwxrwx   1 root root 10, 61 May 30 21:32 dlm_default
> crwxrwxrwx   1 root root 10, 61 May 31 17:03 dlm_kobe
> 
> and I can now run all the user tests as a non-root user:
> # lstest -o -r -l default
> Opening lockspace default
> locking LOCK-NAME EX ...ast called, status = 0, lkid=103a8
> unlocking LOCK-NAME...ast called, status = 65538, lkid=103a8
> 
> but
> # lstest -o -l default
> Opening lockspace default
> locking LOCK-NAME EX ...ast called, status = 0, lkid=100cf
> unlocking LOCK-NAME...ast called, status = 65538, lkid=100cf
> Releasing ls default
> release ls: Operation not permitted
> 

It looks like the default lockspace didn't get released when the all
references to it disappeared (unless you have something holding it open!).

I'm not sure how that might happen

-- 

patrick


From riaan at obsidian.co.za  Thu Jun  1 12:47:27 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Thu, 1 Jun 2006 14:47:27 +0200 (SAST)
Subject: [Linux-cluster] Choose between broadcast and multicast for cman
In-Reply-To: <Pine.LNX.4.33.0606011322280.25176-100000@isis.obsidian.co.za>
Message-ID: <Pine.LNX.4.33.0606011418010.25176-100000@isis.obsidian.co.za>

On Thu, 1 Jun 2006, Riaan van Niekerk wrote:

> Riaan van Niekerk wrote:
> > Is there any reason or advantage to use multicast over broadcast, which
> > offsets the complexity of multicast (relative to broadcast), e.g more
> > control, less traffic, others?
> 
> Generally multicast behaves the same as broadcast. The only time you would
> need multicast is if your nodes are on different subnets - in which case 
> you
> would also have to make sure the router had sufficiently low latencies to
> support clustering over it.
> 
> 
tnx Patrick


The RHCS 4 documentation does not give a recommendation for a multicast
address. cman man page and http://sources.redhat.com/cluster/doc/usage.txt
mention 224.0.0.1, which is a non-routable multicast address range, 
224.0.0.0/24 .

So if I understand this correctly:

Using this address or anything in the 224.0.0.0/24 range would give the
exact same effect as long as nodes are on the same subnet. If nodes are on
different subnets, a multicast address in another network should be used
(e.g. RHCS 3 defaults to 225.0.0.11).

Riaan


From pcaulfie at redhat.com  Thu Jun  1 13:08:23 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 01 Jun 2006 14:08:23 +0100
Subject: [Linux-cluster] Choose between broadcast and multicast for cman
In-Reply-To: <Pine.LNX.4.33.0606011418010.25176-100000@isis.obsidian.co.za>
References: <Pine.LNX.4.33.0606011418010.25176-100000@isis.obsidian.co.za>
Message-ID: <447EE6C7.3040101@redhat.com>

Riaan van Niekerk wrote:
> On Thu, 1 Jun 2006, Riaan van Niekerk wrote:
> 
>> Riaan van Niekerk wrote:
>>> Is there any reason or advantage to use multicast over broadcast, which
>>> offsets the complexity of multicast (relative to broadcast), e.g more
>>> control, less traffic, others?
>> Generally multicast behaves the same as broadcast. The only time you would
>> need multicast is if your nodes are on different subnets - in which case 
>> you
>> would also have to make sure the router had sufficiently low latencies to
>> support clustering over it.
>>
>>
> tnx Patrick
> 
> 
> The RHCS 4 documentation does not give a recommendation for a multicast
> address. cman man page and http://sources.redhat.com/cluster/doc/usage.txt
> mention 224.0.0.1, which is a non-routable multicast address range, 
> 224.0.0.0/24 .
> 
> So if I understand this correctly:
> 
> Using this address or anything in the 224.0.0.0/24 range would give the
> exact same effect as long as nodes are on the same subnet. If nodes are on
> different subnets, a multicast address in another network should be used
> (e.g. RHCS 3 defaults to 225.0.0.11).
> 

Exactly. I don't know just how that mcast address got into the documentation
but I suspect it's my fault. I do seem to remember giving someone a config
file with that address in it at some time :-)

-- 

patrick


From riaan at obsidian.co.za  Thu Jun  1 20:38:32 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Thu, 1 Jun 2006 22:38:32 +0200 (SAST)
Subject: [Linux-cluster] Oracle RAC and Cluster Suite, can they coexist
	on the same machine?
In-Reply-To: <Pine.LNX.4.33.0606012204400.19395-100000@isis.obsidian.co.za>
Message-ID: <Pine.LNX.4.33.0606012206120.19395-100000@isis.obsidian.co.za>

On Thu, 1 Jun 2006, Riaan van Niekerk wrote:

> Hello there,
> 
> I would like to know if there is any problem when using the Oracle Real 
> Application Clusters to mange the Oracle database and Red Hat Cluster 
> Suite to manage an application on the same machine?
> 
> Remember that the application will not be entirely managed by the RHCS, it 
> will be ative on all nodes and the RHCS will only be used to manage 
> virtual IP reallocation and mounting points.
> 
> Is there any problems reported so far with this configuration? 
> 
> Does Oracle RAC and RHCS can coexist without problems?
> 
> Any comments on this type of configuration?
> 
> 
I have not run a setup like this, but I do not see this as being a
problem. You do not give version numbers for any of the RHCS or RAC, nor
if GFS is involved, but if you are running GFS 6.1/RHEL 4, you are already
running RHCS. Either way, as long as you do not do anything to make
scripts, mountpoints or virtual IPs/ports clash between RHCS and RAC, you
should be fine, IMHO.

Riaan


From Klaus.Steinberger at physik.uni-muenchen.de  Fri Jun  2 05:54:28 2006
From: Klaus.Steinberger at physik.uni-muenchen.de (Klaus Steinberger)
Date: Fri, 2 Jun 2006 07:54:28 +0200
Subject: [Linux-cluster] GFS - lost filespace during gfs_grow
Message-ID: <200606020754.28501.Klaus.Steinberger@physik.uni-muenchen.de>

Hello,

I have the following problem:

I tried to expand a GFS filesystem from 2 TByte to 3 TByte.

FIrst I expanded successfully the Logical Volume (sits on a FC storage)

Then I tried "gfs_grow -v /export/data/etp".

The last thing it wrote out:

Preparing to write new FS information

After that the load at least on one of the other nodes running the NFS Service 
has gone up (80 - 130), I did not see any big activity on the storage, but 
DLM lock events on the node running gfs_grow.

After some long time (around 20 Minutes) the node running gfs_grow crashed 
with an OOPS ( please see the screenshot at 
http://www.physik.uni-muenchen.de/~klaus.steinberger/crash-dlm.png ).

With df it looks like that only part of the new space was added:

[root at etpopt03 ~]# df /export/data/etp
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/etpdata-etp
                     2427166768 2105093784 322072984  87% /export/data/etp
[root at etpopt03 ~]#  

Further gfs_grow commands tell:

[root at etpopt03 ~]# gfs_grow -Tv /export/data/etp
Device has grown by less than 100 blocks.... skipping
[root at etpopt03 ~]#  

There are 8 journals with standard size (so at most 128 Mbyte should be used 
for the journals), so it looks like around 500 - 600 MByte are missing.
I run Scientific Linux 4.2 (which is similar to RHEL 4.2)

How could I recover the lost space?

Sincerly,
Klaus


-- 
Klaus Steinberger         Maier-Leibnitz Labor
Phone: (+49 89)289 14287  Am Coulombwall 6, D-85748 Garching, Germany
FAX:   (+49 89)289 14280  EMail: Klaus.Steinberger at Physik.Uni-Muenchen.DE
URL: http://www.physik.uni-muenchen.de/~k2/

In a world without Walls and Fences, who needs Windows and Gates


From Olivier.Thibault at lmpt.univ-tours.fr  Fri Jun  2 08:26:17 2006
From: Olivier.Thibault at lmpt.univ-tours.fr (Olivier Thibault)
Date: Fri, 02 Jun 2006 10:26:17 +0200
Subject: [Linux-cluster] gfs export over nfs is very slow
In-Reply-To: <447C633C.1090508@lmpt.univ-tours.fr>
References: <20060530145918.9049.qmail@webmail46.rediffmail.com>
	<447C633C.1090508@lmpt.univ-tours.fr>
Message-ID: <447FF629.6020307@lmpt.univ-tours.fr>

Hi,

I have upgraded FC5, and it's now much better.
For information, here is a bonnie++ test result, on gfs exported via 
nfs, gigabit ethernet lan.
Version 1.01d       ------Sequential Output------ --Sequential Input- 
--Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
poisson          4G 21383  19 21582   6  4026  75 24101  21 22974   3 
259.8   1
                     ------Sequential Create------ --------Random 
Create--------
                     -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16   158   1   517  91   248   2   157   1  3243  22 
  238   2


Locally, the same test is more than twice faster.
Does someone knows if there are optimizations for gfs and nfs, other 
than ones found in NFS Howto ?

Best regards,

Olivier

Olivier Thibault a ?crit :
> Hi,
> 
> Raj Kumar a ?crit :
>  > Hi,
>  >
>  > We are using GFS6.0 (no cluster suite) and NFS exports of the file 
> system. I am getting a transfer rate of about 35MB/sec. We have a high 
> speed SAN. Actually the transfer rate can be little higher but we 
> attribute the slow rate to NFS itself since we see the same numbers for 
> EXT3 also.
>  >
>  > Regards,
>  > Raj
>  >
>  >
> 
> Thank you for your answer.
> I am upgrading to last GFS/DLM/CMAN kernel stuff and will retry.
> I've ran bonnie++ with ext3 exported over nfs and it is really speeder 
> even if it's not what i expected. I got about 22 MB/s (r/w).
> But i saw that nfsd was consuming a lot of CPU. The system load was 15 !!
> I've also ran test with Suse SLES9 xfs exported over nfs. I got 40MB/s, 
> which is what aim to get with GFS ...
> I don't understand ...
> 
> Is there anybody who export gfs over nfs with FC5 ?
> 
> Thanks by advance
> 
> Olivier
> 
>> On Tue, 30 May 2006 Olivier Thibault wrote :
>>> Hi,
>>>
>>>
>>> I am testing RHCS on Fedora Core 5.
>>> I have a shared gfs volume mounted on two nodes (using clvmd and 
>>> lock_dlm).
>>> Locally, everything is ok.
>>> If I export the gfs volume via nfs, i obtain *very poor* performance.
>>> For exemple, from a nfs client with dd, it take 90 seconds to create 
>>> a 16 MB file !!!
>>> From the cluster's nodes, the performances a good, and i made some 
>>> tests exporting xfs over nfs, and it was good too.
>>> So what's wrong with nfs+gfs ?
>>> I would be very interested to know how guys who use this have 
>>> configured it, and what performances they have.
>>>
>>> Thanks for any advices.
>>>
>>> Best regards
>>>
>>> -- Olivier THIBAULT
>>> Laboratoire de Math?matiques et Physique Th?orique (UMR CNRS 6083)
>>> Universit? Fran?ois Rabelais
>>> Parc de Grandmont - 37200 TOURS
>>> T?l: +33 2 47 36 69 12
>>> Fax: +33 2 47 36 69 56
>>>
>>> -- 
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> ------------------------------------------------------------------------
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


From c_triantafillou at hotmail.com  Fri Jun  2 13:28:51 2006
From: c_triantafillou at hotmail.com (Christos Triantafillou)
Date: Fri, 02 Jun 2006 15:28:51 +0200
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
In-Reply-To: <447ED74F.1040601@redhat.com>
Message-ID: <BAY112-F303073AD30AD86F0922C1A93910@phx.gbl>

Patrick,

Is the query functionality implemented yet?
I defined QUERY in dlmtest.c and I am getting this:
# dlmtest -Q
locking LOCK-NAME EX ...done (lkid = 102da)
Query failed: Invalid argument
unlocking LOCK-NAME...done

Regards,
Christos


>From: Patrick Caulfield <pcaulfie at redhat.com>
>Reply-To: linux clustering <linux-cluster at redhat.com>
>To: linux clustering <linux-cluster at redhat.com>
>Subject: Re: [Linux-cluster] DLM & RedHat Enterprise Linux
>Date: Thu, 01 Jun 2006 13:02:23 +0100
>MIME-Version: 1.0
>Received: from hormel.redhat.com ([209.132.177.30]) by 
>bay0-mc4-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 1 
>Jun 2006 05:02:31 -0700
>Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com 
>[10.8.4.110])by hormel.redhat.com (Postfix) with ESMTPid E91BC72E86; Thu,  
>1 Jun 2006 08:02:28 -0400 (EDT)
>Received: from int-mx1.corp.redhat.com 
>(int-mx1.corp.redhat.com[172.16.52.254])by listman.util.phx.redhat.com 
>(8.13.1/8.13.1) with ESMTP idk51C2RXT025641 for 
><linux-cluster at listman.util.phx.redhat.com>;Thu, 1 Jun 2006 08:02:27 -0400
>Received: from pobox.surrey.redhat.com (pobox.surrey.redhat.com 
>[172.16.10.17])by int-mx1.corp.redhat.com (8.12.11.20060308/8.12.11) with 
>ESMTP idk51C2QqF024437 for <linux-cluster at int-mx1.corp.redhat.com>;Thu, 1 
>Jun 2006 08:02:26 -0400
>Received: from [192.168.1.2] (vpn-68-1.surrey.redhat.com [10.32.68.1])by 
>pobox.surrey.redhat.com (8.12.11.20060308/8.12.11) with ESMTP 
>idk51C2ODh016194for <linux-cluster at redhat.com>; Thu, 1 Jun 2006 13:02:25 
>+0100
>X-Message-Info: LsUYwwHHNt1hwMoPuwvRWIu68qUsjYIZZ2SgBHK6+k0=
>Organization: Red Hat
>User-Agent: Thunderbird 1.5 (X11/20051201)
>References: <BAY112-F26E0FD40C60D1BB38527F793900 at phx.gbl>
>X-Enigmail-Version: 0.94.0.0
>X-loop: linux-cluster at redhat.com
>X-BeenThere: linux-cluster at redhat.com
>X-Mailman-Version: 2.1.5
>Precedence: junk
>List-Id: linux clustering <linux-cluster.redhat.com>
>List-Unsubscribe: 
><https://www.redhat.com/mailman/listinfo/linux-cluster>,<mailto:linux-cluster-request at redhat.com?subject=unsubscribe>
>List-Archive: <https://www.redhat.com/archives/linux-cluster>
>List-Post: <mailto:linux-cluster at redhat.com>
>List-Help: <mailto:linux-cluster-request at redhat.com?subject=help>
>List-Subscribe: 
><https://www.redhat.com/mailman/listinfo/linux-cluster>,<mailto:linux-cluster-request at redhat.com?subject=subscribe>
>Errors-To: linux-cluster-bounces at redhat.com
>Return-Path: linux-cluster-bounces at redhat.com
>X-OriginalArrivalTime: 01 Jun 2006 12:02:32.0305 (UTC) 
>FILETIME=[43464610:01C68573]
>
>Christos Triantafillou wrote:
> >> You mean that the users are using the default lockspace even
> >> though the lockspace that was created by root was a different one?
> >> Strange.
> >
> > yes, that is what is happenning:
> > I have got these devices:
> > crwxrwxrwx   1 root root 10, 62 May 30 21:32 dlm-control
> > crwxrwxrwx   1 root root 10, 61 May 30 21:32 dlm_default
> > crwxrwxrwx   1 root root 10, 61 May 31 17:03 dlm_kobe
> >
> > and I can now run all the user tests as a non-root user:
> > # lstest -o -r -l default
> > Opening lockspace default
> > locking LOCK-NAME EX ...ast called, status = 0, lkid=103a8
> > unlocking LOCK-NAME...ast called, status = 65538, lkid=103a8
> >
> > but
> > # lstest -o -l default
> > Opening lockspace default
> > locking LOCK-NAME EX ...ast called, status = 0, lkid=100cf
> > unlocking LOCK-NAME...ast called, status = 65538, lkid=100cf
> > Releasing ls default
> > release ls: Operation not permitted
> >
>
>It looks like the default lockspace didn't get released when the all
>references to it disappeared (unless you have something holding it open!).
>
>I'm not sure how that might happen
>
>--
>
>patrick
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster


From pcaulfie at redhat.com  Fri Jun  2 13:40:36 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Fri, 02 Jun 2006 14:40:36 +0100
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
In-Reply-To: <BAY112-F303073AD30AD86F0922C1A93910@phx.gbl>
References: <BAY112-F303073AD30AD86F0922C1A93910@phx.gbl>
Message-ID: <44803FD4.7060209@redhat.com>

Christos Triantafillou wrote:
> Patrick,
> 
> Is the query functionality implemented yet?
> I defined QUERY in dlmtest.c and I am getting this:
> # dlmtest -Q
> locking LOCK-NAME EX ...done (lkid = 102da)
> Query failed: Invalid argument
> unlocking LOCK-NAME...done
> 

It's implemented in RHEL4 & STABLE, but not in the new (GFS2 stream) code on
CVS head.

If you had to define QUERY, then it sounds like you're using CVS head.

-- 

patrick


From c_triantafillou at hotmail.com  Fri Jun  2 15:36:54 2006
From: c_triantafillou at hotmail.com (Christos Triantafillou)
Date: Fri, 02 Jun 2006 17:36:54 +0200
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
In-Reply-To: <44803FD4.7060209@redhat.com>
Message-ID: <BAY112-F26B93E506C8DDF4BA6BFDD93910@phx.gbl>

>It's implemented in RHEL4 & STABLE, but not in the new (GFS2 stream) code 
>on
>CVS head.
>
>If you had to define QUERY, then it sounds like you're using CVS head.

I had to define QUERY because it is tested in libdlm.h from the cluster 
source.

How can I get the stable version/headers?


From pcaulfie at redhat.com  Fri Jun  2 15:46:50 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Fri, 02 Jun 2006 16:46:50 +0100
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
In-Reply-To: <BAY112-F26B93E506C8DDF4BA6BFDD93910@phx.gbl>
References: <BAY112-F26B93E506C8DDF4BA6BFDD93910@phx.gbl>
Message-ID: <44805D6A.4090305@redhat.com>

Christos Triantafillou wrote:
>> It's implemented in RHEL4 & STABLE, but not in the new (GFS2 stream)
>> code on
>> CVS head.
>>
>> If you had to define QUERY, then it sounds like you're using CVS head.
> 
> I had to define QUERY because it is tested in libdlm.h from the cluster
> source.
> 
> How can I get the stable version/headers?
> 

Checkout from CVS tag STABLE or download the tarball from sources.redhat.com.

But if you're also using cman & dlm from CVS head, the libdlm from STABLE
won't work with it. They must match as the code in HEAD is largely a rewrite.

-- 

patrick


From sdake at redhat.com  Thu Jun  1 13:51:24 2006
From: sdake at redhat.com (Steven Dake)
Date: Thu, 01 Jun 2006 06:51:24 -0700
Subject: [Linux-cluster] Choose between broadcast and multicast for cman
In-Reply-To: <447EA8C1.6010203@redhat.com>
References: <Pine.LNX.4.33.0606011016480.25176-100000@isis.obsidian.co.za>
	<447EA8C1.6010203@redhat.com>
Message-ID: <1149169884.21510.2.camel@shih.broked.org>

On Thu, 2006-06-01 at 09:43 +0100, Patrick Caulfield wrote:
> Riaan van Niekerk wrote:
> > Is there any reason or advantage to use multicast over broadcast, which
> > offsets the complexity of multicast (relative to broadcast), e.g more
> > control, less traffic, others?
> 
> Generally multicast behaves the same as broadcast. The only time you would
> need multicast is if your nodes are on different subnets - in which case you
> would also have to make sure the router had sufficiently low latencies to
> support clustering over it.
> 
> 
> 

To add to Patrick's comments, some multicast switches (managed) support
IGMP which allows multicasted packets to only be sent to ports which are
subscribed to a specific multicast address.  This can increase
throughput on those nodes that are not part of the cluster and hence
shouldn't be receiving those broadcasts.

Keep in mind that some switches IGMP implementation is broken.

Regards
-steve


From rajkum2002 at rediffmail.com  Fri Jun  2 21:39:31 2006
From: rajkum2002 at rediffmail.com (Raj  Kumar)
Date: 2 Jun 2006 21:39:31 -0000
Subject: [Linux-cluster] gfs export over nfs is very slow
Message-ID: <20060602213931.19959.qmail@webmail49.rediffmail.com>

Hi,

I found that we are also experiencing performance issues using GFS. Earlier we thought EXT3 and GFS were performing equally so NFS is the issue. But I found that some tests were done incorrectly and EXT3 over NFS is twice faster than GFS over NFS. We formatted a SAN volume as EXT3 and benchmarked it on NFS client. We formatted the same SAN volume as GFS and benchmarked again. GFS + NFS is very slow.

I have also read NFS tuning guides and tried several options. But no change whatsoever.

Are there any other ways to debug this issue. It's a top most priority for us now.

Thanks,
Raj


On Fri, 02 Jun 2006 Olivier Thibault wrote :
>Hi,
>
>I have upgraded FC5, and it's now much better.
>For information, here is a bonnie++ test result, on gfs exported via nfs, gigabit ethernet lan.
>Version 1.01d       ------Sequential Output------ --Sequential Input- --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
>poisson          4G 21383  19 21582   6  4026  75 24101  21 22974   3 259.8   1
>                     ------Sequential Create------ --------Random Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP
>                  16   158   1   517  91   248   2   157   1  3243  22  238   2
>
>
>Locally, the same test is more than twice faster.
>Does someone knows if there are optimizations for gfs and nfs, other than ones found in NFS Howto ?
>
>Best regards,
>
>Olivier
>
>Olivier Thibault a ?crit :
>>Hi,
>>
>>Raj Kumar a ?crit :
>>  > Hi,
>>  >
>>  > We are using GFS6.0 (no cluster suite) and NFS exports of the file system. I am getting a transfer rate of about 35MB/sec. We have a high speed SAN. Actually the transfer rate can be little higher but we attribute the slow rate to NFS itself since we see the same numbers for EXT3 also.
>>  >
>>  > Regards,
>>  > Raj
>>  >
>>  >
>>
>>Thank you for your answer.
>>I am upgrading to last GFS/DLM/CMAN kernel stuff and will retry.
>>I've ran bonnie++ with ext3 exported over nfs and it is really speeder even if it's not what i expected. I got about 22 MB/s (r/w).
>>But i saw that nfsd was consuming a lot of CPU. The system load was 15 !!
>>I've also ran test with Suse SLES9 xfs exported over nfs. I got 40MB/s, which is what aim to get with GFS ...
>>I don't understand ...
>>
>>Is there anybody who export gfs over nfs with FC5 ?
>>
>>Thanks by advance
>>
>>Olivier
>>
>>>On Tue, 30 May 2006 Olivier Thibault wrote :
>>>>Hi,
>>>>
>>>>
>>>>I am testing RHCS on Fedora Core 5.
>>>>I have a shared gfs volume mounted on two nodes (using clvmd and lock_dlm).
>>>>Locally, everything is ok.
>>>>If I export the gfs volume via nfs, i obtain *very poor* performance.
>>>>For exemple, from a nfs client with dd, it take 90 seconds to create a 16 MB file !!!
>>>> From the cluster's nodes, the performances a good, and i made some tests exporting xfs over nfs, and it was good too.
>>>>So what's wrong with nfs+gfs ?
>>>>I would be very interested to know how guys who use this have configured it, and what performances they have.
>>>>
>>>>Thanks for any advices.
>>>>
>>>>Best regards
>>>>
>>>>-- Olivier THIBAULT
>>>>Laboratoire de Math?matiques et Physique Th?orique (UMR CNRS 6083)
>>>>Universit? Fran?ois Rabelais
>>>>Parc de Grandmont - 37200 TOURS
>>>>T?l: +33 2 47 36 69 12
>>>>Fax: +33 2 47 36 69 56
>>>>
>>>>-- Linux-cluster mailing list
>>>>Linux-cluster at redhat.com
>>>>https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>>------------------------------------------------------------------------
>>>
>>>-- Linux-cluster mailing list
>>>Linux-cluster at redhat.com
>>>https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060602/e62eccf1/attachment.htm>

From wcheng at redhat.com  Sat Jun  3 00:22:54 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Fri, 02 Jun 2006 20:22:54 -0400
Subject: [Linux-cluster] gfs export over nfs is very slow
In-Reply-To: <20060602213931.19959.qmail@webmail49.rediffmail.com>
References: <20060602213931.19959.qmail@webmail49.rediffmail.com>
Message-ID: <1149294175.5424.42.camel@localhost.localdomain>

Try these:

1) Note that 2.6 kernel defaults NFS export option to "sync". So unless
you have a strong need, explicitly set NFS export to "async" in your
export file (/etc/exports in Red Hat systems) and do *not* mount NFS
shares with "sync" option.

2) Upon large NFS append (i.e. "write" that will increase file size),
using bigger block size (e.g. when using dd command) and/or bigger
application buffer AND increasing NFS wsize and rsize (mount option) to
its maximum, *if* you have to use "sync" in as either export or mount
option.

The problem here could be GFS sync due to its cluster filesystem nature
and its file sync design. So try to avoid it if possible.

Let us know how it goes. 

-- Wendy


From carl at e2-media.co.nz  Sun Jun  4 01:38:33 2006
From: carl at e2-media.co.nz (Carl Bowden)
Date: Sun, 4 Jun 2006 13:38:33 +1200
Subject: [Linux-cluster] cluster.conf Documentation/DTD
Message-ID: <D34A524C-44DF-452A-A7A4-6CC00187EE46@e2-media.co.nz>

Hi,

Is there any Documentation, other than the cluster.conf(8) man page  
on the cluster.conf XML

I'm specifily looking to find out how to declare a 'Fence Domain'  
name in the cluster
and to check what is 'valid' for the cluster.conf file

At this stage this is the only options for the <fence_daemon> element  
I have seen:

<fence_daemon clean_start="<<int>>" post_fail_delay="<<int>>"  
post_join_delay="<<int>>"/>

or is this in-fact a silly thing to try and do?

any pointer to some more info would be very helpful

Cheers,

Carl.


"To understand recursion,
  you must first understand recursion".
----------------------------------
Carl Bowden
carl at e2-media.co.nz
e2media Ltd
2nd Floor, 160 Cashel St
PO BOX 22 128
Christchurch
New Zealand

Ph +64 3 377 0007
Fx +64 3 377 6582
M  +021  338 410


From Tomasz.Koczorowski at centertel.pl  Mon Jun  5 07:56:38 2006
From: Tomasz.Koczorowski at centertel.pl (Tomasz Koczorowski)
Date: Mon, 5 Jun 2006 09:56:38 +0200
Subject: [Linux-cluster] Failed service
Message-ID: <7E36A75EF0ABE243954B6E5AFA35D86645B717@EXCH-BK.centertel.main>

Hi,

I have encountered following problem: developers tried to
upgrade clustered application on RHCS4 without stopping
related service. Unfortunately while the application was stopped,
cluster tried to execute start script with status parameter - it failed.
After that cluster tried to relocate the service but stopping failed 
(beacuse service was already stopped) and the service state 
was changed to failed (shared filesystem and ip address were not
removed from node). Later developers started the service (without 
notifying CS). So now application is running but service state is
failed.

I can enable the service but I will have to execute following commands
on failed node thus stopping important application:

clusvcadm -d my_service
clusvcadm -e my_service

Is there a possibility to enable failed service without stopping and 
starting it?

Regards,

Tomasz Koczorowski


From bill.scherer at verizonwireless.com  Mon Jun  5 18:31:22 2006
From: bill.scherer at verizonwireless.com (Bill Scherer)
Date: Mon, 05 Jun 2006 14:31:22 -0400
Subject: [Linux-cluster] Shared Filesystem
In-Reply-To: <NYOBGCEXSMTP212hM2L001b45dd@nyobgcexsmtp21.uswin.ad.vzwcorp.com>
References: <NYOBGCEXSMTP212hM2L001b45dd@nyobgcexsmtp21.uswin.ad.vzwcorp.com>
Message-ID: <4484787A.2050800@verizonwireless.com>

Joe Warren-Meeks wrote:

>
> Hey there,
>
> I've got an iscsi array, on which I have a load of content to be 
> served via http. I basically want to have a bunch of linux boxes 
> mount the same partition read-only, via iscsi, so that they can do this.
>
> Now, I know I need GFS
>
Why? You said it's read only. NFS can handle this, and it's a lot easier 
to set up and maintain.

> and one of the locking daemons (possibly 
> gulm?), but do I need anything else from the cluster suite?
>
> Anyone got any pointers on where to look for this info? I don't want 
> to set up a full-blown cluster, just share one partition between 
> multiple machines.
>
> Cheers!
>
>   -- joe.
>
> Joe Warren-Meeks                       T: +44 (0) 208 962 0007
> Aggregator Limited                     M: +44 (0) 7789 176078
> Unit 62/63 Pall Mall Deposit           E: joe.warren-meeks at aggregator.tv
> 124-128 Barlby Road, London W10 6BL
> PGP Fingerprint: 361F 78D0 56F5 8D7F 2639  947D 71E2 8811 F825 64CC
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


From joe.warren-meeks at aggregator.tv  Mon Jun  5 21:21:52 2006
From: joe.warren-meeks at aggregator.tv (Joe Warren-Meeks)
Date: Mon, 5 Jun 2006 22:21:52 +0100
Subject: [Linux-cluster] Shared Filesystem
In-Reply-To: <4484787A.2050800@verizonwireless.com>
References: <NYOBGCEXSMTP212hM2L001b45dd@nyobgcexsmtp21.uswin.ad.vzwcorp.com>
	<4484787A.2050800@verizonwireless.com>
Message-ID: <DC24D9A3-6206-41CC-940A-A34AEE4A52E0@aggregator.tv>


On 5 Jun 2006, at 19:31, Bill Scherer wrote:

Hey there,

> Why? You said it's read only. NFS can handle this, and it's a lot  
> easier to set up and maintain.

Mainly scalability and performance. I've done NFS based systems  
before and they've had problems once you pass a certain number of  
clients unless you spend a fortune on NetApps or the equivalent.

Looks like I'm going to go with Storagetek or netapps though. I'd  
rather have used something like the Equallogic iscsi box, though.

  -- joe.

Joe Warren-Meeks                       T: +44 (0) 208 962 0007
Aggregator Limited                     M: +44 (0) 7789 176078
Unit 62/63 Pall Mall Deposit           E: joe.warren-meeks at aggregator.tv
124-128 Barlby Road, London W10 6BL
PGP Fingerprint: 361F 78D0 56F5 8D7F 2639  947D 71E2 8811 F825 64CC


From rick at espresolutions.com  Mon Jun  5 22:59:23 2006
From: rick at espresolutions.com (Rick Bansal)
Date: Mon, 5 Jun 2006 17:59:23 -0500
Subject: [Linux-cluster] Configuring cluster for direct routing
Message-ID: <200606060310.k563A9Xe032719@mx1.redhat.com>

Hello,

Has anyone setup a RH cluster using direct routing.  I read on the RH site
that DR is not offically supported but can be done.  If anyone has any
insight into this, your help would be greatly appreciated.

Regards,
Rick Bansal


From riaan at obsidian.co.za  Tue Jun  6 08:22:51 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Tue, 6 Jun 2006 10:22:51 +0200 (SAST)
Subject: [Linux-cluster] Configuring cluster for direct routing
In-Reply-To: <200606060310.k563A9Xe032719@mx1.redhat.com>
Message-ID: <Pine.LNX.4.33.0606060948180.3416-100000@isis.obsidian.co.za>

On Mon, 5 Jun 2006, Rick Bansal wrote:

> Hello,
> 
> Has anyone setup a RH cluster using direct routing.  I read on the RH site
> that DR is not offically supported but can be done.  If anyone has any
> insight into this, your help would be greatly appreciated.
> 
> Regards,
> Rick Bansal
> 
> 
Where do you read that it is not supported? I remember reading something like that, but cannot find it in the RHCS manual. I have not set up Direct Routing in IPVS mysel, but this Red Hat Magazine Tips & Tricks
http://www.redhat.com/magazine/014dec05/departments/tips_tricks/ , second item, contains a nice writeup by Lon on how to do it. It does not say anything about being supported or not. 

I would contact Red Hat Global Support Services and ask if it is supported. The SLA for Cluster Suite does not go into that level of detail (even though it explicitly excludes manual fencing for production workloads):
http://www.redhat.com/support/service/sla/defs_cluster/ha.html

Riaan


From ben.yarwood at juno.co.uk  Tue Jun  6 10:16:46 2006
From: ben.yarwood at juno.co.uk (Ben Yarwood)
Date: Tue, 6 Jun 2006 11:16:46 +0100
Subject: [Linux-cluster] Backup File System
Message-ID: <0de101c68952$52375780$3964a8c0@WS076>

Presently we have a 3 node production GFS system for which I am creating a
backup system.  The production system has a hardware raid device attached as
the shared storage and I have a second virtually identical storage device
and host that I want to use for the backup.

The backup device and host will eventually be at a separate physical
location to the production system (once the two systems are syncronised) so
I don't want it to be part of the existing cluster.  In the event of failure
of the production storage, the simplest solution would be to physically
transfer the backup storage to the production environment and swap the
devices.  

Can anyone suggest the best way to set up the backup file system?

If I used a one node cluster as the backup, would it be a) possible to
convert the locking from nolock to dlm?  and b) Convert the ClusterName and
FSName

Is it possible to convert other file systems to GFS ones so that no cluster
infrastructure is needed for the backup?


Regards
Ben


From riaan at obsidian.co.za  Tue Jun  6 13:23:01 2006
From: riaan at obsidian.co.za (riaan at obsidian.co.za)
Date: Tue, 06 Jun 2006 15:23:01 +0200
Subject: [Linux-cluster] post_fail_delay
Message-ID: <20060606152301.qrmkr58ls8kcs4ss@web.obsidianonline.net>

Having researched post_fail_delay in the archives extensively, I have the
following scenario and question:

I would like for an errant GFS node to be able to create network/disk dumps
before being power fenced. Am I missing something, or is this leaving the
errand node unfenced for any significant amount of time (enough to complete the
dump, assuming it is upwards of a few seconds) just a bad idea?
AFAIUnderstand, the whole idea of fencing is to prevent the node from damaging
the file system in the first place, making the collection of dumps and power
fencing fundamentally at odds with each other.

The only way I can see fencing/dumping being used togeather is with I/O fencing
(and I/O fencing alone, e.g. no power fencing as a second level).  The cluster
I/O fences the node immediately, but it remains up to be able to complete the
dump. Recovery entails rebooting & re-enabling the port (all manual). However,
post_fail_delay is still set to 0.

To summarize, as I see it (please feel free to correct)

To ensure data integrity:
- Always use a post_fail_delay of 0, whether you are using power or I/O
  fencing.
- When using power fencing (alone or with I/O fencing), you cannot use
netdump/diskdump - otherwise the server will be fenced (rebooted) before being
able to complete the dump.
- When you must have the ability to netdump/diskdump, use I/O fencing (and only
I/O fencing), and time the manual restore/unfence so that the dump has time to
complete

tnx
Riaan

----------------------------------------------------------------
This message was sent using Obsidian Online web-mail.
Obsidian Online - a division of Obsidian Systems (Pty) Ltd.
http://www.obsidianonline.net/


From rick at espresolutions.com  Tue Jun  6 15:20:28 2006
From: rick at espresolutions.com (Rick Bansal)
Date: Tue, 6 Jun 2006 10:20:28 -0500
Subject: [Linux-cluster] Configuring cluster for direct routing
In-Reply-To: <Pine.LNX.4.33.0606060948180.3416-100000@isis.obsidian.co.za>
Message-ID: <200606061520.k56FKdhs020198@mx1.redhat.com>

Thanks for the response.  Can't remember where I read it either.  I'm trying
to find it again.  I'll provide the link once I find it.

Regards,
Rick

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Riaan van Niekerk
Sent: Tuesday, June 06, 2006 3:23 AM
To: linux clustering
Subject: Re: [Linux-cluster] Configuring cluster for direct routing

On Mon, 5 Jun 2006, Rick Bansal wrote:

> Hello,
> 
> Has anyone setup a RH cluster using direct routing.  I read on the RH site
> that DR is not offically supported but can be done.  If anyone has any
> insight into this, your help would be greatly appreciated.
> 
> Regards,
> Rick Bansal
> 
> 
Where do you read that it is not supported? I remember reading something
like that, but cannot find it in the RHCS manual. I have not set up Direct
Routing in IPVS mysel, but this Red Hat Magazine Tips & Tricks
http://www.redhat.com/magazine/014dec05/departments/tips_tricks/ , second
item, contains a nice writeup by Lon on how to do it. It does not say
anything about being supported or not. 

I would contact Red Hat Global Support Services and ask if it is supported.
The SLA for Cluster Suite does not go into that level of detail (even though
it explicitly excludes manual fencing for production workloads):
http://www.redhat.com/support/service/sla/defs_cluster/ha.html

Riaan


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From mathieu.avila at seanodes.com  Tue Jun  6 16:38:31 2006
From: mathieu.avila at seanodes.com (Mathieu Avila)
Date: Tue, 06 Jun 2006 18:38:31 +0200
Subject: [Linux-cluster] Is fenced service started ?
Message-ID: <4485AF87.7090609@seanodes.com>

Hi all,

I am trying to automate the starting and stopping of a GFS filesystem 
(GFS 6.1). I am doing these things :
- On start :
/etc/init.d/ccsd start
/etc/init.d/cman start
/etc/init.d/fenced start
/etc/init.d/gfs start
And then mount -t gfs device mountpoint
- On stop :
umount device,
/etc/init.d/gfs stop
/etc/init.d/fenced stop
/etc/init.d/cman stop
/etc/init.d/ccsd stop

This goes fine most of the time, but not always. Sometimes I get things 
like this:
"lock_dlm: fence domain not found; check fenced" in syslog at mount 
time, although /etc/init.d/fenced was properly started. In fact, the 
fence daemon did not have enough time to initialize itself completely 
(/etc/cluster/services).

The same can happen if i start immediately after a stop, as the fencing 
daemon does not have time to completely exit when i try to run it again.

Is there a clean way to test if fenced is completely started or failed ? 
Looping over /etc/cluster/services does not sound appropriate and quite 
clean. Doing a "sleep 10" is not a good option neither.

Any idea is welcome.

--
Mathieu Avila


From Matthew.Patton.ctr at osd.mil  Tue Jun  6 18:28:48 2006
From: Matthew.Patton.ctr at osd.mil (Patton, Matthew F, CTR, OSD-PA&E)
Date: Tue, 6 Jun 2006 14:28:48 -0400 
Subject: [Linux-cluster] Is fenced service started ?
Message-ID: <D8063DF686D10247B0A49D01271285690CE91E29@osdn06.osd.mil>

Classification: UNCLASSIFIED

> /etc/init.d/fenced start / stop

neither should return until they are actually done. For example in the DNS
start/stop script there is a sleep which is really sad way to go about it
but works for now. I find the quality of the start/stop scripts in general
to be summer-intern grade. Can't comment on how good or bad other distro's
are but maybe they are all similarly flakey.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 2212 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060606/8ec02575/attachment.bin>

From rpeterso at redhat.com  Tue Jun  6 18:51:19 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Tue, 06 Jun 2006 13:51:19 -0500
Subject: [Linux-cluster] [GFS2] gfs2 utils now available (experimental)
Message-ID: <1149619879.6183.50.camel@technetium.msp.redhat.com>

Hi Folks,

For anyone who wants to start doing preliminary playing with GFS2:

This morning, I finished my first version of the user-land tools for the
new GFS2 filesystem and made them available in Red Hat's public CVS
repository.  Feel free to review them and/or try them out.
(See warning at the bottom).

The tools are as follows (with some comments):

1. libgfs2

This is a new library that the other tools rely upon and link against.
In GFS1, each tool had its own way of doing things, and that was prone
to mistakes.  Now the tools all use a standard library of gfs2
functions, and more problems can be fixed in one place rather than many.

2. gfs2_convert

This tool allows you to convert a gfs1 filesystem to gfs2 format.
There are some minor differences between the gfs1 and gfs2 on-disk
format that allows gfs2 to have better performance.  So we wrote a
tool to convert from one to the other.  This tool also requires
new library libgfs.a, which is in the gfs branch.

3. gfs2_fsck

GFS2 filesystem checker.  Enough said.  Still needs some work.

4. mkfs.gfs2

GFS2 mkfs program.  This will be incorporating udev's "libvolume_id.a"
library for determining if a filesystem exists on the device, and what
type.  In GFS1, we used to do this in a home-grown fashion.  Now we're
going to start using a standard library.  Unfortunately, libvolume_id.a
doesn't exist on many systems yet, but that is planned, and we're
all set to use it when it's there.  In the meantime, we've got it
stubbed in with some #ifdefs around.

5. gfs2_edit

This is an internal filesystem debugging and editing tool we use here.
It can be used to hex-edit the filesystem or print gfs2 data structures.
It's a very dangerous tool in the wrong hands, but it has its uses.
We've thrown out "gfs_debug" and incorporated the functionality into
gfs2_edit.  I'm planning to expand its capabilities in the future,
to aid in data recovery for badly damaged filesystems that can't be
mounted.  For example, I'm planning to add the capability to copy files
out of an unmounted fs using the tool.

We're still working on gfs2_jadd and gfs2_grow, and of course, the GFS2
kernel modules are being incorporated into the upstream kernels.

To get the whole cluster suite source code, with the gfs2 directory, 
use CVS.  Do something like this:

cvs -d :ext:sources.redhat.com:/cvs/cluster co -r HEAD cluster

On the web at:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/gfs2/?cvsroot=cluster 

NOTE: This is for the user tools only.  The GFS2 kernel source is lying
in a public git tree on kernel.org, and should also be considered
experimental:

git clone rsync://rsync.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6.git gfs2-2.6
(Be forewarned: This is very big and takes a long time).

WARNING:

These tools are still experimental and I'm sure there are still
problems, which we're still working on.  So don't trust valuable data
to it yet.  After all, we are still in development mode.  Some might
think I'm premature to release these tools before they're rock-solid,
but it's also valid that the open source community should get a look 
at them as soon as it became feasible in the spirit of release 
early/often.  Maybe you can ferret out mistakes, problems or issues 
I've overlooked.  Questions and comments are welcome.

Once again, Red Hat puts its money where its mouth is regarding open
source and the open source community.  Enjoy.

Regards,

Bob Peterson
Red Hat Cluster Suite


From rajkum2002 at rediffmail.com  Tue Jun  6 19:41:35 2006
From: rajkum2002 at rediffmail.com (Raj  Kumar)
Date: 6 Jun 2006 19:41:35 -0000
Subject: [Linux-cluster] gfs export over nfs is very slow
Message-ID: <20060606194135.30308.qmail@webmail55.rediffmail.com>

We are using GFS6.0 and RHEL3 servers. I have been using "async" for all my NFS exports. The speed that I reported is also using "async" option. Any other tips?

Thanks,
Raj
 ?


On Sat, 03 Jun 2006 Wendy Cheng wrote :
>Try these:
>
>1) Note that 2.6 kernel defaults NFS export option to "sync". So unless
>you have a strong need, explicitly set NFS export to "async" in your
>export file (/etc/exports in Red Hat systems) and do *not* mount NFS
>shares with "sync" option.
>
>2) Upon large NFS append (i.e. "write" that will increase file size),
>using bigger block size (e.g. when using dd command) and/or bigger
>application buffer AND increasing NFS wsize and rsize (mount option) to
>its maximum, *if* you have to use "sync" in as either export or mount
>option.
>
>The problem here could be GFS sync due to its cluster filesystem nature
>and its file sync design. So try to avoid it if possible.
>
>Let us know how it goes.
>
>-- Wendy
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060606/030f9c71/attachment.htm>

From wcheng at redhat.com  Tue Jun  6 19:45:19 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Tue, 06 Jun 2006 15:45:19 -0400
Subject: [Linux-cluster] gfs export over nfs is very slow
In-Reply-To: <20060606194135.30308.qmail@webmail55.rediffmail.com>
References: <20060606194135.30308.qmail@webmail55.rediffmail.com>
Message-ID: <4485DB4F.9070502@redhat.com>

Raj Kumar wrote:

> We are using GFS6.0 and RHEL3 servers. I have been using "async" for 
> all my NFS exports. The speed that I reported is also using "async" 
> option. Any other tips?
>
Was the "speed" measured by bonnie++ and only by bonnie++ ?

-- Wendy


From rajkum2002 at rediffmail.com  Tue Jun  6 20:54:35 2006
From: rajkum2002 at rediffmail.com (Raj  Kumar)
Date: 6 Jun 2006 20:54:35 -0000
Subject: [Linux-cluster] gfs export over nfs is very slow
Message-ID: <20060606205435.15785.qmail@webmail46.rediffmail.com>

No. I am measuring read speed. I used command "time cat test* > /dev/null" ?to time reading 400 files. All our files are 33MB in size. Our processing applications reads many such files in real time and processes them. The app is falling behind because the read speed is less than 20MB/sec but it got to be able to read at least 2 files per second on average... so we need about 50-60Mb/sec transfer rate on our NFS clients. We could get about 45MB/sec with EXT3 but only about 20MB/sec with GFS.

Thanks,
Raj 


On Wed, 07 Jun 2006 Wendy Cheng wrote :
>Raj Kumar wrote:
>
>>We are using GFS6.0 and RHEL3 servers. I have been using "async" for all my NFS exports. The speed that I reported is also using "async" option. Any other tips?
>>
>Was the "speed" measured by bonnie++ and only by bonnie++ ?
>
>-- Wendy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060606/57d5e898/attachment.htm>

From mwill at penguincomputing.com  Wed Jun  7 00:16:52 2006
From: mwill at penguincomputing.com (Michael Will)
Date: Tue, 6 Jun 2006 17:16:52 -0700
Subject: [Linux-cluster] gfs export over nfs is very slow
Message-ID: <433093DF7AD7444DA65EFAFE3987879C125DBF@jellyfish.highlyscyld.com>

Try xfs which requires centosplus or suse sles9

 -----Original Message-----
From: 	Raj  Kumar [mailto:rajkum2002 at rediffmail.com]
Sent:	Tue Jun 06 13:56:26 2006
To:	Wendy Cheng
Cc:	linux clustering
Subject:	Re: Re: [Linux-cluster] gfs export over nfs is very slow

No. I am measuring read speed. I used command "time cat test* > /dev/null"  to time reading 400 files. All our files are 33MB in size. Our processing applications reads many such files in real time and processes them. The app is falling behind because the read speed is less than 20MB/sec but it got to be able to read at least 2 files per second on average... so we need about 50-60Mb/sec transfer rate on our NFS clients. We could get about 45MB/sec with EXT3 but only about 20MB/sec with GFS.

Thanks,
Raj 


On Wed, 07 Jun 2006 Wendy Cheng wrote :
>Raj Kumar wrote:
>
>>We are using GFS6.0 and RHEL3 servers. I have been using "async" for all my NFS exports. The speed that I reported is also using "async" option. Any other tips?
>>
>Was the "speed" measured by bonnie++ and only by bonnie++ ?
>
>-- Wendy


 <http://adworks.rediff.com/cgi-bin/AdWorks/sigclick.cgi/www.rediff.com/signature-home.htm/1507191490 at Middle5?PARTNER=3>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060606/c7f57c0d/attachment.htm>

From wcheng at redhat.com  Wed Jun  7 03:48:19 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Tue, 06 Jun 2006 23:48:19 -0400
Subject: [Linux-cluster] gfs export over nfs is very slow
In-Reply-To: <433093DF7AD7444DA65EFAFE3987879C125DBF@jellyfish.highlyscyld.com>
References: <433093DF7AD7444DA65EFAFE3987879C125DBF@jellyfish.highlyscyld.com>
Message-ID: <1149652100.30034.23.camel@localhost.localdomain>

On Tue, 2006-06-06 at 17:16 -0700, Michael Will wrote:
> Try xfs which requires centosplus or suse sles9

XFS is not a cluster filesystem - unless you're going to pay for CXFS
which is not open source. And be aware that each filesystem has its own
strength and weakness. Adding different run time system configurations,
when unexpected problems occur, collaborate efforts to trouble-shoot
and/or improve the issues are the vital steps to make open source
projects work, IMHO.

As usual, above are my personal opinions - it is not necessarily my
management team's position.

-- Wendy


From john at turbocorp.com  Wed Jun  7 13:53:56 2006
From: john at turbocorp.com (John R. Allgood)
Date: Wed, 07 Jun 2006 09:53:56 -0400
Subject: [Linux-cluster] Redhat Cluster Suite and PostgreSQL
Message-ID: <4486DA74.5060605@turbocorp.com>

Hello

    I am new to this list and wanting to know if anyone is running 
Redhat Cluster Suite 3 and PostreSQL. We are converting from a Progress 
Database to a PostgreSQL Database. I have mulitple postmasters running 
various databases and each database is defined as a service under the 
cluster suite. I am running Redhat ES 3.0 Update 7 using Dual Opterons 
with 8GB RAM. I am using a failover solution so I have two servers 
primary/secondary and a shared data silo connected via fibre.
    I have various questions regarding clustering using the above 
mentioned cluster suite. One of my first questions is does the each 
service defined under the cluster suite need a seperate service ip. Also 
I have remote power switches installed on this server does this replace 
using software watchdog timers or is this used in conjunction with.

Thanks

John Allgood

-- 
I see the eigenvalue in thine eye,
I hear the tender tensor in thy sigh.
Bernoulli would have been content to die
Had he but known such _a-squared cos 2(phi)!
		-- Stanislaw Lem, "Cyberiad"


From rajkum2002 at rediffmail.com  Wed Jun  7 14:56:46 2006
From: rajkum2002 at rediffmail.com (Raj  Kumar)
Date: 7 Jun 2006 14:56:46 -0000
Subject: [Linux-cluster] Error: Is lock_gulm running
Message-ID: <20060607145646.31606.qmail@webmail17.rediffmail.com>

Hello,

Sometimes a GFS node doesn't shutdown cleanly. A lot of error messages like "Is lock_gulm running. error_code=111" show up on the console. The node doesn't shutdown so I have to power reset it and after starting the node fsck does the filesystem check. It just takes a lot of time. Is it possible to have the node shutdown cleanly in such cases. 

We use manual method for fencing one of the nodes. There was a network outage between the master lock server and this node. The node status was set to expire and fence_manual didn't succeed so it couldn't join the cluster after restarting it. fence_ack_manual -s nodeip complained there is no /tmp/fifo.tmp file. I had to restart the cluster to get this node join the cluster. Is it possible to join the node without restarting the cluster when it happens again?

Thanks,
Raj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060607/22fe5b24/attachment.htm>

From mwill at penguincomputing.com  Wed Jun  7 15:29:59 2006
From: mwill at penguincomputing.com (Michael Will)
Date: Wed, 7 Jun 2006 08:29:59 -0700
Subject: [Linux-cluster] gfs export over nfs is very slow
Message-ID: <433093DF7AD7444DA65EFAFE3987879C125DC1@jellyfish.highlyscyld.com>

The comment was geared towards the ext3+nfs performance question because it is likely that xfs+nfs can meet the required bandwidth.

Gfs is a good tool for increasing resiliency but not for increasing throughput (yet).

Michael

 -----Original Message-----
From: 	Wendy Cheng [mailto:wcheng at redhat.com]
Sent:	Tue Jun 06 20:37:56 2006
To:	linux-cluster at redhat.com
Subject:	Re: Re: [Linux-cluster] gfs export over nfs is very slow

On Tue, 2006-06-06 at 17:16 -0700, Michael Will wrote:
> Try xfs which requires centosplus or suse sles9

XFS is not a cluster filesystem - unless you're going to pay for CXFS
which is not open source. And be aware that each filesystem has its own
strength and weakness. Adding different run time system configurations,
when unexpected problems occur, collaborate efforts to trouble-shoot
and/or improve the issues are the vital steps to make open source
projects work, IMHO.

As usual, above are my personal opinions - it is not necessarily my
management team's position.

-- Wendy

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060607/ac35af13/attachment.htm>

From teigland at redhat.com  Wed Jun  7 17:27:40 2006
From: teigland at redhat.com (David Teigland)
Date: Wed, 7 Jun 2006 12:27:40 -0500
Subject: [Linux-cluster] [gfs_controld] send messages through separate cpg
Message-ID: <20060607172740.GA18684@redhat.com>

[new process requires all work to be sent to ml prior to cvs check-in]

Set up a separate cpg for sending messages (e.g. for processing
mount/unmount) instead of sending them through the cpg used to represent
the mount group.  Since we apply cpg changes to the mount group async,
that cpg won't always contain all the nodes we need to process the
mount/unmount.  A mount from one node in parallel with unmount from
another often won't work without this.


diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/Makefile cluster/gfs/lock_dlm/daemon/Makefile
--- cluster-HEAD/gfs/lock_dlm/daemon/Makefile	2006-03-27 01:31:46.000000000 -0600
+++ cluster/gfs/lock_dlm/daemon/Makefile	2006-06-06 17:19:40.740421037 -0500
@@ -21,6 +21,7 @@
 	-I../../include/ \
 	-I../../../group/lib/ \
 	-I../../../cman/lib/ \
+	-I../../../cman/daemon/openais/trunk/include/ \
 	-I../../../dlm/lib/ \
 	-I../../../gfs-kernel/src/dlm/
 
@@ -33,12 +34,14 @@
 
 gfs_controld: 	main.o \
 		member_cman.o \
+		cpg.o \
 		group.o \
 		plock.o \
 		recover.o \
 		withdraw.o \
 		../../../dlm/lib/libdlm_lt.a \
 		../../../cman/lib/libcman.a \
+		../../../cman/daemon/openais/trunk/lib/libcpg.a \
 		../../../group/lib/libgroup.a
 	$(CC) $(LDFLAGS) -o $@ $^
 
@@ -49,6 +52,9 @@
 member_cman.o: member_cman.c
 	$(CC) $(CFLAGS) -c -o $@ $<
 
+cpg.o: cpg.c
+	$(CC) $(CFLAGS) -c -o $@ $<
+
 recover.o: recover.c
 	$(CC) $(CFLAGS) -c -o $@ $<
 
diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/cpg.c cluster/gfs/lock_dlm/daemon/cpg.c
--- cluster-HEAD/gfs/lock_dlm/daemon/cpg.c	1969-12-31 18:00:00.000000000 -0600
+++ cluster/gfs/lock_dlm/daemon/cpg.c	2006-06-07 11:54:28.478585576 -0500
@@ -0,0 +1,212 @@
+/******************************************************************************
+*******************************************************************************
+**
+**  Copyright (C) 2006 Red Hat, Inc.  All rights reserved.
+**
+**  This copyrighted material is made available to anyone wishing to use,
+**  modify, copy, or redistribute it subject to the terms and conditions
+**  of the GNU General Public License v.2.
+**
+*******************************************************************************
+******************************************************************************/
+
+#include "lock_dlm.h"
+#include "cpg.h"
+
+static cpg_handle_t	daemon_handle;
+static struct cpg_name	daemon_name;
+static int		got_msg;
+static int		saved_nodeid;
+static int		saved_len;
+static char		saved_data[MAX_MSGLEN];
+
+void receive_journals(struct mountgroup *mg, char *buf, int len, int from);
+void receive_options(struct mountgroup *mg, char *buf, int len, int from);
+void receive_remount(struct mountgroup *mg, char *buf, int len, int from);
+void receive_plock(struct mountgroup *mg, char *buf, int len, int from);
+void receive_recovery_status(struct mountgroup *mg, char *buf, int len,
+			     int from);
+void receive_recovery_done(struct mountgroup *mg, char *buf, int len, int from);
+
+
+static void do_deliver(int nodeid, char *data, int len)
+{
+	struct mountgroup *mg;
+	struct gdlm_header *hd;
+
+	hd = (struct gdlm_header *) data;
+
+	mg = find_mg(hd->name);
+	if (!mg)
+		return;
+
+	hd->version[0]	= le16_to_cpu(hd->version[0]);
+	hd->version[1]	= le16_to_cpu(hd->version[1]);
+	hd->version[2]	= le16_to_cpu(hd->version[2]);
+	hd->type	= le16_to_cpu(hd->type);
+	hd->nodeid	= le32_to_cpu(hd->nodeid);
+	hd->to_nodeid	= le32_to_cpu(hd->to_nodeid);
+
+	if (hd->version[0] != GDLM_VER_MAJOR) {
+		log_error("reject message version %u.%u.%u",
+			  hd->version[0], hd->version[1], hd->version[2]);
+		return;
+	}
+
+	/* If there are some group messages between a new node being added to
+	   the cpg group and being added to the app group, the new node should
+	   discard them since they're only relevant to the app group. */
+
+	if (!mg->last_callback) {
+		log_group(mg, "discard message type %d len %d from %d",
+			  hd->type, len, nodeid);
+		return;
+	}
+
+	switch (hd->type) {
+	case MSG_JOURNAL: 
+		receive_journals(mg, data, len, nodeid);
+		break;
+
+	case MSG_OPTIONS:
+		receive_options(mg, data, len, nodeid);
+		break;
+
+	case MSG_REMOUNT:
+		receive_remount(mg, data, len, nodeid);
+		break;
+
+	case MSG_PLOCK:
+		receive_plock(mg, data, len, nodeid);
+		break;
+
+	case MSG_RECOVERY_STATUS:
+		receive_recovery_status(mg, data, len, nodeid);
+		break;
+
+	case MSG_RECOVERY_DONE:
+		receive_recovery_done(mg, data, len, nodeid);
+		break;
+
+	default:
+		log_error("unknown message type %d from %d",
+			  hd->type, hd->nodeid);
+	}
+}
+
+void deliver_cb(cpg_handle_t handle, struct cpg_name *group_name,
+		uint32_t nodeid, uint32_t pid, void *data, int data_len)
+{
+	saved_nodeid = nodeid;
+	saved_len = data_len;
+	memcpy(saved_data, data, data_len);
+	got_msg = 1;
+}
+
+void confchg_cb(cpg_handle_t handle, struct cpg_name *group_name,
+		struct cpg_address *member_list, int member_list_entries,
+		struct cpg_address *left_list, int left_list_entries,
+		struct cpg_address *joined_list, int joined_list_entries)
+{
+}
+
+static cpg_callbacks_t callbacks = {
+	.cpg_deliver_fn = deliver_cb,
+	.cpg_confchg_fn = confchg_cb,
+};
+
+int process_cpg(void)
+{
+	cpg_error_t error;
+	
+	got_msg = 0;
+	saved_len = 0;
+	saved_nodeid = 0;
+	memset(saved_data, 0, sizeof(saved_data));
+
+	error = cpg_dispatch(daemon_handle, CPG_DISPATCH_ONE);
+	if (error != CPG_OK) {
+		log_error("cpg_dispatch error %d", error);
+		return -1;
+	}
+
+	if (got_msg)
+		do_deliver(saved_nodeid, saved_data, saved_len);
+	return 0;
+}
+
+int setup_cpg(void)
+{
+	cpg_error_t error;
+	int fd = 0;
+
+	error = cpg_initialize(&daemon_handle, &callbacks);
+	if (error != CPG_OK) {
+		log_error("cpg_initialize error %d", error);
+		return -1;
+	}
+
+	cpg_fd_get(daemon_handle, &fd);
+	if (fd < 0)
+		return -1;
+
+	memset(&daemon_name, 0, sizeof(daemon_name));
+	strcpy(daemon_name.value, "gfs_controld");
+	daemon_name.length = 12;
+
+ retry:
+	error = cpg_join(daemon_handle, &daemon_name);
+	if (error == CPG_ERR_TRY_AGAIN) {
+		log_debug("setup_cpg cpg_join retry");
+		sleep(1);
+		goto retry;
+	}
+	if (error != CPG_OK) {
+		log_error("cpg_join error %d", error);
+		cpg_finalize(daemon_handle);
+		return -1;
+	}
+
+	log_debug("cpg %d", fd);
+	return fd;
+}
+
+static int _send_message(cpg_handle_t h, void *buf, int len)
+{
+	struct iovec iov;
+	cpg_error_t error;
+	int retries = 0;
+
+	iov.iov_base = buf;
+	iov.iov_len = len;
+
+ retry:
+	error = cpg_mcast_joined(h, CPG_TYPE_AGREED, &iov, 1);
+	if (error != CPG_OK)
+		log_error("cpg_mcast_joined error %d handle %llx", error, h);
+	if (error == CPG_ERR_TRY_AGAIN) {
+		/* FIXME: backoff say .25 sec, .5 sec, .75 sec, 1 sec */
+		retries++;
+		if (retries > 3)
+			sleep(1);
+		goto retry;
+	}
+
+	return 0;
+}
+
+int send_group_message(struct mountgroup *mg, int len, char *buf)
+{
+	struct gdlm_header *hd = (struct gdlm_header *) buf;
+
+	hd->version[0]	= cpu_to_le16(GDLM_VER_MAJOR);
+	hd->version[1]	= cpu_to_le16(GDLM_VER_MINOR);
+	hd->version[2]	= cpu_to_le16(GDLM_VER_PATCH);
+	hd->type	= cpu_to_le16(hd->type);
+	hd->nodeid	= cpu_to_le32(hd->nodeid);
+	hd->to_nodeid	= cpu_to_le32(hd->to_nodeid);
+	memcpy(hd->name, mg->name, strlen(mg->name));
+	
+	return _send_message(daemon_handle, buf, len);
+}
+
diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/group.c cluster/gfs/lock_dlm/daemon/group.c
--- cluster-HEAD/gfs/lock_dlm/daemon/group.c	2006-06-07 12:10:32.102338261 -0500
+++ cluster/gfs/lock_dlm/daemon/group.c	2006-06-06 17:23:06.523976113 -0500
@@ -21,25 +21,14 @@
 static int cb_event_nr;
 static unsigned int cb_id;
 static int cb_type;
-static int cb_nodeid;
-static int cb_len;
 static int cb_member_count;
 static int cb_members[MAX_GROUP_MEMBERS];
-static char cb_message[MAX_MSGLEN+1];
 
 int do_stop(struct mountgroup *mg);
 int do_finish(struct mountgroup *mg);
 int do_terminate(struct mountgroup *mg);
 int do_start(struct mountgroup *mg, int type, int count, int *nodeids);
 
-void receive_journals(struct mountgroup *mg, char *buf, int len, int from);
-void receive_options(struct mountgroup *mg, char *buf, int len, int from);
-void receive_remount(struct mountgroup *mg, char *buf, int len, int from);
-void receive_plock(struct mountgroup *mg, char *buf, int len, int from);
-void receive_recovery_status(struct mountgroup *mg, char *buf, int len,
-			     int from);
-void receive_recovery_done(struct mountgroup *mg, char *buf, int len, int from);
-
 
 static void stop_cbfn(group_handle_t h, void *private, char *name)
 {
@@ -87,17 +76,9 @@
 static void deliver_cbfn(group_handle_t h, void *private, char *name,
 			 int nodeid, int len, char *buf)
 {
-	int n;
-	cb_action = DO_DELIVER;
-	strncpy(cb_name, name, MAX_GROUP_NAME_LEN);
-	cb_nodeid = nodeid;
-	cb_len = n = len;
-	if (len > MAX_MSGLEN)
-		n = MAX_MSGLEN;
-	memcpy(&cb_message, buf, n);
 }
 
-group_callbacks_t callbacks = {
+static group_callbacks_t callbacks = {
 	stop_cbfn,
 	start_cbfn,
 	finish_cbfn,
@@ -106,53 +87,6 @@
 	deliver_cbfn
 };
 
-static void do_deliver(struct mountgroup *mg)
-{
-	struct gdlm_header *hd;
-
-	hd = (struct gdlm_header *) cb_message;
-
-	/* If there are some group messages between a new node being added to
-	   the cpg group and being added to the app group, the new node should
-	   discard them since they're only relevant to the app group. */
-
-	if (!mg->last_callback) {
-		log_group(mg, "discard message type %d len %d from %d",
-			  hd->type, cb_len, cb_nodeid);
-		return;
-	}
-
-	switch (hd->type) {
-	case MSG_JOURNAL:
-		receive_journals(mg, cb_message, cb_len, cb_nodeid);
-		break;
-
-	case MSG_OPTIONS:
-		receive_options(mg, cb_message, cb_len, cb_nodeid);
-		break;
-
-	case MSG_REMOUNT:
-		receive_remount(mg, cb_message, cb_len, cb_nodeid);
-		break;
-
-	case MSG_PLOCK:
-		receive_plock(mg, cb_message, cb_len, cb_nodeid);
-		break;
-
-	case MSG_RECOVERY_STATUS:
-		receive_recovery_status(mg, cb_message, cb_len, cb_nodeid);
-		break;
-
-	case MSG_RECOVERY_DONE:
-		receive_recovery_done(mg, cb_message, cb_len, cb_nodeid);
-		break;
-
-	default:
-		log_error("unknown message type %d from %d",
-			  hd->type, hd->nodeid);
-	}
-}
-
 char *str_members(void)
 {
 	static char buf[MAXLINE];
@@ -222,12 +156,6 @@
 		mg->id = cb_id;
 		break;
 
-	case DO_DELIVER:
-		log_debug("groupd callback: deliver %s len %d nodeid %d",
-			  cb_name, cb_len, cb_nodeid);
-		do_deliver(mg);
-		break;
-
 	default:
 		error = -EINVAL;
 	}
@@ -257,15 +185,3 @@
 	return rv;
 }
 
-int send_group_message(struct mountgroup *mg, int len, char *buf)
-{
-	int error;
-
-	error = group_send(gh, mg->name, len, buf);
-	if (error < 0)
-		log_error("group_send error %d errno %d", error, errno);
-	else
-		error = 0;
-	return error;
-}
-
diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/lock_dlm.h cluster/gfs/lock_dlm/daemon/lock_dlm.h
--- cluster-HEAD/gfs/lock_dlm/daemon/lock_dlm.h	2006-05-25 14:30:40.000000000 -0500
+++ cluster/gfs/lock_dlm/daemon/lock_dlm.h	2006-06-06 17:18:25.510916543 -0500
@@ -201,11 +201,16 @@
 	MSG_RECOVERY_DONE,
 };
 
+#define GDLM_VER_MAJOR 1
+#define GDLM_VER_MINOR 0
+#define GDLM_VER_PATCH 0
+
 struct gdlm_header {
 	uint16_t		version[3];
 	uint16_t		type;			/* MSG_ */
 	uint32_t		nodeid;			/* sender */
 	uint32_t		to_nodeid;		/* 0 if to all */
+	char			name[MAXNAME];
 };
 
 
@@ -214,6 +219,8 @@
 
 int setup_cman(void);
 int process_cman(void);
+int setup_cpg(void);
+int process_cpg(void);
 int setup_groupd(void);
 int process_groupd(void);
 int setup_libdlm(void);
diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/main.c cluster/gfs/lock_dlm/daemon/main.c
--- cluster-HEAD/gfs/lock_dlm/daemon/main.c	2006-04-21 14:54:10.000000000 -0500
+++ cluster/gfs/lock_dlm/daemon/main.c	2006-06-07 11:59:12.248223925 -0500
@@ -25,6 +25,7 @@
 static struct pollfd pollfd[MAX_CLIENTS];
 
 static int cman_fd;
+static int cpg_fd;
 static int listen_fd;
 static int groupd_fd;
 static int uevent_fd;
@@ -249,6 +250,11 @@
 		goto out;
 	client_add(cman_fd, &maxi);
 
+	rv = cpg_fd = setup_cpg();
+	if (rv < 0)
+		goto out;
+	client_add(cpg_fd, &maxi);
+
 	rv = groupd_fd = setup_groupd();
 	if (rv < 0)
 		goto out;
@@ -272,6 +278,8 @@
 		goto out;
 	client_add(plocks_fd, &maxi);
 
+	log_debug("setup done");
+
 	for (;;) {
 		rv = poll(pollfd, maxi + 1, -1);
 		if (rv < 0)
@@ -296,6 +304,8 @@
 					process_groupd();
 				else if (pollfd[i].fd == cman_fd)
 					process_cman();
+				else if (pollfd[i].fd == cpg_fd)
+					process_cpg();
 				else if (pollfd[i].fd == uevent_fd)
 					process_uevent();
 				else if (!no_withdraw &&
@@ -310,7 +320,6 @@
 			if (pollfd[i].revents & POLLHUP) {
 				if (pollfd[i].fd == cman_fd)
 					exit_cman();
-				log_debug("closing fd %d", pollfd[i].fd);
 				close(pollfd[i].fd);
 			}
 		}


From teigland at redhat.com  Thu Jun  8 18:49:42 2006
From: teigland at redhat.com (David Teigland)
Date: Thu, 8 Jun 2006 13:49:42 -0500
Subject: [Linux-cluster] post_fail_delay
In-Reply-To: <20060606152301.qrmkr58ls8kcs4ss@web.obsidianonline.net>
References: <20060606152301.qrmkr58ls8kcs4ss@web.obsidianonline.net>
Message-ID: <20060608184942.GA6203@redhat.com>

On Tue, Jun 06, 2006 at 03:23:01PM +0200, riaan at obsidian.co.za wrote:

> I would like for an errant GFS node to be able to create network/disk
> dumps before being power fenced. Am I missing something, or is this
> leaving the errand node unfenced for any significant amount of time
> (enough to complete the dump, assuming it is upwards of a few seconds)
> just a bad idea?

No, adding a delay before fencing is just fine, it just prolongs the time
until other stuff can be recovered and used normally again.

> AFAIUnderstand, the whole idea of fencing is to prevent the node from
> damaging the file system in the first place, making the collection of
> dumps and power fencing fundamentally at odds with each other.

The only way the failed node is going to damage anything is if it happens
to write to the fs after its journal has been recovered.  That's why the
only requirement for fencing is that it happens prior to gfs journal
recovery.  If a failed node writes to the fs before journal recovery it's
no problem.

If you want a failed node to disk/net-dump, then set post_fail_delay to
some number of seconds just greater than the typical time a dump takes.

Dave


From tmelhiser at hotmail.com  Thu Jun  8 19:31:30 2006
From: tmelhiser at hotmail.com (Travis Melhiser)
Date: Thu, 08 Jun 2006 15:31:30 -0400
Subject: [Linux-cluster] Oracle 10GR2 on GFS
Message-ID: <BAY118-F11AA660B505BBB6DC66893BA8B0@phx.gbl>

Is there any way to get 10GR2 to go past the ocrconfig script error: OCRFile 
is on FS type 18225520. Not supported.

-Travis

_________________________________________________________________
Don?t just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


From vcmarti at sph.emory.edu  Thu Jun  8 20:17:09 2006
From: vcmarti at sph.emory.edu (Vernard C. Martin)
Date: Thu, 08 Jun 2006 16:17:09 -0400
Subject: [Linux-cluster] Error starting up CLVMD
Message-ID: <448885C5.4050505@sph.emory.edu>

I recently upgraded from an old version of GFS from May of last year to 
the latest stable version in the CVS tree. I did this because it wcould 
compile against the latest kernel in RHEL4U3.  Its a two node cluster 
and had been exhibiing some crashes under heavy I/O load so I thought 
that the upgrade might help stabilize it.

The first node came up fine but the 2nd node is giving me a strange 
error when trying to start up "clvmd". The error is:[root at node001 ~]# clvmd
clvmd could not connect to cluster manager
Consult syslog for more information
[root at node001 ~]#

the syslog has:
Jun  8 16:04:16 node001 clvmd: Unable to create lockspace for CLVM: No 
such file or directory

so exactly which file is it talking about so that I can make sure that 
its there.

Any help would be appreciated.


From rpeterso at redhat.com  Thu Jun  8 21:16:59 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Thu, 08 Jun 2006 16:16:59 -0500
Subject: [Linux-cluster] Updates to libgfs2
Message-ID: <1149801419.12291.9.camel@technetium.msp.redhat.com>

Hi,

I just wanted to let you know: I made some bug fixes to libgfs2 for
problems with fsck.  The following is a patch with the code changes:
Also, there were some parts that got missed from the original commit
that are there now.

Regards,

Bob Peterson
Red Hat Cluster Suite

Index: buf.c
===================================================================
RCS file: /cvs/cluster/cluster/gfs2/libgfs2/buf.c,v
retrieving revision 1.2
diff -w -u -p -u -p -r1.2 buf.c
--- buf.c	6 Jun 2006 14:20:41 -0000	1.2
+++ buf.c	8 Jun 2006 20:58:48 -0000
@@ -188,15 +188,17 @@ void bsync(struct gfs2_sbd *sdp)
 /* commit buffers to disk but do not discard */
 void bcommit(struct gfs2_sbd *sdp)
 {
-	osi_list_t *tmp;
+	osi_list_t *tmp, *x;
 	struct gfs2_buffer_head *bh;
 
-	osi_list_foreach(tmp, &sdp->buf_list) {
+	osi_list_foreach_safe(tmp, &sdp->buf_list, x) {
 		bh = osi_list_entry(tmp, struct gfs2_buffer_head, b_list);
-		if (bh->b_changed) {
+		if (!bh->b_count)             /* if not reserved for later */
+			write_buffer(sdp, bh);    /* write the data, free the memory */
+		else if (bh->b_changed) {     /* if buffer has changed */
 			do_lseek(sdp, bh->b_blocknr * sdp->bsize);
-			do_write(sdp, bh->b_data, sdp->bsize);
-			bh->b_changed = FALSE;
+			do_write(sdp, bh->b_data, sdp->bsize); /* write it out */
+			bh->b_changed = FALSE;    /* no longer changed */
 		}
 	}
 }
Index: fs_ops.c
===================================================================
RCS file: /cvs/cluster/cluster/gfs2/libgfs2/fs_ops.c,v
retrieving revision 1.2
diff -w -u -p -u -p -r1.2 fs_ops.c
--- fs_ops.c	6 Jun 2006 14:20:41 -0000	1.2
+++ fs_ops.c	8 Jun 2006 20:58:49 -0000
@@ -502,14 +502,12 @@ int gfs2_readi(struct gfs2_inode *ip, vo
 	return copied;
 }
 
-static void
-copy_from_mem(struct gfs2_buffer_head *bh, void **buf,
+static void copy_from_mem(struct gfs2_buffer_head *bh, void **buf,
 	      unsigned int offset, unsigned int size)
 {
 	char **p = (char **)buf;
 
 	memcpy(bh->b_data + offset, *p, size);
-
 	*p += size;
 }
 
@@ -526,7 +524,6 @@ int gfs2_writei(struct gfs2_inode *ip, v
 	int isdir = !!(S_ISDIR(ip->i_di.di_flags));
 	const uint64_t start = offset;
 	int copied = 0;
-	enum update_flags f;
 
 	if (!size)
 		return 0;
@@ -558,7 +555,6 @@ int gfs2_writei(struct gfs2_inode *ip, v
 			block_map(ip, lblock, &new, &dblock, &extlen);
 		}
 
-		f = not_updated;
 		if (new) {
 			bh = bget(sdp, dblock);
 			if (isdir) {
@@ -567,12 +563,11 @@ int gfs2_writei(struct gfs2_inode *ip, v
 				mh.mh_type = GFS2_METATYPE_JD;
 				mh.mh_format = GFS2_FORMAT_JD;
 				gfs2_meta_header_out(&mh, bh->b_data);
-				f = updated;
 			}
 		} else
 			bh = bread(sdp, dblock);
 		copy_from_mem(bh, &buf, o, amount);
-		brelse(bh, f);
+		brelse(bh, updated);
 
 		copied += amount;
 		lblock++;
@@ -1084,8 +1079,7 @@ dir_make_exhash(struct gfs2_inode *dip)
 	dip->i_di.di_depth = y;
 }
 
-static void
-dir_l_add(struct gfs2_inode *dip, char *filename, int len,
+static void dir_l_add(struct gfs2_inode *dip, char *filename, int len,
 		  struct gfs2_inum *inum, unsigned int type)
 {
 	struct gfs2_dirent *dent;
@@ -1564,11 +1558,10 @@ int gfs2_freedi(struct gfs2_sbd *sdp, ui
 	struct gfs2_inode *ip;
 	struct gfs2_buffer_head *bh;
 	int x;
-	uint64_t p, freed_blocks;
+	uint64_t p;
 	unsigned char *buf;
 	struct rgrp_list *rgd;
 	
-	freed_blocks = 0;
 	bh = bread(sdp, block);
 	ip = inode_get(sdp, bh);
 	if (ip->i_di.di_height > 0) {
@@ -1578,14 +1571,19 @@ int gfs2_freedi(struct gfs2_sbd *sdp, ui
 			 x += sizeof(uint64_t)) {
 			p = be64_to_cpu(*(uint64_t *)(buf + x));
 			if (p) {
-				freed_blocks++;
 				gfs2_set_bitmap(sdp, p, GFS2_BLKST_FREE);
+				/* We need to adjust the free space count for the freed */
+                /* indirect block. */
+				rgd = gfs2_blk2rgrpd(sdp, p); /* find the rg for indir block */
+				bh = bget(sdp, rgd->ri.ri_addr); /* get the buffer its rg */
+				rgd->rg.rg_free++; /* adjust the free count */
+				gfs2_rgrp_out(&rgd->rg, bh->b_data); /* back to the buffer */
+				brelse(bh, updated); /* release the buffer */
 			}
 		}
 	}
 	/* Set the bitmap type for inode to free space: */
 	gfs2_set_bitmap(sdp, ip->i_di.di_num.no_addr, GFS2_BLKST_FREE);
-	freed_blocks++; /* one for the inode itself */
 	inode_put(ip, updated);
 	/* Now we have to adjust the rg freespace count and inode count: */
 	rgd = gfs2_blk2rgrpd(sdp, block);
@@ -1593,7 +1591,7 @@ int gfs2_freedi(struct gfs2_sbd *sdp, ui
 	/* buffer in memory for the rg on disk because we used it to fix the */
 	/* bitmaps, some of which are on the same block on disk.             */
 	bh = bread(sdp, rgd->ri.ri_addr); /* get the buffer */
-	rgd->rg.rg_free += freed_blocks;
+	rgd->rg.rg_free++;
 	rgd->rg.rg_dinodes--; /* one less inode in use */
 	gfs2_rgrp_out(&rgd->rg, bh->b_data);
 	brelse(bh, updated); /* release the buffer */


From rpeterso at redhat.com  Thu Jun  8 21:26:41 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Thu, 08 Jun 2006 16:26:41 -0500
Subject: [Linux-cluster] Patch to gfs2_convert
Message-ID: <1149802001.12291.15.camel@technetium.msp.redhat.com>

Hi,

This patch to gfs2_convert makes it much more forgiving when fs conversions
are interrupted in the middle due to power loss, interrupts, or other
reasons.  Now, if a filesystem conversion is interrupted mid-way through,
the tool should be able to pick up where it left off without damage.

As always, send questions, comments and concerns to me.  If I don't hear
from anybody, I'll commit it to cvs in a few days.

Regards,

Bob Peterson
Red Hat Cluster Suite

Index: gfs2_convert.c
===================================================================
RCS file: /cvs/cluster/cluster/gfs2/convert/gfs2_convert.c,v
retrieving revision 1.2
diff -w -u -p -u -p -r1.2 gfs2_convert.c
--- gfs2_convert.c	6 Jun 2006 14:37:47 -0000	1.2
+++ gfs2_convert.c	8 Jun 2006 21:13:37 -0000
@@ -77,12 +77,14 @@ void convert_bitmaps(struct gfs2_sbd *sd
 	int x, y;
 	struct gfs2_rindex *ri;
 	unsigned char state;
+	struct gfs2_buffer_head *bh;
 
 	ri = &rgd2->ri;
 	gfs2_compute_bitstructs(sdp, rgd2); /* mallocs bh as array */
 	for (blk = 0; blk < ri->ri_length; blk++) {
-		rgd2->bh[blk] = bget_generic(sdp, ri->ri_addr + blk, read_disk,
-									 read_disk);
+		bh = bget_generic(sdp, ri->ri_addr + blk, read_disk, read_disk);
+		if (!rgd2->bh[blk])
+			rgd2->bh[blk] = bh;
 		x = (blk) ? sizeof(struct gfs2_meta_header) : sizeof(struct gfs2_rgrp);
 
 		for (; x < sdp->bsize; x++)
@@ -92,7 +94,6 @@ void convert_bitmaps(struct gfs2_sbd *sd
 				if (state == 0x02) /* unallocated metadata state invalid */
 					rgd2->bh[blk]->b_data[x] &= ~(0x02 << (GFS2_BIT_SIZE * y));
 			}
-		brelse(rgd2->bh[blk], updated);
 	}
 }/* convert_bitmaps */
 
@@ -134,10 +135,8 @@ static int superblock_cvt(int disk_fd, c
 	/* convert the ondisk sb structure   */
 	/* --------------------------------- */
 	sb2->sd_sb.sb_header.mh_magic = GFS2_MAGIC;
-	sb2->sd_sb.sb_fs_format = GFS2_FORMAT_FS;
 	sb2->sd_sb.sb_header.mh_type = GFS2_METATYPE_SB;
 	sb2->sd_sb.sb_header.mh_format = GFS2_FORMAT_SB;
-	sb2->sd_sb.sb_multihost_format = GFS2_FORMAT_MULTI;
 	sb2->sd_sb.sb_bsize = sb1->sd_sb.sb_bsize;
 	sb2->sd_sb.sb_bsize_shift = sb1->sd_sb.sb_bsize_shift;
 	strcpy(sb2->sd_sb.sb_lockproto, sb1->sd_sb.sb_lockproto);
@@ -174,14 +173,14 @@ static int superblock_cvt(int disk_fd, c
 		rgd2->ri.ri_data0 = rgd->rd_ri.ri_data1;
 		rgd2->ri.ri_data = rgd->rd_ri.ri_data;
 		rgd2->ri.ri_bitbytes = rgd->rd_ri.ri_bitbytes;
-		/* commit the changes to a gfs2 buffer */
-		bh = bread(sb2, rgd2->ri.ri_addr); /* get a gfs2 buffer for the rg */
-		gfs2_rgrp_out(&rgd2->rg, bh->b_data);
-		brelse(bh, updated); /* release the buffer */
 		/* Add the new gfs2 rg to our list: We'll output the index later. */
 		osi_list_add_prev((osi_list_t *)&rgd2->list,
 						  (osi_list_t *)&sb2->rglist);
 		convert_bitmaps(sb2, rgd2, TRUE);
+		/* Write the updated rgrp to the gfs2 buffer */
+		bh = bget(sb2, rgd2->ri.ri_addr); /* get a gfs2 buffer for the rg */
+		gfs2_rgrp_out(&rgd2->rg, rgd2->bh[0]->b_data);
+		brelse(bh, updated); /* release the buffer */
 	}
 	return 0;
 }/* superblock_cvt */
@@ -195,8 +194,12 @@ int adjust_inode(struct gfs2_sbd *sbp, s
 {
 	struct gfs2_inode *inode;
 	struct inode_block *fixdir;
+	int inode_was_gfs1;
 
 	inode = inode_get(sbp, bh);
+
+	inode_was_gfs1 = (inode->i_di.di_num.no_formal_ino ==
+					  inode->i_di.di_num.no_addr);
 	/* Fix the inode number: */
 	inode->i_di.di_num.no_formal_ino = sbp->md.next_inum;           ;
 	
@@ -240,11 +243,23 @@ int adjust_inode(struct gfs2_sbd *sbp, s
 	/* di_goal_meta has shifted locations and di_goal_data has     */
 	/* changed from 32-bits to 64-bits.  The following code        */
 	/* adjusts for the shift.                                      */
+	/*                                                             */
+	/* Note: It may sound absurd, but we need to check if this     */
+	/*       inode has already been converted to gfs2 or if it's   */
+	/*       still a gfs1 inode.  That's just in case there was a  */
+	/*       prior attempt to run gfs2_convert that never finished */
+	/*       (due to power out, ctrl-c, kill, segfault, whatever.) */
+	/*       If it is unconverted gfs1 we want to do a full        */
+	/*       conversion.  If it's a gfs2 inode from a prior run,   */
+	/*       we still need to renumber the inode, but here we      */
+	/*       don't want to shift the data around.                  */
 	/* ----------------------------------------------------------- */
+	if (inode_was_gfs1) {
 	inode->i_di.di_goal_meta = inode->i_di.di_goal_data;
 	inode->i_di.di_goal_data = 0; /* make sure the upper 32b are 0 */
 	inode->i_di.di_goal_data = inode->i_di.__pad[0];
 	inode->i_di.__pad[1] = 0;
+	}
 	
 	gfs2_dinode_out(&inode->i_di, bh->b_data);
 	sbp->md.next_inum++; /* update inode count */
@@ -344,7 +359,7 @@ int inode_renumber(struct gfs2_sbd *sbp,
 /* ------------------------------------------------------------------------- */
 /* fetch_inum - fetch an inum entry from disk, given its block               */
 /* ------------------------------------------------------------------------- */
-int fetch_and_fix_inum(struct gfs2_sbd *sbp, uint64_t iblock,
+int fetch_inum(struct gfs2_sbd *sbp, uint64_t iblock,
 					   struct gfs2_inum *inum)
 {
 	struct gfs2_buffer_head *bh_fix;
@@ -356,7 +371,7 @@ int fetch_and_fix_inum(struct gfs2_sbd *
 	inum->no_addr = fix_inode->i_di.di_num.no_addr;
 	brelse(bh_fix, updated);
 	return 0;
-}/* fetch_and_fix_inum */
+}/* fetch_inum */
 
 /* ------------------------------------------------------------------------- */
 /* process_dirent_info - fix one dirent (directory entry) buffer             */
@@ -382,6 +397,7 @@ int process_dirent_info(struct gfs2_inod
 	/* Turns out you can't trust dir_entries is correct.     */
 	for (de = 0; ; de++) {
 		struct gfs2_inum inum;
+		int dent_was_gfs1;
 		
 		gettimeofday(&tv, NULL);
 		/* Do more warm fuzzy stuff for the customer. */
@@ -394,18 +410,24 @@ int process_dirent_info(struct gfs2_inod
 		}
 		/* fix the dirent's inode number based on the inode */
 		gfs2_inum_in(&inum, (char *)&dent->de_inum);
+		dent_was_gfs1 = (dent->de_inum.no_addr == dent->de_inum.no_formal_ino);
 		if (inum.no_formal_ino) { /* if not a sentinel (placeholder) */
-			error = fetch_and_fix_inum(sbp, inum.no_addr, &inum);
+			error = fetch_inum(sbp, inum.no_addr, &inum);
 			if (error) {
 				printf("Error retrieving inode %" PRIx64 "\n", inum.no_addr);
 				break;
 			}
+			/* fix the dirent's inode number from the fetched inum. */
+			dent->de_inum.no_formal_ino = cpu_to_be64(inum.no_formal_ino);
 		}
 		/* Fix the dirent's filename hash: They are the same as gfs1 */
 		/* dent->de_hash = cpu_to_be32(gfs2_disk_hash((char *)(dent + 1), */
 		/*                             be16_to_cpu(dent->de_name_len))); */
 		/* Fix the dirent's file type.  Gfs1 used home-grown values.  */
 		/* Gfs2 uses standard values from include/linux/fs.h          */
+		/* Only do this if the dent was a true gfs1 dent, and not a   */
+		/* gfs2 dent converted from a previously aborted run.         */
+		if (dent_was_gfs1) {
 		switch be16_to_cpu(dent->de_type) {
 		case GFS_FILE_NON:
 			dent->de_type = cpu_to_be16(DT_UNKNOWN);
@@ -432,7 +454,7 @@ int process_dirent_info(struct gfs2_inod
 			dent->de_type = cpu_to_be16(DT_SOCK);
 			break;
 		}
-
+		}
 		error = gfs2_dirent_next(dip, bh, &dent);
 		if (error)
 			break;
@@ -948,26 +970,33 @@ int main(int argc, char **argv)
 		inode_put(sb2.md.inum, updated);
 		inode_put(sb2.md.statfs, updated);
 
-		bh = bread(&sb2, sb2.sb_addr);
-		gfs2_sb_out(&sb2.sd_sb, bh->b_data);
-		brelse(bh, updated);
 		bcommit(&sb2); /* write the buffers to disk */
 
 		/* Now delete the now-obsolete gfs1 files: */
 		printf("Removing obsolete gfs1 structures.\n");
 		fflush(stdout);
-		/* Delete the Journal index: */
+		/* Delete the old gfs1 Journal index: */
 		gfs2_freedi(&sb2, sb.sd_sb.sb_jindex_di.no_addr);
-		/* Delete the rgindex: */
+		/* Delete the old gfs1 rgindex: */
 		gfs2_freedi(&sb2, sb.sd_sb.sb_rindex_di.no_addr);
-		/* Delete the Quota file: */
+		/* Delete the old gfs1 Quota file: */
 		gfs2_freedi(&sb2, sb.sd_sb.sb_quota_di.no_addr);
-		/* Delete the License file: */
+		/* Delete the old gfs1 License file: */
 		gfs2_freedi(&sb2, sb.sd_sb.sb_license_di.no_addr);
-		/* Now free all the rgrps */
+		/* Now free all the in memory */
 		gfs2_rgrp_free(&sb2, updated);
 		printf("Committing changes to disk.\n");
 		fflush(stdout);
+		/* Set filesystem type in superblock to gfs2.  We do this at the */
+		/* end because if the tool is interrupted in the middle, we want */
+		/* it to not reject the partially converted fs as already done   */
+		/* when it's run a second time.                                  */
+		bh = bread(&sb2, sb2.sb_addr);
+		sb2.sd_sb.sb_fs_format = GFS2_FORMAT_FS;
+		sb2.sd_sb.sb_multihost_format = GFS2_FORMAT_MULTI;
+		gfs2_sb_out(&sb2.sd_sb, bh->b_data);
+		brelse(bh, updated);
+
 		bsync(&sb2); /* write the buffers to disk */
 		error = fsync(disk_fd);
 		if (error)


From rpeterso at redhat.com  Thu Jun  8 22:16:33 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Thu, 08 Jun 2006 17:16:33 -0500
Subject: [Linux-cluster] Error starting up CLVMD
In-Reply-To: <448885C5.4050505@sph.emory.edu>
References: <448885C5.4050505@sph.emory.edu>
Message-ID: <1149804993.12291.27.camel@technetium.msp.redhat.com>

On Thu, 2006-06-08 at 16:17 -0400, Vernard C. Martin wrote:
> The first node came up fine but the 2nd node is giving me a strange 
> error when trying to start up "clvmd". The error is:[root at node001 ~]# clvmd
> clvmd could not connect to cluster manager
> Consult syslog for more information
> [root at node001 ~]#
> 
> the syslog has:
> Jun  8 16:04:16 node001 clvmd: Unable to create lockspace for CLVM: No 
> such file or directory

Hi Vernard,

I'm not very knowledgeable in the ways of lvm, however, you may want to
check to make sure that lock_dlm.ko is loaded (by using lsmod).
I don't know the code, but I'm guessing it's trying to create a lock
space by opening one of the dlm kernel devices (/dev/dlm*) which should
be controlled by the lock_dlm device driver.  If that's not loaded, it
will fail.

Also, make sure the second box can physically see the SAN in
/proc/partitions.  I've seen some weird things like this happen when a
cluster comes up but some of the nodes can't physically access the SAN.

I hope this helps.

Regards,

Bob Peterson
Red Hat Cluster Suite


From Olivier.Thibault at lmpt.univ-tours.fr  Fri Jun  9 12:26:19 2006
From: Olivier.Thibault at lmpt.univ-tours.fr (Olivier Thibault)
Date: Fri, 09 Jun 2006 14:26:19 +0200
Subject: [Linux-cluster] gfs_tool gettune
Message-ID: <448968EB.6020302@lmpt.univ-tours.fr>

Hello,

I am testing GFS 6.1 and have a question about the gettune command of 
gfs_tool.
If I do
gfs_tool setflag inherit_directio my_directory
then
gfs_ttol gettune my_directory
It displays :
new_files_directio = 0

It is the same thing with the inherit_jdata flag and new_files_jdata

So my question is : is there any relation between these flags and what 
gettune displays. Should'nt it display "new_files_directio = 1" ?

However, it seems that it impacts on the filesystem as my tests behave 
differently depending on these flags.
So, what are the tuneable options new_files_directio and new_files_jdata ?
Is there somewhere any doc about all the tuneable parameters ?

Best regards,

Olivier


-- 
Olivier THIBAULT
Laboratoire de Math?matiques et Physique Th?orique (UMR CNRS 6083)
Universit? Fran?ois Rabelais
Parc de Grandmont - 37200 TOURS
T?l: +33 2 47 36 69 12
Fax: +33 2 47 36 69 56


From pcaulfie at redhat.com  Fri Jun  9 13:17:11 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Fri, 09 Jun 2006 14:17:11 +0100
Subject: [Linux-cluster] Error starting up CLVMD
In-Reply-To: <1149804993.12291.27.camel@technetium.msp.redhat.com>
References: <448885C5.4050505@sph.emory.edu>
	<1149804993.12291.27.camel@technetium.msp.redhat.com>
Message-ID: <448974D7.7050801@redhat.com>

Robert S Peterson wrote:
> On Thu, 2006-06-08 at 16:17 -0400, Vernard C. Martin wrote:
>> The first node came up fine but the 2nd node is giving me a strange 
>> error when trying to start up "clvmd". The error is:[root at node001 ~]# clvmd
>> clvmd could not connect to cluster manager
>> Consult syslog for more information
>> [root at node001 ~]#
>>
>> the syslog has:
>> Jun  8 16:04:16 node001 clvmd: Unable to create lockspace for CLVM: No 
>> such file or directory
> 
> Hi Vernard,
> 
> I'm not very knowledgeable in the ways of lvm, however, you may want to
> check to make sure that lock_dlm.ko is loaded (by using lsmod).
> I don't know the code, but I'm guessing it's trying to create a lock
> space by opening one of the dlm kernel devices (/dev/dlm*) which should
> be controlled by the lock_dlm device driver.  If that's not loaded, it
> will fail.

Bob's right, it sounds like the DLM isn't loaded. The module name is just
"dlm" BTW and the device should show up in /proc/misc and (if udev is running)
/dev/misc/dlm-control. lock_dlm is the GFS interface to the DLM...yes, I know
it's confusing.


-- 

patrick


From rpeterso at redhat.com  Fri Jun  9 13:48:18 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Fri, 09 Jun 2006 08:48:18 -0500
Subject: [Linux-cluster] gfs_tool gettune
In-Reply-To: <448968EB.6020302@lmpt.univ-tours.fr>
References: <448968EB.6020302@lmpt.univ-tours.fr>
Message-ID: <1149860898.3363.2.camel@technetium.msp.redhat.com>

On Fri, 2006-06-09 at 14:26 +0200, Olivier Thibault wrote: 
> Hello,
> 
> I am testing GFS 6.1 and have a question about the gettune command of 
> gfs_tool.
> If I do
> gfs_tool setflag inherit_directio my_directory
> then
> gfs_ttol gettune my_directory
> It displays :
> new_files_directio = 0
> 
> It is the same thing with the inherit_jdata flag and new_files_jdata
> 
> So my question is : is there any relation between these flags and what 
> gettune displays. Should'nt it display "new_files_directio = 1" ?
> 
> However, it seems that it impacts on the filesystem as my tests behave 
> differently depending on these flags.
> So, what are the tuneable options new_files_directio and new_files_jdata ?
> Is there somewhere any doc about all the tuneable parameters ?
> 
> Best regards,
> 
> Olivier

Hi Olivier,

Here's what's going on:
inherit_directio and new_files_directio are two separate things.
If you look at the man page, inherit_directio operates on a single
directory whereas new_files_directio is a filesystem-wide "settune"
value.  

If you do:

gfs_tool setflag inherit_directio my_directory

You're telling the fs that ONLY your directory and all new files
within that directory should have this attribute, which is why your
tests are acting as expected, as long as you're within that directory.
It basically sets an attribute on an in-memory inode for the directory.

If instead you were to do:

gfs_tool settune <mount point> new_files_directio 1

The value new_files_directio value would change for the whole
mount point, not just that directory.  Of course, you're seeing what
gfs_tool gettune my_directory is reporting for the global flag.

Regards,

Bob Peterson
Red Hat Cluster Suite


From sdake at redhat.com  Wed Jun  7 19:24:25 2006
From: sdake at redhat.com (Steven Dake)
Date: Wed, 07 Jun 2006 12:24:25 -0700
Subject: [Linux-cluster] [gfs_controld] send messages through separate cpg
In-Reply-To: <20060607172740.GA18684@redhat.com>
References: <20060607172740.GA18684@redhat.com>
Message-ID: <1149708265.18988.3.camel@shih.broked.org>

Dave,

I'd say the cpg bits look really good except for the mcast operation
(where you have a FIXME).

I'd recommend not backing off here, but instead spinning on the transmit
if ERR_TRY_AGAIN is returned.  Even on a heavily loaded system the delay
should not be very significant on a spin operation, unless this code has
certain timeouts (not sure about that) that would expire.  It would
appear not since the code suggests backing off using a timer.

Regards
-steve

On Wed, 2006-06-07 at 12:27 -0500, David Teigland wrote:
> [new process requires all work to be sent to ml prior to cvs check-in]
> 
> Set up a separate cpg for sending messages (e.g. for processing
> mount/unmount) instead of sending them through the cpg used to represent
> the mount group.  Since we apply cpg changes to the mount group async,
> that cpg won't always contain all the nodes we need to process the
> mount/unmount.  A mount from one node in parallel with unmount from
> another often won't work without this.
> 
> 
> diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/Makefile cluster/gfs/lock_dlm/daemon/Makefile
> --- cluster-HEAD/gfs/lock_dlm/daemon/Makefile	2006-03-27 01:31:46.000000000 -0600
> +++ cluster/gfs/lock_dlm/daemon/Makefile	2006-06-06 17:19:40.740421037 -0500
> @@ -21,6 +21,7 @@
>  	-I../../include/ \
>  	-I../../../group/lib/ \
>  	-I../../../cman/lib/ \
> +	-I../../../cman/daemon/openais/trunk/include/ \
>  	-I../../../dlm/lib/ \
>  	-I../../../gfs-kernel/src/dlm/
>  
> @@ -33,12 +34,14 @@
>  
>  gfs_controld: 	main.o \
>  		member_cman.o \
> +		cpg.o \
>  		group.o \
>  		plock.o \
>  		recover.o \
>  		withdraw.o \
>  		../../../dlm/lib/libdlm_lt.a \
>  		../../../cman/lib/libcman.a \
> +		../../../cman/daemon/openais/trunk/lib/libcpg.a \
>  		../../../group/lib/libgroup.a
>  	$(CC) $(LDFLAGS) -o $@ $^
>  
> @@ -49,6 +52,9 @@
>  member_cman.o: member_cman.c
>  	$(CC) $(CFLAGS) -c -o $@ $<
>  
> +cpg.o: cpg.c
> +	$(CC) $(CFLAGS) -c -o $@ $<
> +
>  recover.o: recover.c
>  	$(CC) $(CFLAGS) -c -o $@ $<
>  
> diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/cpg.c cluster/gfs/lock_dlm/daemon/cpg.c
> --- cluster-HEAD/gfs/lock_dlm/daemon/cpg.c	1969-12-31 18:00:00.000000000 -0600
> +++ cluster/gfs/lock_dlm/daemon/cpg.c	2006-06-07 11:54:28.478585576 -0500
> @@ -0,0 +1,212 @@
> +/******************************************************************************
> +*******************************************************************************
> +**
> +**  Copyright (C) 2006 Red Hat, Inc.  All rights reserved.
> +**
> +**  This copyrighted material is made available to anyone wishing to use,
> +**  modify, copy, or redistribute it subject to the terms and conditions
> +**  of the GNU General Public License v.2.
> +**
> +*******************************************************************************
> +******************************************************************************/
> +
> +#include "lock_dlm.h"
> +#include "cpg.h"
> +
> +static cpg_handle_t	daemon_handle;
> +static struct cpg_name	daemon_name;
> +static int		got_msg;
> +static int		saved_nodeid;
> +static int		saved_len;
> +static char		saved_data[MAX_MSGLEN];
> +
> +void receive_journals(struct mountgroup *mg, char *buf, int len, int from);
> +void receive_options(struct mountgroup *mg, char *buf, int len, int from);
> +void receive_remount(struct mountgroup *mg, char *buf, int len, int from);
> +void receive_plock(struct mountgroup *mg, char *buf, int len, int from);
> +void receive_recovery_status(struct mountgroup *mg, char *buf, int len,
> +			     int from);
> +void receive_recovery_done(struct mountgroup *mg, char *buf, int len, int from);
> +
> +
> +static void do_deliver(int nodeid, char *data, int len)
> +{
> +	struct mountgroup *mg;
> +	struct gdlm_header *hd;
> +
> +	hd = (struct gdlm_header *) data;
> +
> +	mg = find_mg(hd->name);
> +	if (!mg)
> +		return;
> +
> +	hd->version[0]	= le16_to_cpu(hd->version[0]);
> +	hd->version[1]	= le16_to_cpu(hd->version[1]);
> +	hd->version[2]	= le16_to_cpu(hd->version[2]);
> +	hd->type	= le16_to_cpu(hd->type);
> +	hd->nodeid	= le32_to_cpu(hd->nodeid);
> +	hd->to_nodeid	= le32_to_cpu(hd->to_nodeid);
> +
> +	if (hd->version[0] != GDLM_VER_MAJOR) {
> +		log_error("reject message version %u.%u.%u",
> +			  hd->version[0], hd->version[1], hd->version[2]);
> +		return;
> +	}
> +
> +	/* If there are some group messages between a new node being added to
> +	   the cpg group and being added to the app group, the new node should
> +	   discard them since they're only relevant to the app group. */
> +
> +	if (!mg->last_callback) {
> +		log_group(mg, "discard message type %d len %d from %d",
> +			  hd->type, len, nodeid);
> +		return;
> +	}
> +
> +	switch (hd->type) {
> +	case MSG_JOURNAL: 
> +		receive_journals(mg, data, len, nodeid);
> +		break;
> +
> +	case MSG_OPTIONS:
> +		receive_options(mg, data, len, nodeid);
> +		break;
> +
> +	case MSG_REMOUNT:
> +		receive_remount(mg, data, len, nodeid);
> +		break;
> +
> +	case MSG_PLOCK:
> +		receive_plock(mg, data, len, nodeid);
> +		break;
> +
> +	case MSG_RECOVERY_STATUS:
> +		receive_recovery_status(mg, data, len, nodeid);
> +		break;
> +
> +	case MSG_RECOVERY_DONE:
> +		receive_recovery_done(mg, data, len, nodeid);
> +		break;
> +
> +	default:
> +		log_error("unknown message type %d from %d",
> +			  hd->type, hd->nodeid);
> +	}
> +}
> +
> +void deliver_cb(cpg_handle_t handle, struct cpg_name *group_name,
> +		uint32_t nodeid, uint32_t pid, void *data, int data_len)
> +{
> +	saved_nodeid = nodeid;
> +	saved_len = data_len;
> +	memcpy(saved_data, data, data_len);
> +	got_msg = 1;
> +}
> +
> +void confchg_cb(cpg_handle_t handle, struct cpg_name *group_name,
> +		struct cpg_address *member_list, int member_list_entries,
> +		struct cpg_address *left_list, int left_list_entries,
> +		struct cpg_address *joined_list, int joined_list_entries)
> +{
> +}
> +
> +static cpg_callbacks_t callbacks = {
> +	.cpg_deliver_fn = deliver_cb,
> +	.cpg_confchg_fn = confchg_cb,
> +};
> +
> +int process_cpg(void)
> +{
> +	cpg_error_t error;
> +	
> +	got_msg = 0;
> +	saved_len = 0;
> +	saved_nodeid = 0;
> +	memset(saved_data, 0, sizeof(saved_data));
> +
> +	error = cpg_dispatch(daemon_handle, CPG_DISPATCH_ONE);
> +	if (error != CPG_OK) {
> +		log_error("cpg_dispatch error %d", error);
> +		return -1;
> +	}
> +
> +	if (got_msg)
> +		do_deliver(saved_nodeid, saved_data, saved_len);
> +	return 0;
> +}
> +
> +int setup_cpg(void)
> +{
> +	cpg_error_t error;
> +	int fd = 0;
> +
> +	error = cpg_initialize(&daemon_handle, &callbacks);
> +	if (error != CPG_OK) {
> +		log_error("cpg_initialize error %d", error);
> +		return -1;
> +	}
> +
> +	cpg_fd_get(daemon_handle, &fd);
> +	if (fd < 0)
> +		return -1;
> +
> +	memset(&daemon_name, 0, sizeof(daemon_name));
> +	strcpy(daemon_name.value, "gfs_controld");
> +	daemon_name.length = 12;
> +
> + retry:
> +	error = cpg_join(daemon_handle, &daemon_name);
> +	if (error == CPG_ERR_TRY_AGAIN) {
> +		log_debug("setup_cpg cpg_join retry");
> +		sleep(1);
> +		goto retry;
> +	}
> +	if (error != CPG_OK) {
> +		log_error("cpg_join error %d", error);
> +		cpg_finalize(daemon_handle);
> +		return -1;
> +	}
> +
> +	log_debug("cpg %d", fd);
> +	return fd;
> +}
> +
> +static int _send_message(cpg_handle_t h, void *buf, int len)
> +{
> +	struct iovec iov;
> +	cpg_error_t error;
> +	int retries = 0;
> +
> +	iov.iov_base = buf;
> +	iov.iov_len = len;
> +
> + retry:
> +	error = cpg_mcast_joined(h, CPG_TYPE_AGREED, &iov, 1);
> +	if (error != CPG_OK)
> +		log_error("cpg_mcast_joined error %d handle %llx", error, h);
> +	if (error == CPG_ERR_TRY_AGAIN) {
> +		/* FIXME: backoff say .25 sec, .5 sec, .75 sec, 1 sec */
> +		retries++;
> +		if (retries > 3)
> +			sleep(1);
> +		goto retry;
> +	}
> +
> +	return 0;
> +}
> +
> +int send_group_message(struct mountgroup *mg, int len, char *buf)
> +{
> +	struct gdlm_header *hd = (struct gdlm_header *) buf;
> +
> +	hd->version[0]	= cpu_to_le16(GDLM_VER_MAJOR);
> +	hd->version[1]	= cpu_to_le16(GDLM_VER_MINOR);
> +	hd->version[2]	= cpu_to_le16(GDLM_VER_PATCH);
> +	hd->type	= cpu_to_le16(hd->type);
> +	hd->nodeid	= cpu_to_le32(hd->nodeid);
> +	hd->to_nodeid	= cpu_to_le32(hd->to_nodeid);
> +	memcpy(hd->name, mg->name, strlen(mg->name));
> +	
> +	return _send_message(daemon_handle, buf, len);
> +}
> +
> diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/group.c cluster/gfs/lock_dlm/daemon/group.c
> --- cluster-HEAD/gfs/lock_dlm/daemon/group.c	2006-06-07 12:10:32.102338261 -0500
> +++ cluster/gfs/lock_dlm/daemon/group.c	2006-06-06 17:23:06.523976113 -0500
> @@ -21,25 +21,14 @@
>  static int cb_event_nr;
>  static unsigned int cb_id;
>  static int cb_type;
> -static int cb_nodeid;
> -static int cb_len;
>  static int cb_member_count;
>  static int cb_members[MAX_GROUP_MEMBERS];
> -static char cb_message[MAX_MSGLEN+1];
>  
>  int do_stop(struct mountgroup *mg);
>  int do_finish(struct mountgroup *mg);
>  int do_terminate(struct mountgroup *mg);
>  int do_start(struct mountgroup *mg, int type, int count, int *nodeids);
>  
> -void receive_journals(struct mountgroup *mg, char *buf, int len, int from);
> -void receive_options(struct mountgroup *mg, char *buf, int len, int from);
> -void receive_remount(struct mountgroup *mg, char *buf, int len, int from);
> -void receive_plock(struct mountgroup *mg, char *buf, int len, int from);
> -void receive_recovery_status(struct mountgroup *mg, char *buf, int len,
> -			     int from);
> -void receive_recovery_done(struct mountgroup *mg, char *buf, int len, int from);
> -
>  
>  static void stop_cbfn(group_handle_t h, void *private, char *name)
>  {
> @@ -87,17 +76,9 @@
>  static void deliver_cbfn(group_handle_t h, void *private, char *name,
>  			 int nodeid, int len, char *buf)
>  {
> -	int n;
> -	cb_action = DO_DELIVER;
> -	strncpy(cb_name, name, MAX_GROUP_NAME_LEN);
> -	cb_nodeid = nodeid;
> -	cb_len = n = len;
> -	if (len > MAX_MSGLEN)
> -		n = MAX_MSGLEN;
> -	memcpy(&cb_message, buf, n);
>  }
>  
> -group_callbacks_t callbacks = {
> +static group_callbacks_t callbacks = {
>  	stop_cbfn,
>  	start_cbfn,
>  	finish_cbfn,
> @@ -106,53 +87,6 @@
>  	deliver_cbfn
>  };
>  
> -static void do_deliver(struct mountgroup *mg)
> -{
> -	struct gdlm_header *hd;
> -
> -	hd = (struct gdlm_header *) cb_message;
> -
> -	/* If there are some group messages between a new node being added to
> -	   the cpg group and being added to the app group, the new node should
> -	   discard them since they're only relevant to the app group. */
> -
> -	if (!mg->last_callback) {
> -		log_group(mg, "discard message type %d len %d from %d",
> -			  hd->type, cb_len, cb_nodeid);
> -		return;
> -	}
> -
> -	switch (hd->type) {
> -	case MSG_JOURNAL:
> -		receive_journals(mg, cb_message, cb_len, cb_nodeid);
> -		break;
> -
> -	case MSG_OPTIONS:
> -		receive_options(mg, cb_message, cb_len, cb_nodeid);
> -		break;
> -
> -	case MSG_REMOUNT:
> -		receive_remount(mg, cb_message, cb_len, cb_nodeid);
> -		break;
> -
> -	case MSG_PLOCK:
> -		receive_plock(mg, cb_message, cb_len, cb_nodeid);
> -		break;
> -
> -	case MSG_RECOVERY_STATUS:
> -		receive_recovery_status(mg, cb_message, cb_len, cb_nodeid);
> -		break;
> -
> -	case MSG_RECOVERY_DONE:
> -		receive_recovery_done(mg, cb_message, cb_len, cb_nodeid);
> -		break;
> -
> -	default:
> -		log_error("unknown message type %d from %d",
> -			  hd->type, hd->nodeid);
> -	}
> -}
> -
>  char *str_members(void)
>  {
>  	static char buf[MAXLINE];
> @@ -222,12 +156,6 @@
>  		mg->id = cb_id;
>  		break;
>  
> -	case DO_DELIVER:
> -		log_debug("groupd callback: deliver %s len %d nodeid %d",
> -			  cb_name, cb_len, cb_nodeid);
> -		do_deliver(mg);
> -		break;
> -
>  	default:
>  		error = -EINVAL;
>  	}
> @@ -257,15 +185,3 @@
>  	return rv;
>  }
>  
> -int send_group_message(struct mountgroup *mg, int len, char *buf)
> -{
> -	int error;
> -
> -	error = group_send(gh, mg->name, len, buf);
> -	if (error < 0)
> -		log_error("group_send error %d errno %d", error, errno);
> -	else
> -		error = 0;
> -	return error;
> -}
> -
> diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/lock_dlm.h cluster/gfs/lock_dlm/daemon/lock_dlm.h
> --- cluster-HEAD/gfs/lock_dlm/daemon/lock_dlm.h	2006-05-25 14:30:40.000000000 -0500
> +++ cluster/gfs/lock_dlm/daemon/lock_dlm.h	2006-06-06 17:18:25.510916543 -0500
> @@ -201,11 +201,16 @@
>  	MSG_RECOVERY_DONE,
>  };
>  
> +#define GDLM_VER_MAJOR 1
> +#define GDLM_VER_MINOR 0
> +#define GDLM_VER_PATCH 0
> +
>  struct gdlm_header {
>  	uint16_t		version[3];
>  	uint16_t		type;			/* MSG_ */
>  	uint32_t		nodeid;			/* sender */
>  	uint32_t		to_nodeid;		/* 0 if to all */
> +	char			name[MAXNAME];
>  };
>  
> 
> @@ -214,6 +219,8 @@
>  
>  int setup_cman(void);
>  int process_cman(void);
> +int setup_cpg(void);
> +int process_cpg(void);
>  int setup_groupd(void);
>  int process_groupd(void);
>  int setup_libdlm(void);
> diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/main.c cluster/gfs/lock_dlm/daemon/main.c
> --- cluster-HEAD/gfs/lock_dlm/daemon/main.c	2006-04-21 14:54:10.000000000 -0500
> +++ cluster/gfs/lock_dlm/daemon/main.c	2006-06-07 11:59:12.248223925 -0500
> @@ -25,6 +25,7 @@
>  static struct pollfd pollfd[MAX_CLIENTS];
>  
>  static int cman_fd;
> +static int cpg_fd;
>  static int listen_fd;
>  static int groupd_fd;
>  static int uevent_fd;
> @@ -249,6 +250,11 @@
>  		goto out;
>  	client_add(cman_fd, &maxi);
>  
> +	rv = cpg_fd = setup_cpg();
> +	if (rv < 0)
> +		goto out;
> +	client_add(cpg_fd, &maxi);
> +
>  	rv = groupd_fd = setup_groupd();
>  	if (rv < 0)
>  		goto out;
> @@ -272,6 +278,8 @@
>  		goto out;
>  	client_add(plocks_fd, &maxi);
>  
> +	log_debug("setup done");
> +
>  	for (;;) {
>  		rv = poll(pollfd, maxi + 1, -1);
>  		if (rv < 0)
> @@ -296,6 +304,8 @@
>  					process_groupd();
>  				else if (pollfd[i].fd == cman_fd)
>  					process_cman();
> +				else if (pollfd[i].fd == cpg_fd)
> +					process_cpg();
>  				else if (pollfd[i].fd == uevent_fd)
>  					process_uevent();
>  				else if (!no_withdraw &&
> @@ -310,7 +320,6 @@
>  			if (pollfd[i].revents & POLLHUP) {
>  				if (pollfd[i].fd == cman_fd)
>  					exit_cman();
> -				log_debug("closing fd %d", pollfd[i].fd);
>  				close(pollfd[i].fd);
>  			}
>  		}
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From sdake at redhat.com  Thu Jun  8 21:10:21 2006
From: sdake at redhat.com (Steven Dake)
Date: Thu, 08 Jun 2006 14:10:21 -0700
Subject: [Linux-cluster] Updates to libgfs2
In-Reply-To: <1149801419.12291.9.camel@technetium.msp.redhat.com>
References: <1149801419.12291.9.camel@technetium.msp.redhat.com>
Message-ID: <1149801021.20886.4.camel@shih.broked.org>

Bob,

The copy_form_mem function looks as though it may break strict aliasing
rules set by the ISO C 99 standard.  Have you tried compiling with -
Wstrict-aliasing=2 as a CFLAGS option?  If you receive no warnings here,
you should be ok.

Regards
-steve

On Thu, 2006-06-08 at 16:16 -0500, Robert S Peterson wrote:
> Hi,
> 
> I just wanted to let you know: I made some bug fixes to libgfs2 for
> problems with fsck.  The following is a patch with the code changes:
> Also, there were some parts that got missed from the original commit
> that are there now.
> 
> Regards,
> 
> Bob Peterson
> Red Hat Cluster Suite
> 
> Index: buf.c
> ===================================================================
> RCS file: /cvs/cluster/cluster/gfs2/libgfs2/buf.c,v
> retrieving revision 1.2
> diff -w -u -p -u -p -r1.2 buf.c
> --- buf.c	6 Jun 2006 14:20:41 -0000	1.2
> +++ buf.c	8 Jun 2006 20:58:48 -0000
> @@ -188,15 +188,17 @@ void bsync(struct gfs2_sbd *sdp)
>  /* commit buffers to disk but do not discard */
>  void bcommit(struct gfs2_sbd *sdp)
>  {
> -	osi_list_t *tmp;
> +	osi_list_t *tmp, *x;
>  	struct gfs2_buffer_head *bh;
>  
> -	osi_list_foreach(tmp, &sdp->buf_list) {
> +	osi_list_foreach_safe(tmp, &sdp->buf_list, x) {
>  		bh = osi_list_entry(tmp, struct gfs2_buffer_head, b_list);
> -		if (bh->b_changed) {
> +		if (!bh->b_count)             /* if not reserved for later */
> +			write_buffer(sdp, bh);    /* write the data, free the memory */
> +		else if (bh->b_changed) {     /* if buffer has changed */
>  			do_lseek(sdp, bh->b_blocknr * sdp->bsize);
> -			do_write(sdp, bh->b_data, sdp->bsize);
> -			bh->b_changed = FALSE;
> +			do_write(sdp, bh->b_data, sdp->bsize); /* write it out */
> +			bh->b_changed = FALSE;    /* no longer changed */
>  		}
>  	}
>  }
> Index: fs_ops.c
> ===================================================================
> RCS file: /cvs/cluster/cluster/gfs2/libgfs2/fs_ops.c,v
> retrieving revision 1.2
> diff -w -u -p -u -p -r1.2 fs_ops.c
> --- fs_ops.c	6 Jun 2006 14:20:41 -0000	1.2
> +++ fs_ops.c	8 Jun 2006 20:58:49 -0000
> @@ -502,14 +502,12 @@ int gfs2_readi(struct gfs2_inode *ip, vo
>  	return copied;
>  }
>  
> -static void
> -copy_from_mem(struct gfs2_buffer_head *bh, void **buf,
> +static void copy_from_mem(struct gfs2_buffer_head *bh, void **buf,
>  	      unsigned int offset, unsigned int size)
>  {
>  	char **p = (char **)buf;
>  
>  	memcpy(bh->b_data + offset, *p, size);
> -
>  	*p += size;
>  }
>  
> @@ -526,7 +524,6 @@ int gfs2_writei(struct gfs2_inode *ip, v
>  	int isdir = !!(S_ISDIR(ip->i_di.di_flags));
>  	const uint64_t start = offset;
>  	int copied = 0;
> -	enum update_flags f;
>  
>  	if (!size)
>  		return 0;
> @@ -558,7 +555,6 @@ int gfs2_writei(struct gfs2_inode *ip, v
>  			block_map(ip, lblock, &new, &dblock, &extlen);
>  		}
>  
> -		f = not_updated;
>  		if (new) {
>  			bh = bget(sdp, dblock);
>  			if (isdir) {
> @@ -567,12 +563,11 @@ int gfs2_writei(struct gfs2_inode *ip, v
>  				mh.mh_type = GFS2_METATYPE_JD;
>  				mh.mh_format = GFS2_FORMAT_JD;
>  				gfs2_meta_header_out(&mh, bh->b_data);
> -				f = updated;
>  			}
>  		} else
>  			bh = bread(sdp, dblock);
>  		copy_from_mem(bh, &buf, o, amount);
> -		brelse(bh, f);
> +		brelse(bh, updated);
>  
>  		copied += amount;
>  		lblock++;
> @@ -1084,8 +1079,7 @@ dir_make_exhash(struct gfs2_inode *dip)
>  	dip->i_di.di_depth = y;
>  }
>  
> -static void
> -dir_l_add(struct gfs2_inode *dip, char *filename, int len,
> +static void dir_l_add(struct gfs2_inode *dip, char *filename, int len,
>  		  struct gfs2_inum *inum, unsigned int type)
>  {
>  	struct gfs2_dirent *dent;
> @@ -1564,11 +1558,10 @@ int gfs2_freedi(struct gfs2_sbd *sdp, ui
>  	struct gfs2_inode *ip;
>  	struct gfs2_buffer_head *bh;
>  	int x;
> -	uint64_t p, freed_blocks;
> +	uint64_t p;
>  	unsigned char *buf;
>  	struct rgrp_list *rgd;
>  	
> -	freed_blocks = 0;
>  	bh = bread(sdp, block);
>  	ip = inode_get(sdp, bh);
>  	if (ip->i_di.di_height > 0) {
> @@ -1578,14 +1571,19 @@ int gfs2_freedi(struct gfs2_sbd *sdp, ui
>  			 x += sizeof(uint64_t)) {
>  			p = be64_to_cpu(*(uint64_t *)(buf + x));
>  			if (p) {
> -				freed_blocks++;
>  				gfs2_set_bitmap(sdp, p, GFS2_BLKST_FREE);
> +				/* We need to adjust the free space count for the freed */
> +                /* indirect block. */
> +				rgd = gfs2_blk2rgrpd(sdp, p); /* find the rg for indir block */
> +				bh = bget(sdp, rgd->ri.ri_addr); /* get the buffer its rg */
> +				rgd->rg.rg_free++; /* adjust the free count */
> +				gfs2_rgrp_out(&rgd->rg, bh->b_data); /* back to the buffer */
> +				brelse(bh, updated); /* release the buffer */
>  			}
>  		}
>  	}
>  	/* Set the bitmap type for inode to free space: */
>  	gfs2_set_bitmap(sdp, ip->i_di.di_num.no_addr, GFS2_BLKST_FREE);
> -	freed_blocks++; /* one for the inode itself */
>  	inode_put(ip, updated);
>  	/* Now we have to adjust the rg freespace count and inode count: */
>  	rgd = gfs2_blk2rgrpd(sdp, block);
> @@ -1593,7 +1591,7 @@ int gfs2_freedi(struct gfs2_sbd *sdp, ui
>  	/* buffer in memory for the rg on disk because we used it to fix the */
>  	/* bitmaps, some of which are on the same block on disk.             */
>  	bh = bread(sdp, rgd->ri.ri_addr); /* get the buffer */
> -	rgd->rg.rg_free += freed_blocks;
> +	rgd->rg.rg_free++;
>  	rgd->rg.rg_dinodes--; /* one less inode in use */
>  	gfs2_rgrp_out(&rgd->rg, bh->b_data);
>  	brelse(bh, updated); /* release the buffer */
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From sdake at redhat.com  Thu Jun  8 21:16:03 2006
From: sdake at redhat.com (Steven Dake)
Date: Thu, 08 Jun 2006 14:16:03 -0700
Subject: [Linux-cluster] Patch to gfs2_convert
In-Reply-To: <1149802001.12291.15.camel@technetium.msp.redhat.com>
References: <1149802001.12291.15.camel@technetium.msp.redhat.com>
Message-ID: <1149801363.20886.9.camel@shih.broked.org>

Bob,

Your patch looks good to me.  One issue that could occur which I'm not
sure is handled is the failure of the conversion right during the gfs1
to gfs2 inode conversion.  Is it possible for half the data structure to
be converted then failure to occur causing the inode to be in a half-
converted state?  I don't know enough about the code to be sure of this
case.

Regards
-steve

On Thu, 2006-06-08 at 16:26 -0500, Robert S Peterson wrote:
> Hi,
> 
> This patch to gfs2_convert makes it much more forgiving when fs conversions
> are interrupted in the middle due to power loss, interrupts, or other
> reasons.  Now, if a filesystem conversion is interrupted mid-way through,
> the tool should be able to pick up where it left off without damage.
> 
> As always, send questions, comments and concerns to me.  If I don't hear
> from anybody, I'll commit it to cvs in a few days.
> 
> Regards,
> 
> Bob Peterson
> Red Hat Cluster Suite
> 
> Index: gfs2_convert.c
> ===================================================================
> RCS file: /cvs/cluster/cluster/gfs2/convert/gfs2_convert.c,v
> retrieving revision 1.2
> diff -w -u -p -u -p -r1.2 gfs2_convert.c
> --- gfs2_convert.c	6 Jun 2006 14:37:47 -0000	1.2
> +++ gfs2_convert.c	8 Jun 2006 21:13:37 -0000
> @@ -77,12 +77,14 @@ void convert_bitmaps(struct gfs2_sbd *sd
>  	int x, y;
>  	struct gfs2_rindex *ri;
>  	unsigned char state;
> +	struct gfs2_buffer_head *bh;
>  
>  	ri = &rgd2->ri;
>  	gfs2_compute_bitstructs(sdp, rgd2); /* mallocs bh as array */
>  	for (blk = 0; blk < ri->ri_length; blk++) {
> -		rgd2->bh[blk] = bget_generic(sdp, ri->ri_addr + blk, read_disk,
> -									 read_disk);
> +		bh = bget_generic(sdp, ri->ri_addr + blk, read_disk, read_disk);
> +		if (!rgd2->bh[blk])
> +			rgd2->bh[blk] = bh;
>  		x = (blk) ? sizeof(struct gfs2_meta_header) : sizeof(struct gfs2_rgrp);
>  
>  		for (; x < sdp->bsize; x++)
> @@ -92,7 +94,6 @@ void convert_bitmaps(struct gfs2_sbd *sd
>  				if (state == 0x02) /* unallocated metadata state invalid */
>  					rgd2->bh[blk]->b_data[x] &= ~(0x02 << (GFS2_BIT_SIZE * y));
>  			}
> -		brelse(rgd2->bh[blk], updated);
>  	}
>  }/* convert_bitmaps */
>  
> @@ -134,10 +135,8 @@ static int superblock_cvt(int disk_fd, c
>  	/* convert the ondisk sb structure   */
>  	/* --------------------------------- */
>  	sb2->sd_sb.sb_header.mh_magic = GFS2_MAGIC;
> -	sb2->sd_sb.sb_fs_format = GFS2_FORMAT_FS;
>  	sb2->sd_sb.sb_header.mh_type = GFS2_METATYPE_SB;
>  	sb2->sd_sb.sb_header.mh_format = GFS2_FORMAT_SB;
> -	sb2->sd_sb.sb_multihost_format = GFS2_FORMAT_MULTI;
>  	sb2->sd_sb.sb_bsize = sb1->sd_sb.sb_bsize;
>  	sb2->sd_sb.sb_bsize_shift = sb1->sd_sb.sb_bsize_shift;
>  	strcpy(sb2->sd_sb.sb_lockproto, sb1->sd_sb.sb_lockproto);
> @@ -174,14 +173,14 @@ static int superblock_cvt(int disk_fd, c
>  		rgd2->ri.ri_data0 = rgd->rd_ri.ri_data1;
>  		rgd2->ri.ri_data = rgd->rd_ri.ri_data;
>  		rgd2->ri.ri_bitbytes = rgd->rd_ri.ri_bitbytes;
> -		/* commit the changes to a gfs2 buffer */
> -		bh = bread(sb2, rgd2->ri.ri_addr); /* get a gfs2 buffer for the rg */
> -		gfs2_rgrp_out(&rgd2->rg, bh->b_data);
> -		brelse(bh, updated); /* release the buffer */
>  		/* Add the new gfs2 rg to our list: We'll output the index later. */
>  		osi_list_add_prev((osi_list_t *)&rgd2->list,
>  						  (osi_list_t *)&sb2->rglist);
>  		convert_bitmaps(sb2, rgd2, TRUE);
> +		/* Write the updated rgrp to the gfs2 buffer */
> +		bh = bget(sb2, rgd2->ri.ri_addr); /* get a gfs2 buffer for the rg */
> +		gfs2_rgrp_out(&rgd2->rg, rgd2->bh[0]->b_data);
> +		brelse(bh, updated); /* release the buffer */
>  	}
>  	return 0;
>  }/* superblock_cvt */
> @@ -195,8 +194,12 @@ int adjust_inode(struct gfs2_sbd *sbp, s
>  {
>  	struct gfs2_inode *inode;
>  	struct inode_block *fixdir;
> +	int inode_was_gfs1;
>  
>  	inode = inode_get(sbp, bh);
> +
> +	inode_was_gfs1 = (inode->i_di.di_num.no_formal_ino ==
> +					  inode->i_di.di_num.no_addr);
>  	/* Fix the inode number: */
>  	inode->i_di.di_num.no_formal_ino = sbp->md.next_inum;           ;
>  	
> @@ -240,11 +243,23 @@ int adjust_inode(struct gfs2_sbd *sbp, s
>  	/* di_goal_meta has shifted locations and di_goal_data has     */
>  	/* changed from 32-bits to 64-bits.  The following code        */
>  	/* adjusts for the shift.                                      */
> +	/*                                                             */
> +	/* Note: It may sound absurd, but we need to check if this     */
> +	/*       inode has already been converted to gfs2 or if it's   */
> +	/*       still a gfs1 inode.  That's just in case there was a  */
> +	/*       prior attempt to run gfs2_convert that never finished */
> +	/*       (due to power out, ctrl-c, kill, segfault, whatever.) */
> +	/*       If it is unconverted gfs1 we want to do a full        */
> +	/*       conversion.  If it's a gfs2 inode from a prior run,   */
> +	/*       we still need to renumber the inode, but here we      */
> +	/*       don't want to shift the data around.                  */
>  	/* ----------------------------------------------------------- */
> +	if (inode_was_gfs1) {
>  	inode->i_di.di_goal_meta = inode->i_di.di_goal_data;
>  	inode->i_di.di_goal_data = 0; /* make sure the upper 32b are 0 */
>  	inode->i_di.di_goal_data = inode->i_di.__pad[0];
>  	inode->i_di.__pad[1] = 0;
> +	}
>  	
>  	gfs2_dinode_out(&inode->i_di, bh->b_data);
>  	sbp->md.next_inum++; /* update inode count */
> @@ -344,7 +359,7 @@ int inode_renumber(struct gfs2_sbd *sbp,
>  /* ------------------------------------------------------------------------- */
>  /* fetch_inum - fetch an inum entry from disk, given its block               */
>  /* ------------------------------------------------------------------------- */
> -int fetch_and_fix_inum(struct gfs2_sbd *sbp, uint64_t iblock,
> +int fetch_inum(struct gfs2_sbd *sbp, uint64_t iblock,
>  					   struct gfs2_inum *inum)
>  {
>  	struct gfs2_buffer_head *bh_fix;
> @@ -356,7 +371,7 @@ int fetch_and_fix_inum(struct gfs2_sbd *
>  	inum->no_addr = fix_inode->i_di.di_num.no_addr;
>  	brelse(bh_fix, updated);
>  	return 0;
> -}/* fetch_and_fix_inum */
> +}/* fetch_inum */
>  
>  /* ------------------------------------------------------------------------- */
>  /* process_dirent_info - fix one dirent (directory entry) buffer             */
> @@ -382,6 +397,7 @@ int process_dirent_info(struct gfs2_inod
>  	/* Turns out you can't trust dir_entries is correct.     */
>  	for (de = 0; ; de++) {
>  		struct gfs2_inum inum;
> +		int dent_was_gfs1;
>  		
>  		gettimeofday(&tv, NULL);
>  		/* Do more warm fuzzy stuff for the customer. */
> @@ -394,18 +410,24 @@ int process_dirent_info(struct gfs2_inod
>  		}
>  		/* fix the dirent's inode number based on the inode */
>  		gfs2_inum_in(&inum, (char *)&dent->de_inum);
> +		dent_was_gfs1 = (dent->de_inum.no_addr == dent->de_inum.no_formal_ino);
>  		if (inum.no_formal_ino) { /* if not a sentinel (placeholder) */
> -			error = fetch_and_fix_inum(sbp, inum.no_addr, &inum);
> +			error = fetch_inum(sbp, inum.no_addr, &inum);
>  			if (error) {
>  				printf("Error retrieving inode %" PRIx64 "\n", inum.no_addr);
>  				break;
>  			}
> +			/* fix the dirent's inode number from the fetched inum. */
> +			dent->de_inum.no_formal_ino = cpu_to_be64(inum.no_formal_ino);
>  		}
>  		/* Fix the dirent's filename hash: They are the same as gfs1 */
>  		/* dent->de_hash = cpu_to_be32(gfs2_disk_hash((char *)(dent + 1), */
>  		/*                             be16_to_cpu(dent->de_name_len))); */
>  		/* Fix the dirent's file type.  Gfs1 used home-grown values.  */
>  		/* Gfs2 uses standard values from include/linux/fs.h          */
> +		/* Only do this if the dent was a true gfs1 dent, and not a   */
> +		/* gfs2 dent converted from a previously aborted run.         */
> +		if (dent_was_gfs1) {
>  		switch be16_to_cpu(dent->de_type) {
>  		case GFS_FILE_NON:
>  			dent->de_type = cpu_to_be16(DT_UNKNOWN);
> @@ -432,7 +454,7 @@ int process_dirent_info(struct gfs2_inod
>  			dent->de_type = cpu_to_be16(DT_SOCK);
>  			break;
>  		}
> -
> +		}
>  		error = gfs2_dirent_next(dip, bh, &dent);
>  		if (error)
>  			break;
> @@ -948,26 +970,33 @@ int main(int argc, char **argv)
>  		inode_put(sb2.md.inum, updated);
>  		inode_put(sb2.md.statfs, updated);
>  
> -		bh = bread(&sb2, sb2.sb_addr);
> -		gfs2_sb_out(&sb2.sd_sb, bh->b_data);
> -		brelse(bh, updated);
>  		bcommit(&sb2); /* write the buffers to disk */
>  
>  		/* Now delete the now-obsolete gfs1 files: */
>  		printf("Removing obsolete gfs1 structures.\n");
>  		fflush(stdout);
> -		/* Delete the Journal index: */
> +		/* Delete the old gfs1 Journal index: */
>  		gfs2_freedi(&sb2, sb.sd_sb.sb_jindex_di.no_addr);
> -		/* Delete the rgindex: */
> +		/* Delete the old gfs1 rgindex: */
>  		gfs2_freedi(&sb2, sb.sd_sb.sb_rindex_di.no_addr);
> -		/* Delete the Quota file: */
> +		/* Delete the old gfs1 Quota file: */
>  		gfs2_freedi(&sb2, sb.sd_sb.sb_quota_di.no_addr);
> -		/* Delete the License file: */
> +		/* Delete the old gfs1 License file: */
>  		gfs2_freedi(&sb2, sb.sd_sb.sb_license_di.no_addr);
> -		/* Now free all the rgrps */
> +		/* Now free all the in memory */
>  		gfs2_rgrp_free(&sb2, updated);
>  		printf("Committing changes to disk.\n");
>  		fflush(stdout);
> +		/* Set filesystem type in superblock to gfs2.  We do this at the */
> +		/* end because if the tool is interrupted in the middle, we want */
> +		/* it to not reject the partially converted fs as already done   */
> +		/* when it's run a second time.                                  */
> +		bh = bread(&sb2, sb2.sb_addr);
> +		sb2.sd_sb.sb_fs_format = GFS2_FORMAT_FS;
> +		sb2.sd_sb.sb_multihost_format = GFS2_FORMAT_MULTI;
> +		gfs2_sb_out(&sb2.sd_sb, bh->b_data);
> +		brelse(bh, updated);
> +
>  		bsync(&sb2); /* write the buffers to disk */
>  		error = fsync(disk_fd);
>  		if (error)
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From ocrete at max-t.com  Fri Jun  9 21:40:23 2006
From: ocrete at max-t.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=)
Date: Fri, 09 Jun 2006 17:40:23 -0400
Subject: [Linux-cluster] Kernel panic - not syncing: membership stopped
	responding
Message-ID: <1149889224.7865.99.camel@cocagne.max-t.internal>

Hi,

We rebooted one machine is our cluster (which uses cman) and we got the
following message on the changelog of all of the other machines of the
cluster (and they panicked!)... 

We are using a snapshot of the STABLE branch from May 9, 2006.

It seems strange to panic the kernel for a ENOMEM...

from syslog:
Error queueing request to port 1: -12
kernel: Kernel panic - not syncing: membership stopped responding

-- 
Olivier Cr?te
ocrete at max-t.com
Maximum Throughput Inc.


From wcheng at redhat.com  Mon Jun 12 05:25:43 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Mon, 12 Jun 2006 01:25:43 -0400
Subject: [Linux-cluster] [RFC] NLM lock failover admin interface
Message-ID: <1150089943.26019.18.camel@localhost.localdomain>

NFS v2/v3 active-active NLM lock failover has been an issue with our
cluster suite. With current implementation, it (cluster suite) is trying
to carry the workaround as much as it can with user mode scripts where,
upon failover, on taken-over server, it:

1. Tear down virtual IP.
2. Unexport the subject NFS export.
3. Signal lockd to drop the locks.
4. Un-mount filesystem if needed.

There are many other issues (such as /var/lib/nfs/statd/sm file, etc)
but this particular post is to further refine step 3 to avoid the 50
second global (default) grace period for all NFS exports; i.e., we would
like to be able to selectively drop locks (only) associated with the
requested exports without disrupting other NFS services. 

We've done some prototype (coding) works but would like to search for
community consensus on the admin interface if possible. We've tried out
the following:

1. /proc interface, say writing the fsid into a /proc directory entry
would end up dropping all NLM locks associated with the NFS export that
has fsid in its /etc/exports file.

2. Adding a new flag into "exportfs" command, say "h", such that

   "exportfs -uh *:/export_path"

would un-export the entry and drop the NLM locks associated with the
entry.

3. Add a new nfsctl by re-using a 2.4 kernel flag (NFSCTL_FOLOCKS) where
it takes:

   struct nfsctl_folocks {
        int           type;
        unsigned int  fsid;
        unsigned int  devno;
   }

as input argument. Depending on "type", the kernel call would drop the
locks associated with either the fsid, or devno. 

The core of the implementation is a new cloned version of
nlm_traverse_files() where it searches the "nlm_files" list one by one
to compare the fsid (or devno) based on nlm_file.f_handle field. A
helper function is also implemented to extract the fsid (or devno) from
f_handle.

The new function is planned to allow failover to abort if the file can't
be closed. We may also put the file locks back if abort occurs.

Would appreciate comments on the above admin interface. As soon as the
external interface can be finalized, the code will be submitted for
review.

-- Wendy


From wcheng at redhat.com  Mon Jun 12 06:11:04 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Mon, 12 Jun 2006 02:11:04 -0400
Subject: [Linux-cluster] [RFC] NLM lock failover admin interface
In-Reply-To: <1150089943.26019.18.camel@localhost.localdomain>
References: <1150089943.26019.18.camel@localhost.localdomain>
Message-ID: <1150092664.26180.19.camel@localhost.localdomain>

On Mon, 2006-06-12 at 01:25 -0400, Wendy Cheng wrote:
> NFS v2/v3 active-active NLM lock failover has been an issue with our
> cluster suite. With current implementation, it (cluster suite) is trying
> to carry the workaround as much as it can with user mode scripts where,
> upon failover, on taken-over server, it:
> 
> 1. Tear down virtual IP.
> 2. Unexport the subject NFS export.
> 3. Signal lockd to drop the locks.
> 4. Un-mount filesystem if needed.
> 
> There are many other issues (such as /var/lib/nfs/statd/sm file, etc)
> but this particular post is to further refine step 3 to avoid the 50
> second global (default) grace period for all NFS exports; i.e., we would
> like to be able to selectively drop locks (only) associated with the
> requested exports without disrupting other NFS services. 
> 
> We've done some prototype (coding) works but would like to search for
> community consensus on the admin interface if possible. 

While ping-pong the emails with our base kernel folks to choose
between /proc, or exportfs, or nfsctl (internally within the company -
mostly with steved and staubach), Peter suggested to try out multiple
lockd(s) to handle different NFS exports. In that case, we may require
to change a big portion of lockd kernel code. I prefer not going that
far since lockd failover is our cluster suite's immediate issue.
However, if this approach can get everyone's vote, we'll comply.

-- Wendy


From rpeterso at redhat.com  Mon Jun 12 14:51:04 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Mon, 12 Jun 2006 09:51:04 -0500
Subject: [Linux-cluster] Updates to libgfs2
In-Reply-To: <1149801021.20886.4.camel@shih.broked.org>
References: <1149801419.12291.9.camel@technetium.msp.redhat.com>
	<1149801021.20886.4.camel@shih.broked.org>
Message-ID: <448D7F58.2010603@redhat.com>

Steven Dake wrote:
> Bob,
>
> The copy_form_mem function looks as though it may break strict aliasing
> rules set by the ISO C 99 standard.  Have you tried compiling with -
> Wstrict-aliasing=2 as a CFLAGS option?  If you receive no warnings here,
> you should be ok.
>
> Regards
> -steve
>   
Hi Steve,

Thanks for the input.  It compiles without warning with:

-Wstrict-aliasing=2

Regards,

Bob Peterson
Red Hat Cluster Suite


From Jon.Stanley at savvis.net  Mon Jun 12 14:45:03 2006
From: Jon.Stanley at savvis.net (Stanley, Jon)
Date: Mon, 12 Jun 2006 09:45:03 -0500
Subject: [Linux-cluster] [RFC] NLM lock failover admin interface
Message-ID: <9A6FE0FCC2B29846824C5CD81C6647B902207776@s228130hz1ew08.apptix-01.savvis.net>

 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Wendy Cheng
> Sent: Monday, June 12, 2006 12:26 AM
> To: nfs at lists.sourceforge.net
> Cc: linux-cluster at redhat.com
> Subject: [Linux-cluster] [RFC] NLM lock failover admin interface
> 
NOTE - I don't use NFS functionality in Cluster Suite, so my coments may
be entirely meaningless.

> 
> 1. /proc interface, say writing the fsid into a /proc directory entry
> would end up dropping all NLM locks associated with the NFS 
> export that
> has fsid in its /etc/exports file.

This would defintely have it's advantages for people who know what
they're doing - they could drop all locks without unexporting the
filesystem.  However, it also gives people the opportunity to shoot
themselves in the foot - by eliminating locks that are needed.  After
weighing the pros and cons, I really don't think that any method
accessible via /proc is a good idea.

> 
> 2. Adding a new flag into "exportfs" command, say "h", such that
> 
>    "exportfs -uh *:/export_path"
> 
> would un-export the entry and drop the NLM locks associated with the
> entry.
> 

This is the best of the three, IMHO.  Gives you the safety of *knowing*
that the filesystem was unexported before dropping the locks, and
preventing folks from shooting themselves in the foot.

The other option that was mentioned, a separate lockd for each fs, is
also a good idea - but would require a lot of coding no doubt, and
introduce more instability into what I already preceive as an unstable
NFS subsystem in Linux (I *refuse* to use Linux as an NFS server and
instead go with Solaris - I've had *really* bad experiences with Linux
NFS under load - but that's getting OT).


From rpeterso at redhat.com  Mon Jun 12 14:56:27 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Mon, 12 Jun 2006 09:56:27 -0500
Subject: [Linux-cluster] Patch to gfs2_convert
In-Reply-To: <1149801363.20886.9.camel@shih.broked.org>
References: <1149802001.12291.15.camel@technetium.msp.redhat.com>
	<1149801363.20886.9.camel@shih.broked.org>
Message-ID: <448D809B.7000906@redhat.com>

Steven Dake wrote:
> Bob,
>
> Your patch looks good to me.  One issue that could occur which I'm not
> sure is handled is the failure of the conversion right during the gfs1
> to gfs2 inode conversion.  Is it possible for half the data structure to
> be converted then failure to occur causing the inode to be in a half-
> converted state?  I don't know enough about the code to be sure of this
> case.
>
> Regards
> -steve
Hi Steve,

Again, thanks for your input.
With the convert tool, the inodes are converted into buffers and the 
buffers are
eventually written out to disk, so either an inode is fully converted or 
not at all.
The latest version of the gfs2_convert tool determines on a per-inode 
basis whether
it has been converted or not, and converts it as appropriate.  That way, 
conversions
that are interrupted may be safely resumed later.

Regards,

Bob Peterson
Red Hat Cluster Suite


From aneesh.kumar at gmail.com  Mon Jun 12 14:51:57 2006
From: aneesh.kumar at gmail.com (Aneesh Kumar)
Date: Mon, 12 Jun 2006 20:21:57 +0530
Subject: [Linux-cluster] [RFC] Transport independent Cluster service
Message-ID: <cc723f590606120751t3cf57560pcfd6dd3e966cbff9@mail.gmail.com>

Hi,

I am right now working on ci-linux.sf.net project. The goal is to get
the code ready so that
projects like GFS and OCFS2 can start using the framework. CI/ICS
allows to build cluster service without worrying about the transport
mechanism used. With the OpenSSI project we had both IP and infiniband
transport and i belive it should be easy to implement one using sctp
or TIPC.  You can see the result of my work here
http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=summary


Different components of CI is explained here
http://ci-linux.sourceforge.net/components.shtml

I have dropped CLMS and CLMS key service. I am looking at using
CMAN/configfs for doing the cluster membership part.

Registering new cluster service is explained here.
http://ci-linux.sourceforge.net/enhancing.shtml

here also you can drop the CLMS part .

This link explains how to write new cluster service.
http://ci-linux.sourceforge.net/ics.shtml

A simple example is below

http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=blob;h=8282ad15da09901f4cd4bdd62490766458d1cebf;hb=f7e456933b9868486c014a83e473f647149a71f6;f=include/cluster/gen/icssig.svc
http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=blob;h=1004328006021f5287bb543abc7d15e51964eae6;hb=688587ead9ce2d26070ca051323769cd76c91185;f=include/cluster/gen/icstest.svc

Please follow up at sic-linux-devel at lists.sourceforge.net
-aneesh


From bfields at fieldses.org  Mon Jun 12 15:00:53 2006
From: bfields at fieldses.org (J. Bruce Fields)
Date: Mon, 12 Jun 2006 11:00:53 -0400
Subject: [Linux-cluster] [RFC] NLM lock failover admin interface
In-Reply-To: <1150089943.26019.18.camel@localhost.localdomain>
References: <1150089943.26019.18.camel@localhost.localdomain>
Message-ID: <20060612150053.GC31596@fieldses.org>

On Mon, Jun 12, 2006 at 01:25:43AM -0400, Wendy Cheng wrote:
> 2. Adding a new flag into "exportfs" command, say "h", such that
> 
>    "exportfs -uh *:/export_path"
> 
> would un-export the entry and drop the NLM locks associated with the
> entry.

What does the kernel interface end up looking like in that case?

--b.


From wcheng at redhat.com  Mon Jun 12 15:44:55 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Mon, 12 Jun 2006 11:44:55 -0400
Subject: [NFS] [Linux-cluster] [RFC] NLM lock failover admin interface
In-Reply-To: <20060612150053.GC31596@fieldses.org>
References: <1150089943.26019.18.camel@localhost.localdomain>
	<20060612150053.GC31596@fieldses.org>
Message-ID: <448D8BF7.7010105@redhat.com>

J. Bruce Fields wrote:

>On Mon, Jun 12, 2006 at 01:25:43AM -0400, Wendy Cheng wrote:
>  
>
>>2. Adding a new flag into "exportfs" command, say "h", such that
>>
>>   "exportfs -uh *:/export_path"
>>
>>would un-export the entry and drop the NLM locks associated with the
>>entry.
>>    
>>
>
>What does the kernel interface end up looking like in that case?
>
>  
>
Happy to see this new exportfs command gets positive response - it was 
our original pick too.

Uploaded is part of a draft version of 2.4 base kernel patch - we're 
cleaning up 2.6 patches at this moment. It basically adds a new export 
flag (NFSEXP_FOLOCK - note that ex_flags is an int but is currently only 
defined up to 16 bits) so nfs-util and kernel can communicate.

The nice thing about this approach is the recovery part - the take-over 
server can use the counter part command to export and set grace period 
for one particular interface within the same system call.

-- Wendy
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: gfs_nlm.patch
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060612/00b260d8/attachment.ksh>

From m_pupil at yahoo.com.cn  Mon Jun 12 15:58:50 2006
From: m_pupil at yahoo.com.cn (FKtPp)
Date: Mon, 12 Jun 2006 23:58:50 +0800 (CST)
Subject: [Linux-cluster] GFS 6.0 lock problem?
Message-ID: <20060612155850.87286.qmail@web15009.mail.cnb.yahoo.com>

hi list,

I recentlt setup a two server GFS cluster. Both of them export a 8G
partition useing gnbd_serv and import eaother with gnbd_import.

I assambled these two gnbd block to a ~17G pool. After that, I run
ccs_servd at one server ( hostname: gfs1 ) to serve the ccs archive,
and
started the ccsd, lock_gulmd.

Then I run gfs_mkfs at gfs1, and mounted the newly created fs on both
servers.

Everything seems ok, before I write something to the fs mounted at
the
other server ( hostname: gfs2 ). When I mkdir at machine gfs2,
machine
gfs1 can't see that directory; but when I mkdir at machine gfs1, gfs2
can see that new directory .

I tried dd a 1GB file to the filesystem at machine gfs2, but at
machine
gfs1, I still see nothing.

How can I figure out what was wrong? The /var/log/message didn't
provide
enough information...


__________________________________________________
???????????????
http://cn.mail.yahoo.com


From aneesh.kumar at gmail.com  Mon Jun 12 17:21:37 2006
From: aneesh.kumar at gmail.com (Aneesh Kumar)
Date: Mon, 12 Jun 2006 22:51:37 +0530
Subject: [Linux-cluster] Re: [RFC] Transport independent Cluster service
In-Reply-To: <cc723f590606120751t3cf57560pcfd6dd3e966cbff9@mail.gmail.com>
References: <cc723f590606120751t3cf57560pcfd6dd3e966cbff9@mail.gmail.com>
Message-ID: <cc723f590606121021t4bd4718aqcbe22a8d39c02c59@mail.gmail.com>

To further explain the simplicity of writing a cluster service using
this framework  you can look at the below  attached icstest service.

with this calling ics_test_print(char *string) will cause the string
to be printed on node 3.

-aneesh
-------------- next part --------------
A non-text attachment was scrubbed...
Name: icstest.svc
Type: application/octet-stream
Size: 1175 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060612/bbf45cf3/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: icstest_cli.c
Type: text/x-csrc
Size: 652 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060612/bbf45cf3/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: icstest_ics.c
Type: text/x-csrc
Size: 1651 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060612/bbf45cf3/attachment-0001.bin>

From wcheng at redhat.com  Mon Jun 12 18:09:30 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Mon, 12 Jun 2006 14:09:30 -0400
Subject: [NFS] [Linux-cluster] [RFC] NLM lock failover admin	interface
In-Reply-To: <448DE1C1.C935.0084.1@novell.com>
References: <1150089943.26019.18.camel@localhost.localdomain>
	<20060612150053.GC31596@fieldses.org>
	<448D8BF7.7010105@redhat.com> <448DE1C1.C935.0084.1@novell.com>
Message-ID: <448DADDA.2070209@redhat.com>

Madhan P wrote:

>For what it's worth, would second this approach of using a flag to
>unexport and associating the cleanup with that. 
>

Happy to have another vote :)  !  It is appreicated.

> Another quick hack we
>used was to store the NSM entries on a standard location on the
>respective exported filesystem, so that notification is sent once the
>filesystem comes back online on the destination server and is exported
>again.  BTW, this was not on Linux. It was a  simple solution providing
>the necessary active/active and active/passive cluster support.
>  
>

Lon Hohberge (from our cluster suite team) has been working on similar 
setup too (to structure the MSM file directory). We'll submit the 
associated kernel patch when it is ready ("rpc.statd -H" needs some 
bandaids). Future reviews and comments are also appreciated.

-- Wendy


From m_pupil at 163.com  Mon Jun 12 15:57:44 2006
From: m_pupil at 163.com (Kai)
Date: Mon, 12 Jun 2006 23:57:44 +0800
Subject: [Linux-cluster] GFS 6.0 lock problem?
Message-ID: <448D8EF8.8030602@163.com>


hi list,

I recentlt setup a two server GFS cluster. Both of them export a 8G
partition useing gnbd_serv and import eaother with gnbd_import.

I assambled these two gnbd block to a ~17G pool. After that, I run
ccs_servd at one server ( hostname: gfs1 ) to serve the ccs archive, and
started the ccsd, lock_gulmd.

Then I run gfs_mkfs at gfs1, and mounted the newly created fs on both
servers.

Everything seems ok, before I write something to the fs mounted at the
other server ( hostname: gfs2 ). When I mkdir at machine gfs2, machine
gfs1 can't see that directory; but when I mkdir at machine gfs1, gfs2
can see that new directory .

I tried dd a 1GB file to the filesystem at machine gfs2, but at machine
gfs1, I still see nothing.

How can I figure out what was wrong? The /var/log/message didn't provide
enough information...


From SteveD at redhat.com  Mon Jun 12 17:23:02 2006
From: SteveD at redhat.com (Steve Dickson)
Date: Mon, 12 Jun 2006 13:23:02 -0400
Subject: [NFS] [Linux-cluster] [RFC] NLM lock failover admin interface
In-Reply-To: <448D8BF7.7010105@redhat.com>
References: <1150089943.26019.18.camel@localhost.localdomain>	<20060612150053.GC31596@fieldses.org>
	<448D8BF7.7010105@redhat.com>
Message-ID: <448DA2F6.8080605@RedHat.com>

Wendy Cheng wrote:
> The nice thing about this approach is the recovery part - the take-over 
> server can use the counter part command to export and set grace period 
> for one particular interface within the same system call.
Actually this is a pretty clean and simple interface... imho..
The only issue I had was adding a flag to an older version and then
having to carry that flag forward... So if this interface is
accepted and added to the mainline nfs-utils (which it should be.. imho)
that fact it is so clean and simple would make the back porting fairly
trivial...

steved.


From jmy at sgi.com  Mon Jun 12 17:27:12 2006
From: jmy at sgi.com (James Yarbrough)
Date: Mon, 12 Jun 2006 10:27:12 -0700
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
References: <1150089943.26019.18.camel@localhost.localdomain>
Message-ID: <448DA3F0.AF1C8540@sgi.com>

> 2. Adding a new flag into "exportfs" command, say "h", such that
> 
>    "exportfs -uh *:/export_path"
> 
> would un-export the entry and drop the NLM locks associated with the
> entry.

This is fine for releasing the locks, but how do you plan to re-enter
the grace period for reclaiming the locks when you relocate the export?
And how do you intend to segregate the export for which reclaims are
valid from the ones which are not?  How do you plan to support the
sending of SM_NOTIFY?  This might be where a lockd per export has an
advantage.

-- 
jmy at sgi.com
650 933 3124

Why is there a snake in my Coke?


From PMadhan at novell.com  Mon Jun 12 16:20:57 2006
From: PMadhan at novell.com (Madhan P)
Date: Mon, 12 Jun 2006 10:20:57 -0600
Subject: [NFS] [Linux-cluster] [RFC] NLM lock failover admin
	interface
In-Reply-To: <448D8BF7.7010105@redhat.com>
References: <1150089943.26019.18.camel@localhost.localdomain>
	<20060612150053.GC31596@fieldses.org> <448D8BF7.7010105@redhat.com>
Message-ID: <448DE1C1.C935.0084.1@novell.com>

For what it's worth, would second this approach of using a flag to
unexport and associating the cleanup with that.  Another quick hack we
used was to store the NSM entries on a standard location on the
respective exported filesystem, so that notification is sent once the
filesystem comes back online on the destination server and is exported
again.  BTW, this was not on Linux. It was a  simple solution providing
the necessary active/active and active/passive cluster support.

- Madhan

>>> On 6/12/2006 at 9:14:55 pm, in message
<448D8BF7.7010105 at redhat.com>, Wendy
Cheng <wcheng at redhat.com> wrote:
> J. Bruce Fields wrote:
> 
>>On Mon, Jun 12, 2006 at 01:25:43AM -0400, Wendy Cheng wrote:
>>  
>>
>>>2. Adding a new flag into "exportfs" command, say "h", such that
>>>
>>>   "exportfs -uh *:/export_path"
>>>
>>>would un-export the entry and drop the NLM locks associated with
the
>>>entry.
>>>    
>>>
>>
>>What does the kernel interface end up looking like in that case?
>>
>>  
>>
> Happy to see this new exportfs command gets positive response - it
was 
> our original pick too.
> 
> Uploaded is part of a draft version of 2.4 base kernel patch - we're

> cleaning up 2.6 patches at this moment. It basically adds a new
export 
> flag (NFSEXP_FOLOCK - note that ex_flags is an int but is currently
only 
> defined up to 16 bits) so nfs-util and kernel can communicate.
> 
> The nice thing about this approach is the recovery part - the
take-over 
> server can use the counter part command to export and set grace
period 
> for one particular interface within the same system call.
> 
> -- Wendy


From rpeterso at redhat.com  Mon Jun 12 16:27:03 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Mon, 12 Jun 2006 11:27:03 -0500
Subject: [Linux-cluster] Updates to gfs2 user tools
Message-ID: <448D95D7.5040403@redhat.com>

Hi Folks,

Attached are my latest patches to the GFS2 user tools.  Summary of the 
changes:

libgfs2 - Porting of more functions from libgfs1 and fsck, mostly for 
gfs2_convert's sake.
edit - Fixed a bug regarding the printing of stuffed directories (e.g. 
-p masterdir)
convert - Got rid of libgfs dependency.  Now does everything the libgfs2 
way.
      Also some cleanup and tool standardization.
fsck - Made all block values print out in decimal and hex.
     Moved functions to libgfs2 so gfs2_convert may use them.
     Change confusing l+f directory to "lost+found" to be compatible 
with e2fsprogs.
man - Added man page for gfs2_convert.  Renamed gfs2_mkfs man page to 
mkfs.gfs2
     Deleted mkfs.gfs2 man page references to gulm.

Feel free to send questions, comments or concerns.
As usual, if I don't hear anything to the contrary, I will commit these 
changes to CVS
in a day or two.  (Maybe tomorrow due to build considerations).

Regards,

Bob Peterson
Red Hat Cluster Suite

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: convert.update2.txt
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060612/0702af3f/attachment.txt>

From wcheng at redhat.com  Mon Jun 12 19:07:23 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Mon, 12 Jun 2006 15:07:23 -0400
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
In-Reply-To: <448DA3F0.AF1C8540@sgi.com>
References: <1150089943.26019.18.camel@localhost.localdomain>
	<448DA3F0.AF1C8540@sgi.com>
Message-ID: <448DBB6B.7000408@redhat.com>

James Yarbrough wrote:

>>2. Adding a new flag into "exportfs" command, say "h", such that
>>
>>   "exportfs -uh *:/export_path"
>>
>>would un-export the entry and drop the NLM locks associated with the
>>entry.
>>    
>>
>
>This is fine for releasing the locks, but how do you plan to re-enter
>the grace period for reclaiming the locks when you relocate the export?
>And how do you intend to segregate the export for which reclaims are
>valid from the ones which are not?  How do you plan to support the
>sending of SM_NOTIFY?  This might be where a lockd per export has an
>advantage.
>
>  
>
Yeah, that's why Peter's idea (different lockd(s)) is also attractive. 
However, on the practical side, we don't plan to introduce kernel 
patches agressively. The approach is to be away from mainline NLM code 
base until we have enough QA cycles to make sure things work. The 
unexport part would allow other nfs services on the taken-over server 
un-interrupted. On the take-over server side, we currently do a global 
grace period. The plan has been to put a little delay before fixing 
take-over server's logic due to other NLM/posix lock issues - for 
example, the current (linux) NLM doesn't bother to call filesystem's 
lock method (which virtually disables any cluster filesystem's NFS 
locking across different NFS servers). However, if we have enough 
resources and/or volunteers, we may do these things in parallel. The 
following are planned:

Take-over server logic:
1. setup the statd sm file (currently /var/lib/nfs/statd/sm or the
    equivalent configured directory) properly.
2. rpc.statd is dispatched with "--ha-callout" option.
3. implement the ha-callout user mode program to create a seperate
    statd sm files for each exported ip.
4. export the target filesystem and set up grace period based on
    fsid (or devno). It will be used in NLM procedure calls by
    extracting the fsid (or devno) from nfs file handle to decide
    accepting or reject the not-reclaiming requests.
5. bring up the failover IP address.
6. send SM_NOTIFY to client machines using the configured sm
    directory created by the ha-callout program (rpc.statd -N -P).

Step 4 will be the counter-part of our unexport flag.

-- Wendy

 
From wcheng at redhat.com  Tue Jun 13 03:39:31 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Mon, 12 Jun 2006 23:39:31 -0400
Subject: [NFS] [Linux-cluster] [RFC] NLM lock failover admin interface
In-Reply-To: <9A6FE0FCC2B29846824C5CD81C6647B902207776@s228130hz1ew08.apptix-01.savvis.net>
References: <9A6FE0FCC2B29846824C5CD81C6647B902207776@s228130hz1ew08.apptix-01.savvis.net>
Message-ID: <1150169971.27203.1.camel@localhost.localdomain>

On Mon, 2006-06-12 at 09:45 -0500, Stanley, Jon wrote:
>  
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Wendy Cheng
> > Sent: Monday, June 12, 2006 12:26 AM
> > To: nfs at lists.sourceforge.net
> > Cc: linux-cluster at redhat.com
> > Subject: [Linux-cluster] [RFC] NLM lock failover admin interface

Jon, Thank you for review this - it helps !

-- Wendy

> > 
> > 1. /proc interface, say writing the fsid into a /proc directory entry
> > would end up dropping all NLM locks associated with the NFS 
> > export that
> > has fsid in its /etc/exports file.
> 
> This would defintely have it's advantages for people who know what
> they're doing - they could drop all locks without unexporting the
> filesystem.  However, it also gives people the opportunity to shoot
> themselves in the foot - by eliminating locks that are needed.  After
> weighing the pros and cons, I really don't think that any method
> accessible via /proc is a good idea.
> 
> > 
> > 2. Adding a new flag into "exportfs" command, say "h", such that
> > 
> >    "exportfs -uh *:/export_path"
> > 
> > would un-export the entry and drop the NLM locks associated with the
> > entry.
> > 
> 
> This is the best of the three, IMHO.  Gives you the safety of *knowing*
> that the filesystem was unexported before dropping the locks, and
> preventing folks from shooting themselves in the foot.
> 
> The other option that was mentioned, a separate lockd for each fs, is
> also a good idea - but would require a lot of coding no doubt, and
> introduce more instability into what I already preceive as an unstable
> NFS subsystem in Linux (I *refuse* to use Linux as an NFS server and
> instead go with Solaris - I've had *really* bad experiences with Linux
> NFS under load - but that's getting OT).
> 
> 
> _______________________________________________
> NFS maillist  -  NFS at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs


From fabbione at fabbione.net  Tue Jun 13 06:22:51 2006
From: fabbione at fabbione.net (Fabio Massimo Di Nitto)
Date: Tue, 13 Jun 2006 08:22:51 +0200
Subject: [Linux-cluster] [PATCH] Fix build failure for gettid.c
Message-ID: <448E59BB.3080503@fabbione.net>


diff -urNad --exclude=CVS --exclude=.svn ./rgmanager/src/clulib/gettid.c
/usr/src/dpatchtemp/dpep-work.JDJ8Uk/redhat-cluster-suite-1.20050706/rgmanager/src/c
lulib/gettid.c
--- ./rgmanager/src/clulib/gettid.c     2005-06-21 20:07:33.000000000 +0200
+++
/usr/src/dpatchtemp/dpep-work.JDJ8Uk/redhat-cluster-suite-1.20050706/rgmanager/src/clulib/gettid.c
 2005-07-06 06:40:22.000000000 +0200
@@ -1,7 +1,9 @@
 #include <sys/types.h>
+#include <sys/syscall.h>
 #include <linux/unistd.h>
 #include <gettid.h>
 #include <errno.h>
+#include <unistd.h>

 /* Patch from Adam Conrad / Ubuntu: Don't use _syscall macro */


This applies to the above in the STABLE branch and HEAD. The same fix needs to
be propagated to cman/qdisk/gettid.c for HEAD (yay for duplicate code ;)).

Fabio

-- 
I'm going to make him an offer he can't refuse.


From wcheng at redhat.com  Tue Jun 13 07:00:11 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Tue, 13 Jun 2006 03:00:11 -0400
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
In-Reply-To: <17550.11870.186706.36949@cse.unsw.edu.au>
References: <1150089943.26019.18.camel@localhost.localdomain>
	<17550.11870.186706.36949@cse.unsw.edu.au>
Message-ID: <1150182012.27203.42.camel@localhost.localdomain>

On Tue, 2006-06-13 at 13:17 +1000, Neil Brown wrote:

> So:
>  I think if we really want to "remove all NFS locks on a filesystem",
>  we could probably tie it into umount - maybe have lockd register some
>  callback which gets called just before s_op->umount_begin.

The "umount_begin" idea was one time on my list but got discarded. The
thought was that nfsd was not a filesystem, neither was lockd. How to
register something with VFS umount for non-filesystem kernel modules ?
Invent another autofs-like pseudo filesystem ? Mostly, not every
filesystem would like to get un-mounted upon failover (GFS, for example,
does not get un-mounted by our cluster suite upon failover).

>  If we want to remove all locks that arrived on a particular
>  interface, then we should arrange to do exactly that.  There are a
>  number of different options here. 
>   One is the multiple-lockd-threads idea.

Certainly a good option. To make it happen, we still need admin
interface. How to pass IP address from user mode into kernel - care to
give this some suggestions if you have them handy ? Should socket ports
get dynamics assigned ? Will we have scalibility issues ? 
 
>   One is to register a callback when an interface is shut down.
>   Another (possibly the best) is to arrange a new signal for lockd
>   which say "Drop any locks which were sent to IP addresses that are
>   no longer valid local addresses".

These, again, give individual filesystem no freedom to adjust what they
need upon failover. But I'll check them out this week - maybe there are
good socket layer hooks that I overlook. 

> 
> So those are my thoughts.  Do any of them seem reasonable to you?
> 

The comments are greatly appreciated. And hopefully we can reach
agreement soon.   

-- Wendy


From mdl at veles.ru  Tue Jun 13 08:23:00 2006
From: mdl at veles.ru (Denis Medvedev)
Date: Tue, 13 Jun 2006 12:23:00 +0400
Subject: [Linux-cluster] Postgresql under RHCS4
In-Reply-To: <Pine.LNX.4.63.0605221041220.6963@mail.kivi.com.tr>
References: <4462F013.40201@gmail.com>	<Pine.LNX.4.63.0605111111550.22324@mail.kivi.com.tr>	<446D734C.60308@gmail.com>	<Pine.LNX.4.63.0605191129300.7386@mail.kivi.com.tr>	<44716765.1000000@gmail.com>
	<Pine.LNX.4.63.0605221041220.6963@mail.kivi.com.tr>
Message-ID: <448E75E4.1020001@veles.ru>

Devrim GUNDUZ wrote:

>
> Hi,
>
> On Mon, 22 May 2006, carlopmart wrote:
>
>> No Devrim, I mean which can be the best form to setup replication 
>> between master and slave ... and when master goes down, if I put data 
>> on slave how can I update master node.
>
>
> You don't need a replication system there. Use an SAN :)
>
> Here is the schema:
>
> +-------+ +-------+
> |Master | |Slave |
> | | | |
> |Node | |Node |
> +-------+ +-------+
> | |
> --------------------------
> |
> |
> +--------------
> | |
> | SAN or NAS |
> | |
> +-------------+
>
>
> You will install just operating system,PostgreSQL binaries and Cluster 
> tools to both Master and Slave nodes. All $PGDATA will reside in the 
> storage.
>
> If master node goes down, Slave node will mount $PGDATA and continue 
> working. When master node is up, slave will umount $PGDATA, stop its 
> postmaster and will trigger master node and start its postmaster so 
> that data will not be corrupted.

And now we have a SPOF - SAN or NAS... how to get beyond that?

>
> Regards,
> -- 
> Devrim GUNDUZ
> Kivi Bili?im Teknolojileri - http://www.kivi.com.tr
> devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
> http://www.gunduz.org
>
>------------------------------------------------------------------------
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>


From carlopmart at gmail.com  Tue Jun 13 08:23:53 2006
From: carlopmart at gmail.com (C. L. Martinez)
Date: Tue, 13 Jun 2006 10:23:53 +0200
Subject: [Linux-cluster] Problems with ccsd
Message-ID: <590a9c800606130123s18bc3a7ic42fe7dd85ad3cc8@mail.gmail.com>

Hi all,

 I have setup two rhel4 U3 boxes with rhcs 4. When ccsd process tries to
start returns me this error:

 [root at srvimss1 init.d]# ccsd
Failed to connect to cluster manager.
Hint: Magma plugins are not in the right spot.

 How can I fix this?? Where is the problem??

 My cluster.conf:

 <?xml version="1.0" ?>
<cluster config_version="2" name="IMSS_Cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="srvimss1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="fence_gnbd"
nodename="srvimss1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="srvimss2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="fence_gnbd"
nodename="srvimss2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_gnbd" name="fence_gnbd"
servers="srvmgmt"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="PriCluster" ordered="1"
restricted="1">
                                <failoverdomainnode name="srvimss1"
priority="1"/>
                                <failoverdomainnode name="srvimss2"
priority="2"/>
                        </failoverdomain>
                        <failoverdomain name="SecCluster" ordered="1"
restricted="1">
                                <failoverdomainnode name="srvimss1"
priority="2"/>
                                <failoverdomainnode name="srvimss2"
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources/>
        </rm>
</cluster>

-- 
C.L. Martinez
clopmart at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060613/0a72f516/attachment.htm>

From lvermeiren at core-it.be  Tue Jun 13 08:28:02 2006
From: lvermeiren at core-it.be (lvermeiren at core-it.be)
Date: Tue, 13 Jun 2006 10:28:02 +0200
Subject: [Linux-cluster] Postgresql under RHCS4
Message-ID: <200606130828.k5D8S2Ut023931@outmx003.isp.belgacom.be>

> > On Mon, 22 May 2006, carlopmart wrote:
> >> No Devrim, I mean which can be the best form to setup replication 
> >> between master and slave ... and when master goes down, if I put data 
> >> on slave how can I update master node.
> >
> > You don't need a replication system there. Use an SAN :)
> And now we have a SPOF - SAN or NAS... how to get beyond that?

Redundant power, cabling, switches, controllers + raid >= 1, preferably with some hotspares configured.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060613/a47d9ca4/attachment.htm>

From riaan at obsidian.co.za  Tue Jun 13 09:44:25 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Tue, 13 Jun 2006 11:44:25 +0200 (SAST)
Subject: [Linux-cluster] Postgresql under RHCS4
In-Reply-To: <448E75E4.1020001@veles.ru>
Message-ID: <Pine.LNX.4.33.0606131034030.32632-100000@isis.obsidian.co.za>

> >
> > You don't need a replication system there. Use an SAN :)
> >
> 
> And now we have a SPOF - SAN or NAS... how to get beyond that?
> 
> >
If you do not have or want a SAN, you can investigate one of the replicated file 
systems or block devices. Here are some:

- Radiant Data PeerFS (we had a customer running this but had too many 
problems and had to disable it.)

- NetVault Replicator (http://www.bakbone.com/products/replication/) - 
looks pretty good, but havent tried it myself. They even have a whitepaper 
on using Replicator with GFS on the above page.

- DRBD (Distributed Replicated Block Device) - www.drdb.org
and with commercial support at http://www.linbit.com/linhac_drbd.html?L=1
They are my preferred choice (based on a hands-off evaluation) since they are 
OSS, but the documentation is not quite there yet, relative to the other 
two.

If anyone has had good/bad experiences with these or any other replicated
block devices / file systems, myself (and I am sure the original author)  
would like to hear them. I have a significant number of smaller customers
who want HA and RHCS, but a SAN is just not a cost-effective option in 
those environments.

Riaan


From gforte at leopard.us.udel.edu  Tue Jun 13 12:39:18 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Tue, 13 Jun 2006 08:39:18 -0400
Subject: [Linux-cluster] clurgmgrd logging wierdness
Message-ID: <448EB1F6.4000107@leopard.us.udel.edu>

Yesterday I added new <failoverdomain>, <resource>, and <service> 
sections to my cluster.conf (for a failover-able samba service, though I 
don't think that's relevant).  I incremented the version, and ran 
ccs_tool update and cman_tool version -r.

Today I noticed that the only status checks being logged in 
/var/log/messages were the ones for the smb service on the node running 
it.  Prior to my changes, all status checks were being logged on both 
nodes.  All cluster services were still running properly, but it looks 
like the status checks on everything but the smb service stopped.

After forcing a restart of rgmanager on both nodes, status checks (or at 
least logging of them) is back to normal.

Is this a bug, or am I missing something?

-g


From cjk at techma.com  Tue Jun 13 13:00:46 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Tue, 13 Jun 2006 09:00:46 -0400
Subject: [Linux-cluster] Someone FSCK'd my GFS filesystem....
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079EBC@tmaemail.techma.com>

Ok, this is a doozie, someone used a standard fsck against my GFS filesystem
from a single node. 
I can, for some reason still access the filesystem from other nodes and
things look ok. I've never had 
this happen to my systems before and quite frankly am at a loss as to what my
options are. The node
from which the fsck was run, hangs when tryng to mount the filesystem so I
believe it's a problem with
the journals. Is there a way to recover from this other then completely
rebuilding the filesystem?


Corey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060613/7ab43355/attachment.htm>

From rpeterso at redhat.com  Tue Jun 13 13:47:53 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Tue, 13 Jun 2006 08:47:53 -0500
Subject: [Linux-cluster] Problems with ccsd
In-Reply-To: <590a9c800606130123s18bc3a7ic42fe7dd85ad3cc8@mail.gmail.com>
References: <590a9c800606130123s18bc3a7ic42fe7dd85ad3cc8@mail.gmail.com>
Message-ID: <448EC209.1070905@redhat.com>

C. L. Martinez wrote:
> Hi all,
>
>  I have setup two rhel4 U3 boxes with rhcs 4. When ccsd process tries 
> to start returns me this error:
>
>  [root at srvimss1 init.d]# ccsd
> Failed to connect to cluster manager.
> Hint: Magma plugins are not in the right spot.
>
>  How can I fix this?? Where is the problem??
>
> -- 
> C.L. Martinez
> clopmart at gmail.com <mailto:clopmart at gmail.com> 
Hi C.L.

I don't know about your particular scenario, but, every time I've gotten 
this message in
the past, it's meant that I build the Red Hat Cluster Suite by hand 
(i.e. compiling it, not
adding it with RPMs or up2date, etc.) and somehow did something wrong.

The solution I've used to fix it is:

cd cluster; make uninstall; make distclean; ./configure; make install

(Assuming the cluster suite source resides in directory "cluster").
If you installed from RPMs or through the GUI, this isn't the solution,
in which case you should let me know and I'll see what I can do.  I hope 
this helps.

Regards,

Bob Peterson
Red Hat Cluster Suite


From lhh at redhat.com  Tue Jun 13 13:39:49 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 13 Jun 2006 09:39:49 -0400
Subject: [Linux-cluster] [PATCH] Fix build failure for gettid.c
In-Reply-To: <448E59BB.3080503@fabbione.net>
References: <448E59BB.3080503@fabbione.net>
Message-ID: <1150205989.20766.227.camel@ayanami.boston.redhat.com>

On Tue, 2006-06-13 at 08:22 +0200, Fabio Massimo Di Nitto wrote:
> diff -urNad --exclude=CVS --exclude=.svn ./rgmanager/src/clulib/gettid.c
> /usr/src/dpatchtemp/dpep-work.JDJ8Uk/redhat-cluster-suite-1.20050706/rgmanager/src/c
> lulib/gettid.c
> --- ./rgmanager/src/clulib/gettid.c     2005-06-21 20:07:33.000000000 +0200
> +++
> /usr/src/dpatchtemp/dpep-work.JDJ8Uk/redhat-cluster-suite-1.20050706/rgmanager/src/clulib/gettid.c
>  2005-07-06 06:40:22.000000000 +0200
> @@ -1,7 +1,9 @@
>  #include <sys/types.h>
> +#include <sys/syscall.h>
>  #include <linux/unistd.h>
>  #include <gettid.h>
>  #include <errno.h>
> +#include <unistd.h>
> 
>  /* Patch from Adam Conrad / Ubuntu: Don't use _syscall macro */

Looks good, I'll put it in today.

-- Lon


From neilb at suse.de  Tue Jun 13 03:17:50 2006
From: neilb at suse.de (Neil Brown)
Date: Tue, 13 Jun 2006 13:17:50 +1000
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
In-Reply-To: message from Wendy Cheng on Monday June 12
References: <1150089943.26019.18.camel@localhost.localdomain>
Message-ID: <17550.11870.186706.36949@cse.unsw.edu.au>

On Monday June 12, wcheng at redhat.com wrote:
> NFS v2/v3 active-active NLM lock failover has been an issue with our
> cluster suite. With current implementation, it (cluster suite) is trying
> to carry the workaround as much as it can with user mode scripts where,
> upon failover, on taken-over server, it:
> 
> 1. Tear down virtual IP.
> 2. Unexport the subject NFS export.
> 3. Signal lockd to drop the locks.
> 4. Un-mount filesystem if needed.
> 
...
>                                                                 we would
> like to be able to selectively drop locks (only) associated with the
> requested exports without disrupting other NFS services. 

There seems to be an unstated assumption here that there is one
virtual IP per exported filesystem.  Is that true?

Assuming it is and that I understand properly what you want to do....

I think that maybe the right thing to do is *not* drop the locks on a
particular filesystem, but to drop the locks made to a particular
virtual IP.

Then it would make a lot of sense to have one lockd thread per IP, and
signal the lockd in order to drop the locks.
True: that might be more code.  But if it is the right thing to do,
then it should be done that way.

On the other hand, I can see a value in removing all the locks for a
particular filesytem quite independent of failover requirements.
If I want to force-unmount a filesystem, I need to unexport it, and I
need to kill all the locks.  Currently you can only remove locks from
all filesystems, which might not be ideal.

I'm not at all keen on the NFSEXP_FOLOCK flag to exp_unexport, as that
is an interface that I would like to discard eventually.  The
preferred mechanism for exporting filesystems is to flush the
appropriate 'cache', and allow it to be repopulated with whatever is
still valid via upcalls to mountd.

So:
 I think if we really want to "remove all NFS locks on a filesystem",
 we could probably tie it into umount - maybe have lockd register some
 callback which gets called just before s_op->umount_begin.

 If we want to remove all locks that arrived on a particular
 interface, then we should arrange to do exactly that.  There are a
 number of different options here. 
  One is the multiple-lockd-threads idea.
  One is to register a callback when an interface is shut down.
  Another (possibly the best) is to arrange a new signal for lockd
  which say "Drop any locks which were sent to IP addresses that are
  no longer valid local addresses".

So those are my thoughts.  Do any of them seem reasonable to you?

NeilBrown


From neilb at suse.de  Tue Jun 13 07:08:13 2006
From: neilb at suse.de (Neil Brown)
Date: Tue, 13 Jun 2006 17:08:13 +1000
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
In-Reply-To: message from Wendy Cheng on Tuesday June 13
References: <1150089943.26019.18.camel@localhost.localdomain>
	<17550.11870.186706.36949@cse.unsw.edu.au>
	<1150182012.27203.42.camel@localhost.localdomain>
Message-ID: <17550.25693.507553.731606@cse.unsw.edu.au>

On Tuesday June 13, wcheng at redhat.com wrote:
> >   One is to register a callback when an interface is shut down.
> >   Another (possibly the best) is to arrange a new signal for lockd
> >   which say "Drop any locks which were sent to IP addresses that are
> >   no longer valid local addresses".
> 
> These, again, give individual filesystem no freedom to adjust what they
> need upon failover. But I'll check them out this week - maybe there are
> good socket layer hooks that I overlook. 
> 

Can you say more about what sort of adjustments an individual filesystem
might want the freedom to make?  It might help me understand the
issues better.

Thanks,
NeilBrown


From brentonr at dorm.org  Tue Jun 13 14:10:27 2006
From: brentonr at dorm.org (Brenton Rothchild)
Date: Tue, 13 Jun 2006 09:10:27 -0500
Subject: [Linux-cluster] Problems with ccsd
In-Reply-To: <590a9c800606130123s18bc3a7ic42fe7dd85ad3cc8@mail.gmail.com>
References: <590a9c800606130123s18bc3a7ic42fe7dd85ad3cc8@mail.gmail.com>
Message-ID: <448EC753.1010505@dorm.org>

IIRC, I've seen this when the magma-plugins RPM wasn't installed,
if you're using RPMS that is :)

-Brenton Rothchild


C. L. Martinez wrote:
> Hi all,
> 
>  I have setup two rhel4 U3 boxes with rhcs 4. When ccsd process tries to 
> start returns me this error:
> 
>  [root at srvimss1 init.d]# ccsd
> Failed to connect to cluster manager.
> Hint: Magma plugins are not in the right spot.
> 
>  How can I fix this?? Where is the problem??
> 
>  My cluster.conf:
> 
>  <?xml version="1.0" ?>
> <cluster config_version="2" name="IMSS_Cluster">
>         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="srvimss1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="fence_gnbd" 
> nodename="srvimss1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="srvimss2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="fence_gnbd" 
> nodename="srvimss2"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_gnbd" name="fence_gnbd" 
> servers="srvmgmt"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="PriCluster" ordered="1" 
> restricted="1">
>                                 <failoverdomainnode name="srvimss1" 
> priority="1"/>
>                                 <failoverdomainnode name="srvimss2" 
> priority="2"/>
>                         </failoverdomain>
>                         <failoverdomain name="SecCluster" ordered="1" 
> restricted="1">
>                                 <failoverdomainnode name="srvimss1" 
> priority="2"/>
>                                 <failoverdomainnode name="srvimss2" 
> priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources/>
>         </rm>
> </cluster>
> 
> -- 
> C.L. Martinez
> clopmart at gmail.com <mailto:clopmart at gmail.com>
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From teigland at redhat.com  Tue Jun 13 14:17:46 2006
From: teigland at redhat.com (David Teigland)
Date: Tue, 13 Jun 2006 09:17:46 -0500
Subject: [Linux-cluster] Someone FSCK'd my GFS filesystem....
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079EBC@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079EBC@tmaemail.techma.com>
Message-ID: <20060613141746.GA20730@redhat.com>

On Tue, Jun 13, 2006 at 09:00:46AM -0400, Kovacs, Corey J. wrote:
> Ok, this is a doozie, someone used a standard fsck against my GFS filesystem
> from a single node. 
> I can, for some reason still access the filesystem from other nodes and
> things look ok. I've never had 
> this happen to my systems before and quite frankly am at a loss as to what my
> options are. The node
> from which the fsck was run, hangs when tryng to mount the filesystem so I
> believe it's a problem with
> the journals. Is there a way to recover from this other then completely
> rebuilding the filesystem?

I would unmount all the nodes and have one run gfs_fsck.

Dave


From lhh at redhat.com  Tue Jun 13 14:21:42 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 13 Jun 2006 10:21:42 -0400
Subject: [Linux-cluster] clurgmgrd logging wierdness
In-Reply-To: <448EB1F6.4000107@leopard.us.udel.edu>
References: <448EB1F6.4000107@leopard.us.udel.edu>
Message-ID: <1150208502.20766.230.camel@ayanami.boston.redhat.com>

On Tue, 2006-06-13 at 08:39 -0400, Greg Forte wrote:
> Yesterday I added new <failoverdomain>, <resource>, and <service> 
> sections to my cluster.conf (for a failover-able samba service, though I 
> don't think that's relevant).  I incremented the version, and ran 
> ccs_tool update and cman_tool version -r.
> 
> Today I noticed that the only status checks being logged in 
> /var/log/messages were the ones for the smb service on the node running 
> it.  Prior to my changes, all status checks were being logged on both 
> nodes.  All cluster services were still running properly, but it looks 
> like the status checks on everything but the smb service stopped.
> 
> After forcing a restart of rgmanager on both nodes, status checks (or at 
> least logging of them) is back to normal.
> 
> Is this a bug, or am I missing something?

It might be a bug, but there's not enough information to tell right now.
Did the services remain in the 'started' state, or did one or more get
stopped for some reason after the transition?

-- Lon


From cjk at techma.com  Tue Jun 13 14:29:19 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Tue, 13 Jun 2006 10:29:19 -0400
Subject: [Linux-cluster] Someone FSCK'd my GFS filesystem....
In-Reply-To: <20060613141746.GA20730@redhat.com>
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079EBD@tmaemail.techma.com>

Ok, next time I'll take a better look at my logs....


I rebooted the node in question while watching logs of another.
problem node booted and logs reflected that gfs wasn't letting it
join cuz it was expired (seems this cluster has other problems).
I rebooted all nodes and all nodes joined the cluster.

So, after examining my fencing config, it appears that my problem
node does not have it's fence device configured properly. I can see 
the filesystem from all nodes and things look OK.

I will definitely be doing a "gfs_fsck" against the fs though just 
in case ,filesystems unmounted of course :) 

Once more lesson in getting clear details when trying to fix problems
induced by others


Cheers


Corey

-----Original Message-----
From: David Teigland [mailto:teigland at redhat.com] 
Sent: Tuesday, June 13, 2006 10:18 AM
To: Kovacs, Corey J.
Cc: linux-cluster at redhat.com
Subject: Re: [Linux-cluster] Someone FSCK'd my GFS filesystem....

On Tue, Jun 13, 2006 at 09:00:46AM -0400, Kovacs, Corey J. wrote:
> Ok, this is a doozie, someone used a standard fsck against my GFS 
> filesystem from a single node.
> I can, for some reason still access the filesystem from other nodes 
> and things look ok. I've never had this happen to my systems before 
> and quite frankly am at a loss as to what my options are. The node 
> from which the fsck was run, hangs when tryng to mount the filesystem 
> so I believe it's a problem with the journals. Is there a way to 
> recover from this other then completely rebuilding the filesystem?

I would unmount all the nodes and have one run gfs_fsck.

Dave


From gforte at leopard.us.udel.edu  Tue Jun 13 14:27:33 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Tue, 13 Jun 2006 10:27:33 -0400
Subject: [Linux-cluster] clurgmgrd logging wierdness
In-Reply-To: <1150208502.20766.230.camel@ayanami.boston.redhat.com>
References: <448EB1F6.4000107@leopard.us.udel.edu>
	<1150208502.20766.230.camel@ayanami.boston.redhat.com>
Message-ID: <448ECB55.7030701@leopard.us.udel.edu>

Lon Hohberger wrote:
> On Tue, 2006-06-13 at 08:39 -0400, Greg Forte wrote:
>> Yesterday I added new <failoverdomain>, <resource>, and <service> 
>> sections to my cluster.conf (for a failover-able samba service, though I 
>> don't think that's relevant).  I incremented the version, and ran 
>> ccs_tool update and cman_tool version -r.
>>
>> Today I noticed that the only status checks being logged in 
>> /var/log/messages were the ones for the smb service on the node running 
>> it.  Prior to my changes, all status checks were being logged on both 
>> nodes.  All cluster services were still running properly, but it looks 
>> like the status checks on everything but the smb service stopped.
>>
>> After forcing a restart of rgmanager on both nodes, status checks (or at 
>> least logging of them) is back to normal.
>>
>> Is this a bug, or am I missing something?
> 
> It might be a bug, but there's not enough information to tell right now.
> Did the services remain in the 'started' state, or did one or more get
> stopped for some reason after the transition?

AFAIK, none of the services ever left the 'started' state, except for 
the samba service which got started right after the update.  In fact, 
they couldn't have stopped, because if any of them had my monitoring 
agent on another box would've squawked.

-g


From malexand at wu-wien.ac.at  Tue Jun 13 17:32:46 2006
From: malexand at wu-wien.ac.at (Michael Alexander)
Date: Tue, 13 Jun 2006 19:32:46 +0200
Subject: [Linux-cluster] CfP Workshop on XEN in HPC Cluster and Grid
	Computing Environments (XHPC)
Message-ID: <CCF57296-74CE-42C7-9B39-AED658082746@wu-wien.ac.at>


===============================================================
CALL FOR PAPERS (XHPC'06)

Workshop on XEN in High-Performance Cluster and Grid Computing
Environments as part of:

The Fourth International Symposium on Parallel and Distributed
Processing and Applications (ISPA'2006). Sorrento, Italy
===============================================================

Date: 1-4 December 2006

ISPA'2006:  http://www.ispa-conference.org/2006/
Workshop URL: http://xhpc.ai.wu-wien.ac.at/ws/

(due date: August 4, Abstracts Jul 17)


Scope:
The Xen virtual machine monitor is reaching wide-spread adoption
in a variety of operating systems as well as scientific educational
and operational usage areas. With its low overhead, Xen allows for
concurrently running large numbers of virtual machines, providing
each encapsulation, isolation and network-wide CPU migratability.
Xen offers a network-wide abstraction layer of individual machine
resources to OS environments, thereby opening whole new cluster-and
grid high-performance computing (HPC) architectures and HPC services
options. With Xen finding applications in HPC environments, this
workshop aims to bring together researchers and practitioners active
on Xen in high-performance cluster and grid computing environments.

The workshop will be one day in length, composed of 20 min paper
presentations, each followed by 10 min discussion sections.
Presentations may be accompanied with interactive demonstrations.
The workshop will end with a 30 min panel discussion by presenters.


TOPICS

Topics include, but are not limited to, the following subject matters:

   - Xen in cluster and grid environments
   - Workload characterizations for Xen-based clusters
   - Xen cluster and grid architectures
   - Cluster reliability, fault-tolerance, and security
   - Compute job entry and scheduling
   - Compute workload load levelling
   - Cluster and grid filesystems for Xen
   - Research and education use cases
   - VM cluster distribution algorithms
   - MPI, PVM  on virtual machines
   - System sizing
   - High-speed interconnects in Xen
   - Xen extensions and utilities for cluster and grid computing
   - Network architectures for Xen clusters
   - Xen on large SMP machines
   - Measuring performance
   - Performance tuning of Xen domains
   - Xen performance tuning on various load types
   - Xen cluster/grid tools
   - Management of Xen clusters


PAPER SUBMISSION

Papers submitted to each workshop will be reviewed by at least three
members of the program committee and external reviewers. Submissions
should include abstract, key words, the e-mail address of the
corresponding author, and must not exceed 15 pages, including tables
and figures, and preferably be in LaTeX or FrameMaker, although
submissions in the LNCS Word format will be accepted as well.
Electronic submission through the submission website is strongly
encouraged. Hardcopies will be accepted only if electronic submission
is not possible. Submission of a paper should be regarded as a
commitment that, should the paper be accepted, at least one of the
authors will register and attend the conference to present the work.
An award for best student paper will be given.

http://isda2006.ujn.edu.cn/isda/author/submit.php

Format should be according to the Springer LNCS Style
http://www.springer.de/comp/lncs/authors.html

It is expected that the proceedings of the workshop programs will
be published by Springer's LNCS series or IEEE CS.


IMPORTANT DATES

July 17, 2006 - Abstract submissions due
Paper submission due: August 4, 2006
Acceptance notification: September 1, 2006
Camera-ready due: September 20, 2006
Conference: December 1-4, 2006


CHAIR

Michael Alexander (chair), WU Vienna, Austria
Geyong Min (co-chair), University of Bradford, UK
Gudula Ruenger (co-chair), Chemnitz University of Technology, Germany


PROGRAM COMMITTEE

Franck Cappello, CNRS-Universit? Paris-Sud, France
Claudia Eckert, Fraunhofer-Institute, Germany
Rob Gardner, HP Labs, USA
Marcus Hardt, Forschungszentrum Karlsruhe, Germany
Sverre Jarp, CERN, Switzerland
Thomas Lange, University of Cologne, Germany
Ronald Luijten, IBM Research Laboratory, Zurich, Switzerland
Klaus Ita, WU Vienna, Austria
Franco Travostino, Nortel CTO Office, USA
Andreas Unterkircher, CERN, Switzerland


GENERAL INFORMATION

This workshop will be held as part of ISPA 2006 in Sorrento, Italy -
http://www.sorrentoinfo.com/sorrento/sorrento_italy.asp
A pre-conference trip to the ESA ESRIN facility in Frascati on
November 30 will be organized.


From rpeterso at redhat.com  Tue Jun 13 18:09:30 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Tue, 13 Jun 2006 13:09:30 -0500
Subject: [Linux-cluster] New Mailing List: cluster-devel
Message-ID: <448EFF5A.9060604@redhat.com>

Hi Cluster People,

Lately, a few of us have been sending cluster development patches and such
to linux-cluster for things like gfs2.  We recently decided this was too 
much
"noise" for this mailing list.  Still, we wanted to keep everyone here 
informed
and in the development loop so everyone has a chance to contribute and 
participate.
So we created a new public mailing list called "cluster-devel". 
You can subscribe to cluster-devel from this web page:

https://www.redhat.com/mailman/listinfo/cluster-devel

All CVS commit messages are automatically sent there and they contain a 
diff of
the changes, and that makes it easier to see the changes and comment on 
them.

I encourage everyone to subscribe to the new cluster-devel mailing list, 
so you
can submit patches, make suggestions, read about the latest development 
efforts,
tell us where we fall short, or stay informed regarding cluster 
development issues.

I'll still use linux-cluster as an open forum for general discussion and 
solving clustering
issues and problems, but I'll try to move my development issues over to 
cluster-devel.

Regards,

Bob Peterson
Red Hat Cluster Suite


From wcheng at redhat.com  Wed Jun 14 06:54:51 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Wed, 14 Jun 2006 02:54:51 -0400
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
In-Reply-To: <17550.11870.186706.36949@cse.unsw.edu.au>
References: <1150089943.26019.18.camel@localhost.localdomain>
	<17550.11870.186706.36949@cse.unsw.edu.au>
Message-ID: <1150268091.28264.75.camel@localhost.localdomain>

Hi,

KABI (kernel application binary interface) commitment is a big thing
from our end - so I would like to focus more on the interface agreement
before jumping into coding and implementation details. 

>   One is the multiple-lockd-threads idea.

Assume we still have this on the table.... Could I expect the admin
interface goes thru rpc.lockd command (man page and nfs-util code
changes) ? The modified command will take similar options as rpc.statd;
more specifically, the -n, -o, and -p (see "man rpc.statd"). To pass the
individual IP (socket address) to kernel, we'll need nfsctl with struct
nfsctl_svc modified.

For the kernel piece, since we're there anyway, could we have the
individual lockd IP interface passed to SM (statd) (in SM_MON call) ?
This would allow statd to structure its SM files based on each lockd IP
address, an important part of lock recovery.

>   One is to register a callback when an interface is shut down.

Haven't checked out (linux) socket interface yet. I'm very fuzzy how
this can be done. Anyone has good ideas ? 

>   Another (possibly the best) is to arrange a new signal for lockd
>   which say "Drop any locks which were sent to IP addresses that are
>   no longer valid local addresses".

Very appealing - but the devil's always in the details. How to decide
which IP address is no longer valid ? Or how does lockd know about these
IP addresses ? And how to associate one particular IP address with the
"struct nlm_file" entries within nlm_files list ? Need few more days to
sort this out (or any one already has ideas in mind ?).

-- Wendy


From riaan at obsidian.co.za  Wed Jun 14 11:27:25 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Wed, 14 Jun 2006 13:27:25 +0200 (SAST)
Subject: [Linux-cluster] Red Hat Summit presentations
Message-ID: <Pine.LNX.4.33.0606141326300.21187-100000@isis.obsidian.co.za>

For anyone interested in the Red Hat Summit presentations, they are 
available on-line now. 

The presentations on Clustering and Storage are 
available here: http://www.redhat.com/promo/summit/presentations/cns.htm

Riaan


From hch at infradead.org  Wed Jun 14 11:36:05 2006
From: hch at infradead.org (Christoph Hellwig)
Date: Wed, 14 Jun 2006 12:36:05 +0100
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
In-Reply-To: <1150268091.28264.75.camel@localhost.localdomain>
References: <1150089943.26019.18.camel@localhost.localdomain>
	<17550.11870.186706.36949@cse.unsw.edu.au>
	<1150268091.28264.75.camel@localhost.localdomain>
Message-ID: <20060614113605.GA28158@infradead.org>

On Wed, Jun 14, 2006 at 02:54:51AM -0400, Wendy Cheng wrote:
> Hi,
> 
> KABI (kernel application binary interface) commitment is a big thing
> from our end - so I would like to focus more on the interface agreement
> before jumping into coding and implementation details. 

Please stop this crap now.  If zou don't get that there is no kernel internal
ABI and there never will be get a different job ASAP.


From wcheng at redhat.com  Wed Jun 14 13:39:04 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Wed, 14 Jun 2006 09:39:04 -0400
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
In-Reply-To: <20060614113605.GA28158@infradead.org>
References: <1150089943.26019.18.camel@localhost.localdomain>
	<17550.11870.186706.36949@cse.unsw.edu.au>
	<1150268091.28264.75.camel@localhost.localdomain>
	<20060614113605.GA28158@infradead.org>
Message-ID: <1150292344.28264.87.camel@localhost.localdomain>

On Wed, 2006-06-14 at 12:36 +0100, Christoph Hellwig wrote:
> On Wed, Jun 14, 2006 at 02:54:51AM -0400, Wendy Cheng wrote:
> > Hi,
> > 
> > KABI (kernel application binary interface) commitment is a big thing
> > from our end - so I would like to focus more on the interface agreement
> > before jumping into coding and implementation details. 
> 
> Please stop this crap now.  If zou don't get that there is no kernel internal
> ABI and there never will be get a different job ASAP.

Actually I don't quite understand this statement (sorry! English is not
my native language) but it is ok. People are entitled for different
opinions and I respect yours.  

On the technical side, just a pre-cautious, in case we need to touch
some kernel export symbols so it would be nice to have external (and
admin) interfaces decided before we start to code. 

So I'll not talk about this and I assume we can keep focusing on NLM
issues. No more noises from each other. Fair ?

-- Wendy


From wcheng at redhat.com  Wed Jun 14 14:00:54 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Wed, 14 Jun 2006 10:00:54 -0400
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
In-Reply-To: <1150268091.28264.75.camel@localhost.localdomain>
References: <1150089943.26019.18.camel@localhost.localdomain>
	<17550.11870.186706.36949@cse.unsw.edu.au>
	<1150268091.28264.75.camel@localhost.localdomain>
Message-ID: <1150293654.28264.91.camel@localhost.localdomain>

On Wed, 2006-06-14 at 02:54 -0400, Wendy Cheng wrote:

> 
> Assume we still have this on the table.... Could I expect the admin
> interface goes thru rpc.lockd command (man page and nfs-util code
> changes) ? The modified command will take similar options as rpc.statd;
> more specifically, the -n, -o, and -p (see "man rpc.statd"). To pass the
> individual IP (socket address) to kernel, we'll need nfsctl with struct
> nfsctl_svc modified.

I want to make sure people catch this. Here we're talking about NFS
system call interface changes. We need either a new NFS syscall or
altering the existing nfsctl_svc structure.

-- Wendy

> 
> For the kernel piece, since we're there anyway, could we have the
> individual lockd IP interface passed to SM (statd) (in SM_MON call) ?
> This would allow statd to structure its SM files based on each lockd IP
> address, an important part of lock recovery.
> 
> >   One is to register a callback when an interface is shut down.
> 
> Haven't checked out (linux) socket interface yet. I'm very fuzzy how
> this can be done. Anyone has good ideas ? 
> 
> >   Another (possibly the best) is to arrange a new signal for lockd
> >   which say "Drop any locks which were sent to IP addresses that are
> >   no longer valid local addresses".
> 
> Very appealing - but the devil's always in the details. How to decide
> which IP address is no longer valid ? Or how does lockd know about these
> IP addresses ? And how to associate one particular IP address with the
> "struct nlm_file" entries within nlm_files list ? Need few more days to
> sort this out (or any one already has ideas in mind ?).
> 
> -- Wendy
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From rainer at ultra-secure.de  Wed Jun 14 15:05:19 2006
From: rainer at ultra-secure.de (Rainer Duffner)
Date: Wed, 14 Jun 2006 17:05:19 +0200
Subject: [Linux-cluster] Red Hat Summit presentations
In-Reply-To: <Pine.LNX.4.33.0606141326300.21187-100000@isis.obsidian.co.za>
References: <Pine.LNX.4.33.0606141326300.21187-100000@isis.obsidian.co.za>
Message-ID: <449025AF.8070602@ultra-secure.de>

Riaan van Niekerk wrote:
> For anyone interested in the Red Hat Summit presentations, they are 
> available on-line now. 
>
> The presentations on Clustering and Storage are 
> available here: http://www.redhat.com/promo/summit/presentations/cns.htm
>
> Riaan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   

Hello,

The "NFS with Linux"-link doesn't work.


Rainer


From bobby.m.dalton at nasa.gov  Wed Jun 14 17:09:54 2006
From: bobby.m.dalton at nasa.gov (Dalton, Maurice)
Date: Wed, 14 Jun 2006 12:09:54 -0500
Subject: [Linux-cluster] Clumanager RHEL3
Message-ID: <EB190CD1E73E1146ACB7694746E205A8016562A0@hx1.ums.msfc.nasa.gov>

I am running a 2 system cluster with NFS services.

kernel-smp-2.4.21-40.EL
clumanager-1.2.31-1
nfs-utils-1.0.6-43EL

In my /var/log/message I have several of messages saying:

clusvcmgrd[7960]: <warning> Starvation on Lock #4!

Anyone know what this means?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060614/99b2c90a/attachment.htm>

From smartjoe at gmail.com  Wed Jun 14 18:49:19 2006
From: smartjoe at gmail.com (jOe)
Date: Thu, 15 Jun 2006 02:49:19 +0800
Subject: [Linux-cluster] Why Redhat replace quorum partition/lock lun with
	new fencing mechanisms?
Message-ID: <a0e72880606141149g258b7bd6j905f7b0ea73392f5@mail.gmail.com>

Hello all,

Sorry if this is a stupid question.

I deploy both HP MC/SG linux edition and RHCS for our customers. I just
wondered why the latest RHCS remove quorum partition/lock lun with the new
fencing mechanisms(powerswitch,iLO/DRAC, SAN switch....)?
Lots of our customers choosed HP's sophisticated MC/SG linux edition for
their mission critical system in Two Node Cluster Configuration. From our
monthly health check service and customers' feedback, i do think  HP SGLX is
reliable and stable,
even under heavy I/O traffic, the lock lun(quorum disk) works pretty good.
And the whole cluster architecture is simple and clean, at same time means
less issue and problem .
I do think Redhat's product team is strong and obviously have their solid
reasons to choose new mechanisms in RHCS v4. I've investigated and i can
understand that  quorum disk/lock lun in two node cluster configuration
"Might Bring" more latency and impact the cluster but according to my
previous words, i'm sure that it is pretty stable to use lock lun/quorum
partition of HP SG/LX even under heavy I/O loads.

I have no intention to start a comparison between HP SGLX and RedHat RHCS,
All i want to get clear is  quorum disk/lock lun Vs RHCS's new fencing
mechanisms.

Regards,

Jun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060615/1180a0ff/attachment.htm>

From riaan at obsidian.co.za  Wed Jun 14 20:05:40 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Wed, 14 Jun 2006 22:05:40 +0200 (SAST)
Subject: [Linux-cluster] Red Hat Summit presentations
In-Reply-To: <449025AF.8070602@ultra-secure.de>
Message-ID: <Pine.LNX.4.33.0606142148540.21187-100000@isis.obsidian.co.za>

> > For anyone interested in the Red Hat Summit presentations, they are
> > available on-line now.  > > The presentations on Clustering and
> Storage are > available here:
> http://www.redhat.com/promo/summit/presentations/cns.htm >
> 
> Hello,
> 
> The "NFS with Linux"-link doesn't work.
> 

The same paper is available under the Security track, and that link is
working. I will pass on the broken link.


From lhh at redhat.com  Wed Jun 14 21:26:29 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 14 Jun 2006 17:26:29 -0400
Subject: [Linux-cluster] Clumanager RHEL3
In-Reply-To: <EB190CD1E73E1146ACB7694746E205A8016562A0@hx1.ums.msfc.nasa.gov>
References: <EB190CD1E73E1146ACB7694746E205A8016562A0@hx1.ums.msfc.nasa.gov>
Message-ID: <1150320389.20766.301.camel@ayanami.boston.redhat.com>

On Wed, 2006-06-14 at 12:09 -0500, Dalton, Maurice wrote:
> I am running a 2 system cluster with NFS services.
> 
> kernel-smp-2.4.21-40.EL 
> clumanager-1.2.31-1 
> nfs-utils-1.0.6-43EL
> 
> In my /var/log/message I have several of messages saying:
> 
> clusvcmgrd[7960]: <warning> Starvation on Lock #4!

High system load or slow shared storage causing a node to not be able to
obtain a lock in a timely manner.  It retries, though, so if no other
problems occur, it can usually be ignored.

-- Lon


From akornev at gmail.com  Wed Jun 14 22:43:25 2006
From: akornev at gmail.com (Anton Kornev)
Date: Thu, 15 Jun 2006 01:43:25 +0300
Subject: [Linux-cluster] GFS locking issues
Message-ID: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com>

Hi,

I have some locking issues (deadlocks?) with GFS.

My configuration include 4 hosts - one of them is used as GNBD-device
exporter and 3 other import this GNBD partition and mount it to the /gfs
mountpoint.

LVM is also used on the imported GNBD partition, so clmvd is running.
The locking method is DLM, GFS version is 6.1.5, manual fencing used.


The problem is quite usual - deadlock on httpd (httpd processess in 'D'
state)
I saw such problems, though not solutions on the list.

In my case apache is placed to the GFS filesystem and I run it inside th
chroot by the command like this:

chroot /gfs/chroot /usr/local/apache/bin/httpd

The problem appears sometimes after "killall httpd" - all the httpd
processes
get the 'D' state in "ps ax" terms and become locked in this state forever.


Moreover the whole GFS filesystem become unavailable after it happens.
Even from another host every command that tries to access /gfs partition
hangs in the 'D' state. Though last time it was unavailable only partially
- the /gfs/chroot/usr hierarchy was "locked" but other parts of gfs worked
okay.

The only cure I know is to reboot the node and fence it out from the
cluster.


Is there any ideas of how to fix this? I mean either the reason ('D' state
of
killed httpd-s) or consequences (the GFS filesystem fully or partially
become unavailable after this).

I also appreciate any help with debugging the problem.

I tried gfs_tool lockdump with decipher_lockstate_dump tool.

bash-3.00# ps ax |grep http
14981 ?        Ds     0:00 /usr/system/apache/bin/httpd
15242 ?        D      0:00 /usr/system/apache/bin/httpd
24708 ?        D      0:00 /usr/system/apache/bin/httpd
24709 ?        D      0:00 /usr/system/apache/bin/httpd
24710 ?        D      0:00 /usr/system/apache/bin/httpd

I found only 2 locks regarding these processes:

bash-3.00# ls -i /gfs/chroot/lib64/libnss_files-2.3.4.so
27190 /gfs/chroot/lib64/libnss_files-2.3.4.so

Glock (inode[2], 27190)
  gl_flags = lock[1]
  gl_count = 7
  gl_state = shared[3]
  req_gh = yes
  req_bh = yes
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 1
  ail_bufs = no
  Request
    owner = 24710
    gh_state = shared[3]
    gh_flags =
    error = 0
    gh_iflags = promote[1] holder[6] first[7]
  Holder
    owner = 24710
    gh_state = shared[3]
    gh_flags =
    error = 0
    gh_iflags = promote[1] holder[6] first[7]
  Waiter3
    owner = 24708
    gh_state = shared[3]
    gh_flags =
    error = 0
    gh_iflags = promote[1]
  Waiter3
    owner = 24709
    gh_state = shared[3]
    gh_flags =
    error = 0
    gh_iflags = promote[1]
  Waiter3
    owner = 15242
    gh_state = shared[3]
    gh_flags =
    error = 0
    gh_iflags = promote[1]
  Inode: busy

and

bash-3.00# ls -i /gfs/chroot/usr/system/apache/bin/httpd
2175961 /gfs/chroot/usr/system/apache/bin/httpd

Glock (inode[2], 2175961)
  gl_flags =
  gl_count = 4
  gl_state = shared[3]
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 1
  ail_bufs = no
  Holder
    owner = 14981
    gh_state = shared[3]
    gh_flags =
    error = 0
    gh_iflags = promote[1] holder[6] first[7]
  Inode: busy

There are also such locks for this inodes:


Glock (iopen[5], 27190)
  gl_flags =
  gl_count = 2
  gl_state = shared[3]
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = no
  ail_bufs = no
  Holder
    owner = none[-1]
    gh_state = shared[3]
    gh_flags = local_excl[5] exact[7]
    error = 0
    gh_iflags = promote[1] holder[6] first[7]


Glock (iopen[5], 2175961)
  gl_flags =
  gl_count = 2
  gl_state = shared[3]
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = no
  ail_bufs = no
  Holder
    owner = none[-1]
    gh_state = shared[3]
    gh_flags = local_excl[5] exact[7]
    error = 0
    gh_iflags = promote[1] holder[6] first[7]


During the last hanging the "/gfs/chroot/usr" was unavailable and there are
two entries regarding this directory in the lockdump:

bash-3.00# ls -di /gfs/chroot/usr/
15077981 /gfs/chroot/usr/

Glock (inode[2], 15077981)
  gl_flags =
  gl_count = 4
  gl_state = shared[3]
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = yes
  aspace = 1
  ail_bufs = no
  Inode:
    num = 15077981/15077981
    type = directory[2]
    i_count = 1
    i_flags =
    vnode = yes

Glock (iopen[5], 15077981)
  gl_flags =
  gl_count = 2
  gl_state = shared[3]
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = no
  ail_bufs = no
  Holder
    owner = none[-1]
    gh_state = shared[3]
    gh_flags = local_excl[5] exact[7]
    error = 0
    gh_iflags = promote[1] holder[6] first[7]


Your comments will be highly appreciated.

-- 
Best Regards,
Anton Kornev.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060615/c70beeef/attachment.htm>

From kanderso at redhat.com  Thu Jun 15 02:47:06 2006
From: kanderso at redhat.com (Kevin Anderson)
Date: Wed, 14 Jun 2006 21:47:06 -0500
Subject: [Linux-cluster] Why Redhat replace quorum partition/lock lun
	with new fencing mechanisms?
In-Reply-To: <a0e72880606141149g258b7bd6j905f7b0ea73392f5@mail.gmail.com>
References: <a0e72880606141149g258b7bd6j905f7b0ea73392f5@mail.gmail.com>
Message-ID: <1150339626.2982.51.camel@localhost.localdomain>

On Thu, 2006-06-15 at 02:49 +0800, jOe wrote:
> Hello all,
> 
> Sorry if this is a stupid question.
> 
> I deploy both HP MC/SG linux edition and RHCS for our customers. I
> just wondered why the latest RHCS remove quorum partition/lock lun
> with the new fencing mechanisms(powerswitch,iLO/DRAC, SAN
> switch....)? 

First off, I don't think it is completely fair to compare quorum
partitions to fencing.  They really serve different purposes.  Quorum
partition gives you the ability to maintain the cluster through flakey
network spikes.  It will keep you from prematurely removing nodes from
the cluster.  Fencing is really used to provide data integrity of your
shared storage devices.  You really want to make sure that a node is
gone before recovering their data.  Just because a node isn't updating
the quorum partition, doesn't mean it isn't still scrogging your file
systems.  However, a combination of the two provides a pretty solid
cluster in small configurations.  And a quorum disk has another nice
feature that is useful.

That said, a little history before I get to the punch line.  Two
clustering technologies were merged together for RHCS 4.x releases and
the resulting software used the core cluster infrastructure that was
part of the GFS product for both RHCS and RHGFS.  GFS didn't have a
quorum partition as an option primarily due to scalability reasons.  The
quorum disk works fine for a limited number of nodes, but the core
cluster infrastructure needed to be able to scale to large numbers.  The
fencing mechanisms provide the ability to ensure data integrity in that
type of configuration.  So, the quorum disk wasn't carried into the new
cluster infrastructure at that time.

Good news is we realized the deficiency and have added quorum disk
support and it will be part of the RHCS4.4 update release which should
be hitting the RHN beta sites within a few days.  This doesn't replace
the need to have a solid fencing infrastructure in place.  When a node
fails, you still need to ensure that it is gone and won't corrupt the
filesystem.  Quorum disk will still have scalability issues and is
really targeted at small clusters, ie <16 nodes.  This is primarily due
to having multiple machines pounding on the same storage device.  It
also provides an additional feature, the ability to represent a
configurable number of votes.  If you set the quorum device to have the
same number of votes as nodes in the cluster.  You can maintain cluster
sanity down to a single active compute node in the cluster.  We can get
rid of our funky special two node configuration option.  You will then
be able to grow a two node cluster without having to reset.

Sorry I rambled a bit..

Thanks
Kevin


From neilb at suse.de  Thu Jun 15 04:27:01 2006
From: neilb at suse.de (Neil Brown)
Date: Thu, 15 Jun 2006 14:27:01 +1000
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
In-Reply-To: message from Wendy Cheng on Wednesday June 14
References: <1150089943.26019.18.camel@localhost.localdomain>
	<17550.11870.186706.36949@cse.unsw.edu.au>
	<1150268091.28264.75.camel@localhost.localdomain>
Message-ID: <17552.57749.121240.42384@cse.unsw.edu.au>

On Wednesday June 14, wcheng at redhat.com wrote:
> Hi,
> 
> KABI (kernel application binary interface) commitment is a big thing
> from our end - so I would like to focus more on the interface agreement
> before jumping into coding and implementation details. 
> 

Before we can agree on an interface, we need to be clear what
functionality is required.

You started out suggesting that the required functionality was to
"remove all locks that lockd holds on a particular filesystem".

I responded that I suspect a better functionality was "remove all
locks that locked holds on behalf of a particular IP address".

You replied that this such an approach

>  give[s] individual filesystem no freedom to adjust what they
> need upon failover. 

I asked:
> Can you say more about what sort of adjustments an individual filesystem
> might want the freedom to make?  It might help me understand the
> issues better.

and am still waiting for an answer.  Without an answer, I still lean
towards and IP-address based approach, and the reply from James
Yarbrough seems to support that (though I don't want to read too much
into his comments).

Lockd is not currently structured to associate locks with
server-ip-addresses.  There is an assumption that one client may talk
to any of the IP addresses that the server supports.  This is clearly
not the case for the failover scenario that you are considering, so a
little restructuring might be in order.

Some locks will be held on behalf of a client, no matter what
interface the requests arrive on.  Other locks will be held on behalf
of a client and tied to a particular server IP address.  Probably the
easiest way to make this distinction in as a new nfsd export flag.

So, maybe something like this:

  Add a 'struct sockaddr_in' to 'struct nlm_file'.
  If nlm_fopen return (say) 3, then treat is as success, and 
    also copy rqstp->rq_addr into that 'sockaddr_in'.
  define a new file in the 'nfsd' filesystem into which can
    be written an IP address and which calls some new lockd
    function which releases all locks held for that IP address.
  Probably get nlm_lookup_file to insist that if the sockaddr_in
    is defined in a lock, it must match the one in rqstp

Does that sound OK ?


> >   One is the multiple-lockd-threads idea.
> 
> Assume we still have this on the table.... Could I expect the admin
> interface goes thru rpc.lockd command (man page and nfs-util code
> changes) ? The modified command will take similar options as rpc.statd;
> more specifically, the -n, -o, and -p (see "man rpc.statd"). To pass the
> individual IP (socket address) to kernel, we'll need nfsctl with struct
> nfsctl_svc modified.

I'm losing interest in the multiple-lockd-threads approach myself (for
the moment anyway :-)
However I would be against trying to re-use rpc.lockd - that was a
mistake that is best forgotten.
If the above approach were taken, then I don't think you need anything
more than
   echo aa.bb.cc.dd > /proc/fs/nfsd/vserver_unlock
(or whatever), though it you really want to wrap that in a shell
script that might be ok.

> 
> For the kernel piece, since we're there anyway, could we have the
> individual lockd IP interface passed to SM (statd) (in SM_MON call) ?
> This would allow statd to structure its SM files based on each lockd IP
> address, an important part of lock recovery.
> 

Maybe....  but I don't get the scenario.
Surely the SM files are only needed when the server  restarts, and in
that case it needs to notify all clients... Or is it that you want to
make sure the notification comes from the right IP address.... I guess
that would make sense.  I that what you are after?


> >   One is to register a callback when an interface is shut down.
> 
> Haven't checked out (linux) socket interface yet. I'm very fuzzy how
> this can be done. Anyone has good ideas ? 

No good idea, but I have a feeling there is a callback we could use.
However I think I am going off this idea.

> 
> >   Another (possibly the best) is to arrange a new signal for lockd
> >   which say "Drop any locks which were sent to IP addresses that are
> >   no longer valid local addresses".
> 
> Very appealing - but the devil's always in the details. How to decide
> which IP address is no longer valid ? Or how does lockd know about these
> IP addresses ? And how to associate one particular IP address with the
> "struct nlm_file" entries within nlm_files list ? Need few more days to
> sort this out (or any one already has ideas in mind ?).

See above.

NeilBrown


From wcheng at redhat.com  Thu Jun 15 06:39:24 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Thu, 15 Jun 2006 02:39:24 -0400
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
In-Reply-To: <17552.57749.121240.42384@cse.unsw.edu.au>
References: <1150089943.26019.18.camel@localhost.localdomain>
	<17550.11870.186706.36949@cse.unsw.edu.au>
	<1150268091.28264.75.camel@localhost.localdomain>
	<17552.57749.121240.42384@cse.unsw.edu.au>
Message-ID: <1150353564.4566.89.camel@localhost.localdomain>

On Thu, 2006-06-15 at 14:27 +1000, Neil Brown wrote:

> You started out suggesting that the required functionality was to
> "remove all locks that lockd holds on a particular filesystem".

I didn't make this clear. No, we don't want to "remove all locks
associated with a particular filesystem". We want to "remove all locks
associated with an NFS service" - one NFS service is normally associated
with one NFS export. For example, say in /etc/exports:

/mnt/export_fs/dir_1     *(fsid=1,async,rw)
/mnt/export_fs/dir_2     *(fsid=2,async,rw)

One same filesystem (export_fs) is exported via two entries, each with
its own fsid. The "fsid" is eventually encoded as part of the filehanlde
stored into "struct nlm_file" and linked into nlm_file global list. 

This is to allow, not only active-active failover (for local filesystem
such as ext3), but also load balancing for cluster file systems (such as
GFS). 

In reality, each NFS service is associated with one virtual IP. The
failover and load-balancing tasks are carried out by moving the virtual
IP around - so I'm ok with the idea of "remove all locks that lockd
holds on behalf of a particular IP address".
  
> 
> Lockd is not currently structured to associate locks with
> server-ip-addresses.  There is an assumption that one client may talk
> to any of the IP addresses that the server supports.  This is clearly
> not the case for the failover scenario that you are considering, so a
> little restructuring might be in order.
> 
> Some locks will be held on behalf of a client, no matter what
> interface the requests arrive on.  Other locks will be held on behalf
> of a client and tied to a particular server IP address.  Probably the
> easiest way to make this distinction in as a new nfsd export flag.

We're very close now - note that I originally proposed adding a new nfsd
export flag (NFSEXP_FOLOCKS) so we can OR it into export's ex_flag upon
un-export. If the new action flag is set, a new sub-call added into
unexport kernel routine will walk thru nlm_file to find the export entry
(matched by either fsid or devno, taken from filehandle, within nlm_file
struct); then subsequently release the lock.   

The ex_flag is an "int" but currently only used up to 16 bit. So my new
export flag is defined as: NFSEXP_FOLOCKS 0x00010000. 

> 
> So, maybe something like this:
> 
>   Add a 'struct sockaddr_in' to 'struct nlm_file'.
>   If nlm_fopen return (say) 3, then treat is as success, and 
>     also copy rqstp->rq_addr into that 'sockaddr_in'.
>   define a new file in the 'nfsd' filesystem into which can
>     be written an IP address and which calls some new lockd
>     function which releases all locks held for that IP address.
>   Probably get nlm_lookup_file to insist that if the sockaddr_in
>     is defined in a lock, it must match the one in rqstp

Yes, we definitely can do this but there is a "BUT" from our end. What I
did in my prototyping code is taking filehandle from nlm_file structure
and yank the fsid (or devno) out of it (so we didn't need to know the
socket address). With (your) above approach, adding a new field into
"struct nlm_file" to hold the sock addr, sadly say, violates our KABI
policy. 

I learnt my lesson. Forget KABI for now. Let me see what you have in the
next paragraph (so I can know how to response ...)

> 
> 
> > >   One is the multiple-lockd-threads idea.
> 
> I'm losing interest in the multiple-lockd-threads approach myself (for
> the moment anyway :-)
> However I would be against trying to re-use rpc.lockd - that was a
> mistake that is best forgotten.
> If the above approach were taken, then I don't think you need anything
> more than
>    echo aa.bb.cc.dd > /proc/fs/nfsd/vserver_unlock
> (or whatever), though it you really want to wrap that in a shell
> script that might be ok.

This is funny - so we go back to /proc. OK with me :) but you may want
to re-think my exportfs command approach. Want me to go over the
unexport flow again ? The idea is to add a new user mode flag, say "-h".
If you unexport the interface as:

shell> exportfs -u *:/export_path   // nothing happens, old behavior

but if you do:

shell> exportfs -hu *:/export_patch  // the kernel code would walk thru 
                                     // nlm_file list to release the
                                     // the locks.

The "-h" "OR" 0x0001000 into ex_flags field of struct nfsctl_export so
kernel can know what to do. With fsid (or devno) in filehandle within
nlm_file, we don't need socket address at all. 

But again, I'm OK with /proc approach. However, with /proc approach, we
may need socket address (since not every export uses fsid and devno is
not easy to get).

Do we agree now ? In simple sentence, I prefer my original "exportfs -
hu" approach. But I'm ok with /proc if you insist.


> 
> > 
> > For the kernel piece, since we're there anyway, could we have the
> > individual lockd IP interface passed to SM (statd) (in SM_MON call) ?
> > This would allow statd to structure its SM files based on each lockd IP
> > address, an important part of lock recovery.
> > 
> 
> Maybe....  but I don't get the scenario.
> Surely the SM files are only needed when the server  restarts, and in
> that case it needs to notify all clients... Or is it that you want to
> make sure the notification comes from the right IP address.... I guess
> that would make sense.  I that what you are after?

Yes ! Right now, lockd doesn't pass the specific server address (that
client connects to) to statd. I don't know how the "-H" can ever work.
Consider this a bug. If you forget what "rpc.statd -H" is, check out the
man page (man rpc.statd).

Thank you for the patience - I'm grateful.

-- Wendy


From neilb at suse.de  Thu Jun 15 08:02:48 2006
From: neilb at suse.de (Neil Brown)
Date: Thu, 15 Jun 2006 18:02:48 +1000
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
In-Reply-To: message from Wendy Cheng on Thursday June 15
References: <1150089943.26019.18.camel@localhost.localdomain>
	<17550.11870.186706.36949@cse.unsw.edu.au>
	<1150268091.28264.75.camel@localhost.localdomain>
	<17552.57749.121240.42384@cse.unsw.edu.au>
	<1150353564.4566.89.camel@localhost.localdomain>
Message-ID: <17553.5160.366425.740082@cse.unsw.edu.au>

On Thursday June 15, wcheng at redhat.com wrote:
> On Thu, 2006-06-15 at 14:27 +1000, Neil Brown wrote:
> 
> > You started out suggesting that the required functionality was to
> > "remove all locks that lockd holds on a particular filesystem".
> 
> I didn't make this clear. No, we don't want to "remove all locks
> associated with a particular filesystem". We want to "remove all locks
> associated with an NFS service" - one NFS service is normally associated
> with one NFS export. For example, say in /etc/exports:
> 
> /mnt/export_fs/dir_1     *(fsid=1,async,rw)
> /mnt/export_fs/dir_2     *(fsid=2,async,rw)

That makes sense.

> 
> One same filesystem (export_fs) is exported via two entries, each with
> its own fsid. The "fsid" is eventually encoded as part of the filehanlde
> stored into "struct nlm_file" and linked into nlm_file global list. 
> 
> This is to allow, not only active-active failover (for local filesystem
> such as ext3), but also load balancing for cluster file systems (such as
> GFS). 

Could you please explain to me what "active-active failover for local
filesystem such as ext3" means (I'm not very familiar with cluster
terminology).
It sounds like the filesystem is active on two nodes at once, which of
course cannot work for ext3, so I am confused.
And if you are doing "failover", what has failed?

The load-balancing scenario makes sense (at least so far...).

> 
> In reality, each NFS service is associated with one virtual IP. The
> failover and load-balancing tasks are carried out by moving the virtual
> IP around - so I'm ok with the idea of "remove all locks that lockd
> holds on behalf of a particular IP address".
>   

Good. :-)

> > 
> > Lockd is not currently structured to associate locks with
> > server-ip-addresses.  There is an assumption that one client may talk
> > to any of the IP addresses that the server supports.  This is clearly
> > not the case for the failover scenario that you are considering, so a
> > little restructuring might be in order.
> > 
> > Some locks will be held on behalf of a client, no matter what
> > interface the requests arrive on.  Other locks will be held on behalf
> > of a client and tied to a particular server IP address.  Probably the
> > easiest way to make this distinction in as a new nfsd export flag.
> 
> We're very close now - note that I originally proposed adding a new nfsd
> export flag (NFSEXP_FOLOCKS) so we can OR it into export's ex_flag upon
> un-export. If the new action flag is set, a new sub-call added into
> unexport kernel routine will walk thru nlm_file to find the export entry
> (matched by either fsid or devno, taken from filehandle, within nlm_file
> struct); then subsequently release the lock.   
> 
> The ex_flag is an "int" but currently only used up to 16 bit. So my new
> export flag is defined as: NFSEXP_FOLOCKS 0x00010000. 
> 

Our two export flags mean VERY different things.
Mine says 'locks against this export are per-server-ip-address'.
Yours says (I think) 'remove all lockd locks from this export' and is
really an unexport flag, not an export flag.

And this makes it not really workable.  We no-longer require the user
of the nfssvc syscall to unexport filesystems.  Infact nfs-utils doesn't
use it at all if /proc/fs/nfsd is mounted.  filesystems are unexported
by their entry in the export cache expiring, or the cache being
flushed.

There is simply no room in the current knfsd design for an unexport
flag - sorry ;-(


> > 
> > So, maybe something like this:
> > 
> >   Add a 'struct sockaddr_in' to 'struct nlm_file'.
> >   If nlm_fopen return (say) 3, then treat is as success, and 
> >     also copy rqstp->rq_addr into that 'sockaddr_in'.
> >   define a new file in the 'nfsd' filesystem into which can
> >     be written an IP address and which calls some new lockd
> >     function which releases all locks held for that IP address.
> >   Probably get nlm_lookup_file to insist that if the sockaddr_in
> >     is defined in a lock, it must match the one in rqstp
> 
> Yes, we definitely can do this but there is a "BUT" from our end. What I
> did in my prototyping code is taking filehandle from nlm_file structure
> and yank the fsid (or devno) out of it (so we didn't need to know the
> socket address). With (your) above approach, adding a new field into
> "struct nlm_file" to hold the sock addr, sadly say, violates our KABI
> policy. 

Does it?
'struct nlm_file' is a structure that is entirely local to lockd.
It does not feature in any of the interface between lockd and any
other part of the kernel.  It is not part of any credible KABI.
The other changes I suggest involve adding an exported symbol to
lockd, which does change the KABI but in a completely back-compatible
way, and re-interpreting the return value of a callout.  
That could not break any external module - it could only break
someone's setup if they had an alternate lockd module, but I don't
your KABI policy allows people to replace modules and stay supported,

However, as you say....

> 
> I learnt my lesson. Forget KABI for now. Let me see what you have in the
> next paragraph (so I can know how to response ...)
> 

....we aren't going to let KABI issues get in our way.

> > 
> > 
> > > >   One is the multiple-lockd-threads idea.
> > 
> > I'm losing interest in the multiple-lockd-threads approach myself (for
> > the moment anyway :-)
> > However I would be against trying to re-use rpc.lockd - that was a
> > mistake that is best forgotten.
> > If the above approach were taken, then I don't think you need anything
> > more than
> >    echo aa.bb.cc.dd > /proc/fs/nfsd/vserver_unlock
> > (or whatever), though it you really want to wrap that in a shell
> > script that might be ok.
> 
> This is funny - so we go back to /proc. OK with me :)

Only sort-of back to /proc.  /proc/fs/nfsd is a separate filesystem
which happens to be mounted there normally.
The unexport system call goes through this exact same filesystem
(though it is somewhat under-the-hood) so at that level, we are
really propose the same style of interface implementation.

>                                                       but you may want
> to re-think my exportfs command approach. Want me to go over the
> unexport flow again ? The idea is to add a new user mode flag, say "-h".
> If you unexport the interface as:
> 
> shell> exportfs -u *:/export_path   // nothing happens, old behavior
> 
> but if you do:
> 
> shell> exportfs -hu *:/export_patch  // the kernel code would walk thru 
>                                      // nlm_file list to release the
>                                      // the locks.
> 
> The "-h" "OR" 0x0001000 into ex_flags field of struct nfsctl_export so
> kernel can know what to do. With fsid (or devno) in filehandle within
> nlm_file, we don't need socket address at all. 

But apart from nfsctl_export being a dead end, this is still
exportpoint specific rather than IP address specific.

> 
> But again, I'm OK with /proc approach. However, with /proc approach, we
> may need socket address (since not every export uses fsid and devno is
> not easy to get).

Absolutely. We need a socket address.  
As part of this process you are shutting down an interface.  We know
(or can easily discover) the address of that interface.  That is
exactly the address that we feed to nfsd.

> 
> Do we agree now ? In simple sentence, I prefer my original "exportfs -
> hu" approach. But I'm ok with /proc if you insist.
> 

I'm not at an 'insist'ing stage at the moment - I like to at least
pretend to be open minded :-)

The main thing I don't like about your "exportfs -hu" approach is that
I don't think it will work (actually, looking at nfs-utils, I'm not so
sure that "exportfs -u" will work at all if you don't have 
/proc/fs/nfsd mounted....)

The other thing I don't like is that it doesn't address your primary
need - decommissioning an IP address.
Rather it addresses a secondary need - removing some locks from some
filesystems. 

But I'm still open to debate...

> 
> > 
> > > 
> > > For the kernel piece, since we're there anyway, could we have the
> > > individual lockd IP interface passed to SM (statd) (in SM_MON call) ?
> > > This would allow statd to structure its SM files based on each lockd IP
> > > address, an important part of lock recovery.
> > > 
> > 
> > Maybe....  but I don't get the scenario.
> > Surely the SM files are only needed when the server  restarts, and in
> > that case it needs to notify all clients... Or is it that you want to
> > make sure the notification comes from the right IP address.... I guess
> > that would make sense.  I that what you are after?
> 
> Yes ! Right now, lockd doesn't pass the specific server address (that
> client connects to) to statd. I don't know how the "-H" can ever work.
> Consider this a bug. If you forget what "rpc.statd -H" is, check out the
> man page (man rpc.statd).

I have to admit I have never given that code a lot of attention.  I
reviewed when sent it - it seemed to make sense and had no obvious
problems - so I accepted it.  I wouldn't be enormously surprised if it
didn't work in some situations.

> 
> Thank you for the patience - I'm grateful.

Ditto.
Conversations work much better when people are patient and polite.

Thanks,
NeilBrown


From andros at citi.umich.edu  Thu Jun 15 14:07:43 2006
From: andros at citi.umich.edu (William A.(Andy) Adamson)
Date: Thu, 15 Jun 2006 10:07:43 -0400
Subject: [NFS] [Linux-cluster] Re: [RFC] NLM lock failover admin interface
In-Reply-To: <1150293654.28264.91.camel@localhost.localdomain> 
References: <1150089943.26019.18.camel@localhost.localdomain> 
	<17550.11870.186706.36949@cse.unsw.edu.au>
	<1150268091.28264.75.camel@localhost.localdomain>
	<1150293654.28264.91.camel@localhost.localdomain>
Message-ID: <20060615140743.36CDC1BBAD@citi.umich.edu>

this discusion has centered around removing the locks of an export.
we also want the interface to ge able to remove the locks owned by a single 
client. this is needed to enable client migration between replica's or between 
nodes in a cluster file system. it is not acceptable to place an entire export 
in grace just to move a small number of clients.

-->Andy

wcheng at redhat.com said:
> On Wed, 2006-06-14 at 02:54 -0400, Wendy Cheng wrote:
> 
> Assume we still have this on the table.... Could I expect the admin
> interface goes thru rpc.lockd command (man page and nfs-util code
> changes) ? The modified command will take similar options as rpc.statd;
> more specifically, the -n, -o, and -p (see "man rpc.statd"). To pass the
> individual IP (socket address) to kernel, we'll need nfsctl with struct
> nfsctl_svc modified.
>
> I want to make sure people catch this. Here we're talking about NFS system
> call interface changes. We need either a new NFS syscall or altering the
> existing nfsctl_svc structure.

> -- Wendy


From wcheng at redhat.com  Thu Jun 15 15:09:41 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Thu, 15 Jun 2006 11:09:41 -0400
Subject: [NFS] [Linux-cluster] Re: [RFC] NLM lock failover admin interface
In-Reply-To: <20060615140743.36CDC1BBAD@citi.umich.edu>
References: <1150089943.26019.18.camel@localhost.localdomain>
	<17550.11870.186706.36949@cse.unsw.edu.au>
	<1150268091.28264.75.camel@localhost.localdomain>
	<1150293654.28264.91.camel@localhost.localdomain>
	<20060615140743.36CDC1BBAD@citi.umich.edu>
Message-ID: <44917835.40805@redhat.com>

William A.(Andy) Adamson wrote:

>this discusion has centered around removing the locks of an export.
>we also want the interface to ge able to remove the locks owned by a single 
>client. this is needed to enable client migration between replica's or between 
>nodes in a cluster file system. it is not acceptable to place an entire export 
>in grace just to move a small number of clients.
>
>  
>
Andy,

Gotcha ... forgot about NFS V4. BTW, the discussion has moved back to 
/proc interface. I agree we need to add one more layer of granularity 
into it. Glad you caught this flaw.

-- Wendy


From smartjoe at gmail.com  Thu Jun 15 16:30:22 2006
From: smartjoe at gmail.com (jOe)
Date: Fri, 16 Jun 2006 00:30:22 +0800
Subject: [Linux-cluster] Why Redhat replace quorum partition/lock lun with
	new fencing mechanisms?
In-Reply-To: <1150339626.2982.51.camel@localhost.localdomain>
References: <a0e72880606141149g258b7bd6j905f7b0ea73392f5@mail.gmail.com>
	<1150339626.2982.51.camel@localhost.localdomain>
Message-ID: <a0e72880606150930w7e94de78s119033b0944f80f6@mail.gmail.com>

On 6/15/06, Kevin Anderson <kanderso at redhat.com> wrote:
>
> On Thu, 2006-06-15 at 02:49 +0800, jOe wrote:
> > Hello all,
> >
> > Sorry if this is a stupid question.
> >
> > I deploy both HP MC/SG linux edition and RHCS for our customers. I
> > just wondered why the latest RHCS remove quorum partition/lock lun
> > with the new fencing mechanisms(powerswitch,iLO/DRAC, SAN
> > switch....)?
>
> First off, I don't think it is completely fair to compare quorum
> partitions to fencing.  They really serve different purposes.  Quorum
> partition gives you the ability to maintain the cluster through flakey
> network spikes.  It will keep you from prematurely removing nodes from
> the cluster.  Fencing is really used to provide data integrity of your
> shared storage devices.  You really want to make sure that a node is
> gone before recovering their data.  Just because a node isn't updating
> the quorum partition, doesn't mean it isn't still scrogging your file
> systems.  However, a combination of the two provides a pretty solid
> cluster in small configurations.  And a quorum disk has another nice
> feature that is useful.
>
> That said, a little history before I get to the punch line.  Two
> clustering technologies were merged together for RHCS 4.x releases and
> the resulting software used the core cluster infrastructure that was
> part of the GFS product for both RHCS and RHGFS.  GFS didn't have a
> quorum partition as an option primarily due to scalability reasons.  The
> quorum disk works fine for a limited number of nodes, but the core
> cluster infrastructure needed to be able to scale to large numbers.  The
> fencing mechanisms provide the ability to ensure data integrity in that
> type of configuration.  So, the quorum disk wasn't carried into the new
> cluster infrastructure at that time.
>
> Good news is we realized the deficiency and have added quorum disk
> support and it will be part of the RHCS4.4 update release which should
> be hitting the RHN beta sites within a few days.  This doesn't replace
> the need to have a solid fencing infrastructure in place.  When a node
> fails, you still need to ensure that it is gone and won't corrupt the
> filesystem.  Quorum disk will still have scalability issues and is
> really targeted at small clusters, ie <16 nodes.  This is primarily due
> to having multiple machines pounding on the same storage device.  It
> also provides an additional feature, the ability to represent a
> configurable number of votes.  If you set the quorum device to have the
> same number of votes as nodes in the cluster.  You can maintain cluster
> sanity down to a single active compute node in the cluster.  We can get
> rid of our funky special two node configuration option.  You will then
> be able to grow a two node cluster without having to reset.
>
> Sorry I rambled a bit..
>
> Thanks
> Kevin
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


Thank you very much Kevin, your information is very useful to us and i've
shared it to our engineer team.
Here are two questions still left:
Q1: In a two node cluster config, how does RHCS(v4) handle the heartbeat
failed ? (suppose the bonded heartbeat path still failed by some bad
situations).
When using quorum disk/lock lun, the quorum will act as a tier breaker and
solve the brain-split if heartbeat failed. Currently the GFS will do this ?
or other part of RHCS?

Q2: As you mentioned the quorum disk support is added into  RHCS v4.4 update
release, so in a two-nodes-cluster config "quorum disk+bonding
heartbeat+fencing(powerswitch or iLO/DRAC) (no GFS)" is the recommended
config from RedHat? Almost 80% cluster requests from our customers are
around two-nodes-cluster(10% is RAC and the left is hpc cluster), We really
want to provide our customers a simple and solid cluster config in their
production environment, Most customer configure their HA cluster as
Active/passive so GFS is not necessary to them and they even don't want GFS
exists in their two-nodes-cluster system.

I do think more and more customers will choose RHCS as their cluster
solution and we'll push this after completely understand RHCS's technical
benefits and advanced mechanisms.

Thanks a lot,

Jun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060616/616b2b3e/attachment.htm>

From admin.cluster at gmail.com  Thu Jun 15 17:05:39 2006
From: admin.cluster at gmail.com (Anthony)
Date: Thu, 15 Jun 2006 19:05:39 +0200
Subject: [Linux-cluster] GFS failure
Message-ID: <44919363.80806@gmail.com>

Hello,

yesterday,
we had a full GFS system Fail,
all partitions were unaccessible from all the 32 nodes.
and now all the cluster is inaccessible.
did any one had already seen this problem?


GFS: Trying to join cluster "lock_gulm", "gen:ir"
GFS: fsid=gen:ir.32: Joined cluster. Now mounting FS...
GFS: fsid=gen:ir.32: jid=32: Trying to acquire journal lock...
GFS: fsid=gen:ir.32: jid=32: Looking at journal...
GFS: fsid=gen:ir.32: jid=32: Done

NETDEV WATCHDOG: jnet0: transmit timed out
ipmi_kcs_sm: kcs hosed: Not in read state for error2
NETDEV WATCHDOG: jnet0: transmit timed out
ipmi_kcs_sm: kcs hosed: Not in read state for error2

GFS: fsid=gen:ir.32: fatal: filesystem consistency error
GFS: fsid=gen:ir.32:   function = trans_go_xmote_bh
GFS: fsid=gen:ir.32:   file = 
/usr/src/build/626614-x86_64/BUILD/gfs-kernel-2.6.9-42/smp/src/gfs/glops.c, 
line = 542
GFS: fsid=gen:ir.32:   time = 1150223491
GFS: fsid=gen:ir.32: about to withdraw from the cluster
GFS: fsid=gen:ir.32: waiting for outstanding I/O
GFS: fsid=gen:ir.32: telling LM to withdraw


From rpeterso at redhat.com  Thu Jun 15 18:18:47 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Thu, 15 Jun 2006 13:18:47 -0500
Subject: [Linux-cluster] GFS failure
In-Reply-To: <44919363.80806@gmail.com>
References: <44919363.80806@gmail.com>
Message-ID: <4491A487.6090501@redhat.com>

Anthony wrote:
> Hello,
>
> yesterday,
> we had a full GFS system Fail,
> all partitions were unaccessible from all the 32 nodes.
> and now all the cluster is inaccessible.
> did any one had already seen this problem?
>
>
> GFS: Trying to join cluster "lock_gulm", "gen:ir"
> GFS: fsid=gen:ir.32: Joined cluster. Now mounting FS...
> GFS: fsid=gen:ir.32: jid=32: Trying to acquire journal lock...
> GFS: fsid=gen:ir.32: jid=32: Looking at journal...
> GFS: fsid=gen:ir.32: jid=32: Done
>
> NETDEV WATCHDOG: jnet0: transmit timed out
> ipmi_kcs_sm: kcs hosed: Not in read state for error2
> NETDEV WATCHDOG: jnet0: transmit timed out
> ipmi_kcs_sm: kcs hosed: Not in read state for error2
>
> GFS: fsid=gen:ir.32: fatal: filesystem consistency error
> GFS: fsid=gen:ir.32:   function = trans_go_xmote_bh
> GFS: fsid=gen:ir.32:   file = 
> /usr/src/build/626614-x86_64/BUILD/gfs-kernel-2.6.9-42/smp/src/gfs/glops.c, 
> line = 542
> GFS: fsid=gen:ir.32:   time = 1150223491
> GFS: fsid=gen:ir.32: about to withdraw from the cluster
> GFS: fsid=gen:ir.32: waiting for outstanding I/O
> GFS: fsid=gen:ir.32: telling LM to withdraw
Hi Anthony,

This problem could be caused by a couple of things.  Basically, it 
indicates a filesystem
consistency error occurred.  In this particular case, it means that a 
write was done to the
file system, and a transaction lock was taken out, but after the write 
transaction, the journal
for the written data was found to be still in use.  That means one of 
two things:

Either (1) some process was writing to the GFS journal when they 
shouldn't be (i.e. without
the necessary lock) or else (2) the journal data written was somehow 
corrupted on disk.

In the past, we've often tracked down such problems to hardware 
failures; in other words,
even without the GFS file system in the loop, if you use a command like 
'dd' to send data to
the raw hard disk device, then use dd to retrieve it, the data comes 
back from the hardware
different than what was written out.  That particular scenario is 
documented as bugzilla bug
175589.

I'm not saying that is your problem, but I'm saying that's what we've 
seen in the past.

My recommendation is to read the bugzilla, back up your entire file 
system or copy it to
a different set of drives, then perhaps you can do some hardware tests 
as described in the
bugzilla to see whether your hardware can consistently write data, read 
it back, and get
a match between what was written and what was read back.  Do this test 
without GFS in
there at all, and hopefully with only one node accessing that storage at 
a time.

You will probably also want to run gfs_fsck before mounting again to 
check the consistency
of the file system, just in case some rogue process on one of the nodes 
was doing something
destructive.

WARNING: overwriting your GFS file system will of course damage what was 
there,
so you better be careful not to destroy your data and make a copy before 
doing this.

If the hardware checks out 100% and you can recreate the failure, open a 
bugzilla against GFS
and we'll go from there.  In other words, we don't know of any problems 
with GFS that
can cause this, beyond hardware problems.

I hope this helps.

Regards,

Bob Peterson
Red Hat Cluster Suite


From teigland at redhat.com  Thu Jun 15 18:26:25 2006
From: teigland at redhat.com (David Teigland)
Date: Thu, 15 Jun 2006 13:26:25 -0500
Subject: [Linux-cluster] GFS failure
In-Reply-To: <44919363.80806@gmail.com>
References: <44919363.80806@gmail.com>
Message-ID: <20060615182624.GA1913@redhat.com>

On Thu, Jun 15, 2006 at 07:05:39PM +0200, Anthony wrote:
> Hello,
> 
> yesterday,
> we had a full GFS system Fail,
> all partitions were unaccessible from all the 32 nodes.
> and now all the cluster is inaccessible.
> did any one had already seen this problem?
> 
> 
> GFS: Trying to join cluster "lock_gulm", "gen:ir"
> GFS: fsid=gen:ir.32: Joined cluster. Now mounting FS...
> GFS: fsid=gen:ir.32: jid=32: Trying to acquire journal lock...
> GFS: fsid=gen:ir.32: jid=32: Looking at journal...
> GFS: fsid=gen:ir.32: jid=32: Done
> 
> NETDEV WATCHDOG: jnet0: transmit timed out
> ipmi_kcs_sm: kcs hosed: Not in read state for error2
> NETDEV WATCHDOG: jnet0: transmit timed out
> ipmi_kcs_sm: kcs hosed: Not in read state for error2
> 
> GFS: fsid=gen:ir.32: fatal: filesystem consistency error
> GFS: fsid=gen:ir.32:   function = trans_go_xmote_bh
> GFS: fsid=gen:ir.32:   file = 
> /usr/src/build/626614-x86_64/BUILD/gfs-kernel-2.6.9-42/smp/src/gfs/glops.c, 
> line = 542
> GFS: fsid=gen:ir.32:   time = 1150223491
> GFS: fsid=gen:ir.32: about to withdraw from the cluster
> GFS: fsid=gen:ir.32: waiting for outstanding I/O
> GFS: fsid=gen:ir.32: telling LM to withdraw

This looks like
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164331

which was fixed back in March and should be in the latest rpm's or source
tarball.

Dave


From wcheng at redhat.com  Thu Jun 15 18:43:50 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Thu, 15 Jun 2006 14:43:50 -0400
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
In-Reply-To: <17553.5160.366425.740082@cse.unsw.edu.au>
References: <1150089943.26019.18.camel@localhost.localdomain>	<17550.11870.186706.36949@cse.unsw.edu.au>	<1150268091.28264.75.camel@localhost.localdomain>	<17552.57749.121240.42384@cse.unsw.edu.au>	<1150353564.4566.89.camel@localhost.localdomain>
	<17553.5160.366425.740082@cse.unsw.edu.au>
Message-ID: <4491AA66.2050900@redhat.com>

Neil Brown wrote:

>Could you please explain to me what "active-active failover for local
>filesystem such as ext3" means 
>
Clustering is a profilic subject so the term may mean different things 
to different people. The setup we discuss here is to move an NFS service 
from one server to the other while both servers are up and running 
(active-active). The goal is not to disturb other NFS services that are 
not involved with the transition.

>It sounds like the filesystem is active on two nodes at once, which of
>course cannot work for ext3, so I am confused.
>And if you are doing "failover", what has failed?
>
>The load-balancing scenario makes sense (at least so far...).
>  
>
Local filesystem such as ext3 will never be mounted on more than two 
nodes but cluster filesystems (e.g. our GFS) will. Moving ext3 normally 
implies error conditions (a true failover) though in rare cases, it may 
be kicked off for load balancing purpose. Current GFS locking has the 
"node-id" concept - the easiest way (at this moment) for virtual IP to 
float around is to drop the locks and let NLM reclaim the locks from the 
new server.

>
>Our two export flags mean VERY different things.
>Mine says 'locks against this export are per-server-ip-address'.
>Yours says (I think) 'remove all lockd locks from this export' and is
>really an unexport flag, not an export flag.
>
>And this makes it not really workable.  We no-longer require the user
>of the nfssvc syscall to unexport filesystems.  Infact nfs-utils doesn't
>use it at all if /proc/fs/nfsd is mounted.  filesystems are unexported
>by their entry in the export cache expiring, or the cache being
>flushed.
>  
>
The important thing (for me) is the vfsmount reference count which can 
only be properly decreased when unexport is triggered. Without 
decreasing the vfsmount, ext3 can not be un-mounted (and we need to 
umount ext3 upon failover). I havn't looked into community versions of 
kernel source for a while (but I'll check). So what can I do to ensure 
this will happen ? - i.e., after the filesystem has been accessed by 
nfsd, how can I safely un-mount it without shuting down nfsd (and/or 
lockd) ?   

>'struct nlm_file' is a structure that is entirely local to lockd.
>It does not feature in any of the interface between lockd and any
>other part of the kernel.  It is not part of any credible KABI.
>The other changes I suggest involve adding an exported symbol to
>lockd, which does change the KABI but in a completely back-compatible
>way, and re-interpreting the return value of a callout.  
>That could not break any external module - it could only break
>someone's setup if they had an alternate lockd module, but I don't
>your KABI policy allows people to replace modules and stay supported,
>  
>
Yes, you're right ! I looked into the wrong code (well, it was late in 
the night so I was not very functional at that moment). Had some 
prototype code where I transported the nlm_file from one server to 
another server , experimenting auto-reclaiming locks without stated. I 
exported the nlm_file list there. So let's forget about this 

>>>>>  One is the multiple-lockd-threads idea.
>>>>>          
>>>>>
>>>I'm losing interest in the multiple-lockd-threads approach myself (for
>>>the moment anyway :-)
>>>      
>>>
Good! because I'm not sure whether we'll hit scalibility issue or not 
(100 nfs services implies 100 lockd threads !).

>>>However I would be against trying to re-use rpc.lockd - that was a
>>>mistake that is best forgotten.
>>>      
>>>
Highlight this :) ... Give me some comfort feelings that I'm not the 
only person who would make mistakes.

>>>If the above approach were taken, then I don't think you need anything
>>>more than
>>>   echo aa.bb.cc.dd > /proc/fs/nfsd/vserver_unlock
>>>(or whatever), though it you really want to wrap that in a shell
>>>script that might be ok.
>>>      
>>>
>>This is funny - so we go back to /proc. OK with me :)
>>    
>>
>
>Only sort-of back to /proc.  /proc/fs/nfsd is a separate filesystem
>which happens to be mounted there normally.
>The unexport system call goes through this exact same filesystem
>(though it is somewhat under-the-hood) so at that level, we are
>really propose the same style of interface implementation.
>  
>
>>But again, I'm OK with /proc approach. However, with /proc approach, we
>>may need socket address (since not every export uses fsid and devno is
>>not easy to get).
>>    
>>
>
>Absolutely. We need a socket address.  
>As part of this process you are shutting down an interface.  We know
>(or can easily discover) the address of that interface.  That is
>exactly the address that we feed to nfsd.
>  
>
Now, it looks good ! Will do the following:
 
1. Futher understand the steps to make sure we can un-mount ext3 due to 
"unexport" method changes.
2. Start to code to the /proc interface and make sure "rpc.stated -H"can 
work (lock reclaiming needs it). Will keep NFS v4 in mind as well.

By the way, there is a socket state-change-handler (TCP only) and/or 
network interface notification routine that seem to be workable (your 
previous thoughts). However, I don't plan to keep exploring that 
possibility since we now have a simple and workable method in place.

-- Wendy


From teigland at redhat.com  Thu Jun 15 19:09:59 2006
From: teigland at redhat.com (David Teigland)
Date: Thu, 15 Jun 2006 14:09:59 -0500
Subject: [Linux-cluster] GFS locking issues
In-Reply-To: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com>
References: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com>
Message-ID: <20060615190959.GB1913@redhat.com>

On Thu, Jun 15, 2006 at 01:43:25AM +0300, Anton Kornev wrote:

> Is there any ideas of how to fix this? I mean either the reason ('D'
> state of killed httpd-s) or consequences (the GFS filesystem fully or
> partially become unavailable after this).
> 
> I also appreciate any help with debugging the problem.
> 
> I tried gfs_tool lockdump with decipher_lockstate_dump tool.

I don't see anything wrong in the lockdumps you gave, although I'm not an
expert at interpreting gfs lockdumps.  Could you do a ps showing the wchan
for those processes?  Using sysrq to get a stack dump would also be useful.
You might also do a dlm lock dump and pick out those locks:
  echo "lockspace name" >> /proc/cluster/dlm_locks
  cat /proc/cluster/dlm_locks

I/O stuck in gnbd could also be a problem, I'm not sure what the signs of
that might be apart from possibly the wchan.

Dave


From kanderso at redhat.com  Thu Jun 15 19:38:34 2006
From: kanderso at redhat.com (Kevin Anderson)
Date: Thu, 15 Jun 2006 14:38:34 -0500
Subject: [Linux-cluster] Why Redhat replace quorum partition/lock lun
	with new fencing mechanisms?
In-Reply-To: <a0e72880606150930w7e94de78s119033b0944f80f6@mail.gmail.com>
References: <a0e72880606141149g258b7bd6j905f7b0ea73392f5@mail.gmail.com>
	<1150339626.2982.51.camel@localhost.localdomain>
	<a0e72880606150930w7e94de78s119033b0944f80f6@mail.gmail.com>
Message-ID: <1150400314.2810.34.camel@localhost.localdomain>

On Fri, 2006-06-16 at 00:30 +0800, jOe wrote:


> 
> Thank you very much Kevin, your information is very useful to us and
> i've shared it to our engineer team.
> Here are two questions still left: 
> Q1: In a two node cluster config, how does RHCS(v4) handle the
> heartbeat failed ? (suppose the bonded heartbeat path still failed by
> some bad situations). 

Current configuration requires using power fencing when running the
special case two node cluster.  If you lose heartbeat between the two
machines, both nodes will attempt to fence the other node.  The node
that wins the fencing race gets to stay up, the other node is reset and
won't be able to re-establish quorum until connectivity is restored.

> When using quorum disk/lock lun, the quorum will act as a tier breaker
> and solve the brain-split if heartbeat failed. Currently the GFS will
> do this ? or other part of RHCS?

Quorum support is integrated in the core cluster infrastructure so is
usable with just RHCS.  You do not need GFS to use a quorum disk.

> 
> Q2: As you mentioned the quorum disk support is added into  RHCS v4.4
> update release, so in a two-nodes-cluster config "quorum disk+bonding
> heartbeat+fencing(powerswitch or iLO/DRAC) (no GFS)" is the
> recommended config from RedHat? Almost 80% cluster requests from our
> customers are around two-nodes-cluster(10% is RAC and the left is hpc
> cluster), We really want to provide our customers a simple and solid
> cluster config in their production environment, Most customer
> configure their HA cluster as Active/passive so GFS is not necessary
> to them and they even don't want GFS exists in their two-nodes-cluster
> system. 

If you have access to shared storage, then a two node cluster with
quorum disk/fencing would be a better configuration and could be the
recommended configuration.  However, there are still cases where you
could have a two node cluster with no shared storage.  Depends on how
the application is sharing state or accessing data.  But for an
active/passive two node failover cluster, I can see where the quorum
disk will be very popular.

Kevin


From smartjoe at gmail.com  Thu Jun 15 19:51:01 2006
From: smartjoe at gmail.com (jOe)
Date: Fri, 16 Jun 2006 03:51:01 +0800
Subject: [Linux-cluster] Why Redhat replace quorum partition/lock lun with
	new fencing mechanisms?
In-Reply-To: <1150400314.2810.34.camel@localhost.localdomain>
References: <a0e72880606141149g258b7bd6j905f7b0ea73392f5@mail.gmail.com>
	<1150339626.2982.51.camel@localhost.localdomain>
	<a0e72880606150930w7e94de78s119033b0944f80f6@mail.gmail.com>
	<1150400314.2810.34.camel@localhost.localdomain>
Message-ID: <a0e72880606151251y3b0a6d57race79fcd65d16acf@mail.gmail.com>

On 6/16/06, Kevin Anderson <kanderso at redhat.com> wrote:
>
> On Fri, 2006-06-16 at 00:30 +0800, jOe wrote:
>
>
> >
> > Thank you very much Kevin, your information is very useful to us and
> > i've shared it to our engineer team.
> > Here are two questions still left:
> > Q1: In a two node cluster config, how does RHCS(v4) handle the
> > heartbeat failed ? (suppose the bonded heartbeat path still failed by
> > some bad situations).
>
> Current configuration requires using power fencing when running the
> special case two node cluster.  If you lose heartbeat between the two
> machines, both nodes will attempt to fence the other node.  The node
> that wins the fencing race gets to stay up, the other node is reset and
> won't be able to re-establish quorum until connectivity is restored.
>
> > When using quorum disk/lock lun, the quorum will act as a tier breaker
> > and solve the brain-split if heartbeat failed. Currently the GFS will
> > do this ? or other part of RHCS?
>
> Quorum support is integrated in the core cluster infrastructure so is
> usable with just RHCS.  You do not need GFS to use a quorum disk.
>
> >
> > Q2: As you mentioned the quorum disk support is added into  RHCS v4.4
> > update release, so in a two-nodes-cluster config "quorum disk+bonding
> > heartbeat+fencing(powerswitch or iLO/DRAC) (no GFS)" is the
> > recommended config from RedHat? Almost 80% cluster requests from our
> > customers are around two-nodes-cluster(10% is RAC and the left is hpc
> > cluster), We really want to provide our customers a simple and solid
> > cluster config in their production environment, Most customer
> > configure their HA cluster as Active/passive so GFS is not necessary
> > to them and they even don't want GFS exists in their two-nodes-cluster
> > system.
>
> If you have access to shared storage, then a two node cluster with
> quorum disk/fencing would be a better configuration and could be the
> recommended configuration.  However, there are still cases where you
> could have a two node cluster with no shared storage.  Depends on how
> the application is sharing state or accessing data.  But for an
> active/passive two node failover cluster, I can see where the quorum
> disk will be very popular.
>
> Kevin
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


Thank you very much.

Jun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060616/dc46932d/attachment.htm>

From vlaurenz at advance.net  Tue Jun 13 14:15:29 2006
From: vlaurenz at advance.net (Vito Laurenza)
Date: Tue, 13 Jun 2006 10:15:29 -0400
Subject: [Linux-cluster] Send notification when a node is fenced?
Message-ID: <448EC881.5030509@advance.net>

Would someone kindly tell me how to configure cluster suite to notify 
(via email) when a node has been fenced or when a node leaves or joins a 
cluster?  I can't seem to find any documentation on this.  Thanks!


From jmy at lolita.engr.sgi.com  Tue Jun 13 15:23:44 2006
From: jmy at lolita.engr.sgi.com (James Yarbrough)
Date: Tue, 13 Jun 2006 08:23:44 -0700 (PDT)
Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface
References: <1150089943.26019.18.camel@localhost.localdomain>
Message-ID: <200606131523.k5DFNila1061570@lolita.engr.sgi.com>

> There seems to be an unstated assumption here that there is one
> virtual IP per exported filesystem.?? Is that true?

This is the normal case for such HA services.  There may actually be
a single IP address covering multiple filesystems and/or NFS exports.

> I think that maybe the right thing to do is *not* drop the locks on a
> particular filesystem, but to drop the locks made to a particular
> virtual IP.

For filesystems such as ext2 or xfs, you unmount the filesystem on the
current server and mount it on the new server when doing a failover.
In this case, you have to be able to get rid of all the locks first and
you do that for the entire filesystem.  For a cluster filesystem such as
cxfs, you don't actually unmount the filesystem, so you really need the
per-IP address approach.

> If I want to force-unmount a filesystem, I need to unexport it, and I
> need to kill all the locks.?? Currently you can only remove locks from
> all filesystems, which might not be ideal.

This is definitely less than ideal.  This will force notification and
reclaim for all exported filesystems.  This can be a significant problem.

jmy at sgi.com
650 933 3124

Why is there a snake in my Coke?


From gradimir_starovic at symantec.com  Wed Jun 14 12:40:37 2006
From: gradimir_starovic at symantec.com (Gradimir Starovic)
Date: Wed, 14 Jun 2006 13:40:37 +0100
Subject: [Linux-cluster] Red Hat Summit presentations
Message-ID: <E1FF5517ED446F46BABF05546CE6213C03C54D42@gpkxchcln5.enterprise.veritas.com>

"NFS for Linux" link gives Page not found. Is it just me or
it needs fixing?

regards
Gradimir 

>  -----Original Message-----
>  From: linux-cluster-bounces at redhat.com 
>  [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Riaan 
>  van Niekerk
>  Sent: 14 June 2006 12:27
>  To: linux-cluster at redhat.com
>  Subject: [Linux-cluster] Red Hat Summit presentations
>  
>  For anyone interested in the Red Hat Summit presentations, 
>  they are available on-line now. 
>  
>  The presentations on Clustering and Storage are available 
>  here: http://www.redhat.com/promo/summit/presentations/cns.htm
>  
>  Riaan
>  
>  --
>  Linux-cluster mailing list
>  Linux-cluster at redhat.com
>  https://www.redhat.com/mailman/listinfo/linux-cluster
>  


From bmarzins at redhat.com  Thu Jun 15 20:52:00 2006
From: bmarzins at redhat.com (Benjamin Marzinski)
Date: Thu, 15 Jun 2006 15:52:00 -0500
Subject: [Linux-cluster] GFS locking issues
In-Reply-To: <20060615190959.GB1913@redhat.com>
References: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com>
	<20060615190959.GB1913@redhat.com>
Message-ID: <20060615205200.GC12574@ether.msp.redhat.com>

On Thu, Jun 15, 2006 at 02:09:59PM -0500, David Teigland wrote:
> On Thu, Jun 15, 2006 at 01:43:25AM +0300, Anton Kornev wrote:
> 
> > Is there any ideas of how to fix this? I mean either the reason ('D'
> > state of killed httpd-s) or consequences (the GFS filesystem fully or
> > partially become unavailable after this).
> > 
> > I also appreciate any help with debugging the problem.
> > 
> > I tried gfs_tool lockdump with decipher_lockstate_dump tool.
> 
> I don't see anything wrong in the lockdumps you gave, although I'm not an
> expert at interpreting gfs lockdumps.  Could you do a ps showing the wchan
> for those processes?  Using sysrq to get a stack dump would also be useful.
> You might also do a dlm lock dump and pick out those locks:
>   echo "lockspace name" >> /proc/cluster/dlm_locks
>   cat /proc/cluster/dlm_locks
> 
> I/O stuck in gnbd could also be a problem, I'm not sure what the signs of
> that might be apart from possibly the wchan.

To check for GNBD lockups, there are a couple of useful places to look.
Are there any messages in the logs of any of the nodes (particularly the
hanging gnbd client and the gnbd server node) that provide any clues?

Do a
# gnbd_import -l 
on all the gnbd client machines. The 'State:' line is the important one.
For all the devices you are using, the first to values should be "Open"
and "Connected".  If it doesn't say "Connected" you've lost connection to
the server for some reason. The log messages should provide a clue.
If the last value says "Clear", then there is no outstanding IO to the server.
If it says "Pending", do a
# cat /sys/class/gnbd/gnbd<minor_nr>/waittime
Run the command a couple of times. This is the time since the server has last
fulfilled an outstanding request. If there are no oustanding requests, it will
be -1. If the value keeps getting larger, then there is pending IO to the
server.

Run
# gnbd_export -L
on the server machine. You should see a process for each exported device for
each client. If there is pending IO to the server, a stack trace of the
server process will show where it's stuck.

The other place GNBD could be stuck is waiting on some internal lock. A stack
should point that out. 

-Ben
> 
> Dave
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From neilb at suse.de  Fri Jun 16 06:09:49 2006
From: neilb at suse.de (Neil Brown)
Date: Fri, 16 Jun 2006 16:09:49 +1000
Subject: [NFS] [Linux-cluster] Re: [RFC] NLM lock failover admin interface
In-Reply-To: message from William A.(Andy) Adamson on Thursday June 15
References: <1150089943.26019.18.camel@localhost.localdomain>
	<17550.11870.186706.36949@cse.unsw.edu.au>
	<1150268091.28264.75.camel@localhost.localdomain>
	<1150293654.28264.91.camel@localhost.localdomain>
	<20060615140743.36CDC1BBAD@citi.umich.edu>
Message-ID: <17554.19245.914383.436585@cse.unsw.edu.au>

On Thursday June 15, andros at citi.umich.edu wrote:
> this discusion has centered around removing the locks of an export.
> we also want the interface to ge able to remove the locks owned by a single 
> client. this is needed to enable client migration between replica's or between 
> nodes in a cluster file system. it is not acceptable to place an entire export 
> in grace just to move a small number of clients.

Hmmmm....
You want to remove all the locks owned by a particular client
with the intension of reclaiming those locks against a different NFS
server (on a cluster filesystem)
and you don't want to put the whole filesystem into grace mode while
doing it.

Is that correct?

Sounds extremely racy to me.  Suppose some other client takes a
conflicting lock between dropping them on one server and claiming them
on the other?  That would be bad.  The purpose of the grace mode is
precisely to avoid this sort of race.

It would seem that what you "really" want to do is to tell the cluster
filesystem to migrate the locks to a different node and some how tell
lockd about out.

Is there a comprehensive design document about how this is going to
work, because I'm feeling doubtful.

For the 'between replicas' case - I'm not sure locking makes sense.
Locking on a read-only filesystem is pretty pointless, and presumably
replicas are read-only???

Basically, dropping locks that are expected to be picked up again,
without putting the whole filesystem into a grace period simply
doesn't sound workable to me.

Am I missing something?

NeilBrown


From akornev at gmail.com  Fri Jun 16 15:37:14 2006
From: akornev at gmail.com (Anton Kornev)
Date: Fri, 16 Jun 2006 18:37:14 +0300
Subject: [Linux-cluster] GFS locking issues
In-Reply-To: <20060615190959.GB1913@redhat.com>
References: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com>
	<20060615190959.GB1913@redhat.com>
Message-ID: <433fd2630606160837x1bbe2716pf7b375f42b01cdbd@mail.gmail.com>

David, Benjamin,
thanks for you assistance!

I reproduced the problem and I have done the tests you mentioned.

Regarding gndb:

gnbd_import -l tool reports "Open, Connected" state and gndb_export -L
on the gnbd server also shows all the hosts importing this partition.
The " cat /sys/class/gnbd/gnbd0/waittime" also shows no data pending
(returns -1).

Though in the message log there were some strange lines about gnbd failures
appeared after the "killall httpd" command was issued:

gnbd (pid 5836: alogc.pl) got signal 9
gnbd0: Send control failed (result -4)
gnbd (pid 5836: alogc.pl) got signal 15
gnbd0: Send control failed (result -4)
gnbd (pid 5911: httpd) got signal 15
gnbd0: Send control failed (result -4)
gnbd (pid 5897: httpd) got signal 15
gnbd0: Send control failed (result -4)
gnbd (pid 5915: httpd) got signal 15
gnbd0: Send control failed (result -4)
gnbd (pid 5911: httpd) got signal 15
gnbd0: Send control failed (result -4)


Regarding ps info on wchan - it looks like this:

ps axl info on IO-waiting processes:

F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY        TIME COMMAND
1     0    51     6  15   0     0    0 wait_o D    ?          0:00 [pdflush]
1     0  5771     6   5 -10     0    0 lock_p D<   ?          0:00
[lock_dlm1]
1     0  5776     1  15   0     0    0 -      D    ?          0:00
[gfs_logd]
1     0  5777     1  15   0     0    0 -      D    ?          0:00
[gfs_quotad]
1     0  5778     1  15   0     0    0 -      D    ?          0:00
[gfs_inoded]
5     0  5892     1  16   0 23440  912 -      Ds   ?          0:00
/usr/system/apache/bin/httpd
5    48  5895  5892  17   0 23472  984 glock_ D    ?          0:00
/usr/system/apache/bin/httpd
5    48  5896  5892  17   0 23440  980 glock_ D    ?          0:00
/usr/system/apache/bin/httpd
5    48  5897  5892  17   0 23440  920 glock_ D    ?          0:00
/usr/system/apache/bin/httpd
5    48  5911  5892  17   0 23440  920 glock_ D    ?          0:00
/usr/system/apache/bin/httpd
5    48  5915  5892  17   0 23440  920 wait_o D    ?          0:00
/usr/system/apache/bin/httpd
4     0  5930  2547  34  19 52780  992 wait_o DN   ?          0:00 /bin/sh
-c run-parts /etc/cron.da
ily

Not truncated version of the "wchan" field for all the IO-waiting processes
is below:

bash-3.00# ps ax -o pid,state,wchan:32,ucomm |grep D
  PID S WCHAN                            COMMAND
   51 D wait_on_buffer                   pdflush
 5771 D lock_page                        lock_dlm1
 5776 D -                                gfs_logd
 5777 D -                                gfs_quotad
 5778 D -                                gfs_inoded
 5892 D -                                httpd
 5895 D glock_wait_internal              httpd
 5896 D glock_wait_internal              httpd
 5897 D glock_wait_internal              httpd
 5911 D glock_wait_internal              httpd
 5915 D wait_on_buffer                   httpd
 5930 D wait_on_buffer                   sh

Finally I have taken the "sysrq" info on these processes.

pdflush       D ffffffff8014aabc     0    51      6            53    50
(L-TLB)
00000100dfc3dc78 0000000000000046 000001011bd3e980 000001010fc11f00
       0000000000000216 ffffffffa0042916 000001011aca60c0 0000000000000008
       000001011fdef7f0 0000000000000dfa
Call Trace:<ffffffffa0042916>{:dm_mod:dm_request+396}
<ffffffff8014aabc>{keventd_create_kthread+0}
       <ffffffff803053ef>{io_schedule+38}
<ffffffff80178c4c>{__wait_on_buffer+125}
       <ffffffff80178ad2>{bh_wake_function+0}
<ffffffff80178ad2>{bh_wake_function+0}
       <ffffffffa0235c5d>{:gfs:gfs_logbh_wait+49}
<ffffffffa024a6a6>{:gfs:disk_commit+794}
       <ffffffffa024a877>{:gfs:log_refund+111}
<ffffffffa024ad8e>{:gfs:log_flush_internal+510}
       <ffffffff8017d682>{sync_supers+167} <ffffffff8015e310>{wb_kupdate+36}

       <ffffffff8015edb4>{pdflush+323} <ffffffff8015e2ec>{wb_kupdate+0}
       <ffffffff8015ec71>{pdflush+0} <ffffffff8014aa93>{kthread+200}
       <ffffffff80110e17>{child_rip+8}
<ffffffff8014aabc>{keventd_create_kthread+0}
       <ffffffff8014a9cb>{kthread+0} <ffffffff80110e0f>{child_rip+0}
lock_dlm1     D 000001000c0096e0     0  5771      6          5772  5766
(L-TLB)
0000010113ce3c58 0000000000000046 0000001000000000 0000010000000069
       000001011420b030 0000000000000069 000001000c00a940 000000010000eb10
       000001011a887030 0000000000001cae
Call Trace:<ffffffff802496d4>{__generic_unplug_device+19}
<ffffffff803053ef>{io_schedule+38}
       <ffffffff80159215>{__lock_page+191}
<ffffffff80158cfa>{page_wake_function+0}
       <ffffffff80158cfa>{page_wake_function+0}
<ffffffff80163125>{truncate_inode_pages+519}
       <ffffffffa0258f35>{:gfs:gfs_inval_page+63}
<ffffffffa02401b5>{:gfs:drop_bh+233}
       <ffffffffa0242138>{:gfs:gfs_glock_cb+194}
<ffffffffa02869dd>{:lock_dlm:dlm_async+1989}
       <ffffffff801333c8>{default_wake_function+0}
<ffffffff8014aabc>{keventd_create_kthread+0}
       <ffffffffa0286218>{:lock_dlm:dlm_async+0}
<ffffffff8014aabc>{keventd_create_kthread+0}
       <ffffffff8014aa93>{kthread+200} <ffffffff80110e17>{child_rip+8}
       <ffffffff8014aabc>{keventd_create_kthread+0}
<ffffffff8014a9cb>{kthread+0}
       <ffffffff80110e0f>{child_rip+0}
gfs_logd      D 0000000000000000     0  5776      1          5777  5775
(L-TLB)
000001011387fe38 0000000000000046 0000000000000000 ffffffff80304a85
       000001011387fe58 ffffffff80304add ffffffff803cca80 0000000000000246
       00000101143fe030 00000000000000b5
Call Trace:<ffffffff80304a85>{thread_return+0}
<ffffffff80304add>{thread_return+88}
       <ffffffffa023e8d3>{:gfs:lock_on_glock+112}
<ffffffff8030565b>{__down_write+134}
       <ffffffffa0249cdb>{:gfs:gfs_ail_empty+56}
<ffffffffa0233930>{:gfs:gfs_logd+77}
       <ffffffff80110e17>{child_rip+8}
<ffffffff801cccff>{dummy_d_instantiate+0}
       <ffffffffa02338e3>{:gfs:gfs_logd+0} <ffffffff80110e0f>{child_rip+0}

gfs_quotad    D 0000000000000000     0  5777      1          5778  5776
(L-TLB)
0000010113881e98 0000000000000046 0000000000000000 ffffffff80304a85
       0000010113881eb8 ffffffff80304add 000001011ff87030 0000000100000074
       000001011430f7f0 0000000000000128
Call Trace:<ffffffff80304a85>{thread_return+0}
<ffffffff80304add>{thread_return+88}
       <ffffffff8030565b>{__down_write+134}
<ffffffffa025b55a>{:gfs:gfs_quota_sync+226}
       <ffffffffa0233ab1>{:gfs:gfs_quotad+127}
<ffffffff80110e17>{child_rip+8}
       <ffffffff801cccff>{dummy_d_instantiate+0}
<ffffffff801cccff>{dummy_d_instantiate+0}
       <ffffffff801cccff>{dummy_d_instantiate+0}
<ffffffffa0233a32>{:gfs:gfs_quotad+0}
       <ffffffff80110e0f>{child_rip+0}
gfs_inoded    D 0000000000000000     0  5778      1          5807  5777
(L-TLB)
0000010113883e98 0000000000000046 000001011e2937f0 000001000c0096e0
       0000000000000000 ffffffff80304a85 0000010113883ec8 0000000180304add
       000001011e2937f0 00000000000000c2
Call Trace:<ffffffff80304a85>{thread_return+0}
<ffffffff8030565b>{__down_write+134}
       <ffffffffa026160d>{:gfs:unlinked_find+115}
<ffffffffa0261c6c>{:gfs:gfs_unlinked_dealloc+25}
       <ffffffffa0233bd5>{:gfs:gfs_inoded+66}
<ffffffff80110e17>{child_rip+8}
       <ffffffffa0233b93>{:gfs:gfs_inoded+0} <ffffffff80110e0f>{child_rip+0}


httpd         D ffffffff80304190     0  5892      1  5893          5826
(NOTLB)
0000010111b75bf8 0000000000000002 0000000000000001 0000000000000001
       0000000000000000 0000000000000000 0000010114667980 0000000111b75bc0
       00000101143fe7f0 00000000000009ad
Call Trace:<ffffffff80303d6f>{__down+147}
<ffffffff801333c8>{default_wake_function+0}
       <ffffffff8015b3a2>{generic_file_write_nolock+158}
<ffffffff80305780>{__down_failed+53}
       <ffffffffa0236986>{:gfs:.text.lock.dio+95}
<ffffffffa0260e4c>{:gfs:gfs_trans_add_bh+205}
       <ffffffffa0253efc>{:gfs:do_write_buf+1138}
<ffffffffa0252db3>{:gfs:walk_vm+278}
       <ffffffffa0253a8a>{:gfs:do_write_buf+0}
<ffffffffa0253a8a>{:gfs:do_write_buf+0}
       <ffffffffa025415b>{:gfs:__gfs_write+201}
<ffffffff80177c60>{vfs_write+207}
       <ffffffff80177d48>{sys_write+69} <ffffffff801101c6>{system_call+126}

httpd         D 0000010110ad7d48     0  5895   5892          5896  5893
(NOTLB)
0000010110ad7bd8 0000000000000006 000001011b16e030 0000000000000075
       0000010117002030 0000000000000075 000001000c002940 0000000000000001
       00000101170027f0 000000000001300e
Call Trace:<ffffffff80131d1d>{try_to_wake_up+863}
<ffffffff80304cbd>{wait_for_completion+167}
       <ffffffff801333c8>{default_wake_function+0}
<ffffffff801333c8>{default_wake_function+0}
       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
<ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
<ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
       <ffffffff80168211>{do_no_page+1003}
<ffffffff80167b13>{do_wp_page+948}
       <ffffffff8016858f>{handle_mm_fault+343}
<ffffffff80142a06>{get_signal_to_deliver+1118}
       <ffffffff801236d2>{do_page_fault+518}
<ffffffff80304a85>{thread_return+0}
       <ffffffff80304add>{thread_return+88} <ffffffff80110c61>{error_exit+0}


httpd         D 0000010110b5bd48     0  5896   5892          5897  5895
(NOTLB)
0000010110b5bbd8 0000000000000002 00000101170027f0 0000000000000075
       00000101114787f0 0000000000000075 000001000c002940 0000000000000001
       0000010117002030 000000000000fb3e
Call Trace:<ffffffff80131d1d>{try_to_wake_up+863}
<ffffffff80304cbd>{wait_for_completion+167}
       <ffffffff801333c8>{default_wake_function+0}
<ffffffff801333c8>{default_wake_function+0}
       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
<ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
<ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
       <ffffffff80168211>{do_no_page+1003}
<ffffffff80167b13>{do_wp_page+948}
       <ffffffff8016858f>{handle_mm_fault+343}
<ffffffff80142a06>{get_signal_to_deliver+1118}
       <ffffffff801236d2>{do_page_fault+518}
<ffffffff802a3445>{sys_accept+327}
       <ffffffff80182e88>{pipe_read+26} <ffffffff80110c61>{error_exit+0}

httpd         D 0000000000000000     0  5897   5892          5911  5896
(NOTLB)
0000010110119bd8 0000000000000006 0000010117002030 0000000000000075
       0000010117002030 0000000000000075 000001000c00a940 000000001b16e030
       00000101114787f0 000000000000fbe0
Call Trace:<ffffffff802496d4>{__generic_unplug_device+19}
<ffffffff80304cbd>{wait_for_completion+167}
       <ffffffff801333c8>{default_wake_function+0}
<ffffffff801333c8>{default_wake_function+0}
       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
<ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
<ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
       <ffffffff80168211>{do_no_page+1003}
<ffffffff80167b13>{do_wp_page+948}
       <ffffffff8016858f>{handle_mm_fault+343}
<ffffffff80142a06>{get_signal_to_deliver+1118}
       <ffffffff801236d2>{do_page_fault+518}
<ffffffff80304a85>{thread_return+0}
       <ffffffff80304add>{thread_return+88} <ffffffff80110c61>{error_exit+0}


httpd         D 00000101100c3d48     0  5911   5892          5915  5897
(NOTLB)
00000101100c3bd8 0000000000000002 000001011420b7f0 0000000000000075
       00000101170027f0 0000000000000075 000001000c002940 0000000000000000
       000001011b16e030 000000000000187e
Call Trace:<ffffffff80131d1d>{try_to_wake_up+863}
<ffffffff80304cbd>{wait_for_completion+167}
       <ffffffff801333c8>{default_wake_function+0}
<ffffffff801333c8>{default_wake_function+0}
       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
<ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
<ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
       <ffffffff80168211>{do_no_page+1003}
<ffffffff80167b13>{do_wp_page+948}
       <ffffffff8016858f>{handle_mm_fault+343}
<ffffffff80142a06>{get_signal_to_deliver+1118}
       <ffffffff801236d2>{do_page_fault+518}
<ffffffff80304a85>{thread_return+0}
       <ffffffff80304add>{thread_return+88} <ffffffff80110c61>{error_exit+0}


httpd         D 0000000000006a36     0  5915   5892                5911
(NOTLB)
00000101180f7ad8 0000000000000006 0000000000002706 ffffffffa020c791
       0000000000000000 0000000000000000 0000030348ac8c1c 0000000114a217f0
       0000010114c997f0 000000000000076a
Call Trace:<ffffffffa020c791>{:dlm:lkb_swqueue+43}
<ffffffff803053ef>{io_schedule+38}
       <ffffffff80178c4c>{__wait_on_buffer+125}
<ffffffff80178ad2>{bh_wake_function+0}
       <ffffffff80178ad2>{bh_wake_function+0}
<ffffffffa02352c6>{:gfs:gfs_dreread+154}
       <ffffffffa0235332>{:gfs:gfs_dread+40}
<ffffffffa02363b1>{:gfs:gfs_get_meta_buffer+201}
       <ffffffffa0242999>{:gfs:gfs_copyin_dinode+23}
<ffffffffa0242461>{:gfs:inode_go_lock+38}
       <ffffffffa023f586>{:gfs:glock_wait_internal+563}
<ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
<ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
       <ffffffff80168211>{do_no_page+1003}
<ffffffff80167b13>{do_wp_page+948}
       <ffffffff8016858f>{handle_mm_fault+343}
<ffffffff80142a06>{get_signal_to_deliver+1118}
       <ffffffff801236d2>{do_page_fault+518}
<ffffffff80304a85>{thread_return+0}
       <ffffffff80304add>{thread_return+88} <ffffffff80110c61>{error_exit+0}


sh            D 000000000000001a     0  5930   2547
(NOTLB)
000001011090f8e8 0000000000000002 0000010111293d88 0000010110973d00
       0000010111293d88 0000000000000000 00000100dfc02400 0000000000010000
       00000101148557f0 0000000000002010
Call Trace:<ffffffff803053ef>{io_schedule+38}
<ffffffff80178c4c>{__wait_on_buffer+125}
       <ffffffff80178ad2>{bh_wake_function+0}
<ffffffff80178ad2>{bh_wake_function+0}
       <ffffffffa02352c6>{:gfs:gfs_dreread+154}
<ffffffffa0235332>{:gfs:gfs_dread+40}
       <ffffffffa02363b1>{:gfs:gfs_get_meta_buffer+201}
<ffffffffa0242999>{:gfs:gfs_copyin_dinode+23}
       <ffffffffa0242461>{:gfs:inode_go_lock+38}
<ffffffffa023f586>{:gfs:glock_wait_internal+563}
       <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
<ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
       <ffffffff801ccb78>{dummy_inode_permission+0}
<ffffffffa0257aca>{:gfs:gfs_permission+64}
       <ffffffff8018d475>{dput+56} <ffffffff80183d32>{permission+51}
       <ffffffff801844aa>{__link_path_walk+372}
<ffffffff801851c2>{link_path_walk+82}
       <ffffffff8012370b>{do_page_fault+575}
<ffffffff801849b0>{__link_path_walk+1658}
       <ffffffff801851c2>{link_path_walk+82}
<ffffffff8012370b>{do_page_fault+575}
       <ffffffff8018540f>{path_lookup+451}
<ffffffff801856bb>{__user_walk+47}
       <ffffffff8017ff1a>{vfs_stat+24} <ffffffff8012370b>{do_page_fault+575}

       <ffffffff80180264>{sys_newstat+17} <ffffffff80110c61>{error_exit+0}
       <ffffffff801101c6>{system_call+126}

Please, let me know if it gives you any clues.


On 6/15/06, David Teigland <teigland at redhat.com> wrote:
>
> On Thu, Jun 15, 2006 at 01:43:25AM +0300, Anton Kornev wrote:
>
> > Is there any ideas of how to fix this? I mean either the reason ('D'
> > state of killed httpd-s) or consequences (the GFS filesystem fully or
> > partially become unavailable after this).
> >
> > I also appreciate any help with debugging the problem.
> >
> > I tried gfs_tool lockdump with decipher_lockstate_dump tool.
>
> I don't see anything wrong in the lockdumps you gave, although I'm not an
> expert at interpreting gfs lockdumps.  Could you do a ps showing the wchan
> for those processes?  Using sysrq to get a stack dump would also be
> useful.
> You might also do a dlm lock dump and pick out those locks:
>   echo "lockspace name" >> /proc/cluster/dlm_locks
>   cat /proc/cluster/dlm_locks
>
> I/O stuck in gnbd could also be a problem, I'm not sure what the signs of
> that might be apart from possibly the wchan.
>
> Dave
>
>


-- 
Best Regards,
Anton Kornev.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060616/441b2453/attachment.htm>

From andros at citi.umich.edu  Fri Jun 16 15:39:04 2006
From: andros at citi.umich.edu (William A.(Andy) Adamson)
Date: Fri, 16 Jun 2006 11:39:04 -0400
Subject: [NFS] [Linux-cluster] Re: [RFC] NLM lock failover admin interface
In-Reply-To: <17554.19245.914383.436585@cse.unsw.edu.au> 
References: <1150089943.26019.18.camel@localhost.localdomain> 
	<17550.11870.186706.36949@cse.unsw.edu.au>
	<1150268091.28264.75.camel@localhost.localdomain>
	<1150293654.28264.91.camel@localhost.localdomain>
	<20060615140743.36CDC1BBAD@citi.umich.edu>
	<17554.19245.914383.436585@cse.unsw.edu.au>
Message-ID: <20060616153904.6A9A21BCBD@citi.umich.edu>

> On Thursday June 15, andros at citi.umich.edu wrote:
> > this discusion has centered around removing the locks of an export.
> > we also want the interface to ge able to remove the locks owned by a single 
> > client. this is needed to enable client migration between replica's or between 
> > nodes in a cluster file system. it is not acceptable to place an entire export 
> > in grace just to move a small number of clients.
> 
> Hmmmm....
> You want to remove all the locks owned by a particular client
> with the intension of reclaiming those locks against a different NFS
> server (on a cluster filesystem)
> and you don't want to put the whole filesystem into grace mode while
> doing it.
> 
> Is that correct?

yes.

> 
> Sounds extremely racy to me.  Suppose some other client takes a
> conflicting lock between dropping them on one server and claiming them
> on the other?  That would be bad.  The purpose of the grace mode is
> precisely to avoid this sort of race.

the idea is that the underlying file system can place only the files with 
locks held by the migrating client(s) into grace, leaving all other files for 
normal operation. the migrating (nfsv4) client then reclaims opens, locks and 
delegations on the new server. its just reducing the scope of the grace period.

> 
> It would seem that what you "really" want to do is to tell the cluster
> filesystem to migrate the locks to a different node and some how tell
> lockd about out.

what we really want is for the cluster file system to share the locks between 
the original node and the new node. then the client can simply be redirected 
and no grace period or reclaim is needed. this is much harder to code than a 
reduced grace period as describe above. from what we hear, lustre has this 
functionality.

either way, the files with locks held by the migrating client need to be 
identified by both the lock manager (lockd/nfsv4 server) and the underlying fs.

> 
> Is there a comprehensive design document about how this is going to
> work, because I'm feeling doubtful.

we have a work in progress - it's not done but may help describe our thinking.

http://wiki.linux-nfs.org/index.php/Recovery_and_migration

> 
> For the 'between replicas' case - I'm not sure locking makes sense.
> Locking on a read-only filesystem is pretty pointless, and presumably
> replicas are read-only???

nope. we have a promising prototye read/write replica scheme that we are 
testing.

http://www.citi.umich.edu/techreports/reports/citi-tr-06-3.pdf

i agree this is an outlying case....

but another immediate consumer of such an iterface would be an administator 
who needs to remove the locks for a client.

-->Andy

> 
> Basically, dropping locks that are expected to be picked up again,
> without putting the whole filesystem into a grace period simply
> doesn't sound workable to me.
> 
> Am I missing something?
> 
> NeilBrown
> 
> 
> _______________________________________________
> NFS maillist  -  NFS at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs


From sunjw at onewaveinc.com  Fri Jun 16 14:38:58 2006
From: sunjw at onewaveinc.com (=?GB2312?B?y++/oc6w?=)
Date: Fri, 16 Jun 2006 22:38:58 +0800
Subject: [Linux-cluster] gfs withdrawed in function xmote_bh with ret =
	0x00000002
Message-ID: <SERVERmFutEeMQkKbF800005bf0@mail.onewaveinc.com>

Hi,all

I run the latest STABLE cluster code with 3 nodes,
I get the message on one node after about 38 hours as:
<--
Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: fatal: assertion "FALSE" failed
Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1:   function = xmote_bh
Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1:   file = /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/gfs/glock.
c, line = 1093
Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1:   time = 1150408904
Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: about to withdraw from the cluster
Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: waiting for outstanding I/O
Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: telling LM to withdraw
Jun 16 06:01:48 nd04 kernel: lock_dlm: withdraw abandoned memory
Jun 16 06:01:48 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: withdrawn
Jun 16 06:01:48 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: ret = 0x00000002
-->
My test program has 'df', 'write', 'ls' and 'read'. 
and each node connect to RAID controller's host port directly with FC.

What would be the problem?

Thanks for any reply,
Luckey


From teigland at redhat.com  Fri Jun 16 16:37:53 2006
From: teigland at redhat.com (David Teigland)
Date: Fri, 16 Jun 2006 11:37:53 -0500
Subject: [Linux-cluster] Re: gfs withdrawed in function xmote_bh with ret =
	0x00000002
In-Reply-To: <SERVERmFutEeMQkKbF800005bf0@mail.onewaveinc.com>
References: <SERVERmFutEeMQkKbF800005bf0@mail.onewaveinc.com>
Message-ID: <20060616163753.GB18872@redhat.com>

On Fri, Jun 16, 2006 at 10:38:58PM +0800, ?????? wrote:
> Hi,all
> 
> I run the latest STABLE cluster code with 3 nodes,
> I get the message on one node after about 38 hours as:
> <--
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: fatal: assertion "FALSE" failed
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1:   function = xmote_bh
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1:   file = /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/gfs/glock.
> c, line = 1093
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1:   time = 1150408904
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: about to withdraw from the cluster
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: waiting for outstanding I/O
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: telling LM to withdraw
> Jun 16 06:01:48 nd04 kernel: lock_dlm: withdraw abandoned memory
> Jun 16 06:01:48 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: withdrawn
> Jun 16 06:01:48 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: ret = 0x00000002
> -->
> My test program has 'df', 'write', 'ls' and 'read'. 
> and each node connect to RAID controller's host port directly with FC.

Hi, I've attached a small patch to print more information and call BUG
instead of withdrawing.  It may also be helpful to see a dlm lock dump and
a gfs_tool lockdump on the machine after you hit the BUG.

Thanks,
Dave

-------------- next part --------------
--- ./glock.c.orig	2006-06-16 11:17:48.313980418 -0500
+++ ./glock.c	2006-06-16 11:31:20.617855661 -0500
@@ -30,6 +30,9 @@
 #include "quota.h"
 #include "recovery.h"
 
+int dump_glock(struct gfs_glock *gl, char *buf, unsigned int size,
+	       unsigned int *count)
+
 /*  Must be kept in sync with the beginning of struct gfs_glock  */
 struct glock_plug {
 	struct list_head gl_list;
@@ -1090,9 +1093,15 @@
 		spin_unlock(&gl->gl_spin);
 
 	} else {
-		if (gfs_assert_withdraw(sdp, FALSE) == -1)
-			printk("GFS: fsid=%s: ret = 0x%.8X\n",
-			       sdp->sd_fsname, ret);
+		char *buf;
+		int junk;
+		printk("GFS: fsid=%s: ret = 0x%.8X prev_state = %d\n",
+		       sdp->sd_fsname, ret, prev_state);
+		buf = kmalloc(4096);
+		memset(buf, 0, sizeof(buf));
+		dump_glock(gl, buf, 4096, &junk);
+		printk("%s\n", buf);
+		BUG();
 	}
 
 	if (glops->go_xmote_bh)

From aberoham at gmail.com  Fri Jun 16 19:36:40 2006
From: aberoham at gmail.com (aberoham at gmail.com)
Date: Fri, 16 Jun 2006 12:36:40 -0700
Subject: [Linux-cluster] recovering from "resource groups locked" error?
Message-ID: <3bdb07840606161236n323d9f41if79abcb6530df747@mail.gmail.com>

If clustat reports rgmanager as online, why would any clusvcadm operation
fail with "Try again (resource groups locked)" ?

Is there any way to recover from that rgmanger failure/error besides
resetting the entire cluster?

Details --

Yesterday evening a technician connected a Netgear GS748T switch to my
network. The new switch somehow caused a storm of traffic that in turn
caused a disruption of network connectivity across the entire LAN, including
to all of my CS/GFS cluster nodes, for a few minutes until the new switch
was removed from the network.

This morning when I finally had a chance to investigate I found that all of
the cluster members that are supposed to be online were online and that the
cluster was quorate. But rgmanager would not work and services running under
rgmanager were hung. (The cluster must have become inquorate and blocked
access to the shared GFS volume while the outage was in progress. But some
of the services and rgmanager never recovered?)

I first tried resetting the "lead" member. (This is a pool of mirrored
storage servers where the lead member creates a rsync batch off of a main
fileserver and all of the other members then replay the rsync batch that is
on a shared filesystem against their local filesystem mirror of the main
fileserver)

No matter what I did rgmanager would not start. cman_tool services would
report code "S-1,80,4" --

root at gfs05:~
(0)>cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 1 4 3]

DLM Lock Space:  "clvmd"                             2   3 run       -
[2 1 4 3]

User:            "usrm::manager"                     0   4 join
S-1,80,4
[]

Other cluster members would report rgmanager as online, yet when I tried to
operate on member services, the operation would fail with "Try again
(resource groups locked)".

root at gfs06:~
(1)>clustat
Member Status: Quorate

  Member Name                              Status
  ------ ----                              ------
  gfs04                                    Online, rgmanager
  gfs05                                    Online
  gfs06                                    Online, Local, rgmanager
  gfs07                                    Online, rgmanager
  gfs08                                    Offline

  Service Name         Owner (Last)                   State
  ------- ----         ----- ------                   -----
  mapsmirror1          gfs05                          started
  mapsmirror2          gfs06                          started
  mapsmirror3          gfs07                          started
  mapsmirror4          gfs04                          started
  mapsmirror5          (none)                         stopped
root at gfs06:~
(0)>clusvcadm -d mapsmirror1
Member gfs06 disabling mapsmirror1...failed: Try again (resource groups
locked)

Eventually I just gave up and power cycled all cluster members at ounce.
Everything, including rgmanger, then came back online OK.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060616/103249ed/attachment.htm>

From magobin at gmail.com  Fri Jun 16 19:38:39 2006
From: magobin at gmail.com (Alex)
Date: Fri, 16 Jun 2006 21:38:39 +0200
Subject: [Linux-cluster] Postfix & Courier in cluster...little question!!
Message-ID: <4493088f.3404778b.431a.fffff008@mx.gmail.com>

Hi at all....I configured postfix and courier in a cluster, now I have the
problem with Maildir directory...in fact if Postfix is on node1 and courier
in node2....Courier can't access to maildir (obviously)....is it there
anyone that can suggest me a good solution??
One solution may be drbd....but cluster suite doesn't implement anything for
this issue???

Thanks in advance

Alex


From aberoham at gmail.com  Fri Jun 16 19:39:59 2006
From: aberoham at gmail.com (aberoham at gmail.com)
Date: Fri, 16 Jun 2006 12:39:59 -0700
Subject: [Linux-cluster] Re: recovering from "resource groups locked" error?
In-Reply-To: <3bdb07840606161236n323d9f41if79abcb6530df747@mail.gmail.com>
References: <3bdb07840606161236n323d9f41if79abcb6530df747@mail.gmail.com>
Message-ID: <3bdb07840606161239i58fa11cfye5eb83b33da51236@mail.gmail.com>

Btw, all members run on 2.6.9-34.ELsmp, cman-1.0.4-0 and
cman-kernel-smp-2.6.9-43.8 with rgmanager-1.9.46-0.

On 6/16/06, aberoham at gmail.com <aberoham at gmail.com> wrote:
>
>
> If clustat reports rgmanager as online, why would any clusvcadm operation
> fail with "Try again (resource groups locked)" ?
>
> Is there any way to recover from that rgmanger failure/error besides
> resetting the entire cluster?
>
> Details --
>
> Yesterday evening a technician connected a Netgear GS748T switch to my
> network. The new switch somehow caused a storm of traffic that in turn
> caused a disruption of network connectivity across the entire LAN, including
> to all of my CS/GFS cluster nodes, for a few minutes until the new switch
> was removed from the network.
>
> This morning when I finally had a chance to investigate I found that all
> of the cluster members that are supposed to be online were online and that
> the cluster was quorate. But rgmanager would not work and services running
> under rgmanager were hung. (The cluster must have become inquorate and
> blocked access to the shared GFS volume while the outage was in progress.
> But some of the services and rgmanager never recovered?)
>
> I first tried resetting the "lead" member. (This is a pool of mirrored
> storage servers where the lead member creates a rsync batch off of a main
> fileserver and all of the other members then replay the rsync batch that is
> on a shared filesystem against their local filesystem mirror of the main
> fileserver)
>
> No matter what I did rgmanager would not start. cman_tool services would
> report code "S-1,80,4" --
>
> root at gfs05:~
> (0)>cman_tool services
> Service          Name                              GID LID State     Code
> Fence Domain:    "default"                           1   2 run       -
> [2 1 4 3]
>
> DLM Lock Space:  "clvmd"                             2   3 run       -
> [2 1 4 3]
>
> User:            "usrm::manager"                     0   4 join
> S-1,80,4
> []
>
> Other cluster members would report rgmanager as online, yet when I tried
> to operate on member services, the operation would fail with "Try again
> (resource groups locked)".
>
> root at gfs06:~
> (1)>clustat
> Member Status: Quorate
>
>   Member Name                              Status
>   ------ ----                              ------
>   gfs04                                    Online, rgmanager
>   gfs05                                    Online
>   gfs06                                    Online, Local, rgmanager
>   gfs07                                    Online, rgmanager
>   gfs08                                    Offline
>
>   Service Name         Owner (Last)                   State
>   ------- ----         ----- ------                   -----
>   mapsmirror1          gfs05                          started
>   mapsmirror2          gfs06                          started
>   mapsmirror3          gfs07                          started
>   mapsmirror4          gfs04                          started
>   mapsmirror5          (none)                         stopped
> root at gfs06:~
> (0)>clusvcadm -d mapsmirror1
> Member gfs06 disabling mapsmirror1...failed: Try again (resource groups
> locked)
>
> Eventually I just gave up and power cycled all cluster members at ounce.
> Everything, including rgmanger, then came back online OK.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060616/391a4eab/attachment.htm>

From lhh at redhat.com  Fri Jun 16 20:14:49 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 16 Jun 2006 16:14:49 -0400
Subject: [Linux-cluster] recovering from "resource groups locked" error?
In-Reply-To: <3bdb07840606161236n323d9f41if79abcb6530df747@mail.gmail.com>
References: <3bdb07840606161236n323d9f41if79abcb6530df747@mail.gmail.com>
Message-ID: <1150488889.20766.323.camel@ayanami.boston.redhat.com>

On Fri, 2006-06-16 at 12:36 -0700, aberoham at gmail.com wrote:
> 
> If clustat reports rgmanager as online, why would any clusvcadm
> operation fail with "Try again (resource groups locked)" ?
> 
> Is there any way to recover from that rgmanger failure/error besides
> resetting the entire cluster? 

Yeah, it's fixed in STABLE and RHEL4 branches at the moment.  It was
getting locked at the wrong time, and there used to be no way to unlock
it.

-- Lon


From mathieu.avila at seanodes.com  Mon Jun 19 11:48:56 2006
From: mathieu.avila at seanodes.com (Mathieu Avila)
Date: Mon, 19 Jun 2006 13:48:56 +0200
Subject: [Linux-cluster] Compilation problem with GFS/GNBD and kernel panics
	on stress.
Message-ID: <44968F28.8070505@seanodes.com>

Hello all, 

(I've already posted this to cluster-devel at redhat.com,and it seems it 
wasn't the appropriate place as i didn't get any answer. Sorry for the 
cross-posting.)

I have 2 problems:

1) I'm trying to use GFS with Fedora Core 4. It was upgraded to a kernel 
2.6.16-1.2111_FC4smp. RPM versions are:
GFS-kernel-smp-2.6.11.8-20050601.152643.FC4.25
GFS-6.1.0-3
GFS-kernheaders-2.6.11.8-20050601.152643.FC4.25
dlm-kernheaders-2.6.11.5-20050601.152643.FC4.22
dlm-kernel-smp-2.6.11.5-20050601.152643.FC4.22
dlm-1.0.0-3
gnbd-kernheaders-2.6.11.2-20050420.133124.FC4.58
gnbd-1.0.0-1


There was a problem to install the following packages,and the following 
patches were necessary:

-GFS-kernel


--- gfs-kernel-2.6.11.8-20050601.152643.FC4/src/gfs/ops_file.c.orig    
2006-06-01 13:57:58.000000000 +0200
+++ gfs-kernel-2.6.11.8-20050601.152643.FC4/src/gfs/ops_file.c    
2006-06-01 13:57:24.000000000 +0200
@@ -931,12 +931,12 @@
    if (!access_ok(VERIFY_READ, buf, size))
        return -EFAULT;

-    down(&inode->i_sem);
+    mutex_lock(&inode->i_mutex);
    if (file->f_flags & O_DIRECT)
        count = walk_vm(file, (char *)buf, size, offset, do_write_direct);
    else
        count = walk_vm(file, (char *)buf, size, offset, do_write_buf);
-    up(&inode->i_sem);
+    mutex_unlock(&inode->i_mutex);

    return count;
}
--- gfs-kernel-2.6.11.8-20050601.152643.FC4/src/gfs/ops_fstype.c.orig    
2006-06-01 14:04:16.000000000 +0200
+++ gfs-kernel-2.6.11.8-20050601.152643.FC4/src/gfs/ops_fstype.c    
2006-06-01 14:05:29.000000000 +0200
@@ -712,12 +712,12 @@
        goto out;
    } else {
        char buf[BDEVNAME_SIZE];
-
+        unsigned long bsize;
        sb->s_flags = flags;
        strlcpy(sb->s_id, bdevname(real, buf), sizeof(sb->s_id));
-        sb->s_old_blocksize = block_size(real);
-        sb_set_blocksize(sb, sb->s_old_blocksize);
-        set_blocksize(real, sb->s_old_blocksize);
+        bsize = block_size(real);
+        sb_set_blocksize(sb, bsize);
+        set_blocksize(real, bsize);
        error = fill_super(sb, data, (flags & MS_VERBOSE) ? 1 : 0);
        if (error) {
            up_write(&sb->s_umount);
@@ -748,7 +748,7 @@
{
    struct block_device *diaper = sb->s_bdev;
    struct block_device *real = gfs_diaper_2real(diaper);
-    unsigned long bsize = sb->s_old_blocksize;
+    unsigned long bsize = block_size(real);

    generic_shutdown_super(sb);
    set_blocksize(diaper, bsize);


I am quite confident about "file_ops.c" as it looks like the latest 
version for 2.6.15:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/ops_file.c?rev=1.16.6.2.2.4&content-type=text/x-cvsweb-markup&cvsroot=cluster&only_with_tag=gfs-kernel_2_6_15_2 


For "ops_fstype.c", it should be ok, unless you see obvious errors.


- gnbd-kernel:

--- gnbd-kernel-2.6.11.2-20050420.133124/src/gnbd.c.orig    2006-06-01 
13:46:35.000000000 +0200
+++ gnbd-kernel-2.6.11.2-20050420.133124/src/gnbd.c    2006-06-01 
13:47:03.000000000 +0200
@@ -180,9 +180,9 @@
    set_capacity(dev->disk, size);
    bdev = bdget_disk(dev->disk, 0);
    if (bdev) {
-        down(&bdev->bd_inode->i_sem);
+        mutex_lock(&bdev->bd_inode->i_mutex);
        i_size_write(bdev->bd_inode, (loff_t)size << 9);
-        up(&bdev->bd_inode->i_sem);
+        mutex_unlock(&bdev->bd_inode->i_mutex);
        bdput(bdev);
    }
    up(&dev->do_it_lock);
@@ -281,7 +281,7 @@
    
    spin_lock_irqsave(q->queue_lock, flags);
    if (!end_that_request_first(req, uptodate, req->nr_sectors)) {
-        end_that_request_last(req);
+        end_that_request_last(req, 0);
    }
    spin_unlock_irqrestore(q->queue_lock, flags);
}


This one is quite straightforward.


2) Once compiled and run, i get 1 node running GNBD and exporting one of 
its disks.
3 other nodes are running as client for GNBD, and i mount a GFS on them, 
although all 4 nodes participate to a GFS cluster. (standard config : 
dlm, cman)

I have tried to loop 100 times over parallel "bonnie++" on the 3 nodes, 
with:
bonnie++ -u 0:0 -d /mnt/gfs -x 100

One of the nodes crashed before the end before the 10th loop, with the 
following panic:

Unable to handle kernel paging request at 0000000000200220 RIP:
^M<ffffffff88351d6a>{:gfs:gfs_depend_add+430}
^MPGD 306d7067 PUD 37532067 PMD 0
^MOops: 0000 [1] SMP
^Mlast sysfs file: /class/gnbd/gnbd0/waittime
^MCPU 1
^MModules linked in: gnbd(U) lock_dlm(U) dlm(U) gfs(U) lock_harness(U) 
cman(U)
ipv6 parport_pc lp parport autofs4 rfcomm l2cap bluetooth sunrpc pcmcia 
yent
a_socket rsrc_nonstatic pcmcia_core dm_mod video button battery ac uhci_hcd
ehci_hcd i2c_i801 i2c_core tg3 e1000 ext3 jbd ata_piix libata sd_mod 
scsi_mod
^MPid: 5679, comm: bonnie++ Tainted: GF     2.6.16-1.2111_FC4smp #1
^MRIP: 0010:[<ffffffff88351d6a>] 
<ffffffff88351d6a>{:gfs:gfs_depend_add+430}
^MRSP: 0018:ffff81002bfddb38  EFLAGS: 00010206
^MRAX: ffff810037571200 RBX: 0000000000003a98 RCX: 0000000000000002
^MRDX: ffff810037571338 RSI: ffff81002bfddb08 RDI: ffff810001dd5c40
^MRBP: ffffc2001017a000 R08: ffffc2001017c650 R09: 0000000000000040
^MR10: 0000000000000040 R11: 0000000000040000 R12: 0000000000003a98
^MR13: 00000001002ac770 R14: 00000000002001f0 R15: ffffc2001017a258
^MFS:  00002aaaaaab8380(0000) GS:ffff8100021d9f40(0000) 
knlGS:0000000000000000
^MCS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
^MCR2: 0000000000200220 CR3: 0000000035b0b000 CR4: 00000000000006e0
^MProcess bonnie++ (pid: 5679, threadinfo ffff81002bfdc000, task 
ffff81003ecd5860)
^MStack: ffff810037571200 000000018832af2b 0000000000d633e7 
ffff810006d384a8
^M       ffff810022a0d978 0000000000d633e8 ffffc2001017a000 
0000000000000001
^M       ffff810009bd4490 ffffffff8832b99b
^MCall Trace: <ffffffff8832b99b>{:gfs:gfs_wipe_buffers+842}
^M       <ffffffff8833a292>{:gfs:gfs_inode_dealloc+1023}
<ffffffff88356102>{:gfs:gfs_unlinked_limit+230}
^M       <ffffffff8834aaac>{:gfs:gfs_unlink+60}
<ffffffff8834b183>{:gfs:gfs_permission+483}
^M       <ffffffff8019080f>{permission+114} 
<ffffffff80190a39>{vfs_unlink+203}
^M       <ffffffff8019312d>{do_unlinkat+184}
<ffffffff8010d431>{syscall_trace_enter+181}
^M       <ffffffff8010ab11>{tracesys+113} <ffffffff8010ab71>{tracesys+209}

^MCode: 4d 8b 66 30 4c 89 ff e8 34 04 00 f8 8b 9d 94 02 00 00 4c 89
^MRIP <ffffffff88351d6a>{:gfs:gfs_depend_add+430} RSP <ffff81002bfddb38>
^MCR2: 0000000000200220
^M <0>Kernel panic - not syncing: Oops

^MCall Trace: <ffffffff80134f76>{panic+133}
<ffffffff803521fb>{_spin_unlock_irqrestore+11}
^M       <ffffffff8035293c>{oops_end+71} 
<ffffffff803543ba>{do_page_fault+1770}
^M       <ffffffff8017dfc1>{kmem_freepages+191} 
<ffffffff8017e2e7>{slab_destroy+151}
^M       <ffffffff8010b93d>{error_exit+0}
<ffffffff88351d6a>{:gfs:gfs_depend_add+430}
^M       <ffffffff88351da4>{:gfs:gfs_depend_add+488}
<ffffffff8832b99b>{:gfs:gfs_wipe_buffers+842}
^M       <ffffffff8833a292>{:gfs:gfs_inode_dealloc+1023}
<ffffffff88356102>{:gfs:gfs_unlinked_limit+230}
^M       <ffffffff8834aaac>{:gfs:gfs_unlink+60}
<ffffffff8834b183>{:gfs:gfs_permission+483}
^M       <ffffffff8019080f>{permission+114} 
<ffffffff80190a39>{vfs_unlink+203}
^M       <ffffffff8019312d>{do_unlinkat+184}
<ffffffff8010d431>{syscall_trace_enter+181}
^M       <ffffffff8010ab11>{tracesys+113} <ffffffff8010ab71>{tracesys+209}


This is 100% reproducible.

Any thoughts on this ? Maybe it has already been corrected in a more 
recent version ?

-- 
Mathieu Avila


From wcheng at redhat.com  Mon Jun 19 13:59:56 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Mon, 19 Jun 2006 09:59:56 -0400
Subject: [Linux-cluster] Compilation problem with GFS/GNBD and kernel
	panics	on stress.
In-Reply-To: <44968F28.8070505@seanodes.com>
References: <44968F28.8070505@seanodes.com>
Message-ID: <4496ADDC.8010206@redhat.com>

Mathieu Avila wrote:

> Hello all,
> (I've already posted this to cluster-devel at redhat.com,and it seems it 
> wasn't the appropriate place as i didn't get any answer. Sorry for the 
> cross-posting.)

You posted to the right list (cluster-devel) and we've checked into the 
issues over the weekend. CVS head should have the correct chagnes now:

CVSROOT:	/cvs/cluster
Module name:	cluster
Changes by:	wcheng sourceware org	2006-06-17 06:38:23

Modified files:
	gfs-kernel/src/gfs: ops_file.c ops_fstype.c 

Log message:
	Sync with base kernel data structure changes:
	1. i_sem (in struct inode) is replaced by i_mutex.
	2. s_old_blocksize (in struct super_block) no longer exists.
	
	Thank to Mathieu Avila <mathieu avila seanodes com> pointed this out.


-- Wendy


From aberoham at gmail.com  Mon Jun 19 18:30:38 2006
From: aberoham at gmail.com (aberoham at gmail.com)
Date: Mon, 19 Jun 2006 11:30:38 -0700
Subject: [Linux-cluster] clucron.sh (Re: Centralized Cron)
Message-ID: <3bdb07840606191130v200daacbob41087471c9e2ac4@mail.gmail.com>

As a simple work-around solution to the desire posted in an earlier thread
regarding how to best handle cluster-dependent cron jobs, I came up with the
following script.

The theory of operation is this: install the same cluster-depedent cronjobs
on all members but prefice the cron command with clucron.sh [cluster
service] [real cron cmd]. clucron.sh verifies the status of the cluster and
punts if the service that the cron job is supposed to run against is not
currently assigned and running on the particular cluster member. If the
particular cluster member IS running the specified service, the cron job
command is ran as usual.

Note: "clustat -s [service]" functionality required for the attached script
is missing in rgmanager-1.9.46 RPM. See
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=185952 and download Mr.
Hohberger's fixed RPMs before trying clucron.sh.

Abe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060619/24cff3c5/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clucron.sh
Type: application/x-sh
Size: 1195 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060619/24cff3c5/attachment.sh>

From jason at monsterjam.org  Mon Jun 19 20:59:22 2006
From: jason at monsterjam.org (Jason)
Date: Mon, 19 Jun 2006 16:59:22 -0400
Subject: [Linux-cluster] servers crashing while not doing much.
Message-ID: <20060619205922.GA10200@monsterjam.org>

hey folks,

I have 2 nodes running GFS 6.1.5
[root at tf1 ~]# rpm -qa | grep -i gfs
GFS-6.1.5-0
GFS-kernheaders-2.6.9-49.1
GFS-kernel-smp-2.6.9-49.1
[root at tf1 ~]# rpm -qa | grep -i ccs
ccs-devel-1.0.3-0
ccs-1.0.3-0
[root at tf1 ~]# 
[root at tf1 ~]# uname -a
Linux tf1.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:54:53 EST 2006 i686 i686 i386 GNU/Linux
[root at tf1 ~]# 


and last week, we had them both go down on us unexpectedly.
one had paniced and the other was powered off..

these systems are NOT in production yet, so there was some data on the GFS partition, but im pretty 
sure that there was not much activity when the boxes went down. Any help on what to do about this 
would be appreciated..

Here is the log from the one that panicd.


Jun 10 03:59:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45030 seconds. 
Jun 10 03:59:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45060 seconds. 
Jun 10 04:00:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45090 seconds. 
Jun 10 04:00:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45120 seconds. 
Jun 10 04:01:01 tf1 crond(pam_unix)[15618]: session opened for user root by (uid=0)
Jun 10 04:01:01 tf1 crond(pam_unix)[15618]: session closed for user root
Jun 10 04:01:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45150 seconds. 
Jun 10 04:01:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45180 seconds. 
Jun 10 04:02:01 tf1 crond(pam_unix)[15620]: session opened for user root by (uid=0)
Jun 10 04:02:03 tf1 kernel: des 1
Jun 10 04:02:03 tf1 kernel: clvmd total nodes 1
Jun 10 04:02:03 tf1 kernel: lv1 rebuild resource directory
Jun 10 04:02:03 tf1 kernel: clvmd rebuild resource directory
Jun 10 04:02:03 tf1 kernel: clvmd rebuilt 0 resources
Jun 10 04:02:03 tf1 kernel: clvmd purge requests
Jun 10 04:02:03 tf1 kernel: clvmd purged 0 requests
Jun 10 04:02:03 tf1 kernel: clvmd mark waiting requests
Jun 10 04:02:03 tf1 kernel: clvmd marked 0 requests
Jun 10 04:02:03 tf1 kernel: clvmd purge locks of departed nodes
Jun 10 04:02:03 tf1 kernel: clvmd purged 0 locks
Jun 10 04:02:03 tf1 kernel: clvmd update remastered resources
Jun 10 04:02:03 tf1 kernel: clvmd updated 1 resources
Jun 10 04:02:03 tf1 kernel: clvmd rebuild locks
Jun 10 04:02:03 tf1 kernel: clvmd rebuilt 0 locks
Jun 10 04:02:03 tf1 kernel: clvmd recover event 7 done
Jun 10 04:02:03 tf1 kernel: clvmd move flags 0,0,1 ids 4,7,7
Jun 10 04:02:03 tf1 kernel: clvmd process held requests
Jun 10 04:02:03 tf1 kernel: clvmd processed 0 requests
Jun 10 04:02:03 tf1 kernel: clvmd resend marked requests
Jun 10 04:02:03 tf1 kernel: clvmd resent 0 requests
Jun 10 04:02:03 tf1 kernel: clvmd recover event 7 finished
Jun 10 04:02:03 tf1 kernel: lv1 rebuilt 518 resources
Jun 10 04:02:03 tf1 kernel: lv1 purge requests
Jun 10 04:02:03 tf1 kernel: lv1 purged 0 requests
Jun 10 04:02:03 tf1 kernel: lv1 mark waiting requests
Jun 10 04:02:03 tf1 kernel: lv1 marked 0 requests
Jun 10 04:02:03 tf1 kernel: lv1 purge locks of departed nodes
Jun 10 04:02:03 tf1 kernel: lv1 purged 530 locks
Jun 10 04:02:03 tf1 kernel: lv1 update remastered resources
Jun 10 04:02:03 tf1 kernel: lv1 updated 20609 resources
Jun 10 04:02:03 tf1 kernel: lv1 rebuild locks
Jun 10 04:02:03 tf1 kernel: lv1 rebuilt 0 locks
Jun 10 04:02:03 tf1 kernel: lv1 recover event 7 done
Jun 10 04:02:03 tf1 kernel: lv1 move flags 0,0,1 ids 5,7,7
Jun 10 04:02:03 tf1 kernel: lv1 process held requests
Jun 10 04:02:03 tf1 kernel: lv1 processed 0 requests
Jun 10 04:02:03 tf1 kernel: lv1 resend marked requests
Jun 10 04:02:03 tf1 kernel: lv1 resent 0 requests
Jun 10 04:02:03 tf1 kernel: lv1 recover event 7 finished
Jun 10 04:02:03 tf1 kernel: 6851 pr_start last_stop 0 last_start 6 last_finish 0
Jun 10 04:02:03 tf1 kernel: 6851 pr_start count 2 type 2 event 6 flags 250
Jun 10 04:02:03 tf1 kernel: 6851 claim_jid 1
Jun 10 04:02:03 tf1 kernel: 6851 pr_start 6 done 1
Jun 10 04:02:03 tf1 kernel: 6851 pr_finish flags 5a
Jun 10 04:02:03 tf1 kernel: 6840 recovery_done jid 1 msg 309 a
Jun 10 04:02:03 tf1 kernel: 6840 recovery_done nodeid 1 flg 18
Jun 10 04:02:03 tf1 kernel: 6851 pr_start last_stop 6 last_start 7 last_finish 6
Jun 10 04:02:03 tf1 kernel: 6851 pr_start count 1 type 1 event 7 flags 21a
Jun 10 04:02:03 tf1 kernel: 6851 pr_start cb jid 0 id 2
Jun 10 04:02:03 tf1 kernel: 6851 pr_start 7 done 0
Jun 10 04:02:03 tf1 kernel: 6854 recovery_done jid 0 msg 309 11a
Jun 10 04:02:03 tf1 kernel: 6854 recovery_done nodeid 2 flg 1b
Jun 10 04:02:03 tf1 kernel: 6854 recovery_done start_done 7
Jun 10 04:02:03 tf1 kernel: 6850 pr_finish flags 1a
Jun 10 04:02:03 tf1 kernel: 
Jun 10 04:02:03 tf1 kernel: 
Jun 10 04:02:03 tf1 kernel: lock_dlm:  Assertion failed on line 428 of file 
/usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c
Jun 10 04:02:03 tf1 kernel: lock_dlm:  assertion:  "!error"
Jun 10 04:02:03 tf1 kernel: lock_dlm:  time = 1252230568
Jun 10 04:02:03 tf1 kernel: lv1: num=3,11 err=-22 cur=-1 req=3 lkf=8
Jun 10 04:02:03 tf1 kernel: 
Jun 10 04:02:03 tf1 kernel: ------------[ cut here ]------------
Jun 10 04:02:03 tf1 kernel: kernel BUG at 
/usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c:428!
Jun 10 04:02:03 tf1 kernel: invalid operand: 0000 [#1]
Jun 10 04:02:03 tf1 kernel: SMP 
Jun 10 04:02:03 tf1 kernel: Modules linked in: nls_utf8 vfat fat usb_storage lock_dlm(U) dcdipm(U) 
dcdbas(U) parport_pc lp parport autofs4 i2c_dev i2c_core gfs(U) lock_harness(U) 
dlm(U) cman(U) md5 ipv6 sunrpc button battery ac uhci_hcd ehci_hcd hw_random shpchp eepro100 e100 
mii e1000 floppy sg ext3 jbd dm_mod aic7xxx megaraid_mbox megaraid_mm sd_mod scsi
_mod
Jun 10 04:02:03 tf1 kernel: CPU:    3
Jun 10 04:02:03 tf1 kernel: EIP:    0060:[<f8bc7779>]    Tainted: P      VLI
Jun 10 04:02:03 tf1 kernel: EFLAGS: 00010246   (2.6.9-34.ELsmp) 
Jun 10 04:02:03 tf1 kernel: EIP is at do_dlm_lock+0x134/0x14e [lock_dlm]
Jun 10 04:02:03 tf1 kernel: eax: 00000001   ebx: ffffffea   ecx: c585ace8   edx: f8bcc15f
Jun 10 04:02:03 tf1 kernel: esi: f8bc7798   edi: f77c8400   ebp: c2361600   esp: c585ace4
Jun 10 04:02:03 tf1 kernel: ds: 007b   es: 007b   ss: 0068
Jun 10 04:02:03 tf1 kernel: Process df (pid: 15930, threadinfo=c585a000 task=d94fa6b0)
Jun 10 04:02:03 tf1 kernel: Stack: f8bcc15f 20202020 33202020 20202020 20202020 20202020 31312020 
00000018 
Jun 10 04:02:03 tf1 kernel:        d2956694 c2361600 00000003 00000000 c2361600 f8bc7828 00000003 
f8bcf860 
Jun 10 04:02:03 tf1 kernel:        f8ba0000 f8bf45b2 00000000 00000001 f4fd2064 f4fd2048 f8ba0000 
f8bea5cd 
Jun 10 04:02:03 tf1 kernel: Call Trace:
Jun 10 04:02:03 tf1 kernel:  [<f8bc7828>] lm_dlm_lock+0x49/0x52 [lock_dlm]
Jun 10 04:02:03 tf1 kernel:  [<f8bf45b2>] gfs_lm_lock+0x35/0x4d [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8bea5cd>] gfs_glock_xmote_th+0x130/0x172 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8be9c91>] rq_promote+0xc8/0x147 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8be9e7d>] run_queue+0x91/0xc1 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8beae88>] gfs_glock_nq+0xcf/0x116 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8beb40f>] gfs_glock_nq_init+0x13/0x26 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8c0b6d6>] stat_gfs_async+0x119/0x187 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8c0b80b>] gfs_stat_gfs+0x27/0x4e [gfs]
Jun 10 04:02:03 tf1 kernel:  [<c01aa436>] superblock_has_perm+0x1f/0x23
Jun 10 04:02:03 tf1 kernel:  [<f8c0387e>] gfs_statfs+0x26/0xc7 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<c0158675>] vfs_statfs+0x41/0x59
Jun 10 04:02:03 tf1 kernel:  [<c015876b>] vfs_statfs64+0xe/0x28
Jun 10 04:02:03 tf1 kernel:  [<c0166d75>] __user_walk+0x4a/0x51
Jun 10 04:02:03 tf1 kernel:  [<c0158876>] sys_statfs64+0x52/0xb2
Jun 10 04:02:03 tf1 kernel:  [<c014f598>] do_mmap_pgoff+0x568/0x666
Jun 10 04:02:03 tf1 kernel:  [<c010b693>] sys_mmap2+0x7e/0xaf
Jun 10 04:02:03 tf1 kernel:  [<c011ad21>] do_page_fault+0x0/0x5c6
Jun 10 04:02:03 tf1 kernel:  [<c02d2657>] syscall_call+0x7/0xb
Jun 10 04:02:03 tf1 kernel: Code: 26 50 0f bf 45 24 50 53 ff 75 08 ff 75 04 ff 75 0c ff 77 18 68 8a 
c2 bc f8 e8 ce ae 55 c7 83 c4 38 68 5f c1 bc f8 e8 c1 ae 55 c7 <0f> 0b ac 01 a7
 c0 bc f8 68 61 c1 bc f8 e8 7c a6 55 c7 83 c4 20 
Jun 10 04:02:03 tf1 kernel:  <0>Fatal exception: panic in 5 seconds
Jun 10 04:02:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45210 seconds. 
Jun 16 10:48:47 tf1 syslogd 1.4.1: restart.


From carlopmart at gmail.com  Tue Jun 20 08:24:34 2006
From: carlopmart at gmail.com (carlopmart)
Date: Tue, 20 Jun 2006 10:24:34 +0200
Subject: [Linux-cluster] Problems with ccsd (SOLVED)
In-Reply-To: <448EC753.1010505@dorm.org>
References: <590a9c800606130123s18bc3a7ic42fe7dd85ad3cc8@mail.gmail.com>
	<448EC753.1010505@dorm.org>
Message-ID: <4497B0C2.9020309@gmail.com>

Sorry for my later response ... That is the solution ... Many thanks 
Brenton.


Brenton Rothchild wrote:
> IIRC, I've seen this when the magma-plugins RPM wasn't installed,
> if you're using RPMS that is :)
> 
> -Brenton Rothchild
> 
> 
> C. L. Martinez wrote:
>> Hi all,
>>
>>  I have setup two rhel4 U3 boxes with rhcs 4. When ccsd process tries 
>> to start returns me this error:
>>
>>  [root at srvimss1 init.d]# ccsd
>> Failed to connect to cluster manager.
>> Hint: Magma plugins are not in the right spot.
>>
>>  How can I fix this?? Where is the problem??
>>
>>  My cluster.conf:
>>
>>  <?xml version="1.0" ?>
>> <cluster config_version="2" name="IMSS_Cluster">
>>         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>>         <clusternodes>
>>                 <clusternode name="srvimss1" votes="1">
>>                         <fence>
>>                                 <method name="1">
>>                                         <device name="fence_gnbd" 
>> nodename="srvimss1"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>                 <clusternode name="srvimss2" votes="1">
>>                         <fence>
>>                                 <method name="1">
>>                                         <device name="fence_gnbd" 
>> nodename="srvimss2"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>         </clusternodes>
>>         <cman expected_votes="1" two_node="1"/>
>>         <fencedevices>
>>                 <fencedevice agent="fence_gnbd" name="fence_gnbd" 
>> servers="srvmgmt"/>
>>         </fencedevices>
>>         <rm>
>>                 <failoverdomains>
>>                         <failoverdomain name="PriCluster" ordered="1" 
>> restricted="1">
>>                                 <failoverdomainnode name="srvimss1" 
>> priority="1"/>
>>                                 <failoverdomainnode name="srvimss2" 
>> priority="2"/>
>>                         </failoverdomain>
>>                         <failoverdomain name="SecCluster" ordered="1" 
>> restricted="1">
>>                                 <failoverdomainnode name="srvimss1" 
>> priority="2"/>
>>                                 <failoverdomainnode name="srvimss2" 
>> priority="1"/>
>>                         </failoverdomain>
>>                 </failoverdomains>
>>                 <resources/>
>>         </rm>
>> </cluster>
>>
>> -- 
>> C.L. Martinez
>> clopmart at gmail.com <mailto:clopmart at gmail.com>
>>
>>
>> ------------------------------------------------------------------------
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

-- 
CL Martinez
carlopmart {at} gmail {d0t} com


From mathieu.avila at seanodes.com  Tue Jun 20 09:53:53 2006
From: mathieu.avila at seanodes.com (Mathieu Avila)
Date: Tue, 20 Jun 2006 11:53:53 +0200
Subject: [Linux-cluster] Compilation problem with GFS/GNBD and kernel
	panics	on stress.
In-Reply-To: <4496ADDC.8010206@redhat.com>
References: <44968F28.8070505@seanodes.com> <4496ADDC.8010206@redhat.com>
Message-ID: <4497C5B1.3090501@seanodes.com>

Wendy Cheng wrote:

> Mathieu Avila wrote:
>
>> Hello all,
>> (I've already posted this to cluster-devel at redhat.com,and it seems it 
>> wasn't the appropriate place as i didn't get any answer. Sorry for 
>> the cross-posting.)
>
>
> You posted to the right list (cluster-devel) and we've checked into 
> the issues over the weekend. CVS head should have the correct chagnes 
> now:
>
>
> -- Wendy

Thank you Wendy,

Do you have any idea on the other problem (crash of GNBD+GFS under heavy 
stress) ? Are there any known problems with the versions I use ? Do you 
need additional information to deal with this issue ?

--
Mathieu


From djkast at gmail.com  Tue Jun 20 18:45:58 2006
From: djkast at gmail.com (DJ-Kast .)
Date: Tue, 20 Jun 2006 14:45:58 -0400
Subject: [Linux-cluster] GFS or ???
Message-ID: <bed25d6e0606201145s3ca42e0bp6859f624cc4fbd4f@mail.gmail.com>

Hi,

  I am looking for advice on a configuration for a portal I am setting up.

I will have 3 Load Balanced BSD web servers that will be using a SAN for
storage

I will have 2 clustered Redhat boxes, 1 active and 1 passive connected to
the SAN.
The SAN will be connected via iSCSI to the 2 Redhat boxes

Do I need to use GFS to mount the drives?

I am skeptical about doing the GFS->NFS export, as I've seen lots of posts
of people
having problems.  Can this extra step be eliminated by something more
efficient for
the setup I require?

Thanks in advance

-Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060620/4c778aa9/attachment.htm>

From jason at monsterjam.org  Tue Jun 20 22:11:05 2006
From: jason at monsterjam.org (Jason)
Date: Tue, 20 Jun 2006 18:11:05 -0400
Subject: [Linux-cluster] [2nd try: servers crashing while not doing much.]
Message-ID: <20060620221104.GA20673@monsterjam.org>

hey folks,

I have 2 nodes running GFS 6.1.5
[root at tf1 ~]# rpm -qa | grep -i gfs
GFS-6.1.5-0
GFS-kernheaders-2.6.9-49.1
GFS-kernel-smp-2.6.9-49.1
[root at tf1 ~]# rpm -qa | grep -i ccs
ccs-devel-1.0.3-0
ccs-1.0.3-0
[root at tf1 ~]# 
[root at tf1 ~]# uname -a
Linux tf1.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:54:53 EST 2006 i686 i686 i386 GNU/Linux
[root at tf1 ~]# 


and last week, we had them both go down on us unexpectedly.
one had paniced and the other was powered off..

these systems are NOT in production yet, so there was some data on the GFS partition, but im pretty 
sure that there was not much activity when the boxes went down. Any help on what to do about this 
would be appreciated..

Here is the log from the one that panicd.


Jun 10 03:59:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45030 seconds. 
Jun 10 03:59:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45060 seconds. 
Jun 10 04:00:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45090 seconds. 
Jun 10 04:00:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45120 seconds. 
Jun 10 04:01:01 tf1 crond(pam_unix)[15618]: session opened for user root by (uid=0)
Jun 10 04:01:01 tf1 crond(pam_unix)[15618]: session closed for user root
Jun 10 04:01:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45150 seconds. 
Jun 10 04:01:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45180 seconds. 
Jun 10 04:02:01 tf1 crond(pam_unix)[15620]: session opened for user root by (uid=0)
Jun 10 04:02:03 tf1 kernel: des 1
Jun 10 04:02:03 tf1 kernel: clvmd total nodes 1
Jun 10 04:02:03 tf1 kernel: lv1 rebuild resource directory
Jun 10 04:02:03 tf1 kernel: clvmd rebuild resource directory
Jun 10 04:02:03 tf1 kernel: clvmd rebuilt 0 resources
Jun 10 04:02:03 tf1 kernel: clvmd purge requests
Jun 10 04:02:03 tf1 kernel: clvmd purged 0 requests
Jun 10 04:02:03 tf1 kernel: clvmd mark waiting requests
Jun 10 04:02:03 tf1 kernel: clvmd marked 0 requests
Jun 10 04:02:03 tf1 kernel: clvmd purge locks of departed nodes
Jun 10 04:02:03 tf1 kernel: clvmd purged 0 locks
Jun 10 04:02:03 tf1 kernel: clvmd update remastered resources
Jun 10 04:02:03 tf1 kernel: clvmd updated 1 resources
Jun 10 04:02:03 tf1 kernel: clvmd rebuild locks
Jun 10 04:02:03 tf1 kernel: clvmd rebuilt 0 locks
Jun 10 04:02:03 tf1 kernel: clvmd recover event 7 done
Jun 10 04:02:03 tf1 kernel: clvmd move flags 0,0,1 ids 4,7,7
Jun 10 04:02:03 tf1 kernel: clvmd process held requests
Jun 10 04:02:03 tf1 kernel: clvmd processed 0 requests
Jun 10 04:02:03 tf1 kernel: clvmd resend marked requests
Jun 10 04:02:03 tf1 kernel: clvmd resent 0 requests
Jun 10 04:02:03 tf1 kernel: clvmd recover event 7 finished
Jun 10 04:02:03 tf1 kernel: lv1 rebuilt 518 resources
Jun 10 04:02:03 tf1 kernel: lv1 purge requests
Jun 10 04:02:03 tf1 kernel: lv1 purged 0 requests
Jun 10 04:02:03 tf1 kernel: lv1 mark waiting requests
Jun 10 04:02:03 tf1 kernel: lv1 marked 0 requests
Jun 10 04:02:03 tf1 kernel: lv1 purge locks of departed nodes
Jun 10 04:02:03 tf1 kernel: lv1 purged 530 locks
Jun 10 04:02:03 tf1 kernel: lv1 update remastered resources
Jun 10 04:02:03 tf1 kernel: lv1 updated 20609 resources
Jun 10 04:02:03 tf1 kernel: lv1 rebuild locks
Jun 10 04:02:03 tf1 kernel: lv1 rebuilt 0 locks
Jun 10 04:02:03 tf1 kernel: lv1 recover event 7 done
Jun 10 04:02:03 tf1 kernel: lv1 move flags 0,0,1 ids 5,7,7
Jun 10 04:02:03 tf1 kernel: lv1 process held requests
Jun 10 04:02:03 tf1 kernel: lv1 processed 0 requests
Jun 10 04:02:03 tf1 kernel: lv1 resend marked requests
Jun 10 04:02:03 tf1 kernel: lv1 resent 0 requests
Jun 10 04:02:03 tf1 kernel: lv1 recover event 7 finished
Jun 10 04:02:03 tf1 kernel: 6851 pr_start last_stop 0 last_start 6 last_finish 0
Jun 10 04:02:03 tf1 kernel: 6851 pr_start count 2 type 2 event 6 flags 250
Jun 10 04:02:03 tf1 kernel: 6851 claim_jid 1
Jun 10 04:02:03 tf1 kernel: 6851 pr_start 6 done 1
Jun 10 04:02:03 tf1 kernel: 6851 pr_finish flags 5a
Jun 10 04:02:03 tf1 kernel: 6840 recovery_done jid 1 msg 309 a
Jun 10 04:02:03 tf1 kernel: 6840 recovery_done nodeid 1 flg 18
Jun 10 04:02:03 tf1 kernel: 6851 pr_start last_stop 6 last_start 7 last_finish 6
Jun 10 04:02:03 tf1 kernel: 6851 pr_start count 1 type 1 event 7 flags 21a
Jun 10 04:02:03 tf1 kernel: 6851 pr_start cb jid 0 id 2
Jun 10 04:02:03 tf1 kernel: 6851 pr_start 7 done 0
Jun 10 04:02:03 tf1 kernel: 6854 recovery_done jid 0 msg 309 11a
Jun 10 04:02:03 tf1 kernel: 6854 recovery_done nodeid 2 flg 1b
Jun 10 04:02:03 tf1 kernel: 6854 recovery_done start_done 7
Jun 10 04:02:03 tf1 kernel: 6850 pr_finish flags 1a
Jun 10 04:02:03 tf1 kernel: 
Jun 10 04:02:03 tf1 kernel: 
Jun 10 04:02:03 tf1 kernel: lock_dlm:  Assertion failed on line 428 of file 
/usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c
Jun 10 04:02:03 tf1 kernel: lock_dlm:  assertion:  "!error"
Jun 10 04:02:03 tf1 kernel: lock_dlm:  time = 1252230568
Jun 10 04:02:03 tf1 kernel: lv1: num=3,11 err=-22 cur=-1 req=3 lkf=8
Jun 10 04:02:03 tf1 kernel: 
Jun 10 04:02:03 tf1 kernel: ------------[ cut here ]------------
Jun 10 04:02:03 tf1 kernel: kernel BUG at 
/usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c:428!
Jun 10 04:02:03 tf1 kernel: invalid operand: 0000 [#1]
Jun 10 04:02:03 tf1 kernel: SMP 
Jun 10 04:02:03 tf1 kernel: Modules linked in: nls_utf8 vfat fat usb_storage lock_dlm(U) dcdipm(U) 
dcdbas(U) parport_pc lp parport autofs4 i2c_dev i2c_core gfs(U) lock_harness(U) 
dlm(U) cman(U) md5 ipv6 sunrpc button battery ac uhci_hcd ehci_hcd hw_random shpchp eepro100 e100 
mii e1000 floppy sg ext3 jbd dm_mod aic7xxx megaraid_mbox megaraid_mm sd_mod scsi
_mod
Jun 10 04:02:03 tf1 kernel: CPU:    3
Jun 10 04:02:03 tf1 kernel: EIP:    0060:[<f8bc7779>]    Tainted: P      VLI
Jun 10 04:02:03 tf1 kernel: EFLAGS: 00010246   (2.6.9-34.ELsmp) 
Jun 10 04:02:03 tf1 kernel: EIP is at do_dlm_lock+0x134/0x14e [lock_dlm]
Jun 10 04:02:03 tf1 kernel: eax: 00000001   ebx: ffffffea   ecx: c585ace8   edx: f8bcc15f
Jun 10 04:02:03 tf1 kernel: esi: f8bc7798   edi: f77c8400   ebp: c2361600   esp: c585ace4
Jun 10 04:02:03 tf1 kernel: ds: 007b   es: 007b   ss: 0068
Jun 10 04:02:03 tf1 kernel: Process df (pid: 15930, threadinfo=c585a000 task=d94fa6b0)
Jun 10 04:02:03 tf1 kernel: Stack: f8bcc15f 20202020 33202020 20202020 20202020 20202020 31312020 
00000018 
Jun 10 04:02:03 tf1 kernel:        d2956694 c2361600 00000003 00000000 c2361600 f8bc7828 00000003 
f8bcf860 
Jun 10 04:02:03 tf1 kernel:        f8ba0000 f8bf45b2 00000000 00000001 f4fd2064 f4fd2048 f8ba0000 
f8bea5cd 
Jun 10 04:02:03 tf1 kernel: Call Trace:
Jun 10 04:02:03 tf1 kernel:  [<f8bc7828>] lm_dlm_lock+0x49/0x52 [lock_dlm]
Jun 10 04:02:03 tf1 kernel:  [<f8bf45b2>] gfs_lm_lock+0x35/0x4d [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8bea5cd>] gfs_glock_xmote_th+0x130/0x172 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8be9c91>] rq_promote+0xc8/0x147 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8be9e7d>] run_queue+0x91/0xc1 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8beae88>] gfs_glock_nq+0xcf/0x116 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8beb40f>] gfs_glock_nq_init+0x13/0x26 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8c0b6d6>] stat_gfs_async+0x119/0x187 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<f8c0b80b>] gfs_stat_gfs+0x27/0x4e [gfs]
Jun 10 04:02:03 tf1 kernel:  [<c01aa436>] superblock_has_perm+0x1f/0x23
Jun 10 04:02:03 tf1 kernel:  [<f8c0387e>] gfs_statfs+0x26/0xc7 [gfs]
Jun 10 04:02:03 tf1 kernel:  [<c0158675>] vfs_statfs+0x41/0x59
Jun 10 04:02:03 tf1 kernel:  [<c015876b>] vfs_statfs64+0xe/0x28
Jun 10 04:02:03 tf1 kernel:  [<c0166d75>] __user_walk+0x4a/0x51
Jun 10 04:02:03 tf1 kernel:  [<c0158876>] sys_statfs64+0x52/0xb2
Jun 10 04:02:03 tf1 kernel:  [<c014f598>] do_mmap_pgoff+0x568/0x666
Jun 10 04:02:03 tf1 kernel:  [<c010b693>] sys_mmap2+0x7e/0xaf
Jun 10 04:02:03 tf1 kernel:  [<c011ad21>] do_page_fault+0x0/0x5c6
Jun 10 04:02:03 tf1 kernel:  [<c02d2657>] syscall_call+0x7/0xb
Jun 10 04:02:03 tf1 kernel: Code: 26 50 0f bf 45 24 50 53 ff 75 08 ff 75 04 ff 75 0c ff 77 18 68 8a 
c2 bc f8 e8 ce ae 55 c7 83 c4 38 68 5f c1 bc f8 e8 c1 ae 55 c7 <0f> 0b ac 01 a7
 c0 bc f8 68 61 c1 bc f8 e8 7c a6 55 c7 83 c4 20 
Jun 10 04:02:03 tf1 kernel:  <0>Fatal exception: panic in 5 seconds
Jun 10 04:02:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45210 seconds. 
Jun 16 10:48:47 tf1 syslogd 1.4.1: restart.


----- End forwarded message -----

-- 
================================================
|    Jason Welsh   jason at monsterjam.org        |
| http://monsterjam.org    DSS PGP: 0x5E30CC98 |
|    gpg key: http://monsterjam.org/gpg/       |
================================================


From jbrassow at redhat.com  Wed Jun 21 02:19:21 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Tue, 20 Jun 2006 21:19:21 -0500
Subject: [Linux-cluster] GFS or ???
In-Reply-To: <bed25d6e0606201145s3ca42e0bp6859f624cc4fbd4f@mail.gmail.com>
References: <bed25d6e0606201145s3ca42e0bp6859f624cc4fbd4f@mail.gmail.com>
Message-ID: <37358bd202580513a95f2da80e092dfa@redhat.com>

GFS is primarily used in active/active setups.  You may be able to get 
by with rgmanager if you are using active/passive, but I'll let someone 
who knows more talk about that.

  brassow

On Jun 20, 2006, at 1:45 PM, DJ-Kast . wrote:

> Hi,
>
> ? I am looking for advice on a configuration for a portal I am setting 
> up.
>
> I will have 3 Load Balanced BSD web servers that will be using a SAN 
> for storage
>
> I will have 2 clustered Redhat boxes, 1 active and 1 passive connected 
> to the SAN.
> The SAN will be connected via iSCSI to the 2 Redhat boxes
>
> Do I need to use GFS to mount the drives?
>
> I am skeptical about doing the GFS->NFS export, as I've seen lots of 
> posts of people
> having problems.? Can this extra step be eliminated by something more 
> efficient for
> the setup I require?
>
> Thanks in advance
>
> -Paul
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From Sherman.Chan at world.net  Wed Jun 21 08:55:07 2006
From: Sherman.Chan at world.net (Sherman Chan)
Date: Wed, 21 Jun 2006 16:55:07 +0800
Subject: [Linux-cluster] RE : GFS  Sharing Hard Disk
Message-ID: <95DE5EAA51B5014CB68664BE5192A89002723607@exchange.world.net>

Hi,
I would like to know is that possible to has a hard disk physically shared
by multi servers.  I have a SAN which has a logical disk setup that can be
mount/accessed by multi servers at the same time directly, however lacking
off global locking/synchronization system the data can not be shared
properly, data I update from server 1, could not been seen on server 2,
unless I dismount and remount the disk on server 2.  I know thing can not be
that easy.
 
I do not want to lose performance by using NFS or iSCSI.  I have look at
GFS, it seems that is a right tools to me but I do not want to setup a full
cluster environment.  Does it has any way to use GFS without a complete
cluster setup?
 
 
Thanks
Sherman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060621/f9a98faa/attachment.htm>

From adingman at cookgroup.com  Wed Jun 21 12:24:42 2006
From: adingman at cookgroup.com (Andrew C. Dingman)
Date: Wed, 21 Jun 2006 08:24:42 -0400
Subject: [Linux-cluster] RE : GFS  Sharing Hard Disk
In-Reply-To: <95DE5EAA51B5014CB68664BE5192A89002723607@exchange.world.net>
References: <95DE5EAA51B5014CB68664BE5192A89002723607@exchange.world.net>
Message-ID: <1150892682.13144.6.camel@ampelos.cin.cook>

The short answer is that you need a "complete cluster setup" to make any
simultaneous-access shared-storage solution work. You need cluster
membership, locking, and fencing for GFS. Depending on your needs, you
might be able to skip much of the rest of cluster suite's feature set,
but you really do need that membership and locking infrastructure. It
makes safe concurrent access to the same disks possible.


On Wed, 2006-06-21 at 16:55 +0800, Sherman Chan wrote:
> Hi,
> I would like to know is that possible to has a hard disk physically
> shared by multi servers.  I have a SAN which has a logical disk setup
> that can be mount/accessed by multi servers at the same time directly,
> however lacking off global locking/synchronization system the data can
> not be shared properly, data I update from server 1, could not been
> seen on server 2, unless I dismount and remount the disk on server
> 2.  I know thing can not be that easy.
>  
> I do not want to lose performance by using NFS or iSCSI.  I have look
> at GFS, it seems that is a right tools to me but I do not want to
> setup a full cluster environment.  Does it has any way to use GFS
> without a complete cluster setup?
>  
>  
>  
> Thanks
> Sherman
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-- 
Andrew C. Dingman
Unix Administrator
Cook
(812)339-2235 x2131
adingman at cookgroup.com


From ranjtech at gmail.com  Wed Jun 21 17:35:05 2006
From: ranjtech at gmail.com (RR)
Date: Thu, 22 Jun 2006 03:35:05 +1000
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
Message-ID: <001a01c69559$0a7cd920$1f768b60$@com>

Hello all,

 
Is there a particular manner I should partition the local filesystems of
each of the cluster nodes to support the Cluster Suite w/GFS or it doesn't
matter? My specific requirement is that I may or may not be able to change
the location where this specific application writes data. And I need that
directory/filesystem that this data is written to, e.g. /var/spool to be
accessible on my iSCSI SAN by all the cluster nodes. The answer could be
very simple but want to double check.

 
Rgds,

RR

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060622/78a97ea6/attachment.htm>

From teigland at redhat.com  Wed Jun 21 17:47:59 2006
From: teigland at redhat.com (David Teigland)
Date: Wed, 21 Jun 2006 12:47:59 -0500
Subject: [Linux-cluster] [2nd try: servers crashing while not doing much.]
In-Reply-To: <20060620221104.GA20673@monsterjam.org>
References: <20060620221104.GA20673@monsterjam.org>
Message-ID: <20060621174759.GA4706@redhat.com>

On Tue, Jun 20, 2006 at 06:11:05PM -0400, Jason wrote:
> Jun 10 04:02:03 tf1 kernel: lock_dlm:  Assertion failed on line 428 of file 
> /usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c
> Jun 10 04:02:03 tf1 kernel: lock_dlm:  assertion:  "!error"
> Jun 10 04:02:03 tf1 kernel: lock_dlm:  time = 1252230568
> Jun 10 04:02:03 tf1 kernel: lv1: num=3,11 err=-22 cur=-1 req=3 lkf=8

Unfortunately this assertion doesn't tell us much since any number of
problems can lead to this.  We usually depend on previous debug messages
to figure out what happened, but there wasn't anything unusual in the logs
you posted.  I'm going to be adding some extra debugging to help narrow
this down, but that's not slated until RHEL4U5.

Dave


From teigland at redhat.com  Wed Jun 21 17:54:30 2006
From: teigland at redhat.com (David Teigland)
Date: Wed, 21 Jun 2006 12:54:30 -0500
Subject: [Linux-cluster] GFS locking issues
In-Reply-To: <433fd2630606160837x1bbe2716pf7b375f42b01cdbd@mail.gmail.com>
References: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com>
	<20060615190959.GB1913@redhat.com>
	<433fd2630606160837x1bbe2716pf7b375f42b01cdbd@mail.gmail.com>
Message-ID: <20060621175430.GB4706@redhat.com>

On Fri, Jun 16, 2006 at 06:37:14PM +0300, Anton Kornev wrote:
> gnbd (pid 5836: alogc.pl) got signal 9
> gnbd0: Send control failed (result -4)
> gnbd (pid 5836: alogc.pl) got signal 15
> gnbd0: Send control failed (result -4)

This and the fact that a number of processes appear to be blocked in the
i/o path seem to point at gnbd as the hold-up.

Dave


>   51 D wait_on_buffer                   pdflush
> 5771 D lock_page                        lock_dlm1
> 5776 D -                                gfs_logd
> 5777 D -                                gfs_quotad
> 5778 D -                                gfs_inoded
> 5892 D -                                httpd
> 5895 D glock_wait_internal              httpd
> 5896 D glock_wait_internal              httpd
> 5897 D glock_wait_internal              httpd
> 5911 D glock_wait_internal              httpd
> 5915 D wait_on_buffer                   httpd
> 5930 D wait_on_buffer                   sh

> pdflush       D ffffffff8014aabc     0    51      6            53    50
> (L-TLB)
> 00000100dfc3dc78 0000000000000046 000001011bd3e980 000001010fc11f00
>       0000000000000216 ffffffffa0042916 000001011aca60c0 0000000000000008
>       000001011fdef7f0 0000000000000dfa
> Call Trace:<ffffffffa0042916>{:dm_mod:dm_request+396}
> <ffffffff8014aabc>{keventd_create_kthread+0}
>       <ffffffff803053ef>{io_schedule+38}
> <ffffffff80178c4c>{__wait_on_buffer+125}
>       <ffffffff80178ad2>{bh_wake_function+0}
> <ffffffff80178ad2>{bh_wake_function+0}
>       <ffffffffa0235c5d>{:gfs:gfs_logbh_wait+49}
> <ffffffffa024a6a6>{:gfs:disk_commit+794}
>       <ffffffffa024a877>{:gfs:log_refund+111}
> <ffffffffa024ad8e>{:gfs:log_flush_internal+510}
>       <ffffffff8017d682>{sync_supers+167} <ffffffff8015e310>{wb_kupdate+36}
> 
>       <ffffffff8015edb4>{pdflush+323} <ffffffff8015e2ec>{wb_kupdate+0}
>       <ffffffff8015ec71>{pdflush+0} <ffffffff8014aa93>{kthread+200}
>       <ffffffff80110e17>{child_rip+8}
> <ffffffff8014aabc>{keventd_create_kthread+0}
>       <ffffffff8014a9cb>{kthread+0} <ffffffff80110e0f>{child_rip+0}
> lock_dlm1     D 000001000c0096e0     0  5771      6          5772  5766
> (L-TLB)
> 0000010113ce3c58 0000000000000046 0000001000000000 0000010000000069
>       000001011420b030 0000000000000069 000001000c00a940 000000010000eb10
>       000001011a887030 0000000000001cae
> Call Trace:<ffffffff802496d4>{__generic_unplug_device+19}
> <ffffffff803053ef>{io_schedule+38}
>       <ffffffff80159215>{__lock_page+191}
> <ffffffff80158cfa>{page_wake_function+0}
>       <ffffffff80158cfa>{page_wake_function+0}
> <ffffffff80163125>{truncate_inode_pages+519}
>       <ffffffffa0258f35>{:gfs:gfs_inval_page+63}
> <ffffffffa02401b5>{:gfs:drop_bh+233}
>       <ffffffffa0242138>{:gfs:gfs_glock_cb+194}
> <ffffffffa02869dd>{:lock_dlm:dlm_async+1989}
>       <ffffffff801333c8>{default_wake_function+0}
> <ffffffff8014aabc>{keventd_create_kthread+0}
>       <ffffffffa0286218>{:lock_dlm:dlm_async+0}
> <ffffffff8014aabc>{keventd_create_kthread+0}
>       <ffffffff8014aa93>{kthread+200} <ffffffff80110e17>{child_rip+8}
>       <ffffffff8014aabc>{keventd_create_kthread+0}
> <ffffffff8014a9cb>{kthread+0}
>       <ffffffff80110e0f>{child_rip+0}
> gfs_logd      D 0000000000000000     0  5776      1          5777  5775
> (L-TLB)
> 000001011387fe38 0000000000000046 0000000000000000 ffffffff80304a85
>       000001011387fe58 ffffffff80304add ffffffff803cca80 0000000000000246
>       00000101143fe030 00000000000000b5
> Call Trace:<ffffffff80304a85>{thread_return+0}
> <ffffffff80304add>{thread_return+88}
>       <ffffffffa023e8d3>{:gfs:lock_on_glock+112}
> <ffffffff8030565b>{__down_write+134}
>       <ffffffffa0249cdb>{:gfs:gfs_ail_empty+56}
> <ffffffffa0233930>{:gfs:gfs_logd+77}
>       <ffffffff80110e17>{child_rip+8}
> <ffffffff801cccff>{dummy_d_instantiate+0}
>       <ffffffffa02338e3>{:gfs:gfs_logd+0} <ffffffff80110e0f>{child_rip+0}
> 
> gfs_quotad    D 0000000000000000     0  5777      1          5778  5776
> (L-TLB)
> 0000010113881e98 0000000000000046 0000000000000000 ffffffff80304a85
>       0000010113881eb8 ffffffff80304add 000001011ff87030 0000000100000074
>       000001011430f7f0 0000000000000128
> Call Trace:<ffffffff80304a85>{thread_return+0}
> <ffffffff80304add>{thread_return+88}
>       <ffffffff8030565b>{__down_write+134}
> <ffffffffa025b55a>{:gfs:gfs_quota_sync+226}
>       <ffffffffa0233ab1>{:gfs:gfs_quotad+127}
> <ffffffff80110e17>{child_rip+8}
>       <ffffffff801cccff>{dummy_d_instantiate+0}
> <ffffffff801cccff>{dummy_d_instantiate+0}
>       <ffffffff801cccff>{dummy_d_instantiate+0}
> <ffffffffa0233a32>{:gfs:gfs_quotad+0}
>       <ffffffff80110e0f>{child_rip+0}
> gfs_inoded    D 0000000000000000     0  5778      1          5807  5777
> (L-TLB)
> 0000010113883e98 0000000000000046 000001011e2937f0 000001000c0096e0
>       0000000000000000 ffffffff80304a85 0000010113883ec8 0000000180304add
>       000001011e2937f0 00000000000000c2
> Call Trace:<ffffffff80304a85>{thread_return+0}
> <ffffffff8030565b>{__down_write+134}
>       <ffffffffa026160d>{:gfs:unlinked_find+115}
> <ffffffffa0261c6c>{:gfs:gfs_unlinked_dealloc+25}
>       <ffffffffa0233bd5>{:gfs:gfs_inoded+66}
> <ffffffff80110e17>{child_rip+8}
>       <ffffffffa0233b93>{:gfs:gfs_inoded+0} <ffffffff80110e0f>{child_rip+0}
> 
> 
> httpd         D ffffffff80304190     0  5892      1  5893          5826
> (NOTLB)
> 0000010111b75bf8 0000000000000002 0000000000000001 0000000000000001
>       0000000000000000 0000000000000000 0000010114667980 0000000111b75bc0
>       00000101143fe7f0 00000000000009ad
> Call Trace:<ffffffff80303d6f>{__down+147}
> <ffffffff801333c8>{default_wake_function+0}
>       <ffffffff8015b3a2>{generic_file_write_nolock+158}
> <ffffffff80305780>{__down_failed+53}
>       <ffffffffa0236986>{:gfs:.text.lock.dio+95}
> <ffffffffa0260e4c>{:gfs:gfs_trans_add_bh+205}
>       <ffffffffa0253efc>{:gfs:do_write_buf+1138}
> <ffffffffa0252db3>{:gfs:walk_vm+278}
>       <ffffffffa0253a8a>{:gfs:do_write_buf+0}
> <ffffffffa0253a8a>{:gfs:do_write_buf+0}
>       <ffffffffa025415b>{:gfs:__gfs_write+201}
> <ffffffff80177c60>{vfs_write+207}
>       <ffffffff80177d48>{sys_write+69} <ffffffff801101c6>{system_call+126}
> 
> httpd         D 0000010110ad7d48     0  5895   5892          5896  5893
> (NOTLB)
> 0000010110ad7bd8 0000000000000006 000001011b16e030 0000000000000075
>       0000010117002030 0000000000000075 000001000c002940 0000000000000001
>       00000101170027f0 000000000001300e
> Call Trace:<ffffffff80131d1d>{try_to_wake_up+863}
> <ffffffff80304cbd>{wait_for_completion+167}
>       <ffffffff801333c8>{default_wake_function+0}
> <ffffffff801333c8>{default_wake_function+0}
>       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
> <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
>       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
> <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
>       <ffffffff80168211>{do_no_page+1003}
> <ffffffff80167b13>{do_wp_page+948}
>       <ffffffff8016858f>{handle_mm_fault+343}
> <ffffffff80142a06>{get_signal_to_deliver+1118}
>       <ffffffff801236d2>{do_page_fault+518}
> <ffffffff80304a85>{thread_return+0}
>       <ffffffff80304add>{thread_return+88} <ffffffff80110c61>{error_exit+0}
> 
> 
> httpd         D 0000010110b5bd48     0  5896   5892          5897  5895
> (NOTLB)
> 0000010110b5bbd8 0000000000000002 00000101170027f0 0000000000000075
>       00000101114787f0 0000000000000075 000001000c002940 0000000000000001
>       0000010117002030 000000000000fb3e
> Call Trace:<ffffffff80131d1d>{try_to_wake_up+863}
> <ffffffff80304cbd>{wait_for_completion+167}
>       <ffffffff801333c8>{default_wake_function+0}
> <ffffffff801333c8>{default_wake_function+0}
>       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
> <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
>       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
> <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
>       <ffffffff80168211>{do_no_page+1003}
> <ffffffff80167b13>{do_wp_page+948}
>       <ffffffff8016858f>{handle_mm_fault+343}
> <ffffffff80142a06>{get_signal_to_deliver+1118}
>       <ffffffff801236d2>{do_page_fault+518}
> <ffffffff802a3445>{sys_accept+327}
>       <ffffffff80182e88>{pipe_read+26} <ffffffff80110c61>{error_exit+0}
> 
> httpd         D 0000000000000000     0  5897   5892          5911  5896
> (NOTLB)
> 0000010110119bd8 0000000000000006 0000010117002030 0000000000000075
>       0000010117002030 0000000000000075 000001000c00a940 000000001b16e030
>       00000101114787f0 000000000000fbe0
> Call Trace:<ffffffff802496d4>{__generic_unplug_device+19}
> <ffffffff80304cbd>{wait_for_completion+167}
>       <ffffffff801333c8>{default_wake_function+0}
> <ffffffff801333c8>{default_wake_function+0}
>       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
> <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
>       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
> <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
>       <ffffffff80168211>{do_no_page+1003}
> <ffffffff80167b13>{do_wp_page+948}
>       <ffffffff8016858f>{handle_mm_fault+343}
> <ffffffff80142a06>{get_signal_to_deliver+1118}
>       <ffffffff801236d2>{do_page_fault+518}
> <ffffffff80304a85>{thread_return+0}
>       <ffffffff80304add>{thread_return+88} <ffffffff80110c61>{error_exit+0}
> 
> 
> httpd         D 00000101100c3d48     0  5911   5892          5915  5897
> (NOTLB)
> 00000101100c3bd8 0000000000000002 000001011420b7f0 0000000000000075
>       00000101170027f0 0000000000000075 000001000c002940 0000000000000000
>       000001011b16e030 000000000000187e
> Call Trace:<ffffffff80131d1d>{try_to_wake_up+863}
> <ffffffff80304cbd>{wait_for_completion+167}
>       <ffffffff801333c8>{default_wake_function+0}
> <ffffffff801333c8>{default_wake_function+0}
>       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
> <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
>       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
> <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
>       <ffffffff80168211>{do_no_page+1003}
> <ffffffff80167b13>{do_wp_page+948}
>       <ffffffff8016858f>{handle_mm_fault+343}
> <ffffffff80142a06>{get_signal_to_deliver+1118}
>       <ffffffff801236d2>{do_page_fault+518}
> <ffffffff80304a85>{thread_return+0}
>       <ffffffff80304add>{thread_return+88} <ffffffff80110c61>{error_exit+0}
> 
> 
> httpd         D 0000000000006a36     0  5915   5892                5911
> (NOTLB)
> 00000101180f7ad8 0000000000000006 0000000000002706 ffffffffa020c791
>       0000000000000000 0000000000000000 0000030348ac8c1c 0000000114a217f0
>       0000010114c997f0 000000000000076a
> Call Trace:<ffffffffa020c791>{:dlm:lkb_swqueue+43}
> <ffffffff803053ef>{io_schedule+38}
>       <ffffffff80178c4c>{__wait_on_buffer+125}
> <ffffffff80178ad2>{bh_wake_function+0}
>       <ffffffff80178ad2>{bh_wake_function+0}
> <ffffffffa02352c6>{:gfs:gfs_dreread+154}
>       <ffffffffa0235332>{:gfs:gfs_dread+40}
> <ffffffffa02363b1>{:gfs:gfs_get_meta_buffer+201}
>       <ffffffffa0242999>{:gfs:gfs_copyin_dinode+23}
> <ffffffffa0242461>{:gfs:inode_go_lock+38}
>       <ffffffffa023f586>{:gfs:glock_wait_internal+563}
> <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
>       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
> <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
>       <ffffffff80168211>{do_no_page+1003}
> <ffffffff80167b13>{do_wp_page+948}
>       <ffffffff8016858f>{handle_mm_fault+343}
> <ffffffff80142a06>{get_signal_to_deliver+1118}
>       <ffffffff801236d2>{do_page_fault+518}
> <ffffffff80304a85>{thread_return+0}
>       <ffffffff80304add>{thread_return+88} <ffffffff80110c61>{error_exit+0}
> 
> 
> sh            D 000000000000001a     0  5930   2547
> (NOTLB)
> 000001011090f8e8 0000000000000002 0000010111293d88 0000010110973d00
>       0000010111293d88 0000000000000000 00000100dfc02400 0000000000010000
>       00000101148557f0 0000000000002010
> Call Trace:<ffffffff803053ef>{io_schedule+38}
> <ffffffff80178c4c>{__wait_on_buffer+125}
>       <ffffffff80178ad2>{bh_wake_function+0}
> <ffffffff80178ad2>{bh_wake_function+0}
>       <ffffffffa02352c6>{:gfs:gfs_dreread+154}
> <ffffffffa0235332>{:gfs:gfs_dread+40}
>       <ffffffffa02363b1>{:gfs:gfs_get_meta_buffer+201}
> <ffffffffa0242999>{:gfs:gfs_copyin_dinode+23}
>       <ffffffffa0242461>{:gfs:inode_go_lock+38}
> <ffffffffa023f586>{:gfs:glock_wait_internal+563}
>       <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
> <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
>       <ffffffff801ccb78>{dummy_inode_permission+0}
> <ffffffffa0257aca>{:gfs:gfs_permission+64}
>       <ffffffff8018d475>{dput+56} <ffffffff80183d32>{permission+51}
>       <ffffffff801844aa>{__link_path_walk+372}
> <ffffffff801851c2>{link_path_walk+82}
>       <ffffffff8012370b>{do_page_fault+575}
> <ffffffff801849b0>{__link_path_walk+1658}
>       <ffffffff801851c2>{link_path_walk+82}
> <ffffffff8012370b>{do_page_fault+575}
>       <ffffffff8018540f>{path_lookup+451}
> <ffffffff801856bb>{__user_walk+47}
>       <ffffffff8017ff1a>{vfs_stat+24} <ffffffff8012370b>{do_page_fault+575}
> 
>       <ffffffff80180264>{sys_newstat+17} <ffffffff80110c61>{error_exit+0}
>       <ffffffff801101c6>{system_call+126}


From vcmarti at sph.emory.edu  Wed Jun 21 18:13:40 2006
From: vcmarti at sph.emory.edu (Vernard C. Martin)
Date: Wed, 21 Jun 2006 14:13:40 -0400
Subject: [Linux-cluster] Error starting up CLVMD
In-Reply-To: <448974D7.7050801@redhat.com>
References: <448885C5.4050505@sph.emory.edu><1149804993.12291.27.camel@techn
	etium.msp.redhat.com> <448974D7.7050801@redhat.com>
Message-ID: <44998C54.7090403@sph.emory.edu>

Patrick Caulfield wrote:
> Bob's right, it sounds like the DLM isn't loaded. The module name is just
> "dlm" BTW and the device should show up in /proc/misc and (if udev is running)
> /dev/misc/dlm-control. lock_dlm is the GFS interface to the DLM...yes, I know
> it's confusing.
>   
Just a followup, that was indeed the problem and everything is running 
nicely at this time. Well, the cluster suite is running. I've still got 
other issues :-)

Vernard


From gstaltari at arnet.net.ar  Wed Jun 21 18:10:30 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Wed, 21 Jun 2006 15:10:30 -0300
Subject: [Linux-cluster] kernel panic - help!
Message-ID: <44998B96.8010001@arnet.net.ar>

Hi, we have a 6 node cluster with FC4, kernel 2.6.16 and the last stable 
cluster tarball. The cluster was OK until we had a little SAN failure, 
since then, the cluster (entirely) is getting kernel panic. This is the 
dump:

qmail-be-04 kernel: ------------[ cut here ]------------
qmail-be-04 kernel: kernel BUG at 
/soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c:357!
qmail-be-04 kernel: invalid opcode: 0000 [#1]
qmail-be-04 kernel: SMP
qmail-be-04 kernel: CPU:    0
qmail-be-04 kernel: EIP is at do_dlm_unlock+0xd1/0xe5 [lock_dlm]
qmail-be-04 kernel: eax: 00000004   ebx: 00000084   ecx: ffffeb92   edx: 
00000000
qmail-be-04 kernel: esi: 00010001   edi: ffffffea   ebp: dc9495c0   esp: 
e382fef4
qmail-be-04 kernel: ds: 007b   es: 007b   ss: 0068
qmail-be-04 kernel: Process gfs_glockd (pid: 29218, threadinfo=e382f000 
task=f3524550)
qmail-be-04 kernel: Stack: <0>f8e95673 f3b9f700 ffffffea 00000002 
007798a8 00000000 00010001 00000084
qmail-be-04 kernel:        00000002 f9618000 00000003 dc9495c0 eaa6ae84 
f8e8f52e f8eb46b5 eaa6aeb4

All nodes dies at the same time with this kernel panic.
Thanks
German


From gstaltari at arnet.net.ar  Wed Jun 21 18:30:46 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Wed, 21 Jun 2006 15:30:46 -0300
Subject: [Linux-cluster] kernel panic - help!
In-Reply-To: <44998B96.8010001@arnet.net.ar>
References: <44998B96.8010001@arnet.net.ar>
Message-ID: <44999056.3060704@arnet.net.ar>

German Staltari wrote:
> Hi, we have a 6 node cluster with FC4, kernel 2.6.16 and the last 
> stable cluster tarball. The cluster was OK until we had a little SAN 
> failure, since then, the cluster (entirely) is getting kernel panic. 
> This is the dump:
>
> qmail-be-04 kernel: ------------[ cut here ]------------
> qmail-be-04 kernel: kernel BUG at 
> /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c:357!
> qmail-be-04 kernel: invalid opcode: 0000 [#1]
> qmail-be-04 kernel: SMP
> qmail-be-04 kernel: CPU:    0
> qmail-be-04 kernel: EIP is at do_dlm_unlock+0xd1/0xe5 [lock_dlm]
> qmail-be-04 kernel: eax: 00000004   ebx: 00000084   ecx: ffffeb92   
> edx: 00000000
> qmail-be-04 kernel: esi: 00010001   edi: ffffffea   ebp: dc9495c0   
> esp: e382fef4
> qmail-be-04 kernel: ds: 007b   es: 007b   ss: 0068
> qmail-be-04 kernel: Process gfs_glockd (pid: 29218, 
> threadinfo=e382f000 task=f3524550)
> qmail-be-04 kernel: Stack: <0>f8e95673 f3b9f700 ffffffea 00000002 
> 007798a8 00000000 00010001 00000084
> qmail-be-04 kernel:        00000002 f9618000 00000003 dc9495c0 
> eaa6ae84 f8e8f52e f8eb46b5 eaa6aeb4
>
> All nodes dies at the same time with this kernel panic.
> Thanks
> German
>
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
I think this would help too:

Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm:  Assertion failed on line 
357 of file /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c
Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm:  assertion:  "!error"
Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm:  time = 2512697
Jun 21 14:59:58 qmail-be-04 kernel: mstore008-002: error=-22 
num=2,7798a8 lkf=10001 flags=84

Thanks again
German


From teigland at redhat.com  Wed Jun 21 18:34:29 2006
From: teigland at redhat.com (David Teigland)
Date: Wed, 21 Jun 2006 13:34:29 -0500
Subject: [Linux-cluster] kernel panic - help!
In-Reply-To: <44998B96.8010001@arnet.net.ar>
References: <44998B96.8010001@arnet.net.ar>
Message-ID: <20060621183429.GC4706@redhat.com>

On Wed, Jun 21, 2006 at 03:10:30PM -0300, German Staltari wrote:
> Hi, we have a 6 node cluster with FC4, kernel 2.6.16 and the last stable 
> cluster tarball. The cluster was OK until we had a little SAN failure, 
> since then, the cluster (entirely) is getting kernel panic. This is the 
> dump:

Any messages before this?  The best you could hope for with a SAN failure
is that all the cluster nodes withdraw gfs, allowing you to reboot them
without the panic.  So, the end result wouldn't be all that different than
the panics.

Dave

> qmail-be-04 kernel: ------------[ cut here ]------------
> qmail-be-04 kernel: kernel BUG at 
> /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c:357!
> qmail-be-04 kernel: invalid opcode: 0000 [#1]
> qmail-be-04 kernel: SMP
> qmail-be-04 kernel: CPU:    0
> qmail-be-04 kernel: EIP is at do_dlm_unlock+0xd1/0xe5 [lock_dlm]
> qmail-be-04 kernel: eax: 00000004   ebx: 00000084   ecx: ffffeb92   edx: 
> 00000000
> qmail-be-04 kernel: esi: 00010001   edi: ffffffea   ebp: dc9495c0   esp: 
> e382fef4
> qmail-be-04 kernel: ds: 007b   es: 007b   ss: 0068
> qmail-be-04 kernel: Process gfs_glockd (pid: 29218, threadinfo=e382f000 
> task=f3524550)
> qmail-be-04 kernel: Stack: <0>f8e95673 f3b9f700 ffffffea 00000002 
> 007798a8 00000000 00010001 00000084
> qmail-be-04 kernel:        00000002 f9618000 00000003 dc9495c0 eaa6ae84 
> f8e8f52e f8eb46b5 eaa6aeb4


From rpeterso at redhat.com  Wed Jun 21 18:46:04 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Wed, 21 Jun 2006 13:46:04 -0500
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
In-Reply-To: <001a01c69559$0a7cd920$1f768b60$@com>
References: <001a01c69559$0a7cd920$1f768b60$@com>
Message-ID: <449993EC.4030803@redhat.com>

RR wrote:
>
> Hello all,
>
>  
>
> Is there a particular manner I should partition the local filesystems 
> of each of the cluster nodes to support the Cluster Suite w/GFS or it 
> doesn't matter? My specific requirement is that I may or may not be 
> able to change the location where this specific application writes 
> data. And I need that directory/filesystem that this data is written 
> to, e.g. /var/spool to be accessible on my iSCSI SAN by all the 
> cluster nodes. The answer could be very simple but want to double check.
>
>  
>
> Rgds,
>
> RR
>
Hi RR,

For your the local root partitions on the individual nodes, it's 
probably best to use ext3.
On the SAN, use GFS and Red Hat Cluster Suite.  Then perhaps you can 
create a symlink from
your local node's mount point to the SAN, e.g. from 
/mnt/gfs_san/var/spool to its local /var/spool.

Regards,

Bob Peterson
Red Hat Cluster Suite


From gstaltari at arnet.net.ar  Wed Jun 21 18:41:58 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Wed, 21 Jun 2006 15:41:58 -0300
Subject: [Linux-cluster] kernel panic - help!
In-Reply-To: <20060621183429.GC4706@redhat.com>
References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com>
Message-ID: <449992F6.9060608@arnet.net.ar>

David Teigland wrote:
> Any messages before this?  The best you could hope for with a SAN failure
> is that all the cluster nodes withdraw gfs, allowing you to reboot them
> without the panic.  So, the end result wouldn't be all that different than
> the panics.
>   
>
This is the log just before the panic:

Jun 21 14:59:17 qmail-be-04 kernel: CMAN: removing node qmail-be-02 from 
the cluster : Missed too many heartbeats
Jun 21 14:59:23 qmail-be-04 kernel: CMAN: removing node qmail-be-01 from 
the cluster : No response to messages
Jun 21 14:59:29 qmail-be-04 kernel: CMAN: removing node qmail-be-06 from 
the cluster : No response to messages
Jun 21 14:59:39 qmail-be-04 kernel: CMAN: removing node qmail-be-03 from 
the cluster : No response to messages
Jun 21 14:59:46 qmail-be-04 kernel: CMAN: removing node qmail-be-05 from 
the cluster : No response to messages
Jun 21 14:59:52 qmail-be-04 kernel: CMAN: quorum lost, blocking activity
Jun 21 14:59:52 qmail-be-04 kernel: CMAN: node qmail-be-04 has been 
removed from the cluster : No response to messages
Jun 21 14:59:52 qmail-be-04 kernel: CMAN: killed by NODEDOWN message
Jun 21 14:59:52 qmail-be-04 kernel: CMAN: we are leaving the cluster. No 
response to messages
Jun 21 14:59:52 qmail-be-04 kernel: WARNING: dlm_emergency_shutdown
Jun 21 14:59:52 qmail-be-04 fenced[17897]: process_events: service get 
event failed
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 1000041 from 3 req 1
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003b from 3 req 3
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003b from 3 req 9
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003b from 3 req 9
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003b from 3 req 3
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003f from 3 req 3
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003f from 3 req 9
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003f from 3 req 3
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003f from 3 req 3
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003f from 3 req 9
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003f from 3 req 3
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003f from 3 req 9
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003f from 3 req 3
Jun 21 14:59:53 qmail-be-04 last message repeated 7 times
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003d from 3 req 3
Jun 21 14:59:53 qmail-be-04 last message repeated 6 times
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 1000041 from 3 req 3
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003b from 3 req 3
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003b from 3 req 3
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003f from 3 req 1
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 1000041 from 3 req 9
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003f from 3 req 9
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 1000041 from 3 req 1
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003f from 3 req 1
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003b from 3 req 1
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003b from 3 req 5
Jun 21 14:59:53 qmail-be-04 last message repeated 5 times
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003d from 3 req 5
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003f from 3 req 5
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003d from 3 req 9
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003d from 3 req 5
Jun 21 14:59:53 qmail-be-04 last message repeated 3 times
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003d from 3 req 9
Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
lockspace 100003d from 3 req 5
Jun 21 14:59:54 qmail-be-04 last message repeated 20 times
Jun 21 14:59:54 qmail-be-04 kernel: dlm: dlm_unlock: lkid 3b013d 
lockspace not found
Jun 21 14:59:54 qmail-be-04 kernel: store004-003 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore004-003 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore004-003 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore002-004 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore003-003 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore001-001 add_to_requestq cmd 3 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore003-003 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore001-003 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore004-004 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore002-003 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore003-004 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore002-002 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore004-003 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore002-004 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore002-002 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore001-003 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore002-003 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore003-002 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore004-004 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel: mstore003-003 add_to_requestq cmd 5 fr 3
Jun 21 14:59:54 qmail-be-04 kernel:  type 2 event 282 flags 21a
Jun 21 14:59:54 qmail-be-04 kernel: 28975 pr_start 282 done 1
Jun 21 14:59:54 qmail-be-04 kernel: 28975 pr_finish flags 1a
Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_start last_stop 273 
last_start 283 last_finish 273
Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_start count 5 type 2 event 
283 flags 21a
Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_start 283 done 1
Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_finish flags 1a
Jun 21 14:59:54 qmail-be-04 kernel: 28957 pr_start last_stop 283 
last_start 285 last_finish 283
Jun 21 14:59:54 qmail-be-04 kernel: 28957 pr_start count 6 type 2 event 
285 flags 21a
Jun 21 14:59:54 qmail-be-04 kernel: 28957 pr_start 285 done 1
Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_finish flags 1a
Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_start last_stop 116 
last_start 287 last_finish 116
Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_start count 4 type 2 event 
287 flags 21a
Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_start 287 done 1
Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_finish flags 1a
Jun 21 14:59:55 qmail-be-04 kernel: 28975 rereq 2,1cec36 id e029f 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_start last_stop 282 
last_start 289 last_finish 282
Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_start count 5 type 2 event 
289 flags 21a
Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_start 289 done 1
Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_finish flags 1a
Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_start last_stop 287 
last_start 291 last_finish 287
Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_start count 5 type 2 event 
291 flags 21a
Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_start 291 done 1
Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_finish flags 1a
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,2fe4b id a001e 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,3fd13 id 7007a 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,6faaf id 90009 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,2fdd6 id c0135 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,4fc8f id c023b 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,34a id 8011d 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,2fe4b id b03c3 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,6faaf id b001e 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,34a id 11000e 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,6faa8 id f0016 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,6faa8 id 1001a8 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,5fbd9 id f00e9 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,5fb4d id 802ac 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,5fbd9 id f0026 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,3fd13 id c009b 3,0
Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,2fdd6 id 8001d 3,0
Jun 21 14:59:56 qmail-be-04 kernel: 28974 rereq 2,4fc8f id c0367 3,0
Jun 21 14:59:56 qmail-be-04 kernel: 28974 rereq 2,2fe40 id a01fd 3,0
Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_start last_stop 289 
last_start 293 last_finish 289
Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_start count 6 type 2 event 
293 flags 21a
Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,7fa97 id 1502b3 3,0
Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,3fcea id 702f3 3,0
Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,5fc1e id 6015f 3,0
Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,5fc1e id c01c1 3,0
Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,2fdfa id c0362 3,0
Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,2fdfa id c02ad 3,0
Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,5fb4d id f01ce 3,0
Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,7fa97 id d0293 3,0
Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,3fcea id d02ad 3,0
Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_start 293 done 1
Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_finish flags 1a
Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start last_stop 118 
last_start 295 last_finish 118
Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start count 4 type 2 event 
295 flags 21a
Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start 295 done 1
Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_finish flags 1a
Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_start last_stop 291 
last_start 297 last_finish 291
Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_start count 6 type 2 event 
297 flags 21a
Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_start 297 done 1
Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_finish flags 1a
Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start last_stop 295 
last_start 299 last_finish 295
Jun 21 14:59:57 qmail-be-04 kernel: 29174 pr_start count 5 type 2 event 
299 flags 21a
Jun 21 14:59:57 qmail-be-04 kernel: 29174 pr_start 299 done 1
Jun 21 14:59:57 qmail-be-04 kernel: 29174 pr_finish flags 1a
Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_start last_stop 299 
last_start 301 last_finish 299
Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_start count 6 type 2 event 
301 flags 21a
Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_start 301 done 1
Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_finish flags 1a
Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start last_stop 120 
last_start 303 last_finish 120
Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start count 4 type 2 event 
303 flags 21a
Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start 303 done 1
Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_finish flags 1a
Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start last_stop 122 
last_start 305 last_finish 122
Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start count 4 type 2 event 
305 flags 21a
Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start 305 done 1
Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_finish flags 1a
Jun 21 14:59:57 qmail-be-04 kernel: 29458 pr_start last_stop 124 
last_start 308 last_finish 124
Jun 21 14:59:57 qmail-be-04 kernel: 29458 pr_start count 4 type 2 event 
308 flags 21a
Jun 21 14:59:57 qmail-be-04 kernel: 29458 pr_start 308 done 1
Jun 21 14:59:57 qmail-be-04 kernel: 29457 pr_finish flags 1a
Jun 21 14:59:57 qmail-be-04 kernel: 29191 pr_start last_stop 303 
last_start 309 last_finish 303
Jun 21 14:59:57 qmail-be-04 kernel: 29191 pr_start count 5 type 2 event 
309 flags 21a
Jun 21 14:59:57 qmail-be-04 kernel: 29191 pr_start 309 done 1
Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_finish flags 1a
Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start last_stop 305 
last_start 311 last_finish 305
Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start count 5 type 2 event 
311 flags 21a
Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start 311 done 1
Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_finish flags 1a
Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start last_stop 309 
last_start 313 last_finish 309
Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start count 6 type 2 event 
313 flags 21a
Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start 313 done 1
Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_finish flags 1a
Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start last_stop 308 
last_start 315 last_finish 308
Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start count 5 type 2 event 
315 flags 21a
Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start 315 done 1
Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_finish flags 1a
Jun 21 14:59:58 qmail-be-04 kernel: 29409 pr_start last_stop 311 
last_start 317 last_finish 311
Jun 21 14:59:58 qmail-be-04 kernel: 29409 pr_start count 6 type 2 event 
317 flags 21a
Jun 21 14:59:58 qmail-be-04 kernel: 29409 pr_start 317 done 1
Jun 21 14:59:58 qmail-be-04 kernel: 29408 pr_finish flags 1a
Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start last_stop 315 
last_start 319 last_finish 315
Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start count 6 type 2 event 
319 flags 21a
Jun 21 14:59:58 qmail-be-04 kernel: 29457 rereq 2,2bd7b5 id 801e5 3,0
Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start 319 done 1
Jun 21 14:59:58 qmail-be-04 kernel: 29457 pr_finish flags 1a
Jun 21 14:59:58 qmail-be-04 kernel:
Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm:  Assertion failed on line 
357 of file /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c
Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm:  assertion:  "!error"
Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm:  time = 2512697
Jun 21 14:59:58 qmail-be-04 kernel: mstore008-002: error=-22 
num=2,7798a8 lkf=10001 flags=84

Is the second panic today :(
Thanks again
German


From teigland at redhat.com  Wed Jun 21 19:42:49 2006
From: teigland at redhat.com (David Teigland)
Date: Wed, 21 Jun 2006 14:42:49 -0500
Subject: [Linux-cluster] kernel panic - help!
In-Reply-To: <449992F6.9060608@arnet.net.ar>
References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com>
	<449992F6.9060608@arnet.net.ar>
Message-ID: <20060621194249.GB6765@redhat.com>

On Wed, Jun 21, 2006 at 03:41:58PM -0300, German Staltari wrote:
> Jun 21 14:59:17 qmail-be-04 kernel: CMAN: removing node qmail-be-02 from 
> the cluster : Missed too many heartbeats
> Jun 21 14:59:23 qmail-be-04 kernel: CMAN: removing node qmail-be-01 from 
> the cluster : No response to messages
> Jun 21 14:59:29 qmail-be-04 kernel: CMAN: removing node qmail-be-06 from 
> the cluster : No response to messages
> Jun 21 14:59:39 qmail-be-04 kernel: CMAN: removing node qmail-be-03 from 
> the cluster : No response to messages
> Jun 21 14:59:46 qmail-be-04 kernel: CMAN: removing node qmail-be-05 from 
> the cluster : No response to messages
> Jun 21 14:59:52 qmail-be-04 kernel: CMAN: quorum lost, blocking activity
> Jun 21 14:59:52 qmail-be-04 kernel: CMAN: node qmail-be-04 has been 
> removed from the cluster : No response to messages
> Jun 21 14:59:52 qmail-be-04 kernel: CMAN: killed by NODEDOWN message
> Jun 21 14:59:52 qmail-be-04 kernel: CMAN: we are leaving the cluster. No 
> response to messages

This is what led to the gfs panic, the cluster shut down when it lost
contact with all the other nodes.

Dave


> Jun 21 14:59:52 qmail-be-04 kernel: WARNING: dlm_emergency_shutdown
> Jun 21 14:59:52 qmail-be-04 fenced[17897]: process_events: service get 
> event failed
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 1000041 from 3 req 1
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003b from 3 req 3
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003b from 3 req 9
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003b from 3 req 9
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003b from 3 req 3
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003f from 3 req 3
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003f from 3 req 9
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003f from 3 req 3
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003f from 3 req 3
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003f from 3 req 9
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003f from 3 req 3
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003f from 3 req 9
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003f from 3 req 3
> Jun 21 14:59:53 qmail-be-04 last message repeated 7 times
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003d from 3 req 3
> Jun 21 14:59:53 qmail-be-04 last message repeated 6 times
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 1000041 from 3 req 3
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003b from 3 req 3
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003b from 3 req 3
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003f from 3 req 1
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 1000041 from 3 req 9
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003f from 3 req 9
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 1000041 from 3 req 1
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003f from 3 req 1
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003b from 3 req 1
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003b from 3 req 5
> Jun 21 14:59:53 qmail-be-04 last message repeated 5 times
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003d from 3 req 5
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003f from 3 req 5
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003d from 3 req 9
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003d from 3 req 5
> Jun 21 14:59:53 qmail-be-04 last message repeated 3 times
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003d from 3 req 9
> Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid 
> lockspace 100003d from 3 req 5
> Jun 21 14:59:54 qmail-be-04 last message repeated 20 times
> Jun 21 14:59:54 qmail-be-04 kernel: dlm: dlm_unlock: lkid 3b013d 
> lockspace not found
> Jun 21 14:59:54 qmail-be-04 kernel: store004-003 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore004-003 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore004-003 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore002-004 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore003-003 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore001-001 add_to_requestq cmd 3 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore003-003 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore001-003 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore004-004 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore002-003 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore003-004 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore002-002 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore004-003 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore002-004 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore002-002 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore001-003 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore002-003 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore003-002 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore004-004 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel: mstore003-003 add_to_requestq cmd 5 fr 3
> Jun 21 14:59:54 qmail-be-04 kernel:  type 2 event 282 flags 21a
> Jun 21 14:59:54 qmail-be-04 kernel: 28975 pr_start 282 done 1
> Jun 21 14:59:54 qmail-be-04 kernel: 28975 pr_finish flags 1a
> Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_start last_stop 273 
> last_start 283 last_finish 273
> Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_start count 5 type 2 event 
> 283 flags 21a
> Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_start 283 done 1
> Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_finish flags 1a
> Jun 21 14:59:54 qmail-be-04 kernel: 28957 pr_start last_stop 283 
> last_start 285 last_finish 283
> Jun 21 14:59:54 qmail-be-04 kernel: 28957 pr_start count 6 type 2 event 
> 285 flags 21a
> Jun 21 14:59:54 qmail-be-04 kernel: 28957 pr_start 285 done 1
> Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_finish flags 1a
> Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_start last_stop 116 
> last_start 287 last_finish 116
> Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_start count 4 type 2 event 
> 287 flags 21a
> Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_start 287 done 1
> Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_finish flags 1a
> Jun 21 14:59:55 qmail-be-04 kernel: 28975 rereq 2,1cec36 id e029f 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_start last_stop 282 
> last_start 289 last_finish 282
> Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_start count 5 type 2 event 
> 289 flags 21a
> Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_start 289 done 1
> Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_finish flags 1a
> Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_start last_stop 287 
> last_start 291 last_finish 287
> Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_start count 5 type 2 event 
> 291 flags 21a
> Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_start 291 done 1
> Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_finish flags 1a
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,2fe4b id a001e 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,3fd13 id 7007a 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,6faaf id 90009 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,2fdd6 id c0135 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,4fc8f id c023b 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,34a id 8011d 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,2fe4b id b03c3 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,6faaf id b001e 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,34a id 11000e 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,6faa8 id f0016 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,6faa8 id 1001a8 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,5fbd9 id f00e9 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,5fb4d id 802ac 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,5fbd9 id f0026 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,3fd13 id c009b 3,0
> Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,2fdd6 id 8001d 3,0
> Jun 21 14:59:56 qmail-be-04 kernel: 28974 rereq 2,4fc8f id c0367 3,0
> Jun 21 14:59:56 qmail-be-04 kernel: 28974 rereq 2,2fe40 id a01fd 3,0
> Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_start last_stop 289 
> last_start 293 last_finish 289
> Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_start count 6 type 2 event 
> 293 flags 21a
> Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,7fa97 id 1502b3 3,0
> Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,3fcea id 702f3 3,0
> Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,5fc1e id 6015f 3,0
> Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,5fc1e id c01c1 3,0
> Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,2fdfa id c0362 3,0
> Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,2fdfa id c02ad 3,0
> Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,5fb4d id f01ce 3,0
> Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,7fa97 id d0293 3,0
> Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,3fcea id d02ad 3,0
> Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_start 293 done 1
> Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_finish flags 1a
> Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start last_stop 118 
> last_start 295 last_finish 118
> Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start count 4 type 2 event 
> 295 flags 21a
> Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start 295 done 1
> Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_finish flags 1a
> Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_start last_stop 291 
> last_start 297 last_finish 291
> Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_start count 6 type 2 event 
> 297 flags 21a
> Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_start 297 done 1
> Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_finish flags 1a
> Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start last_stop 295 
> last_start 299 last_finish 295
> Jun 21 14:59:57 qmail-be-04 kernel: 29174 pr_start count 5 type 2 event 
> 299 flags 21a
> Jun 21 14:59:57 qmail-be-04 kernel: 29174 pr_start 299 done 1
> Jun 21 14:59:57 qmail-be-04 kernel: 29174 pr_finish flags 1a
> Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_start last_stop 299 
> last_start 301 last_finish 299
> Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_start count 6 type 2 event 
> 301 flags 21a
> Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_start 301 done 1
> Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_finish flags 1a
> Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start last_stop 120 
> last_start 303 last_finish 120
> Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start count 4 type 2 event 
> 303 flags 21a
> Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start 303 done 1
> Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_finish flags 1a
> Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start last_stop 122 
> last_start 305 last_finish 122
> Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start count 4 type 2 event 
> 305 flags 21a
> Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start 305 done 1
> Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_finish flags 1a
> Jun 21 14:59:57 qmail-be-04 kernel: 29458 pr_start last_stop 124 
> last_start 308 last_finish 124
> Jun 21 14:59:57 qmail-be-04 kernel: 29458 pr_start count 4 type 2 event 
> 308 flags 21a
> Jun 21 14:59:57 qmail-be-04 kernel: 29458 pr_start 308 done 1
> Jun 21 14:59:57 qmail-be-04 kernel: 29457 pr_finish flags 1a
> Jun 21 14:59:57 qmail-be-04 kernel: 29191 pr_start last_stop 303 
> last_start 309 last_finish 303
> Jun 21 14:59:57 qmail-be-04 kernel: 29191 pr_start count 5 type 2 event 
> 309 flags 21a
> Jun 21 14:59:57 qmail-be-04 kernel: 29191 pr_start 309 done 1
> Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_finish flags 1a
> Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start last_stop 305 
> last_start 311 last_finish 305
> Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start count 5 type 2 event 
> 311 flags 21a
> Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start 311 done 1
> Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_finish flags 1a
> Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start last_stop 309 
> last_start 313 last_finish 309
> Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start count 6 type 2 event 
> 313 flags 21a
> Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start 313 done 1
> Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_finish flags 1a
> Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start last_stop 308 
> last_start 315 last_finish 308
> Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start count 5 type 2 event 
> 315 flags 21a
> Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start 315 done 1
> Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_finish flags 1a
> Jun 21 14:59:58 qmail-be-04 kernel: 29409 pr_start last_stop 311 
> last_start 317 last_finish 311
> Jun 21 14:59:58 qmail-be-04 kernel: 29409 pr_start count 6 type 2 event 
> 317 flags 21a
> Jun 21 14:59:58 qmail-be-04 kernel: 29409 pr_start 317 done 1
> Jun 21 14:59:58 qmail-be-04 kernel: 29408 pr_finish flags 1a
> Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start last_stop 315 
> last_start 319 last_finish 315
> Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start count 6 type 2 event 
> 319 flags 21a
> Jun 21 14:59:58 qmail-be-04 kernel: 29457 rereq 2,2bd7b5 id 801e5 3,0
> Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start 319 done 1
> Jun 21 14:59:58 qmail-be-04 kernel: 29457 pr_finish flags 1a
> Jun 21 14:59:58 qmail-be-04 kernel:
> Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm:  Assertion failed on line 
> 357 of file /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c
> Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm:  assertion:  "!error"
> Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm:  time = 2512697
> Jun 21 14:59:58 qmail-be-04 kernel: mstore008-002: error=-22 
> num=2,7798a8 lkf=10001 flags=84
> 
> Is the second panic today :(
> Thanks again
> German


From gstaltari at arnet.net.ar  Wed Jun 21 19:50:07 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Wed, 21 Jun 2006 16:50:07 -0300
Subject: [Linux-cluster] kernel panic - help!
In-Reply-To: <20060621194249.GB6765@redhat.com>
References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com>
	<449992F6.9060608@arnet.net.ar> <20060621194249.GB6765@redhat.com>
Message-ID: <4499A2EF.7020005@arnet.net.ar>

David Teigland wrote:
> On Wed, Jun 21, 2006 at 03:41:58PM -0300, German Staltari wrote:
>   
>> Jun 21 14:59:17 qmail-be-04 kernel: CMAN: removing node qmail-be-02 from 
>> the cluster : Missed too many heartbeats
>> Jun 21 14:59:23 qmail-be-04 kernel: CMAN: removing node qmail-be-01 from 
>> the cluster : No response to messages
>> Jun 21 14:59:29 qmail-be-04 kernel: CMAN: removing node qmail-be-06 from 
>> the cluster : No response to messages
>> Jun 21 14:59:39 qmail-be-04 kernel: CMAN: removing node qmail-be-03 from 
>> the cluster : No response to messages
>> Jun 21 14:59:46 qmail-be-04 kernel: CMAN: removing node qmail-be-05 from 
>> the cluster : No response to messages
>> Jun 21 14:59:52 qmail-be-04 kernel: CMAN: quorum lost, blocking activity
>> Jun 21 14:59:52 qmail-be-04 kernel: CMAN: node qmail-be-04 has been 
>> removed from the cluster : No response to messages
>> Jun 21 14:59:52 qmail-be-04 kernel: CMAN: killed by NODEDOWN message
>> Jun 21 14:59:52 qmail-be-04 kernel: CMAN: we are leaving the cluster. No 
>> response to messages
>>     
>
> This is what led to the gfs panic, the cluster shut down when it lost
> contact with all the other nodes.
>
> Dave
>
>   
Ok, but this node lost contact with the cluster because all the other 
nodes get the same panic at the same time.
We had another panic a few minutes ago... 3rd panic today... the same 
logs output...

Jun 21 16:13:55 qmail-be-01 kernel: lock_dlm:  Assertion failed on line 
357 of file /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c
Jun 21 16:13:55 qmail-be-01 kernel: lock_dlm:  assertion:  "!error"
Jun 21 16:13:55 qmail-be-01 kernel: lock_dlm:  time = 951351
Jun 21 16:13:55 qmail-be-01 kernel: mstore008-004: error=-22 
num=2,75c6db lkf=10000 flags=84
Jun 21 16:13:55 qmail-be-01 kernel:
Jun 21 16:13:55 qmail-be-01 kernel: ------------[ cut here ]------------
Jun 21 16:13:55 qmail-be-01 kernel: kernel BUG at 
/soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c:357!
Jun 21 16:13:55 qmail-be-01 kernel: invalid opcode: 0000 [#1]
Jun 21 16:13:55 qmail-be-01 kernel: SMP
Jun 21 16:13:55 qmail-be-01 kernel: Modules linked in: nfsd exportfs 
lockd nfs_acl sunrpc gfs lock_dlm lock_harness dlm cman dm_round_robin 
dm_multipath ipv6 ohci_hcd i2c_piix4 i2c_core e1000 sg ext3 jbd dm_mod 
qla2300 qla2xxx scsi_transport_fc mptspi mptscsih mptbase sd_mod scsi_mod
Jun 21 16:13:55 qmail-be-01 kernel: CPU:    6
Jun 21 16:13:55 qmail-be-01 kernel: EIP:    0060:[<f90254d8>]    
Tainted: GF     VLI
Jun 21 16:13:55 qmail-be-01 kernel: EFLAGS: 00010296   (2.6.16.11-gds #1)
Jun 21 16:13:55 qmail-be-01 kernel: EIP is at do_dlm_unlock+0xd1/0xe5 
[lock_dlm]
Jun 21 16:13:55 qmail-be-01 kernel: eax: 00000004   ebx: 00000084   ecx: 
ffffebd8   edx: 00000000
Jun 21 16:13:55 qmail-be-01 kernel: esi: 00010000   edi: ffffffea   ebp: 
ca4265c0   esp: d741eef4
Jun 21 16:13:56 qmail-be-01 kernel: ds: 007b   es: 007b   ss: 0068
Jun 21 16:13:56 qmail-be-01 kernel: Process gfs_glockd (pid: 1061, 
threadinfo=d741e000 task=d6b40550)
Jun 21 16:13:56 qmail-be-01 kernel: Stack: <0>f902b673 f53267e0 ffffffea 
00000002 0075c6db 00000000 00010000 00000084
Jun 21 16:13:56 qmail-be-01 kernel:        00000002 f9732000 00000003 
ca4265c0 cec8a4ac f902552e f905e6b5 cec8a4dc
Jun 21 16:13:56 qmail-be-01 kernel:        cec8a4c8 cec8a4dc f9055f02 
00000296 000000d0 f9732000 f9089ee0 c539f9c0
Jun 21 16:13:56 qmail-be-01 kernel: Call Trace:
Jun 21 16:13:56 qmail-be-01 kernel:  [<f902552e>] 
lm_dlm_unlock+0x14/0x1c [lock_dlm]
Jun 21 16:13:56 qmail-be-01 kernel:  [<f905e6b5>] 
gfs_lm_unlock+0x2c/0x47 [gfs]
Jun 21 16:13:56 qmail-be-01 kernel:  [<f9055f02>] 
gfs_glock_drop_th+0x84/0x182 [gfs]
Jun 21 16:13:56 qmail-be-01 kernel:  [<f9054817>] run_queue+0x348/0x374 
[gfs]
Jun 21 16:13:56 qmail-be-01 kernel:  [<f90541a4>] 
handle_callback+0xe6/0x120 [gfs]
Jun 21 16:13:56 qmail-be-01 kernel:  [<f905485e>] 
unlock_on_glock+0x1b/0x24 [gfs]
Jun 21 16:13:56 qmail-be-01 kernel:  [<f905441b>] 
gfs_reclaim_glock+0xbc/0x170 [gfs]
Jun 21 16:13:56 qmail-be-01 kernel:  [<c031db3e>] _spin_lock_irqsave+0x9/0xd
Jun 21 16:13:56 qmail-be-01 kernel:  [<f9047bca>] gfs_glockd+0xda/0xff [gfs]


From teigland at redhat.com  Wed Jun 21 20:06:16 2006
From: teigland at redhat.com (David Teigland)
Date: Wed, 21 Jun 2006 15:06:16 -0500
Subject: [Linux-cluster] kernel panic - help!
In-Reply-To: <4499A2EF.7020005@arnet.net.ar>
References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com>
	<449992F6.9060608@arnet.net.ar> <20060621194249.GB6765@redhat.com>
	<4499A2EF.7020005@arnet.net.ar>
Message-ID: <20060621200616.GC6765@redhat.com>

On Wed, Jun 21, 2006 at 04:50:07PM -0300, German Staltari wrote:
> Ok, but this node lost contact with the cluster because all the other 
> nodes get the same panic at the same time.

Any messages preceding the panicks on the other nodes?
Dave


From gstaltari at arnet.net.ar  Wed Jun 21 21:04:36 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Wed, 21 Jun 2006 18:04:36 -0300
Subject: [Linux-cluster] kernel panic - help!
In-Reply-To: <20060621200616.GC6765@redhat.com>
References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com>
	<449992F6.9060608@arnet.net.ar> <20060621194249.GB6765@redhat.com>
	<4499A2EF.7020005@arnet.net.ar> <20060621200616.GC6765@redhat.com>
Message-ID: <4499B464.9090902@arnet.net.ar>

David Teigland wrote:
> On Wed, Jun 21, 2006 at 04:50:07PM -0300, German Staltari wrote:
>   
>> Ok, but this node lost contact with the cluster because all the other 
>> nodes get the same panic at the same time.
>>     
>
> Any messages preceding the panicks on the other nodes?
> Dave
>
>
>   
I've attached a file with the logs of all nodes at the time of the last 
cluster panic.
Thanks for your help!
German

-------------- next part --------------
A non-text attachment was scrubbed...
Name: kernel-oops.txt.gz
Type: application/x-gzip
Size: 9281 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060621/14c98d06/attachment.bin>

From teigland at redhat.com  Wed Jun 21 21:21:33 2006
From: teigland at redhat.com (David Teigland)
Date: Wed, 21 Jun 2006 16:21:33 -0500
Subject: [Linux-cluster] kernel panic - help!
In-Reply-To: <4499B464.9090902@arnet.net.ar>
References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com>
	<449992F6.9060608@arnet.net.ar> <20060621194249.GB6765@redhat.com>
	<4499A2EF.7020005@arnet.net.ar> <20060621200616.GC6765@redhat.com>
	<4499B464.9090902@arnet.net.ar>
Message-ID: <20060621212132.GD6765@redhat.com>

On Wed, Jun 21, 2006 at 06:04:36PM -0300, German Staltari wrote:
> David Teigland wrote:
> >On Wed, Jun 21, 2006 at 04:50:07PM -0300, German Staltari wrote:
> >  
> >>Ok, but this node lost contact with the cluster because all the other 
> >>nodes get the same panic at the same time.
> >
> >Any messages preceding the panicks on the other nodes?
> >  
> I've attached a file with the logs of all nodes at the time of the last 
> cluster panic.

It looks like cman is shutting the cluster down everywhere prior to any
gfs problems anywhere.  I wonder if you might have this bug:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=187777

which is fixed in CVS:

-rSTABLE
Checking in cnxman.c;
/cvs/cluster/cluster/cman-kernel/src/Attic/cnxman.c,v  <--  cnxman.c
new revision: 1.42.2.12.4.1.2.12; previous revision: 1.42.2.12.4.1.2.11
done
Checking in membership.c;
/cvs/cluster/cluster/cman-kernel/src/Attic/membership.c,v  <--
membership.c
new revision: 1.44.2.18.6.5; previous revision: 1.44.2.18.6.4
done

Dave


From gstaltari at arnet.net.ar  Wed Jun 21 21:26:57 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Wed, 21 Jun 2006 18:26:57 -0300
Subject: [Linux-cluster] kernel panic - help!
In-Reply-To: <20060621212132.GD6765@redhat.com>
References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com>
	<449992F6.9060608@arnet.net.ar> <20060621194249.GB6765@redhat.com>
	<4499A2EF.7020005@arnet.net.ar> <20060621200616.GC6765@redhat.com>
	<4499B464.9090902@arnet.net.ar> <20060621212132.GD6765@redhat.com>
Message-ID: <4499B9A1.2070606@arnet.net.ar>

David Teigland wrote:
> On Wed, Jun 21, 2006 at 06:04:36PM -0300, German Staltari wrote:
>   
>> David Teigland wrote:
>>     
>>> On Wed, Jun 21, 2006 at 04:50:07PM -0300, German Staltari wrote:
>>>  
>>>       
>>>> Ok, but this node lost contact with the cluster because all the other 
>>>> nodes get the same panic at the same time.
>>>>         
>>> Any messages preceding the panicks on the other nodes?
>>>  
>>>       
>> I've attached a file with the logs of all nodes at the time of the last 
>> cluster panic.
>>     
>
> It looks like cman is shutting the cluster down everywhere prior to any
> gfs problems anywhere.  I wonder if you might have this bug:
>
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=187777
>
> which is fixed in CVS:
>
> -rSTABLE
> Checking in cnxman.c;
> /cvs/cluster/cluster/cman-kernel/src/Attic/cnxman.c,v  <--  cnxman.c
> new revision: 1.42.2.12.4.1.2.12; previous revision: 1.42.2.12.4.1.2.11
> done
> Checking in membership.c;
> /cvs/cluster/cluster/cman-kernel/src/Attic/membership.c,v  <--
> membership.c
> new revision: 1.44.2.18.6.5; previous revision: 1.44.2.18.6.4
> done
>
> Dave
>
>
>   
It looks like our problem, we'll be updating to the STABLE CVS version.
Thanks Dave :)

German


From vlaurenz at advance.net  Thu Jun 22 04:23:58 2006
From: vlaurenz at advance.net (Vito Laurenza)
Date: Thu, 22 Jun 2006 00:23:58 -0400
Subject: [Linux-cluster] Cluster Failover Scripts...
Message-ID: <15509160.1150950238347.JavaMail.root@brimley.host.advance.net>

Hello all,
 
I've written a script to notify via email on Cluster Suite events (failovers, etc) and have added it to my service in cluster.conf, but I've noticed that manual failovers are not processing properly.  My failover script runs, serviceB is relocated, but serviceA only stops on the source node and does not start on the destination node.
 
Is it ok to list more than one script per service in cluster.conf?  Am I going about this the wrong way?
 
I'd appreciate any help on this matter.
 
Below is snippet from my cluster.conf:
 
 <rm>
   <failoverdomains>
        <failoverdomain name="scalix_cluster_001" ordered="1" restricted="1">
                <failoverdomainnode name="serverA" priority="1"/>
                <failoverdomainnode name="serverB" priority="2"/>
        </failoverdomain>
   </failoverdomains>
   <resources>
        <ip address="<myip>" monitor_link="1"/>
   </resources>
   <service autostart="1" name="myservice">
        <ip ref="<myip>"/>
        <fs device="/dev/device" fstype="ext3" mountpoint="/somewhere" force_unmount="1" name="mysharedstorage"/>
        <script file="/usr/local/bin/failover-notify" name="notify_script"/>
        <scritp file="/etc/init.d/serviceA" name="serviceA_script"/>
        <script file="/etc/init.d/serviceB" name="serviceB_script"/>
   </service>
 </rm>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060622/cd7d2aaf/attachment.htm>

From ranjtech at gmail.com  Thu Jun 22 05:35:27 2006
From: ranjtech at gmail.com (RR)
Date: Thu, 22 Jun 2006 15:35:27 +1000
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
In-Reply-To: <449993EC.4030803@redhat.com>
References: <001a01c69559$0a7cd920$1f768b60$@com> <449993EC.4030803@redhat.com>
Message-ID: <006501c695bd$ac9d10d0$05d73270$@com>

Thanks Bob, that was my thought as well but just wasn't sure if that was the
best way to implement partitioning. I suppose I'm still a bit not comfy with
GFS considering I've never seen it in action but should do in a few days
time, as I'm starting to set up the cluster suite and GFS now. 

I'm assuming I still cannot use Logical Volumes on the cluster nodes while
the nodes are being used within a cluster in conjunction with GFS? How do
people provide for RAID1 type redundancy against system boot-disk failure
for each of their cluster nodes if I can't setup software raid for these
nodes? Do I run a cron job to sync up the two drives within the system so in
case of a drive failure, I can at least boot the system using the secondary
drive?

Thanks,
RR

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson
Sent: Thursday, June 22, 2006 4:46 AM
To: linux clustering
Subject: Re: [Linux-cluster] partitioning of filesystems in cluster nodes

Hi RR,

For your the local root partitions on the individual nodes, it's probably
best to use ext3.
On the SAN, use GFS and Red Hat Cluster Suite.  Then perhaps you can create
a symlink from your local node's mount point to the SAN, e.g. from
/mnt/gfs_san/var/spool to its local /var/spool.

Regards,

Bob Peterson
Red Hat Cluster Suite


From gforte at leopard.us.udel.edu  Thu Jun 22 05:47:10 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Thu, 22 Jun 2006 01:47:10 -0400
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
In-Reply-To: <006501c695bd$ac9d10d0$05d73270$@com>
References: <001a01c69559$0a7cd920$1f768b60$@com> <449993EC.4030803@redhat.com>
	<006501c695bd$ac9d10d0$05d73270$@com>
Message-ID: <449A2EDE.7060209@leopard.us.udel.edu>

Where have you gotten these ideas?  LVM and software raid are not 
mutually exclusive with GFS.  Your local disks can easily have LVM 
volumes and use software (or hardware) RAID, no problem.  Unless you're 
trying to export local disks as GFS volumes via GNDB (which I can't 
speak to, have never done that), there shouldn't be any restrictions on 
how you can configure the local disks.  If you want certain portions of 
the local filesystem(s) to be located on the SAN instead, it's as simple 
as a symlink (as Bob indicated).

-g

RR wrote:
> Thanks Bob, that was my thought as well but just wasn't sure if that was the
> best way to implement partitioning. I suppose I'm still a bit not comfy with
> GFS considering I've never seen it in action but should do in a few days
> time, as I'm starting to set up the cluster suite and GFS now. 
> 
> I'm assuming I still cannot use Logical Volumes on the cluster nodes while
> the nodes are being used within a cluster in conjunction with GFS? How do
> people provide for RAID1 type redundancy against system boot-disk failure
> for each of their cluster nodes if I can't setup software raid for these
> nodes? Do I run a cron job to sync up the two drives within the system so in
> case of a drive failure, I can at least boot the system using the secondary
> drive?
> 
> Thanks,
> RR
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson
> Sent: Thursday, June 22, 2006 4:46 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] partitioning of filesystems in cluster nodes
> 
> Hi RR,
> 
> For your the local root partitions on the individual nodes, it's probably
> best to use ext3.
> On the SAN, use GFS and Red Hat Cluster Suite.  Then perhaps you can create
> a symlink from your local node's mount point to the SAN, e.g. from
> /mnt/gfs_san/var/spool to its local /var/spool.
> 
> Regards,
> 
> Bob Peterson
> Red Hat Cluster Suite
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


From ranjtech at gmail.com  Thu Jun 22 06:08:53 2006
From: ranjtech at gmail.com (RR)
Date: Thu, 22 Jun 2006 16:08:53 +1000
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
In-Reply-To: <449A2EDE.7060209@leopard.us.udel.edu>
References: <001a01c69559$0a7cd920$1f768b60$@com>
	<449993EC.4030803@redhat.com>	<006501c695bd$ac9d10d0$05d73270$@com>
	<449A2EDE.7060209@leopard.us.udel.edu>
Message-ID: <006901c695c2$58618d20$0924a760$@com>

Hi Greg,

Thanks for the input. What you say makes sense, but I was getting these
ideas from places like these: http://kbase.redhat.com/faq/FAQ_78_3105.shtm

Have I totally misunderstood this? Maybe this is saying the same thing as
you are but I understand it wrong? I'm definitely not exporting any of my
local volumes or filesystems to anywhere, all I want to do is have this
particular directory tree which is based in /var/spool to actually be on the
SAN and have all nodes read/write to it at their discretion. Nothing too
complicated. But this FAQ seems to say that if I'm running GFS on my nodes,
then those nodes cannot be running software RAID. Somewhere else I'd read I
can't also have LVM volumes on these disks. What I originally had planned
was to have software raid using md and LVM volumes on each machine to allow
for RAID and resizing on each node and then have GFS. 

I'm all confused now! Please clarify someone?

Thanks, and sorry if this is stupid! :(
RR


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Greg Forte
Sent: Thursday, June 22, 2006 3:47 PM
To: linux clustering
Subject: Re: [Linux-cluster] partitioning of filesystems in cluster nodes

Where have you gotten these ideas?  LVM and software raid are not mutually
exclusive with GFS.  Your local disks can easily have LVM volumes and use
software (or hardware) RAID, no problem.  Unless you're trying to export
local disks as GFS volumes via GNDB (which I can't speak to, have never done
that), there shouldn't be any restrictions on how you can configure the
local disks.  If you want certain portions of the local filesystem(s) to be
located on the SAN instead, it's as simple as a symlink (as Bob indicated).

-g

RR wrote:
> Thanks Bob, that was my thought as well but just wasn't sure if that 
> was the best way to implement partitioning. I suppose I'm still a bit 
> not comfy with GFS considering I've never seen it in action but should 
> do in a few days time, as I'm starting to set up the cluster suite and GFS
now.
> 
> I'm assuming I still cannot use Logical Volumes on the cluster nodes 
> while the nodes are being used within a cluster in conjunction with 
> GFS? How do people provide for RAID1 type redundancy against system 
> boot-disk failure for each of their cluster nodes if I can't setup 
> software raid for these nodes? Do I run a cron job to sync up the two 
> drives within the system so in case of a drive failure, I can at least 
> boot the system using the secondary drive?
> 
> Thanks,
> RR
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson
> Sent: Thursday, June 22, 2006 4:46 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] partitioning of filesystems in cluster 
> nodes
> 
> Hi RR,
> 
> For your the local root partitions on the individual nodes, it's 
> probably best to use ext3.
> On the SAN, use GFS and Red Hat Cluster Suite.  Then perhaps you can 
> create a symlink from your local node's mount point to the SAN, e.g. 
> from /mnt/gfs_san/var/spool to its local /var/spool.
> 
> Regards,
> 
> Bob Peterson
> Red Hat Cluster Suite
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From gforte at leopard.us.udel.edu  Thu Jun 22 06:20:15 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Thu, 22 Jun 2006 02:20:15 -0400
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
In-Reply-To: <006901c695c2$58618d20$0924a760$@com>
References: <001a01c69559$0a7cd920$1f768b60$@com>
	<449993EC.4030803@redhat.com>	<006501c695bd$ac9d10d0$05d73270$@com>
	<449A2EDE.7060209@leopard.us.udel.edu>
	<006901c695c2$58618d20$0924a760$@com>
Message-ID: <449A369F.1030708@leopard.us.udel.edu>

No, I'm pretty sure what that's saying is that you can't build a GFS 
filesystem _on top of_ a software RAID volume.  (Although you probably 
could, using GNDB ...) but unless I misunderstood you, you're talking 
about non-GFS filesystems on the local hard drives, with GFS on a SAN. 
This is a "standard" configuration, and certainly there's no reason you 
can't use software raid on the disks that will host non-GFS volumes.

-g

RR wrote:
> Hi Greg,
> 
> Thanks for the input. What you say makes sense, but I was getting these
> ideas from places like these: http://kbase.redhat.com/faq/FAQ_78_3105.shtm
> 
> Have I totally misunderstood this? Maybe this is saying the same thing as
> you are but I understand it wrong? I'm definitely not exporting any of my
> local volumes or filesystems to anywhere, all I want to do is have this
> particular directory tree which is based in /var/spool to actually be on the
> SAN and have all nodes read/write to it at their discretion. Nothing too
> complicated. But this FAQ seems to say that if I'm running GFS on my nodes,
> then those nodes cannot be running software RAID. Somewhere else I'd read I
> can't also have LVM volumes on these disks. What I originally had planned
> was to have software raid using md and LVM volumes on each machine to allow
> for RAID and resizing on each node and then have GFS. 
> 
> I'm all confused now! Please clarify someone?
> 
> Thanks, and sorry if this is stupid! :(
> RR
> 
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Greg Forte
> Sent: Thursday, June 22, 2006 3:47 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] partitioning of filesystems in cluster nodes
> 
> Where have you gotten these ideas?  LVM and software raid are not mutually
> exclusive with GFS.  Your local disks can easily have LVM volumes and use
> software (or hardware) RAID, no problem.  Unless you're trying to export
> local disks as GFS volumes via GNDB (which I can't speak to, have never done
> that), there shouldn't be any restrictions on how you can configure the
> local disks.  If you want certain portions of the local filesystem(s) to be
> located on the SAN instead, it's as simple as a symlink (as Bob indicated).
> 
> -g
> 
> RR wrote:
>> Thanks Bob, that was my thought as well but just wasn't sure if that 
>> was the best way to implement partitioning. I suppose I'm still a bit 
>> not comfy with GFS considering I've never seen it in action but should 
>> do in a few days time, as I'm starting to set up the cluster suite and GFS
> now.
>> I'm assuming I still cannot use Logical Volumes on the cluster nodes 
>> while the nodes are being used within a cluster in conjunction with 
>> GFS? How do people provide for RAID1 type redundancy against system 
>> boot-disk failure for each of their cluster nodes if I can't setup 
>> software raid for these nodes? Do I run a cron job to sync up the two 
>> drives within the system so in case of a drive failure, I can at least 
>> boot the system using the secondary drive?
>>
>> Thanks,
>> RR
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com 
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson
>> Sent: Thursday, June 22, 2006 4:46 AM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] partitioning of filesystems in cluster 
>> nodes
>>
>> Hi RR,
>>
>> For your the local root partitions on the individual nodes, it's 
>> probably best to use ext3.
>> On the SAN, use GFS and Red Hat Cluster Suite.  Then perhaps you can 
>> create a symlink from your local node's mount point to the SAN, e.g. 
>> from /mnt/gfs_san/var/spool to its local /var/spool.
>>
>> Regards,
>>
>> Bob Peterson
>> Red Hat Cluster Suite
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 


From ranjtech at gmail.com  Thu Jun 22 06:37:18 2006
From: ranjtech at gmail.com (RR)
Date: Thu, 22 Jun 2006 16:37:18 +1000
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
In-Reply-To: <449A369F.1030708@leopard.us.udel.edu>
References: <001a01c69559$0a7cd920$1f768b60$@com>	<449993EC.4030803@redhat.com>	<006501c695bd$ac9d10d0$05d73270$@com>	<449A2EDE.7060209@leopard.us.udel.edu>	<006901c695c2$58618d20$0924a760$@com>
	<449A369F.1030708@leopard.us.udel.edu>
Message-ID: <006a01c695c6$50c133a0$f2439ae0$@com>

Right, it is indeed what I want to do. But now let me understand the basics
of GFS. GFS actually runs on the SAN but the GFS drivers/software that I
install on each of my cluster node just allows each of these nodes to see
these volumes? Something analogous to say iscsi-initiators on a node to via
the LUNs on an iSCSI SAN? If that's true, then is it possible for me to say
have my /opt/local installed on the GFS managed filesystem on the SAN such
that whatever application is installed once in this directory can be
accessed by all  nodes mounting that filesystem? So kind of get install
once, use everywhere kind of a deal?

Thanks so much
RR

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Greg Forte
Sent: Thursday, June 22, 2006 4:20 PM
To: linux-cluster at redhat.com
Subject: Re: [Linux-cluster] partitioning of filesystems in cluster nodes

No, I'm pretty sure what that's saying is that you can't build a GFS
filesystem _on top of_ a software RAID volume.  (Although you probably
could, using GNDB ...) but unless I misunderstood you, you're talking about
non-GFS filesystems on the local hard drives, with GFS on a SAN. 
This is a "standard" configuration, and certainly there's no reason you
can't use software raid on the disks that will host non-GFS volumes.

-g

RR wrote:
> Hi Greg,
> 
> Thanks for the input. What you say makes sense, but I was getting 
> these ideas from places like these: 
> http://kbase.redhat.com/faq/FAQ_78_3105.shtm
> 
> Have I totally misunderstood this? Maybe this is saying the same thing 
> as you are but I understand it wrong? I'm definitely not exporting any 
> of my local volumes or filesystems to anywhere, all I want to do is 
> have this particular directory tree which is based in /var/spool to 
> actually be on the SAN and have all nodes read/write to it at their 
> discretion. Nothing too complicated. But this FAQ seems to say that if 
> I'm running GFS on my nodes, then those nodes cannot be running 
> software RAID. Somewhere else I'd read I can't also have LVM volumes 
> on these disks. What I originally had planned was to have software 
> raid using md and LVM volumes on each machine to allow for RAID and
resizing on each node and then have GFS.
> 
> I'm all confused now! Please clarify someone?
> 
> Thanks, and sorry if this is stupid! :( RR
> 
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Greg Forte
> Sent: Thursday, June 22, 2006 3:47 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] partitioning of filesystems in cluster 
> nodes
> 
> Where have you gotten these ideas?  LVM and software raid are not 
> mutually exclusive with GFS.  Your local disks can easily have LVM 
> volumes and use software (or hardware) RAID, no problem.  Unless 
> you're trying to export local disks as GFS volumes via GNDB (which I 
> can't speak to, have never done that), there shouldn't be any 
> restrictions on how you can configure the local disks.  If you want 
> certain portions of the local filesystem(s) to be located on the SAN
instead, it's as simple as a symlink (as Bob indicated).
> 
> -g
> 
> RR wrote:
>> Thanks Bob, that was my thought as well but just wasn't sure if that 
>> was the best way to implement partitioning. I suppose I'm still a bit 
>> not comfy with GFS considering I've never seen it in action but 
>> should do in a few days time, as I'm starting to set up the cluster 
>> suite and GFS
> now.
>> I'm assuming I still cannot use Logical Volumes on the cluster nodes 
>> while the nodes are being used within a cluster in conjunction with 
>> GFS? How do people provide for RAID1 type redundancy against system 
>> boot-disk failure for each of their cluster nodes if I can't setup 
>> software raid for these nodes? Do I run a cron job to sync up the two 
>> drives within the system so in case of a drive failure, I can at 
>> least boot the system using the secondary drive?
>>
>> Thanks,
>> RR
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com 
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert 
>> Peterson
>> Sent: Thursday, June 22, 2006 4:46 AM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] partitioning of filesystems in cluster 
>> nodes
>>
>> Hi RR,
>>
>> For your the local root partitions on the individual nodes, it's 
>> probably best to use ext3.
>> On the SAN, use GFS and Red Hat Cluster Suite.  Then perhaps you can 
>> create a symlink from your local node's mount point to the SAN, e.g.
>> from /mnt/gfs_san/var/spool to its local /var/spool.
>>
>> Regards,
>>
>> Bob Peterson
>> Red Hat Cluster Suite
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From ml at dasir.net  Thu Jun 22 09:30:04 2006
From: ml at dasir.net (David Siroky)
Date: Thu, 22 Jun 2006 11:30:04 +0200
Subject: [Linux-cluster] error: 'struct inode' has no member named 'i_mutex'
Message-ID: <1150968604.8887.3.camel@localhost>

Hi!

I tried to compile the cluster 1.02 against debian kernel 2.6.15

./configure --kernel_src=/usr/src/linux-source-2.6.15
make install

but the compilation fails:

....
/usr/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_file.c: In function
'gfs_write':
/usr/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_file.c:955: warning:
implicit declaration of function 'mutex_lock'
/usr/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_file.c:955: error:
'struct inode' has no member named 'i_mutex'
/usr/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_file.c:960: warning:
implicit declaration of function 'mutex_unlock'
/usr/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_file.c:960: error:
'struct inode' has no member named 'i_mutex'
make[5]: *** [/usr/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_file.o]
Error 1
make[4]: *** [_module_/usr/src/cluster-1.02.00/gfs-kernel/src/gfs] Error
2
make[4]: Leaving directory `/usr/src/linux-source-2.6.15'
make[3]: *** [all] Error 2
make[3]: Leaving directory `/usr/src/cluster-1.02.00/gfs-kernel/src/gfs'
...

What em I doing wrong?

Another thing are messages like:
*** Warning:
"kcl_addref_cluster" [/usr/src/cluster-1.02.00/gfs-kernel/src/dlm/lock_dlm.ko] undefined!
*** Warning:
"kcl_get_services" [/usr/src/cluster-1.02.00/gfs-kernel/src/dlm/lock_dlm.ko] undefined!
*** Warning:
"kcl_cluster_name" [/usr/src/cluster-1.02.00/gfs-kernel/src/dlm/lock_dlm.ko] undefined!

Is that ok?

Thanks for advices.

David


From eloy.acosta at fon.com  Thu Jun 22 09:41:43 2006
From: eloy.acosta at fon.com (Eloy Acosta Toscano)
Date: Thu, 22 Jun 2006 11:41:43 +0200
Subject: [Linux-cluster] MySQL + RH Cluster Suite + GFS
Message-ID: <012e01c695e0$3c505b70$790da8c0@eacosta>

Hello, I?d like to make you some questions about or new enterprise database production systems, which should be in production state in about 2 or 3 days. Yes, I?know this is not too much time to deploy a good and definitive HA solution, but I?d like to try making the best possible.

We?ve tried a MySQL with RedHat Cluster Suite H/A solution, but we have very 
bad results in the H/A tests.
We use a iSCSI LUN per node with ext3 FS and a HP iLO as fence device, and I 
think that these poor results are related with these two components: ext3 
formated HD device and a disfunction in the fenced daemon with the HP iLO 
device.

So, I?ve readed in some whitepapers about solutions based on MySQL + RedHat 
Cluster Suite + GFS.
And the question is: Do you think we sould use GFS instead Ext3 FS?
Do you have any information about this kind of solution? Is stable enougth 
solution  for production enviroments? and ?what about the performance 
results using GFS? will we loose performance?

We have only 2 days to deploy a final H/A solution and your opinion is very 
important to us.

Than you very very much.


__________________________

Eloy Acosta  Toscano
FON Technology
http://www.fon.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060622/0ce947e7/attachment.htm>

From herta.vandeneynde at cc.kuleuven.be  Thu Jun 22 09:49:25 2006
From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde)
Date: Thu, 22 Jun 2006 11:49:25 +0200
Subject: [Linux-cluster] network tiebreaker required for a two node cluster?
In-Reply-To: <1150400314.2810.34.camel@localhost.localdomain>
References: <a0e72880606141149g258b7bd6j905f7b0ea73392f5@mail.gmail.com>	<1150339626.2982.51.camel@localhost.localdomain>	<a0e72880606150930w7e94de78s119033b0944f80f6@mail.gmail.com>
	<1150400314.2810.34.camel@localhost.localdomain>
Message-ID: <449A67A5.4060302@cc.kuleuven.be>

Re: [Linux-cluster] Why Redhat replace quorum partition/lock lun	with 
new fencing mechanisms?

Kevin Anderson wrote:
> If you have access to shared storage, then a two node cluster with
> quorum disk/fencing would be a better configuration and could be the
> recommended configuration.  However, there are still cases where you
> could have a two node cluster with no shared storage.  Depends on how
> the application is sharing state or accessing data.  But for an
> active/passive two node failover cluster, I can see where the quorum
> disk will be very popular.
> 
> Kevin

When configuring the cluquorumd for a two node cluster (active-active 
nfs server), the GUI recommends using a network tiebreaker ip address. 
Why is that?

Under heavier network load, we occasionally see one of the members 
(usually the highest priority member) reporting that the connection to 
the tiebreaker is offline.  It subsequently gets fenced by the other 
node, and simply reboots.
(FWIIW, we checked the network cards, cables, and switch and 
swithc-ports between the two nodes.  The system that holds the TB 
address is currently waiting to be re-installed, so it's pretty much idle.)

I thought the network tiebreaker was meant to avoid a split-brain 
cluster, but if it isn't, needless to say, we'd be happy to get rid of it.

Kind regards,

Herta

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm


From bunk at stusta.de  Thu Jun 22 10:03:24 2006
From: bunk at stusta.de (Adrian Bunk)
Date: Thu, 22 Jun 2006 12:03:24 +0200
Subject: [Linux-cluster] [-mm patch] fs/gfs2/: make code static
In-Reply-To: <20060621034857.35cfe36f.akpm@osdl.org>
References: <20060621034857.35cfe36f.akpm@osdl.org>
Message-ID: <20060622100324.GZ9111@stusta.de>

On Wed, Jun 21, 2006 at 03:48:57AM -0700, Andrew Morton wrote:
>...
> Changes since 2.6.17-rc6-mm2:
>...
>  git-gfs2.patch
>...
>  git trees
>...

This patch makes the following needlessly global code static:
- eaops.c: struct gfs2_security_eaops
- rgrp.c: gfs2_free_uninit_di()

Signed-off-by: Adrian Bunk <bunk at stusta.de>

---

 fs/gfs2/eaops.c |    2 +-
 fs/gfs2/eaops.h |    2 --
 fs/gfs2/rgrp.c  |    2 +-
 fs/gfs2/rgrp.h  |    1 -
 4 files changed, 2 insertions(+), 5 deletions(-)

--- linux-2.6.17-mm1-full/fs/gfs2/eaops.h.old	2006-06-22 01:39:33.000000000 +0200
+++ linux-2.6.17-mm1-full/fs/gfs2/eaops.h	2006-06-22 01:39:45.000000000 +0200
@@ -23,8 +23,6 @@
 
 extern struct gfs2_eattr_operations gfs2_system_eaops;
 
-extern struct gfs2_eattr_operations gfs2_security_eaops;
-
 extern struct gfs2_eattr_operations *gfs2_ea_ops[];
 
 #endif /* __EAOPS_DOT_H__ */
--- linux-2.6.17-mm1-full/fs/gfs2/eaops.c.old	2006-06-22 01:39:05.000000000 +0200
+++ linux-2.6.17-mm1-full/fs/gfs2/eaops.c	2006-06-22 01:39:40.000000000 +0200
@@ -214,7 +214,7 @@
 	.eo_name = "system",
 };
 
-struct gfs2_eattr_operations gfs2_security_eaops = {
+static struct gfs2_eattr_operations gfs2_security_eaops = {
 	.eo_get = security_eo_get,
 	.eo_set = security_eo_set,
 	.eo_remove = security_eo_remove,
--- linux-2.6.17-mm1-full/fs/gfs2/rgrp.h.old	2006-06-22 01:39:52.000000000 +0200
+++ linux-2.6.17-mm1-full/fs/gfs2/rgrp.h	2006-06-22 01:40:07.000000000 +0200
@@ -43,7 +43,6 @@
 
 void gfs2_free_data(struct gfs2_inode *ip, uint64_t bstart, uint32_t blen);
 void gfs2_free_meta(struct gfs2_inode *ip, uint64_t bstart, uint32_t blen);
-void gfs2_free_uninit_di(struct gfs2_rgrpd *rgd, uint64_t blkno);
 void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip);
 void gfs2_unlink_di(struct inode *inode);
 
--- linux-2.6.17-mm1-full/fs/gfs2/rgrp.c.old	2006-06-22 01:40:15.000000000 +0200
+++ linux-2.6.17-mm1-full/fs/gfs2/rgrp.c	2006-06-22 01:40:20.000000000 +0200
@@ -1401,7 +1401,7 @@
 	gfs2_trans_add_rg(rgd);
 }
 
-void gfs2_free_uninit_di(struct gfs2_rgrpd *rgd, uint64_t blkno)
+static void gfs2_free_uninit_di(struct gfs2_rgrpd *rgd, uint64_t blkno)
 {
 	struct gfs2_sbd *sdp = rgd->rd_sbd;
 	struct gfs2_rgrpd *tmp_rgd;


From carlopmart at gmail.com  Thu Jun 22 11:03:01 2006
From: carlopmart at gmail.com (carlopmart)
Date: Thu, 22 Jun 2006 13:03:01 +0200
Subject: [Linux-cluster] Using RHCS with drbd
Message-ID: <449A78E5.9000004@gmail.com>

Hi all,

  I need to deploy different drbd devices using RHCS (without GFS and a 
shared storage). Is it possible to mount drbd devices before rgmanager 
starts up configrued services? Do I use /etc/ha.d/resources.d/drbddisk 
provided by drbd package (heatbeat use this script to accomplish this)?

Many thans.

-- 
CL Martinez
carlopmart {at} gmail {d0t} com


From adel at opennet.ae  Thu Jun 22 11:12:07 2006
From: adel at opennet.ae (Adel Ben Zarrouk)
Date: Thu, 22 Jun 2006 15:12:07 +0400
Subject: [Linux-cluster] Oracle 9i RAC and GFS 6.1
Message-ID: <200606221512.07554.adel@opennet.ae>

Dear All,

please Can someone clarify the below questions:

-What kind of lock method supported by Oracle 9i RAC (DLM or GULM) or doesn't 
need any lock daemon?
-Installation of Cluster Suite should be installed before the GFS or After?
-What is the purpose of the Cluster Manager, it is just for fencing or for 
other things  ?
-If the customer have any kind of support regarding Oracle or GFS, he must get 
back to Oracle Inc. or Redhat Inc.
-There is any TPC-Benchmark for Oracle-Redhat GFS
-How to perform RHEL and Oracle DB for better performance 

Thanks a lot for your help.

Regards

 --Adel
-- 
Adel Ben Zarrouk
Opennet MEA FZ LLC
Tel: +971 4 390 1943
Fax: +971 4 390 4767
http://www.opennet.ae/


From Bowie_Bailey at BUC.com  Thu Jun 22 13:32:58 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Thu, 22 Jun 2006 09:32:58 -0400
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
Message-ID: <4766EEE585A6D311ADF500E018C154E302684B65@bnifex.cis.buc.com>

RR wrote:
> Right, it is indeed what I want to do. But now let me understand the
> basics of GFS. GFS actually runs on the SAN but the GFS
> drivers/software that I install on each of my cluster node just
> allows each of these nodes to see these volumes? Something analogous
> to say iscsi-initiators on a node to via the LUNs on an iSCSI SAN? If
> that's true, then is it possible for me to say have my /opt/local
> installed on the GFS managed filesystem on the SAN such that whatever
> application is installed once in this directory can be accessed by
> all  nodes mounting that filesystem? So kind of get install once, use
> everywhere kind of a deal? 

GFS is a filesystem similar to ext3 or xfs.  The only difference is
that it is cluster-aware and can be accessed by multiple nodes at the
same time without data corruption.

The software/hardware that you install on each node to allow it to see
the volumes could be an iscsi initiator or something else depending on
your physical storage.

You can put /opt/local on the GFS and use it like you suggest as long
as all of the programs installed there are self-contained and do not
rely on libraries installed elsewhere than may not be available on all
of the nodes.

Also, let me clarify the LVM/raid thing.  The only limitation is that
you cannot export two partitions from your shared storage and then put
them into a software raid partition.  This is because the software
raid subsystems are not cluster-aware and each node would be trying to
do it's own raid setup which would cause data corruption.  If the
partitions are not part of the shared storage, you can do whatever you
want with them.

-- 
Bowie


From ranjtech at gmail.com  Thu Jun 22 14:13:59 2006
From: ranjtech at gmail.com (RR)
Date: Fri, 23 Jun 2006 00:13:59 +1000
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
In-Reply-To: <4766EEE585A6D311ADF500E018C154E302684B65@bnifex.cis.buc.com>
References: <4766EEE585A6D311ADF500E018C154E302684B65@bnifex.cis.buc.com>
Message-ID: <000901c69606$1cd83760$5688a620$@com>

Ok thanks. Got it! So I'm assuming then the order of configuration would be
the following?

- Configure the iSCSI SAN with whatever volumes I need to make available to
my cluster nodes
- Connect up my GigE NICs to the SAN via isolated network switches
- Install RHEL on all my cluster nodes
- Install the iSCSI initiator on each of each of my nodes and configure
iscsi.conf, start the iscsi service such that I can see these volumes on the
SAN.
- Install/Configure CSGFS on each of my cluster nodes with whatever fencing
and lock management scheme I want to go with (I have the WTI power switches
+ each server has an RSA II card in it, so I'll probably go with the RSAs)
- Start the cluster daemons and GFS service.
- use gfs_mkfs to create my filesystems on the shared volumes on the iSCSI
SAN
- Mount these filesystems on each of the node

 And I'm done?

Please could someone verify this order of things. I'm trying to read these
cluster suite manuals and it's humongous and it talks about so much stuff
that it all sounds WAY more complicated than I would've thought about from
just common sense. So I'm trying to get it into the basic core steps.
Anything I'm missing? Or any gotchas anyone can think of?

Thank you all esp. Bowie, Greg and Bob for responding so far.

Kind Regards,
RR

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bowie Bailey
Sent: Thursday, June 22, 2006 11:33 PM
To: linux clustering
Subject: RE: [Linux-cluster] partitioning of filesystems in cluster nodes

RR wrote:
> Right, it is indeed what I want to do. But now let me understand the
> basics of GFS. GFS actually runs on the SAN but the GFS
> drivers/software that I install on each of my cluster node just
> allows each of these nodes to see these volumes? Something analogous
> to say iscsi-initiators on a node to via the LUNs on an iSCSI SAN? If
> that's true, then is it possible for me to say have my /opt/local
> installed on the GFS managed filesystem on the SAN such that whatever
> application is installed once in this directory can be accessed by
> all  nodes mounting that filesystem? So kind of get install once, use
> everywhere kind of a deal? 

GFS is a filesystem similar to ext3 or xfs.  The only difference is
that it is cluster-aware and can be accessed by multiple nodes at the
same time without data corruption.

The software/hardware that you install on each node to allow it to see
the volumes could be an iscsi initiator or something else depending on
your physical storage.

You can put /opt/local on the GFS and use it like you suggest as long
as all of the programs installed there are self-contained and do not
rely on libraries installed elsewhere than may not be available on all
of the nodes.

Also, let me clarify the LVM/raid thing.  The only limitation is that
you cannot export two partitions from your shared storage and then put
them into a software raid partition.  This is because the software
raid subsystems are not cluster-aware and each node would be trying to
do it's own raid setup which would cause data corruption.  If the
partitions are not part of the shared storage, you can do whatever you
want with them.

-- 
Bowie

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From rpeterso at redhat.com  Thu Jun 22 14:40:24 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Thu, 22 Jun 2006 09:40:24 -0500
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
In-Reply-To: <006a01c695c6$50c133a0$f2439ae0$@com>
References: <001a01c69559$0a7cd920$1f768b60$@com>	<449993EC.4030803@redhat.com>	<006501c695bd$ac9d10d0$05d73270$@com>	<449A2EDE.7060209@leopard.us.udel.edu>	<006901c695c2$58618d20$0924a760$@com>	<449A369F.1030708@leopard.us.udel.edu>
	<006a01c695c6$50c133a0$f2439ae0$@com>
Message-ID: <449AABD8.3000305@redhat.com>

RR wrote:
> Right, it is indeed what I want to do. But now let me understand the basics
> of GFS. GFS actually runs on the SAN but the GFS drivers/software that I
> install on each of my cluster node just allows each of these nodes to see
> these volumes? Something analogous to say iscsi-initiators on a node to via
> the LUNs on an iSCSI SAN? If that's true, then is it possible for me to say
> have my /opt/local installed on the GFS managed filesystem on the SAN such
> that whatever application is installed once in this directory can be
> accessed by all  nodes mounting that filesystem? So kind of get install
> once, use everywhere kind of a deal?
>
> Thanks so much
> RR
>   
Hi RR,

GFS is the file system that runs on each of the nodes in the cluster.  
It's basically a
kernel device driver that controls how and where the data is stored on a 
logical
volume.  In order to make a bunch of computers ("nodes") cooperatively 
share the
data on a SAN, you need GFS's ability to coordinate with a cluster 
locking protocol.
One such cluster locking protocol is dlm, the distributed lock manager, 
which is also
a kernel device driver.  It's job is to ensure that nodes in the cluster 
who share the
data on the SAN don't corrupt each other's data.

Since GFS manages the contents of a logical volume, there is still the 
underlying
logical volume manager, LVM, that takes care of things like spanning 
physical
volumes, striping, hardware and software RAID, mirroring and such.
For GFS, there is a special version of LVM called LVM2 that is needed, but
not much changes other than the locking protocol specified in 
/etc/lvm/lvm.conf.

This only applies to RHEL4, by the way.  The web page you referenced was for
RHEL3, and in RHEL3 the only mirroring available was md - which GFS does not
work with.  GFS will only work with device-mapper (LVM2) mirroring
(specifically cluster mirroring).

Now there are lots of other little pieces to Cluster Suite besides GFS 
that are
necessary to make a cluster work:  (1) Fencing protects your data from 
split-brain
corruption when a node has hardware failure and stops communicating.
(2) Cluster manager (CMAN) handles communications between nodes.
(3) Resource Group Manager (rgmanager) handles the starting, stopping and
moving of cluster services such as NFS when nodes fail, etc. (if you 
have any).
I hope this helps.

Regards,

Bob Peterson
Red Hat Cluster Suite


From rpeterso at redhat.com  Thu Jun 22 15:07:40 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Thu, 22 Jun 2006 10:07:40 -0500
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
In-Reply-To: <000901c69606$1cd83760$5688a620$@com>
References: <4766EEE585A6D311ADF500E018C154E302684B65@bnifex.cis.buc.com>
	<000901c69606$1cd83760$5688a620$@com>
Message-ID: <449AB23C.1070203@redhat.com>

RR wrote:
> Ok thanks. Got it! So I'm assuming then the order of configuration would be
> the following?
>
> - Configure the iSCSI SAN with whatever volumes I need to make available to
> my cluster nodes
> - Connect up my GigE NICs to the SAN via isolated network switches
> - Install RHEL on all my cluster nodes
> - Install the iSCSI initiator on each of each of my nodes and configure
> iscsi.conf, start the iscsi service such that I can see these volumes on the
> SAN.
> - Install/Configure CSGFS on each of my cluster nodes with whatever fencing
> and lock management scheme I want to go with (I have the WTI power switches
> + each server has an RSA II card in it, so I'll probably go with the RSAs)
> - Start the cluster daemons and GFS service.
> - use gfs_mkfs to create my filesystems on the shared volumes on the iSCSI
> SAN
> - Mount these filesystems on each of the node
>
>  And I'm done?
>
> Please could someone verify this order of things. I'm trying to read these
> cluster suite manuals and it's humongous and it talks about so much stuff
> that it all sounds WAY more complicated than I would've thought about from
> just common sense. So I'm trying to get it into the basic core steps.
> Anything I'm missing? Or any gotchas anyone can think of?
>
> Thank you all esp. Bowie, Greg and Bob for responding so far.
>
> Kind Regards,
> RR
>   
Hi RR,

I agree there's a lot of information to pour through, and it can seem 
overwhelming.
Earlier this year, I wrote an "NFS/GFS Cookbook" that I'm still working 
on, and
because it gives lots of example commands and walks through setting up a 
cluster
from start to finish, it might be easier to go from that rather than the 
manuals.  It
doesn't deal specifically with the RAID or iSCSI issues, but it does 
contain the basics
about setting up a cluster and it's shorter than the manuals.  It can be 
found here:

http://sources.redhat.com/cluster/doc/nfscookbook.pdf

It's not an officially sanctioned Red Hat document.  Yet.  As I said, 
it's got some
problems, and I'm working to correct those, but it's better than nothing. 
I hope this helps.

Regards,

Bob Peterson
Red Hat Cluster Suite


From wcheng at redhat.com  Thu Jun 22 15:07:55 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Thu, 22 Jun 2006 11:07:55 -0400
Subject: [Linux-cluster] error: 'struct inode' has no member named
	'i_mutex'
In-Reply-To: <1150968604.8887.3.camel@localhost>
References: <1150968604.8887.3.camel@localhost>
Message-ID: <449AB24B.3050207@redhat.com>

David Siroky wrote:

>Hi!
>
>I tried to compile the cluster 1.02 against debian kernel 2.6.15
>
>./configure --kernel_src=/usr/src/linux-source-2.6.15
>make install
>
>but the compilation fails:
>
>....
>/usr/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_file.c: In function
>'gfs_write':
>/usr/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_file.c:955: warning:
>implicit declaration of function 'mutex_lock'
>/usr/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_file.c:955: error:
>'struct inode' has no member named 'i_mutex'
>/usr/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_file.c:960: warning:
>implicit declaration of function 'mutex_unlock'
>/usr/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_file.c:960: error:
>'struct inode' has no member named 'i_mutex'
>make[5]: *** [/usr/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_file.o]
>Error 1
>make[4]: *** [_module_/usr/src/cluster-1.02.00/gfs-kernel/src/gfs] Error
>2
>make[4]: Leaving directory `/usr/src/linux-source-2.6.15'
>make[3]: *** [all] Error 2
>make[3]: Leaving directory `/usr/src/cluster-1.02.00/gfs-kernel/src/gfs'
>...
>
>What em I doing wrong?
>  
>

This is why we've been very careful about RHEL updates (in order to 
provide stable external interfaces and environment for users and 
customers).

Sometime after 2.6.15 kernel, the i_sem  semaphore within VFS layer's 
struct inode was changed to i_mutext in community versions of kernels. 
GFS had to comply to this change for non-RHEL based CVS branches.  In 
short, you have to pick up newer versions of kernel from your 
distributor that contains this VFS layer changes.

-- Wendy


From jparsons at redhat.com  Thu Jun 22 15:17:21 2006
From: jparsons at redhat.com (James Parsons)
Date: Thu, 22 Jun 2006 11:17:21 -0400
Subject: [Linux-cluster] MySQL + RH Cluster Suite + GFS
In-Reply-To: <012e01c695e0$3c505b70$790da8c0@eacosta>
References: <012e01c695e0$3c505b70$790da8c0@eacosta>
Message-ID: <449AB481.2030802@redhat.com>

Eloy Acosta Toscano wrote:

> Hello, I?d like to make you some questions about or new enterprise 
> database production systems, which should be in production state in 
> about 2 or 3 days. Yes, I?know this is not too much time to deploy a 
> good and definitive HA solution, but I?d like to try making the best 
> possible.
>  
> We?ve tried a MySQL with RedHat Cluster Suite H/A solution, but we 
> have very
> bad results in the H/A tests.
> We use a iSCSI LUN per node with ext3 FS and a HP iLO as fence device, 
> and I
> think that these poor results are related with these two components: ext3
> formated HD device and a disfunction in the fenced daemon with the HP iLO
> device.

Tell me about your iLO woes.
-J


From teigland at redhat.com  Thu Jun 22 15:20:22 2006
From: teigland at redhat.com (David Teigland)
Date: Thu, 22 Jun 2006 10:20:22 -0500
Subject: [Linux-cluster] error: 'struct inode' has no member named
	'i_mutex'
In-Reply-To: <1150968604.8887.3.camel@localhost>
References: <1150968604.8887.3.camel@localhost>
Message-ID: <20060622152022.GC27434@redhat.com>

On Thu, Jun 22, 2006 at 11:30:04AM +0200, David Siroky wrote:
> Hi!
> 
> I tried to compile the cluster 1.02 against debian kernel 2.6.15
> 
> ./configure --kernel_src=/usr/src/linux-source-2.6.15
> make install

cluster-1.02 was targetted at the 2.6.16 kernel:
https://www.redhat.com/archives/linux-cluster/2006-April/msg00086.html

Dave


From ranjtech at gmail.com  Thu Jun 22 15:45:56 2006
From: ranjtech at gmail.com (RR)
Date: Fri, 23 Jun 2006 01:45:56 +1000
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
In-Reply-To: <449AB23C.1070203@redhat.com>
References: <4766EEE585A6D311ADF500E018C154E302684B65@bnifex.cis.buc.com>	<000901c69606$1cd83760$5688a620$@com>
	<449AB23C.1070203@redhat.com>
Message-ID: <000001c69612$f5a189a0$e0e49ce0$@com>

G'day Bob,

Thanks for both your emails. I have downloaded the cookbook you're working
on. You didn't comment on the little summary of steps that I'd talked about
in my email. Would really appreciate any feedback on those steps. In the
meantime I'll skim over your cookbook :D

Thanks all
\R   

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson
Sent: Friday, June 23, 2006 1:08 AM
To: linux clustering
Subject: Re: [Linux-cluster] partitioning of filesystems in cluster nodes

Hi RR,

I agree there's a lot of information to pour through, and it can seem 
overwhelming.
Earlier this year, I wrote an "NFS/GFS Cookbook" that I'm still working 
on, and
because it gives lots of example commands and walks through setting up a 
cluster
from start to finish, it might be easier to go from that rather than the 
manuals.  It
doesn't deal specifically with the RAID or iSCSI issues, but it does 
contain the basics
about setting up a cluster and it's shorter than the manuals.  It can be 
found here:

http://sources.redhat.com/cluster/doc/nfscookbook.pdf

It's not an officially sanctioned Red Hat document.  Yet.  As I said, 
it's got some
problems, and I'm working to correct those, but it's better than nothing. 
I hope this helps.

Regards,

Bob Peterson
Red Hat Cluster Suite


From eloy.acosta at fon.com  Thu Jun 22 15:46:38 2006
From: eloy.acosta at fon.com (Eloy Acosta Toscano)
Date: Thu, 22 Jun 2006 17:46:38 +0200
Subject: [Linux-cluster] MySQL + RH Cluster Suite + GFS
References: <012e01c695e0$3c505b70$790da8c0@eacosta>
	<449AB481.2030802@redhat.com>
Message-ID: <00bc01c69613$0e0bfed0$6b0da8c0@eacosta>

The problem is that when I power off one node ( I mean disable its power 
supply totally ) to test it, the fencing does not work, on the log I get:

fenced: fencing node "nodename"
fenced: fence "nodename" failed

This will show forerer and the cluster will hang, until the other node join 
the cluster again.
The problem is that, the fenced daemon waits (forever) until the iLO of the 
dead node back to life.

So my cluster is node reboot/reset fault tolerant, but not totally power off 
tolerant.
Do you know what I mean?

Can anybody helps me to workaround the problem?

thak U very much


----- Original Message ----- 
From: "James Parsons" <jparsons at redhat.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, June 22, 2006 5:17 PM
Subject: Re: [Linux-cluster] MySQL + RH Cluster Suite + GFS


> Eloy Acosta Toscano wrote:
>
>> Hello, I?d like to make you some questions about or new enterprise 
>> database production systems, which should be in production state in about 
>> 2 or 3 days. Yes, I?know this is not too much time to deploy a good and 
>> definitive HA solution, but I?d like to try making the best possible.
>>  We?ve tried a MySQL with RedHat Cluster Suite H/A solution, but we have 
>> very
>> bad results in the H/A tests.
>> We use a iSCSI LUN per node with ext3 FS and a HP iLO as fence device, 
>> and I
>> think that these poor results are related with these two components: ext3
>> formated HD device and a disfunction in the fenced daemon with the HP iLO
>> device.
>
> Tell me about your iLO woes.
> -J
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


From rpeterso at redhat.com  Thu Jun 22 15:59:13 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Thu, 22 Jun 2006 10:59:13 -0500
Subject: [Linux-cluster] Oracle 9i RAC and GFS 6.1
In-Reply-To: <200606221512.07554.adel@opennet.ae>
References: <200606221512.07554.adel@opennet.ae>
Message-ID: <449ABE51.5050002@redhat.com>

Adel Ben Zarrouk wrote:
> Dear All,
>
> please Can someone clarify the below questions:
>
> -What kind of lock method supported by Oracle 9i RAC (DLM or GULM) or doesn't 
> need any lock daemon?
>   
If you're going to use GFS for your data, you need a lock daemon.  I 
recommend DLM
because GULM is old-school and may not be supported in future versions 
of Cluster Suite.
> -Installation of Cluster Suite should be installed before the GFS or After?
>   
I consider GFS to be part of Cluster Suite, so installed at the same time.
> -What is the purpose of the Cluster Manager, it is just for fencing or for 
> other things  ?
>   
Cluster Manager is the part of Cluster Suite that handles cluster 
communications.
It maintains a "heartbeat" to inform all nodes in the cluster that the 
node is still running,
and it also takes care of internal messages regarding the joining and 
leaving of nodes
in the cluster.
> -If the customer have any kind of support regarding Oracle or GFS, he must get 
> back to Oracle Inc. or Redhat Inc.
>   
It depends on the error.  People can usually look at what's in the syslog
(/var/log/messages) to determine if error messages are coming from Oracle or
some part of the Cluster Suite.
> -There is any TPC-Benchmark for Oracle-Redhat GFS
>   
I don't have experience with this, so I'll let someone else answer this.
> -How to perform RHEL and Oracle DB for better performance 
>   
There are multiple books and courses dedicated to RHEL and Oracle
performance tuning.  Here's one link that speaks to GFS performance
 tuning, and I'm sure there are more:

http://www.redhat.com/archives/linux-cluster/2005-February/msg00040.html

> Thanks a lot for your help.
>
> Regards
>
>  --Adel
>   
Bob Peterson
Red Hat Cluster Suite


From rpeterso at redhat.com  Thu Jun 22 16:03:27 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Thu, 22 Jun 2006 11:03:27 -0500
Subject: [Linux-cluster] MySQL + RH Cluster Suite + GFS
In-Reply-To: <00bc01c69613$0e0bfed0$6b0da8c0@eacosta>
References: <012e01c695e0$3c505b70$790da8c0@eacosta>	<449AB481.2030802@redhat.com>
	<00bc01c69613$0e0bfed0$6b0da8c0@eacosta>
Message-ID: <449ABF4F.4020208@redhat.com>

Eloy Acosta Toscano wrote:
> The problem is that when I power off one node ( I mean disable its 
> power supply totally ) to test it, the fencing does not work, on the 
> log I get:
>
> fenced: fencing node "nodename"
> fenced: fence "nodename" failed
>
> This will show forerer and the cluster will hang, until the other node 
> join the cluster again.
> The problem is that, the fenced daemon waits (forever) until the iLO 
> of the dead node back to life.
>
> So my cluster is node reboot/reset fault tolerant, but not totally 
> power off tolerant.
> Do you know what I mean?
>
> Can anybody helps me to workaround the problem?
>
> thak U very much
Hi Eloy,

Sounds like maybe there's something wrong in your cluster.conf file.
Perhaps you can post it plus the names of the systems in your cluster
and what kind of fencing you have, and I'll see what I can do.

Regards,

Bob Peterson
Red Hat Cluster Suite


From rpeterso at redhat.com  Thu Jun 22 16:30:09 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Thu, 22 Jun 2006 11:30:09 -0500
Subject: [Linux-cluster] partitioning of filesystems in cluster nodes
In-Reply-To: <000001c69612$f5a189a0$e0e49ce0$@com>
References: <4766EEE585A6D311ADF500E018C154E302684B65@bnifex.cis.buc.com>	<000901c69606$1cd83760$5688a620$@com>	<449AB23C.1070203@redhat.com>
	<000001c69612$f5a189a0$e0e49ce0$@com>
Message-ID: <449AC591.50204@redhat.com>

RR wrote:
> G'day Bob,
>
> Thanks for both your emails. I have downloaded the cookbook you're working
> on. You didn't comment on the little summary of steps that I'd talked about
> in my email. Would really appreciate any feedback on those steps. In the
> meantime I'll skim over your cookbook :D
>
> Thanks all
> \R   
>   
Hi RR,

Yes, the steps you documented look good to me.  The ordering looks good too.
The cs-deploy-tool gui might make it easier to deploy your cluster too, 
once the
nodes can see the shared storage.

Regards,

Bob Peterson
Red Hat Cluster Suite


From doc at zwecker.de  Thu Jun 22 16:56:04 2006
From: doc at zwecker.de (Christophe Zwecker)
Date: Thu, 22 Jun 2006 18:56:04 +0200
Subject: [Linux-cluster] PVFS, GFS - What do I need for...
Message-ID: <449ACBA4.7030502@zwecker.de>

Hi,

im a newbie to clustering and have been browsing docs, irc and subcribed 
to this list 2 days ago. I think I know now that GFS is not what we 
need, id like to ask which system would be right for us:

We'll run 2 Linux Servers, one is backup in case the first one fails. im 
looking into using the heartbeat project for that.
Theyll run

mysql
jboss
apache
openldap

jboss will store lots of pictures on a partition. We want to replicate 
this partition constantly to the second server, so the jboss of server2 
has access to those if it has to take over.
We wont be using a shared storage (right now for cost issues)

so is PVFS doing what we want?

I want the moment I copy a file to that special partition it to be 
copies to the other server. so both server keep a local partition in sync...

thx for clearing things up for me.

Christophe
-- 
Christophe Zwecker                     mail: doc at zwecker.de
Hamburg, Germany                        fon: +49 179 3994867
http://www.zwecker.de

"Reality is that which, when you stop believing in it, doesn't go away"


From eloy.acosta at fon.com  Thu Jun 22 16:59:58 2006
From: eloy.acosta at fon.com (Eloy Acosta Toscano)
Date: Thu, 22 Jun 2006 18:59:58 +0200
Subject: [Linux-cluster] MySQL + RH Cluster Suite + GFS
References: <012e01c695e0$3c505b70$790da8c0@eacosta>	<449AB481.2030802@redhat.com><00bc01c69613$0e0bfed0$6b0da8c0@eacosta>
	<449ABF4F.4020208@redhat.com>
Message-ID: <014901c6961d$4e5d6780$6b0da8c0@eacosta>

This is my cluster.conf

-------------

<?xml version="1.0"?>
<cluster config_version="24" name="FON_DB_Cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="30"/>
        <clusternodes>
                <clusternode name="fonessql01" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="iLO-SQL01"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="fonessql02" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="iLO-SQL02"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="fonessql03" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="iLO-SQL03"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="fonessql04" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="iLO-SQL04"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
                <fencedevice agent="fence_ilo" hostname="fonessql01-ilo" 
login="Administrator" name="iLO-SQL01" passwd="sadsa"/>
                <fencedevice agent="fence_ilo" hostname="fonessql02-ilo" 
login="Administrator" name="iLO-SQL02" passwd="asdad"/>
                <fencedevice agent="fence_ilo" hostname="fonessql03-ilo" 
login="Administrator" name="iLO-SQL03" passwd="adsas"/>
                <fencedevice agent="fence_ilo" hostname="fonessql04-ilo" 
login="Administrator" name="iLO-SQL04" passwd="adasd"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="Radius-Failover-domain" 
ordered="1" restricted="1">
                                <failoverdomainnode name="fonessql01" 
priority="1"/>
                                <failoverdomainnode name="fonessql02" 
priority="2"/>
                        </failoverdomain>
                        <failoverdomain name="eShop-Failover-Domain" 
ordered="1" restricted="1">
                                <failoverdomainnode name="fonessql02" 
priority="1"/>
                                <failoverdomainnode name="fonessql01" 
priority="2"/>
                        </failoverdomain>
                        <failoverdomain name="WebDB-Failover-Domain" 
ordered="1" restricted="1">
                                <failoverdomainnode name="fonessql03" 
priority="1"/>
                                <failoverdomainnode name="fonessql04" 
priority="2"/>
                        </failoverdomain>
                        <failoverdomain name="Billing-Failover-Domain" 
ordered="1" restricted="1">
                                <failoverdomainnode name="fonessql04" 
priority="1"/>
                                <failoverdomainnode name="fonessql03" 
priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <fs device="/dev/sql01" force_fsck="0" 
force_unmount="0" fsid="01" fstype="ext3" mountpoint="/mnt/sql01" 
name="disco-sql01" options="
sync" self_fence="1"/>
                        <fs device="/dev/sql02" force_fsck="0" 
force_unmount="0" fsid="02" fstype="ext3" mountpoint="/mnt/sql02" 
name="disco-sql02" options="
sync" self_fence="1"/>
                        <fs device="/dev/sql03" force_fsck="0" 
force_unmount="0" fsid="03" fstype="ext3" mountpoint="/mnt/sql03" 
name="disco-sql03" options="
sync" self_fence="1"/>
                        <fs device="/dev/sql04" force_fsck="0" 
force_unmount="0" fsid="04" fstype="ext3" mountpoint="/mnt/sql04" 
name="disco-sql04" options="
sync" self_fence="1"/>
                        <ip address="172.17.0.168" monitor_link="1"/>
                        <ip address="172.17.0.169" monitor_link="1"/>
                        <ip address="172.17.0.170" monitor_link="1"/>
                        <ip address="172.17.0.171" monitor_link="1"/>
                        <script file="/etc/init.d/mysql01" name="mysql01"/>
                        <script file="/etc/init.d/mysql02" name="mysql02"/>
                        <script file="/etc/init.d/mysql03" name="mysql03"/>
                        <script file="/etc/init.d/mysql04" name="mysql04"/>
                </resources>
                <service autostart="1" domain="Radius-Failover-domain" 
name="mysql01-Radius" recovery="restart">
                        <fs ref="disco-sql01"/>
                        <ip ref="172.17.0.168"/>
                        <script ref="mysql01"/>
                </service>
                <service autostart="1" domain="eShop-Failover-Domain" 
name="mysql02-eShop-varios" recovery="restart">
                        <fs ref="disco-sql02"/>
                        <ip ref="172.17.0.169"/>
                        <script ref="mysql02"/>
                </service>
                <service autostart="1" domain="WebDB-Failover-Domain" 
name="mysql03-WebDB" recovery="restart">
                        <fs ref="disco-sql03"/>
                        <ip ref="172.17.0.170"/>
                        <script ref="mysql03"/>
                </service>
                <service autostart="1" domain="Billing-Failover-Domain" 
name="mysql04-Billing" recovery="restart">
                        <fs ref="disco-sql04"/>
                        <ip ref="172.17.0.171"/>
                        <script ref="mysql04"/>
                </service>
        </rm>
</cluster>
----------------

and my /etc/hosts

.........................


127.0.0.1 localhost

# Red de Backend
172.17.0.232 fonessql01.es.fon.srv      fonessql01
172.17.0.233 fonessql02.es.fon.srv      fonessql02
172.17.0.234 fonessql03.es.fon.srv      fonessql03
172.17.0.235 fonessql04.es.fon.srv      fonessql04
172.17.0.168 vipsql01.es.fon.srv        vipsql01
172.17.0.169 vipsql02.es.fon.srv        vipsql02
172.17.0.170 vipsql03.es.fon.srv        vipsql03
172.17.0.171 vipsql04.es.fon.srv        vipsql04

# Red de Almacenamiento VLAN 10 (stg)
172.17.4.232 fonessql01.es.fon.stg      fonessql01-stg
172.17.4.233 fonessql02.es.fon.stg      fonessql02-stg
172.17.4.234 fonessql03.es.fon.stg      fonessql03-stg
172.17.4.235 fonessql04.es.fon.stg      fonessql04-stg
172.17.4.244 fonesnas01.es.fon.stg      fonesnas01-stg
172.17.4.245 fonesnas02.es.fon.stg      fonesnas02-stg


----- Original Message ----- 
From: "Robert Peterson" <rpeterso at redhat.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, June 22, 2006 6:03 PM
Subject: Re: [Linux-cluster] MySQL + RH Cluster Suite + GFS


> Eloy Acosta Toscano wrote:
>> The problem is that when I power off one node ( I mean disable its power 
>> supply totally ) to test it, the fencing does not work, on the log I get:
>>
>> fenced: fencing node "nodename"
>> fenced: fence "nodename" failed
>>
>> This will show forerer and the cluster will hang, until the other node 
>> join the cluster again.
>> The problem is that, the fenced daemon waits (forever) until the iLO of 
>> the dead node back to life.
>>
>> So my cluster is node reboot/reset fault tolerant, but not totally power 
>> off tolerant.
>> Do you know what I mean?
>>
>> Can anybody helps me to workaround the problem?
>>
>> thak U very much
> Hi Eloy,
>
> Sounds like maybe there's something wrong in your cluster.conf file.
> Perhaps you can post it plus the names of the systems in your cluster
> and what kind of fencing you have, and I'll see what I can do.
>
> Regards,
>
> Bob Peterson
> Red Hat Cluster Suite
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


From rpeterso at redhat.com  Thu Jun 22 18:29:09 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Thu, 22 Jun 2006 13:29:09 -0500
Subject: [Linux-cluster] MySQL + RH Cluster Suite + GFS
In-Reply-To: <014901c6961d$4e5d6780$6b0da8c0@eacosta>
References: <012e01c695e0$3c505b70$790da8c0@eacosta>	<449AB481.2030802@redhat.com><00bc01c69613$0e0bfed0$6b0da8c0@eacosta>	<449ABF4F.4020208@redhat.com>
	<014901c6961d$4e5d6780$6b0da8c0@eacosta>
Message-ID: <449AE175.6020503@redhat.com>

Eloy Acosta Toscano wrote:
> This is my cluster.conf
Hi Eloy,

Your cluster.conf looked good.  I spoke to the fencing guy, Jim, and he 
says the
next thing to do is see if it fences via the command line.   For example:

/sbin/fence_ilo -a fonessql01-ilo -l Administrator -p sadsa -o off -v


He suspects that it is just taking atoo long for the fence command to 
return. 
If it is working, but just taking too much time, perhaps you'd be 
willing to try
out a highly experimental new version of the agent that cuts the time 
down to
one third?  Also, please let us know the RIBCL version you are using.

Regards,

Bob Peterson
Red Hat Cluster Suite


From wcheng at redhat.com  Thu Jun 22 19:06:37 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Thu, 22 Jun 2006 15:06:37 -0400
Subject: [Linux-cluster] Compilation problem with GFS/GNBD and kernel
	panics	on stress.
In-Reply-To: <4497C5B1.3090501@seanodes.com>
References: <44968F28.8070505@seanodes.com> <4496ADDC.8010206@redhat.com>
	<4497C5B1.3090501@seanodes.com>
Message-ID: <449AEA3D.1070507@redhat.com>

Mathieu Avila wrote:

>
> Thank you Wendy,
>
> Do you have any idea on the other problem (crash of GNBD+GFS under 
> heavy stress) ? Are there any known problems with the versions I use ? 
> Do you need additional information to deal with this issue ?

Hi, Mathieu,

I am not aware of any relevant GNBD+GFS panic issue at this moment. Our 
current workloads are kind of heavy - will look into the issue when 
there are free cycles available.

-- Wendy


From swhiteho at redhat.com  Thu Jun 22 10:25:37 2006
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Thu, 22 Jun 2006 11:25:37 +0100
Subject: [Linux-cluster] Re: [-mm patch] fs/gfs2/: make code static
In-Reply-To: <20060622100324.GZ9111@stusta.de>
References: <20060621034857.35cfe36f.akpm@osdl.org>
	<20060622100324.GZ9111@stusta.de>
Message-ID: <1150971937.3856.1501.camel@quoit.chygwyn.com>

Hi,

On Thu, 2006-06-22 at 12:03 +0200, Adrian Bunk wrote:
> On Wed, Jun 21, 2006 at 03:48:57AM -0700, Andrew Morton wrote:
> >...
> > Changes since 2.6.17-rc6-mm2:
> >...
> >  git-gfs2.patch
> >...
> >  git trees
> >...
> 
> This patch makes the following needlessly global code static:
> - eaops.c: struct gfs2_security_eaops
> - rgrp.c: gfs2_free_uninit_di()
> 
> Signed-off-by: Adrian Bunk <bunk at stusta.de>
> 
Thanks for the patch. I've added it to the git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6.git

Steve.


From akornev at gmail.com  Fri Jun 23 00:19:52 2006
From: akornev at gmail.com (Anton Kornev)
Date: Fri, 23 Jun 2006 03:19:52 +0300
Subject: [Linux-cluster] GFS locking issues
In-Reply-To: <20060621175430.GB4706@redhat.com>
References: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com>
	<20060615190959.GB1913@redhat.com>
	<433fd2630606160837x1bbe2716pf7b375f42b01cdbd@mail.gmail.com>
	<20060621175430.GB4706@redhat.com>
Message-ID: <433fd2630606221719q649fc46bv97e94a929c3d5427@mail.gmail.com>

David,

Thanks a lot for your comments.
Actually it sounds rather strange for me.

I tried to grep the /var/log/messages log with "gnbd" word and found that
there are also
other messages like this even on the working host with no GFS problems.

bash-3.00# grep gnbd /var/log/messages
Jun 19 08:16:20 node1 kernel: gnbd (pid 25756: alogc.pl) got signal 9
Jun 19 08:16:20 node1 kernel: gnbd0: Send control failed (result -4)
Jun 19 08:16:20 node1 kernel: gnbd (pid 25756: alogc.pl) got signal 15
Jun 19 08:16:20 node1 kernel: gnbd0: Send control failed (result -4)
Jun 19 09:51:59 node1 kernel: gnbd (pid 26259: find) got signal 9
Jun 19 09:51:59 node1 kernel: gnbd0: Send control failed (result -4)
Jun 19 10:06:39 node1 kernel: gnbd (pid 313: alogc.pl) got signal 9
Jun 19 10:06:39 node1 kernel: gnbd0: Send control failed (result -4)
Jun 19 10:06:39 node1 kernel: gnbd (pid 313: alogc.pl) got signal 15
Jun 19 10:06:39 node1 kernel: gnbd0: Send control failed (result -4)
Jun 19 12:51:12 node1 kernel: gnbd (pid 19463: vi) got signal 1
Jun 19 12:51:12 node1 kernel: gnbd0: Send control failed (result -4)
Jun 20 14:48:16 node1 kernel: gnbd (pid 20238: alogc.pl) got signal 9
Jun 20 14:48:16 node1 kernel: gnbd0: Send control failed (result -4)
Jun 20 14:48:16 node1 kernel: gnbd (pid 20238: alogc.pl) got signal 15
Jun 20 14:48:16 node1 kernel: gnbd0: Send control failed (result -4)


I tried to check gnbd-kernel sources (latest available SRPM - not CVS
version)
and I found that the first message (gnbd ... got signal) is produced by the
sock_xmit() function with the such a piece of code:

                if (signal_pending(current)) {
                        siginfo_t info;
                        spin_lock_irqsave(&current->sighand->siglock,
flags);
                        printk(KERN_WARNING "gnbd (pid %d: %s) got signal
%d\n",
                                current->pid, current->comm,
                                dequeue_signal(current, &current->blocked,
&info));
                        spin_unlock_irqrestore(&current->sighand->siglock,
flags);
                        result = -EINTR;
                        break;
                }

And the second message is generated inside the gnbd_send_req() by the code

        result = sock_xmit(sock, 1, &request, sizeof(request),
                        (gnbd_cmd(req) == GNBD_CMD_WRITE)? MSG_MORE: 0);
        if (result < 0) {
                printk(KERN_ERR "%s: Send control failed (result %d)\n",
                                dev->disk->disk_name, result);
                goto error_out;
        }


So at the first glance it seems like a normal messages from gnbd - if there
is signal received during sock_xmit - don't send anyting and return -EINTR.

I am not sure that it might be a problem but I take a look on the
sock_xmit() code and
there are at least two things that seems strange for me.

1. There is an inconsistancy between comment and code:

        /* Allow interception of SIGKILL only
         * Don't allow other signals to interrupt the transmission */
        spin_lock_irqsave(&current->sighand->siglock, flags);
        oldset = current->blocked;
        sigfillset(&current->blocked);
        sigdelsetmask(&current->blocked, sigmask(SIGKILL) | sigmask(SIGTERM)
|
                      sigmask(SIGHUP));
        recalc_sigpending();
        spin_unlock_irqrestore(&current->sighand->siglock, flags);

So, inside the comment there is a suggestion that only SIGKILL can interrupt
the transmission but the real mask is for KILL/TERM/HUP signals (btw: in my
case it is a SIGTERM who locks everything).

2. There are two blocks of code following each other

                if (send)
                        result = sock_sendmsg(sock, &msg, size);
                else
                        result = sock_recvmsg(sock, &msg, size, 0);


                if (signal_pending(current)) {
                        siginfo_t info;
                        spin_lock_irqsave(&current->sighand->siglock,
flags);
                        printk(KERN_WARNING "gnbd (pid %d: %s) got signal
%d\n",
                                current->pid, current->comm,
                                dequeue_signal(current, &current->blocked,
&info));
                        spin_unlock_irqrestore(&current->sighand->siglock,
flags);
                        result = -EINTR;
                        break;
                }

Why do we need to return -EINTR as a result if we have already done the real
sock_sendmsg() / sock_recvmsg()?  What if the real transmission was okay and
real result has no mistake?

I am not a kernel developer and I haven't spent a lot of time on the issue,
so it might make no sense at all.

Please, let me know what do you think about it?

On 6/21/06, David Teigland <teigland at redhat.com> wrote:
>
> On Fri, Jun 16, 2006 at 06:37:14PM +0300, Anton Kornev wrote:
> > gnbd (pid 5836: alogc.pl) got signal 9
> > gnbd0: Send control failed (result -4)
> > gnbd (pid 5836: alogc.pl) got signal 15
> > gnbd0: Send control failed (result -4)
>
> This and the fact that a number of processes appear to be blocked in the
> i/o path seem to point at gnbd as the hold-up.
>
> Dave
>
>
> >   51 D wait_on_buffer                   pdflush
> > 5771 D lock_page                        lock_dlm1
> > 5776 D -                                gfs_logd
> > 5777 D -                                gfs_quotad
> > 5778 D -                                gfs_inoded
> > 5892 D -                                httpd
> > 5895 D glock_wait_internal              httpd
> > 5896 D glock_wait_internal              httpd
> > 5897 D glock_wait_internal              httpd
> > 5911 D glock_wait_internal              httpd
> > 5915 D wait_on_buffer                   httpd
> > 5930 D wait_on_buffer                   sh
>
> > pdflush       D ffffffff8014aabc     0    51      6            53    50
> > (L-TLB)
> > 00000100dfc3dc78 0000000000000046 000001011bd3e980 000001010fc11f00
> >       0000000000000216 ffffffffa0042916 000001011aca60c0
> 0000000000000008
> >       000001011fdef7f0 0000000000000dfa
> > Call Trace:<ffffffffa0042916>{:dm_mod:dm_request+396}
> > <ffffffff8014aabc>{keventd_create_kthread+0}
> >       <ffffffff803053ef>{io_schedule+38}
> > <ffffffff80178c4c>{__wait_on_buffer+125}
> >       <ffffffff80178ad2>{bh_wake_function+0}
> > <ffffffff80178ad2>{bh_wake_function+0}
> >       <ffffffffa0235c5d>{:gfs:gfs_logbh_wait+49}
> > <ffffffffa024a6a6>{:gfs:disk_commit+794}
> >       <ffffffffa024a877>{:gfs:log_refund+111}
> > <ffffffffa024ad8e>{:gfs:log_flush_internal+510}
> >       <ffffffff8017d682>{sync_supers+167}
> <ffffffff8015e310>{wb_kupdate+36}
> >
> >       <ffffffff8015edb4>{pdflush+323} <ffffffff8015e2ec>{wb_kupdate+0}
> >       <ffffffff8015ec71>{pdflush+0} <ffffffff8014aa93>{kthread+200}
> >       <ffffffff80110e17>{child_rip+8}
> > <ffffffff8014aabc>{keventd_create_kthread+0}
> >       <ffffffff8014a9cb>{kthread+0} <ffffffff80110e0f>{child_rip+0}
> > lock_dlm1     D 000001000c0096e0     0  5771      6          5772  5766
> > (L-TLB)
> > 0000010113ce3c58 0000000000000046 0000001000000000 0000010000000069
> >       000001011420b030 0000000000000069 000001000c00a940
> 000000010000eb10
> >       000001011a887030 0000000000001cae
> > Call Trace:<ffffffff802496d4>{__generic_unplug_device+19}
> > <ffffffff803053ef>{io_schedule+38}
> >       <ffffffff80159215>{__lock_page+191}
> > <ffffffff80158cfa>{page_wake_function+0}
> >       <ffffffff80158cfa>{page_wake_function+0}
> > <ffffffff80163125>{truncate_inode_pages+519}
> >       <ffffffffa0258f35>{:gfs:gfs_inval_page+63}
> > <ffffffffa02401b5>{:gfs:drop_bh+233}
> >       <ffffffffa0242138>{:gfs:gfs_glock_cb+194}
> > <ffffffffa02869dd>{:lock_dlm:dlm_async+1989}
> >       <ffffffff801333c8>{default_wake_function+0}
> > <ffffffff8014aabc>{keventd_create_kthread+0}
> >       <ffffffffa0286218>{:lock_dlm:dlm_async+0}
> > <ffffffff8014aabc>{keventd_create_kthread+0}
> >       <ffffffff8014aa93>{kthread+200} <ffffffff80110e17>{child_rip+8}
> >       <ffffffff8014aabc>{keventd_create_kthread+0}
> > <ffffffff8014a9cb>{kthread+0}
> >       <ffffffff80110e0f>{child_rip+0}
> > gfs_logd      D 0000000000000000     0  5776      1          5777  5775
> > (L-TLB)
> > 000001011387fe38 0000000000000046 0000000000000000 ffffffff80304a85
> >       000001011387fe58 ffffffff80304add ffffffff803cca80
> 0000000000000246
> >       00000101143fe030 00000000000000b5
> > Call Trace:<ffffffff80304a85>{thread_return+0}
> > <ffffffff80304add>{thread_return+88}
> >       <ffffffffa023e8d3>{:gfs:lock_on_glock+112}
> > <ffffffff8030565b>{__down_write+134}
> >       <ffffffffa0249cdb>{:gfs:gfs_ail_empty+56}
> > <ffffffffa0233930>{:gfs:gfs_logd+77}
> >       <ffffffff80110e17>{child_rip+8}
> > <ffffffff801cccff>{dummy_d_instantiate+0}
> >       <ffffffffa02338e3>{:gfs:gfs_logd+0}
> <ffffffff80110e0f>{child_rip+0}
> >
> > gfs_quotad    D 0000000000000000     0  5777      1          5778  5776
> > (L-TLB)
> > 0000010113881e98 0000000000000046 0000000000000000 ffffffff80304a85
> >       0000010113881eb8 ffffffff80304add 000001011ff87030
> 0000000100000074
> >       000001011430f7f0 0000000000000128
> > Call Trace:<ffffffff80304a85>{thread_return+0}
> > <ffffffff80304add>{thread_return+88}
> >       <ffffffff8030565b>{__down_write+134}
> > <ffffffffa025b55a>{:gfs:gfs_quota_sync+226}
> >       <ffffffffa0233ab1>{:gfs:gfs_quotad+127}
> > <ffffffff80110e17>{child_rip+8}
> >       <ffffffff801cccff>{dummy_d_instantiate+0}
> > <ffffffff801cccff>{dummy_d_instantiate+0}
> >       <ffffffff801cccff>{dummy_d_instantiate+0}
> > <ffffffffa0233a32>{:gfs:gfs_quotad+0}
> >       <ffffffff80110e0f>{child_rip+0}
> > gfs_inoded    D 0000000000000000     0  5778      1          5807  5777
> > (L-TLB)
> > 0000010113883e98 0000000000000046 000001011e2937f0 000001000c0096e0
> >       0000000000000000 ffffffff80304a85 0000010113883ec8
> 0000000180304add
> >       000001011e2937f0 00000000000000c2
> > Call Trace:<ffffffff80304a85>{thread_return+0}
> > <ffffffff8030565b>{__down_write+134}
> >       <ffffffffa026160d>{:gfs:unlinked_find+115}
> > <ffffffffa0261c6c>{:gfs:gfs_unlinked_dealloc+25}
> >       <ffffffffa0233bd5>{:gfs:gfs_inoded+66}
> > <ffffffff80110e17>{child_rip+8}
> >       <ffffffffa0233b93>{:gfs:gfs_inoded+0}
> <ffffffff80110e0f>{child_rip+0}
> >
> >
> > httpd         D ffffffff80304190     0  5892      1  5893          5826
> > (NOTLB)
> > 0000010111b75bf8 0000000000000002 0000000000000001 0000000000000001
> >       0000000000000000 0000000000000000 0000010114667980
> 0000000111b75bc0
> >       00000101143fe7f0 00000000000009ad
> > Call Trace:<ffffffff80303d6f>{__down+147}
> > <ffffffff801333c8>{default_wake_function+0}
> >       <ffffffff8015b3a2>{generic_file_write_nolock+158}
> > <ffffffff80305780>{__down_failed+53}
> >       <ffffffffa0236986>{:gfs:.text.lock.dio+95}
> > <ffffffffa0260e4c>{:gfs:gfs_trans_add_bh+205}
> >       <ffffffffa0253efc>{:gfs:do_write_buf+1138}
> > <ffffffffa0252db3>{:gfs:walk_vm+278}
> >       <ffffffffa0253a8a>{:gfs:do_write_buf+0}
> > <ffffffffa0253a8a>{:gfs:do_write_buf+0}
> >       <ffffffffa025415b>{:gfs:__gfs_write+201}
> > <ffffffff80177c60>{vfs_write+207}
> >       <ffffffff80177d48>{sys_write+69}
> <ffffffff801101c6>{system_call+126}
> >
> > httpd         D 0000010110ad7d48     0  5895   5892          5896  5893
> > (NOTLB)
> > 0000010110ad7bd8 0000000000000006 000001011b16e030 0000000000000075
> >       0000010117002030 0000000000000075 000001000c002940
> 0000000000000001
> >       00000101170027f0 000000000001300e
> > Call Trace:<ffffffff80131d1d>{try_to_wake_up+863}
> > <ffffffff80304cbd>{wait_for_completion+167}
> >       <ffffffff801333c8>{default_wake_function+0}
> > <ffffffff801333c8>{default_wake_function+0}
> >       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
> > <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
> >       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
> > <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
> >       <ffffffff80168211>{do_no_page+1003}
> > <ffffffff80167b13>{do_wp_page+948}
> >       <ffffffff8016858f>{handle_mm_fault+343}
> > <ffffffff80142a06>{get_signal_to_deliver+1118}
> >       <ffffffff801236d2>{do_page_fault+518}
> > <ffffffff80304a85>{thread_return+0}
> >       <ffffffff80304add>{thread_return+88}
> <ffffffff80110c61>{error_exit+0}
> >
> >
> > httpd         D 0000010110b5bd48     0  5896   5892          5897  5895
> > (NOTLB)
> > 0000010110b5bbd8 0000000000000002 00000101170027f0 0000000000000075
> >       00000101114787f0 0000000000000075 000001000c002940
> 0000000000000001
> >       0000010117002030 000000000000fb3e
> > Call Trace:<ffffffff80131d1d>{try_to_wake_up+863}
> > <ffffffff80304cbd>{wait_for_completion+167}
> >       <ffffffff801333c8>{default_wake_function+0}
> > <ffffffff801333c8>{default_wake_function+0}
> >       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
> > <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
> >       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
> > <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
> >       <ffffffff80168211>{do_no_page+1003}
> > <ffffffff80167b13>{do_wp_page+948}
> >       <ffffffff8016858f>{handle_mm_fault+343}
> > <ffffffff80142a06>{get_signal_to_deliver+1118}
> >       <ffffffff801236d2>{do_page_fault+518}
> > <ffffffff802a3445>{sys_accept+327}
> >       <ffffffff80182e88>{pipe_read+26} <ffffffff80110c61>{error_exit+0}
> >
> > httpd         D 0000000000000000     0  5897   5892          5911  5896
> > (NOTLB)
> > 0000010110119bd8 0000000000000006 0000010117002030 0000000000000075
> >       0000010117002030 0000000000000075 000001000c00a940
> 000000001b16e030
> >       00000101114787f0 000000000000fbe0
> > Call Trace:<ffffffff802496d4>{__generic_unplug_device+19}
> > <ffffffff80304cbd>{wait_for_completion+167}
> >       <ffffffff801333c8>{default_wake_function+0}
> > <ffffffff801333c8>{default_wake_function+0}
> >       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
> > <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
> >       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
> > <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
> >       <ffffffff80168211>{do_no_page+1003}
> > <ffffffff80167b13>{do_wp_page+948}
> >       <ffffffff8016858f>{handle_mm_fault+343}
> > <ffffffff80142a06>{get_signal_to_deliver+1118}
> >       <ffffffff801236d2>{do_page_fault+518}
> > <ffffffff80304a85>{thread_return+0}
> >       <ffffffff80304add>{thread_return+88}
> <ffffffff80110c61>{error_exit+0}
> >
> >
> > httpd         D 00000101100c3d48     0  5911   5892          5915  5897
> > (NOTLB)
> > 00000101100c3bd8 0000000000000002 000001011420b7f0 0000000000000075
> >       00000101170027f0 0000000000000075 000001000c002940
> 0000000000000000
> >       000001011b16e030 000000000000187e
> > Call Trace:<ffffffff80131d1d>{try_to_wake_up+863}
> > <ffffffff80304cbd>{wait_for_completion+167}
> >       <ffffffff801333c8>{default_wake_function+0}
> > <ffffffff801333c8>{default_wake_function+0}
> >       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
> > <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
> >       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
> > <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
> >       <ffffffff80168211>{do_no_page+1003}
> > <ffffffff80167b13>{do_wp_page+948}
> >       <ffffffff8016858f>{handle_mm_fault+343}
> > <ffffffff80142a06>{get_signal_to_deliver+1118}
> >       <ffffffff801236d2>{do_page_fault+518}
> > <ffffffff80304a85>{thread_return+0}
> >       <ffffffff80304add>{thread_return+88}
> <ffffffff80110c61>{error_exit+0}
> >
> >
> > httpd         D 0000000000006a36     0  5915   5892                5911
> > (NOTLB)
> > 00000101180f7ad8 0000000000000006 0000000000002706 ffffffffa020c791
> >       0000000000000000 0000000000000000 0000030348ac8c1c
> 0000000114a217f0
> >       0000010114c997f0 000000000000076a
> > Call Trace:<ffffffffa020c791>{:dlm:lkb_swqueue+43}
> > <ffffffff803053ef>{io_schedule+38}
> >       <ffffffff80178c4c>{__wait_on_buffer+125}
> > <ffffffff80178ad2>{bh_wake_function+0}
> >       <ffffffff80178ad2>{bh_wake_function+0}
> > <ffffffffa02352c6>{:gfs:gfs_dreread+154}
> >       <ffffffffa0235332>{:gfs:gfs_dread+40}
> > <ffffffffa02363b1>{:gfs:gfs_get_meta_buffer+201}
> >       <ffffffffa0242999>{:gfs:gfs_copyin_dinode+23}
> > <ffffffffa0242461>{:gfs:inode_go_lock+38}
> >       <ffffffffa023f586>{:gfs:glock_wait_internal+563}
> > <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
> >       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
> > <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
> >       <ffffffff80168211>{do_no_page+1003}
> > <ffffffff80167b13>{do_wp_page+948}
> >       <ffffffff8016858f>{handle_mm_fault+343}
> > <ffffffff80142a06>{get_signal_to_deliver+1118}
> >       <ffffffff801236d2>{do_page_fault+518}
> > <ffffffff80304a85>{thread_return+0}
> >       <ffffffff80304add>{thread_return+88}
> <ffffffff80110c61>{error_exit+0}
> >
> >
> > sh            D 000000000000001a     0  5930   2547
> > (NOTLB)
> > 000001011090f8e8 0000000000000002 0000010111293d88 0000010110973d00
> >       0000010111293d88 0000000000000000 00000100dfc02400
> 0000000000010000
> >       00000101148557f0 0000000000002010
> > Call Trace:<ffffffff803053ef>{io_schedule+38}
> > <ffffffff80178c4c>{__wait_on_buffer+125}
> >       <ffffffff80178ad2>{bh_wake_function+0}
> > <ffffffff80178ad2>{bh_wake_function+0}
> >       <ffffffffa02352c6>{:gfs:gfs_dreread+154}
> > <ffffffffa0235332>{:gfs:gfs_dread+40}
> >       <ffffffffa02363b1>{:gfs:gfs_get_meta_buffer+201}
> > <ffffffffa0242999>{:gfs:gfs_copyin_dinode+23}
> >       <ffffffffa0242461>{:gfs:inode_go_lock+38}
> > <ffffffffa023f586>{:gfs:glock_wait_internal+563}
> >       <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
> > <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
> >       <ffffffff801ccb78>{dummy_inode_permission+0}
> > <ffffffffa0257aca>{:gfs:gfs_permission+64}
> >       <ffffffff8018d475>{dput+56} <ffffffff80183d32>{permission+51}
> >       <ffffffff801844aa>{__link_path_walk+372}
> > <ffffffff801851c2>{link_path_walk+82}
> >       <ffffffff8012370b>{do_page_fault+575}
> > <ffffffff801849b0>{__link_path_walk+1658}
> >       <ffffffff801851c2>{link_path_walk+82}
> > <ffffffff8012370b>{do_page_fault+575}
> >       <ffffffff8018540f>{path_lookup+451}
> > <ffffffff801856bb>{__user_walk+47}
> >       <ffffffff8017ff1a>{vfs_stat+24}
> <ffffffff8012370b>{do_page_fault+575}
> >
> >       <ffffffff80180264>{sys_newstat+17}
> <ffffffff80110c61>{error_exit+0}
> >       <ffffffff801101c6>{system_call+126}
>
>


-- 
Best Regards,
Anton Kornev.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060623/a47a727c/attachment.htm>

From jim at ond.vein.hu  Fri Jun 23 08:47:35 2006
From: jim at ond.vein.hu (JiM)
Date: Fri, 23 Jun 2006 10:47:35 +0200
Subject: [Linux-cluster] cluster slows down when a cluster member fails
Message-ID: <449BAAA7.4010305@ond.vein.hu>

Hi,

I'm newbie to clustering. I set up a 2 nodes cluster.
I don't claim any special.
I just want 2 linuxes to see the same GFS partition on an IBM storage.
It works but if one of the cluster members fails the other one slows
down. Mainly when I try to come at the mounted GFS partition.

What can be the problem?

thanks in advance

Attila

cluster.conf:

<?xml version="1.0"?>
<cluster config_version="1" name="ldap">
<fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>

       <clusternodes>
               <clusternode name="umbilical-cord1" votes="1">
                       <fence>
                               <method name="1">
                                <device ipaddress="192.168.200.1" 
name="manual"/>
                               </method>
                       </fence>
               </clusternode>
               <clusternode name="umbilical-cord2" votes="1">
                       <fence>
                               <method name="1">
                                <device ipaddress="192.168.200.2" 
name="manual"/>
                               </method>
                       </fence>
               </clusternode>
       </clusternodes>
        <cman two_node="1" expected_votes="1">
        </cman>
       <fencedevices>
        <fencedevice agent="fence_manual" name="manual"/>
       </fencedevices>
       <rm>
               <failoverdomains/>
               <resources/>
       </rm>
</cluster>


ha.cf:

logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
ucast eth1 192.168.200.1
auto_failback on
node    cluster1
node    cluster2


From gstaltari at arnet.net.ar  Fri Jun 23 15:16:22 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Fri, 23 Jun 2006 12:16:22 -0300
Subject: [Linux-cluster] files with unknown state - locking problem?
Message-ID: <449C05C6.9070406@arnet.net.ar>

Hi, we have a 6 node cluster with FC4, kernel 2.6.16 and the CVS STABLE 
branch of the cluster software. Sometimes, some processes (courier imap) 
hangs in D state. When I execute "ls -la" in the "tmp" directory (the 
directory is always the same, the same mailbox) of the mailbox that it's 
triyng to access the process, the answer is really slow and this is the 
output:

?---------  ? ?        ?           ?            ? 
1151074448.M345358P6861_courierlock.qmail-be-04
?---------  ? ?        ?           ?            ? 
1151074497.M326691P7647_courierlock.qmail-be-04
?---------  ? ?        ?           ?            ? 
1151074534.M524707P2198_courierlock.qmail-be-05
-rw-r--r--  1 mailuser mailuser   17 Jun 23 12:07 
1151074538.M785749P13408_courierlock.qmail-be-03
-rw-r--r--  1 mailuser mailuser   16 Jun 23 12:09 
1151074588.M917441P3132_courierlock.qmail-be-05
-rw-r--r--  1 mailuser mailuser   16 Jun 23 12:09 
1151074593.M62901P3189_courierlock.qmail-be-05
?---------  ? ?        ?           ?            ? 
1151074649.M845223P5214_courierlock.qmail-be-02
-rw-r--r--  1 mailuser mailuser   17 Jun 23 12:09 
1151074656.M448306P28724_courierlock.qmail-be-06
-rw-r--r--  1 mailuser mailuser   16 Jun 23 12:07 
1151074657.M188653P5302_courierlock.qmail-be-02
?---------  ? ?        ?           ?            ? 
1151074679.M821433P4979_courierlock.qmail-be-05
-rw-r--r--  1 mailuser mailuser   16 Jun 23 12:07 
1151074690.M360083P5741_courierlock.qmail-be-02
-rw-r--r--  1 mailuser mailuser   17 Jun 23 12:07 
1151074701.M709923P29422_courierlock.qmail-be-06
-rw-r--r--  1 mailuser mailuser   16 Jun 23 12:07 
1151074716.M544858P6016_courierlock.qmail-be-02
-rw-r--r--  1 mailuser mailuser   16 Jun 23 12:07 
1151074731.M21587P6179_courierlock.qmail-be-02
?---------  ? ?        ?           ?            ? 
1151074804.M241436P7410_courierlock.qmail-be-02
?---------  ? ?        ?           ?            ? 
1151074831.M678238P17302_courierlock.qmail-be-03
?---------  ? ?        ?           ?            ? 
1151074917.M42708P8494_courierlock.qmail-be-05
-rw-r--r--  1 mailuser mailuser   17 Jun 23 12:08 
1151074918.M541477P14716_courierlock.qmail-be-04
-rw-r--r--  1 mailuser mailuser   17 Jun 23 12:08 
1151074946.M520653P15248_courierlock.qmail-be-04
?---------  ? ?        ?           ?            ? 
1151075037.M234721P11020_courierlock.qmail-be-02
?---------  ? ?        ?           ?            ? 
1151075065.M951224P8598_courierlock.qmail-be-01
-rw-r--r--  1 mailuser mailuser   17 Jun 23 12:09 
1151075082.M788480P11712_courierlock.qmail-be-02
-rw-r--r--  1 mailuser mailuser   17 Jun 23 12:09 
1151075186.M911867P18565_courierlock.qmail-be-04
-rw-r--r--  1 mailuser mailuser   17 Jun 23 12:08 
1151075210.M366861P13891_courierlock.qmail-be-02
-rw-r--r--  1 mailuser mailuser   17 Jun 23 12:09 
1151075217.M850817P13366_courierlock.qmail-be-05
?---------  ? ?        ?           ?            ? 
1151075252.M599978P32483_imapuid_4.qmail-be-05

It seems like a lock problem, but not sure. Is there any other tool that 
I can use to debug this?
Thanks
German


From lhh at redhat.com  Fri Jun 23 15:41:05 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 23 Jun 2006 11:41:05 -0400
Subject: [Linux-cluster] Cluster Failover Scripts...
In-Reply-To: <15509160.1150950238347.JavaMail.root@brimley.host.advance.net>
References: <15509160.1150950238347.JavaMail.root@brimley.host.advance.net>
Message-ID: <1151077265.20766.476.camel@ayanami.boston.redhat.com>

On Thu, 2006-06-22 at 00:23 -0400, Vito Laurenza wrote:
> Hello all,
>  
> I've written a script to notify via email on Cluster Suite events
> (failovers, etc) and have added it to my service in cluster.conf, but
> I've noticed that manual failovers are not processing properly.  My
> failover script runs, serviceB is relocated, but serviceA only stops
> on the source node and does not start on the destination node.

That's odd; what version of rgmanager is it?
 
> Is it ok to list more than one script per service in cluster.conf?  Am
> I going about this the wrong way?

Perfectly okay to list as many scripts as you want.  Out of curiosity,
does service B depend on service A?

>  <rm>
>    <failoverdomains>
>         <failoverdomain name="scalix_cluster_001" ordered="1"
> restricted="1">
>                 <failoverdomainnode name="serverA" priority="1"/>
>                 <failoverdomainnode name="serverB" priority="2"/>
>         </failoverdomain>
>    </failoverdomains>
>    <resources>
>         <ip address="<myip>" monitor_link="1"/>
>    </resources>
>    <service autostart="1" name="myservice">
>         <ip ref="<myip>"/>
>         <fs device="/dev/device" fstype="ext3" mountpoint="/somewhere"
> force_unmount="1" name="mysharedstorage"/>
>         <script file="/usr/local/bin/failover-notify"
> name="notify_script"/>
>         <scritp file="/etc/init.d/serviceA" name="serviceA_script"/>
>         <script file="/etc/init.d/serviceB" name="serviceB_script"/>
>    </service>
>  </rm>

That looks fine.

-- Lon


From lhh at redhat.com  Fri Jun 23 15:47:10 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 23 Jun 2006 11:47:10 -0400
Subject: [Linux-cluster] Using RHCS with drbd
In-Reply-To: <449A78E5.9000004@gmail.com>
References: <449A78E5.9000004@gmail.com>
Message-ID: <1151077630.20766.482.camel@ayanami.boston.redhat.com>

On Thu, 2006-06-22 at 13:03 +0200, carlopmart wrote:
> Hi all,
> 
>   I need to deploy different drbd devices using RHCS (without GFS and a 
> shared storage). Is it possible to mount drbd devices before rgmanager 
> starts up configrued services? Do I use /etc/ha.d/resources.d/drbddisk 
> provided by drbd package (heatbeat use this script to accomplish this)?
> 
> Many thans.
> 

I haven't tried this; I assume /etc/ha.d/resources.d/drbddisk is a
script, right?  If so...

  <service name="foo" ...>
    <script name="drbd" file="/etc/ha.d/resources.d/drbddisk">
      <script name="httpd" file="/etc/init.d/httpd"/>
    </script>
  </service>

This is basically a parent/child dependency: "drbd" will be started
before "httpd", and stopped after httpd has been stopped.

It might be possible to grab linux-ha's drbd resource and use it almost
out of the box with RHCS, but I haven't tried this either.  Either way,
rgmanager should probably have a DRBD resource agent at some point.

-- Lon


From rpeterso at redhat.com  Fri Jun 23 20:57:55 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Fri, 23 Jun 2006 15:57:55 -0500
Subject: [Linux-cluster] files with unknown state - locking problem?
In-Reply-To: <449C05C6.9070406@arnet.net.ar>
References: <449C05C6.9070406@arnet.net.ar>
Message-ID: <449C55D3.6000901@redhat.com>

German Staltari wrote:
> Hi, we have a 6 node cluster with FC4, kernel 2.6.16 and the CVS 
> STABLE branch of the cluster software. Sometimes, some processes 
> (courier imap) hangs in D state. When I execute "ls -la" in the "tmp" 
> directory (the directory is always the same, the same mailbox) of the 
> mailbox that it's triyng to access the process, the answer is really 
> slow and this is the output:
>
> ?---------  ? ?        ?           ?            ? 
> 1151074448.M345358P6861_courierlock.qmail-be-04
> ?---------  ? ?        ?           ?            ? 
> 1151074497.M326691P7647_courierlock.qmail-be-04
> ?---------  ? ?        ?           ?            ? 
> 1151074534.M524707P2198_courierlock.qmail-be-05
> -rw-r--r--  1 mailuser mailuser   17 Jun 23 12:07 
> 1151074538.M785749P13408_courierlock.qmail-be-03
> -rw-r--r--  1 mailuser mailuser   16 Jun 23 12:09 
> 1151074588.M917441P3132_courierlock.qmail-be-05
> -rw-r--r--  1 mailuser mailuser   16 Jun 23 12:09 
> 1151074593.M62901P3189_courierlock.qmail-be-05
> ?---------  ? ?        ?           ?            ? 
> 1151074649.M845223P5214_courierlock.qmail-be-02
> -rw-r--r--  1 mailuser mailuser   17 Jun 23 12:09 
> 1151074656.M448306P28724_courierlock.qmail-be-06
> -rw-r--r--  1 mailuser mailuser   16 Jun 23 12:07 
> 1151074657.M188653P5302_courierlock.qmail-be-02
> ?---------  ? ?        ?           ?            ? 
> 1151074679.M821433P4979_courierlock.qmail-be-05
> -rw-r--r--  1 mailuser mailuser   16 Jun 23 12:07 
> 1151074690.M360083P5741_courierlock.qmail-be-02
> -rw-r--r--  1 mailuser mailuser   17 Jun 23 12:07 
> 1151074701.M709923P29422_courierlock.qmail-be-06
> -rw-r--r--  1 mailuser mailuser   16 Jun 23 12:07 
> 1151074716.M544858P6016_courierlock.qmail-be-02
> -rw-r--r--  1 mailuser mailuser   16 Jun 23 12:07 
> 1151074731.M21587P6179_courierlock.qmail-be-02
> ?---------  ? ?        ?           ?            ? 
> 1151074804.M241436P7410_courierlock.qmail-be-02
> ?---------  ? ?        ?           ?            ? 
> 1151074831.M678238P17302_courierlock.qmail-be-03
> ?---------  ? ?        ?           ?            ? 
> 1151074917.M42708P8494_courierlock.qmail-be-05
> -rw-r--r--  1 mailuser mailuser   17 Jun 23 12:08 
> 1151074918.M541477P14716_courierlock.qmail-be-04
> -rw-r--r--  1 mailuser mailuser   17 Jun 23 12:08 
> 1151074946.M520653P15248_courierlock.qmail-be-04
> ?---------  ? ?        ?           ?            ? 
> 1151075037.M234721P11020_courierlock.qmail-be-02
> ?---------  ? ?        ?           ?            ? 
> 1151075065.M951224P8598_courierlock.qmail-be-01
> -rw-r--r--  1 mailuser mailuser   17 Jun 23 12:09 
> 1151075082.M788480P11712_courierlock.qmail-be-02
> -rw-r--r--  1 mailuser mailuser   17 Jun 23 12:09 
> 1151075186.M911867P18565_courierlock.qmail-be-04
> -rw-r--r--  1 mailuser mailuser   17 Jun 23 12:08 
> 1151075210.M366861P13891_courierlock.qmail-be-02
> -rw-r--r--  1 mailuser mailuser   17 Jun 23 12:09 
> 1151075217.M850817P13366_courierlock.qmail-be-05
> ?---------  ? ?        ?           ?            ? 
> 1151075252.M599978P32483_imapuid_4.qmail-be-05
>
> It seems like a lock problem, but not sure. Is there any other tool 
> that I can use to debug this?
> Thanks
> German
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
Hi German,

I suspect you are right: The question marks in ls -l leads me to believe 
there
might be a problem somewhere regarding the locking of the files.  My 
theory is this:
ls -l calls a kernel stat function to get file statistics.  The stat 
tries to acquire an internal
lock (glock), but can't, so it displays what you see instead of valid 
values.

Perhaps courier imap is locking files, then hanging, and the process is 
somehow
hanging around with the lock intact, or else killed abnormally where the 
lock is not released.
Do you have any suggestions how we can recreate this problem in our lab?

Regards,

Bob Peterson
Red Hat Cluster Suite


From Quentin.Arce at Sun.COM  Fri Jun 23 23:20:24 2006
From: Quentin.Arce at Sun.COM (qarce)
Date: Fri, 23 Jun 2006 16:20:24 -0700
Subject: [Linux-cluster] GFS and iscsi ?
Message-ID: <449C7738.2070507@Sun.Com>

Hi,

I just sent this to the irc room ... but everyone seems to be out.


(15:53:58) *qarce:* I have a GFS ?
(15:54:28) *qarce:* I have setup a redhat cluster with 2 nodes. I can 
mount my GFS block device and use it.
(15:54:38) *qarce:* but I would like to to happen automatically
(15:54:57) *qarce:* the GFS block device I'm using is an iSCSI disk
(16:07:29) *qarce:* Comments anyone?

I have tried adding a line to the fstab

/dev/sdd1 /data gfs defaults 0 0

This didn't work.

I tried adding a line to /etc/rc.local to mount it but this didn't work.
iscsi-rescan
mount -t gfs /dev/sdd1 /data

If I reboot, login and just run

mount -t gfs /dev/sdd1 /data

it mounts just fine.

Comments / Ideas / Your thoughts.

Oh.  I can't change change to a different back end block device.  I have 
to use iSCSI.

Thank you,

Quentin


From gstaltari at arnet.net.ar  Fri Jun 23 22:23:05 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Fri, 23 Jun 2006 19:23:05 -0300
Subject: [Linux-cluster] files with unknown state - locking problem?
In-Reply-To: <449C55D3.6000901@redhat.com>
References: <449C05C6.9070406@arnet.net.ar> <449C55D3.6000901@redhat.com>
Message-ID: <449C69C9.8070300@arnet.net.ar>

Robert Peterson wrote:
> I suspect you are right: The question marks in ls -l leads me to 
> believe there
> might be a problem somewhere regarding the locking of the files.  My 
> theory is this:
> ls -l calls a kernel stat function to get file statistics.  The stat 
> tries to acquire an internal
> lock (glock), but can't, so it displays what you see instead of valid 
> values.
>
> Perhaps courier imap is locking files, then hanging, and the process 
> is somehow
> hanging around with the lock intact, or else killed abnormally where 
> the lock is not released.
> Do you have any suggestions how we can recreate this problem in our lab?
>
Robert, here is the scenario: we have a pool of webmail servers 
accessing mailboxes in the cluster via IMAP (courier-imap 3.0.8). One 
user may hit the mailbox a couple of times in a short period of time 
when using the webmail (webmail is stateless, so it has to reread all 
mailbox structure in each webmail action).

Maybe the problem is in courier-imap, so I was digging it's source code, 
and found all locking stuff in liblock and maildir directories 
(maildir/maildirlock.c -> maildir_lock(), liblock/mail.c -> 
ll_dotlock()). I know that we are not debugging courier-imap, but it may 
help.
I found a post of Lon 
http://www.redhat.com/archives/linux-cluster/2004-October/msg00306.html, 
that recommends using IMAP_USELOCKS and I've checked our conf and it's 
enabled.

So, maybe the best way to recreate the problem is installing 
courier-imap, and access a mailbox user from different clients at the 
same time (in a short period of time would be best).
I hope this helps,
Thanks for your help and time,
German


So, I think it could be possible


From michaelc at cs.wisc.edu  Sat Jun 24 00:13:36 2006
From: michaelc at cs.wisc.edu (Mike Christie)
Date: Fri, 23 Jun 2006 19:13:36 -0500
Subject: [Linux-cluster] GFS and iscsi ?
In-Reply-To: <449C7738.2070507@Sun.Com>
References: <449C7738.2070507@Sun.Com>
Message-ID: <449C83B0.5060000@cs.wisc.edu>

qarce wrote:
> Hi,
> 
> I just sent this to the irc room ... but everyone seems to be out.
> 
> 
> (15:53:58) *qarce:* I have a GFS ?
> (15:54:28) *qarce:* I have setup a redhat cluster with 2 nodes. I can
> mount my GFS block device and use it.
> (15:54:38) *qarce:* but I would like to to happen automatically
> (15:54:57) *qarce:* the GFS block device I'm using is an iSCSI disk
> (16:07:29) *qarce:* Comments anyone?
> 
> I have tried adding a line to the fstab
> 
> /dev/sdd1 /data gfs defaults 0 0
> 

See the readme
http://people.redhat.com/mchristi/iscsi/RHEL4/doc/readme

You need to use _netdev in that fstab entry.


From Quentin.Arce at Sun.COM  Sat Jun 24 00:17:26 2006
From: Quentin.Arce at Sun.COM (qarce)
Date: Fri, 23 Jun 2006 17:17:26 -0700
Subject: [Linux-cluster] GFS and iscsi ?
In-Reply-To: <449C83B0.5060000@cs.wisc.edu>
References: <449C7738.2070507@Sun.Com> <449C83B0.5060000@cs.wisc.edu>
Message-ID: <449C8496.40801@Sun.Com>

Mike Christie wrote:

>qarce wrote:
>  
>
>>Hi,
>>
>>I just sent this to the irc room ... but everyone seems to be out.
>>
>>
>>(15:53:58) *qarce:* I have a GFS ?
>>(15:54:28) *qarce:* I have setup a redhat cluster with 2 nodes. I can
>>mount my GFS block device and use it.
>>(15:54:38) *qarce:* but I would like to to happen automatically
>>(15:54:57) *qarce:* the GFS block device I'm using is an iSCSI disk
>>(16:07:29) *qarce:* Comments anyone?
>>
>>I have tried adding a line to the fstab
>>
>>/dev/sdd1 /data gfs defaults 0 0
>>
>>    
>>
>
>See the readme
>http://people.redhat.com/mchristi/iscsi/RHEL4/doc/readme
>
>You need to use _netdev in that fstab entry.
>  
>
THANK 
YOU!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


SO, much :-)

Q

>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>  
>


From aneesh.kumar at gmail.com  Mon Jun 26 17:58:30 2006
From: aneesh.kumar at gmail.com (Aneesh Kumar)
Date: Mon, 26 Jun 2006 23:28:30 +0530
Subject: [Linux-cluster] Help with CMAN
Message-ID: <cc723f590606261058q6dd1e29cp98a5dfcff910f274@mail.gmail.com>

Hi All,

I am looking at using CMAN as the membership manager for the cluster
project i am working on. The project ( http://ci-linux.sf.net ) helps
in writing kernel cluster services much easier and also make the
service independent of transport. We TCP/IP and IP over infiniband
already done. People are working on tranport using IB verbs.

  I am right now trying to understand how to use CMAN as the
membership service. Which is the source code against i should work on.
 i was going through the code and found a cman-kernel and cman
directory. But then cman-kernel module was making some  level
communication from within the kernel.  Is there a documentation
explaining these components and how they interact ?

-aneesh


From sdake at redhat.com  Mon Jun 26 19:47:38 2006
From: sdake at redhat.com (Steven Dake)
Date: Mon, 26 Jun 2006 12:47:38 -0700
Subject: [Linux-cluster] Help with CMAN
In-Reply-To: <cc723f590606261058q6dd1e29cp98a5dfcff910f274@mail.gmail.com>
References: <cc723f590606261058q6dd1e29cp98a5dfcff910f274@mail.gmail.com>
Message-ID: <1151351258.30084.40.camel@shih.broked.org>

Aneesh,

In the latest code, the membership layer is handled entirely in
userspace.  The CMAN component is a plugin of the openais standards
based cluster framework.  openais uses a protocol called The Totem
Single Ring Ordering and Membership protocol for all communication.

It would be possible to feed membership messages and regular messages
from totem into the kernel using configfs or some other system.  I
believe some of this work has already been done.  Dave would know more
since its his area of expertise.

Regards
-steve

On Mon, 2006-06-26 at 23:28 +0530, Aneesh Kumar wrote:
> Hi All,
> 
> I am looking at using CMAN as the membership manager for the cluster
> project i am working on. The project ( http://ci-linux.sf.net ) helps
> in writing kernel cluster services much easier and also make the
> service independent of transport. We TCP/IP and IP over infiniband
> already done. People are working on tranport using IB verbs.
> 
>   I am right now trying to understand how to use CMAN as the
> membership service. Which is the source code against i should work on.
>  i was going through the code and found a cman-kernel and cman
> directory. But then cman-kernel module was making some  level
> communication from within the kernel.  Is there a documentation
> explaining these components and how they interact ?
> 
> -aneesh
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From teigland at redhat.com  Mon Jun 26 20:29:00 2006
From: teigland at redhat.com (David Teigland)
Date: Mon, 26 Jun 2006 15:29:00 -0500
Subject: [Linux-cluster] Help with CMAN
In-Reply-To: <1151351258.30084.40.camel@shih.broked.org>
References: <cc723f590606261058q6dd1e29cp98a5dfcff910f274@mail.gmail.com>
	<1151351258.30084.40.camel@shih.broked.org>
Message-ID: <20060626202859.GD1375@redhat.com>

On Mon, Jun 26, 2006 at 12:47:38PM -0700, Steven Dake wrote:
> Aneesh,
> 
> In the latest code, the membership layer is handled entirely in
> userspace.  The CMAN component is a plugin of the openais standards
> based cluster framework.  openais uses a protocol called The Totem
> Single Ring Ordering and Membership protocol for all communication.
> 
> It would be possible to feed membership messages and regular messages
> from totem into the kernel using configfs or some other system.  I
> believe some of this work has already been done.  Dave would know more
> since its his area of expertise.

GFS and DLM now both have userland components to interact with
cman/openais clustering infrastructure, see the gfs_controld and
dlm_controld daemons.

Dave


From Fabrizio.Lippolis at AurigaInformatica.it  Tue Jun 27 08:40:21 2006
From: Fabrizio.Lippolis at AurigaInformatica.it (Fabrizio Lippolis)
Date: Tue, 27 Jun 2006 10:40:21 +0200
Subject: [Linux-cluster] "Missed too many heartbeats" messages and hung
	cluster
Message-ID: <44A0EEF5.2090501@aurigainformatica.it>

I have configured two machines in a cluster domain to run mysql and ldap 
services. Everything works correctly except that from time to time, 
seems randomly, the two machines hung. Recently this is what I see in 
the log of the second machine:

Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node AICLSRV01 from the 
cluster : Missed too many heartbeats
Jun 23 23:37:17 AICLSRV02 fenced[2004]: AICLSRV01 not a cluster member 
after 0 sec post_fail_delay
Jun 23 23:37:17 AICLSRV02 fenced[2004]: fencing node "AICLSRV01"
Jun 23 23:37:17 AICLSRV02 fence_manual: Node AICLSRV01 needs to be reset 
before recovery can procede.  Waiting for AICLSRV01 to rejoin the 
cluster or for manual acknowledgement that it has been reset (i.e. 
fence_ack_manual -n AICLSRV01)

A few seconds later the same messages appeared on the first machine:

Jun 23 23:37:36 AICLSRV01 kernel: CMAN: removing node AICLSRV02 from the 
cluster : Missed too many heartbeats
Jun 23 23:37:36 AICLSRV01 fenced[2084]: AICLSRV02 not a cluster member 
after 0 sec post_fail_delay
Jun 23 23:37:36 AICLSRV01 fenced[2084]: fencing node "AICLSRV02"
Jun 23 23:37:39 AICLSRV01 fence_manual: Node AICLSRV02 needs to be reset 
before recovery can procede.  Waiting for AICLSRV02 to rejoin the 
cluster or for manual acknowledgement that it has been reset (i.e. 
fence_ack_manual -n AICLSRV02)

The two machines have been resetted to let them work again. Anybody 
could please explain what happened to cause this problem? I would also 
need a suggestion on how to configure a fence device so that the 
services could still continue to work. As you see actually I configured 
manual fence but that's not much useful. Thank you in advance.

-- 
Fabrizio Lippolis                fabrizio.lippolis at aurigainformatica.it
Auriga Informatica s.r.l.            Via Don Guanella 15/B - 70124 Bari
Tel.: 080/5025414 - Fax: 080/5027448 - http://www.aurigainformatica.it/


From l.dardini at comune.prato.it  Tue Jun 27 08:51:12 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Tue, 27 Jun 2006 10:51:12 +0200
Subject: R: [Linux-cluster] "Missed too many heartbeats" messages and
	hungcluster
In-Reply-To: <44A0EEF5.2090501@aurigainformatica.it>
Message-ID: <0C5C8B118420264EBB94D7D7050150012A0028@exchange2.comune.prato.local>

Running a two machine cluster is a bad thing (but due to budget limitation, I am doing the same bad thing). If something happens between the two machine, they fence each other. In this particular case, I think you have some sort of network problem between the two machine. You can try to "ping" each other and see, when the problem arise, the connectivity state. Maybe a "too much intelligent switch" is handling the traffic and have some sort of "traffic shaping and control".

Leandro


-----Messaggio originale-----
Da: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] Per conto di Fabrizio Lippolis
Inviato: marted? 27 giugno 2006 10.40
A: linux-cluster at redhat.com
Oggetto: [Linux-cluster] "Missed too many heartbeats" messages and hungcluster

I have configured two machines in a cluster domain to run mysql and ldap services. Everything works correctly except that from time to time, seems randomly, the two machines hung. Recently this is what I see in the log of the second machine:

Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node AICLSRV01 from the cluster : Missed too many heartbeats Jun 23 23:37:17 AICLSRV02 fenced[2004]: AICLSRV01 not a cluster member after 0 sec post_fail_delay Jun 23 23:37:17 AICLSRV02 fenced[2004]: fencing node "AICLSRV01"
Jun 23 23:37:17 AICLSRV02 fence_manual: Node AICLSRV01 needs to be reset before recovery can procede.  Waiting for AICLSRV01 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. 
fence_ack_manual -n AICLSRV01)

A few seconds later the same messages appeared on the first machine:

Jun 23 23:37:36 AICLSRV01 kernel: CMAN: removing node AICLSRV02 from the cluster : Missed too many heartbeats Jun 23 23:37:36 AICLSRV01 fenced[2084]: AICLSRV02 not a cluster member after 0 sec post_fail_delay Jun 23 23:37:36 AICLSRV01 fenced[2084]: fencing node "AICLSRV02"
Jun 23 23:37:39 AICLSRV01 fence_manual: Node AICLSRV02 needs to be reset before recovery can procede.  Waiting for AICLSRV02 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. 
fence_ack_manual -n AICLSRV02)

The two machines have been resetted to let them work again. Anybody could please explain what happened to cause this problem? I would also need a suggestion on how to configure a fence device so that the services could still continue to work. As you see actually I configured manual fence but that's not much useful. Thank you in advance.

-- 
Fabrizio Lippolis                fabrizio.lippolis at aurigainformatica.it
Auriga Informatica s.r.l.            Via Don Guanella 15/B - 70124 Bari
Tel.: 080/5025414 - Fax: 080/5027448 - http://www.aurigainformatica.it/

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From pcaulfie at redhat.com  Tue Jun 27 09:01:59 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 27 Jun 2006 10:01:59 +0100
Subject: [Linux-cluster] "Missed too many heartbeats" messages and hung
	cluster
In-Reply-To: <44A0EEF5.2090501@aurigainformatica.it>
References: <44A0EEF5.2090501@aurigainformatica.it>
Message-ID: <44A0F407.4090403@redhat.com>

Fabrizio Lippolis wrote:
> I have configured two machines in a cluster domain to run mysql and ldap
> services. Everything works correctly except that from time to time,
> seems randomly, the two machines hung. Recently this is what I see in
> the log of the second machine:
> 
> Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node AICLSRV01 from the
> cluster : Missed too many heartbeats


That message means that the heartbeat messages are getting lost somehow.
either through an unreliable network link or something else odd happening on
the machine to prevent the heartbeat packets reaching the network.

> 
> The two machines have been resetted to let them work again. Anybody
> could please explain what happened to cause this problem? I would also
> need a suggestion on how to configure a fence device so that the
> services could still continue to work. As you see actually I configured
> manual fence but that's not much useful. Thank you in advance.
> 


-- 

patrick


From Fabrizio.Lippolis at AurigaInformatica.it  Tue Jun 27 09:51:58 2006
From: Fabrizio.Lippolis at AurigaInformatica.it (Fabrizio Lippolis)
Date: Tue, 27 Jun 2006 11:51:58 +0200
Subject: R: [Linux-cluster] "Missed too many heartbeats" messages and
	hung cluster
In-Reply-To: <0C5C8B118420264EBB94D7D7050150012A0028@exchange2.comune.prato.local>
References: <0C5C8B118420264EBB94D7D7050150012A0028@exchange2.comune.prato.local>
Message-ID: <44A0FFBE.4040904@aurigainformatica.it>

Leandro Dardini ha scritto:

> If something happens between the two machine, they fence each other.

I have configured manual fencing but as I wrote it's not much useful 
since, I think, requires manual handling which couldn't be possible 
immediately. Therefore I am looking for a method to let the services run 
even if such a thing happens. This is not the first time the problem 
arises, apparently without a reason, though the last time happened long 
time ago.

> You can try to "ping" each other and see, when the problem arise, the connectivity state.

Sometimes the machines are completely locked and it's not even possible 
to log in. A brute force switch off is necessary in this case. Sometimes 
looks like only the cluster service is locked and I can regularly ping 
the other machine though the cluster is not working.

> Maybe a "too much intelligent switch" is handling the traffic and have some sort of "traffic shaping and control".

There is nothing like that, the two machines are connected by a 1GB 
crossover cable, not even so long, provided by HP with the two machines.

-- 
Fabrizio Lippolis                fabrizio.lippolis at aurigainformatica.it
Auriga Informatica s.r.l.            Via Don Guanella 15/B - 70124 Bari
Tel.: 080/5025414 - Fax: 080/5027448 - http://www.aurigainformatica.it/


From l.dardini at comune.prato.it  Tue Jun 27 10:04:36 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Tue, 27 Jun 2006 12:04:36 +0200
Subject: R: R: [Linux-cluster] "Missed too many heartbeats" messages andhung
	cluster
In-Reply-To: <44A0FFBE.4040904@aurigainformatica.it>
Message-ID: <0C5C8B118420264EBB94D7D7050150012A0034@exchange2.comune.prato.local>

 
> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] Per conto di 
> Fabrizio Lippolis
> Inviato: marted? 27 giugno 2006 11.52
> A: linux clustering
> Oggetto: Re: R: [Linux-cluster] "Missed too many heartbeats" 
> messages andhung cluster
> 
> Leandro Dardini ha scritto:
> 
> > If something happens between the two machine, they fence each other.
> 
> I have configured manual fencing but as I wrote it's not much 
> useful since, I think, requires manual handling which 
> couldn't be possible immediately. Therefore I am looking for 
> a method to let the services run even if such a thing 
> happens. This is not the first time the problem arises, 
> apparently without a reason, though the last time happened 
> long time ago.
> 
> > You can try to "ping" each other and see, when the problem 
> arise, the connectivity state.
> 
> Sometimes the machines are completely locked and it's not 
> even possible to log in. A brute force switch off is 
> necessary in this case. Sometimes looks like only the cluster 
> service is locked and I can regularly ping the other machine 
> though the cluster is not working.

This is really bad. This smells like an hardware problem or buggy kernel driver. Try to stress test the machines individually without cluster support. I usually start with a memtest from a Knoppix CD and then build a kernel for CPU stress. Try to transfer huge chunk of data to test the lan.

Leandro

> 
> > Maybe a "too much intelligent switch" is handling the 
> traffic and have some sort of "traffic shaping and control".
> 
> There is nothing like that, the two machines are connected by 
> a 1GB crossover cable, not even so long, provided by HP with 
> the two machines.
> 
> -- 
> Fabrizio Lippolis                
> fabrizio.lippolis at aurigainformatica.it
> Auriga Informatica s.r.l.            Via Don Guanella 15/B - 
> 70124 Bari
> Tel.: 080/5025414 - Fax: 080/5027448 - 
> http://www.aurigainformatica.it/
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


From vaden at texoma.net  Tue Jun 27 13:24:43 2006
From: vaden at texoma.net (Larry Vaden)
Date: Tue, 27 Jun 2006 08:24:43 -0500
Subject: [Linux-cluster] best csgfs howto not presuming/requiring gui
Message-ID: <c41c77a90606270624p39382bdeo3e844e21fd6e0d5@mail.gmail.com>

The RH documentation apparently presumes/requires the use of gui/X11.

What's the best howto if one chooses not to use gui/X11 on the servers
to be clustered?

Kind regards,

Larry Vaden
Internet Texoma, Inc.


From Fabrizio.Lippolis at AurigaInformatica.it  Tue Jun 27 13:35:35 2006
From: Fabrizio.Lippolis at AurigaInformatica.it (Fabrizio Lippolis)
Date: Tue, 27 Jun 2006 15:35:35 +0200
Subject: [Linux-cluster] "Missed too many heartbeats" messages and hung
	cluster
In-Reply-To: <44A0F407.4090403@redhat.com>
References: <44A0EEF5.2090501@aurigainformatica.it>
	<44A0F407.4090403@redhat.com>
Message-ID: <44A13427.7080709@aurigainformatica.it>

Patrick Caulfield ha scritto:

>> Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node AICLSRV01 from the
>> cluster : Missed too many heartbeats
> 
> 
> That message means that the heartbeat messages are getting lost somehow.
> either through an unreliable network link or something else odd happening on
> the machine to prevent the heartbeat packets reaching the network.

This is very strange since the two machines are connected by a gigabit 
crossover cable and no other device is in the middle. Also, no firewall 
rules are configured on any machine.

By the way, actually I am using the fence manual method but it isn't 
much helpful and I would like to switch to a method that ensures a 
reliable service. Does it mean I have to buy a device sitting in the 
middle of the machines that connects network and power cables? I am 
rather new to it so please any suggestion is welcome.

-- 
Fabrizio Lippolis                fabrizio.lippolis at aurigainformatica.it
Auriga Informatica s.r.l.            Via Don Guanella 15/B - 70124 Bari
Tel.: 080/5025414 - Fax: 080/5027448 - http://www.aurigainformatica.it/


From Matthew.Patton.ctr at osd.mil  Tue Jun 27 13:41:07 2006
From: Matthew.Patton.ctr at osd.mil (Patton, Matthew F, CTR, OSD-PA&E)
Date: Tue, 27 Jun 2006 09:41:07 -0400
Subject: [Linux-cluster] best csgfs howto not presuming/requiring gui
Message-ID: <D8063DF686D10247B0A49D01271285690CE91E4B@osdn06.osd.mil>

Classification: UNCLASSIFIED

I noticed that too. Redhat, please make it a policy such that RHEL4.4 onward
that command-line tools will be the first and primary means of configuring
anything not expressly GUI-related. requiring GUI tools to admin a server is
well, unprintable.

> -----Original Message-----
> The RH documentation apparently presumes/requires the use of gui/X11.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060627/bf691210/attachment.htm>

From rpeterso at redhat.com  Tue Jun 27 14:06:34 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Tue, 27 Jun 2006 09:06:34 -0500
Subject: [Linux-cluster] best csgfs howto not presuming/requiring gui
In-Reply-To: <c41c77a90606270624p39382bdeo3e844e21fd6e0d5@mail.gmail.com>
References: <c41c77a90606270624p39382bdeo3e844e21fd6e0d5@mail.gmail.com>
Message-ID: <44A13B6A.2060503@redhat.com>

Larry Vaden wrote:
> The RH documentation apparently presumes/requires the use of gui/X11.
>
> What's the best howto if one chooses not to use gui/X11 on the servers
> to be clustered?
>
> Kind regards,
>
> Larry Vaden
> Internet Texoma, Inc.
Hi Larry,

I've mentioned this before, but I've got an "NFS/GFS Cookbook" that has 
step-by-step
instructions for setting up a cluster using both the GUI or 
command-line.  I think it's
more geared toward command-line because I didn't even include 
screen-shots of the gui.
It's located here:

http://sources.redhat.com/cluster/doc/nfscookbook.pdf - The Unofficial 
NFS/GFS Cookbook.

I don't think I'd call it "the best howto" because I know it needs some 
work.  
(People have sent me corrections that I haven't had time to implement yet).
It's not even an official Red Hat document, but I've been trying to push 
it that
direction.  I hope this helps.

And of course, if you have corrections, please send them my way and I'll 
eventually
get time to implement them.

Regards,

Bob Peterson
Red Hat Cluster Suite


From doc at zwecker.de  Tue Jun 27 15:30:56 2006
From: doc at zwecker.de (Christophe Zwecker)
Date: Tue, 27 Jun 2006 17:30:56 +0200
Subject: [Linux-cluster] is PVFS Raid0 or Raid1 over Network
Message-ID: <44A14F30.80600@zwecker.de>

Hi,

I (newbie) am currently looking into getting somthing like Raid1 over 
Network for our 2 Node Cluster, I ran over PVFS and not sure wether its 
suited for us.
We want somthing like having a local storage on each node that syncs 
withthe storage on the other node, like saying raid1 over network.

is that what pvfs is doing ?

thx for clearing this up for me.

Christophe
-- 
Christophe Zwecker                     mail: doc at zwecker.de
Hamburg, Germany                        fon: +49 179 3994867
http://www.zwecker.de

"Reality is that which, when you stop believing in it, doesn't go away"


From vaden at texoma.net  Tue Jun 27 16:49:21 2006
From: vaden at texoma.net (Larry Vaden)
Date: Tue, 27 Jun 2006 11:49:21 -0500
Subject: [Linux-cluster] best csgfs howto not presuming/requiring gui
In-Reply-To: <44A13B6A.2060503@redhat.com>
References: <c41c77a90606270624p39382bdeo3e844e21fd6e0d5@mail.gmail.com>
	<44A13B6A.2060503@redhat.com>
Message-ID: <c41c77a90606270949uf8dd362l251f3ab62de75742@mail.gmail.com>

On 6/27/06, Robert Peterson <rpeterso at redhat.com> wrote:
> Larry Vaden wrote:
> > The RH documentation apparently presumes/requires the use of gui/X11.
> >
> > What's the best howto if one chooses not to use gui/X11 on the servers
> > to be clustered?
> >
> > Kind regards,
> >
> > Larry Vaden
> > Internet Texoma, Inc.
> Hi Larry,
>
> I've mentioned this before, but I've got an "NFS/GFS Cookbook" that has
> step-by-step
> instructions for setting up a cluster using both the GUI or
> command-line.  I think it's
> more geared toward command-line because I didn't even include
> screen-shots of the gui.
> It's located here:
>
> http://sources.redhat.com/cluster/doc/nfscookbook.pdf - The Unofficial
> NFS/GFS Cookbook.
>
> I don't think I'd call it "the best howto" because I know it needs some
> work.
> (People have sent me corrections that I haven't had time to implement yet).
> It's not even an official Red Hat document, but I've been trying to push
> it that
> direction.  I hope this helps.
>
> And of course, if you have corrections, please send them my way and I'll
> eventually
> get time to implement them.
>
> Regards,
>
> Bob Peterson
> Red Hat Cluster Suite

Hi Bob,

I think I'll break list netiquette and thank you for your work and for
the clue(s).

Has GULM been more or less officially deprecated in favor of DLM?

Any other gotchas for late adopters?

rgds/ldv


From rpeterso at redhat.com  Tue Jun 27 17:57:44 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Tue, 27 Jun 2006 12:57:44 -0500
Subject: [Linux-cluster] best csgfs howto not presuming/requiring gui
In-Reply-To: <c41c77a90606270949uf8dd362l251f3ab62de75742@mail.gmail.com>
References: <c41c77a90606270624p39382bdeo3e844e21fd6e0d5@mail.gmail.com>	<44A13B6A.2060503@redhat.com>
	<c41c77a90606270949uf8dd362l251f3ab62de75742@mail.gmail.com>
Message-ID: <44A17198.2090807@redhat.com>

Larry Vaden wrote:
> Hi Bob,
>
> I think I'll break list netiquette and thank you for your work and for
> the clue(s).
>
> Has GULM been more or less officially deprecated in favor of DLM?
>
> Any other gotchas for late adopters?
>
> rgds/ldv
Hi Larry,

The plan, as I understand it, is to support GULM locking for RHEL4, but
starting with RHEL5, GULM will no longer be supported in favor of DLM.
That may alarm some people because Oracle is only certified to work with
GULM today, but needless to say, we'll be going through re-certification
with Oracle for RHEL5, so we will have alternatives.  It's pretty simple
to switch from GULM to DLM and back anyway.

As for gotchas:  Well, I've been working on a Cluster Suite FAQ
(Frequently Asked Questions) that I hope to make public soon, maybe
as early as this week.  (I'll post something here when I do).
I'm just waiting for feedback from some of the developers before I post it.

Regards,

Bob Peterson
Red Hat Cluster Suite


From kanderso at redhat.com  Tue Jun 27 18:30:37 2006
From: kanderso at redhat.com (Kevin Anderson)
Date: Tue, 27 Jun 2006 13:30:37 -0500
Subject: [Linux-cluster] best csgfs howto not presuming/requiring gui
In-Reply-To: <44A17198.2090807@redhat.com>
References: <c41c77a90606270624p39382bdeo3e844e21fd6e0d5@mail.gmail.com>
	<44A13B6A.2060503@redhat.com>
	<c41c77a90606270949uf8dd362l251f3ab62de75742@mail.gmail.com>
	<44A17198.2090807@redhat.com>
Message-ID: <1151433037.2784.9.camel@dhcp80-204.msp.redhat.com>

On Tue, 2006-06-27 at 12:57 -0500, Robert Peterson wrote:
> Larry Vaden wrote:

> The plan, as I understand it, is to support GULM locking for RHEL4, but
> starting with RHEL5, GULM will no longer be supported in favor of DLM.
> That may alarm some people because Oracle is only certified to work with
> GULM today, but needless to say, we'll be going through re-certification
> with Oracle for RHEL5, so we will have alternatives.  It's pretty simple
> to switch from GULM to DLM and back anyway.
> 
Also, the new upstream DLM has the ability to be configured in a
client/server model arrangement, where all lock requests go to specific
DLM nodes rather than being resolved locally.  You can tune the usage to
your configuration and still have dedicated lock manager servers if you
choose.  This capability makes GuLM redundant with the DLM functionality
and doesn't make sense to continue to port GuLM forward and support.  

I think Dave Teigland has documented how to do it in either previous
mailings or on the website, will let him comment with the details.

Thanks
Kevin


From Stefano.Schiavi at aem.torino.it  Mon Jun 26 13:29:38 2006
From: Stefano.Schiavi at aem.torino.it (Stefano Schiavi)
Date: Mon, 26 Jun 2006 15:29:38 +0200
Subject: [Linux-cluster] GFS-6.0.2.20-2 doesn't accept rebooted nodes
Message-ID: <s49ffd6f.096@gemini.aem.torino.it>

Hi gurus.

We have a three nodes Itanium 64 with GFS in conjuction with OCFS for a
Oracle RAC

We have find many phisical problems in our switch , and made a
sobstitution of the switches.
Here is the problem :
the first node of the cluster doesn't re-login to the gfs:
here is the situation:made from the master :

[root at sapcl02 spool]# gulm_tool nodelist sapcl02:core
 Name: sapcl03.aem.torino.it
  ip    = 100.2.254.210
  state = Logged in
  mode = Slave
  missed beats = 0
  last beat = 1151328027843676
  delay avg = 10000443
  max delay = 13047588

 Name: sapcl01.aem.torino.it
  ip    = 100.2.254.208
  state = Expired
  mode = Slave
  missed beats = 0
  last beat = 0
  delay avg = 0
  max delay = 0

 Name: sapcl02.aem.torino.it
  ip    = 100.2.254.209
  state = Logged in
  mode = Master
  missed beats = 0
  last beat = 1151328021593557
  delay avg = 10000849
  max delay = 113821588141

as you can see ...sapcl01 is in state expired.
In sapcl01 the startint of lock_gulmd hung ....
>From the /var/log/message of the master i see ....infinitely repetuted
....


Jun 26 15:23:32 sapcl02 lock_gulmd_core[22601]: Gonna exec fence_node
sapcl01.aem.torino.it 
Jun 26 15:23:32 sapcl02 fence_node[22601]: Cannot locate the cluster
node, sapcl01.aem.torino.it 
Jun 26 15:23:32 sapcl02 fence_node[22601]: All fencing methods FAILED!

Jun 26 15:23:32 sapcl02 fence_node[22601]: Fence of
"sapcl01.aem.torino.it" was unsuccessful. 
Jun 26 15:23:32 sapcl02 lock_gulmd_core[7499]: Fence failed. [22601]
Exit code:1 Running it again. 
Jun 26 15:23:32 sapcl02 lock_gulmd_core[7499]: Forked [22604]
fence_node sapcl01.aem.torino.it with a 5 pause. 

also if i power down the sapcl01 node , the master try and try to fence
the slave node.
Also , in the master and the slave , i try to manually fence for
eliminate the expiration . But no results.
It seems that the only way to reallinate the cluster is to GLOBALLY
power down the entire nodes , and restart.

here is the configuration files:
########### fence.ccs ########################################
fence_devices {
nps {
agent = "fence_wti"
ipaddr = "100.2.254.254"
login = "nps"
passwd = "password"
}
}
[root at sapcl01 gfs]# more nodes.ccs 
#### nodes.ccs #######################################
nodes {
sapcl01 {
ip_interfaces {
eth1 = "192.168.2.208"
}
fence {
power {
nps {
port = 1
}
}
}
}
sapcl02 {
ip_interfaces {
eth1 = "192.168.2.209"
}
fence {
power {
nps {
port = 2
}
}
}
}
sapcl03 {
ip_interfaces {
eth1 = "192.168.2.210"
}
fence {
power {
nps {
port = 3
}
}
[root at sapcl01 gfs]# more cluster.ccs 
#### cluster.ccs #####################################
cluster {
name = "gfsrac"
lock_gulm {
servers = [ "sapcl01","sapcl02","sapcl03" ]
}
}


PS the cluster is was fully operational from 7 months ago. the change
of the switch is the problem

Best regards
Stefano


From troy.stepan at unisys.com  Mon Jun 26 14:38:43 2006
From: troy.stepan at unisys.com (Stepan, Troy)
Date: Mon, 26 Jun 2006 10:38:43 -0400
Subject: [Linux-cluster] Equivalent to RHCS?
Message-ID: <94C8C9E8B25F564F95185BDA64AB05F603C3E92C@USTR-EXCH5.na.uis.unisys.com>

I hate to bother you guys with dumb questions, but I'm confused--

 
Looking at the components, it looks like this project is the core of Red
Hat Cluster Suite.  I take it RHCS itself is not open, but its "root"
components are?  How does RHCS compare to these open projects?

 
Thanks in advance.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060626/175b00e0/attachment.htm>

From pcaulfie at redhat.com  Wed Jun 28 07:08:45 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 28 Jun 2006 08:08:45 +0100
Subject: [Linux-cluster] Equivalent to RHCS?
In-Reply-To: <94C8C9E8B25F564F95185BDA64AB05F603C3E92C@USTR-EXCH5.na.uis.unisys.com>
References: <94C8C9E8B25F564F95185BDA64AB05F603C3E92C@USTR-EXCH5.na.uis.unisys.com>
Message-ID: <44A22AFD.7090506@redhat.com>

Stepan, Troy wrote:
> I hate to bother you guys with dumb questions, but I?m confused--
> 
>  
> 
> Looking at the components, it looks like this project is the core of Red
> Hat Cluster Suite.  I take it RHCS itself is not open,

Wrong, all the source code for RHCS is open.

http://sources.redhat.com/cluster/


-- 

patrick


From jstoner at opsource.net  Wed Jun 28 14:29:07 2006
From: jstoner at opsource.net (Jeff Stoner)
Date: Wed, 28 Jun 2006 15:29:07 +0100
Subject: [Linux-cluster] Equivalent to RHCS?
Message-ID: <38A48FA2F0103444906AD22E14F1B5A303AFA6B4@mailxchg01.corp.opsource.net>

It's all Open Source. Redhat is only obligated to make the source code
available (whether that's through CVS, tarballs or SRPMs.) Purchasing a
license entitles you to get support from Redhat (via phone, web or
email) and allows you to download binary RPMs and get compiled updates
via up2date from their servers.
 
Without a license, you have to build the software yourself or get RPMs
from some place like rpmfind.net, and support is pretty much limited to
mailing lists and message forums (where you may not get an answer.)
 

--Jeff
SME - UNIX
OpSource Inc.

PGP Key ID 0x6CB364CA 

 
	I hate to bother you guys with dumb questions, but I'm
confused--

	 
	Looking at the components, it looks like this project is the
core of Red Hat Cluster Suite.  I take it RHCS itself is not open, but
its "root" components are?  How does RHCS compare to these open
projects?

	 
	Thanks in advance.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060628/7d60222d/attachment.htm>

From riaan at obsidian.co.za  Wed Jun 28 14:41:03 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Wed, 28 Jun 2006 16:41:03 +0200 (SAST)
Subject: [Linux-cluster] Equivalent to RHCS?
In-Reply-To: <38A48FA2F0103444906AD22E14F1B5A303AFA6B4@mailxchg01.corp.opsource.net>
Message-ID: <Pine.LNX.4.33.0606281632200.26641-100000@isis.obsidian.co.za>

Jeff you are mostly correct, but I want to make a slight (but IMHO
important correction). 

What you pay for is not a licence or right to use but a "subscription",
which consists of access to binaries (including updates and upgrades for
the duration of your subscription), support, certification and open source
assurance.

or s/licence/subscription/ and I second your answer.

Riaan

On Wed, 28 Jun 2006, Jeff Stoner wrote:

> It's all Open Source. Redhat is only obligated to make the source code
> available (whether that's through CVS, tarballs or SRPMs.) Purchasing a
> license entitles you to get support from Redhat (via phone, web or
> email) and allows you to download binary RPMs and get compiled updates
> via up2date from their servers.
>  
> Without a license, you have to build the software yourself or get RPMs
> from some place like rpmfind.net, and support is pretty much limited to
> mailing lists and message forums (where you may not get an answer.)
>  
> 
> --Jeff
> SME - UNIX
> OpSource Inc.
> 
> PGP Key ID 0x6CB364CA 
> 
>  
> 
> 
> 	 
> 
> 	I hate to bother you guys with dumb questions, but I'm
> confused--
> 
> 	 
> 
> 	Looking at the components, it looks like this project is the
> core of Red Hat Cluster Suite.  I take it RHCS itself is not open, but
> its "root" components are?  How does RHCS compare to these open
> projects?
> 
> 	 
> 
> 	Thanks in advance.


From frank at opticalart.de  Wed Jun 28 14:51:42 2006
From: frank at opticalart.de (Frank Hellmann)
Date: Wed, 28 Jun 2006 16:51:42 +0200
Subject: [Linux-cluster] Equivalent to RHCS?
In-Reply-To: <38A48FA2F0103444906AD22E14F1B5A303AFA6B4@mailxchg01.corp.opsource.net>
References: <38A48FA2F0103444906AD22E14F1B5A303AFA6B4@mailxchg01.corp.opsource.net>
Message-ID: <44A2977E.2060005@opticalart.de>

Hi!

If you look at projects like CentOS ( http://www.centos.org/ ) you will
find a complete free RedHat clone including the Cluster Suite as
precompiled packages. If you don't need any assistance and can live
without extra services this might be another route to go, instead of
compiling everything yourself.

Cheers,
Frank...

PS: I don't want to argue about the free/freedom/moral issues in using a
clone distribution. There's a thread at CentOS covering that to a good
extent:
http://www.centos.org/modules/newbb/viewtopic.php?topic_id=3642&forum=23


Jeff Stoner wrote:

> It's all Open Source. Redhat is only obligated to make the source code
> available (whether that's through CVS, tarballs or SRPMs.) Purchasing
> a license entitles you to get support from Redhat (via phone, web or
> email) and allows you to download binary RPMs and get compiled updates
> via up2date from their servers.
> Without a license, you have to build the software yourself or get RPMs
> from some place like rpmfind.net, and support is pretty much limited
> to mailing lists and message forums (where you may not get an answer.)
>
> --Jeff
> SME - UNIX
> OpSource Inc.
>
> PGP Key ID 0x6CB364CA
>
>
>     I hate to bother you guys with dumb questions, but I?m confused--
>
>     Looking at the components, it looks like this project is the core
>     of Red Hat Cluster Suite. I take it RHCS itself is not open, but
>     its ?root? components are? How does RHCS compare to these open
>     projects?
>
>     Thanks in advance.
>
>------------------------------------------------------------------------
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
--------------------------------------------------------------------------
Frank Hellmann          Optical Art GmbH           Waterloohain 7a
DI Supervisor           http://www.opticalart.de   22769 Hamburg
frank at opticalart.de     Tel: ++49 40 5111051       Fax: ++49 40 43169199 


From DERRICK.BEERY at iowa.gov  Wed Jun 28 16:16:30 2006
From: DERRICK.BEERY at iowa.gov (Beery, Derrick [DAS])
Date: Wed, 28 Jun 2006 11:16:30 -0500
Subject: [Linux-cluster] cluster-1.02.00 make errors
Message-ID: <4D9680752635E9448FF261A1443DD293033C14C0@iowadsmex04.iowa.gov.state.ia.us>

Make against 2.6.16 vanilla kernel is failing with the following errors:

 
 CC [M]  /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o

/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In function
`user_eo_get':

/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:72: too few
arguments to function `permission'

/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In function
`user_eo_set':

/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:95: too few
arguments to function `permission'

/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In function
`user_eo_remove':

/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:120: too few
arguments to function `permission'

make[5]: *** [/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o]
Error 1

make[4]: *** [_module_/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs]
Error 2

 
Can anyone shed some light on this?

 
Thanks in advance!

 
Derrick Beery

DAS ITE State of Iowa

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060628/1496bda3/attachment.htm>

From ranjtech at gmail.com  Wed Jun 28 17:38:14 2006
From: ranjtech at gmail.com (RR)
Date: Thu, 29 Jun 2006 03:38:14 +1000
Subject: [Linux-cluster] two node cluster not coming up
Message-ID: <000001c69ad9$a462d8c0$ed288a40$@com>

Hello all,

Just installed a 2-node cluster 15 mins ago, by the book, using bob
peterson's cookbook and the official CS guide, but only one of my nodes
comes up and joins the cluster, the other one stays in the "joining" state
with the message in "/var/log/messages" file stating

cman: cman_tool: Node is already active failed

it had tried pretty hard to join the cluster when I was first bringing it up
and even says 

"Connected to cluster infrastructure via: CMAN/SM Plugin v1.1.5"
Initial status:: Inquorate
Remote copy of cluster.conf is from quorate node.
Local Version # : 3
Remote version #: 3

Note, I don't have the "fenced" running yet. Also I had specified in my
cluster.conf file thusly

<cman two_node="1" expected_votes="1">
  </cman>

What am I doing wrong?

Thx
\R


From rohara at redhat.com  Wed Jun 28 18:44:48 2006
From: rohara at redhat.com (Ryan O'Hara)
Date: Wed, 28 Jun 2006 13:44:48 -0500
Subject: [Linux-cluster] cluster-1.02.00 make errors
In-Reply-To: <4D9680752635E9448FF261A1443DD293033C14C0@iowadsmex04.iowa.gov.state.ia.us>
References: <4D9680752635E9448FF261A1443DD293033C14C0@iowadsmex04.iowa.gov.state.ia.us>
Message-ID: <44A2CE20.8040003@redhat.com>


Derrick,

I was able to build the cluster-1.02.00 code against the vanilla 2.6.16 
kernel. Did you run the configure script with the --kernel_src option to 
point to the correct kernel tree?

./configure --kernel_src=/path/to/kernel

Ryan


Beery, Derrick [DAS] wrote:
> Make against 2.6.16 vanilla kernel is failing with the following errors:
> 
>  
> 
>  CC [M]  /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o
> 
> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In function 
> `user_eo_get':
> 
> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:72: too few 
> arguments to function `permission'
> 
> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In function 
> `user_eo_set':
> 
> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:95: too few 
> arguments to function `permission'
> 
> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In function 
> `user_eo_remove':
> 
> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:120: too few 
> arguments to function `permission'
> 
> make[5]: *** [/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o] 
> Error 1
> 
> make[4]: *** [_module_/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs] 
> Error 2
> 
>  
> 
> Can anyone shed some light on this?
> 
> 
> Thanks in advance!
> 
>  
> 
> Derrick Beery
> 
> DAS ITE State of Iowa
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From rpeterso at redhat.com  Wed Jun 28 20:29:57 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Wed, 28 Jun 2006 15:29:57 -0500
Subject: [Linux-cluster] two node cluster not coming up
In-Reply-To: <000001c69ad9$a462d8c0$ed288a40$@com>
References: <000001c69ad9$a462d8c0$ed288a40$@com>
Message-ID: <44A2E6C5.5080802@redhat.com>

RR wrote:
> Hello all,
>
> Just installed a 2-node cluster 15 mins ago, by the book, using bob
> peterson's cookbook and the official CS guide, but only one of my nodes
> comes up and joins the cluster, the other one stays in the "joining" state
> with the message in "/var/log/messages" file stating
>
> cman: cman_tool: Node is already active failed
>
> it had tried pretty hard to join the cluster when I was first bringing it up
> and even says 
>
> "Connected to cluster infrastructure via: CMAN/SM Plugin v1.1.5"
> Initial status:: Inquorate
> Remote copy of cluster.conf is from quorate node.
> Local Version # : 3
> Remote version #: 3
>
> Note, I don't have the "fenced" running yet. Also I had specified in my
> cluster.conf file thusly
>
> <cman two_node="1" expected_votes="1">
>   </cman>
>
> What am I doing wrong?
>
> Thx
> \R
>   
Hi RR,

You didn't give us much to go on.  When looking into these kinds of 
problems,
it's always nice to see the /etc/cluster/cluster.conf file, and possibly the
output of clustat, and cman_tool status from the cluster nodes, in this 
case,
from the node that seemed to work.

Regards,

Bob Peterson
Red Hat Cluster Suite


From ranjtech at gmail.com  Thu Jun 29 04:55:33 2006
From: ranjtech at gmail.com (RR)
Date: Thu, 29 Jun 2006 14:55:33 +1000
Subject: [Linux-cluster] two node cluster not coming up
In-Reply-To: <44A2E6C5.5080802@redhat.com>
References: <000001c69ad9$a462d8c0$ed288a40$@com> <44A2E6C5.5080802@redhat.com>
Message-ID: <000f01c69b38$42d95110$c88bf330$@com>

Hi Bob,

Attached is the cluster.conf file, and below is the status I get from the
command "service cman status" on the working node:

Svr00# service cman status
Protocol version: 5.0.1
Config version: 4
Cluster name: testcluster
Cluster ID: 27453
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 0
Node name: svr00
Node addresses: 10.1.3.64

svr00# clustat
Member Status: Quorate

Resource Group Manager not running; no service information available.

  Member Name                              Status
  ------ ----                              ------
  svr00                                     Online, Local
  svr01                                     Offline


I rebooted svr01 and now it just sits there at Starting clvmd: during
bootup. 

Hope this helps in anyone understanding my issue? Do I need all the other
services configured for this to work properly? i.e. clvmd, fenced, etc. etc.
I just wanted to see two nodes in a cluster first before I configured any
resources, services, fencing etc etc. 

\R 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson

Hi RR,

You didn't give us much to go on.  When looking into these kinds of
problems, it's always nice to see the /etc/cluster/cluster.conf file, and
possibly the output of clustat, and cman_tool status from the cluster nodes,
in this case, from the node that seemed to work.

Regards,

Bob Peterson
Red Hat Cluster Suite

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.1.conf
Type: application/octet-stream
Size: 1549 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060629/f55e1cbe/attachment.obj>

From frank at opticalart.de  Thu Jun 29 07:17:40 2006
From: frank at opticalart.de (Frank Hellmann)
Date: Thu, 29 Jun 2006 09:17:40 +0200
Subject: [Linux-cluster] cluster-1.02.00 make errors
In-Reply-To: <44A2CE20.8040003@redhat.com>
References: <4D9680752635E9448FF261A1443DD293033C14C0@iowadsmex04.iowa.gov.state.ia.us>
	<44A2CE20.8040003@redhat.com>
Message-ID: <44A37E94.8070103@opticalart.de>

Hi Derrick,

Make sure you build it this way:

  $ ./configure --kernel_src=/path/to/linux-2.6.x
  $ make install

the usual make; make install won't work for various reasons. Just do a
make install.

       Cheers,
         
                Frank...

Ryan O'Hara wrote:

>
> Derrick,
>
> I was able to build the cluster-1.02.00 code against the vanilla
> 2.6.16 kernel. Did you run the configure script with the --kernel_src
> option to point to the correct kernel tree?
>
> ./configure --kernel_src=/path/to/kernel
>
> Ryan
>
>
>
> Beery, Derrick [DAS] wrote:
>
>> Make against 2.6.16 vanilla kernel is failing with the following errors:
>>
>>  
>>
>>  CC [M]  /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In
>> function `user_eo_get':
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:72: too few
>> arguments to function `permission'
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In
>> function `user_eo_set':
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:95: too few
>> arguments to function `permission'
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In
>> function `user_eo_remove':
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:120: too
>> few arguments to function `permission'
>>
>> make[5]: ***
>> [/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o] Error 1
>>
>> make[4]: ***
>> [_module_/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs] Error 2
>>
>>  
>>
>> Can anyone shed some light on this?
>>
>>
>> Thanks in advance!
>>
>>  
>>
>> Derrick Beery
>>
>> DAS ITE State of Iowa
>>
>>
>> ------------------------------------------------------------------------
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
--------------------------------------------------------------------------
Frank Hellmann          Optical Art GmbH           Waterloohain 7a
DI Supervisor           http://www.opticalart.de   22769 Hamburg
frank at opticalart.de     Tel: ++49 40 5111051       Fax: ++49 40 43169199 


From l.dardini at comune.prato.it  Thu Jun 29 07:27:38 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Thu, 29 Jun 2006 09:27:38 +0200
Subject: R: [Linux-cluster] "Missed too many heartbeats" messages and
	hungcluster
In-Reply-To: <44A13427.7080709@aurigainformatica.it>
Message-ID: <0C5C8B118420264EBB94D7D7050150012A0083@exchange2.comune.prato.local>


> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] Per conto di 
> Fabrizio Lippolis
> Inviato: marted? 27 giugno 2006 15.36
> A: linux clustering
> Oggetto: Re: [Linux-cluster] "Missed too many heartbeats" 
> messages and hungcluster
> 
> Patrick Caulfield ha scritto:
> 
> >> Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node 
> AICLSRV01 from 
> >> the cluster : Missed too many heartbeats
> > 
> > 
> > That message means that the heartbeat messages are getting 
> lost somehow.
> > either through an unreliable network link or something else odd 
> > happening on the machine to prevent the heartbeat packets 
> reaching the network.
> 
> This is very strange since the two machines are connected by 
> a gigabit crossover cable and no other device is in the 
> middle. Also, no firewall rules are configured on any machine.
> 
> By the way, actually I am using the fence manual method but 
> it isn't much helpful and I would like to switch to a method 
> that ensures a reliable service. Does it mean I have to buy a 
> device sitting in the middle of the machines that connects 
> network and power cables? I am rather new to it so please any 
> suggestion is welcome.
> 

A fencing device is required for granting consistency of write. If one node fails to comunicate with other devices, it can write in an unconditional mode and bye bye to GFS.
A fencing device is not only a power-fence device. In my case it is the fibre channel switch. When a node has to be fenced, other telnet to the fibre channel switch and turn off the port. This doesn't powercycle the device, but blocks the write on the shared device. What kind of shared device are you using?

Leandro


> -- 
> Fabrizio Lippolis                
> fabrizio.lippolis at aurigainformatica.it
> Auriga Informatica s.r.l.            Via Don Guanella 15/B - 
> 70124 Bari
> Tel.: 080/5025414 - Fax: 080/5027448 - 
> http://www.aurigainformatica.it/
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


From Fabrizio.Lippolis at AurigaInformatica.it  Thu Jun 29 07:47:02 2006
From: Fabrizio.Lippolis at AurigaInformatica.it (Fabrizio Lippolis)
Date: Thu, 29 Jun 2006 09:47:02 +0200
Subject: R: [Linux-cluster] "Missed too many heartbeats" messages and
	hungcluster
In-Reply-To: <0C5C8B118420264EBB94D7D7050150012A0083@exchange2.comune.prato.local>
References: <0C5C8B118420264EBB94D7D7050150012A0083@exchange2.comune.prato.local>
Message-ID: <44A38576.4030709@aurigainformatica.it>

Leandro Dardini ha scritto:

> A fencing device is required for granting consistency of write. If one node fails to comunicate with other devices, it can write in an unconditional mode and bye bye to GFS.
> A fencing device is not only a power-fence device. In my case it is the fibre channel switch. When a node has to be fenced, other telnet to the fibre channel switch and turn off the port. This doesn't powercycle the device, but blocks the write on the shared device. What kind of shared device are you using?

It's a GFS file system on a disk array. Since I built the cluster for 
MySQL and ldap services, it's the file system where actually are the 
database and directory files. The disk array is physically connected to 
both machines by a SCSI cable.

-- 
Fabrizio Lippolis                fabrizio.lippolis at aurigainformatica.it
Auriga Informatica s.r.l.            Via Don Guanella 15/B - 70124 Bari
Tel.: 080/5025414 - Fax: 080/5027448 - http://www.aurigainformatica.it/


From ugo.parsi at gmail.com  Thu Jun 29 10:05:03 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Thu, 29 Jun 2006 12:05:03 +0200
Subject: [Linux-cluster] Using GULM with CLVM
Message-ID: <f29fd8170606290305i27f60ab7h45845cacf50f84d7@mail.gmail.com>

Hello,

I am having way too much problems with CMAN/DLM on a midsize cluster
(random kernel panics, freezes, random quorum dropping issues, cluster
split view, etc..) and just saw that it was more preferable to use
GULM for mid to large sized clusters and that CMAN wasn't tested on
more than 32 nodes.

Actually, I am using the RedHat Cluster Suite only for CLVM, and some
people are saying that CLVM+GULM is supported but as I can see on the
official 'documentation'  :

http://sourceware.org/cluster/gulm/gulmusage.txt :

'This document does not cover setting up a block device to run on.
Mostly because CLVM doesn't work with gulm yet'

What's the current position on that please ?
Is the documentation outdated ?

Also, my whole infrastructure is totally virtualized (with Xen), and
it is also said that it's better to use GULM on a dedicated computer.
Anyone tried that on a dedicated virtual machine ?
128 or 256 Megs of RAM should be enough or GULM is ressource hungry ?

Thanks,

Ugo PARSI

-- 
An apple a day, keeps the doctor away


From pcaulfie at redhat.com  Thu Jun 29 10:12:32 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 29 Jun 2006 11:12:32 +0100
Subject: [Linux-cluster] Using GULM with CLVM
In-Reply-To: <f29fd8170606290305i27f60ab7h45845cacf50f84d7@mail.gmail.com>
References: <f29fd8170606290305i27f60ab7h45845cacf50f84d7@mail.gmail.com>
Message-ID: <44A3A790.8090206@redhat.com>

Ugo PARSI wrote:
> Hello,
> 
> I am having way too much problems with CMAN/DLM on a midsize cluster
> (random kernel panics, freezes, random quorum dropping issues, cluster
> split view, etc..) and just saw that it was more preferable to use
> GULM for mid to large sized clusters and that CMAN wasn't tested on
> more than 32 nodes.
> 
> Actually, I am using the RedHat Cluster Suite only for CLVM, and some
> people are saying that CLVM+GULM is supported but as I can see on the
> official 'documentation'  :
> 
> http://sourceware.org/cluster/gulm/gulmusage.txt :
> 
> 'This document does not cover setting up a block device to run on.
> Mostly because CLVM doesn't work with gulm yet'
> 
> What's the current position on that please ?
> Is the documentation outdated ?


Yes, it is outdated. clvmd does work with gulm.

> Also, my whole infrastructure is totally virtualized (with Xen), and
> it is also said that it's better to use GULM on a dedicated computer.
> Anyone tried that on a dedicated virtual machine ?
> 128 or 256 Megs of RAM should be enough or GULM is ressource hungry ?
> 

gulm is resource hungry. Get as much RAM as you can ;-)
I'm no expert on gulm, but I would expect that 256MB would not be enough for a
cluster of over 32 nodes.

-- 

patrick


From ugo.parsi at gmail.com  Thu Jun 29 10:39:04 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Thu, 29 Jun 2006 12:39:04 +0200
Subject: [Linux-cluster] Using GULM with CLVM
In-Reply-To: <44A3A790.8090206@redhat.com>
References: <f29fd8170606290305i27f60ab7h45845cacf50f84d7@mail.gmail.com>
	<44A3A790.8090206@redhat.com>
Message-ID: <f29fd8170606290339o1ed8d99chf74cbd2c76844cc8@mail.gmail.com>

Ok thanks for your quick answer :)

>
> Yes, it is outdated. clvmd does work with gulm.
>

Ok... since this is undocumented.

Are these steps ok ? :

-> Start gulm servers.
-> Update cluster.conf to remove cman and add gulm servers
-> Remove cman from the node startup scripts
-> Reboot the whole cluster.

(I'm not in production yet, so downtime is not a real matter, and I'm
trying to deal the transition the easiest way)

Nothing has to be changed for LVM / CLVM ?
I start / use them the same way ?


>
> gulm is resource hungry. Get as much RAM as you can ;-)
> I'm no expert on gulm, but I would expect that 256MB would not be enough for a
> cluster of over 32 nodes.
>

Ouch !
But gulm is just a central locking server, right ? :)
I was more thinking of something like 5 or 10 megs max, LOL :)

Thanks,

Ugo PARSI

-- 
An apple a day, keeps the doctor away


From cjk at techma.com  Thu Jun 29 11:25:56 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Thu, 29 Jun 2006 07:25:56 -0400
Subject: =?us-ascii?Q?RE:_=5BLinux-cluster=5D_two_node_cluster_not_coming_up?=
In-Reply-To: <000f01c69b38$42d95110$c88bf330$@com>
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079EC1@tmaemail.techma.com>

Just a thought, this sounds like what happens when the /etc/hosts file is 
not setup correctly.  If the hostname of the machines is in the loopback 
line, then take it out and put a proper entry in. I still fail to understand
why the installer doesn't add a proper entry when first installed if a
network
interface is indeed configured. That's a nother issue tho. 

Hope this helps...


Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of RR
Sent: Thursday, June 29, 2006 12:56 AM
To: 'linux clustering'
Subject: RE: [Linux-cluster] two node cluster not coming up

Hi Bob,

Attached is the cluster.conf file, and below is the status I get from the
command "service cman status" on the working node:

Svr00# service cman status
Protocol version: 5.0.1
Config version: 4
Cluster name: testcluster
Cluster ID: 27453
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 0
Node name: svr00
Node addresses: 10.1.3.64

svr00# clustat
Member Status: Quorate

Resource Group Manager not running; no service information available.

  Member Name                              Status
  ------ ----                              ------
  svr00                                     Online, Local
  svr01                                     Offline


I rebooted svr01 and now it just sits there at Starting clvmd: during bootup.


Hope this helps in anyone understanding my issue? Do I need all the other
services configured for this to work properly? i.e. clvmd, fenced, etc. etc.
I just wanted to see two nodes in a cluster first before I configured any
resources, services, fencing etc etc. 

\R 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson

Hi RR,

You didn't give us much to go on.  When looking into these kinds of problems,
it's always nice to see the /etc/cluster/cluster.conf file, and possibly the
output of clustat, and cman_tool status from the cluster nodes, in this case,
from the node that seemed to work.

Regards,

Bob Peterson
Red Hat Cluster Suite


From pcaulfie at redhat.com  Thu Jun 29 12:02:54 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 29 Jun 2006 13:02:54 +0100
Subject: [Linux-cluster] Using GULM with CLVM
In-Reply-To: <f29fd8170606290339o1ed8d99chf74cbd2c76844cc8@mail.gmail.com>
References: <f29fd8170606290305i27f60ab7h45845cacf50f84d7@mail.gmail.com>	<44A3A790.8090206@redhat.com>
	<f29fd8170606290339o1ed8d99chf74cbd2c76844cc8@mail.gmail.com>
Message-ID: <44A3C16E.6050603@redhat.com>

Ugo PARSI wrote:
> Ok thanks for your quick answer :)
> 
>>
>> Yes, it is outdated. clvmd does work with gulm.
>>
> 
> Ok... since this is undocumented.
> 
> Are these steps ok ? :
> 
> -> Start gulm servers.
> -> Update cluster.conf to remove cman and add gulm servers
> -> Remove cman from the node startup scripts
> -> Reboot the whole cluster.
> 
> (I'm not in production yet, so downtime is not a real matter, and I'm
> trying to deal the transition the easiest way)
> 
> Nothing has to be changed for LVM / CLVM ?
> I start / use them the same way ?

That's right. clvmd will detect that it's running with gulm rather than cman.

>>
>> gulm is resource hungry. Get as much RAM as you can ;-)
>> I'm no expert on gulm, but I would expect that 256MB would not be
>> enough for a
>> cluster of over 32 nodes.
>>
> 
> Ouch !
> But gulm is just a central locking server, right ? :)
> I was more thinking of something like 5 or 10 megs max, LOL :)

Well, it depends on how many locks there are obviously. but GFS caches locks
for speed so you can end up with quite a lot!
-- 

patrick


From ugo.parsi at gmail.com  Thu Jun 29 13:12:59 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Thu, 29 Jun 2006 15:12:59 +0200
Subject: [Linux-cluster] Using GULM with CLVM
In-Reply-To: <44A3C16E.6050603@redhat.com>
References: <f29fd8170606290305i27f60ab7h45845cacf50f84d7@mail.gmail.com>
	<44A3A790.8090206@redhat.com>
	<f29fd8170606290339o1ed8d99chf74cbd2c76844cc8@mail.gmail.com>
	<44A3C16E.6050603@redhat.com>
Message-ID: <f29fd8170606290612ubb78faarc65e18aeb9d323b3@mail.gmail.com>

> >
> > Nothing has to be changed for LVM / CLVM ?
> > I start / use them the same way ?
>
> That's right. clvmd will detect that it's running with gulm rather than cman.
>

I've tried but can't figure on how to make it work.
Do you know any logs that I could check ?

ccsd starts fine.
gulmd takes like 1 or 2 secs and seems ok.
then when I start clvmd it takes a big time (like 60 seconds).
then lvm is stuck/zombie or dies whatever the action I make (not the
cannot find socket error)

venus:~# ps aux | grep lvm
root      2987  0.0  0.9 21072 1256 ?        Ss   21:31   0:00 clvmd
root      2991  0.0  0.0     0    0 ?        Z    21:31   0:00 [lvm] <defunct>

> Well, it depends on how many locks there are obviously. but GFS caches locks
> for speed so you can end up with quite a lot!
> --
>

Okay, but I'm not planning on using GFS anyway at the moment, I'm only
using the CLVM part of the cluster package.

Thanks,

Ugo PARSI

-- 
An apple a day, keeps the doctor away


From l.dardini at comune.prato.it  Thu Jun 29 13:38:24 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Thu, 29 Jun 2006 15:38:24 +0200
Subject: R: R: [Linux-cluster] "Missed too many heartbeats" messages
	andhungcluster
In-Reply-To: <44A38576.4030709@aurigainformatica.it>
Message-ID: <0C5C8B118420264EBB94D7D7050150012A00A8@exchange2.comune.prato.local>

 
> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] Per conto di 
> Fabrizio Lippolis
> Inviato: gioved? 29 giugno 2006 9.47
> A: linux clustering
> Oggetto: Re: R: [Linux-cluster] "Missed too many heartbeats" 
> messages andhungcluster
> 
> Leandro Dardini ha scritto:
> 
> > A fencing device is required for granting consistency of 
> write. If one node fails to comunicate with other devices, it 
> can write in an unconditional mode and bye bye to GFS.
> > A fencing device is not only a power-fence device. In my 
> case it is the fibre channel switch. When a node has to be 
> fenced, other telnet to the fibre channel switch and turn off 
> the port. This doesn't powercycle the device, but blocks the 
> write on the shared device. What kind of shared device are you using?
> 
> It's a GFS file system on a disk array. Since I built the 
> cluster for MySQL and ldap services, it's the file system 
> where actually are the database and directory files. The disk 
> array is physically connected to both machines by a SCSI cable.
> 

Is there a managemente console accessible via telnet/http where you can "disable" a port/host? If this is the case, you have already a fencing device.

Leandro

> -- 
> Fabrizio Lippolis                
> fabrizio.lippolis at aurigainformatica.it
> Auriga Informatica s.r.l.            Via Don Guanella 15/B - 
> 70124 Bari
> Tel.: 080/5025414 - Fax: 080/5027448 - 
> http://www.aurigainformatica.it/
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


From ramon at vanalteren.nl  Thu Jun 29 13:40:48 2006
From: ramon at vanalteren.nl (Ramon van Alteren)
Date: Thu, 29 Jun 2006 15:40:48 +0200
Subject: [Linux-cluster] Re: CLVM and AoE
Message-ID: <44A3D860.9010802@vanalteren.nl>

Hi Aaron,

> We're already committed to the AoE route unfortunately, but we're 
> setting up next week, and I'll keep everyone posted on any performance 
> benchmarks we glean.

I'm working on a similar setup (no Xen do use GFS & coRAIDS)
I was wondering whether you have any performance info about this setup 
and if you fell into any  pitfalls along the way.

TIA,

Ramon

  
From ranjtech at gmail.com  Thu Jun 29 13:46:07 2006
From: ranjtech at gmail.com (RR)
Date: Thu, 29 Jun 2006 23:46:07 +1000
Subject: [Linux-cluster] two node cluster not coming up
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079EC1@tmaemail.techma.com>
References: <000f01c69b38$42d95110$c88bf330$@com>
	<FF2CE0D593AEE34B955FEC77BD5AFBE0079EC1@tmaemail.techma.com>
Message-ID: <000801c69b82$615e1f90$241a5eb0$@com>

No, that ain't it. I install CSGFS etc. during my modified Kickstart process
and as part of my extended post-install, I fix the /etc/hosts as well and I
double checked that and its all good. Anyone else? Bob? Any ideas?


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J.
Sent: Thursday, June 29, 2006 9:26 PM
To: linux clustering
Subject: RE: [Linux-cluster] two node cluster not coming up

Just a thought, this sounds like what happens when the /etc/hosts file is 
not setup correctly.  If the hostname of the machines is in the loopback 
line, then take it out and put a proper entry in. I still fail to understand
why the installer doesn't add a proper entry when first installed if a
network
interface is indeed configured. That's a nother issue tho. 

Hope this helps...


Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of RR
Sent: Thursday, June 29, 2006 12:56 AM
To: 'linux clustering'
Subject: RE: [Linux-cluster] two node cluster not coming up

Hi Bob,

Attached is the cluster.conf file, and below is the status I get from the
command "service cman status" on the working node:

Svr00# service cman status
Protocol version: 5.0.1
Config version: 4
Cluster name: testcluster
Cluster ID: 27453
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 0
Node name: svr00
Node addresses: 10.1.3.64

svr00# clustat
Member Status: Quorate

Resource Group Manager not running; no service information available.

  Member Name                              Status
  ------ ----                              ------
  svr00                                     Online, Local
  svr01                                     Offline


I rebooted svr01 and now it just sits there at Starting clvmd: during
bootup.


Hope this helps in anyone understanding my issue? Do I need all the other
services configured for this to work properly? i.e. clvmd, fenced, etc. etc.
I just wanted to see two nodes in a cluster first before I configured any
resources, services, fencing etc etc. 

\R 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson

Hi RR,

You didn't give us much to go on.  When looking into these kinds of
problems,
it's always nice to see the /etc/cluster/cluster.conf file, and possibly the
output of clustat, and cman_tool status from the cluster nodes, in this
case,
from the node that seemed to work.

Regards,

Bob Peterson
Red Hat Cluster Suite


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From kanderso at redhat.com  Thu Jun 29 14:00:04 2006
From: kanderso at redhat.com (Kevin Anderson)
Date: Thu, 29 Jun 2006 09:00:04 -0500
Subject: R: [Linux-cluster] "Missed too many heartbeats" messages and
	hungcluster
In-Reply-To: <44A38576.4030709@aurigainformatica.it>
References: <0C5C8B118420264EBB94D7D7050150012A0083@exchange2.comune.prato.local>
	<44A38576.4030709@aurigainformatica.it>
Message-ID: <1151589604.2864.4.camel@localhost.localdomain>

On Thu, 2006-06-29 at 09:47 +0200, Fabrizio Lippolis wrote:
> Leandro Dardini ha scritto:
> 
> > A fencing device is required for granting consistency of write. If one node fails to comunicate with other devices, it can write in an unconditional mode and bye bye to GFS.
> > A fencing device is not only a power-fence device. In my case it is the fibre channel switch. When a node has to be fenced, other telnet to the fibre channel switch and turn off the port. This doesn't powercycle the device, but blocks the write on the shared device. What kind of shared device are you using?
> 
> It's a GFS file system on a disk array. Since I built the cluster for 
> MySQL and ldap services, it's the file system where actually are the 
> database and directory files. The disk array is physically connected to 
> both machines by a SCSI cable.
> 

You might be getting lockouts due to the storage subsystem you are
using.  GFS requires the ability to write/read concurrently from the
storage devices and generally overwhelms a direct attached SCSI array.
The configuration you describe will not be stable since when one node is
accessing the storage, the other machine is completely locked out of the
bus.  This is probably some of the problems you are having with missing
heartbeats.  It has been a long time since we have run in that
configuration, so not sure of the current behaviors, use fibre channel,
iscsi or gnbd as proper storage infrastructure.

Kevin


From pcaulfie at redhat.com  Thu Jun 29 14:03:00 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 29 Jun 2006 15:03:00 +0100
Subject: [Linux-cluster] Using GULM with CLVM
In-Reply-To: <f29fd8170606290612ubb78faarc65e18aeb9d323b3@mail.gmail.com>
References: <f29fd8170606290305i27f60ab7h45845cacf50f84d7@mail.gmail.com>	<44A3A790.8090206@redhat.com>	<f29fd8170606290339o1ed8d99chf74cbd2c76844cc8@mail.gmail.com>	<44A3C16E.6050603@redhat.com>
	<f29fd8170606290612ubb78faarc65e18aeb9d323b3@mail.gmail.com>
Message-ID: <44A3DD94.3070001@redhat.com>

Ugo PARSI wrote:
>> >
>> > Nothing has to be changed for LVM / CLVM ?
>> > I start / use them the same way ?
>>
>> That's right. clvmd will detect that it's running with gulm rather
>> than cman.
>>
> 
> I've tried but can't figure on how to make it work.
> Do you know any logs that I could check ?
> 
> ccsd starts fine.
> gulmd takes like 1 or 2 secs and seems ok.
> then when I start clvmd it takes a big time (like 60 seconds).
> then lvm is stuck/zombie or dies whatever the action I make (not the
> cannot find socket error)
> 

clvmd should log errors to syslog. Check that the gulm cluster is quorate as
clvmd won't do anything without a quorate cluster. You might also like to run
it with -d and see if any errors appear on stderr.

-- 

patrick


From rpeterso at redhat.com  Thu Jun 29 14:36:28 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Thu, 29 Jun 2006 09:36:28 -0500
Subject: [Linux-cluster] two node cluster not coming up
In-Reply-To: <000f01c69b38$42d95110$c88bf330$@com>
References: <000001c69ad9$a462d8c0$ed288a40$@com> <44A2E6C5.5080802@redhat.com>
	<000f01c69b38$42d95110$c88bf330$@com>
Message-ID: <44A3E56C.6010001@redhat.com>

RR wrote:
> Hi Bob,
>
> Attached is the cluster.conf file, and below is the status I get from the
> command "service cman status" on the working node:
>
> Svr00# service cman status
> Protocol version: 5.0.1
> Config version: 4
> Cluster name: testcluster
> Cluster ID: 27453
> Cluster Member: Yes
> Membership state: Cluster-Member
> Nodes: 1
> Expected_votes: 1
> Total_votes: 1
> Quorum: 1
> Active subsystems: 0
> Node name: svr00
> Node addresses: 10.1.3.64
>
> svr00# clustat
> Member Status: Quorate
>
> Resource Group Manager not running; no service information available.
>
>   Member Name                              Status
>   ------ ----                              ------
>   svr00                                     Online, Local
>   svr01                                     Offline
>
>
> I rebooted svr01 and now it just sits there at Starting clvmd: during
> bootup. 
>
> Hope this helps in anyone understanding my issue? Do I need all the other
> services configured for this to work properly? i.e. clvmd, fenced, etc. etc.
> I just wanted to see two nodes in a cluster first before I configured any
> resources, services, fencing etc etc. 
>
> \R 
>   
Hi RR,

Hm.  I didn't see anything obviously wrong with your cluster.conf file.
I guess I'd reboot svr01 and try to bring it into the cluster manually, and
see if it complains about anything along the way. 
(You may need to bring it up in single-user mode so that it doesn't hang
at the service script that starts clvmd)
Something like this:

modprobe lock_dlm
modprobe gfs
ccsd
cman_tool join -w
fence_tool join -w
clvmd

I'd verify that your communications are sound, that you can ping svr00 from
svr01, and that multicast is working.  Any reason you went with multicast
rather than broadcast?  You could see if a broadcast ping (ping -b) would
work from svr01 to svr00.  Also, you could test to see if your firewall
is blocking the IO by temporarily doing "service iptables stop" on both 
nodes.
I'd hope that selinux isn't interfering either, but you could try doing
"setenforce 0" just as an experiment to make sure.  These are just some 
ideas.

Regards,

Bob Peterson
Red Hat Cluster Suite


From ranjtech at gmail.com  Thu Jun 29 14:44:49 2006
From: ranjtech at gmail.com (RR)
Date: Fri, 30 Jun 2006 00:44:49 +1000
Subject: [Linux-cluster] two node cluster not coming up
In-Reply-To: <44A3E56C.6010001@redhat.com>
References: <000001c69ad9$a462d8c0$ed288a40$@com>
	<44A2E6C5.5080802@redhat.com>	<000f01c69b38$42d95110$c88bf330$@com>
	<44A3E56C.6010001@redhat.com>
Message-ID: <000901c69b8a$94f95c40$beec14c0$@com>

Hi Bob,

Yeah the communication is all good, can ping each other, in fact I'm scp'ing
the cluster.conf file to svr01 and there's nothing else on that network,
they might as well be connected through a x-over cable as these two machines
are the only machines on the network. I got around the startup hang by
starting the OS with the "I" keypress and said No to all cluster services.
Once in the OS, I did

# service ccsd start
# service cman start

This works fine on svr00, on svr01 it comes back with [FAILED]. When I do,
'service cman status' it says the following

[root at svr01 ~]# service cman status
Protocol version: 5.0.1
Config version: 4
Cluster name: testcluster
Cluster ID: 27453
Cluster Member: No
Membership state: Joining

When I do a manual: cman_tool join -w I get back a "cman_tool: Node is
already active"

Also, I load my modules automatically during system startup with S99local. 

BTW, I do now see the following messages in my /var/log/messages file

Jun 29 14:36:39 svr01 ccsd[3685]: Unable to perform sendto: Cannot assign
requested address
Jun 29 14:36:40 svr01 kernel: CMAN: sending membership request
Jun 29 14:36:41 svr01 ccsd[3685]: Unable to perform sendto: Cannot assign
requested address
Jun 29 14:36:45 svr01 last message repeated 2 times
Jun 29 14:36:45 svr01 kernel: CMAN: sending membership request
Jun 29 14:36:47 svr01 ccsd[3685]: Unable to perform sendto: Cannot assign
requested address
Jun 29 14:36:50 svr01 kernel: CMAN: sending membership request
Jun 29 14:37:25 svr01 last message repeated 7 times

Does this help? Can I have the application generate more detailed logging?

Thanks in advance
\R

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson
Sent: Friday, June 30, 2006 12:36 AM
To: linux clustering
Subject: Re: [Linux-cluster] two node cluster not coming up

RR wrote:
> Hi Bob,
>
> Attached is the cluster.conf file, and below is the status I get from 
> the command "service cman status" on the working node:
>
> Svr00# service cman status
> Protocol version: 5.0.1
> Config version: 4
> Cluster name: testcluster
> Cluster ID: 27453
> Cluster Member: Yes
> Membership state: Cluster-Member
> Nodes: 1
> Expected_votes: 1
> Total_votes: 1
> Quorum: 1
> Active subsystems: 0
> Node name: svr00
> Node addresses: 10.1.3.64
>
> svr00# clustat
> Member Status: Quorate
>
> Resource Group Manager not running; no service information available.
>
>   Member Name                              Status
>   ------ ----                              ------
>   svr00                                     Online, Local
>   svr01                                     Offline
>
>
> I rebooted svr01 and now it just sits there at Starting clvmd: during 
> bootup.
>
> Hope this helps in anyone understanding my issue? Do I need all the 
> other services configured for this to work properly? i.e. clvmd, fenced,
etc. etc.
> I just wanted to see two nodes in a cluster first before I configured 
> any resources, services, fencing etc etc.
>
> \R
>   
Hi RR,

Hm.  I didn't see anything obviously wrong with your cluster.conf file.
I guess I'd reboot svr01 and try to bring it into the cluster manually, and
see if it complains about anything along the way. 
(You may need to bring it up in single-user mode so that it doesn't hang at
the service script that starts clvmd) Something like this:

modprobe lock_dlm
modprobe gfs
ccsd
cman_tool join -w
fence_tool join -w
clvmd

I'd verify that your communications are sound, that you can ping svr00 from
svr01, and that multicast is working.  Any reason you went with multicast
rather than broadcast?  You could see if a broadcast ping (ping -b) would
work from svr01 to svr00.  Also, you could test to see if your firewall is
blocking the IO by temporarily doing "service iptables stop" on both nodes.
I'd hope that selinux isn't interfering either, but you could try doing
"setenforce 0" just as an experiment to make sure.  These are just some
ideas.

Regards,

Bob Peterson
Red Hat Cluster Suite

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From bmarzins at redhat.com  Thu Jun 29 14:59:24 2006
From: bmarzins at redhat.com (Benjamin Marzinski)
Date: Thu, 29 Jun 2006 09:59:24 -0500
Subject: [Linux-cluster] GFS locking issues
In-Reply-To: <433fd2630606221719q649fc46bv97e94a929c3d5427@mail.gmail.com>
References: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com>
	<20060615190959.GB1913@redhat.com>
	<433fd2630606160837x1bbe2716pf7b375f42b01cdbd@mail.gmail.com>
	<20060621175430.GB4706@redhat.com>
	<433fd2630606221719q649fc46bv97e94a929c3d5427@mail.gmail.com>
Message-ID: <20060629145924.GA15061@ether.msp.redhat.com>

On Fri, Jun 23, 2006 at 03:19:52AM +0300, Anton Kornev wrote:

Anton,

It appears that you found a bug in the gnbd code. I have it on my todo list,
and I'll get to it as soon as possible. Frankly, I'm really suprised that this
hasn't come up earlier. The issue is that once a part of a request has been
send to the server, either the whole request must be sent or connection must
be dropped. In some cases, such as when the server is non-responsive, but still
has an open socket, it is necessary to send a signal break out of the socket
transfer and then shutdown the socket. In your case, you simply want the process
that is in the middle of the transfer to die. In this case, the appropriate
response is to finish sending the IO, and then pass the signal on. I need to
look though the code and make sure that I'm doing the appropriate thing for
all circumstances.

-Ben


>    David,
> 
>    Thanks a lot for your comments.
>    Actually it sounds rather strange for me.
> 
>    I tried to grep the /var/log/messages log with "gnbd" word and found that
>    there are also
>    other messages like this even on the working host with no GFS problems.
> 
>    bash-3.00# grep gnbd /var/log/messages
>    Jun 19 08:16:20 node1 kernel: gnbd (pid 25756: alogc.pl) got signal 9
>    Jun 19 08:16:20 node1 kernel: gnbd0: Send control failed (result -4)
>    Jun 19 08:16:20 node1 kernel: gnbd (pid 25756: alogc.pl) got signal 15
>    Jun 19 08:16:20 node1 kernel: gnbd0: Send control failed (result -4)
>    Jun 19 09:51:59 node1 kernel: gnbd (pid 26259: find) got signal 9
>    Jun 19 09:51:59 node1 kernel: gnbd0: Send control failed (result -4)
>    Jun 19 10:06:39 node1 kernel: gnbd (pid 313: alogc.pl) got signal 9
>    Jun 19 10:06:39 node1 kernel: gnbd0: Send control failed (result -4)
>    Jun 19 10:06:39 node1 kernel: gnbd (pid 313: alogc.pl) got signal 15
>    Jun 19 10:06:39 node1 kernel: gnbd0: Send control failed (result -4)
>    Jun 19 12:51:12 node1 kernel: gnbd (pid 19463: vi) got signal 1
>    Jun 19 12:51:12 node1 kernel: gnbd0: Send control failed (result -4)
>    Jun 20 14:48:16 node1 kernel: gnbd (pid 20238: alogc.pl) got signal 9
>    Jun 20 14:48:16 node1 kernel: gnbd0: Send control failed (result -4)
>    Jun 20 14:48:16 node1 kernel: gnbd (pid 20238: alogc.pl) got signal 15
>    Jun 20 14:48:16 node1 kernel: gnbd0: Send control failed (result -4)
> 
>    I tried to check gnbd-kernel sources (latest available SRPM - not CVS
>    version)
>    and I found that the first message (gnbd ... got signal) is produced by
>    the
>    sock_xmit() function with the such a piece of code:
> 
>                    if (signal_pending(current)) {
>                            siginfo_t info;
>                            spin_lock_irqsave(&current->sighand->siglock,
>    flags);
>                            printk(KERN_WARNING "gnbd (pid %d: %s) got signal
>    %d\n",
>                                    current->pid, current->comm,
>                                    dequeue_signal(current, &current->blocked,
>    &info));
>                            spin_unlock_irqrestore(&current->sighand->siglock,
>    flags);
>                            result = -EINTR;
>                            break;
>                    }
> 
>    And the second message is generated inside the gnbd_send_req() by the code
> 
>            result = sock_xmit(sock, 1, &request, sizeof(request),
>                            (gnbd_cmd(req) == GNBD_CMD_WRITE)? MSG_MORE: 0);
>            if (result < 0) {
>                    printk(KERN_ERR "%s: Send control failed (result %d)\n",
>                                    dev->disk->disk_name, result);
>                    goto error_out;
>            }
> 
>    So at the first glance it seems like a normal messages from gnbd - if
>    there is signal received during sock_xmit - don't send anyting and return
>    -EINTR.
> 
>    I am not sure that it might be a problem but I take a look on the
>    sock_xmit() code and
>    there are at least two things that seems strange for me.
> 
>    1. There is an inconsistancy between comment and code:
> 
>            /* Allow interception of SIGKILL only
>             * Don't allow other signals to interrupt the transmission */
>            spin_lock_irqsave(&current->sighand->siglock, flags);
>            oldset = current->blocked;
>            sigfillset(&current->blocked);
>            sigdelsetmask(&current->blocked, sigmask(SIGKILL) |
>    sigmask(SIGTERM) |
>                          sigmask(SIGHUP));
>            recalc_sigpending();
>            spin_unlock_irqrestore(&current->sighand->siglock, flags);
> 
>    So, inside the comment there is a suggestion that only SIGKILL can
>    interrupt the transmission but the real mask is for KILL/TERM/HUP signals
>    (btw: in my case it is a SIGTERM who locks everything).
> 
>    2. There are two blocks of code following each other
> 
>                    if (send)
>                            result = sock_sendmsg(sock, &msg, size);
>                    else
>                            result = sock_recvmsg(sock, &msg, size, 0);
> 
>                    if (signal_pending(current)) {
>                            siginfo_t info;
>                            spin_lock_irqsave(&current->sighand->siglock,
>    flags);
>                            printk(KERN_WARNING "gnbd (pid %d: %s) got signal
>    %d\n",
>                                    current->pid, current->comm,
>                                    dequeue_signal(current, &current->blocked,
>    &info));
>                            spin_unlock_irqrestore(&current->sighand->siglock,
>    flags);
>                            result = -EINTR;
>                            break;
>                    }
> 
>    Why do we need to return -EINTR as a result if we have already done the
>    real sock_sendmsg() / sock_recvmsg()?  What if the real transmission was
>    okay and real result has no mistake?
> 
>    I am not a kernel developer and I haven't spent a lot of time on the
>    issue, so it might make no sense at all.
> 
>    Please, let me know what do you think about it?
> 
>    On 6/21/06, David Teigland <[1]teigland at redhat.com> wrote:
> 
>      On Fri, Jun 16, 2006 at 06:37:14PM +0300, Anton Kornev wrote:
>      > gnbd (pid 5836: alogc.pl) got signal 9
>      > gnbd0: Send control failed (result -4)
>      > gnbd (pid 5836: alogc.pl) got signal 15
>      > gnbd0: Send control failed (result -4)
> 
>      This and the fact that a number of processes appear to be blocked in the
>      i/o path seem to point at gnbd as the hold-up.
> 
>      Dave
> 
>      >   51 D wait_on_buffer                   pdflush
>      > 5771 D lock_page                        lock_dlm1
>      > 5776 D -                                gfs_logd
>      > 5777 D -                                gfs_quotad
>      > 5778 D -                                gfs_inoded
>      > 5892 D -                                httpd
>      > 5895 D glock_wait_internal              httpd
>      > 5896 D glock_wait_internal              httpd
>      > 5897 D glock_wait_internal              httpd
>      > 5911 D glock_wait_internal              httpd
>      > 5915 D wait_on_buffer                   httpd
>      > 5930 D wait_on_buffer                   sh
> 
>      > pdflush       D ffffffff8014aabc
>      0    51      6            53    50
>      > (L-TLB)
>      > 00000100dfc3dc78 0000000000000046 000001011bd3e980 000001010fc11f00
>      >       0000000000000216 ffffffffa0042916 000001011aca60c0
>      0000000000000008
>      >       000001011fdef7f0 0000000000000dfa
>      > Call Trace:<ffffffffa0042916>{:dm_mod:dm_request+396}
>      > <ffffffff8014aabc>{keventd_create_kthread+0}
>      >       <ffffffff803053ef>{io_schedule+38}
>      > <ffffffff80178c4c>{__wait_on_buffer+125}
>      >       <ffffffff80178ad2>{bh_wake_function+0}
>      > <ffffffff80178ad2>{bh_wake_function+0}
>      >       <ffffffffa0235c5d>{:gfs:gfs_logbh_wait+49}
>      > <ffffffffa024a6a6>{:gfs:disk_commit+794}
>      >       <ffffffffa024a877>{:gfs:log_refund+111}
>      > <ffffffffa024ad8e>{:gfs:log_flush_internal+510}
>      >       <ffffffff8017d682>{sync_supers+167}
>      <ffffffff8015e310>{wb_kupdate+36}
>      >
>      >       <ffffffff8015edb4>{pdflush+323} <ffffffff8015e2ec>{wb_kupdate+0}
>      >       <ffffffff8015ec71>{pdflush+0} <ffffffff8014aa93>{kthread+200}
>      >       <ffffffff80110e17>{child_rip+8}
>      > <ffffffff8014aabc>{keventd_create_kthread+0}
>      >       <ffffffff8014a9cb>{kthread+0} <ffffffff80110e0f>{child_rip+0}
>      > lock_dlm1     D 000001000c0096e0
>      0  5771      6          5772  5766
>      > (L-TLB)
>      > 0000010113ce3c58 0000000000000046 0000001000000000 0000010000000069
>      >       000001011420b030 0000000000000069 000001000c00a940
>      000000010000eb10
>      >       000001011a887030 0000000000001cae
>      > Call Trace:<ffffffff802496d4>{__generic_unplug_device+19}
>      > <ffffffff803053ef>{io_schedule+38}
>      >       <ffffffff80159215>{__lock_page+191}
>      > <ffffffff80158cfa>{page_wake_function+0}
>      >       <ffffffff80158cfa>{page_wake_function+0}
>      > <ffffffff80163125>{truncate_inode_pages+519}
>      >       <ffffffffa0258f35>{:gfs:gfs_inval_page+63}
>      > <ffffffffa02401b5>{:gfs:drop_bh+233}
>      >       <ffffffffa0242138>{:gfs:gfs_glock_cb+194}
>      > <ffffffffa02869dd>{:lock_dlm:dlm_async+1989}
>      >       <ffffffff801333c8>{default_wake_function+0}
>      > <ffffffff8014aabc>{keventd_create_kthread+0}
>      >       <ffffffffa0286218>{:lock_dlm:dlm_async+0}
>      > <ffffffff8014aabc>{keventd_create_kthread+0}
>      >       <ffffffff8014aa93>{kthread+200} <ffffffff80110e17>{child_rip+8}
>      >       <ffffffff8014aabc>{keventd_create_kthread+0}
>      > <ffffffff8014a9cb>{kthread+0}
>      >       <ffffffff80110e0f>{child_rip+0}
>      > gfs_logd      D 0000000000000000
>      0  5776      1          5777  5775
>      > (L-TLB)
>      > 000001011387fe38 0000000000000046 0000000000000000 ffffffff80304a85
>      >       000001011387fe58 ffffffff80304add ffffffff803cca80
>      0000000000000246
>      >       00000101143fe030 00000000000000b5
>      > Call Trace:<ffffffff80304a85>{thread_return+0}
>      > <ffffffff80304add>{thread_return+88}
>      >       <ffffffffa023e8d3>{:gfs:lock_on_glock+112}
>      > <ffffffff8030565b>{__down_write+134}
>      >       <ffffffffa0249cdb>{:gfs:gfs_ail_empty+56}
>      > <ffffffffa0233930>{:gfs:gfs_logd+77}
>      >       <ffffffff80110e17>{child_rip+8}
>      > <ffffffff801cccff>{dummy_d_instantiate+0}
>      >       <ffffffffa02338e3>{:gfs:gfs_logd+0}
>      <ffffffff80110e0f>{child_rip+0}
>      >
>      > gfs_quotad    D 0000000000000000
>      0  5777      1          5778  5776
>      > (L-TLB)
>      > 0000010113881e98 0000000000000046 0000000000000000 ffffffff80304a85
>      >       0000010113881eb8 ffffffff80304add 000001011ff87030
>      0000000100000074
>      >       000001011430f7f0 0000000000000128
>      > Call Trace:<ffffffff80304a85>{thread_return+0}
>      > <ffffffff80304add>{thread_return+88}
>      >       <ffffffff8030565b>{__down_write+134}
>      > <ffffffffa025b55a>{:gfs:gfs_quota_sync+226}
>      >       <ffffffffa0233ab1>{:gfs:gfs_quotad+127}
>      > <ffffffff80110e17>{child_rip+8}
>      >       <ffffffff801cccff>{dummy_d_instantiate+0}
>      > <ffffffff801cccff>{dummy_d_instantiate+0}
>      >       <ffffffff801cccff>{dummy_d_instantiate+0}
>      > <ffffffffa0233a32>{:gfs:gfs_quotad+0}
>      >       <ffffffff80110e0f>{child_rip+0}
>      > gfs_inoded    D 0000000000000000
>      0  5778      1          5807  5777
>      > (L-TLB)
>      > 0000010113883e98 0000000000000046 000001011e2937f0 000001000c0096e0
>      >       0000000000000000 ffffffff80304a85 0000010113883ec8
>      0000000180304add
>      >       000001011e2937f0 00000000000000c2
>      > Call Trace:<ffffffff80304a85>{thread_return+0}
>      > <ffffffff8030565b>{__down_write+134}
>      >       <ffffffffa026160d>{:gfs:unlinked_find+115}
>      > <ffffffffa0261c6c>{:gfs:gfs_unlinked_dealloc+25}
>      >       <ffffffffa0233bd5>{:gfs:gfs_inoded+66}
>      > <ffffffff80110e17>{child_rip+8}
>      >       <ffffffffa0233b93>{:gfs:gfs_inoded+0}
>      <ffffffff80110e0f>{child_rip+0}
>      >
>      >
>      > httpd         D ffffffff80304190
>      0  5892      1  5893          5826
>      > (NOTLB)
>      > 0000010111b75bf8 0000000000000002 0000000000000001 0000000000000001
>      >       0000000000000000 0000000000000000 0000010114667980
>      0000000111b75bc0
>      >       00000101143fe7f0 00000000000009ad
>      > Call Trace:<ffffffff80303d6f>{__down+147}
>      > <ffffffff801333c8>{default_wake_function+0}
>      >       <ffffffff8015b3a2>{generic_file_write_nolock+158}
>      > <ffffffff80305780>{__down_failed+53}
>      >       <ffffffffa0236986>{:gfs:.text.lock.dio+95}
>      > <ffffffffa0260e4c>{:gfs:gfs_trans_add_bh+205}
>      >       <ffffffffa0253efc>{:gfs:do_write_buf+1138}
>      > <ffffffffa0252db3>{:gfs:walk_vm+278}
>      >       <ffffffffa0253a8a>{:gfs:do_write_buf+0}
>      > <ffffffffa0253a8a>{:gfs:do_write_buf+0}
>      >       <ffffffffa025415b>{:gfs:__gfs_write+201}
>      > <ffffffff80177c60>{vfs_write+207}
>      >       <ffffffff80177d48>{sys_write+69}
>      <ffffffff801101c6>{system_call+126}
>      >
>      > httpd         D 0000010110ad7d48     0  5895
>      5892          5896  5893
>      > (NOTLB)
>      > 0000010110ad7bd8 0000000000000006 000001011b16e030 0000000000000075
>      >       0000010117002030 0000000000000075 000001000c002940
>      0000000000000001
>      >       00000101170027f0 000000000001300e
>      > Call Trace:<ffffffff80131d1d>{try_to_wake_up+863}
>      > <ffffffff80304cbd>{wait_for_completion+167}
>      >       <ffffffff801333c8>{default_wake_function+0}
>      > <ffffffff801333c8>{default_wake_function+0}
>      >       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
>      > <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
>      >       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
>      > <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
>      >       <ffffffff80168211>{do_no_page+1003}
>      > <ffffffff80167b13>{do_wp_page+948}
>      >       <ffffffff8016858f>{handle_mm_fault+343}
>      > <ffffffff80142a06>{get_signal_to_deliver+1118}
>      >       <ffffffff801236d2>{do_page_fault+518}
>      > <ffffffff80304a85>{thread_return+0}
>      >       <ffffffff80304add>{thread_return+88}
>      <ffffffff80110c61>{error_exit+0}
>      >
>      >
>      > httpd         D 0000010110b5bd48     0  5896
>      5892          5897  5895
>      > (NOTLB)
>      > 0000010110b5bbd8 0000000000000002 00000101170027f0 0000000000000075
>      >       00000101114787f0 0000000000000075 000001000c002940
>      0000000000000001
>      >       0000010117002030 000000000000fb3e
>      > Call Trace:<ffffffff80131d1d>{try_to_wake_up+863}
>      > <ffffffff80304cbd>{wait_for_completion+167}
>      >       <ffffffff801333c8>{default_wake_function+0}
>      > <ffffffff801333c8>{default_wake_function+0}
>      >       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
>      > <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
>      >       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
>      > <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
>      >       <ffffffff80168211>{do_no_page+1003}
>      > <ffffffff80167b13>{do_wp_page+948}
>      >       <ffffffff8016858f>{handle_mm_fault+343}
>      > <ffffffff80142a06>{get_signal_to_deliver+1118}
>      >       <ffffffff801236d2>{do_page_fault+518}
>      > <ffffffff802a3445>{sys_accept+327}
>      >       <ffffffff80182e88>{pipe_read+26}
>      <ffffffff80110c61>{error_exit+0}
>      >
>      > httpd         D 0000000000000000     0  5897
>      5892          5911  5896
>      > (NOTLB)
>      > 0000010110119bd8 0000000000000006 0000010117002030 0000000000000075
>      >       0000010117002030 0000000000000075 000001000c00a940
>      000000001b16e030
>      >       00000101114787f0 000000000000fbe0
>      > Call Trace:<ffffffff802496d4>{__generic_unplug_device+19}
>      > <ffffffff80304cbd>{wait_for_completion+167}
>      >       <ffffffff801333c8>{default_wake_function+0}
>      > <ffffffff801333c8>{default_wake_function+0}
>      >       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
>      > <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
>      >       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
>      > <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
>      >       <ffffffff80168211>{do_no_page+1003}
>      > <ffffffff80167b13>{do_wp_page+948}
>      >       <ffffffff8016858f>{handle_mm_fault+343}
>      > <ffffffff80142a06>{get_signal_to_deliver+1118}
>      >       <ffffffff801236d2>{do_page_fault+518}
>      > <ffffffff80304a85>{thread_return+0}
>      >       <ffffffff80304add>{thread_return+88}
>      <ffffffff80110c61>{error_exit+0}
>      >
>      >
>      > httpd         D 00000101100c3d48     0  5911
>      5892          5915  5897
>      > (NOTLB)
>      > 00000101100c3bd8 0000000000000002 000001011420b7f0 0000000000000075
>      >       00000101170027f0 0000000000000075 000001000c002940
>      0000000000000000
>      >       000001011b16e030 000000000000187e
>      > Call Trace:<ffffffff80131d1d>{try_to_wake_up+863}
>      > <ffffffff80304cbd>{wait_for_completion+167}
>      >       <ffffffff801333c8>{default_wake_function+0}
>      > <ffffffff801333c8>{default_wake_function+0}
>      >       <ffffffffa023f4b1>{:gfs:glock_wait_internal+350}
>      > <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
>      >       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
>      > <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
>      >       <ffffffff80168211>{do_no_page+1003}
>      > <ffffffff80167b13>{do_wp_page+948}
>      >       <ffffffff8016858f>{handle_mm_fault+343}
>      > <ffffffff80142a06>{get_signal_to_deliver+1118}
>      >       <ffffffff801236d2>{do_page_fault+518}
>      > <ffffffff80304a85>{thread_return+0}
>      >       <ffffffff80304add>{thread_return+88}
>      <ffffffff80110c61>{error_exit+0}
>      >
>      >
>      > httpd         D 0000000000006a36     0  5915
>      5892                5911
>      > (NOTLB)
>      > 00000101180f7ad8 0000000000000006 0000000000002706 ffffffffa020c791
>      >       0000000000000000 0000000000000000 0000030348ac8c1c
>      0000000114a217f0
>      >       0000010114c997f0 000000000000076a
>      > Call Trace:<ffffffffa020c791>{:dlm:lkb_swqueue+43}
>      > <ffffffff803053ef>{io_schedule+38}
>      >       <ffffffff80178c4c>{__wait_on_buffer+125}
>      > <ffffffff80178ad2>{bh_wake_function+0}
>      >       <ffffffff80178ad2>{bh_wake_function+0}
>      > <ffffffffa02352c6>{:gfs:gfs_dreread+154}
>      >       <ffffffffa0235332>{:gfs:gfs_dread+40}
>      > <ffffffffa02363b1>{:gfs:gfs_get_meta_buffer+201}
>      >       <ffffffffa0242999>{:gfs:gfs_copyin_dinode+23}
>      > <ffffffffa0242461>{:gfs:inode_go_lock+38}
>      >       <ffffffffa023f586>{:gfs:glock_wait_internal+563}
>      > <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
>      >       <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
>      > <ffffffffa0258b7b>{:gfs:gfs_private_nopage+84}
>      >       <ffffffff80168211>{do_no_page+1003}
>      > <ffffffff80167b13>{do_wp_page+948}
>      >       <ffffffff8016858f>{handle_mm_fault+343}
>      > <ffffffff80142a06>{get_signal_to_deliver+1118}
>      >       <ffffffff801236d2>{do_page_fault+518}
>      > <ffffffff80304a85>{thread_return+0}
>      >       <ffffffff80304add>{thread_return+88}
>      <ffffffff80110c61>{error_exit+0}
>      >
>      >
>      > sh            D 000000000000001a     0  5930   2547
>      > (NOTLB)
>      > 000001011090f8e8 0000000000000002 0000010111293d88 0000010110973d00
>      >       0000010111293d88 0000000000000000 00000100dfc02400
>      0000000000010000
>      >       00000101148557f0 0000000000002010
>      > Call Trace:<ffffffff803053ef>{io_schedule+38}
>      > <ffffffff80178c4c>{__wait_on_buffer+125}
>      >       <ffffffff80178ad2>{bh_wake_function+0}
>      > <ffffffff80178ad2>{bh_wake_function+0}
>      >       <ffffffffa02352c6>{:gfs:gfs_dreread+154}
>      > <ffffffffa0235332>{:gfs:gfs_dread+40}
>      >       <ffffffffa02363b1>{:gfs:gfs_get_meta_buffer+201}
>      > <ffffffffa0242999>{:gfs:gfs_copyin_dinode+23}
>      >       <ffffffffa0242461>{:gfs:inode_go_lock+38}
>      > <ffffffffa023f586>{:gfs:glock_wait_internal+563}
>      >       <ffffffffa023fce6>{:gfs:gfs_glock_nq+961}
>      > <ffffffffa023ff11>{:gfs:gfs_glock_nq_init+20}
>      >       <ffffffff801ccb78>{dummy_inode_permission+0}
>      > <ffffffffa0257aca>{:gfs:gfs_permission+64}
>      >       <ffffffff8018d475>{dput+56} <ffffffff80183d32>{permission+51}
>      >       <ffffffff801844aa>{__link_path_walk+372}
>      > <ffffffff801851c2>{link_path_walk+82}
>      >       <ffffffff8012370b>{do_page_fault+575}
>      > <ffffffff801849b0>{__link_path_walk+1658}
>      >       <ffffffff801851c2>{link_path_walk+82}
>      > <ffffffff8012370b>{do_page_fault+575}
>      >       <ffffffff8018540f>{path_lookup+451}
>      > <ffffffff801856bb>{__user_walk+47}
>      >       <ffffffff8017ff1a>{vfs_stat+24}
>      <ffffffff8012370b>{do_page_fault+575}
>      >
>      >       <ffffffff80180264>{sys_newstat+17}
>      <ffffffff80110c61>{error_exit+0}
>      >       <ffffffff801101c6>{system_call+126}
> 
>    --
>    Best Regards,
>    Anton Kornev.
> 
> References
> 
>    Visible links
>    1. mailto:teigland at redhat.com


From rpeterso at redhat.com  Thu Jun 29 15:34:44 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Thu, 29 Jun 2006 10:34:44 -0500
Subject: [Linux-cluster] two node cluster not coming up
In-Reply-To: <000901c69b8a$94f95c40$beec14c0$@com>
References: <000001c69ad9$a462d8c0$ed288a40$@com>	<44A2E6C5.5080802@redhat.com>	<000f01c69b38$42d95110$c88bf330$@com>	<44A3E56C.6010001@redhat.com>
	<000901c69b8a$94f95c40$beec14c0$@com>
Message-ID: <44A3F314.7030203@redhat.com>

RR wrote:
> Jun 29 14:36:39 svr01 ccsd[3685]: Unable to perform sendto: Cannot assign
> requested address
Hi RR,

These messages means that svr01 tried to send a broadcast/multicast 
message to
the socket, but the underlying communications layer returned an error.
Perhaps you can try it without the line:

<multicast addr="224.0.0.1" interface="bond0"/>

In your cluster.conf.

Regards,

Bob Peterson
Red Hat Cluster Suite


From djkast at gmail.com  Thu Jun 29 15:42:19 2006
From: djkast at gmail.com (DJ-Kast .)
Date: Thu, 29 Jun 2006 11:42:19 -0400
Subject: [Linux-cluster] I am getting the following error
Message-ID: <bed25d6e0606290842n5455e0c3h4b5a79ecf45abe50@mail.gmail.com>

Jun 29 11:36:06 jtest1 clurgmgrd[5217]: <notice> Resource Group Manager
Starting
Jun 29 11:36:06 jtest1 clurgmgrd[5217]: <info> Loading Service Data
Jun 29 11:36:06 jtest1 clurgmgrd[5217]: <info> Initializing Services
Jun 29 11:36:06 jtest1 clurgmgrd[5217]: <info> Services Initialized
Jun 29 11:37:17 jtest1 clurgmgrd[5217]: <info> Logged in SG "usrm::manager"
Jun 29 11:37:17 jtest1 clurgmgrd[5217]: <info> Magma Event: Membership
Change
Jun 29 11:37:17 jtest1 clurgmgrd[5217]: <info> State change: Local
UP
Jun 29 11:37:47 jtest1 clurgmgrd[5217]: <warning> Node ID:0000000000000001
stuck with lock usrm::rg="MountME"
Jun 29 11:38:47 jtest1 last message repeated 2 times
Jun 29 11:39:13 jtest1 clurgmgrd[5217]: <info> State change: vps3 UP
Jun 29 11:39:13 jtest1 clurgmgrd[5217]: <info> State change: vps1 UP

MountME is the name of my service

The resources include IP and NFS mount

I have 3 nodes..

If node 1 has ownership and is running the NFS Mount

and i issue a umount...

No other nodes recover
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060629/3fd16d2a/attachment.htm>

From DERRICK.BEERY at iowa.gov  Thu Jun 29 16:28:13 2006
From: DERRICK.BEERY at iowa.gov (Beery, Derrick [DAS])
Date: Thu, 29 Jun 2006 11:28:13 -0500
Subject: [Linux-cluster] cluster-1.02.00 make errors
Message-ID: <4D9680752635E9448FF261A1443DD293033C14CC@iowadsmex04.iowa.gov.state.ia.us>

Any ideas on this one?

gnbd_export.c:26:25: error: sysfs/dlist.h: No such file or directory
gnbd_export.c:27:28: error: sysfs/libsysfs.h: No such file or directory
gnbd_export.c: In function $(B!F(Bget_sysfs_name

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Frank Hellmann
Sent: Thursday, June 29, 2006 2:18 AM
To: linux clustering
Subject: Re: [Linux-cluster] cluster-1.02.00 make errors

Hi Derrick,

Make sure you build it this way:

  $ ./configure --kernel_src=/path/to/linux-2.6.x
  $ make install

the usual make; make install won't work for various reasons. Just do a
make install.

       Cheers,
         
                Frank...

Ryan O'Hara wrote:

>
> Derrick,
>
> I was able to build the cluster-1.02.00 code against the vanilla
> 2.6.16 kernel. Did you run the configure script with the --kernel_src
> option to point to the correct kernel tree?
>
> ./configure --kernel_src=/path/to/kernel
>
> Ryan
>
>
>
> Beery, Derrick [DAS] wrote:
>
>> Make against 2.6.16 vanilla kernel is failing with the following
errors:
>>
>>  
>>
>>  CC [M]  /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In
>> function `user_eo_get':
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:72: too few
>> arguments to function `permission'
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In
>> function `user_eo_set':
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:95: too few
>> arguments to function `permission'
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In
>> function `user_eo_remove':
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:120: too
>> few arguments to function `permission'
>>
>> make[5]: ***
>> [/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o] Error 1
>>
>> make[4]: ***
>> [_module_/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs] Error 2
>>
>>  
>>
>> Can anyone shed some light on this?
>>
>>
>> Thanks in advance!
>>
>>  
>>
>> Derrick Beery
>>
>> DAS ITE State of Iowa
>>
>>
>>
------------------------------------------------------------------------
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
------------------------------------------------------------------------
--
Frank Hellmann          Optical Art GmbH           Waterloohain 7a
DI Supervisor           http://www.opticalart.de   22769 Hamburg
frank at opticalart.de     Tel: ++49 40 5111051       Fax: ++49 40 43169199


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From ranjtech at gmail.com  Thu Jun 29 16:28:13 2006
From: ranjtech at gmail.com (RR)
Date: Fri, 30 Jun 2006 02:28:13 +1000
Subject: [Linux-cluster] two node cluster not coming up
In-Reply-To: <44A3F314.7030203@redhat.com>
References: <000001c69ad9$a462d8c0$ed288a40$@com>	<44A2E6C5.5080802@redhat.com>	<000f01c69b38$42d95110$c88bf330$@com>	<44A3E56C.6010001@redhat.com>	<000901c69b8a$94f95c40$beec14c0$@com>
	<44A3F314.7030203@redhat.com>
Message-ID: <001601c69b99$06c98710$145c9530$@com>

Bob, mate, you've done it. Obv. Had to be a network related issue and I
should've thought of it since I was getting nada on svr00 when I tried
capturing packets from svr01. Didn't know what parameter to change. Needed
to reboot the machine and manually start the services but it's now happy I
think. I'm pretty sure the bonded Ethernet interfaces support multicast. Not
sure why it's getting rejected. My iptables and selinux are both disabled by
default during install. BTW, What's the consequence of my removing the
multicast address? Will it have a consequence in using DLM? Does it do
Broadcast by default? 

Ok, so guess I'll move down the list of steps to get this moving with
resources and services. I'm assuming I can have an active-active two node
cluster? 

Thanks again
\R

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson
Sent: Friday, June 30, 2006 1:35 AM
To: linux clustering
Subject: Re: [Linux-cluster] two node cluster not coming up

RR wrote:
> Jun 29 14:36:39 svr01 ccsd[3685]: Unable to perform sendto: Cannot assign
> requested address
Hi RR,

These messages means that svr01 tried to send a broadcast/multicast 
message to
the socket, but the underlying communications layer returned an error.
Perhaps you can try it without the line:

<multicast addr="224.0.0.1" interface="bond0"/>

In your cluster.conf.

Regards,

Bob Peterson
Red Hat Cluster Suite

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From DERRICK.BEERY at iowa.gov  Thu Jun 29 18:26:53 2006
From: DERRICK.BEERY at iowa.gov (Beery, Derrick [DAS])
Date: Thu, 29 Jun 2006 13:26:53 -0500
Subject: [Linux-cluster] cluster-1.02.00 make errors
Message-ID: <4D9680752635E9448FF261A1443DD293033C14CD@iowadsmex04.iowa.gov.state.ia.us>

Looks like it just needed sysfsutils-devel.

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Beery, Derrick
[DAS]
Sent: Thursday, June 29, 2006 11:28 AM
To: linux clustering
Subject: RE: [Linux-cluster] cluster-1.02.00 make errors

Any ideas on this one?

gnbd_export.c:26:25: error: sysfs/dlist.h: No such file or directory
gnbd_export.c:27:28: error: sysfs/libsysfs.h: No such file or directory
gnbd_export.c: In function $(B!F(Bget_sysfs_name

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Frank Hellmann
Sent: Thursday, June 29, 2006 2:18 AM
To: linux clustering
Subject: Re: [Linux-cluster] cluster-1.02.00 make errors

Hi Derrick,

Make sure you build it this way:

  $ ./configure --kernel_src=/path/to/linux-2.6.x
  $ make install

the usual make; make install won't work for various reasons. Just do a
make install.

       Cheers,
         
                Frank...

Ryan O'Hara wrote:

>
> Derrick,
>
> I was able to build the cluster-1.02.00 code against the vanilla
> 2.6.16 kernel. Did you run the configure script with the --kernel_src
> option to point to the correct kernel tree?
>
> ./configure --kernel_src=/path/to/kernel
>
> Ryan
>
>
>
> Beery, Derrick [DAS] wrote:
>
>> Make against 2.6.16 vanilla kernel is failing with the following
errors:
>>
>>  
>>
>>  CC [M]  /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In
>> function `user_eo_get':
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:72: too few
>> arguments to function `permission'
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In
>> function `user_eo_set':
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:95: too few
>> arguments to function `permission'
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In
>> function `user_eo_remove':
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:120: too
>> few arguments to function `permission'
>>
>> make[5]: ***
>> [/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o] Error 1
>>
>> make[4]: ***
>> [_module_/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs] Error 2
>>
>>  
>>
>> Can anyone shed some light on this?
>>
>>
>> Thanks in advance!
>>
>>  
>>
>> Derrick Beery
>>
>> DAS ITE State of Iowa
>>
>>
>>
------------------------------------------------------------------------
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
------------------------------------------------------------------------
--
Frank Hellmann          Optical Art GmbH           Waterloohain 7a
DI Supervisor           http://www.opticalart.de   22769 Hamburg
frank at opticalart.de     Tel: ++49 40 5111051       Fax: ++49 40 43169199


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From DERRICK.BEERY at iowa.gov  Thu Jun 29 18:36:37 2006
From: DERRICK.BEERY at iowa.gov (Beery, Derrick [DAS])
Date: Thu, 29 Jun 2006 13:36:37 -0500
Subject: [Linux-cluster] cluster-1.02.00 make errors
Message-ID: <4D9680752635E9448FF261A1443DD293033C14CE@iowadsmex04.iowa.gov.state.ia.us>

It also seems that cluster-1.02.00 cannot be built against a kernel
including OpenVZ for some reason. Any ideas why this would be? 

Thanks,
Derrick

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Beery, Derrick
[DAS]
Sent: Thursday, June 29, 2006 1:27 PM
To: linux clustering
Subject: RE: [Linux-cluster] cluster-1.02.00 make errors

Looks like it just needed sysfsutils-devel.

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Beery, Derrick
[DAS]
Sent: Thursday, June 29, 2006 11:28 AM
To: linux clustering
Subject: RE: [Linux-cluster] cluster-1.02.00 make errors

Any ideas on this one?

gnbd_export.c:26:25: error: sysfs/dlist.h: No such file or directory
gnbd_export.c:27:28: error: sysfs/libsysfs.h: No such file or directory
gnbd_export.c: In function $(B!F(Bget_sysfs_name

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Frank Hellmann
Sent: Thursday, June 29, 2006 2:18 AM
To: linux clustering
Subject: Re: [Linux-cluster] cluster-1.02.00 make errors

Hi Derrick,

Make sure you build it this way:

  $ ./configure --kernel_src=/path/to/linux-2.6.x
  $ make install

the usual make; make install won't work for various reasons. Just do a
make install.

       Cheers,
         
                Frank...

Ryan O'Hara wrote:

>
> Derrick,
>
> I was able to build the cluster-1.02.00 code against the vanilla
> 2.6.16 kernel. Did you run the configure script with the --kernel_src
> option to point to the correct kernel tree?
>
> ./configure --kernel_src=/path/to/kernel
>
> Ryan
>
>
>
> Beery, Derrick [DAS] wrote:
>
>> Make against 2.6.16 vanilla kernel is failing with the following
errors:
>>
>>  
>>
>>  CC [M]  /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In
>> function `user_eo_get':
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:72: too few
>> arguments to function `permission'
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In
>> function `user_eo_set':
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:95: too few
>> arguments to function `permission'
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In
>> function `user_eo_remove':
>>
>> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:120: too
>> few arguments to function `permission'
>>
>> make[5]: ***
>> [/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o] Error 1
>>
>> make[4]: ***
>> [_module_/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs] Error 2
>>
>>  
>>
>> Can anyone shed some light on this?
>>
>>
>> Thanks in advance!
>>
>>  
>>
>> Derrick Beery
>>
>> DAS ITE State of Iowa
>>
>>
>>
------------------------------------------------------------------------
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
------------------------------------------------------------------------
--
Frank Hellmann          Optical Art GmbH           Waterloohain 7a
DI Supervisor           http://www.opticalart.de   22769 Hamburg
frank at opticalart.de     Tel: ++49 40 5111051       Fax: ++49 40 43169199


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From ugo.parsi at gmail.com  Thu Jun 29 22:53:41 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Fri, 30 Jun 2006 00:53:41 +0200
Subject: [Linux-cluster] Using GULM with CLVM
In-Reply-To: <44A3DD94.3070001@redhat.com>
References: <f29fd8170606290305i27f60ab7h45845cacf50f84d7@mail.gmail.com>
	<44A3A790.8090206@redhat.com>
	<f29fd8170606290339o1ed8d99chf74cbd2c76844cc8@mail.gmail.com>
	<44A3C16E.6050603@redhat.com>
	<f29fd8170606290612ubb78faarc65e18aeb9d323b3@mail.gmail.com>
	<44A3DD94.3070001@redhat.com>
Message-ID: <f29fd8170606291553k33a27576j714d45691151d195@mail.gmail.com>

>
> clvmd should log errors to syslog. Check that the gulm cluster is quorate as
> clvmd won't do anything without a quorate cluster. You might also like to run
> it with -d and see if any errors appear on stderr.
>

Yes you're right, thanks, if clvmd & lvm are stuck this is because
gulm is inquorate and simply doesn't work at all....

But I still can't figure out on how making it work, I've spent a lot
of hours on it now, and all of my problems seems to be
IPv4/IPv6/hostname related (I guess)....

To ease the configuration and the trial process, I've reduced my
cluster.conf to the simplest case (I guess) : just 2 client nodes and
1 master gulm one. All of them are working on the 10.x.x.x private
IPv4 subnet.

Again I don't know if I can trust the 'documentation' or not, since it
is written that gulm is working on IPv6 sockets only and on the man
pages (man lock_gulmd) it seems that both IPv4 and IPv6 sockets are
handled by GULM.

My first problem is this one :

venus:/etc/init.d# lock_gulmd --use_ccs
Warning! You didn't specify a cluster name before --use_ccs
  Letting ccsd choose which cluster we belong to.
I cannot find the name for ip "::ffff:10.1.1.5". Stopping.
Gulm requires 1,3,4, or 5 nodes to be specified in the servers list.
You specified 0
I cannot find the name for ip "::ffff:10.1.1.5". Stopping.
Gulm requires 1,3,4, or 5 nodes to be specified in the servers list.
You specified 0
venus:/etc/init.d# I cannot find the name for ip "::ffff:10.1.1.5". Stopping.
Gulm requires 1,3,4, or 5 nodes to be specified in the servers list.
You specified 0

Apparently, GULM forces some kind of ~IPv6-translated-IPv4~ address
that it can't find anywhere on the system.

Here's my /etc/hosts :

----------------------------------------
venus:/etc/init.d# cat /etc/hosts
127.0.0.1       localhost.localdomain   localhost
10.1.1.5        venus

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
------------------------------------------

And here's my cluster.conf
-----------------------------------------------------------
venus:/etc/init.d# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster name="iliona" config_version="1">

<gulm>
        <lockserver name="10.1.1.5"/>
</gulm>

<clusternodes>

<clusternode name="mars">
        <fence>
                <method name="single">
                        <device name="human" ipaddr="10.1.1.3"/>
                </method>
        </fence>
</clusternode>

<clusternode name="venus">
        <fence>
                <method name="single">
                        <device name="human" ipaddr="10.1.1.5"/>
                </method>
        </fence>
</clusternode>

<clusternode name="triton">
        <fence>
                <method name="single">
                        <device name="human" ipaddr="10.1.1.6"/>
                </method>
        </fence>
</clusternode>

</clusternodes>

<fencedevices>
        <fencedevice name="human" agent="fence_manual"/>
</fencedevices>

</cluster>
---------------------------------------------------------

So in order to force it's host-matching process, I've modified my
/etc/hosts according to that :

------------------------------------------
venus:/etc/init.d# cat /etc/hosts
127.0.0.1       localhost.localdomain   localhost
10.1.1.5        venus
::ffff:10.1.1.5 venus

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
-------------------------------------------


With that one, it *SEEMED* to work (it's not printing messages at
runtime anymore and silently forks the daemon), but still on the logs
:

It seems it cannot find his IP or host again : Jun 30 00:19:03 venus
lock_gulmd_LTPX[26223]: I am (venus) with ip (::)  :(

Here's the whole part :

Jun 30 00:19:01 venus lock_gulmd_main[26211]: Forked lock_gulmd_core.
Jun 30 00:19:01 venus lock_gulmd_core[26215]: Starting lock_gulmd_core
1.02.00. (built Jun 23 2006 18:56:19) Copyright (C) 2004 Red Hat, Inc.
 All rights reserved.
Jun 30 00:19:01 venus lock_gulmd_core[26215]: I am running in Standard mode.
Jun 30 00:19:01 venus lock_gulmd_core[26215]: I am (venus) with ip (::)
Jun 30 00:19:01 venus lock_gulmd_core[26215]: This is cluster iliona
Jun 30 00:19:01 venus lock_gulmd_core[26215]: EOF on xdr (Magma::26198
::1 idx:1 fd:6)
Jun 30 00:19:02 venus lock_gulmd_main[26211]: Forked lock_gulmd_LT.
Jun 30 00:19:02 venus lock_gulmd_LT[26219]: Starting lock_gulmd_LT
1.02.00. (built Jun 23 2006 18:56:19) Copyright (C) 2004 Red Hat, Inc.
 All rights reserved.
Jun 30 00:19:02 venus lock_gulmd_LT[26219]: I am running in Standard mode.
Jun 30 00:19:02 venus lock_gulmd_LT[26219]: I am (venus) with ip (::)
Jun 30 00:19:02 venus lock_gulmd_LT[26219]: This is cluster iliona
Jun 30 00:19:02 venus lock_gulmd_LT000[26219]: Not serving locks from
this node.
Jun 30 00:19:02 venus lock_gulmd_core[26215]: EOF on xdr (Magma::26198
::1 idx:1 fd:6)
Jun 30 00:19:03 venus lock_gulmd_main[26211]: Forked lock_gulmd_LTPX.
Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: Starting lock_gulmd_LTPX
1.02.00. (built Jun 23 2006 18:56:19) Copyright (C) 2004 Red Hat, Inc.
 All rights reserved.
Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: I am running in Standard mode.
Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: I am (venus) with ip (::)
Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: This is cluster iliona
Jun 30 00:19:03 venus ccsd[26197]: Connected to cluster infrastruture
via: GuLM Plugin v1.0.4
Jun 30 00:19:03 venus ccsd[26197]: Initial status:: Inquorate

And indeed it's not acting as a 'Server/Master' but as a 'Client' too :

venus:/etc/init.d# gulm_tool getstats venus
I_am = Client
quorum_has = 1
quorum_needs = 1
rank = -1
quorate = false
GenerationID = 0
run time = 128
pid = 27456
verbosity = Default
failover = disabled
venus:/etc/init.d#

Of course the other 2 nodes are acting the same way, and with no
master, the cluster is always in inquorate/unusable state, hence my
problems with clvmd/lvm.

I've tried many other things, like putting the names inside
cluster.conf (with the names inside /etc/hosts or DNS-based only,
etc..) instead of IPs, etc.... but still the same error.

I am getting really confused by the whole system and the lack of
documentation is really painful for me to find my mistakes as a
cluster-suite beginner :/

If you have any ideas :),

Thanks a lot,

Ugo PARSI

-- 
An apple a day, keeps the doctor away


From jason at monsterjam.org  Fri Jun 30 22:50:10 2006
From: jason at monsterjam.org (Jason)
Date: Fri, 30 Jun 2006 18:50:10 -0400
Subject: [Linux-cluster] newbie questions
Message-ID: <20060630225010.GA3972@monsterjam.org>

so I have a 2 node cluster im setting up
and im trying to use the /usr/bin/system-config-cluster

im setting up my nodes, and setting up fencing...
and for my AP7900, ive got box1 plugged into ports 1,2
and box 2 plugged into ports 3,4..

first question, I dont see how to set up multiple fence ports for each box.

second question, what the heck does it want on the 
"edit fence properties", it says port, and I understand that, but when it asks for 
switch, what does it want there? seems to want a number, but it takes an ip address as well.
not sure.

regards,
Jason