From wendland at scan-plus.de  Thu Jul  1 01:00:48 2004
From: wendland at scan-plus.de (Joerg Wendland)
Date: Thu, 1 Jul 2004 03:00:48 +0200
Subject: [Linux-cluster] Kernel panic in fs/gfs_locking/lock_dlm/lock.c
Message-ID: <20040701010048.GC25028@dozer>

Hi,

I am running the cluster package rev. 0406282100 using a custom 2.6.7
kernel (vanilla) on three VMware ESX virtual machines sharing one SCSI
disc with GFS on a 10GB logical volume. Setup as proposed in doc/usage.txt, 
mkfs, mount and first tests all went fine until the first machine crashed 
after 20 minutes with the following assertion (messages wrapped):

  lock_dlm:  Assertion failed on line 363 of file 
      fs/gfs_locking/lock_dlm/lock.c
  lock_dlm:  assertion:  "!error"
  lock_dlm:  time = 2482179
  testfs: num=2,178 err=-22 cur=-1 req=5 lkf=0

  Kernel panic: lock_dlm:  Record message above and reboot.

The other two machines were still running although any process accessing
the GFS mountpoint would block infinitely with the effect that the whole
cluster is torn down.

Kind regards,
  Joerg

-- 
| Entwickler Elektronische Datenverarbeitung und Dienstbetriebsmittel          |
| Scan-Plus GmbH Dienstbetriebsmittelherstellung         fon +49-731-92013-0   |
| Koenigstrasse 78, 89077 Ulm, Germany                   fax +49-731-92013-290 |
| Geschaeftsfuehrer: Juergen Hoermann                 HRB 3220 Amtsgericht Ulm |
| PGP-key: 51CF8417 (FP: 79C0 7671 AFC7 315E 657A  F318 57A3 7FBD 51CF 8417)   |
--------------------------------------------------------------------------------
Diese  E-Mail koennte vertrauliche und/oder rechtlich geschuetzte  Informationen
enthalten.  Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuem-
lich erhalten haben,  informieren Sie  bitte sofort den  Absender und vernichten
Sie diese Mail.  Das unerlaubte  Kopieren sowie die unbefugte  Weitergabe dieser
Mail ist nicht gestattet.
--------------------------------------------------------------------------------
This  e-mail may contain confidential and/or privileged information.  If you are
not the intended recipient (or have received this e-mail in error) please notify
the sender immediately and destroy this e-mail.  Any unauthorised copying,  dis-
closure or distribution of the material in this e-mail is strictly forbidden.
--------------------------------------------------------------------------------



From teigland at redhat.com  Thu Jul  1 03:41:55 2004
From: teigland at redhat.com (David Teigland)
Date: Thu, 1 Jul 2004 11:41:55 +0800
Subject: [Linux-cluster] GFS with ATA/IDE drives
In-Reply-To: <s0e2e56d.037@budget.state.ny.us>
References: <s0e2e56d.037@budget.state.ny.us>
Message-ID: <20040701034155.GB11996@redhat.com>


On Wed, Jun 30, 2004 at 04:07:57PM -0400, Mark Neal wrote:
> is there a place to get more in-depth documentation on GFS than just the
> usage.txt file?

The new cluster infrastructure Ken alluded to (and some information on how
GFS uses it) is documented in the "Symmetric Cluster Architecture" paper
which is a work in progress.

http://people.redhat.com/~teigland/sca.pdf

-- 
Dave Teigland  <teigland at redhat.com>



From teigland at redhat.com  Thu Jul  1 03:48:59 2004
From: teigland at redhat.com (David Teigland)
Date: Thu, 1 Jul 2004 11:48:59 +0800
Subject: [Linux-cluster] Kernel panic in fs/gfs_locking/lock_dlm/lock.c
In-Reply-To: <20040701010048.GC25028@dozer>
References: <20040701010048.GC25028@dozer>
Message-ID: <20040701034859.GC11996@redhat.com>


On Thu, Jul 01, 2004 at 03:00:48AM +0200, Joerg Wendland wrote:
> Hi,
> 
> I am running the cluster package rev. 0406282100 using a custom 2.6.7
> kernel (vanilla) on three VMware ESX virtual machines sharing one SCSI
> disc with GFS on a 10GB logical volume. Setup as proposed in doc/usage.txt, 
> mkfs, mount and first tests all went fine until the first machine crashed 
> after 20 minutes with the following assertion (messages wrapped):
> 
>   lock_dlm:  Assertion failed on line 363 of file 
>       fs/gfs_locking/lock_dlm/lock.c
>   lock_dlm:  assertion:  "!error"
>   lock_dlm:  time = 2482179
>   testfs: num=2,178 err=-22 cur=-1 req=5 lkf=0
> 
>   Kernel panic: lock_dlm:  Record message above and reboot.

This is a bug we know of and are working on right now.

> The other two machines were still running although any process accessing
> the GFS mountpoint would block infinitely with the effect that the whole
> cluster is torn down.

You're using manual fencing so I suspect the remaining nodes are waiting for
you to verify the node is dead and then run "fence_ack_manual" on the node
that's running fence_manual (look in /var/log/messages for the relevant
message on that machine.)

-- 
Dave Teigland  <teigland at redhat.com>



From dice at mfa.kfki.hu  Thu Jul  1 04:28:16 2004
From: dice at mfa.kfki.hu (Gergely Tamas)
Date: Thu, 1 Jul 2004 06:28:16 +0200
Subject: [Linux-cluster] GFS with ATA/IDE drives
In-Reply-To: <20040630200604.GA26510@potassium.msp.redhat.com>
References: <4diduga40c0c63s.300620041051@mail.nextresponse.com>
	<20040630200604.GA26510@potassium.msp.redhat.com>
Message-ID: <20040701042816.GA361@mfa.kfki.hu>

Hi!

 > The most simple setup is to have one machine with a big pile of IDE disks 
 > export thost disks to an IP network with GNBD (or iSCSI).

Does anyone know a reliable iSCSI (server side) software implementation?

Thanks in advance,
Gergely



From tom at regio.net  Fri Jul  2 08:30:49 2004
From: tom at regio.net (tom at regio.net)
Date: Fri, 2 Jul 2004 10:30:49 +0200
Subject: [Linux-cluster] Problems with gnbd
Message-ID: <OFFFCEF85A.EEBAC659-ONC1256EC5.002E897A-C1256EC5.002EC4C0@regio.net>





Hi all,

i have a little problem with gnbd_import :

if i start gnbd_import the following error appears :

gnbd_import
gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such
file or directory

Anyone have a idea?

-tom



From Gareth at Linux.co.uk  Fri Jul  2 08:41:46 2004
From: Gareth at Linux.co.uk (Gareth Bult)
Date: Fri, 02 Jul 2004 09:41:46 +0100
Subject: [Linux-cluster] Compiling GFS ..
Message-ID: <1088757706.721.198.camel@squizzey>

Hi,

I'm trying to compile the current /cluster cvs against 2.6.7 and get the
following error .. anyone any idea what I'm doing wrong ?
(cd /custer && ./configure && make)

tia
Gareth.

make[2]: Entering directory `/root/cvs/cluster/dlm-kernel/src'
rm -f cluster service.h cnxman.h cnxman-socket.h
ln -s . cluster
ln -s //usr/include/cluster/service.h .
ln -s //usr/include/cluster/cnxman.h .
ln -s //usr/include/cluster/cnxman-socket.h .
make -C /usr/src/linux-2.6 M=/root/cvs/cluster/dlm-kernel/src modules
USING_KBUILD=yes
make[3]: Entering directory `/usr/src/linux-2.6.7'
  CC [M]  /root/cvs/cluster/dlm-kernel/src/ast.o
In file included from /root/cvs/cluster/dlm-kernel/src/ast.c:20:
/root/cvs/cluster/dlm-kernel/src/dlm_internal.h:36:29: cluster/service.
h: No such file or directory
make[4]: *** [/root/cvs/cluster/dlm-kernel/src/ast.o] Error 1
make[3]: *** [_module_/root/cvs/cluster/dlm-kernel/src] Error 2
make[3]: Leaving directory `/usr/src/linux-2.6.7'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/root/cvs/cluster/dlm-kernel/src'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/root/cvs/cluster/dlm-kernel'
make: *** [all] Error 2

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040702/3c9b277d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040702/3c9b277d/attachment.sig>

From erik at debian.franken.de  Fri Jul  2 09:10:35 2004
From: erik at debian.franken.de (Erik Tews)
Date: Fri, 02 Jul 2004 11:10:35 +0200
Subject: [Linux-cluster] Problems with gnbd
In-Reply-To: <OFFFCEF85A.EEBAC659-ONC1256EC5.002E897A-C1256EC5.002EC4C0@regio.net>
References: <OFFFCEF85A.EEBAC659-ONC1256EC5.002E897A-C1256EC5.002EC4C0@regio.net>
Message-ID: <1088759434.6929.4.camel@localhost>

Am Fr, den 02.07.2004 schrieb tom at regio.net um 10:30:
> gnbd_import
> gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such
> file or directory

First idea, do you got sysfs mounted?



From tom at regio.net  Fri Jul  2 09:35:24 2004
From: tom at regio.net (tom at regio.net)
Date: Fri, 2 Jul 2004 11:35:24 +0200
Subject: [Linux-cluster] Problems with gnbd
In-Reply-To: <1088759434.6929.4.camel@localhost>
Message-ID: <OF7ECAD826.25253905-ONC1256EC5.00347C0C-C1256EC5.0034AEC9@regio.net>





Hi,

in think the problem ist /sys/class/gnbd/gnbd0/name

i dont have this path/devive or what ever it is ;)

i just have  /dev/gnbd and /dev/gnbd_ctl

-tom



                                                                           
             Erik Tews                                                     
             <erik at debian.fran                                             
             ken.de>                                                    To 
             Sent by:                  Discussion of clustering software   
             linux-cluster-bou         components including GFS            
             nces at redhat.com           <linux-cluster at redhat.com>          
                                                                        cc 
                                                                           
             02.07.2004 11:10                                      Subject 
                                       Re: [Linux-cluster] Problems with   
                                       gnbd                                
             Please respond to                                             
               Discussion of                                               
                clustering                                                 
                 software                                                  
                components                                                 
               including GFS                                               
             <linux-cluster at re                                             
                 dhat.com>                                                 
                                                                           
                                                                           




Am Fr, den 02.07.2004 schrieb tom at regio.net um 10:30:
> gnbd_import
> gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such
> file or directory

First idea, do you got sysfs mounted?

--
Linux-cluster mailing list
Linux-cluster at redhat.com
http://www.redhat.com/mailman/listinfo/linux-cluster




From rmayhew at mweb.com  Fri Jul  2 13:18:46 2004
From: rmayhew at mweb.com (Richard Mayhew)
Date: Fri, 2 Jul 2004 15:18:46 +0200
Subject: [Linux-cluster] GFS on RedHat ES 3.0
Message-ID: <91C4F1A7C418014D9F88E938C13554584B281B@mwjdc2.mweb.com>

Hi All,

I am running RedHat ES3.0 with the kernel 2.4.21-15 (I had to downgrade
from 2.4.21-15.02 to be able to install the GFS RPMS).
I build and installed the supplied ES 3.0 RPMS, but when it comes to
doing the depmod -a

I end up with this.
#depmod -a
Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
/lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd.o
Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
/lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd_serv.o
Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
/lib/modules/2.4.21-15.EL/kernel/drivers/md/pool/pool.o
Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
/lib/modules/2.4.21-15.EL/kernel/fs/gfs/gfs.o
Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
/lib/modules/2.4.21-15.EL/kernel/fs/gfs_locking/lock_gulm/lock_gulm.o

Does any one have any pointers?

--

Regards

Richard Mayhew
Unix Specialist

MWEB Business
Tel:  + 27 11 340 7200
Fax:  + 27 11 340 7288
Website: www.mwebbusiness.co.za



From danderso at redhat.com  Fri Jul  2 13:35:43 2004
From: danderso at redhat.com (Derek Anderson)
Date: Fri, 2 Jul 2004 08:35:43 -0500
Subject: [Linux-cluster] Compiling GFS ..
In-Reply-To: <1088757706.721.198.camel@squizzey>
References: <1088757706.721.198.camel@squizzey>
Message-ID: <200407020835.43026.danderso@redhat.com>

Gareth:

(cd cluster && ./configure && make install)

See if that works.

On Friday 02 July 2004 03:41, Gareth Bult wrote:
> Hi,
>
> I'm trying to compile the current /cluster cvs against 2.6.7 and get the
> following error .. anyone any idea what I'm doing wrong ?
> (cd /custer && ./configure && make)
>
> tia
> Gareth.
>
> make[2]: Entering directory `/root/cvs/cluster/dlm-kernel/src'
> rm -f cluster service.h cnxman.h cnxman-socket.h
> ln -s . cluster
> ln -s //usr/include/cluster/service.h .
> ln -s //usr/include/cluster/cnxman.h .
> ln -s //usr/include/cluster/cnxman-socket.h .
> make -C /usr/src/linux-2.6 M=/root/cvs/cluster/dlm-kernel/src modules
> USING_KBUILD=yes
> make[3]: Entering directory `/usr/src/linux-2.6.7'
>   CC [M]  /root/cvs/cluster/dlm-kernel/src/ast.o
> In file included from /root/cvs/cluster/dlm-kernel/src/ast.c:20:
> /root/cvs/cluster/dlm-kernel/src/dlm_internal.h:36:29: cluster/service.
> h: No such file or directory
> make[4]: *** [/root/cvs/cluster/dlm-kernel/src/ast.o] Error 1
> make[3]: *** [_module_/root/cvs/cluster/dlm-kernel/src] Error 2
> make[3]: Leaving directory `/usr/src/linux-2.6.7'
> make[2]: *** [all] Error 2
> make[2]: Leaving directory `/root/cvs/cluster/dlm-kernel/src'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory `/root/cvs/cluster/dlm-kernel'
> make: *** [all] Error 2



From danderso at redhat.com  Fri Jul  2 13:38:46 2004
From: danderso at redhat.com (Derek Anderson)
Date: Fri, 2 Jul 2004 08:38:46 -0500
Subject: [Linux-cluster] GFS on RedHat ES 3.0
In-Reply-To: <91C4F1A7C418014D9F88E938C13554584B281B@mwjdc2.mweb.com>
References: <91C4F1A7C418014D9F88E938C13554584B281B@mwjdc2.mweb.com>
Message-ID: <200407020838.46175.danderso@redhat.com>

Richard:

Which hardware architecture are your machines?  Assuming Intel x86, make sure 
you use the i686.rpms instead of the i386.rpms.

On Friday 02 July 2004 08:18, Richard Mayhew wrote:
> Hi All,
>
> I am running RedHat ES3.0 with the kernel 2.4.21-15 (I had to downgrade
> from 2.4.21-15.02 to be able to install the GFS RPMS).
> I build and installed the supplied ES 3.0 RPMS, but when it comes to
> doing the depmod -a
>
> I end up with this.
> #depmod -a
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd_serv.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/drivers/md/pool/pool.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/fs/gfs/gfs.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/fs/gfs_locking/lock_gulm/lock_gulm.o
>
> Does any one have any pointers?
>
> --
>
> Regards
>
> Richard Mayhew
> Unix Specialist
>
> MWEB Business
> Tel:  + 27 11 340 7200
> Fax:  + 27 11 340 7288
> Website: www.mwebbusiness.co.za
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From bdcneal at budget.state.ny.us  Fri Jul  2 13:34:59 2004
From: bdcneal at budget.state.ny.us (Mark Neal)
Date: Fri, 02 Jul 2004 09:34:59 -0400
Subject: [Linux-cluster] GFS on RedHat ES 3.0
Message-ID: <s0e52c57.000@budget.state.ny.us>

one way to avoid this is to:
1) grab the kernel source (kernel-source-2.4.21-15.0.2.EL)
2) apply the patches that come in the gfs rpm
(GFS-6.0.0-1.2.TL1.src.rpm)
3) compile with your current config file (make sure to do a make
oldconfig to be safe)


Mark Neal
System Administrator - Web Services
NYS Division of Budget
(518) 402-4181


>>> rmayhew at mweb.com 07/02/04 09:19 AM >>>
Hi All,

I am running RedHat ES3.0 with the kernel 2.4.21-15 (I had to downgrade
from 2.4.21-15.02 to be able to install the GFS RPMS).
I build and installed the supplied ES 3.0 RPMS, but when it comes to
doing the depmod -a

I end up with this.
#depmod -a
Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
/lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd.o
Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
/lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd_serv.o
Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
/lib/modules/2.4.21-15.EL/kernel/drivers/md/pool/pool.o
Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
/lib/modules/2.4.21-15.EL/kernel/fs/gfs/gfs.o
Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
/lib/modules/2.4.21-15.EL/kernel/fs/gfs_locking/lock_gulm/lock_gulm.o

Does any one have any pointers?

--

Regards

Richard Mayhew
Unix Specialist

MWEB Business
Tel:  + 27 11 340 7200
Fax:  + 27 11 340 7288
Website: www.mwebbusiness.co.za

--
Linux-cluster mailing list
Linux-cluster at redhat.com
http://www.redhat.com/mailman/listinfo/linux-cluster



From amanthei at redhat.com  Fri Jul  2 13:36:48 2004
From: amanthei at redhat.com (Adam Manthei)
Date: Fri, 2 Jul 2004 08:36:48 -0500
Subject: [Linux-cluster] GFS on RedHat ES 3.0
In-Reply-To: <91C4F1A7C418014D9F88E938C13554584B281B@mwjdc2.mweb.com>
References: <91C4F1A7C418014D9F88E938C13554584B281B@mwjdc2.mweb.com>
Message-ID: <20040702133648.GC23240@redhat.com>

On Fri, Jul 02, 2004 at 03:18:46PM +0200, Richard Mayhew wrote:
> Hi All,
> 
> I am running RedHat ES3.0 with the kernel 2.4.21-15 (I had to downgrade
> from 2.4.21-15.02 to be able to install the GFS RPMS).
> I build and installed the supplied ES 3.0 RPMS, but when it comes to
> doing the depmod -a
> 
> I end up with this.
> #depmod -a
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd_serv.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/drivers/md/pool/pool.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/fs/gfs/gfs.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/fs/gfs_locking/lock_gulm/lock_gulm.o
> 
> Does any one have any pointers?

Make sure the kernel versions and architectures match.  For example, if your 
kernel is i686 SMP, then make sure you have i686 SMP gfs modules too.

> 
> --
> 
> Regards
> 
> Richard Mayhew
> Unix Specialist
> 
> MWEB Business
> Tel:  + 27 11 340 7200
> Fax:  + 27 11 340 7288
> Website: www.mwebbusiness.co.za
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Adam Manthei  <amanthei at redhat.com>



From rmayhew at mweb.com  Fri Jul  2 13:44:26 2004
From: rmayhew at mweb.com (Richard Mayhew)
Date: Fri, 2 Jul 2004 15:44:26 +0200
Subject: [Linux-cluster] GFS on RedHat ES 3.0
Message-ID: <91C4F1A7C418014D9F88E938C13554584B2833@mwjdc2.mweb.com>

Hi
Thanks for the quick response.

I am running on Dell 1750's (Dual P4 2.4Ghz, 2GB Ram) using a EMC CX600
SAN.

I grabbed the 3 RPMS from
ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
Should I be downloading them for another source? Is there support yet
for the latest ES 3.0 kernel?

After rebuilding these RPMS' I only end up with the following.


GFS-6.0.0-1.2.i386.rpm
GFS-6.0.0-1.2.src.rpm
GFS-debuginfo-6.0.0-1.2.i386.rpm
GFS-devel-6.0.0-1.2.i386.rpm
GFS-modules-6.0.0-1.2.i386.rpm
perl-Net-Telnet-3.03-2.noarch.rpm
perl-Net-Telnet-3.03-2.src.rpm
rh-gfs-en-6.0-4.noarch.rpm
rh-gfs-en-6.0-4.src.rpm

This was done with a rpmbuild --rebuild GFS-6.0.0-1.2.src.rpm


Thanks.

-----Original Message-----
From: Derek Anderson [mailto:danderso at redhat.com] 
Sent: 02 July 2004 03:39 PM
To: Discussion of clustering software components including GFS; Richard
Mayhew
Subject: Re: [Linux-cluster] GFS on RedHat ES 3.0

Richard:

Which hardware architecture are your machines?  Assuming Intel x86, make
sure 
you use the i686.rpms instead of the i386.rpms.

On Friday 02 July 2004 08:18, Richard Mayhew wrote:
> Hi All,
>
> I am running RedHat ES3.0 with the kernel 2.4.21-15 (I had to
downgrade
> from 2.4.21-15.02 to be able to install the GFS RPMS).
> I build and installed the supplied ES 3.0 RPMS, but when it comes to
> doing the depmod -a
>
> I end up with this.
> #depmod -a
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd_serv.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/drivers/md/pool/pool.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/fs/gfs/gfs.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/fs/gfs_locking/lock_gulm/lock_gulm.o
>
> Does any one have any pointers?
>
> --
>
> Regards
>
> Richard Mayhew
> Unix Specialist
>
> MWEB Business
> Tel:  + 27 11 340 7200
> Fax:  + 27 11 340 7288
> Website: www.mwebbusiness.co.za
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster




From Gareth at Linux.co.uk  Fri Jul  2 13:59:57 2004
From: Gareth at Linux.co.uk (Gareth Bult)
Date: Fri, 02 Jul 2004 14:59:57 +0100
Subject: [Linux-cluster] Compiling GFS ..
In-Reply-To: <200407020835.43026.danderso@redhat.com>
References: <1088757706.721.198.camel@squizzey>
	<200407020835.43026.danderso@redhat.com>
Message-ID: <1088776797.724.202.camel@squizzey>

:) very funny.

I've found by creating /usr/include/cluster and copying in a few headers
I managed to make it build .. still experimenting ...

Gareth.

On Fri, 2004-07-02 at 08:35 -0500, Derek Anderson wrote:

> Gareth:
> 
> (cd cluster && ./configure && make install)
> 
> See if that works.
> 
> On Friday 02 July 2004 03:41, Gareth Bult wrote:
> > Hi,
> >
> > I'm trying to compile the current /cluster cvs against 2.6.7 and get the
> > following error .. anyone any idea what I'm doing wrong ?
> > (cd /custer && ./configure && make)
> >
> > tia
> > Gareth.
> >
> > make[2]: Entering directory `/root/cvs/cluster/dlm-kernel/src'
> > rm -f cluster service.h cnxman.h cnxman-socket.h
> > ln -s . cluster
> > ln -s //usr/include/cluster/service.h .
> > ln -s //usr/include/cluster/cnxman.h .
> > ln -s //usr/include/cluster/cnxman-socket.h .
> > make -C /usr/src/linux-2.6 M=/root/cvs/cluster/dlm-kernel/src modules
> > USING_KBUILD=yes
> > make[3]: Entering directory `/usr/src/linux-2.6.7'
> >   CC [M]  /root/cvs/cluster/dlm-kernel/src/ast.o
> > In file included from /root/cvs/cluster/dlm-kernel/src/ast.c:20:
> > /root/cvs/cluster/dlm-kernel/src/dlm_internal.h:36:29: cluster/service.
> > h: No such file or directory
> > make[4]: *** [/root/cvs/cluster/dlm-kernel/src/ast.o] Error 1
> > make[3]: *** [_module_/root/cvs/cluster/dlm-kernel/src] Error 2
> > make[3]: Leaving directory `/usr/src/linux-2.6.7'
> > make[2]: *** [all] Error 2
> > make[2]: Leaving directory `/root/cvs/cluster/dlm-kernel/src'
> > make[1]: *** [all] Error 2
> > make[1]: Leaving directory `/root/cvs/cluster/dlm-kernel'
> > make: *** [all] Error 2
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040702/c8d3a1f1/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smiley-3.png
Type: image/png
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040702/c8d3a1f1/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040702/c8d3a1f1/attachment.sig>

From danderso at redhat.com  Fri Jul  2 14:42:20 2004
From: danderso at redhat.com (Derek Anderson)
Date: Fri, 2 Jul 2004 09:42:20 -0500
Subject: [Linux-cluster] Compiling GFS ..
In-Reply-To: <1088776797.724.202.camel@squizzey>
References: <1088757706.721.198.camel@squizzey>
	<200407020835.43026.danderso@redhat.com>
	<1088776797.724.202.camel@squizzey>
Message-ID: <200407020942.20781.danderso@redhat.com>

No, seriously.  The 'make install' target should work without moving header 
files around.

On Friday 02 July 2004 08:59, Gareth Bult wrote:
> :) very funny.
>
> I've found by creating /usr/include/cluster and copying in a few headers
> I managed to make it build .. still experimenting ...
>
> Gareth.
>
> On Fri, 2004-07-02 at 08:35 -0500, Derek Anderson wrote:
> > Gareth:
> >
> > (cd cluster && ./configure && make install)
> >
> > See if that works.
> >
> > On Friday 02 July 2004 03:41, Gareth Bult wrote:
> > > Hi,
> > >
> > > I'm trying to compile the current /cluster cvs against 2.6.7 and get
> > > the following error .. anyone any idea what I'm doing wrong ?
> > > (cd /custer && ./configure && make)
> > >
> > > tia
> > > Gareth.
> > >
> > > make[2]: Entering directory `/root/cvs/cluster/dlm-kernel/src'
> > > rm -f cluster service.h cnxman.h cnxman-socket.h
> > > ln -s . cluster
> > > ln -s //usr/include/cluster/service.h .
> > > ln -s //usr/include/cluster/cnxman.h .
> > > ln -s //usr/include/cluster/cnxman-socket.h .
> > > make -C /usr/src/linux-2.6 M=/root/cvs/cluster/dlm-kernel/src modules
> > > USING_KBUILD=yes
> > > make[3]: Entering directory `/usr/src/linux-2.6.7'
> > >   CC [M]  /root/cvs/cluster/dlm-kernel/src/ast.o
> > > In file included from /root/cvs/cluster/dlm-kernel/src/ast.c:20:
> > > /root/cvs/cluster/dlm-kernel/src/dlm_internal.h:36:29: cluster/service.
> > > h: No such file or directory
> > > make[4]: *** [/root/cvs/cluster/dlm-kernel/src/ast.o] Error 1
> > > make[3]: *** [_module_/root/cvs/cluster/dlm-kernel/src] Error 2
> > > make[3]: Leaving directory `/usr/src/linux-2.6.7'
> > > make[2]: *** [all] Error 2
> > > make[2]: Leaving directory `/root/cvs/cluster/dlm-kernel/src'
> > > make[1]: *** [all] Error 2
> > > make[1]: Leaving directory `/root/cvs/cluster/dlm-kernel'
> > > make: *** [all] Error 2



From bmarzins at redhat.com  Fri Jul  2 15:28:08 2004
From: bmarzins at redhat.com (Benjamin Marzinski)
Date: Fri, 2 Jul 2004 10:28:08 -0500
Subject: [Linux-cluster] GFS with ATA/IDE drives
In-Reply-To: <20040701042816.GA361@mfa.kfki.hu>
References: <4diduga40c0c63s.300620041051@mail.nextresponse.com>
	<20040630200604.GA26510@potassium.msp.redhat.com>
	<20040701042816.GA361@mfa.kfki.hu>
Message-ID: <20040702152808.GB27303@phlogiston.msp.redhat.com>

On Thu, Jul 01, 2004 at 06:28:16AM +0200, Gergely Tamas wrote:
> Hi!
> 
>  > The most simple setup is to have one machine with a big pile of IDE disks 
>  > export thost disks to an IP network with GNBD (or iSCSI).
> 
> Does anyone know a reliable iSCSI (server side) software implementation?
> 
> Thanks in advance,
> Gergely

There are two that I've tried.

http://www.ardistech.com/iscsi/

and

http://sourceforge.net/projects/unh-iscsi/

both seem to work reasonably well.

The ArdisTech target is only for 2.4 but has a more reasonable UI

The UNH target works better, as far as I can tell, and works on 2.6,
but it was obviously designed to test the UNH iscsi initiator. The UI
leaves tons to be desired.

-Ben

> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From bmarzins at redhat.com  Fri Jul  2 15:35:02 2004
From: bmarzins at redhat.com (Benjamin Marzinski)
Date: Fri, 2 Jul 2004 10:35:02 -0500
Subject: [Linux-cluster] Problems with gnbd
In-Reply-To: <OF7ECAD826.25253905-ONC1256EC5.00347C0C-C1256EC5.0034AEC9@regio.net>
References: <1088759434.6929.4.camel@localhost>
	<OF7ECAD826.25253905-ONC1256EC5.00347C0C-C1256EC5.0034AEC9@regio.net>
Message-ID: <20040702153502.GC27303@phlogiston.msp.redhat.com>

On Fri, Jul 02, 2004 at 11:35:24AM +0200, tom at regio.net wrote:
> 
> 
> 
> 
> Hi,
> 
> in think the problem ist /sys/class/gnbd/gnbd0/name
> 
> i dont have this path/devive or what ever it is ;)
> 
> i just have  /dev/gnbd and /dev/gnbd_ctl
> 
> -tom

That is definitely the problem. That is a sysfs file, and you probably
don't have sysfs mounted.

run the command:

# mount -t sysfs sysfs /sys

For more sysfs info, see 

Documentation/filesystems/sysfs.txt in your kernel directory.

-Ben
> 
> 
> 
>                                                                            
>              Erik Tews                                                     
>              <erik at debian.fran                                             
>              ken.de>                                                    To 
>              Sent by:                  Discussion of clustering software   
>              linux-cluster-bou         components including GFS            
>              nces at redhat.com           <linux-cluster at redhat.com>          
>                                                                         cc 
>                                                                            
>              02.07.2004 11:10                                      Subject 
>                                        Re: [Linux-cluster] Problems with   
>                                        gnbd                                
>              Please respond to                                             
>                Discussion of                                               
>                 clustering                                                 
>                  software                                                  
>                 components                                                 
>                including GFS                                               
>              <linux-cluster at re                                             
>                  dhat.com>                                                 
>                                                                            
>                                                                            
> 
> 
> 
> 
> Am Fr, den 02.07.2004 schrieb tom at regio.net um 10:30:
> > gnbd_import
> > gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such
> > file or directory
> 
> First idea, do you got sysfs mounted?
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From Gareth at Linux.co.uk  Fri Jul  2 15:38:48 2004
From: Gareth at Linux.co.uk (Gareth Bult)
Date: Fri, 02 Jul 2004 16:38:48 +0100
Subject: [Linux-cluster] Compiling GFS ..
In-Reply-To: <200407020942.20781.danderso@redhat.com>
References: <1088757706.721.198.camel@squizzey>
	<200407020835.43026.danderso@redhat.com>
	<1088776797.724.202.camel@squizzey>
	<200407020942.20781.danderso@redhat.com>
Message-ID: <1088782728.721.213.camel@squizzey>

Mmm.. I did try various combinations and I thought I'd tried that ..

I'll try another box to confirm as soon as I can get the first one to do
something ..

Regards,
Gareth.

On Fri, 2004-07-02 at 09:42 -0500, Derek Anderson wrote:

> No, seriously.  The 'make install' target should work without moving header 
> files around.
> 
> On Friday 02 July 2004 08:59, Gareth Bult wrote:
> > :) very funny.
> >
> > I've found by creating /usr/include/cluster and copying in a few headers
> > I managed to make it build .. still experimenting ...
> >
> > Gareth.
> >
> > On Fri, 2004-07-02 at 08:35 -0500, Derek Anderson wrote:
> > > Gareth:
> > >
> > > (cd cluster && ./configure && make install)
> > >
> > > See if that works.
> > >
> > > On Friday 02 July 2004 03:41, Gareth Bult wrote:
> > > > Hi,
> > > >
> > > > I'm trying to compile the current /cluster cvs against 2.6.7 and get
> > > > the following error .. anyone any idea what I'm doing wrong ?
> > > > (cd /custer && ./configure && make)
> > > >
> > > > tia
> > > > Gareth.
> > > >
> > > > make[2]: Entering directory `/root/cvs/cluster/dlm-kernel/src'
> > > > rm -f cluster service.h cnxman.h cnxman-socket.h
> > > > ln -s . cluster
> > > > ln -s //usr/include/cluster/service.h .
> > > > ln -s //usr/include/cluster/cnxman.h .
> > > > ln -s //usr/include/cluster/cnxman-socket.h .
> > > > make -C /usr/src/linux-2.6 M=/root/cvs/cluster/dlm-kernel/src modules
> > > > USING_KBUILD=yes
> > > > make[3]: Entering directory `/usr/src/linux-2.6.7'
> > > >   CC [M]  /root/cvs/cluster/dlm-kernel/src/ast.o
> > > > In file included from /root/cvs/cluster/dlm-kernel/src/ast.c:20:
> > > > /root/cvs/cluster/dlm-kernel/src/dlm_internal.h:36:29: cluster/service.
> > > > h: No such file or directory
> > > > make[4]: *** [/root/cvs/cluster/dlm-kernel/src/ast.o] Error 1
> > > > make[3]: *** [_module_/root/cvs/cluster/dlm-kernel/src] Error 2
> > > > make[3]: Leaving directory `/usr/src/linux-2.6.7'
> > > > make[2]: *** [all] Error 2
> > > > make[2]: Leaving directory `/root/cvs/cluster/dlm-kernel/src'
> > > > make[1]: *** [all] Error 2
> > > > make[1]: Leaving directory `/root/cvs/cluster/dlm-kernel'
> > > > make: *** [all] Error 2
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040702/2e4b7646/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040702/2e4b7646/attachment.sig>

From lhh at redhat.com  Fri Jul  2 18:53:35 2004
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 02 Jul 2004 14:53:35 -0400
Subject: [Linux-cluster] one-node cluster.xml (from question on IRC)
Message-ID: <1088794415.25468.8.camel@atlantis.boston.redhat.com>

Place in /etc and /etc/cluster; salt to taste.  You *need* fencing, even
if it's just fence-manual.

-- Lon

<?xml version="1.0"?>
<cluster name="pretty" config_version="5">

<cman>
</cman>

<dlm>
</dlm>

<nodes>
  <node name="red" votes="3">
    <fence>
      <method name="power">
        <device name="wti" port="3"/>
      </method>
    </fence>
  </node>
</nodes>


<fence_devices>
  <device name="apc" agent="fence_apc" ipaddress="apcms.super.com"
login="apc" password="apc"/>
  <device name="wti" agent="fence_wti" ipaddress="wti.super.com"
password="wti"/>
</fence_devices>


<rm>
</rm>

</cluster>




From Gareth at Linux.co.uk  Fri Jul  2 19:07:05 2004
From: Gareth at Linux.co.uk (Gareth Bult)
Date: Fri, 02 Jul 2004 20:07:05 +0100
Subject: [Linux-cluster] one-node cluster.xml (from question on IRC)
In-Reply-To: <1088794415.25468.8.camel@atlantis.boston.redhat.com>
References: <1088794415.25468.8.camel@atlantis.boston.redhat.com>
Message-ID: <1088795225.721.218.camel@squizzey>

Tvm.

Is there any documentation on this anywhere ... ?

tia
Gareth.

On Fri, 2004-07-02 at 14:53 -0400, Lon Hohberger wrote:

> Place in /etc and /etc/cluster; salt to taste.  You *need* fencing, even
> if it's just fence-manual.
> 
> -- Lon
> 
> <?xml version="1.0"?>
> <cluster name="pretty" config_version="5">
> 
> <cman>
> </cman>
> 
> <dlm>
> </dlm>
> 
> <nodes>
>   <node name="red" votes="3">
>     <fence>
>       <method name="power">
>         <device name="wti" port="3"/>
>       </method>
>     </fence>
>   </node>
> </nodes>
> 
> 
> <fence_devices>
>   <device name="apc" agent="fence_apc" ipaddress="apcms.super.com"
> login="apc" password="apc"/>
>   <device name="wti" agent="fence_wti" ipaddress="wti.super.com"
> password="wti"/>
> </fence_devices>
> 
> 
> <rm>
> </rm>
> 
> </cluster>
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040702/ded67665/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040702/ded67665/attachment.sig>

From lhh at redhat.com  Fri Jul  2 20:12:15 2004
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 02 Jul 2004 16:12:15 -0400
Subject: [Linux-cluster] one-node cluster.xml (from question on IRC)
In-Reply-To: <1088795225.721.218.camel@squizzey>
References: <1088794415.25468.8.camel@atlantis.boston.redhat.com>
	<1088795225.721.218.camel@squizzey>
Message-ID: <1088799135.25468.13.camel@atlantis.boston.redhat.com>

On Fri, 2004-07-02 at 20:07 +0100, Gareth Bult wrote:
> Tvm.
> 
> Is there any documentation on this anywhere ... ?

Not a lot.  The format is still subject to change to some degree.
Eventually, there will be a GUI app for configuring it, so you won't
have to memorize the XML tags ;)

-- Lon



From lists at wikidev.net  Sat Jul  3 00:15:56 2004
From: lists at wikidev.net (Gabriel Wicke)
Date: Sat, 03 Jul 2004 02:15:56 +0200
Subject: [Linux-cluster] ccsd hanging after start on debian unstable
Message-ID: <1088813756.2249.15.camel@venus>

Hello,

i have some problems with ccsd on debian unstable- it hangs after
starting and eats 100% cpu. A normal kill is enough to stop it again.

I've done a strace of ccsd -n, full log at
http://dl.aulinx.de/gfs/ccsd.strace.

I'm willing to investigate the cause for this, any pointers appreciated-
here or in #linux-cluster (nick gwicke).

Thanks
-- 
Gabriel Wicke



From jeff at intersystems.com  Sat Jul  3 14:33:56 2004
From: jeff at intersystems.com (Jeff)
Date: Sat, 3 Jul 2004 10:33:56 -0400
Subject: [Linux-cluster] Some GDLM questions
Message-ID: <104121513.20040703103356@intersystems.com>

These are from reviewing http://people.redhat.com/~teigland/sca.pdf
and the CVS copy of cluster/dlm/doc/libdlm.txt.
------------------------------------------------------------------

If a program requests a lock on the AST side can it wait for
the lock to complete without returning from the original AST
routine?  Would it use the poll/select mechanism to do this?

What's the best way to implement a blocking lock request in
an application where some requests are synchronous and some
are asynchronous? Use semop() after the lock request and in 
the lock completion routine? Is semop() safe to call from
a thread on Linux? Would pthread_cond_wait()/pthread_cond_signal()
be better?

Does conversion deadlock occur only when a conversion is
about to be queued and its granted/requested state is 
incompatible with another lock already on the conversion queue?
(eg. there is a PR->EX conversion queued and another PR->EX
conversion is about to be queued)

Other DLMs do not deliver a blocking AST to a lock which is not
on the granted queue. This means that a lock which queued for
conversion will not get a blocking AST if it is interfering with
another lock being added to the conversion queue. Does GDLM do this 
as well or are blocking ASTs delivered to all locks regardless of 
their state?

GDLM is not listed as a client of FENCE. This seems to imply
that a GDLM application has to interact directly with FENCE to 
deal with the unknown state problem in a 2 node cluster where each 
member has 1 vote and expected votes is 1 (section 3.2.6.2, page 28)
as otherwise the same lockspace could end up existing on multiple
machines in a single cluster. How would an application interact
with FENCE to prevent this or does this have to be handled by
configuring the cluster to reboot in this case?

libdlm.txt has a vague comment which reads:
   One further point about lockspace operations is that there is no locking
   on the creating/destruction of lockspaces in the library so it is up to
   the application to only call dlm_*_lockspace when it is sure that
   no other locking operations are likely to be happening.
Does this mean 'no other locking operations' by the process which is
creating the lockspace? no other requests to create a lock space on
that cluster member? in the cluster as a whole?


Possible Enhancements:
----------------------
The following two items are areas where GDLM appears to differ from
the DLMs from HP and IBM (eg for VMS, Tru64, AIX and OpenDLM for
Linux which is derived from IBM's DLM for AIX). These differences 
aren't incompatible with GFS's requirements and could be implemented 
as optional behaviors. I'd be happy to work on patches for
these if they would be welcome.

GDLM is described as granting new lock requests as long as they 
are compatible with the existing lock mode regardless of the 
existence of a conversion queue. The other DLMs mentioned above
always queue new lock requests if there are any locks on the conversion 
queue. Certain mechanisms can't be implemented without this kind of
ordering. Would it be possible to make the alternate behavior a property 
of the lock space or a property of a grant request so it can be 
utilized where necessary?

Certain tasks are simplified if the return status of a lock indicates
whether it was granted immediately or ended up on the waiting queue.
Other DLMs which have both synchronous and asynchronous completion
mechanisms implement this via a flag which requests synchronous
completion if the lock is available, otherwise the request is queued
and the asynchronous mechanism is used. This is particularly useful 
for deadman locks that control recovery to distinguish between 
the first instance of a service to start and recovery conditions.
There are other (more complex) techniques to implement this but 
even though GDLM is purely an asynchronous mechanism, it still would 
be possible for the completion status to indicate (if requested) 
whether the lock was granted immediately or not.




From Gareth at Linux.co.uk  Sat Jul  3 16:20:39 2004
From: Gareth at Linux.co.uk (Gareth Bult)
Date: Sat, 03 Jul 2004 17:20:39 +0100
Subject: [Linux-cluster] ccsd hanging after start on debian unstable
In-Reply-To: <1088813756.2249.15.camel@venus>
References: <1088813756.2249.15.camel@venus>
Message-ID: <1088871581.7439.1.camel@rag.linux.co.uk>

Hi,

fyi; I get this both on AMD64 and x86 test boxes .. 

I've tried two methods;
a. CVS
b. EBuilds from Datacore 

(!)

Same results, strace says the "network is down" in a loop using 100% CPU

Gareth.

On Sat, 2004-07-03 at 01:15, Gabriel Wicke wrote:
> Hello,
> 
> i have some problems with ccsd on debian unstable- it hangs after
> starting and eats 100% cpu. A normal kill is enough to stop it again.
> 
> I've done a strace of ccsd -n, full log at
> http://dl.aulinx.de/gfs/ccsd.strace.
> 
> I'm willing to investigate the cause for this, any pointers appreciated-
> here or in #linux-cluster (nick gwicke).
> 
> Thanks



From teigland at redhat.com  Sat Jul  3 16:00:47 2004
From: teigland at redhat.com (David Teigland)
Date: Sun, 4 Jul 2004 00:00:47 +0800
Subject: [Linux-cluster] Some GDLM questions
In-Reply-To: <104121513.20040703103356@intersystems.com>
References: <104121513.20040703103356@intersystems.com>
Message-ID: <20040703160047.GD8257@redhat.com>


> GDLM is not listed as a client of FENCE. This seems to imply
> that a GDLM application has to interact directly with FENCE to 
> deal with the unknown state problem in a 2 node cluster where each 
> member has 1 vote and expected votes is 1 (section 3.2.6.2, page 28)
> as otherwise the same lockspace could end up existing on multiple
> machines in a single cluster. How would an application interact
> with FENCE to prevent this or does this have to be handled by
> configuring the cluster to reboot in this case?

This is the quickest one to answer right off the bat.  We'll get to the others
over the next few days I expect.

Fencing is a service that runs on its own in a CMAN cluster; it's entirely
independent from other services.  GFS simply checks to verify fencing is
running before allowing a mount since it's especially dangerous for a mount to
succeed without it.

As soon as a node joins a fencing domain it will be fenced by another domain
member if it fails.  i.e. as soon as a node runs:

> cman_tool join    (joins the cluster)
> fence_tool join   (starts fenced which joins the default fence domain)

it will be fenced by other domain members if it fails.  So, you simply need to
configure your nodes to run fence_tool join after joining the cluster if you
want fencing to happen.  You can add any checks later on that you think are
necessary to be sure that the node is in the fence domain.  (Looking at
/proc/cluster/services is one way.)

Running fence_tool leave will remove a node cleanly from the fence domain (it
won't be fenced by other members.)

One note of warning.  If the fence daemon (fenced) process is killed on node X,
it appears to fenced processes on other nodes that X has left the domain
cleanly (just as if it had run fence_tool leave).  X only leaves the domain
"uncleanly" when the node itself fails (meaning the cluster manager decides X
has failed.)  There is some further development planned to address this.

-- 
Dave Teigland  <teigland at redhat.com>



From Gareth at Bult.co.uk  Sat Jul  3 10:05:17 2004
From: Gareth at Bult.co.uk (Gareth Bult)
Date: Sat, 03 Jul 2004 11:05:17 +0100
Subject: [Linux-cluster] one-node cluster.xml (from question on IRC)
In-Reply-To: <1088799135.25468.13.camel@atlantis.boston.redhat.com>
References: <1088794415.25468.8.camel@atlantis.boston.redhat.com>
	<1088795225.721.218.camel@squizzey>
	<1088799135.25468.13.camel@atlantis.boston.redhat.com>
Message-ID: <1088849117.724.222.camel@squizzey>

Ok,

Well here's a rather useful page someone passed me .. :)

https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS.Install

It even has sample cluster.xml's  ;-)

Regards,
Gareth.


On Fri, 2004-07-02 at 16:12 -0400, Lon Hohberger wrote:

> On Fri, 2004-07-02 at 20:07 +0100, Gareth Bult wrote:
> > Tvm.
> > 
> > Is there any documentation on this anywhere ... ?
> 
> Not a lot.  The format is still subject to change to some degree.
> Eventually, there will be a GUI app for configuring it, so you won't
> have to memorize the XML tags ;)
> 
> -- Lon
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gareth Bult <Gareth at Bult.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040703/d99bc115/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smiley-4.png
Type: image/png
Size: 822 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040703/d99bc115/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smiley-3.png
Type: image/png
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040703/d99bc115/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040703/d99bc115/attachment.sig>

From Gareth at Bult.co.uk  Sat Jul  3 10:07:25 2004
From: Gareth at Bult.co.uk (Gareth Bult)
Date: Sat, 03 Jul 2004 11:07:25 +0100
Subject: [Linux-cluster] ccsd hanging after start on debian unstable
In-Reply-To: <1088813756.2249.15.camel@venus>
References: <1088813756.2249.15.camel@venus>
Message-ID: <1088849245.724.224.camel@squizzey>

Hi,

ccsd starts a number of threads ..

pick the one eating the CPU and "strace -p" it ..

Gareth.

On Sat, 2004-07-03 at 02:15 +0200, Gabriel Wicke wrote:

> Hello,
> 
> i have some problems with ccsd on debian unstable- it hangs after
> starting and eats 100% cpu. A normal kill is enough to stop it again.
> 
> I've done a strace of ccsd -n, full log at
> http://dl.aulinx.de/gfs/ccsd.strace.
> 
> I'm willing to investigate the cause for this, any pointers appreciated-
> here or in #linux-cluster (nick gwicke).
> 
> Thanks

-- 
Gareth Bult <Gareth at Bult.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040703/633a29b4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040703/633a29b4/attachment.sig>

From Gareth at Bult.co.uk  Sat Jul  3 19:56:10 2004
From: Gareth at Bult.co.uk (Gareth Bult)
Date: Sat, 03 Jul 2004 20:56:10 +0100
Subject: [Linux-cluster] ccsd hanging after start on debian unstable
In-Reply-To: <1088871581.7439.1.camel@rag.linux.co.uk>
References: <1088813756.2249.15.camel@venus>
	<1088871581.7439.1.camel@rag.linux.co.uk>
Message-ID: <1088884570.30660.1.camel@squizzey>

Ok, this was as a result of an invalid tag in cluster.xml .. is there a
way to validate cluster.xml is it ccsd does not appear to print any
warnings/errors if the file is invalid .. ?

Gareth.

On Sat, 2004-07-03 at 17:20 +0100, Gareth Bult wrote:

> Hi,
> 
> fyi; I get this both on AMD64 and x86 test boxes .. 
> 
> I've tried two methods;
> a. CVS
> b. EBuilds from Datacore 
> 
> (!)
> 
> Same results, strace says the "network is down" in a loop using 100% CPU
> 
> Gareth.
> 
> On Sat, 2004-07-03 at 01:15, Gabriel Wicke wrote:
> > Hello,
> > 
> > i have some problems with ccsd on debian unstable- it hangs after
> > starting and eats 100% cpu. A normal kill is enough to stop it again.
> > 
> > I've done a strace of ccsd -n, full log at
> > http://dl.aulinx.de/gfs/ccsd.strace.
> > 
> > I'm willing to investigate the cause for this, any pointers appreciated-
> > here or in #linux-cluster (nick gwicke).
> > 
> > Thanks
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gareth Bult <Gareth at Bult.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040703/f0b94a0c/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040703/f0b94a0c/attachment.sig>

From Gareth at Bult.co.uk  Sun Jul  4 00:09:42 2004
From: Gareth at Bult.co.uk (Gareth Bult)
Date: Sun, 04 Jul 2004 01:09:42 +0100
Subject: [Linux-cluster] Possible problem with different architectures
Message-ID: <1088899782.11202.9.camel@squizzey>

Hi,

With help from the guys on #linux-cluster ( thanks guys :) ) I've
managed to get a 3-node cluster running.

Two of the nodes are x86 and the third is an amd64 - all are running
identical Gentoo installs on kernel 2.6.7.
All are running an up-to-date cvs /cluster.

I can successfully export a device from one x86 box to another, then
format/mount a gfs on it on both x86 boxes - this works great.

However, I can't run gnbd_import on the amd64 box.  I get;

gnbd_import: /dev/gnbd/netdisc is not in use. deleting
gnbd_import: created gnbd device netdisc2
gnbd_monitor: gnbd_monitor started. Monitoring device #0
<gnbd_import does not return, Ctrl-C at this point>
gnbd_import: ERROR gnbd_recvd failed

It "looks" like gnbd_recvd is failing to complete a handshake, i.e.
hanging half way through ..
.. Any suggestions welcome.

On another note, I've had a number of kernel crashes and I'm wondering
looking at the logs whether it's because I'm running a preemtable
kernel ... ?

Here are two sample crash dumps from syslog.. typically the machine goes
D-state on the processes involved and won't shutdown cleanly ...

Crash #1 (x86 box):

Jul  3 22:44:48 rag CMAN: node squizzey.linux.co.uk is not responding -
removing from the cluster
Jul  3 22:44:53 rag dlm: clvmd: recover event 2 (first)
Jul  3 22:44:53 rag dlm: clvmd: add nodes
Jul  3 22:44:53 rag Unable to handle kernel paging request at virtual
address 0c000000
Jul  3 22:44:53 rag printing eip:
Jul  3 22:44:53 rag c013c2cb
Jul  3 22:44:53 rag *pde = 00000000
Jul  3 22:44:53 rag Oops: 0000 [#1]
Jul  3 22:44:53 rag PREEMPT
Jul  3 22:44:53 rag Modules linked in: gnbd gfs lock_dlm dlm cman
lock_harness ohci_hcd e100 mii snd_intel8x0 snd_ac97_codec snd_pcm
snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi
snd_seq_device snd uhci_hcd intel_agp agpgart st usb_storage scsi_mod
ehci_hcd usbcore
Jul  3 22:44:53 rag CPU:    0
Jul  3 22:44:53 rag EIP:    0060:[<c013c2cb>]    Not tainted
Jul  3 22:44:53 rag EFLAGS: 00010292   (2.6.7)
Jul  3 22:44:53 rag EIP is at page_address+0xb/0xb0
Jul  3 22:44:53 rag eax: 0c000000   ebx: 0c000000   ecx: 00000000   edx:
18e0e600
Jul  3 22:44:53 rag esi: 18e0e600   edi: e0e600b8   ebp: e0e600e8   esp:
e0e15e1c
Jul  3 22:44:53 rag ds: 007b   es: 007b   ss: 0068
Jul  3 22:44:53 rag Process dlm_recoverd (pid: 9579, threadinfo=e0e14000
task=e6542eb0)
Jul  3 22:44:53 rag Stack: 00000000 e0e60001 18e0e600 e0e600b8 e0e600e8
e85baee1 0c000000 e85c84b7
Jul  3 22:44:53 rag 18e0e600 18000000 00000018 e0e15ee0 00000002
00000002 e85bb3cf 00000002
Jul  3 22:44:53 rag 00000018 000000d0 e0e15e6c 00000000 00000000
00000018 e0e15ee0 00000002
Jul  3 22:44:53 rag Call Trace:
Jul  3 22:44:53 rag [<e85baee1>] lowcomms_get_buffer+0x81/0x150 [dlm]
Jul  3 22:44:53 rag [<e85bb3cf>] lowcomms_send_message+0x3f/0xf0 [dlm]
Jul  3 22:44:53 rag [<e85bccf4>] midcomms_send_message+0x44/0x70 [dlm]
Jul  3 22:44:53 rag [<e85c1621>] rcom_send_message+0xd1/0x210 [dlm]
Jul  3 22:44:53 rag [<e85c23f0>] gdlm_wait_status_low+0x60/0x90 [dlm]
Jul  3 22:44:53 rag [<e85bd07a>] nodes_reconfig_wait+0x2a/0x80 [dlm]
Jul  3 22:44:53 rag [<e85bd57f>] ls_nodes_init+0xbf/0x150 [dlm]
Jul  3 22:44:53 rag [<e85c31d2>] ls_first_start+0x62/0x160 [dlm]
Jul  3 22:44:53 rag [<e85c420d>] do_ls_recovery+0x1ed/0x430 [dlm]
Jul  3 22:44:53 rag [<e85c4593>] dlm_recoverd+0x143/0x180 [dlm]
Jul  3 22:44:53 rag [<c0114620>] default_wake_function+0x0/0x20
Jul  3 22:44:53 rag [<c0105c72>] ret_from_fork+0x6/0x14
Jul  3 22:44:53 rag [<c0114620>] default_wake_function+0x0/0x20
Jul  3 22:44:53 rag [<e85c4450>] dlm_recoverd+0x0/0x180 [dlm]
Jul  3 22:44:53 rag [<c0103f4d>] kernel_thread_helper+0x5/0x18
Jul  3 22:44:53 rag
Jul  3 22:44:53 rag Code: 8b 03 f6 c4 01 75 1e 8b 2d 8c 63 48 c0 29 eb
c1 fb 05 c1 e3
Jul  3 22:44:53 rag ccsd[9560]: Error while processing get: No data
available

Crash #2: (amd64)

Jul  3 21:42:28 squizzey dlm: clvmd: recover event 2 (first)
Jul  3 21:42:28 squizzey dlm: clvmd: add nodes
Jul  3 21:42:28 squizzey Unable to handle kernel NULL pointer
dereference at 000000000000008a RIP:
Jul  3 21:42:28 squizzey <ffffffffa06b5dc6>{:dlm:send_to_sock+54}
Jul  3 21:42:28 squizzey PML4 3f7a9067 PGD b591067 PMD 0
Jul  3 21:42:28 squizzey Oops: 0000 [1] PREEMPT
Jul  3 21:42:28 squizzey CPU 0
Jul  3 21:42:28 squizzey Modules linked in: gnbd lock_dlm dlm cman gfs
lock_harness dm_mod ipt_ttl ipt_limit ipt_state iptable_filter
iptable_mangle ipt_LOG ipt_MASQUERADE ipt_TOS ipt_REDIRECT iptable_nat
ipt_REJECT ip_tables ip_conntrack_irc ip_conntrack_ftp ip_conntrack
nvidia usblp usbhid forcedeth ohci_hcd snd_intel8x0 snd_ac97_codec
snd_mpu401_uart snd_rawmidi snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_pcm snd_page_alloc snd_timer
snd_mixer_oss snd usb_storage ehci_hcd usbcore
Jul  3 21:42:28 squizzey Pid: 31748, comm: dlm_sendd Tainted: P   2.6.7
Jul  3 21:42:28 squizzey RIP: 0010:[<ffffffffa06b5dc6>]
<ffffffffa06b5dc6>{:dlm:send_to_sock+54}
Jul  3 21:42:28 squizzey RSP: 0018:00000100319b5ec8  EFLAGS: 00010202
Jul  3 21:42:28 squizzey RAX: 0000000000000002 RBX: ffffffffa06ca0f0
RCX: 00000100139c80c0
Jul  3 21:42:28 squizzey RDX: 0000000000000000 RSI: 00000000ffffffff
RDI: 00000100139c80b8
Jul  3 21:42:28 squizzey RBP: 00000100139c80a8 R08: 00000100319b4000
R09: 0000000000000000
Jul  3 21:42:28 squizzey R10: 00000000ffffffff R11: 0000000000000000
R12: 0000010030d1d150
Jul  3 21:42:28 squizzey R13: 00000100139c80a8 R14: 0000000000000000
R15: 000000358cc16f78
Jul  3 21:42:28 squizzey FS:  000000358d80f640(0000) GS:ffffffff804f61c0
(0000) knlGS:0000000000000000
Jul  3 21:42:28 squizzey CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Jul  3 21:42:28 squizzey CR2: 000000000000008a CR3: 0000000000101000
CR4: 00000000000006e0
Jul  3 21:42:28 squizzey Process dlm_sendd (pid: 31748, threadinfo
00000100319b4000, task 000001000676a000)
Jul  3 21:42:28 squizzey Stack: 0000007a319b5f08 00000100139c80b8
0000000000000a64 ffffffffa06ca0f0
Jul  3 21:42:28 squizzey 00000100139c80a8 0000010030d1d150
0000000000000005 00000100297df89c
Jul  3 21:42:28 squizzey 000000358cc16f78 ffffffffa06b637d
Jul  3 21:42:28 squizzey Call Trace:<ffffffffa06b637d>
{:dlm:process_output_queue+157} <ffffffffa06b68b8>{:dlm:dlm_sendd+184}
Jul  3 21:42:28 squizzey <ffffffff8011126f>{child_rip+8}
<ffffffffa06b6800>{:dlm:dlm_sendd+0}
Jul  3 21:42:28 squizzey <ffffffff80111267>{child_rip+0}
Jul  3 21:42:28 squizzey
Jul  3 21:42:28 squizzey Code: 48 8b 80 88 00 00 00 48 89 44 24 10 65 48
8b 04 25 18 00 00

-- 
Gareth Bult <Gareth at Bult.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040704/07d08853/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smiley-3.png
Type: image/png
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040704/07d08853/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040704/07d08853/attachment.sig>

From jeff at intersystems.com  Sun Jul  4 14:41:58 2004
From: jeff at intersystems.com (Jeff)
Date: Sun, 4 Jul 2004 10:41:58 -0400
Subject: [Linux-cluster] Some GDLM questions
In-Reply-To: <20040703160047.GD8257@redhat.com>
References: <104121513.20040703103356@intersystems.com>
	<20040703160047.GD8257@redhat.com>
Message-ID: <7810099096.20040704104158@intersystems.com>

Saturday, July 3, 2004, 12:00:47 PM, David Teigland wrote:


>> GDLM is not listed as a client of FENCE. This seems to imply
>> that a GDLM application has to interact directly with FENCE to 
>> deal with the unknown state problem in a 2 node cluster where each 
>> member has 1 vote and expected votes is 1 (section 3.2.6.2, page 28)
>> as otherwise the same lockspace could end up existing on multiple
>> machines in a single cluster. How would an application interact
>> with FENCE to prevent this or does this have to be handled by
>> configuring the cluster to reboot in this case?

> This is the quickest one to answer right off the bat.  We'll get to the others
> over the next few days I expect.

> Fencing is a service that runs on its own in a CMAN cluster; it's entirely
> independent from other services.  GFS simply checks to verify fencing is
> running before allowing a mount since it's especially dangerous for a mount to
> succeed without it.

> As soon as a node joins a fencing domain it will be fenced by another domain
> member if it fails.  i.e. as soon as a node runs:

>> cman_tool join    (joins the cluster)
>> fence_tool join   (starts fenced which joins the default fence domain)

> it will be fenced by other domain members if it fails.  So, you simply need to
> configure your nodes to run fence_tool join after joining the cluster if you
> want fencing to happen.  You can add any checks later on that you think are
> necessary to be sure that the node is in the fence domain.  (Looking at
> /proc/cluster/services is one way.)

> Running fence_tool leave will remove a node cleanly from the fence domain (it
> won't be fenced by other members.)

> One note of warning.  If the fence daemon (fenced) process is killed on node X,
> it appears to fenced processes on other nodes that X has left the domain
> cleanly (just as if it had run fence_tool leave).  X only leaves the domain
> "uncleanly" when the node itself fails (meaning the cluster manager decides X
> has failed.)  There is some further development planned to address this.


I understand the above but its still not clear to me how a
locking application would get fenced. On startup the application
could check that the cluster member has joined the fence domain.
This will ensure that it gets fenced if something goes wrong.

What's not clear is how the fence process will shut down (or
suspend) the locking application while fencing the node. Fencing
seems to be related to blocking access to I/O devices.







From lists at wikidev.net  Sun Jul  4 19:27:04 2004
From: lists at wikidev.net (Gabriel Wicke)
Date: Sun, 04 Jul 2004 21:27:04 +0200
Subject: [Linux-cluster] ccsd hanging after start on debian unstable
In-Reply-To: <1088849245.724.224.camel@squizzey>
References: <1088813756.2249.15.camel@venus>
	<1088849245.724.224.camel@squizzey>
Message-ID: <1088969224.1246.10.camel@venus>

On Sat, 2004-07-03 at 11:07 +0100, Gareth Bult wrote:
> Hi,
> 
> ccsd starts a number of threads ..
> 
> pick the one eating the CPU and "strace -p" it ..

Thanks for this tip, i found some information that might be useful.
There's only one thread created that doesn't show up in ps aux or top,
but it's possible to connect to it by using strace -p  pid-of-parent+1.
Output is heaps of lines like this, constantly looping/scrolling:

socket(PF_BLUETOOTH, SOCK_DGRAM, 3)     = -1 ENETDOWN (Network is down)

So i suspected some weird Bluetooth/GFS interaction. Recompiled the
kernel with Bluetooth support disabled, but same thing. ccs_test seems
to work however, the results returned are correct. I've since double-
checked cluster.xml a few times, that's very likely not the reason
(posted it at http://dl.aulinx.de/gfs/cluster.xml).
-- 
Gabriel Wicke



From teigland at redhat.com  Mon Jul  5 02:39:47 2004
From: teigland at redhat.com (David Teigland)
Date: Mon, 5 Jul 2004 10:39:47 +0800
Subject: [Linux-cluster] Some GDLM questions
In-Reply-To: <7810099096.20040704104158@intersystems.com>
References: <104121513.20040703103356@intersystems.com>
	<20040703160047.GD8257@redhat.com>
	<7810099096.20040704104158@intersystems.com>
Message-ID: <20040705023947.GA6629@redhat.com>


> I understand the above but its still not clear to me how a
> locking application would get fenced. On startup the application
> could check that the cluster member has joined the fence domain.
> This will ensure that it gets fenced if something goes wrong.
> 
> What's not clear is how the fence process will shut down (or
> suspend) the locking application while fencing the node. Fencing
> seems to be related to blocking access to I/O devices.

I'm not entirely sure what you're asking, but I hope a long and broad answer
might answer it.

say there's a two node cluster of nodes A and B
both nodes are running cman, fence, dlm and some application using the dlm

1. node A: hangs and is unresponsive
2. node B: cman detects that A has failed
3. node B: all cluster services are stopped/suspended
           (these services are fence and dlm in this example)
4. node B: while dlm service is stopped, it blocks all lock requests
5. node B: cluster still has quorum because of special "two_node" config
6. node B: fence service is started/enabled
7. node B: fence service fences node A
8. node B: dlm service is started/enabled
9. node B: dlm service recovers the application's lock space and
           lock requests proceed as usual

If the fencing method in step 7 only blocks access to i/o devices from node A,
node A could potentially "revive" and continue running.  The dlm on node B no
longer accepts A as a member of the lockspace so any dlm messages from A will
be ignored by B.

Depending on the application this may not be sufficient to prevent a revived
node A from causing problems.  If so, the simplest thing is to use a fencing
method that resets the power on node A rather than simply blocking its device
i/o.

-- 
Dave Teigland  <teigland at redhat.com>



From teigland at redhat.com  Mon Jul  5 03:22:08 2004
From: teigland at redhat.com (David Teigland)
Date: Mon, 5 Jul 2004 11:22:08 +0800
Subject: [Linux-cluster] Some GDLM questions
In-Reply-To: <104121513.20040703103356@intersystems.com>
References: <104121513.20040703103356@intersystems.com>
Message-ID: <20040705032208.GB6629@redhat.com>


> Does conversion deadlock occur only when a conversion is
> about to be queued and its granted/requested state is 
> incompatible with another lock already on the conversion queue?
> (eg. there is a PR->EX conversion queued and another PR->EX
> conversion is about to be queued)

Yes.  The application can't know ahead of time, of course, whether this will
happen since both PR holders may convert to EX at the same time.


> Other DLMs do not deliver a blocking AST to a lock which is not
> on the granted queue. This means that a lock which queued for
> conversion will not get a blocking AST if it is interfering with
> another lock being added to the conversion queue. Does GDLM do this 
> as well or are blocking ASTs delivered to all locks regardless of 
> their state?

We only send blocking asts for locks on the granted queue.  This may
be a change from what was written in the sca document (which has become
incorrect in some places over the past 6 months.)


> Possible Enhancements:
> ----------------------
> The following two items are areas where GDLM appears to differ from
> the DLMs from HP and IBM (eg for VMS, Tru64, AIX and OpenDLM for
> Linux which is derived from IBM's DLM for AIX). These differences 
> aren't incompatible with GFS's requirements and could be implemented 
> as optional behaviors. I'd be happy to work on patches for
> these if they would be welcome.

Yes, we'd be very happy to get patches.


> GDLM is described as granting new lock requests as long as they 
> are compatible with the existing lock mode regardless of the 
> existence of a conversion queue. The other DLMs mentioned above
> always queue new lock requests if there are any locks on the conversion 
> queue. Certain mechanisms can't be implemented without this kind of
> ordering. Would it be possible to make the alternate behavior a property 
> of the lock space or a property of a grant request so it can be 
> utilized where necessary?

If that's the more standard behavior we should look at making it default
for us, too.  Otherwise a new flag sounds appropriate.


> Certain tasks are simplified if the return status of a lock indicates
> whether it was granted immediately or ended up on the waiting queue.
> Other DLMs which have both synchronous and asynchronous completion
> mechanisms implement this via a flag which requests synchronous
> completion if the lock is available, otherwise the request is queued
> and the asynchronous mechanism is used. This is particularly useful 
> for deadman locks that control recovery to distinguish between 
> the first instance of a service to start and recovery conditions.
> There are other (more complex) techniques to implement this but 
> even though GDLM is purely an asynchronous mechanism, it still would 
> be possible for the completion status to indicate (if requested) 
> whether the lock was granted immediately or not.

A "flags" field in the LKSB can also be used to return information like this.

-- 
Dave Teigland  <teigland at redhat.com>



From pcaulfie at redhat.com  Mon Jul  5 09:31:35 2004
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 5 Jul 2004 10:31:35 +0100
Subject: [Linux-cluster] Some GDLM questions
In-Reply-To: <104121513.20040703103356@intersystems.com>
References: <104121513.20040703103356@intersystems.com>
Message-ID: <20040705093135.GB30146@tykepenguin.com>

On Sat, Jul 03, 2004 at 10:33:56AM -0400, Jeff wrote:
> These are from reviewing http://people.redhat.com/~teigland/sca.pdf
> and the CVS copy of cluster/dlm/doc/libdlm.txt.
> ------------------------------------------------------------------
> 
> If a program requests a lock on the AST side can it wait for
> the lock to complete without returning from the original AST
> routine?  Would it use the poll/select mechanism to do this?

In kernel space you shouldn't wait or do much work in the AST routine or
you can block the kernel's AST delivery thread. You can call dlm_lock() in an
AST routine though.

In userspace you can do pretty much what you like in the AST routine as (by
default) they run in a seperate thread - see libdlm for more details on this.
 
> What's the best way to implement a blocking lock request in
> an application where some requests are synchronous and some
> are asynchronous? Use semop() after the lock request and in 
> the lock completion routine? Is semop() safe to call from
> a thread on Linux? Would pthread_cond_wait()/pthread_cond_signal()
> be better?

pthreads are recommended for userspace locking. As I mentioned above libdlm
uses pthreads (though you can switch this off if you want a non-threaded
application and are prepared to do the work yourself).
 
> Does conversion deadlock occur only when a conversion is
> about to be queued and its granted/requested state is 
> incompatible with another lock already on the conversion queue?
> (eg. there is a PR->EX conversion queued and another PR->EX
> conversion is about to be queued)

Yes
 
> Other DLMs do not deliver a blocking AST to a lock which is not
> on the granted queue. This means that a lock which queued for
> conversion will not get a blocking AST if it is interfering with
> another lock being added to the conversion queue. Does GDLM do this 
> as well or are blocking ASTs delivered to all locks regardless of 
> their state?

Blocking ASTs are sent to locks on the granted queue.
 
> 
> libdlm.txt has a vague comment which reads:
>    One further point about lockspace operations is that there is no locking
>    on the creating/destruction of lockspaces in the library so it is up to
>    the application to only call dlm_*_lockspace when it is sure that
>    no other locking operations are likely to be happening.
> Does this mean 'no other locking operations' by the process which is
> creating the lockspace? no other requests to create a lock space on
> that cluster member? in the cluster as a whole?

No other locking operation in that process tree.
 
> 
> Possible Enhancements:
> ----------------------
> The following two items are areas where GDLM appears to differ from
> the DLMs from HP and IBM (eg for VMS, Tru64, AIX and OpenDLM for
> Linux which is derived from IBM's DLM for AIX). These differences 
> aren't incompatible with GFS's requirements and could be implemented 
> as optional behaviors. I'd be happy to work on patches for
> these if they would be welcome.
> 
> GDLM is described as granting new lock requests as long as they 
> are compatible with the existing lock mode regardless of the 
> existence of a conversion queue. The other DLMs mentioned above
> always queue new lock requests if there are any locks on the conversion 
> queue. Certain mechanisms can't be implemented without this kind of
> ordering. Would it be possible to make the alternate behavior a property 
> of the lock space or a property of a grant request so it can be 
> utilized where necessary?

I'm ashamed to admit I didn't know this - we can add it as a lockspace option
I think.

 
> Certain tasks are simplified if the return status of a lock indicates
> whether it was granted immediately or ended up on the waiting queue.
> Other DLMs which have both synchronous and asynchronous completion
> mechanisms implement this via a flag which requests synchronous
> completion if the lock is available, otherwise the request is queued
> and the asynchronous mechanism is used. This is particularly useful 
> for deadman locks that control recovery to distinguish between 
> the first instance of a service to start and recovery conditions.
> There are other (more complex) techniques to implement this but 
> even though GDLM is purely an asynchronous mechanism, it still would 
> be possible for the completion status to indicate (if requested) 
> whether the lock was granted immediately or not.

Off-hand I'm not sure how complex this would be to implement, I'll have a think
about it.

-- 

patrick



From rmayhew at mweb.com  Mon Jul  5 09:50:44 2004
From: rmayhew at mweb.com (Richard Mayhew)
Date: Mon, 5 Jul 2004 11:50:44 +0200
Subject: [Linux-cluster] GFS on RedHat ES 3.0
Message-ID: <91C4F1A7C418014D9F88E938C13554584B28EA@mwjdc2.mweb.com>

Hi
Thanks for the quick response.

I am running on Dell 1750's (Dual P4 2.4Ghz, 2GB Ram) using a EMC CX600
SAN.

I grabbed the 3 RPMS from
ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
Should I be downloading them for another source? Is there support yet
for the latest ES 3.0 kernel?

After rebuilding these RPMS' I only end up with the following.


GFS-6.0.0-1.2.i386.rpm
GFS-6.0.0-1.2.src.rpm
GFS-debuginfo-6.0.0-1.2.i386.rpm
GFS-devel-6.0.0-1.2.i386.rpm
GFS-modules-6.0.0-1.2.i386.rpm
perl-Net-Telnet-3.03-2.noarch.rpm
perl-Net-Telnet-3.03-2.src.rpm
rh-gfs-en-6.0-4.noarch.rpm
rh-gfs-en-6.0-4.src.rpm

This was done with a rpmbuild --rebuild GFS-6.0.0-1.2.src.rpm


Thanks.

-----Original Message-----
From: Adam Manthei [mailto:amanthei at redhat.com] 
Sent: 02 July 2004 03:37 PM
To: Discussion of clustering software components including GFS
Subject: Re: [Linux-cluster] GFS on RedHat ES 3.0

On Fri, Jul 02, 2004 at 03:18:46PM +0200, Richard Mayhew wrote:
> Hi All,
> 
> I am running RedHat ES3.0 with the kernel 2.4.21-15 (I had to
downgrade
> from 2.4.21-15.02 to be able to install the GFS RPMS).
> I build and installed the supplied ES 3.0 RPMS, but when it comes to
> doing the depmod -a
> 
> I end up with this.
> #depmod -a
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd_serv.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/drivers/md/pool/pool.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/fs/gfs/gfs.o
> Jul  2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in
> /lib/modules/2.4.21-15.EL/kernel/fs/gfs_locking/lock_gulm/lock_gulm.o
> 
> Does any one have any pointers?

Make sure the kernel versions and architectures match.  For example, if
your 
kernel is i686 SMP, then make sure you have i686 SMP gfs modules too.

> 
> --
> 
> Regards
> 
> Richard Mayhew
> Unix Specialist
> 
> MWEB Business
> Tel:  + 27 11 340 7200
> Fax:  + 27 11 340 7288
> Website: www.mwebbusiness.co.za
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Adam Manthei  <amanthei at redhat.com>

--
Linux-cluster mailing list
Linux-cluster at redhat.com
http://www.redhat.com/mailman/listinfo/linux-cluster



From amir at datacore.ch  Sun Jul  4 23:37:16 2004
From: amir at datacore.ch (Amir Guindehi)
Date: Mon, 05 Jul 2004 01:37:16 +0200
Subject: [Linux-cluster] GFS Dokumentation: GFS Installation / GNBD Usage /
	GFS Benchmarks
Message-ID: <40E894AC.50703@datacore.ch>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I've consolidated the available GFS dokumentation as well as wrote and
added some documentation of my own to:

https://open.datacore.ch/page/GFS

I hope, this can be of use to others.

Regards,
- - Amir
- --
Amir Guindehi, nospam.amir at datacore.ch
DataCore GmbH, Witikonerstrasse 289, 8053 Zurich, Switzerland


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-nr1 (Windows 2000)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA6JSpbycOjskSVCwRAn6YAJ4s2MlB/Kcs6YtkMCwfSwUIgAMUdgCeKS6t
8J/zjBGbTd5W7pTPIfZoHgA=
=YbYK
-----END PGP SIGNATURE-----



From pcaulfie at redhat.com  Mon Jul  5 13:38:46 2004
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 5 Jul 2004 14:38:46 +0100
Subject: [Linux-cluster] ccsd hanging after start on debian unstable
In-Reply-To: <1088969224.1246.10.camel@venus>
References: <1088813756.2249.15.camel@venus>
	<1088849245.724.224.camel@squizzey>
	<1088969224.1246.10.camel@venus>
Message-ID: <20040705133845.GI30146@tykepenguin.com>

On Sun, Jul 04, 2004 at 09:27:04PM +0200, Gabriel Wicke wrote:
> On Sat, 2004-07-03 at 11:07 +0100, Gareth Bult wrote:
> > Hi,
> > 
> > ccsd starts a number of threads ..
> > 
> > pick the one eating the CPU and "strace -p" it ..
> 
> Thanks for this tip, i found some information that might be useful.
> There's only one thread created that doesn't show up in ps aux or top,
> but it's possible to connect to it by using strace -p  pid-of-parent+1.
> Output is heaps of lines like this, constantly looping/scrolling:
> 
> socket(PF_BLUETOOTH, SOCK_DGRAM, 3)     = -1 ENETDOWN (Network is down)
> 
> So i suspected some weird Bluetooth/GFS interaction. Recompiled the
> kernel with Bluetooth support disabled, but same thing. ccs_test seems
> to work however, the results returned are correct. I've since double-
> checked cluster.xml a few times, that's very likely not the reason
> (posted it at http://dl.aulinx.de/gfs/cluster.xml).

The bluetooth thing is a red-herring. The cluster socket type clashes with
AF_BLUETOOTH and strace knows about the "real" one. we need to register the
AF_type properly.

CCS seems to poll for the cluster to be ready so it can enable updates - maybe
it's just doing that rather too enthusiatically :-)

patrick



From mailing-lists at hughesjr.com  Tue Jul  6 02:59:24 2004
From: mailing-lists at hughesjr.com (Johnny Hughes)
Date: Mon, 05 Jul 2004 21:59:24 -0500
Subject: [Linux-cluster] GFS on RedHat ES 3.0
Message-ID: <1089082764.5974.9.camel@Myth.home.local>

Richard,
Try this:

rpmbuild --rebuild --target i686 GFS-6.0.0-1.2.src.rpm

Johnny Hughes
HughesJR.com

-----Original Message-----
From: "Richard Mayhew" <rmayhew mweb com>
To: "Discussion of clustering software components including GFS"
<linux-cluster redhat com>
Subject: RE: [Linux-cluster] GFS on RedHat ES 3.0
Date: Mon, 5 Jul 2004 11:50:44 +0200

>Hi

>Thanks for the quick response.
>
>I am running on Dell 1750's (Dual P4 2.4Ghz, 2GB Ram) using a EMC CX600
>SAN.
>
>I grabbed the 3 RPMS from
>ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
>Should I be downloading them for another source? Is there support yet
>for the latest ES 3.0 kernel?
>
>After rebuilding these RPMS' I only end up with the following.
>
>
>GFS-6.0.0-1.2.i386.rpm
>GFS-6.0.0-1.2.src.rpm
>GFS-debuginfo-6.0.0-1.2.i386.rpm
>GFS-devel-6.0.0-1.2.i386.rpm
>GFS-modules-6.0.0-1.2.i386.rpm
>perl-Net-Telnet-3.03-2.noarch.rpm
>perl-Net-Telnet-3.03-2.src.rpm
>rh-gfs-en-6.0-4.noarch.rpm
>rh-gfs-en-6.0-4.src.rpm
>
>This was done with a rpmbuild --rebuild GFS-6.0.0-1.2.src.rpm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040705/3da1be2e/attachment.htm>

From rmayhew at mweb.com  Tue Jul  6 07:48:38 2004
From: rmayhew at mweb.com (Richard Mayhew)
Date: Tue, 6 Jul 2004 09:48:38 +0200
Subject: [Linux-cluster] GFS on RedHat ES 3.0
Message-ID: <91C4F1A7C418014D9F88E938C13554584B2A54@mwjdc2.mweb.com>

hi
I tried this some time ago and ended up with this..
 
 
Installing GFS-6.0.0-1.2.src.rpm
Building target platforms: i686
Building for target i686
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.81824
+ umask 022
+ cd /usr/src/redhat/BUILD
+ LANG=C
+ export LANG
+ unset DISPLAY
+ echo ping
ping
+ cd /usr/src/redhat/BUILD
+ rm -rf GFS-6.0.0
+ /bin/mkdir -p GFS-6.0.0
+ cd GFS-6.0.0
+ /usr/bin/gzip -dc /usr/src/redhat/SOURCES/gfs-build.tar.gz
+ tar -xf -
+ STATUS=0
+ '[' 0 -ne 0 ']'
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chown -Rhf root .
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chgrp -Rhf root .
+ /bin/chmod -Rf a+rX,g-w,o-w .
+ echo pong
pong
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.81824
+ umask 022
+ cd /usr/src/redhat/BUILD
+ cd GFS-6.0.0
+ LANG=C
+ export LANG
+ unset DISPLAY
++ pwd
+ BUILD_TOPDIR=/usr/src/redhat/BUILD/GFS-6.0.0
+ BuildSistina i686 hugemem
+ cpu_type=i686
+ flavor=hugemem
+ kernel_src=/lib/modules/2.4.21-15.ELhugemem/build
+ '[' -d /lib/modules/2.4.21-15.ELhugemem/build/. ']'
+ echo 'Kernel not found.'
Kernel not found.
+ ls /lib/modules/2.4.21-15.EL /lib/modules/2.4.21-4.EL
/lib/modules/2.4.21-9.0.3.EL
/lib/modules/2.4.21-15.EL:
build   misc         modules.generic_string  modules.isapnpmap
modules.pcimap      modules.usbmap
kernel  modules.dep  modules.ieee1394map     modules.parportmap
modules.pnpbiosmap

/lib/modules/2.4.21-4.EL:
updates

/lib/modules/2.4.21-9.0.3.EL:
misc
+ exit 1
error: Bad exit status from /var/tmp/rpm-tmp.81824 (%build)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.81824 (%build)


Any ideas?
________________________________

From: Johnny Hughes [mailto:mailing-lists at hughesjr.com] 
Sent: 06 July 2004 04:59 AM
To: linux-cluster at redhat.com
Subject: Re: [Linux-cluster] GFS on RedHat ES 3.0


Richard,
Try this:

rpmbuild --rebuild --target i686 GFS-6.0.0-1.2.src.rpm


Johnny Hughes
HughesJR.com <http://www.hughesjr.com>  	


-----Original Message-----
From: "Richard Mayhew" <rmayhew mweb com>
To: "Discussion of clustering software components including GFS"
<linux-cluster redhat com>
Subject: RE: [Linux-cluster] GFS on RedHat ES 3.0
Date: Mon, 5 Jul 2004 11:50:44 +0200

>Hi 
>Thanks for the quick response.
>
>I am running on Dell 1750's (Dual P4 2.4Ghz, 2GB Ram) using a EMC CX600
>SAN.
>
>I grabbed the 3 RPMS from
>ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
<ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
> 
>Should I be downloading them for another source? Is there support yet
>for the latest ES 3.0 kernel?
>
>After rebuilding these RPMS' I only end up with the following.
>
>
>GFS-6.0.0-1.2.i386.rpm
>GFS-6.0.0-1.2.src.rpm
>GFS-debuginfo-6.0.0-1.2.i386.rpm
>GFS-devel-6.0.0-1.2.i386.rpm
>GFS-modules-6.0.0-1.2.i386.rpm
>perl-Net-Telnet-3.03-2.noarch.rpm
>perl-Net-Telnet-3.03-2.src.rpm
>rh-gfs-en-6.0-4.noarch.rpm
>rh-gfs-en-6.0-4.src.rpm
>
>This was done with a rpmbuild --rebuild GFS-6.0.0-1.2.src.rpm



From mailing-lists at hughesjr.com  Tue Jul  6 12:22:26 2004
From: mailing-lists at hughesjr.com (Johnny Hughes)
Date: Tue, 06 Jul 2004 07:22:26 -0500
Subject: [Linux-cluster] GFS on RedHat ES 3.0
In-Reply-To: <91C4F1A7C418014D9F88E938C13554584B2A54@mwjdc2.mweb.com>
References: <91C4F1A7C418014D9F88E938C13554584B2A54@mwjdc2.mweb.com>
Message-ID: <1089116546.19333.122.camel@Myth.home.local>

For building purposes, install the packages kernel, kernel-source,
kernel-smp, kernel-hugemem.  Then do the --target i686 command. (After
you have finished building, you can remove all the kernels except the
one you need to boot).

Also, if you want to build this on the 2.4.21-15.0.3.EL kernel, you can
download a modified source file from me that builds against that kernel
(it builds against 15.EL, 15.0.2.EL and 15.0.3.EL).  I built the i686
rpms, which you can download from me and try if you want (see the link
below).

The src.rpm file should build on any kernel where the name is
2.4.21-15.EL, 2.4.21-15.0.2.EL, or 2.4.21-15.0.3.EL. (must have kernel,
kernel-source, kernel-smp, kernel-hugemem installed to build).

My RPMS were built on a WBEL machine, but it shouldn't make any
difference.   They will only install on a 2.4.21-15.0.3.EL kernel...

 GFS Downloads

Johnny Hughes
HughesJR.com

On Tue, 2004-07-06 at 02:48, Richard Mayhew wrote:

> hi
> I tried this some time ago and ended up with this..
>  
> 
> Installing GFS-6.0.0-1.2.src.rpm
> Building target platforms: i686
> Building for target i686
> Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.81824
> + umask 022
> + cd /usr/src/redhat/BUILD
> + LANG=C
> + export LANG
> + unset DISPLAY
> + echo ping
> ping
> + cd /usr/src/redhat/BUILD
> + rm -rf GFS-6.0.0
> + /bin/mkdir -p GFS-6.0.0
> + cd GFS-6.0.0
> + /usr/bin/gzip -dc /usr/src/redhat/SOURCES/gfs-build.tar.gz
> + tar -xf -
> + STATUS=0
> + '[' 0 -ne 0 ']'
> ++ /usr/bin/id -u
> + '[' 0 = 0 ']'
> + /bin/chown -Rhf root .
> ++ /usr/bin/id -u
> + '[' 0 = 0 ']'
> + /bin/chgrp -Rhf root .
> + /bin/chmod -Rf a+rX,g-w,o-w .
> + echo pong
> pong
> + exit 0
> Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.81824
> + umask 022
> + cd /usr/src/redhat/BUILD
> + cd GFS-6.0.0
> + LANG=C
> + export LANG
> + unset DISPLAY
> ++ pwd
> + BUILD_TOPDIR=/usr/src/redhat/BUILD/GFS-6.0.0
> + BuildSistina i686 hugemem
> + cpu_type=i686
> + flavor=hugemem
> + kernel_src=/lib/modules/2.4.21-15.ELhugemem/build
> + '[' -d /lib/modules/2.4.21-15.ELhugemem/build/. ']'
> + echo 'Kernel not found.'
> Kernel not found.
> + ls /lib/modules/2.4.21-15.EL /lib/modules/2.4.21-4.EL
> /lib/modules/2.4.21-9.0.3.EL
> /lib/modules/2.4.21-15.EL:
> build   misc         modules.generic_string  modules.isapnpmap
> modules.pcimap      modules.usbmap
> kernel  modules.dep  modules.ieee1394map     modules.parportmap
> modules.pnpbiosmap
> 
> /lib/modules/2.4.21-4.EL:
> updates
> 
> /lib/modules/2.4.21-9.0.3.EL:
> misc
> + exit 1
> error: Bad exit status from /var/tmp/rpm-tmp.81824 (%build)
> 
> 
> RPM build errors:
>     Bad exit status from /var/tmp/rpm-tmp.81824 (%build)
> 
> 
> Any ideas?
> ________________________________
> 
> From: Johnny Hughes [mailto:mailing-lists at hughesjr.com] 
> Sent: 06 July 2004 04:59 AM
> To: linux-cluster at redhat.com
> Subject: Re: [Linux-cluster] GFS on RedHat ES 3.0
> 
> 
> Richard,
> Try this:
> 
> rpmbuild --rebuild --target i686 GFS-6.0.0-1.2.src.rpm
> 
> 
> Johnny Hughes
> HughesJR.com <http://www.hughesjr.com>  	
> 
> 
> -----Original Message-----
> From: "Richard Mayhew" <rmayhew mweb com>
> To: "Discussion of clustering software components including GFS"
> <linux-cluster redhat com>
> Subject: RE: [Linux-cluster] GFS on RedHat ES 3.0
> Date: Mon, 5 Jul 2004 11:50:44 +0200
> 
> >Hi 
> >Thanks for the quick response.
> >
> >I am running on Dell 1750's (Dual P4 2.4Ghz, 2GB Ram) using a EMC CX600
> >SAN.
> >
> >I grabbed the 3 RPMS from
> >ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
> <ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
> > 
> >Should I be downloading them for another source? Is there support yet
> >for the latest ES 3.0 kernel?
> >
> >After rebuilding these RPMS' I only end up with the following.
> >
> >
> >GFS-6.0.0-1.2.i386.rpm
> >GFS-6.0.0-1.2.src.rpm
> >GFS-debuginfo-6.0.0-1.2.i386.rpm
> >GFS-devel-6.0.0-1.2.i386.rpm
> >GFS-modules-6.0.0-1.2.i386.rpm
> >perl-Net-Telnet-3.03-2.noarch.rpm
> >perl-Net-Telnet-3.03-2.src.rpm
> >rh-gfs-en-6.0-4.noarch.rpm
> >rh-gfs-en-6.0-4.src.rpm
> >
> >This was done with a rpmbuild --rebuild GFS-6.0.0-1.2.src.rpm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040706/c1515651/attachment.htm>

From rmayhew at mweb.com  Tue Jul  6 12:43:01 2004
From: rmayhew at mweb.com (Richard Mayhew)
Date: Tue, 6 Jul 2004 14:43:01 +0200
Subject: [Linux-cluster] GFS on RedHat ES 3.0
Message-ID: <91C4F1A7C418014D9F88E938C13554584B2AD6@mwjdc2.mweb.com>

Ta very much, let me get stuck in here with your RPM and give it a bash.

  _____  

From: Johnny Hughes [mailto:mailing-lists at hughesjr.com] 
Sent: 06 July 2004 02:22 PM
To: linux-cluster at redhat.com
Subject: RE: [Linux-cluster] GFS on RedHat ES 3.0


For building purposes, install the packages kernel, kernel-source,
kernel-smp, kernel-hugemem.  Then do the --target i686 command. (After
you have finished building, you can remove all the kernels except the
one you need to boot).

Also, if you want to build this on the 2.4.21-15.0.3.EL kernel, you can
download a modified source file from me that builds against that kernel
(it builds against 15.EL, 15.0.2.EL and 15.0.3.EL).  I built the i686
rpms, which you can download from me and try if you want (see the link
below).

The src.rpm file should build on any kernel where the name is
2.4.21-15.EL, 2.4.21-15.0.2.EL, or 2.4.21-15.0.3.EL. (must have kernel,
kernel-source, kernel-smp, kernel-hugemem installed to build).

My RPMS were built on a WBEL machine, but it shouldn't make any
difference.   They will only install on a 2.4.21-15.0.3.EL kernel...

GFS Downloads
<http://www.hughesjr.com/component/option,com_docman/task,view_category/
Itemid,34/subcat,1/catid,15/limitstart,0/limit,50/> 


Johnny Hughes
HughesJR.com <http://www.hughesjr.com>  	


On Tue, 2004-07-06 at 02:48, Richard Mayhew wrote: 

	hi
	I tried this some time ago and ended up with this..
	 
	
	Installing GFS-6.0.0-1.2.src.rpm
	Building target platforms: i686
	Building for target i686
	Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.81824
	+ umask 022
	+ cd /usr/src/redhat/BUILD
	+ LANG=C
	+ export LANG
	+ unset DISPLAY
	+ echo ping
	ping
	+ cd /usr/src/redhat/BUILD
	+ rm -rf GFS-6.0.0
	+ /bin/mkdir -p GFS-6.0.0
	+ cd GFS-6.0.0
	+ /usr/bin/gzip -dc /usr/src/redhat/SOURCES/gfs-build.tar.gz
	+ tar -xf -
	+ STATUS=0
	+ '[' 0 -ne 0 ']'
	++ /usr/bin/id -u
	+ '[' 0 = 0 ']'
	+ /bin/chown -Rhf root .
	++ /usr/bin/id -u
	+ '[' 0 = 0 ']'
	+ /bin/chgrp -Rhf root .
	+ /bin/chmod -Rf a+rX,g-w,o-w .
	+ echo pong
	pong
	+ exit 0
	Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.81824
	+ umask 022
	+ cd /usr/src/redhat/BUILD
	+ cd GFS-6.0.0
	+ LANG=C
	+ export LANG
	+ unset DISPLAY
	++ pwd
	+ BUILD_TOPDIR=/usr/src/redhat/BUILD/GFS-6.0.0
	+ BuildSistina i686 hugemem
	+ cpu_type=i686
	+ flavor=hugemem
	+ kernel_src=/lib/modules/2.4.21-15.ELhugemem/build
	+ '[' -d /lib/modules/2.4.21-15.ELhugemem/build/. ']'
	+ echo 'Kernel not found.'
	Kernel not found.
	+ ls /lib/modules/2.4.21-15.EL /lib/modules/2.4.21-4.EL
	/lib/modules/2.4.21-9.0.3.EL
	/lib/modules/2.4.21-15.EL:
	build   misc         modules.generic_string  modules.isapnpmap
	modules.pcimap      modules.usbmap
	kernel  modules.dep  modules.ieee1394map     modules.parportmap
	modules.pnpbiosmap
	
	/lib/modules/2.4.21-4.EL:
	updates
	
	/lib/modules/2.4.21-9.0.3.EL:
	misc
	+ exit 1
	error: Bad exit status from /var/tmp/rpm-tmp.81824 (%build)
	
	
	RPM build errors:
	    Bad exit status from /var/tmp/rpm-tmp.81824 (%build)
	
	
	Any ideas?
	________________________________
	
	From: Johnny Hughes [mailto:mailing-lists at hughesjr.com] 
	Sent: 06 July 2004 04:59 AM
	To: linux-cluster at redhat.com
	Subject: Re: [Linux-cluster] GFS on RedHat ES 3.0
	
	
	Richard,
	Try this:
	
	rpmbuild --rebuild --target i686 GFS-6.0.0-1.2.src.rpm
	
	
	Johnny Hughes
	HughesJR.com <http://www.hughesjr.com <http://www.hughesjr.com>
>  	
	
	
	-----Original Message-----
	From: "Richard Mayhew" <rmayhew mweb com>
	To: "Discussion of clustering software components including GFS"
	<linux-cluster redhat com>
	Subject: RE: [Linux-cluster] GFS on RedHat ES 3.0
	Date: Mon, 5 Jul 2004 11:50:44 +0200
	
	>Hi 
	>Thanks for the quick response.
	>
	>I am running on Dell 1750's (Dual P4 2.4Ghz, 2GB Ram) using a
EMC CX600
	>SAN.
	>
	>I grabbed the 3 RPMS from
	
>ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
<ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
> 
	
<ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
<ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
> 
	> 
	>Should I be downloading them for another source? Is there
support yet
	>for the latest ES 3.0 kernel?
	>
	>After rebuilding these RPMS' I only end up with the following.
	>
	>
	>GFS-6.0.0-1.2.i386.rpm
	>GFS-6.0.0-1.2.src.rpm
	>GFS-debuginfo-6.0.0-1.2.i386.rpm
	>GFS-devel-6.0.0-1.2.i386.rpm
	>GFS-modules-6.0.0-1.2.i386.rpm
	>perl-Net-Telnet-3.03-2.noarch.rpm
	>perl-Net-Telnet-3.03-2.src.rpm
	>rh-gfs-en-6.0-4.noarch.rpm
	>rh-gfs-en-6.0-4.src.rpm
	>
	>This was done with a rpmbuild --rebuild GFS-6.0.0-1.2.src.rpm

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040706/83ee7a21/attachment.htm>

From jbrassow at redhat.com  Tue Jul  6 14:24:59 2004
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Tue, 6 Jul 2004 09:24:59 -0500
Subject: [Linux-cluster] Re: ccsd hanging after start on debian unstable
In-Reply-To: <20040703160112.7160674067@hormel.redhat.com>
References: <20040703160112.7160674067@hormel.redhat.com>
Message-ID: <42BA51AC-CF58-11D8-A8E3-000A957BB1F6@redhat.com>

This has to do with 'exit' not being able to be called from interrupt 
context on some systems.  This problem should be fixed in cvs.

Additionally, for those receiving tons of messages about a network not 
being found... this is due to ccsd trying to communicate with cman 
before it is ready.  These messages should no longer be printed.

  brassow


On Jul 3, 2004, at 11:01 AM, linux-cluster-request at redhat.com wrote:

> ccsd hanging after start on debian unstable



From wendland at scan-plus.de  Tue Jul  6 18:46:53 2004
From: wendland at scan-plus.de (Joerg Wendland)
Date: Tue, 6 Jul 2004 20:46:53 +0200
Subject: [Linux-cluster] Kernel panic in fs/gfs_locking/lock_dlm/lock.c
In-Reply-To: <20040701034859.GC11996@redhat.com>
References: <20040701010048.GC25028@dozer> <20040701034859.GC11996@redhat.com>
Message-ID: <20040706184653.GA16139@dozer>

On Thu, Jul 01, 2004 at 11:48:59AM +0800, David Teigland wrote:
> >   Kernel panic: lock_dlm:  Record message above and reboot.
> 
> This is a bug we know of and are working on right now.

Is this fixed by the latest CVS checkins? The log messages suppose so.

Thanks,
  Joerg

-- 
| Entwickler Elektronische Datenverarbeitung und Dienstbetriebsmittel          |
| Scan-Plus GmbH Dienstbetriebsmittelherstellung         fon +49-731-92013-0   |
| Koenigstrasse 78, 89077 Ulm, Germany                   fax +49-731-92013-290 |
| Geschaeftsfuehrer: Juergen Hoermann                 HRB 3220 Amtsgericht Ulm |
| PGP-key: 51CF8417 (FP: 79C0 7671 AFC7 315E 657A  F318 57A3 7FBD 51CF 8417)   |



From teigland at redhat.com  Wed Jul  7 02:43:51 2004
From: teigland at redhat.com (David Teigland)
Date: Wed, 7 Jul 2004 10:43:51 +0800
Subject: [Linux-cluster] Kernel panic in fs/gfs_locking/lock_dlm/lock.c
In-Reply-To: <20040706184653.GA16139@dozer>
References: <20040701010048.GC25028@dozer> <20040701034859.GC11996@redhat.com>
	<20040706184653.GA16139@dozer>
Message-ID: <20040707024351.GA7674@redhat.com>


On Tue, Jul 06, 2004 at 08:46:53PM +0200, Joerg Wendland wrote:
> On Thu, Jul 01, 2004 at 11:48:59AM +0800, David Teigland wrote:
> > >   Kernel panic: lock_dlm:  Record message above and reboot.
> > 
> > This is a bug we know of and are working on right now.
> 
> Is this fixed by the latest CVS checkins? The log messages suppose so.

We've fixed a couple things, so it's possible, but I suspect you may
just trigger an assertion earlier now based on the debugging we're still
doing.

-- 
Dave Teigland  <teigland at redhat.com>



From rmayhew at mweb.com  Wed Jul  7 10:23:41 2004
From: rmayhew at mweb.com (Richard Mayhew)
Date: Wed, 7 Jul 2004 12:23:41 +0200
Subject: [Linux-cluster] Gfs_data vs gfs_journal
Message-ID: <91C4F1A7C418014D9F88E938C13554584B2BF6@mwjdc2.mweb.com>

Hi,

Could some one explain or point me in the right direction in the
differences between gfs_data and gfs_journal in the pool config file.

Which is the better option, and why?

Thanks
Richard.



From mtilstra at redhat.com  Wed Jul  7 14:39:08 2004
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Wed, 7 Jul 2004 09:39:08 -0500
Subject: [Linux-cluster] Gfs_data vs gfs_journal
In-Reply-To: <91C4F1A7C418014D9F88E938C13554584B2BF6@mwjdc2.mweb.com>
References: <91C4F1A7C418014D9F88E938C13554584B2BF6@mwjdc2.mweb.com>
Message-ID: <20040707143908.GA6496@redhat.com>

On Wed, Jul 07, 2004 at 12:23:41PM +0200, Richard Mayhew wrote:
> Could some one explain or point me in the right direction in the
> differences between gfs_data and gfs_journal in the pool config file.
> 
> Which is the better option, and why?

For nearly everyone, just use gfs_data.
gfs_journal is for controling where physically gfs puts the journals.
(file system data goes in gfs_data, journals in gfs_journal.)

The idea when we originally made this was that someon might have a
really fast but small storage device, and then a bunch of more common
storage.  They then could use pool to combine the two devices into a
single pool, putting the journal onto the faster device. Which then, in
theory, would make gfs faster.  I don't think it has ever been tested
though.  You have to tell mkfs.gfs to look at pool lables to use this.
(I forget the cmd option, its in the man page.)

If you don't tell mkfs.gfs to look at pool labels, the labels are
ignored.  And gfs puts the journals in the middle of the data section I
think. (could be wrong on that.)

Hope that helps.
-- 
Michael Conrad Tadpol Tilstra
I hate when they fix a bug that I use.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040707/a78b7d4f/attachment.sig>

From ben.m.cahill at intel.com  Wed Jul  7 15:06:37 2004
From: ben.m.cahill at intel.com (Cahill, Ben M)
Date: Wed, 7 Jul 2004 08:06:37 -0700
Subject: [Linux-cluster] Gfs_data vs gfs_journal
Message-ID: <0604335B7764D141945E20215310596002299EE5@orsmsx404.amr.corp.intel.com>

They are both necessary.

gfs_data is the device/partition for filesystem data (i.e. files and
on-disk metadata).

Each node in the cluster also needs a separate journal device/partition
in which to redundantly record metadata, to enable the filesystem to
recover gracefully from node failure/crash.

There's some documentation about this in the OpenGFS project:

opengfs.sourceforge.net/docs.php

CAUTION:  OpenGFS is *not* the same as current RedHat GFS; many things
(e.g. lock protocols) are different ... but the basic idea is the same.
See WHATIS-OpenGFS, and HOWTO-generic, just to see if they help you
understand.  But remember to rely on current RedHat GFS docs for current
installation, components, and capabilities info.

-- Ben --

Opinions are mine, not Intel's

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Richard Mayhew
> Sent: Wednesday, July 07, 2004 6:24 AM
> To: Discussion of clustering software components including GFS
> Subject: [Linux-cluster] Gfs_data vs gfs_journal
> 
> Hi,
> 
> Could some one explain or point me in the right direction in the
> differences between gfs_data and gfs_journal in the pool config file.
> 
> Which is the better option, and why?
> 
> Thanks
> Richard.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 



From ben.m.cahill at intel.com  Wed Jul  7 15:16:59 2004
From: ben.m.cahill at intel.com (Cahill, Ben M)
Date: Wed, 7 Jul 2004 08:16:59 -0700
Subject: [Linux-cluster] Gfs_data vs gfs_journal
Message-ID: <0604335B7764D141945E20215310596002299EE8@orsmsx404.amr.corp.intel.com>

Oops, based on Michael's response, I realized that mine might be not
quite right.  Both data and journal *space* are necessary, but the
journals can be created, by default, within the filesystem device, with
no need for gfs_journal entry in config file.

BTW, OpenGFS has supported external journals for over a year at this
point ... would this be a useful feature for GFS?

-- Ben --

Opinions are mine, not Intel's

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Cahill, Ben M
> Sent: Wednesday, July 07, 2004 11:07 AM
> To: linux-cluster at redhat.com
> Subject: RE: [Linux-cluster] Gfs_data vs gfs_journal
> 
> They are both necessary.
> 
> gfs_data is the device/partition for filesystem data (i.e. files and
> on-disk metadata).
> 
> Each node in the cluster also needs a separate journal 
> device/partition
> in which to redundantly record metadata, to enable the filesystem to
> recover gracefully from node failure/crash.
> 
> There's some documentation about this in the OpenGFS project:
> 
> opengfs.sourceforge.net/docs.php
> 
> CAUTION:  OpenGFS is *not* the same as current RedHat GFS; many things
> (e.g. lock protocols) are different ... but the basic idea is 
> the same.
> See WHATIS-OpenGFS, and HOWTO-generic, just to see if they help you
> understand.  But remember to rely on current RedHat GFS docs 
> for current
> installation, components, and capabilities info.
> 
> -- Ben --
> 
> Opinions are mine, not Intel's
> 
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of 
> Richard Mayhew
> > Sent: Wednesday, July 07, 2004 6:24 AM
> > To: Discussion of clustering software components including GFS
> > Subject: [Linux-cluster] Gfs_data vs gfs_journal
> > 
> > Hi,
> > 
> > Could some one explain or point me in the right direction in the
> > differences between gfs_data and gfs_journal in the pool 
> config file.
> > 
> > Which is the better option, and why?
> > 
> > Thanks
> > Richard.
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > http://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 



From bruno.coudoin at storagency.com  Wed Jul  7 15:47:54 2004
From: bruno.coudoin at storagency.com (Bruno Coudoin)
Date: Wed, 07 Jul 2004 17:47:54 +0200
Subject: [Linux-cluster] building gfs on opteron
Message-ID: <1089215274.18997.123.camel@bruno.storagency>


I would like to build GFS on opterons with kernel 2.4.21. 
I managed to compile it on X86 using the GFS-6.0.0-1.2.src.rpm and the
binary redhat kernels. Is this process appropriate for opterons as well?

Bruno.





From nygaard at redhat.com  Wed Jul  7 16:21:26 2004
From: nygaard at redhat.com (Erling Nygaard)
Date: Wed, 7 Jul 2004 11:21:26 -0500
Subject: [Linux-cluster] Gfs_data vs gfs_journal
In-Reply-To: <0604335B7764D141945E20215310596002299EE8@orsmsx404.amr.corp.intel.com>;
	from ben.m.cahill@intel.com on Wed, Jul 07, 2004 at 08:16:59AM -0700
References: <0604335B7764D141945E20215310596002299EE8@orsmsx404.amr.corp.intel.com>
Message-ID: <20040707112126.B30098@homer.msp.redhat.com>

Ben

You can indeed have external journals with GFS.

As Mike was saying, you can specify a subpool with type "gfs_journal".
And since you easily can specify what device the subpool is on you decide 
where the journal is.

This feature has been in GFS since 'a looong time ago' and unless there 
have been changes to this in OpenGFS this feature works in the same way in 
all versions of GFS :)

As Mike pointed out, this was originally done in case of Solid State 
Disks, where having the journals on the SSD could prove speedup. Due to 
lack of SSDs this has never really been tested much...


Erling



On Wed, Jul 07, 2004 at 08:16:59AM -0700, Cahill, Ben M wrote:
> Oops, based on Michael's response, I realized that mine might be not
> quite right.  Both data and journal *space* are necessary, but the
> journals can be created, by default, within the filesystem device, with
> no need for gfs_journal entry in config file.
> 
> BTW, OpenGFS has supported external journals for over a year at this
> point ... would this be a useful feature for GFS?
> 
> -- Ben --
> 
> Opinions are mine, not Intel's
> 
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Cahill, Ben M
> > Sent: Wednesday, July 07, 2004 11:07 AM
> > To: linux-cluster at redhat.com
> > Subject: RE: [Linux-cluster] Gfs_data vs gfs_journal
> > 
> > They are both necessary.
> > 
> > gfs_data is the device/partition for filesystem data (i.e. files and
> > on-disk metadata).
> > 
> > Each node in the cluster also needs a separate journal 
> > device/partition
> > in which to redundantly record metadata, to enable the filesystem to
> > recover gracefully from node failure/crash.
> > 
> > There's some documentation about this in the OpenGFS project:
> > 
> > opengfs.sourceforge.net/docs.php
> > 
> > CAUTION:  OpenGFS is *not* the same as current RedHat GFS; many things
> > (e.g. lock protocols) are different ... but the basic idea is 
> > the same.
> > See WHATIS-OpenGFS, and HOWTO-generic, just to see if they help you
> > understand.  But remember to rely on current RedHat GFS docs 
> > for current
> > installation, components, and capabilities info.
> > 
> > -- Ben --
> > 
> > Opinions are mine, not Intel's
> > 
> > > -----Original Message-----
> > > From: linux-cluster-bounces at redhat.com 
> > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of 
> > Richard Mayhew
> > > Sent: Wednesday, July 07, 2004 6:24 AM
> > > To: Discussion of clustering software components including GFS
> > > Subject: [Linux-cluster] Gfs_data vs gfs_journal
> > > 
> > > Hi,
> > > 
> > > Could some one explain or point me in the right direction in the
> > > differences between gfs_data and gfs_journal in the pool 
> > config file.
> > > 
> > > Which is the better option, and why?
> > > 
> > > Thanks
> > > Richard.
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > http://www.redhat.com/mailman/listinfo/linux-cluster
> > > 
> > > 
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > http://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Erling Nygaard
nygaard at redhat.com

Red Hat Inc



From Remi.Nivet at atosorigin.com  Wed Jul  7 17:26:12 2004
From: Remi.Nivet at atosorigin.com (=?iso-8859-1?Q?Nivet_R=E9mi?=)
Date: Wed, 7 Jul 2004 19:26:12 +0200
Subject: [Linux-cluster] GFS data access failover
Message-ID: <1AD1E96E6289744CB6979EC49DA554C4130F79@srv-grp-s08.dev.atos.fr>

Hi everyone,

I successfully set up a 2-nodes cluster after three days of hard work looking for docs and crashing servers, but it finaly worked and I'm able to export GFS partition from one node to the other using gnbd ;-)

Now my question is : is there any way to use 2 dataservers (maybe replicate data between them but I can manage that on myself) and use a failover (or round-robin) mechanism so that if one of the dataserver crash, my clients can still access the data from the other dataserver ?

As an optional question : I'm trying to use clvmd to propagate lvm config from one node to the other, but when I try to create LV, I've got the following error : 

	Error locking on node XXXX: Internal lvm error, check syslog
	Failed to activate new LV.

and the only log I have on the remote node is : 

	lvm[721]: Volume group for uuid not found: LWuaVGYKeELfrxE16CgW0pUOAilU4CSNmLXVJ3b7i5AsjhWgfszorCRkn5KRCQTU

anyone knows what I'm missing ?

Thanks,
R?mi.



From ben.m.cahill at intel.com  Wed Jul  7 21:01:16 2004
From: ben.m.cahill at intel.com (Cahill, Ben M)
Date: Wed, 7 Jul 2004 14:01:16 -0700
Subject: [Linux-cluster] Gfs_data vs gfs_journal
Message-ID: <0604335B7764D141945E20215310596002299EE9@orsmsx404.amr.corp.intel.com>

Yes, you're correct ... however, the idea in OpenGFS was to not rely on
pool, but to allow other "generic" volume managers to be used instead.

-- Ben --

Opinions are mine, not Intel's

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Erling Nygaard
> Sent: Wednesday, July 07, 2004 12:21 PM
> To: Discussion of clustering software components including GFS
> Subject: Re: [Linux-cluster] Gfs_data vs gfs_journal
> 
> Ben
> 
> You can indeed have external journals with GFS.
> 
> As Mike was saying, you can specify a subpool with type "gfs_journal".
> And since you easily can specify what device the subpool is 
> on you decide 
> where the journal is.
> 
> This feature has been in GFS since 'a looong time ago' and 
> unless there 
> have been changes to this in OpenGFS this feature works in 
> the same way in 
> all versions of GFS :)
> 
> As Mike pointed out, this was originally done in case of Solid State 
> Disks, where having the journals on the SSD could prove 
> speedup. Due to 
> lack of SSDs this has never really been tested much...
> 
> 
> Erling
> 
> 
> 
> On Wed, Jul 07, 2004 at 08:16:59AM -0700, Cahill, Ben M wrote:
> > Oops, based on Michael's response, I realized that mine might be not
> > quite right.  Both data and journal *space* are necessary, but the
> > journals can be created, by default, within the filesystem 
> device, with
> > no need for gfs_journal entry in config file.
> > 
> > BTW, OpenGFS has supported external journals for over a year at this
> > point ... would this be a useful feature for GFS?
> > 
> > -- Ben --
> > 
> > Opinions are mine, not Intel's
> > 
> > > -----Original Message-----
> > > From: linux-cluster-bounces at redhat.com 
> > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of 
> Cahill, Ben M
> > > Sent: Wednesday, July 07, 2004 11:07 AM
> > > To: linux-cluster at redhat.com
> > > Subject: RE: [Linux-cluster] Gfs_data vs gfs_journal
> > > 
> > > They are both necessary.
> > > 
> > > gfs_data is the device/partition for filesystem data 
> (i.e. files and
> > > on-disk metadata).
> > > 
> > > Each node in the cluster also needs a separate journal 
> > > device/partition
> > > in which to redundantly record metadata, to enable the 
> filesystem to
> > > recover gracefully from node failure/crash.
> > > 
> > > There's some documentation about this in the OpenGFS project:
> > > 
> > > opengfs.sourceforge.net/docs.php
> > > 
> > > CAUTION:  OpenGFS is *not* the same as current RedHat 
> GFS; many things
> > > (e.g. lock protocols) are different ... but the basic idea is 
> > > the same.
> > > See WHATIS-OpenGFS, and HOWTO-generic, just to see if 
> they help you
> > > understand.  But remember to rely on current RedHat GFS docs 
> > > for current
> > > installation, components, and capabilities info.
> > > 
> > > -- Ben --
> > > 
> > > Opinions are mine, not Intel's
> > > 
> > > > -----Original Message-----
> > > > From: linux-cluster-bounces at redhat.com 
> > > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of 
> > > Richard Mayhew
> > > > Sent: Wednesday, July 07, 2004 6:24 AM
> > > > To: Discussion of clustering software components including GFS
> > > > Subject: [Linux-cluster] Gfs_data vs gfs_journal
> > > > 
> > > > Hi,
> > > > 
> > > > Could some one explain or point me in the right direction in the
> > > > differences between gfs_data and gfs_journal in the pool 
> > > config file.
> > > > 
> > > > Which is the better option, and why?
> > > > 
> > > > Thanks
> > > > Richard.
> > > > 
> > > > --
> > > > Linux-cluster mailing list
> > > > Linux-cluster at redhat.com
> > > > http://www.redhat.com/mailman/listinfo/linux-cluster
> > > > 
> > > > 
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > http://www.redhat.com/mailman/listinfo/linux-cluster
> > > 
> > > 
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> -- 
> Erling Nygaard
> nygaard at redhat.com
> 
> Red Hat Inc
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 



From notiggy at gmail.com  Wed Jul  7 22:24:08 2004
From: notiggy at gmail.com (Brian Jackson)
Date: Wed, 7 Jul 2004 17:24:08 -0500
Subject: [Linux-cluster] GFS data access failover
In-Reply-To: <1AD1E96E6289744CB6979EC49DA554C4130F79@srv-grp-s08.dev.atos.fr>
References: <1AD1E96E6289744CB6979EC49DA554C4130F79@srv-grp-s08.dev.atos.fr>
Message-ID: <fb20c214040707152456734e6e@mail.gmail.com>

On Wed, 7 Jul 2004 19:26:12 +0200, Nivet R?mi <remi.nivet at atosorigin.com> wrote:
> Hi everyone,
> 
> I successfully set up a 2-nodes cluster after three days of hard work looking for docs and crashing servers, but it finaly worked and I'm able to export GFS partition from one node to the other using gnbd ;-)
> 
> Now my question is : is there any way to use 2 dataservers (maybe replicate data between them but I can manage that on myself) and use a failover (or round-robin) mechanism so that if one of the dataserver crash, my clients can still access the data from the other dataserver ?
> 

Nope, this is frequently asked. You need some sort of cluster aware
mirroring. The kernels' md drivers aren't even close. I believe some
people are currently looking at/working on this

--Brian

> As an optional question : I'm trying to use clvmd to propagate lvm config from one node to the other, but when I try to create LV, I've got the following error :
> 
>         Error locking on node XXXX: Internal lvm error, check syslog
>         Failed to activate new LV.
> 
> and the only log I have on the remote node is :
> 
>         lvm[721]: Volume group for uuid not found: LWuaVGYKeELfrxE16CgW0pUOAilU4CSNmLXVJ3b7i5AsjhWgfszorCRkn5KRCQTU
> 
> anyone knows what I'm missing ?
> 
> Thanks,
> R?mi.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
>



From rmayhew at mweb.com  Thu Jul  8 12:27:38 2004
From: rmayhew at mweb.com (Richard Mayhew)
Date: Thu, 8 Jul 2004 14:27:38 +0200
Subject: [Linux-cluster] GFS Performance
Message-ID: <91C4F1A7C418014D9F88E938C13554584B2D5C@mwjdc2.mweb.com>

Hi


I setup 2 nodes, on my EMC SAN. Both nodes see the storage and can
access the cca device.
When writing a file to the storage fs, the second node takes a couple of
seconds to see the changes.

Ie. 
1. Node 1 Creates the file "dd if=/dev/zero of=test.file bs=4096
count=10240000"
2. Doing a ls -la on node 2 takes a few seconds to display the contents
of the dir.

After the file has finished being updates, all listings of that dir are
quick, but if any changes are made, one again has to wait for the system
to display the contents of the dir.

Any idea?



--

Regards

Richard Mayhew
Unix Specialist

MWEB Business
Tel:  + 27 11 340 7200
Fax:  + 27 11 340 7288
Website: www.mwebbusiness.co.za



From jeff at intersystems.com  Thu Jul  8 13:09:49 2004
From: jeff at intersystems.com (Jeff)
Date: Thu, 8 Jul 2004 09:09:49 -0400
Subject: [Linux-cluster] Some GDLM questions
In-Reply-To: <20040705093135.GB30146@tykepenguin.com>
References: <104121513.20040703103356@intersystems.com>
	<20040705093135.GB30146@tykepenguin.com>
Message-ID: <14710269533.20040708090949@intersystems.com>

Monday, July 5, 2004, 5:31:35 AM, Patrick Caulfield wrote:

> On Sat, Jul 03, 2004 at 10:33:56AM -0400, Jeff wrote:
>> These are from reviewing http://people.redhat.com/~teigland/sca.pdf
>> and the CVS copy of cluster/dlm/doc/libdlm.txt.
>> ------------------------------------------------------------------
>> 
>> If a program requests a lock on the AST side can it wait for
>> the lock to complete without returning from the original AST
>> routine?  Would it use the poll/select mechanism to do this?

> In kernel space you shouldn't wait or do much work in the AST routine or
> you can block the kernel's AST delivery thread. You can call dlm_lock() in an
> AST routine though.

> In userspace you can do pretty much what you like in the AST routine as (by
> default) they run in a seperate thread - see libdlm for more details on this.

Thanks for the answers. One more question about acquiring locks in
the worker thread.

Assuming I'm using pthreads if I want to call dlm_lock() in the worker
thread (AST routine) and I need to wait for that dlm_lock() call to complete
would I call dlm_get_fd() and loop calling dlm_dispatch() until I see that
the lock completes? This involves dlm_dispatch() calling itself
recursively which I assume is going to be ok.

Can you call dlm_pthread_init() more than once to start multiple
service threads? I have some routines which are called
both by the mainline code and as an AST routine. I'm wondering if I
need them to be aware of how they're called or whether they can
always simply open the fd and use dlm_dispatch if they need to
issue a 'blocking' dlm_lock() call.







From pcaulfie at redhat.com  Thu Jul  8 13:38:02 2004
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 8 Jul 2004 14:38:02 +0100
Subject: [Linux-cluster] Some GDLM questions
In-Reply-To: <14710269533.20040708090949@intersystems.com>
References: <104121513.20040703103356@intersystems.com>
	<20040705093135.GB30146@tykepenguin.com>
	<14710269533.20040708090949@intersystems.com>
Message-ID: <20040708133800.GF7680@tykepenguin.com>

On Thu, Jul 08, 2004 at 09:09:49AM -0400, Jeff wrote:
> Monday, July 5, 2004, 5:31:35 AM, Patrick Caulfield wrote:
> 
> > On Sat, Jul 03, 2004 at 10:33:56AM -0400, Jeff wrote:
> >> These are from reviewing http://people.redhat.com/~teigland/sca.pdf
> >> and the CVS copy of cluster/dlm/doc/libdlm.txt.
> >> ------------------------------------------------------------------
> >> 
> >> If a program requests a lock on the AST side can it wait for
> >> the lock to complete without returning from the original AST
> >> routine?  Would it use the poll/select mechanism to do this?
> 
> > In kernel space you shouldn't wait or do much work in the AST routine or
> > you can block the kernel's AST delivery thread. You can call dlm_lock() in an
> > AST routine though.
> 
> > In userspace you can do pretty much what you like in the AST routine as (by
> > default) they run in a seperate thread - see libdlm for more details on this.
> 
> Thanks for the answers. One more question about acquiring locks in
> the worker thread.
> 
> Assuming I'm using pthreads if I want to call dlm_lock() in the worker
> thread (AST routine) and I need to wait for that dlm_lock() call to complete
> would I call dlm_get_fd() and loop calling dlm_dispatch() until I see that
> the lock completes? This involves dlm_dispatch() calling itself
> recursively which I assume is going to be ok.

That should work fine, though I must admit I haven't tried it. The routines are
all re-entrant (of course).
 
> Can you call dlm_pthread_init() more than once to start multiple
> service threads? I have some routines which are called
> both by the mainline code and as an AST routine. I'm wondering if I
> need them to be aware of how they're called or whether they can
> always simply open the fd and use dlm_dispatch if they need to
> issue a 'blocking' dlm_lock() call.

Currently you can't have multiple dispatch routines by using library's
pthread_init calls. 

The threads don't really care whether they are AST threads or work threads -
all they do is read data from the DLM's fd and call the routine specified by
astaddr. I suppose one hazard of issuing more lock requests in the AST routine
is that you will need to keep a track of which lock requests you have had ASTs
for - in cast mainline issues any more lock requests in the meantime. You will
have to make sure that they get dispatched as well as your nested one. If course
this will happen in dlm_dispatch() but you don't know whose lock has been
dispatched each time.

Things might be a little clearer if you had a look inside libdlm.c - it's
actually quite as simple little library. One warning...try to avoid calling the
kernel bits yourself; I don't want to change the userland/kernel API but if it
becomes necessary the library will always be modified to cope.

-- 

patrick



From madmax at iskon.hr  Thu Jul  8 14:47:24 2004
From: madmax at iskon.hr (Kresimir Kukulj)
Date: Thu, 8 Jul 2004 16:47:24 +0200
Subject: [Linux-cluster] GNBD, how good it is ?
Message-ID: <20040708144724.GB18751@max.zg.iskon.hr>


What is the difference/development status of RedHat's (sistina) GNBD
compared to OpenGFS GNBD ? Which one is more stable ? I see on sourceforge
project page that OpenGFS GNBD was not updated since 2002.

I also found NBD, ENBD, DRBD but these don't support client nodes to be
mounted (even read-only) if master node is using the device.

Is there any other technology (software) that can export a block device from
1 master to couple of slave nodes ? Read only access on client nodes is good
enough.

Is anyone using some kind of network block device in production, and with
what success ?

Thanks.

-- 
Kresimir Kukulj                      madmax at iskon.hr
+--------------------------------------------------+
Old PC's never die. They just become Unix terminals.



From notiggy at gmail.com  Thu Jul  8 18:46:01 2004
From: notiggy at gmail.com (Brian Jackson)
Date: Thu, 8 Jul 2004 13:46:01 -0500
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <20040708144724.GB18751@max.zg.iskon.hr>
References: <20040708144724.GB18751@max.zg.iskon.hr>
Message-ID: <fb20c214040708114675b3a568@mail.gmail.com>

On Thu, 8 Jul 2004 16:47:24 +0200, Kresimir Kukulj <madmax at iskon.hr> wrote:
> 
> What is the difference/development status of RedHat's (sistina) GNBD
> compared to OpenGFS GNBD ? Which one is more stable ? I see on sourceforge
> project page that OpenGFS GNBD was not updated since 2002.

Once we (OpenGFS) found out that there were other alternatives, we
thought we had more important things to do than to maintain gnbd (with
our limited resources). It was deprecated some time ago. GFS's code
however has been maintained, and updated to newer kernels, etc. GFS's
is by far a better choice as far as those two are concerned.

> 
> I also found NBD, ENBD, DRBD but these don't support client nodes to be
> mounted (even read-only) if master node is using the device.
> 
> Is there any other technology (software) that can export a block device from
> 1 master to couple of slave nodes ? Read only access on client nodes is good
> enough.

iSCSI and HyperSCSI both work with GFS, so those are options. I
suppose you'd be better off answering the question of whether they are
stable enough for you.

--Brian Jackson

> 
> Is anyone using some kind of network block device in production, and with
> what success ?
> 
> Thanks.
> 
> --
> Kresimir Kukulj                      madmax at iskon.hr



From Gareth at Bult.co.uk  Thu Jul  8 20:02:15 2004
From: Gareth at Bult.co.uk (Gareth Bult)
Date: Thu, 08 Jul 2004 21:02:15 +0100
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <fb20c214040708114675b3a568@mail.gmail.com>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708114675b3a568@mail.gmail.com>
Message-ID: <1089316935.6121.7.camel@squizzey>

Hi,

Just a general comment;

I spent some time getting a cluster running, in summary - for anyone
interested (IMHO);

a.    x86 boxes cluster relatively easily once you get hold of the right
docs / examples
b.    amd64 boxes will not cluster with x86
c.    the cluster can crash relatively easily for a number of reasons
d.    ccsd can be a real CPU hog, esp when waiting to connect
e.    after a potentially silent gfs kernel crash, there's a real nice
bug that leaves your CPU floating at expected levels yet the load
average is up at 15+.

Summary (IMHO);

a.    Performance is good and it does work
b.    It looks promising, yet still alpha/beta
c.    Until mirroring is implemented clvmd, it's not really replacement
for NFS given the stability
d.    Documentation is severely lacking ...

I'm sure it will be good with a little more work (!) , but I was hoping
for production, and it's not quite there yet ..

Regards,
Gareth.

On Thu, 2004-07-08 at 13:46 -0500, Brian Jackson wrote:

> On Thu, 8 Jul 2004 16:47:24 +0200, Kresimir Kukulj <madmax at iskon.hr> wrote:
> > 
> > What is the difference/development status of RedHat's (sistina) GNBD
> > compared to OpenGFS GNBD ? Which one is more stable ? I see on sourceforge
> > project page that OpenGFS GNBD was not updated since 2002.
> 
> Once we (OpenGFS) found out that there were other alternatives, we
> thought we had more important things to do than to maintain gnbd (with
> our limited resources). It was deprecated some time ago. GFS's code
> however has been maintained, and updated to newer kernels, etc. GFS's
> is by far a better choice as far as those two are concerned.
> 
> > 
> > I also found NBD, ENBD, DRBD but these don't support client nodes to be
> > mounted (even read-only) if master node is using the device.
> > 
> > Is there any other technology (software) that can export a block device from
> > 1 master to couple of slave nodes ? Read only access on client nodes is good
> > enough.
> 
> iSCSI and HyperSCSI both work with GFS, so those are options. I
> suppose you'd be better off answering the question of whether they are
> stable enough for you.
> 
> --Brian Jackson
> 
> > 
> > Is anyone using some kind of network block device in production, and with
> > what success ?
> > 
> > Thanks.
> > 
> > --
> > Kresimir Kukulj                      madmax at iskon.hr
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gareth Bult <Gareth at Bult.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040708/34f8609c/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040708/34f8609c/attachment.sig>

From notiggy at gmail.com  Thu Jul  8 21:26:49 2004
From: notiggy at gmail.com (Brian Jackson)
Date: Thu, 8 Jul 2004 16:26:49 -0500
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <1089316935.6121.7.camel@squizzey>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708114675b3a568@mail.gmail.com>
	<1089316935.6121.7.camel@squizzey>
Message-ID: <fb20c214040708142665de3e41@mail.gmail.com>

> Hi,
>
> Just a general comment;
>
> I spent some time getting a cluster running, in summary - for anyone interested (IMHO);
<snip>
> Summary (IMHO);

> a.    Performance is good and it does work
>
> b.    It looks promising, yet still alpha/beta

I'm sure I read somewhere, that the current code is considered beta.
If I'm making that up, It's implied by the fact that the DLM is quite
new, and the port to 2.6 is also pretty fresh.

>
> c.    Until mirroring is implemented clvmd, it's not really replacement for NFS given the stability

It is if you've got shared storage that implements raid.

>
> d.    Documentation is severely lacking ...

Agreed, but that's why there's a wiki, mailing list, and irc channel.

>
> I'm sure it will be good with a little more work (!) , but I was hoping for production, and it's > not quite there yet ..

I'm sure Red Hat's product for RHEL is nice, stable, and production
ready. You are playing with relatively fresh code.

--Brian

>
> Regards,
>
> Gareth.



From kpfleming at backtobasicsmgmt.com  Fri Jul  9 00:47:34 2004
From: kpfleming at backtobasicsmgmt.com (Kevin P. Fleming)
Date: Thu, 08 Jul 2004 17:47:34 -0700
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <fb20c214040708142665de3e41@mail.gmail.com>
References: <20040708144724.GB18751@max.zg.iskon.hr>	<fb20c214040708114675b3a568@mail.gmail.com>	<1089316935.6121.7.camel@squizzey>
	<fb20c214040708142665de3e41@mail.gmail.com>
Message-ID: <40EDEB26.8020209@backtobasicsmgmt.com>

Brian Jackson wrote:

>>c.    Until mirroring is implemented clvmd, it's not really replacement for NFS given the stability
> 
> 
> It is if you've got shared storage that implements raid.

Not if you are trying to avoid single points of failure, unless you have 
  a fully redundant meshed fabric SAN, which most of us cannot afford :-)



From kpfleming at backtobasicsmgmt.com  Fri Jul  9 02:58:39 2004
From: kpfleming at backtobasicsmgmt.com (Kevin P. Fleming)
Date: Thu, 08 Jul 2004 19:58:39 -0700
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <fb20c214040708114675b3a568@mail.gmail.com>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708114675b3a568@mail.gmail.com>
Message-ID: <40EE09DF.7010909@backtobasicsmgmt.com>

Brian Jackson wrote:

> iSCSI and HyperSCSI both work with GFS, so those are options. I
> suppose you'd be better off answering the question of whether they are
> stable enough for you.

Speaking of iSCSI, is anyone aware of a GPL Linux 2.6 iSCSI target?



From wim.coekaerts at oracle.com  Fri Jul  9 03:04:54 2004
From: wim.coekaerts at oracle.com (Wim Coekaerts)
Date: Thu, 8 Jul 2004 20:04:54 -0700
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <40EE09DF.7010909@backtobasicsmgmt.com>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708114675b3a568@mail.gmail.com>
	<40EE09DF.7010909@backtobasicsmgmt.com>
Message-ID: <20040709030453.GA13641@ca-server1.us.oracle.com>

http://unh-iscsi.sourceforge.net/

On Thu, Jul 08, 2004 at 07:58:39PM -0700, Kevin P. Fleming wrote:
> Brian Jackson wrote:
> 
> >iSCSI and HyperSCSI both work with GFS, so those are options. I
> >suppose you'd be better off answering the question of whether they are
> >stable enough for you.
> 
> Speaking of iSCSI, is anyone aware of a GPL Linux 2.6 iSCSI target?
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From RAWIPFEL at novell.com  Fri Jul  9 03:07:15 2004
From: RAWIPFEL at novell.com (Robert Wipfel)
Date: Thu, 08 Jul 2004 21:07:15 -0600
Subject: [Linux-cluster] GNBD, how good it is ?
Message-ID: <s0edb78e.094@sinclair.provo.novell.com>


http://www.ardistech.com/iscsi/


>>> wim.coekaerts at oracle.com 7/8/2004 9:04:54 PM >>>
http://unh-iscsi.sourceforge.net/ 

On Thu, Jul 08, 2004 at 07:58:39PM -0700, Kevin P. Fleming wrote:
> Brian Jackson wrote:
> 
> >iSCSI and HyperSCSI both work with GFS, so those are options. I
> >suppose you'd be better off answering the question of whether they
are
> >stable enough for you.
> 
> Speaking of iSCSI, is anyone aware of a GPL Linux 2.6 iSCSI target?
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com 
> http://www.redhat.com/mailman/listinfo/linux-cluster 

--
Linux-cluster mailing list
Linux-cluster at redhat.com 
http://www.redhat.com/mailman/listinfo/linux-cluster



From kpfleming at backtobasicsmgmt.com  Fri Jul  9 03:16:08 2004
From: kpfleming at backtobasicsmgmt.com (Kevin P. Fleming)
Date: Thu, 08 Jul 2004 20:16:08 -0700
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <s0edb78e.094@sinclair.provo.novell.com>
References: <s0edb78e.094@sinclair.provo.novell.com>
Message-ID: <40EE0DF8.7020303@backtobasicsmgmt.com>

Robert Wipfel wrote:

> http://www.ardistech.com/iscsi/

This page says "requires a recent 2.4 kernel".

> http://unh-iscsi.sourceforge.net/ 

This page warns that their target implementation was only created to 
test the initiator with, and is not intended for production use. Doesn't 
mean it doesn't work, but that does not give me a good feeling about 
relying on it :-)



From Gareth at Bult.co.uk  Fri Jul  9 14:53:00 2004
From: Gareth at Bult.co.uk (Gareth Bult)
Date: Fri, 09 Jul 2004 15:53:00 +0100
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <fb20c214040708142665de3e41@mail.gmail.com>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708114675b3a568@mail.gmail.com>
	<1089316935.6121.7.camel@squizzey>
	<fb20c214040708142665de3e41@mail.gmail.com>
Message-ID: <1089384780.6120.35.camel@squizzey>

:)

I do appreciate all that, however there are some press releases out
there that are not so clear ..

There is certainly an implication in the news items I've seen that this
is "THE GFS" code .. as opposed to being a new and unstable version ..

.. Incidentally, I was being kind - I've had many kernel crashes, even
after getting it going ..

Regards,
Gareth.

On Thu, 2004-07-08 at 16:26 -0500, Brian Jackson wrote:

> > Hi,
> >
> > Just a general comment;
> >
> > I spent some time getting a cluster running, in summary - for anyone interested (IMHO);
> <snip>
> > Summary (IMHO);
> 
> > a.    Performance is good and it does work
> >
> > b.    It looks promising, yet still alpha/beta
> 
> I'm sure I read somewhere, that the current code is considered beta.
> If I'm making that up, It's implied by the fact that the DLM is quite
> new, and the port to 2.6 is also pretty fresh.
> 
> >
> > c.    Until mirroring is implemented clvmd, it's not really replacement for NFS given the stability
> 
> It is if you've got shared storage that implements raid.
> 
> >
> > d.    Documentation is severely lacking ...
> 
> Agreed, but that's why there's a wiki, mailing list, and irc channel.
> 
> >
> > I'm sure it will be good with a little more work (!) , but I was hoping for production, and it's > not quite there yet ..
> 
> I'm sure Red Hat's product for RHEL is nice, stable, and production
> ready. You are playing with relatively fresh code.
> 
> --Brian
> 
> >
> > Regards,
> >
> > Gareth.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gareth Bult <Gareth at Bult.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040709/3f2f83cc/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smiley-3.png
Type: image/png
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040709/3f2f83cc/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040709/3f2f83cc/attachment.sig>

From bmarzins at redhat.com  Fri Jul  9 16:21:43 2004
From: bmarzins at redhat.com (Benjamin Marzinski)
Date: Fri, 9 Jul 2004 11:21:43 -0500
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <1089384780.6120.35.camel@squizzey>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708114675b3a568@mail.gmail.com>
	<1089316935.6121.7.camel@squizzey>
	<fb20c214040708142665de3e41@mail.gmail.com>
	<1089384780.6120.35.camel@squizzey>
Message-ID: <20040709162142.GC23619@phlogiston.msp.redhat.com>

On Fri, Jul 09, 2004 at 03:53:00PM +0100, Gareth Bult wrote:
> :)
> 
> I do appreciate all that, however there are some press releases out
> there that are not so clear ..
> 
> There is certainly an implication in the news items I've seen that this
> is "THE GFS" code .. as opposed to being a new and unstable version ..
> 
> .. Incidentally, I was being kind - I've had many kernel crashes, even
> after getting it going ..
> 
> Regards,
> Gareth.

The code you are using is not the code currently being sold by redhat.
That is the 6.0 code. You can download that in SRPM form at
ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/

There is fairly complete documentation for this code. However it does not use
the DLM.  Instead, GULM handles all the cluster manager issues.  This code only
runs on 2.4 kernels.

The CVS code is going to be sold starting with RHEL 4. Some of the components,
like the dlm are just now gotten out of the development stage. Others, like
gnbd have been drastically rewritten.  We REALLY appreciate all the testing
that people are doing on these pieces, however, if you are trying to run
something in production, I would encourage you to run the 6.0 code.

-Ben Marzinski
bmarzins at redhat.com

> On Thu, 2004-07-08 at 16:26 -0500, Brian Jackson wrote:
> 
> > > Hi,
> > >
> > > Just a general comment;
> > >
> > > I spent some time getting a cluster running, in summary - for anyone interested (IMHO);
> > <snip>
> > > Summary (IMHO);
> > 
> > > a.    Performance is good and it does work
> > >
> > > b.    It looks promising, yet still alpha/beta
> > 
> > I'm sure I read somewhere, that the current code is considered beta.
> > If I'm making that up, It's implied by the fact that the DLM is quite
> > new, and the port to 2.6 is also pretty fresh.
> > 
> > >
> > > c.    Until mirroring is implemented clvmd, it's not really replacement for NFS given the stability
> > 
> > It is if you've got shared storage that implements raid.
> > 
> > >
> > > d.    Documentation is severely lacking ...
> > 
> > Agreed, but that's why there's a wiki, mailing list, and irc channel.
> > 
> > >
> > > I'm sure it will be good with a little more work (!) , but I was hoping for production, and it's > not quite there yet ..
> > 
> > I'm sure Red Hat's product for RHEL is nice, stable, and production
> > ready. You are playing with relatively fresh code.
> > 
> > --Brian
> > 
> > >
> > > Regards,
> > >
> > > Gareth.
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> -- 
> Gareth Bult <Gareth at Bult.co.uk>





> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From phillips at redhat.com  Fri Jul  9 16:36:40 2004
From: phillips at redhat.com (Daniel Phillips)
Date: Fri, 9 Jul 2004 12:36:40 -0400
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <1089384780.6120.35.camel@squizzey>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708142665de3e41@mail.gmail.com>
	<1089384780.6120.35.camel@squizzey>
Message-ID: <200407091236.40966.phillips@redhat.com>

On Friday 09 July 2004 10:53, Gareth Bult wrote:
> I do appreciate all that, however there are some press releases out
> there that are not so clear ..

Which press releases?

> There is certainly an implication in the news items I've seen that
> this is "THE GFS" code .. as opposed to being a new and unstable
> version ..

It's very clear that the 2.6 release is out there so that hackers can 
get to work on it, add to it, and go find bugs.  SRPMs for the stable 
6.0 release are linked from the cluster page.  They build against RHEL3 
kernels.

> .. Incidentally, I was being kind - I've had many kernel crashes,
> even after getting it going ..

On 2.6?  No surprise.  Please post any oopses to the list.

Regards,

Daniel



From Gareth at Bult.co.uk  Fri Jul  9 16:30:45 2004
From: Gareth at Bult.co.uk (Gareth Bult)
Date: Fri, 09 Jul 2004 17:30:45 +0100
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <20040709162142.GC23619@phlogiston.msp.redhat.com>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708114675b3a568@mail.gmail.com>
	<1089316935.6121.7.camel@squizzey>
	<fb20c214040708142665de3e41@mail.gmail.com>
	<1089384780.6120.35.camel@squizzey>
	<20040709162142.GC23619@phlogiston.msp.redhat.com>
Message-ID: <1089390645.6121.68.camel@squizzey>

Hi,

I'm afraid it's over a year since I used a 2.4 kernel, so the SRPMS
aren't much use to me personally ..

If someone were to make the "tools" available to configure and maintain
a cluster at the same time as the beta code, it might make more sense.
As things stand however I'm afraid I found finding relevant
documentation too much like hard work .vs. current code stability. 

If you could document the current cluster.xml file I'd be happy to try
again .. I keep reading the 6.0 docs which don't document the file and
explicitly state "do not edit this file by hand" .. damned if you do and
damned if you don't .. not least as ccsd (if not other components) crash
silently if there is an error in cluster.xml (!)

Regards,
Gareth.


On Fri, 2004-07-09 at 11:21 -0500, Benjamin Marzinski wrote:

> On Fri, Jul 09, 2004 at 03:53:00PM +0100, Gareth Bult wrote:
> > :)
> > 
> > I do appreciate all that, however there are some press releases out
> > there that are not so clear ..
> > 
> > There is certainly an implication in the news items I've seen that this
> > is "THE GFS" code .. as opposed to being a new and unstable version ..
> > 
> > .. Incidentally, I was being kind - I've had many kernel crashes, even
> > after getting it going ..
> > 
> > Regards,
> > Gareth.
> 
> The code you are using is not the code currently being sold by redhat.
> That is the 6.0 code. You can download that in SRPM form at
> ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
> 
> There is fairly complete documentation for this code. However it does not use
> the DLM.  Instead, GULM handles all the cluster manager issues.  This code only
> runs on 2.4 kernels.
> 
> The CVS code is going to be sold starting with RHEL 4. Some of the components,
> like the dlm are just now gotten out of the development stage. Others, like
> gnbd have been drastically rewritten.  We REALLY appreciate all the testing
> that people are doing on these pieces, however, if you are trying to run
> something in production, I would encourage you to run the 6.0 code.
> 
> -Ben Marzinski
> bmarzins at redhat.com
> 
> > On Thu, 2004-07-08 at 16:26 -0500, Brian Jackson wrote:
> > 
> > > > Hi,
> > > >
> > > > Just a general comment;
> > > >
> > > > I spent some time getting a cluster running, in summary - for anyone interested (IMHO);
> > > <snip>
> > > > Summary (IMHO);
> > > 
> > > > a.    Performance is good and it does work
> > > >
> > > > b.    It looks promising, yet still alpha/beta
> > > 
> > > I'm sure I read somewhere, that the current code is considered beta.
> > > If I'm making that up, It's implied by the fact that the DLM is quite
> > > new, and the port to 2.6 is also pretty fresh.
> > > 
> > > >
> > > > c.    Until mirroring is implemented clvmd, it's not really replacement for NFS given the stability
> > > 
> > > It is if you've got shared storage that implements raid.
> > > 
> > > >
> > > > d.    Documentation is severely lacking ...
> > > 
> > > Agreed, but that's why there's a wiki, mailing list, and irc channel.
> > > 
> > > >
> > > > I'm sure it will be good with a little more work (!) , but I was hoping for production, and it's > not quite there yet ..
> > > 
> > > I'm sure Red Hat's product for RHEL is nice, stable, and production
> > > ready. You are playing with relatively fresh code.
> > > 
> > > --Brian
> > > 
> > > >
> > > > Regards,
> > > >
> > > > Gareth.
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > http://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > -- 
> > Gareth Bult <Gareth at Bult.co.uk>
> 
> 
> 
> 
> 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gareth Bult <Gareth at Bult.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040709/88e7988c/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040709/88e7988c/attachment.sig>

From Gareth at Bult.co.uk  Fri Jul  9 16:42:12 2004
From: Gareth at Bult.co.uk (Gareth Bult)
Date: Fri, 09 Jul 2004 17:42:12 +0100
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <200407091236.40966.phillips@redhat.com>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708142665de3e41@mail.gmail.com>
	<1089384780.6120.35.camel@squizzey>
	<200407091236.40966.phillips@redhat.com>
Message-ID: <1089391332.6126.75.camel@squizzey>

>Which press releases?

Urm, how 'bout this one .. you might not call it a "press release", but
I found it via an announcement on a news site .. 
maybe it's me but I don't see the words "development", "beta" or "not
working yet" listed anywhere  .. ;-)

(Oopses listed on IRC when discovered, someone has them.. bmarzins I
think..)

Regards,
Gareth.

---
From:       Ken Preslan [email blocked]

To:         Linux Kernel Mailing List [email blocked]
Subject:    GFS cluster filesystem re-released
Date:       2004-06-24 22:53:49

Hi,

I'd like to announce that Red Hat has re-released the GFS cluster
filesystem and its related infrastructure under the GPL.  The
different projects that make up the infrastructure are:

GFS - shared-disk cluster file system
CLVM - clustering extensions to the LVM2 logical volume manager toolset
CMAN - general-purpose symmetric cluster manager
DLM - general-purpose distributed lock manager
CCS - cluster configuration system to manage the cluster config file
GULM - alternative redundant server-based lock/cluster manager for GFS
GNBD - network block device driver shares storage over a network
Fence - I/O fencing system


The source code and patches for 2.6 are available at
http://sources.redhat.com/cluster/.  2.4 source should show up early
tomorrow.

We're looking for people help us work on this project so we can
eventually get it included into the Linux kernel.  Comments,
suggestions, patches, and testers are more than welcome.
---






On Fri, 2004-07-09 at 12:36 -0400, Daniel Phillips wrote:

> On Friday 09 July 2004 10:53, Gareth Bult wrote:
> > I do appreciate all that, however there are some press releases out
> > there that are not so clear ..
> 
> Which press releases?
> 
> > There is certainly an implication in the news items I've seen that
> > this is "THE GFS" code .. as opposed to being a new and unstable
> > version ..
> 
> It's very clear that the 2.6 release is out there so that hackers can 
> get to work on it, add to it, and go find bugs.  SRPMs for the stable 
> 6.0 release are linked from the cluster page.  They build against RHEL3 
> kernels.
> 
> > .. Incidentally, I was being kind - I've had many kernel crashes,
> > even after getting it going ..
> 
> On 2.6?  No surprise.  Please post any oopses to the list.
> 
> Regards,
> 
> Daniel

-- 
Gareth Bult <Gareth at Bult.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040709/de566325/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smiley-4.png
Type: image/png
Size: 822 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040709/de566325/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040709/de566325/attachment.sig>

From phillips at redhat.com  Fri Jul  9 17:49:55 2004
From: phillips at redhat.com (Daniel Phillips)
Date: Fri, 9 Jul 2004 13:49:55 -0400
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <1089391332.6126.75.camel@squizzey>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<200407091236.40966.phillips@redhat.com>
	<1089391332.6126.75.camel@squizzey>
Message-ID: <200407091349.55362.phillips@redhat.com>

On Friday 09 July 2004 12:42, Gareth Bult wrote:
> >Which press releases?
>
> Urm, how 'bout this one .. you might not call it a "press release",
> but I found it via an announcement on a news site ..
> maybe it's me but I don't see the words "development", "beta" or "not
> working yet" listed anywhere  .. ;-)

You found it on Linux Kernel Mailing List, and it's for 2.6.  Please 
draw your own conclusion ;-)

GFS 6.0 on 2.4 is the stable release.

> (Oopses listed on IRC when discovered, someone has them.. bmarzins I
> think..)

Thanks.

Daniel



From amir at datacore.ch  Fri Jul  9 18:24:47 2004
From: amir at datacore.ch (Amir Guindehi)
Date: Fri, 09 Jul 2004 20:24:47 +0200
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <1089390645.6121.68.camel@squizzey>
References: <20040708144724.GB18751@max.zg.iskon.hr>	<fb20c214040708114675b3a568@mail.gmail.com>	<1089316935.6121.7.camel@squizzey>	<fb20c214040708142665de3e41@mail.gmail.com>	<1089384780.6120.35.camel@squizzey>	<20040709162142.GC23619@phlogiston.msp.redhat.com>
	<1089390645.6121.68.camel@squizzey>
Message-ID: <40EEE2EF.8010000@datacore.ch>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Gareth,

| If you could document the current cluster.xml file I'd be happy to try
| again .. I keep reading the 6.0 docs which don't document the file and
| explicitly state "do not edit this file by hand" .. damned if you do and
| damned if you don't .. not least as ccsd (if not other components) crash
| silently if there is an error in cluster.xml (!)

Did you find the GFS documentation I wrote?
It's available at:

https://open.datacore.ch/page/GFS
https://open.datacore.ch/page/GFS.Install

The later document includes two sample cluster.xml files for a two node
setup and for a three node setup with manual fencing.

Regards
- - Amir
- --
Amir Guindehi, nospam.amir at datacore.ch
DataCore GmbH, Witikonerstrasse 289, 8053 Zurich, Switzerland

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-nr1 (Windows 2000)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA7uLtbycOjskSVCwRAv5cAKCJOYl+3cxdY4FP1M7Im71P1cGVUACfSzoa
jiNyYrmjyCr7GckAXGVYmVM=
=9/tE
-----END PGP SIGNATURE-----



From Gareth at Bult.co.uk  Sat Jul 10 08:57:02 2004
From: Gareth at Bult.co.uk (Gareth Bult)
Date: Sat, 10 Jul 2004 09:57:02 +0100
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <200407091349.55362.phillips@redhat.com>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<200407091236.40966.phillips@redhat.com>
	<1089391332.6126.75.camel@squizzey>
	<200407091349.55362.phillips@redhat.com>
Message-ID: <1089449822.6121.103.camel@squizzey>

Urm, no ..

I'm not on the kernel mailing list.
(Are you implying 2.6 is unstable ?!  It's way safer to use that 2.4 !!)

I'm inclined at this point to mention Fedora and all my years using
Redhat and my relatively recent move to Gentoo .. suffice to say I
picked up the code off a public news site and thought it was stable
enough to play with. (and it's not)

How about some big notices on the source web pages to the effect that
it's for experimental use only and should not be used near a production
environment (?!)

Regards,
Gareth.

On Fri, 2004-07-09 at 13:49 -0400, Daniel Phillips wrote:

> On Friday 09 July 2004 12:42, Gareth Bult wrote:
> > >Which press releases?
> >
> > Urm, how 'bout this one .. you might not call it a "press release",
> > but I found it via an announcement on a news site ..
> > maybe it's me but I don't see the words "development", "beta" or "not
> > working yet" listed anywhere  .. ;-)
> 
> You found it on Linux Kernel Mailing List, and it's for 2.6.  Please 
> draw your own conclusion ;-)
> 
> GFS 6.0 on 2.4 is the stable release.
> 
> > (Oopses listed on IRC when discovered, someone has them.. bmarzins I
> > think..)
> 
> Thanks.
> 
> Daniel

-- 
Gareth Bult <Gareth at Bult.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040710/ac7bae4e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040710/ac7bae4e/attachment.sig>

From mailing-lists at hughesjr.com  Sat Jul 10 10:47:05 2004
From: mailing-lists at hughesjr.com (Johnny Hughes)
Date: Sat, 10 Jul 2004 05:47:05 -0500
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <1089449822.6121.103.camel@squizzey>
References: <1089449822.6121.103.camel@squizzey>
Message-ID: <1089456425.10000.10.camel@Myth.home.local>

Gareth,
What I think everyone is saying ... not implying, but saying... is
this.  

RHEL is stable (if you use the supported kernel), and GFS 6.0 for RHEL
is stable.  If you are using anything else, it is not stable.  Why is
that so hard to understand?

The 2.6 Kernel is stable ... however, it is not stable (or supported) on
RHEL ... and the code GFS code for the 2.6 kernel is not recommended for
use on a production machine with a 2.6 kernel.  Use the GFS code for the
2.6 kernel on a production machine at your own risk.

At least that is what I got out of the posts ... maybe I'm wrong though

Johnny Hughes
HughesJR.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040710/58e9b33a/attachment.htm>

From madmax at iskon.hr  Sat Jul 10 18:38:57 2004
From: madmax at iskon.hr (Kresimir Kukulj)
Date: Sat, 10 Jul 2004 20:38:57 +0200
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <20040709162142.GC23619@phlogiston.msp.redhat.com>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708114675b3a568@mail.gmail.com>
	<1089316935.6121.7.camel@squizzey>
	<fb20c214040708142665de3e41@mail.gmail.com>
	<1089384780.6120.35.camel@squizzey>
	<20040709162142.GC23619@phlogiston.msp.redhat.com>
Message-ID: <20040710183857.GA7532@max.zg.iskon.hr>

Quoting Benjamin Marzinski (bmarzins at redhat.com):
> On Fri, Jul 09, 2004 at 03:53:00PM +0100, Gareth Bult wrote:
> > :)
> > 
> > I do appreciate all that, however there are some press releases out
> > there that are not so clear ..
> > 
> > There is certainly an implication in the news items I've seen that this
> > is "THE GFS" code .. as opposed to being a new and unstable version ..
> > 
> > .. Incidentally, I was being kind - I've had many kernel crashes, even
> > after getting it going ..
> > 
> > Regards,
> > Gareth.
> 
> The code you are using is not the code currently being sold by redhat.
> That is the 6.0 code. You can download that in SRPM form at
> ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
> 
> There is fairly complete documentation for this code. However it does not use
> the DLM.  Instead, GULM handles all the cluster manager issues.  This code only
> runs on 2.4 kernels.
> 
> The CVS code is going to be sold starting with RHEL 4. Some of the components,
> like the dlm are just now gotten out of the development stage. Others, like

Is this new DLM still dependent on single lock storage or is it distributed
(like in OpenDLM) ?

> gnbd have been drastically rewritten.  We REALLY appreciate all the testing

You are saying that GNBD is rewritten. How does it compare to GNBD in
GFS-6.0 (version sold by RedHat) in stability, performance, features ?

> that people are doing on these pieces, however, if you are trying to run
> something in production, I would encourage you to run the 6.0 code.

Thanks, I'll look into it.


Does anyone use some software based shared storage like GNBD, iSCSI
or HyperSCSI as an alternative to expensive FibreChannel hardware ?
If you do, can you describe your experiences (how stable it is, performance,
which implementation)... I believe this information will be interesting to
other people too.

Browsing the net, there are couple of variants of network block device (NBD,
ENBD, DRBD) but they don't support more than one client (and both sides
cannot be used at the same time).

There is of course GNBD:
  - OpenGFS version - not maintained anymore.
  - GFS-6.0 version sold by RedHat (2.4 kernel).
  - GFS-XX version from re-released sources of GFS ported to 2.6 kernel.

There are two 'target' implementations of iSCSI protocol:

  - http://unh-iscsi.sourceforge.net/
    initiator implementation is their primary development. They have target
    implemented but is currently mostly used to test the initiator. Runs on
    2.4 and 2.6 kernels.

  - http://www.ardistech.com/iscsi/
    iSCSI target implementation for 2.4 kernel's only.

  - http://linux-iscsi.sourceforge.net/
    iSCSI initiator implementation for 2.4 and 2.6 kernels.


Anything else?

-- 
Kresimir Kukulj                      madmax at iskon.hr
+--------------------------------------------------+
Old PC's never die. They just become Unix terminals.



From Gareth at Bult.co.uk  Sat Jul 10 21:42:45 2004
From: Gareth at Bult.co.uk (Gareth Bult)
Date: Sat, 10 Jul 2004 22:42:45 +0100
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <40EEE2EF.8010000@datacore.ch>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708114675b3a568@mail.gmail.com>
	<1089316935.6121.7.camel@squizzey>
	<fb20c214040708142665de3e41@mail.gmail.com>
	<1089384780.6120.35.camel@squizzey>
	<20040709162142.GC23619@phlogiston.msp.redhat.com>
	<1089390645.6121.68.camel@squizzey>  <40EEE2EF.8010000@datacore.ch>
Message-ID: <1089495765.6126.148.camel@squizzey>

Hi,

> Did you find the GFS documentation I wrote?
> It's available at:
> 
> https://open.datacore.ch/page/GFS
> https://open.datacore.ch/page/GFS.Install


I certainly did, very nice it is too .. :)


> The later document includes two sample cluster.xml files for a two node
> setup and for a three node setup with manual fencing.


Urm, yes, it does.

However, after reading some of the 6.0 docs, this covers about 2% of the
things you can do with cluster.xml .. (!)

After getting a cluster "working" I started looking at services, shared
IP's etc .. I didn't really stand a chance with all the "don't edit by
hand" stuff in the 6.0 docs - they don't list the cluster.xml's to go
with the examples, they just give screen shots of the config apps ... :(

Regards,
Gareth.



> 
> Regards
> - - Amir
> - --
> Amir Guindehi, nospam.amir at datacore.ch
> DataCore GmbH, Witikonerstrasse 289, 8053 Zurich, Switzerland
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2-nr1 (Windows 2000)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
> 
> iD8DBQFA7uLtbycOjskSVCwRAv5cAKCJOYl+3cxdY4FP1M7Im71P1cGVUACfSzoa
> jiNyYrmjyCr7GckAXGVYmVM=
> =9/tE
> -----END PGP SIGNATURE-----
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gareth Bult <Gareth at Bult.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040710/d7edf249/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smiley-6.png
Type: image/png
Size: 796 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040710/d7edf249/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smiley-3.png
Type: image/png
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040710/d7edf249/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040710/d7edf249/attachment.sig>

From Gareth at Bult.co.uk  Sat Jul 10 21:46:24 2004
From: Gareth at Bult.co.uk (Gareth Bult)
Date: Sat, 10 Jul 2004 22:46:24 +0100
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <1089456425.10000.10.camel@Myth.home.local>
References: <1089449822.6121.103.camel@squizzey>
	<1089456425.10000.10.camel@Myth.home.local>
Message-ID: <1089495984.6120.153.camel@squizzey>

Hi,

> What I think everyone is saying ... not implying, but saying... is
> this.  
> 
> RHEL is stable (if you use the supported kernel), and GFS 6.0 for RHEL
> is stable.  If you are using anything else, it is not stable.  Why is
> that so hard to understand?


Perhaps because for non-redhat users, 2.4 is considered "old hat" and
they can't understand why Redhat is *still* using 2.4 (?!)


> The 2.6 Kernel is stable ... however, it is not stable (or supported)
> on RHEL ... and the code GFS code for the 2.6 kernel is not
> recommended for use on a production machine with a 2.6 kernel.  Use
> the GFS code for the 2.6 kernel on a production machine at your own
> risk.


Urm, I guess I don't "have" to use 2.6, but it would be "really" painful
for me not to use 2.6 .. for way more reasons than I want to list here.


> At least that is what I got out of the posts ... maybe I'm wrong
> though


Sure, thought you should appreciate that for people who've been off 2.4
and in production on 2.6 for a long time, comments like "you should be
using 2.4" are a little redundant.

Regards,
Gareth.


> 
> Johnny Hughes
> HughesJR.com
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gareth Bult <Gareth at Bult.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040710/4b08f7ad/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040710/4b08f7ad/attachment.sig>

From mailing-lists at hughesjr.com  Sat Jul 10 22:37:32 2004
From: mailing-lists at hughesjr.com (Johnny Hughes)
Date: Sat, 10 Jul 2004 17:37:32 -0500
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <1089495984.6120.153.camel@squizzey>
References: <1089495984.6120.153.camel@squizzey>
Message-ID: <1089499052.5230.27.camel@Myth.home.local>

On Sat, 2004-07-10 at 16:46, Gareth Bult wrote:

> Hi,
> 
> 
> >What I think everyone is saying ... not implying, but saying... is this.
> 
> 
> >RHEL is stable (if you use the supported kernel), and GFS 6.0 for RHEL
> >is stable.  If you are using anything else, it is not stable.  Why is
> >that so hard to understand?
> 
> 
> 
> Perhaps because for non-redhat users, 2.4 is considered "old hat" and
> they can't understand why Redhat is *still* using 2.4 (?!)
> 

RHEL is using a 2.4 kernel because that is what they chose to make
stable. You are on a RedHat mailing list discussing a RedHat product. 
Thousands of customers running Oracle on RHEL 3 AS are quite happy that
RedHat is using a 2.4.21 kernel (as an example).  They are also happy
that RedHat is making GFS 6.0 available for the RHEL 3 product line.

> 
> 
> >The 2.6 Kernel is stable ... however, it is not stable (or supported) on
> >RHEL ... and the code GFS code for the 2.6 kernel is not recommended for
> >use on a production machine with a 2.6 kernel.  Use the GFS code for the
> >2.6 kernel on a production machine at your own risk.
> 
> 
> 
> Urm, I guess I don't "have" to use 2.6, but it would be "really" painful
> for me not to use 2.6 .. for way more reasons than I want to list here.
> 

Use whatever you want ... only don't expect software that someone says
is unstable to be stable.  IF you want a stable GFS from RedHat ... use
RHEL and GFS 6.  If you want to use another distro and another GFS ...
great ... just don't complain that it is not stable.

> 
> 
> >At least that is what I got out of the posts ... maybe I'm wrong though
> 
> 
> 
> Sure, thought you should appreciate that for people who've been off 2.4
> and in production on 2.6 for a long time, comments like "you should be
> using 2.4" are a little redundant.

Again ... you are the person who chooses what technology you deploy ...
but RedHat is going to put out stable products for their supported
RHEL.  If you can make it work on a different distro with a different
kernel, great.

> 
> Regards,
> Gareth.
> 
> 
> 
> 
> Johnny Hughes
> HughesJR.com <http://www.hughesjr.com>  	
> 

Johnny Hughes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040710/ecb25d62/attachment.htm>

From Gareth at Bult.co.uk  Sun Jul 11 15:18:40 2004
From: Gareth at Bult.co.uk (Gareth Bult)
Date: Sun, 11 Jul 2004 16:18:40 +0100
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <1089499052.5230.27.camel@Myth.home.local>
References: <1089495984.6120.153.camel@squizzey>
	<1089499052.5230.27.camel@Myth.home.local>
Message-ID: <1089559120.6120.163.camel@squizzey>

Hi,

Sorry, my mistake, I read the list title as "linux-cluster" as opposed
to "redhat-linux-server-3.0-cluster".

As I'm no longer a RH user, and now I know, I'll unsubscribe.

Thanks for the clarification.

Regards,
Gareth.


On Sat, 2004-07-10 at 17:37 -0500, Johnny Hughes wrote:

> On Sat, 2004-07-10 at 16:46, Gareth Bult wrote: 
> 
> > Hi,
> > 
> > 
> > >What I think everyone is saying ... not implying, but saying... is this.
> > 
> > 
> > >RHEL is stable (if you use the supported kernel), and GFS 6.0 for RHEL
> > >is stable.  If you are using anything else, it is not stable.  Why is
> > >that so hard to understand?
> > 
> > 
> > 
> > Perhaps because for non-redhat users, 2.4 is considered "old hat" and
> > they can't understand why Redhat is *still* using 2.4 (?!)
> 
> RHEL is using a 2.4 kernel because that is what they chose to make
> stable. You are on a RedHat mailing list discussing a RedHat product.
> Thousands of customers running Oracle on RHEL 3 AS are quite happy
> that RedHat is using a 2.4.21 kernel (as an example).  They are also
> happy that RedHat is making GFS 6.0 available for the RHEL 3 product
> line. 
> 
> > 
> > 
> > >The 2.6 Kernel is stable ... however, it is not stable (or supported) on
> > >RHEL ... and the code GFS code for the 2.6 kernel is not recommended for
> > >use on a production machine with a 2.6 kernel.  Use the GFS code for the
> > >2.6 kernel on a production machine at your own risk.
> > 
> > 
> > 
> > Urm, I guess I don't "have" to use 2.6, but it would be "really" painful
> > for me not to use 2.6 .. for way more reasons than I want to list here.
> 
> Use whatever you want ... only don't expect software that someone says
> is unstable to be stable.  IF you want a stable GFS from RedHat ...
> use RHEL and GFS 6.  If you want to use another distro and another
> GFS ... great ... just don't complain that it is not stable. 
> 
> > 
> > 
> > >At least that is what I got out of the posts ... maybe I'm wrong though
> > 
> > 
> > 
> > Sure, thought you should appreciate that for people who've been off 2.4
> > and in production on 2.6 for a long time, comments like "you should be
> > using 2.4" are a little redundant.
> 
> Again ... you are the person who chooses what technology you
> deploy ... but RedHat is going to put out stable products for their
> supported RHEL.  If you can make it work on a different distro with a
> different kernel, great. 
> 
> > 
> > Regards,
> > Gareth.
> > 
> > 
> > 
> > 
> > Johnny Hughes
> > HughesJR.com <http://www.hughesjr.com>  	
> 
> Johnny Hughes

-- 
Gareth Bult <Gareth at Bult.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040711/1a53b4de/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040711/1a53b4de/attachment.sig>

From pcaulfie at redhat.com  Mon Jul 12 07:49:10 2004
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 12 Jul 2004 08:49:10 +0100
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <20040710183857.GA7532@max.zg.iskon.hr>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708114675b3a568@mail.gmail.com>
	<1089316935.6121.7.camel@squizzey>
	<fb20c214040708142665de3e41@mail.gmail.com>
	<1089384780.6120.35.camel@squizzey>
	<20040709162142.GC23619@phlogiston.msp.redhat.com>
	<20040710183857.GA7532@max.zg.iskon.hr>
Message-ID: <20040712074909.GC11355@tykepenguin.com>

On Sat, Jul 10, 2004 at 08:38:57PM +0200, Kresimir Kukulj wrote:
> Quoting Benjamin Marzinski (bmarzins at redhat.com):
> > On Fri, Jul 09, 2004 at 03:53:00PM +0100, Gareth Bult wrote:
> > > :)
> > > 
> > > I do appreciate all that, however there are some press releases out
> > > there that are not so clear ..
> > > 
> > > There is certainly an implication in the news items I've seen that this
> > > is "THE GFS" code .. as opposed to being a new and unstable version ..
> > > 
> > > .. Incidentally, I was being kind - I've had many kernel crashes, even
> > > after getting it going ..
> > > 
> > > Regards,
> > > Gareth.
> > 
> > The code you are using is not the code currently being sold by redhat.
> > That is the 6.0 code. You can download that in SRPM form at
> > ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
> > 
> > There is fairly complete documentation for this code. However it does not use
> > the DLM.  Instead, GULM handles all the cluster manager issues.  This code only
> > runs on 2.4 kernels.
> > 
> > The CVS code is going to be sold starting with RHEL 4. Some of the components,
> > like the dlm are just now gotten out of the development stage. Others, like
> 
> Is this new DLM still dependent on single lock storage or is it distributed
> (like in OpenDLM) ?

The DLM is fully distributed.

-- 

patrick



From arekm at pld-linux.org  Mon Jul 12 09:36:12 2004
From: arekm at pld-linux.org (Arkadiusz Miskiewicz)
Date: Mon, 12 Jul 2004 11:36:12 +0200
Subject: [Linux-cluster] Fwd: gfs fixes for PPC32 platform...
Message-ID: <200407121136.12773.arekm@pld-linux.org>



----------  Forwarded Message  ----------

Subject: gfs fixes for PPC32 platform...
Date: Monday 12 of July 2004 11:30
From: Pawe? Sikora <pluto at ds14.agh.edu.pl>
To: arekm at pld-linux.org

[ fs/gfs_locking/lock_gulm/utils_verb_flags.c ]:

The `strncasecmp' function confilcts with arch/ppc{,64}/lib/strcase.c
Please, rename it or link with proper arch/*/lib/built-in.o

[ fs/gfs/log.c ]:

The sequence `switch (head_wrap - dump_wrap)' uses __ucmpdi2 (for 64-bits
 ops) from libgcc_s.so and finally causing `unresolved symbol' in module.

fix:

__u64 tmp = head_wrap - dump_wrap;
if (tmp < 0x100000000LLU)
	switch ((__u32)tmp)
	{
	....
	}
else
	// default action.

--
/* Copyright (C) 2003, SCO, Inc. This is valuable Intellectual Property. */

                           #define say(x) lie(x)

-------------------------------------------------------



-- 
Arkadiusz Mi?kiewicz     CS at FoE, Wroclaw University of Technology
arekm.pld-linux.org, 1024/3DB19BBD, JID: arekm.jabber.org, PLD/Linux
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2.6.7-ppc-strncasecmp.patch
Type: text/x-diff
Size: 1238 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040712/2ba901a4/attachment.bin>

From lhh at redhat.com  Mon Jul 12 13:41:24 2004
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 12 Jul 2004 09:41:24 -0400
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <1089495765.6126.148.camel@squizzey>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708114675b3a568@mail.gmail.com>
	<1089316935.6121.7.camel@squizzey>
	<fb20c214040708142665de3e41@mail.gmail.com>
	<1089384780.6120.35.camel@squizzey>
	<20040709162142.GC23619@phlogiston.msp.redhat.com>
	<1089390645.6121.68.camel@squizzey>  <40EEE2EF.8010000@datacore.ch>
	<1089495765.6126.148.camel@squizzey>
Message-ID: <1089639684.13281.146.camel@atlantis.boston.redhat.com>

On Sat, 2004-07-10 at 22:42 +0100, Gareth Bult wrote:
> However, after reading some of the 6.0 docs, this covers about 2% of
> the things you can do with cluster.xml .. (!)

True.  But cluster.xml is rather different from the one in RHCS/RHGFS.

> After getting a cluster "working" I started looking at services,
> shared IP's etc .. I didn't really stand a chance with all the "don't
> edit by hand" stuff in the 6.0 docs - they don't list the cluster.
> xml's to go with the examples, they just give screen shots of the
> config apps ... :(

Well, the shared IPs and stuff won't really do much until Friday... ;)
I hope to commit the cold-failover component to CVS by then.

The config app won't be available for awhile; it was for RHCS and RHGFS;
which have much less "stuff" to deal with than this project.  However, I
hope to expand the preliminary things I've sent to this list so that
people can define resource groups by (gasp) hand editing - at least
until the GUI is built.

-- Lon



From bmarzins at redhat.com  Mon Jul 12 15:56:51 2004
From: bmarzins at redhat.com (Benjamin Marzinski)
Date: Mon, 12 Jul 2004 10:56:51 -0500
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <20040710183857.GA7532@max.zg.iskon.hr>
References: <20040708144724.GB18751@max.zg.iskon.hr>
	<fb20c214040708114675b3a568@mail.gmail.com>
	<1089316935.6121.7.camel@squizzey>
	<fb20c214040708142665de3e41@mail.gmail.com>
	<1089384780.6120.35.camel@squizzey>
	<20040709162142.GC23619@phlogiston.msp.redhat.com>
	<20040710183857.GA7532@max.zg.iskon.hr>
Message-ID: <20040712155651.GE23619@phlogiston.msp.redhat.com>

On Sat, Jul 10, 2004 at 08:38:57PM +0200, Kresimir Kukulj wrote:
> Quoting Benjamin Marzinski (bmarzins at redhat.com):
> > On Fri, Jul 09, 2004 at 03:53:00PM +0100, Gareth Bult wrote:
> > > :)
> > > 
> > > I do appreciate all that, however there are some press releases out
> > > there that are not so clear ..
> > > 
> > > There is certainly an implication in the news items I've seen that this
> > > is "THE GFS" code .. as opposed to being a new and unstable version ..
> > > 
> > > .. Incidentally, I was being kind - I've had many kernel crashes, even
> > > after getting it going ..
> > > 
> > > Regards,
> > > Gareth.
> > 
> > The code you are using is not the code currently being sold by redhat.
> > That is the 6.0 code. You can download that in SRPM form at
> > ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/
> > 
> > There is fairly complete documentation for this code. However it does not use
> > the DLM.  Instead, GULM handles all the cluster manager issues.  This code only
> > runs on 2.4 kernels.
> > 
> > The CVS code is going to be sold starting with RHEL 4. Some of the components,
> > like the dlm are just now gotten out of the development stage. Others, like
> 
> Is this new DLM still dependent on single lock storage or is it distributed
> (like in OpenDLM) ?
> 
> > gnbd have been drastically rewritten.  We REALLY appreciate all the testing
> 
> You are saying that GNBD is rewritten. How does it compare to GNBD in
> GFS-6.0 (version sold by RedHat) in stability, performance, features ?

Since it has just been rewritten, it is currently pretty unstable...
The rewrite involved removing large chunks of gnbd from the kernel, and
doing them in user space.  This should make it easier to maintain. So once
it stabilizes, it should be better.

Performance testing will be done as soon as I'm happy with the stability.
It should be pretty much the same... Not too much of the core functionality
changed.

The features are pretty much identical, except that the new code auto
reconnects if it looses a connection.

The rewrite was done because the block layer had some largish changes from
2.4 to 2.6, and to be more inline with the way redhat ships maintains
products.

-Ben Marzinski
bmarzins at redhat.com



From mtilstra at redhat.com  Mon Jul 12 17:09:39 2004
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Mon, 12 Jul 2004 12:09:39 -0500
Subject: [Linux-cluster] Fwd: gfs fixes for PPC32 platform...
In-Reply-To: <200407121136.12773.arekm@pld-linux.org>
References: <200407121136.12773.arekm@pld-linux.org>
Message-ID: <20040712170939.GA1471@redhat.com>

On Mon, Jul 12, 2004 at 11:36:12AM +0200, Arkadiusz Miskiewicz wrote:
> ----------  Forwarded Message  ----------
> 
> Subject: gfs fixes for PPC32 platform...
> Date: Monday 12 of July 2004 11:30
> From: Pawe? Sikora <pluto at ds14.agh.edu.pl>
> To: arekm at pld-linux.org
> 
> [ fs/gfs_locking/lock_gulm/utils_verb_flags.c ]:
> 
> The `strncasecmp' function confilcts with arch/ppc{,64}/lib/strcase.c
> Please, rename it or link with proper arch/*/lib/built-in.o
> 

the utils_verb_flags stuff isn't actually needed anymore, so I just
removed it.  Which as a side affect, should fix the compile thing you
see.
(odd how ppc and ppc64 seem to be the only archs that have a strncasecmp
function...)

Thanks for catching that.
-- 
Michael Conrad Tadpol Tilstra
Even though I feel like I might ignite, I probably won't.
But I'm gonna try anyways.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040712/fd3049a2/attachment.sig>

From phillips at redhat.com  Mon Jul 12 18:09:38 2004
From: phillips at redhat.com (Daniel Phillips)
Date: Mon, 12 Jul 2004 14:09:38 -0400
Subject: [Linux-cluster] GNBD, how good it is ?
In-Reply-To: <1089559120.6120.163.camel@squizzey>
References: <1089495984.6120.153.camel@squizzey>
	<1089499052.5230.27.camel@Myth.home.local>
	<1089559120.6120.163.camel@squizzey>
Message-ID: <200407121409.38256.phillips@redhat.com>

On Sunday 11 July 2004 11:18, Gareth Bult wrote:
> Hi,
>
> Sorry, my mistake, I read the list title as "linux-cluster" as
> opposed to "redhat-linux-server-3.0-cluster".
>
> As I'm no longer a RH user, and now I know, I'll unsubscribe.
>
> Thanks for the clarification.
>
> Regards,
> Gareth.

Suit yourself, however please be accurate in your comments.  You have 
mischaracterized this list as a Red Hat product-oriented list.  It is 
not, it is a community forum, please read the other posts to be sure of 
that.

Regards,

Daniel



From phillips at redhat.com  Mon Jul 12 20:18:50 2004
From: phillips at redhat.com (Daniel Phillips)
Date: Mon, 12 Jul 2004 16:18:50 -0400
Subject: [Linux-cluster] [ANNOUNCE] Cluster Infrastructure BOF at OLS
Message-ID: <200407121618.50987.phillips@redhat.com>

Hi all,

There will be a BOF at OLS for those interested in hammering out issues
of cluster infrastructure for Linux.

   http://www.linuxsymposium.org/2004/view_abstract.php?content_key=203

   Friday July 23rd, 8:00 to 9:00 PM, Room D
   (Watch for last minute schedule changes.)

The format will be:

   * Panel discussion, 20 minutes
   * Open discussion, 30 minutes
   * Wrapup, 10 minutes.

Participants should come equipped with an adequate supply of fire 
retardant and/or a good flamesuit.

P.S., Please remove cross-posts as appropriate if you reply.

Regards,

Daniel



From arekm at pld-linux.org  Tue Jul 13 14:25:58 2004
From: arekm at pld-linux.org (Arkadiusz Miskiewicz)
Date: Tue, 13 Jul 2004 16:25:58 +0200
Subject: [Linux-cluster] [PATCH]: include cluster config on more arches
Message-ID: <200407131625.58495.arekm@pld-linux.org>


>From qboosh at pld-linux.org - include cluster config on more arches
--- linux-2.6.7/arch/ppc/Kconfig.orig   2004-07-09 23:03:07.000000000 +0000
+++ linux-2.6.7/arch/ppc/Kconfig        2004-07-10 09:17:08.000000000 +0000
@@ -1247,6 +1247,8 @@

 source "lib/Kconfig"

+source "cluster/Kconfig"
+
 source "arch/ppc/oprofile/Kconfig"

 menu "Kernel hacking"
--- linux-2.6.7/arch/ia64/Kconfig.orig  2004-07-09 23:03:06.000000000 +0000
+++ linux-2.6.7/arch/ia64/Kconfig       2004-07-10 09:19:56.000000000 +0000
@@ -368,6 +368,8 @@

 source "lib/Kconfig"

+source "cluster/Kconfig"
+
 source "arch/ia64/hp/sim/Kconfig"

 source "arch/ia64/oprofile/Kconfig"

-- 
Arkadiusz Mi?kiewicz     CS at FoE, Wroclaw University of Technology
arekm.pld-linux.org, 1024/3DB19BBD, JID: arekm.jabber.org, PLD/Linux



From pcaulfie at redhat.com  Tue Jul 13 14:47:14 2004
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 13 Jul 2004 15:47:14 +0100
Subject: [Linux-cluster] [PATCH]: include cluster config on more arches
In-Reply-To: <200407131625.58495.arekm@pld-linux.org>
References: <200407131625.58495.arekm@pld-linux.org>
Message-ID: <20040713144714.GG14327@tykepenguin.com>

On Tue, Jul 13, 2004 at 04:25:58PM +0200, Arkadiusz Miskiewicz wrote:
> 
> >From qboosh at pld-linux.org - include cluster config on more arches

I added a few last week:

$ patch -p1 < ~/dev/cluster/cman-kernel/patches/2.6.7/00001.patch
patching file arch/alpha/Kconfig
patching file arch/arm/Kconfig
patching file arch/arm26/Kconfig
patching file arch/cris/Kconfig
patching file arch/i386/Kconfig
patching file arch/ia64/Kconfig
patching file arch/m68k/Kconfig
patching file arch/mips/Kconfig
patching file arch/parisc/Kconfig
patching file arch/ppc/Kconfig
patching file arch/ppc64/Kconfig
patching file arch/s390/Kconfig
patching file arch/sh/Kconfig
patching file arch/sparc/Kconfig
patching file arch/sparc64/Kconfig
patching file arch/um/Kconfig
patching file arch/x86_64/Kconfig

Are there any missing from that (apart from the non-MMU ones) ?

:-)
-- 

patrick



From ced at md3.vsnl.net.in  Tue Jul 13 23:45:23 2004
From: ced at md3.vsnl.net.in (ced)
Date: Tue, 13 Jul 2004 16:45:23 -0700
Subject: [Linux-cluster] Details Wanted
Message-ID: <000501c46933$78188d20$0100a8c0@cedtn>

Dear Sir

This is Jeyaram P from  India regarding to establishing a linux based
cluster. I have 20 numbers of  Intel Celeron 366 Mhz processor computer. Now
i want to make a cluster using the above said systems. The RAM capasity is
varied from 32 MBto 128 MB system to system. Plz guide me to estblish a
linux cluster.

My e-mail id : sadmn_cedtn at yahoo.co.in
                      apjram at yahoo.com

Looking forward

With regards
Jeyaram P




From john.hearns at clustervision.com  Tue Jul 13 15:43:01 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Tue, 13 Jul 2004 16:43:01 +0100
Subject: [Linux-cluster] Details Wanted
In-Reply-To: <000501c46933$78188d20$0100a8c0@cedtn>
References: <000501c46933$78188d20$0100a8c0@cedtn>
Message-ID: <1089733381.4373.90.camel@vigor12>

On Wed, 2004-07-14 at 00:45, ced wrote:
> Dear Sir
> 
> This is Jeyaram P from  India regarding to establishing a linux based
> cluster. 
Jeyaram, do you mean a computational cluster,
also known as a Beowulf cluster?
You could make a start by looking at this site http://www.beowulf.org




From arekm at pld-linux.org  Tue Jul 13 15:45:31 2004
From: arekm at pld-linux.org (Arkadiusz Miskiewicz)
Date: Tue, 13 Jul 2004 17:45:31 +0200
Subject: [Linux-cluster] [PATCH]: include cluster config on more arches
In-Reply-To: <20040713144714.GG14327@tykepenguin.com>
References: <200407131625.58495.arekm@pld-linux.org>
	<20040713144714.GG14327@tykepenguin.com>
Message-ID: <200407131745.31536.arekm@pld-linux.org>

On Tuesday 13 of July 2004 16:47, Patrick Caulfield wrote:

> I added a few last week:
Great.

> Are there any missing from that (apart from the non-MMU ones) ?
Probably not - thanks!

-- 
Arkadiusz Mi?kiewicz     CS at FoE, Wroclaw University of Technology
arekm.pld-linux.org, 1024/3DB19BBD, JID: arekm.jabber.org, PLD/Linux



From don at smugmug.com  Tue Jul 13 22:22:51 2004
From: don at smugmug.com (Don MacAskill)
Date: Tue, 13 Jul 2004 15:22:51 -0700
Subject: [Linux-cluster] GFS limits?
Message-ID: <40F460BB.4040603@smugmug.com>


Hi there,

I've been peripherally following GFS's progress for the last two years 
or so, and I'm very interested in using it.  We were already on Red Hat 
when Sistina was acquired, so I've been waiting to see what Red Hat will 
do with it.   But before I get ahold of the sales people, I thought I'd 
find out a little more about it.

We have two use cases where I can see it being useful:

- For our web server clusters to share a single "snapshot" of our 
application code amongst themselves.  GFS obviously functions great in 
this environment and would be useful.

- For our backend image data storage.  We currently have 35TB of 
storage, and it's growing at a rapid rate.  I'd like to be able to scale 
into hundreds of petabytes some day, and would like to select a solution 
early that will scale large.  Migrating a few hundred TBs from one 
solution to another already keeps me up at night...   PBs would make me 
go insane.  This is the use case I'm not sure of with regards to GFS.

Does GFS somehow get around the 1TB block device issue?  Just how large 
can a single exported filesystem be with GFS?

Our current (homegrown) solution will scale very well for quite some 
time, but eventually we're going to get saturated with write requests to 
individual head units.  Does GFS intelligently "spread the load" among 
multiple storage entities for writing under high load?  Does it always 
write to any available storage units, or are there thresholds where it 
expands the pool of units it writes to?  (I'm not sure I'm making much 
sense, but we'll see if any of you grok it :)

In the event of some multiple-catastrophe failure (where some data isn't 
online at all, let alone redundant), how graceful is GFS?  Does it "rope 
off" the data that's not available and still allow full access to the 
data that is?  Or does the whole cluster go down?

I notice the pricing for GFS is $2200.  Is that per seat?  And if so, 
what's a "seat"?  Each client?  Each server with storage participating 
in the cluster?  Both?  Some other distinction?

Is AS a prereq for clients?  Servers?  Both?  Or will ES and WS boxes be 
able to participate as well?

Whew, that should be enough to get us started.

Thanks in advance!

Don




-------------- next part --------------
A non-text attachment was scrubbed...
Name: don.vcf
Type: text/x-vcard
Size: 253 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040713/c0f0bf5e/attachment.vcf>

From notiggy at gmail.com  Tue Jul 13 22:50:00 2004
From: notiggy at gmail.com (Brian Jackson)
Date: Tue, 13 Jul 2004 17:50:00 -0500
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <40F460BB.4040603@smugmug.com>
References: <40F460BB.4040603@smugmug.com>
Message-ID: <fb20c214040713155014290b6e@mail.gmail.com>

On Tue, 13 Jul 2004 15:22:51 -0700, Don MacAskill <don at smugmug.com> wrote:
> 
> Hi there,
> 
> I've been peripherally following GFS's progress for the last two years
> or so, and I'm very interested in using it.  We were already on Red Hat
> when Sistina was acquired, so I've been waiting to see what Red Hat will
> do with it.   But before I get ahold of the sales people, I thought I'd
> find out a little more about it.
> 
> We have two use cases where I can see it being useful:
> 
> - For our web server clusters to share a single "snapshot" of our
> application code amongst themselves.  GFS obviously functions great in
> this environment and would be useful.
> 
> - For our backend image data storage.  We currently have 35TB of
> storage, and it's growing at a rapid rate.  I'd like to be able to scale
> into hundreds of petabytes some day, and would like to select a solution
> early that will scale large.  Migrating a few hundred TBs from one
> solution to another already keeps me up at night...   PBs would make me
> go insane.  This is the use case I'm not sure of with regards to GFS.
> 
> Does GFS somehow get around the 1TB block device issue?  Just how large
> can a single exported filesystem be with GFS?

The code that most people on this list are interested in currently is
the code in cvs which is for 2.6 only. 2.6 has a config option to
enable using devices larger than 2TB. I'm still reading through all
the GFS code, but it's still architecturally the same as when it was
closed source, so I'm pretty sure most of my knowledge from OpenGFS
will still apply. GFS uses 64bit values internally, so you can have
very large filesystems (larger than PBs).

> 
> Our current (homegrown) solution will scale very well for quite some
> time, but eventually we're going to get saturated with write requests to
> individual head units.  Does GFS intelligently "spread the load" among
> multiple storage entities for writing under high load?

No, each node that mounts has direct access to the storage. It writes
just like any other fs, when it can.

> Does it always
> write to any available storage units, or are there thresholds where it
> expands the pool of units it writes to?  (I'm not sure I'm making much
> sense, but we'll see if any of you grok it :)

I think you may have a little misconception about just what GFS is.
You should check the WHATIS_OpenGFS doc at
http://opengfs.sourceforge.net/docs.php It says OpenGFS, but for the
most part, the same stuff applies to GFS.

> 
> In the event of some multiple-catastrophe failure (where some data isn't
> online at all, let alone redundant), how graceful is GFS?  Does it "rope
> off" the data that's not available and still allow full access to the
> data that is?  Or does the whole cluster go down?

That's a good question that I don't know the answer to. But I'd
imagine that it wouldn't be terribly happy. Sorry I don't know more.
Maybe one of the GFS devs will know better.

> 
> I notice the pricing for GFS is $2200.  Is that per seat?  And if so,
> what's a "seat"?  Each client?  Each server with storage participating
> in the cluster?  Both?  Some other distinction?

Now I definitely know you have some misconception. GFS doesn't have
any concept of server and client. All nodes mount the fs directly
since they are all directly connected to the storage.

> 
> Is AS a prereq for clients?  Servers?  Both?  Or will ES and WS boxes be
> able to participate as well?

I'll punt to Red Hat people here.

> 
> Whew, that should be enough to get us started.
> 
> Thanks in advance!
> 
> Don
> 

--Brian Jackson



From rbrown at metservice.com  Tue Jul 13 23:48:53 2004
From: rbrown at metservice.com (Royce Brown)
Date: Wed, 14 Jul 2004 11:48:53 +1200
Subject: [Linux-cluster] node failing 
Message-ID: <200407141148292.SM01912@rbrown>

Hi,

I am trying to track down a problem I'll been having with the clustering
software on redhat 3.0 (supplied rpm's).  

I am running a 2 cluster node using Multicast Heartbeat, Network Tiebreaker
IP address and have bonded Ethernet interfaces to different switches. 

The problem is that you start the cluster and everything is working fine and
then suddenly one node (always the same one) thinks the other node has
become Inactive. Its gets into a state where one node

thinks both nodes are active and the other node only thinks it is active. 

There is no networking problems that I can see. On the bad node I can ping
the other node by it's address and the multicast address. I have full debug
mode on, but the log files don't show anything.

 

Has any one else seen this problem or can give me some tips what to look at
next ?

 

Cheers

Royce

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040714/df9adfa6/attachment.htm>

From kpreslan at redhat.com  Tue Jul 13 23:55:19 2004
From: kpreslan at redhat.com (Ken Preslan)
Date: Tue, 13 Jul 2004 18:55:19 -0500
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <40F460BB.4040603@smugmug.com>
References: <40F460BB.4040603@smugmug.com>
Message-ID: <20040713235519.GA11119@potassium.msp.redhat.com>

On Tue, Jul 13, 2004 at 03:22:51PM -0700, Don MacAskill wrote:
> Does GFS somehow get around the 1TB block device issue?  Just how large 
> can a single exported filesystem be with GFS?

On Linux 2.4-based kernels, the limit is 1TB.  On 2.6-based kernels, the
limit is 8TB on 32-bit systems and some really large number (at least
exabytes) on 64-bit systems.

> Our current (homegrown) solution will scale very well for quite some 
> time, but eventually we're going to get saturated with write requests to 
> individual head units.  Does GFS intelligently "spread the load" among 
> multiple storage entities for writing under high load?  Does it always 
> write to any available storage units, or are there thresholds where it 
> expands the pool of units it writes to?  (I'm not sure I'm making much 
> sense, but we'll see if any of you grok it :)

Our current allocation methods try to allocate from areas of the disk
where there isn't much contention for the allocation bitmap locks.  It
doesn't know anything about spreading load on the basis of disk load.
(That would be an interesting thing to add, but we don't have any plans
to do so for the short term.)

> In the event of some multiple-catastrophe failure (where some data isn't 
> online at all, let alone redundant), how graceful is GFS?  Does it "rope 
> off" the data that's not available and still allow full access to the 
> data that is?  Or does the whole cluster go down?

Right now, a malfunctioning or non-present disk can cause the whole
cluster to go down.  That's assuming the error isn't masked by hardware
RAID or CLVM mirroing (when we get there).

One of the next projects on my plate is fixing the filesystem so that a
node will gracefully withdraw itself from the cluster when it sees a
malfunctioning storage device.  Each node will stay up and could
potentially be able to continue accessing other GFS filesystems on
other storage devices.

I/We haven't thought much about trying to get GFS to continue to function
when only part of a filesystem is present.

> I notice the pricing for GFS is $2200.  Is that per seat?  And if so, 
> what's a "seat"?  Each client?  Each server with storage participating 
> in the cluster?  Both?  Some other distinction?

I'm not a marketing/sales person, just a code monkey, so take this with
a grain of salt:  It's per node running the filesystem.  I don't think
machines running GULM lock servers or GNBD block servers count as machine
that need to be paid for.

> Is AS a prereq for clients?  Servers?  Both?  Or will ES and WS boxes be 
> able to participate as well?

According to the web page, you should be able to add a GFS entitlement to
all RHEL trimlines (WS, ES, and AS).

http://www.redhat.com/apps/commerce/rha/gfs/

-- 
Ken Preslan <kpreslan at redhat.com>



From ebpeele2 at pams.ncsu.edu  Wed Jul 14 02:18:37 2004
From: ebpeele2 at pams.ncsu.edu (Elliot Peele)
Date: Tue, 13 Jul 2004 22:18:37 -0400
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <20040713235519.GA11119@potassium.msp.redhat.com>
References: <40F460BB.4040603@smugmug.com>
	<20040713235519.GA11119@potassium.msp.redhat.com>
Message-ID: <1089771517.11645.8.camel@localhost.localdomain>

On Tue, 2004-07-13 at 18:55 -0500, Ken Preslan wrote:
> On Tue, Jul 13, 2004 at 03:22:51PM -0700, Don MacAskill wrote:
> > Does GFS somehow get around the 1TB block device issue?  Just how large 
> > can a single exported filesystem be with GFS?
> 
> On Linux 2.4-based kernels, the limit is 1TB.  On 2.6-based kernels, the
> limit is 8TB on 32-bit systems and some really large number (at least
> exabytes) on 64-bit systems.

The file system size limit under 2.4 is 2TB, this can be changed to 4TB
if your kernel has the LBD (Large Block Device) patches. Really to only
change is using a unsigned int instead of a signed int.

There are rpms for GFS for 2.4.21-15.EL. I have kernel packages for
2.4.21-15.EL that have xfs and lbd patches if you want them.

Elliot
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040713/0c7bc008/attachment.sig>

From don at smugmug.com  Wed Jul 14 03:17:07 2004
From: don at smugmug.com (Don MacAskill)
Date: Tue, 13 Jul 2004 20:17:07 -0700
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <fb20c214040713155014290b6e@mail.gmail.com>
References: <40F460BB.4040603@smugmug.com>
	<fb20c214040713155014290b6e@mail.gmail.com>
Message-ID: <40F4A5B3.7010806@smugmug.com>



Brian Jackson wrote:

> 
> The code that most people on this list are interested in currently is
> the code in cvs which is for 2.6 only. 2.6 has a config option to
> enable using devices larger than 2TB. I'm still reading through all
> the GFS code, but it's still architecturally the same as when it was
> closed source, so I'm pretty sure most of my knowledge from OpenGFS
> will still apply. GFS uses 64bit values internally, so you can have
> very large filesystems (larger than PBs).
> 

This is nice.  I was specifically thinking of 64bit machines, in which 
case, I'd expect it to be 9EB or something.

> 
>>Our current (homegrown) solution will scale very well for quite some
>>time, but eventually we're going to get saturated with write requests to
>>individual head units.  Does GFS intelligently "spread the load" among
>>multiple storage entities for writing under high load?
> 
> 
> No, each node that mounts has direct access to the storage. It writes
> just like any other fs, when it can.
> 

So, if I have a dozen seperate arrays in a given cluster, it will write 
data linearly to array #1, then array #2, then array #3?  If that's the 
case, GFS doesn't solve my biggest fear - write performance with a huge 
influx of data.  I'd hoped it might somehow "stripe" the data across 
individual units so that we can aggregate the combined interface 
bandwidth to some extent.


> 
>>Does it always
>>write to any available storage units, or are there thresholds where it
>>expands the pool of units it writes to?  (I'm not sure I'm making much
>>sense, but we'll see if any of you grok it :)
> 
> 
> I think you may have a little misconception about just what GFS is.
> You should check the WHATIS_OpenGFS doc at
> http://opengfs.sourceforge.net/docs.php It says OpenGFS, but for the
> most part, the same stuff applies to GFS.
> 

I've read it, and quite a few other documents and whitepapers on GFS 
quite a few times, but perhaps you're right - I must be missing 
something.  More on this below...


>>I notice the pricing for GFS is $2200.  Is that per seat?  And if so,
>>what's a "seat"?  Each client?  Each server with storage participating
>>in the cluster?  Both?  Some other distinction?
> 
> 
> Now I definitely know you have some misconception. GFS doesn't have
> any concept of server and client. All nodes mount the fs directly
> since they are all directly connected to the storage.
> 

Hmm, yes, this is probably my sticking point.  It was my understanding 
(or maybe just my hope?) that servers could participate as "storage 
units" in the cluster by exporting their block devices, in addition to 
FC or iSCSI or whatever devices which aren't techincally 'servers'.

In other words, I was thinking/hoping that the cluster consisted of 
block units aggregated into a filesystem, and that the filesystem could 
consist of FC RAID devices, iSCSI solutions, and "dumb servers" that 
just exported their local disks to the cluster FS.

Am I totally wrong?  I guess it's GNDB I don't totally understand, so 
I'd better go read up on it.

Thanks,

Don

-------------- next part --------------
A non-text attachment was scrubbed...
Name: don.vcf
Type: text/x-vcard
Size: 253 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040713/e77f6ba8/attachment.vcf>

From don at smugmug.com  Wed Jul 14 03:25:35 2004
From: don at smugmug.com (Don MacAskill)
Date: Tue, 13 Jul 2004 20:25:35 -0700
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <20040713235519.GA11119@potassium.msp.redhat.com>
References: <40F460BB.4040603@smugmug.com>
	<20040713235519.GA11119@potassium.msp.redhat.com>
Message-ID: <40F4A7AF.7030009@smugmug.com>


Ken Preslan wrote:


> 
> 
> Our current allocation methods try to allocate from areas of the disk
> where there isn't much contention for the allocation bitmap locks.  It
> doesn't know anything about spreading load on the basis of disk load.
> (That would be an interesting thing to add, but we don't have any plans
> to do so for the short term.)
> 
> 

My use case isn't very standard.  Rather than needing tons of read/write 
random access all over the disk, we're almost completely  linear 
write-once-per-file, read-many operations.

We do photo sharing and storage.  So lots and lots of photos get 
uploaded, and they're serially stored on disk.  Once they're on disk, 
though, they're rarely modified.  Just read.

It's forseeable in the future, though, to where we can't push these 
linear writes to disk fast enough as people upload photos.  Either the 
interface (GigE, iSCSI, Fibre Channel) isn't fast enough or whatever. 
It's way out in the future, but it'll come faster than I like to think 
about.

In that case, we need a nice way to spread those writes across multiple 
disks/servers/whatever.  GigE bonding might solve it temporarily, but 
that can only last so far.

Ideally, I want to scale horizontally (tons of cheap linx boxes attached 
to big disks) and have the writes "passed out" among those boxes.  If I 
have to write my own stuff to do that, fine.  But if GFS can potentially 
provide something along those lines down the road, great.


>>In the event of some multiple-catastrophe failure (where some data isn't 
>>online at all, let alone redundant), how graceful is GFS?  Does it "rope 
>>off" the data that's not available and still allow full access to the 
>>data that is?  Or does the whole cluster go down?
> 
> 
> Right now, a malfunctioning or non-present disk can cause the whole
> cluster to go down.  That's assuming the error isn't masked by hardware
> RAID or CLVM mirroing (when we get there).
> 
> One of the next projects on my plate is fixing the filesystem so that a
> node will gracefully withdraw itself from the cluster when it sees a
> malfunctioning storage device.  Each node will stay up and could
> potentially be able to continue accessing other GFS filesystems on
> other storage devices.
> 
> I/We haven't thought much about trying to get GFS to continue to function
> when only part of a filesystem is present.
> 

When I'm talking about petabytes, this weighs on my mind heavily.  I 
can't have some power outage take out a couple of nodes which may have 
both sets of "redundant data" for, say, 10TB, take down a 20PB cluster.

I realize 20PB sounds fairly ridiculous at the moment, but I can see it 
coming.  And it's a management nightmare when it's spread across small 
1TB block devices all over the place instead of an aggregate volume. 
I'm sure it's a software nightmare to think of the aggregate volume, but 
that's not my problem.  :)


> 
>>I notice the pricing for GFS is $2200.  Is that per seat?  And if so, 
>>what's a "seat"?  Each client?  Each server with storage participating 
>>in the cluster?  Both?  Some other distinction?
> 
> 
> I'm not a marketing/sales person, just a code monkey, so take this with
> a grain of salt:  It's per node running the filesystem.  I don't think
> machines running GULM lock servers or GNBD block servers count as machine
> that need to be paid for.
> 

Looks like I have more reading to do, since apparently I don't totally 
get what a GNDB block server is.  Or a GULM lock server, for that matter.


> 
>>Is AS a prereq for clients?  Servers?  Both?  Or will ES and WS boxes be 
>>able to participate as well?
> 
> 
> According to the web page, you should be able to add a GFS entitlement to
> all RHEL trimlines (WS, ES, and AS).
> 
> http://www.redhat.com/apps/commerce/rha/gfs/
> 

Thanks!

Don

-------------- next part --------------
A non-text attachment was scrubbed...
Name: don.vcf
Type: text/x-vcard
Size: 253 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040713/fe3c141f/attachment.vcf>

From don at smugmug.com  Wed Jul 14 03:42:48 2004
From: don at smugmug.com (Don MacAskill)
Date: Tue, 13 Jul 2004 20:42:48 -0700
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <1089771517.11645.8.camel@localhost.localdomain>
References: <40F460BB.4040603@smugmug.com>	
	<20040713235519.GA11119@potassium.msp.redhat.com>
	<1089771517.11645.8.camel@localhost.localdomain>
Message-ID: <40F4ABB8.2000206@smugmug.com>



Elliot Peele wrote:

> On Tue, 2004-07-13 at 18:55 -0500, Ken Preslan wrote:
> 
>>On Tue, Jul 13, 2004 at 03:22:51PM -0700, Don MacAskill wrote:
>>
>>>Does GFS somehow get around the 1TB block device issue?  Just how large 
>>>can a single exported filesystem be with GFS?
>>
>>On Linux 2.4-based kernels, the limit is 1TB.  On 2.6-based kernels, the
>>limit is 8TB on 32-bit systems and some really large number (at least
>>exabytes) on 64-bit systems.
> 
> 
> The file system size limit under 2.4 is 2TB, this can be changed to 4TB
> if your kernel has the LBD (Large Block Device) patches. Really to only
> change is using a unsigned int instead of a signed int.
> 
> There are rpms for GFS for 2.4.21-15.EL. I have kernel packages for
> 2.4.21-15.EL that have xfs and lbd patches if you want them.
> 
> Elliot

I'd love to take a look at the LBD patches, yes.  I've currently got 
systems with 2 1.2TB filesystems attached, and I'd really like to use md 
or LVM or something to combine them to be one fs.  But that goes beyond 
the 2TB limit....  :)

I'm on 2.4.21-15.0.3.EL right now, but I can hop back a revision to play 
with this.

I wish we could use XFS, but until RH supports it, I'm afraid it's a 
no-go.  Sucks, too, since I had to migrate many TBs of storage from XFS 
to ext3 when we moved from SuSE Enterprise to RHEL3.   What a pain...

Thanks!

Don

-------------- next part --------------
A non-text attachment was scrubbed...
Name: don.vcf
Type: text/x-vcard
Size: 253 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040713/842409a6/attachment.vcf>

From notiggy at gmail.com  Wed Jul 14 03:59:59 2004
From: notiggy at gmail.com (Brian Jackson)
Date: Tue, 13 Jul 2004 22:59:59 -0500
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <40F4A5B3.7010806@smugmug.com>
References: <40F460BB.4040603@smugmug.com>
	<fb20c214040713155014290b6e@mail.gmail.com>
	<40F4A5B3.7010806@smugmug.com>
Message-ID: <fb20c21404071320591ce7894d@mail.gmail.com>

On Tue, 13 Jul 2004 20:17:07 -0700, Don MacAskill <don at smugmug.com> wrote:
> 
> 
> Brian Jackson wrote:
> 
> >
> > The code that most people on this list are interested in currently is
> > the code in cvs which is for 2.6 only. 2.6 has a config option to
> > enable using devices larger than 2TB. I'm still reading through all
> > the GFS code, but it's still architecturally the same as when it was
> > closed source, so I'm pretty sure most of my knowledge from OpenGFS
> > will still apply. GFS uses 64bit values internally, so you can have
> > very large filesystems (larger than PBs).
> >
> 
> This is nice.  I was specifically thinking of 64bit machines, in which
> case, I'd expect it to be 9EB or something.
> 
> >
> >>Our current (homegrown) solution will scale very well for quite some
> >>time, but eventually we're going to get saturated with write requests to
> >>individual head units.  Does GFS intelligently "spread the load" among
> >>multiple storage entities for writing under high load?
> >
> >
> > No, each node that mounts has direct access to the storage. It writes
> > just like any other fs, when it can.
> >
> 
> So, if I have a dozen seperate arrays in a given cluster, it will write
> data linearly to array #1, then array #2, then array #3?  If that's the
> case, GFS doesn't solve my biggest fear - write performance with a huge
> influx of data.  I'd hoped it might somehow "stripe" the data across
> individual units so that we can aggregate the combined interface
> bandwidth to some extent.

That's not the job of the filesystem, that should be done at the block
layer with clvm/evms2/etc.

> 
> 
> >
> >>Does it always
> >>write to any available storage units, or are there thresholds where it
> >>expands the pool of units it writes to?  (I'm not sure I'm making much
> >>sense, but we'll see if any of you grok it :)
> >
> >
> > I think you may have a little misconception about just what GFS is.
> > You should check the WHATIS_OpenGFS doc at
> > http://opengfs.sourceforge.net/docs.php It says OpenGFS, but for the
> > most part, the same stuff applies to GFS.
> >
> 
> I've read it, and quite a few other documents and whitepapers on GFS
> quite a few times, but perhaps you're right - I must be missing
> something.  More on this below...
> 
> 
> >>I notice the pricing for GFS is $2200.  Is that per seat?  And if so,
> >>what's a "seat"?  Each client?  Each server with storage participating
> >>in the cluster?  Both?  Some other distinction?
> >
> >
> > Now I definitely know you have some misconception. GFS doesn't have
> > any concept of server and client. All nodes mount the fs directly
> > since they are all directly connected to the storage.
> >
> 
> Hmm, yes, this is probably my sticking point.  It was my understanding
> (or maybe just my hope?) that servers could participate as "storage
> units" in the cluster by exporting their block devices, in addition to
> FC or iSCSI or whatever devices which aren't techincally 'servers'.

You can technically use anything that the kernel sees as a block
device, but I'd hesitate to put gnbd (and a few other solutions) into
a production environment currently.

> 
> In other words, I was thinking/hoping that the cluster consisted of
> block units aggregated into a filesystem, and that the filesystem could
> consist of FC RAID devices, iSCSI solutions, and "dumb servers" that
> just exported their local disks to the cluster FS.

Like I said, you can techincally do it, but it's not the filesystems
job, that should all happen at the block layer.

> 
> Am I totally wrong?  I guess it's GNDB I don't totally understand, so
> I'd better go read up on it.

GNBD is just a way to export a block device to another host over a
network (similar in concept to iSCSI/HyperSCSI)

--Brian

> 
> Thanks,
> 
> Don
> 
> 
> 
>



From ebpeele2 at pams.ncsu.edu  Wed Jul 14 04:36:20 2004
From: ebpeele2 at pams.ncsu.edu (Elliot Peele)
Date: Wed, 14 Jul 2004 00:36:20 -0400
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <40F4ABB8.2000206@smugmug.com>
References: <40F460BB.4040603@smugmug.com>
	<20040713235519.GA11119@potassium.msp.redhat.com>
	<1089771517.11645.8.camel@localhost.localdomain>
	<40F4ABB8.2000206@smugmug.com>
Message-ID: <1089779780.11645.14.camel@localhost.localdomain>

On Tue, 2004-07-13 at 20:42 -0700, Don MacAskill wrote:
> I'd love to take a look at the LBD patches, yes.  I've currently got 
> systems with 2 1.2TB filesystems attached, and I'd really like to use md 
> or LVM or something to combine them to be one fs.  But that goes beyond 
> the 2TB limit....  :)

You can find my rpms at:

ftp://mirror.physics.ncsu.edu/pub/contrib/ebpeele2/cls_xfs

and gfs rpms for that kernel at:

ftp://mirror.physics.ncsu.edu/pub/contrib/ebpeele2/cls_gfs

Elliot
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040714/851653cd/attachment.sig>

From hlawatschek at atix.de  Wed Jul 14 09:48:19 2004
From: hlawatschek at atix.de (Mark Hlawatschek)
Date: Wed, 14 Jul 2004 11:48:19 +0200
Subject: [Linux-cluster] GNBD, how good it is ?
Message-ID: <1089798498.5012.7.camel@falballa.gallien.atix>

Hi,

you'll find our iSCSI target server based on Intel's iSCSI reference
implementation near
 
http://www.atix.de/iscsi-target

Brian Jackson wrote:
> 
> > iSCSI and HyperSCSI both work with GFS, so those are options. I
> > suppose you'd be better off answering the question of whether they are
> > stable enough for you.
> 
> Speaking of iSCSI, is anyone aware of a GPL Linux 2.6 iSCSI target?
> 
-- 
Gruss / Regards,

Dipl.-Ing. Mark Hlawatschek

**
ATIX - Ges. fuer Informationstechnologie und Consulting mbh
Einsteinstr. 10
D-85716 Unterschleissheim

Company HomePage: www.atix.de
SAN Division    : www.san-time.com






From lhh at redhat.com  Wed Jul 14 15:20:04 2004
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 14 Jul 2004 11:20:04 -0400
Subject: [Linux-cluster] node failing
In-Reply-To: <200407141148292.SM01912@rbrown>
References: <200407141148292.SM01912@rbrown>
Message-ID: <1089818404.31623.88.camel@atlantis.boston.redhat.com>

On Wed, 2004-07-14 at 11:48 +1200, Royce Brown wrote:

> I am trying to track down a problem I?ll been having with the
> clustering software on redhat 3.0 (supplied rpm?s).  

This would be taroon-list material, actually.

> I am running a 2 cluster node using Multicast Heartbeat, Network
> Tiebreaker IP address and have bonded Ethernet interfaces to different
> switches. 

Good.  Try running in HA-bonded/failover mode if you're not already.

> There is no networking problems that I can see. On the bad node I can
> ping the other node by it?s address and the multicast address. I have
> full debug mode on, but the log files don?t show anything.

You should file a support ticket with Red Hat Support:

http://www.redhat.com/apps/support

> Has any one else seen this problem or can give me some tips what to
> look at next ?

Try the latest package from the RHN beta channel if you have access to
it, it fixes a problem which causes membership to enter an infinite loop
in some cases where timeouts occurred.  The infinite loop causes
multiple clumembd (or cluquorumd) processes to appear.

Here's a ref to the bugzilla:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=126316

-- Lon



From cknowlton at science.edu  Wed Jul 14 16:08:10 2004
From: cknowlton at science.edu (Carlos Knowlton)
Date: Wed, 14 Jul 2004 11:08:10 -0500
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <20040713235519.GA11119@potassium.msp.redhat.com>
References: <40F460BB.4040603@smugmug.com>
	<20040713235519.GA11119@potassium.msp.redhat.com>
Message-ID: <40F55A6A.4010809@science.edu>

Ken Preslan wrote:

>On Tue, Jul 13, 2004 at 03:22:51PM -0700, Don MacAskill wrote:
>  
>
>>Does GFS somehow get around the 1TB block device issue?  Just how large 
>>can a single exported filesystem be with GFS?
>>    
>>
>
>On Linux 2.4-based kernels, the limit is 1TB.  On 2.6-based kernels, the
>limit is 8TB on 32-bit systems and some really large number (at least
>exabytes) on 64-bit systems.
>  
>
On the 2.6 kernel, I've heard that the Ext2/3fs can handle 16TB by 
increasing the block size (I'm not sure if that number is with 4K or 8K 
blocks). 

Is it possible to increase the FS capacity in GFS on 32bit systems in 
the same way (ie, by increasing the block size)?  If so, what is the 
maximum block size supported in GFS

Thanks!
Carlos



From kpreslan at redhat.com  Wed Jul 14 17:18:17 2004
From: kpreslan at redhat.com (Ken Preslan)
Date: Wed, 14 Jul 2004 12:18:17 -0500
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <40F55A6A.4010809@science.edu>
References: <40F460BB.4040603@smugmug.com>
	<20040713235519.GA11119@potassium.msp.redhat.com>
	<40F55A6A.4010809@science.edu>
Message-ID: <20040714171817.GA14278@potassium.msp.redhat.com>

On Wed, Jul 14, 2004 at 11:08:10AM -0500, Carlos Knowlton wrote:
> >On Linux 2.4-based kernels, the limit is 1TB.  On 2.6-based kernels, the
> >limit is 8TB on 32-bit systems and some really large number (at least
> >exabytes) on 64-bit systems.
> > 
> >
> On the 2.6 kernel, I've heard that the Ext2/3fs can handle 16TB by 
> increasing the block size (I'm not sure if that number is with 4K or 8K 
> blocks). 
> 
> Is it possible to increase the FS capacity in GFS on 32bit systems in 
> the same way (ie, by increasing the block size)?  If so, what is the 
> maximum block size supported in GFS

The difference in quoted max size limits (1 or 2 TB on 2.4 and 8 or 16 TB
on 2.6) shows up because some people trust block/page numbers that use the
sign bit and some don't.  It may very well be possible to go to larger
sizes on your hardware and drivers, but you need to check to verify that
yourself.  I'm paranoid and will quote you the smaller number. :-)

-- 
Ken Preslan <kpreslan at redhat.com>



From deks at sbcglobal.net  Wed Jul 14 18:35:21 2004
From: deks at sbcglobal.net (Dexter Eugenio)
Date: Wed, 14 Jul 2004 11:35:21 -0700 (PDT)
Subject: [Linux-cluster] GFS configuration help
Message-ID: <20040714183521.92044.qmail@web81704.mail.yahoo.com>

Hi,

Need your help for my proposed setup below.

1. machine1 is the host server that is connected to the SAN, mounts the filesystem. This
machine has read-write capability on the filesystem.
2. machine2, machine3, machine4 etc..  are the servers that mounts the same filesystem
with read only capability. they are connected to a fiber switch.
3. future machines will be connected to the switch to mount the same filesystem as read
only.

Upon reading various docs, it says that i have to configure clustering? I'm not sure if
that is needed in my setup. All i want to do is to emulate NFS capability, but instead of
using the network, my machines are connected directly to the SAN. I might be wrong?

Btw, i'm running RH ES 3.0 and has the GFS rpms from Redhat. I've installed it fine but I
have no idea on how to configure it. 

I hope you can help me with any information you can give.

Regards,
Deks



From danderso at redhat.com  Wed Jul 14 18:48:28 2004
From: danderso at redhat.com (Derek Anderson)
Date: Wed, 14 Jul 2004 13:48:28 -0500
Subject: [Linux-cluster] GFS configuration help
In-Reply-To: <20040714183521.92044.qmail@web81704.mail.yahoo.com>
References: <20040714183521.92044.qmail@web81704.mail.yahoo.com>
Message-ID: <200407141348.28153.danderso@redhat.com>

Deks,

The GFS 6.0.0 Administrators Guide should walk you through everything you need 
to do:

http://www.redhat.com/docs/manuals/csgfs/admin-guide/

On Wednesday 14 July 2004 13:35, Dexter Eugenio wrote:
> Hi,
>
> Need your help for my proposed setup below.
>
> 1. machine1 is the host server that is connected to the SAN, mounts the
> filesystem. This machine has read-write capability on the filesystem.
> 2. machine2, machine3, machine4 etc..  are the servers that mounts the same
> filesystem with read only capability. they are connected to a fiber switch.
> 3. future machines will be connected to the switch to mount the same
> filesystem as read only.
>
> Upon reading various docs, it says that i have to configure clustering? I'm
> not sure if that is needed in my setup. All i want to do is to emulate NFS
> capability, but instead of using the network, my machines are connected
> directly to the SAN. I might be wrong?
>
> Btw, i'm running RH ES 3.0 and has the GFS rpms from Redhat. I've installed
> it fine but I have no idea on how to configure it.
>
> I hope you can help me with any information you can give.
>
> Regards,
> Deks
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From kpreslan at redhat.com  Wed Jul 14 18:46:45 2004
From: kpreslan at redhat.com (Ken Preslan)
Date: Wed, 14 Jul 2004 13:46:45 -0500
Subject: [Linux-cluster] GFS Performance
In-Reply-To: <91C4F1A7C418014D9F88E938C13554584B2D5C@mwjdc2.mweb.com>
References: <91C4F1A7C418014D9F88E938C13554584B2D5C@mwjdc2.mweb.com>
Message-ID: <20040714184645.GA15045@potassium.msp.redhat.com>

It's not the directory that that's causing the slowness, but the fact
that the "ls -la" tries to a stat() on the file that's being written
to by node 1.  Node 1 has to sync out all the dirty data in its cache
before it can release the lock to node 2.  This can take a while if Node
1 has a big (and full) cache.

You can do a ls without the -l option, so it won't stat() the files in
the directory.  That should be faster.

The ultimate solution is to add buffer forwarding to GFS, so node 1 can
give node 2 stat() information without having to flush all its data.
But that's a ways off.


On Thu, Jul 08, 2004 at 02:27:38PM +0200, Richard Mayhew wrote:
> Hi
> 
> 
> I setup 2 nodes, on my EMC SAN. Both nodes see the storage and can
> access the cca device.
> When writing a file to the storage fs, the second node takes a couple of
> seconds to see the changes.
> 
> Ie. 
> 1. Node 1 Creates the file "dd if=/dev/zero of=test.file bs=4096
> count=10240000"
> 2. Doing a ls -la on node 2 takes a few seconds to display the contents
> of the dir.
> 
> After the file has finished being updates, all listings of that dir are
> quick, but if any changes are made, one again has to wait for the system
> to display the contents of the dir.
> 
> Any idea?
> 
> 
> 
> --
> 
> Regards
> 
> Richard Mayhew
> Unix Specialist
> 
> MWEB Business
> Tel:  + 27 11 340 7200
> Fax:  + 27 11 340 7288
> Website: www.mwebbusiness.co.za
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Ken Preslan <kpreslan at redhat.com>



From ninja at slaphack.com  Thu Jul 15 03:36:05 2004
From: ninja at slaphack.com (David Masover)
Date: Wed, 14 Jul 2004 22:36:05 -0500
Subject: [Linux-cluster] (gfs or coda) and reiser4
Message-ID: <40F5FBA5.3070808@slaphack.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1




I've been having trouble finding a good distributed FS.  All I want is
one server with a ton of space and some aggressive client-side caching
- -- so, basically, CODA (or maybe GFS).  But I like the performance of
Reiser4, and I like the InterMezzo concept, where the local cache might
even be used as an ordinary filesystem in a pinch.

Also, by God, on the server side, Samba and NFS are the two easiest to
set up, in that order.  I should NOT need a dedicated partition -- it
should just access local files.

I like the speed of reiser4 (among other things), so I want to use that
for the data storage and cache.  What I really want is a working
implementation of InterMezzo on Linux 2.6, but that isn't going to happen.

It looks like the only way to do what I want here is to code something
myself.  Is there a better way?





-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iQIVAwUBQPX7pHgHNmZLgCUhAQJgHg/+LtxH0lXbfZ+2eP2CEn84hLUZqUWYR81n
3qugRt+jW73/RssQJwEMZymjGwZqZvKP7T4nk1wSjSNo5okiqIHgH8wKCgQtMqzS
RZlNAUxFbs2Z4sah07i2Tqt8rRKbM2ppT4I11WCcPJ5MD6u2mI/pmGJZE1XjGSz/
iygz+PHEbUqegwbnn2ayHC3oc1YZXDwGxDZjdjPUdWylU152RT8BfauaIclsrVTJ
bVck9Uofax6aDkxCBgE811/ePTnmm8Hwf02V2aIFrPg9qZkXXK+zBH9R61nTtyzV
SI8E1yGcjvYXe/0ywomnxYuis2A8M/x4Yv/0A5zLmgngu6x3bVzKqBn8HijedWc5
H9i5KZVrFMzYg+y4QMAb1EHHOPeFAt2yI5w2S+qaZ6rcmiExvgKuwHIUqnPH0hRp
GWuF6jbpMgN9PsqueqXQO4rU8D72skwx2K+P77juOXiB5lryXelfTN05VfsULkdy
oDa1w5xViUsAq0JVozi7k615eSMoKFVHmU/9CRO3nUKMZdA3iMaXpv1BMJf+j4mL
0PMgoHZTGAbYhUdj6V+Uab4arOX6Nwk19Ff4R4UFgZVEfYgQH8/3zpxNhvKm117T
2zP40mByFH6e+PkIpulBvciUWdtqaH9sqm8ppfgErQZpGW6lTGok/I8dWrJEz2O7
94mmoI/HtyQ=
=WnAk
-----END PGP SIGNATURE-----



From rmayhew at mweb.com  Thu Jul 15 09:38:02 2004
From: rmayhew at mweb.com (Richard Mayhew)
Date: Thu, 15 Jul 2004 11:38:02 +0200
Subject: [Linux-cluster] GFS configuration help
Message-ID: <91C4F1A7C418014D9F88E938C1355458609CAC@mwjdc2.mweb.com>

Hi,
Ill send you the documentation away from this list (its 5MB) 

-----Original Message-----
From: Dexter Eugenio [mailto:deks at sbcglobal.net] 
Sent: 14 July 2004 08:35 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] GFS configuration help

Hi,

Need your help for my proposed setup below.

1. machine1 is the host server that is connected to the SAN, mounts the
filesystem. This machine has read-write capability on the filesystem.
2. machine2, machine3, machine4 etc..  are the servers that mounts the
same filesystem with read only capability. they are connected to a fiber
switch.
3. future machines will be connected to the switch to mount the same
filesystem as read only.

Upon reading various docs, it says that i have to configure clustering?
I'm not sure if that is needed in my setup. All i want to do is to
emulate NFS capability, but instead of using the network, my machines
are connected directly to the SAN. I might be wrong?

Btw, i'm running RH ES 3.0 and has the GFS rpms from Redhat. I've
installed it fine but I have no idea on how to configure it. 

I hope you can help me with any information you can give.

Regards,
Deks

--
Linux-cluster mailing list
Linux-cluster at redhat.com
http://www.redhat.com/mailman/listinfo/linux-cluster




From jake.gold-gfs at hypermediasystems.com  Thu Jul 15 18:21:20 2004
From: jake.gold-gfs at hypermediasystems.com (Jake Gold)
Date: Thu, 15 Jul 2004 11:21:20 -0700
Subject: [Linux-cluster] GFS configuration help
In-Reply-To: <91C4F1A7C418014D9F88E938C1355458609CAC@mwjdc2.mweb.com>
References: <91C4F1A7C418014D9F88E938C1355458609CAC@mwjdc2.mweb.com>
Message-ID: <20040715112120.1b63fbd2.jake.gold-gfs@hypermediasystems.com>

All,

Are there any special concerns or steps when using one read-write node and many read-only nodes?

In this scenerio do you still have to setup all the usual components  in the same way (locking, fencing, ...) ? Can anything be left out when you only have one node doing writes?

How many people are doing this? Anyone know of any how-tos/documents regarding this specific configuration?


Thanks to everyone at Sistina and Red Hat for all their hard work on GFS!

Thanks,
Jake


On Thu, 15 Jul 2004 11:38:02 +0200
"Richard Mayhew" <rmayhew at mweb.com> wrote:

> Hi,
> Ill send you the documentation away from this list (its 5MB) 
> 
> -----Original Message-----
> From: Dexter Eugenio [mailto:deks at sbcglobal.net] 
> Sent: 14 July 2004 08:35 PM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] GFS configuration help
> 
> Hi,
> 
> Need your help for my proposed setup below.
> 
> 1. machine1 is the host server that is connected to the SAN, mounts the
> filesystem. This machine has read-write capability on the filesystem.
> 2. machine2, machine3, machine4 etc..  are the servers that mounts the
> same filesystem with read only capability. they are connected to a fiber
> switch.
> 3. future machines will be connected to the switch to mount the same
> filesystem as read only.
> 
> Upon reading various docs, it says that i have to configure clustering?
> I'm not sure if that is needed in my setup. All i want to do is to
> emulate NFS capability, but instead of using the network, my machines
> are connected directly to the SAN. I might be wrong?
> 
> Btw, i'm running RH ES 3.0 and has the GFS rpms from Redhat. I've
> installed it fine but I have no idea on how to configure it. 
> 
> I hope you can help me with any information you can give.
> 
> Regards,
> Deks
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From ed.mann at choicepoint.com  Thu Jul 15 14:39:35 2004
From: ed.mann at choicepoint.com (Edward Mann)
Date: Thu, 15 Jul 2004 09:39:35 -0500
Subject: [Linux-cluster] gfs 6.0 Firewire
Message-ID: <1089902374.24755.19.camel@storm.cp-direct.com>

Hello,

I am using RedHat Linux Enterprise Server 3. GFS 6.0 and a firewire
drive. I have done the setup and created all the files, created the gfs
drive with gfs_mkfs. But when i go to mount it the mount just hangs and
never returns any errors. I have let it run all night and it still has
not mounted the file system. I have made sure that all my modules are
installed that were listed in the docs. And all other programs are
running. ccsd is running lock_gulmd is running. All i have are two
machines that need to share storage off 1 drive. I am using the
fence_manual. I hope that i am using it right. This is what my fence.ccs
file looks like

fence_devices {
	admin {
		agent = "fence_manual"
	      }
	}

Is this setup right? Any help would be appreciated.

Thanks.




From linux-cluster-subscription at swapoff.org  Fri Jul 16 04:44:37 2004
From: linux-cluster-subscription at swapoff.org (Alec Thomas)
Date: Fri, 16 Jul 2004 14:44:37 +1000
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <40F4A7AF.7030009@smugmug.com>
References: <40F460BB.4040603@smugmug.com>
	<20040713235519.GA11119@potassium.msp.redhat.com>
	<40F4A7AF.7030009@smugmug.com>
Message-ID: <20040716044437.GA22945@swapoff.org>

> It's forseeable in the future, though, to where we can't push these
> linear writes to disk fast enough as people upload photos.  Either the
> interface (GigE, iSCSI, Fibre Channel) isn't fast enough or whatever.
> It's way out in the future, but it'll come faster than I like to think
> about.
>
> In that case, we need a nice way to spread those writes across multiple
> disks/servers/whatever.  GigE bonding might solve it temporarily, but
> that can only last so far.
>

Don,

It seems you want something more like Lustre (http://www.lustre.org/):

	"The central target in this project is the development of Lustre, a
	next-generation cluster file system which can serve clusters with
	10,000's of nodes, petabytes of storage, move 100's of GB/sec with
	state of the art security and management infrastructure."

Alec

--
Evolution: Taking care of those too stupid to take care of themselves.



From mauelshagen at redhat.com  Thu Jul 15 20:35:53 2004
From: mauelshagen at redhat.com (Heinz Mauelshagen)
Date: Thu, 15 Jul 2004 22:35:53 +0200
Subject: [Linux-cluster] *** Announcement: dmraid 1.0.0-rc2 ***
Message-ID: <20040715203553.GA18616@redhat.com>


               *** Announcement: dmraid 1.0.0-rc2 ***

Following a good tradition, dmraid 1.0.0-rc2 is available at
http://people.redhat.com:/~heinzm/sw/dmraid/ in source and i386 rpm,
before I leave for a 2 weeks vacation trip followed by LWE ;)

Won't read my email before July, 30th.

dmraid (Device-Mapper Raid tool) discovers, [de]activates and displays
properties of software RAID sets (ie. ATARAID) and contained DOS
partitions using the device-mapper runtime of the 2.6 kernel.

The following ATARAID types are supported on Linux 2.6:

Highpoint HPT37X
Highpoint HPT45X
Intel Software RAID
Promise FastTrack
Silicon Image Medley

This ATARAID type can be discovered only in this version:
LSI Logic MegaRAID

Please provide insight to support those metadata formats completely.

Thanks.

See files README and CHANGELOG, which come with the source tarball for
prerequisites to run this software, further instructions on installing
and using dmraid!

CHANGELOG is contained below for your convenience as well.


Call for testers:
-----------------

I need testers with the above ATARAID types, to check that the mapping
created by this tool is correct (see options "-t -ay") and access to the ATARAID
data is proper.

You can activate your ATARAID sets without danger of overwriting
your metadata, because dmraid accesses it read-only unless you use
option -E with -r in order to erase ATARAID metadata (see 'man dmraid')!

This is a release candidate version so you want to have backups of your valuable
data *and* you want to test accessing your data read-only first in order to
make sure that the mapping is correct before you go for read-write access.


The author is reachable at <Mauelshagen at RedHat.com>.
Later, I told you ;)

For test results, mapping information, discussions, questions, patches,
enhancement requests and the like, please subscribe and mail
to <ataraid at redhat.com>.

CHANGELOG:
---------

Changelog from dmraid 1.0.0-rc1 to 1.0.0-rc2		2004.07.15

o Intel Software RAID discovery and activation support
o allow more than one format handler name with --format
o display "raid10" sets properly rather than just "mirror"
o enhanced activate.c to handle partial activation of sets (eg, degraded RAID0)
o enhanced command line option checks
o implemented a library context for variables such as debug etc.
o fixed memory leak in discover_partitions
o fixed recursion in _find_set()
o continued writing subsets in case we fail on one because of RAID1
o format handler template update
o fixed dietlibc build
o fixed shared library configure
o use default_list_set() instead of &raid_sets where possible
o name change of list_head members to the more commonly used 'list'
o renamed msdos partition format handler to dos
o lots of inline comments corrected/updated
o streamlined tools/*.[ch]
o moved get.*level() and get_status to metadata.[ch] and changed level
  name to type

--

Regards,
Heinz    -- The LVM Guy --

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Red Hat GmbH
Consulting Development Engineer                   Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen at RedHat.com                            +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-



From d.lelli at surrey.ac.uk  Fri Jul 16 13:18:29 2004
From: d.lelli at surrey.ac.uk (Diego)
Date: Fri, 16 Jul 2004 14:18:29 +0100
Subject: [Linux-cluster] Cluster admin
Message-ID: <001001c46b37$6309daa0$77b4e383@Mmepcfluids08>

Hello everibody,

I built a linux cluster running the RH 9. 

I'd like to know if is there any tool for the general cluster
administration, such 
as add a new user to all the node, put a retrieve fil from the nodes and so
on . 

I had a look to OSCAR, but I don't want to installl again all the machine
that are 
already set-up to go. 

Many Thanks 

Diego 

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.720 / Virus Database: 476 - Release Date: 14/07/2004
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040716/8cba1f3e/attachment.htm>

From anton at hq.310.ru  Fri Jul 16 08:47:49 2004
From: anton at hq.310.ru (=?Windows-1251?B?wO3y7u0gzeX17vDu+Oj1?=)
Date: Fri, 16 Jul 2004 12:47:49 +0400
Subject: [Linux-cluster] patch for gfs, add suiddir option for mount
Message-ID: <61425687.20040716124749@hq.310.ru>

Hi linux-cluster,

I apologize for the previous letter, here full patch

suiddir option for mount
man 8 mount (FreeBSD)

*** gfs_ioctl.h.orig    2004-07-14 13:54:39.000000000 +0400
--- gfs_ioctl.h 2004-07-14 13:57:38.000000000 +0400
***************
*** 213,218 ****
--- 213,219 ----
        unsigned int ar_num_glockd;

        int ar_posixacls;       /* Enable posix acls */
+       int ar_suiddir;         /* suiddir support */
  };

  #endif /* ___GFS_IOCTL_DOT_H__ */
*** inode.c.orig        2004-07-15 19:52:33.000000000 +0400
--- inode.c     2004-07-15 19:55:36.000000000 +0400
***************
*** 1132,1138 ****
        struct posix_acl *acl = NULL;
        struct gfs_alloc *al;
        struct gfs_inode *ip;
!       unsigned int gid;
        int alloc_required;
        int error;
  
--- 1132,1138 ----
        struct posix_acl *acl = NULL;
        struct gfs_alloc *al;
        struct gfs_inode *ip;
!       unsigned int gid, uid;
        int alloc_required;
        int error;
  
***************
*** 1148,1162 ****
        else
                gid = current->fsgid;
  
        al = gfs_alloc_get(dip);
  
        error = gfs_quota_lock_m(dip,
!                                current->fsuid,
                                 gid);
        if (error)
                goto fail;
  
!       error = gfs_quota_check(dip, current->fsuid, gid);
        if (error)
                goto fail_gunlock_q;
  
--- 1148,1172 ----
        else
                gid = current->fsgid;
  
+       if ( (sdp->sd_args.ar_suiddir == TRUE) 
+             && (dip->i_di.di_mode & S_ISUID) ) {
+               if (type == GFS_FILE_DIR)
+                       mode |= S_ISUID;
+               uid = dip->i_di.di_uid;
+               gid = dip->i_di.di_gid;
+       }
+       else
+               uid = current->fsuid;
+ 
        al = gfs_alloc_get(dip);
  
        error = gfs_quota_lock_m(dip,
!                                uid,
                                 gid);
        if (error)
                goto fail;
  
!       error = gfs_quota_check(dip, uid, gid);
        if (error)
                goto fail_gunlock_q;
  
***************
*** 1206,1212 ****
        if (error)
                goto fail_end_trans;
  
!       error = make_dinode(dip, gl, inum, type, mode, current->fsuid, gid);
        if (error)
                goto fail_end_trans;
  
--- 1216,1222 ----
        if (error)
                goto fail_end_trans;
  
!       error = make_dinode(dip, gl, inum, type, mode, uid, gid);
        if (error)
                goto fail_end_trans;
*** mount.c.orig        2004-06-24 12:53:28.000000000 +0400
--- mount.c     2004-07-14 13:59:36.000000000 +0400
***************
*** 110,115 ****
--- 110,118 ----
                else if (!strcmp(x, "upgrade"))
                        args->ar_upgrade = TRUE;

+               else if (!strcmp(x, "suiddir"))
+                       args->ar_suiddir = TRUE;
+
                else if (!strcmp(x, "num_glockd")) {
                        if (!y) {
                                printk("GFS: need argument to num_glockd\n");  

  

-- 
e-mail: anton at hq.310.ru
http://www.310.ru



From john.hearns at clustervision.com  Fri Jul 16 14:48:25 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 16 Jul 2004 15:48:25 +0100
Subject: [Linux-cluster] Cluster admin
In-Reply-To: <001001c46b37$6309daa0$77b4e383@Mmepcfluids08>
References: <001001c46b37$6309daa0$77b4e383@Mmepcfluids08>
Message-ID: <1089989304.14987.12.camel@vigor12>

On Fri, 2004-07-16 at 14:18, Diego wrote:
> Hello everibody,
> 
> I built a linux cluster running the RH 9. 
> 
> I'd like to know if is there any tool for the general cluster
> administration, such 
> as add a new user to all the node, put a retrieve fil from the nodes
> and so on . 
For that sort of cluster, you would get better advice on the Beowulf
list.

There are parallel utilities around, including one which runs parallel
terminal sessions (I forget the name for the moment).
Our own clustering environment ncludes extensive parallel tools,
for parallel command execution, syncing, shutdown, power control.

To add users, you can use NIS or LDAP.

Having nine machines is a good start, and a good introduction. 
However, if you don;t have some sort of clustering framework then
re-installing, installing more machines and upgrading will be a pain.




From zleite at its.caltech.edu  Fri Jul 16 16:03:55 2004
From: zleite at its.caltech.edu (Zailo Leite)
Date: Fri, 16 Jul 2004 09:03:55 -0700
Subject: [Linux-cluster] LVM2+GNBD?
Message-ID: <1089993835.7257.13.camel@DHCP-152-86.caltech.edu>

Can I make a LVM2 logical volume using GNBD imported block devices from,
say 2 GNDB servers, then make a GFS device out of the LVM volume?
I'm building a test rig for trying it, but if someone knows that it
won't work, I'd appreciate the head's up...




From notiggy at gmail.com  Fri Jul 16 18:23:59 2004
From: notiggy at gmail.com (Brian Jackson)
Date: Fri, 16 Jul 2004 13:23:59 -0500
Subject: [Linux-cluster] LVM2+GNBD?
In-Reply-To: <1089993835.7257.13.camel@DHCP-152-86.caltech.edu>
References: <1089993835.7257.13.camel@DHCP-152-86.caltech.edu>
Message-ID: <fb20c2140407161123142e34b4@mail.gmail.com>

On Fri, 16 Jul 2004 09:03:55 -0700, Zailo Leite <zleite at its.caltech.edu> wrote:
> Can I make a LVM2 logical volume using GNBD imported block devices from,
> say 2 GNDB servers, then make a GFS device out of the LVM volume?
> I'm building a test rig for trying it, but if someone knows that it
> won't work, I'd appreciate the head's up...

It should work fine. To the system gnbd is just another block device.
You are limited to the different raid levels you can use in a shared
device situation though. Just something to keep in mind.

--Brian Jackson



From danderso at redhat.com  Fri Jul 16 18:36:45 2004
From: danderso at redhat.com (Derek Anderson)
Date: Fri, 16 Jul 2004 13:36:45 -0500
Subject: [Linux-cluster] LVM2+GNBD?
In-Reply-To: <fb20c2140407161123142e34b4@mail.gmail.com>
References: <1089993835.7257.13.camel@DHCP-152-86.caltech.edu>
	<fb20c2140407161123142e34b4@mail.gmail.com>
Message-ID: <200407161336.45390.danderso@redhat.com>

On Friday 16 July 2004 13:23, Brian Jackson wrote:
> On Fri, 16 Jul 2004 09:03:55 -0700, Zailo Leite <zleite at its.caltech.edu> 
wrote:
> > Can I make a LVM2 logical volume using GNBD imported block devices from,
> > say 2 GNDB servers, then make a GFS device out of the LVM volume?
> > I'm building a test rig for trying it, but if someone knows that it
> > won't work, I'd appreciate the head's up...

You need to add the line:
  types = [ "gnbd", 1 ]
to the devices section of the /etc/lvm/lvm.conf for lvm to scan for GNBD 
devices.

>
> It should work fine. To the system gnbd is just another block device.
> You are limited to the different raid levels you can use in a shared
> device situation though. Just something to keep in mind.
>
> --Brian Jackson
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From lhh at redhat.com  Fri Jul 16 19:09:13 2004
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 16 Jul 2004 15:09:13 -0400
Subject: [Linux-cluster] rgmanager pre-commit
Message-ID: <1090004953.3699.24.camel@atlantis.boston.redhat.com>

Here's the RM I've been working on:

http://people.redhat.com/lhh/rgmanager-1.3.0.tar.gz

README (ie, we know it's broken; read this first):

http://people.redhat.com/lhh/README.rgmanager

Example stuff:

http://people.redhat.com/lhh/cluster.xml

It's *not* stable (= barely runs in some cases), but should give insight
as to where we were going with it.  I will be OOTO next week, so let the
forest fire begin.  It will be a few more weeks before this is
integrated into the main project, and the name is probably going to
change as well as some of the thread nastiness.  Please read the README
first.

Basically, tree-structured, user-configurable resource groups - similar
to the way clumanager 1.x handled them, only a lot more flexible - and
if you don't like the way they're structured, you can edit the rule sets
defining how they're structured to your liking.

/me revs up his ZX6

-- Lon




From kpreslan at redhat.com  Fri Jul 16 22:20:36 2004
From: kpreslan at redhat.com (Ken Preslan)
Date: Fri, 16 Jul 2004 17:20:36 -0500
Subject: [Linux-cluster] patch for gfs, add suiddir option for mount
In-Reply-To: <61425687.20040716124749@hq.310.ru>
References: <61425687.20040716124749@hq.310.ru>
Message-ID: <20040716222036.GA31057@potassium.msp.redhat.com>

On Fri, Jul 16, 2004 at 12:47:49PM +0400, ????? ????????? wrote:
> Hi linux-cluster,
> 
> I apologize for the previous letter, here full patch
> 
> suiddir option for mount
> man 8 mount (FreeBSD)
> 
> <patch ommited>

You're patch didn't actually follow the FreeBSD man page's description
of suiddir.  From:

http://www.freebsd.org/cgi/man.cgi?query=mount&sektion=8&apropos=0&manpath=FreeBSD+5.2-RELEASE+and+Ports

  A directory on the mounted file system will respond to
  the SUID bit being set, by setting the owner of any new
  files to be the same as the owner of the directory.  New
  directories will inherit the bit from their parents.
  Execute bits are removed from the file, and it will not
  be given to root.

Note the last sentence.


The below patch is acceptable to me.  Is it ok with you?


diff -urN crap1/gfs-kernel/src/gfs/gfs_ioctl.h crap2/gfs-kernel/src/gfs/gfs_ioctl.h
--- crap1/gfs-kernel/src/gfs/gfs_ioctl.h	24 Jun 2004 08:53:27 -0000	1.1
+++ crap2/gfs-kernel/src/gfs/gfs_ioctl.h	16 Jul 2004 22:13:05 -0000
@@ -213,6 +213,7 @@
 	unsigned int ar_num_glockd;
 
 	int ar_posixacls;	/* Enable posix acls */
+	int ar_suiddir;         /* suiddir support */
 };
 
 #endif /* ___GFS_IOCTL_DOT_H__ */
diff -urN crap1/gfs-kernel/src/gfs/inode.c crap2/gfs-kernel/src/gfs/inode.c
--- crap1/gfs-kernel/src/gfs/inode.c	16 Jul 2004 22:07:02 -0000	1.3
+++ crap2/gfs-kernel/src/gfs/inode.c	16 Jul 2004 22:13:05 -0000
@@ -1132,16 +1132,26 @@
 	struct posix_acl *acl = NULL;
 	struct gfs_alloc *al;
 	struct gfs_inode *ip;
-	unsigned int gid;
+	unsigned int uid, gid;
 	int alloc_required;
 	int error;
 
+	if (sdp->sd_args.ar_suiddir &&
+	    (dip->i_di.di_mode & S_ISUID) &&
+	    dip->i_di.di_uid) {
+		if (type == GFS_FILE_DIR)
+			mode |= S_ISUID;
+		else if (dip->i_di.di_uid != current->fsuid)
+			mode &= ~07111;
+		uid = dip->i_di.di_uid;
+	} else
+		uid = current->fsuid;
+
 	if (dip->i_di.di_mode & S_ISGID) {
 		if (type == GFS_FILE_DIR)
 			mode |= S_ISGID;
 		gid = dip->i_di.di_gid;
-	}
-	else
+	} else
 		gid = current->fsgid;
 
 	error = gfs_setup_new_acl(dip, type, &mode, &acl);
@@ -1150,13 +1160,11 @@
 
 	al = gfs_alloc_get(dip);
 
-	error = gfs_quota_lock_m(dip,
-				 current->fsuid,
-				 gid);
+	error = gfs_quota_lock_m(dip, uid, gid);
 	if (error)
 		goto fail;
 
-	error = gfs_quota_check(dip, current->fsuid, gid);
+	error = gfs_quota_check(dip, uid, gid);
 	if (error)
 		goto fail_gunlock_q;
 
@@ -1206,13 +1214,13 @@
 	if (error)
 		goto fail_end_trans;
 
-	error = make_dinode(dip, gl, inum, type, mode, current->fsuid, gid);
+	error = make_dinode(dip, gl, inum, type, mode, uid, gid);
 	if (error)
 		goto fail_end_trans;
 
 	al->al_ul = gfs_trans_add_unlinked(sdp, GFS_LOG_DESC_IDA,
 					   &(struct gfs_inum){0, inum->no_addr});
-	gfs_trans_add_quota(sdp, +1, current->fsuid, gid);
+	gfs_trans_add_quota(sdp, +1, uid, gid);
 
 	/* Gfs_inode_get() can't fail here.  But then again, it shouldn't be
 	   here (it should be in gfs_createi()).  Gfs_init_acl() has no
diff -urN crap1/gfs-kernel/src/gfs/mount.c crap2/gfs-kernel/src/gfs/mount.c
--- crap1/gfs-kernel/src/gfs/mount.c	24 Jun 2004 08:53:28 -0000	1.1
+++ crap2/gfs-kernel/src/gfs/mount.c	16 Jul 2004 22:13:05 -0000
@@ -128,6 +128,9 @@
 		else if (!strcmp(x, "acl"))
 			args->ar_posixacls = TRUE;
 
+		else if (!strcmp(x, "suiddir"))
+			args->ar_suiddir = TRUE;
+
 		/*  Unknown  */
 
 		else {


-- 
Ken Preslan <kpreslan at redhat.com>



From kazutomo at powercockpit.net  Sat Jul 17 04:09:50 2004
From: kazutomo at powercockpit.net (Kazutomo Yoshii)
Date: Fri, 16 Jul 2004 21:09:50 -0700
Subject: [Linux-cluster] Cluster admin
In-Reply-To: <001001c46b37$6309daa0$77b4e383@Mmepcfluids08>
References: <001001c46b37$6309daa0$77b4e383@Mmepcfluids08>
Message-ID: <40F8A68E.5000107@powercockpit.net>

Hi,

> Hello everibody,
>
> I built a linux cluster running the RH 9.
>
> I'd like to know if is there any tool for the general cluster
> administration, such
> as add a new user to all the node, put a retrieve fil from the nodes
> and so on .
>
NIS or openldap may be good for managing user in cluster.

If you want to do arbitrary operation to entirer cluster, you may need
cluster-wise shell such as
http://sourceforge.net/projects/clusterssh/

I'm also working on similar tool.

Thanks,
Kaz
-- 
My PowerCockpit page: http://powercockpit.net/hacking/

> I had a look to OSCAR, but I don't want to installl again all the
> machine that are
> already set-up to go.
>
> Many Thanks
>
> Diego
>
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.720 / Virus Database: 476 - Release Date: 14/07/2004
>
>------------------------------------------------------------------------
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>http://www.redhat.com/mailman/listinfo/linux-cluster
>  
>




From chloong at nextnationnet.com  Mon Jul 19 09:09:37 2004
From: chloong at nextnationnet.com (chloong)
Date: Mon, 19 Jul 2004 17:09:37 +0800
Subject: [Linux-cluster] unresolved symbol
Message-ID: <40FB8FD1.6070902@nextnationnet.com>

hi all,
I am facing this unresolved symbol error when i do a depmod -a for gfs.

i am using kernel 2.4.21-15.0.3.EL. I actually re-compile the GFS using 
GFS-6.0.0-1.2.src.rpm from HughesJR.com where it is able to compile 
under 2.4.21-15.0.3.EL version where as the one that i downloaded from 
RedHat is only be able to compile under 2.4.21-15.EL version.

I was able to compile it and installed but when i run depmod -a it 
complain that there are unresolved symbol for all the GFS modules....

depmod: *** Unresolved symbols in 
/lib/modules/2.4.21-15.0.3.EL/kernel/drivers/block/gnbd/gnbd.o
depmod: *** Unresolved symbols in 
/lib/modules/2.4.21-15.0.3.EL/kernel/drivers/block/gnbd/gnbd_serv.
o
depmod: *** Unresolved symbols in 
/lib/modules/2.4.21-15.0.3.EL/kernel/drivers/md/pool/pool.o
depmod: *** Unresolved symbols in 
/lib/modules/2.4.21-15.0.3.EL/kernel/fs/gfs/gfs.o
depmod: *** Unresolved symbols in 
/lib/modules/2.4.21-15.0.3.EL/kernel/fs/gfs_locking/lock_gulm/lock
_gulm.o
depmod: *** Unresolved symbols in 
/lib/modules/2.4.21-15.0.3.EL/kernel/fs/gfs_locking/lock_harness/l
ock_harness.o
depmod: *** Unresolved symbols in 
/lib/modules/2.4.21-15.0.3.EL/kernel/fs/gfs_locking/lock_nolock/lo
ck_nolock.o

I checked that my kernel version is correct using uname -r...

How could i go about this....? Please help....

Thanks!




From mailing-lists at hughesjr.com  Mon Jul 19 10:00:39 2004
From: mailing-lists at hughesjr.com (Johnny Hughes)
Date: Mon, 19 Jul 2004 05:00:39 -0500
Subject: [Linux-cluster] unresolved symbol
In-Reply-To: <40FB8FD1.6070902@nextnationnet.com>
References: <40FB8FD1.6070902@nextnationnet.com>
Message-ID: <1090231239.10085.13.camel@Myth.home.local>

On Mon, 2004-07-19 at 04:09, chloong wrote:
> hi all,
> I am facing this unresolved symbol error when i do a depmod -a for gfs.
> 
> i am using kernel 2.4.21-15.0.3.EL. I actually re-compile the GFS using 
> GFS-6.0.0-1.2.src.rpm from HughesJR.com where it is able to compile 
> under 2.4.21-15.0.3.EL version where as the one that i downloaded from 
> RedHat is only be able to compile under 2.4.21-15.EL version.
> 
> I was able to compile it and installed but when i run depmod -a it 
> complain that there are unresolved symbol for all the GFS modules....

Did you compile the version for your arch (i686, athlon)?

If you are running a i686 kernel, you need to compile with a target=i686
or if you have an athlon kernel, you need to compile with target=athlon

You can download the binary modules from my site as well and see if you
have the same problem.

In order to compile the SRPM as written for a target=athlon you will
need to install (at least for the compile), kernel-unsupported,
kernel-smp, and kernel-source ... for target=i686 you need to install 
kernel-unsupported, kernel-smp, kernel-source, and kernel-hugemem

Johnny Hughes
HughesJR.com

> I checked that my kernel version is correct using uname -r...
> 
> How could i go about this....? Please help....





From chloong at nextnationnet.com  Mon Jul 19 10:21:38 2004
From: chloong at nextnationnet.com (chloong)
Date: Mon, 19 Jul 2004 18:21:38 +0800
Subject: [Linux-cluster] unresolved symbol
In-Reply-To: <1090231239.10085.13.camel@Myth.home.local>
References: <40FB8FD1.6070902@nextnationnet.com>
	<1090231239.10085.13.camel@Myth.home.local>
Message-ID: <40FBA0B2.2000700@nextnationnet.com>

Johnny Hughes wrote:

>On Mon, 2004-07-19 at 04:09, chloong wrote:
>  
>
>>hi all,
>>I am facing this unresolved symbol error when i do a depmod -a for gfs.
>>
>>i am using kernel 2.4.21-15.0.3.EL. I actually re-compile the GFS using 
>>GFS-6.0.0-1.2.src.rpm from HughesJR.com where it is able to compile 
>>under 2.4.21-15.0.3.EL version where as the one that i downloaded from 
>>RedHat is only be able to compile under 2.4.21-15.EL version.
>>
>>I was able to compile it and installed but when i run depmod -a it 
>>complain that there are unresolved symbol for all the GFS modules....
>>    
>>
>
>Did you compile the version for your arch (i686, athlon)?
>
>If you are running a i686 kernel, you need to compile with a target=i686
>or if you have an athlon kernel, you need to compile with target=athlon
>
>You can download the binary modules from my site as well and see if you
>have the same problem.
>
>In order to compile the SRPM as written for a target=athlon you will
>need to install (at least for the compile), kernel-unsupported,
>kernel-smp, and kernel-source ... for target=i686 you need to install 
>kernel-unsupported, kernel-smp, kernel-source, and kernel-hugemem
>
>Johnny Hughes
>HughesJR.com
>
>  
>
>>I checked that my kernel version is correct using uname -r...
>>
>>How could i go about this....? Please help....
>>    
>>
>
>
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>http://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>  
>
hi Johnny,
I  had tried using your bin, both i386 & i686, but still have the same 
problem.
BTW, how could i know what target should i use? i am running on a x86 
platform and i compile the kernel myself and i select the cpu type as 
386 family with no smp support.

the kernel source i downloaded from rpmfind. The version is 
kernel-2.4.21-15.0.3.EL.src.rpm. I used back the kernel config file 
provided from this rpm and changed it to no smp support.

Everything are fine after reboot using this kernel. Then i compile the 
gfs source from your side. The compilation was successful. After 
installation, when i do a depmod -a, it still gave me unresolved symbol 
for gfs modules....

Please help!

Thanks



From mailing-lists at hughesjr.com  Mon Jul 19 12:01:40 2004
From: mailing-lists at hughesjr.com (Johnny Hughes)
Date: Mon, 19 Jul 2004 07:01:40 -0500
Subject: [Linux-cluster] unresolved symbol
In-Reply-To: <40FBA0B2.2000700@nextnationnet.com>
References: <40FB8FD1.6070902@nextnationnet.com>
	<1090231239.10085.13.camel@Myth.home.local>
	<40FBA0B2.2000700@nextnationnet.com>
Message-ID: <1090238500.5864.21.camel@Myth.home.local>

On Mon, 2004-07-19 at 05:21, chloong wrote:
> hi Johnny,
> I  had tried using your bin, both i386 & i686, but still have the same 
> problem.
> BTW, how could i know what target should i use? i am running on a x86 
> platform and i compile the kernel myself and i select the cpu type as 
> 386 family with no smp support.
> 
> the kernel source i downloaded from rpmfind. The version is 
> kernel-2.4.21-15.0.3.EL.src.rpm. I used back the kernel config file 
> provided from this rpm and changed it to no smp support.
> 
OK, did you build a kernel rpm and install it ... if so, what was the
name of the kernel's rpm.

Is this on RHEL or a clone like WBEL/CentOS/TaoLinux?

> Everything are fine after reboot using this kernel. Then i compile the 
> gfs source from your side. The compilation was successful. After 
> installation, when i do a depmod -a, it still gave me unresolved symbol 
> for gfs modules....
> 
I installed:
GFS-6.0.0-1.2.i686.rpm
GFS-devel-6.0.0-1.2.i686.rpm
GFS-modules-6.0.0-1.2.i686.rpm
perl-Net-Telnet-3.03-2.noarch.rpm

Then do a:

depmod -a

No errors...

Johnny Hughes
HughesJR.com




From merlin at studiobz.it  Mon Jul 19 15:40:40 2004
From: merlin at studiobz.it (Christian Zoffoli)
Date: Mon, 19 Jul 2004 17:40:40 +0200
Subject: [Linux-cluster] GNBD: cannot connect to cluster manager
	...Operation not permitted
Message-ID: <40FBEB78.8040305@studiobz.it>


Hi to all.
I have compiled and installed all the stuff in cvs on a vanilla 2.6.7, 
but I have a big problem when I try to export a device with GNBD:


...I've done all the steps in 
https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS.GNBD.Usage


...but when I try

gnbd_export -v -e export1 -d /dev/sdb1

it fails with:

---
receiver: ERROR cannot connect to cluster manager : Operation not permitted
gnbd_export: ERROR gnbd_clusterd failed
---

looking at the log I have found this message:
---
Jul 19 20:28:13 gfs1 receiver[22551]: ERROR [gnbd_clusterd.c:53] cannot 
connect to cluster manager : Operation not permitted
---


What can I do?
Where can I find a little explanation of the cluster.xml file ?


Thanks,
Christian



From jbrassow at redhat.com  Mon Jul 19 16:06:10 2004
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Mon, 19 Jul 2004 11:06:10 -0500
Subject: [Linux-cluster] GNBD: cannot connect to cluster manager
	...Operation not permitted
In-Reply-To: <40FBEB78.8040305@studiobz.it>
References: <40FBEB78.8040305@studiobz.it>
Message-ID: <8CD7B6F8-D99D-11D8-ACF6-000A957BB1F6@redhat.com>

Christian,

I see that there are new binaries that come with gnbd now (like 
gnbd_monitor and gnbd_clusterd).  I have not used gnbd since it has 
changed, and it might be a good guess to say that the documentation is 
out of date.

Ben M is not in right now, but he would be able to answer your 
question.  I'll try to make sure he gets this.

  brassow

On Jul 19, 2004, at 10:40 AM, Christian Zoffoli wrote:

>
> Hi to all.
> I have compiled and installed all the stuff in cvs on a vanilla 2.6.7, 
> but I have a big problem when I try to export a device with GNBD:
>
>
> ...I've done all the steps in 
> https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS.GNBD.Usage
>
>
> ...but when I try
>
> gnbd_export -v -e export1 -d /dev/sdb1
>
> it fails with:
>
> ---
> receiver: ERROR cannot connect to cluster manager : Operation not 
> permitted
> gnbd_export: ERROR gnbd_clusterd failed
> ---
>
> looking at the log I have found this message:
> ---
> Jul 19 20:28:13 gfs1 receiver[22551]: ERROR [gnbd_clusterd.c:53] 
> cannot connect to cluster manager : Operation not permitted
> ---
>
>
> What can I do?
> Where can I find a little explanation of the cluster.xml file ?
>
>
> Thanks,
> Christian
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
>



From merlin at studiobz.it  Mon Jul 19 18:13:38 2004
From: merlin at studiobz.it (Christian Zoffoli)
Date: Mon, 19 Jul 2004 20:13:38 +0200
Subject: [Linux-cluster] GNBD: cannot connect to cluster
	manager	...Operation not permitted
In-Reply-To: <8CD7B6F8-D99D-11D8-ACF6-000A957BB1F6@redhat.com>
References: <40FBEB78.8040305@studiobz.it>
	<8CD7B6F8-D99D-11D8-ACF6-000A957BB1F6@redhat.com>
Message-ID: <40FC0F52.6020508@studiobz.it>

Jonathan E Brassow wrote:
[cut]
> 
> Ben M is not in right now, but he would be able to answer your 
> question.  I'll try to make sure he gets this.

thanks,
I'm very interested to make extensive tests on the new code.

Christian



From amir at datacore.ch  Mon Jul 19 18:45:06 2004
From: amir at datacore.ch (Amir Guindehi)
Date: Mon, 19 Jul 2004 20:45:06 +0200
Subject: [Linux-cluster] GNBD: cannot connect to cluster
	manager	...Operation not permitted
In-Reply-To: <40FBEB78.8040305@studiobz.it>
References: <40FBEB78.8040305@studiobz.it>
Message-ID: <40FC16B2.5070703@datacore.ch>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Christian,

| Jul 19 20:28:13 gfs1 receiver[22551]: ERROR [gnbd_clusterd.c:53] cannot
| connect to cluster manager : Operation not permitted

Did you start gnbd_serv?

If so, check the permissions of /dev/gnbd_ctl. Eventually the are wrong
(they where here). If you run the 'gnbd_export' with 'strace' you will
see more.

I remember that using devfs one needs something along the following
lines in /etc/devfs.d/gnbd:

#
# GNBD
# gnbd needs crw------- on /dev/gnbd_ctl
#
REGISTER   ^gnbd_ctl    PERMISSIONS root.root 600

Regards
- - Amir
- --
Amir Guindehi, nospam.amir at datacore.ch
DataCore GmbH, Witikonerstrasse 289, 8053 Zurich, Switzerland

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-nr1 (Windows 2000)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA/BaxbycOjskSVCwRAp+WAJwImq2LK4NvQJirXpztKLRu+d4+8ACeO8ie
G4XXlqrtTMT5Wi/116uoE0M=
=fNie
-----END PGP SIGNATURE-----



From merlin at studiobz.it  Mon Jul 19 19:26:55 2004
From: merlin at studiobz.it (Christian Zoffoli)
Date: Mon, 19 Jul 2004 21:26:55 +0200
Subject: [Linux-cluster] GNBD: cannot connect to
	cluster	manager	...Operation not permitted
In-Reply-To: <40FC16B2.5070703@datacore.ch>
References: <40FBEB78.8040305@studiobz.it> <40FC16B2.5070703@datacore.ch>
Message-ID: <40FC207F.9040301@studiobz.it>

Amir Guindehi wrote:
[cut]
> Did you start gnbd_serv?

yes

> If so, check the permissions of /dev/gnbd_ctl. Eventually the are wrong
> (they where here). If you run the 'gnbd_export' with 'strace' you will
> see more.

permissions seems correct here crw------- thanks for the infos.

Christian



From danderso at redhat.com  Mon Jul 19 19:46:43 2004
From: danderso at redhat.com (Derek Anderson)
Date: Mon, 19 Jul 2004 14:46:43 -0500
Subject: [Linux-cluster] GNBD: cannot connect to cluster manager
	...Operation not permitted
In-Reply-To: <40FBEB78.8040305@studiobz.it>
References: <40FBEB78.8040305@studiobz.it>
Message-ID: <200407191446.43030.danderso@redhat.com>

Christian,

You need to execute the cluster setup steps on the page you linked below 
(previous to the GNBD-specific sections).  Specifically, on each node you 
need to run: modprobe lock_dlm, ccsd, cman_tool join, and fence_tool join 
after you have cman quorum.  Then you should be able to gnbd_export devices.

On Monday 19 July 2004 10:40, Christian Zoffoli wrote:
> Hi to all.
> I have compiled and installed all the stuff in cvs on a vanilla 2.6.7,
> but I have a big problem when I try to export a device with GNBD:
>
>
> ...I've done all the steps in
> https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS.GNBD.Usage
>
>
> ...but when I try
>
> gnbd_export -v -e export1 -d /dev/sdb1
>
> it fails with:
>
> ---
> receiver: ERROR cannot connect to cluster manager : Operation not permitted
> gnbd_export: ERROR gnbd_clusterd failed
> ---
>
> looking at the log I have found this message:
> ---
> Jul 19 20:28:13 gfs1 receiver[22551]: ERROR [gnbd_clusterd.c:53] cannot
> connect to cluster manager : Operation not permitted
> ---
>
>
> What can I do?
> Where can I find a little explanation of the cluster.xml file ?
>
>
> Thanks,
> Christian
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From phillips at redhat.com  Mon Jul 19 20:13:37 2004
From: phillips at redhat.com (Daniel Phillips)
Date: Mon, 19 Jul 2004 16:13:37 -0400
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <fb20c214040713155014290b6e@mail.gmail.com>
References: <40F460BB.4040603@smugmug.com>
	<fb20c214040713155014290b6e@mail.gmail.com>
Message-ID: <200407191613.38768.phillips@redhat.com>

On Tuesday 13 July 2004 18:50, Brian Jackson wrote:
> > Does GFS intelligently "spread
> > the load" among multiple storage entities for writing under high
> > load?
>
> No, each node that mounts has direct access to the storage. It writes
> just like any other fs, when it can.

Hi Brian,

He can do that at the block device level, with a device-mapper "striped" 
target.

Regards,

Daniel



From merlin at studiobz.it  Mon Jul 19 20:35:50 2004
From: merlin at studiobz.it (Christian Zoffoli)
Date: Mon, 19 Jul 2004 22:35:50 +0200
Subject: [Linux-cluster] GNBD: cannot connect to cluster manager
	...Operation not permitted
In-Reply-To: <200407191446.43030.danderso@redhat.com>
References: <40FBEB78.8040305@studiobz.it>
	<200407191446.43030.danderso@redhat.com>
Message-ID: <40FC30A6.4060808@studiobz.it>

Derek Anderson wrote:
> Christian,
> 
> You need to execute the cluster setup steps on the page you linked below 
> (previous to the GNBD-specific sections).  Specifically, on each node you 
> need to run: modprobe lock_dlm, ccsd, cman_tool join, and fence_tool join 
> after you have cman quorum.  Then you should be able to gnbd_export devices.

I have done all the steps but I found these errors in the logs:

node1 (GFS1)
-----
Jul 20 01:29:02 gfs1 Lock_Harness <CVS> (built Jul 17 2004 22:54:18) 
installed
Jul 20 01:29:02 gfs1 CMAN <CVS> (built Jul 17 2004 22:59:43) installed
Jul 20 01:29:02 gfs1 NET: Registered protocol family 31
Jul 20 01:29:02 gfs1 DLM <CVS> (built Jul 18 2004 00:08:18) installed
Jul 20 01:29:02 gfs1 Lock_DLM (built Jul 17 2004 22:53:49) installed
Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data 
available
Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data 
available
Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data 
available
Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data 
available
Jul 20 01:29:03 gfs1 CMAN: Waiting to join or form a Linux-cluster
Jul 20 01:29:14 gfs1 CMAN: forming a new cluster
Jul 20 01:29:14 gfs1 CMAN: quorum regained, resuming activity
Jul 20 01:29:34 gfs1 CMAN: got node gfs2
Jul 20 01:30:06 gfs1 gnbd: registered device at major 253
Jul 20 01:30:38 gfs1 gnbd_serv[8039]: startup succeeded
Jul 20 01:30:38 gfs1 receiver[8043]: ERROR [gnbd_clusterd.c:53] cannot 
connect to cluster manager : Operation not permitted
-----


node2 (GFS2)
-----
Jul 20 01:15:24 gfs2 Lock_Harness <CVS> (built Jul 17 2004 22:54:18) 
installed
Jul 20 01:15:24 gfs2 CMAN <CVS> (built Jul 17 2004 22:59:43) installed
Jul 20 01:15:24 gfs2 NET: Registered protocol family 31
Jul 20 01:15:24 gfs2 DLM <CVS> (built Jul 18 2004 00:07:29) installed
Jul 20 01:15:24 gfs2 Lock_DLM (built Jul 17 2004 22:53:49) installed
Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data 
available
Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data 
available
Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data 
available
Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data 
available
Jul 20 01:15:51 gfs2 CMAN: Waiting to join or form a Linux-cluster
Jul 20 01:15:55 gfs2 CMAN: sending membership request
Jul 20 01:15:55 gfs2 CMAN: got node gfs1
Jul 20 01:15:55 gfs2 CMAN: quorum regained, resuming activity
Jul 20 01:16:28 gfs2 gnbd: registered device at major 253
Jul 20 01:16:52 gfs2 gnbd_serv[8025]: startup succeeded
Jul 20 01:17:03 gfs2 receiver[8029]: ERROR [gnbd_clusterd.c:53] cannot 
connect to cluster manager : Operation not permitted
-----


here is the cluster.xml file:
-----
<?xml version="1.0"?>
<cluster name="xcluster" config_version="1">

   <cman two_node="1" expected_votes="1">
   </cman>

   <nodes>
     <node name="gfs1" votes="1">
       <fence>
         <method name="single">
           <device name="human" ipaddr="10.0.4.101"/>
         </method>
       </fence>
     </node>
     <node name="gfs2" votes="1">
       <fence>
         <method name="single">
           <device name="human" ipaddr="10.0.4.102"/>
         </method>
       </fence>
     </node>
   </nodes>

   <fence_devices>
     <device name="human" agent="fence_manual"/>
   </fence_devices>
</cluster>
-----


Christian



From danderso at redhat.com  Mon Jul 19 21:25:52 2004
From: danderso at redhat.com (Derek Anderson)
Date: Mon, 19 Jul 2004 16:25:52 -0500
Subject: [Linux-cluster] GNBD: cannot connect to cluster manager
	...Operation not permitted
In-Reply-To: <40FC30A6.4060808@studiobz.it>
References: <40FBEB78.8040305@studiobz.it>
	<200407191446.43030.danderso@redhat.com>
	<40FC30A6.4060808@studiobz.it>
Message-ID: <200407191625.52391.danderso@redhat.com>

Christian,

I tried it again with your config file and it is working for me.  What do the 
/proc/cluster/nodes, /proc/cluster/services, and /proc/cluster/status files 
look like on the nodes?

On Monday 19 July 2004 15:35, Christian Zoffoli wrote:
> Derek Anderson wrote:
> > Christian,
> >
> > You need to execute the cluster setup steps on the page you linked below
> > (previous to the GNBD-specific sections).  Specifically, on each node you
> > need to run: modprobe lock_dlm, ccsd, cman_tool join, and fence_tool join
> > after you have cman quorum.  Then you should be able to gnbd_export
> > devices.
>
> I have done all the steps but I found these errors in the logs:
>
> node1 (GFS1)
> -----
> Jul 20 01:29:02 gfs1 Lock_Harness <CVS> (built Jul 17 2004 22:54:18)
> installed
> Jul 20 01:29:02 gfs1 CMAN <CVS> (built Jul 17 2004 22:59:43) installed
> Jul 20 01:29:02 gfs1 NET: Registered protocol family 31
> Jul 20 01:29:02 gfs1 DLM <CVS> (built Jul 18 2004 00:08:18) installed
> Jul 20 01:29:02 gfs1 Lock_DLM (built Jul 17 2004 22:53:49) installed
> Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data
> available
> Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data
> available
> Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data
> available
> Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data
> available
> Jul 20 01:29:03 gfs1 CMAN: Waiting to join or form a Linux-cluster
> Jul 20 01:29:14 gfs1 CMAN: forming a new cluster
> Jul 20 01:29:14 gfs1 CMAN: quorum regained, resuming activity
> Jul 20 01:29:34 gfs1 CMAN: got node gfs2
> Jul 20 01:30:06 gfs1 gnbd: registered device at major 253
> Jul 20 01:30:38 gfs1 gnbd_serv[8039]: startup succeeded
> Jul 20 01:30:38 gfs1 receiver[8043]: ERROR [gnbd_clusterd.c:53] cannot
> connect to cluster manager : Operation not permitted
> -----
>
>
> node2 (GFS2)
> -----
> Jul 20 01:15:24 gfs2 Lock_Harness <CVS> (built Jul 17 2004 22:54:18)
> installed
> Jul 20 01:15:24 gfs2 CMAN <CVS> (built Jul 17 2004 22:59:43) installed
> Jul 20 01:15:24 gfs2 NET: Registered protocol family 31
> Jul 20 01:15:24 gfs2 DLM <CVS> (built Jul 18 2004 00:07:29) installed
> Jul 20 01:15:24 gfs2 Lock_DLM (built Jul 17 2004 22:53:49) installed
> Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data
> available
> Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data
> available
> Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data
> available
> Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data
> available
> Jul 20 01:15:51 gfs2 CMAN: Waiting to join or form a Linux-cluster
> Jul 20 01:15:55 gfs2 CMAN: sending membership request
> Jul 20 01:15:55 gfs2 CMAN: got node gfs1
> Jul 20 01:15:55 gfs2 CMAN: quorum regained, resuming activity
> Jul 20 01:16:28 gfs2 gnbd: registered device at major 253
> Jul 20 01:16:52 gfs2 gnbd_serv[8025]: startup succeeded
> Jul 20 01:17:03 gfs2 receiver[8029]: ERROR [gnbd_clusterd.c:53] cannot
> connect to cluster manager : Operation not permitted
> -----
>
>
> here is the cluster.xml file:
> -----
> <?xml version="1.0"?>
> <cluster name="xcluster" config_version="1">
>
>    <cman two_node="1" expected_votes="1">
>    </cman>
>
>    <nodes>
>      <node name="gfs1" votes="1">
>        <fence>
>          <method name="single">
>            <device name="human" ipaddr="10.0.4.101"/>
>          </method>
>        </fence>
>      </node>
>      <node name="gfs2" votes="1">
>        <fence>
>          <method name="single">
>            <device name="human" ipaddr="10.0.4.102"/>
>          </method>
>        </fence>
>      </node>
>    </nodes>
>
>    <fence_devices>
>      <device name="human" agent="fence_manual"/>
>    </fence_devices>
> </cluster>
> -----
>
>
> Christian



From merlin at studiobz.it  Mon Jul 19 21:27:58 2004
From: merlin at studiobz.it (Christian Zoffoli)
Date: Mon, 19 Jul 2004 23:27:58 +0200
Subject: [Linux-cluster] GNBD: cannot connect to cluster manager
	...Operation not permitted
In-Reply-To: <200407191625.52391.danderso@redhat.com>
References: <40FBEB78.8040305@studiobz.it>
	<200407191446.43030.danderso@redhat.com>
	<40FC30A6.4060808@studiobz.it>
	<200407191625.52391.danderso@redhat.com>
Message-ID: <40FC3CDE.6050104@studiobz.it>

Derek Anderson wrote:
> Christian,
> 
> I tried it again with your config file and it is working for me.  What do the 
> /proc/cluster/nodes, /proc/cluster/services, and /proc/cluster/status files 
> look like on the nodes?

-----
gfs1 root # cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
    1    1    1   M   gfs1
    2    1    1   M   gfs2
-----

-----
gfs2 root # cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
    1    1    1   M   gfs1
    2    1    1   M   gfs2
-----

-----
gfs1 root # cat /proc/cluster/services

Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2]
-----

-----
gfs2 root # cat /proc/cluster/services

Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2]
-----

-----
gfs1 root # cat /proc/cluster/status
Version: 2.0.1
Config version: 1
Cluster name: xcluster
Cluster ID: 28724
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 1
Node addresses: 10.0.4.101
-----

-----
gfs2 root # cat /proc/cluster/status
Version: 2.0.1
Config version: 1
Cluster name: xcluster
Cluster ID: 28724
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 1
Node addresses: 10.0.4.10
-----



From chloong at nextnationnet.com  Tue Jul 20 02:11:59 2004
From: chloong at nextnationnet.com (chloong)
Date: Tue, 20 Jul 2004 10:11:59 +0800
Subject: [Linux-cluster] unresolved symbol solved!
In-Reply-To: <1090238500.5864.21.camel@Myth.home.local>
References: <40FB8FD1.6070902@nextnationnet.com>	
	<1090231239.10085.13.camel@Myth.home.local>	
	<40FBA0B2.2000700@nextnationnet.com>
	<1090238500.5864.21.camel@Myth.home.local>
Message-ID: <40FC7F6F.3090300@nextnationnet.com>

Jonny,
Thanks a lot man!

I managed to install GFS and run it. Actually i used back the 
kernel-2.4.21-15.EL for smp. As follow what you said, installed all 
kernel-source, kernel-hugemem, kernel-unsupported, kernel-smp and then 
recompile from src. Now no more unresolved symbol and able to modprobe 
all the modules.

Need to configure GFS now.

Thanks again man!


Johnny Hughes wrote:

>On Mon, 2004-07-19 at 05:21, chloong wrote:
>  
>
>>hi Johnny,
>>I  had tried using your bin, both i386 & i686, but still have the same 
>>problem.
>>BTW, how could i know what target should i use? i am running on a x86 
>>platform and i compile the kernel myself and i select the cpu type as 
>>386 family with no smp support.
>>
>>the kernel source i downloaded from rpmfind. The version is 
>>kernel-2.4.21-15.0.3.EL.src.rpm. I used back the kernel config file 
>>provided from this rpm and changed it to no smp support.
>>
>>    
>>
>OK, did you build a kernel rpm and install it ... if so, what was the
>name of the kernel's rpm.
>
>Is this on RHEL or a clone like WBEL/CentOS/TaoLinux?
>
>  
>
>>Everything are fine after reboot using this kernel. Then i compile the 
>>gfs source from your side. The compilation was successful. After 
>>installation, when i do a depmod -a, it still gave me unresolved symbol 
>>for gfs modules....
>>
>>    
>>
>I installed:
>GFS-6.0.0-1.2.i686.rpm
>GFS-devel-6.0.0-1.2.i686.rpm
>GFS-modules-6.0.0-1.2.i686.rpm
>perl-Net-Telnet-3.03-2.noarch.rpm
>
>Then do a:
>
>depmod -a
>
>No errors...
>
>Johnny Hughes
>HughesJR.com
>
>
>
>
>  
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040720/cf97c317/attachment.htm>

From teigland at redhat.com  Tue Jul 20 04:14:42 2004
From: teigland at redhat.com (David Teigland)
Date: Tue, 20 Jul 2004 12:14:42 +0800
Subject: [Linux-cluster] GNBD: cannot connect to cluster manager
	...Operation not permitted
In-Reply-To: <40FBEB78.8040305@studiobz.it>
References: <40FBEB78.8040305@studiobz.it>
Message-ID: <20040720041442.GA11189@redhat.com>


On Mon, Jul 19, 2004 at 05:40:40PM +0200, Christian Zoffoli wrote:
> 
> Hi to all.
> I have compiled and installed all the stuff in cvs on a vanilla 2.6.7, 
> but I have a big problem when I try to export a device with GNBD:
> 
> 
> ...I've done all the steps in 
> https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS.GNBD.Usage
> 
> 
> ...but when I try
> 
> gnbd_export -v -e export1 -d /dev/sdb1
> 
> it fails with:
> 
> ---
> receiver: ERROR cannot connect to cluster manager : Operation not permitted
> gnbd_export: ERROR gnbd_clusterd failed
> ---
> 
> looking at the log I have found this message:
> ---
> Jul 19 20:28:13 gfs1 receiver[22551]: ERROR [gnbd_clusterd.c:53] cannot 
> connect to cluster manager : Operation not permitted

There have been a lot of people who have had this same problem.  In general
there is no reason for gnbd to have any relation to clustering.  This begs the
question, why is there a "gnbd_cluster" thread trying to talk with a cluster
manager?  IMO it's unfortunate that gnbd is doing this at all, much more so by
default, causing unnecessary problems for so many people.

AFAIK, the only way to prevent gnbd from doing this is to use the "-c Enable
caching" flag for gnbd_export.  Try using that flag and see if it helps.

Now a feeble attempt to answer the question above.  When you do a "non-caching"
export (don't use -c), gnbd assumes that it also needs to talk with a cluster
manager because it assumes that you are going to use two gnbd servers to export
the same (shared) underlying block device.  This also assumes that the clients
are using some form of multi-pathing in their volume manager.  I may be wrong
on some of that, but it's clearly not the way most people use gnbd -- people
usually have SAN's precisely to avoid using gnbd, not to do gnbd multi-pathing.

[If anything, people want to do mirroring between gnbd servers, not fail-over.
Fail-over may be useful for some people, but I'd hope it could be done without
making gnbd itself impossibly convoluted.]

-- 
Dave Teigland  <teigland at redhat.com>



From chloong at nextnationnet.com  Tue Jul 20 12:13:15 2004
From: chloong at nextnationnet.com (chloong)
Date: Tue, 20 Jul 2004 20:13:15 +0800
Subject: [Linux-cluster] unable to mount gfs partition
Message-ID: <40FD0C5B.9060601@nextnationnet.com>

hi all,
I managed to setup the whole gfs clustering. i have 2 nodes servers in 
this gfs cluster.

1 node is mounting the gfs partition without any issue but the other one 
not able to mount...giving me error:
#mount -t gfs /dev/pool/smsgateclu_pool0 /gfs1
mount: wrong fs type, bad option, bad superblock on 
/dev/pool/smsgateclu_pool0,
           or too many mounted file systems

can anyone facing this problem?

Please help!

Thanks



From rmayhew at mweb.com  Tue Jul 20 15:02:55 2004
From: rmayhew at mweb.com (Richard Mayhew)
Date: Tue, 20 Jul 2004 17:02:55 +0200
Subject: [Linux-cluster] unable to mount gfs partition
Message-ID: <91C4F1A7C418014D9F88E938C135545860A1A8@mwjdc2.mweb.com>

Hi,

Are all your Daemons running and functioning correctly (specially the
lock_gulm daemon)
Have you assembled your pool device?
 


--

Regards

Richard Mayhew
Unix Specialist

MWEB Business
Tel:  + 27 11 340 7200
Fax:  + 27 11 340 7288
Website: www.mwebbusiness.co.za

-----Original Message-----
From: chloong [mailto:chloong at nextnationnet.com] 
Sent: 20 July 2004 02:13 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] unable to mount gfs partition

hi all,
I managed to setup the whole gfs clustering. i have 2 nodes servers in
this gfs cluster.

1 node is mounting the gfs partition without any issue but the other one
not able to mount...giving me error:
#mount -t gfs /dev/pool/smsgateclu_pool0 /gfs1
mount: wrong fs type, bad option, bad superblock on
/dev/pool/smsgateclu_pool0,
           or too many mounted file systems

can anyone facing this problem?

Please help!

Thanks

--
Linux-cluster mailing list
Linux-cluster at redhat.com
http://www.redhat.com/mailman/listinfo/linux-cluster




From notiggy at gmail.com  Tue Jul 20 15:44:25 2004
From: notiggy at gmail.com (Brian Jackson)
Date: Tue, 20 Jul 2004 10:44:25 -0500
Subject: [Linux-cluster] GFS limits?
In-Reply-To: <200407191613.38768.phillips@redhat.com>
References: <40F460BB.4040603@smugmug.com>
	<fb20c214040713155014290b6e@mail.gmail.com>
	<200407191613.38768.phillips@redhat.com>
Message-ID: <fb20c2140407200844763399fe@mail.gmail.com>

On Mon, 19 Jul 2004 16:13:37 -0400, Daniel Phillips <phillips at redhat.com> wrote:
> On Tuesday 13 July 2004 18:50, Brian Jackson wrote:
> > > Does GFS intelligently "spread
> > > the load" among multiple storage entities for writing under high
> > > load?
> >
> > No, each node that mounts has direct access to the storage. It writes
> > just like any other fs, when it can.
> 
> Hi Brian,
> 
> He can do that at the block device level, with a device-mapper "striped"
> target.

True but that's not very intelligent. I thought he meant some kind of
hot spot tracking or something similar.

--Brian

> 
> Regards,
> 
> Daniel
>



From amanthei at redhat.com  Tue Jul 20 15:57:19 2004
From: amanthei at redhat.com (Adam Manthei)
Date: Tue, 20 Jul 2004 10:57:19 -0500
Subject: [Linux-cluster] unable to mount gfs partition
In-Reply-To: <40FD0C5B.9060601@nextnationnet.com>
References: <40FD0C5B.9060601@nextnationnet.com>
Message-ID: <20040720155719.GD3866@redhat.com>

On Tue, Jul 20, 2004 at 08:13:15PM +0800, chloong wrote:
> hi all,
> I managed to setup the whole gfs clustering. i have 2 nodes servers in 
> this gfs cluster.
> 
> 1 node is mounting the gfs partition without any issue but the other one 
> not able to mount...giving me error:
> #mount -t gfs /dev/pool/smsgateclu_pool0 /gfs1
> mount: wrong fs type, bad option, bad superblock on 
> /dev/pool/smsgateclu_pool0,
>           or too many mounted file systems
> 
> can anyone facing this problem?

This is the standard error message that mount gives on error.  In general it
isn't very usefull.  More accurate error messages are on the console.  Post
your `dmesg` output if you are still having problems.

-- 
Adam Manthei  <amanthei at redhat.com>



From bmarzins at redhat.com  Tue Jul 20 17:12:35 2004
From: bmarzins at redhat.com (Benjamin Marzinski)
Date: Tue, 20 Jul 2004 12:12:35 -0500
Subject: [Linux-cluster] GNBD: cannot connect to cluster manager
	...Operation not permitted
In-Reply-To: <40FBEB78.8040305@studiobz.it>
References: <40FBEB78.8040305@studiobz.it>
Message-ID: <20040720171235.GG23619@phlogiston.msp.redhat.com>

On Mon, Jul 19, 2004 at 05:40:40PM +0200, Christian Zoffoli wrote:
> 
> Hi to all.
> I have compiled and installed all the stuff in cvs on a vanilla 2.6.7, 
> but I have a big problem when I try to export a device with GNBD:
> 
> 
> ...I've done all the steps in 
> https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS.GNBD.Usage
> 
> 
> ...but when I try
> 
> gnbd_export -v -e export1 -d /dev/sdb1
> 
> it fails with:
> 
> ---
> receiver: ERROR cannot connect to cluster manager : Operation not permitted
> gnbd_export: ERROR gnbd_clusterd failed
> ---
> 
> looking at the log I have found this message:
> ---
> Jul 19 20:28:13 gfs1 receiver[22551]: ERROR [gnbd_clusterd.c:53] cannot 
> connect to cluster manager : Operation not permitted
> ---

If you do not want to enable multipathing or run GFS on the gnbd server,
you can just add a -c to your export line.

Here's a guess at what you might be seeing.  The behaviour that you are seeing
looks like it could be caused by not having the correct magma plugins.

In the cluster/magma/tests directory there is a cluster plugin test program,
cpt, run

# cpt null

if you get something like

Connect failure: Operation not permitted

then either cman isn't running on the node, or magma cannot connect to it.
If cman is running correctly (check you logs) Then look in
/usr/lib/magma/plugins. You should have a sm.so file there. If not, you need
to install the magma plugins, which are located in /cluster/magma-plugins.

-Ben
 
> What can I do?
> Where can I find a little explanation of the cluster.xml file ?
> 
> 
> Thanks,
> Christian
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From chloong at nextnationnet.com  Wed Jul 21 01:54:36 2004
From: chloong at nextnationnet.com (chloong)
Date: Wed, 21 Jul 2004 09:54:36 +0800
Subject: [Linux-cluster] unable to mount gfs partition
In-Reply-To: <20040720155719.GD3866@redhat.com>
References: <40FD0C5B.9060601@nextnationnet.com>
	<20040720155719.GD3866@redhat.com>
Message-ID: <40FDCCDC.5040402@nextnationnet.com>

Adam Manthei wrote:

>On Tue, Jul 20, 2004 at 08:13:15PM +0800, chloong wrote:
>  
>
>>hi all,
>>I managed to setup the whole gfs clustering. i have 2 nodes servers in 
>>this gfs cluster.
>>
>>1 node is mounting the gfs partition without any issue but the other one 
>>not able to mount...giving me error:
>>#mount -t gfs /dev/pool/smsgateclu_pool0 /gfs1
>>mount: wrong fs type, bad option, bad superblock on 
>>/dev/pool/smsgateclu_pool0,
>>          or too many mounted file systems
>>
>>can anyone facing this problem?
>>    
>>
>
>This is the standard error message that mount gives on error.  In general it
>isn't very usefull.  More accurate error messages are on the console.  Post
>your `dmesg` output if you are still having problems.
>
>  
>
hi,
i checked the dmesg, the error is :

lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111
GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata =

where as in /var/log/messages the error is :

lock_gulm: ERROR Got a -111 trying to login to lock_gulmd.  Is it runni
ng?
lock_gulm: ERROR cm_login failed. -111
lock_gulm: ERROR Got a -111 trying to start the threads.
lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111
GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata =

i got 2 nodes in the gfs cluster. 1 is the lock_gulm server and the 
other one is not.
the one that not a lock_gulm server giving me mount error...

Did i need to start the lock_gulm daemon on this server that is not the 
lock_gulm server?

When i start the lock_gulmd on this server it gave me this error in 
/var/log/messages:

lock_gulmd[18399]: You are running in Standard mode.
lock_gulmd[18399]: I am (clu2.abc.com) with ip (192.168.11.212)
lock_gulmd[18399]: Forked core [18400].
lock_gulmd_core[18400]: ERROR [core_io.c:1029] Got error from reply: 
(clu1:192.
168.11.211) 1006:Not Allowed

my cluster.ccs :

cluster {
   name = "smsgateclu"
   lock_gulm {
     servers = ["clu1"]
     heartbeat_rate = 0.3
     allowed_misses = 1
   }
}

nodes.ccs:

nodes {
 clu1 {
  ip_interfaces {
   eth2 = "192.168.11.211"
  }
  fence {
   human {
    admin {
     ipaddr = "192.168.11.211"
    }
   }
  }
 }
 clu2 {
  ip_interfaces {
   eth2 = "192.168.11.212"
  }
  fence {
   human {
    admin {
     ipaddr = "192.168.11.212"
    }
   }
  }
 }
}

fence.ccs:

fence_devices {
 admin {
  agent = "fence_manual"
 }
}

Please help!

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040721/cce4f9e5/attachment.htm>

From mtilstra at redhat.com  Wed Jul 21 15:02:39 2004
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Wed, 21 Jul 2004 10:02:39 -0500
Subject: [Linux-cluster] unable to mount gfs partition
In-Reply-To: <40FDCCDC.5040402@nextnationnet.com>
References: <40FD0C5B.9060601@nextnationnet.com>
	<20040720155719.GD3866@redhat.com>
	<40FDCCDC.5040402@nextnationnet.com>
Message-ID: <20040721150239.GA20220@redhat.com>

On Wed, Jul 21, 2004 at 09:54:36AM +0800, chloong wrote:
>    hi,
>    i checked the dmesg, the error is :
>    lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111
>    GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata =
>    where as in /var/log/messages the error is :
>    lock_gulm:  ERROR  Got  a  -111  trying to login to lock_gulmd.  Is it
>    runni
>    ng?
>    lock_gulm: ERROR cm_login failed. -111
>    lock_gulm: ERROR Got a -111 trying to start the threads.
>    lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111
>    GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata =
>    i  got  2  nodes in the gfs cluster. 1 is the lock_gulm server and the
>    other one is not.
>    the one that not a lock_gulm server giving me mount error...
>    Did  i  need  to start the lock_gulm daemon on this server that is not
>    the lock_gulm server?

yes, you need to start lock_gulmd on every node.

>    When  i  start  the lock_gulmd on this server it gave me this error in
>    /var/log/messages:
>    lock_gulmd[18399]: You are running in Standard mode.
>    lock_gulmd[18399]: I am (clu2.abc.com) with ip (192.168.11.212)
>    lock_gulmd[18399]: Forked core [18400].
>    lock_gulmd_core[18400]:  ERROR  [core_io.c:1029] Got error from reply:
>    (clu1:192.
>    168.11.211) 1006:Not Allowed

it might be marked expired.  do a 'gulm_tool nodelist clu1'  that will
list what state gulm thinks each node is in.
If it is marked expired, and given your ccs config, you'll need to
complete the fence manual action. (erm, i forget how that's done, man
pages should tell.)


-- 
Michael Conrad Tadpol Tilstra
The Grand Illusion: "I am in control!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040721/8215eac2/attachment.sig>

From danderso at redhat.com  Wed Jul 21 15:14:49 2004
From: danderso at redhat.com (Derek Anderson)
Date: Wed, 21 Jul 2004 10:14:49 -0500
Subject: [Linux-cluster] unable to mount gfs partition
In-Reply-To: <40FDCCDC.5040402@nextnationnet.com>
References: <40FD0C5B.9060601@nextnationnet.com>
	<20040720155719.GD3866@redhat.com>
	<40FDCCDC.5040402@nextnationnet.com>
Message-ID: <200407211014.49096.danderso@redhat.com>

> hi,
> i checked the dmesg, the error is :
>
> lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111
> GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata =
>
> where as in /var/log/messages the error is :
>
> lock_gulm: ERROR Got a -111 trying to login to lock_gulmd.  Is it runni
> ng?
> lock_gulm: ERROR cm_login failed. -111
> lock_gulm: ERROR Got a -111 trying to start the threads.
> lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111
> GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata =
>
> i got 2 nodes in the gfs cluster. 1 is the lock_gulm server and the
> other one is not.
> the one that not a lock_gulm server giving me mount error...
>
> Did i need to start the lock_gulm daemon on this server that is not the
> lock_gulm server?
>
> When i start the lock_gulmd on this server it gave me this error in
> /var/log/messages:
>
> lock_gulmd[18399]: You are running in Standard mode.
> lock_gulmd[18399]: I am (clu2.abc.com) with ip (192.168.11.212)
> lock_gulmd[18399]: Forked core [18400].
> lock_gulmd_core[18400]: ERROR [core_io.c:1029] Got error from reply:
> (clu1:192.
> 168.11.211) 1006:Not Allowed
>
> my cluster.ccs :
>
> cluster {
>    name = "smsgateclu"
>    lock_gulm {
>      servers = ["clu1"]
>      heartbeat_rate = 0.3
>      allowed_misses = 1
>    }
> }

Like tadpol said in the last post, you are most likely expired.  Where are 
people getting these ridiculously low examples of heartbeat_rate and 
allowed_misses?  No wonder you're fenced.

>
> nodes.ccs:
>
> nodes {
>  clu1 {
>   ip_interfaces {
>    eth2 = "192.168.11.211"
>   }
>   fence {
>    human {
>     admin {
>      ipaddr = "192.168.11.211"
>     }
>    }
>   }
>  }
>  clu2 {
>   ip_interfaces {
>    eth2 = "192.168.11.212"
>   }
>   fence {
>    human {
>     admin {
>      ipaddr = "192.168.11.212"
>     }
>    }
>   }
>  }
> }
>
> fence.ccs:
>
> fence_devices {
>  admin {
>   agent = "fence_manual"
>  }
> }
>
> Please help!
>
> Thanks.



From danderso at redhat.com  Wed Jul 21 15:21:48 2004
From: danderso at redhat.com (Derek Anderson)
Date: Wed, 21 Jul 2004 10:21:48 -0500
Subject: [Linux-cluster] unable to mount gfs partition
In-Reply-To: <200407211014.49096.danderso@redhat.com>
References: <40FD0C5B.9060601@nextnationnet.com>
	<40FDCCDC.5040402@nextnationnet.com>
	<200407211014.49096.danderso@redhat.com>
Message-ID: <200407211021.48671.danderso@redhat.com>

On Wednesday 21 July 2004 10:14, Derek Anderson wrote:
> > hi,
> > i checked the dmesg, the error is :
> >
> > lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111
> > GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata =
> >
> > where as in /var/log/messages the error is :
> >
> > lock_gulm: ERROR Got a -111 trying to login to lock_gulmd.  Is it runni
> > ng?
> > lock_gulm: ERROR cm_login failed. -111
> > lock_gulm: ERROR Got a -111 trying to start the threads.
> > lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111
> > GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata =
> >
> > i got 2 nodes in the gfs cluster. 1 is the lock_gulm server and the
> > other one is not.
> > the one that not a lock_gulm server giving me mount error...
> >
> > Did i need to start the lock_gulm daemon on this server that is not the
> > lock_gulm server?
> >
> > When i start the lock_gulmd on this server it gave me this error in
> > /var/log/messages:
> >
> > lock_gulmd[18399]: You are running in Standard mode.
> > lock_gulmd[18399]: I am (clu2.abc.com) with ip (192.168.11.212)
> > lock_gulmd[18399]: Forked core [18400].
> > lock_gulmd_core[18400]: ERROR [core_io.c:1029] Got error from reply:
> > (clu1:192.
> > 168.11.211) 1006:Not Allowed
> >
> > my cluster.ccs :
> >
> > cluster {
> >    name = "smsgateclu"
> >    lock_gulm {
> >      servers = ["clu1"]
> >      heartbeat_rate = 0.3
> >      allowed_misses = 1
> >    }
> > }
>
> Like tadpol said in the last post, you are most likely expired.  Where are
> people getting these ridiculously low examples of heartbeat_rate and
> allowed_misses?  No wonder you're fenced.

Doh!  Right out of the GFS 6.0 manual, huh?  I think we should change that 
example (Table 6.1).  Anyway, you should try something closer to the defaults 
of heartbeat_rate=15, allowed_misses=2 to keep your nodes from getting 
unnecessarily fenced.  Depending on network traffic load you can move it 
down.  It's one of those things you kind of have to play with.

>
> > nodes.ccs:
> >
> > nodes {
> >  clu1 {
> >   ip_interfaces {
> >    eth2 = "192.168.11.211"
> >   }
> >   fence {
> >    human {
> >     admin {
> >      ipaddr = "192.168.11.211"
> >     }
> >    }
> >   }
> >  }
> >  clu2 {
> >   ip_interfaces {
> >    eth2 = "192.168.11.212"
> >   }
> >   fence {
> >    human {
> >     admin {
> >      ipaddr = "192.168.11.212"
> >     }
> >    }
> >   }
> >  }
> > }
> >
> > fence.ccs:
> >
> > fence_devices {
> >  admin {
> >   agent = "fence_manual"
> >  }
> > }
> >
> > Please help!
> >
> > Thanks.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From adam.cassar at netregistry.com.au  Wed Jul 21 22:30:37 2004
From: adam.cassar at netregistry.com.au (Adam Cassar)
Date: Thu, 22 Jul 2004 08:30:37 +1000
Subject: [Linux-cluster] Quotas
Message-ID: <1090449037.22972.157.camel@akira.nro.au.com>

Hi Guys,

I've got GFS running in a two node set up on kernel 2.6.7 on debian
stable.

I can mount both partitions and normal file access seems fine. However
any quota related commands just hang: ie

./gfs_quota init -f /mnt 

just sits there and is unkillable. Below are some interesting lines from
ps:

5     0   621     1  16   0  4312 1080 -      Ss   ?          0:00
./ccsd
5     0   622   621  16   0  4312 1080 -      S    ?          0:00
./ccsd
5     0   623   622  16   0  4312 1080 414395 S    ?          0:03
./ccsd
1     0   625     1   9  -6     0    0 cluste S<   ?          0:00
[cman_comms]
5     0   626     1  10  -5     0    0 member S<   ?          0:00
[cman_memb]
1     0   627     1  15   0     0    0 servic S    ?          0:00
[cman_serviced]
1     0   628     1   9  -6     0    0 hello_ S<   ?          0:00
[cman_hbeat]
5     0   631     1  18   0  1344  484 pause  Ss   ?          0:00
fenced
1     0   632     1  19   0     0    0 kcl_jo D    ?          0:00
[cman_userjoin]
1     0   641     1  15   0     0    0 dlm_re S    ?          0:00
[dlm_recoverd]
1     0   642     1  15   0     0    0 dlm_as S    ?          0:30
[dlm_astd]
1     0   643     1  15   0     0    0 dlm_re S    ?          0:13
[dlm_recvd]
1     0   644     1  15   0     0    0 dlm_se S    ?          0:10
[dlm_sendd]
1     0   645     1  15   0     0    0 dlm_as S    ?          0:18
[lock_dlm]
1     0   646     1  15   0     0    0 dlm_as S    ?          0:20
[lock_dlm]
1     0   647     1  15   0     0    0 -      S    ?          0:06
[gfs_scand]
1     0   648     1  15   0     0    0 gfs_gl S    ?          0:05
[gfs_glockd]
1     0   649     1  15   0     0    0 -      S    ?          0:00
[gfs_recoverd]
1     0   650     1  15   0     0    0 -      S    ?          0:00
[gfs_logd]
1     0   651     1  15   0     0    0 -      S    ?          0:00
[gfs_quotad]
1     0   652     1  15   0     0    0 -      S    ?          0:00
[gfs_inoded]
5     0  1080   286  17   0  5704 1692 -      S    ?          0:00
/usr/sbin/sshd
5  1002  1082  1080  16   0  5716 1784 -      S    ?          0:00
/usr/sbin/sshd
0  1002  1083  1082  17   0  2208 1212 wait4  Ss   pts/1      0:00 -bash
4     0  1087  1083  15   0  2288 1308 wait4  S    pts/1      0:00 -su
4     0  1070   609  18   0  1284  400 -      R+   pts/0      0:28
./gfs_quota init -f /mnt




From adam.cassar at netregistry.com.au  Thu Jul 22 01:12:22 2004
From: adam.cassar at netregistry.com.au (Adam Cassar)
Date: Thu, 22 Jul 2004 11:12:22 +1000
Subject: [Linux-cluster] gnbd crash
Message-ID: <1090458742.22972.192.camel@akira.nro.au.com>

Guys,

I was attempting to use gnbd. I exported the device on the server and
attempted to mount it on the client.

mount /dev/gnbd/export1 /mnt on the client hangs. The server shows the
following in dmesg:

Unable to handle kernel paging request at virtual address 19191959
 printing eip:
f8982489
*pde = 00000000
Oops: 0000 [#1]
SMP 
Modules linked in: gnbd lock_dlm dlm cman gfs lock_harness dm_mod
CPU:    0
EIP:    0060:[<f8982489>]    Not tainted
EFLAGS: 00010282   (2.6.7) 
EIP is at name_to_directory_nodeid+0x15/0xf9 [dlm]
eax: 19191919   ebx: f1d02cd4   ecx: 00000000   edx: f1d02cc4
esi: 00000000   edi: 19191919   ebp: f1d02cc4   esp: f74d3df4
ds: 007b   es: 007b   ss: 0068
Process dlm_recoverd (pid: 641, threadinfo=f74d2000 task=f4a00940)
Stack: 00000000 00000025 00000001 c01ed6dd c01ede1c 00000000 f1d02cd4
00000000 
       dc8260d4 f1d02cc4 f898258e 19191919 f1d02d3d 00000000 f8982b51
f1d02cc4 
       f74d3e58 00000008 c04010a0 00000246 00000001 c18d7ccc dc8260bc
c18d7cd4 
Call Trace:
 [<c01ed6dd>] scrup+0x13b/0x14f
 [<c01ede1c>] complement_pos+0x20/0x183
 [<f898258e>] get_directory_nodeid+0x21/0x25 [dlm]
 [<f8982b51>] dlm_dir_rebuild_send+0xec/0x27d [dlm]
 [<f898f721>] rcom_process_message+0x2d6/0x558 [dlm]
 [<f898f3f9>] rcom_send_message+0x1c5/0x217 [dlm]
 [<f89828f4>] dlm_dir_rebuild_local+0x12b/0x29c [dlm]
 [<f8990edc>] ls_reconfig+0x79/0x292 [dlm]
 [<f8991cac>] do_ls_recovery+0x166/0x436 [dlm]
 [<f89920bf>] dlm_recoverd+0x143/0x16e [dlm]
 [<c0115b6c>] default_wake_function+0x0/0x12
 [<c0103cc6>] ret_from_fork+0x6/0x14
 [<c0115b6c>] default_wake_function+0x0/0x12
 [<f8991f7c>] dlm_recoverd+0x0/0x16e [dlm]
 [<c0102131>] kernel_thread_helper+0x5/0xb

Code: 83 7f 40 01 74 65 8b 44 24 34 89 44 24 04 8b 44 24 30 89 04 
 





From czoffoli at xmerlin.org  Wed Jul 21 00:00:28 2004
From: czoffoli at xmerlin.org (Christian Zoffoli)
Date: Wed, 21 Jul 2004 02:00:28 +0200
Subject: [Linux-cluster] GNBD: cannot connect to cluster
	manager	...Operation not permitted
In-Reply-To: <20040720171235.GG23619@phlogiston.msp.redhat.com>
References: <40FBEB78.8040305@studiobz.it>
	<20040720171235.GG23619@phlogiston.msp.redhat.com>
Message-ID: <40FDB21C.1030401@xmerlin.org>

Benjamin Marzinski wrote:
[cut]
> If you do not want to enable multipathing or run GFS on the gnbd server,
> you can just add a -c to your export line.

...I need multipathing ;) ...with -c it works


[cut]
> 
> # cpt null
> 
> if you get something like
> 
> Connect failure: Operation not permitted

Yes, I get a message like this one.

> then either cman isn't running on the node, or magma cannot connect to it.
> If cman is running correctly (check you logs) Then look in
> /usr/lib/magma/plugins. You should have a sm.so file there. If not, you need
> to install the magma plugins, which are located in /cluster/magma-plugins.

cman is running and I have a sm.so plugin


Christian



From zhuyfa at lenovo.com  Thu Jul 22 05:56:37 2004
From: zhuyfa at lenovo.com (zhuyfa at lenovo.com)
Date: Thu, 22 Jul 2004 13:56:37 +0800
Subject: [Linux-cluster] Does Gfs only run in kernel 2.6.7 ? (all) 
Message-ID: <OF28D2F7C4.DC046287-ON48256ED9.002085F2@legend.com.cn>


*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
????(??)????  ????????????
???(walkinair)       13810259866
010-58864076 zhuyfa at lenovo.com
?????????6?  ??8688?? 100085
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
???????????,??????????;
??,?????!
?????????,???????????;
??,?????!!
???????????,???????;
??,????!!!
_________________________________________________



From lserinol at gmail.com  Thu Jul 22 08:23:33 2004
From: lserinol at gmail.com (Levent Serinol)
Date: Thu, 22 Jul 2004 11:23:33 +0300
Subject: [Linux-cluster] GFS maximum filesystem size ?
Message-ID: <2c1942a7040722012362fb3810@mail.gmail.com>

Hi,

 what is the maximum GFS filesystem size ? AFAIK, documents in
redhat.com site says that this limit is 2TB.  Also, it says that
bigger filesystem size support will be available when RHEL kernel
migrated to to 2.6.
anybody here knows when rhel kernel 2.6 will be released ?

thanks,



From john.hearns at clustervision.com  Thu Jul 22 08:31:24 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Thu, 22 Jul 2004 09:31:24 +0100
Subject: [Linux-cluster] GFS maximum filesystem size ?
In-Reply-To: <2c1942a7040722012362fb3810@mail.gmail.com>
References: <2c1942a7040722012362fb3810@mail.gmail.com>
Message-ID: <1090485084.6205.7.camel@vigor12>

On Thu, 2004-07-22 at 09:23, Levent Serinol wrote:
> Hi,
> 
>  what is the maximum GFS filesystem size ? AFAIK, documents in
> redhat.com site says that this limit is 2TB.  Also, it says that
> bigger filesystem size support will be available when RHEL kernel
> migrated to to 2.6.
> anybody here knows when rhel kernel 2.6 will be released ?
This is of course best answered by someone from Redhat.

But if you want to work with 2.6 now, why not install an R+D system
running Fedora 2?
RPMs available from: http://www2.wantstofly.org/gfs/

That's the concept of Fedora - a faster release cycle, so you can try
out leading edge features, and of course a closer relationship with the
community.
This leaves RHEL to be more stable, less frequent release cycle,with a
much longer time till End of Life.
I think it s fair bet that RHEL 4 will have 2.6. 





From julien.senon at toulouse.inra.fr  Thu Jul 22 09:16:05 2004
From: julien.senon at toulouse.inra.fr (Julien Senon)
Date: Thu, 22 Jul 2004 11:16:05 +0200
Subject: [Linux-cluster] Problem with GFS
Message-ID: <40FF85D5.7090804@toulouse.inra.fr>

Hi,

I am a problem with GFS :

When I execute this command : "ls" in a directory which are mounted by 
GFS, and which contains 100 sub-directory, I wait 20min after the 
command line was type.
Who are the problem ? and What is the atime ?
Thank you for yours response.

Julien Senon

-- 
Julien Senon - julien.senon at toulouse.inra.fr - 
INRA - G??nopole plateforme Bioinformatique - 
Unit?? de Biom??trie et Intelligence Artificielle
BP 27 - 31326 Castanet Tolosan cedex



From lserinol at gmail.com  Thu Jul 22 09:53:48 2004
From: lserinol at gmail.com (Levent Serinol)
Date: Thu, 22 Jul 2004 12:53:48 +0300
Subject: [Linux-cluster] lock_gulm is very slow. why ?
Message-ID: <2c1942a704072202534d487950@mail.gmail.com>

Hi,
 I have done some benchmark tests with postmark(tests repeated many
times). There is one client (also it is lock server). and another one
which exports it's scsi hard disk with gnbd.

filesystem created with lock_gulm:
-----------------------------------------
Time:
        94 seconds total
        7 seconds of transactions (142 per second)

Files:
        10692 created (113 per second)
                Creation alone: 10000 files (434 per second)
                Mixed with transactions: 692 files (98 per second)
        899 read (128 per second)
        101 appended (14 per second)
        10692 deleted (113 per second)
                Deletion alone: 10384 files (162 per second)
                Mixed with transactions: 308 files (44 per second)

Data:
        21.05 megabytes read (229.28 kilobytes per second)
        250.41 megabytes written (2.66 megabytes per second)


filesystem created with no_lock:
--------------------------------------
Time:
        35 seconds total
        4 seconds of transactions (250 per second)

Files:
        10692 created (305 per second)
                Creation alone: 10000 files (454 per second)
                Mixed with transactions: 692 files (173 per second)
        899 read (224 per second)
        101 appended (25 per second)
        10692 deleted (305 per second)
                Deletion alone: 10384 files (1153 per second)
                Mixed with transactions: 308 files (77 per second)

Data:
        21.05 megabytes read (615.77 kilobytes per second)
        250.41 megabytes written (7.15 megabytes per second)


as you can see nolock results is 2 times (some parts 3 times) faster
then with locked one .
what could be the problem ? is there any workaround or settune option
(releasing locks earlier,etc...) ?



From lists at wikidev.net  Thu Jul 22 10:20:46 2004
From: lists at wikidev.net (Gabriel Wicke)
Date: Thu, 22 Jul 2004 12:20:46 +0200
Subject: [Linux-cluster] GNBD: cannot connect to cluster
	manager	...Operation not permitted
In-Reply-To: <40FDB21C.1030401@xmerlin.org>
References: <40FBEB78.8040305@studiobz.it>
	<20040720171235.GG23619@phlogiston.msp.redhat.com>
	<40FDB21C.1030401@xmerlin.org>
Message-ID: <1090491646.1306.19.camel@venus>

On Wed, 2004-07-21 at 02:00 +0200, Christian Zoffoli wrote:
> Benjamin Marzinski wrote:
> [cut]
> > If you do not want to enable multipathing or run GFS on the gnbd server,
> > you can just add a -c to your export line.
> 
> ...I need multipathing ;) ...with -c it works
> 
> 
> [cut]
> > 
> > # cpt null
> > 
> > if you get something like
> > 
> > Connect failure: Operation not permitted
> 
> Yes, I get a message like this one.
> 
> > then either cman isn't running on the node, or magma cannot connect to it.
> > If cman is running correctly (check you logs) Then look in
> > /usr/lib/magma/plugins. You should have a sm.so file there. If not, you need
> > to install the magma plugins, which are located in /cluster/magma-plugins.
> 
> cman is running and I have a sm.so plugin


On my system (debian unstable) it expects the plugin folder
in /lib/magma/plugins, you could add a symlink and see if it works. Else
you can run a test program from the magma source dir, magma/tests/cpt
null. Stracing this will show you the place it's looking for (will show
an ENOENT near the end of the strace).
The reason for this problem seems to be the usage of $libdir in the
magma-plugins makefiles or somesuch.
-- 
Gabriel Wicke



From arekm at pld-linux.org  Thu Jul 22 11:05:17 2004
From: arekm at pld-linux.org (Arkadiusz Miskiewicz)
Date: Thu, 22 Jul 2004 13:05:17 +0200
Subject: [Linux-cluster] GFS maximum filesystem size ?
In-Reply-To: <1090485084.6205.7.camel@vigor12>
References: <2c1942a7040722012362fb3810@mail.gmail.com>
	<1090485084.6205.7.camel@vigor12>
Message-ID: <200407221305.17025.arekm@pld-linux.org>

On Thursday 22 of July 2004 10:31, John Hearns wrote:

> But if you want to work with 2.6 now, why not install an R+D system
> running Fedora 2?
> RPMs available from: http://www2.wantstofly.org/gfs/
None of gfs related rpms I have seen so far doesn't include system integration 
scripts like initscripts etc.

Does anyone have nice conception or better working scripts to fully integrate 
GFS with system like Fedora?

-- 
Arkadiusz Mi?kiewicz     CS at FoE, Wroclaw University of Technology
arekm.pld-linux.org, 1024/3DB19BBD, JID: arekm.jabber.org, PLD/Linux



From mtilstra at redhat.com  Thu Jul 22 14:53:45 2004
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Thu, 22 Jul 2004 09:53:45 -0500
Subject: [Linux-cluster] lock_gulm is very slow. why ?
In-Reply-To: <2c1942a704072202534d487950@mail.gmail.com>
References: <2c1942a704072202534d487950@mail.gmail.com>
Message-ID: <20040722145345.GA22628@redhat.com>

On Thu, Jul 22, 2004 at 12:53:48PM +0300, Levent Serinol wrote:
> Hi,
>  I have done some benchmark tests with postmark(tests repeated many
> times). There is one client (also it is lock server). and another one
> which exports it's scsi hard disk with gnbd.
[snipped a lot of nice data]
> as you can see nolock results is 2 times (some parts 3 times) faster
> then with locked one .
> what could be the problem ? is there any workaround or settune option
> (releasing locks earlier,etc...) ?

the biggest thing you are probably running into is that when running
with lock_nolock, gfs knows that it is not in a cluster, therefor it can
enable some optimisations that only work for lcoal filesystems.  These
optimisations would corrupt disk data if you had multiple nodes mounted.

There is also no network traffic for handling lock in lock_nolock, but
that is minor compaired to the local file system optimisations.

Basically, gfs with lock_nolock should always be quite faster than with
any cluster locking (lock_gulm or lock_dlm).


Ken could say more on this.

-- 
Michael Conrad Tadpol Tilstra
Reality is for people who lack imagination.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040722/9f28fa87/attachment.sig>

From kpreslan at redhat.com  Thu Jul 22 14:58:37 2004
From: kpreslan at redhat.com (Ken Preslan)
Date: Thu, 22 Jul 2004 09:58:37 -0500
Subject: [Linux-cluster] lock_gulm is very slow. why ?
In-Reply-To: <20040722145345.GA22628@redhat.com>
References: <2c1942a704072202534d487950@mail.gmail.com>
	<20040722145345.GA22628@redhat.com>
Message-ID: <20040722145837.GA29470@potassium.msp.redhat.com>

On Thu, Jul 22, 2004 at 09:53:45AM -0500, Michael Conrad Tadpol Tilstra wrote:
> On Thu, Jul 22, 2004 at 12:53:48PM +0300, Levent Serinol wrote:
> > Hi,
> >  I have done some benchmark tests with postmark(tests repeated many
> > times). There is one client (also it is lock server). and another one
> > which exports it's scsi hard disk with gnbd.
> [snipped a lot of nice data]
> > as you can see nolock results is 2 times (some parts 3 times) faster
> > then with locked one .
> > what could be the problem ? is there any workaround or settune option
> > (releasing locks earlier,etc...) ?
> 
> the biggest thing you are probably running into is that when running
> with lock_nolock, gfs knows that it is not in a cluster, therefor it can
> enable some optimisations that only work for lcoal filesystems.  These
> optimisations would corrupt disk data if you had multiple nodes mounted.

You can turn off those optimizations with lock_nolock by mounting with
"-o ignore_local_fs".  That will let us figure out what is optimizations
and what is lock latency.

> There is also no network traffic for handling lock in lock_nolock, but
> that is minor compaired to the local file system optimisations.
> 
> Basically, gfs with lock_nolock should always be quite faster than with
> any cluster locking (lock_gulm or lock_dlm).
>
> Ken could say more on this.
> 
> -- 
> Michael Conrad Tadpol Tilstra
> Reality is for people who lack imagination.



> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster


-- 
Ken Preslan <kpreslan at redhat.com>



From amir at datacore.ch  Thu Jul 22 15:05:59 2004
From: amir at datacore.ch (Amir Guindehi)
Date: Thu, 22 Jul 2004 17:05:59 +0200
Subject: [Linux-cluster] GFS maximum filesystem size ?
In-Reply-To: <200407221305.17025.arekm@pld-linux.org>
References: <2c1942a7040722012362fb3810@mail.gmail.com>	<1090485084.6205.7.camel@vigor12>
	<200407221305.17025.arekm@pld-linux.org>
Message-ID: <40FFD7D7.1020702@datacore.ch>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Arkadiusz,

| None of gfs related rpms I have seen so far doesn't include system
integration
| scripts like initscripts etc.
|
| Does anyone have nice conception or better working scripts to fully
integrate
| GFS with system like Fedora?

I wrote init scripts for my GenToo system.

You can find them inside of the Ebuilds i published in the GFS section
at: https://open.datacore.ch/page/GFS.Install

The scripts are able to start the cluster, join the fence domain, start
gnbd inport/export and finally to mount the GFS filesystem automatically.

Regards
- - Amir
- --
Amir Guindehi, nospam.amir at datacore.ch
DataCore GmbH, Witikonerstrasse 289, 8053 Zurich, Switzerland

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-nr1 (Windows 2000)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA/9fVbycOjskSVCwRAhY7AKCfQUWBFgEsl7RbNr0qHiQ6kd8NWgCgmzdU
23vcTSkgfu+/0/c0VCi6pmI=
=eMYj
-----END PGP SIGNATURE-----



From jbrassow at redhat.com  Thu Jul 22 15:28:19 2004
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Thu, 22 Jul 2004 10:28:19 -0500
Subject: [Linux-cluster] Does Gfs only run in kernel 2.6.7 ? (all) 
In-Reply-To: <OF28D2F7C4.DC046287-ON48256ED9.002085F2@legend.com.cn>
References: <OF28D2F7C4.DC046287-ON48256ED9.002085F2@legend.com.cn>
Message-ID: <C2A7CAD2-DBF3-11D8-B716-000A957BB1F6@redhat.com>

GFS 6.0.0 runs on the 2.4 kernel - look at 
ftp.redhat.com/pub/redhat/linux/updates/enterprise/3AS/en/RHGFS/SRPMS

GFS (cvs/devel) runs on the 2.6 kernel - look at 
sources.redhat.com/cluster

  brassow
On Jul 22, 2004, at 12:56 AM, zhuyfa at lenovo.com wrote:

>
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> ????(??)????  ????????????
> ???(walkinair)       13810259866
> 010-58864076 zhuyfa at lenovo.com
> ?????????6?  ??8688?? 100085
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> ???????????,??????????;
> ??,?????!
> ?????????,???????????;
> ??,?????!!
> ???????????,???????;
> ??,????!!!
> _________________________________________________
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1920 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040722/a0dde7eb/attachment.bin>

From jbrassow at redhat.com  Thu Jul 22 15:31:24 2004
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Thu, 22 Jul 2004 10:31:24 -0500
Subject: [Linux-cluster] GNBD: cannot connect to cluster
	manager	...Operation not permitted
In-Reply-To: <1090491646.1306.19.camel@venus>
References: <40FBEB78.8040305@studiobz.it>
	<20040720171235.GG23619@phlogiston.msp.redhat.com>
	<40FDB21C.1030401@xmerlin.org> <1090491646.1306.19.camel@venus>
Message-ID: <30CBEEAA-DBF4-11D8-B716-000A957BB1F6@redhat.com>


On Jul 22, 2004, at 5:20 AM, Gabriel Wicke wrote:

> On Wed, 2004-07-21 at 02:00 +0200, Christian Zoffoli wrote:
>> Benjamin Marzinski wrote:
>> [cut]
>>> If you do not want to enable multipathing or run GFS on the gnbd 
>>> server,
>>> you can just add a -c to your export line.
>>
>> ...I need multipathing ;) ...with -c it works
>>
>>
>> [cut]
>>>
>>> # cpt null
>>>
>>> if you get something like
>>>
>>> Connect failure: Operation not permitted
>>
>> Yes, I get a message like this one.
>>
>>> then either cman isn't running on the node, or magma cannot connect 
>>> to it.
>>> If cman is running correctly (check you logs) Then look in
>>> /usr/lib/magma/plugins. You should have a sm.so file there. If not, 
>>> you need
>>> to install the magma plugins, which are located in 
>>> /cluster/magma-plugins.
>>
>> cman is running and I have a sm.so plugin
>
>
> On my system (debian unstable) it expects the plugin folder
> in /lib/magma/plugins, you could add a symlink and see if it works. 
> Else
> you can run a test program from the magma source dir, magma/tests/cpt
> null. Stracing this will show you the place it's looking for (will show
> an ENOENT near the end of the strace).
> The reason for this problem seems to be the usage of $libdir in the
> magma-plugins makefiles or somesuch.
>

I noticed the configure scripts are not always consistent WRT 
%{libdir}.  I believe this may be causing some of the confusion...  As 
stated, the symlinks will work, but I intend to correct the configure 
scripts in cvs soon.

  brassow



From danderso at redhat.com  Thu Jul 22 15:38:23 2004
From: danderso at redhat.com (Derek Anderson)
Date: Thu, 22 Jul 2004 10:38:23 -0500
Subject: [Linux-cluster] Quotas
In-Reply-To: <1090449037.22972.157.camel@akira.nro.au.com>
References: <1090449037.22972.157.camel@akira.nro.au.com>
Message-ID: <200407221038.23945.danderso@redhat.com>

I am not seeing this on fedora core 2.

[root at link-11 /]# mount
/dev/hda2 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbdevfs on /proc/bus/usb type usbdevfs (rw)
/dev/hda1 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/sda1 on /data1 type gfs (rw)
[root at link-11 /]# gfs_quota init -f /data1
[root at link-11 /]# gfs_quota list -f /data1
user        root:  limit: 880.0      warn: 870.0      value: 98.1
user         bin:  limit: 890.0      warn: 880.0      value: 0.0
user      daemon:  limit: 900.0      warn: 890.0      value: 0.0
user         adm:  limit: 910.0      warn: 900.0      value: 0.0
user          lp:  limit: 920.0      warn: 910.0      value: 0.0
user        sync:  limit: 930.0      warn: 920.0      value: 0.0
user    shutdown:  limit: 940.0      warn: 930.0      value: 0.0
user        halt:  limit: 950.0      warn: 940.0      value: 0.0
user        mail:  limit: 960.0      warn: 950.0      value: 0.0
user        news:  limit: 970.0      warn: 960.0      value: 0.0
user        uucp:  limit: 980.0      warn: 970.0      value: 0.0
user    operator:  limit: 990.0      warn: 980.0      value: 0.0
. . .
group       root:  limit: 12600.0    warn: 12500.0    value: 98.1
group        bin:  limit: 12700.0    warn: 12600.0    value: 0.0
group     daemon:  limit: 12800.0    warn: 12700.0    value: 0.0
group        sys:  limit: 12900.0    warn: 12800.0    value: 0.0
group        adm:  limit: 13000.0    warn: 12900.0    value: 0.0
group        tty:  limit: 13100.0    warn: 13000.0    value: 0.0
group       disk:  limit: 13200.0    warn: 13100.0    value: 0.0
group         lp:  limit: 13300.0    warn: 13200.0    value: 0.0
group        mem:  limit: 13400.0    warn: 13300.0    value: 0.0
group       kmem:  limit: 13500.0    warn: 13400.0    value: 0.0
. . .
[root at link-11 /]# gfs_quota warn -u bin -l 6666 -f /data1
[root at link-11 /]# gfs_quota list -f /data1
user        root:  limit: 880.0      warn: 870.0      value: 98.1
user         bin:  limit: 890.0      warn: 6666.0     value: 0.0
. . .

On Wednesday 21 July 2004 17:30, Adam Cassar wrote:
> Hi Guys,
>
> I've got GFS running in a two node set up on kernel 2.6.7 on debian
> stable.
>
> I can mount both partitions and normal file access seems fine. However
> any quota related commands just hang: ie
>
> ./gfs_quota init -f /mnt
>
> just sits there and is unkillable. Below are some interesting lines from
> ps:
>
> 5     0   621     1  16   0  4312 1080 -      Ss   ?          0:00
> ./ccsd
> 5     0   622   621  16   0  4312 1080 -      S    ?          0:00
> ./ccsd
> 5     0   623   622  16   0  4312 1080 414395 S    ?          0:03
> ./ccsd
> 1     0   625     1   9  -6     0    0 cluste S<   ?          0:00
> [cman_comms]
> 5     0   626     1  10  -5     0    0 member S<   ?          0:00
> [cman_memb]
> 1     0   627     1  15   0     0    0 servic S    ?          0:00
> [cman_serviced]
> 1     0   628     1   9  -6     0    0 hello_ S<   ?          0:00
> [cman_hbeat]
> 5     0   631     1  18   0  1344  484 pause  Ss   ?          0:00
> fenced
> 1     0   632     1  19   0     0    0 kcl_jo D    ?          0:00
> [cman_userjoin]
> 1     0   641     1  15   0     0    0 dlm_re S    ?          0:00
> [dlm_recoverd]
> 1     0   642     1  15   0     0    0 dlm_as S    ?          0:30
> [dlm_astd]
> 1     0   643     1  15   0     0    0 dlm_re S    ?          0:13
> [dlm_recvd]
> 1     0   644     1  15   0     0    0 dlm_se S    ?          0:10
> [dlm_sendd]
> 1     0   645     1  15   0     0    0 dlm_as S    ?          0:18
> [lock_dlm]
> 1     0   646     1  15   0     0    0 dlm_as S    ?          0:20
> [lock_dlm]
> 1     0   647     1  15   0     0    0 -      S    ?          0:06
> [gfs_scand]
> 1     0   648     1  15   0     0    0 gfs_gl S    ?          0:05
> [gfs_glockd]
> 1     0   649     1  15   0     0    0 -      S    ?          0:00
> [gfs_recoverd]
> 1     0   650     1  15   0     0    0 -      S    ?          0:00
> [gfs_logd]
> 1     0   651     1  15   0     0    0 -      S    ?          0:00
> [gfs_quotad]
> 1     0   652     1  15   0     0    0 -      S    ?          0:00
> [gfs_inoded]
> 5     0  1080   286  17   0  5704 1692 -      S    ?          0:00
> /usr/sbin/sshd
> 5  1002  1082  1080  16   0  5716 1784 -      S    ?          0:00
> /usr/sbin/sshd
> 0  1002  1083  1082  17   0  2208 1212 wait4  Ss   pts/1      0:00 -bash
> 4     0  1087  1083  15   0  2288 1308 wait4  S    pts/1      0:00 -su
> 4     0  1070   609  18   0  1284  400 -      R+   pts/0      0:28
> ./gfs_quota init -f /mnt
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From amanthei at redhat.com  Thu Jul 22 15:36:41 2004
From: amanthei at redhat.com (Adam Manthei)
Date: Thu, 22 Jul 2004 10:36:41 -0500
Subject: [Linux-cluster] Quotas
In-Reply-To: <200407221038.23945.danderso@redhat.com>
References: <1090449037.22972.157.camel@akira.nro.au.com>
	<200407221038.23945.danderso@redhat.com>
Message-ID: <20040722153641.GI17867@redhat.com>

On Thu, Jul 22, 2004 at 10:38:23AM -0500, Derek Anderson wrote:
> I am not seeing this on fedora core 2.
> On Wednesday 21 July 2004 17:30, Adam Cassar wrote:
> > Hi Guys,
> >
> > I've got GFS running in a two node set up on kernel 2.6.7 on debian
> > stable.
> >
> > I can mount both partitions and normal file access seems fine. However
> > any quota related commands just hang: ie
> >
> > ./gfs_quota init -f /mnt
> >
> > just sits there and is unkillable. Below are some interesting lines from
> > ps:
> >
> > 1     0   632     1  19   0     0    0 kcl_jo D    ?          0:00
> > [cman_userjoin]

Could this be the problem?                    ^^^^^^^^^

-- 
Adam Manthei  <amanthei at redhat.com>



From lserinol at gmail.com  Thu Jul 22 17:58:24 2004
From: lserinol at gmail.com (Levent Serinol)
Date: Thu, 22 Jul 2004 20:58:24 +0300
Subject: [Linux-cluster] lock_gulm is very slow. why ?
In-Reply-To: <20040722145837.GA29470@potassium.msp.redhat.com>
References: <2c1942a704072202534d487950@mail.gmail.com>
	<20040722145345.GA22628@redhat.com>
	<20040722145837.GA29470@potassium.msp.redhat.com>
Message-ID: <2c1942a704072210583051ca1e@mail.gmail.com>

here is the result with ignore_local_fs option:

Time:
        81 seconds total
        6 seconds of transactions (166 per second)

Files:
        10692 created (132 per second)
                Creation alone: 10000 files (434 per second)
                Mixed with transactions: 692 files (115 per second)
        899 read (149 per second)
        101 appended (16 per second)
        10692 deleted (132 per second)
                Deletion alone: 10384 files (199 per second)
                Mixed with transactions: 308 files (51 per second)

Data:
        21.05 megabytes read (266.07 kilobytes per second)
        250.41 megabytes written (3.09 megabytes per second)







On Thu, 22 Jul 2004 09:58:37 -0500, Ken Preslan <kpreslan at redhat.com> wrote:
> On Thu, Jul 22, 2004 at 09:53:45AM -0500, Michael Conrad Tadpol Tilstra wrote:
> > On Thu, Jul 22, 2004 at 12:53:48PM +0300, Levent Serinol wrote:
> > > Hi,
> > >  I have done some benchmark tests with postmark(tests repeated many
> > > times). There is one client (also it is lock server). and another one
> > > which exports it's scsi hard disk with gnbd.
> > [snipped a lot of nice data]
> > > as you can see nolock results is 2 times (some parts 3 times) faster
> > > then with locked one .
> > > what could be the problem ? is there any workaround or settune option
> > > (releasing locks earlier,etc...) ?
> >
> > the biggest thing you are probably running into is that when running
> > with lock_nolock, gfs knows that it is not in a cluster, therefor it can
> > enable some optimisations that only work for lcoal filesystems.  These
> > optimisations would corrupt disk data if you had multiple nodes mounted.
> 
> You can turn off those optimizations with lock_nolock by mounting with
> "-o ignore_local_fs".  That will let us figure out what is optimizations
> and what is lock latency.
> 
> > There is also no network traffic for handling lock in lock_nolock, but
> > that is minor compaired to the local file system optimisations.
> >
> > Basically, gfs with lock_nolock should always be quite faster than with
> > any cluster locking (lock_gulm or lock_dlm).
> >
> > Ken could say more on this.
> >
> > --
> > Michael Conrad Tadpol Tilstra
> > Reality is for people who lack imagination.
> 
> 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Ken Preslan <kpreslan at redhat.com>
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
> 


-- 
--

Stay out of the road, if you want to grow old. 
~ Pink Floyd ~.



From merlin at studiobz.it  Thu Jul 22 19:20:19 2004
From: merlin at studiobz.it (Christian Zoffoli)
Date: Thu, 22 Jul 2004 21:20:19 +0200
Subject: [Linux-cluster] GNBD: cannot connect to
	cluster	manager	...Operation not permitted
In-Reply-To: <1090491646.1306.19.camel@venus>
References: <40FBEB78.8040305@studiobz.it>	<20040720171235.GG23619@phlogiston.msp.redhat.com>	<40FDB21C.1030401@xmerlin.org>
	<1090491646.1306.19.camel@venus>
Message-ID: <41001373.9070001@studiobz.it>

Gabriel Wicke wrote:
[cut]
> 
> On my system (debian unstable) it expects the plugin folder
> in /lib/magma/plugins, you could add a symlink and see if it works. Else
> you can run a test program from the magma source dir, magma/tests/cpt
> null. Stracing this will show you the place it's looking for (will show
> an ENOENT near the end of the strace).
> The reason for this problem seems to be the usage of $libdir in the
> magma-plugins makefiles or somesuch.

You are right, the problem is the path ...now it works.

Thank you very much.

Christian



From stephen.willey at framestore-cfc.com  Fri Jul 23 10:53:46 2004
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Fri, 23 Jul 2004 11:53:46 +0100
Subject: [Linux-cluster] GFS is *very* slow when NFS exported
Message-ID: <4100EE3A.6050403@framestore-cfc.com>

We are looking at using GFS for load balanced NFS.  Going with GNBD 
exported GFS isn't really an option since we're talking about over 1000 
machines needing to access the storage.  For this reason we were looking 
at providing a relatively small cluster of GFS machines serving the 
storage via NFS.  The clients would then use round-robin DNS to load 
balance across these servers.

The results we've got back from our tests are shown below:

-= Setup ========================================-

Two machines with GFS filesystem mounted from a
dual-port RAID, connected to ethernet via GigE and
serving NFS as follows:

/mnt/gfs *(no_root_squash,rw,insecure,async)

-= End of Setup =================================-


-= Local GFS Filesystem Access ==================-

The tests show Mb/s computed by using 10Gb dd

1 machine, write (gfstest1):
  166
1 machine, read (gfstest2 reading file gfstest1 created):
  139.5
2 machines, sim writes (different files):
  113.7 (gfstest1) 108 (gfstest2) - 221.7aggr
2 machines, sim reads (different files):
  101.5 (gfstest1) 101.6 (gfstest2) - 203aggr
2 machines, sim reads (same file):
  130 (gfstest1) 134 (gfstest2) - 264aggr

-= End of Local GFS Filesystem Access ===========-


-= NFS/GFS Access ===============================-

1 write: (client1 to gfstest1)
  42.3
1 read: (client1 from gfstest1)
  Varies enormously between 10-35Mb/s)

Simultaneous writes and reads done as follows:
  client1 NFS mounting gfstest1
  clients2 NFS mounting gfstest2

2 sim writes (different files):
  39.6 (client1) 42.4 (client2) - 82aggr
2 sim reads (different files):
  11.3 (client1) 11.2 (client2) - 22.5aggr

-= End of NFS/GFS Access ========================-


-= NFS/XFS Access ===============================-

Done for comparison of XFS & GFS export speeds

1 write: 65.2
1 read: 73.5

-= End of NFS/XFS Access ========================-


We know NFS isn't the highest performing thing in the world, but it's a 
concern that the NFS performance of a GFS mounted filesystem is so much 
lower than that of an XFS system.  There are of course, the clustering 
overheads, but this would affect local performance as well.  Anyone got 
any ideas as to why this might be and how to get more performance?

Thanks,

Stephen



From linux-cluster-rhn at chaj.com  Fri Jul 23 19:54:51 2004
From: linux-cluster-rhn at chaj.com (linux-cluster-rhn at chaj.com)
Date: Fri, 23 Jul 2004 15:54:51 -0400 (EDT)
Subject: [Linux-cluster] compilation woes
Message-ID: <Pine.LNX.4.58.0407231553080.20874@znvy.sverfvgrf.pbz>


Here is the output of my compile attempt of the cluster dir (dl'd) via CVS. I
compiled the LVM and GFS stuff into my patched 2.6.7 kernel tree. Any ideas?
Thanks.

[root at live1 cluster]# make
cd cman-kernel && make all
make[1]: Entering directory `/content/src/gfs/cluster/cman-kernel'
cd src && make all
make[2]: Entering directory `/content/src/gfs/cluster/cman-kernel/src'
rm -f cluster
ln -s . cluster
make -C /usr/src/linux-2.6.7 M=/content/src/gfs/cluster/cman-kernel/src
modules USING_KBUILD=yes
make[3]: Entering directory `/content/src/linux-2.6.7'
  Building modules, stage 2.
  MODPOST
*** Warning: "sigprocmask" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "release_sock" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "kmem_cache_destroy"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "__kmalloc" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "sock_init_data"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "__kfree_skb" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "vmalloc" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "del_timer" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "seq_open" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "malloc_sizes" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "remove_wait_queue"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "sock_release" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "simple_strtoul"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "sock_recvmsg" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "seq_printf" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "remove_proc_entry"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "skb_recv_datagram"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "sock_create_kern"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "vfree" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "sock_rfree" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "sprintf" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "seq_read" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "jiffies" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "__write_lock_failed"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "sock_no_sendpage"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "sock_no_mmap" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "default_wake_function"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "wait_for_completion"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "sock_no_socketpair"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "proc_mkdir" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "sk_alloc" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "printk" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "alloc_skb" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "sock_sendmsg" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "panic" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "copy_to_user" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "sock_no_listen"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "sock_no_accept"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "strstr" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "sk_free" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "mod_timer" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "fput" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "lock_sock" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "skb_over_panic"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "skb_queue_tail"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "kmem_cache_alloc"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "memcpy_toiovec"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "system_utsname"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "datagram_poll"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "sock_register"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "schedule" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "schedule_timeout"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "local_bh_enable"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "create_proc_entry"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "put_cmsg" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "wake_up_process"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "kmem_cache_create"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "vsnprintf" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "__wake_up" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "net_ratelimit"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "sock_no_connect"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "do_gettimeofday"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "add_wait_queue"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "seq_lseek" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "sk_run_filter"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "kfree" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "kill_proc" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "memcpy" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "___pskb_trim" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "sock_unregister"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "memcpy_fromiovec"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "set_user_nice"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "fget" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "kernel_thread"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "__up_wakeup" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "complete" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "snprintf" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "seq_release" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "__down_failed"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "copy_from_user"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
*** Warning: "daemonize" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
undefined!
*** Warning: "skb_free_datagram"
[/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
  CC      /content/src/gfs/cluster/cman-kernel/src/cman.mod.o
/content/src/gfs/cluster/cman-kernel/src/cman.mod.c:9: error: variable
`__this_module' has initializer but incomplete type
/content/src/gfs/cluster/cman-kernel/src/cman.mod.c:10: error: unknown field
`name' specified in initializer
/content/src/gfs/cluster/cman-kernel/src/cman.mod.c:10: warning: excess
elements in struct initializer
/content/src/gfs/cluster/cman-kernel/src/cman.mod.c:10: warning: (near
initialization for `__this_module')
/content/src/gfs/cluster/cman-kernel/src/cman.mod.c:11: error: unknown field
`init' specified in initializer
/content/src/gfs/cluster/cman-kernel/src/cman.mod.c:11: warning: excess
elements in struct initializer
/content/src/gfs/cluster/cman-kernel/src/cman.mod.c:11: warning: (near
initialization for `__this_module')
/content/src/gfs/cluster/cman-kernel/src/cman.mod.c:9: error: storage size of
`__this_module' isn't known
make[4]: *** [/content/src/gfs/cluster/cman-kernel/src/cman.mod.o] Error 1
make[3]: *** [modules] Error 2
make[3]: Leaving directory `/content/src/linux-2.6.7'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/content/src/gfs/cluster/cman-kernel/src'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/content/src/gfs/cluster/cman-kernel'
make: *** [all] Error 2



From jbrassow at redhat.com  Fri Jul 23 20:56:24 2004
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Fri, 23 Jul 2004 15:56:24 -0500
Subject: [Linux-cluster] compilation woes
In-Reply-To: <Pine.LNX.4.58.0407231553080.20874@znvy.sverfvgrf.pbz>
References: <Pine.LNX.4.58.0407231553080.20874@znvy.sverfvgrf.pbz>
Message-ID: <C270C241-DCEA-11D8-8881-000A957BB1F6@redhat.com>

looks like it can't find the kernel... ? ...

Is your kernel src in /usr/src/linux-2.6 ?  If not, try:

 > ./configure --kernel_src=<location>
 > make install

  brassow

On Jul 23, 2004, at 2:54 PM, linux-cluster-rhn at chaj.com wrote:

>
> Here is the output of my compile attempt of the cluster dir (dl'd) via 
> CVS. I
> compiled the LVM and GFS stuff into my patched 2.6.7 kernel tree. Any 
> ideas?
> Thanks.
>
> [root at live1 cluster]# make
> cd cman-kernel && make all
> make[1]: Entering directory `/content/src/gfs/cluster/cman-kernel'
> cd src && make all
> make[2]: Entering directory `/content/src/gfs/cluster/cman-kernel/src'
> rm -f cluster
> ln -s . cluster
> make -C /usr/src/linux-2.6.7 M=/content/src/gfs/cluster/cman-kernel/src
> modules USING_KBUILD=yes
> make[3]: Entering directory `/content/src/linux-2.6.7'
>   Building modules, stage 2.
>   MODPOST
> *** Warning: "sigprocmask" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "release_sock" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "kmem_cache_destroy"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "__kmalloc" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "sock_init_data"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "__kfree_skb" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "vmalloc" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "del_timer" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "seq_open" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "malloc_sizes" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "remove_wait_queue"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "sock_release" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "simple_strtoul"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "sock_recvmsg" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "seq_printf" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "remove_proc_entry"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "skb_recv_datagram"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "sock_create_kern"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "vfree" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "sock_rfree" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "sprintf" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "seq_read" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "jiffies" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "__write_lock_failed"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "sock_no_sendpage"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "sock_no_mmap" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "default_wake_function"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "wait_for_completion"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "sock_no_socketpair"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "proc_mkdir" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "sk_alloc" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "printk" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "alloc_skb" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "sock_sendmsg" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "panic" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "copy_to_user" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "sock_no_listen"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "sock_no_accept"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "strstr" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "sk_free" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "mod_timer" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "fput" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "lock_sock" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "skb_over_panic"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "skb_queue_tail"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "kmem_cache_alloc"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "memcpy_toiovec"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "system_utsname"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "datagram_poll"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "sock_register"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "schedule" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "schedule_timeout"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "local_bh_enable"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "create_proc_entry"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "put_cmsg" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "wake_up_process"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "kmem_cache_create"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "vsnprintf" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "__wake_up" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "net_ratelimit"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "sock_no_connect"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "do_gettimeofday"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "add_wait_queue"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "seq_lseek" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "sk_run_filter"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "kfree" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "kill_proc" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "memcpy" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "___pskb_trim" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "sock_unregister"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "memcpy_fromiovec"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "set_user_nice"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "fget" [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "kernel_thread"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "__up_wakeup" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "complete" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "snprintf" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "seq_release" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "__down_failed"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "copy_from_user"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
> *** Warning: "daemonize" 
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko]
> undefined!
> *** Warning: "skb_free_datagram"
> [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined!
>   CC      /content/src/gfs/cluster/cman-kernel/src/cman.mod.o
> /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:9: error: variable
> `__this_module' has initializer but incomplete type
> /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:10: error: unknown 
> field
> `name' specified in initializer
> /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:10: warning: excess
> elements in struct initializer
> /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:10: warning: (near
> initialization for `__this_module')
> /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:11: error: unknown 
> field
> `init' specified in initializer
> /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:11: warning: excess
> elements in struct initializer
> /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:11: warning: (near
> initialization for `__this_module')
> /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:9: error: storage 
> size of
> `__this_module' isn't known
> make[4]: *** [/content/src/gfs/cluster/cman-kernel/src/cman.mod.o] 
> Error 1
> make[3]: *** [modules] Error 2
> make[3]: Leaving directory `/content/src/linux-2.6.7'
> make[2]: *** [all] Error 2
> make[2]: Leaving directory `/content/src/gfs/cluster/cman-kernel/src'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory `/content/src/gfs/cluster/cman-kernel'
> make: *** [all] Error 2
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
>



From lhh at redhat.com  Mon Jul 26 15:29:24 2004
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 26 Jul 2004 11:29:24 -0400
Subject: [Linux-cluster] Removal of message header cruft from magma/message.c
Message-ID: <1090855764.4427.28.camel@dhcp83-21.boston.redhat.com>

Jon Brassow noted that this wasn't necessary.  He's right; it isn't
anymore. ;)

-- Lon
 
Index: lib/Makefile
===================================================================
RCS file: /cvs/cluster/cluster/magma/lib/Makefile,v
retrieving revision 1.2
diff -u -r1.2 Makefile
--- lib/Makefile	1 Jul 2004 13:35:46 -0000	1.2
+++ lib/Makefile	26 Jul 2004 15:22:31 -0000
@@ -72,7 +72,7 @@
 		        memberlist.o clist.o
 	${AR} cr $@ $^
 
-libmagmamsg.a: message.o crc32.o fdops.o
+libmagmamsg.a: message.o fdops.o
 	${AR} cr $@ $^
 
 %.o: %.c
Index: lib/message.c
===================================================================
RCS file: /cvs/cluster/cluster/magma/lib/message.c,v
retrieving revision 1.3
diff -u -r1.3 message.c
--- lib/message.c	1 Jul 2004 13:35:46 -0000	1.3
+++ lib/message.c	26 Jul 2004 15:22:31 -0000
@@ -49,8 +49,6 @@
 
 #define IPV6_PORT_OFFSET 1
 
-int clu_crc32(void *, int);
-
 /*
    From fdops.c
  */
@@ -80,62 +78,6 @@
 static pthread_mutex_t fill_mutex = PTHREAD_MUTEX_INITIALIZER;
 
 
-struct __attribute__ ((packed)) msg_struct {
-	uint32_t ms_count;	/* number of bytes in payload */
-	uint32_t ms_crc32;	/* CRC32 of data */
-};
-
-
-/**
-  Create a message buffer with a header including length and data CRC.
-
-  @param payload	data to send
-  @param len		length of message to add header to
-  @param msg		allocated within: message + header
-  @return		Total size of allocated buffer.
- */
-static unsigned long
-msg_create(void *payload, ssize_t len, void **msg)
-{
-	unsigned long ret;
-	struct msg_struct msg_hdr;
-
-	memset(&msg_hdr, 0, sizeof (msg_hdr));
-	msg_hdr.ms_count = len;
-	msg_hdr.ms_crc32 = clu_crc32(payload, len);
-#if __BYTE_ORDER == __BIG_ENDIAN
-	msg_hdr.ms_count = bswap_32(msg_hdr.ms_count);
-	msg_hdr.ms_crc32 = bswap_32(msg_hdr.ms_crc32);
-#endif
-
-	if (!len || !payload)
-		return sizeof (msg_hdr);
-
-	*msg = (void *) malloc(sizeof (msg_hdr) + len);
-	if (*msg == NULL) {
-		errno = ENOMEM;
-		return -1;
-	}
-	memcpy(*msg, &msg_hdr, sizeof (msg_hdr));
-	memcpy(*msg + sizeof (msg_hdr), payload, len);
-
-	ret = sizeof (msg_hdr) + len;
-	return ret;
-}
-
-
-/**
-  Free a message buffer.
-
-  @param msg		Buffer to free.
- */
-static inline void
-msg_destroy(void *msg)
-{
-	if (msg != NULL)
-		free(msg);
-}
-
 /**
   Update our internal membership list with the provided list.
   Does NOT copy over resolved addresses; the caller may want to 
@@ -177,11 +119,6 @@
 _msg_receive(int fd, void *buf, ssize_t count,
 	      struct timeval *tv)
 {
-	uint32_t crc;
-	int err;
-	struct msg_struct msg_hdr;
-	ssize_t retval = 0;
-
 	if (fd < 0) {
 		errno = EBADF;
 		return -1;
@@ -197,36 +134,7 @@
 		return -1;
 	}
 
-	if ((retval = _read_retry(fd, &msg_hdr, sizeof (msg_hdr), tv)) <
-	    (ssize_t) sizeof (msg_hdr)) {
-		return -1;
-	}
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-	msg_hdr.ms_count = bswap_32(msg_hdr.ms_count);
-	msg_hdr.ms_crc32 = bswap_32(msg_hdr.ms_crc32);
-#endif
-
-	if (!msg_hdr.ms_count)
-		return 0;
-
-	err = errno;
-	retval = _read_retry(fd, buf, count, tv);
-
-	if ((count == msg_hdr.ms_count) && (retval == count)) {
-		crc = clu_crc32(buf, retval);
-
-		if (crc != msg_hdr.ms_crc32) {
-			/* Mangled message */
-			fprintf(stderr, "CRC32 mismatch: 0x%08x vs. 0x%08x\n",
-				crc, msg_hdr.ms_crc32);
-			err = EIO;
-			retval = -1;
-		}
-	}
-
-	errno = err;
-	return retval;
+	return _read_retry(fd, buf, count, tv);
 }
 
 
@@ -234,7 +142,7 @@
   Receive a message from a file descriptor w/o a timeout value.
 
   @param fd		File descriptor to receive from
-  @param buf		Pre-allocated bufffer \
+  @param buf		Pre-allocated bufffer 
   @param count		Size of expected message; must be <= size of
   			preallocated buffer.
   @return		-1 on failure or size of read data
@@ -282,9 +190,6 @@
 ssize_t
 msg_send(int fd, void *buf, ssize_t count)
 {
-	void *msg;
-	int msg_len = -1, bytes_written = 0;
-
 	if (fd == -1) {
 		errno = EBADF;
 		return -1;
@@ -300,13 +205,7 @@
 		return -1;
 	}
 
-	msg_len = msg_create(buf, count, &msg);
-	if ((bytes_written = write(fd, msg, msg_len)) < msg_len) {
-		msg_destroy(msg);
-		return -1;
-	}
-	msg_destroy(msg);
-	return (bytes_written - sizeof (struct msg_struct));
+	return write(fd, buf, count);
 }
 
 
@@ -914,50 +813,11 @@
 
 
 ssize_t
-_msg_peek(int sockfd, void *buf, ssize_t count)
-{
-	char *bigbuf;
-	ssize_t ret;
-	int bigbuf_sz;
-	int hdrsz = sizeof (struct msg_struct);
-
-	bigbuf_sz = count + hdrsz;
-	bigbuf = (char *) malloc(bigbuf_sz);
-	if (bigbuf == NULL)
-		return -1;
-
-	/*
-	 * We need to account for the msg header.  So we skip past it
-	 * and decrement the return value by the number of bytes eaten
-	 * up by the header.
-	 */
-	ret = recv(sockfd, bigbuf, bigbuf_sz, MSG_PEEK);
-	if (ret < 0) {
-		ret = errno;
-		free(bigbuf);
-		errno = ret;
-		return -1;
-	}
-	if (ret - hdrsz > 0) {
-		ret -= hdrsz;
-		if (ret > count)
-			ret = count;
-		memcpy(buf, bigbuf + hdrsz, ret);
-	} else {
-		ret = 0;
-	}
-	free(bigbuf);
-
-	return ret;
-}
-
-
-ssize_t
 msg_peek(int sockfd, void *buf, ssize_t count)
 {
 	if (sockfd < 0 || count > MSG_MAX_SIZE) {
 		return -1;
 	}
 
-	return (_msg_peek(sockfd, buf, count));
+	return recv(sockfd, buf, count, MSG_PEEK);
 }




From laza at yu.net  Mon Jul 26 17:08:35 2004
From: laza at yu.net (Lazar Obradovic)
Date: Mon, 26 Jul 2004 19:08:35 +0200
Subject: [Linux-cluster] SNMP modules?
Message-ID: <1090861715.13809.3.camel@laza.eunet.yu>

Hello all, 

I'd like to develop my own fencing agents (for IBM BladeCenter and
QLogic SANBox2 switches), but they will require SNMP bindings.

Is that ok with general development philosophy, since I'd like to
contribude them? net-snmp-5.x.x-based API? 

-- 
Lazar Obradovic, System Engineer
----- 
laza at YU.net
YUnet International http://www.EUnet.yu
Dubrovacka 35/III, 11000 Belgrade
Tel: +381 11 3119901; Fax: +381 11 3119901
-----
This e-mail is confidential and intended only for the recipient.
Unauthorized distribution, modification or disclosure of its
contents is prohibited. If you have received this e-mail in error,
please notify the sender by telephone +381 11 3119901.
-----




From canseco at fidmail.com  Mon Jul 26 17:41:24 2004
From: canseco at fidmail.com (Robert)
Date: Mon, 26 Jul 2004 12:41:24 -0500
Subject: [Linux-cluster] GFS:  FS Mounting Issues
Message-ID: <000001c47337$c5c5da10$0b50e5d8@roadkill>

All,

 

I have a question regarding the latest implementation of GFS 6.0 with RedHat
Linux 3.0 Enterprise.  What my company has going on is this:  We have a SAN
project coming up but we do not have the SAN or a similar type shared
storage device available.  We have the node machines on hand and are trying
to work through the GFS implementation as we are new to GFS (We have run
RedHat Linux since version 5.0 and all the flavors in between.).  We have
tried to simulate the SAN environment by utilizing the GNBD software and we
have followed the instructions available at:
<http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-ex-slm-ext-gnbd.htm
l>
http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-ex-slm-ext-gnbd.html

 

This is the LOCK_GULM, SLM External, and GNBD example of GFS.

 

We have not had any problems getting the shared devices, pools, and
filesystems created as followed in the documentation.  What we have
happening is that when we mount the GFS filesystem on one node and then we
try and mount the filesystem on the second node, the second node will hang
when issued the command to mount the filesystem.  No errors are reported on
console or in logs.  No errors are reported in the Lock Server either.
Everything appears to be working correctly as the log information for both
machines at that instant are the same, ie, the one that is hung, has the
same log messages as the one that is not hung.  

 

When I go to node one, with node two still trying to mount the filesystem,
and unmount the filesystem, node two immediately finishes the mount command
and everything is fine with node two.  However, when trying to mount node
one again, it just hangs and so on.  It is only allowing one node to mount
the filesystem at once.  The configuration files are all the generic
examples given in the documentation with the fencing mechanism as GNBD
(Tried fencing with manual and the same situation exists with that method.).
I can provide all configuration files and also give log file information if
this problem isn't something that experienced GFS users know what the
problem may be.  

 

Thank you all for your time.

 

Robert

Fidelity Communications

 
ps. I'm not sure if my first message got through as I sent it via an
alternate email address so if this is a duplicate, please ignore.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040726/454773fd/attachment.htm>

From mtilstra at redhat.com  Mon Jul 26 18:05:44 2004
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Mon, 26 Jul 2004 13:05:44 -0500
Subject: [Linux-cluster] GFS:  FS Mounting Issues
In-Reply-To: <000001c47337$c5c5da10$0b50e5d8@roadkill>
References: <000001c47337$c5c5da10$0b50e5d8@roadkill>
Message-ID: <20040726180544.GA11937@redhat.com>

On Mon, Jul 26, 2004 at 12:41:24PM -0500, Robert wrote:
>    We  have  not  had any problems getting the shared devices, pools, and
>    filesystems  created  as  followed in the documentation.  What we have
>    happening  is  that  when  we mount the GFS filesystem on one node and
>    then  we  try  and mount the filesystem on the second node, the second
>    node  will  hang  when issued the command to mount the filesystem.  No
>    errors  are reported on console or in logs.  No errors are reported in
>    the Lock Server either.  Everything appears to be working correctly as
>    the  log  information  for both machines at that instant are the same,
>    ie, the one that is hung, has the same log messages as the one that is
>    not hung.

How long are the names of your nodes?  There is a name length issue in
the 6.0 code where the first 8 bytes of each node in you cluster needs
to be unique.


See bugzilla:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=127828

-- 
Michael Conrad Tadpol Tilstra
I always wanted to be a procrastinator, never got around to it.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040726/f846f511/attachment.sig>

From hanafim at asc.hpc.mil  Mon Jul 26 18:20:28 2004
From: hanafim at asc.hpc.mil (MAHMOUD HANAFI)
Date: Mon, 26 Jul 2004 14:20:28 -0400
Subject: [Linux-cluster] GFS rpms
Message-ID: <41054B6C.2090509@asc.hpc.mil>

We are currently running GFS5.0 with full support. Where do we get 
updates because i haven't been able to get any help from redhat.



From canseco at fidmail.com  Mon Jul 26 18:16:48 2004
From: canseco at fidmail.com (Robert)
Date: Mon, 26 Jul 2004 13:16:48 -0500
Subject: [Linux-cluster] GFS:  FS Mounting Issues
In-Reply-To: <20040726180544.GA11937@redhat.com>
Message-ID: <001001c4733c$b763f5b0$0b50e5d8@roadkill>

Our node names are in the form:  pe2650-ox.fidnet.com

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Michael Conrad Tadpol
Tilstra
Sent: Monday, July 26, 2004 1:06 PM
To: Discussion of clustering software components including GFS
Subject: Re: [Linux-cluster] GFS: FS Mounting Issues


On Mon, Jul 26, 2004 at 12:41:24PM -0500, Robert wrote:
>    We  have  not  had any problems getting the shared devices, pools, and
>    filesystems  created  as  followed in the documentation.  What we have
>    happening  is  that  when  we mount the GFS filesystem on one node and
>    then  we  try  and mount the filesystem on the second node, the second
>    node  will  hang  when issued the command to mount the filesystem.  No
>    errors  are reported on console or in logs.  No errors are reported in
>    the Lock Server either.  Everything appears to be working correctly as
>    the  log  information  for both machines at that instant are the same,
>    ie, the one that is hung, has the same log messages as the one that is
>    not hung.

How long are the names of your nodes?  There is a name length issue in the
6.0 code where the first 8 bytes of each node in you cluster needs to be
unique.


See bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=127828

-- 
Michael Conrad Tadpol Tilstra
I always wanted to be a procrastinator, never got around to it.




From Rory_Savage.consultant at peoplesoft.com  Mon Jul 26 18:52:10 2004
From: Rory_Savage.consultant at peoplesoft.com (Rory_Savage.consultant at peoplesoft.com)
Date: Mon, 26 Jul 2004 14:52:10 -0400
Subject: [Linux-cluster] Trying to get GNBD with GFS Working
Message-ID: <OF09D20555.95572379-ON87256EDD.0067A03B-85256EDD.0067A743@peoplesoft.com>





Please Help!

I have a two node cluster (hal-n1, and hal-n2).  I exported the /dev/hda4
filesystem from hal-n2

[root at hal-n2 cluster]# gnbd_export -c -v -e export1 -d /dev/hda4
gnbd_export: created GNBD export1 serving file /dev/hda4

log file:

Jul 26 14:22:35 hal-n2 gnbd_serv[3853]: gnbd device 'export1' serving
/dev/hda4 exported with 130897620 sectors

While trying to import the device on hal-n1, I am reciving the following
error:

[root at hal-n1 src]# gnbd_import -v -i hal-n2
gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such
file or directory

* My first reaction is, when did the "/sys" ever need to be in existance?
I examined all of the build options for GNBD and could not find a
prefrecnce location setting for anything related to "/sys".  And I know
this directory is not native to Red Hat (that I know of).

System Configuration and Parameters

Kernel 2.6.7 from source

Kernel Config Options:

CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID5=m
CONFIG_MD_RAID6=m
CONFIG_MD_MULTIPATH=m
CONFIG_BLK_DEV_DM=m
CONFIG_DM_CRYPT=m
CONFIG_BLK_DEV_GNBD=m

CONFIG_CLUSTER=m
CONFIG_CLUSTER_DLM=m
CONFIG_CLUSTER_DLM_PROCLOCKS=y

CONFIG_LOCK_HARNESS=m
CONFIG_GFS_FS=m
CONFIG_LOCK_NOLOCK=m
CONFIG_LOCK_DLM=m
CONFIG_LOCK_GULM=m

# GFS and GNBD sources obtain via CVS

[root at hal-n1 src]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   1    1    1   M   hal-n1
   2    1    1   M   hal-n2

[root at hal-n1 src]# cat /proc/cluster/services

Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 join
S-6,20,1
[1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[1 2]

[root at hal-n1 src]# cat /proc/cluster/status
Version: 2.0.1
Config version: 1
Cluster name: xcluster
Cluster ID: 28724
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 3
Node addresses: 10.1.1.1

[root at hal-n2 cluster]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   1    1    1   M   hal-n1
   2    1    1   M   hal-n2

[root at hal-n2 cluster]# cat /proc/cluster/services

Service          Name                              GID LID State     Code
Fence Domain:    "default"                           0   2 join
S-1,80,2
[]

DLM Lock Space:  "clvmd"                             2   3 run       -
[1 2]

[root at hal-n2 cluster]# cat /proc/cluster/status
Version: 2.0.1
Config version: 1
Cluster name: xcluster
Cluster ID: 28724
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 3
Node addresses: 10.1.1.2




--
Rory Savage, Charlotte DSI Group
Product & Technology
PeopleSoft Inc.
14045 Ballantyne Corporate Place
Suite 101
Charlotte, NC 28277
Email: rory_savage at peoplesoft.com
Phone: 704.401.1104
Fax: 704.401.1240





From Rory_Savage.consultant at peoplesoft.com  Mon Jul 26 19:13:35 2004
From: Rory_Savage.consultant at peoplesoft.com (Rory_Savage.consultant at peoplesoft.com)
Date: Mon, 26 Jul 2004 15:13:35 -0400
Subject: [Linux-cluster] Trying to get GNBD with GFS Working
Message-ID: <OFC18E8D8D.2BF9B541-ON87256EDD.00699999-85256EDD.00699D1E@peoplesoft.com>





Please Help!

I have a two node cluster (hal-n1, and hal-n2).  I exported the /dev/hda4
filesystem from hal-n2

[root at hal-n2 cluster]# gnbd_export -c -v -e export1 -d /dev/hda4
gnbd_export: created GNBD export1 serving file /dev/hda4

log file:

Jul 26 14:22:35 hal-n2 gnbd_serv[3853]: gnbd device 'export1' serving
/dev/hda4 exported with 130897620 sectors

While trying to import the device on hal-n1, I am reciving the following
error:

[root at hal-n1 src]# gnbd_import -v -i hal-n2
gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such
file or directory

* My first reaction is, when did the "/sys" ever need to be in existance?
I examined all of the build options for GNBD and could not find a
prefrecnce location setting for anything related to "/sys".  And I know
this directory is not native to Red Hat (that I know of).

System Configuration and Parameters

Kernel 2.6.7 from source

Kernel Config Options:

CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID5=m
CONFIG_MD_RAID6=m
CONFIG_MD_MULTIPATH=m
CONFIG_BLK_DEV_DM=m
CONFIG_DM_CRYPT=m
CONFIG_BLK_DEV_GNBD=m

CONFIG_CLUSTER=m
CONFIG_CLUSTER_DLM=m
CONFIG_CLUSTER_DLM_PROCLOCKS=y

CONFIG_LOCK_HARNESS=m
CONFIG_GFS_FS=m
CONFIG_LOCK_NOLOCK=m
CONFIG_LOCK_DLM=m
CONFIG_LOCK_GULM=m

# GFS and GNBD sources obtain via CVS

[root at hal-n1 src]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   1    1    1   M   hal-n1
   2    1    1   M   hal-n2

[root at hal-n1 src]# cat /proc/cluster/services

Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 join
S-6,20,1
[1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[1 2]

[root at hal-n1 src]# cat /proc/cluster/status
Version: 2.0.1
Config version: 1
Cluster name: xcluster
Cluster ID: 28724
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 3
Node addresses: 10.1.1.1

[root at hal-n2 cluster]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   1    1    1   M   hal-n1
   2    1    1   M   hal-n2

[root at hal-n2 cluster]# cat /proc/cluster/services

Service          Name                              GID LID State     Code
Fence Domain:    "default"                           0   2 join
S-1,80,2
[]

DLM Lock Space:  "clvmd"                             2   3 run       -
[1 2]

[root at hal-n2 cluster]# cat /proc/cluster/status
Version: 2.0.1
Config version: 1
Cluster name: xcluster
Cluster ID: 28724
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 3
Node addresses: 10.1.1.2




--
Rory Savage, Charlotte DSI Group
Product & Technology
PeopleSoft Inc.
14045 Ballantyne Corporate Place
Suite 101
Charlotte, NC 28277
Email: rory_savage at peoplesoft.com
Phone: 704.401.1104
Fax: 704.401.1240






From gshi at ncsa.uiuc.edu  Mon Jul 26 19:32:19 2004
From: gshi at ncsa.uiuc.edu (Guochun Shi)
Date: Mon, 26 Jul 2004 14:32:19 -0500
Subject: [Linux-cluster] GFS/GNBD configuration
Message-ID: <5.1.0.14.2.20040726142340.04937de0@pop.ncsa.uiuc.edu>

hi,

is the configuration in the attached file for gnbd and gfs feasible? 

Thanks
-Guochun
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.pdf
Type: application/pdf
Size: 4879 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040726/b378d029/attachment.pdf>

From bmarzins at redhat.com  Mon Jul 26 20:27:56 2004
From: bmarzins at redhat.com (Benjamin Marzinski)
Date: Mon, 26 Jul 2004 15:27:56 -0500
Subject: [Linux-cluster] Trying to get GNBD with GFS Working
In-Reply-To: <OFC18E8D8D.2BF9B541-ON87256EDD.00699999-85256EDD.00699D1E@peoplesoft.com>
References: <OFC18E8D8D.2BF9B541-ON87256EDD.00699999-85256EDD.00699D1E@peoplesoft.com>
Message-ID: <20040726202756.GK23619@phlogiston.msp.redhat.com>

On Mon, Jul 26, 2004 at 03:13:35PM -0400, Rory_Savage.consultant at peoplesoft.com wrote:
> 
> 
> 
>
> Please Help!
> 
> I have a two node cluster (hal-n1, and hal-n2).  I exported the /dev/hda4
> filesystem from hal-n2
> 
> [root at hal-n2 cluster]# gnbd_export -c -v -e export1 -d /dev/hda4
> gnbd_export: created GNBD export1 serving file /dev/hda4
> 
> log file:
> 
> Jul 26 14:22:35 hal-n2 gnbd_serv[3853]: gnbd device 'export1' serving
> /dev/hda4 exported with 130897620 sectors
> 
> While trying to import the device on hal-n1, I am reciving the following
> error:
> 
> [root at hal-n1 src]# gnbd_import -v -i hal-n2
> gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such
> file or directory

GNBD requires sysfs to run. Somewhere in you kernel config file, you should
have:

CONFIG_SYSFS=y


Then run the command:
# mount -t sysfs sysfs /sys
to mount sysfs.

For more information on sysfs, see Documentation/filesystems/sysfs.txt
 
Hope this helps

-Ben

bmarzins at redhat.com

> * My first reaction is, when did the "/sys" ever need to be in existance?
> I examined all of the build options for GNBD and could not find a
> prefrecnce location setting for anything related to "/sys".  And I know
> this directory is not native to Red Hat (that I know of).
> 
> System Configuration and Parameters
> 
> Kernel 2.6.7 from source
> 
> Kernel Config Options:
> 
> CONFIG_MD=y
> CONFIG_BLK_DEV_MD=m
> CONFIG_MD_LINEAR=m
> CONFIG_MD_RAID0=m
> CONFIG_MD_RAID1=m
> CONFIG_MD_RAID5=m
> CONFIG_MD_RAID6=m
> CONFIG_MD_MULTIPATH=m
> CONFIG_BLK_DEV_DM=m
> CONFIG_DM_CRYPT=m
> CONFIG_BLK_DEV_GNBD=m
> 
> CONFIG_CLUSTER=m
> CONFIG_CLUSTER_DLM=m
> CONFIG_CLUSTER_DLM_PROCLOCKS=y
> 
> CONFIG_LOCK_HARNESS=m
> CONFIG_GFS_FS=m
> CONFIG_LOCK_NOLOCK=m
> CONFIG_LOCK_DLM=m
> CONFIG_LOCK_GULM=m
> 
> # GFS and GNBD sources obtain via CVS
> 
> [root at hal-n1 src]# cat /proc/cluster/nodes
> Node  Votes Exp Sts  Name
>    1    1    1   M   hal-n1
>    2    1    1   M   hal-n2
> 
> [root at hal-n1 src]# cat /proc/cluster/services
> 
> Service          Name                              GID LID State     Code
> Fence Domain:    "default"                           1   2 join
> S-6,20,1
> [1]
> 
> DLM Lock Space:  "clvmd"                             2   3 run       -
> [1 2]
> 
> [root at hal-n1 src]# cat /proc/cluster/status
> Version: 2.0.1
> Config version: 1
> Cluster name: xcluster
> Cluster ID: 28724
> Membership state: Cluster-Member
> Nodes: 2
> Expected_votes: 1
> Total_votes: 2
> Quorum: 1
> Active subsystems: 3
> Node addresses: 10.1.1.1
> 
> [root at hal-n2 cluster]# cat /proc/cluster/nodes
> Node  Votes Exp Sts  Name
>    1    1    1   M   hal-n1
>    2    1    1   M   hal-n2
> 
> [root at hal-n2 cluster]# cat /proc/cluster/services
> 
> Service          Name                              GID LID State     Code
> Fence Domain:    "default"                           0   2 join
> S-1,80,2
> []
> 
> DLM Lock Space:  "clvmd"                             2   3 run       -
> [1 2]
> 
> [root at hal-n2 cluster]# cat /proc/cluster/status
> Version: 2.0.1
> Config version: 1
> Cluster name: xcluster
> Cluster ID: 28724
> Membership state: Cluster-Member
> Nodes: 2
> Expected_votes: 1
> Total_votes: 2
> Quorum: 1
> Active subsystems: 3
> Node addresses: 10.1.1.2
> 
> 
> 
> 
> --
> Rory Savage, Charlotte DSI Group
> Product & Technology
> PeopleSoft Inc.
> 14045 Ballantyne Corporate Place
> Suite 101
> Charlotte, NC 28277
> Email: rory_savage at peoplesoft.com
> Phone: 704.401.1104
> Fax: 704.401.1240
> 
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From mailing-lists at hughesjr.com  Mon Jul 26 21:08:09 2004
From: mailing-lists at hughesjr.com (Johnny Hughes)
Date: Mon, 26 Jul 2004 16:08:09 -0500
Subject: [Linux-cluster] GFS rpms
In-Reply-To: <41054B6C.2090509@asc.hpc.mil>
References: <41054B6C.2090509@asc.hpc.mil>
Message-ID: <1090876088.18047.5.camel@Myth.home.local>

On Mon, 2004-07-26 at 13:20, MAHMOUD HANAFI wrote:

> We are currently running GFS5.0 with full support. Where do we get 
> updates because i haven't been able to get any help from redhat.
> 


You can download rpms from my website that run on RHEL 3 and WhiteBox EL
3 for AMD and i686 (smp, hugemem, regular) for the 2.4.21-15.0.3.EL
kernel.

RHEL / WBEL GFS

That is not the official download.

Johnny Hughes
HughesJR.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040726/47fad56d/attachment.htm>

From linux-cluster-rhn at chaj.com  Mon Jul 26 21:17:53 2004
From: linux-cluster-rhn at chaj.com (linux-cluster-rhn at chaj.com)
Date: Mon, 26 Jul 2004 17:17:53 -0400 (EDT)
Subject: [Linux-cluster] requirements question
In-Reply-To: <001001c4733c$b763f5b0$0b50e5d8@roadkill>
References: <001001c4733c$b763f5b0$0b50e5d8@roadkill>
Message-ID: <Pine.LNX.4.58.0407261624150.14158@znvy.sverfvgrf.pbz>


We've got a fiber channel connection from two nodes to a single san storage
share (through a brocade). We're looking to export nfs from one of the hosts
with the other host as failover. As a test, I patched a 2.6.7 kernel and
compiled the necessary utilities according to
http://gfs.wikidev.net/Installation. I used LVM2 to create a logical volume on
the san device (/dev/sda1), gfs_mkfs'd the device according to the clustering
config that I made, and successfully mounted the lv on both hosts as a local
drive. Is it necessary for us to use LVM if the san is already doing the
raid/redundancy? What is the bare minimum in terms of daemons that we'd need
in order to run the above setup? I'm thinking it'd be something like:

ccsd
cman_tool join
clvmd
mount -t gfs /dev/sda1 /mnt/san-name

Also, what suggestions do you have for an automatic failover system for the
two hosts? I imagine some sort of heartbeat package. Thanks for your time.

Jim



From hanafim at asc.hpc.mil  Mon Jul 26 21:37:48 2004
From: hanafim at asc.hpc.mil (MAHMOUD HANAFI)
Date: Mon, 26 Jul 2004 17:37:48 -0400
Subject: [Linux-cluster] GFS rpms
In-Reply-To: <1090876088.18047.5.camel@Myth.home.local>
References: <41054B6C.2090509@asc.hpc.mil>
	<1090876088.18047.5.camel@Myth.home.local>
Message-ID: <410579AC.7000106@asc.hpc.mil>

Thanks!

Johnny Hughes wrote:
> On Mon, 2004-07-26 at 13:20, MAHMOUD HANAFI wrote:
> 
>>/We are currently running GFS5.0 with full support. Where do we get 
>>updates because i haven't been able to get any help from redhat.
>>/
>>
> 
> You can download rpms from my website that run on RHEL 3 and WhiteBox EL 
> 3 for AMD and i686 (smp, hugemem, regular) for the 2.4.21-15.0.3.EL kernel.
> 
> RHEL / WBEL GFS 
> <http://www.hughesjr.com/component/option,com_docman/task,view_category/Itemid,34/subcat,1/catid,15/limitstart,0/limit,50/>
> 
> That is not the official download.
> 
> Johnny Hughes
> _HughesJR.com_ <http://www.hughesjr.com>
> 
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From notiggy at gmail.com  Mon Jul 26 23:47:04 2004
From: notiggy at gmail.com (Brian Jackson)
Date: Mon, 26 Jul 2004 18:47:04 -0500
Subject: [Linux-cluster] GFS/GNBD configuration
In-Reply-To: <5.1.0.14.2.20040726142340.04937de0@pop.ncsa.uiuc.edu>
References: <5.1.0.14.2.20040726142340.04937de0@pop.ncsa.uiuc.edu>
Message-ID: <fb20c214040726164740e4bc8a@mail.gmail.com>

In fact that's what I think gnbd was created for. Although it's gained
more popularity as a way to export local drives from a computer.

--Brian Jackson

On Mon, 26 Jul 2004 14:32:19 -0500, Guochun Shi <gshi at ncsa.uiuc.edu> wrote:
> hi,
> 
> is the configuration in the attached file for gnbd and gfs feasible?
> 
> Thanks
> -Guochun
> 
>



From notiggy at gmail.com  Tue Jul 27 00:23:57 2004
From: notiggy at gmail.com (Brian Jackson)
Date: Mon, 26 Jul 2004 19:23:57 -0500
Subject: [Linux-cluster] requirements question
In-Reply-To: <Pine.LNX.4.58.0407261624150.14158@znvy.sverfvgrf.pbz>
References: <001001c4733c$b763f5b0$0b50e5d8@roadkill>
	<Pine.LNX.4.58.0407261624150.14158@znvy.sverfvgrf.pbz>
Message-ID: <fb20c214040726172359a1ca37@mail.gmail.com>

On Mon, 26 Jul 2004 17:17:53 -0400 (EDT), linux-cluster-rhn at chaj.com
<linux-cluster-rhn at chaj.com> wrote:
> 
> We've got a fiber channel connection from two nodes to a single san storage
> share (through a brocade). We're looking to export nfs from one of the hosts
> with the other host as failover. As a test, I patched a 2.6.7 kernel and
> compiled the necessary utilities according to
> http://gfs.wikidev.net/Installation. I used LVM2 to create a logical volume on
> the san device (/dev/sda1), gfs_mkfs'd the device according to the clustering
> config that I made, and successfully mounted the lv on both hosts as a local
> drive. Is it necessary for us to use LVM if the san is already doing the
> raid/redundancy?

Nope, in your situation it would be most useful providing stable device naming.

> What is the bare minimum in terms of daemons that we'd need
> in order to run the above setup? I'm thinking it'd be something like:
> 
> ccsd
> cman_tool join
> clvmd
> mount -t gfs /dev/sda1 /mnt/san-name

looks right

> 
> Also, what suggestions do you have for an automatic failover system for the
> two hosts? I imagine some sort of heartbeat package. Thanks for your time.

Currently there is heartbeat (linux-ha.org), and a few others. I
believe redhat is working on one as well that will fit in with their
infrastructure bits

--Brian Jackson

> 
> Jim



From teigland at redhat.com  Tue Jul 27 02:47:18 2004
From: teigland at redhat.com (David Teigland)
Date: Tue, 27 Jul 2004 10:47:18 +0800
Subject: [Linux-cluster] SNMP modules?
In-Reply-To: <1090861715.13809.3.camel@laza.eunet.yu>
References: <1090861715.13809.3.camel@laza.eunet.yu>
Message-ID: <20040727024718.GC12983@redhat.com>

On Mon, Jul 26, 2004 at 07:08:35PM +0200, Lazar Obradovic wrote:
> Hello all, 
> 
> I'd like to develop my own fencing agents (for IBM BladeCenter and
> QLogic SANBox2 switches), but they will require SNMP bindings.
> 
> Is that ok with general development philosophy, since I'd like to
> contribude them? net-snmp-5.x.x-based API? 

That sounds great, we'd be happy to add them to the collection.

-- 
Dave Teigland  <teigland at redhat.com>



From teigland at redhat.com  Tue Jul 27 03:01:04 2004
From: teigland at redhat.com (David Teigland)
Date: Tue, 27 Jul 2004 11:01:04 +0800
Subject: [Linux-cluster] requirements question
In-Reply-To: <Pine.LNX.4.58.0407261624150.14158@znvy.sverfvgrf.pbz>
References: <001001c4733c$b763f5b0$0b50e5d8@roadkill>
	<Pine.LNX.4.58.0407261624150.14158@znvy.sverfvgrf.pbz>
Message-ID: <20040727030104.GD12983@redhat.com>

On Mon, Jul 26, 2004 at 05:17:53PM -0400, linux-cluster-rhn at chaj.com wrote:
> 
> We've got a fiber channel connection from two nodes to a single san storage
> share (through a brocade). We're looking to export nfs from one of the hosts
> with the other host as failover. As a test, I patched a 2.6.7 kernel and
> compiled the necessary utilities according to
> http://gfs.wikidev.net/Installation. I used LVM2 to create a logical volume on
> the san device (/dev/sda1), gfs_mkfs'd the device according to the clustering
> config that I made, and successfully mounted the lv on both hosts as a local
> drive. Is it necessary for us to use LVM if the san is already doing the
> raid/redundancy? What is the bare minimum in terms of daemons that we'd need
> in order to run the above setup? I'm thinking it'd be something like:
> 
> ccsd
> cman_tool join
> clvmd
> mount -t gfs /dev/sda1 /mnt/san-name

Without CLVM the steps reduce to:

ccsd
cman_tool join
fence_tool join
mount -t gfs /dev/sda1 /mnt


> Also, what suggestions do you have for an automatic failover system for the
> two hosts? I imagine some sort of heartbeat package. Thanks for your time.

The "Resource Manager" will do NFS failover:
https://www.redhat.com/archives/linux-cluster/2004-July/msg00121.html

-- 
Dave Teigland  <teigland at redhat.com>



From jeff at intersystems.com  Tue Jul 27 12:28:20 2004
From: jeff at intersystems.com (Jeff)
Date: Tue, 27 Jul 2004 08:28:20 -0400
Subject: [Linux-cluster] EDEADLOCK status in dlm
In-Reply-To: <20040727024718.GC12983@redhat.com>
References: <1090861715.13809.3.camel@laza.eunet.yu>
	<20040727024718.GC12983@redhat.com>
Message-ID: <417837901.20040727082820@intersystems.com>

The dlm document describes a return status of EDEADLOCK
and this is referenced in ast.c and a couple of the tests.

Using the latest version of CVS (I'm pretty sure) I can't find
the definition for EDEADLOCK in a header file. The only
definition is in one of the tests (which doesn't build) and it
defines it as SS$_DEADLOCK :-)

[Neither of the tests in the dlm\tests\locktest directory
compile cleanly]

I assume I'm missing something but I'm not sure what it is.




From pcaulfie at redhat.com  Tue Jul 27 13:09:19 2004
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 27 Jul 2004 14:09:19 +0100
Subject: [Linux-cluster] EDEADLOCK status in dlm
In-Reply-To: <417837901.20040727082820@intersystems.com>
References: <1090861715.13809.3.camel@laza.eunet.yu>
	<20040727024718.GC12983@redhat.com>
	<417837901.20040727082820@intersystems.com>
Message-ID: <20040727130919.GH14648@tykepenguin.com>

On Tue, Jul 27, 2004 at 08:28:20AM -0400, Jeff wrote:
> The dlm document describes a return status of EDEADLOCK
> and this is referenced in ast.c and a couple of the tests.
> 
> Using the latest version of CVS (I'm pretty sure) I can't find
> the definition for EDEADLOCK in a header file. The only
> definition is in one of the tests (which doesn't build) and it
> defines it as SS$_DEADLOCK :-)

EDEADLOCK should be in /usr/include/errno.h (actually I think its 
asm/errno.h) so should not need to be defined by the dlm headers.
 
> [Neither of the tests in the dlm\tests\locktest directory
> compile cleanly]

That's quite probable, those are kernel modules written some
time ago. If you can be bothered to manually hook them into 
the kernel build system I think locktest.c should work. I might
get some time to fix the makefiles and old bits fixed in those but
it's not a priority.
 
I suspect pingtest only works on VMS by now :-)
-- 

patrick



From lhh at redhat.com  Tue Jul 27 13:21:27 2004
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 27 Jul 2004 09:21:27 -0400
Subject: [Linux-cluster] requirements question
In-Reply-To: <20040727030104.GD12983@redhat.com>
References: <001001c4733c$b763f5b0$0b50e5d8@roadkill>
	<Pine.LNX.4.58.0407261624150.14158@znvy.sverfvgrf.pbz>
	<20040727030104.GD12983@redhat.com>
Message-ID: <1090934487.8748.75.camel@dhcp83-21.boston.redhat.com>

On Tue, 2004-07-27 at 11:01 +0800, David Teigland wrote:
> 
> The "Resource Manager" will do NFS failover:
> https://www.redhat.com/archives/linux-cluster/2004-July/msg00121.html
> 

True, dat.  It's still got a few kinks though.

-- Lon



From michael.krietemeyer at informatik.uni-rostock.de  Mon Jul 26 12:06:01 2004
From: michael.krietemeyer at informatik.uni-rostock.de (Michael Krietemeyer)
Date: Mon, 26 Jul 2004 14:06:01 +0200
Subject: [Linux-cluster] GFS mount problem
Message-ID: <4104F3A9.8080303@informatik.uni-rostock.de>

Hello

We have setuped our small 4-node cluster with the RedHat 
2.4.21-15.0.3ELsmp Kernel and use the GFS 6.0.0-7 Package.

One cluster-node exports via gnbd two disks to the three other nodes. 
One of these exports is used as CCA-Device ond one for a GFS share.

Now we setup GFS like the example C.1. in the "Red Hat GFS 6.0 
Administrator's Guide" (three nodes, each as LOCK_GULM Server and GFS 
Client, fence method: manual). All steps work fine, except the mount.
On the fist node, the mount works. The mount on the second node blocks, 
until the node one unmounts the gfs share. (Summary: Only one node can 
mount the share at the same time).

Can somebody help?

Michael Krietemeyer



From robert at dicus.org  Mon Jul 26 15:47:26 2004
From: robert at dicus.org (Robert)
Date: Mon, 26 Jul 2004 10:47:26 -0500
Subject: [Linux-cluster] GFS:  FS Mount Issues
Message-ID: <4105278E.7020804@dicus.org>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040726/52708c03/attachment.htm>

From Carl.Bavington at ca.com  Wed Jul 28 08:31:22 2004
From: Carl.Bavington at ca.com (Bavington, Carl)
Date: Wed, 28 Jul 2004 09:31:22 +0100
Subject: [Linux-cluster] GFS:  FS Mount Issues
Message-ID: <08237065FA027340B731E570990978540252B184@ukslms22.ca.com>

Robert,

 

I am also seeing the hang on a second mount, No errors reported in logs.
Did you get an answer?.

 

Thanks,

 

Carl Bavington

mob +44 (0)7793 758327

  _____  

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert
Sent: 26 July 2004 16:47
To: linux-cluster at redhat.com
Subject: [Linux-cluster] GFS: FS Mount Issues

 

All,

 

I have a question regarding the latest implementation of GFS 6.0 with
RedHat Linux 3.0 Enterprise.  What my company has going on is this:  We
have a SAN project coming up but we do not have the SAN or a similar
type shared storage device available.  We have the node machines on hand
and are trying to work through the GFS implementation as we are new to
GFS (We have run RedHat Linux since version 5.0 and all the flavors in
between.).  We have tried to simulate the SAN environment by utilizing
the GNBD software and we have followed the instructions available at:
http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-ex-slm-ext-gnbd.
html

 

This is the LOCK_GULM, SLM External, and GNBD example of GFS.

 

We have not had any problems getting the shared devices, pools, and
filesystems created as followed in the documentation.  What we have
happening is that when we mount the GFS filesystem on one node and then
we try and mount the filesystem on the second node, the second node will
hang when issued the command to mount the filesystem.  No errors are
reported on console or in logs.  No errors are reported in the Lock
Server either.  Everything appears to be working correctly as the log
information for both machines at that instant are the same, ie, the one
that is hung, has the same log messages as the one that is not hung.  

 

When I go to node one, with node two still trying to mount the
filesystem, and unmount the filesystem, node two immediately finishes
the mount command and everything is fine with node two.  However, when
trying to mount node one again, it just hangs and so on.  It is only
allowing one node to mount the filesystem at once.  The configuration
files are all the generic examples given in the documentation with the
fencing mechanism as GNBD (Tried fencing with manual and the same
situation exists with that method.).  I can provide all configuration
files and also give log file information if this problem isn't something
that experienced GFS users know what the problem may be.  

 

Thank you all for your time.

 

Robert

Fidelity Communications

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040728/99c1209d/attachment.htm>

From stephen.willey at framestore-cfc.com  Wed Jul 28 08:47:33 2004
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Wed, 28 Jul 2004 09:47:33 +0100
Subject: [Linux-cluster] GFS:  FS Mount Issues
In-Reply-To: <08237065FA027340B731E570990978540252B184@ukslms22.ca.com>
References: <08237065FA027340B731E570990978540252B184@ukslms22.ca.com>
Message-ID: <41076825.2040302@framestore-cfc.com>

Have you mkfs'd the filesystem with enough journals for each machine? If 
you only created one journal I guess it'd do this...

Stephen




Bavington, Carl wrote:

> Robert,
>
> I am also seeing the hang on a second mount, No errors reported in 
> logs. Did you get an answer?.
>
> Thanks,
>
> Carl Bavington
>
> mob +44 (0)7793 758327
>
> ------------------------------------------------------------------------
>
> *From:* linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Robert
> *Sent:* 26 July 2004 16:47
> *To:* linux-cluster at redhat.com
> *Subject:* [Linux-cluster] GFS: FS Mount Issues
>
> All,
>
> I have a question regarding the latest implementation of GFS 6.0 with 
> RedHat Linux 3.0 Enterprise. What my company has going on is this: We 
> have a SAN project coming up but we do not have the SAN or a similar 
> type shared storage device available. We have the node machines on 
> hand and are trying to work through the GFS implementation as we are 
> new to GFS (We have run RedHat Linux since version 5.0 and all the 
> flavors in between.). We have tried to simulate the SAN environment by 
> utilizing the GNBD software and we have followed the instructions 
> available at: 
> http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-ex-slm-ext-gnbd.html
>
> This is the LOCK_GULM, SLM External, and GNBD example of GFS.
>
> We have not had any problems getting the shared devices, pools, and 
> filesystems created as followed in the documentation. What we have 
> happening is that when we mount the GFS filesystem on one node and 
> then we try and mount the filesystem on the second node, the second 
> node will hang when issued the command to mount the filesystem. No 
> errors are reported on console or in logs. No errors are reported in 
> the Lock Server either. Everything appears to be working correctly as 
> the log information for both machines at that instant are the same, 
> ie, the one that is hung, has the same log messages as the one that is 
> not hung.
>
> When I go to node one, with node two still trying to mount the 
> filesystem, and unmount the filesystem, node two immediately finishes 
> the mount command and everything is fine with node two. However, when 
> trying to mount node one again, it just hangs and so on. It is only 
> allowing one node to mount the filesystem at once. The configuration 
> files are all the generic examples given in the documentation with the 
> fencing mechanism as GNBD (Tried fencing with manual and the same 
> situation exists with that method.). I can provide all configuration 
> files and also give log file information if this problem isn?t 
> something that experienced GFS users know what the problem may be.
>
> Thank you all for your time.
>
> Robert
>
> Fidelity Communications
>
>------------------------------------------------------------------------
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>http://www.redhat.com/mailman/listinfo/linux-cluster
>  
>



From laza at yu.net  Wed Jul 28 08:57:11 2004
From: laza at yu.net (Lazar Obradovic)
Date: Wed, 28 Jul 2004 10:57:11 +0200
Subject: [Linux-cluster] Posix ACL deps for gfs-kernel modules
Message-ID: <1091005031.29997.887.camel@laza.eunet.yu>

Hi, 

I've been playing with out-of-tree building of gfs related modules
(actually, creating ebuilds for gfs), and found out one really stupid
thing: 

You cannot successfuly build gfs-kernel modules if you don't have at
least one "regular" filesystem build in kernel with Posix ACL support. 
The thing is that gfs-kernel tree modules expect to have posix_acl*
symbols available from already build kernel. 

Perhaps it's not a "bug" in gfs-kernel after all, since kernel
documentation states:

--- fs/KConfig ---
config FS_POSIX_ACL
# Posix ACL utility routines (for now, only ext2/ext3/jfs/reiserfs)
#
# NOTE: you can implement Posix ACLs without these helpers (XFS does).
#       Never use this symbol for ifdefs.
#
        bool
        depends on EXT2_FS_POSIX_ACL || EXT3_FS_POSIX_ACL ||
JFS_POSIX_ACL || REISERFS_FS_POSIX_ACL
        default y
--- fs/KConfig ---

but, on the other hand, it doesn't put CONFIG_FS_POSIX_ACL in .config,
so fs/Makefile ignores posix_acl.o and xattr.o when compiliing kernel. 

Is this gfs or kernel issue? Can we locally correct this (somehow force
the compilation of fs/posix_acl.o and fs/xattr.o if not available) or do
we have report this to LKML? 

Quick fix would be to compile kernel with EXT3_FS_POSIX_ACL, but i'm not
sure what side-effects would that have on ext3 filesystems.

-- 
Lazar Obradovic, System Engineer
----- 
laza at YU.net
YUnet International http://www.EUnet.yu
Dubrovacka 35/III, 11000 Belgrade
Tel: +381 11 3119901; Fax: +381 11 3119901
-----
This e-mail is confidential and intended only for the recipient.
Unauthorized distribution, modification or disclosure of its
contents is prohibited. If you have received this e-mail in error,
please notify the sender by telephone +381 11 3119901.
-----




From mtilstra at redhat.com  Wed Jul 28 14:36:35 2004
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Wed, 28 Jul 2004 09:36:35 -0500
Subject: [Linux-cluster] GFS:  FS Mount Issues
In-Reply-To: <41076825.2040302@framestore-cfc.com>
References: <08237065FA027340B731E570990978540252B184@ukslms22.ca.com>
	<41076825.2040302@framestore-cfc.com>
Message-ID: <20040728143635.GA6734@redhat.com>

On Wed, Jul 28, 2004 at 09:47:33AM +0100, Stephen Willey wrote:
> Have you mkfs'd the filesystem with enough journals for each machine? If 
> you only created one journal I guess it'd do this...

It should not hang if there are not enough journals.  The mount should
fail, and a message stating why will be in dmsg.


-- 
Michael Conrad Tadpol Tilstra
It's not reality that's important, but how you perceive things.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040728/4b71c551/attachment.sig>

From kpreslan at redhat.com  Wed Jul 28 14:49:58 2004
From: kpreslan at redhat.com (Ken Preslan)
Date: Wed, 28 Jul 2004 09:49:58 -0500
Subject: [Linux-cluster] Posix ACL deps for gfs-kernel modules
In-Reply-To: <1091005031.29997.887.camel@laza.eunet.yu>
References: <1091005031.29997.887.camel@laza.eunet.yu>
Message-ID: <20040728144957.GA30927@potassium.msp.redhat.com>

If you apply the patches in cluster/gfs-kernel/patches to the kernel and
build GFS that way, things work ok.


On Wed, Jul 28, 2004 at 10:57:11AM +0200, Lazar Obradovic wrote:
> Hi, 
> 
> I've been playing with out-of-tree building of gfs related modules
> (actually, creating ebuilds for gfs), and found out one really stupid
> thing: 
> 
> You cannot successfuly build gfs-kernel modules if you don't have at
> least one "regular" filesystem build in kernel with Posix ACL support. 
> The thing is that gfs-kernel tree modules expect to have posix_acl*
> symbols available from already build kernel. 
> 
> Perhaps it's not a "bug" in gfs-kernel after all, since kernel
> documentation states:
> 
> --- fs/KConfig ---
> config FS_POSIX_ACL
> # Posix ACL utility routines (for now, only ext2/ext3/jfs/reiserfs)
> #
> # NOTE: you can implement Posix ACLs without these helpers (XFS does).
> #       Never use this symbol for ifdefs.
> #
>         bool
>         depends on EXT2_FS_POSIX_ACL || EXT3_FS_POSIX_ACL ||
> JFS_POSIX_ACL || REISERFS_FS_POSIX_ACL
>         default y
> --- fs/KConfig ---
> 
> but, on the other hand, it doesn't put CONFIG_FS_POSIX_ACL in .config,
> so fs/Makefile ignores posix_acl.o and xattr.o when compiliing kernel. 
> 
> Is this gfs or kernel issue? Can we locally correct this (somehow force
> the compilation of fs/posix_acl.o and fs/xattr.o if not available) or do
> we have report this to LKML? 
> 
> Quick fix would be to compile kernel with EXT3_FS_POSIX_ACL, but i'm not
> sure what side-effects would that have on ext3 filesystems.
> 
> -- 
> Lazar Obradovic, System Engineer
> ----- 
> laza at YU.net
> YUnet International http://www.EUnet.yu
> Dubrovacka 35/III, 11000 Belgrade
> Tel: +381 11 3119901; Fax: +381 11 3119901
> -----
> This e-mail is confidential and intended only for the recipient.
> Unauthorized distribution, modification or disclosure of its
> contents is prohibited. If you have received this e-mail in error,
> please notify the sender by telephone +381 11 3119901.
> -----
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Ken Preslan <kpreslan at redhat.com>



From jeff at intersystems.com  Wed Jul 28 15:15:17 2004
From: jeff at intersystems.com (Jeff)
Date: Wed, 28 Jul 2004 11:15:17 -0400
Subject: [Linux-cluster] Specifyng the AST routine in dlm_unlock()
Message-ID: <58786861.20040728111517@intersystems.com>

dlm_unlock() is documented as being asynchronous and it
takes an astarg as one of its arguments. However it does
not take an AST routine as an argument.

What routine gets executed when an unlock completes?




From teigland at redhat.com  Wed Jul 28 15:38:40 2004
From: teigland at redhat.com (David Teigland)
Date: Wed, 28 Jul 2004 23:38:40 +0800
Subject: [Linux-cluster] Specifyng the AST routine in dlm_unlock()
In-Reply-To: <58786861.20040728111517@intersystems.com>
References: <58786861.20040728111517@intersystems.com>
Message-ID: <20040728153840.GH13983@redhat.com>

On Wed, Jul 28, 2004 at 11:15:17AM -0400, Jeff wrote:
> dlm_unlock() is documented as being asynchronous and it
> takes an astarg as one of its arguments. However it does
> not take an AST routine as an argument.
> 
> What routine gets executed when an unlock completes?

The AST routine from dlm_lock() is saved and used for dlm_unlock().

-- 
Dave Teigland  <teigland at redhat.com>



From jeff at intersystems.com  Wed Jul 28 15:57:03 2004
From: jeff at intersystems.com (Jeff)
Date: Wed, 28 Jul 2004 11:57:03 -0400
Subject: [Linux-cluster] Specifyng the AST routine in dlm_unlock()
In-Reply-To: <20040728153840.GH13983@redhat.com>
References: <58786861.20040728111517@intersystems.com>
	<20040728153840.GH13983@redhat.com>
Message-ID: <1024309498.20040728115703@intersystems.com>

Wednesday, July 28, 2004, 11:38:40 AM, David Teigland wrote:

> On Wed, Jul 28, 2004 at 11:15:17AM -0400, Jeff wrote:
>> dlm_unlock() is documented as being asynchronous and it
>> takes an astarg as one of its arguments. However it does
>> not take an AST routine as an argument.
>> 
>> What routine gets executed when an unlock completes?

> The AST routine from dlm_lock() is saved and used for dlm_unlock().


This makes it difficult to update an application which works
with other DLM's as all the completion AST routines need to be
updated to test for EUNLOCK to figure out why they've been
invoked.

Would it be possible to add an optional argument to dlm_unlock()
for the AST routine to call when the unlock completes?
If this is omitted, the existing completion AST routine is
executed.




From teigland at redhat.com  Wed Jul 28 16:06:35 2004
From: teigland at redhat.com (David Teigland)
Date: Thu, 29 Jul 2004 00:06:35 +0800
Subject: [Linux-cluster] Specifyng the AST routine in dlm_unlock()
In-Reply-To: <1024309498.20040728115703@intersystems.com>
References: <58786861.20040728111517@intersystems.com>
	<20040728153840.GH13983@redhat.com>
	<1024309498.20040728115703@intersystems.com>
Message-ID: <20040728160635.GK13983@redhat.com>

On Wed, Jul 28, 2004 at 11:57:03AM -0400, Jeff wrote:
> Wednesday, July 28, 2004, 11:38:40 AM, David Teigland wrote:
> 
> > On Wed, Jul 28, 2004 at 11:15:17AM -0400, Jeff wrote:
> >> dlm_unlock() is documented as being asynchronous and it
> >> takes an astarg as one of its arguments. However it does
> >> not take an AST routine as an argument.
> >> 
> >> What routine gets executed when an unlock completes?
> 
> > The AST routine from dlm_lock() is saved and used for dlm_unlock().
> 
> This makes it difficult to update an application which works
> with other DLM's as all the completion AST routines need to be
> updated to test for EUNLOCK to figure out why they've been
> invoked.
> 
> Would it be possible to add an optional argument to dlm_unlock()
> for the AST routine to call when the unlock completes?
> If this is omitted, the existing completion AST routine is
> executed.

It should be simple to add an AST routine as an arg to dlm_unlock().

-- 
Dave Teigland  <teigland at redhat.com>



From amanthei at redhat.com  Wed Jul 28 16:46:12 2004
From: amanthei at redhat.com (Adam Manthei)
Date: Wed, 28 Jul 2004 11:46:12 -0500
Subject: [Linux-cluster] GFS mount problem
In-Reply-To: <4104F3A9.8080303@informatik.uni-rostock.de>
References: <4104F3A9.8080303@informatik.uni-rostock.de>
Message-ID: <20040728164612.GD27527@redhat.com>

Has your problem been resolved yet?  It sounds similar to a hostname length
issue that has been reported in bugzilla:

http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=127828

If this is the case, the short term workaround is to change your hostnames.

On Mon, Jul 26, 2004 at 02:06:01PM +0200, Michael Krietemeyer wrote:
> Hello
> 
> We have setuped our small 4-node cluster with the RedHat 
> 2.4.21-15.0.3ELsmp Kernel and use the GFS 6.0.0-7 Package.
> 
> One cluster-node exports via gnbd two disks to the three other nodes. 
> One of these exports is used as CCA-Device ond one for a GFS share.
> 
> Now we setup GFS like the example C.1. in the "Red Hat GFS 6.0 
> Administrator's Guide" (three nodes, each as LOCK_GULM Server and GFS 
> Client, fence method: manual). All steps work fine, except the mount.
> On the fist node, the mount works. The mount on the second node blocks, 
> until the node one unmounts the gfs share. (Summary: Only one node can 
> mount the share at the same time).
> 
> Can somebody help?
> 
> Michael Krietemeyer
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Adam Manthei  <amanthei at redhat.com>



From jeff at intersystems.com  Wed Jul 28 16:49:48 2004
From: jeff at intersystems.com (Jeff)
Date: Wed, 28 Jul 2004 12:49:48 -0400
Subject: [Linux-cluster] Is this intentional: specifying a new completion
	ast routine on a convert
Message-ID: <1524056551.20040728124948@intersystems.com>

This is from device.c. The intent seems to
be that if an argument is specified, then it overrides
an existing value. However, a new completion ast address
is only loaded if a new blocking ast address is specified.

        if (kparams->flags & DLM_LKF_CONVERT) {
                struct dlm_lkb *lkb = dlm_get_lkb(fi->fi_ls->ls_lockspace, kparams->lkid);
                if (!lkb) {
                        return -EINVAL;
                }
                li = (struct lock_info *)lkb->lkb_astparam;

                /* Only override these if they are provided */
                if (li->li_user_lksb)
                        li->li_user_lksb = kparams->lksb;
                if (li->li_astparam)
                        li->li_astparam  = kparams->astparam;
                if (li->li_bastaddr)
                        li->li_bastaddr  = kparams->bastaddr;
--->            if (li->li_bastaddr)
--->                    li->li_astaddr   = kparams->astaddr;
                li->li_flags     = 0;
        }




From jeff at intersystems.com  Wed Jul 28 16:54:45 2004
From: jeff at intersystems.com (Jeff)
Date: Wed, 28 Jul 2004 12:54:45 -0400
Subject: [Linux-cluster] Is this intentional: specifying a new completion
	ast routine on a convert
In-Reply-To: <1524056551.20040728124948@intersystems.com>
References: <1524056551.20040728124948@intersystems.com>
Message-ID: <29859916.20040728125445@intersystems.com>

Wednesday, July 28, 2004, 12:49:48 PM, Jeff wrote:

> This is from device.c. The intent seems to
> be that if an argument is specified, then it overrides
> an existing value. However, a new completion ast address
> is only loaded if a new blocking ast address is specified.

>         if (kparams->flags & DLM_LKF_CONVERT) {
>                 struct dlm_lkb *lkb =
> dlm_get_lkb(fi->fi_ls->ls_lockspace, kparams->lkid);
>                 if (!lkb) {
>                         return -EINVAL;
>                 }
>                 li = (struct lock_info *)lkb->lkb_astparam;

>                 /* Only override these if they are provided */
>                 if (li->li_user_lksb)
>                         li->li_user_lksb = kparams->lksb;
>                 if (li->li_astparam)
>                         li->li_astparam  = kparams->astparam;
>                 if (li->li_bastaddr)
>                         li->li_bastaddr  = kparams->bastaddr;
--->>            if (li->li_bastaddr)
--->>                    li->li_astaddr   = kparams->astaddr;
>                 li->li_flags     = 0;
>         }

Looking at this again, shouldn't it be testing kparams-> in
the if() rather than li->*? The current code seems to write new
values if there were old ones as opposed to if new values
are specified.





From michael.krietemeyer at informatik.uni-rostock.de  Wed Jul 28 06:00:05 2004
From: michael.krietemeyer at informatik.uni-rostock.de (Michael Krietemeyer)
Date: Wed, 28 Jul 2004 08:00:05 +0200
Subject: [Linux-cluster] GFS mount problem
In-Reply-To: <4104F3A9.8080303@informatik.uni-rostock.de>
References: <4104F3A9.8080303@informatik.uni-rostock.de>
Message-ID: <410740E5.8020809@informatik.uni-rostock.de>

Hello

Solved! The first 8 bytes of the node names are not equal!

M. Krietemeyer

Michael Krietemeyer wrote:
> Hello
> 
> We have setuped our small 4-node cluster with the RedHat 
> 2.4.21-15.0.3ELsmp Kernel and use the GFS 6.0.0-7 Package.
> 
> One cluster-node exports via gnbd two disks to the three other nodes. 
> One of these exports is used as CCA-Device ond one for a GFS share.
> 
> Now we setup GFS like the example C.1. in the "Red Hat GFS 6.0 
> Administrator's Guide" (three nodes, each as LOCK_GULM Server and GFS 
> Client, fence method: manual). All steps work fine, except the mount.
> On the fist node, the mount works. The mount on the second node blocks, 
> until the node one unmounts the gfs share. (Summary: Only one node can 
> mount the share at the same time).
> 
> Can somebody help?
> 
> Michael Krietemeyer
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster



From pcaulfie at redhat.com  Thu Jul 29 12:38:04 2004
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 29 Jul 2004 13:38:04 +0100
Subject: [Linux-cluster] Is this intentional: specifying a new completion
	ast routine on a convert
In-Reply-To: <29859916.20040728125445@intersystems.com>
References: <1524056551.20040728124948@intersystems.com>
	<29859916.20040728125445@intersystems.com>
Message-ID: <20040729123803.GA26311@tykepenguin.com>

On Wed, Jul 28, 2004 at 12:54:45PM -0400, Jeff wrote:
> Wednesday, July 28, 2004, 12:49:48 PM, Jeff wrote:
> 
> > This is from device.c. The intent seems to
> > be that if an argument is specified, then it overrides
> > an existing value. However, a new completion ast address
> > is only loaded if a new blocking ast address is specified.
> 
> >         if (kparams->flags & DLM_LKF_CONVERT) {
> >                 struct dlm_lkb *lkb =
> > dlm_get_lkb(fi->fi_ls->ls_lockspace, kparams->lkid);
> >                 if (!lkb) {
> >                         return -EINVAL;
> >                 }
> >                 li = (struct lock_info *)lkb->lkb_astparam;
> 
> >                 /* Only override these if they are provided */
> >                 if (li->li_user_lksb)
> >                         li->li_user_lksb = kparams->lksb;
> >                 if (li->li_astparam)
> >                         li->li_astparam  = kparams->astparam;
> >                 if (li->li_bastaddr)
> >                         li->li_bastaddr  = kparams->bastaddr;
> --->>            if (li->li_bastaddr)
> --->>                    li->li_astaddr   = kparams->astaddr;
> >                 li->li_flags     = 0;
> >         }
> 
> Looking at this again, shouldn't it be testing kparams-> in
> the if() rather than li->*? The current code seems to write new
> values if there were old ones as opposed to if new values
> are specified.

er, yes it looks like it. I'll check in a fix when I get back home.

-- 

patrick



From dascalu_dragos at bah.com  Thu Jul 29 14:59:20 2004
From: dascalu_dragos at bah.com (Dascalu Dragos)
Date: Thu, 29 Jul 2004 10:59:20 -0400
Subject: [Linux-cluster] Only root can write on GFS volume...
Message-ID: <DF3ADD9C-E16F-11D8-BE4B-000A95DC31EE@bah.com>

We are currently experimenting w/ GFS and have run into a problems we 
can not seem to find an answer to. To set the stage: We have 3 machines 
connected through an optical switch to a SAN. For simplicity purposes 
we have created a LUN which can be seen by the 3 machines. 
GFS+modules+patches are successfully running. We are using LMV2 and 
created a volume group called "test" on the /dev/sdb5 which is what the 
machines see the LUN as. We then created a logical volume called "one" 
on this volume group.

web3:~# ls -la /dev/test/
total 28
dr-x------    2 root     root         4096 Jul 29 10:01 .
drwxr-xr-x   13 root     root        24576 Jul 29 10:01 ..
lrwx------    1 root     root           20 Jul 29 10:01 one -> 
/dev/mapper/test-one

web3:~# ls -la /dev/mapper/
total 28
drwxr-xr-x    2 tomcat   tomcat       4096 Jul 29 10:13 .
drwxr-xr-x   13 root     root        24576 Jul 29 10:01 ..
crw-------    1 root     root      10,  63 Jul 29 10:13 control
brw-------    1 root     root     254,   0 Jul 29 10:01 test-one

/dev/test/one was formatted using "gfs_mkfs -p lock_dlm -t 
webserver:one -j 4 /dev/test/one". (there will be 4 machines in the 
future)
System starts fine, all 3 machines are member nodes and can 
successfully mount /dev/test/one on "/test" (mount point on / we 
created for testing).
_______
Problem
------------
When /test is accessed and written to as root everything is fine, new 
data gets updated on the other nodes in real time. However if another 
user besides root attempts to write to /test the partition locks up 
(basically the shell we are in locks appearing to wait for the return 
of the "touch new_file" command). In the process tree we can see the 
touch command however it can not be killed nor can /test be unmounted. 
At this point on any of the 3 machines an "ls -la /test" has the same 
frozen behavior. Only a reboot gets things back to normal :(

We first thought that this may be an LMV2 problem but if /dev/test/one 
is formatted as ext3 not gfs then mounted on /test is can be written to 
fine by all users including root (we also gave 777 permissions to all 
objects in /dev/test and /dev/mapper). This also does not seem to be an 
obvious OS perm issue...

Scenario 1:
drwxr-xr-x    2 root     root         4096 Jul 29 10:02 test
Here only root can write and everyone read. If another user but root 
tries to write the get an "touch: creating `/test/tom': Permission 
denied". This does not cause a system freeze.

Scenario 2:
drwxrwxrwx    2 root     root         4096 Jul 29 10:02 test
Here everyone can do anything to this directory and if another user but 
root tries to write system freezes.

It appears that when it is formatted as gfs no one but root can write 
to it. Any thoughts?

Dede.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3399 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040729/0a695c04/attachment.p7s>

From teigland at redhat.com  Fri Jul 30 04:59:05 2004
From: teigland at redhat.com (David Teigland)
Date: Fri, 30 Jul 2004 12:59:05 +0800
Subject: [Linux-cluster] Only root can write on GFS volume...
In-Reply-To: <DF3ADD9C-E16F-11D8-BE4B-000A95DC31EE@bah.com>
References: <DF3ADD9C-E16F-11D8-BE4B-000A95DC31EE@bah.com>
Message-ID: <20040730045905.GB13525@redhat.com>

On Thu, Jul 29, 2004 at 10:59:20AM -0400, Dascalu Dragos wrote:
> It appears that when it is formatted as gfs no one but root can write 
> to it. Any thoughts?

This was solved by updating to the latest dlm source code; the problem was
fixed a couple weeks ago.

-- 
Dave Teigland  <teigland at redhat.com>



From kpfleming at backtobasicsmgmt.com  Sat Jul 31 14:40:48 2004
From: kpfleming at backtobasicsmgmt.com (Kevin P. Fleming)
Date: Sat, 31 Jul 2004 07:40:48 -0700
Subject: [Linux-cluster] Re: [ANNOUNCE] OpenSSI 1.0.0 released!!
In-Reply-To: <410B80BC.4060100@hp.com>
References: <410B80BC.4060100@hp.com>
Message-ID: <410BAF70.7010205@backtobasicsmgmt.com>

Aneesh Kumar K.V wrote:

> 5. Devices
>   * there is a clusterwide device model via the devfs code

Yeah, that's we want, take buggy, unreliable, 
soon-to-be-removed-from-mainline code and put an entire clustering layer 
on top of it. Too bad someone is going to need to completely reimplement 
this "clusterwide device model".



From bruce.walker at hp.com  Sat Jul 31 16:00:34 2004
From: bruce.walker at hp.com (Walker, Bruce J)
Date: Sat, 31 Jul 2004 09:00:34 -0700
Subject: [Linux-cluster] Re: [ANNOUNCE] OpenSSI 1.0.0 released!!
Message-ID: <3689AF909D816446BA505D21F1461AE4C750E6@cacexc04.americas.cpqcorp.net>

Kevin,
   Got out of bed on the wrong side?  Such anger.  First, the
clusterwide device capability is a very small part of OpenSSI so your
comment "put the entire clustering layer on top of it" is COMPLETELY
wrong - you clearly are commenting about something you know nothing
about.  In the 2.4 implementation, providing this one capability by
leveraging devfs was quite economic, efficient and has been very stable.
I'm not sure who you mean by "that's what WE want".  If you mean the
current worldwide users of OpenSSI on 2.4, they are a very happy group
with a kick-ass clustering capability.

About one thing you are correct.  We are going to have to have a way to
lookup and name remote devices in 2.6.  I believe the remote file-op
mechanism we are using in 2.4 will adapt easily.

Bruce Walker
Architect and project manager - OpenSSI project



> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kevin 
> P. Fleming
> Sent: Saturday, July 31, 2004 7:41 AM
> To: Linux Kernel Mailing List
> Cc: linux-cluster at redhat.com; 
> opengfs-devel at lists.sourceforge.net; 
> opengfs-users at lists.sourceforge.net; 
> opendlm-devel at lists.sourceforge.net
> Subject: [Linux-cluster] Re: [ANNOUNCE] OpenSSI 1.0.0 released!!
> 
> 
> Aneesh Kumar K.V wrote:
> 
> > 5. Devices
> >   * there is a clusterwide device model via the devfs code
> 
> Yeah, that's we want, take buggy, unreliable, 
> soon-to-be-removed-from-mainline code and put an entire 
> clustering layer 
> on top of it. Too bad someone is going to need to completely 
> reimplement 
> this "clusterwide device model".
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
> 



From aneesh.kumar at hp.com  Sat Jul 31 11:21:32 2004
From: aneesh.kumar at hp.com (Aneesh Kumar K.V)
Date: Sat, 31 Jul 2004 16:51:32 +0530
Subject: [Linux-cluster] [ANNOUNCE] OpenSSI 1.0.0 released!!
Message-ID: <410B80BC.4060100@hp.com>

Hi,

Sorry for the cross post. I came across this on OpenSSI website. I guess 
others may also be interested.

-aneesh

The OpenSSI project leverages both HP's NonStop Clusters for Unixware 
technology and other open source technology to provide a full, highly 
available Single System Image environment for Linux.

Feature list:
1.  Cluster Membership
   * includes libcluster  that application can use
2. Internode Communication

3. Filesystem
    * support for CFS over ext3,  Lustre Lite
    * CFS can be used for the root
    * reopen of files, devices, ipc objects when processes move is supported
    * CFS supports file record locking and shared writable mapped files 
(along with all other standard POSIX capabilities
    * HA-CFS is configurable for the root or other filesystems
4. Process Management
     * almost all pieces there, including:
           o clusterwide PIDs
           o process migration and distributed rexec(), rfork() and 
migrate() with reopen of files, sockets, pipes, devices, etc.
           o vprocs
           o clusterwide signalling, get/setpriority
           o capabilities
           o distributed process groups, session, controlling terminal
           o surrogate origin functionality
           o no single points of failure (cleanup code to deal with 
nodedowns)
           o Mosix load leveler (with the process migration model from NSC)
           o clusterwide ptrace() and strace
           o clusterwide /proc/<pid>, ps, top, etc.

5. Devices
   * there is a clusterwide device model via the devfs code
   * each node mounts its devfs on /cluster/node#/dev and bind mounts it 
to /dev so all devices are visible and accessible from all nodes, but by 
default you see only local devices
   * a process on any node can open a device on any node
   * devices are reopened when processes move
   * processes retain a context, even if they move; the context 
determines which node's devices to access by defaul
6. IPC
   * all IPC objects/mechanisms are clusterwide:
          o pipes
          o fifos
          o signalling
          o message queues
          o semaphore
          o shared memory
          o Unix-domain sockets
          o Internet-domain sockets
  * reopen of IPC objects is there for process movement
  * nodedown handling is there for all IPC objects
7. Clusterwide TCP/IP
   * HA-LVS is integrated, with extensions
   * extension is that port redirection to servers in the cluster is 
automatic and doesn't have to be managed.
8. Kernel Data Replication Service
   * it is in there (cluster/ssi/clreg)
9. Shared Storage
   * we have tested shared FCAL and use it for HA-CFS
10. DLM
   * is integrated with CLMS and is HA
11. Sysadmin
   * services architecture has been made clusterwide
12. Init, Booting and Run Levels
   * system runs with a single init which will failover/restart on 
another node if the node it is on dies
13. Application Availability
  * application monitoring/restart provided by spawndaemon/keepalive
  * services started by RC on the initnode will automatically restart on 
a failure of the initnode
14. Timesync
  * NTP for now
15. Load Leveling
  * adapted the openMosix algorithm
  * for connection load balancing, using HA-LVS
  * load leveling is on by default
  * applications must be registered to load level
16. Packaging/Install
   * Have source patch, binary RPMs and CVS source options;
   *  Debian packages also available via ap-get repository.
   * First node is incremental to a standard Linux install
   * Other nodes install via netboot, PXEboot, DHCP and simple addnode 
command;
17. Object Interfaces
   * standard interfaces for objects work as expected
   * no new interfaces for object location or movement except for 
processes (rexec(), migrate(), and /proc/pid/goto to move a process)



From tao at acc.umu.se  Sat Jul 31 16:35:58 2004
From: tao at acc.umu.se (David Weinehall)
Date: Sat, 31 Jul 2004 18:35:58 +0200
Subject: [Linux-cluster] Re: [ANNOUNCE] OpenSSI 1.0.0 released!!
In-Reply-To: <410B80BC.4060100@hp.com>
References: <410B80BC.4060100@hp.com>
Message-ID: <20040731163558.GA10689@khan.acc.umu.se>

On Sat, Jul 31, 2004 at 04:51:32PM +0530, Aneesh Kumar K.V wrote:
> Hi,
> 
> Sorry for the cross post. I came across this on OpenSSI website. I guess 
> others may also be interested.
> 
> -aneesh
> 
> The OpenSSI project leverages both HP's NonStop Clusters for Unixware 
> technology and other open source technology to provide a full, highly 
> available Single System Image environment for Linux.

I can already hear SCO's lawyers screaming "They are taking technology
from UnixWare and incorporating in Linux!  Let's sue them!!!"...

That said, this looks really interesting.


Regards: David Weinehall
-- 
 /) David Weinehall <tao at acc.umu.se> /) Northern lights wander      (\
//  Maintainer of the v2.0 kernel   //  Dance across the winter sky //
\)  http://www.acc.umu.se/~tao/    (/   Full colour fire           (/



From crh at ubiqx.mn.org  Fri Jul 30 23:15:51 2004
From: crh at ubiqx.mn.org (Christopher R. Hertel)
Date: Fri, 30 Jul 2004 18:15:51 -0500
Subject: [Linux-cluster] Re: Welcome to the "Linux-cluster" mailing list
In-Reply-To: <mailman.14.1091222215.8647.linux-cluster@redhat.com>
References: <mailman.14.1091222215.8647.linux-cluster@redhat.com>
Message-ID: <20040730231551.GB20038@Favog.ubiqx.mn.org>

Man, that was fast...

-- 
"Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org